Hi,
I have been playing around with spark for a couple of days. I am
using spark-1.0.1-bin-hadoop1 and the Java API. The main idea of the
implementation is to run Hive queries on Spark. I used JavaHiveContext to
achieve this (As per the examples).
I have 2 questions.
1. I am wondering how I
For your second question: hql() (as well as sql()) does not launch a
Spark job immediately; instead, it fires off the Spark SQL
parser/optimizer/planner pipeline first, and a Spark job will be
started after the a physical execution plan is selected. Therefore,
your hand-rolled end-to-end
Actually, with HiveContext, you can join hive tables with registered
temporary tables.
On Fri, Aug 22, 2014 at 9:07 PM, chutium teng@gmail.com wrote:
oops, thanks Yan, you are right, i got
scala sqlContext.sql(select * from a join b).take(10)
java.lang.RuntimeException: Table Not Found:
hi, all
I suggest spark not use assembly jar as default run-time
dependency(spark-submit/spark-class depend on assembly jar),use a library of
all 3rd dependency jar like hadoop/hive/hbase more reasonable.
1 assembly jar packaged all 3rd jars into a big one, so we need rebuild this
jar if
Hm, are you suggesting that the Spark distribution be a bag of 100
JARs? It doesn't quite seem reasonable. It does not remove version
conflicts, just pushes them to run-time, which isn't good. The
assembly is also necessary because that's where shading happens. In
development, you want to run
yes, i am not sure what happens when building assembly jar and in my
understanding it just package all the dependency jars to a big one.
On 2014/9/2 16:45, Sean Owen wrote:
Hm, are you suggesting that the Spark distribution be a bag of 100
JARs? It doesn't quite seem reasonable. It does not
Sorry, The quick reply didn't cc the dev list.
Sean, sometimes I have to use the spark-shell to confirm some behavior change.
In that case, I have to reassembly the whole project. is there another way
around, not use the the big jar in development? For the original question, I
have no
Hi sean owen,
here are some problems when i used assembly jar
1 i put spark-assembly-*.jar to the lib directory of my application, it throw
compile error
Error:scalac: Error: class scala.reflect.BeanInfo not found.
scala.tools.nsc.MissingRequirementError: class scala.reflect.BeanInfo not found.
Zongheng pointed out in my SPARK-3329 PR
(https://github.com/apache/spark/pull/2220) that Aaron had already fixed this
issue but that it had gotten inadvertently clobbered by another patch. I don't
know how the project handles this kind of problem, but I've rewritten my
SPARK-3329 branch to
This doesn't help for every dependency, but Spark provides an option to
build the assembly jar without Hadoop and its dependencies. We make use of
this in CDH packaging.
-Sandy
On Tue, Sep 2, 2014 at 2:12 AM, scwf wangf...@huawei.com wrote:
Hi sean owen,
here are some problems when i used
in our hive warehouse there are many tables with a lot of partitions, such as
scala hiveContext.sql(use db_external)
scala val result = hiveContext.sql(show partitions et_fullorders).count
result: Long = 5879
i noticed that this part of code:
so, i had a meeting w/the databricks guys on friday and they recommended i
send an email out to the list to say 'hi' and give you guys a quick intro.
:)
hi! i'm shane knapp, the new AMPLab devops engineer, and will be spending
time getting the jenkins build infrastructure up to production
Welcome, Shane!
On Tuesday, September 2, 2014, shane knapp skn...@berkeley.edu wrote:
so, i had a meeting w/the databricks guys on friday and they recommended i
send an email out to the list to say 'hi' and give you guys a quick intro.
:)
hi! i'm shane knapp, the new AMPLab devops
Hi Shane!
Thank you for doing the Jenkins upgrade last week. It's nice to know that
infrastructure is gonna get some dedicated TLC going forward.
Welcome aboard!
Nick
On Tue, Sep 2, 2014 at 1:35 PM, shane knapp skn...@berkeley.edu wrote:
so, i had a meeting w/the databricks guys on friday
Having a SSD help tremendously with assembly time.
Without that, you can do the following in order for Spark to pick up the
compiled classes before assembly at runtime.
export SPARK_PREPEND_CLASSES=true
On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
This doesn't
Hey Shane,
Thanks for your work so far and I'm really happy to see investment in
this infrastructure. This is a key productivity tool for us and
something we'd love to expand over time to improve the development
process of Spark.
- Patrick
On Tue, Sep 2, 2014 at 10:47 AM, Nicholas Chammas
Welcome, Shane. As a former prof and eng dir at Google, I've been expecting
this to be a first-class engineering college subject. I just didn't expect
it to come through this route :-)
So congrats, and I hope you represent the beginning of a great new trend at
universities.
Sent while mobile.
Hi,
I want to incorporate some intelligence while choosing the resources for
rdd replication. I thought, if we replicate rdd on specially chosen nodes
based on the capabilities, the next application that requires this rdd can
be executed more efficiently. But, I found that an rdd creatd by an
Yea, SSD + SPARK_PREPEND_CLASSES totally changed my life :)
Maybe we should add a developer notes page to document all these useful
black magic.
On Tue, Sep 2, 2014 at 10:54 AM, Reynold Xin r...@databricks.com wrote:
Having a SSD help tremendously with assembly time.
Without that, you can
SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could probably be
easier to find):
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
On September 2, 2014 at 11:53:49 AM, Cheng Lian (lian.cs@gmail.com) wrote:
Yea, SSD + SPARK_PREPEND_CLASSES totally
Cool, didn't notice that, thanks Josh!
On Tue, Sep 2, 2014 at 11:55 AM, Josh Rosen rosenvi...@gmail.com wrote:
SPARK_PREPEND_CLASSES is documented on the Spark Wiki (which could
probably be easier to find):
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
On
Welcome Shane =)
- Henry
On Tue, Sep 2, 2014 at 10:35 AM, shane knapp skn...@berkeley.edu wrote:
so, i had a meeting w/the databricks guys on friday and they recommended i
send an email out to the list to say 'hi' and give you guys a quick intro.
:)
hi! i'm shane knapp, the new AMPLab
Welcome Shane! Glad to see that finally a hero jumping out to tame Jenkins
:)
On Tue, Sep 2, 2014 at 12:44 PM, Henry Saputra henry.sapu...@gmail.com
wrote:
Welcome Shane =)
- Henry
On Tue, Sep 2, 2014 at 10:35 AM, shane knapp skn...@berkeley.edu wrote:
so, i had a meeting w/the
+1
Tested Scala/MLlib apps on Fedora 20 (OpenJDK 7) and OS X 10.9 (Oracle JDK 8).
best,
wb
- Original Message -
From: Patrick Wendell pwend...@gmail.com
To: dev@spark.apache.org
Sent: Saturday, August 30, 2014 5:07:52 PM
Subject: [VOTE] Release Apache Spark 1.1.0 (RC3)
Please
+1
- Tested Thrift server and SQL CLI locally on OSX 10.9.
- Checked datanucleus dependencies in distribution tarball built by
make-distribution.sh without SPARK_HIVE defined.
On Tue, Sep 2, 2014 at 2:30 PM, Will Benton wi...@redhat.com wrote:
+1
Tested Scala/MLlib apps on
Hey guys,
I’m trying to run connected components on graphs that end up running for a
fairly large number of iterations (25-30) and take 5-6 hours. I find more than
half the time I end up getting fetch failures and losing an executor after a
number of iterations. Then it has to go back and
Hi, I am phoenixlee and a Spark programmer in Korea.
And be a good chance this time, it tries to teach college students and
office workers to Spark.
This course will be done with the support of the government. Can I use the
data(pictures, samples, etc.) in the spark homepage for this course? Of
I think in general that is fine. It would be great if your slides come with
proper attribution.
On Tue, Sep 2, 2014 at 3:31 PM, Sanghoon Lee phoenixl...@gmail.com wrote:
Hi, I am phoenixlee and a Spark programmer in Korea.
And be a good chance this time, it tries to teach college students
+1
On Tue, Sep 2, 2014 at 3:08 PM, Cheng Lian lian.cs@gmail.com wrote:
+1
- Tested Thrift server and SQL CLI locally on OSX 10.9.
- Checked datanucleus dependencies in distribution tarball built by
make-distribution.sh without SPARK_HIVE defined.
On Tue, Sep 2, 2014 at
+1
Verified PySpark InputFormat/OutputFormat examples.
On Tue, Sep 2, 2014 at 4:10 PM, Reynold Xin r...@databricks.com wrote:
+1
On Tue, Sep 2, 2014 at 3:08 PM, Cheng Lian lian.cs@gmail.com wrote:
+1
- Tested Thrift server and SQL CLI locally on OSX 10.9.
- Checked
+1
On Tue, Sep 2, 2014 at 5:18 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
+1
Tested on Mac OS X.
Matei
On September 2, 2014 at 5:03:19 PM, Kan Zhang (kzh...@apache.org) wrote:
+1
Verified PySpark InputFormat/OutputFormat examples.
On Tue, Sep 2, 2014 at 4:10 PM, Reynold Xin
+1 Tested on Mac OSX, Thrift Server, SparkSQL
On September 2, 2014 at 17:29:29, Michael Armbrust (mich...@databricks.com)
wrote:
+1
On Tue, Sep 2, 2014 at 5:18 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
+1
Tested on Mac OS X.
Matei
On September 2, 2014 at
+1
From: Patrick Wendell [pwend...@gmail.com]
Sent: Saturday, August 30, 2014 4:08 PM
To: dev@spark.apache.org
Subject: [VOTE] Release Apache Spark 1.1.0 (RC3)
Please vote on releasing the following candidate as Apache Spark version 1.1.0!
The tag to be
+1
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC3-tp8147p8211.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
and we're back and building!
On Tue, Sep 2, 2014 at 5:07 PM, shane knapp skn...@berkeley.edu wrote:
since our queue is really short, i'm waiting for a couple of builds to
finish and will be restarting jenkins to install/update some plugins. the
github pull request builder looks like it has
+1
Tested on HDP 2.1 Sandbox, Thrift Server with Simba Shark ODBC
Paolo
Da: Jeremy Freemanmailto:freeman.jer...@gmail.com
Data invio: ?mercoled?? ?3? ?settembre? ?2014 ?02?:?34
A: d...@spark.incubator.apache.orgmailto:d...@spark.incubator.apache.org
+1
--
View this message in context:
In light of the discussion on SPARK-, I'll revoke my -1 vote. The
issue does not appear to be serious.
On Sun, Aug 31, 2014 at 5:14 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
-1: I believe I've found a regression from 1.0.2. The report is captured
in SPARK-
Yea, SSD + SPARK_PREPEND_CLASSES is great for iterative development!
Then why it is ok with a bag of 3rd jars but throw error with assembly jar, any
one have idea?
On 2014/9/3 2:57, Cheng Lian wrote:
Cool, didn't notice that, thanks Josh!
On Tue, Sep 2, 2014 at 11:55 AM, Josh Rosen
Thanks everyone for voting on this. There were two minor issues (one a
blocker) were found that warrant cutting a new RC. For those who voted
+1 on this release, I'd encourage you to +1 rc4 when it comes out
unless you have been testing issues specific to the EC2 scripts. This
will move the
On Sun, Aug 31, 2014 at 8:27 PM, Ian O'Connell i...@ianoconnell.com wrote:
I'm not sure what you mean here? Parquet is at its core just a format, you
could store that data anywhere.
Though it sounds like you saying, correct me if i'm wrong: you basically
want a columnar abstraction layer
40 matches
Mail list logo