Hi all,
I'm new to spark and I'm trying to play with it in order to understand how
it works.
So I began with running the LocalPi and SparkPi examples on my laptop in
local mode.
I notice that LocalPi is 3 times faster than SparkPi which is supposed to
be multi threaded.
Furthermore I have the
I am trying to setup Spark with YARN 2.2.0. My Hadoop is plain Hadoop from
Apache Hadoop website. When I SBT build against 2.2.0 it fails. While it
compiles with a lot of warnings when I try against Hadoop 2.0.5-alpha.
How can I compile Spark against YARN 2.2.0.
There is a related thread here:
I don't think yarn 2.2 is supported in 0.8 and very soon it will not be
supported in master either. Read this thread
http://mail-archives.apache.org/mod_mbox/spark-dev/201312.mbox/browser.
On Thu, Dec 12, 2013 at 4:24 PM, Pinak Pani
nishant.has.a.quest...@gmail.com wrote:
I am trying to setup
Do you mean it has been decided not to support YARN 2.2 in any future
release of version 0.8?
http://mail-archives.apache.org has big usability issue. You do not get URL
at the thread level instead month level. Can you please tell me the subject
of the mail you are referring. I will search in the
Hey,
On Thu, Dec 12, 2013 at 5:10 PM, Pinak Pani
nishant.has.a.quest...@gmail.com wrote:
Do you mean it has been decided not to support YARN 2.2 in any future
release of version 0.8?
Well AFAIK. But it might get in 0.9.
http://mail-archives.apache.org has big usability issue. You do not
Alright. Thanks guys. So, I what version of Hadoop is currently supported
by Spark. Also, I am not a Hadoop person, it is possible to access HDFS in
Spark without YARN?
On Thu, Dec 12, 2013 at 5:19 PM, Prashant Sharma scrapco...@gmail.comwrote:
Hey,
On Thu, Dec 12, 2013 at 5:10 PM, Pinak
Hi,
I'm reading through the STDERR logs of my slaves, and about 1/4 of them
don't actually start. Instead, the only thing on the log is the command
that should have launched the process. Thoughts?
Thanks
Obviously it depends on
what is missing, but if I were you, I'd try monkey patching pyspark with
the functionality you need first (along with submitting a pull request,
of course). The pyspark code is very readable, and a lot of
functionality just builds on top of a few primitives, as in the
Hi all,
I've had smashing success with Spark 0.7.x with this code, and this same
code on Spark 0.8.0 using a smaller data set. However, when I try to use a
larger data set, some strange behavior occurs.
I'm trying to do L2 regularization with Logistic Regression using the new
ML Lib.
Reading
Hello,
When trying to read from a file, sc.textFile() hangs for exactly one minute.
From the spark shell,
scala val v = sc.textFile(README.txt)// Hangs for one minute
After one minute the command successfully returns the result. Now, v.count also
blocks for one minute but returns the
How big is your data set?
Did you set SPARK_MEM and SPARK_WORKER_MEMORY environmental variables?
On Thu, Dec 12, 2013 at 9:07 AM, Walrus theCat walrusthe...@gmail.comwrote:
Hi all,
I've had smashing success with Spark 0.7.x with this code, and this same
code on Spark 0.8.0 using a smaller
Yeah, I’m curious which APIs you found missing in Python. I know we have a lot on the Scala side that aren’t yet in there, but I’m not sure how to prioritize them.If you do want to call Python from Scala, you can also use the RDD.pipe() operation to pass data through an external process. However
How long did they run for? The JVM takes a few seconds to start up and compile
code, not to mention that Spark takes some time to initialize too, so you won’t
see a major difference unless the application is taking longer. One other
problem in this job is that it might use Math.random(), which
The hadoopFile method reuses the Writable object between records that it reads
by default, so you get back the same object. You should clone them if you need
to cache them. This is kind of an unintuitive behavior that we’ll probably need
to turn off by default; it’s helpful when you don’t need
Hi Philip,
I got this bit of code to work in the spark-shell using scala against our dev
hbase cluster.
-bash-4.1$export
SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/CDH/lib/hbase/hbase.jar:/opt/cloudera/parcels/CDH/lib/hbase/conf:/opt/cloudera/parcels/CDH/lib/hadoop/conf
I'm going to try what Ewen suggested--the Python wrappers seem pretty straightforward to understand and very readable. In particular, I am interested in SparkContext.hadoopRDD() and RDD.saveAsTextFile() (with compression).To elaborate on the first count, I'd like to be able to take XML files in
You might also check the spark/work/ directory for application (Executor)
logs on the slaves.
On Tue, Nov 19, 2013 at 6:13 PM, Umar Javed umarj.ja...@gmail.com wrote:
I have a scala script that I'm trying to run on a Spark standalone cluster
with just one worker (existing on the master node).
See if there are any logs on the slaves that suggest why the tasks are
failing. Right now the master log is just saying some stuff is failing
but it's not clear why.
On Thu, Dec 12, 2013 at 9:36 AM, Taka Shinagawa taka.epsi...@gmail.comwrote:
How big is your data set?
Did you set SPARK_MEM
ah, got it, makes a lot more sense now. I couldn't figure out what w was,
I should have figured it was weights.
As Evan suggested, using zip is almost certainly what you want.
val pointsAndWeights: RDD[(Double,Double)] = ...
zipping together id_x and id_w will give you exactly that, but maybe
Thanks, Matei. I expected something along these lines.
Robert
On Fri, Dec 13, 2013 at 5:28 AM, Matei Zaharia matei.zaha...@gmail.comwrote:
The hadoopFile method reuses the Writable object between records that it
reads by default, so you get back the same object. You should clone them if
When I call rdd.saveAsTextFile(hdfs://...) it uses my username to
write to the HDFS drive. If I try to write to an HDFS directory that I
do not have permissions to, then I get an error like this:
Permission denied: user=me, access=WRITE,
inode=/user/you/:you:us:drwxr-xr-x
I can obviously
Hey Philip,
how do you get spark to write to hdfs with your user name? When i use spark
it writes to hdfs as the user that runs the spark services... i wish it
read and wrote as me.
On Thu, Dec 12, 2013 at 6:37 PM, Philip Ogren philip.og...@oracle.comwrote:
When I call
22 matches
Mail list logo