java.lang.NoClassDefFoundError: org/apache/spark/deploy/worker/Worker

2014-05-18 Thread Hao Wang
Hi, all *Spark version: bae07e3 [behind 1] fix different versions of commons-lang dependency and apache/spark#746 addendum* I have six worker nodes and four of them have this NoClassDefFoundError when I use thestart-slaves.sh on my driver node. However, running ./bin/spark-class

Re: breeze DGEMM slow in spark

2014-05-18 Thread wxhsdp
Hi, xiangrui i check the stderr of worker node, yes it's failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS... what do you mean by include breeze-natives or netlib:all? things i've already done: 1. add breeze and breeze native dependency in sbt build file

Re: breeze DGEMM slow in spark

2014-05-18 Thread wxhsdp
Hi, xiangrui you said It doesn't work if you put the netlib-native jar inside an assembly jar. Try to mark it provided in the dependencies, and use --jars to include them with spark-submit. -Xiangrui i'am not use an assembly jar which contains every thing, i also mark breeze

Re: Configuring Spark for reduceByKey on on massive data sets

2014-05-18 Thread lukas nalezenec
Hi Try using *reduceByKeyLocally*. Regards Lukas Nalezenec On Sun, May 18, 2014 at 3:33 AM, Matei Zaharia matei.zaha...@gmail.comwrote: Make sure you set up enough reduce partitions so you don’t overload them. Another thing that may help is checking whether you’ve run out of local disk space

Re: File list read into single RDD

2014-05-18 Thread Pat Ferrel
Doesn’t using an HDFS path pattern then restrict the URI to an HDFS URI. Since Spark supports several FS schemes I’m unclear about how much to assume about using the hadoop file systems APIs and conventions. Concretely if I pass a pattern in with a HTTPS file system, will the pattern work?

Re: File list read into single RDD

2014-05-18 Thread Andrew Ash
Spark's sc.textFile()https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L456 method delegates to sc.hadoopFile(), which uses Hadoop's

Re: Passing runtime config to workers?

2014-05-18 Thread Robert James
I see - I didn't realize that scope would work like that. Are you saying that any variable that is in scope of the lambda passed to map will be automagically propagated to all workers? What if it's not explicitly referenced in the map, only used by it. E.g.: def main: settings.setSettings

IllegelAccessError when writing to HBase?

2014-05-18 Thread Nan Zhu
Hi, all I tried to write data to HBase in a Spark-1.0 rc8 application, the application is terminated due to the java.lang.IllegalAccessError, Hbase shell works fine, and the same application works with a standalone Hbase deployment java.lang.IllegalAccessError:

Re: IllegelAccessError when writing to HBase?

2014-05-18 Thread Nan Zhu
I tried hbase-0.96.2/0.98.1/0.98.2 HDFS version is 2.3 -- Nan Zhu On Sunday, May 18, 2014 at 4:18 PM, Nan Zhu wrote: Hi, all I tried to write data to HBase in a Spark-1.0 rc8 application, the application is terminated due to the java.lang.IllegalAccessError, Hbase shell works

Re: unsubscribe

2014-05-18 Thread Andrew Ash
Hi Shangyu (and everyone else looking to unsubscribe!), If you'd like to get off this mailing list, please send an email to user *-unsubscribe*@spark.apache.org, not the regular user@spark.apache.org list. How to use the Apache mailing list infrastructure is documented here:

Re: Text file and shuffle

2014-05-18 Thread Han JU
I think the shuffle is unavoidable given that the input partitions (probably hadoop input spits in your case) are not arranged in the way of a cogroup job. But maybe you can try: 1) co-partition you data for cogroup: val par = HashPartitioner(128) val big =

First sample with Spark Streaming and three Time's?

2014-05-18 Thread Jacek Laskowski
Hi, I'm quite new to Spark Streaming and developed the following application to pass 4 strings, process them and shut down: val conf = new SparkConf(false) // skip loading external settings .setMaster(local[1]) // run locally with one thread .setAppName(Spark Streaming with

Re: breeze DGEMM slow in spark

2014-05-18 Thread wxhsdp
ok Spark Executor Command: java -cp

Re: Passing runtime config to workers?

2014-05-18 Thread DB Tsai
When you reference any variable outside the executor's scope, spark will automatically serialize them in the driver, and send them to executors, which implies, those variables have to implement serializable. For the example you mention, the Spark will serialize object F, and if it's not

sync master with slaves with bittorrent?

2014-05-18 Thread Daniel Mahler
I am launching a rather large cluster on ec2. It seems like the launch is taking forever on Setting up spark RSYNC'ing /root/spark to slaves... ... It seems that bittorrent might be a faster way to replicate the sizeable spark directory to the slaves particularly if there is a lot of not

Re: sync master with slaves with bittorrent?

2014-05-18 Thread Aaron Davidson
Out of curiosity, do you have a library in mind that would make it easy to setup a bit torrent network and distribute files in an rsync (i.e., apply a diff to a tree, ideally) fashion? I'm not familiar with this space, but we do want to minimize the complexity of our standard ec2 launch scripts to

Re: sync master with slaves with bittorrent?

2014-05-18 Thread Daniel Mahler
I am not an expert in this space either. I thought the initial rsync during launch is really just a straight copy that did not need the tree diff. So it seemed like having the slaves do the copying among it each other would be better than having the master copy to everyone directly. That made me