Spark on YARN, ExecutorLostFailure for long running computations in map

2014-11-08 Thread jan.zikes
Hi,

I am getting ExecutorLostFailure when I run spark on YARN and in map I perform 
very long tasks (couple of hours). Error Log is below.

Do you know if it is possible to set something to make it possible for Spark to 
perform these very long running jobs in map?

Thank you very much for any advice.

Best regards,
Jan 
 
Spark log:
4533,931: [GC 394578K->20882K(1472000K), 0,0226470 secs]
Traceback (most recent call last):
  File "/home/hadoop/spark_stuff/spark_lda.py", line 112, in 
    models.saveAsTextFile(sys.argv[1])
  File "/home/hadoop/spark/python/pyspark/rdd.py", line 1324, in saveAsTextFile
    keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)
  File 
"/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 
538, in __call__
  File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", 
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in 
stage 0.0 failed 4 times, most recent failure: Lost task 28.3 in stage 0.0 (TID 
41, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure (executor 
lost)
Driver stacktrace:
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at scala.Option.foreach(Option.scala:236)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 
 
 
Yarn log:
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:41091 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:39160 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:45058 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-241.us-west-2.compute.internal:54111 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-238.us-west-2.compute.internal:45772 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-241.us-west-2.compute.internal:59509 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-238.us-west-2.compute.internal:35720 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:11 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509)
14/11/08 08:21:11 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509)
14/11/08 08:21:11 ERROR network.ConnectionManager: Corresponding 
SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509) not found
14/11/08 08:21:11 INFO cluster.YarnClientSchedulerBackend: Executor 10 
disconnected, so removing it
14/11/08 08:21:11 ERROR cluster.YarnClientClusterScheduler: Lost executor 10 on 
ip-172-16-1-241.us-west-2.compute.internal: remote Akka client disas

Issue with Custom Key Class

2014-11-08 Thread Bahubali Jain
Hi,
I have a custom key class.In this class equals() and hashcode() have been
overridden.
I have a javaPairRDD which has this class as the key .When  groupbykey() or
reducebykey() is called  a null object is being passed to the function
*equals*(Object obj) as a result the grouping is failing.
Is this a known issue?
I am using Spark 0.9 version.

Thanks,
Baahu


-- 
Twitter:http://twitter.com/Baahu


Re: Issue with Custom Key Class

2014-11-08 Thread Sean Owen
Does your RDD contain a null key?

On Sat, Nov 8, 2014 at 11:15 AM, Bahubali Jain  wrote:
> Hi,
> I have a custom key class.In this class equals() and hashcode() have been
> overridden.
> I have a javaPairRDD which has this class as the key .When  groupbykey() or
> reducebykey() is called  a null object is being passed to the function
> equals(Object obj) as a result the grouping is failing.
> Is this a known issue?
> I am using Spark 0.9 version.
>
> Thanks,
> Baahu
>
>
> --
> Twitter:http://twitter.com/Baahu
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: org/apache/commons/math3/random/RandomGenerator issue

2014-11-08 Thread lev
Hi,
I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having
the same error.
I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and it
didn't help.

Any ideas what might be the problem?

Thanks,
Lev.


anny9699 wrote
> I use the breeze.stats.distributions.Bernoulli in my code, however met
> this problem
> java.lang.NoClassDefFoundError:
> org/apache/commons/math3/random/RandomGenerator





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18406.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Embedding static files in a spark app

2014-11-08 Thread Jay Vyas
Hi spark. I have a set of text files that are dependencies of my app.

They are less than 2mb in total size.

What's the idiom for packaing text file dependencies for a spark based jar 
file? Class resources in packages ? Or distributing them separately?
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark on YARN, ExecutorLostFailure for long running computations in map

2014-11-08 Thread jan.zikes
So it seems that this problem was related to 
http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html
 and increasing the executor memory worked for me.
__


Hi,

I am getting ExecutorLostFailure when I run spark on YARN and in map I perform 
very long tasks (couple of hours). Error Log is below.

Do you know if it is possible to set something to make it possible for Spark to 
perform these very long running jobs in map?

Thank you very much for any advice.

Best regards,
Jan 
 
Spark log:
4533,931: [GC 394578K->20882K(1472000K), 0,0226470 secs]
Traceback (most recent call last):
  File "/home/hadoop/spark_stuff/spark_lda.py", line 112, in 
    models.saveAsTextFile(sys.argv[1])
  File "/home/hadoop/spark/python/pyspark/rdd.py", line 1324, in saveAsTextFile
    keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)
  File 
"/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 
538, in __call__
  File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", 
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in 
stage 0.0 failed 4 times, most recent failure: Lost task 28.3 in stage 0.0 (TID 
41, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure (executor 
lost)
Driver stacktrace:
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at scala.Option.foreach(Option.scala:236)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 
 
 
Yarn log:
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:41091 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:39160 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:45058 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-241.us-west-2.compute.internal:54111 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-238.us-west-2.compute.internal:45772 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-241.us-west-2.compute.internal:59509 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-238.us-west-2.compute.internal:35720 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:11 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509)
14/11/08 08:21:11 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509)
14/11/08 08:21:11 ERROR network.ConnectionManager: Corresponding 
SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.interna

Debian package for spark?

2014-11-08 Thread Kevin Burton
Are there debian packages for spark?

If not I plan on making one… I threw one together in about 20 minutes as
they are somewhat easy with maven and jdeb.  But of course there are other
things I need to install like cassandra support and an init script.

So I figured I’d ask here first.

If not we will open source our packaging code and put it on github.  It’s
about 50 lines of code :-P

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: org/apache/commons/math3/random/RandomGenerator issue

2014-11-08 Thread aross
lev wrote
> I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having
> the same error.
> I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and it
> didn't help.

I am experiencing likewise with all the breeze.stats.distributions using any
math3 version. 

I run 'spark-shell --jars commons-math3-3.1.1.jar', import the required
classes (org.apache.commons.math3.random.RandomGenerator, etc.), and am
unable to create any distributions from breeze despite having just imported
the offending class:

java.lang.NoClassDefFoundError:
org/apache/commons/math3/random/RandomGenerator
Caused by: java.lang.ClassNotFoundException:
org.apache.commons.math3.random.RandomGenerator




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18411.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Debian package for spark?

2014-11-08 Thread Kevin Burton
Nice!  Not sure how I missed that.  Building it now.  If it has all the
init scripts and config in the right place I might use that.

I might have to build a cassandra package too which adds cassandra
support.. I *think* at least.

Maybe distribute this .deb with the standard downloads?

Kevin

On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi  wrote:

> Yep there is one have a look here
> http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages
> Le 8 nov. 2014 19:48, "Kevin Burton"  a écrit :
>
> Are there debian packages for spark?
>>
>> If not I plan on making one… I threw one together in about 20 minutes as
>> they are somewhat easy with maven and jdeb.  But of course there are other
>> things I need to install like cassandra support and an init script.
>>
>> So I figured I’d ask here first.
>>
>> If not we will open source our packaging code and put it on github.  It’s
>> about 50 lines of code :-P
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: Debian package for spark?

2014-11-08 Thread Kevin Burton
looks like it doesn’t work:

> [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on
project spark-assembly_2.10: Failed to create debian package
/Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb:
Could not create deb package: Control file descriptor keys are invalid
[Version]. The following keys are mandatory [Package, Version, Section,
Priority, Architecture, Maintainer, Description]. Please check your
pom.xml/build.xml and your control file. -> [Help 1]

On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton  wrote:

> Nice!  Not sure how I missed that.  Building it now.  If it has all the
> init scripts and config in the right place I might use that.
>
> I might have to build a cassandra package too which adds cassandra
> support.. I *think* at least.
>
> Maybe distribute this .deb with the standard downloads?
>
> Kevin
>
> On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi 
> wrote:
>
>> Yep there is one have a look here
>> http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages
>> Le 8 nov. 2014 19:48, "Kevin Burton"  a écrit :
>>
>> Are there debian packages for spark?
>>>
>>> If not I plan on making one… I threw one together in about 20 minutes as
>>> they are somewhat easy with maven and jdeb.  But of course there are other
>>> things I need to install like cassandra support and an init script.
>>>
>>> So I figured I’d ask here first.
>>>
>>> If not we will open source our packaging code and put it on github.
>>> It’s about 50 lines of code :-P
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: Debian package for spark?

2014-11-08 Thread Kevin Burton
OK… here’s my version.

https://github.com/spinn3r/spark-deb

it’s just two files really.  so if the standard spark packages get fixed
I’ll just switch to them.

Doesn’t look like there’s an init script and the conf isn’t in /etc …

On Sat, Nov 8, 2014 at 12:06 PM, Kevin Burton  wrote:

> looks like it doesn’t work:
>
> > [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on
> project spark-assembly_2.10: Failed to create debian package
> /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb:
> Could not create deb package: Control file descriptor keys are invalid
> [Version]. The following keys are mandatory [Package, Version, Section,
> Priority, Architecture, Maintainer, Description]. Please check your
> pom.xml/build.xml and your control file. -> [Help 1]
>
> On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton  wrote:
>
>> Nice!  Not sure how I missed that.  Building it now.  If it has all the
>> init scripts and config in the right place I might use that.
>>
>> I might have to build a cassandra package too which adds cassandra
>> support.. I *think* at least.
>>
>> Maybe distribute this .deb with the standard downloads?
>>
>> Kevin
>>
>> On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi 
>> wrote:
>>
>>> Yep there is one have a look here
>>> http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages
>>> Le 8 nov. 2014 19:48, "Kevin Burton"  a écrit :
>>>
>>> Are there debian packages for spark?

 If not I plan on making one… I threw one together in about 20 minutes
 as they are somewhat easy with maven and jdeb.  But of course there are
 other things I need to install like cassandra support and an init script.

 So I figured I’d ask here first.

 If not we will open source our packaging code and put it on github.
 It’s about 50 lines of code :-P

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 


>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: org/apache/commons/math3/random/RandomGenerator issue

2014-11-08 Thread anny9699
Hi Lev,

I also finally couldn't solve that problem and switched to Java.util.Random.

Thanks~
Anny

On Sat, Nov 8, 2014 at 4:21 AM, lev [via Apache Spark User List] <
ml-node+s1001560n18406...@n3.nabble.com> wrote:

> Hi,
> I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having
> the same error.
> I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and it
> didn't help.
>
> Any ideas what might be the problem?
>
> Thanks,
> Lev.
>
> anny9699 wrote
> I use the breeze.stats.distributions.Bernoulli in my code, however met
> this problem
> java.lang.NoClassDefFoundError:
> org/apache/commons/math3/random/RandomGenerator
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18406.html
>  To unsubscribe from org/apache/commons/math3/random/RandomGenerator
> issue, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18415.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Debian package for spark?

2014-11-08 Thread Kevin Burton
Another note for the official debs.  ‘spark’ is a bad package name because
of confusion with the spark programming lang based on ada.

There are packages for this already named ‘spark’

so I put mine as ‘apache-spark’



On Sat, Nov 8, 2014 at 12:21 PM, Kevin Burton  wrote:

> OK… here’s my version.
>
> https://github.com/spinn3r/spark-deb
>
> it’s just two files really.  so if the standard spark packages get fixed
> I’ll just switch to them.
>
> Doesn’t look like there’s an init script and the conf isn’t in /etc …
>
> On Sat, Nov 8, 2014 at 12:06 PM, Kevin Burton  wrote:
>
>> looks like it doesn’t work:
>>
>> > [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on
>> project spark-assembly_2.10: Failed to create debian package
>> /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb:
>> Could not create deb package: Control file descriptor keys are invalid
>> [Version]. The following keys are mandatory [Package, Version, Section,
>> Priority, Architecture, Maintainer, Description]. Please check your
>> pom.xml/build.xml and your control file. -> [Help 1]
>>
>> On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton  wrote:
>>
>>> Nice!  Not sure how I missed that.  Building it now.  If it has all the
>>> init scripts and config in the right place I might use that.
>>>
>>> I might have to build a cassandra package too which adds cassandra
>>> support.. I *think* at least.
>>>
>>> Maybe distribute this .deb with the standard downloads?
>>>
>>> Kevin
>>>
>>> On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi 
>>> wrote:
>>>
 Yep there is one have a look here
 http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages
 Le 8 nov. 2014 19:48, "Kevin Burton"  a écrit :

 Are there debian packages for spark?
>
> If not I plan on making one… I threw one together in about 20 minutes
> as they are somewhat easy with maven and jdeb.  But of course there are
> other things I need to install like cassandra support and an init script.
>
> So I figured I’d ask here first.
>
> If not we will open source our packaging code and put it on github.
> It’s about 50 lines of code :-P
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: org/apache/commons/math3/random/RandomGenerator issue

2014-11-08 Thread Sean Owen
This means you haven't actually included commons-math3 in your
application. Check the contents of your final app jar and then go
check your build file again.

On Sat, Nov 8, 2014 at 12:20 PM, lev  wrote:
> Hi,
> I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having
> the same error.
> I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and it
> didn't help.
>
> Any ideas what might be the problem?
>
> Thanks,
> Lev.
>
>
> anny9699 wrote
>> I use the breeze.stats.distributions.Bernoulli in my code, however met
>> this problem
>> java.lang.NoClassDefFoundError:
>> org/apache/commons/math3/random/RandomGenerator
>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18406.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Debian package for spark?

2014-11-08 Thread Mark Hamstra
The building of the Debian package in Spark works just fine for me -- I
just did it using a clean check-out of 1.1.1-SNAPSHOT and `mvn -U -Pdeb
-DskipTests clean package`.  There's likely something else amiss in your
build.

Actually, that's not quite true.  There is one small problem with the
Debian packaging that you should be aware of:

https://issues.apache.org/jira/browse/SPARK-3624
https://github.com/apache/spark/pull/2477#issuecomment-58291272

You should also know there is no such thing as standard Debian packages or
official debs for Spark, nor is it likely that there ever will be.  What is
available was never intended as anything more than a convenient hack (or a
starting point for a custom hack) for Spark developers or users who need a
way to create a Spark package sufficient to use in a configuration
management system or something of that nature.  A proper collection of debs
that divides Spark up into multiple parts, properly reflects inter-package
dependencies, relocates executables, configuration and libraries to conform
to the expectations of a larger system, etc. is something that the Apache
Spark Project does not do, probably won't do, and probably shouldn't do --
something like that is better handled by the distributors of OSes or larger
software systems like Apache Bigtop.

On Sat, Nov 8, 2014 at 1:17 PM, Kevin Burton  wrote:

> Another note for the official debs.  ‘spark’ is a bad package name because
> of confusion with the spark programming lang based on ada.
>
> There are packages for this already named ‘spark’
>
> so I put mine as ‘apache-spark’
>
>
>
> On Sat, Nov 8, 2014 at 12:21 PM, Kevin Burton  wrote:
>
>> OK… here’s my version.
>>
>> https://github.com/spinn3r/spark-deb
>>
>> it’s just two files really.  so if the standard spark packages get fixed
>> I’ll just switch to them.
>>
>> Doesn’t look like there’s an init script and the conf isn’t in /etc …
>>
>> On Sat, Nov 8, 2014 at 12:06 PM, Kevin Burton  wrote:
>>
>>> looks like it doesn’t work:
>>>
>>> > [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on
>>> project spark-assembly_2.10: Failed to create debian package
>>> /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb:
>>> Could not create deb package: Control file descriptor keys are invalid
>>> [Version]. The following keys are mandatory [Package, Version, Section,
>>> Priority, Architecture, Maintainer, Description]. Please check your
>>> pom.xml/build.xml and your control file. -> [Help 1]
>>>
>>> On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton 
>>> wrote:
>>>
 Nice!  Not sure how I missed that.  Building it now.  If it has all the
 init scripts and config in the right place I might use that.

 I might have to build a cassandra package too which adds cassandra
 support.. I *think* at least.

 Maybe distribute this .deb with the standard downloads?

 Kevin

 On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi 
 wrote:

> Yep there is one have a look here
> http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages
> Le 8 nov. 2014 19:48, "Kevin Burton"  a écrit :
>
> Are there debian packages for spark?
>>
>> If not I plan on making one… I threw one together in about 20 minutes
>> as they are somewhat easy with maven and jdeb.  But of course there are
>> other things I need to install like cassandra support and an init script.
>>
>> So I figured I’d ask here first.
>>
>> If not we will open source our packaging code and put it on github.
>> It’s about 50 lines of code :-P
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>


 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 


>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


Do spark works on multicore systems?

2014-11-08 Thread hmushtaq
I am a Spark newbie and I use python (pyspark). I am trying to run a program
on a 64 core system, but no matter what I do, it always use 1 core. It
doesn't matter if I run it using "spark-submit --master local[64] run.sh" or
I call x.repartition(64) in my code with an RDD, the spark program always
use one core. Has anyone experience of running spark programs on multicore
processors with success? Can someone provide me a very simple example that
does properly run on all cores of a multicore system? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Do-spark-works-on-multicore-systems-tp18419.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Does spark works on multicore systems?

2014-11-08 Thread Blind Faith
I am a Spark newbie and I use python (pyspark). I am trying to run a
program on a 64 core system, but no matter what I do, it always uses 1
core. It doesn't matter if I run it using "spark-submit --master local[64]
run.sh" or I call x.repartition(64) in my code with an RDD, the spark
program always uses one core. Has anyone experience of running spark
programs on multicore processors with success? Can someone provide me a
very simple example that does properly run on all cores of a multicore
system?


Re: Debian package for spark?

2014-11-08 Thread Kevin Burton
Weird… I’m using a 1.1.0 source tar.gz …

but if it’s fixed in 1.1.1 that’s good.

On Sat, Nov 8, 2014 at 2:08 PM, Mark Hamstra 
wrote:

> The building of the Debian package in Spark works just fine for me -- I
> just did it using a clean check-out of 1.1.1-SNAPSHOT and `mvn -U -Pdeb
> -DskipTests clean package`.  There's likely something else amiss in your
> build.
>
> Actually, that's not quite true.  There is one small problem with the
> Debian packaging that you should be aware of:
>
> https://issues.apache.org/jira/browse/SPARK-3624
> https://github.com/apache/spark/pull/2477#issuecomment-58291272
>
> You should also know there is no such thing as standard Debian packages or
> official debs for Spark, nor is it likely that there ever will be.  What is
> available was never intended as anything more than a convenient hack (or a
> starting point for a custom hack) for Spark developers or users who need a
> way to create a Spark package sufficient to use in a configuration
> management system or something of that nature.  A proper collection of debs
> that divides Spark up into multiple parts, properly reflects inter-package
> dependencies, relocates executables, configuration and libraries to conform
> to the expectations of a larger system, etc. is something that the Apache
> Spark Project does not do, probably won't do, and probably shouldn't do --
> something like that is better handled by the distributors of OSes or larger
> software systems like Apache Bigtop.
>
> On Sat, Nov 8, 2014 at 1:17 PM, Kevin Burton  wrote:
>
>> Another note for the official debs.  ‘spark’ is a bad package name
>> because of confusion with the spark programming lang based on ada.
>>
>> There are packages for this already named ‘spark’
>>
>> so I put mine as ‘apache-spark’
>>
>>
>>
>> On Sat, Nov 8, 2014 at 12:21 PM, Kevin Burton  wrote:
>>
>>> OK… here’s my version.
>>>
>>> https://github.com/spinn3r/spark-deb
>>>
>>> it’s just two files really.  so if the standard spark packages get fixed
>>> I’ll just switch to them.
>>>
>>> Doesn’t look like there’s an init script and the conf isn’t in /etc …
>>>
>>> On Sat, Nov 8, 2014 at 12:06 PM, Kevin Burton 
>>> wrote:
>>>
 looks like it doesn’t work:

 > [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on
 project spark-assembly_2.10: Failed to create debian package
 /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb:
 Could not create deb package: Control file descriptor keys are invalid
 [Version]. The following keys are mandatory [Package, Version, Section,
 Priority, Architecture, Maintainer, Description]. Please check your
 pom.xml/build.xml and your control file. -> [Help 1]

 On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton 
 wrote:

> Nice!  Not sure how I missed that.  Building it now.  If it has all
> the init scripts and config in the right place I might use that.
>
> I might have to build a cassandra package too which adds cassandra
> support.. I *think* at least.
>
> Maybe distribute this .deb with the standard downloads?
>
> Kevin
>
> On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi 
> wrote:
>
>> Yep there is one have a look here
>> http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages
>> Le 8 nov. 2014 19:48, "Kevin Burton"  a écrit :
>>
>> Are there debian packages for spark?
>>>
>>> If not I plan on making one… I threw one together in about 20
>>> minutes as they are somewhat easy with maven and jdeb.  But of course 
>>> there
>>> are other things I need to install like cassandra support and an init
>>> script.
>>>
>>> So I figured I’d ask here first.
>>>
>>> If not we will open source our packaging code and put it on github.
>>> It’s about 50 lines of code :-P
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 


>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 

Re: Does spark works on multicore systems?

2014-11-08 Thread Aaron Davidson
oops, meant to cc userlist too

On Sat, Nov 8, 2014 at 3:13 PM, Aaron Davidson  wrote:

> The default local master is "local[*]", which should use all cores on your
> system. So you should be able to just do "./bin/pyspark" and
> "sc.parallelize(range(1000)).count()" and see that all your cores were used.
>
> On Sat, Nov 8, 2014 at 2:20 PM, Blind Faith 
> wrote:
>
>> I am a Spark newbie and I use python (pyspark). I am trying to run a
>> program on a 64 core system, but no matter what I do, it always uses 1
>> core. It doesn't matter if I run it using "spark-submit --master local[64]
>> run.sh" or I call x.repartition(64) in my code with an RDD, the spark
>> program always uses one core. Has anyone experience of running spark
>> programs on multicore processors with success? Can someone provide me a
>> very simple example that does properly run on all cores of a multicore
>> system?
>>
>
>


Unresolved Attributes

2014-11-08 Thread Srinivas Chamarthi
I have an exception when I am trying to run a simple where clause query. I
can see the name attribute is present in the schema but somehow it still
throws the exception.

query = "select name from business where business_id=" + business_id

what am I doing wrong ?

thx
srinivas


Exception in thread "main"
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
attributes: 'name, tree:
Project ['name]
 Filter (business_id#1 = 'Ba1hXOqb3Yhix8bhE0k_WQ)
  Subquery business
   SparkLogicalPlan (ExistingRdd
[attributes#0,business_id#1,categories#2,city#3,full_address#4,hours#5,latitude#6,longitude#7,name#8,neighborhoods#9,open#10,review_count#11,stars#12,state#13,type#14],
MappedRDD[5] at map at JsonRDD.scala:38)


Re: Debian package for spark?

2014-11-08 Thread Mark Hamstra
No change from 1.1.0 to 1.1.1-SNAPSHOT.  The deb profile hasn't changed
since before the 1.0.2 release.

On Sat, Nov 8, 2014 at 3:12 PM, Kevin Burton  wrote:

> Weird… I’m using a 1.1.0 source tar.gz …
>
> but if it’s fixed in 1.1.1 that’s good.
>
> On Sat, Nov 8, 2014 at 2:08 PM, Mark Hamstra 
> wrote:
>
>> The building of the Debian package in Spark works just fine for me -- I
>> just did it using a clean check-out of 1.1.1-SNAPSHOT and `mvn -U -Pdeb
>> -DskipTests clean package`.  There's likely something else amiss in your
>> build.
>>
>> Actually, that's not quite true.  There is one small problem with the
>> Debian packaging that you should be aware of:
>>
>> https://issues.apache.org/jira/browse/SPARK-3624
>> https://github.com/apache/spark/pull/2477#issuecomment-58291272
>>
>> You should also know there is no such thing as standard Debian packages
>> or official debs for Spark, nor is it likely that there ever will be.  What
>> is available was never intended as anything more than a convenient hack (or
>> a starting point for a custom hack) for Spark developers or users who need
>> a way to create a Spark package sufficient to use in a configuration
>> management system or something of that nature.  A proper collection of debs
>> that divides Spark up into multiple parts, properly reflects inter-package
>> dependencies, relocates executables, configuration and libraries to conform
>> to the expectations of a larger system, etc. is something that the Apache
>> Spark Project does not do, probably won't do, and probably shouldn't do --
>> something like that is better handled by the distributors of OSes or larger
>> software systems like Apache Bigtop.
>>
>> On Sat, Nov 8, 2014 at 1:17 PM, Kevin Burton  wrote:
>>
>>> Another note for the official debs.  ‘spark’ is a bad package name
>>> because of confusion with the spark programming lang based on ada.
>>>
>>> There are packages for this already named ‘spark’
>>>
>>> so I put mine as ‘apache-spark’
>>>
>>>
>>>
>>> On Sat, Nov 8, 2014 at 12:21 PM, Kevin Burton 
>>> wrote:
>>>
 OK… here’s my version.

 https://github.com/spinn3r/spark-deb

 it’s just two files really.  so if the standard spark packages get
 fixed I’ll just switch to them.

 Doesn’t look like there’s an init script and the conf isn’t in /etc …

 On Sat, Nov 8, 2014 at 12:06 PM, Kevin Burton 
 wrote:

> looks like it doesn’t work:
>
> > [ERROR] Failed to execute goal org.vafer:jdeb:0.11:jdeb (default) on
> project spark-assembly_2.10: Failed to create debian package
> /Users/burton/Dropbox/projects-macbook-pro-2013-09/spark-1.1.0/assembly/target/spark_1.1.0-${buildNumber}_all.deb:
> Could not create deb package: Control file descriptor keys are invalid
> [Version]. The following keys are mandatory [Package, Version, Section,
> Priority, Architecture, Maintainer, Description]. Please check your
> pom.xml/build.xml and your control file. -> [Help 1]
>
> On Sat, Nov 8, 2014 at 11:24 AM, Kevin Burton 
> wrote:
>
>> Nice!  Not sure how I missed that.  Building it now.  If it has all
>> the init scripts and config in the right place I might use that.
>>
>> I might have to build a cassandra package too which adds cassandra
>> support.. I *think* at least.
>>
>> Maybe distribute this .deb with the standard downloads?
>>
>> Kevin
>>
>> On Sat, Nov 8, 2014 at 11:19 AM, Eugen Cepoi 
>> wrote:
>>
>>> Yep there is one have a look here
>>> http://spark.apache.org/docs/latest/building-with-maven.html#building-spark-debian-packages
>>> Le 8 nov. 2014 19:48, "Kevin Burton"  a écrit :
>>>
>>> Are there debian packages for spark?

 If not I plan on making one… I threw one together in about 20
 minutes as they are somewhat easy with maven and jdeb.  But of course 
 there
 are other things I need to install like cassandra support and an init
 script.

 So I figured I’d ask here first.

 If not we will open source our packaging code and put it on
 github.  It’s about 50 lines of code :-P

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 


>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 

contains in array in Spark SQL

2014-11-08 Thread Srinivas Chamarthi
hi,

what would be the syntax for check for  an attribute in an array data type
for my where clause ?

select * from business where cateogories contains 'X' // something like
this , is this right syntax ??

attribute: categories
type: Array

thx
srinivas


Re: org/apache/commons/math3/random/RandomGenerator issue

2014-11-08 Thread Shivaram Venkataraman
I ran into this problem too and I know of a workaround but don't exactly
know what is happening. The work around is to explicitly add either the
commons math jar or your application jar (shaded with commons math)
to spark.executor.extraClassPath.

My hunch is that this is related to the class loader problem described in
[1] where Spark loads breeze at the beginning and then having commons math
in the user's jar somehow doesn't get picked up.

Thanks
Shivaram
[1]
http://apache-spark-user-list.1001560.n3.nabble.com/Native-library-can-not-be-loaded-when-using-Mllib-PCA-td7042.html#a8307

On Sat, Nov 8, 2014 at 1:21 PM, Sean Owen  wrote:

> This means you haven't actually included commons-math3 in your
> application. Check the contents of your final app jar and then go
> check your build file again.
>
> On Sat, Nov 8, 2014 at 12:20 PM, lev  wrote:
> > Hi,
> > I'm using breeze.stats.distributions.Binomial with spark 1.1.0 and having
> > the same error.
> > I tried to add the dependency to math3 with versions 3.11, 3.2, 3.3 and
> it
> > didn't help.
> >
> > Any ideas what might be the problem?
> >
> > Thanks,
> > Lev.
> >
> >
> > anny9699 wrote
> >> I use the breeze.stats.distributions.Bernoulli in my code, however met
> >> this problem
> >> java.lang.NoClassDefFoundError:
> >> org/apache/commons/math3/random/RandomGenerator
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-commons-math3-random-RandomGenerator-issue-tp15748p18406.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


wierd caching

2014-11-08 Thread Nathan Kronenfeld
RDD NameStorage LevelCached PartitionsFraction CachedSize in MemorySize in
TachyonSize on Disk   8
 Memory Deserialized
1x Replicated 426 107% 59.7 GB 0.0 B 0.0 BAnyone understand what it means
to have more than 100% of an rdd cached?

Thanks,
-Nathan


Re: wierd caching

2014-11-08 Thread Matei Zaharia
It might mean that some partition was computed on two nodes, because a task for 
it wasn't able to be scheduled locally on the first node. Did the RDD really 
have 426 partitions total? You can click on it and see where there are copies 
of each one.

Matei

> On Nov 8, 2014, at 10:16 PM, Nathan Kronenfeld  
> wrote:
> 
> RDD Name  Storage Level   Cached Partitions   Fraction Cached Size in 
> Memory  Size in Tachyon Size on Disk
> 8    Memory 
> Deserialized 1x Replicated   426 107%59.7 GB 0.0 B   0.0 B
> Anyone understand what it means to have more than 100% of an rdd cached?
> 
> Thanks,
> -Nathan
> 



Make Spark Job Board permanent.

2014-11-08 Thread Egor Pahomov
During Spark Summit 2014 there was a Job Board(
http://spark-summit.org/2014/jobs) for positions related to spark
technology. It is great thing, because it's hard to search for position,
related to so young technology. And such board good for spark community,
because it makes easy for companies to find people working on this
technology.

Could Databricks or Spark Summit organizers make such board permanent?

P.S. It possible to find not remote jobs on resources like Dice. It's hard
to find remote jobs, because they mostly around startups, which don't use
Dice.

-- 



*Sincerely yoursEgor PakhomovScala Developer, Yandex*