Re: Possible to broadcast a function?

2016-06-30 Thread Aaron Perrin
-user-list.1001560.n3.nabble.com/Running-a-task-once-on-each-executor-td3203.html > > Yong > > From: aper...@timerazor.com > Date: Wed, 29 Jun 2016 14:00:07 +0000 > Subject: Possible to broadcast a function? > To: user@spark.apache.org > > The user guide describes a broadcast

RE: Possible to broadcast a function?

2016-06-30 Thread Yong Zhang
How about this old discussion related to similar problem as yours. http://apache-spark-user-list.1001560.n3.nabble.com/Running-a-task-once-on-each-executor-td3203.html Yong From: aper...@timerazor.com Date: Wed, 29 Jun 2016 14:00:07 + Subject: Possible to broadcast a function? To: user

Re: Possible to broadcast a function?

2016-06-29 Thread Bin Fan
following this suggestion, Aaron, you may take a look at Alluxio as the off-heap in-memory data storage as input/output for Spark jobs if that works for you. See more intro on how to run Spark with Alluxio as data input / output.

Re: Possible to broadcast a function?

2016-06-29 Thread Sean Owen
Ah, I completely read over the "250GB" part. Yeah you have a huge heap then and indeed you can run into problems with GC pauses. You can probably still manage such huge executors with a fair bit of care with the GC and memory settings, and, you have a good reason to consider this. In particular I

Re: Possible to broadcast a function?

2016-06-29 Thread Aaron Perrin
>From what I've read, people had seen performance issues when the JVM used more than 60 GiB of memory. I haven't tested it myself, but I guess not true? Also, how does one optimize memory when the driver allocates some on one node? For example, let's say my cluster has N nodes each with 500 GiB

Re: Possible to broadcast a function?

2016-06-29 Thread Sean Owen
If you have one executor per machine, which is the right default thing to do, and this is a singleton in the JVM, then this does just have one copy per machine. Of course an executor is tied to an app, so if you mean to hold this data across executors that won't help. On Wed, Jun 29, 2016 at

Re: Possible to broadcast a function?

2016-06-29 Thread Sonal Goyal
Have you looked at Alluxio? (earlier tachyon) Best Regards, Sonal Founder, Nube Technologies Reifier at Strata Hadoop World Reifier at Spark Summit 2015

Possible to broadcast a function?

2016-06-29 Thread Aaron Perrin
The user guide describes a broadcast as a way to move a large dataset to each node: "Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. They can be used, for example, to give every node a copy of a large input