Re: Possible to broadcast a function?

2016-06-30 Thread Aaron Perrin
That's helpful, thanks. I didn't see that thread earlier. But, it sounds like the best solution is to use singletons in the executors, which I'm already doing. (BTW - the reason why I consider that method kind of hack-ish, is because the it makes the code a bit more difficult for others to

RE: Possible to broadcast a function?

2016-06-30 Thread Yong Zhang
How about this old discussion related to similar problem as yours. http://apache-spark-user-list.1001560.n3.nabble.com/Running-a-task-once-on-each-executor-td3203.html Yong From: aper...@timerazor.com Date: Wed, 29 Jun 2016 14:00:07 + Subject: Possible to broadcast a function? To:

Re: Possible to broadcast a function?

2016-06-29 Thread Bin Fan
following this suggestion, Aaron, you may take a look at Alluxio as the off-heap in-memory data storage as input/output for Spark jobs if that works for you. See more intro on how to run Spark with Alluxio as data input / output.

Re: Possible to broadcast a function?

2016-06-29 Thread Sean Owen
Ah, I completely read over the "250GB" part. Yeah you have a huge heap then and indeed you can run into problems with GC pauses. You can probably still manage such huge executors with a fair bit of care with the GC and memory settings, and, you have a good reason to consider this. In particular I

Re: Possible to broadcast a function?

2016-06-29 Thread Aaron Perrin
>From what I've read, people had seen performance issues when the JVM used more than 60 GiB of memory. I haven't tested it myself, but I guess not true? Also, how does one optimize memory when the driver allocates some on one node? For example, let's say my cluster has N nodes each with 500 GiB

Re: Possible to broadcast a function?

2016-06-29 Thread Sean Owen
If you have one executor per machine, which is the right default thing to do, and this is a singleton in the JVM, then this does just have one copy per machine. Of course an executor is tied to an app, so if you mean to hold this data across executors that won't help. On Wed, Jun 29, 2016 at

Re: Possible to broadcast a function?

2016-06-29 Thread Sonal Goyal
Have you looked at Alluxio? (earlier tachyon) Best Regards, Sonal Founder, Nube Technologies Reifier at Strata Hadoop World Reifier at Spark Summit 2015