Error running spark-sql-perf version 0.3.2 against Spark 1.6

2016-04-27 Thread Michael Slavitch
Hello; I'm trying to run spark-sql-perf version 0.3.2 (hash cb0347b) against Spark 1.6, I get the following when running ./bin/run --benchmark DatsetPerformance Exception in thread "main" java.lang.ClassNotFoundException: com.databricks.spark.sql.perf.DatsetPerformance Even though the

Re: Spark on Mobile platforms

2016-04-07 Thread Michael Slavitch
You should consider mobile agents that feed data into a spark datacenter via spark streaming. > On Apr 7, 2016, at 8:28 AM, Ashic Mahtab wrote: > > Spark may not be the right tool for this. Working on just the mobile device, > you won't be scaling out stuff, and as such most

Re: lost executor due to large shuffle spill memory

2016-04-06 Thread Michael Slavitch
3.2xlarge nodes. Should I increase > spark.storage.memoryFraction? Also I'm thinking maybe I should repartition > all_pairs so that each partition will be small enough to be handled. > > On Tue, Apr 5, 2016 at 8:03 PM, Michael Slavitch <slavi...@gmail.com > <mailto:slavi...@gma

Re: lost executor due to large shuffle spill memory

2016-04-05 Thread Michael Slavitch
Do you have enough disk space for the spill? It seems it has lots of memory reserved but not enough for the spill. You will need a disk that can handle the entire data partition for each host. Compression of the spilled data saves about 50% in most if not all cases. Given the large data set I

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Michael Slavitch
Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly propagated to all nodes? Are they identical? > On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote: > > [ CC'ing dev list since nearly identical questions have occurred in > user list recently w/o

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
id FS overhead of ramdisk, you could try to > hack a new shuffle implementation, since shuffle framework is pluggable. > > > On Sat, Apr 2, 2016 at 6:48 AM, Michael Slavitch <slavi...@gmail.com> > wrote: > >> As I mentioned earlier this flag is now ignored. &

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
y read my email, but spill has > nothing to do with the shuffle files on disk. It was for the partitioning > (i.e. sorting) process. If that flag is off, Spark will just run out of > memory when data doesn't fit in memory. > > > On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch &

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
> Are there concerns with taking that approach to test ? (I dont see > any, but I am not sure if I missed something). > > > Regards, > Mridul > > > > > On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch <slavi...@gmail.com> > wrote: > > I totally disagree that i

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
common to run Spark on a few number of beefy > nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into improving > performance for those. Meantime, you can setup local ramdisks on each node > for shuffle writes. > > > > On Fri, Apr 1, 2016 at 11:32 AM, Michael S

Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
Hello; I’m working on spark with very large memory systems (2TB+) and notice that Spark spills to disk in shuffle. Is there a way to force spark to stay in memory when doing shuffle operations? The goal is to keep the shuffle data either in the heap or in off-heap memory (in 1.6.x) and

Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
Hello; I’m working on spark with very large memory systems (2TB+) and notice that Spark spills to disk in shuffle. Is there a way to force spark to stay in memory when doing shuffle operations? The goal is to keep the shuffle data either in the heap or in off-heap memory (in 1.6.x) and