Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-03-03 Thread Lior Chaga
No reference. I opened a ticket about missing documentation for it, and was answered by Sean Owen that this is not meant for spark users. I explained that it's an issue, but no news so far. As for the memory management, I'm not experienced with it, but I suggest you read:

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-03-02 Thread Koert Kuipers
with the locality issue resolved, i am still struggling with the new memory management. i am seeing tasks on tiny amounts of data take 15 seconds, of which 14 are spend in GC. with the legacy memory management (spark.memory.useLegacyMode = false ) they complete in 1 - 2 seconds. since we are

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Koert Kuipers
setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks is there any reference to the benefits of setting reduceLocality to true? i am tempted to disable it across the board. On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang wrote: > The default value for

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Yin Yang
The default value for spark.shuffle.reduceLocality.enabled is true. To reduce surprise to users of 1.5 and earlier releases, should the default value be set to false ? On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote: > Hi Koret, > Try

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Lior Chaga
Hi Koret, Try spark.shuffle.reduceLocality.enabled=false This is an undocumented configuration. See: https://github.com/apache/spark/pull/8280 https://issues.apache.org/jira/browse/SPARK-10567 It solved the problem for me (both with and without memory legacy mode) On Sun, Feb 28, 2016 at 11:16

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-28 Thread Koert Kuipers
i find it particularly confusing that a new memory management module would change the locations. its not like the hash partitioner got replaced. i can switch back and forth between legacy and "new" memory management and see the distribution change... fully reproducible On Sun, Feb 28, 2016 at

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-28 Thread Lior Chaga
Hi, I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. The data is not evenly distributed across executors, but in my case it also reproduced with legacy mode. Also tried 1.6.1 rc-1, with same results. Still looking for resolution. Lior On Fri, Feb 19, 2016 at 2:01 AM,

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-18 Thread Koert Kuipers
looking at the cached rdd i see a similar story: with useLegacyMode = true the cached rdd is spread out across 10 executors, but with useLegacyMode = false the data for the cached rdd sits on only 3 executors (the rest all show 0s). my cached RDD is a key-value RDD that got partitioned (hash

spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-18 Thread Koert Kuipers
hello all, we are just testing a semi-realtime application (it should return results in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it used to run on spark 1.5.1 in spark 1.6.0 the performance is similar to 1.5.1 if i set spark.memory.useLegacyMode = true, however if i