Re: spark 1.6 new memory management - some issues with tasks not using all executors
No reference. I opened a ticket about missing documentation for it, and was answered by Sean Owen that this is not meant for spark users. I explained that it's an issue, but no news so far. As for the memory management, I'm not experienced with it, but I suggest you read: http://0x0fff.com/spark-memory-management/ and http://0x0fff.com/spark-architecture/ Could be that the effective default storage memory in spark 1.6 is a bit lower than in spark 1.5, and your application can't borrow from the execution memory. On Thu, Mar 3, 2016 at 2:35 AM, Koert Kuipers wrote: > with the locality issue resolved, i am still struggling with the new > memory management. > > i am seeing tasks on tiny amounts of data take 15 seconds, of which 14 are > spend in GC. with the legacy memory management (spark.memory.useLegacyMode > = false ) they complete in 1 - 2 seconds. > > since we are permanently caching a very large number of RDDs, my suspicion > is that with the new memory management these cached RDDs happily gobble up > all the memory, and need to be evicted to run my small job, leading to the > slowness. > > i can revert to legacy memory management mode, so this is not an issue, > but i am worried that at some point the legacy memory management will be > deprecated and then i am stuck with this performance issue. > > On Mon, Feb 29, 2016 at 12:47 PM, Koert Kuipers wrote: > >> setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks >> >> >> is there any reference to the benefits of setting reduceLocality to true? >> i am tempted to disable it across the board. >> >> On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang wrote: >> >>> The default value for spark.shuffle.reduceLocality.enabled is true. >>> >>> To reduce surprise to users of 1.5 and earlier releases, should the >>> default value be set to false ? >>> >>> On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote: >>> Hi Koret, Try spark.shuffle.reduceLocality.enabled=false This is an undocumented configuration. See: https://github.com/apache/spark/pull/8280 https://issues.apache.org/jira/browse/SPARK-10567 It solved the problem for me (both with and without memory legacy mode) On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers wrote: > i find it particularly confusing that a new memory management module > would change the locations. its not like the hash partitioner got > replaced. > i can switch back and forth between legacy and "new" memory management and > see the distribution change... fully reproducible > > On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga > wrote: > >> Hi, >> I've experienced a similar problem upgrading from spark 1.4 to spark >> 1.6. >> The data is not evenly distributed across executors, but in my case >> it also reproduced with legacy mode. >> Also tried 1.6.1 rc-1, with same results. >> >> Still looking for resolution. >> >> Lior >> >> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers >> wrote: >> >>> looking at the cached rdd i see a similar story: >>> with useLegacyMode = true the cached rdd is spread out across 10 >>> executors, but with useLegacyMode = false the data for the cached rdd >>> sits >>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value >>> RDD that got partitioned (hash partitioner, 50 partitions) before being >>> cached. >>> >>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers >>> wrote: >>> hello all, we are just testing a semi-realtime application (it should return results in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it used to run on spark 1.5.1 in spark 1.6.0 the performance is similar to 1.5.1 if i set spark.memory.useLegacyMode = true, however if i switch to spark.memory.useLegacyMode = false the queries take about 50% to 100% more time. the issue becomes clear when i focus on a single stage: the individual tasks are not slower at all, but they run on less executors. in my test query i have 50 tasks and 10 executors. both with useLegacyMode = true and useLegacyMode = false the tasks finish in about 3 seconds and show as running PROCESS_LOCAL. however when useLegacyMode = false the tasks run on just 3 executors out of 10, while with useLegacyMode = true they spread out across 10 executors. all the tasks running on just a few executors leads to the slower results. any idea why this would happen? thanks! koert >>> >> > >>> >> >
Re: spark 1.6 new memory management - some issues with tasks not using all executors
with the locality issue resolved, i am still struggling with the new memory management. i am seeing tasks on tiny amounts of data take 15 seconds, of which 14 are spend in GC. with the legacy memory management (spark.memory.useLegacyMode = false ) they complete in 1 - 2 seconds. since we are permanently caching a very large number of RDDs, my suspicion is that with the new memory management these cached RDDs happily gobble up all the memory, and need to be evicted to run my small job, leading to the slowness. i can revert to legacy memory management mode, so this is not an issue, but i am worried that at some point the legacy memory management will be deprecated and then i am stuck with this performance issue. On Mon, Feb 29, 2016 at 12:47 PM, Koert Kuipers wrote: > setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks > > > is there any reference to the benefits of setting reduceLocality to true? > i am tempted to disable it across the board. > > On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang wrote: > >> The default value for spark.shuffle.reduceLocality.enabled is true. >> >> To reduce surprise to users of 1.5 and earlier releases, should the >> default value be set to false ? >> >> On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote: >> >>> Hi Koret, >>> Try spark.shuffle.reduceLocality.enabled=false >>> This is an undocumented configuration. >>> See: >>> https://github.com/apache/spark/pull/8280 >>> https://issues.apache.org/jira/browse/SPARK-10567 >>> >>> It solved the problem for me (both with and without memory legacy mode) >>> >>> >>> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers >>> wrote: >>> i find it particularly confusing that a new memory management module would change the locations. its not like the hash partitioner got replaced. i can switch back and forth between legacy and "new" memory management and see the distribution change... fully reproducible On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga wrote: > Hi, > I've experienced a similar problem upgrading from spark 1.4 to spark > 1.6. > The data is not evenly distributed across executors, but in my case it > also reproduced with legacy mode. > Also tried 1.6.1 rc-1, with same results. > > Still looking for resolution. > > Lior > > On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers > wrote: > >> looking at the cached rdd i see a similar story: >> with useLegacyMode = true the cached rdd is spread out across 10 >> executors, but with useLegacyMode = false the data for the cached rdd >> sits >> on only 3 executors (the rest all show 0s). my cached RDD is a key-value >> RDD that got partitioned (hash partitioner, 50 partitions) before being >> cached. >> >> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers >> wrote: >> >>> hello all, >>> we are just testing a semi-realtime application (it should return >>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before >>> this it used to run on spark 1.5.1 >>> >>> in spark 1.6.0 the performance is similar to 1.5.1 if i set >>> spark.memory.useLegacyMode = true, however if i switch to >>> spark.memory.useLegacyMode = false the queries take about 50% to 100% >>> more >>> time. >>> >>> the issue becomes clear when i focus on a single stage: the >>> individual tasks are not slower at all, but they run on less executors. >>> in my test query i have 50 tasks and 10 executors. both with >>> useLegacyMode = true and useLegacyMode = false the tasks finish in >>> about 3 >>> seconds and show as running PROCESS_LOCAL. however when useLegacyMode = >>> false the tasks run on just 3 executors out of 10, while with >>> useLegacyMode >>> = true they spread out across 10 executors. all the tasks running on >>> just a >>> few executors leads to the slower results. >>> >>> any idea why this would happen? >>> thanks! koert >>> >>> >>> >> > >>> >> >
Re: spark 1.6 new memory management - some issues with tasks not using all executors
setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks is there any reference to the benefits of setting reduceLocality to true? i am tempted to disable it across the board. On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang wrote: > The default value for spark.shuffle.reduceLocality.enabled is true. > > To reduce surprise to users of 1.5 and earlier releases, should the > default value be set to false ? > > On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote: > >> Hi Koret, >> Try spark.shuffle.reduceLocality.enabled=false >> This is an undocumented configuration. >> See: >> https://github.com/apache/spark/pull/8280 >> https://issues.apache.org/jira/browse/SPARK-10567 >> >> It solved the problem for me (both with and without memory legacy mode) >> >> >> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers >> wrote: >> >>> i find it particularly confusing that a new memory management module >>> would change the locations. its not like the hash partitioner got replaced. >>> i can switch back and forth between legacy and "new" memory management and >>> see the distribution change... fully reproducible >>> >>> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga wrote: >>> Hi, I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. The data is not evenly distributed across executors, but in my case it also reproduced with legacy mode. Also tried 1.6.1 rc-1, with same results. Still looking for resolution. Lior On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers wrote: > looking at the cached rdd i see a similar story: > with useLegacyMode = true the cached rdd is spread out across 10 > executors, but with useLegacyMode = false the data for the cached rdd sits > on only 3 executors (the rest all show 0s). my cached RDD is a key-value > RDD that got partitioned (hash partitioner, 50 partitions) before being > cached. > > On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers > wrote: > >> hello all, >> we are just testing a semi-realtime application (it should return >> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before >> this it used to run on spark 1.5.1 >> >> in spark 1.6.0 the performance is similar to 1.5.1 if i set >> spark.memory.useLegacyMode = true, however if i switch to >> spark.memory.useLegacyMode = false the queries take about 50% to 100% >> more >> time. >> >> the issue becomes clear when i focus on a single stage: the >> individual tasks are not slower at all, but they run on less executors. >> in my test query i have 50 tasks and 10 executors. both with >> useLegacyMode = true and useLegacyMode = false the tasks finish in about >> 3 >> seconds and show as running PROCESS_LOCAL. however when useLegacyMode = >> false the tasks run on just 3 executors out of 10, while with >> useLegacyMode >> = true they spread out across 10 executors. all the tasks running on >> just a >> few executors leads to the slower results. >> >> any idea why this would happen? >> thanks! koert >> >> >> > >>> >> >
Re: spark 1.6 new memory management - some issues with tasks not using all executors
The default value for spark.shuffle.reduceLocality.enabled is true. To reduce surprise to users of 1.5 and earlier releases, should the default value be set to false ? On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote: > Hi Koret, > Try spark.shuffle.reduceLocality.enabled=false > This is an undocumented configuration. > See: > https://github.com/apache/spark/pull/8280 > https://issues.apache.org/jira/browse/SPARK-10567 > > It solved the problem for me (both with and without memory legacy mode) > > > On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers wrote: > >> i find it particularly confusing that a new memory management module >> would change the locations. its not like the hash partitioner got replaced. >> i can switch back and forth between legacy and "new" memory management and >> see the distribution change... fully reproducible >> >> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga wrote: >> >>> Hi, >>> I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. >>> The data is not evenly distributed across executors, but in my case it >>> also reproduced with legacy mode. >>> Also tried 1.6.1 rc-1, with same results. >>> >>> Still looking for resolution. >>> >>> Lior >>> >>> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers >>> wrote: >>> looking at the cached rdd i see a similar story: with useLegacyMode = true the cached rdd is spread out across 10 executors, but with useLegacyMode = false the data for the cached rdd sits on only 3 executors (the rest all show 0s). my cached RDD is a key-value RDD that got partitioned (hash partitioner, 50 partitions) before being cached. On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers wrote: > hello all, > we are just testing a semi-realtime application (it should return > results in less than 20 seconds from cached RDDs) on spark 1.6.0. before > this it used to run on spark 1.5.1 > > in spark 1.6.0 the performance is similar to 1.5.1 if i set > spark.memory.useLegacyMode = true, however if i switch to > spark.memory.useLegacyMode = false the queries take about 50% to 100% more > time. > > the issue becomes clear when i focus on a single stage: the individual > tasks are not slower at all, but they run on less executors. > in my test query i have 50 tasks and 10 executors. both with > useLegacyMode = true and useLegacyMode = false the tasks finish in about 3 > seconds and show as running PROCESS_LOCAL. however when useLegacyMode = > false the tasks run on just 3 executors out of 10, while with > useLegacyMode > = true they spread out across 10 executors. all the tasks running on just > a > few executors leads to the slower results. > > any idea why this would happen? > thanks! koert > > > >>> >> >
Re: spark 1.6 new memory management - some issues with tasks not using all executors
Hi Koret, Try spark.shuffle.reduceLocality.enabled=false This is an undocumented configuration. See: https://github.com/apache/spark/pull/8280 https://issues.apache.org/jira/browse/SPARK-10567 It solved the problem for me (both with and without memory legacy mode) On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers wrote: > i find it particularly confusing that a new memory management module would > change the locations. its not like the hash partitioner got replaced. i can > switch back and forth between legacy and "new" memory management and see > the distribution change... fully reproducible > > On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga wrote: > >> Hi, >> I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. >> The data is not evenly distributed across executors, but in my case it >> also reproduced with legacy mode. >> Also tried 1.6.1 rc-1, with same results. >> >> Still looking for resolution. >> >> Lior >> >> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers wrote: >> >>> looking at the cached rdd i see a similar story: >>> with useLegacyMode = true the cached rdd is spread out across 10 >>> executors, but with useLegacyMode = false the data for the cached rdd sits >>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value >>> RDD that got partitioned (hash partitioner, 50 partitions) before being >>> cached. >>> >>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers >>> wrote: >>> hello all, we are just testing a semi-realtime application (it should return results in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it used to run on spark 1.5.1 in spark 1.6.0 the performance is similar to 1.5.1 if i set spark.memory.useLegacyMode = true, however if i switch to spark.memory.useLegacyMode = false the queries take about 50% to 100% more time. the issue becomes clear when i focus on a single stage: the individual tasks are not slower at all, but they run on less executors. in my test query i have 50 tasks and 10 executors. both with useLegacyMode = true and useLegacyMode = false the tasks finish in about 3 seconds and show as running PROCESS_LOCAL. however when useLegacyMode = false the tasks run on just 3 executors out of 10, while with useLegacyMode = true they spread out across 10 executors. all the tasks running on just a few executors leads to the slower results. any idea why this would happen? thanks! koert >>> >> >
Re: spark 1.6 new memory management - some issues with tasks not using all executors
i find it particularly confusing that a new memory management module would change the locations. its not like the hash partitioner got replaced. i can switch back and forth between legacy and "new" memory management and see the distribution change... fully reproducible On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga wrote: > Hi, > I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. > The data is not evenly distributed across executors, but in my case it > also reproduced with legacy mode. > Also tried 1.6.1 rc-1, with same results. > > Still looking for resolution. > > Lior > > On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers wrote: > >> looking at the cached rdd i see a similar story: >> with useLegacyMode = true the cached rdd is spread out across 10 >> executors, but with useLegacyMode = false the data for the cached rdd sits >> on only 3 executors (the rest all show 0s). my cached RDD is a key-value >> RDD that got partitioned (hash partitioner, 50 partitions) before being >> cached. >> >> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers wrote: >> >>> hello all, >>> we are just testing a semi-realtime application (it should return >>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before >>> this it used to run on spark 1.5.1 >>> >>> in spark 1.6.0 the performance is similar to 1.5.1 if i set >>> spark.memory.useLegacyMode = true, however if i switch to >>> spark.memory.useLegacyMode = false the queries take about 50% to 100% more >>> time. >>> >>> the issue becomes clear when i focus on a single stage: the individual >>> tasks are not slower at all, but they run on less executors. >>> in my test query i have 50 tasks and 10 executors. both with >>> useLegacyMode = true and useLegacyMode = false the tasks finish in about 3 >>> seconds and show as running PROCESS_LOCAL. however when useLegacyMode = >>> false the tasks run on just 3 executors out of 10, while with useLegacyMode >>> = true they spread out across 10 executors. all the tasks running on just a >>> few executors leads to the slower results. >>> >>> any idea why this would happen? >>> thanks! koert >>> >>> >>> >> >
Re: spark 1.6 new memory management - some issues with tasks not using all executors
Hi, I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. The data is not evenly distributed across executors, but in my case it also reproduced with legacy mode. Also tried 1.6.1 rc-1, with same results. Still looking for resolution. Lior On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers wrote: > looking at the cached rdd i see a similar story: > with useLegacyMode = true the cached rdd is spread out across 10 > executors, but with useLegacyMode = false the data for the cached rdd sits > on only 3 executors (the rest all show 0s). my cached RDD is a key-value > RDD that got partitioned (hash partitioner, 50 partitions) before being > cached. > > On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers wrote: > >> hello all, >> we are just testing a semi-realtime application (it should return results >> in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it >> used to run on spark 1.5.1 >> >> in spark 1.6.0 the performance is similar to 1.5.1 if i set >> spark.memory.useLegacyMode = true, however if i switch to >> spark.memory.useLegacyMode = false the queries take about 50% to 100% more >> time. >> >> the issue becomes clear when i focus on a single stage: the individual >> tasks are not slower at all, but they run on less executors. >> in my test query i have 50 tasks and 10 executors. both with >> useLegacyMode = true and useLegacyMode = false the tasks finish in about 3 >> seconds and show as running PROCESS_LOCAL. however when useLegacyMode = >> false the tasks run on just 3 executors out of 10, while with useLegacyMode >> = true they spread out across 10 executors. all the tasks running on just a >> few executors leads to the slower results. >> >> any idea why this would happen? >> thanks! koert >> >> >> >
Re: spark 1.6 new memory management - some issues with tasks not using all executors
looking at the cached rdd i see a similar story: with useLegacyMode = true the cached rdd is spread out across 10 executors, but with useLegacyMode = false the data for the cached rdd sits on only 3 executors (the rest all show 0s). my cached RDD is a key-value RDD that got partitioned (hash partitioner, 50 partitions) before being cached. On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers wrote: > hello all, > we are just testing a semi-realtime application (it should return results > in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it > used to run on spark 1.5.1 > > in spark 1.6.0 the performance is similar to 1.5.1 if i set > spark.memory.useLegacyMode = true, however if i switch to > spark.memory.useLegacyMode = false the queries take about 50% to 100% more > time. > > the issue becomes clear when i focus on a single stage: the individual > tasks are not slower at all, but they run on less executors. > in my test query i have 50 tasks and 10 executors. both with useLegacyMode > = true and useLegacyMode = false the tasks finish in about 3 seconds and > show as running PROCESS_LOCAL. however when useLegacyMode = false the > tasks run on just 3 executors out of 10, while with useLegacyMode = true > they spread out across 10 executors. all the tasks running on just a few > executors leads to the slower results. > > any idea why this would happen? > thanks! koert > > >
spark 1.6 new memory management - some issues with tasks not using all executors
hello all, we are just testing a semi-realtime application (it should return results in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it used to run on spark 1.5.1 in spark 1.6.0 the performance is similar to 1.5.1 if i set spark.memory.useLegacyMode = true, however if i switch to spark.memory.useLegacyMode = false the queries take about 50% to 100% more time. the issue becomes clear when i focus on a single stage: the individual tasks are not slower at all, but they run on less executors. in my test query i have 50 tasks and 10 executors. both with useLegacyMode = true and useLegacyMode = false the tasks finish in about 3 seconds and show as running PROCESS_LOCAL. however when useLegacyMode = false the tasks run on just 3 executors out of 10, while with useLegacyMode = true they spread out across 10 executors. all the tasks running on just a few executors leads to the slower results. any idea why this would happen? thanks! koert