with the locality issue resolved, i am still struggling with the new memory
management.

i am seeing tasks on tiny amounts of data take 15 seconds, of which 14 are
spend in GC. with the legacy memory management (spark.memory.useLegacyMode
= false ) they complete in 1 - 2 seconds.

since we are permanently caching a very large number of RDDs, my suspicion
is that with the new memory management these cached RDDs happily gobble up
all the memory, and need to be evicted to run my small job, leading to the
slowness.

i can revert to legacy memory management mode, so this is not an issue, but
i am worried that at some point the legacy memory management will be
deprecated and then i am stuck with this performance issue.

On Mon, Feb 29, 2016 at 12:47 PM, Koert Kuipers <ko...@tresata.com> wrote:

> setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks
>
>
> is there any reference to the benefits of setting reduceLocality to true?
> i am tempted to disable it across the board.
>
> On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang <yy201...@gmail.com> wrote:
>
>> The default value for spark.shuffle.reduceLocality.enabled is true.
>>
>> To reduce surprise to users of 1.5 and earlier releases, should the
>> default value be set to false ?
>>
>> On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga <lio...@taboola.com> wrote:
>>
>>> Hi Koret,
>>> Try spark.shuffle.reduceLocality.enabled=false
>>> This is an undocumented configuration.
>>> See:
>>> https://github.com/apache/spark/pull/8280
>>> https://issues.apache.org/jira/browse/SPARK-10567
>>>
>>> It solved the problem for me (both with and without memory legacy mode)
>>>
>>>
>>> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>>
>>>> i find it particularly confusing that a new memory management module
>>>> would change the locations. its not like the hash partitioner got replaced.
>>>> i can switch back and forth between legacy and "new" memory management and
>>>> see the distribution change... fully reproducible
>>>>
>>>> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga <lio...@taboola.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> I've experienced a similar problem upgrading from spark 1.4 to spark
>>>>> 1.6.
>>>>> The data is not evenly distributed across executors, but in my case it
>>>>> also reproduced with legacy mode.
>>>>> Also tried 1.6.1 rc-1, with same results.
>>>>>
>>>>> Still looking for resolution.
>>>>>
>>>>> Lior
>>>>>
>>>>> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> looking at the cached rdd i see a similar story:
>>>>>> with useLegacyMode = true the cached rdd is spread out across 10
>>>>>> executors, but with useLegacyMode = false the data for the cached rdd 
>>>>>> sits
>>>>>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
>>>>>> RDD that got partitioned (hash partitioner, 50 partitions) before being
>>>>>> cached.
>>>>>>
>>>>>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> hello all,
>>>>>>> we are just testing a semi-realtime application (it should return
>>>>>>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before
>>>>>>> this it used to run on spark 1.5.1
>>>>>>>
>>>>>>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>>>>>>> spark.memory.useLegacyMode = true, however if i switch to
>>>>>>> spark.memory.useLegacyMode = false the queries take about 50% to 100% 
>>>>>>> more
>>>>>>> time.
>>>>>>>
>>>>>>> the issue becomes clear when i focus on a single stage: the
>>>>>>> individual tasks are not slower at all, but they run on less executors.
>>>>>>> in my test query i have 50 tasks and 10 executors. both with
>>>>>>> useLegacyMode = true and useLegacyMode = false the tasks finish in 
>>>>>>> about 3
>>>>>>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
>>>>>>> false the tasks run on just 3 executors out of 10, while with 
>>>>>>> useLegacyMode
>>>>>>> = true they spread out across 10 executors. all the tasks running on 
>>>>>>> just a
>>>>>>> few executors leads to the slower results.
>>>>>>>
>>>>>>> any idea why this would happen?
>>>>>>> thanks! koert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to