Reynold, thanks for replying.

getPageSize parameters: maxMemory=515396075, numCores=0
Calculated values: cores=8, default=4194304

So am I getting a large page size as I only have 8 cores?

On 15 September 2015 at 00:40, Reynold Xin <r...@databricks.com> wrote:

> Pete - can you do me a favor?
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174
>
> Print the parameters that are passed into the getPageSize function, and
> check their values.
>
> On Mon, Sep 14, 2015 at 4:32 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> Is this on latest master / branch-1.5?
>>
>> out of the box we reserve only 16% (0.2 * 0.8) of the memory for
>> execution (e.g. aggregate, join) / shuffle sorting. With a 3GB heap, that's
>> 480MB. So each task gets 480MB / 32 = 15MB, and each operator reserves at
>> least one page for execution. If your page size is 4MB, it only takes 3
>> operators to use up its memory.
>>
>> The thing is page size is dynamically determined -- and in your case it
>> should be smaller than 4MB.
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174
>>
>> Maybe there is a place that in the maven tests that we explicitly set the
>> page size (spark.buffer.pageSize) to 4MB? If yes, we need to find it and
>> just remove it.
>>
>>
>> On Mon, Sep 14, 2015 at 4:16 AM, Pete Robbins <robbin...@gmail.com>
>> wrote:
>>
>>> I keep hitting errors running the tests on 1.5 such as
>>>
>>>
>>> - join31 *** FAILED ***
>>>   Failed to execute query using catalyst:
>>>   Error: Job aborted due to stage failure: Task 9 in stage 3653.0 failed
>>> 1 times, most recent failure: Lost task 9.0 in stage 3653.0 (TID 123363,
>>> localhost): java.io.IOException: Unable to acquire 4194304 bytes of memory
>>>       at
>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
>>>
>>>
>>> This is using the command
>>> build/mvn -Pyarn -Phadoop-2.2 -Phive -Phive-thriftserver  test
>>>
>>>
>>> I don't see these errors in any of the amplab jenkins builds. Do those
>>> builds have any configuration/environment that I may be missing? My build
>>> is running with whatever defaults are in the top level pom.xml, eg -Xmx3G.
>>>
>>> I can make these tests pass by setting spark.shuffle.memoryFraction=0.6
>>> in the HiveCompatibilitySuite rather than the default 0.2 value.
>>>
>>> Trying to analyze what is going on with the test it is related to the
>>> number of active tasks, which seems to rise to 32, and so the
>>> ShuffleMemoryManager allows less memory per task even though most of those
>>> tasks do not have any memory allocated to them.
>>>
>>> Has anyone seen issues like this before?
>>>
>>
>>
>

Reply via email to