Re: Unable to acquire memory errors in HiveCompatibilitySuite

Marcelo Vanzin Tue, 15 Sep 2015 09:41:03 -0700

That test explicitly sets the number of executor cores to 32.

object TestHive
  extends TestHiveContext(
    new SparkContext(
      System.getProperty("spark.sql.test.master", "local[32]"),



On Mon, Sep 14, 2015 at 11:22 PM, Reynold Xin <r...@databricks.com> wrote:
> Yea I think this is where the heuristics is failing -- it uses 8 cores to
> approximate the number of active tasks, but the tests somehow is using 32
> (maybe because it explicitly sets it to that, or you set it yourself? I'm
> not sure which one)
>
> On Mon, Sep 14, 2015 at 11:06 PM, Pete Robbins <robbin...@gmail.com> wrote:
>>
>> Reynold, thanks for replying.
>>
>> getPageSize parameters: maxMemory=515396075, numCores=0
>> Calculated values: cores=8, default=4194304
>>
>> So am I getting a large page size as I only have 8 cores?
>>
>> On 15 September 2015 at 00:40, Reynold Xin <r...@databricks.com> wrote:
>>>
>>> Pete - can you do me a favor?
>>>
>>>
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174
>>>
>>> Print the parameters that are passed into the getPageSize function, and
>>> check their values.
>>>
>>> On Mon, Sep 14, 2015 at 4:32 PM, Reynold Xin <r...@databricks.com> wrote:
>>>>
>>>> Is this on latest master / branch-1.5?
>>>>
>>>> out of the box we reserve only 16% (0.2 * 0.8) of the memory for
>>>> execution (e.g. aggregate, join) / shuffle sorting. With a 3GB heap, that's
>>>> 480MB. So each task gets 480MB / 32 = 15MB, and each operator reserves at
>>>> least one page for execution. If your page size is 4MB, it only takes 3
>>>> operators to use up its memory.
>>>>
>>>> The thing is page size is dynamically determined -- and in your case it
>>>> should be smaller than 4MB.
>>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174
>>>>
>>>> Maybe there is a place that in the maven tests that we explicitly set
>>>> the page size (spark.buffer.pageSize) to 4MB? If yes, we need to find it 
>>>> and
>>>> just remove it.
>>>>
>>>>
>>>> On Mon, Sep 14, 2015 at 4:16 AM, Pete Robbins <robbin...@gmail.com>
>>>> wrote:
>>>>>
>>>>> I keep hitting errors running the tests on 1.5 such as
>>>>>
>>>>>
>>>>> - join31 *** FAILED ***
>>>>>   Failed to execute query using catalyst:
>>>>>   Error: Job aborted due to stage failure: Task 9 in stage 3653.0
>>>>> failed 1 times, most recent failure: Lost task 9.0 in stage 3653.0 (TID
>>>>> 123363, localhost): java.io.IOException: Unable to acquire 4194304 bytes 
>>>>> of
>>>>> memory
>>>>>       at
>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
>>>>>
>>>>>
>>>>> This is using the command
>>>>> build/mvn -Pyarn -Phadoop-2.2 -Phive -Phive-thriftserver  test
>>>>>
>>>>>
>>>>> I don't see these errors in any of the amplab jenkins builds. Do those
>>>>> builds have any configuration/environment that I may be missing? My build 
>>>>> is
>>>>> running with whatever defaults are in the top level pom.xml, eg -Xmx3G.
>>>>>
>>>>> I can make these tests pass by setting spark.shuffle.memoryFraction=0.6
>>>>> in the HiveCompatibilitySuite rather than the default 0.2 value.
>>>>>
>>>>> Trying to analyze what is going on with the test it is related to the
>>>>> number of active tasks, which seems to rise to 32, and so the
>>>>> ShuffleMemoryManager allows less memory per task even though most of those
>>>>> tasks do not have any memory allocated to them.
>>>>>
>>>>> Has anyone seen issues like this before?
>>>>
>>>>
>>>
>>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Unable to acquire memory errors in HiveCompatibilitySuite

Reply via email to