Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Vadim Semenov Fri, 30 Sep 2016 12:03:40 -0700

Run more smaller executors: change `spark.executor.memory` to 32g and
`spark.executor.cores` to 2-4, for example.


Changing driver's memory won't help because it doesn't participate in
execution.

On Fri, Sep 30, 2016 at 2:58 PM, Babak Alipour <babak.alip...@gmail.com>
wrote:

> Thank you for your replies.
>
> @Mich, using LIMIT 100 in the query prevents the exception but given the
> fact that there's enough memory, I don't think this should happen even
> without LIMIT.
>
> @Vadim, here's the full stack trace:
>
> Caused by: java.lang.IllegalArgumentException: Cannot allocate a page
> with more than 17179869176 bytes
>         at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskM
> emoryManager.java:241)
>         at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryCo
> nsumer.java:121)
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalS
> orter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalS
> orter.insertRecord(UnsafeExternalSorter.java:396)
>         at org.apache.spark.sql.execution.UnsafeExternalRowSorter.inser
> tRow(UnsafeExternalRowSorter.java:94)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$Gen
> eratedIterator.sort_addToSorter$(Unknown Source)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$Gen
> eratedIterator.agg_doAggregateWithoutKey$(Unknown Source)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$Gen
> eratedIterator.processNext(Unknown Source)
>         at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(
> BufferedRowIterator.java:43)
>         at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfu
> n$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.w
> rite(BypassMergeSortShuffleWriter.java:125)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap
> Task.scala:79)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap
> Task.scala:47)
>         at org.apache.spark.scheduler.Task.run(Task.scala:85)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.
> scala:274)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> I'm running spark in local mode so there is only one executor, the driver
> and spark.driver.memory is set to 64g. Changing the driver's memory doesn't
> help.
>
> *Babak Alipour ,*
> *University of Florida*
>
> On Fri, Sep 30, 2016 at 2:05 PM, Vadim Semenov <
> vadim.seme...@datadoghq.com> wrote:
>
>> Can you post the whole exception stack trace?
>> What are your executor memory settings?
>>
>> Right now I assume that it happens in UnsafeExternalRowSorter ->
>> UnsafeExternalSorter:insertRecord
>>
>> Running more executors with lower `spark.executor.memory` should help.
>>
>>
>> On Fri, Sep 30, 2016 at 12:57 PM, Babak Alipour <babak.alip...@gmail.com>
>> wrote:
>>
>>> Greetings everyone,
>>>
>>> I'm trying to read a single field of a Hive table stored as Parquet in
>>> Spark (~140GB for the entire table, this single field should be just a few
>>> GB) and look at the sorted output using the following:
>>>
>>> sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC")
>>>
>>> But this simple line of code gives:
>>>
>>> Caused by: java.lang.IllegalArgumentException: Cannot allocate a page
>>> with more than 17179869176 bytes
>>>
>>> Same error for:
>>>
>>> sql("SELECT " + field + " FROM MY_TABLE).sort(field)
>>>
>>> and:
>>>
>>> sql("SELECT " + field + " FROM MY_TABLE).orderBy(field)
>>>
>>>
>>> I'm running this on a machine with more than 200GB of RAM, running in
>>> local mode with spark.driver.memory set to 64g.
>>>
>>> I do not know why it cannot allocate a big enough page, and why is it
>>> trying to allocate such a big page in the first place?
>>>
>>> I hope someone with more knowledge of Spark can shed some light on this.
>>> Thank you!
>>>
>>>
>>> *Best regards,*
>>> *Babak Alipour ,*
>>> *University of Florida*
>>>
>>
>>
>

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Reply via email to