Run more smaller executors: change `spark.executor.memory` to 32g and `spark.executor.cores` to 2-4, for example.
Changing driver's memory won't help because it doesn't participate in execution. On Fri, Sep 30, 2016 at 2:58 PM, Babak Alipour <babak.alip...@gmail.com> wrote: > Thank you for your replies. > > @Mich, using LIMIT 100 in the query prevents the exception but given the > fact that there's enough memory, I don't think this should happen even > without LIMIT. > > @Vadim, here's the full stack trace: > > Caused by: java.lang.IllegalArgumentException: Cannot allocate a page > with more than 17179869176 bytes > at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskM > emoryManager.java:241) > at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryCo > nsumer.java:121) > at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalS > orter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374) > at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalS > orter.insertRecord(UnsafeExternalSorter.java:396) > at org.apache.spark.sql.execution.UnsafeExternalRowSorter.inser > tRow(UnsafeExternalRowSorter.java:94) > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$Gen > eratedIterator.sort_addToSorter$(Unknown Source) > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$Gen > eratedIterator.agg_doAggregateWithoutKey$(Unknown Source) > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$Gen > eratedIterator.processNext(Unknown Source) > at org.apache.spark.sql.execution.BufferedRowIterator.hasNext( > BufferedRowIterator.java:43) > at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfu > n$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.w > rite(BypassMergeSortShuffleWriter.java:125) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap > Task.scala:79) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap > Task.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor. > scala:274) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool > Executor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > lExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > I'm running spark in local mode so there is only one executor, the driver > and spark.driver.memory is set to 64g. Changing the driver's memory doesn't > help. > > *Babak Alipour ,* > *University of Florida* > > On Fri, Sep 30, 2016 at 2:05 PM, Vadim Semenov < > vadim.seme...@datadoghq.com> wrote: > >> Can you post the whole exception stack trace? >> What are your executor memory settings? >> >> Right now I assume that it happens in UnsafeExternalRowSorter -> >> UnsafeExternalSorter:insertRecord >> >> Running more executors with lower `spark.executor.memory` should help. >> >> >> On Fri, Sep 30, 2016 at 12:57 PM, Babak Alipour <babak.alip...@gmail.com> >> wrote: >> >>> Greetings everyone, >>> >>> I'm trying to read a single field of a Hive table stored as Parquet in >>> Spark (~140GB for the entire table, this single field should be just a few >>> GB) and look at the sorted output using the following: >>> >>> sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC") >>> >>> But this simple line of code gives: >>> >>> Caused by: java.lang.IllegalArgumentException: Cannot allocate a page >>> with more than 17179869176 bytes >>> >>> Same error for: >>> >>> sql("SELECT " + field + " FROM MY_TABLE).sort(field) >>> >>> and: >>> >>> sql("SELECT " + field + " FROM MY_TABLE).orderBy(field) >>> >>> >>> I'm running this on a machine with more than 200GB of RAM, running in >>> local mode with spark.driver.memory set to 64g. >>> >>> I do not know why it cannot allocate a big enough page, and why is it >>> trying to allocate such a big page in the first place? >>> >>> I hope someone with more knowledge of Spark can shed some light on this. >>> Thank you! >>> >>> >>> *Best regards,* >>> *Babak Alipour ,* >>> *University of Florida* >>> >> >> >