In Spark 1.5, we have a new way to manage memory (part of Project
Tungsten). The default unit of memory allocation is 64MB, which is way too
high when you have 1G of memory allocated in total and have more than 4
threads.

We will reduce the default page size before releasing 1.5.  For now, you
can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m).

https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125

On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin <aseigneu...@ippon.fr>
wrote:

> Hi,
>
> I'm receiving a memory allocation error with a recent build of Spark 1.5:
>
> java.io.IOException: Unable to acquire 67108864 bytes of memory
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
> at
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
> at
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
> at
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
> at
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)
>
>
> The issue appears when joining 2 datasets. One with 6084 records, the
> other one with 200 records. I'm expecting to receive 200 records in the
> result.
>
> I'm using a homemade build prepared from "branch-1.5" with commit ID
> "eedb996". I have run "mvn -DskipTests clean install" to generate that
> build.
>
> Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.
>
> I've prepared a test case that can be built and executed very easily (data
> files are included in the repo):
> https://github.com/aseigneurin/spark-testcase
>
> One thing to note is that the issue arises when the master is set to
> "local[*]" but not when set to "local". Both options work without problem
> with Spark 1.4, though.
>
> Any help will be greatly appreciated!
>
> Many thanks,
> Alexis
>

Reply via email to