subject:"Memory allocation error with Spark 1.5"

Re: Memory allocation error with Spark 1.5

2015-08-06 Thread Alexis Seigneurin

Works like a charm. Thanks Reynold for the quick and efficient response!

Alexis

2015-08-05 19:19 GMT+02:00 Reynold Xin r...@databricks.com:

In Spark 1.5, we have a new way to manage memory (part of Project
Tungsten). The default unit of memory allocation is 64MB, which is way too
high when you have 1G of memory allocated in total and have more than 4
threads.

We will reduce the default page size before releasing 1.5. For now, you
can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m).

https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125

On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin aseigneu...@ippon.fr
wrote:

Hi,

I'm receiving a memory allocation error with a recent build of Spark 1.5:

java.io.IOException: Unable to acquire 67108864 bytes of memory
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)

The issue appears when joining 2 datasets. One with 6084 records, the
other one with 200 records. I'm expecting to receive 200 records in the
result.

I'm using a homemade build prepared from branch-1.5 with commit ID
eedb996. I have run mvn -DskipTests clean install to generate that
build.

Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.

I've prepared a test case that can be built and executed very easily
(data files are included in the repo):
https://github.com/aseigneurin/spark-testcase

One thing to note is that the issue arises when the master is set to
local[*] but not when set to local. Both options work without problem
with Spark 1.4, though.

Any help will be greatly appreciated!

Many thanks,
Alexis

Memory allocation error with Spark 1.5

2015-08-05 Thread Alexis Seigneurin

Hi,

I'm receiving a memory allocation error with a recent build of Spark 1.5:

java.io.IOException: Unable to acquire 67108864 bytes of memory
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)


The issue appears when joining 2 datasets. One with 6084 records, the other
one with 200 records. I'm expecting to receive 200 records in the result.

I'm using a homemade build prepared from branch-1.5 with commit ID
eedb996. I have run mvn -DskipTests clean install to generate that
build.

Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.

I've prepared a test case that can be built and executed very easily (data
files are included in the repo):
https://github.com/aseigneurin/spark-testcase

One thing to note is that the issue arises when the master is set to
local[*] but not when set to local. Both options work without problem
with Spark 1.4, though.

Any help will be greatly appreciated!

Many thanks,
Alexis

Re: Memory allocation error with Spark 1.5

2015-08-05 Thread Reynold Xin

We will reduce the default page size before releasing 1.5. For now, you
can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m).

https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125

On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin aseigneu...@ippon.fr
wrote:

Hi,

I'm receiving a memory allocation error with a recent build of Spark 1.5:

The issue appears when joining 2 datasets. One with 6084 records, the
other one with 200 records. I'm expecting to receive 200 records in the
result.

I'm using a homemade build prepared from branch-1.5 with commit ID
eedb996. I have run mvn -DskipTests clean install to generate that
build.

Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.

I've prepared a test case that can be built and executed very easily (data
files are included in the repo):
https://github.com/aseigneurin/spark-testcase

One thing to note is that the issue arises when the master is set to
local[*] but not when set to local. Both options work without problem
with Spark 1.4, though.

Any help will be greatly appreciated!

Many thanks,
Alexis

Re: Memory allocation error with Spark 1.5

Memory allocation error with Spark 1.5

Re: Memory allocation error with Spark 1.5

3 matches

Site Navigation

Mail list logo

Footer information