Re: Memory allocation error with Spark 1.5

2015-08-06 Thread Alexis Seigneurin
Works like a charm. Thanks Reynold for the quick and efficient response!

Alexis

2015-08-05 19:19 GMT+02:00 Reynold Xin r...@databricks.com:

 In Spark 1.5, we have a new way to manage memory (part of Project
 Tungsten). The default unit of memory allocation is 64MB, which is way too
 high when you have 1G of memory allocated in total and have more than 4
 threads.

 We will reduce the default page size before releasing 1.5.  For now, you
 can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m).


 https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125

 On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin aseigneu...@ippon.fr
 wrote:

 Hi,

 I'm receiving a memory allocation error with a recent build of Spark 1.5:

 java.io.IOException: Unable to acquire 67108864 bytes of memory
 at
 org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
 at
 org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
 at
 org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
 at
 org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
 at
 org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
 at
 org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)


 The issue appears when joining 2 datasets. One with 6084 records, the
 other one with 200 records. I'm expecting to receive 200 records in the
 result.

 I'm using a homemade build prepared from branch-1.5 with commit ID
 eedb996. I have run mvn -DskipTests clean install to generate that
 build.

 Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.

 I've prepared a test case that can be built and executed very easily
 (data files are included in the repo):
 https://github.com/aseigneurin/spark-testcase

 One thing to note is that the issue arises when the master is set to
 local[*] but not when set to local. Both options work without problem
 with Spark 1.4, though.

 Any help will be greatly appreciated!

 Many thanks,
 Alexis





Memory allocation error with Spark 1.5

2015-08-05 Thread Alexis Seigneurin
Hi,

I'm receiving a memory allocation error with a recent build of Spark 1.5:

java.io.IOException: Unable to acquire 67108864 bytes of memory
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)


The issue appears when joining 2 datasets. One with 6084 records, the other
one with 200 records. I'm expecting to receive 200 records in the result.

I'm using a homemade build prepared from branch-1.5 with commit ID
eedb996. I have run mvn -DskipTests clean install to generate that
build.

Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.

I've prepared a test case that can be built and executed very easily (data
files are included in the repo):
https://github.com/aseigneurin/spark-testcase

One thing to note is that the issue arises when the master is set to
local[*] but not when set to local. Both options work without problem
with Spark 1.4, though.

Any help will be greatly appreciated!

Many thanks,
Alexis


Re: Memory allocation error with Spark 1.5

2015-08-05 Thread Reynold Xin
In Spark 1.5, we have a new way to manage memory (part of Project
Tungsten). The default unit of memory allocation is 64MB, which is way too
high when you have 1G of memory allocated in total and have more than 4
threads.

We will reduce the default page size before releasing 1.5.  For now, you
can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m).

https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125

On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin aseigneu...@ippon.fr
wrote:

 Hi,

 I'm receiving a memory allocation error with a recent build of Spark 1.5:

 java.io.IOException: Unable to acquire 67108864 bytes of memory
 at
 org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
 at
 org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
 at
 org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
 at
 org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
 at
 org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
 at
 org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)


 The issue appears when joining 2 datasets. One with 6084 records, the
 other one with 200 records. I'm expecting to receive 200 records in the
 result.

 I'm using a homemade build prepared from branch-1.5 with commit ID
 eedb996. I have run mvn -DskipTests clean install to generate that
 build.

 Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.

 I've prepared a test case that can be built and executed very easily (data
 files are included in the repo):
 https://github.com/aseigneurin/spark-testcase

 One thing to note is that the issue arises when the master is set to
 local[*] but not when set to local. Both options work without problem
 with Spark 1.4, though.

 Any help will be greatly appreciated!

 Many thanks,
 Alexis