Re: Memory allocation error with Spark 1.5
Works like a charm. Thanks Reynold for the quick and efficient response! Alexis 2015-08-05 19:19 GMT+02:00 Reynold Xin r...@databricks.com: In Spark 1.5, we have a new way to manage memory (part of Project Tungsten). The default unit of memory allocation is 64MB, which is way too high when you have 1G of memory allocated in total and have more than 4 threads. We will reduce the default page size before releasing 1.5. For now, you can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m). https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125 On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin aseigneu...@ippon.fr wrote: Hi, I'm receiving a memory allocation error with a recent build of Spark 1.5: java.io.IOException: Unable to acquire 67108864 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174) at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146) at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126) The issue appears when joining 2 datasets. One with 6084 records, the other one with 200 records. I'm expecting to receive 200 records in the result. I'm using a homemade build prepared from branch-1.5 with commit ID eedb996. I have run mvn -DskipTests clean install to generate that build. Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3. I've prepared a test case that can be built and executed very easily (data files are included in the repo): https://github.com/aseigneurin/spark-testcase One thing to note is that the issue arises when the master is set to local[*] but not when set to local. Both options work without problem with Spark 1.4, though. Any help will be greatly appreciated! Many thanks, Alexis
Memory allocation error with Spark 1.5
Hi, I'm receiving a memory allocation error with a recent build of Spark 1.5: java.io.IOException: Unable to acquire 67108864 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174) at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146) at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126) The issue appears when joining 2 datasets. One with 6084 records, the other one with 200 records. I'm expecting to receive 200 records in the result. I'm using a homemade build prepared from branch-1.5 with commit ID eedb996. I have run mvn -DskipTests clean install to generate that build. Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3. I've prepared a test case that can be built and executed very easily (data files are included in the repo): https://github.com/aseigneurin/spark-testcase One thing to note is that the issue arises when the master is set to local[*] but not when set to local. Both options work without problem with Spark 1.4, though. Any help will be greatly appreciated! Many thanks, Alexis
Re: Memory allocation error with Spark 1.5
In Spark 1.5, we have a new way to manage memory (part of Project Tungsten). The default unit of memory allocation is 64MB, which is way too high when you have 1G of memory allocated in total and have more than 4 threads. We will reduce the default page size before releasing 1.5. For now, you can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m). https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125 On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin aseigneu...@ippon.fr wrote: Hi, I'm receiving a memory allocation error with a recent build of Spark 1.5: java.io.IOException: Unable to acquire 67108864 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174) at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146) at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126) The issue appears when joining 2 datasets. One with 6084 records, the other one with 200 records. I'm expecting to receive 200 records in the result. I'm using a homemade build prepared from branch-1.5 with commit ID eedb996. I have run mvn -DskipTests clean install to generate that build. Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3. I've prepared a test case that can be built and executed very easily (data files are included in the repo): https://github.com/aseigneurin/spark-testcase One thing to note is that the issue arises when the master is set to local[*] but not when set to local. Both options work without problem with Spark 1.4, though. Any help will be greatly appreciated! Many thanks, Alexis