from:"Naden Franciscus \(JIRA\)"

[jira] [Commented] (SPARK-11617) MEMORY LEAK: ByteBuf.release() was not called before it's garbage-collected

2015-11-12 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001815#comment-15001815
 ] 

Naden Franciscus commented on SPARK-11617:
--

Can confirm both of these issues. Could it be related to the recent Snappy 
library upgrade ?

Also seeing lots of java.lang.OutOfMemoryError: Direct buffer memory coming 
from the Netty library.

We are executing a number of SQL statements in parallel and the code continues 
to work fine in 1.4.1/1.5.0 so it's a recent regression.

> MEMORY LEAK: ByteBuf.release() was not called before it's garbage-collected
> ---
>
> Key: SPARK-11617
> URL: https://issues.apache.org/jira/browse/SPARK-11617
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.6.0
>Reporter: LingZhou
>
> The problem may be related to
>  [SPARK-11235][NETWORK] Add ability to stream data using network lib.
> while running on yarn-client mode, there are error messages:
> 15/11/09 10:23:55 ERROR util.ResourceLeakDetector: LEAK: ByteBuf.release() 
> was not called before it's garbage-collected. Enable advanced leak reporting 
> to find out where the leak occurred. To enable advanced leak reporting, 
> specify the JVM option '-Dio.netty.leakDetectionLevel=advanced' or call 
> ResourceLeakDetector.setLevel() See 
> http://netty.io/wiki/reference-counted-objects.html for more information.
> and then it will cause 
> cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN 
> for exceeding memory limits. 9.0 GB of 9 GB physical memory used. Consider 
> boosting spark.yarn.executor.memoryOverhead.
> and WARN scheduler.TaskSetManager: Lost task 105.0 in stage 1.0 (TID 2616, 
> gsr489): java.lang.IndexOutOfBoundsException: index: 130828, length: 16833 
> (expected: range(0, 524288)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

2015-11-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984615#comment-14984615
 ] 

Naden Franciscus edited comment on SPARK-10929 at 11/2/15 12:48 AM:


Can confirm this issue has been resolved.

Any reason we can't target this for 1.5.3 ? 

It is a serious bug and it would mean that any HDP distros that eventually 
deploy Spark 1.5 would be unusable for many of us.


was (Author: nadenf):
Can confirm this issue has been resolved.

Any reason we can't target this for 1.5.3 ? 

It is a serious bug and it would mean that any HDP distros that use Spark 1.5 
would be unusable for many of us.

> Tungsten fails to acquire memory writing to HDFS
> 
>
> Key: SPARK-10929
> URL: https://issues.apache.org/jira/browse/SPARK-10929
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
>Reporter: Naden Franciscus
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 1.6.0
>
>
> We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
> hitting the following issue pretty routinely.
> 40GB heap x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
> 0.1 with no difference. 
> {code}
> .16): org.apache.spark.SparkException: Task failed while writing rows.
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:250)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
> at 
> org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
> at 
> org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
> at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
> at 
> scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.collect(TraversableLike.scala:278)
> at scala.collection.AbstractTraversable.collect(Traversable.scala:105)
> at 
> org.apache.spark.rdd.ZippedPartitionsBaseRDD.tryPrepareParents(ZippedPartitionsRDD.scala:82)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:97)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> {code}
> I have tried setting spark.buffer.pageSize to both 1MB and 64MB (in 
> spark-defaults.conf) and it makes no difference.
> It also tries to acquire 33554432 bytes of memory in both cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

2015-11-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984615#comment-14984615
 ] 

Naden Franciscus commented on SPARK-10929:
--

Can confirm this issue has been resolved.

Any reason we can't target this for 1.5.3 ? 

It is a serious bug and it would mean that any HDP distros that use Spark 1.5 
would be unusable for many of us.

> Tungsten fails to acquire memory writing to HDFS
> 
>
> Key: SPARK-10929
> URL: https://issues.apache.org/jira/browse/SPARK-10929
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
>Reporter: Naden Franciscus
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 1.6.0
>
>
> We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
> hitting the following issue pretty routinely.
> 40GB heap x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
> 0.1 with no difference. 
> {code}
> .16): org.apache.spark.SparkException: Task failed while writing rows.
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:250)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
> at 
> org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
> at 
> org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
> at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
> at 
> scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.collect(TraversableLike.scala:278)
> at scala.collection.AbstractTraversable.collect(Traversable.scala:105)
> at 
> org.apache.spark.rdd.ZippedPartitionsBaseRDD.tryPrepareParents(ZippedPartitionsRDD.scala:82)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:97)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> {code}
> I have tried setting spark.buffer.pageSize to both 1MB and 64MB (in 
> spark-defaults.conf) and it makes no difference.
> It also tries to acquire 33554432 bytes of memory in both cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-11-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984614#comment-14984614
 ] 

Naden Franciscus commented on SPARK-10474:
--

Can confirm this issue has been resolved. Nice work.

> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>  java.io.IOException: Could not acquire 65536 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Key SQL Query
> {code:sql}
> INSERT INTO TABLE test_table
> SELECT
>   ss.ss_customer_sk AS cid,
>   count(CASE WHEN i.i_class_id=1  THEN 1 ELSE NULL END) AS id1,
>   count(CASE WHEN i.i_class_id=3  THEN 1 ELSE NULL END) AS id3,
>   count(CASE WHEN i.i_class_id=5  THEN 1 ELSE NULL END) AS id5,
>   count(CASE WHEN i.i_class_id=7  THEN 1 ELSE NULL END) AS id7,
>   count(CASE WHEN i.i_class_id=9  THEN 1 ELSE NULL END) AS id9,
>   count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
>   count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
>   count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15,
>   count(CASE WHEN i.i_class_id=2  THEN 1 ELSE NULL END) AS id2,
>   count(CASE WHEN i.i_class_id=4  THEN 1 ELSE NULL END) AS id4,
>   count(CASE WHEN i.i_class_id=6  THEN 1 ELSE NULL END) AS id6,
>   count(CASE WHEN i.i_class_id=8  THEN 1 ELSE NULL END) AS id8,
>   count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
>   count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
>   count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16
> FROM store_sales ss
> INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk
> WHERE i.i_category IN ('Books')
> AND ss.ss_customer_sk IS NOT NULL
> GROUP BY ss.ss_customer_sk
> HAVING count(ss.ss_item_sk) > 5
> {code}
> Note:
> the store_sales is a big fact table and item is a small dimension table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Commented] (SPARK-4226) SparkSQL - Add support for subqueries in predicates

2015-10-10 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951663#comment-14951663
 ] 

Naden Franciscus commented on SPARK-4226:
-

Is there a reason this pull request can't be merged in ?

It's been so long and such a critical feature for us.

> SparkSQL - Add support for subqueries in predicates
> ---
>
> Key: SPARK-4226
> URL: https://issues.apache.org/jira/browse/SPARK-4226
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.2.0
> Environment: Spark 1.2 snapshot
>Reporter: Terry Siu
>
> I have a test table defined in Hive as follows:
> {code:sql}
> CREATE TABLE sparkbug (
>   id INT,
>   event STRING
> ) STORED AS PARQUET;
> {code}
> and insert some sample data with ids 1, 2, 3.
> In a Spark shell, I then create a HiveContext and then execute the following 
> HQL to test out subquery predicates:
> {code}
> val hc = HiveContext(hc)
> hc.hql("select customerid from sparkbug where customerid in (select 
> customerid from sparkbug where customerid in (2,3))")
> {code}
> I get the following error:
> {noformat}
> java.lang.RuntimeException: Unsupported language features in query: select 
> customerid from sparkbug where customerid in (select customerid from sparkbug 
> where customerid in (2,3))
> TOK_QUERY
>   TOK_FROM
> TOK_TABREF
>   TOK_TABNAME
> sparkbug
>   TOK_INSERT
> TOK_DESTINATION
>   TOK_DIR
> TOK_TMP_FILE
> TOK_SELECT
>   TOK_SELEXPR
> TOK_TABLE_OR_COL
>   customerid
> TOK_WHERE
>   TOK_SUBQUERY_EXPR
> TOK_SUBQUERY_OP
>   in
> TOK_QUERY
>   TOK_FROM
> TOK_TABREF
>   TOK_TABNAME
> sparkbug
>   TOK_INSERT
> TOK_DESTINATION
>   TOK_DIR
> TOK_TMP_FILE
> TOK_SELECT
>   TOK_SELEXPR
> TOK_TABLE_OR_COL
>   customerid
> TOK_WHERE
>   TOK_FUNCTION
> in
> TOK_TABLE_OR_COL
>   customerid
> 2
> 3
> TOK_TABLE_OR_COL
>   customerid
> scala.NotImplementedError: No parse rules for ASTNode type: 817, text: 
> TOK_SUBQUERY_EXPR :
> TOK_SUBQUERY_EXPR
>   TOK_SUBQUERY_OP
> in
>   TOK_QUERY
> TOK_FROM
>   TOK_TABREF
> TOK_TABNAME
>   sparkbug
> TOK_INSERT
>   TOK_DESTINATION
> TOK_DIR
>   TOK_TMP_FILE
>   TOK_SELECT
> TOK_SELEXPR
>   TOK_TABLE_OR_COL
> customerid
>   TOK_WHERE
> TOK_FUNCTION
>   in
>   TOK_TABLE_OR_COL
> customerid
>   2
>   3
>   TOK_TABLE_OR_COL
> customerid
> " +
>  
> org.apache.spark.sql.hive.HiveQl$.nodeToExpr(HiveQl.scala:1098)
> 
> at scala.sys.package$.error(package.scala:27)
> at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:252)
> at 
> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:50)
> at 
> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:49)
> at 
> scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
> {noformat}
> [This 
> thread|http://apache-spark-user-list.1001560.n3.nabble.com/Subquery-in-having-clause-Spark-1-1-0-td17401.html]
>  also brings up lack of subquery support in SparkSQL. It would be nice to 
> have subquery predicate support in a near, future release (1.3, maybe?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

2015-10-05 Thread Naden Franciscus (JIRA)

Naden Franciscus created SPARK-10929:


 Summary: Tungsten fails to acquire memory writing to HDFS
 Key: SPARK-10929
 URL: https://issues.apache.org/jira/browse/SPARK-10929
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.1, 1.5.0
Reporter: Naden Franciscus






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

2015-10-05 Thread Naden Franciscus (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naden Franciscus updated SPARK-10929:
-
Priority: Blocker  (was: Major)

> Tungsten fails to acquire memory writing to HDFS
> 
>
> Key: SPARK-10929
> URL: https://issues.apache.org/jira/browse/SPARK-10929
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
>Reporter: Naden Franciscus
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

2015-10-05 Thread Naden Franciscus (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naden Franciscus updated SPARK-10929:
-
Description: 
We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
hitting the following issue pretty routinely.

40GB heap x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
0.1 with no difference. 

{code}
.16): org.apache.spark.SparkException: Task failed while writing rows.
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:250)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at 
scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
scala.collection.TraversableLike$class.collect(TraversableLike.scala:278)
at scala.collection.AbstractTraversable.collect(Traversable.scala:105)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.tryPrepareParents(ZippedPartitionsRDD.scala:82)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

{code}

I have tried setting spark.buffer.pageSize to both 1MB and 64MB (in 
spark-defaults.conf) and it makes no difference.

It also tries to acquire 33554432 bytes of memory in both cases.


  was:
We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
hitting the following issue pretty routinely.

40GB heap x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
0.1 with no difference. 

{code}
.16): org.apache.spark.SparkException: Task failed while writing rows.
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:250)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
at

[jira] [Updated] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

2015-10-05 Thread Naden Franciscus (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naden Franciscus updated SPARK-10929:
-
Description: 
We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
hitting the following issue pretty routinely.

40GB heap x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
0.1 with no difference. 

{code}
.16): org.apache.spark.SparkException: Task failed while writing rows.
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:250)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at 
scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
scala.collection.TraversableLike$class.collect(TraversableLike.scala:278)
at scala.collection.AbstractTraversable.collect(Traversable.scala:105)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.tryPrepareParents(ZippedPartitionsRDD.scala:82)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

{code}

> Tungsten fails to acquire memory writing to HDFS
> 
>
> Key: SPARK-10929
> URL: https://issues.apache.org/jira/browse/SPARK-10929
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
>Reporter: Naden Franciscus
>Priority: Blocker
>
> We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
> hitting the following issue pretty routinely.
> 40GB heap x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
> 0.1 with no difference. 
> {code}
> .16): org.apache.spark.SparkException: Task failed while writing rows.
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:250)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
>

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-10-02 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942096#comment-14942096
 ] 

Naden Franciscus commented on SPARK-10474:
--

I have tried setting spark.buffer.pageSize to both 1Mb and 64MB and it makes no 
difference.

It also tries to acquire 33554432 bytes of memory in both cases.

Can we please reopen this ticket ?

> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>  java.io.IOException: Could not acquire 65536 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Key SQL Query
> {code:sql}
> INSERT INTO TABLE test_table
> SELECT
>   ss.ss_customer_sk AS cid,
>   count(CASE WHEN i.i_class_id=1  THEN 1 ELSE NULL END) AS id1,
>   count(CASE WHEN i.i_class_id=3  THEN 1 ELSE NULL END) AS id3,
>   count(CASE WHEN i.i_class_id=5  THEN 1 ELSE NULL END) AS id5,
>   count(CASE WHEN i.i_class_id=7  THEN 1 ELSE NULL END) AS id7,
>   count(CASE WHEN i.i_class_id=9  THEN 1 ELSE NULL END) AS id9,
>   count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
>   count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
>   count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15,
>   count(CASE WHEN i.i_class_id=2  THEN 1 ELSE NULL END) AS id2,
>   count(CASE WHEN i.i_class_id=4  THEN 1 ELSE NULL END) AS id4,
>   count(CASE WHEN i.i_class_id=6  THEN 1 ELSE NULL END) AS id6,
>   count(CASE WHEN i.i_class_id=8  THEN 1 ELSE NULL END) AS id8,
>   count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
>   count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
>   count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16
> FROM store_sales ss
> INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk
> WHERE i.i_category IN ('Books')
> AND ss.ss_customer_sk IS NOT NULL
> GROUP BY ss.ss_customer_sk
> HAVING count(ss.ss_item_sk) > 5
> {code}
> Note:
> the store_sales is a big fact table and item is a small

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-10-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939808#comment-14939808
 ] 

Naden Franciscus commented on SPARK-10474:
--

We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
hitting this issue pretty routinely.

40GB heaps x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
0.1 with no difference.

Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at 
scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
scala.collection.TraversableLike$class.collect(TraversableLike.scala:278)
at scala.collection.AbstractTraversable.collect(Traversable.scala:105)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.tryPrepareParents(ZippedPartitionsRDD.scala:82)


> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>  java.io.IOException: Could not acquire 65536 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>

[jira] [Comment Edited] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-10-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939808#comment-14939808
 ] 

Naden Franciscus edited comment on SPARK-10474 at 10/1/15 1:30 PM:
---

We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
hitting this issue pretty routinely.

40GB heaps x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
0.1 with no difference. I am using the HEAD of the Spark 1.5 branch so it 
definitely includes all the commits above.

Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at 
scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
scala.collection.TraversableLike$class.collect(TraversableLike.scala:278)
at scala.collection.AbstractTraversable.collect(Traversable.scala:105)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.tryPrepareParents(ZippedPartitionsRDD.scala:82)



was (Author: nadenf):
We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
hitting this issue pretty routinely.

40GB heaps x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
0.1 with no difference.

Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at 
scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
scala.collection.TraversableLike$class.collect(TraversableLike.scala:278)
at scala.collection.AbstractTraversable.collect(Traversable.scala:105)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.tryPrepareParents(ZippedPartitionsRDD.scala:82)


> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>

[jira] [Comment Edited] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-10-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939808#comment-14939808
 ] 

Naden Franciscus edited comment on SPARK-10474 at 10/1/15 1:33 PM:
---

We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
hitting this issue pretty routinely.

40GB heaps x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
0.1 with no difference. I am using the HEAD of the Spark 1.5 branch so it 
definitely includes all the commits above.

.16): org.apache.spark.SparkException: Task failed while writing rows.
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:250)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at 
scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
scala.collection.TraversableLike$class.collect(TraversableLike.scala:278)
at scala.collection.AbstractTraversable.collect(Traversable.scala:105)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.tryPrepareParents(ZippedPartitionsRDD.scala:82)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)



was (Author: nadenf):
We are executing 20 Spark SQL jobs in parallel using Spark Job Server and 
hitting this issue pretty routinely.

40GB heaps x 6 nodes. Have tried adjusting shuffle.memoryFraction from 0.2 -> 
0.1 with no difference. I am using the HEAD of the Spark 1.5 branch so it 
definitely includes all the commits above.

Caused by: java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-10-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940535#comment-14940535
 ] 

Naden Franciscus commented on SPARK-10474:
--

[~yhuai] Standalone

> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>  java.io.IOException: Could not acquire 65536 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Key SQL Query
> {code:sql}
> INSERT INTO TABLE test_table
> SELECT
>   ss.ss_customer_sk AS cid,
>   count(CASE WHEN i.i_class_id=1  THEN 1 ELSE NULL END) AS id1,
>   count(CASE WHEN i.i_class_id=3  THEN 1 ELSE NULL END) AS id3,
>   count(CASE WHEN i.i_class_id=5  THEN 1 ELSE NULL END) AS id5,
>   count(CASE WHEN i.i_class_id=7  THEN 1 ELSE NULL END) AS id7,
>   count(CASE WHEN i.i_class_id=9  THEN 1 ELSE NULL END) AS id9,
>   count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
>   count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
>   count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15,
>   count(CASE WHEN i.i_class_id=2  THEN 1 ELSE NULL END) AS id2,
>   count(CASE WHEN i.i_class_id=4  THEN 1 ELSE NULL END) AS id4,
>   count(CASE WHEN i.i_class_id=6  THEN 1 ELSE NULL END) AS id6,
>   count(CASE WHEN i.i_class_id=8  THEN 1 ELSE NULL END) AS id8,
>   count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
>   count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
>   count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16
> FROM store_sales ss
> INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk
> WHERE i.i_category IN ('Books')
> AND ss.ss_customer_sk IS NOT NULL
> GROUP BY ss.ss_customer_sk
> HAVING count(ss.ss_item_sk) > 5
> {code}
> Note:
> the store_sales is a big fact table and item is a small dimension table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail:

[jira] [Comment Edited] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-10-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940487#comment-14940487
 ] 

Naden Franciscus edited comment on SPARK-10474 at 10/1/15 9:58 PM:
---

I can't provide the explain plan since we are executing 1000s of SQL statement 
and hard to tell which is which.

Have increased heap to 50GB + shuffle.memoryFraction to 0.6 and 0.8. No change.

Will file this in another ticket.


was (Author: nadenf):
I can't provide the explain plan since we are executing 1000s of SQL statement 
and hard to tell which is which.

Have increased heap to 50GB + shuffle.memoryFraction to 0.6 and 0.8. No change.

@Andrew: is there is a ticket for this ?

> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>  java.io.IOException: Could not acquire 65536 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Key SQL Query
> {code:sql}
> INSERT INTO TABLE test_table
> SELECT
>   ss.ss_customer_sk AS cid,
>   count(CASE WHEN i.i_class_id=1  THEN 1 ELSE NULL END) AS id1,
>   count(CASE WHEN i.i_class_id=3  THEN 1 ELSE NULL END) AS id3,
>   count(CASE WHEN i.i_class_id=5  THEN 1 ELSE NULL END) AS id5,
>   count(CASE WHEN i.i_class_id=7  THEN 1 ELSE NULL END) AS id7,
>   count(CASE WHEN i.i_class_id=9  THEN 1 ELSE NULL END) AS id9,
>   count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
>   count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
>   count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15,
>   count(CASE WHEN i.i_class_id=2  THEN 1 ELSE NULL END) AS id2,
>   count(CASE WHEN i.i_class_id=4  THEN 1 ELSE NULL END) AS id4,
>   count(CASE WHEN i.i_class_id=6  THEN 1 ELSE NULL END) AS id6,
>   count(CASE WHEN i.i_class_id=8  THEN 1 ELSE NULL END) AS id8,
>   count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
>   count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
>

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

2015-10-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940494#comment-14940494
 ] 

Naden Franciscus commented on SPARK-10309:
--

Has been difficult to get a clean stacktrace/explain trace because we are 
executing lots of SQL commands in parallel and we don't know which one is 
failing. We are absolutely doing lots of joins/aggregation/sorts. 

I have tried increase shuffle.memoryFraction to 0.8 but that didn't help.

This is still an issue with the latest Spark 1.5.2 branch.



> Some tasks failed with Unable to acquire memory
> ---
>
> Key: SPARK-10309
> URL: https://issues.apache.org/jira/browse/SPARK-10309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Davies Liu
>
> While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on 
> executor):
> {code}
> java.io.IOException: Unable to acquire 33554432 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:45)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The task could finished after retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-10-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940505#comment-14940505
 ] 

Naden Franciscus commented on SPARK-10474:
--

Can confirm also getting this issue now. There must be something common to both 
though right.

An acquire should never fail unless the OS is out of memory right ?

> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>  java.io.IOException: Could not acquire 65536 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Key SQL Query
> {code:sql}
> INSERT INTO TABLE test_table
> SELECT
>   ss.ss_customer_sk AS cid,
>   count(CASE WHEN i.i_class_id=1  THEN 1 ELSE NULL END) AS id1,
>   count(CASE WHEN i.i_class_id=3  THEN 1 ELSE NULL END) AS id3,
>   count(CASE WHEN i.i_class_id=5  THEN 1 ELSE NULL END) AS id5,
>   count(CASE WHEN i.i_class_id=7  THEN 1 ELSE NULL END) AS id7,
>   count(CASE WHEN i.i_class_id=9  THEN 1 ELSE NULL END) AS id9,
>   count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
>   count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
>   count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15,
>   count(CASE WHEN i.i_class_id=2  THEN 1 ELSE NULL END) AS id2,
>   count(CASE WHEN i.i_class_id=4  THEN 1 ELSE NULL END) AS id4,
>   count(CASE WHEN i.i_class_id=6  THEN 1 ELSE NULL END) AS id6,
>   count(CASE WHEN i.i_class_id=8  THEN 1 ELSE NULL END) AS id8,
>   count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
>   count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
>   count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16
> FROM store_sales ss
> INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk
> WHERE i.i_category IN ('Books')
> AND ss.ss_customer_sk IS NOT NULL
> GROUP BY ss.ss_customer_sk
> HAVING count(ss.ss_item_sk) > 5
> {code}
> Note:
> the store_sales is a big fact table and item is a small dimension table.



--
This message

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

2015-10-01 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940487#comment-14940487
 ] 

Naden Franciscus commented on SPARK-10474:
--

I can't provide the explain plan since we are executing 1000s of SQL statement 
and hard to tell which is which.

Have increased heap to 50GB + shuffle.memoryFraction to 0.6 and 0.8. No change.

@Andrew: is there is a ticket for this ?

> TungstenAggregation cannot acquire memory for pointer array after switching 
> to sort-based
> -
>
> Key: SPARK-10474
> URL: https://issues.apache.org/jira/browse/SPARK-10474
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yi Zhou
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> In aggregation case, a  Lost task happened with below error.
> {code}
>  java.io.IOException: Could not acquire 65536 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
> at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:126)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Key SQL Query
> {code:sql}
> INSERT INTO TABLE test_table
> SELECT
>   ss.ss_customer_sk AS cid,
>   count(CASE WHEN i.i_class_id=1  THEN 1 ELSE NULL END) AS id1,
>   count(CASE WHEN i.i_class_id=3  THEN 1 ELSE NULL END) AS id3,
>   count(CASE WHEN i.i_class_id=5  THEN 1 ELSE NULL END) AS id5,
>   count(CASE WHEN i.i_class_id=7  THEN 1 ELSE NULL END) AS id7,
>   count(CASE WHEN i.i_class_id=9  THEN 1 ELSE NULL END) AS id9,
>   count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
>   count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
>   count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15,
>   count(CASE WHEN i.i_class_id=2  THEN 1 ELSE NULL END) AS id2,
>   count(CASE WHEN i.i_class_id=4  THEN 1 ELSE NULL END) AS id4,
>   count(CASE WHEN i.i_class_id=6  THEN 1 ELSE NULL END) AS id6,
>   count(CASE WHEN i.i_class_id=8  THEN 1 ELSE NULL END) AS id8,
>   count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
>   count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
>   count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16
> FROM store_sales ss
> INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk
> WHERE i.i_category IN ('Books')
> AND ss.ss_customer_sk IS NOT NULL
> GROUP BY ss.ss_customer_sk
> HAVING count(ss.ss_item_sk) > 5
> {code}
> Note:
> the store_sales is a

[jira] [Created] (SPARK-10908) ClassCastException in HadoopRDD.getJobConf

2015-10-01 Thread Naden Franciscus (JIRA)

Naden Franciscus created SPARK-10908:


 Summary: ClassCastException in HadoopRDD.getJobConf
 Key: SPARK-10908
 URL: https://issues.apache.org/jira/browse/SPARK-10908
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.2
Reporter: Naden Franciscus


Whilst running a Spark SQL job (I can't provide an explain plan as many of 
these are happening concurrently) the following exception is thrown:

java.lang.ClassCastException: [B cannot be cast to 
org.apache.spark.util.SerializableConfiguration
rg.apache.spark.util.SerializableConfiguration
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:144)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.ShuffleDependency.(Dependency.scala:82)
at 
org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:78)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

2015-09-09 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736304#comment-14736304
 ] 

Naden Franciscus commented on SPARK-10309:
--

Still working on the physical plan but we have been testing with the latest 
branch-1.5.0 releases which included this fix.

> Some tasks failed with Unable to acquire memory
> ---
>
> Key: SPARK-10309
> URL: https://issues.apache.org/jira/browse/SPARK-10309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Davies Liu
>
> While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on 
> executor):
> {code}
> java.io.IOException: Unable to acquire 33554432 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:45)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The task could finished after retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10309) Some tasks failed with Unable to acquire memory

2015-09-09 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736304#comment-14736304
 ] 

Naden Franciscus edited comment on SPARK-10309 at 9/9/15 6:43 AM:
--

Still working on the physical plan but we have been testing with the latest 
branch-1.5.0 releases which included this fix. It doesn't help.


was (Author: nadenf):
Still working on the physical plan but we have been testing with the latest 
branch-1.5.0 releases which included this fix.

> Some tasks failed with Unable to acquire memory
> ---
>
> Key: SPARK-10309
> URL: https://issues.apache.org/jira/browse/SPARK-10309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Davies Liu
>
> While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on 
> executor):
> {code}
> java.io.IOException: Unable to acquire 33554432 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:45)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The task could finished after retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10309) Some tasks failed with Unable to acquire memory

2015-09-08 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735872#comment-14735872
 ] 

Naden Franciscus edited comment on SPARK-10309 at 9/9/15 12:08 AM:
---

We are using Spark Job Server to submit the job.

Each job consists of:

1) Execute an SQL statement against HDFS.
2) Write the results into HDFS.
3) Writes the result into MongoDB using just a normal Java adapter.

We do many of these in parallel. We have 6 node cluster with 30GB allocated to 
Spark (-xmx30g) and 60GB free. We constantly get these failures.


was (Author: nadenf):
We are using Spark Job Server to submit the job.

Each job consists of:

1) Execute an SQL statement against HDFS.
2) Write the results into HDFS.
3) Writes the result into MongoDB using just a normal Java adapter.

We do many of these in parallel. We have 6 node cluster with 30GB allocated to 
Spark (-xmx30g) and 60GB free.

> Some tasks failed with Unable to acquire memory
> ---
>
> Key: SPARK-10309
> URL: https://issues.apache.org/jira/browse/SPARK-10309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Davies Liu
>
> While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on 
> executor):
> {code}
> java.io.IOException: Unable to acquire 33554432 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:45)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The task could finished after retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

2015-09-08 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735872#comment-14735872
 ] 

Naden Franciscus commented on SPARK-10309:
--

We are using Spark Job Server to submit the job.

Each job consists of:

1) Execute an SQL statement against HDFS.
2) Write the results into HDFS.
3) Writes the result into MongoDB using just a normal Java adapter.

We do many of these in parallel. We have 6 node cluster with 30GB allocated to 
Spark (-xmx30g) and 60GB free.

> Some tasks failed with Unable to acquire memory
> ---
>
> Key: SPARK-10309
> URL: https://issues.apache.org/jira/browse/SPARK-10309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Davies Liu
>
> While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on 
> executor):
> {code}
> java.io.IOException: Unable to acquire 33554432 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:45)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The task could finished after retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

2015-09-05 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731862#comment-14731862
 ] 

Naden Franciscus commented on SPARK-10309:
--

Why is this targeted for 1.6 ? We are finding this issue with basic Spark SQL 
executions in our applications.

Job aborted due to stage failure: Task 1 in stage 25.0 failed 4 times, most 
recent failure: Lost task 1.3 in stage 25.0 (TID 3962, 39.6.64.17): 
java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at 
scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
at scala.collection.immutable.List.foreach(List.scala:318)


> Some tasks failed with Unable to acquire memory
> ---
>
> Key: SPARK-10309
> URL: https://issues.apache.org/jira/browse/SPARK-10309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Davies Liu
>
> While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on 
> executor):
> {code}
> java.io.IOException: Unable to acquire 33554432 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:45)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The task could finished after retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (SPARK-10309) Some tasks failed with Unable to acquire memory

2015-09-05 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731862#comment-14731862
 ] 

Naden Franciscus edited comment on SPARK-10309 at 9/5/15 7:37 AM:
--

Why is this targeted for 1.6 ? We are finding this issue with basic Spark SQL 
executions in our applications. Is the expectation that Tungsten sort will be 
disabled in an upcoming checkin ?

Job aborted due to stage failure: Task 1 in stage 25.0 failed 4 times, most 
recent failure: Lost task 1.3 in stage 25.0 (TID 3962, 39.6.64.17): 
java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at 
scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
at scala.collection.immutable.List.foreach(List.scala:318)



was (Author: nadenf):
Why is this targeted for 1.6 ? We are finding this issue with basic Spark SQL 
executions in our applications.

Job aborted due to stage failure: Task 1 in stage 25.0 failed 4 times, most 
recent failure: Lost task 1.3 in stage 25.0 (TID 3962, 39.6.64.17): 
java.io.IOException: Unable to acquire 16777216 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
at 
org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:83)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD$$anonfun$tryPrepareParents$1.applyOrElse(ZippedPartitionsRDD.scala:82)
at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at 
scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
at scala.collection.immutable.List.foreach(List.scala:318)


> Some tasks failed with Unable to acquire memory
> ---
>
> Key: SPARK-10309
> URL: https://issues.apache.org/jira/browse/SPARK-10309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Davies Liu
>
> While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on 
> executor):
> {code}
> java.io.IOException: Unable to acquire 33554432 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)

[jira] [Commented] (SPARK-6412) Add Char support in dataTypes.

2015-06-21 Thread Naden Franciscus (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595280#comment-14595280
 ] 

Naden Franciscus commented on SPARK-6412:
-

This needs to be reopened.

The use case couldn't be more clear. Querying from a table that has a CHAR data 
type is unsupported.

 Add Char support in dataTypes.
 --

 Key: SPARK-6412
 URL: https://issues.apache.org/jira/browse/SPARK-6412
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Chen Song

 We can't get the schema of case class PrimitiveData, because of 
 ScalaReflection.schemaFor and dataTYpes doesn't support CharType.
 case class PrimitiveData(
 charField: Char,// Can't get the char schema info
 intField: Int,
 longField: Long,
 doubleField: Double,
 floatField: Float,
 shortField: Short,
 byteField: Byte,
 booleanField: Boolean)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-6412) Add Char support in dataTypes.

2015-06-21 Thread Naden Franciscus (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naden Franciscus updated SPARK-6412:

Comment: was deleted

(was: This needs to be reopened.

The use case couldn't be more clear. Querying from a table that has a CHAR data 
type is unsupported.)

 Add Char support in dataTypes.
 --

 Key: SPARK-6412
 URL: https://issues.apache.org/jira/browse/SPARK-6412
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Chen Song

 We can't get the schema of case class PrimitiveData, because of 
 ScalaReflection.schemaFor and dataTYpes doesn't support CharType.
 case class PrimitiveData(
 charField: Char,// Can't get the char schema info
 intField: Int,
 longField: Long,
 doubleField: Double,
 floatField: Float,
 shortField: Short,
 byteField: Byte,
 booleanField: Boolean)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11617) MEMORY LEAK: ByteBuf.release() was not called before it's garbage-collected

[jira] [Comment Edited] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

[jira] [Commented] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

[jira] [Commented] (SPARK-4226) SparkSQL - Add support for subqueries in predicates

[jira] [Created] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

[jira] [Updated] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

[jira] [Updated] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

[jira] [Updated] (SPARK-10929) Tungsten fails to acquire memory writing to HDFS

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

[jira] [Comment Edited] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

[jira] [Comment Edited] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

[jira] [Comment Edited] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

[jira] [Commented] (SPARK-10474) TungstenAggregation cannot acquire memory for pointer array after switching to sort-based

[jira] [Created] (SPARK-10908) ClassCastException in HadoopRDD.getJobConf

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

[jira] [Comment Edited] (SPARK-10309) Some tasks failed with Unable to acquire memory

[jira] [Comment Edited] (SPARK-10309) Some tasks failed with Unable to acquire memory

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

[jira] [Comment Edited] (SPARK-10309) Some tasks failed with Unable to acquire memory

[jira] [Commented] (SPARK-6412) Add Char support in dataTypes.

[jira] [Issue Comment Deleted] (SPARK-6412) Add Char support in dataTypes.

27 matches

Site Navigation

Mail list logo

Footer information