[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Deenar Toraskar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341844#comment-15341844
 ] 

Deenar Toraskar commented on SPARK-15704:
-

done see https://issues.apache.org/jira/browse/SPARK-16100


> TungstenAggregate crashes 
> --
>
> Key: SPARK-15704
> URL: https://issues.apache.org/jira/browse/SPARK-15704
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hiroshi Inoue
>Assignee: Hiroshi Inoue
> Fix For: 2.0.0
>
>
> When I run DatasetBenchmark, the JVM crashes while executing "Dataset complex 
> Aggregator" test case due to IndexOutOfBoundsException.
> The error happens in TungstenAggregate; the mappings between bufferSerializer 
> and bufferDeserializer are broken due to unresolved attribute.
> {quote}
> 16/06/02 01:41:05 ERROR Executor: Exception in task 0.0 in stage 67.0 (TID 
> 232)
> java.lang.IndexOutOfBoundsException: -1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate$RichAttribute.right(interfaces.scala:389)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:110)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:109)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:307)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:356)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:336)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:334)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
> 

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Hiroshi Inoue (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341618#comment-15341618
 ] 

Hiroshi Inoue commented on SPARK-15704:
---

Yes, please. Thank you.

> TungstenAggregate crashes 
> --
>
> Key: SPARK-15704
> URL: https://issues.apache.org/jira/browse/SPARK-15704
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hiroshi Inoue
>Assignee: Hiroshi Inoue
> Fix For: 2.0.0
>
>
> When I run DatasetBenchmark, the JVM crashes while executing "Dataset complex 
> Aggregator" test case due to IndexOutOfBoundsException.
> The error happens in TungstenAggregate; the mappings between bufferSerializer 
> and bufferDeserializer are broken due to unresolved attribute.
> {quote}
> 16/06/02 01:41:05 ERROR Executor: Exception in task 0.0 in stage 67.0 (TID 
> 232)
> java.lang.IndexOutOfBoundsException: -1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate$RichAttribute.right(interfaces.scala:389)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:110)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:109)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:307)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:356)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:336)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:334)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
> 

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Deenar Toraskar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341598#comment-15341598
 ] 

Deenar Toraskar commented on SPARK-15704:
-

[~inouehrs] thanks for checking this out, Do you want me to raise another JIRA?

> TungstenAggregate crashes 
> --
>
> Key: SPARK-15704
> URL: https://issues.apache.org/jira/browse/SPARK-15704
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hiroshi Inoue
>Assignee: Hiroshi Inoue
> Fix For: 2.0.0
>
>
> When I run DatasetBenchmark, the JVM crashes while executing "Dataset complex 
> Aggregator" test case due to IndexOutOfBoundsException.
> The error happens in TungstenAggregate; the mappings between bufferSerializer 
> and bufferDeserializer are broken due to unresolved attribute.
> {quote}
> 16/06/02 01:41:05 ERROR Executor: Exception in task 0.0 in stage 67.0 (TID 
> 232)
> java.lang.IndexOutOfBoundsException: -1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate$RichAttribute.right(interfaces.scala:389)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:110)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:109)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:307)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:356)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:336)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:334)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
> 

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Hiroshi Inoue (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341587#comment-15341587
 ] 

Hiroshi Inoue commented on SPARK-15704:
---

I confirmed the same error by executing Deenar's code.
This seems another issue.


> TungstenAggregate crashes 
> --
>
> Key: SPARK-15704
> URL: https://issues.apache.org/jira/browse/SPARK-15704
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hiroshi Inoue
>Assignee: Hiroshi Inoue
> Fix For: 2.0.0
>
>
> When I run DatasetBenchmark, the JVM crashes while executing "Dataset complex 
> Aggregator" test case due to IndexOutOfBoundsException.
> The error happens in TungstenAggregate; the mappings between bufferSerializer 
> and bufferDeserializer are broken due to unresolved attribute.
> {quote}
> 16/06/02 01:41:05 ERROR Executor: Exception in task 0.0 in stage 67.0 (TID 
> 232)
> java.lang.IndexOutOfBoundsException: -1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate$RichAttribute.right(interfaces.scala:389)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:110)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:109)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:307)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:356)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:336)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:334)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Deenar Toraskar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341478#comment-15341478
 ] 

Deenar Toraskar commented on SPARK-15704:
-

Hi guys

I get a similar error when using complex types in Aggregator. Not sure if this 
is the same issue or something else.

{code:title=Agg.scala|borderStyle=solid}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.TypedColumn
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.expressions.Aggregator
import org.apache.spark.sql.{Encoder,Row}
import sqlContext.implicits._

object CustomSummer extends Aggregator[Valuation, Map[Int, Seq[Double]], 
Seq[Seq[Double]]] with Serializable  {
 def zero: Map[Int, Seq[Double]] = Map()
 def reduce(b: Map[Int, Seq[Double]], a:Valuation): Map[Int, Seq[Double]] = 
{
   val timeInterval: Int = a.timeInterval
   val currentSum: Seq[Double] = b.get(timeInterval).getOrElse(Nil)
   val currentRow: Seq[Double] = a.pvs
   b.updated(timeInterval, sumArray(currentSum, currentRow))
 } 
def sumArray(a: Seq[Double], b: Seq[Double]): Seq[Double] = Nil
 def merge(b1: Map[Int, Seq[Double]], b2: Map[Int, Seq[Double]]): Map[Int, 
Seq[Double]] = {
/* merges two maps together ++ replaces any (k,v) from the map on the 
left
side of ++ (here map1) by (k,v) from the right side map, if (k,_) 
already
exists in the left side map (here map1), e.g. Map(1->1) ++ Map(1->2) 
results in Map(1->2) */
b1 ++ b2.map { case (timeInterval, exposures) =>
  timeInterval -> sumArray(exposures, b1.getOrElse(timeInterval, Nil))
}
 }
 def finish(exposures: Map[Int, Seq[Double]]): Seq[Seq[Double]] = 
  {
exposures.size match {
  case 0 => null
  case _ => {
val range = exposures.keySet.max
// convert map to 2 dimensional array, (timeInterval x Seq[expScn1, 
expScn2, ...]
(0 to range).map(x => exposures.getOrElse(x, Nil))
  }
}
  }
  override def bufferEncoder: Encoder[Map[Int,Seq[Double]]] = 
ExpressionEncoder()
  override def outputEncoder: Encoder[Seq[Seq[Double]]] = ExpressionEncoder()
   }

case class Valuation(timeInterval : Int, pvs : Seq[Double])
val valns = sc.parallelize(Seq(Valuation(0, Seq(1.0,2.0,3.0)),
  Valuation(2, Seq(1.0,2.0,3.0)),
  Valuation(1, Seq(1.0,2.0,3.0)),Valuation(2, Seq(1.0,2.0,3.0)),Valuation(0, 
Seq(1.0,2.0,3.0))
  )).toDS

val g_c1 = 
valns.groupByKey(_.timeInterval).agg(CustomSummer.toColumn).show(false)
{code}

I get the following error

{quote}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
stage 10.0 failed 1 times, most recent failure: Lost task 1.0 in stage 10.0 
(TID 19, localhost): java.lang.IndexOutOfBoundsException: 0
at scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43)
at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47)
at scala.collection.mutable.ArrayBuffer.remove(ArrayBuffer.scala:167)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:214)
at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:156)
at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:154)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:155)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:155)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:155)
at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:154)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:155)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:155)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at 

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310650#comment-15310650
 ] 

Apache Spark commented on SPARK-15704:
--

User 'inouehrs' has created a pull request for this issue:
https://github.com/apache/spark/pull/13446

> TungstenAggregate crashes 
> --
>
> Key: SPARK-15704
> URL: https://issues.apache.org/jira/browse/SPARK-15704
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hiroshi Inoue
>Priority: Minor
>
> When I run DatasetBenchmark, the JVM crashes while executing "Dataset complex 
> Aggregator" test case due to IndexOutOfBoundsException.
> The error happens in TungstenAggregate; the mappings between bufferSerializer 
> and bufferDeserializer are broken due to unresolved attribute.
> {quote}
> 16/06/02 01:41:05 ERROR Executor: Exception in task 0.0 in stage 67.0 (TID 
> 232)
> java.lang.IndexOutOfBoundsException: -1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.DeclarativeAggregate$RichAttribute.right(interfaces.scala:389)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:110)
>   at 
> org.apache.spark.sql.execution.aggregate.TypedAggregateExpression$$anonfun$3.applyOrElse(TypedAggregateExpression.scala:109)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:265)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:307)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:356)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:270)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:336)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:334)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
>