Re: internal unit tests failing against the latest spark master
i confirmed that an Encoder[Array[Int]] is no longer serializable, and with my spark build from march 7 it was. i believe the issue is commit 295747e59739ee8a697ac3eba485d3439e4a04c3 and i send wenchen an email about it. On Wed, Apr 12, 2017 at 4:31 PM, Koert Kuiperswrote: > i believe the error is related to an > org.apache.spark.sql.expressions.Aggregator > where the buffer type (BUF) is Array[Int] > > On Wed, Apr 12, 2017 at 4:19 PM, Koert Kuipers wrote: > >> hey all, >> today i tried upgrading the spark version we use internally by creating a >> new internal release from the spark master branch. last time i did this was >> march 7. >> >> with this updated spark i am seeing some serialization errors in the unit >> tests for our own libraries. looks like a scala reflection type that is not >> serializable is getting sucked into serialization for the encoder? >> see below. >> best, >> koert >> >> [info] org.apache.spark.SparkException: Task not serializable >> [info] at org.apache.spark.util.ClosureCleaner$.ensureSerializable(Clo >> sureCleaner.scala:298) >> [info] at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ >> ClosureCleaner$$clean(ClosureCleaner.scala:288) >> [info] at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner. >> scala:108) >> [info] at org.apache.spark.SparkContext.clean(SparkContext.scala:2284) >> [info] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058) >> ... >> [info] Serialization stack: >> [info] - object not serializable (class: >> scala.reflect.internal.BaseTypeSeqs$BaseTypeSeq, value: >> BTS(Int,AnyVal,Any)) >> [info] - field (class: scala.reflect.internal.Types$TypeRef, name: >> baseTypeSeqCache, type: class scala.reflect.internal.BaseTyp >> eSeqs$BaseTypeSeq) >> [info] - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, >> Int) >> [info] - field (class: >> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, >> name: elementType$2, type: class scala.reflect.api.Types$TypeApi) >> [info] - object (class >> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, >> ) >> [info] - field (class: org.apache.spark.sql.catalyst. >> expressions.objects.UnresolvedMapObjects, name: function, type: >> interface scala.Function1) >> [info] - object (class org.apache.spark.sql.catalyst. >> expressions.objects.UnresolvedMapObjects, unresolvedmapobjects(, >> getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface >> scala.collection.Seq))) >> [info] - field (class: org.apache.spark.sql.catalyst. >> expressions.objects.WrapOption, name: child, type: class >> org.apache.spark.sql.catalyst.expressions.Expression) >> [info] - object (class org.apache.spark.sql.catalyst. >> expressions.objects.WrapOption, wrapoption(unresolvedmapobjects(, >> getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface >> scala.collection.Seq)), ObjectType(interface scala.collection.Seq))) >> [info] - writeObject data (class: scala.collection.immutable.Lis >> t$SerializationProxy) >> [info] - object (class >> scala.collection.immutable.List$SerializationProxy, >> scala.collection.immutable.List$SerializationProxy@69040c85) >> [info] - writeReplace data (class: scala.collection.immutable.Lis >> t$SerializationProxy) >> [info] - object (class scala.collection.immutable.$colon$colon, >> List(wrapoption(unresolvedmapobjects(, getcolumnbyordinal(0, >> ArrayType(IntegerType,false)), Some(interface scala.collection.Seq)), >> ObjectType(interface scala.collection.Seq >> [info] - field (class: org.apache.spark.sql.catalyst. >> expressions.objects.NewInstance, name: arguments, type: interface >> scala.collection.Seq) >> [info] - object (class org.apache.spark.sql.catalyst. >> expressions.objects.NewInstance, newInstance(class scala.Tuple1)) >> [info] - field (class: >> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder, >> name: deserializer, type: class org.apache.spark.sql.catalyst. >> expressions.Expression) >> [info] - object (class >> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder, >> class[_1[0]: array]) >> ... >> >> >
Re: internal unit tests failing against the latest spark master
i believe the error is related to an org.apache.spark.sql.expressions.Aggregator where the buffer type (BUF) is Array[Int] On Wed, Apr 12, 2017 at 4:19 PM, Koert Kuiperswrote: > hey all, > today i tried upgrading the spark version we use internally by creating a > new internal release from the spark master branch. last time i did this was > march 7. > > with this updated spark i am seeing some serialization errors in the unit > tests for our own libraries. looks like a scala reflection type that is not > serializable is getting sucked into serialization for the encoder? > see below. > best, > koert > > [info] org.apache.spark.SparkException: Task not serializable > [info] at org.apache.spark.util.ClosureCleaner$.ensureSerializable( > ClosureCleaner.scala:298) > [info] at org.apache.spark.util.ClosureCleaner$.org$apache$ > spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) > [info] at org.apache.spark.util.ClosureCleaner$.clean( > ClosureCleaner.scala:108) > [info] at org.apache.spark.SparkContext.clean(SparkContext.scala:2284) > [info] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058) > ... > [info] Serialization stack: > [info] - object not serializable (class: > scala.reflect.internal.BaseTypeSeqs$BaseTypeSeq, > value: BTS(Int,AnyVal,Any)) > [info] - field (class: scala.reflect.internal.Types$TypeRef, name: > baseTypeSeqCache, type: class scala.reflect.internal. > BaseTypeSeqs$BaseTypeSeq) > [info] - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, > Int) > [info] - field (class: > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, > name: elementType$2, type: class scala.reflect.api.Types$TypeApi) > [info] - object (class > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, > ) > [info] - field (class: org.apache.spark.sql.catalyst. > expressions.objects.UnresolvedMapObjects, name: function, type: interface > scala.Function1) > [info] - object (class org.apache.spark.sql.catalyst. > expressions.objects.UnresolvedMapObjects, unresolvedmapobjects(, > getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface > scala.collection.Seq))) > [info] - field (class: org.apache.spark.sql.catalyst. > expressions.objects.WrapOption, name: child, type: class > org.apache.spark.sql.catalyst.expressions.Expression) > [info] - object (class org.apache.spark.sql.catalyst. > expressions.objects.WrapOption, wrapoption(unresolvedmapobjects(, > getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface > scala.collection.Seq)), ObjectType(interface scala.collection.Seq))) > [info] - writeObject data (class: scala.collection.immutable. > List$SerializationProxy) > [info] - object (class scala.collection.immutable.List$SerializationProxy, > scala.collection.immutable.List$SerializationProxy@69040c85) > [info] - writeReplace data (class: scala.collection.immutable. > List$SerializationProxy) > [info] - object (class scala.collection.immutable.$colon$colon, > List(wrapoption(unresolvedmapobjects(, getcolumnbyordinal(0, > ArrayType(IntegerType,false)), Some(interface scala.collection.Seq)), > ObjectType(interface scala.collection.Seq > [info] - field (class: org.apache.spark.sql.catalyst. > expressions.objects.NewInstance, name: arguments, type: interface > scala.collection.Seq) > [info] - object (class org.apache.spark.sql.catalyst. > expressions.objects.NewInstance, newInstance(class scala.Tuple1)) > [info] - field (class: > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder, > name: deserializer, type: class org.apache.spark.sql.catalyst. > expressions.Expression) > [info] - object (class > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder, > class[_1[0]: array]) > ... > >
internal unit tests failing against the latest spark master
hey all, today i tried upgrading the spark version we use internally by creating a new internal release from the spark master branch. last time i did this was march 7. with this updated spark i am seeing some serialization errors in the unit tests for our own libraries. looks like a scala reflection type that is not serializable is getting sucked into serialization for the encoder? see below. best, koert [info] org.apache.spark.SparkException: Task not serializable [info] at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) [info] at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) [info] at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) [info] at org.apache.spark.SparkContext.clean(SparkContext.scala:2284) [info] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058) ... [info] Serialization stack: [info] - object not serializable (class: scala.reflect.internal.BaseTypeSeqs$BaseTypeSeq, value: BTS(Int,AnyVal,Any)) [info] - field (class: scala.reflect.internal.Types$TypeRef, name: baseTypeSeqCache, type: class scala.reflect.internal.BaseTypeSeqs$BaseTypeSeq) [info] - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, Int) [info] - field (class: org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, name: elementType$2, type: class scala.reflect.api.Types$TypeApi) [info] - object (class org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, ) [info] - field (class: org.apache.spark.sql.catalyst.expressions.objects.UnresolvedMapObjects, name: function, type: interface scala.Function1) [info] - object (class org.apache.spark.sql.catalyst.expressions.objects.UnresolvedMapObjects, unresolvedmapobjects(, getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface scala.collection.Seq))) [info] - field (class: org.apache.spark.sql.catalyst.expressions.objects.WrapOption, name: child, type: class org.apache.spark.sql.catalyst.expressions.Expression) [info] - object (class org.apache.spark.sql.catalyst.expressions.objects.WrapOption, wrapoption(unresolvedmapobjects(, getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface scala.collection.Seq)), ObjectType(interface scala.collection.Seq))) [info] - writeObject data (class: scala.collection.immutable.List$SerializationProxy) [info] - object (class scala.collection.immutable.List$SerializationProxy, scala.collection.immutable.List$SerializationProxy@69040c85) [info] - writeReplace data (class: scala.collection.immutable.List$SerializationProxy) [info] - object (class scala.collection.immutable.$colon$colon, List(wrapoption(unresolvedmapobjects(, getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface scala.collection.Seq)), ObjectType(interface scala.collection.Seq [info] - field (class: org.apache.spark.sql.catalyst.expressions.objects.NewInstance, name: arguments, type: interface scala.collection.Seq) [info] - object (class org.apache.spark.sql.catalyst.expressions.objects.NewInstance, newInstance(class scala.Tuple1)) [info] - field (class: org.apache.spark.sql.catalyst.encoders.ExpressionEncoder, name: deserializer, type: class org.apache.spark.sql.catalyst.expressions.Expression) [info] - object (class org.apache.spark.sql.catalyst.encoders.ExpressionEncoder, class[_1[0]: array]) ...