Re: internal unit tests failing against the latest spark master

2017-04-12 Thread Koert Kuipers
i confirmed that an Encoder[Array[Int]] is no longer serializable, and with
my spark build from march 7 it was.

i believe the issue is commit 295747e59739ee8a697ac3eba485d3439e4a04c3 and
i send wenchen an email about it.

On Wed, Apr 12, 2017 at 4:31 PM, Koert Kuipers  wrote:

> i believe the error is related to an 
> org.apache.spark.sql.expressions.Aggregator
> where the buffer type (BUF) is Array[Int]
>
> On Wed, Apr 12, 2017 at 4:19 PM, Koert Kuipers  wrote:
>
>> hey all,
>> today i tried upgrading the spark version we use internally by creating a
>> new internal release from the spark master branch. last time i did this was
>> march 7.
>>
>> with this updated spark i am seeing some serialization errors in the unit
>> tests for our own libraries. looks like a scala reflection type that is not
>> serializable is getting sucked into serialization for the encoder?
>> see below.
>> best,
>> koert
>>
>> [info]   org.apache.spark.SparkException: Task not serializable
>> [info]   at org.apache.spark.util.ClosureCleaner$.ensureSerializable(Clo
>> sureCleaner.scala:298)
>> [info]   at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$
>> ClosureCleaner$$clean(ClosureCleaner.scala:288)
>> [info]   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.
>> scala:108)
>> [info]   at org.apache.spark.SparkContext.clean(SparkContext.scala:2284)
>> [info]   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058)
>> ...
>> [info] Serialization stack:
>> [info] - object not serializable (class:
>> scala.reflect.internal.BaseTypeSeqs$BaseTypeSeq, value:
>> BTS(Int,AnyVal,Any))
>> [info] - field (class: scala.reflect.internal.Types$TypeRef, name:
>> baseTypeSeqCache, type: class scala.reflect.internal.BaseTyp
>> eSeqs$BaseTypeSeq)
>> [info] - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef,
>> Int)
>> [info] - field (class: 
>> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6,
>> name: elementType$2, type: class scala.reflect.api.Types$TypeApi)
>> [info] - object (class 
>> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6,
>> )
>> [info] - field (class: org.apache.spark.sql.catalyst.
>> expressions.objects.UnresolvedMapObjects, name: function, type:
>> interface scala.Function1)
>> [info] - object (class org.apache.spark.sql.catalyst.
>> expressions.objects.UnresolvedMapObjects, unresolvedmapobjects(,
>> getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface
>> scala.collection.Seq)))
>> [info] - field (class: org.apache.spark.sql.catalyst.
>> expressions.objects.WrapOption, name: child, type: class
>> org.apache.spark.sql.catalyst.expressions.Expression)
>> [info] - object (class org.apache.spark.sql.catalyst.
>> expressions.objects.WrapOption, wrapoption(unresolvedmapobjects(,
>> getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface
>> scala.collection.Seq)), ObjectType(interface scala.collection.Seq)))
>> [info] - writeObject data (class: scala.collection.immutable.Lis
>> t$SerializationProxy)
>> [info] - object (class 
>> scala.collection.immutable.List$SerializationProxy,
>> scala.collection.immutable.List$SerializationProxy@69040c85)
>> [info] - writeReplace data (class: scala.collection.immutable.Lis
>> t$SerializationProxy)
>> [info] - object (class scala.collection.immutable.$colon$colon,
>> List(wrapoption(unresolvedmapobjects(, getcolumnbyordinal(0,
>> ArrayType(IntegerType,false)), Some(interface scala.collection.Seq)),
>> ObjectType(interface scala.collection.Seq
>> [info] - field (class: org.apache.spark.sql.catalyst.
>> expressions.objects.NewInstance, name: arguments, type: interface
>> scala.collection.Seq)
>> [info] - object (class org.apache.spark.sql.catalyst.
>> expressions.objects.NewInstance, newInstance(class scala.Tuple1))
>> [info] - field (class: 
>> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder,
>> name: deserializer, type: class org.apache.spark.sql.catalyst.
>> expressions.Expression)
>> [info] - object (class 
>> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder,
>> class[_1[0]: array])
>> ...
>>
>>
>


Re: internal unit tests failing against the latest spark master

2017-04-12 Thread Koert Kuipers
i believe the error is related to an
org.apache.spark.sql.expressions.Aggregator where the buffer type (BUF) is
Array[Int]

On Wed, Apr 12, 2017 at 4:19 PM, Koert Kuipers  wrote:

> hey all,
> today i tried upgrading the spark version we use internally by creating a
> new internal release from the spark master branch. last time i did this was
> march 7.
>
> with this updated spark i am seeing some serialization errors in the unit
> tests for our own libraries. looks like a scala reflection type that is not
> serializable is getting sucked into serialization for the encoder?
> see below.
> best,
> koert
>
> [info]   org.apache.spark.SparkException: Task not serializable
> [info]   at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
> ClosureCleaner.scala:298)
> [info]   at org.apache.spark.util.ClosureCleaner$.org$apache$
> spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> [info]   at org.apache.spark.util.ClosureCleaner$.clean(
> ClosureCleaner.scala:108)
> [info]   at org.apache.spark.SparkContext.clean(SparkContext.scala:2284)
> [info]   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058)
> ...
> [info] Serialization stack:
> [info] - object not serializable (class: 
> scala.reflect.internal.BaseTypeSeqs$BaseTypeSeq,
> value: BTS(Int,AnyVal,Any))
> [info] - field (class: scala.reflect.internal.Types$TypeRef, name:
> baseTypeSeqCache, type: class scala.reflect.internal.
> BaseTypeSeqs$BaseTypeSeq)
> [info] - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef,
> Int)
> [info] - field (class: 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6,
> name: elementType$2, type: class scala.reflect.api.Types$TypeApi)
> [info] - object (class 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6,
> )
> [info] - field (class: org.apache.spark.sql.catalyst.
> expressions.objects.UnresolvedMapObjects, name: function, type: interface
> scala.Function1)
> [info] - object (class org.apache.spark.sql.catalyst.
> expressions.objects.UnresolvedMapObjects, unresolvedmapobjects(,
> getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface
> scala.collection.Seq)))
> [info] - field (class: org.apache.spark.sql.catalyst.
> expressions.objects.WrapOption, name: child, type: class
> org.apache.spark.sql.catalyst.expressions.Expression)
> [info] - object (class org.apache.spark.sql.catalyst.
> expressions.objects.WrapOption, wrapoption(unresolvedmapobjects(,
> getcolumnbyordinal(0, ArrayType(IntegerType,false)), Some(interface
> scala.collection.Seq)), ObjectType(interface scala.collection.Seq)))
> [info] - writeObject data (class: scala.collection.immutable.
> List$SerializationProxy)
> [info] - object (class scala.collection.immutable.List$SerializationProxy,
> scala.collection.immutable.List$SerializationProxy@69040c85)
> [info] - writeReplace data (class: scala.collection.immutable.
> List$SerializationProxy)
> [info] - object (class scala.collection.immutable.$colon$colon,
> List(wrapoption(unresolvedmapobjects(, getcolumnbyordinal(0,
> ArrayType(IntegerType,false)), Some(interface scala.collection.Seq)),
> ObjectType(interface scala.collection.Seq
> [info] - field (class: org.apache.spark.sql.catalyst.
> expressions.objects.NewInstance, name: arguments, type: interface
> scala.collection.Seq)
> [info] - object (class org.apache.spark.sql.catalyst.
> expressions.objects.NewInstance, newInstance(class scala.Tuple1))
> [info] - field (class: 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder,
> name: deserializer, type: class org.apache.spark.sql.catalyst.
> expressions.Expression)
> [info] - object (class 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder,
> class[_1[0]: array])
> ...
>
>


internal unit tests failing against the latest spark master

2017-04-12 Thread Koert Kuipers
hey all,
today i tried upgrading the spark version we use internally by creating a
new internal release from the spark master branch. last time i did this was
march 7.

with this updated spark i am seeing some serialization errors in the unit
tests for our own libraries. looks like a scala reflection type that is not
serializable is getting sucked into serialization for the encoder?
see below.
best,
koert

[info]   org.apache.spark.SparkException: Task not serializable
[info]   at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
[info]   at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
[info]   at
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
[info]   at org.apache.spark.SparkContext.clean(SparkContext.scala:2284)
[info]   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058)
...
[info] Serialization stack:
[info] - object not serializable (class:
scala.reflect.internal.BaseTypeSeqs$BaseTypeSeq, value: BTS(Int,AnyVal,Any))
[info] - field (class: scala.reflect.internal.Types$TypeRef, name:
baseTypeSeqCache, type: class
scala.reflect.internal.BaseTypeSeqs$BaseTypeSeq)
[info] - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef,
Int)
[info] - field (class:
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, name:
elementType$2, type: class scala.reflect.api.Types$TypeApi)
[info] - object (class
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, )
[info] - field (class:
org.apache.spark.sql.catalyst.expressions.objects.UnresolvedMapObjects,
name: function, type: interface scala.Function1)
[info] - object (class
org.apache.spark.sql.catalyst.expressions.objects.UnresolvedMapObjects,
unresolvedmapobjects(, getcolumnbyordinal(0,
ArrayType(IntegerType,false)), Some(interface scala.collection.Seq)))
[info] - field (class:
org.apache.spark.sql.catalyst.expressions.objects.WrapOption, name: child,
type: class org.apache.spark.sql.catalyst.expressions.Expression)
[info] - object (class
org.apache.spark.sql.catalyst.expressions.objects.WrapOption,
wrapoption(unresolvedmapobjects(, getcolumnbyordinal(0,
ArrayType(IntegerType,false)), Some(interface scala.collection.Seq)),
ObjectType(interface scala.collection.Seq)))
[info] - writeObject data (class:
scala.collection.immutable.List$SerializationProxy)
[info] - object (class
scala.collection.immutable.List$SerializationProxy,
scala.collection.immutable.List$SerializationProxy@69040c85)
[info] - writeReplace data (class:
scala.collection.immutable.List$SerializationProxy)
[info] - object (class scala.collection.immutable.$colon$colon,
List(wrapoption(unresolvedmapobjects(, getcolumnbyordinal(0,
ArrayType(IntegerType,false)), Some(interface scala.collection.Seq)),
ObjectType(interface scala.collection.Seq
[info] - field (class:
org.apache.spark.sql.catalyst.expressions.objects.NewInstance, name:
arguments, type: interface scala.collection.Seq)
[info] - object (class
org.apache.spark.sql.catalyst.expressions.objects.NewInstance,
newInstance(class scala.Tuple1))
[info] - field (class:
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder, name:
deserializer, type: class
org.apache.spark.sql.catalyst.expressions.Expression)
[info] - object (class
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder, class[_1[0]:
array])
...