Kousuke Saruta has posted comments on this change.

Change subject: KuduRDD.collect fails because of NoSerializableException
......................................................................


Patch Set 4:

Hi Dan,

Thank you for the review! I have a question and a comment.

 > I'm not keen on adding io.Serializable to the java client classes
 > due to compatibility concerns with the Java Serializable API. 

Can we have any compatibility issue? Do you have any examples?

 > looked into this issue, and it seems our biggest spark users are
 > using Kryo serialization instead of Java serialization, since it
 > provides much better performance (and it should be compatible with
 > KuduRDD).  Is it an option to use Kryo?  Using it should be as
 > simple as setting the "spark.serializer" option on the SparkConf:
 > 
 > new SparkConf().set("spark.serializer", 
 > "org.apache.spark.serializer.KryoSerializer")

As you mentioned, Kryo is more efficient than Java serializer but unfortunately,
we can't serialize/deserialize those classes by Kryo. 
When we try to serialize those classes by Kryo, we will get exception like as 
follows.

```
16/12/15 15:40:58 ERROR TaskResultGetter: Exception while getting task result
com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException
Serialization trace:
columnsByIndex (org.apache.kudu.Schema)
schema (org.apache.kudu.client.RowResult)
rowResult (org.apache.kudu.spark.kudu.KuduRow)
        at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:396)
        at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:307)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at 
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:327)
        at 
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:88)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:72)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.UnsupportedOperationException
        at 
org.apache.kudu.client.shaded.com.google.common.collect.ImmutableCollection.add(ImmutableCollection.java:96)
        at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        ... 21 more
```

The reason why we get the exception above is that Kryo can't 
serialize/deserialize guava's ImmutableList which is the type of columnsByIndex 
in Schema.

-- 
To view, visit http://gerrit.cloudera.org:8080/5496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If0463424481a33c66fd7464345c305062420cfe9
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Kousuke Saruta <saru...@oss.nttdata.co.jp>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Kousuke Saruta <saru...@oss.nttdata.co.jp>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

Reply via email to