[ 
https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741627#comment-14741627
 ] 

Glenn Strycker commented on SPARK-10569:
----------------------------------------

It looks very similar to this thread: 
https://groups.google.com/forum/#!topic/spark-users/Whf1cGwZlD8, and 
[~joshrosen] commented on it, so at least one known Spark contributor is aware 
of this issue :-)

I tried registering the keys and values separately in addition to the RDD being 
sorted... is sortByKey remapping the RDD into another form?  For example, if my 
key is a pair (A,B), and sortByKey is first sorting by A, maybe it is mapping 
things to (A, (B, V)), assigning an order index, then mapping (B, (A,V,index1)) 
or something?  If so, please let me know what additional forms are used, and I 
can register those forms, such as "(Any, (Any, Any, Any))"

> Kryo serialization fails on sortByKey operation on registered RDDs
> ------------------------------------------------------------------
>
>                 Key: SPARK-10569
>                 URL: https://issues.apache.org/jira/browse/SPARK-10569
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Glenn Strycker
>
> I have code that creates RDDs, persists, checkpoints, and materializes (using 
> count()), and these RDDs are serialized with Kryo, using the standard code.
> I have "kryo.setRegistrationRequired(true)", which is useful for debugging my 
> code to find out which RDDs I haven't registered.  Unfortunately, having this 
> setting turned on does not seem compatible with Spark internals.
> When my code encounters a sortByKey, it fails, giving my an error:
> {noformat}
> User class threw exception: Job aborted due to stage failure: Task 1 in stage 
> 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID 
> 232, <server name>): java.lang.IllegalArgumentException: Class is not 
> registered: scala.Tuple3[]
> Note: To register this class use: kryo.register(scala.Tuple3[].class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> {noformat}
> Why is scala.Tuple3[] not registered?  I attempted to register it using 
> various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem 
> to work.
> I tried making sure that both my keys and values of my RDD are both 
> registered in addition to the entire RDD.  I have lines like this:
> {code}
>     kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))])
>     kryo.register(classOf[((Any,Any),(Any,Any))])
>     kryo.register(classOf[((Any, Any),Any)])
> {code}
> Again, my program is only dying on the sortByKey command.  If I get rid of 
> it, the code proceeds just fine, but I need this for certain operations 
> (assigning indices based on sort order).
> FYI, it is failing of RDDs of all types... I verified this in several places 
> in my program.
> {code}
> myRDD.sortByKey(ascending=true).collect().foreach(println)
> {code}
> doesn't work (gives the error above), but
> {code}
> myRDD.collect().foreach(println)
> {code}
> works just fine.  My code also works if I turn off 
> "kryo.setRegistrationRequired(true)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to