[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741486#comment-14741486
 ] 

Glenn Strycker commented on SPARK-10569:


Playing around with adding additional registrations, I tried adding things like 
"kryo.register( classOf[ (Any,Any,Any) ] )" and "kryo.register( classOf[ Array 
[ (Any,Any,Any) ] ] )", and I once got an error saying "User class threw 
exception: Task not serializable" instead of the "Class is not registered: 
scala.Tuple3[]" error.

It's still in the same spot of the code, though -- sortByKey

> Kryo serialization fails on sortByKey operation on registered RDDs
> --
>
> Key: SPARK-10569
> URL: https://issues.apache.org/jira/browse/SPARK-10569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Glenn Strycker
>
> I have code that creates RDDs, persists, checkpoints, and materializes (using 
> count()), and these RDDs are serialized with Kryo, using the standard code.
> I have "kryo.setRegistrationRequired(true)", which is useful for debugging my 
> code to find out which RDDs I haven't registered.  Unfortunately, having this 
> setting turned on does not seem compatible with Spark internals.
> When my code encounters a sortByKey, it fails, giving my an error:
> {noformat}
> User class threw exception: Job aborted due to stage failure: Task 1 in stage 
> 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID 
> 232, ): java.lang.IllegalArgumentException: Class is not 
> registered: scala.Tuple3[]
> Note: To register this class use: kryo.register(scala.Tuple3[].class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> {noformat}
> Why is scala.Tuple3[] not registered?  I attempted to register it using 
> various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem 
> to work.
> I tried making sure that both my keys and values of my RDD are both 
> registered in addition to the entire RDD.  I have lines like this:
> {code}
> kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))])
> kryo.register(classOf[((Any,Any),(Any,Any))])
> kryo.register(classOf[((Any, Any),Any)])
> {code}
> Again, my program is only dying on the sortByKey command.  If I get rid of 
> it, the code proceeds just fine, but I need this for certain operations 
> (assigning indices based on sort order).
> FYI, it is failing of RDDs of all types... I verified this in several places 
> in my program.
> {code}
> myRDD.sortByKey(ascending=true).collect().foreach(println)
> {code}
> doesn't work (gives the error above), but
> {code}
> myRDD.collect().foreach(println)
> {code}
> works just fine.  My code also works if I turn off 
> "kryo.setRegistrationRequired(true)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741582#comment-14741582
 ] 

Glenn Strycker commented on SPARK-10569:


Is this issue related to HIVE-7540 or SPARK-2421?

> Kryo serialization fails on sortByKey operation on registered RDDs
> --
>
> Key: SPARK-10569
> URL: https://issues.apache.org/jira/browse/SPARK-10569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Glenn Strycker
>
> I have code that creates RDDs, persists, checkpoints, and materializes (using 
> count()), and these RDDs are serialized with Kryo, using the standard code.
> I have "kryo.setRegistrationRequired(true)", which is useful for debugging my 
> code to find out which RDDs I haven't registered.  Unfortunately, having this 
> setting turned on does not seem compatible with Spark internals.
> When my code encounters a sortByKey, it fails, giving my an error:
> {noformat}
> User class threw exception: Job aborted due to stage failure: Task 1 in stage 
> 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID 
> 232, ): java.lang.IllegalArgumentException: Class is not 
> registered: scala.Tuple3[]
> Note: To register this class use: kryo.register(scala.Tuple3[].class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> {noformat}
> Why is scala.Tuple3[] not registered?  I attempted to register it using 
> various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem 
> to work.
> I tried making sure that both my keys and values of my RDD are both 
> registered in addition to the entire RDD.  I have lines like this:
> {code}
> kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))])
> kryo.register(classOf[((Any,Any),(Any,Any))])
> kryo.register(classOf[((Any, Any),Any)])
> {code}
> Again, my program is only dying on the sortByKey command.  If I get rid of 
> it, the code proceeds just fine, but I need this for certain operations 
> (assigning indices based on sort order).
> FYI, it is failing of RDDs of all types... I verified this in several places 
> in my program.
> {code}
> myRDD.sortByKey(ascending=true).collect().foreach(println)
> {code}
> doesn't work (gives the error above), but
> {code}
> myRDD.collect().foreach(println)
> {code}
> works just fine.  My code also works if I turn off 
> "kryo.setRegistrationRequired(true)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741633#comment-14741633
 ] 

Josh Rosen commented on SPARK-10569:


This actually sounds like an instance of SPARK-10251, which should be re-opened.

> Kryo serialization fails on sortByKey operation on registered RDDs
> --
>
> Key: SPARK-10569
> URL: https://issues.apache.org/jira/browse/SPARK-10569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Glenn Strycker
>
> I have code that creates RDDs, persists, checkpoints, and materializes (using 
> count()), and these RDDs are serialized with Kryo, using the standard code.
> I have "kryo.setRegistrationRequired(true)", which is useful for debugging my 
> code to find out which RDDs I haven't registered.  Unfortunately, having this 
> setting turned on does not seem compatible with Spark internals.
> When my code encounters a sortByKey, it fails, giving my an error:
> {noformat}
> User class threw exception: Job aborted due to stage failure: Task 1 in stage 
> 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID 
> 232, ): java.lang.IllegalArgumentException: Class is not 
> registered: scala.Tuple3[]
> Note: To register this class use: kryo.register(scala.Tuple3[].class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> {noformat}
> Why is scala.Tuple3[] not registered?  I attempted to register it using 
> various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem 
> to work.
> I tried making sure that both my keys and values of my RDD are both 
> registered in addition to the entire RDD.  I have lines like this:
> {code}
> kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))])
> kryo.register(classOf[((Any,Any),(Any,Any))])
> kryo.register(classOf[((Any, Any),Any)])
> {code}
> Again, my program is only dying on the sortByKey command.  If I get rid of 
> it, the code proceeds just fine, but I need this for certain operations 
> (assigning indices based on sort order).
> FYI, it is failing of RDDs of all types... I verified this in several places 
> in my program.
> {code}
> myRDD.sortByKey(ascending=true).collect().foreach(println)
> {code}
> doesn't work (gives the error above), but
> {code}
> myRDD.collect().foreach(println)
> {code}
> works just fine.  My code also works if I turn off 
> "kryo.setRegistrationRequired(true)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741638#comment-14741638
 ] 

Glenn Strycker commented on SPARK-10569:


Note that I am still using 1.3.0.  I noticed that SPARK-10251 was an issue for 
1.4.1... so if that one is resolved, perhaps this one is as well?

> Kryo serialization fails on sortByKey operation on registered RDDs
> --
>
> Key: SPARK-10569
> URL: https://issues.apache.org/jira/browse/SPARK-10569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Glenn Strycker
>
> I have code that creates RDDs, persists, checkpoints, and materializes (using 
> count()), and these RDDs are serialized with Kryo, using the standard code.
> I have "kryo.setRegistrationRequired(true)", which is useful for debugging my 
> code to find out which RDDs I haven't registered.  Unfortunately, having this 
> setting turned on does not seem compatible with Spark internals.
> When my code encounters a sortByKey, it fails, giving my an error:
> {noformat}
> User class threw exception: Job aborted due to stage failure: Task 1 in stage 
> 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID 
> 232, ): java.lang.IllegalArgumentException: Class is not 
> registered: scala.Tuple3[]
> Note: To register this class use: kryo.register(scala.Tuple3[].class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> {noformat}
> Why is scala.Tuple3[] not registered?  I attempted to register it using 
> various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem 
> to work.
> I tried making sure that both my keys and values of my RDD are both 
> registered in addition to the entire RDD.  I have lines like this:
> {code}
> kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))])
> kryo.register(classOf[((Any,Any),(Any,Any))])
> kryo.register(classOf[((Any, Any),Any)])
> {code}
> Again, my program is only dying on the sortByKey command.  If I get rid of 
> it, the code proceeds just fine, but I need this for certain operations 
> (assigning indices based on sort order).
> FYI, it is failing of RDDs of all types... I verified this in several places 
> in my program.
> {code}
> myRDD.sortByKey(ascending=true).collect().foreach(println)
> {code}
> doesn't work (gives the error above), but
> {code}
> myRDD.collect().foreach(println)
> {code}
> works just fine.  My code also works if I turn off 
> "kryo.setRegistrationRequired(true)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741554#comment-14741554
 ] 

Glenn Strycker commented on SPARK-10569:


I'm also seeing the occasional "User class threw exception: Task not 
serializable"

> Kryo serialization fails on sortByKey operation on registered RDDs
> --
>
> Key: SPARK-10569
> URL: https://issues.apache.org/jira/browse/SPARK-10569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Glenn Strycker
>
> I have code that creates RDDs, persists, checkpoints, and materializes (using 
> count()), and these RDDs are serialized with Kryo, using the standard code.
> I have "kryo.setRegistrationRequired(true)", which is useful for debugging my 
> code to find out which RDDs I haven't registered.  Unfortunately, having this 
> setting turned on does not seem compatible with Spark internals.
> When my code encounters a sortByKey, it fails, giving my an error:
> {noformat}
> User class threw exception: Job aborted due to stage failure: Task 1 in stage 
> 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID 
> 232, ): java.lang.IllegalArgumentException: Class is not 
> registered: scala.Tuple3[]
> Note: To register this class use: kryo.register(scala.Tuple3[].class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> {noformat}
> Why is scala.Tuple3[] not registered?  I attempted to register it using 
> various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem 
> to work.
> I tried making sure that both my keys and values of my RDD are both 
> registered in addition to the entire RDD.  I have lines like this:
> {code}
> kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))])
> kryo.register(classOf[((Any,Any),(Any,Any))])
> kryo.register(classOf[((Any, Any),Any)])
> {code}
> Again, my program is only dying on the sortByKey command.  If I get rid of 
> it, the code proceeds just fine, but I need this for certain operations 
> (assigning indices based on sort order).
> FYI, it is failing of RDDs of all types... I verified this in several places 
> in my program.
> {code}
> myRDD.sortByKey(ascending=true).collect().foreach(println)
> {code}
> doesn't work (gives the error above), but
> {code}
> myRDD.collect().foreach(println)
> {code}
> works just fine.  My code also works if I turn off 
> "kryo.setRegistrationRequired(true)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

2015-09-11 Thread Glenn Strycker (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741627#comment-14741627
 ] 

Glenn Strycker commented on SPARK-10569:


It looks very similar to this thread: 
https://groups.google.com/forum/#!topic/spark-users/Whf1cGwZlD8, and 
[~joshrosen] commented on it, so at least one known Spark contributor is aware 
of this issue :-)

I tried registering the keys and values separately in addition to the RDD being 
sorted... is sortByKey remapping the RDD into another form?  For example, if my 
key is a pair (A,B), and sortByKey is first sorting by A, maybe it is mapping 
things to (A, (B, V)), assigning an order index, then mapping (B, (A,V,index1)) 
or something?  If so, please let me know what additional forms are used, and I 
can register those forms, such as "(Any, (Any, Any, Any))"

> Kryo serialization fails on sortByKey operation on registered RDDs
> --
>
> Key: SPARK-10569
> URL: https://issues.apache.org/jira/browse/SPARK-10569
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Glenn Strycker
>
> I have code that creates RDDs, persists, checkpoints, and materializes (using 
> count()), and these RDDs are serialized with Kryo, using the standard code.
> I have "kryo.setRegistrationRequired(true)", which is useful for debugging my 
> code to find out which RDDs I haven't registered.  Unfortunately, having this 
> setting turned on does not seem compatible with Spark internals.
> When my code encounters a sortByKey, it fails, giving my an error:
> {noformat}
> User class threw exception: Job aborted due to stage failure: Task 1 in stage 
> 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID 
> 232, ): java.lang.IllegalArgumentException: Class is not 
> registered: scala.Tuple3[]
> Note: To register this class use: kryo.register(scala.Tuple3[].class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> {noformat}
> Why is scala.Tuple3[] not registered?  I attempted to register it using 
> various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem 
> to work.
> I tried making sure that both my keys and values of my RDD are both 
> registered in addition to the entire RDD.  I have lines like this:
> {code}
> kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))])
> kryo.register(classOf[((Any,Any),(Any,Any))])
> kryo.register(classOf[((Any, Any),Any)])
> {code}
> Again, my program is only dying on the sortByKey command.  If I get rid of 
> it, the code proceeds just fine, but I need this for certain operations 
> (assigning indices based on sort order).
> FYI, it is failing of RDDs of all types... I verified this in several places 
> in my program.
> {code}
> myRDD.sortByKey(ascending=true).collect().foreach(println)
> {code}
> doesn't work (gives the error above), but
> {code}
> myRDD.collect().foreach(println)
> {code}
> works just fine.  My code also works if I turn off 
> "kryo.setRegistrationRequired(true)".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org