Re: Serialization Problem in Spark Program
Awesome. Thanks Best Regards On Fri, Mar 27, 2015 at 7:26 AM, donhoff_h 165612...@qq.com wrote: Hi, Akhil Yes, it's the problem lies in. Thanks very much for point out my mistake. -- Original -- *From: * Akhil Das;ak...@sigmoidanalytics.com; *Send time:* Thursday, Mar 26, 2015 3:23 PM *To:* donhoff_h165612...@qq.com; *Cc:* useruser@spark.apache.org; *Subject: * Re: Serialization Problem in Spark Program Try registering your MyObject[] with Kryo. On 25 Mar 2015 13:17, donhoff_h 165612...@qq.com wrote: Hi, experts I wrote a very simple spark program to test the KryoSerialization function. The codes are as following: object TestKryoSerialization { def main(args: Array[String]) { val conf = new SparkConf() conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrationRequired,true) //I use this statement to force checking registration. conf.registerKryoClasses(Array(classOf[MyObject])) val sc = new SparkContext(conf) val rdd = sc.textFile(hdfs://dhao.hrb:8020/user/spark/tstfiles/charset/A_utf8.txt) val objs = rdd.map(new MyObject(_,1)).collect() for (x - objs ) { x.printMyObject } } The class MyObject is also a very simple Class, which is only used to test the serialization function: class MyObject { var myStr : String = var myInt : Int = 0 def this(inStr : String, inInt : Int) { this() this.myStr = inStr this.myInt = inInt } def printMyObject { println(MyString is : +myStr+\tMyInt is : +myInt) } } But when I ran the application, it reported the following error: java.lang.IllegalArgumentException: Class is not registered: dhao.test.Serialization.MyObject[] Note: To register this class use: kryo.register(dhao.test.Serialization.MyObject[].class); at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442) at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79) at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I don't understand what cause this problem. I have used the conf.registerKryoClasses to register my class. Could anyone help me ? Thanks By the way, the spark version is 1.3.0.
Re: Serialization Problem in Spark Program
Try registering your MyObject[] with Kryo. On 25 Mar 2015 13:17, donhoff_h 165612...@qq.com wrote: Hi, experts I wrote a very simple spark program to test the KryoSerialization function. The codes are as following: object TestKryoSerialization { def main(args: Array[String]) { val conf = new SparkConf() conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrationRequired,true) //I use this statement to force checking registration. conf.registerKryoClasses(Array(classOf[MyObject])) val sc = new SparkContext(conf) val rdd = sc.textFile(hdfs://dhao.hrb:8020/user/spark/tstfiles/charset/A_utf8.txt) val objs = rdd.map(new MyObject(_,1)).collect() for (x - objs ) { x.printMyObject } } The class MyObject is also a very simple Class, which is only used to test the serialization function: class MyObject { var myStr : String = var myInt : Int = 0 def this(inStr : String, inInt : Int) { this() this.myStr = inStr this.myInt = inInt } def printMyObject { println(MyString is : +myStr+\tMyInt is : +myInt) } } But when I ran the application, it reported the following error: java.lang.IllegalArgumentException: Class is not registered: dhao.test.Serialization.MyObject[] Note: To register this class use: kryo.register(dhao.test.Serialization.MyObject[].class); at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442) at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79) at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I don't understand what cause this problem. I have used the conf.registerKryoClasses to register my class. Could anyone help me ? Thanks By the way, the spark version is 1.3.0.
Serialization Problem in Spark Program
Hi, experts I wrote a very simple spark program to test the KryoSerialization function. The codes are as following: object TestKryoSerialization { def main(args: Array[String]) { val conf = new SparkConf() conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrationRequired,true) //I use this statement to force checking registration. conf.registerKryoClasses(Array(classOf[MyObject])) val sc = new SparkContext(conf) val rdd = sc.textFile(hdfs://dhao.hrb:8020/user/spark/tstfiles/charset/A_utf8.txt) val objs = rdd.map(new MyObject(_,1)).collect() for (x - objs ) { x.printMyObject } } The class MyObject is also a very simple Class, which is only used to test the serialization function: class MyObject { var myStr : String = var myInt : Int = 0 def this(inStr : String, inInt : Int) { this() this.myStr = inStr this.myInt = inInt } def printMyObject { println(MyString is : +myStr+\tMyInt is : +myInt) } } But when I ran the application, it reported the following error: java.lang.IllegalArgumentException: Class is not registered: dhao.test.Serialization.MyObject[] Note: To register this class use: kryo.register(dhao.test.Serialization.MyObject[].class); at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442) at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79) at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I don't understand what cause this problem. I have used the conf.registerKryoClasses to register my class. Could anyone help me ? Thanks By the way, the spark version is 1.3.0.
Re: Serialization Problem in Spark Program
you also need to register *array*s of MyObject. so change: conf.registerKryoClasses(Array(classOf[MyObject])) to conf.registerKryoClasses(Array(classOf[MyObject], classOf[Array[MyObject]])) On Wed, Mar 25, 2015 at 2:44 AM, donhoff_h 165612...@qq.com wrote: Hi, experts I wrote a very simple spark program to test the KryoSerialization function. The codes are as following: object TestKryoSerialization { def main(args: Array[String]) { val conf = new SparkConf() conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrationRequired,true) //I use this statement to force checking registration. conf.registerKryoClasses(Array(classOf[MyObject])) val sc = new SparkContext(conf) val rdd = sc.textFile(hdfs://dhao.hrb:8020/user/spark/tstfiles/charset/A_utf8.txt) val objs = rdd.map(new MyObject(_,1)).collect() for (x - objs ) { x.printMyObject } } The class MyObject is also a very simple Class, which is only used to test the serialization function: class MyObject { var myStr : String = var myInt : Int = 0 def this(inStr : String, inInt : Int) { this() this.myStr = inStr this.myInt = inInt } def printMyObject { println(MyString is : +myStr+\tMyInt is : +myInt) } } But when I ran the application, it reported the following error: java.lang.IllegalArgumentException: Class is not registered: dhao.test.Serialization.MyObject[] Note: To register this class use: kryo.register(dhao.test.Serialization.MyObject[].class); at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442) at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79) at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I don't understand what cause this problem. I have used the conf.registerKryoClasses to register my class. Could anyone help me ? Thanks By the way, the spark version is 1.3.0.
Re: Serialization problem in Spark
did you try to register the class in Kryo serializer? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Mon, Jun 23, 2014 at 7:00 PM, rrussell25 rrussel...@gmail.com wrote: Thanks for pointer...tried Kryo and ran into a strange error: org.apache.spark.SparkException: Job aborted due to stage failure: Exception while deserializing and fetching task: com.esotericsoftware.kryo.KryoException: Unable to find class: rg.apache.hadoop.hbase.io.ImmutableBytesWritable It is strange in that the complaint is for rg.apache... (missing o is not a typo). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Serialization-problem-in-Spark-tp7049p8123.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Serialization problem in Spark
Thanks for pointer...tried Kryo and ran into a strange error: org.apache.spark.SparkException: Job aborted due to stage failure: Exception while deserializing and fetching task: com.esotericsoftware.kryo.KryoException: Unable to find class: rg.apache.hadoop.hbase.io.ImmutableBytesWritable It is strange in that the complaint is for rg.apache... (missing o is not a typo). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Serialization-problem-in-Spark-tp7049p8123.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Serialization problem in Spark
I'm not sure if this is a Hadoop-centric issue or not. I had similar issues with non-serializable external library classes. I used a Kryo config (as illustrated here https://spark.apache.org/docs/latest/tuning.html#data-serialization ) and registered the one troublesome class. It seemed to work after that. Here's a link to the thread http://apache-spark-user-list.1001560.n3.nabble.com/Un-serializable-3rd-party-classes-Spark-Java-td7815.html I asked on. Take a look at the other solutions proposed as well. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Serialization-problem-in-Spark-tp7049p7975.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Serialization problem in Spark
Where are you getting serialization error. Its likely to be a different problem. Which class is not getting serialized? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Thu, Jun 5, 2014 at 6:32 PM, Vibhor Banga vibhorba...@gmail.com wrote: Any inputs on this will be helpful. Thanks, -Vibhor On Thu, Jun 5, 2014 at 3:41 PM, Vibhor Banga vibhorba...@gmail.com wrote: Hi, I am trying to do something like following in Spark: JavaPairRDDbyte[], MyObject eventRDD = hBaseRDD.map(new PairFunctionTuple2ImmutableBytesWritable, Result, byte[], MyObject () { @Override public Tuple2byte[], MyObject call(Tuple2ImmutableBytesWritable, Result immutableBytesWritableResultTuple2) throws Exception { return new Tuple2byte[], MyObject (immutableBytesWritableResultTuple2._1.get(), MyClass.get(immutableBytesWritableResultTuple2._2)); } }); eventRDD.foreach(new VoidFunctionTuple2byte[], Event() { @Override public void call(Tuple2byte[], Event eventTuple2) throws Exception { processForEvent(eventTuple2._2); } }); processForEvent() function flow contains some processing and ultimately writing to HBase Table. But I am getting serialisation issues with Hadoop and HBase inbuilt classes. How do I solve this ? Does using Kyro Serialisation help in this case ? Thanks, -Vibhor -- Vibhor Banga Software Development Engineer Flipkart Internet Pvt. Ltd., Bangalore
Serialization problem in Spark
Hi, I am trying to do something like following in Spark: JavaPairRDDbyte[], MyObject eventRDD = hBaseRDD.map(new PairFunctionTuple2ImmutableBytesWritable, Result, byte[], MyObject () { @Override public Tuple2byte[], MyObject call(Tuple2ImmutableBytesWritable, Result immutableBytesWritableResultTuple2) throws Exception { return new Tuple2byte[], MyObject (immutableBytesWritableResultTuple2._1.get(), MyClass.get(immutableBytesWritableResultTuple2._2)); } }); eventRDD.foreach(new VoidFunctionTuple2byte[], Event() { @Override public void call(Tuple2byte[], Event eventTuple2) throws Exception { processForEvent(eventTuple2._2); } }); processForEvent() function flow contains some processing and ultimately writing to HBase Table. But I am getting serialisation issues with Hadoop and HBase inbuilt classes. How do I solve this ? Does using Kyro Serialisation help in this case ? Thanks, -Vibhor
Re: Serialization problem in Spark
Any inputs on this will be helpful. Thanks, -Vibhor On Thu, Jun 5, 2014 at 3:41 PM, Vibhor Banga vibhorba...@gmail.com wrote: Hi, I am trying to do something like following in Spark: JavaPairRDDbyte[], MyObject eventRDD = hBaseRDD.map(new PairFunctionTuple2ImmutableBytesWritable, Result, byte[], MyObject () { @Override public Tuple2byte[], MyObject call(Tuple2ImmutableBytesWritable, Result immutableBytesWritableResultTuple2) throws Exception { return new Tuple2byte[], MyObject (immutableBytesWritableResultTuple2._1.get(), MyClass.get(immutableBytesWritableResultTuple2._2)); } }); eventRDD.foreach(new VoidFunctionTuple2byte[], Event() { @Override public void call(Tuple2byte[], Event eventTuple2) throws Exception { processForEvent(eventTuple2._2); } }); processForEvent() function flow contains some processing and ultimately writing to HBase Table. But I am getting serialisation issues with Hadoop and HBase inbuilt classes. How do I solve this ? Does using Kyro Serialisation help in this case ? Thanks, -Vibhor -- Vibhor Banga Software Development Engineer Flipkart Internet Pvt. Ltd., Bangalore