Re: Serialization Problem in Spark Program

2015-03-27 Thread Akhil Das
Awesome.

Thanks
Best Regards

On Fri, Mar 27, 2015 at 7:26 AM, donhoff_h 165612...@qq.com wrote:

 Hi, Akhil

 Yes, it's the problem lies in. Thanks very much for point out my mistake.

 -- Original --
 *From: * Akhil Das;ak...@sigmoidanalytics.com;
 *Send time:* Thursday, Mar 26, 2015 3:23 PM
 *To:* donhoff_h165612...@qq.com;
 *Cc:* useruser@spark.apache.org;
 *Subject: * Re: Serialization Problem in Spark Program

 Try registering your MyObject[] with Kryo.
 On 25 Mar 2015 13:17, donhoff_h 165612...@qq.com wrote:

 Hi, experts

 I wrote a very simple spark program to test the KryoSerialization
 function. The codes are as following:

 object TestKryoSerialization {
   def main(args: Array[String]) {
 val conf = new SparkConf()
 conf.set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)
 conf.set(spark.kryo.registrationRequired,true)  //I use this
 statement to force checking registration.
 conf.registerKryoClasses(Array(classOf[MyObject]))

 val sc = new SparkContext(conf)
 val rdd =
 sc.textFile(hdfs://dhao.hrb:8020/user/spark/tstfiles/charset/A_utf8.txt)
 val objs = rdd.map(new MyObject(_,1)).collect()
 for (x - objs ) {
   x.printMyObject
 }
 }

 The class MyObject is also a very simple Class, which is only used to
 test the serialization function:
 class MyObject  {
   var myStr : String = 
   var myInt : Int = 0
   def this(inStr : String, inInt : Int) {
 this()
 this.myStr = inStr
 this.myInt = inInt
   }
   def printMyObject {
 println(MyString is : +myStr+\tMyInt is : +myInt)
   }
 }

 But when I ran the application, it reported the following error:
 java.lang.IllegalArgumentException: Class is not registered:
 dhao.test.Serialization.MyObject[]
 Note: To register this class use:
 kryo.register(dhao.test.Serialization.MyObject[].class);
 at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
 at
 com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
 at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
 at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
 at
 org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I don't understand what cause this problem. I have used the
 conf.registerKryoClasses to register my class. Could anyone help me ?
 Thanks

 By the way, the spark version is 1.3.0.




Re: Serialization Problem in Spark Program

2015-03-26 Thread Akhil Das
Try registering your MyObject[] with Kryo.
On 25 Mar 2015 13:17, donhoff_h 165612...@qq.com wrote:

 Hi, experts

 I wrote a very simple spark program to test the KryoSerialization
 function. The codes are as following:

 object TestKryoSerialization {
   def main(args: Array[String]) {
 val conf = new SparkConf()
 conf.set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)
 conf.set(spark.kryo.registrationRequired,true)  //I use this
 statement to force checking registration.
 conf.registerKryoClasses(Array(classOf[MyObject]))

 val sc = new SparkContext(conf)
 val rdd =
 sc.textFile(hdfs://dhao.hrb:8020/user/spark/tstfiles/charset/A_utf8.txt)
 val objs = rdd.map(new MyObject(_,1)).collect()
 for (x - objs ) {
   x.printMyObject
 }
 }

 The class MyObject is also a very simple Class, which is only used to test
 the serialization function:
 class MyObject  {
   var myStr : String = 
   var myInt : Int = 0
   def this(inStr : String, inInt : Int) {
 this()
 this.myStr = inStr
 this.myInt = inInt
   }
   def printMyObject {
 println(MyString is : +myStr+\tMyInt is : +myInt)
   }
 }

 But when I ran the application, it reported the following error:
 java.lang.IllegalArgumentException: Class is not registered:
 dhao.test.Serialization.MyObject[]
 Note: To register this class use:
 kryo.register(dhao.test.Serialization.MyObject[].class);
 at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
 at
 com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
 at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
 at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
 at
 org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I don't understand what cause this problem. I have used the
 conf.registerKryoClasses to register my class. Could anyone help me ?
 Thanks

 By the way, the spark version is 1.3.0.




Re: Serialization Problem in Spark Program

2015-03-25 Thread Imran Rashid
you also need to register *array*s of MyObject.  so change:

conf.registerKryoClasses(Array(classOf[MyObject]))

to

conf.registerKryoClasses(Array(classOf[MyObject], classOf[Array[MyObject]]))


On Wed, Mar 25, 2015 at 2:44 AM, donhoff_h 165612...@qq.com wrote:

 Hi, experts

 I wrote a very simple spark program to test the KryoSerialization
 function. The codes are as following:

 object TestKryoSerialization {
   def main(args: Array[String]) {
 val conf = new SparkConf()
 conf.set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)
 conf.set(spark.kryo.registrationRequired,true)  //I use this
 statement to force checking registration.
 conf.registerKryoClasses(Array(classOf[MyObject]))

 val sc = new SparkContext(conf)
 val rdd =
 sc.textFile(hdfs://dhao.hrb:8020/user/spark/tstfiles/charset/A_utf8.txt)
 val objs = rdd.map(new MyObject(_,1)).collect()
 for (x - objs ) {
   x.printMyObject
 }
 }

 The class MyObject is also a very simple Class, which is only used to test
 the serialization function:
 class MyObject  {
   var myStr : String = 
   var myInt : Int = 0
   def this(inStr : String, inInt : Int) {
 this()
 this.myStr = inStr
 this.myInt = inInt
   }
   def printMyObject {
 println(MyString is : +myStr+\tMyInt is : +myInt)
   }
 }

 But when I ran the application, it reported the following error:
 java.lang.IllegalArgumentException: Class is not registered:
 dhao.test.Serialization.MyObject[]
 Note: To register this class use:
 kryo.register(dhao.test.Serialization.MyObject[].class);
 at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
 at
 com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
 at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
 at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
 at
 org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I don't understand what cause this problem. I have used the
 conf.registerKryoClasses to register my class. Could anyone help me ?
 Thanks

 By the way, the spark version is 1.3.0.




Re: Serialization problem in Spark

2014-06-24 Thread Mayur Rustagi
did you try to register the class in Kryo serializer?

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Mon, Jun 23, 2014 at 7:00 PM, rrussell25 rrussel...@gmail.com wrote:

 Thanks for pointer...tried Kryo and ran into a strange error:

 org.apache.spark.SparkException: Job aborted due to stage failure:
 Exception
 while deserializing and fetching task:
 com.esotericsoftware.kryo.KryoException: Unable to find class:
 rg.apache.hadoop.hbase.io.ImmutableBytesWritable

 It is strange in that the complaint is for rg.apache...   (missing o is
 not a typo).





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Serialization-problem-in-Spark-tp7049p8123.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Serialization problem in Spark

2014-06-23 Thread rrussell25
Thanks for pointer...tried Kryo and ran into a strange error:

org.apache.spark.SparkException: Job aborted due to stage failure: Exception
while deserializing and fetching task:
com.esotericsoftware.kryo.KryoException: Unable to find class:
rg.apache.hadoop.hbase.io.ImmutableBytesWritable

It is strange in that the complaint is for rg.apache...   (missing o is
not a typo).   





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Serialization-problem-in-Spark-tp7049p8123.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Serialization problem in Spark

2014-06-19 Thread Daedalus
I'm not sure if this is a Hadoop-centric issue or not. I had similar issues
with non-serializable external library classes.

I used a Kryo config (as illustrated  here
https://spark.apache.org/docs/latest/tuning.html#data-serialization  ) and
registered the one troublesome class. It seemed to work after that.

Here's a link to the  thread
http://apache-spark-user-list.1001560.n3.nabble.com/Un-serializable-3rd-party-classes-Spark-Java-td7815.html
  
I asked on. Take a look at the other solutions proposed as well.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Serialization-problem-in-Spark-tp7049p7975.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Serialization problem in Spark

2014-06-06 Thread Mayur Rustagi
Where are you getting serialization error. Its likely to be a different
problem. Which class is not getting serialized?


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Thu, Jun 5, 2014 at 6:32 PM, Vibhor Banga vibhorba...@gmail.com wrote:

 Any inputs on this will be helpful.

 Thanks,
 -Vibhor


 On Thu, Jun 5, 2014 at 3:41 PM, Vibhor Banga vibhorba...@gmail.com
 wrote:

 Hi,

 I am trying to do something like following in Spark:

 JavaPairRDDbyte[], MyObject eventRDD = hBaseRDD.map(new
 PairFunctionTuple2ImmutableBytesWritable, Result, byte[], MyObject () {
 @Override
 public Tuple2byte[], MyObject 
 call(Tuple2ImmutableBytesWritable, Result
 immutableBytesWritableResultTuple2) throws Exception {
 return new
 Tuple2byte[], MyObject (immutableBytesWritableResultTuple2._1.get(),
 MyClass.get(immutableBytesWritableResultTuple2._2));
 }
 });

 eventRDD.foreach(new VoidFunctionTuple2byte[], Event() {
 @Override
 public void call(Tuple2byte[], Event eventTuple2) throws
 Exception {

 processForEvent(eventTuple2._2);
 }
 });


 processForEvent() function flow contains some processing and ultimately
 writing to HBase Table. But I am getting serialisation issues with Hadoop
 and HBase inbuilt classes. How do I solve this ? Does using Kyro
 Serialisation help in this case ?

 Thanks,
 -Vibhor




 --
 Vibhor Banga
 Software Development Engineer
 Flipkart Internet Pvt. Ltd., Bangalore




Re: Serialization problem in Spark

2014-06-05 Thread Vibhor Banga
Any inputs on this will be helpful.

Thanks,
-Vibhor


On Thu, Jun 5, 2014 at 3:41 PM, Vibhor Banga vibhorba...@gmail.com wrote:

 Hi,

 I am trying to do something like following in Spark:

 JavaPairRDDbyte[], MyObject eventRDD = hBaseRDD.map(new
 PairFunctionTuple2ImmutableBytesWritable, Result, byte[], MyObject () {
 @Override
 public Tuple2byte[], MyObject 
 call(Tuple2ImmutableBytesWritable, Result
 immutableBytesWritableResultTuple2) throws Exception {
 return new
 Tuple2byte[], MyObject (immutableBytesWritableResultTuple2._1.get(),
 MyClass.get(immutableBytesWritableResultTuple2._2));
 }
 });

 eventRDD.foreach(new VoidFunctionTuple2byte[], Event() {
 @Override
 public void call(Tuple2byte[], Event eventTuple2) throws
 Exception {

 processForEvent(eventTuple2._2);
 }
 });


 processForEvent() function flow contains some processing and ultimately
 writing to HBase Table. But I am getting serialisation issues with Hadoop
 and HBase inbuilt classes. How do I solve this ? Does using Kyro
 Serialisation help in this case ?

 Thanks,
 -Vibhor




-- 
Vibhor Banga
Software Development Engineer
Flipkart Internet Pvt. Ltd., Bangalore