Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.
Thanks for the info frank. Twitter's-chill avro serializer looks great. But how does spark identifies it as serializer, as its not extending from KryoSerializer. (sorry scala is an alien lang for me). - Thanks & Regards, Mohan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14649.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.
Thanks for the info frank. so your suggestion could be to use Avro serializer. i just have to configure it like Kryo for the same property? and is there any registering process for this or just specify serializer? Also does it effect performance. what measures to be taken to avoid. (im using kryo just because of its high performance, size compared to avro) - Thanks & Regards, Mohan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14567.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.
Hi frank, thanks for the info, thats great. but im not saying Avro serializer is failing. Kryo is failing but im using kryo serializer. and registering Avro generated classes with kryo. sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"); sparkConf.set("spark.kryo.registrator", "com.globallogic.goliath.platform.PlatformKryoRegistrator"); But how did it able to perform output operation when the message is simple. but not when the message is complex.(please observe no avro schema changes) just the data is changed. providing you more info below. avro schema: = record KeyValueObject { union{boolean, int, long, float, double, bytes, string} name; union {boolean, int, long, float, double, bytes, string, array, KeyValueObject} value; } record Datum { union {boolean, int, long, float, double, bytes, string, array, KeyValueObject} value; } record ResourceMessage { string version; string sequence; string resourceGUID; string GWID; string GWTimestamp; union {Datum, array} data; } simple message is as below: === {"version": "01", "sequence": "1", "resourceGUID": "001", "GWID": "002", "GWTimestamp": "1409823150737", "data": {"value": "30"}} complex message is as below: === {"version": "01", "sequence": "1", "resource": "sensor-001", "controller": "002", "controllerTimestamp": "1411038710358", "data": {"value": [{"name": "Temperature", "value": "30"}, {"name": "Speed", "value": "60"}, {"name": "Location", "value": ["+401213.1", "-0750015.1"]}, {"name": "Timestamp", "value": "2014-09-09T08:15:25-05:00"}]}} both messages can fit in to the schema, actually the message is coming from kafka, which is avro binary. at spark converting the message to Avro objects(ResourceMessage) using decoders.(this is also working). able to perform some mappings, able to convert the stream to stream now the events need to be pushed to flume source. for this i need to collect the RDD, and then send to flume client. end to end worked fine with simple message. problem is with complex message. - Thanks & Regards, Mohan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14565.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.
Added some more info on this issue in the tracker Spark-3447 https://issues.apache.org/jira/browse/SPARK-3447 - Thanks & Regards, Mohan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14557.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Kryo fails with avro having Arrays and unions, but succeeds with simple avro.
*I am facing similar issue to Spark-3447 with spark streaming Api, Kryo Serializer, Avro messages. If avro message is simple, its fine. but if the avro message has Union/Arrays its failing with the exception Below:* ERROR scheduler.JobScheduler: Error running job streaming job 1411043845000 ms.0 org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException Serialization trace: value (com.globallogic.goliath.model.Datum) data (com.globallogic.goliath.model.ResourceMessage) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) *Above exception shows up when used output operations.* *below is the avro message. {"version": "01", "sequence": "1", "resource": "sensor-001", "controller": "002", "controllerTimestamp": "1411038710358", "data": {"value": [{"name": "Temperature", "value": "30"}, {"name": "Speed", "value": "60"}, {"name": "Location", "value": ["+401213.1", "-0750015.1"]}, {"name": "Timestamp", "value": "2014-09-09T08:15:25-05:00"}]}} message is been successfully decoded in decoder, but throws exception for output operation.* - Thanks & Regards, Mohan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org