Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.

2014-09-19 Thread mohan.gadm
Thanks for the info frank.
Twitter's-chill avro serializer looks great.
But how does spark identifies it as serializer, as its not extending from
KryoSerializer.
(sorry scala is an alien lang for me). 



-
Thanks & Regards,
Mohan
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14649.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.

2014-09-18 Thread mohan.gadm
Thanks for the info frank.
so your suggestion could be to use Avro serializer.

i just have to configure it like Kryo for the same property? and is there
any registering process for this or just specify serializer?
Also does it effect performance. what measures to be taken to avoid.
(im using kryo just because of its high performance, size compared to avro)



-
Thanks & Regards,
Mohan
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14567.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.

2014-09-18 Thread mohan.gadm
Hi frank, thanks for the info, thats great. but im not saying Avro serializer
is failing. Kryo is failing
but 
im using kryo serializer. and registering Avro generated classes with kryo.
sparkConf.set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer");
sparkConf.set("spark.kryo.registrator",
"com.globallogic.goliath.platform.PlatformKryoRegistrator");

But how did it able to perform output operation when the message is simple.
but not when the message is complex.(please observe no avro schema changes)
just the data is changed.
providing you more info below.

avro schema:
=
record KeyValueObject {
union{boolean, int, long, float, double, bytes, string} name;
union {boolean, int, long, float, double, bytes, string,
array, KeyValueObject} value;
}
record Datum {
union {boolean, int, long, float, double, bytes, string,
array, KeyValueObject} value;
}
record ResourceMessage {
string version;
string sequence;
string resourceGUID;
string GWID;
string GWTimestamp;
union {Datum, array} data;
}

simple message is as below:
===
{"version": "01", "sequence": "1", "resourceGUID": "001", "GWID": "002",
"GWTimestamp": "1409823150737", "data": {"value": "30"}}

complex message is as below:
===
{"version": "01", "sequence": "1", "resource": "sensor-001",
"controller": "002", "controllerTimestamp": "1411038710358", "data":
{"value": [{"name": "Temperature", "value": "30"}, {"name": "Speed",
"value": "60"}, {"name": "Location", "value": ["+401213.1", "-0750015.1"]},
{"name": "Timestamp", "value": "2014-09-09T08:15:25-05:00"}]}}


both messages can fit in to the schema,

actually the message is coming from kafka, which is avro binary.
at spark converting the message to Avro objects(ResourceMessage) using
decoders.(this is also working).
able to perform some mappings, able to convert the stream
to stream

now the events need to be pushed to flume source. for this i need to collect
the RDD, and then send to flume client.

end to end worked fine with simple message. problem is with complex message.




-
Thanks & Regards,
Mohan
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14565.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.

2014-09-18 Thread mohan.gadm
Added some more info on this issue in the tracker Spark-3447
https://issues.apache.org/jira/browse/SPARK-3447



-
Thanks & Regards,
Mohan
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14557.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Kryo fails with avro having Arrays and unions, but succeeds with simple avro.

2014-09-18 Thread mohan.gadm
*I am facing similar issue to Spark-3447 with spark streaming Api, Kryo
Serializer, Avro messages.
If avro message is simple, its fine. but if the avro message has
Union/Arrays its failing with the exception Below:*
ERROR scheduler.JobScheduler: Error running job streaming job 1411043845000
ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Exception
while getting task result: com.esotericsoftware.kryo.KryoException:
java.lang.NullPointerException
Serialization trace:
value (com.globallogic.goliath.model.Datum)
data (com.globallogic.goliath.model.ResourceMessage)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

*Above exception shows up when used output operations.*

*below is the avro message.
{"version": "01", "sequence": "1", "resource": "sensor-001",
"controller": "002", "controllerTimestamp": "1411038710358", "data":
{"value": [{"name": "Temperature", "value": "30"}, {"name": "Speed",
"value": "60"}, {"name": "Location", "value": ["+401213.1", "-0750015.1"]},
{"name": "Timestamp", "value": "2014-09-09T08:15:25-05:00"}]}}

message is been successfully decoded in decoder, but throws exception for
output operation.*



-
Thanks & Regards,
Mohan
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org