[ 
https://issues.apache.org/jira/browse/SPARK-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597973#comment-14597973
 ] 

Joseph Batchik commented on SPARK-746:
--------------------------------------

Spark can currently serialize the three type of Avro records if the user 
specifies Kryo. Specific and Reflect records serialize just fine, if the user 
registers them ahead of time, since Kryo can efficiently deal with serializing 
classes. The problem lies in generic records since Kryo cannot serialize them 
without a large amount of overhead. This causes issues for users who want to 
use Avro records during a shuffle. To alleviate this, I implemented a custom 
Kryo serializer for generic records that tries to reduce the amount of network 
IO.

https://github.com/JDrit/spark/commit/6f1106bc20eb670e963d45a191dfc4517d46543b

This works by sending a compressed form of the schema with each message over 
have Kryo serialize the in-memory representation itself. Since the same schema 
is going to be sent numerous times, it caches previously seen values as to 
reduce the computation needed. It also allows users to register their schemas 
ahead of time. This allows it to just send the schema’s unique ID with each 
message, over the entire schema itself.

Could I get some feedback about this approach or let me know if I am missing 
anything important.

> Automatically Use Avro Serialization for Avro Objects
> -----------------------------------------------------
>
>                 Key: SPARK-746
>                 URL: https://issues.apache.org/jira/browse/SPARK-746
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Patrick Cogan
>
> All generated objects extend org.apache.avro.specific.SpecificRecordBase (or 
> there may be a higher up class as well).
> Since Avro records aren't JavaSerializable by default people currently have 
> to wrap their records. It would be good if we could use an implicit 
> conversion to do this for them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to