[ 
https://issues.apache.org/jira/browse/SPARK-17829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610399#comment-15610399
 ] 

Cody Koeninger commented on SPARK-17829:
----------------------------------------

I'm not telling you to do it that way, just asking if you had considered it.  
General advantage of typeclasses being separating concerns (should all these 
classes need to know about json) and getting inductive definitions for free (if 
you have a serializer for container, you have a serializer for any container of 
nested serializable). If all the stuff you're looking at modifying already 
knows about java serialization it may not be a big deal though.

Specifically about the using a seq instead of array for compactible file 
stream, isn't there an existing warning in the code as to why that's using an 
array, due to pathological behavior on large linked lists?

> Stable format for offset log
> ----------------------------
>
>                 Key: SPARK-17829
>                 URL: https://issues.apache.org/jira/browse/SPARK-17829
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Tyson Condie
>
> Currently we use java serialization for the WAL that stores the offsets 
> contained in each batch.  This has two main issues:
>  - It can break across spark releases (though this is not the only thing 
> preventing us from upgrading a running query)
>  - It is unnecessarily opaque to the user.
> I'd propose we require offsets to provide a user readable serialization and 
> use that instead.  JSON is probably a good option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to