[ 
https://issues.apache.org/jira/browse/SPARK-17829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609499#comment-15609499
 ] 

Tyson Condie commented on SPARK-17829:
--------------------------------------

In the process of adopting a complete JSON serialization metadata logging 
standard, an issue was raised by TD as to the extent of such a change; 
particularly pertaining to CompactableFileStreamLog. I would like to propose 
the following change items:
1. FileEntry and SinkFileStatus classes should provide JSON serialization 
routines.
2. Define a JsonSerializable trait with a 'def json: String' method. This seems 
more like a trait rather than an abstract class.
3. Impose an upper bound on the generic type in MetadataLog i.e. 'trait 
MetadataLog[T <: JsonSerializable]'
4. Define a class that can contain a sequence of MetadataLog entries, and is 
JsonSerializable i.e. 'class MetadataLogSeq[T](entries: Seq[T]) extends 
JsonSerializable'
5. Revise CompactibleFileStreamLog to use MetadataLogSeq instead of Array. 


> Stable format for offset log
> ----------------------------
>
>                 Key: SPARK-17829
>                 URL: https://issues.apache.org/jira/browse/SPARK-17829
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Tyson Condie
>
> Currently we use java serialization for the WAL that stores the offsets 
> contained in each batch.  This has two main issues:
>  - It can break across spark releases (though this is not the only thing 
> preventing us from upgrading a running query)
>  - It is unnecessarily opaque to the user.
> I'd propose we require offsets to provide a user readable serialization and 
> use that instead.  JSON is probably a good option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to