[ https://issues.apache.org/jira/browse/SPARK-17829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609499#comment-15609499 ]
Tyson Condie commented on SPARK-17829: -------------------------------------- In the process of adopting a complete JSON serialization metadata logging standard, an issue was raised by TD as to the extent of such a change; particularly pertaining to CompactableFileStreamLog. I would like to propose the following change items: 1. FileEntry and SinkFileStatus classes should provide JSON serialization routines. 2. Define a JsonSerializable trait with a 'def json: String' method. This seems more like a trait rather than an abstract class. 3. Impose an upper bound on the generic type in MetadataLog i.e. 'trait MetadataLog[T <: JsonSerializable]' 4. Define a class that can contain a sequence of MetadataLog entries, and is JsonSerializable i.e. 'class MetadataLogSeq[T](entries: Seq[T]) extends JsonSerializable' 5. Revise CompactibleFileStreamLog to use MetadataLogSeq instead of Array. > Stable format for offset log > ---------------------------- > > Key: SPARK-17829 > URL: https://issues.apache.org/jira/browse/SPARK-17829 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Michael Armbrust > Assignee: Tyson Condie > > Currently we use java serialization for the WAL that stores the offsets > contained in each batch. This has two main issues: > - It can break across spark releases (though this is not the only thing > preventing us from upgrading a running query) > - It is unnecessarily opaque to the user. > I'd propose we require offsets to provide a user readable serialization and > use that instead. JSON is probably a good option. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org