[ https://issues.apache.org/jira/browse/SPARK-17829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595716#comment-15595716 ]
Michael Armbrust commented on SPARK-17829: ------------------------------------------ Yeah, I agree. I think it could be really simple. We can have one that just holds json, and sources can convert to something more specific internally. {code} abstract class Offset { def json: String } /** Used when loading */ case class SerializedOffset(json: String) /** Used to convert to a more specific type */ object LongOffset { def apply(serialized: Offset) = LongOffset(parse(serialized.json).as[Long])) } case class LongOffset(value: Long) extends Offset { def json = value.toString } {code} > Stable format for offset log > ---------------------------- > > Key: SPARK-17829 > URL: https://issues.apache.org/jira/browse/SPARK-17829 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Michael Armbrust > Assignee: Tyson Condie > > Currently we use java serialization for the WAL that stores the offsets > contained in each batch. This has two main issues: > - It can break across spark releases (though this is not the only thing > preventing us from upgrading a running query) > - It is unnecessarily opaque to the user. > I'd propose we require offsets to provide a user readable serialization and > use that instead. JSON is probably a good option. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org