[
https://issues.apache.org/jira/browse/SAMZA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277853#comment-14277853
]
Chris Riccomini commented on SAMZA-484:
---------------------------------------
I think there are three basic functions being discussed here (and in
SAMZA-429), which can be summarized by three different method calls:
# envelope.getMessage
# envelope.getGenericMessage
# envelope.getGenericMessageSchema
I like [~jkreps]' comment on SAMZA-429:
bq. One intermediate position would be to leave the framework pluggable ... but
provide a good implementation of one serialization type that could be kind of
recommended for those who don't have a preference.
Although not exactly the same, what I propose is that we leave the
IncomingMessageEnvelope API as it is--just support (1)--and define a Tuple
interface which as a generic getField (2) method and a getSchema (3) method.
This should cover all of our use cases. We can then implement a
GenericRecordTuple, ProtobufTuple, JsonTuple, etc.
If the Tuple interface is proves useful, I could imagine where it becomes used
inside raw StreamTasks, not just in the SQL-layer. This would naturally lead to
utility methods and libraries for tasks that use Tuples, but would still leave
things pluggable for those that don't.
A pseudo-code example joiner class that uses the Tuple interface:
{code}
def init(...) {
joiner = new Joiner(key="member_id")
}
def process(envelope: IncomingMessageEnvelope, ...) {
val tuple = new AvroTuple((GenericRecord) envelope.getMessage());
val joinedTuple = joiner.join(envelope.getSystemStreamPartition.getStream,
tuple)
if(joinedTuple != null) {
collector.send((GenericRecord) joinedTuple.getRawObject())
}
}
{code}
We could then use Tuple for SQL as well. Again, this wouldn't require any API
changes to Samza, which seems like a good sign.
> Define the serialization/deserialization format for stream tuple
> ----------------------------------------------------------------
>
> Key: SAMZA-484
> URL: https://issues.apache.org/jira/browse/SAMZA-484
> Project: Samza
> Issue Type: Sub-task
> Components: sql
> Reporter: Yi Pan (Data Infrastructure)
> Priority: Minor
> Labels: project
>
> It came out in the discussion for streaming SQL that we will need to define
> the serialization/deserialization format for stream tuple.
> The ideal serialization/deserialization format should allow both forward and
> backward compatibility on additional/missing fields in the data.
> Several choices to be considered:
> 1) Avro
> 2) Protobuf
> 3) Flatbuffer
> It might also be interesting to consider a pluggable serialization interface
> that allows different serialization methods for different Samza jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)