[jira] [Commented] (SAMZA-484) Define the serialization/deserialization format for stream tuple

Chris Riccomini (JIRA) Wed, 14 Jan 2015 15:04:12 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277853#comment-14277853
 ]


Chris Riccomini commented on SAMZA-484:
---------------------------------------

I think there are three basic functions being discussed here (and in 
SAMZA-429), which can be summarized by three different method calls:

# envelope.getMessage
# envelope.getGenericMessage
# envelope.getGenericMessageSchema

I like [~jkreps]' comment on SAMZA-429:

bq. One intermediate position would be to leave the framework pluggable ... but 
provide a good implementation of one serialization type that could be kind of 
recommended for those who don't have a preference.

Although not exactly the same, what I propose is that we leave the 
IncomingMessageEnvelope API as it is--just support (1)--and define a Tuple 
interface which as a generic getField (2) method and a getSchema (3) method. 
This should cover all of our use cases. We can then implement a 
GenericRecordTuple, ProtobufTuple, JsonTuple, etc.

If the Tuple interface is proves useful, I could imagine where it becomes used 
inside raw StreamTasks, not just in the SQL-layer. This would naturally lead to 
utility methods and libraries for tasks that use Tuples, but would still leave 
things pluggable for those that don't.

A pseudo-code example joiner class that uses the Tuple interface:

{code}
def init(...) {
  joiner = new Joiner(key="member_id")
}

def process(envelope: IncomingMessageEnvelope, ...) {
  val tuple = new AvroTuple((GenericRecord) envelope.getMessage());
  val joinedTuple = joiner.join(envelope.getSystemStreamPartition.getStream, 
tuple)
  if(joinedTuple != null) {
    collector.send((GenericRecord) joinedTuple.getRawObject())
  }
}
{code}

We could then use Tuple for SQL as well. Again, this wouldn't require any API 
changes to Samza, which seems like a good sign.

> Define the serialization/deserialization format for stream tuple
> ----------------------------------------------------------------
>
>                 Key: SAMZA-484
>                 URL: https://issues.apache.org/jira/browse/SAMZA-484
>             Project: Samza
>          Issue Type: Sub-task
>          Components: sql
>            Reporter: Yi Pan (Data Infrastructure)
>            Priority: Minor
>              Labels: project
>
> It came out in the discussion for streaming SQL that we will need to define 
> the serialization/deserialization format for stream tuple.
> The ideal serialization/deserialization format should allow both forward and 
> backward compatibility on additional/missing fields in the data.
> Several choices to be considered:
> 1) Avro
> 2) Protobuf
> 3) Flatbuffer
> It might also be interesting to consider a pluggable serialization interface 
> that allows different serialization methods for different Samza jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-484) Define the serialization/deserialization format for stream tuple

Reply via email to