[ 
https://issues.apache.org/jira/browse/SAMZA-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170034#comment-14170034
 ] 

Jonathan Herriott commented on SAMZA-429:
-----------------------------------------

I'm making the assumption that (1) is what I was proposing, and (2) is getting 
raw library objects (such as HashMap<String, Object> which is returned by 
JSONSerde).

By envelope.getMessage() returning Object, it doesn't specify either, however, 
implicitly it does.  Realistically, what this means is that given two Serde 
developers who aren't communicating, one will develop a subtype for Object 
which is incompatible with the other developer's subtype and thereby will 
create an ecosystem where no two Serde will be compatible in terms of return 
values for deserialization.  This means that it is up to the Task itself to 
either handle multiple protocol libraries or only accept one, which the latter 
is the "easiest" to implement, and so that is how people will do them.  In 
terms of an ecosystem, it will be much easier to publish Samza jobs on github 
or whatever for re-use *IF* I don't have to worry about what protocol is being 
used and can just assume there is (1).  At least, that's how I envision the 
direction of the samza ecosystem, where people can just trade their Samza 
Tasks, otherwise, you end up having lots of people write the same exact thing 
for different protocols and the ecosystem becomes fragmented based on Serde.

So yes, I do agree that it doesn't explicitly enforce either, however, I think 
implicitly, based on my experiences human behavior, it implicitly does.

> Decouple Protocol from Task
> ---------------------------
>
>                 Key: SAMZA-429
>                 URL: https://issues.apache.org/jira/browse/SAMZA-429
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Jonathan Herriott
>
> Maybe someone can point me in the right direction if this is wrong.  One 
> thing I've disliked about tasks is the fact that the protocols have to be 
> baked directly into the Task, so if you want to process JSON, you have to 
> treat the message contents as a HashMap, but if you want to use Avro, it 
> needs to be treated as a GenericRecord object, etc.  I think it would be 
> super beneficial to fully abstract this from the Task object and just treat 
> each thing as a "Message" object.  I think the advantage of this is that you 
> can test with JSON and run with Avro in production or whatever as debugging 
> with JSON is a lot easier than Avro.
> The thing is, in the Task, I only care about the structure, I don't really 
> care about what protocol it is.  Maybe this statement is a bit naive, but I 
> don't think there would ever be a good situation in which you would pass just 
> a string or integer or whatever instead of some form of hierarchical message. 
>  In my opinion, all Serde should return a common interface for a Record for 
> deserialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to