Chris, Thanks for the clarification.
BTW what I think led me astray re serializers.default was this page so perhaps that could be removed from there too: http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/configuration.html Thanks! Garry -----Original Message----- From: Chris Riccomini [mailto:[email protected]] Sent: 11 February 2014 18:06 To: [email protected] Subject: Re: Serde defaults and config hierarchy Hey Garry, Only that default key is deprecated. The other ones are still used (i.e. the string serde snippet you've pasted is valid). Cheers, Chris On 2/11/14 8:38 AM, "Garry Turkington" <[email protected]> wrote: >Hi Chris, > >Thanks for the explanation, all makes sense. > >Is it only that one key in the serializers.* config namespace that is >deprecated or are entries re serializer factory no longer needed, e.g: > >serializers.registry.string.class=org.apache.samza.serializers.StringSe >rde >Factory > >Thanks >Garry > >-----Original Message----- >From: Chris Riccomini [mailto:[email protected]] >Sent: 11 February 2014 01:10 >To: [email protected] >Subject: Re: Serde defaults and config hierarchy > >Hey Garry, > >The first one you have is now defunct: > > serializers.default=string > > >This was a relic of Samza 0.6.0. It shouldn't do anything. > >You are correct regarding the other two. The system-level serde defines >a serde that is used by default for all streams in the system. The >stream-level serde overrides any system level serde, and defines the >serde only for the specific stream. > >Actually, no levels are required. Samza is not strongly typed >internally--it just uses Object everywhere. Internally, consumers and >producers just get objects and need to know what to do with them. If >you DO define a serde, Samza will pass the object through the serde, >and give the consumer/producer the results. This sounds a little goofy >until you start thinking about things like JDBC consumers, which >receive result sets, and never have bytes. It's easy for such consumers >to hand objects directly to Samza, rather than try to hack around a >byte-specific serde interface. > >In general, it's best practice to define key and message serdes at the >system level for Kafka systems, which sounds like what you've done. > >Regarding the serde package splice (core vs. serializers), the main >reason for a separate serializers package is to isolate dependencies >from core. >We want as few dependencies as possible in samza-core, since it's a >low-level framework, and we want to avoid version conflicts where >developers might want (or need) to use version X of a library, and we >depend on an API incompatible version Y of the same library. > >That said, the split is completely arbitrary right now, because Samza >already depends on Jackson for a number of things, and all serdes that >exist so far either use Jackson, or Java primitives (e.g. Integer serde). >We didn't put a ton of thought into that--it just evolved organically >to what it is now. We'll probably need to refactor it at some point. > >Cheers, >Chris > >On 2/10/14 3:33 PM, "Garry Turkington" ><[email protected]> >wrote: > >>Hi, >> >>Damn, Chris asked for my task config which is only going to show how >>confused I am on serde config options. So I want to avoid any >>embarrassment. :) >> >>Looking at config files there seem to be 3 places to define the serde, >>for example: >> >>serializers.default=string >>systems.kafka.samza.msg.serde=string >>systems.kafka.streams.msgs-parsed.samza.msg.serde=string >> >>I've been reading this as the first is the default for all defined >>systems, the 2nd for a given system and the 3rd is specifying for a >>given stream. Is this correct? If so are all levels required or could >>I for example get away with only the 2nd if I only used Kafka and only >>had streams requiring the string serde? I got myself into some knots >>with a task with multiple streams each with different serdes so >>clarity would be good. >> >>And as an aside any reason why two serdes are in samza-serializers and >>the rest are in samza-core? At first blush it looked like a >>system/user-facing split but they both seem to have a mix >>(JSON/metrics in one, Integer/Checkpoint etc in the other). >> >>Thanks >>Garry >> > > >----- >No virus found in this message. >Checked by AVG - www.avg.com >Version: 2014.0.4259 / Virus Database: 3697/7081 - Release Date: >02/10/14 ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4259 / Virus Database: 3697/7081 - Release Date: 02/10/14
