Hey Garry, The first one you have is now defunct:
serializers.default=string This was a relic of Samza 0.6.0. It shouldn't do anything. You are correct regarding the other two. The system-level serde defines a serde that is used by default for all streams in the system. The stream-level serde overrides any system level serde, and defines the serde only for the specific stream. Actually, no levels are required. Samza is not strongly typed internally--it just uses Object everywhere. Internally, consumers and producers just get objects and need to know what to do with them. If you DO define a serde, Samza will pass the object through the serde, and give the consumer/producer the results. This sounds a little goofy until you start thinking about things like JDBC consumers, which receive result sets, and never have bytes. It's easy for such consumers to hand objects directly to Samza, rather than try to hack around a byte-specific serde interface. In general, it's best practice to define key and message serdes at the system level for Kafka systems, which sounds like what you've done. Regarding the serde package splice (core vs. serializers), the main reason for a separate serializers package is to isolate dependencies from core. We want as few dependencies as possible in samza-core, since it's a low-level framework, and we want to avoid version conflicts where developers might want (or need) to use version X of a library, and we depend on an API incompatible version Y of the same library. That said, the split is completely arbitrary right now, because Samza already depends on Jackson for a number of things, and all serdes that exist so far either use Jackson, or Java primitives (e.g. Integer serde). We didn't put a ton of thought into that--it just evolved organically to what it is now. We'll probably need to refactor it at some point. Cheers, Chris On 2/10/14 3:33 PM, "Garry Turkington" <[email protected]> wrote: >Hi, > >Damn, Chris asked for my task config which is only going to show how >confused I am on serde config options. So I want to avoid any >embarrassment. :) > >Looking at config files there seem to be 3 places to define the serde, >for example: > >serializers.default=string >systems.kafka.samza.msg.serde=string >systems.kafka.streams.msgs-parsed.samza.msg.serde=string > >I've been reading this as the first is the default for all defined >systems, the 2nd for a given system and the 3rd is specifying for a given >stream. Is this correct? If so are all levels required or could I for >example get away with only the 2nd if I only used Kafka and only had >streams requiring the string serde? I got myself into some knots with a >task with multiple streams each with different serdes so clarity would be >good. > >And as an aside any reason why two serdes are in samza-serializers and >the rest are in samza-core? At first blush it looked like a >system/user-facing split but they both seem to have a mix (JSON/metrics >in one, Integer/Checkpoint etc in the other). > >Thanks >Garry >
