Re: Serde defaults and config hierarchy

Chris Riccomini Mon, 10 Feb 2014 16:10:34 -0800

Hey Garry,

The first one you have is now defunct:

  serializers.default=string

This was a relic of Samza 0.6.0. It shouldn't do anything.

You are correct regarding the other two. The system-level serde defines a
serde that is used by default for all streams in the system. The
stream-level serde overrides any system level serde, and defines the serde
only for the specific stream.

Actually, no levels are required. Samza is not strongly typed
internally--it just uses Object everywhere. Internally, consumers and
producers just get objects and need to know what to do with them. If you
DO define a serde, Samza will pass the object through the serde, and give
the consumer/producer the results. This sounds a little goofy until you
start thinking about things like JDBC consumers, which receive result
sets, and never have bytes. It's easy for such consumers to hand objects
directly to Samza, rather than try to hack around a byte-specific serde
interface.

In general, it's best practice to define key and message serdes at the
system level for Kafka systems, which sounds like what you've done.

Regarding the serde package splice (core vs. serializers), the main reason
for a separate serializers package is to isolate dependencies from core.
We want as few dependencies as possible in samza-core, since it's a
low-level framework, and we want to avoid version conflicts where
developers might want (or need) to use version X of a library, and we
depend on an API incompatible version Y of the same library.

That said, the split is completely arbitrary right now, because Samza
already depends on Jackson for a number of things, and all serdes that
exist so far either use Jackson, or Java primitives (e.g. Integer serde).
We didn't put a ton of thought into that--it just evolved organically to
what it is now. We'll probably need to refactor it at some point.

Cheers,
Chris

On 2/10/14 3:33 PM, "Garry Turkington" <[email protected]>
wrote:

>Hi,
>
>Damn, Chris asked for my task config which is only  going to show how
>confused I am on serde config options. So I want to avoid any
>embarrassment. :)
>
>Looking at config files there seem to be 3 places to define the serde,
>for example:
>
>serializers.default=string
>systems.kafka.samza.msg.serde=string
>systems.kafka.streams.msgs-parsed.samza.msg.serde=string
>
>I've been reading this as the first is the default for all defined
>systems, the 2nd for a given system and the 3rd is specifying for a given
>stream. Is this correct? If so are all levels required or could I for
>example get away with only the 2nd if I only used Kafka and only had
>streams requiring the string serde? I got myself into some knots with a
>task with multiple streams each with different serdes so clarity would be
>good.
>
>And as an aside any reason why two serdes are in samza-serializers and
>the rest are in samza-core? At first blush it looked like a
>system/user-facing split but they both seem to have a mix (JSON/metrics
>in one, Integer/Checkpoint etc in the other).
>
>Thanks
>Garry
>

Re: Serde defaults and config hierarchy

Reply via email to