I need to take more time to think about this. Here are a few off-the-cuff
remarks:

- To date we have tried really, really hard to keep the data model for
message simple since after all you can always add whatever you like inside
the message body.

- For system tags, why not just make these fields first class fields in
message? The purpose of a system tag is presumably that Why have a bunch of
key-value pairs versus first-class fields?

- You don't necessarily need application-level tags explicitly represented
in the message format for efficiency. The application can define their own
header (e.g. their message could be a size delimited header followed by a
size delimited body). But actually if you use Avro you don't even need this
I don't think. Avro has the ability to just deserialize the "header" fields
in your message. Avro has a notion of reader and writer schemas. The writer
schema is whatever the message was written with. If the reader schema is
just the header, avro will skip any fields it doesn't need and just
deserialize the fields it does need. This is actually a much more usable
and flexible way to define a header since you get all the types avro allows
instead of just bytes.

- We will need to think carefully about what to do with timestamps if we
end up including them. There are actually several timestamps
  - The time the producer created the message
  - The time the leader received the message
  - The time the current broker received the message
The producer timestamps won't be at all increasing. The leader timestamp
will be mostly increasing except when the clock changes or leadership
moves. This somewhat complicates the use of these timestamps, though. From
the point of view of the producer the only time that matters is the time
the message was created. However since the producer sets this it can be
arbitrarily bad (remember all the ntp issues and 1970 timestamps we would
get). Say that the heuristic was to use the timestamp of the first message
in a file for retention, the problem would be that the timestamps for the
segments need not even be sequential and a single bad producer could send
data with time in the distant past or future causing data to be deleted or
retained forever. Using the broker timestamp at write time is better,
though obvious that would be overwritten when data is mirrored between
clusters (the mirror would then have a different time--and if the mirroring
ever stopped that gap could be large). One approach would be to use the
client timestamp but have the broker overwrite it if it is too bad (e.g.
off by more than a minute, say).

-Jay

On Fri, Oct 10, 2014 at 11:21 PM, Joel Koshy <jjkosh...@gmail.com> wrote:

> Thanks Guozhang! This is an excellent write-up and the approach nicely
> consolidates a number of long-standing issues. It would be great if
> everyone can review this carefully and give feedback.
>
> Also, wrt discussion in the past we have used a mix of wiki comments
> and the mailing list. Personally, I think it is better to discuss on
> the mailing list (for more visibility) and just post a bold link to
> the (archived) mailing list thread on the wiki.
>
> Joel
>
> On Fri, Oct 10, 2014 at 05:33:52PM -0700, Guozhang Wang wrote:
> > Hello all,
> >
> > I put some thoughts on enhancing our current message metadata format to
> > solve a bunch of existing issues:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Enriched+Message+Metadata
> >
> > This wiki page is for kicking off some discussions about the feasibility
> of
> > adding more info into the message header, and if possible how we would
> add
> > them.
> >
> > -- Guozhang
>
>

Reply via email to