FYI I can imagine a world in which we have no coders. We could define the
entire model on top of schemas. Today's "Coder" is completely equivalent to
a single-field schema with a logical-type field (actually the latter is
slightly more expressive as you aren't forced to serialize into bytes).

Due to compatibility constraints and the effort that would be  involved in
such a change, I think the practical decision should be for schemas and
coders to coexist for the time being. However when we start planning Beam
3.0, deprecating coders is something I would like to suggest.

On Thu, May 9, 2019 at 7:48 AM Robert Bradshaw <rober...@google.com> wrote:

> From: Kenneth Knowles <k...@apache.org>
> Date: Thu, May 9, 2019 at 10:05 AM
> To: dev
>
> > This is a huge development. Top posting because I can be more compact.
> >
> > I really think after the initial idea converges this needs a design doc
> with goals and alternatives. It is an extraordinarily consequential model
> change. So in the spirit of doing the work / bias towards action, I created
> a quick draft at https://s.apache.org/beam-schemas and added everyone on
> this thread as editors. I am still in the process of writing this to match
> the thread.
>
> Thanks! Added some comments there.
>
> > *Multiple timestamp resolutions*: you can use logcial types to represent
> nanos the same way Java and proto do.
>
> As per the other discussion, I'm unsure the value in supporting
> multiple timestamp resolutions is high enough to outweigh the cost.
>
> > *Why multiple int types?* The domain of values for these types are
> different. For a language with one "int" or "number" type, that's another
> domain of values.
>
> What is the value in having different domains? If your data has a
> natural domain, chances are it doesn't line up exactly with one of
> these. I guess it's for languages whose types have specific domains?
> (There's also compactness in representation, encoded and in-memory,
> though I'm not sure that's high.)
>
> > *Columnar/Arrow*: making sure we unlock the ability to take this path is
> Paramount. So tying it directly to a row-oriented coder seems
> counterproductive.
>
> I don't think Coders are necessarily row-oriented. They are, however,
> bytes-oriented. (Perhaps they need not be.) There seems to be a lot of
> overlap between what Coders express in terms of element typing
> information and what Schemas express, and I'd rather have one concept
> if possible. Or have a clear division of responsibilities.
>
> > *Multimap*: what does it add over an array-valued map or
> large-iterable-valued map? (honest question, not rhetorical)
>
> Multimap has a different notion of what it means to contain a value,
> can handle (unordered) unions of non-disjoint keys, etc. Maybe this
> isn't worth a new primitive type.
>
> > *URN/enum for type names*: I see the case for both. The core types are
> fundamental enough they should never really change - after all, proto,
> thrift, avro, arrow, have addressed this (not to mention most programming
> languages). Maybe additions once every few years. I prefer the smallest
> intersection of these schema languages. A oneof is more clear, while URN
> emphasizes the similarity of built-in and logical types.
>
> Hmm... Do we have any examples of the multi-level primitive/logical
> type in any of these other systems? I have a bias towards all types
> being on the same footing unless there is compelling reason to divide
> things into primitive/use-defined ones.
>
> Here it seems like the most essential value of the primitive type set
> is to describe the underlying representation, for encoding elements in
> a variety of ways (notably columnar, but also interfacing with other
> external systems like IOs). Perhaps, rather than the previous
> suggestion of making everything a logical of bytes, this could be made
> clear by still making everything a logical type, but renaming
> "TypeName" to Representation. There would be URNs (typically with
> empty payloads) for the various primitive types (whose mapping to
> their representations would be the identity).
>
> - Robert
>

Reply via email to