FYI I can imagine a world in which we have no coders. We could define the entire model on top of schemas. Today's "Coder" is completely equivalent to a single-field schema with a logical-type field (actually the latter is slightly more expressive as you aren't forced to serialize into bytes).
Due to compatibility constraints and the effort that would be involved in such a change, I think the practical decision should be for schemas and coders to coexist for the time being. However when we start planning Beam 3.0, deprecating coders is something I would like to suggest. On Thu, May 9, 2019 at 7:48 AM Robert Bradshaw <[email protected]> wrote: > From: Kenneth Knowles <[email protected]> > Date: Thu, May 9, 2019 at 10:05 AM > To: dev > > > This is a huge development. Top posting because I can be more compact. > > > > I really think after the initial idea converges this needs a design doc > with goals and alternatives. It is an extraordinarily consequential model > change. So in the spirit of doing the work / bias towards action, I created > a quick draft at https://s.apache.org/beam-schemas and added everyone on > this thread as editors. I am still in the process of writing this to match > the thread. > > Thanks! Added some comments there. > > > *Multiple timestamp resolutions*: you can use logcial types to represent > nanos the same way Java and proto do. > > As per the other discussion, I'm unsure the value in supporting > multiple timestamp resolutions is high enough to outweigh the cost. > > > *Why multiple int types?* The domain of values for these types are > different. For a language with one "int" or "number" type, that's another > domain of values. > > What is the value in having different domains? If your data has a > natural domain, chances are it doesn't line up exactly with one of > these. I guess it's for languages whose types have specific domains? > (There's also compactness in representation, encoded and in-memory, > though I'm not sure that's high.) > > > *Columnar/Arrow*: making sure we unlock the ability to take this path is > Paramount. So tying it directly to a row-oriented coder seems > counterproductive. > > I don't think Coders are necessarily row-oriented. They are, however, > bytes-oriented. (Perhaps they need not be.) There seems to be a lot of > overlap between what Coders express in terms of element typing > information and what Schemas express, and I'd rather have one concept > if possible. Or have a clear division of responsibilities. > > > *Multimap*: what does it add over an array-valued map or > large-iterable-valued map? (honest question, not rhetorical) > > Multimap has a different notion of what it means to contain a value, > can handle (unordered) unions of non-disjoint keys, etc. Maybe this > isn't worth a new primitive type. > > > *URN/enum for type names*: I see the case for both. The core types are > fundamental enough they should never really change - after all, proto, > thrift, avro, arrow, have addressed this (not to mention most programming > languages). Maybe additions once every few years. I prefer the smallest > intersection of these schema languages. A oneof is more clear, while URN > emphasizes the similarity of built-in and logical types. > > Hmm... Do we have any examples of the multi-level primitive/logical > type in any of these other systems? I have a bias towards all types > being on the same footing unless there is compelling reason to divide > things into primitive/use-defined ones. > > Here it seems like the most essential value of the primitive type set > is to describe the underlying representation, for encoding elements in > a variety of ways (notably columnar, but also interfacing with other > external systems like IOs). Perhaps, rather than the previous > suggestion of making everything a logical of bytes, this could be made > clear by still making everything a logical type, but renaming > "TypeName" to Representation. There would be URNs (typically with > empty payloads) for the various primitive types (whose mapping to > their representations would be the identity). > > - Robert >
