On Wed, Jun 12, 2019 at 8:29 PM Kenneth Knowles <k...@apache.org> wrote:
> Can we choose a first step? I feel there's consensus around: > > - the basic idea of what a schema looks like, ignoring logical types or > SDK-specific bits > - the version of logical type which is a standardized URN+payload plus a > representation > > Perhaps we could commit this and see what it looks like to try to use it? > > It also seems like there might be consensus around the idea of each of: > > - a coder that simply encodes rows; its payload is just a schema; it is > minimalist, canonical > - a coder that encodes a non-row using the serialization format of a row; > this has to be a coder (versus Convert transforms) so that to/from row > conversions can be elided when primitives are fused (just like to/from > bytes is elided) > Actually this doesn't make sense to me. I think from the portability perspective, all we have is schemas - the rest is just a convenience for the SDK. As such, I don't think it makes sense at all to model this as a Coder. > > Can we also just have both of these, with different URNs? > > Kenn > > On Wed, Jun 12, 2019 at 3:57 PM Reuven Lax <re...@google.com> wrote: > >> >> >> On Wed, Jun 12, 2019 at 3:46 PM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> On Tue, Jun 11, 2019 at 8:04 PM Kenneth Knowles <k...@apache.org> wrote: >>> >>>> >>>> I believe the schema registry is a transient construction-time concept. >>>> I don't think there's any need for a concept of a registry in the portable >>>> representation. >>>> >>>> I'd rather urn:beam:schema:logicaltype:javasdk not be used whenever one >>>>> has (say) a Java POJO as that would prevent other SDKs from >>>>> "understanding" >>>>> it as above (unless we had a way of declaring it as "just an >>>>> alias/wrapper"). >>>>> >>>> >>>> I didn't understand the example I snipped, but I think I understand >>>> your concern here. Is this what you want? (a) something presented as a POJO >>>> in Java (b) encoded to a row, but still decoded to the POJO and (c) >>>> non-Java SDK knows that it is "just a struct" so it is safe to mess about >>>> with or even create new ones. If this is what you want it seems potentially >>>> useful, but also easy to live without. This can also be done entirely >>>> within the Java SDK via conversions, leaving no logical type in the >>>> portable pipeline. >>>> >>> >>> I'm imaging a world where someone defines a PTransform that takes a POJO >>> for a constructor, and consumes and produces a POJO, and is now usable from >>> Go with no additional work on the PTransform author's part. But maybe I'm >>> thinking about this wrong and the POJO <-> Row conversion is part of >>> the @ProcesssElement magic, not encoded in the schema itself. >>> >> >> The user's output would have to be explicitly schema. They would somehow >> have to tell Beam the infer a schema from the output POJO (e.g. one way to >> do this is to annotate the POJO with the @DefaultSchema annotation). We >> don't currently magically turn a POJO into a schema unless we are asked to >> do so. >> >