Re: [DISCUSS] Portability representation of schemas

Kenneth Knowles Wed, 12 Jun 2019 20:29:29 -0700

Can we choose a first step? I feel there's consensus around:

 - the basic idea of what a schema looks like, ignoring logical types or
SDK-specific bits
 - the version of logical type which is a standardized URN+payload plus a
representation


Perhaps we could commit this and see what it looks like to try to use it?

It also seems like there might be consensus around the idea of each of:

 - a coder that simply encodes rows; its payload is just a schema; it is
minimalist, canonical
 - a coder that encodes a non-row using the serialization format of a row;
this has to be a coder (versus Convert transforms) so that to/from row
conversions can be elided when primitives are fused (just like to/from
bytes is elided)

Can we also just have both of these, with different URNs?

Kenn

On Wed, Jun 12, 2019 at 3:57 PM Reuven Lax <[email protected]> wrote:

>
>
> On Wed, Jun 12, 2019 at 3:46 PM Robert Bradshaw <[email protected]>
> wrote:
>
>> On Tue, Jun 11, 2019 at 8:04 PM Kenneth Knowles <[email protected]> wrote:
>>
>>>
>>> I believe the schema registry is a transient construction-time concept.
>>> I don't think there's any need for a concept of a registry in the portable
>>> representation.
>>>
>>> I'd rather urn:beam:schema:logicaltype:javasdk not be used whenever one
>>>> has (say) a Java POJO as that would prevent other SDKs from "understanding"
>>>> it as above (unless we had a way of declaring it as "just an
>>>> alias/wrapper").
>>>>
>>>
>>> I didn't understand the example I snipped, but I think I understand your
>>> concern here. Is this what you want? (a) something presented as a POJO in
>>> Java (b) encoded to a row, but still decoded to the POJO and (c) non-Java
>>> SDK knows that it is "just a struct" so it is safe to mess about with or
>>> even create new ones. If this is what you want it seems potentially useful,
>>> but also easy to live without. This can also be done entirely within the
>>> Java SDK via conversions, leaving no logical type in the portable pipeline.
>>>
>>
>> I'm imaging a world where someone defines a PTransform that takes a POJO
>> for a constructor, and consumes and produces a POJO, and is now usable from
>> Go with no additional work on the PTransform author's part.  But maybe I'm
>> thinking about this wrong and the POJO <-> Row conversion is part of
>> the @ProcesssElement magic, not encoded in the schema itself.
>>
>
> The user's output would have to be explicitly schema. They would somehow
> have to tell Beam the infer a schema from the output POJO (e.g. one way to
> do this is to annotate the POJO with the @DefaultSchema annotation).  We
> don't currently magically turn a POJO into a schema unless we are asked to
> do so.
>

Re: [DISCUSS] Portability representation of schemas

Reply via email to