Re: [DISCUSS] Portability representation of schemas

Reuven Lax Wed, 12 Jun 2019 20:47:26 -0700

On Wed, Jun 12, 2019 at 8:29 PM Kenneth Knowles <[email protected]> wrote:


> Can we choose a first step? I feel there's consensus around:
>
>  - the basic idea of what a schema looks like, ignoring logical types or
> SDK-specific bits
>  - the version of logical type which is a standardized URN+payload plus a
> representation
>
> Perhaps we could commit this and see what it looks like to try to use it?
>
> It also seems like there might be consensus around the idea of each of:
>
>  - a coder that simply encodes rows; its payload is just a schema; it is
> minimalist, canonical
>  - a coder that encodes a non-row using the serialization format of a row;
> this has to be a coder (versus Convert transforms) so that to/from row
> conversions can be elided when primitives are fused (just like to/from
> bytes is elided)
>

Actually this doesn't make sense to me. I think from the portability
perspective, all we have is schemas - the rest is just a convenience for
the SDK. As such, I don't think it makes sense at all to model this as a
Coder.


>
> Can we also just have both of these, with different URNs?
>
> Kenn
>
> On Wed, Jun 12, 2019 at 3:57 PM Reuven Lax <[email protected]> wrote:
>
>>
>>
>> On Wed, Jun 12, 2019 at 3:46 PM Robert Bradshaw <[email protected]>
>> wrote:
>>
>>> On Tue, Jun 11, 2019 at 8:04 PM Kenneth Knowles <[email protected]> wrote:
>>>
>>>>
>>>> I believe the schema registry is a transient construction-time concept.
>>>> I don't think there's any need for a concept of a registry in the portable
>>>> representation.
>>>>
>>>> I'd rather urn:beam:schema:logicaltype:javasdk not be used whenever one
>>>>> has (say) a Java POJO as that would prevent other SDKs from 
>>>>> "understanding"
>>>>> it as above (unless we had a way of declaring it as "just an
>>>>> alias/wrapper").
>>>>>
>>>>
>>>> I didn't understand the example I snipped, but I think I understand
>>>> your concern here. Is this what you want? (a) something presented as a POJO
>>>> in Java (b) encoded to a row, but still decoded to the POJO and (c)
>>>> non-Java SDK knows that it is "just a struct" so it is safe to mess about
>>>> with or even create new ones. If this is what you want it seems potentially
>>>> useful, but also easy to live without. This can also be done entirely
>>>> within the Java SDK via conversions, leaving no logical type in the
>>>> portable pipeline.
>>>>
>>>
>>> I'm imaging a world where someone defines a PTransform that takes a POJO
>>> for a constructor, and consumes and produces a POJO, and is now usable from
>>> Go with no additional work on the PTransform author's part.  But maybe I'm
>>> thinking about this wrong and the POJO <-> Row conversion is part of
>>> the @ProcesssElement magic, not encoded in the schema itself.
>>>
>>
>> The user's output would have to be explicitly schema. They would somehow
>> have to tell Beam the infer a schema from the output POJO (e.g. one way to
>> do this is to annotate the POJO with the @DefaultSchema annotation).  We
>> don't currently magically turn a POJO into a schema unless we are asked to
>> do so.
>>
>

Re: [DISCUSS] Portability representation of schemas

Reply via email to