Registration/definition and use are different sites. The TypeDescriptor
always comes from the user fn or the transform. For Jeff's example, the
AutoValueSchema provider is registered with MyClass.class which is fine.
Then when a user writes a DoFn that accepts or returns a
MyClass<ActualType> first you infer a schema ActualSchema for ActualType
and then you pass it along and the invocation is along the lines of
AutoValueSchemeProvider(MyClass.class, ImmutableList.of(ActualSchema)) and
you'd get a legitimate schema for MyClass<ActualType>.

I expect this would be a decent amount of work in the schema machinery and
also the AutoValueSchemaProvider would need to be type-variable aware.

Kenn

On Sun, Feb 10, 2019 at 8:24 PM Reuven Lax <re...@google.com> wrote:

> Ok, so actually SchemaRegistry is based on TypeDescriptors, so it does not
> have this limitation (I was wrong about that).
>
> However, I'm still not sure that the @DefaultSchema annotation-based
> registration would work here. Right now it tries to infer a schema eagerly,
> which clearly would not work. I guess we could create a SchemaProvider that
> lazily resolved the schema only upon use, when we should have a good
> TypeDescriptor.. However I'm still worried that we often won't have a good
> type descriptor. It works well for DoFn, because usually the user's DoFn is
> a concrete class with resolved types. I'm not sure that this is easy to do
> with AutoValue; the user can't create a concrete subclass of their
> AutoValue class, as that won't work with the generated code AutoValue does.
>
> Reuven
>
> On Sun, Feb 10, 2019 at 8:00 PM Kenneth Knowles <k...@google.com> wrote:
>
>> Hmm, this is a huge limitation relative to the CoderRegistry, which very
>> explicitly does support constructing parameterized coders via
>> CoderProvider. The root CoderProvider is still keyed on rawtype but the
>> CoderProvider is passed inferred coders for the concrete parameters. Here's
>> how List.class is registered:
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L116
>>
>> The one thing that _is_ required for this is that at the call site a good
>> TypeDescriptor is captured. That is mostly automatic for DoFns, hence the
>> CoderRegistry works fairly well. There are special methods in various user
>> fns and boilerplate in transforms like MapElements to provide a good
>> TypeDescriptor.
>>
>> Kenn
>>
>> On Sun, Feb 10, 2019 at 5:11 PM Reuven Lax <re...@google.com> wrote:
>>
>>> This is an interesting question.
>>>
>>> In general, I don't think schema inference can handle these generics
>>> today. Right now the SchemaRegistry is keyed off of Java class, and due to
>>> type erasure all different instances of . MyClass<T> will look the same.
>>>
>>> Now it might be possible to include generic type parameters in the
>>> registry. You would not be able to use the @DefaultSchema annotation to
>>> infer a schema, but you might be able to dynamically register a schema
>>> using a TypeDescriptor. Unfortunately I think this would only sometimes
>>> work. e..g. my experience has been that given a type T you can often figure
>>> out T using reflection, but if there are nested types (e.g. List<T>) than
>>> Java doesn't always preserve these types for introspection..
>>>
>>> In sum, I think we could do a bit better for these types of classes, but
>>> not a whole lot better.
>>>
>>> Reuven
>>>
>>> On Mon, Feb 4, 2019 at 6:02 AM Jeff Klukas <jklu...@mozilla.com> wrote:
>>>
>>>> I've started experimenting with Beam schemas in the context of creating
>>>> custom AutoValue-based classes and using AutoValueSchema to generate
>>>> schemas and thus coders.
>>>>
>>>> AFAICT, schemas need to have types fully specified, so it doesn't
>>>> appear to be possible to define an AutoValue class with a type parameter
>>>> and then create a schema for it. Basically, I want to confirm whether the
>>>> following type would ever be possible to create a schema for:
>>>>
>>>> @DefaultSchema(AutoValueSchema.class)
>>>> @AutoValue
>>>> public abstract class MyClass<T> {
>>>>   public abstract T getField1();
>>>>   public abstract String getField2();
>>>>   public static <T> MyClass<T> of(T field1, String field2) {
>>>>     return new AutoValue_MyClass(field1, field2);
>>>>   }
>>>> }
>>>>
>>>> This may be an entirely reasonable limitation of the schema machinery,
>>>> but I want to make sure I'm not missing something.
>>>>
>>>

Reply via email to