Some initial thoughts:

Making schema inference handle generic classes would be a nice improvement
- users occasionally bump into this restriction, and there's no reason not
to improve it.

I would recommend using the new Java reflection APIs (i.e.
getRecordComponents) to directly infer the schema. I think we'll end up
with less error-prone code that way.

We should still use the codegen path for generating efficient Row objects
here, otherwise Record classes will end up being significantly less
efficient than regular Java objects. Since I believe that Record classes
expand out into normal classes, we should be able to reuse the existing
code (i.e. JavaFieldSchema.java and PojotUtils.java) with maybe some small
modifications.

On Mon, Apr 15, 2024 at 8:03 AM Maciej Szwaja via dev <dev@beam.apache.org>
wrote:

> Hi team,
>
> I'd like to propose a new java sdk extension feature, which is adding
> support for java record schema inference - see the design doc here:
> https://docs.google.com/document/d/1zSQ9cnqtVM8ttJEuHBDE6hw4qjUuJy1dpZWB6IBTuOs/edit?usp=sharing
>
> In short - adding this extension's jar to the classpath would enable users
> to use java 17 record classes as elements of the PCollections simply by
> annotating them with DefaultSchema annotation (pointing to the new
> RecordSchema provider) similarly to how it's currently possible with
> JavaBean or AutoValue classes.
>
> Let me know what you think, there's already an open feature request
> created last year (https://github.com/apache/beam/issues/27802), I
> could simply take it and start working on it if the proposal gets approved
>
> Thanks,
> Maciej
>

Reply via email to