Re: Incomplete Beam Schema -> Avro Schema conversion
Is it still better to have an asymmetric conversion that supports more data types than not having these implemented, right? This contribution seems simple enough, but that's definitely not true for the other direction (... and I'm also biased, I only need Beam->Avro). Brian Hulette via dev ezt írta (időpont: 2022. aug. 23., K, 1:53): > I don't think there's a reason for this, it's just that these logical > types were defined after the Avro <-> Beam schema conversion. I think it > would be worthwhile to add support for them, but we'd also need to look at > the reverse (avro to beam) direction, which would map back to the catch-all > DATETIME primitive type [1]. Changing that could break backwards > compatibility. > > [1] > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L771-L776 > > On Wed, Aug 17, 2022 at 2:53 PM Balázs Németh > wrote: > >> java.lang.RuntimeException: Unhandled logical type >> beam:logical_type:date:v1 >> at >> org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943) >> at >> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306) >> at >> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341) >> at >> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java >> >> In >> https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944 >> it seems to me there are some missing options. >> >> For example >> - FixedBytes.IDENTIFIER, >> - EnumerationType.IDENTIFIER, >> - OneOfType.IDENTIFIER >> is there, but: >> - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER >> ("beam:logical_type:date:v1") >> - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER >> ("beam:logical_type:datetime:v1") >> - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER >> ("beam:logical_type:time:v1") >> is missing. >> >> This in an example that fails: >> >>> import java.time.LocalDate; >>> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils; >>> import org.apache.beam.sdk.schemas.Schema; >>> import org.apache.beam.sdk.schemas.Schema.FieldType; >>> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; >>> import org.apache.beam.sdk.schemas.utils.AvroUtils; >>> import org.apache.beam.sdk.values.Row; >> >> // ... >> >> final Schema schema = >>> Schema.builder() >>> .addField("ymd", >>> FieldType.logicalType(SqlTypes.DATE)) >>> .build(); >>> >>> final Row row = >>> Row.withSchema(schema) >>> .withFieldValue("ymd", LocalDate.now()) >>> .build(); >>> >>> System.out.println(BigQueryUtils.toTableSchema(schema)); // works >>> System.out.println(BigQueryUtils.toTableRow(row)); // works >>> >>> System.out.println(AvroUtils.toAvroSchema(schema)); // fails >>> System.out.println(AvroUtils.toGenericRecord(row)); // fails >> >> >> Am I missing a reason for that or is it just not done properly yet? If >> this is the case, am I right to assume that they should be represented in >> the Avro format as the already existing cases? >> "beam:logical_type:date:v1" vs "DATE" >> "beam:logical_type:time:v1" vs "TIME" >> >> >>
Re: Incomplete Beam Schema -> Avro Schema conversion
I don't think there's a reason for this, it's just that these logical types were defined after the Avro <-> Beam schema conversion. I think it would be worthwhile to add support for them, but we'd also need to look at the reverse (avro to beam) direction, which would map back to the catch-all DATETIME primitive type [1]. Changing that could break backwards compatibility. [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L771-L776 On Wed, Aug 17, 2022 at 2:53 PM Balázs Németh wrote: > java.lang.RuntimeException: Unhandled logical type > beam:logical_type:date:v1 > at > org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943) > at > org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306) > at > org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341) > at > org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java > > In > https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944 > it seems to me there are some missing options. > > For example > - FixedBytes.IDENTIFIER, > - EnumerationType.IDENTIFIER, > - OneOfType.IDENTIFIER > is there, but: > - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER > ("beam:logical_type:date:v1") > - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER > ("beam:logical_type:datetime:v1") > - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER > ("beam:logical_type:time:v1") > is missing. > > This in an example that fails: > >> import java.time.LocalDate; >> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils; >> import org.apache.beam.sdk.schemas.Schema; >> import org.apache.beam.sdk.schemas.Schema.FieldType; >> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; >> import org.apache.beam.sdk.schemas.utils.AvroUtils; >> import org.apache.beam.sdk.values.Row; > > // ... > > final Schema schema = >> Schema.builder() >> .addField("ymd", >> FieldType.logicalType(SqlTypes.DATE)) >> .build(); >> >> final Row row = >> Row.withSchema(schema) >> .withFieldValue("ymd", LocalDate.now()) >> .build(); >> >> System.out.println(BigQueryUtils.toTableSchema(schema)); // works >> System.out.println(BigQueryUtils.toTableRow(row)); // works >> >> System.out.println(AvroUtils.toAvroSchema(schema)); // fails >> System.out.println(AvroUtils.toGenericRecord(row)); // fails > > > Am I missing a reason for that or is it just not done properly yet? If > this is the case, am I right to assume that they should be represented in > the Avro format as the already existing cases? > "beam:logical_type:date:v1" vs "DATE" > "beam:logical_type:time:v1" vs "TIME" > > >
Incomplete Beam Schema -> Avro Schema conversion
java.lang.RuntimeException: Unhandled logical type beam:logical_type:date:v1 at org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943) at org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306) at org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341) at org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java In https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944 it seems to me there are some missing options. For example - FixedBytes.IDENTIFIER, - EnumerationType.IDENTIFIER, - OneOfType.IDENTIFIER is there, but: - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER ("beam:logical_type:date:v1") - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER ("beam:logical_type:datetime:v1") - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER ("beam:logical_type:time:v1") is missing. This in an example that fails: > import java.time.LocalDate; > import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils; > import org.apache.beam.sdk.schemas.Schema; > import org.apache.beam.sdk.schemas.Schema.FieldType; > import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; > import org.apache.beam.sdk.schemas.utils.AvroUtils; > import org.apache.beam.sdk.values.Row; // ... final Schema schema = > Schema.builder() > .addField("ymd", > FieldType.logicalType(SqlTypes.DATE)) > .build(); > > final Row row = > Row.withSchema(schema) > .withFieldValue("ymd", LocalDate.now()) > .build(); > > System.out.println(BigQueryUtils.toTableSchema(schema)); // works > System.out.println(BigQueryUtils.toTableRow(row)); // works > > System.out.println(AvroUtils.toAvroSchema(schema)); // fails > System.out.println(AvroUtils.toGenericRecord(row)); // fails Am I missing a reason for that or is it just not done properly yet? If this is the case, am I right to assume that they should be represented in the Avro format as the already existing cases? "beam:logical_type:date:v1" vs "DATE" "beam:logical_type:time:v1" vs "TIME"