Re: Incomplete Beam Schema -> Avro Schema conversion

2022-09-09 Thread Balázs Németh
Is it still better to have an asymmetric conversion that supports more data
types than not having these implemented, right? This contribution seems
simple enough, but that's definitely not true for the other direction (...
and I'm also biased, I only need Beam->Avro).

Brian Hulette via dev  ezt írta (időpont: 2022. aug.
23., K, 1:53):

> I don't think there's a reason for this, it's just that these logical
> types were defined after the Avro <-> Beam schema conversion. I think it
> would be worthwhile to add support for them, but we'd also need to look at
> the reverse (avro to beam) direction, which would map back to the catch-all
> DATETIME primitive type [1]. Changing that could break backwards
> compatibility.
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L771-L776
>
> On Wed, Aug 17, 2022 at 2:53 PM Balázs Németh 
> wrote:
>
>> java.lang.RuntimeException: Unhandled logical type
>> beam:logical_type:date:v1
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943)
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306)
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341)
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java
>>
>> In
>> https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944
>> it seems to me there are some missing options.
>>
>> For example
>> - FixedBytes.IDENTIFIER,
>> - EnumerationType.IDENTIFIER,
>> - OneOfType.IDENTIFIER
>> is there, but:
>> - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER
>> ("beam:logical_type:date:v1")
>> - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER
>> ("beam:logical_type:datetime:v1")
>> - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER
>> ("beam:logical_type:time:v1")
>> is missing.
>>
>> This in an example that fails:
>>
>>> import java.time.LocalDate;
>>> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils;
>>> import org.apache.beam.sdk.schemas.Schema;
>>> import org.apache.beam.sdk.schemas.Schema.FieldType;
>>> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes;
>>> import org.apache.beam.sdk.schemas.utils.AvroUtils;
>>> import org.apache.beam.sdk.values.Row;
>>
>> // ...
>>
>> final Schema schema =
>>> Schema.builder()
>>> .addField("ymd",
>>> FieldType.logicalType(SqlTypes.DATE))
>>> .build();
>>>
>>> final Row row =
>>> Row.withSchema(schema)
>>> .withFieldValue("ymd", LocalDate.now())
>>> .build();
>>>
>>> System.out.println(BigQueryUtils.toTableSchema(schema)); // works
>>> System.out.println(BigQueryUtils.toTableRow(row)); // works
>>>
>>> System.out.println(AvroUtils.toAvroSchema(schema)); // fails
>>> System.out.println(AvroUtils.toGenericRecord(row)); // fails
>>
>>
>> Am I missing a reason for that or is it just not done properly yet? If
>> this is the case, am I right to assume that they should be represented in
>> the Avro format as the already existing cases?
>> "beam:logical_type:date:v1" vs "DATE"
>> "beam:logical_type:time:v1" vs "TIME"
>>
>>
>>


Re: Incomplete Beam Schema -> Avro Schema conversion

2022-08-22 Thread Brian Hulette via dev
I don't think there's a reason for this, it's just that these logical types
were defined after the Avro <-> Beam schema conversion. I think it would be
worthwhile to add support for them, but we'd also need to look at the
reverse (avro to beam) direction, which would map back to the catch-all
DATETIME primitive type [1]. Changing that could break backwards
compatibility.

[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L771-L776

On Wed, Aug 17, 2022 at 2:53 PM Balázs Németh  wrote:

> java.lang.RuntimeException: Unhandled logical type
> beam:logical_type:date:v1
>   at
> org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943)
>   at
> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306)
>   at
> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341)
>   at
> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java
>
> In
> https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944
> it seems to me there are some missing options.
>
> For example
> - FixedBytes.IDENTIFIER,
> - EnumerationType.IDENTIFIER,
> - OneOfType.IDENTIFIER
> is there, but:
> - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER
> ("beam:logical_type:date:v1")
> - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER
> ("beam:logical_type:datetime:v1")
> - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER
> ("beam:logical_type:time:v1")
> is missing.
>
> This in an example that fails:
>
>> import java.time.LocalDate;
>> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils;
>> import org.apache.beam.sdk.schemas.Schema;
>> import org.apache.beam.sdk.schemas.Schema.FieldType;
>> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes;
>> import org.apache.beam.sdk.schemas.utils.AvroUtils;
>> import org.apache.beam.sdk.values.Row;
>
> // ...
>
> final Schema schema =
>> Schema.builder()
>> .addField("ymd",
>> FieldType.logicalType(SqlTypes.DATE))
>> .build();
>>
>> final Row row =
>> Row.withSchema(schema)
>> .withFieldValue("ymd", LocalDate.now())
>> .build();
>>
>> System.out.println(BigQueryUtils.toTableSchema(schema)); // works
>> System.out.println(BigQueryUtils.toTableRow(row)); // works
>>
>> System.out.println(AvroUtils.toAvroSchema(schema)); // fails
>> System.out.println(AvroUtils.toGenericRecord(row)); // fails
>
>
> Am I missing a reason for that or is it just not done properly yet? If
> this is the case, am I right to assume that they should be represented in
> the Avro format as the already existing cases?
> "beam:logical_type:date:v1" vs "DATE"
> "beam:logical_type:time:v1" vs "TIME"
>
>
>


Incomplete Beam Schema -> Avro Schema conversion

2022-08-17 Thread Balázs Németh
java.lang.RuntimeException: Unhandled logical type beam:logical_type:date:v1
  at
org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943)
  at
org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306)
  at
org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341)
  at org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java

In
https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944
it seems to me there are some missing options.

For example
- FixedBytes.IDENTIFIER,
- EnumerationType.IDENTIFIER,
- OneOfType.IDENTIFIER
is there, but:
- org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER
("beam:logical_type:date:v1")
- org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER
("beam:logical_type:datetime:v1")
- org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER
("beam:logical_type:time:v1")
is missing.

This in an example that fails:

> import java.time.LocalDate;
> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils;
> import org.apache.beam.sdk.schemas.Schema;
> import org.apache.beam.sdk.schemas.Schema.FieldType;
> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes;
> import org.apache.beam.sdk.schemas.utils.AvroUtils;
> import org.apache.beam.sdk.values.Row;

// ...

final Schema schema =
> Schema.builder()
> .addField("ymd",
> FieldType.logicalType(SqlTypes.DATE))
> .build();
>
> final Row row =
> Row.withSchema(schema)
> .withFieldValue("ymd", LocalDate.now())
> .build();
>
> System.out.println(BigQueryUtils.toTableSchema(schema)); // works
> System.out.println(BigQueryUtils.toTableRow(row)); // works
>
> System.out.println(AvroUtils.toAvroSchema(schema)); // fails
> System.out.println(AvroUtils.toGenericRecord(row)); // fails


Am I missing a reason for that or is it just not done properly yet? If this
is the case, am I right to assume that they should be represented in the
Avro format as the already existing cases?
"beam:logical_type:date:v1" vs "DATE"
"beam:logical_type:time:v1" vs "TIME"