Re: [PR] AVRO-3884: Add `local-timestamp-nanos` and `timestamp-nanos` [avro]
tjwp commented on code in PR #2554: URL: https://github.com/apache/avro/pull/2554#discussion_r1437935439 ## doc/content/en/docs/++version++/Specification/_index.md: ## @@ -852,25 +852,25 @@ The `time-micros` logical type represents a time of day, with no reference to a A `time-micros` logical type annotates an Avro `long`, where the long stores the number of microseconds after midnight, 00:00:00.00. -### Timestamp (millisecond precision) {#timestamp_ms} -The `timestamp-millis` logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one millisecond. Please note that time zone information gets lost in this process. Upon reading a value back, we can only reconstruct the instant, but not the original representation. In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment. +### Timestamps {#timestamps} -A `timestamp-millis` logical type annotates an Avro `long`, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000 UTC. +The `timestamp-{millis,micros,nanos}` logical type represents an instant on the global timeline, independent of a particular time zone or calendar. Upon reading a value back, we can only reconstruct the instant, but not the original representation. In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment. -### Timestamp (microsecond precision) -The `timestamp-micros` logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one microsecond. Please note that time zone information gets lost in this process. Upon reading a value back, we can only reconstruct the instant, but not the original representation. In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment. +- `timestamp-millis`: logical type annotates an Avro `long`, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000. +- `timestamp-micros`: logical type annotates an Avro `long`, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.00. +- `timestamp-nanos`: logical type annotates an Avro `long`, where the long stores the number of nanoseconds from the unix epoch, 1 January 1970 00:00:00.0. -A `timestamp-micros` logical type annotates an Avro `long`, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.00 UTC. +Example: Given an event at noon local time (12:00) on January 1, 2000, in Helsinki where the local time was two hours east of UTC (UTC+2). The timestamp is first shifted to UTC 2000-01-01T10:00:00 and that is then converted to Avro long 94672080 (milliseconds) and written. -### Local timestamp (millisecond precision) {#local_timestamp_ms} -The `local-timestamp-millis` logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one millisecond. +### Local Timestamps {#local_timestamp} -A `local-timestamp-millis` logical type annotates an Avro `long`, where the long stores the number of milliseconds, from 1 January 1970 00:00:00.000. +The `local-timestamp-{millis,micros,nanos}` logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local. -### Local timestamp (microsecond precision) -The `local-timestamp-micros` logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one microsecond. +- `local-timestamp-millis`: logical type annotates an Avro `long`, where the long stores the number of milliseconds, from 1 January 1970 00:00:00.000. +- `local-timestamp-micros`: logical type annotates an Avro `long`, where the long stores the number of microseconds, from 1 January 1970 00:00:00.00. +- `local-timestamp-nanos`: logical type annotates an Avro `long`, where the long stores the number of nanoseconds, from 1 January 1970 00:00:00.0. -A `local-timestamp-micros` logical type annotates an Avro `long`, where the long stores the number of microseconds, from 1 January 1970 00:00:00.00. +Example: Given an event at noon local time (12:00) on January 1, 2000, in Helsinki where the local time was two hours east of UTC (UTC+2). The timestamp is converted to Avro long 94668480 (milliseconds) and then written. Review Comment: @Fokko Is the example value `94668480` correct? This corresponds to `2000-01-01 00:00:00 UTC`. I was expecting that the value would
[jira] [Resolved] (AVRO-3921) Test against Ruby 3.3
[ https://issues.apache.org/jira/browse/AVRO-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Perkins resolved AVRO-3921. --- Fix Version/s: 1.12.0 Resolution: Fixed > Test against Ruby 3.3 > - > > Key: AVRO-3921 > URL: https://issues.apache.org/jira/browse/AVRO-3921 > Project: Apache Avro > Issue Type: Test > Components: ruby >Affects Versions: 1.12.0 >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Trivial > Labels: pull-request-available > Fix For: 1.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Ruby 3.3.0 was released on December 25, 2023 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (AVRO-3922) Add timestamp-nanos support to Ruby
[ https://issues.apache.org/jira/browse/AVRO-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated AVRO-3922: - Labels: pull-request-available (was: ) > Add timestamp-nanos support to Ruby > --- > > Key: AVRO-3922 > URL: https://issues.apache.org/jira/browse/AVRO-3922 > Project: Apache Avro > Issue Type: New Feature > Components: ruby >Affects Versions: 1.12.0 >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Scoping this to only add timestamp-nanos support and addressing the > local-timestamp-* logical types separately. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] AVRO-3918: add uuid with bytes and fixed [avro]
Fokko commented on code in PR #2652: URL: https://github.com/apache/avro/pull/2652#discussion_r1437795342 ## doc/content/en/docs/++version++/Specification/_index.md: ## @@ -827,7 +827,31 @@ Here, as scale property is stored in value itself it needs more bytes than prece ### UUID The `uuid` logical type represents a random generated universally unique identifier (UUID). -A `uuid` logical type annotates an Avro `string`. The string has to conform with [RFC-4122](https://www.ietf.org/rfc/rfc4122.txt) +A `uuid` logical type annotates an Avro `string`, `fixed` of length 16 or `bytes`. The string has to conform with [RFC-4122](https://www.ietf.org/rfc/rfc4122.txt) + +The following schemas represent a uuid: +```json +{ + "type": "string", + "logicalType": "uuid" +} +``` + +```json +{ + "type": "fixed", + "size": "16", + "logicalType": "uuid" +} +``` + +```json +{ + "type": "bytes", + "logicalType": "uuid" +} +``` Review Comment: I don't think this adds value since a UUID is always 128 bits. Storing it as bytes will be suboptimal since you first need to encode the length of the bytes which is always the same. ## lang/java/avro/src/main/java/org/apache/avro/LogicalTypes.java: ## @@ -296,8 +296,12 @@ private Uuid() { @Override public void validate(Schema schema) { super.validate(schema); - if (schema.getType() != Schema.Type.STRING) { -throw new IllegalArgumentException("Uuid can only be used with an underlying string type"); + if (schema.getType() != Schema.Type.STRING && schema.getType() != Schema.Type.FIXED + && schema.getType() != Schema.Type.BYTES) { +throw new IllegalArgumentException("Uuid can only be used with an underlying string, fixed or byte type"); + } + if (schema.getType() == Schema.Type.FIXED && schema.getFixedSize() != 2 * Long.BYTES) { Review Comment: Should we create a new constant with `UUID_BYTES`? ## doc/content/en/docs/++version++/Specification/_index.md: ## @@ -827,7 +827,31 @@ Here, as scale property is stored in value itself it needs more bytes than prece ### UUID The `uuid` logical type represents a random generated universally unique identifier (UUID). -A `uuid` logical type annotates an Avro `string`. The string has to conform with [RFC-4122](https://www.ietf.org/rfc/rfc4122.txt) +A `uuid` logical type annotates an Avro `string`, `fixed` of length 16 or `bytes`. The string has to conform with [RFC-4122](https://www.ietf.org/rfc/rfc4122.txt) + +The following schemas represent a uuid: +```json +{ + "type": "string", + "logicalType": "uuid" +} +``` + +```json +{ + "type": "fixed", + "size": "16", + "logicalType": "uuid" +} +``` + +```json +{ + "type": "bytes", + "logicalType": "uuid" +} +``` +(UUID will be sorted differently if the underlying type is a string, a fixed or a bytes) Review Comment: I would not expect this to be true, but I have to dig into the details. ## lang/java/avro/src/test/java/org/apache/avro/TestLogicalType.java: ## @@ -208,7 +208,7 @@ void uuidExtendsString() { assertEquals(LogicalTypes.uuid(), uuidSchema.getLogicalType()); assertThrows("UUID requires a string", IllegalArgumentException.class, -"Uuid can only be used with an underlying string type", +"Uuid can only be used with an underlying string, fixed or byte type", Review Comment: The current proposal does not include bytes. ```suggestion "Uuid can only be used with an underlying string or fixed type", ``` ## lang/java/avro/src/main/java/org/apache/avro/Conversions.java: ## @@ -68,6 +80,68 @@ public UUID fromCharSequence(CharSequence value, Schema schema, LogicalType type public CharSequence toCharSequence(UUID value, Schema schema, LogicalType type) { return value.toString(); } + +@Override +public UUID fromFixed(final GenericFixed value, final Schema schema, final LogicalType type) { + long mostSigBits = 0; + long leastSigBits = 0; + byte[] bytes = value.bytes(); + for (int i = 0; i < Long.BYTES; i++) { +mostSigBits |= ((long) (bytes[i] & 255)) << (Byte.SIZE * i); +leastSigBits |= ((long) (bytes[i + Long.BYTES] & 255)) << (Byte.SIZE * i); + } + + return new UUID(this.convert(mostSigBits), this.convert(leastSigBits)); +} + +private long convert(long value) { + if (this.isBigEndian) { +return value; + } else { +return Long.reverseBytes(value); + } +} + +@Override +public UUID fromBytes(final ByteBuffer value, final Schema schema, final LogicalType type) { + BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(value.array(), null); + try { +final long mostSigBits = decoder.readLong(); Review Comment: In Iceberg we [first read it to a
Re: [PR] AVRO-3918: add uuid with bytes and fixed [avro]
Fokko commented on PR #2652: URL: https://github.com/apache/avro/pull/2652#issuecomment-1871359420 Thanks @clesaec for working on this, this is great! @KalleOlaviNiemitalo Thanks for chiming in here. It would be great to get this on the devlist as well: https://lists.apache.org/thread/h4lvc2lk194om6cjm76wzosm22cop2xw Or on the Google doc: https://docs.google.com/document/d/16_oSWrEM7AFUCTe0uuraAEHxywezLfoEz5ahzwvhGUk/edit#heading=h.43xuauwfk7ow You raised some valid points, but I think we don't sort on the actual physical type, but rather on the logical type. Meaning that how you store them, is more of an implementation detail than something that should be visible to the user. Java should just sort on a `java.util.UUID`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AVRO-3921) Test against Ruby 3.3
[ https://issues.apache.org/jira/browse/AVRO-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801032#comment-17801032 ] ASF subversion and git services commented on AVRO-3921: --- Commit a0b277ffcc18ce7f0bc1faa0e0b8c61326e93753 in avro's branch refs/heads/main from Tim Perkins [ https://gitbox.apache.org/repos/asf?p=avro.git;h=a0b277ffc ] AVRO-3921: [ruby] Test against Ruby 3.3 (#2655) > Test against Ruby 3.3 > - > > Key: AVRO-3921 > URL: https://issues.apache.org/jira/browse/AVRO-3921 > Project: Apache Avro > Issue Type: Test > Components: ruby >Affects Versions: 1.12.0 >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Trivial > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Ruby 3.3.0 was released on December 25, 2023 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (AVRO-3923) Add Avro 1.11.3 release blog
[ https://issues.apache.org/jira/browse/AVRO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated AVRO-3923: - Labels: pull-request-available (was: ) > Add Avro 1.11.3 release blog > > > Key: AVRO-3923 > URL: https://issues.apache.org/jira/browse/AVRO-3923 > Project: Apache Avro > Issue Type: Improvement > Components: website >Affects Versions: 1.11.3 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (AVRO-3923) Add Avro 1.11.3 release blog
[ https://issues.apache.org/jira/browse/AVRO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong reassigned AVRO-3923: -- > Add Avro 1.11.3 release blog > > > Key: AVRO-3923 > URL: https://issues.apache.org/jira/browse/AVRO-3923 > Project: Apache Avro > Issue Type: Improvement > Components: website >Affects Versions: 1.11.3 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (AVRO-2728) idl2schemata: types with the same name in different namespaces => overwritten files
[ https://issues.apache.org/jira/browse/AVRO-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801013#comment-17801013 ] Carsten Seibert commented on AVRO-2728: --- We are also facing this issue when importing different domains from our quite extensive company models. Any chance to have this priotised? > idl2schemata: types with the same name in different namespaces => overwritten > files > --- > > Key: AVRO-2728 > URL: https://issues.apache.org/jira/browse/AVRO-2728 > Project: Apache Avro > Issue Type: Bug > Components: tools >Affects Versions: 1.10.0, 1.9.1 >Reporter: Jarek Rosiek >Priority: Major > > For the following AVDL: > {code:java} > @namespace("ns1") > protocol Proto { > record Foo { > ns2.Foo foo; > } > @namespace("ns2") > record Foo { > int x; > } > } > {code} > the tool will generate just a single file named {{Foo.avsc}} with the schema > of the last record Foo. > > Suggested solution: > Add an option to the {{idl2schemata}} command named, for example, > {{-fqnames}} that will make the tool generate files named using full > qualified type names. For example, for the protocol shown above the files > would be named: > {code:java} > ns1.Foo.avsc > ns2.Foo.avsc{code} > > Impact: > My company wanted to use Avro IDL as a specification format for the schemas > (more readable, ability to import other files). We need to push the schemas > to kafka Schema Registry. We could use {{idl2schemata}} to generate schemas > in JSON format but we risk pushing wrong schema whenever different teams > working on schemas from different namespaces create a type with the same base > name. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (AVRO-3830) Handle namespace properly if a name starts with dot
[ https://issues.apache.org/jira/browse/AVRO-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated AVRO-3830: --- Component/s: rust > Handle namespace properly if a name starts with dot > --- > > Key: AVRO-3830 > URL: https://issues.apache.org/jira/browse/AVRO-3830 > Project: Apache Avro > Issue Type: Bug > Components: rust >Affects Versions: 1.12.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0, 1.11.3 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The specification says about the name and namespace like as follows. > ??The empty string may also be used as a namespace to indicate the null > namespace?? > ??If the name specified contains a dot, then it is assumed to be a fullname, > and any namespace also specified is ignored?? > According to this specification, if a name in a name field starts with a dot, > it's considered that the namespace is null and the corresponding namespace > field should be ignored. > For example, given the following schema. > {code} > { > "name": ".record1", > "namespace": "ns1", > "type": "record", > "fields": [] > } > {code} > The name and namespace should be "record1" and null respectively. > But the namespace is considered as "ns1" in the current Rust binding . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (AVRO-3922) Add timestamp-nanos support to Ruby
[ https://issues.apache.org/jira/browse/AVRO-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Perkins reassigned AVRO-3922: - > Add timestamp-nanos support to Ruby > --- > > Key: AVRO-3922 > URL: https://issues.apache.org/jira/browse/AVRO-3922 > Project: Apache Avro > Issue Type: New Feature > Components: ruby >Affects Versions: 1.12.0 >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Minor > > Scoping this to only add timestamp-nanos support and addressing the > local-timestamp-* logical types separately. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (AVRO-3921) Test against Ruby 3.3
[ https://issues.apache.org/jira/browse/AVRO-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated AVRO-3921: - Labels: pull-request-available (was: ) > Test against Ruby 3.3 > - > > Key: AVRO-3921 > URL: https://issues.apache.org/jira/browse/AVRO-3921 > Project: Apache Avro > Issue Type: Test > Components: ruby >Affects Versions: 1.12.0 >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Ruby 3.3.0 was released on December 25, 2023 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (AVRO-3919) Add UUID type example
[ https://issues.apache.org/jira/browse/AVRO-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AVRO-3919. Fix Version/s: 1.12.0 Assignee: Fokko Driesprong Resolution: Fixed > Add UUID type example > - > > Key: AVRO-3919 > URL: https://issues.apache.org/jira/browse/AVRO-3919 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (AVRO-3919) Add UUID type example
[ https://issues.apache.org/jira/browse/AVRO-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801007#comment-17801007 ] ASF subversion and git services commented on AVRO-3919: --- Commit 8e651d8a1f91c9d279388cb22afe8c092b49a216 in avro's branch refs/heads/main from Fokko Driesprong [ https://gitbox.apache.org/repos/asf?p=avro.git;h=8e651d8a1 ] AVRO-3919: [Spec] Clarify UUID type (#2646) With an example as we have with the other types as well. > Add UUID type example > - > > Key: AVRO-3919 > URL: https://issues.apache.org/jira/browse/AVRO-3919 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Reporter: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] AVRO-3919: [Spec] Clarify UUID type [avro]
Fokko commented on PR #2646: URL: https://github.com/apache/avro/pull/2646#issuecomment-1871193677 Thanks for the reviews @martin-g and @KalleOlaviNiemitalo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (AVRO-3921) Test against Ruby 3.3
[ https://issues.apache.org/jira/browse/AVRO-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Perkins reassigned AVRO-3921: - > Test against Ruby 3.3 > - > > Key: AVRO-3921 > URL: https://issues.apache.org/jira/browse/AVRO-3921 > Project: Apache Avro > Issue Type: Test > Components: ruby >Affects Versions: 1.12.0 >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Trivial > > Ruby 3.3.0 was released on December 25, 2023 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] AVRO-2956: [java] add reserved word escape character to the end of reserved and contextual keywords [avro]
Fokko commented on PR #2544: URL: https://github.com/apache/avro/pull/2544#issuecomment-1871165852 @stephenkittelson Thanks, anyway, thank you for working on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org