[jira] [Updated] (AVRO-3374) [Java] Fully qualified type reference "ns.int" loses namespace.

2022-05-17 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec updated AVRO-3374:
-
Attachment: AVRO-3374.patch

> [Java] Fully qualified type reference "ns.int" loses namespace.
> ---
>
> Key: AVRO-3374
> URL: https://issues.apache.org/jira/browse/AVRO-3374
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Minor
> Attachments: AVRO-3374.patch
>
>
> While brainstorming for AVRO-3370, I came across this special case where a 
> type-reference could be considered ambiguous if the SDK is not careful when 
> simplifying inherited namespaces:
> {code:json}
> {
>   "type" : "record",
>   "name" : "ns.int",
>   "fields" : [ 
> {"name" : "value", "type" : "int"}, 
> {"name" : "next", "type" : [ "null", "ns.int" ]}
>   ]
> }
> {code}
> In Java, if this code is parsed, it works as expected (as a linked list).
> If the schema is turned to a String using toString(), the namespace is 
> dropped off the last {*}{{ns.int}}{*}, turning it into the primitive. That 
> string can still be parsed into a Schema, but the "round-trip" modifies the 
> schema in an incompatible way.
> That namespace shouldn't be dropped when producing the JSON string 
> representing the Schema in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (AVRO-3374) [Java] Fully qualified type reference "ns.int" loses namespace.

2022-05-17 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec updated AVRO-3374:
-
Labels: pull-requests-available  (was: pull-request-available)

> [Java] Fully qualified type reference "ns.int" loses namespace.
> ---
>
> Key: AVRO-3374
> URL: https://issues.apache.org/jira/browse/AVRO-3374
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Minor
>  Labels: pull-requests-available
> Attachments: AVRO-3374.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While brainstorming for AVRO-3370, I came across this special case where a 
> type-reference could be considered ambiguous if the SDK is not careful when 
> simplifying inherited namespaces:
> {code:json}
> {
>   "type" : "record",
>   "name" : "ns.int",
>   "fields" : [ 
> {"name" : "value", "type" : "int"}, 
> {"name" : "next", "type" : [ "null", "ns.int" ]}
>   ]
> }
> {code}
> In Java, if this code is parsed, it works as expected (as a linked list).
> If the schema is turned to a String using toString(), the namespace is 
> dropped off the last {*}{{ns.int}}{*}, turning it into the primitive. That 
> string can still be parsed into a Schema, but the "round-trip" modifies the 
> schema in an incompatible way.
> That namespace shouldn't be dropped when producing the JSON string 
> representing the Schema in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (AVRO-3374) [Java] Fully qualified type reference "ns.int" loses namespace.

2022-05-17 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538178#comment-17538178
 ] 

Christophe Le Saec edited comment on AVRO-3374 at 5/17/22 1:08 PM:
---

Good question indeed.

[For first part of 
sentence|https://avro.apache.org/docs/current/spec.html#names] "Primitive type 
names have no namespace", it's OK as in example, "ns.int" define a record.

For second part  : "their names may not be defined in any namespace", it's not 
entirely clear to me. If it means that names of primitive may not be used to 
define other types as record; then the json shouldn't be parsed correctly in 
Java (and it should throw an exception). If it means that a new primitive type 
cannot be defined in a schema (said 'positive integer' for instance), the 
example is fine with it.


was (Author: JIRAUSER289541):
Good question indeed.

[For first part of 
sentence|https://avro.apache.org/docs/current/spec.html#names] "Primitive type 
names have no namespace", it's OK as in example, "ns.int" is a define a record.

For second part  : "their names may not be defined in any namespace", it's not 
entirely clear to me. If it means that names of primitive may not be used to 
define other types as record; then the json shouldn't be parsed correctly in 
Java (and it should throw an exception). If it means that a new primitive type 
cannot be defined in a schema (said 'positive integer' for instance), the 
example is fine with it.

> [Java] Fully qualified type reference "ns.int" loses namespace.
> ---
>
> Key: AVRO-3374
> URL: https://issues.apache.org/jira/browse/AVRO-3374
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Minor
>  Labels: pull-requests-available
> Attachments: AVRO-3374.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While brainstorming for AVRO-3370, I came across this special case where a 
> type-reference could be considered ambiguous if the SDK is not careful when 
> simplifying inherited namespaces:
> {code:json}
> {
>   "type" : "record",
>   "name" : "ns.int",
>   "fields" : [ 
> {"name" : "value", "type" : "int"}, 
> {"name" : "next", "type" : [ "null", "ns.int" ]}
>   ]
> }
> {code}
> In Java, if this code is parsed, it works as expected (as a linked list).
> If the schema is turned to a String using toString(), the namespace is 
> dropped off the last {*}{{ns.int}}{*}, turning it into the primitive. That 
> string can still be parsed into a Schema, but the "round-trip" modifies the 
> schema in an incompatible way.
> That namespace shouldn't be dropped when producing the JSON string 
> representing the Schema in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3374) [Java] Fully qualified type reference "ns.int" loses namespace.

2022-05-17 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538178#comment-17538178
 ] 

Christophe Le Saec commented on AVRO-3374:
--

Good question indeed.

[For first part of 
sentence|https://avro.apache.org/docs/current/spec.html#names] "Primitive type 
names have no namespace", it's OK as in example, "ns.int" is a define a record.

For second part  : "their names may not be defined in any namespace", it's not 
entirely clear to me. If it means that names of primitive may not be used to 
define other types as record; then the json shouldn't be parsed correctly in 
Java (and it should throw an exception). If it means that a new primitive type 
cannot be defined in a schema (said 'positive integer' for instance), the 
example is fine with it.

> [Java] Fully qualified type reference "ns.int" loses namespace.
> ---
>
> Key: AVRO-3374
> URL: https://issues.apache.org/jira/browse/AVRO-3374
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Minor
>  Labels: pull-requests-available
> Attachments: AVRO-3374.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While brainstorming for AVRO-3370, I came across this special case where a 
> type-reference could be considered ambiguous if the SDK is not careful when 
> simplifying inherited namespaces:
> {code:json}
> {
>   "type" : "record",
>   "name" : "ns.int",
>   "fields" : [ 
> {"name" : "value", "type" : "int"}, 
> {"name" : "next", "type" : [ "null", "ns.int" ]}
>   ]
> }
> {code}
> In Java, if this code is parsed, it works as expected (as a linked list).
> If the schema is turned to a String using toString(), the namespace is 
> dropped off the last {*}{{ns.int}}{*}, turning it into the primitive. That 
> string can still be parsed into a Schema, but the "round-trip" modifies the 
> schema in an incompatible way.
> That namespace shouldn't be dropped when producing the JSON string 
> representing the Schema in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (AVRO-3523) How to contribute : remove patch section

2022-05-17 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3523:


 Summary: How to contribute : remove patch section
 Key: AVRO-3523
 URL: https://issues.apache.org/jira/browse/AVRO-3523
 Project: Apache Avro
  Issue Type: Task
  Components: community
Reporter: Christophe Le Saec


The page [how to 
contribute|https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute#HowToContribute-CommittingGuidelinesforcommitters]
 contains description to patch but PR is enough.

The aim of this JIRA is to remove patch from website.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3520) CustomEncoding doesn't expose the read schema

2022-05-18 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538645#comment-17538645
 ] 

Christophe Le Saec commented on AVRO-3520:
--

Hi [~itstheceo],

Could you please supply more context ?  Code snippets or details would be 
welcome.

> CustomEncoding doesn't expose the read schema
> -
>
> Key: AVRO-3520
> URL: https://issues.apache.org/jira/browse/AVRO-3520
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Colin
>Priority: Major
>
> Currently it is not possible to detect a schema change when using 
> `CustomEncoding`.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3438) NPE when serializing Avro GenericRecord with Kryo

2022-05-18 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538782#comment-17538782
 ] 

Christophe Le Saec commented on AVRO-3438:
--

Hello [~tashoyan] ,

This code snippet
{code:java}
void test() throws IOException {
Kryo kryo = new Kryo();
final URL resource = 
Thread.currentThread().getContextClassLoader().getResource(".");
File data = new File(resource.getPath(), "file.dat");
if (data.exists()) {
data.delete();
}
data.createNewFile();
Input input =  new Input(new FileInputStream(data));
Output output = new Output(new FileOutputStream(data));

final Schema schema = Schema.create(Schema.Type.LONG);
schema.addProp("Hello", "World");

kryo.register(schema.getClass(), new JavaSerializer());

kryo.writeClassAndObject(output, schema);
output.close();

Schema theObject = (Schema) kryo.readClassAndObject(input);
input.close();

System.out.println("The shema : " + theObject.getClass().getName());
System.out.println("  Hello : " + theObject.getProp("Hello"));
}
{code}
work like a charm (with [kryo 
2.24.0|https://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo/2.24.0]
 and with more recent [Kryo 
5.3.0|https://mvnrepository.com/artifact/com.esotericsoftware/kryo/5.3.0]), 
tested with Avro 1.11.0 which still have this anonymous class.
Did i miss something ?

> NPE when serializing Avro GenericRecord with Kryo
> -
>
> Key: AVRO-3438
> URL: https://issues.apache.org/jira/browse/AVRO-3438
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0, 1.10.2
>Reporter: Arseniy Tashoyan
>Priority: Major
>
> We have an Apache Flink application, that processes Avro GenericRecords. 
> Flink uses Kryo serialization. The serialization fails:
>  
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> props (org.apache.avro.Schema$Field)
> fieldMap (org.apache.avro.Schema$RecordSchema)
> schema (org.apache.avro.generic.GenericData$Record)
> values (org.apache.avro.generic.GenericData$Record)
> values (org.apache.avro.generic.GenericData$Record)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:82) 
> ~[kryo-2.24.0.jar:?]
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>  ~[kryo-2.24.0.jar:?]
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:599) 
> ~[kryo-2.24.0.jar:?]
> at 
> com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:95)
>  ~[kryo-2.24.0.jar:?]
> at 
> com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:21)
>  ~[kryo-2.24.0.jar:?]
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) 
> ~[kryo-2.24.0.jar:?]
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) 
> ~[kryo-2.24.0.jar:?]
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>  ~[kryo-2.24.0.jar:?]
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) 
> ~[kryo-2.24.0.jar:?]
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) 
> ~[kryo-2.24.0.jar:?]
> ...
> Caused by: java.lang.NullPointerException
> at org.apache.avro.JsonProperties$2$1$1.(JsonProperties.java:175) 
> ~[avro-1.10.2.jar:1.10.2]
> at org.apache.avro.JsonProperties$2$1.iterator(JsonProperties.java:174) 
> ~[avro-1.10.2.jar:1.10.2]
> at 
> com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:80)
>  ~[kryo-2.24.0.jar:?]
> at 
> com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:21)
>  ~[kryo-2.24.0.jar:?]
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523) 
> ~[kryo-2.24.0.jar:?]
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61) 
> ~[kryo-2.24.0.jar:?]
> ...
>  
> This problem occurs, because Kryo uses Unsafe to copy objects. When creating 
> an object, Unsafe does not call the class constructor. Therefore the field 
> *props* in the class JsonProperties is not properly initialized. The field 
> *props* is implemented via anonymous inner classes. The NullPointerException 
> occurs at JsonProperties.java:175, because the reference to the enclosing 
> instance of AbstractSet is null.
> A possible fix is to avoid using anonymous classes, use normal classes 
> instead.  A similar case: 
> [https://stackoverflow.com/questions/36902471/inner-class-reference-to-enclosing-class-instance-is-null]
>  
> Avro: 1.10.2
> Kryo: 2.24.0
> Flink: 1.12.4



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3130) Schema.java errors with "No type" when a type object is provided instead of text type

2022-05-18 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538855#comment-17538855
 ] 

Christophe Le Saec commented on AVRO-3130:
--

Hello [~honoredb],

it does work with
{code:json}
{
  "type" : "record",
  "name" : "AccountEvent",
  "fields" : [ 
{
  "name" : "NullableLongArray",
  "type" : [ 
   "null",
   { "type":"array", "items":"long" } 
   ]
} 
  ]
}
{code}

Maybe i missed something, but i don't see the purpose of 'accountList' name 
(_As it's not a field_)

> Schema.java errors with "No type" when a type object is provided instead of 
> text type
> -
>
> Key: AVRO-3130
> URL: https://issues.apache.org/jira/browse/AVRO-3130
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Aaron Zinger
>Priority: Minor
>
> When a schema object is provided inside a union or as an array items type, 
> the parser looks for a text value for type, doesn't find it, and throws a 
> confusing error message of the form "No type: ".
> The error message could be fixed by having GetRequiredText distinguish 
> between the key not being present and the value not being text, with the 
> latter returning a message like "Value for type must be string".
> However, my preferred change would be to handle schema objects in these 
> situations--it seems like it'd be compliant with the spec to have {"type": 
> "string"} resolve to "string", and it would let schema generation code be a 
> little simpler by letting it reuse schemas. This could also address 
> https://issues.apache.org/jira/browse/AVRO-1977, I think.
> Examples below.
> {code:json}
> {
> "type": "record",
> "name": "AccountEvent",
> "fields": [
> {"type": 
>   ["null",
>{ "name": "accountList",
>   "type": {
> "type": "array",
> "items": "long"
>   }
>   }
>   ], 
>  "name":"NullableLongArray"
>}
> ]
> }
> {code}
> Fails with {code:java}"No type: 
> {\"name\":\"accountList\",\"type\":{\"type\":\"array\",\"items\":\"long\"}}"{code}
> The error is similar for
> {code:json}
> {
> "type": "array",
> "name": "FavoriteNumbers",
> "items": {"type":{"type": ["null","long"], "name": "NullableNumber"}}
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3520) CustomEncoding doesn't expose the read schema

2022-05-19 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539481#comment-17539481
 ] 

Christophe Le Saec commented on AVRO-3520:
--

For the existing encoded binaries (that are already exits), as it does not 
embed schema information, i suggest the reader try to get information for added 
field :
{code:java}
  @Override  
  protected MyCustomv2 read(Object object, Decoder in)  {
// ... fill in field of first version
// For fields in new version, potentially non exists
try {
   mc.setNewField(in.readString());
}
catch (IOException ex) {}
  }
{code}
(This imply that new fields are added at end of Record, and support no other 
kind of change)

For futur encoded binaries, you could put schema information on record itself,
{code:java}
final Field version = new Field("version", Schema.create(Schema.Type.INT), 
"schelma version", 0);
...
Schema.createRecord("record", "doc", "namespace", false, Arrays.asList(version, 
...));
{code}
Then, use this "metadata" field to store object version

For instance, with second version
{code:java}
@Override
protected void write(Object datum, Encoder out) throws IOException {
 ... 
 out.writeInt(2); // version 2
...
}

@Override
protected MyCustom_versionX read(Object reuse, Decoder in) throws 
IOException {
   
   int version = in.readInt();
   // So, you can have an adapted code to the version
}

@Override
protected Schema getSchema() {
// return version 2
}
{code}
(This allow more modification than simply add field, but also, change field 
type (from int to float for example ...))

Hope it will help.

> CustomEncoding doesn't expose the read schema
> -
>
> Key: AVRO-3520
> URL: https://issues.apache.org/jira/browse/AVRO-3520
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Colin
>Priority: Major
>
> Currently it is not possible to detect a schema change when using 
> `CustomEncoding`.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3523) How to contribute : remove patch section

2022-05-19 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539518#comment-17539518
 ] 

Christophe Le Saec commented on AVRO-3523:
--

Indeed, i hadn't thought about it.

[~rskraba] , you're point about that, i cancel this Jira ?

> How to contribute : remove patch section
> 
>
> Key: AVRO-3523
> URL: https://issues.apache.org/jira/browse/AVRO-3523
> Project: Apache Avro
>  Issue Type: Task
>  Components: community
>Reporter: Christophe Le Saec
>Priority: Trivial
>  Labels: documentation
>
> The page [how to 
> contribute|https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute#HowToContribute-CommittingGuidelinesforcommitters]
>  contains description to patch but PR is enough.
> The aim of this JIRA is to remove patch from website.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3520) CustomEncoding doesn't expose the read schema

2022-05-23 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540841#comment-17540841
 ] 

Christophe Le Saec commented on AVRO-3520:
--

Great job !!

It would be even better if schema supply in CustomEncoding.read(Object reuse, 
Decoder in) method ( CustomEncoding.read(Object reuse, Decoder in, Schema.Field 
f) ) but it leads to too many update.

> CustomEncoding doesn't expose the read schema
> -
>
> Key: AVRO-3520
> URL: https://issues.apache.org/jira/browse/AVRO-3520
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Colin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently it is not possible to detect a schema change when using 
> `CustomEncoding`.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (AVRO-3520) CustomEncoding doesn't expose the read schema

2022-05-23 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec updated AVRO-3520:
-
Attachment: patchTest.txt

> CustomEncoding doesn't expose the read schema
> -
>
> Key: AVRO-3520
> URL: https://issues.apache.org/jira/browse/AVRO-3520
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Colin
>Priority: Major
>  Labels: pull-request-available
> Attachments: patchTest.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently it is not possible to detect a schema change when using 
> `CustomEncoding`.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3520) CustomEncoding doesn't expose the read schema

2022-05-23 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540917#comment-17540917
 ] 

Christophe Le Saec commented on AVRO-3520:
--

Finally, i tested it with changing : 
{code:java}
CustomEncoding.read(Object reuse, Decoder in, Schema.Field f)
{code}
See  [^patchTest.txt] 
It work well since i changed write test method with
{code:java}
  private byte[] write(Custom custom) {
ReflectData rd = new ReflectData();
Schema schema = rd.getSchema(Wrapper.class);
ReflectDatumWriter datumWriter = new ReflectDatumWriter<>(rd);
{code}
To prevent singleton issue.
This should also work when , "in the wild", you have to read some input stream 
with old version and some with new version.


> CustomEncoding doesn't expose the read schema
> -
>
> Key: AVRO-3520
> URL: https://issues.apache.org/jira/browse/AVRO-3520
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Colin
>Priority: Major
>  Labels: pull-request-available
> Attachments: patchTest.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently it is not possible to detect a schema change when using 
> `CustomEncoding`.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3520) CustomEncoding doesn't expose the read schema

2022-05-24 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541349#comment-17541349
 ] 

Christophe Le Saec commented on AVRO-3520:
--

And about reading 2 avro files with different schema, in parallel in 2 
different thread ?
[As the cache for accessors is 
static|https://github.com/apache/avro/blob/ff4eaf32c1fb4f04770a6ad39f2769cc907006e4/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java#L347],
 it should throws error ? 
If you change API of CustomEncoding.read method to 
{code:java}
CustomEncoding.read(Object reuse, Decoder in, Schema.Field f){code}
you could extract real schema from field f, that come directly from input file, 
so without any confusion ?

> CustomEncoding doesn't expose the read schema
> -
>
> Key: AVRO-3520
> URL: https://issues.apache.org/jira/browse/AVRO-3520
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Colin
>Priority: Major
>  Labels: pull-request-available
> Attachments: patchTest.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently it is not possible to detect a schema change when using 
> `CustomEncoding`.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3374) [Java] Fully qualified type reference "ns.int" loses namespace.

2022-05-24 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541361#comment-17541361
 ] 

Christophe Le Saec commented on AVRO-3374:
--

[~rskraba]  [~opwvhk] : Here, can we find a consensus on how to fix this ?
We have 2 options :
- Fix the issue and update documentation (from "Primitive type names have no 
namespace and their names may not be defined in any namespace." to "Primitive 
type names have no namespace" for example)
- Change parsing code to throw an exception when encountered "ns.int" for 
record name; and so, be more conformed to current documentation, but introduced 
a *breaking change*.

(I only explore Java code about that, but should we align behavior of others 
modules (rust, C++ ...) ?)

> [Java] Fully qualified type reference "ns.int" loses namespace.
> ---
>
> Key: AVRO-3374
> URL: https://issues.apache.org/jira/browse/AVRO-3374
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Minor
>  Labels: pull-requests-available
> Attachments: AVRO-3374.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While brainstorming for AVRO-3370, I came across this special case where a 
> type-reference could be considered ambiguous if the SDK is not careful when 
> simplifying inherited namespaces:
> {code:json}
> {
>   "type" : "record",
>   "name" : "ns.int",
>   "fields" : [ 
> {"name" : "value", "type" : "int"}, 
> {"name" : "next", "type" : [ "null", "ns.int" ]}
>   ]
> }
> {code}
> In Java, if this code is parsed, it works as expected (as a linked list).
> If the schema is turned to a String using toString(), the namespace is 
> dropped off the last {*}{{ns.int}}{*}, turning it into the primitive. That 
> string can still be parsed into a Schema, but the "round-trip" modifies the 
> schema in an incompatible way.
> That namespace shouldn't be dropped when producing the JSON string 
> representing the Schema in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3374) [Java] Fully qualified type reference "ns.int" loses namespace.

2022-05-24 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541579#comment-17541579
 ] 

Christophe Le Saec commented on AVRO-3374:
--

If Avro Project was at start, i would always write complete name (with 
namespace), even for *ns.MyRecord*, it would easily allow to have 
*ns1.MyRecord* & *ns.MyRecord* in same json (same name, different namespace), 
and, as it's for schema, not for record, there are no great gain to skip 
namespace in term of memory. This would automatically allow "ns.int" as record 
schema name.

Currently, SDK doesn't failed to load Json schema with "ns.int", it just can't 
rewrite it correctly with toString function, because it translate "ns.int" to 
"int", so it fails to relaod it because misinterpret "int" type (it would be 
the same with the fix, "int" can be misinterpreted). Fix will only fix toString 
method, not the loading, so, there's no pb with compatibility (i think).

I think this fix is in good direction (allow "ns.int"), and we can even discuss 
about the fact that to write namespace systematically ?


> [Java] Fully qualified type reference "ns.int" loses namespace.
> ---
>
> Key: AVRO-3374
> URL: https://issues.apache.org/jira/browse/AVRO-3374
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Minor
>  Labels: pull-requests-available
> Attachments: AVRO-3374.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While brainstorming for AVRO-3370, I came across this special case where a 
> type-reference could be considered ambiguous if the SDK is not careful when 
> simplifying inherited namespaces:
> {code:json}
> {
>   "type" : "record",
>   "name" : "ns.int",
>   "fields" : [ 
> {"name" : "value", "type" : "int"}, 
> {"name" : "next", "type" : [ "null", "ns.int" ]}
>   ]
> }
> {code}
> In Java, if this code is parsed, it works as expected (as a linked list).
> If the schema is turned to a String using toString(), the namespace is 
> dropped off the last {*}{{ns.int}}{*}, turning it into the primitive. That 
> string can still be parsed into a Schema, but the "round-trip" modifies the 
> schema in an incompatible way.
> That namespace shouldn't be dropped when producing the JSON string 
> representing the Schema in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3520) CustomEncoding doesn't expose the read schema

2022-05-30 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543844#comment-17543844
 ] 

Christophe Le Saec commented on AVRO-3520:
--

You're right, (i had to modify you're unit test to figure it out :P), i add a 
comment to the PR.

> CustomEncoding doesn't expose the read schema
> -
>
> Key: AVRO-3520
> URL: https://issues.apache.org/jira/browse/AVRO-3520
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Colin
>Priority: Major
>  Labels: pull-request-available
> Attachments: patchTest.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently it is not possible to detect a schema change when using 
> `CustomEncoding`.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (AVRO-3523) How to contribute : remove patch section

2022-05-31 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec reassigned AVRO-3523:


Assignee: Christophe Le Saec

> How to contribute : remove patch section
> 
>
> Key: AVRO-3523
> URL: https://issues.apache.org/jira/browse/AVRO-3523
> Project: Apache Avro
>  Issue Type: Task
>  Components: community
>Reporter: Christophe Le Saec
>Assignee: Christophe Le Saec
>Priority: Trivial
>  Labels: documentation
>
> The page [how to 
> contribute|https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute#HowToContribute-CommittingGuidelinesforcommitters]
>  contains description to patch but PR is enough.
> The aim of this JIRA is to remove patch from website.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (AVRO-3523) How to contribute : remove patch section

2022-05-31 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec updated AVRO-3523:
-
Labels: documentation pull-request-available  (was: documentation)

> How to contribute : remove patch section
> 
>
> Key: AVRO-3523
> URL: https://issues.apache.org/jira/browse/AVRO-3523
> Project: Apache Avro
>  Issue Type: Task
>  Components: community
>Reporter: Christophe Le Saec
>Assignee: Christophe Le Saec
>Priority: Trivial
>  Labels: documentation, pull-request-available
>
> The page [how to 
> contribute|https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute#HowToContribute-CommittingGuidelinesforcommitters]
>  contains description to patch but PR is enough.
> The aim of this JIRA is to remove patch from website.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work started] (AVRO-3523) How to contribute : remove patch section

2022-05-31 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-3523 started by Christophe Le Saec.

> How to contribute : remove patch section
> 
>
> Key: AVRO-3523
> URL: https://issues.apache.org/jira/browse/AVRO-3523
> Project: Apache Avro
>  Issue Type: Task
>  Components: community
>Reporter: Christophe Le Saec
>Assignee: Christophe Le Saec
>Priority: Trivial
>  Labels: documentation, pull-request-available
>
> The page [how to 
> contribute|https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute#HowToContribute-CommittingGuidelinesforcommitters]
>  contains description to patch but PR is enough.
> The aim of this JIRA is to remove patch from website.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3523) How to contribute : remove patch section

2022-05-31 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544224#comment-17544224
 ] 

Christophe Le Saec commented on AVRO-3523:
--

I add  ["How to contribute" markdown 
file|https://github.com/apache/avro/blob/af1c4371923429e114ddb31eddc6e94c7c891d37/doc/content/en/project/How%20to%20contribute/_index.md]
 in pull request, copied from [same page in 
wiki|https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute], just 
separate "Patch management" in another section.

I also replace some site references :
- http://java.sun.com/j2se/javadoc/writingdoccomments/ (for javadoc comments) 
by 
https://www.oracle.com/fr/technical-resources/articles/java/javadoc-tool.html 
(I don't know if it's the good replacement, or if there is apache doc for that).
- http://java.sun.com/docs/codeconv/ (for Sun's convention) by 
https://www.oracle.com/java/technologies/javase/codeconventions-introduction.html
 (for Oracle's convention), but same question, is there a kind of "Apache Java 
convention" ?

Then, there's some details about JUnit i hesitated to drop (like "Define 
methods within your class and tag them with the @Test annotation. Call JUnit's 
..."); that's JUnit manual.

> How to contribute : remove patch section
> 
>
> Key: AVRO-3523
> URL: https://issues.apache.org/jira/browse/AVRO-3523
> Project: Apache Avro
>  Issue Type: Task
>  Components: community
>Reporter: Christophe Le Saec
>Assignee: Christophe Le Saec
>Priority: Trivial
>  Labels: documentation, pull-request-available
>
> The page [how to 
> contribute|https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute#HowToContribute-CommittingGuidelinesforcommitters]
>  contains description to patch but PR is enough.
> The aim of this JIRA is to remove patch from website.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3374) [Java] Fully qualified type reference "ns.int" loses namespace.

2022-06-01 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544941#comment-17544941
 ] 

Christophe Le Saec commented on AVRO-3374:
--

So, attached pull request is OK ? Need to find a committer on Avro to control 
and may be integrate it ?

> [Java] Fully qualified type reference "ns.int" loses namespace.
> ---
>
> Key: AVRO-3374
> URL: https://issues.apache.org/jira/browse/AVRO-3374
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Minor
>  Labels: pull-requests-available
> Attachments: AVRO-3374.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While brainstorming for AVRO-3370, I came across this special case where a 
> type-reference could be considered ambiguous if the SDK is not careful when 
> simplifying inherited namespaces:
> {code:json}
> {
>   "type" : "record",
>   "name" : "ns.int",
>   "fields" : [ 
> {"name" : "value", "type" : "int"}, 
> {"name" : "next", "type" : [ "null", "ns.int" ]}
>   ]
> }
> {code}
> In Java, if this code is parsed, it works as expected (as a linked list).
> If the schema is turned to a String using toString(), the namespace is 
> dropped off the last {*}{{ns.int}}{*}, turning it into the primitive. That 
> string can still be parsed into a Schema, but the "round-trip" modifies the 
> schema in an incompatible way.
> That namespace shouldn't be dropped when producing the JSON string 
> representing the Schema in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3374) [Java] Fully qualified type reference "ns.int" loses namespace.

2022-06-01 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545289#comment-17545289
 ] 

Christophe Le Saec commented on AVRO-3374:
--

Sorry, may be i wasn't clear, [Pull 
request|https://github.com/apache/avro/pull/1688] already exists and is already 
linked to this JIRA (see *issue links* title) 

> [Java] Fully qualified type reference "ns.int" loses namespace.
> ---
>
> Key: AVRO-3374
> URL: https://issues.apache.org/jira/browse/AVRO-3374
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Minor
>  Labels: pull-requests-available
> Attachments: AVRO-3374.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While brainstorming for AVRO-3370, I came across this special case where a 
> type-reference could be considered ambiguous if the SDK is not careful when 
> simplifying inherited namespaces:
> {code:json}
> {
>   "type" : "record",
>   "name" : "ns.int",
>   "fields" : [ 
> {"name" : "value", "type" : "int"}, 
> {"name" : "next", "type" : [ "null", "ns.int" ]}
>   ]
> }
> {code}
> In Java, if this code is parsed, it works as expected (as a linked list).
> If the schema is turned to a String using toString(), the namespace is 
> dropped off the last {*}{{ns.int}}{*}, turning it into the primitive. That 
> string can still be parsed into a Schema, but the "round-trip" modifies the 
> schema in an incompatible way.
> That namespace shouldn't be dropped when producing the JSON string 
> representing the Schema in Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3527) Generated equals() and hashCode() for SpecificRecords

2022-06-02 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545431#comment-17545431
 ] 

Christophe Le Saec commented on AVRO-3527:
--

For [hashCode 
method|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1099-L],
 it should be possible, for Record and Arrays, to limit the number of 
comparison ({_}number of field in Record, element in Array and deep in case of 
a value is another record or array{_}), as hashCode doesn't need to always 
differentiate objects.
But i can't see what kind of improvement we can make on equals method (that 
call 
[compare|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1144]),
 as we have to compare all elements until we see a difference.
Any idea ?

> Generated equals() and hashCode() for SpecificRecords
> -
>
> Key: AVRO-3527
> URL: https://issues.apache.org/jira/browse/AVRO-3527
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Steven Aerts
>Priority: Major
> Attachments: equals_hashcode_after.txt, equals_hashcode_before.txt, 
> flame_graph.jpeg
>
>
> When profiling our production system, we found that it was spending almost 
> 40% of its overall time in the {{SpecificRecordBase.hashCode()}} and 
> {{SpecificRecordBase.equals()}} implementations.
> In some sections of its logic we see that almost all time is spend in those 
> function, as can be seen in attached flame graph  (blue "pyramids")
> !flame_graph.jpeg|width=385,height=99!
> By generating the {{.equals()}} and {{.hashCode()}} all this overhead 
> disappeared and this application became 35% faster overall. 
> Also on other AVRO heavy applications we saw noticeable performance gains 
> where we hadn't expect them due to this improvement.
> A generated implementation of {{.hashCode()}} becomes 5 to 10 times faster 
> than its generic counterpart. For {{.equals()}} it is 10 to 20 times faster.
> Which is also visible in the attached JMH benchmarks.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (AVRO-3532) Align naming rules on code

2022-06-09 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3532:


 Summary: Align naming rules on code
 Key: AVRO-3532
 URL: https://issues.apache.org/jira/browse/AVRO-3532
 Project: Apache Avro
  Issue Type: Wish
Reporter: Christophe Le Saec


Description of [naming rule on 
documentation|https://avro.apache.org/docs/current/spec.html#names] is
{noformat}
- start with [A-Za-z_]
- subsequently contain only [A-Za-z0-9_]
{noformat}
But [java 
code|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1578]
 use Character.isLetter method
{code:java}
char first = name.charAt(0);
if (!(Character.isLetter(first) || first == '_'))
  throw new SchemaParseException("Illegal initial character: " + name);
for (int i = 1; i < length; i++) {
  char c = name.charAt(i);
  if (!(Character.isLetterOrDigit(c) || c == '_'))
throw new SchemaParseException("Illegal character in: " + name);
}
return name;
{code}
This method accept accent éùàçË ... and also chinese character (我) ...

So, the aim of this ticket is to see if we can update the documentation, if 
other implementations (rust, C# ...) are also compatible with ?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (AVRO-3532) Align naming rules on code

2022-06-09 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552239#comment-17552239
 ] 

Christophe Le Saec edited comment on AVRO-3532 at 6/9/22 2:08 PM:
--

About 
[discussion|https://lists.apache.org/thread/8q666sll1hkh7hq7kbdthlv2dd9dfnr8], 
i would have same opinion as Nilesh, lot of data sources like GCP storage he 
mentioned but also relational database, json file format ...

Transport data format like [Apache 
Arrow|https://arrow.apache.org/docs/format/Columnar.html#struct-layout] 
({_}Each field must have a UTF8-encoded name{_}) or Amazon ion, ({_}that accept 
every json{_}) also accept UTF-8 name, and it competes directly with Avro.

So, i think it could be great to develop Avro to align it (but, i understand 
this would be a big work).


was (Author: JIRAUSER289541):
About 
[discussion|https://lists.apache.org/thread/8q666sll1hkh7hq7kbdthlv2dd9dfnr8], 
i would have same opinion as Nilesh, lot of data sources like GCP storage he 
mentioned but also relational database, json file format ...


Transport data format like [Apache 
Arrow|https://arrow.apache.org/docs/format/Columnar.html#struct-layout] 
({_}Each field must have a UTF8-encoded name{_}) or Amazon ion, ({_}that accept 
every json{_}) also accept name, and it competes directly with Avro.


So, i think it could be great to develop Avro to align it (but, i understand 
this would be a big work).

> Align naming rules on code
> --
>
> Key: AVRO-3532
> URL: https://issues.apache.org/jira/browse/AVRO-3532
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> Description of [naming rule on 
> documentation|https://avro.apache.org/docs/current/spec.html#names] is
> {noformat}
> - start with [A-Za-z_]
> - subsequently contain only [A-Za-z0-9_]
> {noformat}
> But [java 
> code|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1578]
>  use Character.isLetter method
> {code:java}
> char first = name.charAt(0);
> if (!(Character.isLetter(first) || first == '_'))
>   throw new SchemaParseException("Illegal initial character: " + name);
> for (int i = 1; i < length; i++) {
>   char c = name.charAt(i);
>   if (!(Character.isLetterOrDigit(c) || c == '_'))
> throw new SchemaParseException("Illegal character in: " + name);
> }
> return name;
> {code}
> This method accept accent éùàçË ... and also chinese character (我) ...
> So, the aim of this ticket is to see if we can update the documentation, if 
> other implementations (rust, C# ...) are also compatible with ?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3532) Align naming rules on code

2022-06-09 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552239#comment-17552239
 ] 

Christophe Le Saec commented on AVRO-3532:
--

About 
[discussion|https://lists.apache.org/thread/8q666sll1hkh7hq7kbdthlv2dd9dfnr8], 
i would have same opinion as Nilesh, lot of data sources like GCP storage he 
mentioned but also relational database, json file format ...


Transport data format like [Apache 
Arrow|https://arrow.apache.org/docs/format/Columnar.html#struct-layout] 
({_}Each field must have a UTF8-encoded name{_}) or Amazon ion, ({_}that accept 
every json{_}) also accept name, and it competes directly with Avro.


So, i think it could be great to develop Avro to align it (but, i understand 
this would be a big work).

> Align naming rules on code
> --
>
> Key: AVRO-3532
> URL: https://issues.apache.org/jira/browse/AVRO-3532
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> Description of [naming rule on 
> documentation|https://avro.apache.org/docs/current/spec.html#names] is
> {noformat}
> - start with [A-Za-z_]
> - subsequently contain only [A-Za-z0-9_]
> {noformat}
> But [java 
> code|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1578]
>  use Character.isLetter method
> {code:java}
> char first = name.charAt(0);
> if (!(Character.isLetter(first) || first == '_'))
>   throw new SchemaParseException("Illegal initial character: " + name);
> for (int i = 1; i < length; i++) {
>   char c = name.charAt(i);
>   if (!(Character.isLetterOrDigit(c) || c == '_'))
> throw new SchemaParseException("Illegal character in: " + name);
> }
> return name;
> {code}
> This method accept accent éùàçË ... and also chinese character (我) ...
> So, the aim of this ticket is to see if we can update the documentation, if 
> other implementations (rust, C# ...) are also compatible with ?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3531) GenericDatumReader in multithread lead to infinite loop cause misused of IdentityHashMap

2022-06-10 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552737#comment-17552737
 ] 

Christophe Le Saec commented on AVRO-3531:
--

Hello,
I think we have to make this class threadsafe,

either a quick change like
{code:java}
Class c = stringClassCache.get(s);
if (c == null) {
  synchronized (stringClassCache) {
c = stringClassCache.computeIfAbsent(s, this::findStringClass);
  }
}
return c;
{code}
And also treat stringCtorCache instance the same way ...
Or, as this 2 elements are linked and work together, extract it in a new class.


> GenericDatumReader in multithread lead to infinite loop cause misused of 
> IdentityHashMap
> 
>
> Key: AVRO-3531
> URL: https://issues.apache.org/jira/browse/AVRO-3531
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: tansion
>Priority: Critical
>
> Hi, 
> I am working on a java project that uses Kafka with Avro 
> serialization/deserialization in an messaging platform.
> In production enrionment, we meet a serious issue on the deserialization 
> processs. The GenericDatumReader process some how get into a infinite loop 
> status, and it is happened accationally.
> When the issue happens, The thread stack is like this:
>  
> {code:java}
> "DmqFixedRateConsumer-Thread-17" #453 daemon prio=5 os_prio=0 
> tid=0x7f2ae1832800 nid=0xef49 runnable [0x7f2a743fc000]
>    java.lang.Thread.State: RUNNABLE
>     at java.util.IdentityHashMap.get(IdentityHashMap.java:337)
>     at 
> org.apache.avro.generic.GenericDatumReader.getStringClass(GenericDatumReader.java:503)
>     at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:454)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.r

[jira] [Comment Edited] (AVRO-3531) GenericDatumReader in multithread lead to infinite loop cause misused of IdentityHashMap

2022-06-12 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552737#comment-17552737
 ] 

Christophe Le Saec edited comment on AVRO-3531 at 6/13/22 6:20 AM:
---

Hello,
I think we have to make this class threadsafe,

either a quick change like
{code:java}
synchronized (stringClassCache) {
  c = stringClassCache.computeIfAbsent(s, this::findStringClass);
}
return c;
{code}
And also treat stringCtorCache instance the same way ...
Or, as this 2 elements are linked and work together, extract it in a new class.
(currently, i can't reproduce the error in a unit test)


was (Author: JIRAUSER289541):
Hello,
I think we have to make this class threadsafe,

either a quick change like
{code:java}
Class c = stringClassCache.get(s);
if (c == null) {
  synchronized (stringClassCache) {
c = stringClassCache.computeIfAbsent(s, this::findStringClass);
  }
}
return c;
{code}
And also treat stringCtorCache instance the same way ...
Or, as this 2 elements are linked and work together, extract it in a new class.


> GenericDatumReader in multithread lead to infinite loop cause misused of 
> IdentityHashMap
> 
>
> Key: AVRO-3531
> URL: https://issues.apache.org/jira/browse/AVRO-3531
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: tansion
>Priority: Critical
>
> Hi, 
> I am working on a java project that uses Kafka with Avro 
> serialization/deserialization in an messaging platform.
> In production enrionment, we meet a serious issue on the deserialization 
> processs. The GenericDatumReader process some how get into a infinite loop 
> status, and it is happened accationally.
> When the issue happens, The thread stack is like this:
>  
> {code:java}
> "DmqFixedRateConsumer-Thread-17" #453 daemon prio=5 os_prio=0 
> tid=0x7f2ae1832800 nid=0xef49 runnable [0x7f2a743fc000]
>    java.lang.Thread.State: RUNNABLE
>     at java.util.IdentityHashMap.get(IdentityHashMap.java:337)
>     at 
> org.apache.avro.generic.GenericDatumReader.getStringClass(GenericDatumReader.java:503)
>     at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:454)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic

[jira] [Commented] (AVRO-3531) GenericDatumReader in multithread lead to infinite loop cause misused of IdentityHashMap

2022-06-13 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553437#comment-17553437
 ] 

Christophe Le Saec commented on AVRO-3531:
--

About this, i propose this [PR|https://github.com/apache/avro/pull/1719], but 
the *synchronized* on stringCacheClass has an impact on performance (unit test 
run in 200 ms without to 1.1 sec with), so, i don't know this proposition is 
valid.

> GenericDatumReader in multithread lead to infinite loop cause misused of 
> IdentityHashMap
> 
>
> Key: AVRO-3531
> URL: https://issues.apache.org/jira/browse/AVRO-3531
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: tansion
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi, 
> I am working on a java project that uses Kafka with Avro 
> serialization/deserialization in an messaging platform.
> In production enrionment, we meet a serious issue on the deserialization 
> processs. The GenericDatumReader process some how get into a infinite loop 
> status, and it is happened accationally.
> When the issue happens, The thread stack is like this:
>  
> {code:java}
> "DmqFixedRateConsumer-Thread-17" #453 daemon prio=5 os_prio=0 
> tid=0x7f2ae1832800 nid=0xef49 runnable [0x7f2a743fc000]
>    java.lang.Thread.State: RUNNABLE
>     at java.util.IdentityHashMap.get(IdentityHashMap.java:337)
>     at 
> org.apache.avro.generic.GenericDatumReader.getStringClass(GenericDatumReader.java:503)
>     at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:454)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDat

[jira] [Commented] (AVRO-3531) GenericDatumReader in multithread lead to infinite loop cause misused of IdentityHashMap

2022-06-13 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553595#comment-17553595
 ] 

Christophe Le Saec commented on AVRO-3531:
--

Hi,
No, on HashMap, you can't do get & put in same time, this can lead to crash. 
Here, you can have x get and one put at same time.
May be possible with a ReentrantReadWriteLock, i will try this.

> GenericDatumReader in multithread lead to infinite loop cause misused of 
> IdentityHashMap
> 
>
> Key: AVRO-3531
> URL: https://issues.apache.org/jira/browse/AVRO-3531
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: tansion
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi, 
> I am working on a java project that uses Kafka with Avro 
> serialization/deserialization in an messaging platform.
> In production enrionment, we meet a serious issue on the deserialization 
> processs. The GenericDatumReader process some how get into a infinite loop 
> status, and it is happened accationally.
> When the issue happens, The thread stack is like this:
>  
> {code:java}
> "DmqFixedRateConsumer-Thread-17" #453 daemon prio=5 os_prio=0 
> tid=0x7f2ae1832800 nid=0xef49 runnable [0x7f2a743fc000]
>    java.lang.Thread.State: RUNNABLE
>     at java.util.IdentityHashMap.get(IdentityHashMap.java:337)
>     at 
> org.apache.avro.generic.GenericDatumReader.getStringClass(GenericDatumReader.java:503)
>     at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:454)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>   

[jira] [Commented] (AVRO-3531) GenericDatumReader in multithread lead to infinite loop cause misused of IdentityHashMap

2022-06-13 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553612#comment-17553612
 ] 

Christophe Le Saec commented on AVRO-3531:
--

Indeed,
I update the PR to use ReentrantReadWriteLock, it works and has better 
performance than simple synchronized, but code become little more complex.

> GenericDatumReader in multithread lead to infinite loop cause misused of 
> IdentityHashMap
> 
>
> Key: AVRO-3531
> URL: https://issues.apache.org/jira/browse/AVRO-3531
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.0
>Reporter: tansion
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi, 
> I am working on a java project that uses Kafka with Avro 
> serialization/deserialization in an messaging platform.
> In production enrionment, we meet a serious issue on the deserialization 
> processs. The GenericDatumReader process some how get into a infinite loop 
> status, and it is happened accationally.
> When the issue happens, The thread stack is like this:
>  
> {code:java}
> "DmqFixedRateConsumer-Thread-17" #453 daemon prio=5 os_prio=0 
> tid=0x7f2ae1832800 nid=0xef49 runnable [0x7f2a743fc000]
>    java.lang.Thread.State: RUNNABLE
>     at java.util.IdentityHashMap.get(IdentityHashMap.java:337)
>     at 
> org.apache.avro.generic.GenericDatumReader.getStringClass(GenericDatumReader.java:503)
>     at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:454)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:187)
>     at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:291)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDat

[jira] [Commented] (AVRO-2787) Hadoop Mapreduce job fails when creating Writer

2022-06-16 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554953#comment-17554953
 ] 

Christophe Le Saec commented on AVRO-2787:
--

As development is done with "Avro version 1.9.2" and  job executed with "With 
Avro 1.8.2", it can happen, especially as the constructors of this class are 
not same between 2 versions (some were removed, other added).

This can lead to NoSuchMethodError error without implying an error in Avro. Can 
you have a try after aligning Avro versions (compile & run) ?

> Hadoop Mapreduce job fails when creating Writer
> ---
>
> Key: AVRO-2787
> URL: https://issues.apache.org/jira/browse/AVRO-2787
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.2
> Environment: Development
>  * OS: Fedora 31
>  * Java version 8
>  * Gradle version 6.2.2
>  * Avro version 1.9.2
>  * Shadow version 5.2.0
>  * Gradle-avro-plugin version 0.19.1
> Running in a Podman container
>  * OS: Ubuntu 18.04
>  * Podman 1.8.2
>  * Hadoop version 3.2.1
>  * Java version 8
>Reporter: Anton Oellerer
>Priority: Blocker
> Attachments: CategoryData.avsc, CategoryTokensReducer.java, 
> TextprocessingfundamentalsApplication.java
>
>
> Hey,
> I am trying to create a Hadoop pipeline getting the chi squared value in for 
> tokens in reviews saved in JSON.
> For this, I created multiple Hadoop jobs, and the communication between them 
> happens, partly, with Avro Data containers.
> When trying to run this pipeline, I get the following error at the end of the 
> first reduce Job (Signature
> {code:java}
> public class CategoryTokensReducer extends Reducer AvroKey, AvroValue>{code}
> )
> Error:
> {code:java}
> java.lang.Exception: java.lang.NoSuchMethodError: 
> org.apache.avro.Schema$Field.(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) 
>   
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)  
>   
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.avro.Schema$Field.(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
> at 
> org.apache.avro.hadoop.io.AvroKeyValue.getSchema(AvroKeyValue.java:111)   
>  
> at 
> org.apache.avro.mapreduce.AvroKeyValueRecordWriter.(AvroKeyValueRecordWriter.java:84)
>  
> at 
> org.apache.avro.mapreduce.AvroKeyValueOutputFormat.getRecordWriter(AvroKeyValueOutputFormat.java:70)
> at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542)
> at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)   
> 
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)   
> 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)  
>  
> {code}
> The Job is setup like this:
> {code:java}
> Job jsonToCategoryTokensJob = Job.getInstance(conf, "json to category data");
> AvroJob.setOutputKeySchema(jsonToCategoryTokensJob, 
> Schema.create(Schema.Type.STRING));
> AvroJob.setOutputValueSchema(jsonToCategoryTokensJob, 
> CategoryData.getClassSchema());
> jsonToCategoryTokensJob.setJarByClass(TextprocessingfundamentalsApplication.class);
> jsonToCategoryTokensJob.setMapperClass(JsonToCategoryTokensMapper.class);
> jsonToCategoryTokensJob.setMapOutputKeyClass(Text.class);
> jsonToCategoryTokensJob.setMapOutputValueClass(StringArrayWritable.class);
> jsonToCategoryTokensJob.setReducerClass(CategoryTokensReducer.class);
> jsonToCategoryTokensJob.setOutputFormatClass(AvroKeyValueOutputFormat.class);
> String in = otherArgs.get(0);
> String out = otherArgs.get(1);
> FileInputFormat.addInputPath(jsonToCategoryTokensJob, new Path(in));
> FileOutputFormat.setOutputPath(jsonToCategoryTokensJob, new Path(out, 
> "outCategoryData"));
> {code}
> The pipeline is run by first building a shadowJar from the source in the 
> development environment and then running it in a podman container.
> With Avro 1.8.2 and gradle plugin 0.16.0 the reduce job works. 
> Does someone know what the pr

[jira] [Commented] (AVRO-2160) Json to Avro with non required value and union schema failing

2022-06-16 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554992#comment-17554992
 ] 

Christophe Le Saec commented on AVRO-2160:
--

Issue is not about "union type" but more about default value, it also occurs 
with schema :
{code:json}
{"type":"record", "namespace":"foo","name":"Person",
  "fields":[
{"name":"lastname","type": "string", "default": "last"},
{"name":"firstname","type":"string"},
{"name":"age","type":["null","int"], "default":null}
]}
{code}
And input data (so lastname is missing, and default value "last" should replace 
it) ?
{code:json}
{"firstname":"John","age":{"int":35}}
{code}
I think [FieldAdjustAction Symbol 
class|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/io/parsing/Symbol.java#L587]
 should embed default value that could be easily provided by 
[JsonGrammarGenerator|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/io/parsing/JsonGrammarGenerator.java#L85]
 and then [JsonDecoder could use it (if it has one) before throwing 
exception|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/io/JsonDecoder.java#L473].
(At least, to my point, that's the meaning of default value)

> Json to Avro with non required value and union schema failing
> -
>
> Key: AVRO-2160
> URL: https://issues.apache.org/jira/browse/AVRO-2160
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Lydie
>Priority: Critical
>  Labels: java
>
> I am trying to convert this string:
> str str4
> using this schema:
> {"type":"record", 
> "namespace":"foo","name":"Person","fields":[\\{"name":"lastname","type": 
> ["null","string"], "default":null}
> ,\{"name":"firstname","type":"string"},{"name":"age","type":["null","int"], 
> "default":null}]}
> I get this error 
> {color:#ff}com.syapse.messagePublisher.publisher.AvroEncodeException: 
> Expected field name not found: 
> lastnamein\{"firstname":"John","age":{"int":35}}{color}at 
> com.syapse.messagePublisher.publisher.AvroEncoder.convertJsonToAvro(AvroEncoder.java:78)
>  
> Although this should give me the correct syntax for a non required filed.
> Note that it works for 
> {"lastname":\\{"string" : "Doe"}
> ,"firstname":"John","age":\{"int":36}}
>  
> What am I missing ( using Abro 1.8.2)
> here is my code:
>  
> {code:java}
> public static byte[] convertJsonToAvro(byte[] data, String schemaStr) throws 
> AvroEncodeException {
> InputStream input = null;
> DataFileWriter writer = null;
> ByteArrayOutputStream output = null;
> try {
> Schema schema = new Schema.Parser().parse(schemaStr);
> DatumReader reader = new 
> GenericDatumReader(schema);
> input = new ByteArrayInputStream(data);
> DataInputStream din = new DataInputStream(input);
> output = new ByteArrayOutputStream();
> writer = new DataFileWriter(new 
> GenericDatumWriter());
> writer.create(schema, output);
> Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
> GenericRecord datum = null;
> while (true) {
> try {
> datum = reader.read(null, decoder);
> } catch (EOFException eofe) {
> break;
> }
> writer.append(datum);
> }
> writer.flush();
> writer.close();
> return output.toByteArray();
> } catch (AvroTypeException e) {
> throw new AvroEncodeException(e.getMessage() + "in" + new String(data));
> } catch (IOException e1) {
> throw new AvroEncodeException("Error decoding Json " + e1.getMessage());
> } finally {
> try {
> input.close();
> } catch (Exception e) {
> }
> }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (AVRO-2160) Json to Avro with non required value and union schema failing

2022-06-17 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec updated AVRO-2160:
-
Labels: java pull-request-available  (was: java)

> Json to Avro with non required value and union schema failing
> -
>
> Key: AVRO-2160
> URL: https://issues.apache.org/jira/browse/AVRO-2160
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Lydie
>Priority: Critical
>  Labels: java, pull-request-available
>
> I am trying to convert this string:
> str str4
> using this schema:
> {"type":"record", 
> "namespace":"foo","name":"Person","fields":[\\{"name":"lastname","type": 
> ["null","string"], "default":null}
> ,\{"name":"firstname","type":"string"},{"name":"age","type":["null","int"], 
> "default":null}]}
> I get this error 
> {color:#ff}com.syapse.messagePublisher.publisher.AvroEncodeException: 
> Expected field name not found: 
> lastnamein\{"firstname":"John","age":{"int":35}}{color}at 
> com.syapse.messagePublisher.publisher.AvroEncoder.convertJsonToAvro(AvroEncoder.java:78)
>  
> Although this should give me the correct syntax for a non required filed.
> Note that it works for 
> {"lastname":\\{"string" : "Doe"}
> ,"firstname":"John","age":\{"int":36}}
>  
> What am I missing ( using Abro 1.8.2)
> here is my code:
>  
> {code:java}
> public static byte[] convertJsonToAvro(byte[] data, String schemaStr) throws 
> AvroEncodeException {
> InputStream input = null;
> DataFileWriter writer = null;
> ByteArrayOutputStream output = null;
> try {
> Schema schema = new Schema.Parser().parse(schemaStr);
> DatumReader reader = new 
> GenericDatumReader(schema);
> input = new ByteArrayInputStream(data);
> DataInputStream din = new DataInputStream(input);
> output = new ByteArrayOutputStream();
> writer = new DataFileWriter(new 
> GenericDatumWriter());
> writer.create(schema, output);
> Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
> GenericRecord datum = null;
> while (true) {
> try {
> datum = reader.read(null, decoder);
> } catch (EOFException eofe) {
> break;
> }
> writer.append(datum);
> }
> writer.flush();
> writer.close();
> return output.toByteArray();
> } catch (AvroTypeException e) {
> throw new AvroEncodeException(e.getMessage() + "in" + new String(data));
> } catch (IOException e1) {
> throw new AvroEncodeException("Error decoding Json " + e1.getMessage());
> } finally {
> try {
> input.close();
> } catch (Exception e) {
> }
> }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-2160) Json to Avro with non required value and union schema failing

2022-06-17 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555481#comment-17555481
 ] 

Christophe Le Saec commented on AVRO-2160:
--

Yes, common json usage is without schema, so it can't behave same as Avro; but 
[swagger allows default 
value|https://swagger.io/blog/api-development/unlocking-the-spec-the-default-keyword/].

The linked PR is to allow that, for a field defined as
{code:json}
{"name":"lastname","type":["string","null"],"default":"last"} 
{code}
{"lastname": null} and {}
won't have same meaning, lastname will be "last" for second example, which is 
the definition of default (and this also work for binary format).

 

The only thing this PR won't do is using default value for writing. When you 
have an indexed record with a field value that matches the default value, you 
can't know if in source, value was missing or if it was equals to the default; 
so, writing will explicitly write the value.

> Json to Avro with non required value and union schema failing
> -
>
> Key: AVRO-2160
> URL: https://issues.apache.org/jira/browse/AVRO-2160
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Lydie
>Priority: Critical
>  Labels: java, pull-request-available
>
> I am trying to convert this string:
> str str4
> using this schema:
> {"type":"record", 
> "namespace":"foo","name":"Person","fields":[\\{"name":"lastname","type": 
> ["null","string"], "default":null}
> ,\{"name":"firstname","type":"string"},{"name":"age","type":["null","int"], 
> "default":null}]}
> I get this error 
> {color:#ff}com.syapse.messagePublisher.publisher.AvroEncodeException: 
> Expected field name not found: 
> lastnamein\{"firstname":"John","age":{"int":35}}{color}at 
> com.syapse.messagePublisher.publisher.AvroEncoder.convertJsonToAvro(AvroEncoder.java:78)
>  
> Although this should give me the correct syntax for a non required filed.
> Note that it works for 
> {"lastname":\\{"string" : "Doe"}
> ,"firstname":"John","age":\{"int":36}}
>  
> What am I missing ( using Abro 1.8.2)
> here is my code:
>  
> {code:java}
> public static byte[] convertJsonToAvro(byte[] data, String schemaStr) throws 
> AvroEncodeException {
> InputStream input = null;
> DataFileWriter writer = null;
> ByteArrayOutputStream output = null;
> try {
> Schema schema = new Schema.Parser().parse(schemaStr);
> DatumReader reader = new 
> GenericDatumReader(schema);
> input = new ByteArrayInputStream(data);
> DataInputStream din = new DataInputStream(input);
> output = new ByteArrayOutputStream();
> writer = new DataFileWriter(new 
> GenericDatumWriter());
> writer.create(schema, output);
> Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
> GenericRecord datum = null;
> while (true) {
> try {
> datum = reader.read(null, decoder);
> } catch (EOFException eofe) {
> break;
> }
> writer.append(datum);
> }
> writer.flush();
> writer.close();
> return output.toByteArray();
> } catch (AvroTypeException e) {
> throw new AvroEncodeException(e.getMessage() + "in" + new String(data));
> } catch (IOException e1) {
> throw new AvroEncodeException("Error decoding Json " + e1.getMessage());
> } finally {
> try {
> input.close();
> } catch (Exception e) {
> }
> }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3527) Generated equals() and hashCode() for SpecificRecords

2022-06-17 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555650#comment-17555650
 ] 

Christophe Le Saec commented on AVRO-3527:
--

For GenericData implementation, you can't easily optimized the equals method, 
as you said, you have to "{_}loop over all it fields{_}", but it's not 
mandatory for [hashCode 
method|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1099-L],
 you can, for example, just scan 4 first fields (and for first value for Array).

> Generated equals() and hashCode() for SpecificRecords
> -
>
> Key: AVRO-3527
> URL: https://issues.apache.org/jira/browse/AVRO-3527
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Steven Aerts
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
> Attachments: equals_hashcode_after.txt, equals_hashcode_before.txt, 
> flame_graph.jpeg
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When profiling our production system, we found that it was spending almost 
> 40% of its overall time in the {{SpecificRecordBase.hashCode()}} and 
> {{SpecificRecordBase.equals()}} implementations.
> In some sections of its logic we see that almost all time is spend in those 
> function, as can be seen in attached flame graph  (blue "pyramids")
> !flame_graph.jpeg|width=385,height=99!
> By generating the {{.equals()}} and {{.hashCode()}} all this overhead 
> disappeared and this application became 35% faster overall. 
> Also on other AVRO heavy applications we saw noticeable performance gains 
> where we hadn't expect them due to this improvement.
> A generated implementation of {{.hashCode()}} becomes 5 to 10 times faster 
> than its generic counterpart. For {{.equals()}} it is 10 to 20 times faster.
> Which is also visible in the attached JMH benchmarks.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (AVRO-3537) TestNettyTransceiverWhenFailsToConnect often fail

2022-06-20 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3537:


 Summary: TestNettyTransceiverWhenFailsToConnect often fail
 Key: AVRO-3537
 URL: https://issues.apache.org/jira/browse/AVRO-3537
 Project: Apache Avro
  Issue Type: Bug
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec


TestNettyTransceiverWhenFailsToConnect unit test except a failure in netty 
[connection due to a short time out of 1 
ms|https://github.com/apache/avro/blob/master/lang/java/ipc-netty/src/test/java/org/apache/avro/ipc/netty/TestNettyTransceiverWhenFailsToConnect.java#L41].
 
Unfortunately, sometimes, 1 ms is enough to build connection, so i propose to 
replace this by zero, which, this time trigger the io exception, and fix the 
unit test.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (AVRO-3537) TestNettyTransceiverWhenFailsToConnect often fail

2022-06-20 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec updated AVRO-3537:
-
Labels: pull-requests-available  (was: pull-request-available)

> TestNettyTransceiverWhenFailsToConnect often fail
> -
>
> Key: AVRO-3537
> URL: https://issues.apache.org/jira/browse/AVRO-3537
> Project: Apache Avro
>  Issue Type: Bug
>Reporter: Christophe Le Saec
>Assignee: Christophe Le Saec
>Priority: Minor
>  Labels: pull-requests-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestNettyTransceiverWhenFailsToConnect unit test except a failure in netty 
> [connection due to a short time out of 1 
> ms|https://github.com/apache/avro/blob/master/lang/java/ipc-netty/src/test/java/org/apache/avro/ipc/netty/TestNettyTransceiverWhenFailsToConnect.java#L41].
>  
> Unfortunately, sometimes, 1 ms is enough to build connection, so i propose to 
> replace this by zero, which, this time trigger the io exception, and fix the 
> unit test.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (AVRO-3537) TestNettyTransceiverWhenFailsToConnect often fail

2022-06-20 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec reassigned AVRO-3537:


Assignee: (was: Christophe Le Saec)

> TestNettyTransceiverWhenFailsToConnect often fail
> -
>
> Key: AVRO-3537
> URL: https://issues.apache.org/jira/browse/AVRO-3537
> Project: Apache Avro
>  Issue Type: Bug
>Reporter: Christophe Le Saec
>Priority: Minor
>  Labels: pull-requests-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestNettyTransceiverWhenFailsToConnect unit test except a failure in netty 
> [connection due to a short time out of 1 
> ms|https://github.com/apache/avro/blob/master/lang/java/ipc-netty/src/test/java/org/apache/avro/ipc/netty/TestNettyTransceiverWhenFailsToConnect.java#L41].
>  
> Unfortunately, sometimes, 1 ms is enough to build connection, so i propose to 
> replace this by zero, which, this time trigger the io exception, and fix the 
> unit test.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (AVRO-3554) Create original art for the Avro logo

2022-07-01 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561473#comment-17561473
 ] 

Christophe Le Saec commented on AVRO-3554:
--

I'm not an expert on marketing and logo, but fun to participate :D
For *Stream* , why not a [design 
salmon|https://www.svgrepo.com/svg/323927/salmon], may be with smile ... As it 
follow rivers & streams, it's dynamic (for performance image), it's kind of 
symbol for ecology.

I thought that pulsar logo was more about radio signal graph from star (or from 
pulsar), like on the album of Joy Division.

> Create original art for the Avro logo
> -
>
> Key: AVRO-3554
> URL: https://issues.apache.org/jira/browse/AVRO-3554
> Project: Apache Avro
>  Issue Type: Improvement
>Reporter: Ryan Skraba
>Priority: Major
> Fix For: 1.12.0
>
> Attachments: OldAsmLogo.png, cuttlefish oscar 2.svg, cuttlefish 
> oscar.svg
>
>
> There was a quick discussion with Apache Trademarks along the lines "If it 
> ever came to a legal challenge, would we care enough to defend our usage? If 
> not, changing it discreetly and voluntarily is the best route."
> The result: we _must_ change our logo this year to some original art.
>  
> Criteria for a new logo, as discussed on the mailing list: 
> [https://lists.apache.org/thread/m5w7fmkjnrl8m4hlbw8xhzr4v69xg3ml]:
> Must:
>  * be unique and not similar to anything “out there”
>  * be available in a vector format (SVG) so it can be scaled with full 
> accuracy
> Should:
>  * show either something of the name (like Beam, Airflow, Ant, Drill) or what 
> it does (like Mahout, Bookkeeper, Ratis, Pulsar)
>  * have 2 variants: Square/Circle (social media, favicon, etc) and rectangle 
> (top banner)
> Could:
>  * have the banner logo be the compact logo with the name attached, or more 
> elaborate
> Would:
>  * have limited colors; at most 5-10 (so like Iceberg, not like Flex with the 
> gradients)
>  
> _To allow sufficient artistic freedom, we should limit new requirements. 
> Instead, please add ideas in the comments._
>  
> See [https://www.apache.org/logos/] for many examples



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-2774) missing @Override annotations in generated code

2022-07-06 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563202#comment-17563202
 ] 

Christophe Le Saec commented on AVRO-2774:
--

Hello,
The linked pull request add @Override annotation by modifying vm file 
(record.vm & enum.vm), and also add unit test to ensure that all overrides 
method on generated code have this annotation. 

> missing @Override annotations in generated code
> ---
>
> Key: AVRO-2774
> URL: https://issues.apache.org/jira/browse/AVRO-2774
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.9.1, 1.9.2
> Environment: openjdk version "11.0.6" 2020-01-14 LTS
> avro 1.9.2
> gradle avro plugin
>Reporter: Tim Spriggs
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When applying errorProne to my project, I get errors from the MissingOverride 
> rule. eg:
> error: [MissingOverride] getSpecificData overrides method in 
> SpecificRecordBase
> error: [MissingOverride] getSchema implements method in SpecificRecordBase
> error: [MissingOverride] get implements method in SpecificRecordBase
> error: [MissingOverride] put implements method in SpecificRecordBase
>  
> If these are always tagged with @Override then static analysis and IDE hints 
> perform better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-530) allow for mutual recursion in type definitions

2022-07-07 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563650#comment-17563650
 ] 

Christophe Le Saec commented on AVRO-530:
-

10 years after ;)
This code in Java
{code:java}
Schema parse = new Schema.Parser().parse(" {\"type\":\"record\", 
\"name\":\"SelfRefType\","
+ " \"fields\":[{\"type\": \"SelfRefType\", 
\"name\":\"self\"}]} ");
{code}
work well, so time to close this ticket ?

> allow for mutual recursion in type definitions
> --
>
> Key: AVRO-530
> URL: https://issues.apache.org/jira/browse/AVRO-530
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: spec
>Affects Versions: 1.3.2
>Reporter: Jeff Hodges
>Priority: Major
>
> Suppose you have these two types in your protocol:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": "Status"}]}
> {"name": "Status", "type": "record", "fields": [{"name": "author", "type": 
> "User"}]}
> {code}
> This will raise an error! The current workaround is to define one of them at 
> their first usage. Like:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": {"name": "Status", "type": "record", "fields": [.. lots of fields 
> ...]}]}
> {code}
> But this is incredibly unwieldy. It would be really nice for the spec to 
> require all the parsers to allow for mutual recursion, instead. It could be 
> done by implementing a two-pass parser. One pass to acquire names referenced, 
> and a second to fill in those names with their appropriate references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-1365) NPE thrown when comparing objects using GenericData.compare

2022-07-07 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563660#comment-17563660
 ] 

Christophe Le Saec commented on AVRO-1365:
--

Here, o1 & o2 can't be null if of type "ENUM", and nor NULL, nor Union(ENUM, 
NULL)

So, NPE means that at least one object doesn't respect the schema.

> NPE thrown when comparing objects using GenericData.compare
> ---
>
> Key: AVRO-1365
> URL: https://issues.apache.org/jira/browse/AVRO-1365
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.5
>Reporter: Douglas Kaminsky
>Priority: Minor
>  Labels: beginner
>
> When comparing two objects using GenericData.compare (directly or 
> indirectly), null values in fields of record type objects are not 
> sufficiently protected against, resulting in NPE
> e.g.
> {code}
> case ENUM:
>   return s.getEnumOrdinal(o1.toString()) - s.getEnumOrdinal(o2.toString());
> {code}
> This is prevalent throughout the {{compare}} method. This impacts 
> {{compareTo}}, and {{equals}} implementations as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-530) allow for mutual recursion in type definitions

2022-07-07 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563769#comment-17563769
 ] 

Christophe Le Saec commented on AVRO-530:
-

Ok, so to keep consistency, could we imagine classes Registry & RegistryBuilder 
with RegistryBuilder that could parse new record schema with fields that refer 
to unknown schema ... And where the check is done on "Registry 
RegistryBuilder.build()" method; which would require that all referred schema 
exist ?

> allow for mutual recursion in type definitions
> --
>
> Key: AVRO-530
> URL: https://issues.apache.org/jira/browse/AVRO-530
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: spec
>Affects Versions: 1.3.2
>Reporter: Jeff Hodges
>Priority: Major
>
> Suppose you have these two types in your protocol:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": "Status"}]}
> {"name": "Status", "type": "record", "fields": [{"name": "author", "type": 
> "User"}]}
> {code}
> This will raise an error! The current workaround is to define one of them at 
> their first usage. Like:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": {"name": "Status", "type": "record", "fields": [.. lots of fields 
> ...]}]}
> {code}
> But this is incredibly unwieldy. It would be really nice for the spec to 
> require all the parsers to allow for mutual recursion, instead. It could be 
> done by implementing a two-pass parser. One pass to acquire names referenced, 
> and a second to fill in those names with their appropriate references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-530) allow for mutual recursion in type definitions

2022-07-12 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565449#comment-17565449
 ] 

Christophe Le Saec commented on AVRO-530:
-

Hello,
The linked Pull Request should answer the issue (the examples above and in 
description work well in unit test).

The update of code is quite heavy.

> allow for mutual recursion in type definitions
> --
>
> Key: AVRO-530
> URL: https://issues.apache.org/jira/browse/AVRO-530
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: spec
>Affects Versions: 1.3.2
>Reporter: Jeff Hodges
>Priority: Major
>
> Suppose you have these two types in your protocol:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": "Status"}]}
> {"name": "Status", "type": "record", "fields": [{"name": "author", "type": 
> "User"}]}
> {code}
> This will raise an error! The current workaround is to define one of them at 
> their first usage. Like:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": {"name": "Status", "type": "record", "fields": [.. lots of fields 
> ...]}]}
> {code}
> But this is incredibly unwieldy. It would be really nice for the spec to 
> require all the parsers to allow for mutual recursion, instead. It could be 
> done by implementing a two-pass parser. One pass to acquire names referenced, 
> and a second to fill in those names with their appropriate references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-530) allow for mutual recursion in type definitions

2022-07-12 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565475#comment-17565475
 ] 

Christophe Le Saec commented on AVRO-530:
-

Yes may be, i don't know how improvement/new feature are usually managed to 
synchronize it for all languages, and i won't be able to code this for majority 
of it.

> allow for mutual recursion in type definitions
> --
>
> Key: AVRO-530
> URL: https://issues.apache.org/jira/browse/AVRO-530
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: spec
>Affects Versions: 1.3.2
>Reporter: Jeff Hodges
>Priority: Major
>
> Suppose you have these two types in your protocol:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": "Status"}]}
> {"name": "Status", "type": "record", "fields": [{"name": "author", "type": 
> "User"}]}
> {code}
> This will raise an error! The current workaround is to define one of them at 
> their first usage. Like:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": {"name": "Status", "type": "record", "fields": [.. lots of fields 
> ...]}]}
> {code}
> But this is incredibly unwieldy. It would be really nice for the spec to 
> require all the parsers to allow for mutual recursion, instead. It could be 
> done by implementing a two-pass parser. One pass to acquire names referenced, 
> and a second to fill in those names with their appropriate references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3579) Java Test : From Junit4 to JUnit5

2022-07-14 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3579:


 Summary: Java Test : From Junit4 to JUnit5
 Key: AVRO-3579
 URL: https://issues.apache.org/jira/browse/AVRO-3579
 Project: Apache Avro
  Issue Type: Improvement
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec


Progressively pass to JUnit4 to JUnit5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-2918) Schema polymorphism

2022-07-17 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567837#comment-17567837
 ] 

Christophe Le Saec commented on AVRO-2918:
--

The above example would also imply that an array of Person could contains some 
Employee(s), (same with a Field which is declared as a Person).
So, each record data that could belong to an extensible record type should 
store its real type near to its value.

So this imply modifications on Encoder/Decoder classes ?



> Schema polymorphism
> ---
>
> Key: AVRO-2918
> URL: https://issues.apache.org/jira/browse/AVRO-2918
> Project: Apache Avro
>  Issue Type: New Feature
>  Components: logical types, misc, spec
>Reporter: Jonathan Rapoport
>Priority: Critical
>  Labels: features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Include the option to use named types as base types for a new schema. Allow 
> for MRO generation. Field inheritance. 
> The benefits of this approach include:
>  * Defining a schema as validation for a certain wire, and so allowing the 
> receiver to be certain of the structure of the data (this works today). 
> However, defining an extension of this schema, or certain schemas which can 
> be normalized to the original schema, but contain additional information, 
> will not allow it to be sent over the same wire.
>  * Backwards compatibility through inheritance - you never break the old 
> schema, thus allowing a long integration period, with no need to recode all 
> processes familiar with the schema. The new schema will simply inherit the 
> old one, and only add information.
>  * Allow for full data control through polymorphism, and the ability to 
> replace structures within any supported language. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-2918) Schema polymorphism

2022-07-18 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568065#comment-17568065
 ] 

Christophe Le Saec commented on AVRO-2918:
--

I thought that having an Array of Person should be almost like having an Array 
of Union[ Person | Employee ]; and solution should be not so far of it ({_}like 
indexed all Record in a hierarchy to get the real type{_}).

> Schema polymorphism
> ---
>
> Key: AVRO-2918
> URL: https://issues.apache.org/jira/browse/AVRO-2918
> Project: Apache Avro
>  Issue Type: New Feature
>  Components: logical types, misc, spec
>Reporter: Jonathan Rapoport
>Priority: Critical
>  Labels: features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Include the option to use named types as base types for a new schema. Allow 
> for MRO generation. Field inheritance. 
> The benefits of this approach include:
>  * Defining a schema as validation for a certain wire, and so allowing the 
> receiver to be certain of the structure of the data (this works today). 
> However, defining an extension of this schema, or certain schemas which can 
> be normalized to the original schema, but contain additional information, 
> will not allow it to be sent over the same wire.
>  * Backwards compatibility through inheritance - you never break the old 
> schema, thus allowing a long integration period, with no need to recode all 
> processes familiar with the schema. The new schema will simply inherit the 
> old one, and only add information.
>  * Allow for full data control through polymorphism, and the ability to 
> replace structures within any supported language. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-530) allow for mutual recursion in type definitions

2022-07-19 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568567#comment-17568567
 ] 

Christophe Le Saec commented on AVRO-530:
-

At least, it's already work for rust with Schema::parse_list function :)
Nevertheless, i will do the JIRA for others languages.

> allow for mutual recursion in type definitions
> --
>
> Key: AVRO-530
> URL: https://issues.apache.org/jira/browse/AVRO-530
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: spec
>Affects Versions: 1.3.2
>Reporter: Jeff Hodges
>Priority: Major
>
> Suppose you have these two types in your protocol:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": "Status"}]}
> {"name": "Status", "type": "record", "fields": [{"name": "author", "type": 
> "User"}]}
> {code}
> This will raise an error! The current workaround is to define one of them at 
> their first usage. Like:
> {code}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": {"name": "Status", "type": "record", "fields": [.. lots of fields 
> ...]}]}
> {code}
> But this is incredibly unwieldy. It would be really nice for the spec to 
> require all the parsers to allow for mutual recursion, instead. It could be 
> done by implementing a two-pass parser. One pass to acquire names referenced, 
> and a second to fill in those names with their appropriate references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3584) allow for mutual recursion in type definitions for all languages

2022-07-21 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3584:


 Summary: allow for mutual recursion in type definitions for all 
languages
 Key: AVRO-3584
 URL: https://issues.apache.org/jira/browse/AVRO-3584
 Project: Apache Avro
  Issue Type: Wish
Reporter: Christophe Le Saec


This idea is to allow AVRO to load schemas with recursion described like this:
{code:json}
{"name": "User", "type": "record", "fields": [{"name": "current_status", 
"type": "Status"}]}
{"name": "Status", "type": "record", "fields": [{"name": "author", "type": 
"User"}]}
{code}

- Java : [AVRO-530|https://issues.apache.org/jira/browse/AVRO-530] is ready to 
checked (PR ready).
- Rust : Unit test show that it already works (May be better to add a unit test 
in a PR).

For others languages (C, C++, C#, Python ...); 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3584) allow for mutual recursion in type definitions for all languages

2022-07-21 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569290#comment-17569290
 ] 

Christophe Le Saec commented on AVRO-3584:
--

Rust unit test work :
{code}
Unable to find source-code formatter for language: rust. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
let schema_str_a = r#"{
"name": "A",
"type": "record",
"fields": [  {"name": "field_one", "type": "B"} ]
}"#;

let schema_str_b = r#"{
"name": "B",
"type": "record",
"fields": [ {"name": "field_one", "type": "A"} ]
}"#;

// we get Error::GetNameField if we put ["A", "B"] directly here.
let schema_str_c = r#"{
"name": "C",
"type": "record",
"fields": [ {"name": "field_one",  "type": ["A", "B"]} ]
}"#;

let list = Schema::parse_list(&[schema_str_a, schema_str_b, 
schema_str_c])
.unwrap();
{code}

> allow for mutual recursion in type definitions for all languages
> 
>
> Key: AVRO-3584
> URL: https://issues.apache.org/jira/browse/AVRO-3584
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> This idea is to allow AVRO to load schemas with recursion described like this:
> {code:json}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": "Status"}]}
> {"name": "Status", "type": "record", "fields": [{"name": "author", "type": 
> "User"}]}
> {code}
> - Java : [AVRO-530|https://issues.apache.org/jira/browse/AVRO-530] is ready 
> to checked (PR ready).
> - Rust : Unit test show that it already works (May be better to add a unit 
> test in a PR).
> For others languages (C, C++, C#, Python ...); 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3584) allow for mutual recursion in type definitions for all languages

2022-07-21 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569307#comment-17569307
 ] 

Christophe Le Saec commented on AVRO-3584:
--

Hi Martin, I'll create it

> allow for mutual recursion in type definitions for all languages
> 
>
> Key: AVRO-3584
> URL: https://issues.apache.org/jira/browse/AVRO-3584
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> This idea is to allow AVRO to load schemas with recursion described like this:
> {code:json}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": "Status"}]}
> {"name": "Status", "type": "record", "fields": [{"name": "author", "type": 
> "User"}]}
> {code}
> - Java : [AVRO-530|https://issues.apache.org/jira/browse/AVRO-530] is ready 
> to checked (PR ready).
> - Rust : Unit test show that it already works (May be better to add a unit 
> test in a PR).
> For others languages (C, C++, C#, Python ...); 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3584) allow for mutual recursion in type definitions for all languages

2022-07-21 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569311#comment-17569311
 ] 

Christophe Le Saec commented on AVRO-3584:
--

[PR request added|https://github.com/apache/avro/pull/1775]

> allow for mutual recursion in type definitions for all languages
> 
>
> Key: AVRO-3584
> URL: https://issues.apache.org/jira/browse/AVRO-3584
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> This idea is to allow AVRO to load schemas with recursion described like this:
> {code:json}
> {"name": "User", "type": "record", "fields": [{"name": "current_status", 
> "type": "Status"}]}
> {"name": "Status", "type": "record", "fields": [{"name": "author", "type": 
> "User"}]}
> {code}
> - Java : [AVRO-530|https://issues.apache.org/jira/browse/AVRO-530] is ready 
> to checked (PR ready).
> - Rust : Unit test show that it already works (May be better to add a unit 
> test in a PR).
> For others languages (C, C++, C#, Python ...); 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-2918) Schema polymorphism

2022-07-21 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569480#comment-17569480
 ] 

Christophe Le Saec commented on AVRO-2918:
--

As a possible example of polymorphism, i built this [Pull 
Request|https://github.com/apache/avro/pull/1776].
This use a discriminator when a value is a subtype of declaration (if a field 
is declared as Person and its real value is Employee for example, or in an 
array of Person, each Employee instance will use a discriminator).
 - For schema set, sub schemas will be stored in Names during parsing 
(parse(JsonNode schema, Names names)).
Otherwise, Protocol class does the Job ({_}this is a pb i already faced when 
building a PR for AVRO-530, allowing using schema before it is define, and so, 
allowing mutual recursion{_}).
 - Encoding / decoding are taken into account(need to add more tests)
 - I didn't test generated code.

(_Code is inspired of what was already done with Union_)

> Schema polymorphism
> ---
>
> Key: AVRO-2918
> URL: https://issues.apache.org/jira/browse/AVRO-2918
> Project: Apache Avro
>  Issue Type: New Feature
>  Components: logical types, misc, spec
>Reporter: Jonathan Rapoport
>Priority: Critical
>  Labels: features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Include the option to use named types as base types for a new schema. Allow 
> for MRO generation. Field inheritance. 
> The benefits of this approach include:
>  * Defining a schema as validation for a certain wire, and so allowing the 
> receiver to be certain of the structure of the data (this works today). 
> However, defining an extension of this schema, or certain schemas which can 
> be normalized to the original schema, but contain additional information, 
> will not allow it to be sent over the same wire.
>  * Backwards compatibility through inheritance - you never break the old 
> schema, thus allowing a long integration period, with no need to recode all 
> processes familiar with the schema. The new schema will simply inherit the 
> old one, and only add information.
>  * Allow for full data control through polymorphism, and the ability to 
> replace structures within any supported language. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3579) Java Test : From Junit4 to JUnit5

2022-07-22 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569994#comment-17569994
 ] 

Christophe Le Saec commented on AVRO-3579:
--

New [PR for step 2|https://github.com/apache/avro/pull/1777], based on 
[rewrite-testing-frameworks|https://github.com/openrewrite/rewrite-testing-frameworks],
 but it does not all ... So, it will need further steps.

> Java Test : From Junit4 to JUnit5
> -
>
> Key: AVRO-3579
> URL: https://issues.apache.org/jira/browse/AVRO-3579
> Project: Apache Avro
>  Issue Type: Improvement
>Reporter: Christophe Le Saec
>Assignee: Christophe Le Saec
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Progressively pass from JUnit4 to JUnit5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3532) Align naming rules on code

2022-07-26 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571519#comment-17571519
 ] 

Christophe Le Saec commented on AVRO-3532:
--

My first intention is to keep Java implementation of 
[validateName|https://github.com/clesaec/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1593]
 and change documentation to render this code as the official rule.
The objective is to not introduce breaking change (in Java implementation at 
least)
So, currently, i add a unit test on that and just touch Rust code to adapt. 
I will have a look on C/C++ code and read article you referred, Oscar.

> Align naming rules on code
> --
>
> Key: AVRO-3532
> URL: https://issues.apache.org/jira/browse/AVRO-3532
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> Description of [naming rule on 
> documentation|https://avro.apache.org/docs/current/spec.html#names] is
> {noformat}
> - start with [A-Za-z_]
> - subsequently contain only [A-Za-z0-9_]
> {noformat}
> But [java 
> code|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1578]
>  use Character.isLetter method
> {code:java}
> char first = name.charAt(0);
> if (!(Character.isLetter(first) || first == '_'))
>   throw new SchemaParseException("Illegal initial character: " + name);
> for (int i = 1; i < length; i++) {
>   char c = name.charAt(i);
>   if (!(Character.isLetterOrDigit(c) || c == '_'))
> throw new SchemaParseException("Illegal character in: " + name);
> }
> return name;
> {code}
> This method accept accent éùàçË ... and also chinese character (我) ...
> So, the aim of this ticket is to see if we can update the documentation, if 
> other implementations (rust, C# ...) are also compatible with ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3532) Align naming rules on code

2022-07-27 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571895#comment-17571895
 ] 

Christophe Le Saec commented on AVRO-3532:
--

For C, (like on Java), null character should stay forbidden. I test accent, it 
does not work as expected due to 
[is_avro_id|https://github.com/apache/avro/blob/master/lang/c/src/schema.c#L49] 
function. I wonder if we could use an external library like 
[ICU|https://unicode-org.github.io/icu/], with function like 
[u_isUAlphabetic()|https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/uchar_8h.html#a063b8b8c01c1c8246682dd81dd46da00].
If yes, how do we add such a dependency in C ? (What is the maven equivalent ;) 
?)

> Align naming rules on code
> --
>
> Key: AVRO-3532
> URL: https://issues.apache.org/jira/browse/AVRO-3532
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> Description of [naming rule on 
> documentation|https://avro.apache.org/docs/current/spec.html#names] is
> {noformat}
> - start with [A-Za-z_]
> - subsequently contain only [A-Za-z0-9_]
> {noformat}
> But [java 
> code|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1578]
>  use Character.isLetter method
> {code:java}
> char first = name.charAt(0);
> if (!(Character.isLetter(first) || first == '_'))
>   throw new SchemaParseException("Illegal initial character: " + name);
> for (int i = 1; i < length; i++) {
>   char c = name.charAt(i);
>   if (!(Character.isLetterOrDigit(c) || c == '_'))
> throw new SchemaParseException("Illegal character in: " + name);
> }
> return name;
> {code}
> This method accept accent éùàçË ... and also chinese character (我) ...
> So, the aim of this ticket is to see if we can update the documentation, if 
> other implementations (rust, C# ...) are also compatible with ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3532) Align naming rules on code

2022-07-28 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572334#comment-17572334
 ] 

Christophe Le Saec commented on AVRO-3532:
--

About [ICU|https://unicode-org.github.io/icu/], do I even have the right to use 
it regards to its [license|https://www.unicode.org/copyright.html]

> Align naming rules on code
> --
>
> Key: AVRO-3532
> URL: https://issues.apache.org/jira/browse/AVRO-3532
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> Description of [naming rule on 
> documentation|https://avro.apache.org/docs/current/spec.html#names] is
> {noformat}
> - start with [A-Za-z_]
> - subsequently contain only [A-Za-z0-9_]
> {noformat}
> But [java 
> code|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1578]
>  use Character.isLetter method
> {code:java}
> char first = name.charAt(0);
> if (!(Character.isLetter(first) || first == '_'))
>   throw new SchemaParseException("Illegal initial character: " + name);
> for (int i = 1; i < length; i++) {
>   char c = name.charAt(i);
>   if (!(Character.isLetterOrDigit(c) || c == '_'))
> throw new SchemaParseException("Illegal character in: " + name);
> }
> return name;
> {code}
> This method accept accent éùàçË ... and also chinese character (我) ...
> So, the aim of this ticket is to see if we can update the documentation, if 
> other implementations (rust, C# ...) are also compatible with ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3532) Align naming rules on code

2022-07-28 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572777#comment-17572777
 ] 

Christophe Le Saec commented on AVRO-3532:
--

The other point to take into account is that current Avro Java code ({_}at 
least from version 1.8.2 to 1.11.0{_}) already accepts fields with name like 
{*}"Âge"{*}, due to the used method "Character.isLetter" that return true for Â.
So, here, the idea is to keep this code, changing documentation and adapt 
others languages.


(FI : Apache Arrow project [limits field names to 
UTF8|https://arrow.apache.org/docs/format/Columnar.html#struct-layout] : 
"{_}Each field must have a UTF8-encoded name{_}")

> Align naming rules on code
> --
>
> Key: AVRO-3532
> URL: https://issues.apache.org/jira/browse/AVRO-3532
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> Description of [naming rule on 
> documentation|https://avro.apache.org/docs/current/spec.html#names] is
> {noformat}
> - start with [A-Za-z_]
> - subsequently contain only [A-Za-z0-9_]
> {noformat}
> But [java 
> code|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1578]
>  use Character.isLetter method
> {code:java}
> char first = name.charAt(0);
> if (!(Character.isLetter(first) || first == '_'))
>   throw new SchemaParseException("Illegal initial character: " + name);
> for (int i = 1; i < length; i++) {
>   char c = name.charAt(i);
>   if (!(Character.isLetterOrDigit(c) || c == '_'))
> throw new SchemaParseException("Illegal character in: " + name);
> }
> return name;
> {code}
> This method accept accent éùàçË ... and also chinese character (我) ...
> So, the aim of this ticket is to see if we can update the documentation, if 
> other implementations (rust, C# ...) are also compatible with ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3532) Align naming rules on code

2022-08-01 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573641#comment-17573641
 ] 

Christophe Le Saec commented on AVRO-3532:
--

ok, thanks [~kniemitalo] , so, it would be ok to use unicode license.

Nevertheless, I finally succeed build a first PR in C without unicode library, 
using mbstowcs & iswalpha C functions ({_}just with a weird setlocale, which i 
don't know where to put{_}).

> Align naming rules on code
> --
>
> Key: AVRO-3532
> URL: https://issues.apache.org/jira/browse/AVRO-3532
> Project: Apache Avro
>  Issue Type: Wish
>Reporter: Christophe Le Saec
>Priority: Major
>
> Description of [naming rule on 
> documentation|https://avro.apache.org/docs/current/spec.html#names] is
> {noformat}
> - start with [A-Za-z_]
> - subsequently contain only [A-Za-z0-9_]
> {noformat}
> But [java 
> code|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1578]
>  use Character.isLetter method
> {code:java}
> char first = name.charAt(0);
> if (!(Character.isLetter(first) || first == '_'))
>   throw new SchemaParseException("Illegal initial character: " + name);
> for (int i = 1; i < length; i++) {
>   char c = name.charAt(i);
>   if (!(Character.isLetterOrDigit(c) || c == '_'))
> throw new SchemaParseException("Illegal character in: " + name);
> }
> return name;
> {code}
> This method accept accent éùàçË ... and also chinese character (我) ...
> So, the aim of this ticket is to see if we can update the documentation, if 
> other implementations (rust, C# ...) are also compatible with ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (AVRO-2796) Generated schema class can't be compiled: code too large

2022-08-03 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec updated AVRO-2796:
-
Attachment: recordfixed.avsc

> Generated schema class can't be compiled: code too large
> 
>
> Key: AVRO-2796
> URL: https://issues.apache.org/jira/browse/AVRO-2796
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.2
> Environment: {code:java}
> $ uname -a
> Linux barlnx 5.4.28-gentoo-PRI #1 SMP PREEMPT Mon Mar 30 19:39:16 EEST 2020 
> x86_64 Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz GenuineIntel GNU/Linux{code}
> {code:java}
> $ java -version
> java version "1.8.0_202"
> Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode){code}
>Reporter: DMYTRO TRUNYKOV
>Priority: Major
> Attachments: record.avsc, recordfixed.avsc
>
>
> Hi,
> I have a schema (see the attach) that can be compiled into Java source code 
> and then compiled by javac.
> This works for the AVRO release 1.8.2, but fails with the 1.9.2.
> Below are examples.
> Happy path (AVRO version 1.8.2):
> {code:java}
> $ java -jar avro-tools-1.8.2.jar compile -string schema record.avsc 
> generated_sources/
> Input files to compile:
>  record.avsc
> log4j:WARN No appenders could be found for logger (AvroVelocityLogChute).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> $ javac -cp avro-tools-1.8.2.jar generated_sources/example/avro/therecord.java
> $ tree generated_sources/
> generated_sources/
> └── example
>  └── avro
>  ├── therecord$1.class
>  ├── therecord$Builder.class
>  ├── therecord.class
>  └── therecord.java
> {code}
>  
> Unhappy path (AVRO version 1.9.2):
> {code:java}
> $ java -jar avro-tools-1.9.2.jar compile -string schema record.avsc 
> generated_sources/
> Input files to compile:
>  record.avsc
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'resource.loader' has been deprecated in favor of 'resource.loaders'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'class.resource.loader.class' has been deprecated in favor of 'resour
> ce.loader.class.class'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'file.resource.loader.class' has been deprecated in favor of 'resourc
> e.loader.file.class'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'file.resource.loader.path' has been deprecated in favor of 'resource
> .loader.file.path'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'runtime.references.strict' has been deprecated in favor of 'runtime.
> strict_mode.enable'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'space.gobbling' has been deprecated in favor of 'parser.space_gobbling'
> 20/04/12 13:51:27 WARN specific.SpecificCompiler: Record 
> 'example.avro.therecord' contains more than 254 parameters which exceeds the 
> JVM spec for the number of permitted constructor arguments. Clients must rely 
> on the builder pattern to create objects instead. For more info see JIRA 
> ticket AVRO-1642.
> $ javac -cp avro-tools-1.9.2.jar generated_sources/example/avro/therecord.java
> generated_sources/example/avro/therecord.java:24600: error: code too large
>  private Builder(example.avro.therecord.Builder other) {
>  ^
> generated_sources/example/avro/therecord.java:90993: error: code too large
>  @Override public void customDecode(org.apache.avro.io.ResolvingDecoder in)
>  ^
> 2 errors
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-2796) Generated schema class can't be compiled: code too large

2022-08-03 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574719#comment-17574719
 ] 

Christophe Le Saec commented on AVRO-2796:
--

Model on [^record.avsc] sounds tricky. [^recordfixed.avsc] model just replaces 
long lists of fields like "prop1", "prop2" ... "prop100" ... by one field props 
that is an array of ... And it's enough to make it works. No needs to have 
sub-records like "mobile", "city" ... even if it could be better with.

So, what about closing this ticket ?

> Generated schema class can't be compiled: code too large
> 
>
> Key: AVRO-2796
> URL: https://issues.apache.org/jira/browse/AVRO-2796
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.2
> Environment: {code:java}
> $ uname -a
> Linux barlnx 5.4.28-gentoo-PRI #1 SMP PREEMPT Mon Mar 30 19:39:16 EEST 2020 
> x86_64 Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz GenuineIntel GNU/Linux{code}
> {code:java}
> $ java -version
> java version "1.8.0_202"
> Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode){code}
>Reporter: DMYTRO TRUNYKOV
>Priority: Major
> Attachments: record.avsc, recordfixed.avsc
>
>
> Hi,
> I have a schema (see the attach) that can be compiled into Java source code 
> and then compiled by javac.
> This works for the AVRO release 1.8.2, but fails with the 1.9.2.
> Below are examples.
> Happy path (AVRO version 1.8.2):
> {code:java}
> $ java -jar avro-tools-1.8.2.jar compile -string schema record.avsc 
> generated_sources/
> Input files to compile:
>  record.avsc
> log4j:WARN No appenders could be found for logger (AvroVelocityLogChute).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> $ javac -cp avro-tools-1.8.2.jar generated_sources/example/avro/therecord.java
> $ tree generated_sources/
> generated_sources/
> └── example
>  └── avro
>  ├── therecord$1.class
>  ├── therecord$Builder.class
>  ├── therecord.class
>  └── therecord.java
> {code}
>  
> Unhappy path (AVRO version 1.9.2):
> {code:java}
> $ java -jar avro-tools-1.9.2.jar compile -string schema record.avsc 
> generated_sources/
> Input files to compile:
>  record.avsc
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'resource.loader' has been deprecated in favor of 'resource.loaders'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'class.resource.loader.class' has been deprecated in favor of 'resour
> ce.loader.class.class'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'file.resource.loader.class' has been deprecated in favor of 'resourc
> e.loader.file.class'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'file.resource.loader.path' has been deprecated in favor of 'resource
> .loader.file.path'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'runtime.references.strict' has been deprecated in favor of 'runtime.
> strict_mode.enable'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'space.gobbling' has been deprecated in favor of 'parser.space_gobbling'
> 20/04/12 13:51:27 WARN specific.SpecificCompiler: Record 
> 'example.avro.therecord' contains more than 254 parameters which exceeds the 
> JVM spec for the number of permitted constructor arguments. Clients must rely 
> on the builder pattern to create objects instead. For more info see JIRA 
> ticket AVRO-1642.
> $ javac -cp avro-tools-1.9.2.jar generated_sources/example/avro/therecord.java
> generated_sources/example/avro/therecord.java:24600: error: code too large
>  private Builder(example.avro.therecord.Builder other) {
>  ^
> generated_sources/example/avro/therecord.java:90993: error: code too large
>  @Override public void customDecode(org.apache.avro.io.ResolvingDecoder in)
>  ^
> 2 errors
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (AVRO-2796) Generated schema class can't be compiled: code too large

2022-08-04 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec updated AVRO-2796:
-
Attachment: therecord.java

> Generated schema class can't be compiled: code too large
> 
>
> Key: AVRO-2796
> URL: https://issues.apache.org/jira/browse/AVRO-2796
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.2
> Environment: {code:java}
> $ uname -a
> Linux barlnx 5.4.28-gentoo-PRI #1 SMP PREEMPT Mon Mar 30 19:39:16 EEST 2020 
> x86_64 Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz GenuineIntel GNU/Linux{code}
> {code:java}
> $ java -version
> java version "1.8.0_202"
> Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode){code}
>Reporter: DMYTRO TRUNYKOV
>Priority: Major
> Attachments: record.avsc, recordfixed.avsc, therecord.java
>
>
> Hi,
> I have a schema (see the attach) that can be compiled into Java source code 
> and then compiled by javac.
> This works for the AVRO release 1.8.2, but fails with the 1.9.2.
> Below are examples.
> Happy path (AVRO version 1.8.2):
> {code:java}
> $ java -jar avro-tools-1.8.2.jar compile -string schema record.avsc 
> generated_sources/
> Input files to compile:
>  record.avsc
> log4j:WARN No appenders could be found for logger (AvroVelocityLogChute).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> $ javac -cp avro-tools-1.8.2.jar generated_sources/example/avro/therecord.java
> $ tree generated_sources/
> generated_sources/
> └── example
>  └── avro
>  ├── therecord$1.class
>  ├── therecord$Builder.class
>  ├── therecord.class
>  └── therecord.java
> {code}
>  
> Unhappy path (AVRO version 1.9.2):
> {code:java}
> $ java -jar avro-tools-1.9.2.jar compile -string schema record.avsc 
> generated_sources/
> Input files to compile:
>  record.avsc
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'resource.loader' has been deprecated in favor of 'resource.loaders'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'class.resource.loader.class' has been deprecated in favor of 'resour
> ce.loader.class.class'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'file.resource.loader.class' has been deprecated in favor of 'resourc
> e.loader.file.class'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'file.resource.loader.path' has been deprecated in favor of 'resource
> .loader.file.path'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'runtime.references.strict' has been deprecated in favor of 'runtime.
> strict_mode.enable'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'space.gobbling' has been deprecated in favor of 'parser.space_gobbling'
> 20/04/12 13:51:27 WARN specific.SpecificCompiler: Record 
> 'example.avro.therecord' contains more than 254 parameters which exceeds the 
> JVM spec for the number of permitted constructor arguments. Clients must rely 
> on the builder pattern to create objects instead. For more info see JIRA 
> ticket AVRO-1642.
> $ javac -cp avro-tools-1.9.2.jar generated_sources/example/avro/therecord.java
> generated_sources/example/avro/therecord.java:24600: error: code too large
>  private Builder(example.avro.therecord.Builder other) {
>  ^
> generated_sources/example/avro/therecord.java:90993: error: code too large
>  @Override public void customDecode(org.apache.avro.io.ResolvingDecoder in)
>  ^
> 2 errors
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-2796) Generated schema class can't be compiled: code too large

2022-08-05 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575609#comment-17575609
 ] 

Christophe Le Saec commented on AVRO-2796:
--

The purpose is to transform fields like
{code:json}
{"name": "prop1", "type": ["null", "string"]},
{"name": "prop2", "type": ["null", "string"]},
{"name": "prop3", "type": ["null", "string"]},{code}
that give java code as
{code:java}
private java.lang.CharSequence prop1;
private java.lang.CharSequence prop1;
private java.lang.CharSequence prop1;
{code}
To
{code:json}
{"name": "props", "type": { "type": "array", "items": ["null", "string"]}},
{code}
that give
{code:java}
private java.util.List props;
{code}
So there's no loss of type here (see [^therecord.java] for the generated java 
code).

 

The second purpose (but not done in the example) was to transform fields like
{code:json}
 {"name": "mobile_id", "type": ["null", "int"]},
 {"name": "mobileacquisitionclicks", "type": ["null", "string"]},
 {"name": "mobileaction", "type": ["null", "string"]},
 {"name": "mobileactioninapptime", "type": ["null", "string"]},
 {"name": "mobileactiontotaltime", "type": ["null", "string"]},
 {"name": "mobileappid", "type": ["null", "string"]},
 {"name": "mobileappperformanceaffectedusers", "type": ["null", "string"]},
 {"name": "mobileappperformanceappid", "type": ["null", "string"]} ...
{code}
into a mobile subtype, something like :
{code:json}
{
  "name": "mobile",
  "type": {
"type": "record",
"name": "mobile",
"fields": [
  { "name": "id", "type": [ "null", "int" ] },
  { "name": "acquisitionclicks", "type": [ "null", "string" ] },
  { "name": "action", "type": [ "null", "string" ] },
  { "name": "actioninapptime", "type": [ "null", "string" ] },
  { "name": "actiontotaltime", "type": [ "null", "string" ] },
  { "name": "mobileappid", "type": [ "null", "string" ] },
  { "name": "appperformanceaffectedusers", "type": [ "null", "string" ] },
  { "name": "appperformanceappid", "type": [ "null", "string" ] } ...
]
  }
}
{code}
So, here again, there is no loss of typing.
Hope this explanation is more clear than the preceding one :)

> Generated schema class can't be compiled: code too large
> 
>
> Key: AVRO-2796
> URL: https://issues.apache.org/jira/browse/AVRO-2796
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.2
> Environment: {code:java}
> $ uname -a
> Linux barlnx 5.4.28-gentoo-PRI #1 SMP PREEMPT Mon Mar 30 19:39:16 EEST 2020 
> x86_64 Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz GenuineIntel GNU/Linux{code}
> {code:java}
> $ java -version
> java version "1.8.0_202"
> Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode){code}
>Reporter: DMYTRO TRUNYKOV
>Priority: Major
> Attachments: record.avsc, recordfixed.avsc, therecord.java
>
>
> Hi,
> I have a schema (see the attach) that can be compiled into Java source code 
> and then compiled by javac.
> This works for the AVRO release 1.8.2, but fails with the 1.9.2.
> Below are examples.
> Happy path (AVRO version 1.8.2):
> {code:java}
> $ java -jar avro-tools-1.8.2.jar compile -string schema record.avsc 
> generated_sources/
> Input files to compile:
>  record.avsc
> log4j:WARN No appenders could be found for logger (AvroVelocityLogChute).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> $ javac -cp avro-tools-1.8.2.jar generated_sources/example/avro/therecord.java
> $ tree generated_sources/
> generated_sources/
> └── example
>  └── avro
>  ├── therecord$1.class
>  ├── therecord$Builder.class
>  ├── therecord.class
>  └── therecord.java
> {code}
>  
> Unhappy path (AVRO version 1.9.2):
> {code:java}
> $ java -jar avro-tools-1.9.2.jar compile -string schema record.avsc 
> generated_sources/
> Input files to compile:
>  record.avsc
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'resource.loader' has been deprecated in favor of 'resource.loaders'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'class.resource.loader.class' has been deprecated in favor of 'resour
> ce.loader.class.class'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'file.resource.loader.class' has been deprecated in favor of 'resourc
> e.loader.file.class'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'file.resource.loader.path' has been deprecated in favor of 'resource
> .loader.file.path'
> 20/04/12 13:51:27 WARN velocity.deprecation: configuration key 
> 'runtime.references.strict' has been deprecated in favor of 'runtime.
> strict

[jira] [Work started] (AVRO-3618) [Java] TestBinaryDecoder should check consistency with directBinaryDecoder

2022-08-29 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-3618 started by Christophe Le Saec.

> [Java] TestBinaryDecoder should check consistency with directBinaryDecoder
> --
>
> Key: AVRO-3618
> URL: https://issues.apache.org/jira/browse/AVRO-3618
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Major
>  Labels: starter
> Fix For: 1.12.0
>
>
> The unit tests for TestBinaryDecoder were originally parameterized so that 
> _every_ test verifies that {{*BinaryDecoder*}} and {{*DirectBinaryDecoder*}} 
> show the same behaviour.
> In the meantime, some tests have been added or modified that only test 
> BinaryDecoder (twice).  Where possible, this should be fixed so that both 
> classes are checked and that their behaviour are the same.
> Notably: the {{testNegativeBytesLength}} throws a different exception when 
> DirectBinaryEncoder is used.
> Please pay special attention around the tests deserializing invalid binary 
> data: this might be an unrecoverable error for Avro that throws a runtime 
> exception but shouldn't consume unnecessary resources or cause 
> OutOfMemoryErrors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (AVRO-3618) [Java] TestBinaryDecoder should check consistency with directBinaryDecoder

2022-08-29 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec reassigned AVRO-3618:


Assignee: Christophe Le Saec

> [Java] TestBinaryDecoder should check consistency with directBinaryDecoder
> --
>
> Key: AVRO-3618
> URL: https://issues.apache.org/jira/browse/AVRO-3618
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Major
>  Labels: starter
> Fix For: 1.12.0
>
>
> The unit tests for TestBinaryDecoder were originally parameterized so that 
> _every_ test verifies that {{*BinaryDecoder*}} and {{*DirectBinaryDecoder*}} 
> show the same behaviour.
> In the meantime, some tests have been added or modified that only test 
> BinaryDecoder (twice).  Where possible, this should be fixed so that both 
> classes are checked and that their behaviour are the same.
> Notably: the {{testNegativeBytesLength}} throws a different exception when 
> DirectBinaryEncoder is used.
> Please pay special attention around the tests deserializing invalid binary 
> data: this might be an unrecoverable error for Avro that throws a runtime 
> exception but shouldn't consume unnecessary resources or cause 
> OutOfMemoryErrors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3618) [Java] TestBinaryDecoder should check consistency with directBinaryDecoder

2022-08-29 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597073#comment-17597073
 ] 

Christophe Le Saec commented on AVRO-3618:
--

Hi Ryan,
Here small updates on unit test to use newDecoder function ({_}and so, 
DirectBinaryDecoder{_}) in unit test before conversion in JUnit5 with AVRO-3579.
 

> [Java] TestBinaryDecoder should check consistency with directBinaryDecoder
> --
>
> Key: AVRO-3618
> URL: https://issues.apache.org/jira/browse/AVRO-3618
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Ryan Skraba
>Assignee: Christophe Le Saec
>Priority: Major
>  Labels: starter
> Fix For: 1.12.0
>
>
> The unit tests for TestBinaryDecoder were originally parameterized so that 
> _every_ test verifies that {{*BinaryDecoder*}} and {{*DirectBinaryDecoder*}} 
> show the same behaviour.
> In the meantime, some tests have been added or modified that only test 
> BinaryDecoder (twice).  Where possible, this should be fixed so that both 
> classes are checked and that their behaviour are the same.
> Notably: the {{testNegativeBytesLength}} throws a different exception when 
> DirectBinaryEncoder is used.
> Please pay special attention around the tests deserializing invalid binary 
> data: this might be an unrecoverable error for Avro that throws a runtime 
> exception but shouldn't consume unnecessary resources or cause 
> OutOfMemoryErrors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (AVRO-3597) Recent changes in GenericDatumReader.java break compatibility

2022-08-30 Thread Christophe Le Saec (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christophe Le Saec reassigned AVRO-3597:


Assignee: Christophe Le Saec

> Recent changes in GenericDatumReader.java break compatibility
> -
>
> Key: AVRO-3597
> URL: https://issues.apache.org/jira/browse/AVRO-3597
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.1
>Reporter: Viktor Dvoretskii
>Assignee: Christophe Le Saec
>Priority: Major
>
> We use a custom SpecificDatumReader which overrides the string creation 
> logic. It looks like this:
> {code:java}
> class CustomSpecificDatumReader extends SpecificDatumReader {
> @Override
> protected Object newInstanceFromString(Class c, String s) {
> // custom logic
> }
> } {code}
> With [this 
> commit|https://github.com/apache/avro/commit/820ed6e5ea4417b5735078bfd26c99f1305ea363],
>  the newInstanceFromString() method is no longer called. Instead, strings are 
> created with the hard-coded logic within GenericDatumReader.ReaderCache.
> It would be appreciated if you made it possible to override the string 
> creation logic used by GenericDatumReader.ReaderCache. Preferably make use of 
> the newInstanceFromString() method again to maintain backward compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3597) Recent changes in GenericDatumReader.java break compatibility

2022-08-31 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598230#comment-17598230
 ] 

Christophe Le Saec commented on AVRO-3597:
--

Attached PR should fix the issue.

> Recent changes in GenericDatumReader.java break compatibility
> -
>
> Key: AVRO-3597
> URL: https://issues.apache.org/jira/browse/AVRO-3597
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.11.1
>Reporter: Viktor Dvoretskii
>Assignee: Christophe Le Saec
>Priority: Major
>
> We use a custom SpecificDatumReader which overrides the string creation 
> logic. It looks like this:
> {code:java}
> class CustomSpecificDatumReader extends SpecificDatumReader {
> @Override
> protected Object newInstanceFromString(Class c, String s) {
> // custom logic
> }
> } {code}
> With [this 
> commit|https://github.com/apache/avro/commit/820ed6e5ea4417b5735078bfd26c99f1305ea363],
>  the newInstanceFromString() method is no longer called. Instead, strings are 
> created with the hard-coded logic within GenericDatumReader.ReaderCache.
> It would be appreciated if you made it possible to override the string 
> creation logic used by GenericDatumReader.ReaderCache. Preferably make use of 
> the newInstanceFromString() method again to maintain backward compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3617) [C++] Integer overflow risks with Validator::count_ and Validator::counters_

2022-09-01 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598961#comment-17598961
 ] 

Christophe Le Saec commented on AVRO-3617:
--

I think [C++ size_t 
documentation|https://en.cppreference.com/w/cpp/types/size_t] encourage to use 
size_t for all collections count, file size ... And so, declare 
Validator::count_ as size_t (and reserve int32 / int64 for business data when 
it is convenient). WDYT ?

> [C++] Integer overflow risks with Validator::count_ and Validator::counters_
> 
>
> Key: AVRO-3617
> URL: https://issues.apache.org/jira/browse/AVRO-3617
> Project: Apache Avro
>  Issue Type: Bug
>  Components: c++
>Reporter: Kalle Niemitalo
>Priority: Minor
>
> In Validator, there seems to be some inconsistency with {{std::vector 
> counters_}} and {{int64_t count_}}:
> - Validator::countingSetup converts int64_t to size_t: 
> {{counters_.push_back(static_cast(count_));}}
> - Validator::countingAdvance converts size_t to int: {{int count = 
> --counters_.back();}}
> - Validator::unionAdvance converts size_t to int64_t: {{if (count_ < 
> static_cast(node->leaves()))}}
> - Validator::unionAdvance converts int64_t to int and that to size_t: 
> {{setupOperation(node->leafAt(static_cast(count_)));}}
> I did not verify whether these integers can actually grow so high that 
> overflow is possible. Nevertheless, it would be safest to use integer types 
> consistently.
> (Originally posted as 
> [https://github.com/apache/avro/pull/1836#issuecomment-1225303643].)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3591) Improve interoperability tests with a common test suite

2022-09-01 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599038#comment-17599038
 ] 

Christophe Le Saec commented on AVRO-3591:
--

Hi [~rskraba] ,

 To fix the idea, i started this [PR|https://github.com/apache/avro/pull/1850] 
with a shared folder "share/test/data/schemas" that contains subfolder, each 
for one test, that contains file (schema.json & data.avro) with a [JUnit 
test|https://github.com/apache/avro/blob/4688029b5bd32a94ae686245a195bf73ecac180e/lang/java/avro/src/test/java/org/apache/avro/TestSchemaCommons.java#L27]
 that parse all subfolder to apply read/write operations.
Is that corresponds to your idea ? (there is no file schema-tests.txt in that 
first try)

> Improve interoperability tests with a common test suite
> ---
>
> Key: AVRO-3591
> URL: https://issues.apache.org/jira/browse/AVRO-3591
> Project: Apache Avro
>  Issue Type: New Feature
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We sometimes see interoperability issues like FLINK-25962.  There are dozens 
> (or hundreds) or test schemas and data in the various language SDKs. 
> It would be a huge win if we had a way to share test schemas between 
> different SDKs and versions.
> The {{share/test/data/schema-tests.txt}} is a good start.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3617) [C++] Integer overflow risks with Validator::count_ and Validator::counters_

2022-09-01 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599047#comment-17599047
 ] 

Christophe Le Saec commented on AVRO-3617:
--

(y)

> [C++] Integer overflow risks with Validator::count_ and Validator::counters_
> 
>
> Key: AVRO-3617
> URL: https://issues.apache.org/jira/browse/AVRO-3617
> Project: Apache Avro
>  Issue Type: Bug
>  Components: c++
>Reporter: Kalle Niemitalo
>Priority: Minor
>
> In Validator, there seems to be some inconsistency with {{std::vector 
> counters_}} and {{int64_t count_}}:
> - Validator::countingSetup converts int64_t to size_t: 
> {{counters_.push_back(static_cast(count_));}}
> - Validator::countingAdvance converts size_t to int: {{int count = 
> --counters_.back();}}
> - Validator::unionAdvance converts size_t to int64_t: {{if (count_ < 
> static_cast(node->leaves()))}}
> - Validator::unionAdvance converts int64_t to int and that to size_t: 
> {{setupOperation(node->leafAt(static_cast(count_)));}}
> I did not verify whether these integers can actually grow so high that 
> overflow is possible. Nevertheless, it would be safest to use integer types 
> consistently.
> (Originally posted as 
> [https://github.com/apache/avro/pull/1836#issuecomment-1225303643].)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3617) [C++] Integer overflow risks with Validator::count_ and Validator::counters_

2022-09-02 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599508#comment-17599508
 ] 

Christophe Le Saec commented on AVRO-3617:
--

[~kniemitalo] : I try to start a branch [with this 
PR|https://github.com/apache/avro/pull/1852] just to check if warning could be 
easily removed, but Travis give me none. Are you using any tools or options on 
build to show warning ?

> [C++] Integer overflow risks with Validator::count_ and Validator::counters_
> 
>
> Key: AVRO-3617
> URL: https://issues.apache.org/jira/browse/AVRO-3617
> Project: Apache Avro
>  Issue Type: Bug
>  Components: c++
>Reporter: Kalle Niemitalo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Validator, there seems to be some inconsistency with {{std::vector 
> counters_}} and {{int64_t count_}}:
> - Validator::countingSetup converts int64_t to size_t: 
> {{counters_.push_back(static_cast(count_));}}
> - Validator::countingAdvance converts size_t to int: {{int count = 
> --counters_.back();}}
> - Validator::unionAdvance converts size_t to int64_t: {{if (count_ < 
> static_cast(node->leaves()))}}
> - Validator::unionAdvance converts int64_t to int and that to size_t: 
> {{setupOperation(node->leafAt(static_cast(count_)));}}
> I did not verify whether these integers can actually grow so high that 
> overflow is possible. Nevertheless, it would be safest to use integer types 
> consistently.
> (Originally posted as 
> [https://github.com/apache/avro/pull/1836#issuecomment-1225303643].)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (AVRO-3617) [C++] Integer overflow risks with Validator::count_ and Validator::counters_

2022-09-02 Thread Christophe Le Saec (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599549#comment-17599549
 ] 

Christophe Le Saec commented on AVRO-3617:
--

ok, i found a way by adding -Wconversion option near -Wall in file 
"lang/c++/CMakeLists.txt"

> [C++] Integer overflow risks with Validator::count_ and Validator::counters_
> 
>
> Key: AVRO-3617
> URL: https://issues.apache.org/jira/browse/AVRO-3617
> Project: Apache Avro
>  Issue Type: Bug
>  Components: c++
>Reporter: Kalle Niemitalo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Validator, there seems to be some inconsistency with {{std::vector 
> counters_}} and {{int64_t count_}}:
> - Validator::countingSetup converts int64_t to size_t: 
> {{counters_.push_back(static_cast(count_));}}
> - Validator::countingAdvance converts size_t to int: {{int count = 
> --counters_.back();}}
> - Validator::unionAdvance converts size_t to int64_t: {{if (count_ < 
> static_cast(node->leaves()))}}
> - Validator::unionAdvance converts int64_t to int and that to size_t: 
> {{setupOperation(node->leafAt(static_cast(count_)));}}
> I did not verify whether these integers can actually grow so high that 
> overflow is possible. Nevertheless, it would be safest to use integer types 
> consistently.
> (Originally posted as 
> [https://github.com/apache/avro/pull/1836#issuecomment-1225303643].)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3704) Naming rules : multiple choice

2023-01-11 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3704:


 Summary: Naming rules : multiple choice
 Key: AVRO-3704
 URL: https://issues.apache.org/jira/browse/AVRO-3704
 Project: Apache Avro
  Issue Type: Improvement
  Components: java
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec


Long discussion about [naming 
rules|https://lists.apache.org/thread/39v98os6wdpyr6w31xdkz0yzol51fsrr] and the 
fact that currently, name checking is not the same depends of module (rust 
follow current doc, java is more flexible and accept accent), and also you can 
now deactivate control, i propose a change to allow to choose "name controler" 
between some proposed one (no control, current controller, strict controller 
that follow doc) to client custom one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3779) Any big decimal conversion

2023-06-13 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3779:


 Summary: Any big decimal conversion
 Key: AVRO-3779
 URL: https://issues.apache.org/jira/browse/AVRO-3779
 Project: Apache Avro
  Issue Type: Improvement
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec


Current big decimal conversion needs to have predetermine precision and scale; 
which is fine when values doesn't varies too much ... but scratch otherwise.

The idea is to build a binary Big Decimal conversion that extract precision & 
scale from the value itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3801) Modify Schema.Parser.addTypes signature

2023-07-17 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3801:


 Summary: Modify Schema.Parser.addTypes signature
 Key: AVRO-3801
 URL: https://issues.apache.org/jira/browse/AVRO-3801
 Project: Apache Avro
  Issue Type: Improvement
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec


On java module, method addTypes is defined with a Map as input parameter
{code:java}
public Parser addTypes(Map types) {
{code}
But key is never used.
So, very basic, simple improvement, is to change to 
{code:java}
public Parser addTypes(Iterabler types) {
{code}
(keeping current one as deprecated for backward compatibility purpose)





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3813) Use list of primitiv

2023-07-24 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3813:


 Summary: Use list of primitiv
 Key: AVRO-3813
 URL: https://issues.apache.org/jira/browse/AVRO-3813
 Project: Apache Avro
  Issue Type: Sub-task
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec


*Implementation of second part of the JIRA description*

Use of list of primitive type to avoid garbage collection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3826) Commons test for C++ module

2023-08-09 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3826:


 Summary: Commons test for C++ module
 Key: AVRO-3826
 URL: https://issues.apache.org/jira/browse/AVRO-3826
 Project: Apache Avro
  Issue Type: Sub-task
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3829) JUnit4 to JUnit5 : continue

2023-08-11 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3829:


 Summary: JUnit4 to JUnit5 : continue
 Key: AVRO-3829
 URL: https://issues.apache.org/jira/browse/AVRO-3829
 Project: Apache Avro
  Issue Type: Improvement
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec


[AVRO-3579|https://issues.apache.org/jira/browse/AVRO-3579] started to convert 
JUnit4 test to JUnit5.
This JIRA is to finish the upgrade, at least for avro module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3876) JacksonUtils is not symmetric

2023-09-28 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3876:


 Summary: JacksonUtils is not symmetric
 Key: AVRO-3876
 URL: https://issues.apache.org/jira/browse/AVRO-3876
 Project: Apache Avro
  Issue Type: Bug
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec


For each json node, you should have
{code:java}
Object object = JacksonUtils.toObject(node);
JsonNode node1 = JacksonUtils.toJsonNode(object);
Assertions.assertEquals(node, node1);
{code}

But, that's not true for
{code:java}
JsonNodeFactory.instance.numberNode(33.33000183105469f);
{code}
where test give
{noformat}
org.opentest4j.AssertionFailedError: 
Expected :33.33
Actual   :33.33000183105469
{noformat}
because toObject method transform float to double and by default, 
Float.toString method gives round values, as this code
{code:java}
float x = 33.33000183105469f;
System.out.println("value float  = " + x);
System.out.println("value double = " + (double) x);
{code}
shows
{noformat}
value float  = 33.33
value double = 33.33000183105469
{noformat}








--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3888) CVE with common compress

2023-10-20 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3888:


 Summary: CVE with common compress
 Key: AVRO-3888
 URL: https://issues.apache.org/jira/browse/AVRO-3888
 Project: Apache Avro
  Issue Type: Bug
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (AVRO-3912) Issue with deserialization for BigDecimal in rust

2023-11-28 Thread Christophe Le Saec (Jira)
Christophe Le Saec created AVRO-3912:


 Summary: Issue with deserialization for BigDecimal in rust
 Key: AVRO-3912
 URL: https://issues.apache.org/jira/browse/AVRO-3912
 Project: Apache Avro
  Issue Type: Bug
Reporter: Christophe Le Saec
Assignee: Christophe Le Saec


Deserialization with reader does not work in rust



--
This message was sent by Atlassian Jira
(v8.20.10#820010)