[ 
https://issues.apache.org/jira/browse/AVRO-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011070#comment-14011070
 ] 

Michael Pershyn commented on AVRO-1500:
---------------------------------------

Intially I was looking forward to make this test in avro proj, but haven't 
found where to put it.
ThriftData looks a good place, so I'm on it, but have some questions:
* There is environment dependency - current test is compiled with thrift v0.7 
dependency, so I had to fire up a vagrant environment that has all the 
necessary tools for thrift (provisioning to install thrift 0.7, java, maven). 
In my dev environment I have thrift 0.9.1, so I found virtual env really useful 
way to avoid versions clash, but have doubts even to suggest to contribute 
that, because it is specific version of linux only... So I need some advice 
here :)
* On my dev environment test fail on clear checkout from trunk, when compiling 
Apache Avro IPC:
Failed tests:   
testBasicMaxSpans(org.apache.avro.ipc.trace.TestFileSpanStorage): expected:<9> 
but was:<10>
in virtual environment this test passes... no bloker, just strange.
* generated thrift files are checked-in, so it requires manual updating of 
those, while in pom.xml there is a profile definition "thrift-generate", but it 
is not updated when generated.
So I have a question - why maven-thrift-plugin is not used here? My suggestion 
is that this was done years ago and back then this plugin may not have been 
around. Also, should I commit the generated thrift-objects back?

> Unknown datum type exception during union type resolution (no short to int 
> conversion).
> ---------------------------------------------------------------------------------------
>
>                 Key: AVRO-1500
>                 URL: https://issues.apache.org/jira/browse/AVRO-1500
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.7.4, 1.7.5, 1.7.6
>         Environment: java API
>            Reporter: Michael Pershyn
>              Labels: easyfix, newbie, patch
>         Attachments: AVRO-1500.patch
>
>
> There is a conversion for values of type {{short}} (and other numeric types) 
> if in the schema they are declared as int:
> {code:title=/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumWriter.java}
>  protected void write(Schema schema, Object datum, Encoder out)
>     throws IOException {
>     try {
>       switch (schema.getType()) {
>         // ...
>         case INT:     out.writeInt(((Number)datum).intValue()); break;
>         // ...
> {code}
> So, if a value of {{short}} type is passed to INT field, it will be converted 
> and saved as INT in avro.
> But, when there is next field in schema: 
> {code}
>  ["null",{"type":"int","thrift":"short"}] {code}
> which is a union with int in it, and short is passed in (lets say 5), then we 
> are having the next exception:
> {code}
> org.apache.avro.AvroRuntimeException: Unknown datum type: 5
>       at 
> org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:593)
>       at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:558)
>       at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:144)
>  
> {code}
> This happens because in {{org.apache.avro.generic.GenericData}} there is no 
> check if the passed object has a type of {{java.lang.Short}}, and it is not 
> get converted then in {{write}} method of {{GenericDatumWriter}}:
> {code:title=lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java}
> /** Return the schema full name for a datum.  Called by {@link
>    * #resolveUnion(Schema,Object)}. */
> protected String getSchemaName(Object datum) {
> /* ... */
> if (isInteger(datum))
>   return Type.INT.getName();
> if (isLong(datum))
>   return Type.LONG.getName();
>  if (isFloat(datum))
>  return Type.FLOAT.getName();
> if (isDouble(datum))
>   return Type.DOUBLE.getName();
>  if (isBoolean(datum))
>   return Type.BOOLEAN.getName();
> throw new AvroRuntimeException("Unknown datum type: "+datum);
> {code}
> This error initially occured during thrift to avro conversion, when thrift 
> obj had optional field of type i16.
> In thrift to avro schema converter, if the type is short in thrift (i16) it 
> will be implicitly converted to int in avro-schema, so the values should be 
> converted as well. This is already done if they are not in the union (not 
> optional). Otherwise the exception is thrown.
> The snippet from schema conversion code is below:
> {code:title=lang/java/thrift/src/main/java/org/apache/avro/thrift/ThriftData.java}
> private Schema getSchema(FieldValueMetaData f) {
>   switch (f.type) {
>   /* ... */ 
>     case TType.I16:
>       Schema s = Schema.create(Schema.Type.INT);
>       s.addProp(THRIFT_PROP, "short");
>     return s;
>    /* ... */
> {code}
> Proposal is to add isShort check to generic data, as well as isShort method 
> implementation:
> {code:title=lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java}
> protected String getSchemaName(Object datum) {
> // ..
> if (isInteger(datum) || isShort(datum))
>    return Type.INT.getName();
> // ..
> {code}
> or maybe even some kind of isNumeric method, so the behaviour will be same 
> for INT fields and INT fields that are in Union.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to