[ 
https://issues.apache.org/jira/browse/AVRO-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914931#action_12914931
 ] 

Bruce Martin commented on AVRO-656:
-----------------------------------

In java (Avro version 1.4) if you use anything other than the first ENUM in a 
UNION you can get an exception when writing to a file:


java.lang.NullPointerException: null of SaleType of union in field f02 of fields
        at 
org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:90)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:85)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:56)
        at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)

Caused by: java.lang.NullPointerException
        at org.apache.avro.Schema$EnumSchema.getEnumOrdinal(Schema.java:651)
        at 
org.apache.avro.generic.GenericDatumWriter.writeEnum(GenericDatumWriter.java:120)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
        at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:102)
        at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:64)


One other issue is you can use a the same ENUM value in multiple ENUM's  but 
the code can not decide
which version you are using in a UNION

e.g. I have used RETURN in both SaleType and PoType then used SaleType and 
PoType in the same Enum ???

  enum SaleType {
      RETURN,
      OTHER,
      SALE
  }
  
  enum PoType {
    PURCHASE_ORDER,
    DIRECT_DELIVERY,
    RETURN,
    CONSIGNMENT
  }
  
 

  record fields {
    union {null, int, float, double, SaleType, PoType, letters, string} f02;

> writing unions with multiple records, fixed or enums can choose wrong branch 
> -----------------------------------------------------------------------------
>
>                 Key: AVRO-656
>                 URL: https://issues.apache.org/jira/browse/AVRO-656
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-656.patch
>
>
> According to the specification, a union may contain multiple instances of a 
> named type, provided they have different names.  There are several bugs in 
> the Java implementation of this when writing data:
>  - for record, only the short-name of the record is checked, so the branch 
> for a record of the same name in a different namespace may be used by mistake
>  - for enum and fixed, the name of the record is not checked, so the first 
> enum or fixed in the union will always be assumed when writing.  in many 
> cases this may cause the wrong data to be written, potentially corrupting 
> output.
> This is not a regression.  This has never been implemented correctly by Java. 
>  Python and Ruby never check names, but rather perform a full, recursive 
> validation of content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to