[
https://issues.apache.org/jira/browse/AVRO-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914931#action_12914931
]
Bruce Martin commented on AVRO-656:
-----------------------------------
In java (Avro version 1.4) if you use anything other than the first ENUM in a
UNION you can get an exception when writing to a file:
java.lang.NullPointerException: null of SaleType of union in field f02 of fields
at
org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:90)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:85)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:56)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
Caused by: java.lang.NullPointerException
at org.apache.avro.Schema$EnumSchema.getEnumOrdinal(Schema.java:651)
at
org.apache.avro.generic.GenericDatumWriter.writeEnum(GenericDatumWriter.java:120)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:102)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:64)
One other issue is you can use a the same ENUM value in multiple ENUM's but
the code can not decide
which version you are using in a UNION
e.g. I have used RETURN in both SaleType and PoType then used SaleType and
PoType in the same Enum ???
enum SaleType {
RETURN,
OTHER,
SALE
}
enum PoType {
PURCHASE_ORDER,
DIRECT_DELIVERY,
RETURN,
CONSIGNMENT
}
record fields {
union {null, int, float, double, SaleType, PoType, letters, string} f02;
> writing unions with multiple records, fixed or enums can choose wrong branch
> -----------------------------------------------------------------------------
>
> Key: AVRO-656
> URL: https://issues.apache.org/jira/browse/AVRO-656
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.4.0
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Attachments: AVRO-656.patch
>
>
> According to the specification, a union may contain multiple instances of a
> named type, provided they have different names. There are several bugs in
> the Java implementation of this when writing data:
> - for record, only the short-name of the record is checked, so the branch
> for a record of the same name in a different namespace may be used by mistake
> - for enum and fixed, the name of the record is not checked, so the first
> enum or fixed in the union will always be assumed when writing. in many
> cases this may cause the wrong data to be written, potentially corrupting
> output.
> This is not a regression. This has never been implemented correctly by Java.
> Python and Ruby never check names, but rather perform a full, recursive
> validation of content.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.