[ 
https://issues.apache.org/jira/browse/AVRO-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906924#action_12906924
 ] 

Scott Carey commented on AVRO-656:
----------------------------------

bq. A high-fidelity implementation can read and write data without alteration, 
but an implementation that cannot write data exactly as read might still be 
both useful and correctly implement the Avro specification.

I agree, an implementation doesn't need to have that ability.  I am wary of 
restricting what is capable in unions to what is 'easy' in languages with 
weaker type systems.

bq. A primary question of this issue is whether to continue to permit multiple 
enums and fixed in a union, distinguished by name. No implementation takes 
advantage of this today, and it might make implementations simpler to drop 
this, permitting only a single enum and fixed per union. So far, no one has 
presented a use case for this feature.

To be clear, would that break this:

{code}
[
{"name": "com.rr.avro.Fixed16", "type": "fixed", "size":16},
{"name": "com.rr.avro.Fixed4", "type": "fixed", "size":4},
{"name": "com.rr.avro.MyRecord", "type": "record", "fields": [
  {"name": "hostIp", "type": ["Fixed4", "Fixed16"], "doc": "should always be 4 
bytes (IPv4) or 16 bytes (IPv6)"},
   ... (more fields)
  }}
]
{code}

Which I have in use in production right now.  I could switch to bytes and 
control the size restrictions client side however.  But schema migration might 
be a bit annoying in that case -- in particular would new code be able to read 
old data written with the above schema?

I have a hard time thinking of a use case for multiple enums.  A union of two 
different enums is too much like a single, larger enum.
A union of multiple fixed has some uses, but can always be replaced with bytes. 
 The main motivation for the union of two fixed instead of bytes is that if 
there is a third member of the union, it saves space.  ["null", "Fixed4", 
"Fixed16"] takes up 1 less byte than ["null", "bytes"] when not null.


On a different note with Unions, doing some research and experimentation with 
Scala recently I fount it interesting that Avro Unions map almost 1:1 to Scala 
'case classes'.  It is a bit annoying to map Unions to Java polymorphically 
(perhaps with Avro-648), but would be simple in Scala.

> writing unions with multiple records, fixed or enums can choose wrong branch 
> -----------------------------------------------------------------------------
>
>                 Key: AVRO-656
>                 URL: https://issues.apache.org/jira/browse/AVRO-656
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-656.patch
>
>
> According to the specification, a union may contain multiple instances of a 
> named type, provided they have different names.  There are several bugs in 
> the Java implementation of this when writing data:
>  - for record, only the short-name of the record is checked, so the branch 
> for a record of the same name in a different namespace may be used by mistake
>  - for enum and fixed, the name of the record is not checked, so the first 
> enum or fixed in the union will always be assumed when writing.  in many 
> cases this may cause the wrong data to be written, potentially corrupting 
> output.
> This is not a regression.  This has never been implemented correctly by Java. 
>  Python and Ruby never check names, but rather perform a full, recursive 
> validation of content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to