[ 
https://issues.apache.org/jira/browse/AVRO-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063912#comment-13063912
 ] 

Joe Crobak commented on AVRO-816:
---------------------------------

bq. a) some thoughts on functionality - if you're going for set operations, you 
should just go for it and give the full set of operations:

I think it's pretty non-intuitive (and potentially incorrect) to think about 
set operations for avro schemas. In addition to "union" meaning something 
different in Avro, it's unclear to me what a number of other operations would 
be... e.g. what is the intersection of int and long? int or {} (what does empty 
set even mean..)?  There are plenty of other examples where it breaks down...

With that said, I could see use for some of the methods you mention.  I'd 
prefer to add those as part of a separate patch, though (this is already 
getting rather large). subsumes/unify grew out of the need to determine the 
correct "read" schema for reading multiple avro data files with different but 
compatible schemas... nothing to do with set operations.

{quote}
b) some thoughts on naming -

I would favor "compose" "join" or "combine" over "unify"
"subsumes" seems a bit... obtuse? I'd leave it as "isSupersetOf" or "contains"
{quote}

I'm terrible at naming :).. compose, join, and combine would all be fine with 
me. I'd like to avoid the use of set operations in names per above (which rules 
out isSupersetOf) and I think contains would be confusing (e.g. a record with 
an int field "contains" an int schema)... other suggestions?



> Schema Comparison Utils
> -----------------------
>
>                 Key: AVRO-816
>                 URL: https://issues.apache.org/jira/browse/AVRO-816
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Joe Crobak
>            Assignee: Joe Crobak
>            Priority: Minor
>         Attachments: AVRO-816.patch, AVRO-816.patch, AVRO-816.patch, 
> AVRO-816.patch
>
>
> From my post on the mailing list, and Doug's response:
> {quote}
> On 05/05/2011 10:29 AM, Joe Crobak wrote:
> > We've recently come across a situation where we have two data files with
> > different schemas that we'd like to process together using
> > GenericDatumReader.  One schema is promotable to the other, but not vice
> > versa.  We'd like to programmatically determine which of the schemas to
> > use.  I did a brief look through javadoc and tests, and I couldn't find
> > any examples of checking if one schema is promotable to the other.  Has
> > anyone else come across this?
> >
> > For some context, we're considering patching AvroStorage [1] to remove
> > the assumption that all files have the same schema.  In our case, our
> > schema has evolved in that a field that was an int was promoted to a long.
> A boolean method that tells you if one schema is promotable to another
> would work in this case, but would not help in cases where, e.g.,
> different fields had changed in different versions.  For example, in
> branched development, two branches might each add a distinct symbol to
> an enum.  So I think you might be better off with a method that, given
> two schemas, returns their superset, a schema that can read data written
> by either.
> Such a method does not yet exist in Avro, but should not be difficult to
> add.  Please file an issue in Jira if this sounds of interest.
> Doug
> {quote}
> I think it would be useful to have both of the methods that Doug mentioned in 
> some sort of schema utils class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to