[
https://issues.apache.org/jira/browse/AVRO-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879903#action_12879903
]
Doug Cutting commented on AVRO-580:
-----------------------------------
> I'm not sure it's a good idea to encourage using multiple APIs in the same
> code.
The goal is actually to reduce the number of APIs. Folks could use a single
API (specific) to read & write data. If they've generated classes for records,
those classes will be used, otherwise GenericRecord would be used. The only
reason to use GenericDatumReader and GenericDatumWriter explicitly would be to
disable this, to force everything to use the generic representation, which
might be useful if one must, e,g., walk data genericially.
Alternately, instead of modifying SpecificDatumReader & SpecificDatumWriter, we
could add a third kind of reader/writer that has this behaviour.
> Is this to facilitate making Pair easier to program
It may not even be required by Pair. Pair can be implemented as a
manually-written SpecificRecord, so that the specific reader/writer can handle
it. The generic writer can write instances of this (since it only requires
IndexedRecord, which both SpecificRecord and GenericRecord implement) and the
generic reader could similarly read instances of this, except for instance
creation. So, the least-lines of code way to implement this might be to use a
subclass of GenericDatumReader when the user has requested generic data that
special cases Pair. But this subclass would be equivalent to changing
SpecificDatumWriter to punt to GenericDatumWriter when it sees an unknown
class, and that latter implementation is more generally useful.
The specific/generic distinction is confusing. Rather than telling folks
who're, e.g., writing a mapreduce program that they need to decide which
representation they'll use, we can tell them that, if they generate code it
will be used, otherwise generic representations will be used.
> is there a genuine use case when you want to use one API for the first
> element of the pair, and a different API for the second element?
In reduce logic we'd like to process pairs identically regardless of how their
keys and values are represented, so using different classes makes for more
work. We could have a common interface for the two, but that's hard when one
would naturally be a GenericRecord and the other a manually written class.
Currently object trees must all be either specific or generic. By extending
SpecificRecord, one can intermix manually-written classes with specific
classes, permitting things like a generic Pair<K,V> that's not easily handled
by generated code, since the key/value schemas vary. It seems to me that it'd
be a feature to be able to intermix generic data into trees too, simplifying
lots of things.
> java: permit generic data within specific
> -----------------------------------------
>
> Key: AVRO-580
> URL: https://issues.apache.org/jira/browse/AVRO-580
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.4.0
>
>
> It should be possible to intermix specific and generic data. For example, if
> some fields of a record have specific classes defined, while others do not,
> the latter should use the generic representation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.