[ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]
Doug Cutting updated HADOOP-525:
--------------------------------
Priority: Major (was: Minor)
A raw comparator shouldn't have to deserialize fields, but should operate
directly on the field data. For primitive fields we'd generate calls to
methods like WritableComparator.{readInt,readLong,...}. For Text, we'd
generate calls to WritableComparator.compareBytes(). For complex objects we'd
generate calls to their raw comparator.
Besides having a huge performance benefit, adding raw comparators to records
would solve other problems with Hadoop's io framework: currently it is possible
for raw and cooked comparators to differ. But if both are auto-generated from
the same source they'll be guaranteed compatible. Also, raw comparators are
fragile and difficult to develop, since they bypass all type mechanisms.
Generated code would ensure correctness.
I've increased the priority of this issue. We should implement this and start
using records more extensively. Prior we've mostly thought of records as an
aid for interoperability with other programming languages, but I think they'll
also be a valuable for performance and correctness.
> Need raw comparators for hadoop record types
> --------------------------------------------
>
> Key: HADOOP-525
> URL: http://issues.apache.org/jira/browse/HADOOP-525
> Project: Hadoop
> Issue Type: Improvement
> Components: record
> Affects Versions: 0.6.0
> Reporter: Sameer Paranjpye
> Assigned To: Milind Bhandarkar
> Fix For: 0.8.0
>
> Attachments: TypeBuilder-support.tar, TypeBuilder.java,
> WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the
> Hadoop record framework. This could have a substantial performance impact
> when using hadoop record generated types in Map/Reduce. The record i/o
> framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise
> comparison of objects. A possible implementation could only deserialize one
> member from each object at a time, compare them and either return or move on
> to the next member if the values are equal.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira