[ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Doug Cutting updated HADOOP-525:
--------------------------------

    Priority: Major  (was: Minor)

A raw comparator shouldn't have to deserialize fields, but should operate 
directly on the field data.  For primitive fields we'd generate calls to 
methods like WritableComparator.{readInt,readLong,...}.  For Text, we'd 
generate calls to WritableComparator.compareBytes().  For complex objects we'd 
generate calls to their raw comparator.

Besides having a huge performance benefit, adding raw comparators to records 
would solve other problems with Hadoop's io framework: currently it is possible 
for raw and cooked comparators to differ.  But if both are auto-generated from 
the same source they'll be guaranteed compatible.  Also, raw comparators are 
fragile and difficult to develop, since they bypass all type mechanisms.  
Generated code would ensure correctness.

I've increased the priority of this issue.  We should implement this and start 
using records more extensively.  Prior we've mostly thought of records as an 
aid for interoperability with other programming languages, but I think they'll 
also be a valuable for performance and correctness.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.8.0
>
>         Attachments: TypeBuilder-support.tar, TypeBuilder.java, 
> WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the 
> Hadoop record framework. This could have a substantial performance impact 
> when using hadoop record generated types in Map/Reduce. The record i/o 
> framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise 
> comparison of objects. A possible implementation could only deserialize one 
> member from each object at a time, compare them and either return or move on 
> to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to