[
https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596730#action_12596730
]
Enis Soztutar commented on HADOOP-3380:
---------------------------------------
With the introduction of serialization framework, the need for RawComparator is
somewhat broken.
In theory an object of some type (for example Double) can be serialized to its
byte[] form in an arbitrary way by different serializers, so it is not possible
to efficiently compare two byte arrays w/o actually deserializing the objects.
Although some objects, especially writables, can precisely know how it is
serialized and thus can benefit from raw byte comparison(in short we should
keep RawComparator)
Similarly the returned RawComparators returned by Serialization#getComparator()
cannot do much except deserializing the objects and calling
{{o1.compareTo(o2)}} (see {{DeserializerComparator}} and
{{JavaSerializationComparator}}).
I think we should
# not change Serialization interface
# introduce DefaultComparator extending DeserializerComparator, implementing
Configurable, and with static {{register(Class, RawComparator)}} and
{{get(Class)}} methods.
DefaultComparator.get(Class keyClass) should check for registered Comparator
instances for a given class, if unsuccessful, it should return itself,
obtaining Deserializer by calling serializationFactory.getDeSerializer(c);
# replace usages of WritableComparator#define() with
DefaultComparator#register(),
# WritableComparator extends DefaultComparator
# fix JobConf#getOutputValueGroupingComparator(), so that it uses
DefaultComparator.
# depracate JavaSerializationComparator (since it is not needed once we have
DefaultComparator extending DeserializerComparator)
thoughts ?
> need comparators in serializer framework
> ----------------------------------------
>
> Key: HADOOP-3380
> URL: https://issues.apache.org/jira/browse/HADOOP-3380
> Project: Hadoop Core
> Issue Type: New Feature
> Components: io
> Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different
> serialization systems, including Hadoop's Writable, Thrift, Java
> Serialization, etc. It provides a generic, extensible means
> (SerializationFactory) to create serializers and deserializers for arbitrary
> Java classes. However it does not include a generic means to create
> comparators for these classes. Comparators are required for MapReduce keys
> and many other computations. Thus we should enhance the serialization
> framwork to provide comparators too.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.