[
https://issues.apache.org/jira/browse/HADOOP-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Douglas updated HADOOP-3665:
----------------------------------
Attachment: 3665-0.patch
bq. The whole point is that I would like to understand how Reduce job can
output a file without any key values in it. The NullWritable seemed to be an
ideal candidate for this but unfortunately I ran into exceptions when trying
it. So I made a quick and dirty fix which is not meant to be a production ready
(obviously NullWritable should not be special-cased in any way!).
I'm sorry, I hadn't understood this. If you only want to output null keys from
your reduce, then the RecordWriter used by your OutputFormat can encode or
ignore null keys (e.g. TextOutputFormat). SequenceFiles, as you discovered,
explicitly disallow zero-length keys, so you'll need to pick a different binary
file format to store output records. Glancing at the code, this constraint is
inconsistently enforced, and not for any particular reason that I can discern.
Adapting SequenceFile to handle zero-length keys might be as simple as allowing
zero-length keys from the Writers, since the Reader looks like it could handle
it.
bq. On the other hand there seemed to be some questions which need to be asked
and possible addressed. One of them is that ReflectionUtils is able to call any
constructor after setAccessible is set to true but is this what we really want
for singleton keys? And do we really need singleton keys at all? (I believe the
answer is positive).
There's already a fair amount of object reuse. We need an object to deserialize
into per the Writable contract, so a registration system like the one in
WritableComparator would be necessary in ReflectionUtils to make singletons
work (i.e. a map of classes to instances checked before the map of classes to
constructors). Other than NullWritable, all of the sane use cases I can think
of are just badly designed, but there are likely good ones.
bq. How about size (length) of key value? Is it allowed to be zero?
It depends on where in the framework you're looking. The OutputFormat defines
how to encode/handle null/NullWritable keys from the reduce (or the map if
you're running without reduces). In 0.17, intermediate data is stored in
SequenceFiles, so zero-length keys can't be emitted from the map. In 0.18,
zero-length keys are supported, but their semantics are kind of odd. In most
cases, emitting NullWritable keys from the map is not a scalable design.
bq. And why WritableComparato calls to newInstance method while this causes
issues with any class having non-public constructor?
Most WritableComparable types use RawComparator, which provides much better
performance while rendering this consideration irrelevant. Unfortunately,
WritableComparator creates new instances of its internal keys whether it
requires them or not! This is easily remedied. This patch does the following:
* No longer creates instances of the WritableComparable in WritableComparator
when a class has registered a WritableComparator (neither does it create a
buffer). This makes super.compare(byte[], off1, len1, byte[], off2, len2)
illegal, but I doubt this is a problem. Though one could imagine a situation
where a raw comparator attempts an efficient comparison but uses the slow
comparator when the result is ambiguous, such a comparator is easily adapted.
* Lets WritableComparators be configurable, so WritableComparable objects not
defining RawComparators are still configured before being compared
* Defines a raw comparator for NullWritable
* Changes checks in SequenceFile Writer classes to check only for key lengths
less than zero; this doesn't require any changes to the Reader, which already
supports zero-length keys, so the SequenceFile version doesn't need to be
adjusted, either.
* Adds a test case for reading/writing NullWritable keys.
> WritableComparator newKey() fails for NullWritable
> --------------------------------------------------
>
> Key: HADOOP-3665
> URL: https://issues.apache.org/jira/browse/HADOOP-3665
> Project: Hadoop Core
> Issue Type: Bug
> Components: io
> Affects Versions: 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.16.4, 0.17.0
> Environment: n/a
> Reporter: Lukas Vlcek
> Priority: Minor
> Fix For: 0.19.0
>
> Attachments: 3665-0.patch, HADOOP-3665.path
>
>
> It is not possible to use NullWritable as a key in order to suppress key
> value in output.
> Syndrome exception:
> Caused by: java.lang.IllegalAccessException: Class
> org.apache.hadoop.io.WritableComparator can not access a member of class
> org.apache.hadoop.io.NullWritable with modifiers "private"
> The problem is that NullWritable is a singleton and does not provide public
> non-parametric constructor. The following code in WritableComparator causes
> the exception: return (WritableComparable)keyClass.newInstance();
> Proposed simple solution is to use ReflectionUtils instead (it requires
> modification as well).
> This issue is probably related to HADOOP-2922
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.