[
http://issues.apache.org/jira/browse/HADOOP-120?page=comments#action_12373224 ]
Owen O'Malley commented on HADOOP-120:
--------------------------------------
After thinking about it for a bit, the problem with this patch is that this is
going to encode the typename in each and every record. So if your value type is
ArrayWriter<UTF8>, you are going to spend an extra
2+strlen("org.apache.hadoop.io.UTF8") bytes per a record. That's a fair amount
of overhead.
We also have to be a careful with the serialization of ArrayWritable because it
is used in the DFS name node logs.
I'm not sure what the right solution is. Probably for right now, I would derive
a subclass of ArrayWritable that is specific for your type. It isn't pretty,
but it is guaranteed to be safe.
public class UTF8Array extends ArrayWritable {
public UTF8Array() {
super(UTF8.class);
}
}
> Reading an ArrayWriter does not work because valueClass does not get
> initialized
> --------------------------------------------------------------------------------
>
> Key: HADOOP-120
> URL: http://issues.apache.org/jira/browse/HADOOP-120
> Project: Hadoop
> Type: Bug
> Components: io
> Environment: Red Hat
> Reporter: Dick King
> Attachments: hadoop-120-fix.patch
>
> If you have a Reducer whose value type is an ArrayWriter it gets enstreamed
> alright but at reconstruction type when ArrayWriter::readFields(DataInput in)
> runs on a DataInput that has a nonempty ArrayWriter , newInstance fails
> trying to instantiate the null class.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira