[
https://issues.apache.org/jira/browse/HADOOP-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475832
]
David Bowen commented on HADOOP-941:
------------------------------------
Milind,
Excuse my newbie-ness. I didn't realize that readVLong etc were old code from
WritableUtils. Are these methods now duplicated in record.Utils simply to
facilitate using the o.a.h.r package stand-alone? That would seem unfortunate.
I still can't see that the methods are correct. I see the sign bit removed
from negative numbers, and I don't see where it is put back. In any case, it
would seem logical for writeVLong to use less than 8 bytes for small negative
ints, and it does not appear to do that.
On a separate topic: it might be worth considering a different approach to code
generation for record Comparators. E.g. a generated record could have an
additional method to return its "legend", like this:
private static final byte[] legend = { TYPE_BOOL, TYPE_FLOAT, TYPE_LONG,
TYPE_USTRING };
public byte[] getLegend() { return legend; }
where the TYPE_* things are static final bytes. Then you could have a
Comparator that knows how to compare the binary forms of things that have
legends - it just iterates over the legend using a switch statement to do the
right thing based on the type.
I think there is a maintenance benefit in keeping the generated code as small
and as simple as possible. Performance-wise, this adds the overhead of a
for-loop and a switch statement dispatch, but I don't think that would be
significant.
- David
> Make Hadoop Record I/O Easier to use outside Hadoop
> ---------------------------------------------------
>
> Key: HADOOP-941
> URL: https://issues.apache.org/jira/browse/HADOOP-941
> Project: Hadoop
> Issue Type: Improvement
> Components: record
> Affects Versions: 0.10.1
> Environment: All
> Reporter: Milind Bhandarkar
> Assigned To: Milind Bhandarkar
> Attachments: jute-patch.txt
>
>
> Hadoop record I/O can be used effectively outside of Hadoop. It would
> increase its utility if developers can use it without having to import hadoop
> classes, or having to depend on Hadoop jars. Following changes to the current
> translator and runtime are proposed.
> Proposed Changes:
> 1. Use java.lang.String as a native type for ustring (instead of Text.)
> 2. Provide a Buffer class as a native Java type for buffer (instead of
> BytesWritable), so that later BytesWritable could be implemented as following
> DDL:
> module org.apache.hadoop.io {
> record BytesWritable {
> buffer value;
> }
> }
> 3. Member names in generated classes should not have prefixes 'm' before
> their names. In the above example, the private member name would be 'value'
> not 'mvalue' as it is done now.
> 4. Convert getters and setters to have CamelCase. e.g. in the above example
> the getter will be:
> public Buffer getValue();
> 5. Provide a 'swiggable' C binding, so that processing the generated C code
> with swig allows it to be used in scripting languages such as Python and Perl.
> 6. The default --language="java" target would generate class code for records
> that would not have Hadoop dependency on WritableComparable interface, but
> instead would have "implements Record, Comparable". (i.e. It will not have
> write() and readFields() methods.) An additional option "--writable" will
> need to be specified on rcc commandline to generate classes that "implements
> Record, WritableComparable".
> 7. Optimize generated write() and readFields() methods, so that they do not
> have to create BinaryOutputArchive or BinaryInputArchive every time these
> methods are called on a record.
> 8. Implement ByteInStream and ByteOutStream for C++ runtime, as they will be
> needed for using Hadoop Record I/O with forthcoming C++ MapReduce framework
> (currently, only FileStreams are provided.)
> 9. Generate clone() methods for records in Java i.e. the generated classes
> should implement Cloneable.
> 10. As part of Hadoop build process, produce a tar bundle for Record I/O
> alone. This tar bundle will contain the translator classes and ant task
> (lib/rcc.jar), translator script (bin/rcc), Java runtime (recordio.jar) that
> includes org.apache.hadoop.record.*, sources for the java runtime (src/java),
> and c/c++ runtime sources with Makefiles (src/c++, src/c).
> 11. Make generated Java codes for maps and vectors use Java generics.
> These are the proposed user-visible changes. Internally, the translator will
> be restructured so that it is easier to plug-in translators for different
> targets.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.