[ https://issues.apache.org/jira/browse/HADOOP-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467970 ]
Runping Qi commented on HADOOP-941: ----------------------------------- The generated classes should have a static method returning the IDL string from which the classes are generated. Also, it may be a good time to start to think how to deal with the issues related to IDL versioning/evolution. Of course, this issue is a big issue, and certainly deserves a separate JIRA:) > Make Hadoop Record I/O Easier to use outside Hadoop > --------------------------------------------------- > > Key: HADOOP-941 > URL: https://issues.apache.org/jira/browse/HADOOP-941 > Project: Hadoop > Issue Type: Improvement > Components: record > Affects Versions: 0.10.1 > Environment: All > Reporter: Milind Bhandarkar > Assigned To: Milind Bhandarkar > Fix For: 0.11.0 > > > Hadoop record I/O can be used effectively outside of Hadoop. It would > increase its utility if developers can use it without having to import hadoop > classes, or having to depend on Hadoop jars. Following changes to the current > translator and runtime are proposed. > Proposed Changes: > 1. Use java.lang.String as a native type for ustring (instead of Text.) > 2. Provide a Buffer class as a native Java type for buffer (instead of > BytesWritable), so that later BytesWritable could be implemented as following > DDL: > module org.apache.hadoop.io { > record BytesWritable { > buffer value; > } > } > 3. Member names in generated classes should not have prefixes 'm' before > their names. In the above example, the private member name would be 'value' > not 'mvalue' as it is done now. > 4. Convert getters and setters to have CamelCase. e.g. in the above example > the getter will be: > public Buffer getValue(); > 5. Provide a 'swiggable' C binding, so that processing the generated C code > with swig allows it to be used in scripting languages such as Python and Perl. > 6. The default --language="java" target would generate class code for records > that would not have Hadoop dependency on WritableComparable interface, but > instead would have "implements Record, Comparable". (i.e. It will not have > write() and readFields() methods.) An additional option "--writable" will > need to be specified on rcc commandline to generate classes that "implements > Record, WritableComparable". > 7. Optimize generated write() and readFields() methods, so that they do not > have to create BinaryOutputArchive or BinaryInputArchive every time these > methods are called on a record. > 8. Implement ByteInStream and ByteOutStream for C++ runtime, as they will be > needed for using Hadoop Record I/O with forthcoming C++ MapReduce framework > (currently, only FileStreams are provided.) > 9. Generate clone() methods for records in Java i.e. the generated classes > should implement Cloneable. > 10. As part of Hadoop build process, produce a tar bundle for Record I/O > alone. This tar bundle will contain the translator classes and ant task > (lib/rcc.jar), translator script (bin/rcc), Java runtime (recordio.jar) that > includes org.apache.hadoop.record.*, sources for the java runtime (src/java), > and c/c++ runtime sources with Makefiles (src/c++, src/c). > 11. Make generated Java codes for maps and vectors use Java generics. > These are the proposed user-visible changes. Internally, the translator will > be restructured so that it is easier to plug-in translators for different > targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.