Gelesh created HADOOP-9168: ------------------------------ Summary: The Naming and Inheritance for RecordReader, LineRecordReader, LineReader Key: HADOOP-9168 URL: https://issues.apache.org/jira/browse/HADOOP-9168 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 0.23.5, 2.0.2-alpha, 0.21.0 Reporter: Gelesh Priority: Minor Fix For: site, hudson, 1.2.0, 0.23.2
I feel LineReader is not the correct name, since it reads up to a given delimiter. How about Text Record Reader ? Sounds correct but LineReader is not a RecordReader by inheritance, but by functionality , yes it is the Record reader. Now if we look at it with a different angle, In General, InputFormat would mostly has two responsibilities 1)To Read A split 2)Generate Key & Value pairs based upon the Reading done over Split. Now in TextInputFormat, Has a RecordReader, Which is inherited by LineRecordReader, which uses another class LineReader. But We Have LineReader, which does the reading of the file. LineRecordReader generates key & Value. I would suggest, RecordReader to be renamed as KeyValueGenerator, LineRecordReader to be renamed as TextInputKeyValueGenerator, LineReader to be renamed as delimitedTextReader, Generic attributes of LineReader (such as start, pos, end, buffer, bufferBytes .. etc ) to be abstracted to a class called RecordReader, Since its all specific to reading of the given input. delimitedTextReader class could extend RecordReader. Now the names could make better scene. We must also look into computability as well. It might be un fit to deploy unless a new API is introduced. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira