[jira] Updated: (HADOOP-4226) LineReader::readLine cleanup

Chris Douglas (JIRA) Thu, 25 Sep 2008 21:36:37 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Douglas updated HADOOP-4226:
----------------------------------

    Status: Open  (was: Patch Available)

bq. OK. BTW, are you planning to switch LRR to use util.LineReader?
HADOOP-4269 fixed this; trunk runs the unit tests from TextInputFormat over 
o.a.h.util.LineReader, as intended. Sorry, the current patch no longer applies 
to trunk.

bq. Only as an optimization; for this little piece of code I think it's OK. It 
is (or going to be) heavily used, as you say.
*nod* Fair enough. I think it makes the code slightly harder to read for a 
nominal performance gain that the JIT compiler should cover, but it's not 
significant.

bq. Sorry, fixed. It was not obvious from looking at the code.
It's not obvious that it should work that way at all, but backwards 
compatibility is a big deal for this class.

* Unless there are new tests in TestLineReader, that class or the redundant 
tests in TestTextInputFormat should be removed. A test case verifying that 
streams ending in \r behave like those ending in \n would be a good addition to 
this patch. Sorry, "this" was ambiguous in my last comment.
* The default buffer size doesn't need to be public (resolved in trunk)
* CR and LF don't need to be constants; they should be private at least
* Instead of
{noformat}
+        if (bufferLength <= bufferPosn)
+          break; // EOF
{noformat}
It might make more sense to check {{bufferLength <= 0}}, to which bufferPosn is 
set
* Instead of incrementing:
{noformat}
+        if (prevCharCR) { //CR + notLF, we are at notLF
+          ++newlineLength;
{noformat}
Shouldn't this always set newLineLength = 1?
* If buffer ends in \r and the following segment starts with \r, it looks like 
this may not separate those lines.

> LineReader::readLine cleanup
> ----------------------------
>
>                 Key: HADOOP-4226
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4226
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Yuri Pradkin
>            Assignee: Yuri Pradkin
>            Priority: Minor
>         Attachments: HADOOP-4226.patch, HADOOP-4226.patch, HADOOP-4226.patch, 
> HADOOP-4226.patch
>
>
> I've been looking at HADOOP-4010 and realized that readLine is pretty 
> convoluted.  I changed the implementation which made it hopefully a little 
> easier to read/validate/understand.  
> I've had some problems testing it locally, so I'll submit it for Hudson to 
> test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4226) LineReader::readLine cleanup

Reply via email to