[
https://issues.apache.org/jira/browse/HADOOP-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545712
]
Sam Pullara commented on HADOOP-2285:
-------------------------------------
A better implementation, very similar to the one you might find in
BufferedReader (but with bytes instead of chars), would be the best solution.
The basic issue stems from the fact that read() is called on a per byte basis
in the current readLine implementation rather than using the more efficient
read([], o, l) call thats available and then manipulating the array.
> TextInputFormat is slow compared to reading files.
> --------------------------------------------------
>
> Key: HADOOP-2285
> URL: https://issues.apache.org/jira/browse/HADOOP-2285
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.15.0
> Reporter: Owen O'Malley
> Fix For: 0.16.0
>
>
> The LineRecordReader reads from the source byte by byte, which seems to be
> half as fast as if the readLine method was defined on the memory buffer
> directly instead of as an InputStream.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.