[
https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536008
]
Raghu Angadi commented on HADOOP-2071:
--------------------------------------
bq. the readimit argument for mark is not honored in these changes. If one
calls reset after more than readlimit bytes have been read after mark, that
reset is supposed to throw IOException.
We can just keep track of how many bytes we read and if it is larger than
readlimit, we can throw an IOException, if we want to keep that behavior.
Actually we can just throw an exception if there is no record found within
readlimit (instead of reading till there a match or EOF).
Lohit and I looked the code around and it seems to seek-back pretty heavily
(pretty much for every record). Seeking back is pretty inefficient in DFS. It
throws away current buffers (both app and TCP) and starts a new connection in
most cases. The current patch does not make this situation any worse. I wonder
what the typical size of these records is..
One problem with using BufferedInputStream() is that current code uses getPos()
and seek() in many place which is specific to FSDataInputStream. So it will
need more changes to manage it.
> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in
> hadoop 0.14
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-2071
> URL: https://issues.apache.org/jira/browse/HADOOP-2071
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.14.3
> Reporter: lohit vijayarenu
> Assignee: lohit vijayarenu
> Attachments: HADOOP-2071-1.patch
>
>
> In hadoop 0.14, using -inputreader StreamXmlRecordReader for streaming jobs
> throw
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to
> (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.