On Tue, Aug 2, 2016 at 12:17 AM, Mr rty ff wrote:
>
> Hi I have few questions about implementation of inputstream in S3.
> 1)public synchronized long getPos() throws IOException
> {return (nextReadPos < 0) ? 0 : nextReadPos;}
> Why does it return nextReadPos not pos?
My understanding is:
seek() is a lazy implementation. S3AInputStream keeps track of two
seek positions:
1. current position in underlying stream (pos)
2. next position to read (nextReadPos).
If the seek() implementation were eager, not lazy, we could do the seeking when
seek() is called. In that case, I think we would only need to keep
track of #1 (pos).
Instead we keep track of where the next read() will start, and
lazily do the seek logic when it is actually needed.
getPos() is supposed to return the position of the next read(),
so nextReadPos is the correct value to return.
> In memeber definition for
> pos/*** This is the public position; the one set in {@link #seek(long)}* and
> returned in {@link #getPos()}.*/
This is probably the source of your confusion. Looks like this comment should
be changed. I believe pos is the position of the underlying stream,
not the next read pos. They probably became different when
lazy seek was implemented.
> private long pos;
> 2)seekInStream In the last lines you have:// close the stream;
> if read the object will be opened at the new pos
> closeStream("seekInStream()", this.requestedStreamLen);
> pos = targetPos; Why you need this line? Shouldn`t pos be updated
> with actual skipped value? As you did:
> | if (skipped > 0) { |
> | pos += skipped; |
skipped variable is not in scope at that point.
It is used to keep track of how far the underlying stream actually skipped.
The point of this logic is to balance performance between
(a) always reopening the stream at the newly-seeked position
(b) just reading forward and discarding unneeded bytes
I believe (a) was found to inefficient in some cases.
This code implements both approaches, depending on how far
forward the seek() is. The code you are talking about here is
the (a) case where we reopen the stream on next read().
In this case, we just store the desired position (pos) which
will be used in the next call to read() to open the
stream at the offset 'pos' (see call to lazySeek()).
-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org