Hi Chao,
As far as I know, if client B opens the file which is under construction,
 the DFSInputStream will get the LocatedBlocks object and it contains a
member variable which called "underConstruction" to mark this file is under
construction.
If the file is reopen, the client will get a different length. I think this
is make sense because that the file is no longer the old one but one with
new append data.

Write edits log to HBase, additions are appended to the end of the WAL file
rather than reopen the HDFS file second.


2013/12/27 Chao Shi <stepi...@live.com>

> Hi users,
>
> Suppose a client A opens /f and keep appending data then hflushing.
> Another client B opens this file for read. I found that B can only see the
> snapshot of data at the time he opens the file. (After B's opening, A may
> continue to write more data. B cannot see it unless reopen.)
>
> Looking into the code, I think this is because DFSInputStream maintains a
> file length and simply report EOF when we read beyond the file length. The
> file length is updated and thus the client has a chance to see longer file
> when:
> 1) the file is open
> 2) no live DNs to read from (correct? not very sure.)
>
> I think such behaviour is inconsistent. Clients may see a sudden change of
> file length. I guess a better behaviour is to always try to read beyond the
> known file length at client-side and let the DN  to return EOF if no more
> data. In this way, the client B can continue to see what A wrote and
> hflushed.
>
> A real use case for this is HBase log replication. In the region server,
> there is a background thread keep polling for new HLog entries. It has to
> reopen every second. This may put a pressure on NN if the number of region
> servers gets larger.
>
> Please correct me if there is anything wrong.
>
> Thanks,
> Chao
>

Reply via email to