[hypertable-dev] Re: CellStore files should be open on demand?

donald Thu, 05 Mar 2009 07:50:59 -0800

So I have done more digging on this subject...

There is another problem if many files are kept open at the same time:
once you read some data from a HDFS file by calling FSInputStream.read
(byte[] buf, int off, int len), a tcp connection between HdfsBroker
and the DataNode that contains the file block is set up, this
connection is kept until you read another block (by default 64MB in
size) of the file, or close the file entirely. There is a timeout on
the server side, but I see no clue on the client side. So you quickly
end up with a lot of idle connections between the HdfsBroker and many
DataNodes.

What's even worse, no matter how many bytes the application wants to
read, the HDFS client library always requests the the chosen DataNode
to send all the remaining bytes of the block. Which means if you read
1 byte from the beginning of a block, the DataNode actually gets the
request of sending the whole block, of which only the first few bytes
are read. Consequences are: if the client reads nothing for quite a
long while, 1) the kernel tcp send queue on the DataNode side and the
tcp receive queue on the client side are quickly fed up; 2) the
DataNode Xceiver thread (there is a max count of 256 by default) is
blocked. Eventually the Xceiver timeouts, and closes the connection.
However this FIN packet cannot reach client side as send&receive
queues are still blocked. Here is what I observe from one node of our
test cluster:
$ netstat -ntp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address               Foreign
Address             State       PID/Program name
tcp        0 121937 10.65.25.150:50010
10.65.25.150:38595          FIN_WAIT1   -
[...]
tcp    74672      0 10.65.25.150:38595
10.65.25.150:50010          ESTABLISHED 32667/java
[...]
(and hundreds of other connections in the same states)

Possible solutions without modifying hadoop client library are: 1)
open-read-close the file stream every time the cell store is accessed;
2) always use postioned read: read(long position, byte[] buf, int off,
int len) instead, because pread doesn't keep the tcp connection with
DataNodes. Solution 1 is not scalable because every open() operation
includes interaction with HDFS NameNode, which immediately becomes a
bottleneck: in our test cluster the NameNode can only handle hundreds
of parallel open() request per second, with an average delay of 2-3ms.
I haven't tested the performance of solution 2 yet, I will put up some
numbers tomorrow.

Donald

On Feb 25, 8:59 pm, "Liu Kejia (Donald)" <[email protected]> wrote:
> It turns out the hadoop-default.xml packaged in my custom
> hadoop-0.19.0-core.jar has set the "io.file.buffer.size" to 131072 (128KB),
> which means DfsBroker has to open a 128KB buffer for every open file. The
> official hadoop-0.19.0-core.jar sets this value to 4096, which is more
> reasonable for applications like Hypertable.
> Donald
>
> On Fri, Feb 20, 2009 at 11:55 AM, Liu Kejia (Donald)
> <[email protected]>wrote:
>
> > Caching might not work very well because keys are randomly generated,
> > resulting in bad locality...
> > Even it's Java, hundreds of kilobytes per file object is still very big.
> > I'll profile HdfsBroker to see what exactly is using so much memory, and
> > post the results later.
>
> > Donald
>
> > On Fri, Feb 20, 2009 at 11:20 AM, Doug Judd <[email protected]> wrote:
>
> >> Hi Donald,
>
> >> Interesting.  One possibility would be to have an open CellStore cache.
> >> Frequently accessed CellStores would remain open, while seldom used ones 
> >> get
> >> closed.  The effectiveness of this solution would depend on the workload.
> >> Do you think this might work for your use case?
>
> >> - Doug
>
> >> On Thu, Feb 19, 2009 at 7:09 PM, donald <[email protected]> wrote:
>
> >>> Hi all,
>
> >>> I recently run into the problem that HdfsBroker throws out of memory
> >>> exception, because too many CellStore files in HDFS are kept open - I
> >>> have over 600 ranges per range server, with a maximum of 10 cell
> >>> stores per range, that'll be 6,000 open files at the same time, making
> >>> HdfsBroker to take gigabytes of memory.
>
> >>> If we open the CellStore file on demand, i.e. when a scanner is
> >>> created on it, this problem is gone. However random-read performance
> >>> may drop due to the the overhead of opening a file in HDFS. Any better
> >>> solution?
>
> >>> Donald
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[hypertable-dev] Re: CellStore files should be open on demand?

Reply via email to