try update jdk to 1.6, there is a bug for jdk 1.5 about nio.
在 2008-9-26,下午7:29,Goel, Ankur 写道:
Hi Folks,
We have developed a simple log writer in Java that is plugged into
Apache custom log and writes log entries directly to our hadoop
cluster
(50 machines, quad core, each with 16 GB Ram and 800 GB hard-disk, 1
machine as dedicated Namenode another machine as JobTracker
TaskTracker + DataNode).
There are around 8 Apache servers dumping logs into HDFS via our
writer.
Everything was working fine and we were getting around 15 - 20 MB log
data per hour from each server.
Recently we have been experiencing problems with 2-3 of our Apache
servers where a file is opened by log-writer in HDFS for writing
but it
never receives any data.
Looking at apache error logs shows the following errors
08/09/22 05:02:13 INFO ipc.Client: java.io.IOException: Too many open
files
at sun.nio.ch.IOUtil.initPipe(Native Method)
at
sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:49)
at
sun.nio.ch.EPollSelectorProvider.openSelector
(EPollSelectorProvider.java
:18)
at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get
(SocketIOWithT
imeout.java:312)
at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select
(SocketIOWi
thTimeout.java:227)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO
(SocketIOWithTimeout.java:
155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:
149)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:
122)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at
org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:203)
at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at
java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:
289)
...
...
Followed by connection errors saying
Retrying to connect to server: hadoop-server.com:9000. Already tried
'n' times.
(same as above) ...
and is retrying constantly (log-writer set up so that it waits and
retries).
Doing an lsof on the log writer java process shows that it got
stuck in
a lot of pipe/event poll and eventually ran out of file handles.
Below is the part of the lsof output
lsof -p 2171
COMMAND PID USER FD TYPE DEVICE SIZE NODE
NAME
java2171 root 20r FIFO0,7 24090207
pipe
java2171 root 21w FIFO0,7 24090207
pipe
java2171 root 22r 0,80 24090208
eventpoll
java2171 root 23r FIFO0,7 23323281
pipe
java2171 root 24r FIFO0,7 23331536
pipe
java2171 root 25w FIFO0,7 23306764
pipe
java2171 root 26r 0,80 23306765
eventpoll
java2171 root 27r FIFO0,7 23262160
pipe
java2171 root 28w FIFO0,7 23262160
pipe
java2171 root 29r 0,80 23262161
eventpoll
java2171 root 30w FIFO0,7 23299329
pipe
java2171 root 31r 0,80 23299330
eventpoll
java2171 root 32w FIFO0,7 23331536
pipe
java2171 root 33r FIFO0,7 23268961
pipe
java2171 root 34w FIFO0,7 23268961
pipe
java2171 root 35r 0,80 23268962
eventpoll
java2171 root 36w FIFO0,7 23314889
pipe
...
...
...
What in DFS client (if any) could have caused this? Could it be
something else?
Is it not ideal to use an HDFS writer to directly write logs from
Apache
into HDFS?
Is 'Chukwa (hadoop log collection and analysis framework
contributed by
Yahoo) a better fit for our case?
I would highly appreciate help on any or all of the above questions.
Thanks and Regards
-Ankur