Re: nagios to monitor hadoop datanodes!

2008-10-07 Thread
Hadoop already integrated jmx inside, you can extend them to  
implement what you want to monitor, it need to modify some code to  
add some counters or something like that.
One thing you may need to be care is hadoop does not include any  
JMXConnectorServer inside, you need to start one JMXConnectorServer  
for every hadoop process you want to monitor.
This is what we have done on hadoop to monitor it. We have not check  
out Nagios  for hadoop,so no word on Nagios.

hope it help.
在 2008-10-8,上午8:34,Brian Bockelman 写道:


Hey Stefan,

Is there any documentation for making JMX working in Hadoop?

Brian

On Oct 7, 2008, at 7:03 PM, Stefan Groschupf wrote:


try jmx. There should be also jmx to snmp available somewhere.
http://blogs.sun.com/jmxetc/entry/jmx_vs_snmp

~~~
101tec Inc., Menlo Park, California
web:  http://www.101tec.com
blog: http://www.find23.net



On Oct 6, 2008, at 10:05 AM, Gerardo Velez wrote:


Hi Everyone!


I would like to implement Nagios health monitoring of a Hadoop grid.

Some of you have some experience here, do you hace any approach  
or advice I

could use.

At this time I've been only playing with jsp's files that hadoop has
integrated into it. so I;m not sure if it could be a good idea that
nagios monitor request info to these jsp?


Thanks in advance!


-- Gerardo








Re: IPC Client error | Too many files open

2008-10-07 Thread

try update jdk to 1.6, there is a bug for jdk 1.5 about nio.
在 2008-9-26,下午7:29,Goel, Ankur 写道:


Hi Folks,

We have developed a simple log writer in Java that is plugged into
Apache custom log and writes log entries directly to our hadoop  
cluster

(50 machines, quad core, each with 16 GB Ram and 800 GB hard-disk, 1
machine as dedicated Namenode another machine as JobTracker 
TaskTracker + DataNode).

There are around 8 Apache servers dumping logs into HDFS via our  
writer.

Everything was working fine and we were getting around 15 - 20 MB log
data per hour from each server.



Recently we have been experiencing problems with 2-3 of our Apache
servers where a file is opened by log-writer in HDFS for writing  
but it

never receives any data.

Looking at apache error logs shows the following errors

08/09/22 05:02:13 INFO ipc.Client: java.io.IOException: Too many open
files
at sun.nio.ch.IOUtil.initPipe(Native Method)
at
sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:49)
at
sun.nio.ch.EPollSelectorProvider.openSelector 
(EPollSelectorProvider.java

:18)
at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get 
(SocketIOWithT

imeout.java:312)
at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select 
(SocketIOWi

thTimeout.java:227)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO 
(SocketIOWithTimeout.java:

155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java: 
149)

at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java: 
122)

at java.io.FilterInputStream.read(FilterInputStream.java:116)
at
org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:203)
at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at
java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java: 
289)


...

...

 Followed by connection errors saying

Retrying to connect to server: hadoop-server.com:9000. Already tried
'n' times.

(same as above) ...



and is retrying constantly (log-writer set up so that it waits and
retries).



Doing an lsof on the log writer java process shows that it got  
stuck in

a lot of pipe/event poll and eventually ran out of file handles.

Below is the part of the lsof output



lsof -p 2171
COMMAND  PID USER   FD   TYPE DEVICE SIZE NODE  
NAME




java2171 root   20r  FIFO0,7  24090207  
pipe
java2171 root   21w  FIFO0,7  24090207  
pipe

java2171 root   22r  0,80 24090208
eventpoll
java2171 root   23r  FIFO0,7  23323281  
pipe
java2171 root   24r  FIFO0,7  23331536  
pipe
java2171 root   25w  FIFO0,7  23306764  
pipe

java2171 root   26r  0,80 23306765
eventpoll
java2171 root   27r  FIFO0,7  23262160  
pipe
java2171 root   28w  FIFO0,7  23262160  
pipe

java2171 root   29r  0,80 23262161
eventpoll
java2171 root   30w  FIFO0,7  23299329  
pipe

java2171 root   31r  0,80 23299330
eventpoll
java2171 root   32w  FIFO0,7  23331536  
pipe
java2171 root   33r  FIFO0,7  23268961  
pipe
java2171 root   34w  FIFO0,7  23268961  
pipe

java2171 root   35r  0,80 23268962
eventpoll
java2171 root   36w  FIFO0,7  23314889  
pipe


...

...

...

What in DFS client (if any) could have caused this? Could it be
something else?

Is it not ideal to use an HDFS writer to directly write logs from  
Apache

into HDFS?

Is 'Chukwa (hadoop log collection and analysis framework  
contributed by

Yahoo) a better fit for our case?



I would highly appreciate help on any or all of the above questions.



Thanks and Regards

-Ankur