Thanks for the clarification, Eric!
-preetam

On Mon, Jul 4, 2011 at 11:38 AM, Eric Yang <[email protected]> wrote:

> Yes, they are the same issue.  The corresponding chunk should have
> timestamp in front of INFO, but it is sending as hostname instead.
> There were two metrics got merged into one log entry during flush.
> Hence, the parser throws an exception for malformed entry, which was
> captured by the framework, and attempt to write malformed entry
> exception to HadoopMetricsInError.  HadoopMetricsInError is not a
> defined column family, therefore, you are seening this exception in
> the collector log.
>
> regards,
> Eric
>
> On Sun, Jul 3, 2011 at 10:28 PM, Preetam Patil <[email protected]>
> wrote:
> > Thanks for the prompt response!
> > Is this also the reason I'm seeing lots of errors of this type?
> > 2011-07-04 10:52:35,271 WARN btpool0-97 HBaseWriter -
> > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed
> > 1 action: NoSuchColumnFamilyException: 1 time, servers with issues:
> > chukwanode:49137,
> > at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1223)
> > at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1237)
> > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
> > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:675)
> > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:665)
> > at
> >
> org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter.add(HBaseWriter.java:203)
> > at
> >
> org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter.add(SocketTeeWriter.java:252)
> > at
> >
> org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.add(PipelineStageWriter.java:41)
> > at
> >
> org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:154)
> > From the debugger, the 'put' request is seen as:
> > row=clus1-datanode8-HadoopMetrics,
> families={(family=HadoopMetricsInError,
> >
> keyvalues=(clus1-datanode8-HadoopMetrics/HadoopMetricsInError:capp/1309756958715/Put/vlen=15,
> >
> clus1-datanode8-HadoopMetrics/HadoopMetricsInError:cchunkData/1309756958715/Put/vlen=585,
> >
> clus1-datanode8-HadoopMetrics/HadoopMetricsInError:cchunkException/1309756958715/Put/vlen=1977,
> >
> clus1-datanode8-HadoopMetrics/HadoopMetricsInError:csource/1309756958715/Put/vlen=15,
> >
> clus1-datanode8-HadoopMetrics/HadoopMetricsInError:ctags/1309756958715/Put/vlen=16)}]
> > I can see from the HBase schema that HadoopMetricsInError column family
> is
> > not there in any of the tables. The corresponding chunk is:
> > clus1-datanode8:clus1-datanode8: INFO chukwa.metrics.rpc:
> >
> {"rpcAuthorizationSuccesses":9,"port":"58148","rpcAuthorizationFailures":0,"recordName":"metrics","ReceivedBytes":90903,"chukwa_timestamp":1309517476828,"RpcQueueTime_num_ops":174,"RpcProcessingTime_num_ops":174,"rpcAuthenticationFailures":0,"period":60,"RpcQueueTime_avg_time":0,"SentBytes":8177,"NumOpenConnections":0,"callQueueLen":0,"rpcAuthenticationSuccesses":0,"hostName":"clus1-datanode8","RpcProcessingTime_avg_time":0,"contextName":"rpc"}
> > Thanks,
> > -preetam
> > On Mon, Jul 4, 2011 at 10:34 AM, Eric Yang <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> This is actually not a chukwa error.  Hadoop metrics context injects
> >> errors like this for jvm metrics, when system is under stress.  This
> >> issue should be addressed in Hadoop Metrics V2 in Hadoop v0.23.
> >>
> >> regards,
> >> Eric
> >>
> >> On Sun, Jul 3, 2011 at 9:40 PM, Preetam Patil <[email protected]>
> >> wrote:
> >> > Hi,
> >> > I'm noticing a strange behavior in the chukwa collector that is
> causing
> >> > the
> >> > parsers to fail. When a large number of events are coming to the
> >> > collector,
> >> > the event entries get kind of jumbled up (date missing, two entries
> >> > getting
> >> > interleaved, two events getting passed as a single chunk, etc). As a
> >> > result,
> >> > the HadoopMetricsProcessor throws a NullPointerException because the
> >> > parser
> >> > returns a null object for a chunk. I have noticed that in all cases of
> >> > this
> >> > error, chukwa.metrics.jvm is involved. But it does not occur on every
> >> > instance of chukwa.metrics.jvm event being received.
> >> > Here's one example of the error (in collector.log):
> >> > 2011-07-04 09:42:18,617 WARN btpool0-18 HadoopMetricsProcessor - Wrong
> >> > format in HadoopMetricsProcessor [2011-07-01 16:29:14,860 2011-07-01
> >> > 16:29:14,862 INFOINFO  chukwa.metrics.jvm: chukwa.metrics.dfs:
> >> >
> >> >
> {"memHeapCommittedM":24.6875,"recordName":"metrics","chukwa_timestamp":1309517954860,"logFatal":0,"processName":"DataNode","threadsWaiting":8,"threadsBlocked":0,"gcCount":482,"logError":8,"logWarn":9,"period":60,"memNonHeapCommittedM":23.1875,"memNonHeapUsedM":18.441505,"gcTimeMillis":9625,"sessionId":"","logInfo":332,"threadsNew":0,"memHeapUsedM":10.359726,"threadsTerminated":0,"hostName":"clus1-datanode6","threadsTimedWaiting":9,"maxMemoryM":888.9375,"threadsRunnable":8,"contextName":"jvm"}{"recordName":"datanode","chukwa_timestamp":1309517954862,"readBlockOp_num_ops":10,"writeBlockOp_avg_time":0,"block_verification_failures":0,"blockChecksumOp_num_ops":0,"blocks_read":10,"blocks_written":13,"copyBlockOp_avg_time":0,"heartBeats_num_ops":197707,"writes_from_remote_client":12,"replaceBlockOp_num_ops":0,"blockReports_avg_time":0,"blocks_replicated":0,"writeBlockOp_num_ops":13,"heartBeats_avg_time":7,"hostName":"clus1-datanode6","bytes_read":60797,"copyBlockOp_num_ops":0,"contextName":"dfs","blockReports_num_ops":167,"bytes_written":389862107,"replaceBlockOp_avg_time":0,"reads_from_remote_client":8,"period":60,"readBlockOp_avg_time":0,"reads_from_local_client":2,"blocks_verified":1,"writes_from_local_client":1,"sessionId":"","blockChecksumOp_avg_time":0,"blocks_removed":14}
> >> > ]
> >> > java.lang.NullPointerException
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.extraction.demux.processor.mapper.HadoopMetricsProcessor.parse(HadoopMetricsProcessor.java:103)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:81)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter.add(HBaseWriter.java:202)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter.add(SocketTeeWriter.java:252)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.add(PipelineStageWriter.java:41)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:154)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.doPost(ServletCollector.java:203)
> >> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
> >> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >> > at
> >> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
> >> > at
> >> >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
> >> > at
> >> >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> >> > at
> >> >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
> >> > at
> >> >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> >> > at org.mortbay.jetty.Server.handle(Server.java:324)
> >> > at
> >> >
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
> >> > at
> >> >
> >> >
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
> >> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
> >> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
> >> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
> >> > at
> >> >
> >> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
> >> > at
> >> >
> >> >
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:450)
> >> > 2011-07-04 09:42:18,618 WARN btpool0-18 HBaseWriter -
> >> > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> >> > Failed
> >> > 1 action: NoSuchColumnFamilyException: 1 time, servers with issues:
> >> > chukwanode:49137,
> >> > at
> >> >
> >> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1223)
> >> > at
> >> >
> >> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1237)
> >> > at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
> >> > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:675)
> >> > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:665)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter.add(HBaseWriter.java:203)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter.add(SocketTeeWriter.java:252)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.add(PipelineStageWriter.java:41)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:154)
> >> > at
> >> >
> >> >
> org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.doPost(ServletCollector.java:203)
> >> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
> >> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >> > at
> >> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
> >> > at
> >> >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
> >> > at
> >> >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> >> > at
> >> >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
> >> > at
> >> >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> >> > at org.mortbay.jetty.Server.handle(Server.java:324)
> >> > at
> >> >
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
> >> > at
> >> >
> >> >
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
> >> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
> >> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
> >> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
> >> > at
> >> >
> >> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
> >> > at
> >> >
> >> >
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:450)
> >> > Here's the corresponding entry in the sequenceFile written by the
> >> > SequenceFileWriter in HDFS:
> >> >
> >> > ^@^@^F^R^@^@^@I<8a>^A0ó2^FÀ^MHadoopMetrics0
> >> >
> >> >
> cluster="Clus1"/clus1-datanode6/clus1-datanode6<8e>^Ep^@^@^@^A^@^@^@^@^@^@^Ep^@^Oclus1-datanode6^@^P
> >> >
> >> >
> cluster="Clus1"^@^Oclus1-datanode6^@^MHadoopMetrics^@^@^@^@^@^A^@^@^Eo2011-07-01
> >> > 16:29:14,860 2011-07-01 16:29:14,862 INFOINFO  chukwa.metrics.jvm:
> >> > chukwa.metrics.dfs:
> >> >
> >> >
> {"memHeapCommittedM":24.6875,"recordName":"metrics","chukwa_timestamp":1309517954860,"logFatal":0,"processName":"DataNode","threadsWaiting":8,"threadsBlocked":0,"gcCount":482,"logError":8,"logWarn":9,"period":60,"memNonHeapCommittedM":23.1875,"memNonHeapUsedM":18.441505,"gcTimeMillis":9625,"sessionId":"","logInfo":332,"threadsNew":0,"memHeapUsedM":10.359726,"threadsTerminated":0,"hostName":"clus1-datanode6","threadsTimedWaiting":9,"maxMemoryM":888.9375,"threadsRunnable":8,"contextName":"jvm"}{"recordName":"datanode","chukwa_timestamp":1309517954862,"readBlockOp_num_ops":10,"writeBlockOp_avg_time":0,"block_verification_failures":0,"blockChecksumOp_num_ops":0,"blocks_read":10,"blocks_written":13,"copyBlockOp_avg_time":0,"heartBeats_num_ops":197707,"writes_from_remote_client":12,"replaceBlockOp_num_ops":0,"blockReports_avg_time":0,"blocks_replicated":0,"writeBlockOp_num_ops":13,"heartBeats_avg_time":7,"hostName":"clus1-datanode6","bytes_read":60797,"copyBlockOp_num_ops":0,"contextName":"dfs","blockReports_num_ops":167,"bytes_written":389862107,"replaceBlockOp_avg_time":0,"reads_from_remote_client":8,"period":60,"readBlockOp_avg_time":0,"reads_from_local_client":2,"blocks_verified":1,"writes_from_local_client":1,"sessionId":"","blockChecksumOp_avg_time":0,"blocks_removed":14}
> >> > Anyone else observed this kind of error? To me, it seems to be
> something
> >> > to
> >> > do with 'events' LIST creation. Any fix or a pointer to fixing it is
> >> > welcome
> >> > :-)
> >> > Thanks,
> >> > -preetam
> >> >
> >
> >
>

Reply via email to