This looks like a bug, the last number should be in sync with the current file's size, but the UTF adaptor is still tailing the previous file (which rotated at 10487067) It means there is a bug in handling the file rotation, but the adaptor did not pick up the change.
Please open a jira. Thanks regards, Eric On Tue, Jul 26, 2011 at 8:05 PM, Ying Tang <[email protected]> wrote: > The log didn't rotate very rapidly. > > Now i can't rebuild the scenario . But when the chukwa agent log looks ok, > > 2011-07-27 10:57:38,967 INFO Timer-0 ChukwaAgent - writing checkpoint > 1307083 > 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - collected 1 > chunks for post_745 > 2011-07-27 10:57:42,571 INFO HTTP post thread ChukwaHttpSender - >>>>>> HTTP > post_745 to http://chukwacollector1.xingcloud.com:9095/ length = 1837 > 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - >>>>>> HTTP > Got success back from http://chukwacollector1.xingcloud.com:9095/chukwa; > response length 43 > 2011-07-27 10:57:42,574 INFO HTTP post thread ChukwaHttpSender - post_745 > sent 0 chunks, got back 1 acks > > The list in telnet agent 9093 is: > adaptor_2963225a90653a309cf779d4a1d815a3) > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 > Gamelog 0 /var/log/gamelog 10487067 > After several minites , the list is still > adaptor_2963225a90653a309cf779d4a1d815a3) > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 > Gamelog 0 /var/log/gamelog 10487067 > > Is the 10487067 the offset number ?The number didn't changed , and the log > file's size is from 0 to 10M .And now the log file's size is 1150872. > > On Wed, Jul 27, 2011 at 12:26 AM, Eric Yang <[email protected]> wrote: >> >> CharFileTailingAdaptorUTF should handle log rotation gracefully. Is the >> log rotating rapidly? >> Run those command on chukwa agent: >> telnet localhost 9093 >> list >> This should show a list of tailing files, and check the offset number of >> the tailing log file. The most right number should be smaller than the size >> of your log file. If it is bigger and not changing, it is most likely there >> is a bug that we haven't seen before. It might be useful to turn on debug >> on chukwa agent and see if this can be reproduced to nail down the root >> cause. Thanks >> regards, >> Eric >> On Jul 26, 2011, at 6:13 AM, Ying Tang wrote: >> >> Is there the possibility that >> when the log file reaches the log4g config file size ,the log4j will >> rename this log file and create a new file with this name as the log file . >> At the time ,the chukwa adaptor doesn't tail the log properly , and this >> cause the chuwa agent can't collector the log any more. >> >> On Tue, Jul 26, 2011 at 2:07 PM, Ying Tang <[email protected]> wrote: >>> >>> The log file is log4j log file ,and the size is 10M ,the maxbackupindex >>> is 1. >>> >>> >>> On Tue, Jul 26, 2011 at 1:42 PM, Eric Yang <[email protected]> wrote: >>>> >>>> Can you run "ls -l" to show the size and dateof the log files that you >>>> are streaming? >>>> >>>> regards, >>>> Eric >>>> >>>> On Mon, Jul 25, 2011 at 7:36 PM, Ying Tang <[email protected]> >>>> wrote: >>>> > The chukwa version is 0.4.0 and the adaptor is >>>> > >>>> > org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8 >>>> > >>>> > On Mon, Jul 25, 2011 at 11:50 PM, Eric Yang <[email protected]> wrote: >>>> >> >>>> >> Hi Ivy, >>>> >> >>>> >> When data is send from agent to collector, collector send >>>> >> acknowledgment >>>> >> of receiving of the chunks. At 00:03:28, there are 5 chunks >>>> >> acknowledged. >>>> >> This means communication between collector and agent are working at >>>> >> that >>>> >> point in time. However, there is no activity after 00:04:28. This >>>> >> looks >>>> >> like adaptor did not handle the log rotation properly at close to >>>> >> midnight. >>>> >> Which version of Chukwa are you using and which adaptor are you >>>> >> using? >>>> >> >>>> >> regards, >>>> >> Eric >>>> >> >>>> >> On Jul 25, 2011, at 12:40 AM, Ying Tang wrote: >>>> >> >>>> >> > Hi all, >>>> >> > >>>> >> > In my cluster , i have two chukwa agent and one collector . >>>> >> > At a time , both chukwa agents's log : >>>> >> > 2011-07-18 00:03:28,688 INFO Timer-1 HttpConnector - # http chunks >>>> >> > ACK'ed since last report: 5 >>>> >> > 2011-07-18 00:04:28,697 INFO Timer-1 HttpConnector - # http chunks >>>> >> > ACK'ed since last report: 0 >>>> >> > 2011-07-18 00:05:28,706 INFO Timer-1 HttpConnector - # http chunks >>>> >> > ACK'ed since last report: 0 >>>> >> > 2011-07-18 00:06:28,714 INFO Timer-1 HttpConnector - # http chunks >>>> >> > ACK'ed since last report: 0 >>>> >> > 2011-07-18 00:07:29,340 INFO Timer-1 HttpConnector - # http chunks >>>> >> > ACK'ed since last report: 0 >>>> >> > >>>> >> > And the collector >>>> >> > 2011-07-17 11:02:32,155 INFO Timer-3 SeqFileWriter - >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >>>> >> > 2011-07-17 11:02:43,074 INFO Timer-1 root - >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0 >>>> >> > 2011-07-17 11:03:02,162 INFO Timer-3 SeqFileWriter - >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >>>> >> > 2011-07-17 11:03:32,168 INFO Timer-3 SeqFileWriter - >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >>>> >> > 2011-07-17 11:03:43,085 INFO Timer-1 root - >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0 >>>> >> > 2011-07-17 11:04:02,174 INFO Timer-3 SeqFileWriter - >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >>>> >> > 2011-07-17 11:04:32,180 INFO Timer-3 SeqFileWriter - >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >>>> >> > 2011-07-17 11:04:43,096 INFO Timer-1 root - >>>> >> > stats:ServletCollector,numberHTTPConnection:0,numberchunks:0 >>>> >> > 2011-07-17 11:05:02,185 INFO Timer-3 SeqFileWriter - >>>> >> > stat:datacollection.writer.hdfs dataSize=0 dataRate=0 >>>> >> > >>>> >> > (the collector and agent has different timezone) >>>> >> > And the collector didn't collect any log. >>>> >> > >>>> >> > >>>> >> > What dons the "http chunks ACK'ed since last report: 0" means? >>>> >> > And from this log "http chunks ACK'ed since last report: 0" appears >>>> >> > to >>>> >> > agent crash, the chukwa port still on , but after several days, >>>> >> > both agents >>>> >> > crashed without exceptions. >>>> >> > >>>> >> > >>>> >> > -- >>>> >> > Best regards, >>>> >> > >>>> >> > Ivy Tang >>>> >> > >>>> >> > >>>> >> > >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > Best regards, >>>> > Ivy Tang >>>> > >>>> > >>>> > >>> >>> >>> >>> -- >>> Best regards, >>> Ivy Tang >>> >>> >> >> >> >> -- >> Best regards, >> Ivy Tang >> >> >> > > > > -- > Best regards, > Ivy Tang > > >
