hmm ... I am wondering if the Trigger thread should just bail out without resetting trigger if it can't get hold of the lock in 1 sec. The next append or next trigger should take care of rotating the files ..
thanks Prasad On Wed, Oct 19, 2011 at 1:42 PM, Cameron Gandevia <cgande...@gmail.com>wrote: > We recently modified the RollSink to hide our problem by giving it a few > seconds to finish writing before rolling. We are going to test it out and if > it fixes our issue we will provide a patch later today. > On Oct 19, 2011 1:27 PM, "AD" <straightfl...@gmail.com> wrote: > >> Yea, i am using Hbase sink, so i guess its possible something is getting >> hung up there and causing the collector to die. The number of file >> descriptors seems more than safe under the limit. >> >> On Wed, Oct 19, 2011 at 3:16 PM, Cameron Gandevia <cgande...@gmail.com>wrote: >> >>> We were seeing the same issue when our HDFS instance was overloaded and >>> taking over a second to respond. I assume if whatever backend is down the >>> collector will die and need to be restarted when it becomes available again? >>> Doesn't seem very reliable >>> >>> >>> On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <ralph.go...@dslextreme.com >>> > wrote: >>> >>>> We saw this problem when it was taking more than 1 second for a response >>>> from writing to Cassandra (our back end). A single long response will kill >>>> the collector. We had to revert back to the version of Flume that uses >>>> syncrhonization instead of read/write locking to get around this. >>>> >>>> Ralph >>>> >>>> On Oct 18, 2011, at 1:55 PM, AD wrote: >>>> >>>> > Hello, >>>> > >>>> > My collector keeps dying with the following error, is this a known >>>> issue? Any idea how to prevent or find out what is causing it ? is >>>> format("%{nanos}" an issue ? >>>> > >>>> > 2011-10-17 23:16:33,957 INFO >>>> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode >>>> flume1-18 exited with error: null >>>> > java.lang.InterruptedException >>>> > at >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246) >>>> > at >>>> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009) >>>> > at >>>> com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296) >>>> > at >>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) >>>> > at >>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) >>>> > >>>> > >>>> > source: collectorSource("35853") >>>> > sink: regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/: >>>> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_ >>>> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte") >>>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:") >>>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) { >>>> attr2hbase("apache_logs","f1","","hbase_") } >>>> >>>> >>> >>> >>> -- >>> Thanks >>> >>> Cameron Gandevia >>> >> >>