Re: flume dying on InterruptException (nanos)

Prasad Mujumdar Wed, 19 Oct 2011 18:09:07 -0700

  hmm ... I am wondering if the Trigger thread should just bail out without
resetting trigger if it can't get hold of the lock in 1 sec. The next append
or next trigger should take care of rotating the files ..


thanks
Prasad

On Wed, Oct 19, 2011 at 1:42 PM, Cameron Gandevia <cgande...@gmail.com>wrote:

> We recently modified the RollSink to hide our problem by giving it a few
> seconds to finish writing before rolling. We are going to test it out and if
> it fixes our issue we will provide a patch later today.
> On Oct 19, 2011 1:27 PM, "AD" <straightfl...@gmail.com> wrote:
>
>> Yea, i am using Hbase sink, so i guess its possible something is getting
>> hung up there and causing the collector to die. The number of file
>> descriptors seems more than safe under the limit.
>>
>> On Wed, Oct 19, 2011 at 3:16 PM, Cameron Gandevia <cgande...@gmail.com>wrote:
>>
>>> We were seeing the same issue when our HDFS instance was overloaded and
>>> taking over a second to respond. I assume if whatever backend is down the
>>> collector will die and need to be restarted when it becomes available again?
>>> Doesn't seem very reliable
>>>
>>>
>>> On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <ralph.go...@dslextreme.com
>>> > wrote:
>>>
>>>> We saw this problem when it was taking more than 1 second for a response
>>>> from writing to Cassandra (our back end).  A single long response will kill
>>>> the collector.  We had to revert back to the version of Flume that uses
>>>> syncrhonization instead of read/write locking to get around this.
>>>>
>>>> Ralph
>>>>
>>>> On Oct 18, 2011, at 1:55 PM, AD wrote:
>>>>
>>>> > Hello,
>>>> >
>>>> >  My collector keeps dying with the following error, is this a known
>>>> issue? Any idea how to prevent or find out what is causing it ?  is
>>>> format("%{nanos}" an issue ?
>>>> >
>>>> > 2011-10-17 23:16:33,957 INFO
>>>> com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode
>>>> flume1-18 exited with error: null
>>>> > java.lang.InterruptedException
>>>> >       at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>>>> >       at
>>>> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
>>>> >       at
>>>> com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>>>> >       at
>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>> >       at
>>>> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>>>> >
>>>> >
>>>> > source:  collectorSource("35853")
>>>> > sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/:
>>>> -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_
>>>> -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte")
>>>> format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:")
>>>> split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) {
>>>> attr2hbase("apache_logs","f1","","hbase_") }
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks
>>>
>>> Cameron Gandevia
>>>
>>
>>

Re: flume dying on InterruptException (nanos)

Reply via email to