Storm topology freezes and does not process tuples from Kafka

2017-07-12 Thread Sreeram
Hi,

I am observing that my storm topology intermediately freezes and does
not continue to process tuples from Kafka. This happens frequently and
when it happens this freeze lasts for 5 to 15 minutes. No content is
written to any of the worker log files during this time.

The version of storm I use is 1.0.2 and Kafka version is 0.9.0.

Any suggestions to solve the issue ?

Thanks,
Sreeram

Supervisor log at the time of freeze looks like below

2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
2017-07-12 14:38:47.212 o.a.s.d.supervisor [INFO]
d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
2017-07-12 14:38:47.712 o.a.s.d.supervisor [INFO]
d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
2017-07-12 14:38:48.213 o.a.s.d.supervisor [INFO]
d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
2017-07-12 14:38:48.713 o.a.s.d.supervisor [INFO]
d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
2017-07-12 14:38:49.213 o.a.s.d.supervisor [INFO]
d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
2017-07-12 14:38:49.713 o.a.s.d.supervisor [INFO]
d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
2017-07-12 14:38:50.214 o.a.s.d.supervisor [INFO]
d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
2017-07-12 14:38:50.714 o.a.s.d.supervisor [INFO]
d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started


Thread stacks (sample)
Most of worker threads during this freeze period look like one of the
below two stack traces.

Thread 104773: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
information may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object,
long) @bci=20, line=215 (Compiled frame)
 - 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode,
boolean, long) @bci=160, line=460 (Compil
ed frame)
 - 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
boolean, long) @bci=102, line=362 (Compiled frame)
 - java.util.concurrent.SynchronousQueue.poll(long,
java.util.concurrent.TimeUnit) @bci=11, line=941 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=134,
line=1066 (Compiled frame)
 - 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=26, line=1127 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5,
line=617 (Compiled frame)
 - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)

 Thread 147495: (state = IN_NATIVE)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0
(Compiled frame; information may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)
 - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=93 (Compiled frame)
 - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=86
(Compiled frame)
 - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=97 (Compiled frame)
 - org.apache.kafka.common.network.Selector.select(long) @bci=35,
line=425 (Compiled frame)
 - org.apache.kafka.common.network.Selector.poll(long) @bci=81,
line=254 (Compiled frame)
 - org.apache.kafka.clients.NetworkClient.poll(long, long) @bci=84,
line=270 (Compiled frame)
 - org.apache.kafka.clients.producer.internals.Sender.run(long)
@bci=343, line=216 (Compiled frame)
 - org.apache.kafka.clients.producer.internals.Sender.run() @bci=27,
line=128 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)


Re: Storm topology freezes and does not process tuples from Kafka

2017-07-14 Thread Sreeram
Anyone?

On Wed, Jul 12, 2017 at 2:56 PM, Sreeram  wrote:
> Hi,
>
> I am observing that my storm topology intermediately freezes and does
> not continue to process tuples from Kafka. This happens frequently and
> when it happens this freeze lasts for 5 to 15 minutes. No content is
> written to any of the worker log files during this time.
>
> The version of storm I use is 1.0.2 and Kafka version is 0.9.0.
>
> Any suggestions to solve the issue ?
>
> Thanks,
> Sreeram
>
> Supervisor log at the time of freeze looks like below
>
> 2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:47.212 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:47.712 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:48.213 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:48.713 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:49.213 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:49.713 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:50.214 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:50.714 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>
>
> Thread stacks (sample)
> Most of worker threads during this freeze period look like one of the
> below two stack traces.
>
> Thread 104773: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
> information may be imprecise)
>  - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object,
> long) @bci=20, line=215 (Compiled frame)
>  - 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode,
> boolean, long) @bci=160, line=460 (Compil
> ed frame)
>  - 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
> boolean, long) @bci=102, line=362 (Compiled frame)
>  - java.util.concurrent.SynchronousQueue.poll(long,
> java.util.concurrent.TimeUnit) @bci=11, line=941 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=134,
> line=1066 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=26, line=1127 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5,
> line=617 (Compiled frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
>
>  Thread 147495: (state = IN_NATIVE)
>  - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0
> (Compiled frame; information may be imprecise)
>  - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)
>  - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=93 (Compiled 
> frame)
>  - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=86
> (Compiled frame)
>  - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=97 (Compiled frame)
>  - org.apache.kafka.common.network.Selector.select(long) @bci=35,
> line=425 (Compiled frame)
>  - org.apache.kafka.common.network.Selector.poll(long) @bci=81,
> line=254 (Compiled frame)
>  - org.apache.kafka.clients.NetworkClient.poll(long, long) @bci=84,
> line=270 (Compiled frame)
>  - org.apache.kafka.clients.producer.internals.Sender.run(long)
> @bci=343, line=216 (Compiled frame)
>  - org.apache.kafka.clients.producer.internals.Sender.run() @bci=27,
> line=128 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)


Re: Storm topology freezes and does not process tuples from Kafka

2017-07-14 Thread P. Taylor Goetz
> Supervisor log at the time of freeze looks like below
> 
> 2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started


There are two situations where you would see those messages: When a topology is 
first deployed, and when a worker has died and is being restarted.

I suspect the latter. Have you looked at the worker logs for any indication 
that the workers might be crashing and what might be causing it?

What components are involved in you’re topology?

-Taylor


> On Jul 12, 2017, at 5:26 AM, Sreeram  wrote:
> 
> Hi,
> 
> I am observing that my storm topology intermediately freezes and does
> not continue to process tuples from Kafka. This happens frequently and
> when it happens this freeze lasts for 5 to 15 minutes. No content is
> written to any of the worker log files during this time.
> 
> The version of storm I use is 1.0.2 and Kafka version is 0.9.0.
> 
> Any suggestions to solve the issue ?
> 
> Thanks,
> Sreeram
> 
> Supervisor log at the time of freeze looks like below
> 
> 2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:47.212 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:47.712 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:48.213 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:48.713 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:49.213 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:49.713 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:50.214 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 2017-07-12 14:38:50.714 o.a.s.d.supervisor [INFO]
> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> 
> 
> Thread stacks (sample)
> Most of worker threads during this freeze period look like one of the
> below two stack traces.
> 
> Thread 104773: (state = BLOCKED)
> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
> information may be imprecise)
> - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object,
> long) @bci=20, line=215 (Compiled frame)
> - 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode,
> boolean, long) @bci=160, line=460 (Compil
> ed frame)
> - 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
> boolean, long) @bci=102, line=362 (Compiled frame)
> - java.util.concurrent.SynchronousQueue.poll(long,
> java.util.concurrent.TimeUnit) @bci=11, line=941 (Compiled frame)
> - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=134,
> line=1066 (Compiled frame)
> - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=26, line=1127 (Compiled frame)
> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5,
> line=617 (Compiled frame)
> - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
> 
> Thread 147495: (state = IN_NATIVE)
> - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0
> (Compiled frame; information may be imprecise)
> - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)
> - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=93 (Compiled 
> frame)
> - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=86
> (Compiled frame)
> - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=97 (Compiled frame)
> - org.apache.kafka.common.network.Selector.select(long) @bci=35,
> line=425 (Compiled frame)
> - org.apache.kafka.common.network.Selector.poll(long) @bci=81,
> line=254 (Compiled frame)
> - org.apache.kafka.clients.NetworkClient.poll(long, long) @bci=84,
> line=270 (Compiled frame)
> - org.apache.kafka.clients.producer.internals.Sender.run(long)
> @bci=343, line=216 (Compiled frame)
> - org.apache.kafka.clients.producer.internals.Sender.run() @bci=27,
> line=128 (Interpreted frame)
> - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)



Re: Storm topology freezes and does not process tuples from Kafka

2017-07-14 Thread Sreeram
Thank you Taylor for replying.

I checked the worker logs and there were no messages that get printed
once the worker goes into freeze( I had the worker logs set at ERROR
level ).

Regarding the components, I have kafka spout in the topology and bolts
that write to HBase.

I had been rebalancing the topology with wait time of 30seconds. Is it
recommended to match it with topology.message.timeout.secs ?

Please let me know if you need any specific info.

Thanks,
Sreeram

On Sat, Jul 15, 2017 at 12:39 AM, P. Taylor Goetz  wrote:
>> Supervisor log at the time of freeze looks like below
>>
>> 2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>
>
> There are two situations where you would see those messages: When a topology 
> is first deployed, and when a worker has died and is being restarted.
>
> I suspect the latter. Have you looked at the worker logs for any indication 
> that the workers might be crashing and what might be causing it?
>
> What components are involved in you’re topology?
>
> -Taylor
>
>
>> On Jul 12, 2017, at 5:26 AM, Sreeram  wrote:
>>
>> Hi,
>>
>> I am observing that my storm topology intermediately freezes and does
>> not continue to process tuples from Kafka. This happens frequently and
>> when it happens this freeze lasts for 5 to 15 minutes. No content is
>> written to any of the worker log files during this time.
>>
>> The version of storm I use is 1.0.2 and Kafka version is 0.9.0.
>>
>> Any suggestions to solve the issue ?
>>
>> Thanks,
>> Sreeram
>>
>> Supervisor log at the time of freeze looks like below
>>
>> 2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> 2017-07-12 14:38:47.212 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> 2017-07-12 14:38:47.712 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> 2017-07-12 14:38:48.213 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> 2017-07-12 14:38:48.713 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> 2017-07-12 14:38:49.213 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> 2017-07-12 14:38:49.713 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> 2017-07-12 14:38:50.214 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> 2017-07-12 14:38:50.714 o.a.s.d.supervisor [INFO]
>> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>>
>>
>> Thread stacks (sample)
>> Most of worker threads during this freeze period look like one of the
>> below two stack traces.
>>
>> Thread 104773: (state = BLOCKED)
>> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
>> information may be imprecise)
>> - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object,
>> long) @bci=20, line=215 (Compiled frame)
>> - 
>> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode,
>> boolean, long) @bci=160, line=460 (Compil
>> ed frame)
>> - 
>> java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
>> boolean, long) @bci=102, line=362 (Compiled frame)
>> - java.util.concurrent.SynchronousQueue.poll(long,
>> java.util.concurrent.TimeUnit) @bci=11, line=941 (Compiled frame)
>> - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=134,
>> line=1066 (Compiled frame)
>> - 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>> @bci=26, line=1127 (Compiled frame)
>> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5,
>> line=617 (Compiled frame)
>> - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
>>
>> Thread 147495: (state = IN_NATIVE)
>> - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0
>> (Compiled frame; information may be imprecise)
>> - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled frame)
>> - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=93 (Compiled 
>> frame)
>> - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=86
>> (Compiled frame)
>> - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=97 (Compiled frame)
>> - org.apache.kafka.common.network.Selector.select(long) @bci=35,
>> line=425 (Compiled frame)
>> - org.apache.kafka.common.network.Selector.poll(long) @bci=81,
>> line=254 (Compiled frame)
>> - org.apache.kafka.clients.NetworkClient.poll(long, long) @bci=84,
>> line=270 (Compiled frame)
>> - org.apache.kafka.clients.producer.internals.Sender.run(long)
>> @bci=343, line=216 (Compiled frame)
>> - org.apache.kafka.clients.producer.internals.Sender.run() @bci=27,
>> line=128 (Interpreted frame)
>> - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
>


Re: Storm topology freezes and does not process tuples from Kafka

2017-07-16 Thread Ambud Sharma
Please check if you have orphan workers. Orphan workers happen when a
topology is redeployed in a short period of time and the old workers
haven't yet been cleaned up.

Check this running ps aux|grep java or specific jar keyword if you have one.

On Jul 14, 2017 11:17 PM, "Sreeram"  wrote:

> Thank you Taylor for replying.
>
> I checked the worker logs and there were no messages that get printed
> once the worker goes into freeze( I had the worker logs set at ERROR
> level ).
>
> Regarding the components, I have kafka spout in the topology and bolts
> that write to HBase.
>
> I had been rebalancing the topology with wait time of 30seconds. Is it
> recommended to match it with topology.message.timeout.secs ?
>
> Please let me know if you need any specific info.
>
> Thanks,
> Sreeram
>
> On Sat, Jul 15, 2017 at 12:39 AM, P. Taylor Goetz 
> wrote:
> >> Supervisor log at the time of freeze looks like below
> >>
> >> 2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >
> >
> > There are two situations where you would see those messages: When a
> topology is first deployed, and when a worker has died and is being
> restarted.
> >
> > I suspect the latter. Have you looked at the worker logs for any
> indication that the workers might be crashing and what might be causing it?
> >
> > What components are involved in you’re topology?
> >
> > -Taylor
> >
> >
> >> On Jul 12, 2017, at 5:26 AM, Sreeram  wrote:
> >>
> >> Hi,
> >>
> >> I am observing that my storm topology intermediately freezes and does
> >> not continue to process tuples from Kafka. This happens frequently and
> >> when it happens this freeze lasts for 5 to 15 minutes. No content is
> >> written to any of the worker log files during this time.
> >>
> >> The version of storm I use is 1.0.2 and Kafka version is 0.9.0.
> >>
> >> Any suggestions to solve the issue ?
> >>
> >> Thanks,
> >> Sreeram
> >>
> >> Supervisor log at the time of freeze looks like below
> >>
> >> 2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >> 2017-07-12 14:38:47.212 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >> 2017-07-12 14:38:47.712 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >> 2017-07-12 14:38:48.213 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >> 2017-07-12 14:38:48.713 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >> 2017-07-12 14:38:49.213 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >> 2017-07-12 14:38:49.713 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >> 2017-07-12 14:38:50.214 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >> 2017-07-12 14:38:50.714 o.a.s.d.supervisor [INFO]
> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
> >>
> >>
> >> Thread stacks (sample)
> >> Most of worker threads during this freeze period look like one of the
> >> below two stack traces.
> >>
> >> Thread 104773: (state = BLOCKED)
> >> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
> >> information may be imprecise)
> >> - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object,
> >> long) @bci=20, line=215 (Compiled frame)
> >> - java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(
> java.util.concurrent.SynchronousQueue$TransferStack$SNode,
> >> boolean, long) @bci=160, line=460 (Compil
> >> ed frame)
> >> - java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.
> lang.Object,
> >> boolean, long) @bci=102, line=362 (Compiled frame)
> >> - java.util.concurrent.SynchronousQueue.poll(long,
> >> java.util.concurrent.TimeUnit) @bci=11, line=941 (Compiled frame)
> >> - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=134,
> >> line=1066 (Compiled frame)
> >> - java.util.concurrent.ThreadPoolExecutor.runWorker(
> java.util.concurrent.ThreadPoolExecutor$Worker)
> >> @bci=26, line=1127 (Compiled frame)
> >> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5,
> >> line=617 (Compiled frame)
> >> - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)
> >>
> >> Thread 147495: (state = IN_NATIVE)
> >> - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0
> >> (Compiled frame; information may be imprecise)
> >> - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=269 (Compiled
> frame)
> >> - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=93
> (Compiled frame)
> >> - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=86
> >> (Compiled frame)
> >> - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=97 (Compiled
> frame)
> >> - org.apache.kafka.common.network.Selector.select(long) @bci=35,
> >> line=425 (Compiled frame)
> >> - org.apache.kafka

Re: Storm topology freezes and does not process tuples from Kafka

2017-07-17 Thread J.R. Pauley
unrelated probably but my 1.0.2 topology closes itself down whenever the
console times out. This may be because I installed it in /home/ubuntu user
dir. Not certain if that causes the problem or not. Unlike you I'm not
using kafka at all but after any inactivity period I return to find
supervisor, drpc, and nimbus shutdown with no errors in logs. When I
restart those processes the topology workers resume. Since mine is only a
demo it is not a crucial issue but quite annoying

On Sun, Jul 16, 2017 at 11:38 PM, Ambud Sharma 
wrote:

> Please check if you have orphan workers. Orphan workers happen when a
> topology is redeployed in a short period of time and the old workers
> haven't yet been cleaned up.
>
> Check this running ps aux|grep java or specific jar keyword if you have
> one.
>
> On Jul 14, 2017 11:17 PM, "Sreeram"  wrote:
>
>> Thank you Taylor for replying.
>>
>> I checked the worker logs and there were no messages that get printed
>> once the worker goes into freeze( I had the worker logs set at ERROR
>> level ).
>>
>> Regarding the components, I have kafka spout in the topology and bolts
>> that write to HBase.
>>
>> I had been rebalancing the topology with wait time of 30seconds. Is it
>> recommended to match it with topology.message.timeout.secs ?
>>
>> Please let me know if you need any specific info.
>>
>> Thanks,
>> Sreeram
>>
>> On Sat, Jul 15, 2017 at 12:39 AM, P. Taylor Goetz 
>> wrote:
>> >> Supervisor log at the time of freeze looks like below
>> >>
>> >> 2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >
>> >
>> > There are two situations where you would see those messages: When a
>> topology is first deployed, and when a worker has died and is being
>> restarted.
>> >
>> > I suspect the latter. Have you looked at the worker logs for any
>> indication that the workers might be crashing and what might be causing it?
>> >
>> > What components are involved in you’re topology?
>> >
>> > -Taylor
>> >
>> >
>> >> On Jul 12, 2017, at 5:26 AM, Sreeram  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am observing that my storm topology intermediately freezes and does
>> >> not continue to process tuples from Kafka. This happens frequently and
>> >> when it happens this freeze lasts for 5 to 15 minutes. No content is
>> >> written to any of the worker log files during this time.
>> >>
>> >> The version of storm I use is 1.0.2 and Kafka version is 0.9.0.
>> >>
>> >> Any suggestions to solve the issue ?
>> >>
>> >> Thanks,
>> >> Sreeram
>> >>
>> >> Supervisor log at the time of freeze looks like below
>> >>
>> >> 2017-07-12 14:38:46.712 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >> 2017-07-12 14:38:47.212 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >> 2017-07-12 14:38:47.712 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >> 2017-07-12 14:38:48.213 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >> 2017-07-12 14:38:48.713 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >> 2017-07-12 14:38:49.213 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >> 2017-07-12 14:38:49.713 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >> 2017-07-12 14:38:50.214 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >> 2017-07-12 14:38:50.714 o.a.s.d.supervisor [INFO]
>> >> d8958816-5bc8-449e-94e3-87ddbb2c3d02 still hasn't started
>> >>
>> >>
>> >> Thread stacks (sample)
>> >> Most of worker threads during this freeze period look like one of the
>> >> below two stack traces.
>> >>
>> >> Thread 104773: (state = BLOCKED)
>> >> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
>> >> information may be imprecise)
>> >> - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object,
>> >> long) @bci=20, line=215 (Compiled frame)
>> >> - java.util.concurrent.SynchronousQueue$TransferStack.
>> awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode,
>> >> boolean, long) @bci=160, line=460 (Compil
>> >> ed frame)
>> >> - java.util.concurrent.SynchronousQueue$TransferStack.
>> transfer(java.lang.Object,
>> >> boolean, long) @bci=102, line=362 (Compiled frame)
>> >> - java.util.concurrent.SynchronousQueue.poll(long,
>> >> java.util.concurrent.TimeUnit) @bci=11, line=941 (Compiled frame)
>> >> - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=134,
>> >> line=1066 (Compiled frame)
>> >> - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.
>> concurrent.ThreadPoolExecutor$Worker)
>> >> @bci=26, line=1127 (Compiled frame)
>> >> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5,
>> >> line=617 (Compiled frame)
>> >> - java.lang.Thread.run() @bci=11, line=745 (Co