Hey Zach,
Thanks for doing this.
You're right, it looks like the main thread is indeed blocked on
thread.join in BrokerProxy, as predicted:
"main" prio=10 tid=0x00007f71d800a000 nid=0xa in Object.wait()
[0x00007f71e1b9d000]
...
at org.apache.samza.system.kafka.BrokerProxy.stop(BrokerProxy.scala:284)
The BrokerProxy appears to be happily fetching messages from the broker,
still.
"SAMZA-BROKER-PROXY-BrokerProxy thread pointed at dev.banno.com:9192 for
...
at
org.apache.samza.system.kafka.DefaultFetchSimpleConsumer.fetch(DefaultFetch
SimpleConsumer.scala:48)
...
at
org.apache.samza.system.kafka.BrokerProxy.org$apache$samza$system$kafka$Bro
kerProxy$$fetchMessages(BrokerProxy.scala:172)
Looking at your logs (from previous email, granted, different execution),
I see that we get as far as SimpleConsumer.disconnect():
[DEBUG] [2015-01-14 17:19:12,596] [SAMZA-BROKER-PROXY-BrokerProxy ...]
o.a.s.s.k.DefaultFetchSimpleConsumer: Disconnecting from localhost:9192
This is invoked by BrokerProxy.scala:135. The only way this should get
invoked is if an exception is caught. Do you see this log line anywhere?
warn("Restarting consumer due to %s. Turn on debugging to get a full
stack trace." format exception)
I ran our TestStatefulTask test a few times, and verified that the
consumer CAN be shutdown:
[ThreadJob] INFO org.apache.samza.container.SamzaContainer - Shutting
down consumer multiplexer.
[ThreadJob] INFO org.apache.samza.system.kafka.BrokerProxy - Shutting
down BrokerProxy for localhost:59075
2015-01-15 08:59:26 DefaultFetchSimpleConsumer [WARN] Reconnect due to
socket error: null
[SAMZA-BROKER-PROXY-BrokerProxy thread pointed at ...] INFO
org.apache.samza.system.kafka.BrokerProxy - Got closed by interrupt
exception in broker proxy thread.
[SAMZA-BROKER-PROXY-BrokerProxy thread pointed at ...] INFO
org.apache.samza.system.kafka.BrokerProxy - Shutting down due to interrupt.
// consumer shutdown is complete here.. The container has moved on to
shutting down the producer
[ThreadJob] INFO org.apache.samza.container.SamzaContainer - Shutting
down producer multiplexer.
Cheers,
Chris
On 1/14/15 7:24 PM, "Zach Cox" <[email protected]> wrote:
>Hi Chris,
>
>I did a thread dump during the shutdown process, when I could tell it
>wasn't shutting down properly [1]. You can see this thread dump in the
>context of the other logging at shutdown.
>
>The main thread is indeed blocked on the Thread.join call in
>BrokerProxy.stop. The BrokerProxy thread looks like it's still consuming
>from Kafka?
>
>The thread dump was a bit tricky; this Samza container is running in a
>Docker container on boot2docker, which is Tiny Core Linux, i.e. no jstack.
>I never new this, but `sudo kill -3 [pid]` tells the jvm to do a thread
>dump to stdout [2] :)
>
>Thanks,
>Zach
>
>[1]
>https://gist.githubusercontent.com/zcox/bb47a61d4ae1acd54056/raw/83224d20a
>61de20bb47bcf916d92930a71cd97ad/gistfile1.txt
>
>[2]
>http://serverfault.com/questions/574745/jstack-alternative-for-linux-redha
>t-6
>
>
>On Wed, Jan 14, 2015 at 7:30 PM, Chris Riccomini <
>[email protected]> wrote:
>
>> Hey Zach,
>>
>> It sounds likely that the BrokerProxy thread is blocking or improperly
>> catching an interrupt exception.
>>
>> Can you take a thread dump? My guess is that you'll see that the main
>> thread is blocked on BrokerProxy.stop:
>>
>> thread.interrupt
>> thread.join
>>
>>
>> It'll likely be blocked on thread.join. If that's the case, I'd like to
>> see what the BrokerProxy thread is doing. This line indicates that the
>> thread seems to be shutting down:
>>
>> [DEBUG] [2015-01-14 17:19:12,596] [SAMZA-BROKER-PROXY-BrokerProxy thread
>> pointed at localhost:9192 for client
>> samza_consumer-twitter_message_separation-1-1421255914862-1]
>> o.a.s.s.k.DefaultFetchSimpleConsumer: Disconnecting from localhost:9192
>>
>> But if thread.join is blocking, then the shutdown never seems to
>>complete.
>> It'd be good to see where these threads are at.
>>
>>
>> Cheers,
>> Chris
>>
>> On 1/14/15 10:49 AM, "Zach Cox" <[email protected]> wrote:
>>
>> >Hi - related to the discussion yesterday about graceful shutdown [1],
>> >today
>> >I can't seem to get the SamzaContainer to actually shut down.
>>Yesterday I
>> >was seeing nice, fast, complete shutdown logs [2]. However, today the
>>last
>> >log line I see is related to shutting down BrokerProxy, then nothing
>>else
>> >happens until the container is finally just SIGKILLed by Docker [3].
>> >
>> >The consistency with which clean, fast shutdowns were happening
>>yesterday
>> >and now they are never happening today leads me to believe I've changed
>> >something, but I really can't find anything that would lead to this.
>>It's
>> >almost like there's some deadlock in BrokerProxy, but I would think
>>that
>> >would show up randomly - I've tried this about 100 times today, and
>>every
>> >time the logs are as shown in [3]. Today I also see 2 exceptions in the
>> >Kafka broker logs [4] but I don't know if those were occurring
>>yesterday
>> >when Samza was cleanly shutting down, or not.
>> >
>> >One thing that doesn't seem to be happening today is this log line from
>> >BrokerProxy: "Got interrupt exception in broker proxy thread."
>> >
>> >Has anyone seen this, or have any advice on what to try next?
>> >
>> >Thanks,
>> >Zach
>> >
>> >
>> >[1]
>>
>>>http://www.mail-archive.com/[email protected]/msg02246.html
>> >
>> >[2]
>> >
>>
>>https://gist.githubusercontent.com/zcox/6ec8910bd3f18e36c1a2/raw/666eae24
>>5
>> >11490bf51a66e56fd90c794ea6b6282/stdout
>> >
>> >[3]
>> >
>>
>>https://gist.githubusercontent.com/zcox/673a2ba607c566de7650/raw/d29ea8b9
>>9
>> >868da9cc4dc1db198315ee1d03bc694/gistfile1.txt
>> >
>> >[4]
>> >
>>
>>https://gist.githubusercontent.com/zcox/f08de55c5d5fe2d70cde/raw/c4be453b
>>5
>> >f6a57cfb5acb08508667f4b8ce8c2bd/gistfile1.txt
>>
>>