Re: More unexplained timeouts

2013-09-29 Thread Eric Wong
n...@auger.net wrote:
> We're still suffering from unexplained workers timing out.  We
> recently upgraded to the latest unicorn 4.6.3 (while still on REE
> 1.8.7) in the hopes that it would solve our issues.  Unfortunately,
> this seemed to exacerbate the problem, with timeouts happening more
> frequently, but that could be related to greater precision in timeouts
> in newer versions of unicorn.  (In our unicorn 3.6.2, a timeout set to
> 120s might not ACTUALLY timeout until 180s or more, thus allowing a
> bit more time for Ruby to finish whatever it was choking on.)

Yes, there were some fixes in 4.x to improve the timeout accuracy.

> We dropped the timeout down to 65s (to make sure it was triggered) and
> then tried to add greater logging (per
> http://permalink.gmane.org/gmane.comp.lang.ruby.unicorn.general/1269.)
> The START/FINISH approach confirms it's not an issue with our
> application code, ie:
> 
> HH:MM:SS- S/F[PID]- /PATH
> 15:21:01- START-25904- /pathA
> 15:21:01- FINISH-25904- /pathA
> 15:21:01- START-25904- /pathB
> 15:21:01- FINISH-25904- /pathB
> 15:21:01- START-25904- /pathC
> 15:21:01- FINISH-25904- /pathC
> worker=11 PID:25904 timeout (66s > 65s), killing
> reaped # worker=11
> 
> For each START we always get a corresponding FINISH and then the
> worker is killed.  Additionally, our nginx logs confirm that this last
> request was sent back to the client.  No 'upstream' errors in our
> nginx log, either.
> 
> When we tried the Thread sleep approach, nothing actually appeared in
> the logs.  I imagine this means that ruby or some C extension is
> misbehaving.

Sounds like it.  1.8 and old C extensions could easily lock up the
interpreter on blocking calls.

Another problem could be using new versions of C extensions that are no
longer tested under 1.8.  I admit I haven't tested recent versions of
unicorn/kgio/raindrops on 1.8 lately, either, but I'm _fairly_ sure they
still work since they haven't changed much.

> Unfortunately, it's been impossible for us to recreate this in
> development.  

Are you running any different gems/extensions in development vs
production?

> Thoughts?
> 
> RHEL 5.6
> REE 1.8.7 2011.12
> Unicorn 4.6.3
> 16 unicorn workers on 8 cores
> No swap activity, no peaks in load

What other gems/extensions do you use?
___
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


More unexplained timeouts

2013-09-29 Thread nick
We're still suffering from unexplained workers timing out.  We recently 
upgraded to the latest unicorn 4.6.3 (while still on REE 1.8.7) in the hopes 
that it would solve our issues.  Unfortunately, this seemed to exacerbate the 
problem, with timeouts happening more frequently, but that could be related to 
greater precision in timeouts in newer versions of unicorn.  (In our unicorn 
3.6.2, a timeout set to 120s might not ACTUALLY timeout until 180s or more, 
thus allowing a bit more time for Ruby to finish whatever it was choking on.)

We dropped the timeout down to 65s (to make sure it was triggered) and then 
tried to add greater logging (per 
http://permalink.gmane.org/gmane.comp.lang.ruby.unicorn.general/1269.)  The 
START/FINISH approach confirms it's not an issue with our application code, ie:

HH:MM:SS- S/F[PID]- /PATH
15:21:01- START-25904- /pathA
15:21:01- FINISH-25904- /pathA
15:21:01- START-25904- /pathB
15:21:01- FINISH-25904- /pathB
15:21:01- START-25904- /pathC
15:21:01- FINISH-25904- /pathC
worker=11 PID:25904 timeout (66s > 65s), killing
reaped # worker=11

For each START we always get a corresponding FINISH and then the worker is 
killed.  Additionally, our nginx logs confirm that this last request was sent 
back to the client.  No 'upstream' errors in our nginx log, either.

When we tried the Thread sleep approach, nothing actually appeared in the logs. 
 I imagine this means that ruby or some C extension is misbehaving.

Unfortunately, it's been impossible for us to recreate this in development.  

Thoughts?

RHEL 5.6
REE 1.8.7 2011.12
Unicorn 4.6.3
16 unicorn workers on 8 cores
No swap activity, no peaks in load

Again, thanks for all your help!

-Nick

___
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying