Re: App Timeouts

mattsly Mon, 29 Nov 2010 20:55:16 -0800

So I've been trying to get rack-timeout installed, but it's causing
consistent, though non-deterministic, application crashes from which
my app won't recover.  I've had to rollback.


I'm on Rails 3.0.1 on Bamboo (Ruby 1.8.7)

Basically, per README here:
https://github.com/kch/rack-timeout

...I add two lines to my Gemfile
gem "system_timer" if RUBY_VERSION < "1.9"
gem "rack-timeout"

...and that causes the crash (after running peachy keen for a few
minutes, even up to an hour or so...) Comment them out, and we're all
good.

Have others had success using rack-timeout?



On Nov 7, 9:34 pm, Subramanya Sastry <sss.li...@gmail.com> wrote:
> Aha!  Thanks for the explanation.  That is very helpful.  So, it could
> just be a single bad request that pretty much times out all additional
> requests down the pipe.  We've added rack-timeout already, and next
> time we hit one such bad request, we'll know with an exceptional
> report!
>
> Subbu.
>
>
>
>
>
>
>
> On Sun, Nov 7, 2010 at 8:27 PM, daniel hoey <danielho...@gmail.com> wrote:
> > Just to follow up on my original post: We had one action that we knew
> > had a timeout problem but we hadn't prioritized fixing it. We
> > eventually discovered that this action caused other requests to
> > timeout. The understanding that I got from talking to Heroku support
> > is when a request comes in it gets assigned to a dyno immediately. For
> > the purposes of herokutimeoutsthe request 'start time' is now. But
> > if that dyno is currently processing some other request then the new
> > request will just wait. If 30 seconds passes and the first request has
> > not finished processing, then both requests timeout. Note also that if
> > the first request takes 29s and the second request takes 2s then the
> > second request will timeout.
>
> > We ended up putting SystemTimer (http://systemtimer.rubyforge.org/)
> >timeoutsaround some of our actions and filters so an exception gets
> > raised when something times out, rack-timeout looks like a better way
> > of doing this. We also used New Relic Silver to find the actions that
> > where the root cause of the problem.
>
> > Basically the moral of the story is that you have to make sure that
> > none of your actions ever timeout.
>
> > On Nov 6, 4:31 am, Oren Teich <o...@heroku.com> wrote:
> >> I've seen a few people with weirdtimeoutswhere theappowner was
> >> able to find out that it was a bug in their code.  Anything from a
> >> weird SQL query locking a table that was hanging their process to API
> >> requests to other hard to track stuff.
>
> >> This gem (http://github.com/kch/rack-timeout) will timeout your
> >> requests after a period you specify.  The advantage of this is you can
> >> set it to a short time, and exceptional/hoptoad should catch the
> >> timeout giving you some indication in the backtrace of what's going
> >> on.
>
> >> Oren
>
> >> On Fri, Nov 5, 2010 at 9:06 AM, Subbu Sastry <sss.li...@gmail.com> wrote:
> >> > Has anyone found a reasonable solution to this problem yet?  On our
> >> >appas well, we notice totally random timeout errors that couldn't
> >> > possibly be associated with db lookup -- sometimes request time out on
> >> > pages that lookup a row by primary key on a table with 15 records.
> >> > Favicon.ico timed out as well.  Thetimeoutsseem arbitrary, and
> >> > *always* get fixed on server restart (heroku restart).  This has
> >> > happened to us a few times over the last week.  And yes, as several of
> >> > you have noted, there is no exceptions raised (neither exceptional nor
> >> > NewRelic).
>
> >> > I think given that we experienced timeout with favicon.ico and an
> >> > about page with a single db lookup and newrelic doesn't see this at
> >> > all, I suspect this is something higher up the heroku stack that is
> >> > timing out .. It almost smells like a memory leak somewhere which is
> >> > howapprestart seems to fix the problem.  Now, the question is
> >> > whether the memory leak is in ourappor somewhere else (plugins,
> >> > gems, interaction with heroku stack) ... I will debug this, but wanted
> >> > to see if someone else has found a reasonable solution to this.
>
> >> > Subbu.
>
> >> > On Oct 6, 9:37 pm, mattsly <matt...@gmail.com> wrote:
> >> >> In just manual testing myapp, I've seen a fair number oftimeouts
> >> >> (maybe a dozen) but have not received any communication.  I am pretty
> >> >> sure I'd have no idea they occurred had I not personally witnessed the
> >> >> error page.  I find this a borderline "ship blocker" for a migration
> >> >> to Heroku as I consider migrating a ~500K monthly page viewappto
> >> >> Heroku, and get very anxious thinking about lots of users seeing funky
> >> >> error page and having no way of being alerted or knowing how prevalent
> >> >> the issue is.
>
> >> >> WRT to thetimeouts, it's maybe 1% of requests thattimeout...and I
> >> >> still can't pin down why they're happening.  I'm on a single dyno,
> >> >> with Koi, and < 5 alpha testers on it "concurrently" (andtimeout
> >> >> errors are related to response...not concurrency...) and these are
> >> >> extremely simple paging requests, that according to New Relic, return
> >> >> in ~100MS on average...and then all of a sudden...bam! - a 
> >> >> requesttimeout.  And we're talking about essentially the exact same code
> >> >> path, except a different :offset in the ActiveRecord find call.  The
> >> >> complexity is nothing along the lines of suggestedtimeoutcauses
> >> >> here:http://docs.heroku.com/performance#request-timeout
>
> >> >> Strangely, I just tried turning off all varnish level caching (which I
> >> >> hope to rely on heavily) to try and isolate the issue and now perf
> >> >> seems *more* consistent and faster (haven't seen a timout yet). Could
> >> >> it be that thetimeoutsare being caused during lookup at the Varnish
> >> >> layer? My understanding is this wouldn't be a possible explanation, as
> >> >> I think the dyno doesn't even catch a request if the a varnish cache
> >> >> hit is found.  So maybe Varnish caching is a red herring...but does
> >> >> seem curious.
>
> >> >> Matt
>
> >> >> On Sep 24, 7:56 pm, John Norman <j...@7fff.com> wrote:
>
> >> >> > Well, you should get an e-mail if yourappis generating backlogs.
>
> >> >> > I have oneappthat did generate 2 in a whole week, and I received at 
> >> >> > least
> >> >> > two e-mails from Heroku suggesting that I up the number of dynos.
>
> >> >> > On Fri, Sep 24, 2010 at 11:42 AM, mattsly <matt...@gmail.com> wrote:
> >> >> > > How are you finding thetimeouts? Just manually?  I was havingtimeout
> >> >> > > issues (that I now think I've solved - see below) but am concerned
> >> >> > > that, once I flip my site public, that:
>
> >> >> > > a) There's no apparent native reporting/alerting fortimeoutsor
> >> >> > > backlog too deep errors if they do occur
> >> >> > > b) No ability to render a custom (static) error page in that case
>
> >> >> > > Re: reporting. Whentimeoutsoccur, am I mistaken in not seeing them
> >> >> > > reported anywhere?  They don't seem to throw exceptional or new 
> >> >> > > relic
> >> >> > > exceptions with the free version?  It's unclear to me that they 
> >> >> > > would
> >> >> > > be with the (expensive - .$.05/hr = $36/month for alerting?) 
> >> >> > > "Silver"
> >> >> > > - can anyone confirm that they in fact do?
>
> >> >> > > It seems liketimeout/backlog too deep reporting/alerting should
> >> >> > > really be a built-in feature of Heroku, since they are core elements
> >> >> > > in the architecture, and such alerting (especially backlog) helps 
> >> >> > > you
> >> >> > > make a quick call about cranking dyno count up/down and or 
> >> >> > > restarting
> >> >> > > anappto minimize adverse user affects...i.e. really what this cloud
> >> >> > > and hosting-as-a-service thing is all about.
>
> >> >> > > I'm about to (I think) migrate a high traffic site to Heroku. I 
> >> >> > > *love*
> >> >> > > the idea of being able to focus on development and not 
> >> >> > > sysadmin...but
> >> >> > > have to say that I am getting a little anxious about quirks like 
> >> >> > > this
> >> >> > > and what it might mean for my users.
>
> >> >> > > Matt
>
> >> >> > > (On a slightly related note - I've learned the hard way the
> >> >> > > Table.count is a great way to cause atimeout- looks like MySQL and
> >> >> > > PostGreSQL handle counts *way* differently...something to keep in 
> >> >> > > mind
> >> >> > > if you're migrating from mysql:
> >> >> > >http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#COUNT.28.2A.29)
>
> >> >> > > On Sep 10, 3:45 am, daniel hoey <danielho...@gmail.com> wrote:
> >> >> > > > We go through short periods where we get frequentapptimeouts. The
> >> >> > > > pages thattimeoutare often very simple and do not relying on
> >> >> > > > external services or performing any demanding database queries. We
> >> >> > > > don't get any information in our New Relic transaction traces for
> >> >> > > > these queries (we have for othertimeoutsin the past). Basically we
> >> >> > > > can't get any information about what is going on, and only know 
> >> >> > > > about
> >> >> > > > the problem if our users tell us. Has anyone else experienced 
> >> >> > > > similar
> >> >> > > > problems or have anything to suggest in terms of investigating the
> >> >> > > > root cause?
>
> >> >> > > > The last time that we are aware of this happening was between 
> >> >> > > > 06:30
> >> >> > > > and 07:00 GMT on Sept 10.
>
> >> >> > > On Sep 10, 3:45 am, daniel hoey <danielho...@gmail.com> wrote:
> >> >> > > > We go through short periods where we get frequentapptimeouts. The
> >> >> > > > pages thattimeoutare often very simple and do not relying on
> >> >> > > > external services or performing any demanding database queries. We
> >> >> > > > don't get any information in our New Relic transaction traces for
> >> >> > > > these queries (we have for othertimeoutsin the past). Basically we
> >> >> > > > can't get any information about what is going on, and only know 
> >> >> > > > about
> >> >> > > > the problem if our users tell us. Has anyone else experienced 
> >> >> > > > similar
> >> >> > > > problems or have anything to suggest in terms of investigating the
> >> >> > > > root cause?
>
> >> >> > > > The last time that we are aware of this happening was between 
> >> >> > > > 06:30
> >> >> > > > and 07:00 GMT on Sept 10.
>
> >> >> > > --
> >> >> > > You received this message because you are subscribed to the Google 
> >> >> > > Groups
> >> >> > > "Heroku" group.
> >> >> > > To post to this group, send email to her...@googlegroups.com.
> >> >> > > To unsubscribe from this group, send email to
> >> >> > > heroku+unsubscr...@googlegroups.com<heroku%2bunsubscr...@googlegroups.com>
> >> >> > > .
> >> >> > > For more options, visit this group at
> >> >> > >http://groups.google.com/group/heroku?hl=en.
>
> >> > --
> >> > You received this message because you are subscribed to the Google 
> >> > Groups "Heroku" group.
> >> > To post to this group, send email to her...@googlegroups.com.
> >> > To unsubscribe from this group, send
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google Groups 
"Heroku" group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.

Re: App Timeouts

Reply via email to