Re: App Timeouts

daniel hoey Sun, 07 Nov 2010 18:27:17 -0800

Just to follow up on my original post: We had one action that we knew
had a timeout problem but we hadn't prioritized fixing it. We
eventually discovered that this action caused other requests to
timeout. The understanding that I got from talking to Heroku support
is when a request comes in it gets assigned to a dyno immediately. For
the purposes of heroku timeouts the request 'start time' is now. But
if that dyno is currently processing some other request then the new
request will just wait. If 30 seconds passes and the first request has
not finished processing, then both requests timeout. Note also that if
the first request takes 29s and the second request takes 2s then the
second request will timeout.


We ended up putting SystemTimer (http://systemtimer.rubyforge.org/)
timeouts around some of our actions and filters so an exception gets
raised when something times out, rack-timeout looks like a better way
of doing this. We also used New Relic Silver to find the actions that
where the root cause of the problem.

Basically the moral of the story is that you have to make sure that
none of your actions ever timeout.

On Nov 6, 4:31 am, Oren Teich <o...@heroku.com> wrote:
> I've seen a few people with weird timeouts where the app owner was
> able to find out that it was a bug in their code.  Anything from a
> weird SQL query locking a table that was hanging their process to API
> requests to other hard to track stuff.
>
> This gem (http://github.com/kch/rack-timeout) will timeout your
> requests after a period you specify.  The advantage of this is you can
> set it to a short time, and exceptional/hoptoad should catch the
> timeout giving you some indication in the backtrace of what's going
> on.
>
> Oren
>
>
>
>
>
>
>
> On Fri, Nov 5, 2010 at 9:06 AM, Subbu Sastry <sss.li...@gmail.com> wrote:
> > Has anyone found a reasonable solution to this problem yet?  On our
> > app as well, we notice totally random timeout errors that couldn't
> > possibly be associated with db lookup -- sometimes request time out on
> > pages that lookup a row by primary key on a table with 15 records.
> > Favicon.ico timed out as well.  The timeouts seem arbitrary, and
> > *always* get fixed on server restart (heroku restart).  This has
> > happened to us a few times over the last week.  And yes, as several of
> > you have noted, there is no exceptions raised (neither exceptional nor
> > NewRelic).
>
> > I think given that we experienced timeout with favicon.ico and an
> > about page with a single db lookup and newrelic doesn't see this at
> > all, I suspect this is something higher up the heroku stack that is
> > timing out .. It almost smells like a memory leak somewhere which is
> > how app restart seems to fix the problem.  Now, the question is
> > whether the memory leak is in our app or somewhere else (plugins,
> > gems, interaction with heroku stack) ... I will debug this, but wanted
> > to see if someone else has found a reasonable solution to this.
>
> > Subbu.
>
> > On Oct 6, 9:37 pm, mattsly <matt...@gmail.com> wrote:
> >> In just manual testing my app, I've seen a fair number of timeouts
> >> (maybe a dozen) but have not received any communication.  I am pretty
> >> sure I'd have no idea they occurred had I not personally witnessed the
> >> error page.  I find this a borderline "ship blocker" for a migration
> >> to Heroku as I consider migrating a ~500K monthly page view app to
> >> Heroku, and get very anxious thinking about lots of users seeing funky
> >> error page and having no way of being alerted or knowing how prevalent
> >> the issue is.
>
> >> WRT to the timeouts, it's maybe 1% of requests thattimeout...and I
> >> still can't pin down why they're happening.  I'm on a single dyno,
> >> with Koi, and < 5 alpha testers on it "concurrently" (andtimeout
> >> errors are related to response...not concurrency...) and these are
> >> extremely simple paging requests, that according to New Relic, return
> >> in ~100MS on average...and then all of a sudden...bam! - a requesttimeout. 
> >>  And we're talking about essentially the exact same code
> >> path, except a different :offset in the ActiveRecord find call.  The
> >> complexity is nothing along the lines of suggestedtimeoutcauses
> >> here:http://docs.heroku.com/performance#request-timeout
>
> >> Strangely, I just tried turning off all varnish level caching (which I
> >> hope to rely on heavily) to try and isolate the issue and now perf
> >> seems *more* consistent and faster (haven't seen a timout yet). Could
> >> it be that the timeouts are being caused during lookup at the Varnish
> >> layer? My understanding is this wouldn't be a possible explanation, as
> >> I think the dyno doesn't even catch a request if the a varnish cache
> >> hit is found.  So maybe Varnish caching is a red herring...but does
> >> seem curious.
>
> >> Matt
>
> >> On Sep 24, 7:56 pm, John Norman <j...@7fff.com> wrote:
>
> >> > Well, you should get an e-mail if your app is generating backlogs.
>
> >> > I have one app that did generate 2 in a whole week, and I received at 
> >> > least
> >> > two e-mails from Heroku suggesting that I up the number of dynos.
>
> >> > On Fri, Sep 24, 2010 at 11:42 AM, mattsly <matt...@gmail.com> wrote:
> >> > > How are you finding the timeouts? Just manually?  I was havingtimeout
> >> > > issues (that I now think I've solved - see below) but am concerned
> >> > > that, once I flip my site public, that:
>
> >> > > a) There's no apparent native reporting/alerting for timeouts or
> >> > > backlog too deep errors if they do occur
> >> > > b) No ability to render a custom (static) error page in that case
>
> >> > > Re: reporting. When timeouts occur, am I mistaken in not seeing them
> >> > > reported anywhere?  They don't seem to throw exceptional or new relic
> >> > > exceptions with the free version?  It's unclear to me that they would
> >> > > be with the (expensive - .$.05/hr = $36/month for alerting?) "Silver"
> >> > > - can anyone confirm that they in fact do?
>
> >> > > It seems liketimeout/backlog too deep reporting/alerting should
> >> > > really be a built-in feature of Heroku, since they are core elements
> >> > > in the architecture, and such alerting (especially backlog) helps you
> >> > > make a quick call about cranking dyno count up/down and or restarting
> >> > > an app to minimize adverse user affects...i.e. really what this cloud
> >> > > and hosting-as-a-service thing is all about.
>
> >> > > I'm about to (I think) migrate a high traffic site to Heroku. I *love*
> >> > > the idea of being able to focus on development and not sysadmin...but
> >> > > have to say that I am getting a little anxious about quirks like this
> >> > > and what it might mean for my users.
>
> >> > > Matt
>
> >> > > (On a slightly related note - I've learned the hard way the
> >> > > Table.count is a great way to cause atimeout- looks like MySQL and
> >> > > PostGreSQL handle counts *way* differently...something to keep in mind
> >> > > if you're migrating from mysql:
> >> > >http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#COUNT.28.2A.29)
>
> >> > > On Sep 10, 3:45 am, daniel hoey <danielho...@gmail.com> wrote:
> >> > > > We go through short periods where we get frequentapptimeouts. The
> >> > > > pages thattimeoutare often very simple and do not relying on
> >> > > > external services or performing any demanding database queries. We
> >> > > > don't get any information in our New Relic transaction traces for
> >> > > > these queries (we have for othertimeoutsin the past). Basically we
> >> > > > can't get any information about what is going on, and only know about
> >> > > > the problem if our users tell us. Has anyone else experienced similar
> >> > > > problems or have anything to suggest in terms of investigating the
> >> > > > root cause?
>
> >> > > > The last time that we are aware of this happening was between 06:30
> >> > > > and 07:00 GMT on Sept 10.
>
> >> > > On Sep 10, 3:45 am, daniel hoey <danielho...@gmail.com> wrote:
> >> > > > We go through short periods where we get frequentapptimeouts. The
> >> > > > pages thattimeoutare often very simple and do not relying on
> >> > > > external services or performing any demanding database queries. We
> >> > > > don't get any information in our New Relic transaction traces for
> >> > > > these queries (we have for othertimeoutsin the past). Basically we
> >> > > > can't get any information about what is going on, and only know about
> >> > > > the problem if our users tell us. Has anyone else experienced similar
> >> > > > problems or have anything to suggest in terms of investigating the
> >> > > > root cause?
>
> >> > > > The last time that we are aware of this happening was between 06:30
> >> > > > and 07:00 GMT on Sept 10.
>
> >> > > --
> >> > > You received this message because you are subscribed to the Google 
> >> > > Groups
> >> > > "Heroku" group.
> >> > > To post to this group, send email to her...@googlegroups.com.
> >> > > To unsubscribe from this group, send email to
> >> > > heroku+unsubscr...@googlegroups.com<heroku%2bunsubscr...@googlegroups.com>
> >> > > .
> >> > > For more options, visit this group at
> >> > >http://groups.google.com/group/heroku?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Heroku" group.
> > To post to this group, send email to her...@googlegroups.com.
> > To unsubscribe from this group, send email to 
> > heroku+unsubscr...@googlegroups.com.
> > For more options, visit this group 
> > athttp://groups.google.com/group/heroku?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Heroku" group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.

Re: App Timeouts

Reply via email to