So I've been trying to get rack-timeout installed, but it's causing consistent, though non-deterministic, application crashes from which my app won't recover. I've had to rollback.
I'm on Rails 3.0.1 on Bamboo (Ruby 1.8.7) Basically, per README here: https://github.com/kch/rack-timeout ...I add two lines to my Gemfile gem "system_timer" if RUBY_VERSION < "1.9" gem "rack-timeout" ...and that causes the crash (after running peachy keen for a few minutes, even up to an hour or so...) Comment them out, and we're all good. Have others had success using rack-timeout? On Nov 7, 9:34 pm, Subramanya Sastry <sss.li...@gmail.com> wrote: > Aha! Thanks for the explanation. That is very helpful. So, it could > just be a single bad request that pretty much times out all additional > requests down the pipe. We've added rack-timeout already, and next > time we hit one such bad request, we'll know with an exceptional > report! > > Subbu. > > > > > > > > On Sun, Nov 7, 2010 at 8:27 PM, daniel hoey <danielho...@gmail.com> wrote: > > Just to follow up on my original post: We had one action that we knew > > had a timeout problem but we hadn't prioritized fixing it. We > > eventually discovered that this action caused other requests to > > timeout. The understanding that I got from talking to Heroku support > > is when a request comes in it gets assigned to a dyno immediately. For > > the purposes of herokutimeoutsthe request 'start time' is now. But > > if that dyno is currently processing some other request then the new > > request will just wait. If 30 seconds passes and the first request has > > not finished processing, then both requests timeout. Note also that if > > the first request takes 29s and the second request takes 2s then the > > second request will timeout. > > > We ended up putting SystemTimer (http://systemtimer.rubyforge.org/) > >timeoutsaround some of our actions and filters so an exception gets > > raised when something times out, rack-timeout looks like a better way > > of doing this. We also used New Relic Silver to find the actions that > > where the root cause of the problem. > > > Basically the moral of the story is that you have to make sure that > > none of your actions ever timeout. > > > On Nov 6, 4:31 am, Oren Teich <o...@heroku.com> wrote: > >> I've seen a few people with weirdtimeoutswhere theappowner was > >> able to find out that it was a bug in their code. Anything from a > >> weird SQL query locking a table that was hanging their process to API > >> requests to other hard to track stuff. > > >> This gem (http://github.com/kch/rack-timeout) will timeout your > >> requests after a period you specify. The advantage of this is you can > >> set it to a short time, and exceptional/hoptoad should catch the > >> timeout giving you some indication in the backtrace of what's going > >> on. > > >> Oren > > >> On Fri, Nov 5, 2010 at 9:06 AM, Subbu Sastry <sss.li...@gmail.com> wrote: > >> > Has anyone found a reasonable solution to this problem yet? On our > >> >appas well, we notice totally random timeout errors that couldn't > >> > possibly be associated with db lookup -- sometimes request time out on > >> > pages that lookup a row by primary key on a table with 15 records. > >> > Favicon.ico timed out as well. Thetimeoutsseem arbitrary, and > >> > *always* get fixed on server restart (heroku restart). This has > >> > happened to us a few times over the last week. And yes, as several of > >> > you have noted, there is no exceptions raised (neither exceptional nor > >> > NewRelic). > > >> > I think given that we experienced timeout with favicon.ico and an > >> > about page with a single db lookup and newrelic doesn't see this at > >> > all, I suspect this is something higher up the heroku stack that is > >> > timing out .. It almost smells like a memory leak somewhere which is > >> > howapprestart seems to fix the problem. Now, the question is > >> > whether the memory leak is in ourappor somewhere else (plugins, > >> > gems, interaction with heroku stack) ... I will debug this, but wanted > >> > to see if someone else has found a reasonable solution to this. > > >> > Subbu. > > >> > On Oct 6, 9:37 pm, mattsly <matt...@gmail.com> wrote: > >> >> In just manual testing myapp, I've seen a fair number oftimeouts > >> >> (maybe a dozen) but have not received any communication. I am pretty > >> >> sure I'd have no idea they occurred had I not personally witnessed the > >> >> error page. I find this a borderline "ship blocker" for a migration > >> >> to Heroku as I consider migrating a ~500K monthly page viewappto > >> >> Heroku, and get very anxious thinking about lots of users seeing funky > >> >> error page and having no way of being alerted or knowing how prevalent > >> >> the issue is. > > >> >> WRT to thetimeouts, it's maybe 1% of requests thattimeout...and I > >> >> still can't pin down why they're happening. I'm on a single dyno, > >> >> with Koi, and < 5 alpha testers on it "concurrently" (andtimeout > >> >> errors are related to response...not concurrency...) and these are > >> >> extremely simple paging requests, that according to New Relic, return > >> >> in ~100MS on average...and then all of a sudden...bam! - a > >> >> requesttimeout. And we're talking about essentially the exact same code > >> >> path, except a different :offset in the ActiveRecord find call. The > >> >> complexity is nothing along the lines of suggestedtimeoutcauses > >> >> here:http://docs.heroku.com/performance#request-timeout > > >> >> Strangely, I just tried turning off all varnish level caching (which I > >> >> hope to rely on heavily) to try and isolate the issue and now perf > >> >> seems *more* consistent and faster (haven't seen a timout yet). Could > >> >> it be that thetimeoutsare being caused during lookup at the Varnish > >> >> layer? My understanding is this wouldn't be a possible explanation, as > >> >> I think the dyno doesn't even catch a request if the a varnish cache > >> >> hit is found. So maybe Varnish caching is a red herring...but does > >> >> seem curious. > > >> >> Matt > > >> >> On Sep 24, 7:56 pm, John Norman <j...@7fff.com> wrote: > > >> >> > Well, you should get an e-mail if yourappis generating backlogs. > > >> >> > I have oneappthat did generate 2 in a whole week, and I received at > >> >> > least > >> >> > two e-mails from Heroku suggesting that I up the number of dynos. > > >> >> > On Fri, Sep 24, 2010 at 11:42 AM, mattsly <matt...@gmail.com> wrote: > >> >> > > How are you finding thetimeouts? Just manually? I was havingtimeout > >> >> > > issues (that I now think I've solved - see below) but am concerned > >> >> > > that, once I flip my site public, that: > > >> >> > > a) There's no apparent native reporting/alerting fortimeoutsor > >> >> > > backlog too deep errors if they do occur > >> >> > > b) No ability to render a custom (static) error page in that case > > >> >> > > Re: reporting. Whentimeoutsoccur, am I mistaken in not seeing them > >> >> > > reported anywhere? They don't seem to throw exceptional or new > >> >> > > relic > >> >> > > exceptions with the free version? It's unclear to me that they > >> >> > > would > >> >> > > be with the (expensive - .$.05/hr = $36/month for alerting?) > >> >> > > "Silver" > >> >> > > - can anyone confirm that they in fact do? > > >> >> > > It seems liketimeout/backlog too deep reporting/alerting should > >> >> > > really be a built-in feature of Heroku, since they are core elements > >> >> > > in the architecture, and such alerting (especially backlog) helps > >> >> > > you > >> >> > > make a quick call about cranking dyno count up/down and or > >> >> > > restarting > >> >> > > anappto minimize adverse user affects...i.e. really what this cloud > >> >> > > and hosting-as-a-service thing is all about. > > >> >> > > I'm about to (I think) migrate a high traffic site to Heroku. I > >> >> > > *love* > >> >> > > the idea of being able to focus on development and not > >> >> > > sysadmin...but > >> >> > > have to say that I am getting a little anxious about quirks like > >> >> > > this > >> >> > > and what it might mean for my users. > > >> >> > > Matt > > >> >> > > (On a slightly related note - I've learned the hard way the > >> >> > > Table.count is a great way to cause atimeout- looks like MySQL and > >> >> > > PostGreSQL handle counts *way* differently...something to keep in > >> >> > > mind > >> >> > > if you're migrating from mysql: > >> >> > >http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#COUNT.28.2A.29) > > >> >> > > On Sep 10, 3:45 am, daniel hoey <danielho...@gmail.com> wrote: > >> >> > > > We go through short periods where we get frequentapptimeouts. The > >> >> > > > pages thattimeoutare often very simple and do not relying on > >> >> > > > external services or performing any demanding database queries. We > >> >> > > > don't get any information in our New Relic transaction traces for > >> >> > > > these queries (we have for othertimeoutsin the past). Basically we > >> >> > > > can't get any information about what is going on, and only know > >> >> > > > about > >> >> > > > the problem if our users tell us. Has anyone else experienced > >> >> > > > similar > >> >> > > > problems or have anything to suggest in terms of investigating the > >> >> > > > root cause? > > >> >> > > > The last time that we are aware of this happening was between > >> >> > > > 06:30 > >> >> > > > and 07:00 GMT on Sept 10. > > >> >> > > On Sep 10, 3:45 am, daniel hoey <danielho...@gmail.com> wrote: > >> >> > > > We go through short periods where we get frequentapptimeouts. The > >> >> > > > pages thattimeoutare often very simple and do not relying on > >> >> > > > external services or performing any demanding database queries. We > >> >> > > > don't get any information in our New Relic transaction traces for > >> >> > > > these queries (we have for othertimeoutsin the past). Basically we > >> >> > > > can't get any information about what is going on, and only know > >> >> > > > about > >> >> > > > the problem if our users tell us. Has anyone else experienced > >> >> > > > similar > >> >> > > > problems or have anything to suggest in terms of investigating the > >> >> > > > root cause? > > >> >> > > > The last time that we are aware of this happening was between > >> >> > > > 06:30 > >> >> > > > and 07:00 GMT on Sept 10. > > >> >> > > -- > >> >> > > You received this message because you are subscribed to the Google > >> >> > > Groups > >> >> > > "Heroku" group. > >> >> > > To post to this group, send email to her...@googlegroups.com. > >> >> > > To unsubscribe from this group, send email to > >> >> > > heroku+unsubscr...@googlegroups.com<heroku%2bunsubscr...@googlegroups.com> > >> >> > > . > >> >> > > For more options, visit this group at > >> >> > >http://groups.google.com/group/heroku?hl=en. > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups "Heroku" group. > >> > To post to this group, send email to her...@googlegroups.com. > >> > To unsubscribe from this group, send > > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Heroku" group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.