Re: Frequent HTTP 500 server errors from search API

2009-02-18 Thread Pete Warden

I'm hitting the same 500 problem with complex queries. It's occasional
enough that I can live with it, but the big issue for me is that I'm
using the JSON interface and appear to get back HTML in the error
content. This doesn't allow me to do any error handling, as the

Re: Frequent HTTP 500 server errors from search API

2009-02-18 Thread Matt Sanford

Hi Nigel,

The HTTP 500 and HTTP 502 are different errors. A 502 is returned  
when there are no more mongrel processes available to handle your  
request, the dreaded Fail Whale. We've been adding more hardware to  
our search system to keep from having problems but we ran into a bug  
yesterday that threw tons of 502s.


The 500 is a different matter. From the queries you described I'm  
guessing these are query timeouts. We've optimized the system for the  
most common use which is 1-2 real words. Some queries take too long  
and we kill them rather than let them back the entire system up  
(causing 502s). When a query gets killed we return a 500. Since we've  
had some time to cache part of the information retries work more  
often, but since there are multiple machines that's not always the  
case. If you can provide me some real example queries that you're  
having trouble (m...@twitter.com) with I can confirm this but it seems  
like the most likely reason. The most commonly killed queries are for  
complex combinations of operators, or queries with multiple words that  
rarely ever appear together.


As for the firehose, we'll be updating the FAQ page (http://apiwiki.twitter.com/FAQ#Whenwillthefirehosebeready 
) as things change. We've been working on this in parallel with OAuth  
and are working on the final touches now.


Thanks;
  — Matt Sanford

On Feb 18, 2009, at 06:08 AM, nigel_spl...@yahoo.com wrote:



Hello,

We get frequent and seemingly-random HTTP 500 errors when calling the
search API.  Sometimes it's 500, sometimes 502.  It's definitely not
throttling, as we don't make that many calls and I know from reading
here that those errors have a specific message.

I've tried without success to find particular searches, times of day,
etc. that are more or less likely to fail.  Eventually I wrote a
little program that generates random three-letter nonsense words and
calls the search API with wget for each one.  Out of 100 different
search terms, usually there are 3-10 failures.  If I try the same
search terms again, sometimes the ones that failed will now work, and
vice versa.  This behavior is similar to what we see in our production
code, which is completely different but also fails randomly and fairly
frequently.

Any ideas?  We'd love to get a solution, since even with retry
policies the errors are frequent enough to really hinder our ability
to get search results.

On a related note, is there any news on availability of the content
firehose?  Last I remember reading, it was going to be made available
to "trusted partners" at the end of January or beginning of February.
Our ideal solution would be to have access to the firehose so we can
index and search the content on our side.

Thanks!




Frequent HTTP 500 server errors from search API

2009-02-18 Thread nigel_spl...@yahoo.com

Hello,

We get frequent and seemingly-random HTTP 500 errors when calling the
search API.  Sometimes it's 500, sometimes 502.  It's definitely not
throttling, as we don't make that many calls and I know from reading
here that those errors have a specific message.

I've tried without success to find particular searches, times of day,
etc. that are more or less likely to fail.  Eventually I wrote a
little program that generates random three-letter nonsense words and
calls the search API with wget for each one.  Out of 100 different
search terms, usually there are 3-10 failures.  If I try the same
search terms again, sometimes the ones that failed will now work, and
vice versa.  This behavior is similar to what we see in our production
code, which is completely different but also fails randomly and fairly
frequently.

Any ideas?  We'd love to get a solution, since even with retry
policies the errors are frequent enough to really hinder our ability
to get search results.

On a related note, is there any news on availability of the content
firehose?  Last I remember reading, it was going to be made available
to "trusted partners" at the end of January or beginning of February.
Our ideal solution would be to have access to the firehose so we can
index and search the content on our side.

Thanks!