[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-05-16 Thread Corey Ballou
Thanks for the feedback Brian. Late response here, but I'd be more
than willing to provide you with more details regarding our
application in a private email. You should be receiving said email
shortly.

Regards,

Corey

On Apr 14, 1:12 pm, Brian Sutorius  wrote:
> While the Streaming API may not provide processed results to you in
> the way that search queries can (logical ORs, etc.), it's a more
> scalable solution for returning a lot of Tweets. Our search system can
> rate limit queries if they become too computationally expensive (in
> addition to the normal query limit), so continuing to add parameters
> to the query up front rather than doing this processing yourself may
> cause you to keep running into limits. Ultimately, circumventing the
> limits put in place by our APIs is not allowed by our API ToS, and
> building your architecture this way just to get around the defaults is
> something we strongly discourage. If you keep being rate limited, you
> should think about re-factoring your prioritization strategy.
>
> Can you go into a little more detail about what your application does?
> We might be able to guide you towards a mix of Streaming API and
> search queries that gets you what you need but stays within the rate
> limits.
>
> Brian Sutorius
> Twitter API Policy
>
> On Apr 13, 10:28 am, Corey Ballou  wrote:
>
>
>
>
>
>
>
> > I'm still looking for a community leader answer on this one.
>
> > On Apr 11, 5:50 pm, Corey Ballou  wrote:
>
> > > Thanks for the reply, I appreciate it.
>
> > > I have concerns regarding the streaming APIs, which mainly concern the
> > > following:
>
> > > * usage of logical OR when using locations
> > > * firehose limitations
> > > * the user’s location field is not used to filter tweets
> > > * increased application complexity for parsing the resulting stream of
> > > data back out into individual searches
>
> > > I know that the Search API is not Twitter's preferred choice, but it's
> > > currently returning the best applicable results for my application.
> > > It's also worth noting that the API recently received a drastic
> > > improvement to speed which should theoretically relax the strain on
> > > the API:
>
> > >http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faste...
>
> > > I guess I'm mainly interested in knowing whether @twitterapi will
> > > allow me to use the Search API in the manner I indicated above?
> > > Essentially I would be willing to guarantee the application worker
> > > nodes handles 420 rate limiting errors accordingly while still
> > > supporting multiple twitter accounts and searches.
>
> > > On Apr 11, 1:05 pm, "M. Edward (Ed) Borasky" 
> > > research.net> wrote:
> > > > I don't see an answer here, but I'll tell you how *I* would go about
> > > > implementing this:
>
> > > > 1. Switch to the Streaming API. Using Search in an application puts a 
> > > > strain
> > > > on Twitter's servers and makes it difficult to Twitter to manage 
> > > > capacity.
> > > > That's why it's rate-limited and why the rate limits aren't publicly
> > > > disclosed.
>
> > > > 2. If your application is a desktop application, use User Streams. If 
> > > > it is
> > > > a server, use User Streams on a desktop or the low-frequency free 
> > > > access to
> > > > Streaming on a server to prototype and develop. Your target for a server
> > > > will be Site Streams, but that's in closed beta at the moment IIRC.
>
> > > > 3. *Concurrently with development*, your business development / sales /
> > > > marketing / planning people, or yourself, if it's a one-person shop, 
> > > > should
> > > > be negotiating with Twitter for access to Site Streams, I'm assuming an
> > > > "agile" development methodology - customer-in-the-loop - and one of the
> > > > parties that needs to be in the loop is Twitter for Site Streams. You 
> > > > simply
> > > > *can't* build an at-scale Twitter application without direct business
> > > > discussions with Twitter!
>
> > > > On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou  wrote:
> > > > > I tried speaking with Ryan Sarver directly, but he's forwarding me
> > > > > here to the community advocates to answer. I believe this answer will
> > > > > need to come top down from Twitter, as it's your rate limiting that
> > > > > I'm most worried about.
>
> > > > > I have a technical question for all of you in regards to the Search
> > > > > API as I want to maintain full compliancy. Currently, the old Search
> > > > > API implementation (albeit slower) provides a fuller result set and
> > > > > allows for more flexibility in the types and combinations of searches
> > > > > allowed. The manner I have developed my application would allow for a
> > > > > number of daemonized worker instances running on different IP
> > > > > addresses to make calls to the search API on behalf of the stored
> > > > > OAuth credentials to avoid rate limiting issues.
>
> > > > > I had a conversation with the Pluggio developer in which he stated
> > > > > Twitt

[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-04-14 Thread Brian Sutorius
While the Streaming API may not provide processed results to you in
the way that search queries can (logical ORs, etc.), it's a more
scalable solution for returning a lot of Tweets. Our search system can
rate limit queries if they become too computationally expensive (in
addition to the normal query limit), so continuing to add parameters
to the query up front rather than doing this processing yourself may
cause you to keep running into limits. Ultimately, circumventing the
limits put in place by our APIs is not allowed by our API ToS, and
building your architecture this way just to get around the defaults is
something we strongly discourage. If you keep being rate limited, you
should think about re-factoring your prioritization strategy.

Can you go into a little more detail about what your application does?
We might be able to guide you towards a mix of Streaming API and
search queries that gets you what you need but stays within the rate
limits.

Brian Sutorius
Twitter API Policy

On Apr 13, 10:28 am, Corey Ballou  wrote:
> I'm still looking for a community leader answer on this one.
>
> On Apr 11, 5:50 pm, Corey Ballou  wrote:
>
>
>
> > Thanks for the reply, I appreciate it.
>
> > I have concerns regarding the streaming APIs, which mainly concern the
> > following:
>
> > * usage of logical OR when using locations
> > * firehose limitations
> > * the user’s location field is not used to filter tweets
> > * increased application complexity for parsing the resulting stream of
> > data back out into individual searches
>
> > I know that the Search API is not Twitter's preferred choice, but it's
> > currently returning the best applicable results for my application.
> > It's also worth noting that the API recently received a drastic
> > improvement to speed which should theoretically relax the strain on
> > the API:
>
> >http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faste...
>
> > I guess I'm mainly interested in knowing whether @twitterapi will
> > allow me to use the Search API in the manner I indicated above?
> > Essentially I would be willing to guarantee the application worker
> > nodes handles 420 rate limiting errors accordingly while still
> > supporting multiple twitter accounts and searches.
>
> > On Apr 11, 1:05 pm, "M. Edward (Ed) Borasky" 
> > research.net> wrote:
> > > I don't see an answer here, but I'll tell you how *I* would go about
> > > implementing this:
>
> > > 1. Switch to the Streaming API. Using Search in an application puts a 
> > > strain
> > > on Twitter's servers and makes it difficult to Twitter to manage capacity.
> > > That's why it's rate-limited and why the rate limits aren't publicly
> > > disclosed.
>
> > > 2. If your application is a desktop application, use User Streams. If it 
> > > is
> > > a server, use User Streams on a desktop or the low-frequency free access 
> > > to
> > > Streaming on a server to prototype and develop. Your target for a server
> > > will be Site Streams, but that's in closed beta at the moment IIRC.
>
> > > 3. *Concurrently with development*, your business development / sales /
> > > marketing / planning people, or yourself, if it's a one-person shop, 
> > > should
> > > be negotiating with Twitter for access to Site Streams, I'm assuming an
> > > "agile" development methodology - customer-in-the-loop - and one of the
> > > parties that needs to be in the loop is Twitter for Site Streams. You 
> > > simply
> > > *can't* build an at-scale Twitter application without direct business
> > > discussions with Twitter!
>
> > > On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou  wrote:
> > > > I tried speaking with Ryan Sarver directly, but he's forwarding me
> > > > here to the community advocates to answer. I believe this answer will
> > > > need to come top down from Twitter, as it's your rate limiting that
> > > > I'm most worried about.
>
> > > > I have a technical question for all of you in regards to the Search
> > > > API as I want to maintain full compliancy. Currently, the old Search
> > > > API implementation (albeit slower) provides a fuller result set and
> > > > allows for more flexibility in the types and combinations of searches
> > > > allowed. The manner I have developed my application would allow for a
> > > > number of daemonized worker instances running on different IP
> > > > addresses to make calls to the search API on behalf of the stored
> > > > OAuth credentials to avoid rate limiting issues.
>
> > > > I had a conversation with the Pluggio developer in which he stated
> > > > Twitter had threatened to shutdown his application if he didn't switch
> > > > to a different implementation of the Search API. The problem indicated
> > > > was that he was performing searches for multiple Twitter accounts,
> > > > which is exactly my use case. Site streams does not make as much sense
> > > > for my application given the search queries I wish to perform and the
> > > > necessity for logical AND operations on geo-location.
>
>

[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-04-14 Thread Brian Sutorius


On Apr 13, 10:28 am, Corey Ballou  wrote:
> I'm still looking for a community leader answer on this one.
>
> On Apr 11, 5:50 pm, Corey Ballou  wrote:
>
>
>
> > Thanks for the reply, I appreciate it.
>
> > I have concerns regarding the streaming APIs, which mainly concern the
> > following:
>
> > * usage of logical OR when using locations
> > * firehose limitations
> > * the user’s location field is not used to filter tweets
> > * increased application complexity for parsing the resulting stream of
> > data back out into individual searches
>
> > I know that the Search API is not Twitter's preferred choice, but it's
> > currently returning the best applicable results for my application.
> > It's also worth noting that the API recently received a drastic
> > improvement to speed which should theoretically relax the strain on
> > the API:
>
> >http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faste...
>
> > I guess I'm mainly interested in knowing whether @twitterapi will
> > allow me to use the Search API in the manner I indicated above?
> > Essentially I would be willing to guarantee the application worker
> > nodes handles 420 rate limiting errors accordingly while still
> > supporting multiple twitter accounts and searches.
>
> > On Apr 11, 1:05 pm, "M. Edward (Ed) Borasky" 
> > research.net> wrote:
> > > I don't see an answer here, but I'll tell you how *I* would go about
> > > implementing this:
>
> > > 1. Switch to the Streaming API. Using Search in an application puts a 
> > > strain
> > > on Twitter's servers and makes it difficult to Twitter to manage capacity.
> > > That's why it's rate-limited and why the rate limits aren't publicly
> > > disclosed.
>
> > > 2. If your application is a desktop application, use User Streams. If it 
> > > is
> > > a server, use User Streams on a desktop or the low-frequency free access 
> > > to
> > > Streaming on a server to prototype and develop. Your target for a server
> > > will be Site Streams, but that's in closed beta at the moment IIRC.
>
> > > 3. *Concurrently with development*, your business development / sales /
> > > marketing / planning people, or yourself, if it's a one-person shop, 
> > > should
> > > be negotiating with Twitter for access to Site Streams, I'm assuming an
> > > "agile" development methodology - customer-in-the-loop - and one of the
> > > parties that needs to be in the loop is Twitter for Site Streams. You 
> > > simply
> > > *can't* build an at-scale Twitter application without direct business
> > > discussions with Twitter!
>
> > > On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou  wrote:
> > > > I tried speaking with Ryan Sarver directly, but he's forwarding me
> > > > here to the community advocates to answer. I believe this answer will
> > > > need to come top down from Twitter, as it's your rate limiting that
> > > > I'm most worried about.
>
> > > > I have a technical question for all of you in regards to the Search
> > > > API as I want to maintain full compliancy. Currently, the old Search
> > > > API implementation (albeit slower) provides a fuller result set and
> > > > allows for more flexibility in the types and combinations of searches
> > > > allowed. The manner I have developed my application would allow for a
> > > > number of daemonized worker instances running on different IP
> > > > addresses to make calls to the search API on behalf of the stored
> > > > OAuth credentials to avoid rate limiting issues.
>
> > > > I had a conversation with the Pluggio developer in which he stated
> > > > Twitter had threatened to shutdown his application if he didn't switch
> > > > to a different implementation of the Search API. The problem indicated
> > > > was that he was performing searches for multiple Twitter accounts,
> > > > which is exactly my use case. Site streams does not make as much sense
> > > > for my application given the search queries I wish to perform and the
> > > > necessity for logical AND operations on geo-location.
>
> > > > Do you foresee any problems with my current method of using different
> > > > IP addresses to stay under the rate limit? I'm trying to stay in full
> > > > compliance with Twitter's TOS and would love to find the most
> > > > applicable and API friendly solution. I know headway is being made
> > > > with Twitter's new search implementation so I would like to stay ahead
> > > > of the curve and not get myself stuck in a box.
>
> > > > I still need a method for polling for new search results (say, every
> > > > 30 minutes, dependent upon the pricing plan) for non-logged in users.
>
> > > > Below is a scaled down representation of how I'm currently handling
> > > > searches to help you decide the best plan of action:
>
> > > > 1) Searches are performed on a rolling queue basis, say one search
> > > > every thirty minutes. There can be a finite number of searches per
> > > > Twitter user (say 5 searches per Twitter account). There can be any
> > > > number of Twitter accounts.
> 

[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-04-13 Thread Corey Ballou
I'm still looking for a community leader answer on this one.

On Apr 11, 5:50 pm, Corey Ballou  wrote:
> Thanks for the reply, I appreciate it.
>
> I have concerns regarding the streaming APIs, which mainly concern the
> following:
>
> * usage of logical OR when using locations
> * firehose limitations
> * the user’s location field is not used to filter tweets
> * increased application complexity for parsing the resulting stream of
> data back out into individual searches
>
> I know that the Search API is not Twitter's preferred choice, but it's
> currently returning the best applicable results for my application.
> It's also worth noting that the API recently received a drastic
> improvement to speed which should theoretically relax the strain on
> the API:
>
> http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faste...
>
> I guess I'm mainly interested in knowing whether @twitterapi will
> allow me to use the Search API in the manner I indicated above?
> Essentially I would be willing to guarantee the application worker
> nodes handles 420 rate limiting errors accordingly while still
> supporting multiple twitter accounts and searches.
>
> On Apr 11, 1:05 pm, "M. Edward (Ed) Borasky" 
>
>
>
>
>
>
> research.net> wrote:
> > I don't see an answer here, but I'll tell you how *I* would go about
> > implementing this:
>
> > 1. Switch to the Streaming API. Using Search in an application puts a strain
> > on Twitter's servers and makes it difficult to Twitter to manage capacity.
> > That's why it's rate-limited and why the rate limits aren't publicly
> > disclosed.
>
> > 2. If your application is a desktop application, use User Streams. If it is
> > a server, use User Streams on a desktop or the low-frequency free access to
> > Streaming on a server to prototype and develop. Your target for a server
> > will be Site Streams, but that's in closed beta at the moment IIRC.
>
> > 3. *Concurrently with development*, your business development / sales /
> > marketing / planning people, or yourself, if it's a one-person shop, should
> > be negotiating with Twitter for access to Site Streams, I'm assuming an
> > "agile" development methodology - customer-in-the-loop - and one of the
> > parties that needs to be in the loop is Twitter for Site Streams. You simply
> > *can't* build an at-scale Twitter application without direct business
> > discussions with Twitter!
>
> > On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou  wrote:
> > > I tried speaking with Ryan Sarver directly, but he's forwarding me
> > > here to the community advocates to answer. I believe this answer will
> > > need to come top down from Twitter, as it's your rate limiting that
> > > I'm most worried about.
>
> > > I have a technical question for all of you in regards to the Search
> > > API as I want to maintain full compliancy. Currently, the old Search
> > > API implementation (albeit slower) provides a fuller result set and
> > > allows for more flexibility in the types and combinations of searches
> > > allowed. The manner I have developed my application would allow for a
> > > number of daemonized worker instances running on different IP
> > > addresses to make calls to the search API on behalf of the stored
> > > OAuth credentials to avoid rate limiting issues.
>
> > > I had a conversation with the Pluggio developer in which he stated
> > > Twitter had threatened to shutdown his application if he didn't switch
> > > to a different implementation of the Search API. The problem indicated
> > > was that he was performing searches for multiple Twitter accounts,
> > > which is exactly my use case. Site streams does not make as much sense
> > > for my application given the search queries I wish to perform and the
> > > necessity for logical AND operations on geo-location.
>
> > > Do you foresee any problems with my current method of using different
> > > IP addresses to stay under the rate limit? I'm trying to stay in full
> > > compliance with Twitter's TOS and would love to find the most
> > > applicable and API friendly solution. I know headway is being made
> > > with Twitter's new search implementation so I would like to stay ahead
> > > of the curve and not get myself stuck in a box.
>
> > > I still need a method for polling for new search results (say, every
> > > 30 minutes, dependent upon the pricing plan) for non-logged in users.
>
> > > Below is a scaled down representation of how I'm currently handling
> > > searches to help you decide the best plan of action:
>
> > > 1) Searches are performed on a rolling queue basis, say one search
> > > every thirty minutes. There can be a finite number of searches per
> > > Twitter user (say 5 searches per Twitter account). There can be any
> > > number of Twitter accounts.
> > > 2) Search results are stored locally for retrieval by a javascript
> > > AJAX long-poller every minute to check for frequent changes.
> > > 3) When a user visits the search results page and filters results, no
> > > A

[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-04-11 Thread Corey Ballou
Thanks for the reply, I appreciate it.

I have concerns regarding the streaming APIs, which mainly concern the
following:

* usage of logical OR when using locations
* firehose limitations
* the user’s location field is not used to filter tweets
* increased application complexity for parsing the resulting stream of
data back out into individual searches

I know that the Search API is not Twitter's preferred choice, but it's
currently returning the best applicable results for my application.
It's also worth noting that the API recently received a drastic
improvement to speed which should theoretically relax the strain on
the API:

http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html

I guess I'm mainly interested in knowing whether @twitterapi will
allow me to use the Search API in the manner I indicated above?
Essentially I would be willing to guarantee the application worker
nodes handles 420 rate limiting errors accordingly while still
supporting multiple twitter accounts and searches.

On Apr 11, 1:05 pm, "M. Edward (Ed) Borasky"  wrote:
> I don't see an answer here, but I'll tell you how *I* would go about
> implementing this:
>
> 1. Switch to the Streaming API. Using Search in an application puts a strain
> on Twitter's servers and makes it difficult to Twitter to manage capacity.
> That's why it's rate-limited and why the rate limits aren't publicly
> disclosed.
>
> 2. If your application is a desktop application, use User Streams. If it is
> a server, use User Streams on a desktop or the low-frequency free access to
> Streaming on a server to prototype and develop. Your target for a server
> will be Site Streams, but that's in closed beta at the moment IIRC.
>
> 3. *Concurrently with development*, your business development / sales /
> marketing / planning people, or yourself, if it's a one-person shop, should
> be negotiating with Twitter for access to Site Streams, I'm assuming an
> "agile" development methodology - customer-in-the-loop - and one of the
> parties that needs to be in the loop is Twitter for Site Streams. You simply
> *can't* build an at-scale Twitter application without direct business
> discussions with Twitter!
>
>
>
>
>
>
>
>
>
> On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou  wrote:
> > I tried speaking with Ryan Sarver directly, but he's forwarding me
> > here to the community advocates to answer. I believe this answer will
> > need to come top down from Twitter, as it's your rate limiting that
> > I'm most worried about.
>
> > I have a technical question for all of you in regards to the Search
> > API as I want to maintain full compliancy. Currently, the old Search
> > API implementation (albeit slower) provides a fuller result set and
> > allows for more flexibility in the types and combinations of searches
> > allowed. The manner I have developed my application would allow for a
> > number of daemonized worker instances running on different IP
> > addresses to make calls to the search API on behalf of the stored
> > OAuth credentials to avoid rate limiting issues.
>
> > I had a conversation with the Pluggio developer in which he stated
> > Twitter had threatened to shutdown his application if he didn't switch
> > to a different implementation of the Search API. The problem indicated
> > was that he was performing searches for multiple Twitter accounts,
> > which is exactly my use case. Site streams does not make as much sense
> > for my application given the search queries I wish to perform and the
> > necessity for logical AND operations on geo-location.
>
> > Do you foresee any problems with my current method of using different
> > IP addresses to stay under the rate limit? I'm trying to stay in full
> > compliance with Twitter's TOS and would love to find the most
> > applicable and API friendly solution. I know headway is being made
> > with Twitter's new search implementation so I would like to stay ahead
> > of the curve and not get myself stuck in a box.
>
> > I still need a method for polling for new search results (say, every
> > 30 minutes, dependent upon the pricing plan) for non-logged in users.
>
> > Below is a scaled down representation of how I'm currently handling
> > searches to help you decide the best plan of action:
>
> > 1) Searches are performed on a rolling queue basis, say one search
> > every thirty minutes. There can be a finite number of searches per
> > Twitter user (say 5 searches per Twitter account). There can be any
> > number of Twitter accounts.
> > 2) Search results are stored locally for retrieval by a javascript
> > AJAX long-poller every minute to check for frequent changes.
> > 3) When a user visits the search results page and filters results, no
> > API calls to Twitter are made, only a local query is required
>
> > Due to this process, the queue is constantly searching for the next
> > searches and mentions to perform. I foresee rate limiting concerns
> > cropping up with searches being performed for any num