[twitter-dev] Re: Possible Bug in Twitter Search API
Hi Brian, My guess is that this is the same since_id/max_id pagination confusion we have always had. If you look at the next_page URL in our API you'll notice that it does not contain the since_id. If you are searching with since_id and requesting multiple pages you need to manually stop pagination once you find an id lower than your original since_id. I know this is a pain but there is a large performance gain in it on our back end. There was an update a few weeks ago [1] where I talked about this and a warning message (twitter:warning in atom and warning in JSON) was added to alert you to the fact it had been removed. Does that sound like the cause of your issue? Thanks; – Matt Sanford / @mzsanford Twitter Dev [1] - http://groups.google.com/group/twitter-development-talk/browse_frm/thread/6e80cb6eec3a16d3?tvc=1 On May 15, 2009, at 7:50 AM, briantroy wrote: I've noticed this before but always tried to deal with it as a bug on my side. It is, however, now clear to me that from time to time Twitter Search API seems to ignore the since_id. We track FollowFriday by polling Twitter Search every so often (the process is throttled from 10 seconds to 180 seconds depending on how many results we get). This works great 90% of the time. But on high volume days (Fridays) I've noticed we get a lot of multi-page responses causing us to make far too many requests to the Twitter API (900/hour). When attempting to figure out why we are making so many requests I uncovered something very interesting. When we get a tweet we store it in our database. That database has a unique index on the customer id/Tweet Id. When we get mulit-page responses from Twitter and iterate through each page the VAST MAJORITY of the Tweets violate this unique index. What does this mean? That we already have that tweet. Today, I turned on some additional debugging and saw that the tweets we were getting from Twitter Search were, in fact, prior to the since_id we sent. This is causing us to POUND the API servers unnecessarily. There is, however, really nothing I can do about it on my end. Here is a snip of the log showing the failed inserts and the ID we are working with. The last line shows you both the old max id and the new max id (after processing the tweets). As you can see every tweet violates the unique constraint (27 is the customer id). You can also see that we've called the API for this one search 1016 times this hour... which is WAY, WAY too much (16.9 times per second): NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522797' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('#bfollowfriday/b edubloggers @CoolCatTeacher @dwarlick @ewanmcintosh @willrich45 @larryferlazzo @suewaters',1806522797, 0, '', 192010, 'WeAreTeachers', 'en', 'http:// s3.amazonaws.com/twitter_production/profile_images/52716611/ Picture_2_normal.png', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522766' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('thx for the #bfollowfriday/b love, @brokesocialite amp; @silveroaklimo. Also thx to @diamondemory amp; @bmichelle for the RTs of FF',1806522766, 0, '', 1149953, 'lmdupont', 'en', 'http://s3.amazonaws.com/twitter_production/ profile_images/188591402/lisaann_normal.jpg', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522760' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('Thx! RT @dpbkmb: #bfollowfriday/b @ifeelgod @americandream09 @DailyHappenings @MrMilestone @emgtay @Nurul54 @mexiabill @naturallyknotty',1806522760, 0, '', 1303322, 'borgellaj', 'en', 'http://s3.amazonaws.com/twitter_production/ profile_images/58399480/img017_normal.jpg', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522759' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('Morning my tweets!!! bfollow friday/b! Dnt forget to RT me in need of followers LOL!',1806522759, 0, '', 11790458, 'Dae_Marie', 'en', 'http://s3.amazonaws.com/ twitter_production/profile_images/199283178/dae_bab_normal.jpg', 'Fri, 15 May 2009 14:41:50 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522752' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url,
[twitter-dev] Re: Possible Bug in Twitter Search API
Matt - I'll verify that is the issue (I assume I should have new results on page one AND page 2 - otherwise there is something else going on). Brian On May 15, 8:33 am, Matt Sanford m...@twitter.com wrote: Hi Brian, My guess is that this is the same since_id/max_id pagination confusion we have always had. If you look at the next_page URL in our API you'll notice that it does not contain the since_id. If you are searching with since_id and requesting multiple pages you need to manually stop pagination once you find an id lower than your original since_id. I know this is a pain but there is a large performance gain in it on our back end. There was an update a few weeks ago [1] where I talked about this and a warning message (twitter:warning in atom and warning in JSON) was added to alert you to the fact it had been removed. Does that sound like the cause of your issue? Thanks; – Matt Sanford / @mzsanford Twitter Dev [1] -http://groups.google.com/group/twitter-development-talk/browse_frm/th... On May 15, 2009, at 7:50 AM, briantroy wrote: I've noticed this before but always tried to deal with it as a bug on my side. It is, however, now clear to me that from time to time Twitter Search API seems to ignore the since_id. We track FollowFriday by polling Twitter Search every so often (the process is throttled from 10 seconds to 180 seconds depending on how many results we get). This works great 90% of the time. But on high volume days (Fridays) I've noticed we get a lot of multi-page responses causing us to make far too many requests to the Twitter API (900/hour). When attempting to figure out why we are making so many requests I uncovered something very interesting. When we get a tweet we store it in our database. That database has a unique index on the customer id/Tweet Id. When we get mulit-page responses from Twitter and iterate through each page the VAST MAJORITY of the Tweets violate this unique index. What does this mean? That we already have that tweet. Today, I turned on some additional debugging and saw that the tweets we were getting from Twitter Search were, in fact, prior to the since_id we sent. This is causing us to POUND the API servers unnecessarily. There is, however, really nothing I can do about it on my end. Here is a snip of the log showing the failed inserts and the ID we are working with. The last line shows you both the old max id and the new max id (after processing the tweets). As you can see every tweet violates the unique constraint (27 is the customer id). You can also see that we've called the API for this one search 1016 times this hour... which is WAY, WAY too much (16.9 times per second): NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522797' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('#bfollowfriday/b edubloggers @CoolCatTeacher @dwarlick @ewanmcintosh @willrich45 @larryferlazzo @suewaters',1806522797, 0, '', 192010, 'WeAreTeachers', 'en', 'http:// s3.amazonaws.com/twitter_production/profile_images/52716611/ Picture_2_normal.png', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522766' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('thx for the #bfollowfriday/b love, @brokesocialite amp; @silveroaklimo. Also thx to @diamondemory amp; @bmichelle for the RTs of FF',1806522766, 0, '', 1149953, 'lmdupont', 'en', 'http://s3.amazonaws.com/twitter_production/ profile_images/188591402/lisaann_normal.jpg', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522760' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('Thx! RT @dpbkmb: #bfollowfriday/b @ifeelgod @americandream09 @DailyHappenings @MrMilestone @emgtay @Nurul54 @mexiabill @naturallyknotty',1806522760, 0, '', 1303322, 'borgellaj', 'en', 'http://s3.amazonaws.com/twitter_production/ profile_images/58399480/img017_normal.jpg', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522759' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('Morning my tweets!!! bfollow friday/b! Dnt forget to RT me in need of followers LOL!',1806522759, 0, '', 11790458, 'Dae_Marie', 'en', 'http://s3.amazonaws.com/
[twitter-dev] Re: Possible Bug in Twitter Search API
Matt - That took care of it... minor change on my side with big resource savings. Where was the original announcement made that this had changed (wondering how I missed it). Thanks! Brian On May 15, 8:33 am, Matt Sanford m...@twitter.com wrote: Hi Brian, My guess is that this is the same since_id/max_id pagination confusion we have always had. If you look at the next_page URL in our API you'll notice that it does not contain the since_id. If you are searching with since_id and requesting multiple pages you need to manually stop pagination once you find an id lower than your original since_id. I know this is a pain but there is a large performance gain in it on our back end. There was an update a few weeks ago [1] where I talked about this and a warning message (twitter:warning in atom and warning in JSON) was added to alert you to the fact it had been removed. Does that sound like the cause of your issue? Thanks; – Matt Sanford / @mzsanford Twitter Dev [1] -http://groups.google.com/group/twitter-development-talk/browse_frm/th... On May 15, 2009, at 7:50 AM, briantroy wrote: I've noticed this before but always tried to deal with it as a bug on my side. It is, however, now clear to me that from time to time Twitter Search API seems to ignore the since_id. We track FollowFriday by polling Twitter Search every so often (the process is throttled from 10 seconds to 180 seconds depending on how many results we get). This works great 90% of the time. But on high volume days (Fridays) I've noticed we get a lot of multi-page responses causing us to make far too many requests to the Twitter API (900/hour). When attempting to figure out why we are making so many requests I uncovered something very interesting. When we get a tweet we store it in our database. That database has a unique index on the customer id/Tweet Id. When we get mulit-page responses from Twitter and iterate through each page the VAST MAJORITY of the Tweets violate this unique index. What does this mean? That we already have that tweet. Today, I turned on some additional debugging and saw that the tweets we were getting from Twitter Search were, in fact, prior to the since_id we sent. This is causing us to POUND the API servers unnecessarily. There is, however, really nothing I can do about it on my end. Here is a snip of the log showing the failed inserts and the ID we are working with. The last line shows you both the old max id and the new max id (after processing the tweets). As you can see every tweet violates the unique constraint (27 is the customer id). You can also see that we've called the API for this one search 1016 times this hour... which is WAY, WAY too much (16.9 times per second): NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522797' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('#bfollowfriday/b edubloggers @CoolCatTeacher @dwarlick @ewanmcintosh @willrich45 @larryferlazzo @suewaters',1806522797, 0, '', 192010, 'WeAreTeachers', 'en', 'http:// s3.amazonaws.com/twitter_production/profile_images/52716611/ Picture_2_normal.png', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522766' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('thx for the #bfollowfriday/b love, @brokesocialite amp; @silveroaklimo. Also thx to @diamondemory amp; @bmichelle for the RTs of FF',1806522766, 0, '', 1149953, 'lmdupont', 'en', 'http://s3.amazonaws.com/twitter_production/ profile_images/188591402/lisaann_normal.jpg', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522760' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('Thx! RT @dpbkmb: #bfollowfriday/b @ifeelgod @americandream09 @DailyHappenings @MrMilestone @emgtay @Nurul54 @mexiabill @naturallyknotty',1806522760, 0, '', 1303322, 'borgellaj', 'en', 'http://s3.amazonaws.com/twitter_production/ profile_images/58399480/img017_normal.jpg', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522759' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('Morning my tweets!!! bfollow friday/b! Dnt forget to RT me in need of followers LOL!',1806522759, 0, '', 11790458, 'Dae_Marie', 'en',
[twitter-dev] Re: Possible Bug in Twitter Search API
Hi Brian, This has always been the case, that thread I linked to earlier is where I made it more explicit. It was always there but it wasn't documented properly. The documentation was updated as well to try and help in the future. Thanks; – Matt Sanford / @mzsanford Twitter Dev On May 15, 2009, at 9:14 AM, briantroy wrote: Matt - That took care of it... minor change on my side with big resource savings. Where was the original announcement made that this had changed (wondering how I missed it). Thanks! Brian On May 15, 8:33 am, Matt Sanford m...@twitter.com wrote: Hi Brian, My guess is that this is the same since_id/max_id pagination confusion we have always had. If you look at the next_page URL in our API you'll notice that it does not contain the since_id. If you are searching with since_id and requesting multiple pages you need to manually stop pagination once you find an id lower than your original since_id. I know this is a pain but there is a large performance gain in it on our back end. There was an update a few weeks ago [1] where I talked about this and a warning message (twitter:warning in atom and warning in JSON) was added to alert you to the fact it had been removed. Does that sound like the cause of your issue? Thanks; – Matt Sanford / @mzsanford Twitter Dev [1] -http://groups.google.com/group/twitter-development-talk/browse_frm/th ... On May 15, 2009, at 7:50 AM, briantroy wrote: I've noticed this before but always tried to deal with it as a bug on my side. It is, however, now clear to me that from time to time Twitter Search API seems to ignore the since_id. We track FollowFriday by polling Twitter Search every so often (the process is throttled from 10 seconds to 180 seconds depending on how many results we get). This works great 90% of the time. But on high volume days (Fridays) I've noticed we get a lot of multi-page responses causing us to make far too many requests to the Twitter API (900/hour). When attempting to figure out why we are making so many requests I uncovered something very interesting. When we get a tweet we store it in our database. That database has a unique index on the customer id/Tweet Id. When we get mulit-page responses from Twitter and iterate through each page the VAST MAJORITY of the Tweets violate this unique index. What does this mean? That we already have that tweet. Today, I turned on some additional debugging and saw that the tweets we were getting from Twitter Search were, in fact, prior to the since_id we sent. This is causing us to POUND the API servers unnecessarily. There is, however, really nothing I can do about it on my end. Here is a snip of the log showing the failed inserts and the ID we are working with. The last line shows you both the old max id and the new max id (after processing the tweets). As you can see every tweet violates the unique constraint (27 is the customer id). You can also see that we've called the API for this one search 1016 times this hour... which is WAY, WAY too much (16.9 times per second): NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522797' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('#bfollowfriday/b edubloggers @CoolCatTeacher @dwarlick @ewanmcintosh @willrich45 @larryferlazzo @suewaters',1806522797, 0, '', 192010, 'WeAreTeachers', 'en', 'http:// s3.amazonaws.com/twitter_production/profile_images/52716611/ Picture_2_normal.png', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522766' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('thx for the #bfollowfriday/b love, @brokesocialite amp; @silveroaklimo. Also thx to @diamondemory amp; @bmichelle for the RTs of FF',1806522766, 0, '', 1149953, 'lmdupont', 'en', 'http://s3.amazonaws.com/twitter_production/ profile_images/188591402/lisaann_normal.jpg', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522760' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user, from_user_id, from_user, iso_language_code, profile_image_url, created_at, bulk_svc_id) values('Thx! RT @dpbkmb: #bfollowfriday/b @ifeelgod @americandream09 @DailyHappenings @MrMilestone @emgtay @Nurul54 @mexiabill @naturallyknotty',1806522760, 0, '', 1303322, 'borgellaj', 'en', 'http://s3.amazonaws.com/twitter_production/ profile_images/58399480/img017_normal.jpg', 'Fri, 15 May 2009 14:41:51 +', 27) NOTICE: 10:45:37 AM on Fri May 15th Tweet insert failed: Duplicate entry '27-1806522759' for key 2 SQL: insert into justsignal.tweets(text, tw_id, to_user_id, to_user,