just to follow up, instagram has evidently stopped blocking/throttling app engine's IPs, or whatever else was happening here. i can now successfully fetch www.instagram.com profile and photo pages from a few different app engine apps.
On Fri, May 6, 2016 at 1:25 PM, 'Nick (Cloud Platform Support)' via Google App Engine <google-appengine@googlegroups.com> wrote: > Thanks for the details! Hopefully this thread is useful to future users. > > > On Friday, May 6, 2016 at 2:57:49 PM UTC-4, Ryan Barrett wrote: >> >> On Fri, May 6, 2016 at 9:47 AM, 'Nick (Cloud Platform Support)' via >> Google App Engine <google-appengine@googlegroups.com> wrote: >> > Hey Ryan, >> > >> > Glad to be of assistance, and I really want to get to the bottom of >> > this. >> > Reviewing the infrastructure used by UrlFetch, this absolutely does make >> > sense, when we consider this tantalizing detail from the documentation: >> > >> >> The URL Fetch service uses an HTTP/1.1 compliant proxy to fetch the >> >> result. >> >> yup. if urlfetch is behind a small or even medium sized set of VIPs or >> IP blocks, and instagram rate limits their www based on IP (individual >> or block), that's it. you could data mine urlfetch's logs and find the >> offending app(s), if any, and play the abuse whack-a-mole game, >> but...meh. >> >> > which sort of proxy configuration you're using, the type of proxy, the >> > average >> > request load, etc.? >> >> sure! it's dirt simple, just apache mod_proxy with these lines in >> httpd.conf: >> >> SSLProxyEngine on >> <Location "/instagram/"> >> ProxyPass "https://www.instagram.com/" >> </Location> >> >> my load is miniscule and pretty constant, roughly 1-2qm on average. >> most of that is to profile URLs (evenly spread across ~500 users), the >> rest to individual photo URLs like eg >> https://www.instagram.com/p/BE4xLpmABFz/. >> >> i used to do 3-5x that much before i throttled down recently. i >> haven't tried, but i expect i could ramp back up to that on the >> reverse proxy and not get 429ed. >> >> >> > On Thursday, May 5, 2016 at 6:36:12 PM UTC-4, Ryan Barrett wrote: >> >> >> >> thanks for going above and beyond, nick! much appreciated. i'm >> >> currently working around it by using a reverse proxy outside of app >> >> engine, so that my requests are charged to a different IP and isolated >> >> from other app engine apps. glad this info is here now for other >> >> people too. >> >> >> >> >> >> On Thu, May 5, 2016 at 2:13 PM, 'Nick (Cloud Platform Support)' via >> >> Google App Engine <google-appengine@googlegroups.com> wrote: >> >> > >> >> > After some extensive testing, I've determined that the 429 you're >> >> > receiving >> >> > is expected behaviour from instagram, and it does relate to a >> >> > windowing >> >> > average, although it may not be the same as that published in their >> >> > documentation for APIs. After sending a few thousand requests in a >> >> > span >> >> > of >> >> > ~15 seconds, I began to receive 429 responses, with some 200's >> >> > intermixed. >> >> > >> >> > Cheers, >> >> > >> >> > Nick >> >> > Cloud Platform Community Support >> >> > >> >> > On Wednesday, May 4, 2016 at 3:21:46 PM UTC-4, Ryan Barrett wrote: >> >> >> >> >> >> On Wed, May 4, 2016 at 12:02 PM, 'Nick (Cloud Platform Support)' via >> >> >> Google App Engine <google-appengine@googlegroups.com> wrote: >> >> >> > So, you're attempting merely to fetch http://www.instagram.com/, >> >> >> > and >> >> >> > you >> >> >> > receive 429 on the first request, and you're not launching many >> >> >> > other >> >> >> > requests at the same time? It seems odd that a rate-limit response >> >> >> > would >> >> >> > come without a condition being reached requiring rate-limiting... >> >> >> > Let >> >> >> > me >> >> >> >> >> >> i'm actually fetching profile URLs, not the front page. eg `import >> >> >> urllib2; urllib2.urlopen('https://www.instagram.com/kevin/')` in >> >> >> https://shell-hrd.appspot.com/ gets 429ed even though i'm not >> >> >> fetching >> >> >> that particular URL in any of my apps. >> >> >> >> >> >> it definitely seems odd, agreed. i only suspect rate >> >> >> limiting/blocking >> >> >> at the IP level because i exhaused the other obvious causes. i'd be >> >> >> happy to be proven wrong! >> >> >> >> >> >> >> >> >> > know what you think in your reply. >> >> >> > >> >> >> > Cheers, >> >> >> > >> >> >> > Nick >> >> >> > Cloud Platform Community Support >> >> >> > >> >> >> > On Wednesday, May 4, 2016 at 1:09:35 PM UTC-4, Ryan Barrett wrote: >> >> >> >> >> >> >> >> thanks for the replies! i should have emphasized that this is for >> >> >> >> www.instagram.com, not the API. API requests are working fine. >> >> >> >> >> >> >> >> you're right that IP blocking wouldn't usually be the first >> >> >> >> culprit >> >> >> >> in >> >> >> >> general, especially for 429s. i tried from a few different apps, >> >> >> >> though, >> >> >> >> including shell-hrd (log in my first post), which pretty much >> >> >> >> never >> >> >> >> uses >> >> >> >> urlfetch otherwise based on its quota numbers, so i doubt it's >> >> >> >> User-Agent >> >> >> >> blocking. i tried an entirely new www.instagram.com URL and still >> >> >> >> got a >> >> >> >> 429, >> >> >> >> so it's probably not specific URLs, at least due to my own >> >> >> >> traffic. >> >> >> >> and >> >> >> >> i >> >> >> >> can fetch the same URL fine from my local machine. hence my IP >> >> >> >> suspicion. >> >> >> >> >> >> >> >> i've already worked around this, so it's not urgent. just figured >> >> >> >> you >> >> >> >> all >> >> >> >> might want to know. thanks again! >> >> >> >> >> >> >> >> On Monday, May 2, 2016 at 10:52:11 AM UTC-7, Nick (Cloud Platform >> >> >> >> Support) >> >> >> >> wrote: >> >> >> >>> >> >> >> >>> Hey Ryan, >> >> >> >>> >> >> >> >>> I'm unsure that this indicates that App Engine specifically is >> >> >> >>> being >> >> >> >>> rate-limited. It's likely that the 429 response is directly >> >> >> >>> related >> >> >> >>> to >> >> >> >>> the >> >> >> >>> frequency with which you're making requests, regardless of the >> >> >> >>> origin >> >> >> >>> of >> >> >> >>> those requests. While not impossible, I suppose, it would be >> >> >> >>> surprising if >> >> >> >>> they were keeping track of App Engine IP ranges and applying a >> >> >> >>> different >> >> >> >>> rate-limit, and would require some thorough A/B testing to >> >> >> >>> prove. >> >> >> >>> So, >> >> >> >>> I >> >> >> >>> recommend just checking their documentation or, if the >> >> >> >>> rate-limit >> >> >> >>> is >> >> >> >>> undocumented, benchmarking to attempt to determine it, and try >> >> >> >>> to >> >> >> >>> fly >> >> >> >>> under >> >> >> >>> it. Generally, exponential-backoff is a good tactic when dealing >> >> >> >>> with >> >> >> >>> rate-limiting. >> >> >> >>> >> >> >> >>> Sincerely, >> >> >> >>> >> >> >> >>> Nick >> >> >> >>> Cloud Platform Community Support >> >> >> >>> >> >> >> >>> On Monday, May 2, 2016 at 11:57:15 AM UTC-4, Nickolas Daskalou >> >> >> >>> wrote: >> >> >> >>>> >> >> >> >>>> Hi Ryan, >> >> >> >>>> >> >> >> >>>> It seems to be working fine for us (SocialPage.me). >> >> >> >>>> >> >> >> >>>> Are you accessing their API using separate access tokens for >> >> >> >>>> each >> >> >> >>>> user? >> >> >> >>>> >> >> >> >>>> Nick >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> On 2 May 2016 at 14:30, Ryan Barrett <goo...@ryanb.org> wrote: >> >> >> >>>>> >> >> >> >>>>> hi all! just FYI, it looks like Instagram is blocking/rate >> >> >> >>>>> limiting >> >> >> >>>>> App >> >> >> >>>>> Engine's IPs from fetching www.instagram.com, both urlfetch >> >> >> >>>>> and >> >> >> >>>>> sockets, >> >> >> >>>>> across apps. e.g. this session from >> >> >> >>>>> https://shell-hrd.appspot.com/ : >> >> >> >>>>> >> >> >> >>>>> >>> urllib2.urlopen('https://www.instagram.com/snarfed/') >> >> >> >>>>> Traceback (most recent call last): >> >> >> >>>>> ... >> >> >> >>>>> File >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> "/base/data/home/runtimes/python/python_dist/lib/python2.5/urllib2.py", >> >> >> >>>>> line >> >> >> >>>>> 506, in http_error_default >> >> >> >>>>> raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) >> >> >> >>>>> HTTPError: HTTP Error 429: Unknown >> >> >> >>>>> >> >> >> >>>>> it's not 100% consistent - i occasionally see requests make it >> >> >> >>>>> through >> >> >> >>>>> - but the majority get 429ed. >> >> >> >>>>> >> >> >> >>>>> not holding my breath, but i figured you all might want to >> >> >> >>>>> know, >> >> >> >>>>> especially in case cloud support people have lines of >> >> >> >>>>> communication >> >> >> >>>>> open >> >> >> >>>>> with instagram/facebook for this kind of thing. >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> You received this message because you are subscribed to the >> >> >> >>>>> Google >> >> >> >>>>> Groups "Google App Engine" group. >> >> >> >>>>> To unsubscribe from this group and stop receiving emails from >> >> >> >>>>> it, >> >> >> >>>>> send >> >> >> >>>>> an email to google-appengi...@googlegroups.com. >> >> >> >>>>> To post to this group, send email to >> >> >> >>>>> google-a...@googlegroups.com. >> >> >> >>>>> Visit this group at >> >> >> >>>>> https://groups.google.com/group/google-appengine. >> >> >> >>>>> To view this discussion on the web visit >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> https://groups.google.com/d/msgid/google-appengine/be7f6ead-fe34-45c4-9ee0-00956b5f89de%40googlegroups.com. >> >> >> >>>>> For more options, visit https://groups.google.com/d/optout. >> >> >> >>>> >> >> >> >>>> >> >> >> > -- >> >> >> > You received this message because you are subscribed to a topic in >> >> >> > the >> >> >> > Google Groups "Google App Engine" group. >> >> >> > To unsubscribe from this topic, visit >> >> >> > >> >> >> > >> >> >> > >> >> >> > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe. >> >> >> > To unsubscribe from this group and all its topics, send an email >> >> >> > to >> >> >> > google-appengine+unsubscr...@googlegroups.com. >> >> >> > To post to this group, send email to >> >> >> > google-appengine@googlegroups.com. >> >> >> > Visit this group at >> >> >> > https://groups.google.com/group/google-appengine. >> >> >> > To view this discussion on the web visit >> >> >> > >> >> >> > >> >> >> > >> >> >> > https://groups.google.com/d/msgid/google-appengine/8ef83fef-658e-48e0-a2f5-c6aee889d455%40googlegroups.com. >> >> >> > >> >> >> > For more options, visit https://groups.google.com/d/optout. >> >> >> >> >> >> -- >> >> >> https://snarfed.org/ >> >> > >> >> > -- >> >> > You received this message because you are subscribed to a topic in >> >> > the >> >> > Google Groups "Google App Engine" group. >> >> > To unsubscribe from this topic, visit >> >> > >> >> > >> >> > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe. >> >> > To unsubscribe from this group and all its topics, send an email to >> >> > google-appengine+unsubscr...@googlegroups.com. >> >> > To post to this group, send email to >> >> > google-appengine@googlegroups.com. >> >> > Visit this group at https://groups.google.com/group/google-appengine. >> >> > To view this discussion on the web visit >> >> > >> >> > >> >> > https://groups.google.com/d/msgid/google-appengine/29c32354-cc82-452b-bc8f-fc4f5a62e464%40googlegroups.com. >> >> > >> >> > For more options, visit https://groups.google.com/d/optout. >> >> >> >> -- >> >> https://snarfed.org/ >> > >> > -- >> > You received this message because you are subscribed to a topic in the >> > Google Groups "Google App Engine" group. >> > To unsubscribe from this topic, visit >> > >> > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe. >> > To unsubscribe from this group and all its topics, send an email to >> > google-appengine+unsubscr...@googlegroups.com. >> > To post to this group, send email to google-appengine@googlegroups.com. >> > Visit this group at https://groups.google.com/group/google-appengine. >> > To view this discussion on the web visit >> > >> > https://groups.google.com/d/msgid/google-appengine/102fb6db-a520-41e0-8c2f-57d969560ad8%40googlegroups.com. >> > >> > For more options, visit https://groups.google.com/d/optout. >> >> >> >> -- >> https://snarfed.org/ > > -- > You received this message because you are subscribed to a topic in the > Google Groups "Google App Engine" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > google-appengine+unsubscr...@googlegroups.com. > To post to this group, send email to google-appengine@googlegroups.com. > Visit this group at https://groups.google.com/group/google-appengine. > To view this discussion on the web visit > https://groups.google.com/d/msgid/google-appengine/345134f9-d8dc-4da8-99b0-f418c51dfd02%40googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. -- https://snarfed.org/ -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at https://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/CA%2BcaGh_fFrC%3DF6xo-FJQVJqTwjruBbFi2kN4n3iSCHEJEJnWgw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.