An interesting update! Hard to say whether something on our side or theirs 
has changed, or if it's merely a matter of the average load coming from the 
UrlFetch proxies... If this rears its head again in future, I think it's 
worth reporting to the Public Issue Tracker 
<http://code.google.com/p/google-cloud-platform/issues/list>.

Cheers,

Nick
Cloud Platform Community Support

On Sunday, May 15, 2016 at 11:55:07 PM UTC-4, Ryan Barrett wrote:
>
> just to follow up, instagram has evidently stopped blocking/throttling 
> app engine's IPs, or whatever else was happening here. i can now 
> successfully fetch www.instagram.com profile and photo pages from a 
> few different app engine apps. 
>
> On Fri, May 6, 2016 at 1:25 PM, 'Nick (Cloud Platform Support)' via 
> Google App Engine <google-appengine@googlegroups.com> wrote: 
> > Thanks for the details! Hopefully this thread is useful to future users. 
> > 
> > 
> > On Friday, May 6, 2016 at 2:57:49 PM UTC-4, Ryan Barrett wrote: 
> >> 
> >> On Fri, May 6, 2016 at 9:47 AM, 'Nick (Cloud Platform Support)' via 
> >> Google App Engine <google-appengine@googlegroups.com> wrote: 
> >> > Hey Ryan, 
> >> > 
> >> > Glad to be of assistance, and I really want to get to the bottom of 
> >> > this. 
> >> > Reviewing the infrastructure used by UrlFetch, this absolutely does 
> make 
> >> > sense, when we consider this tantalizing detail from the 
> documentation: 
> >> > 
> >> >> The URL Fetch service uses an HTTP/1.1 compliant proxy to fetch the 
> >> >> result. 
> >> 
> >> yup. if urlfetch is behind a small or even medium sized set of VIPs or 
> >> IP blocks, and instagram rate limits their www based on IP (individual 
> >> or block), that's it. you could data mine urlfetch's logs and find the 
> >> offending app(s), if any, and play the abuse whack-a-mole game, 
> >> but...meh. 
> >> 
> >> > which sort of proxy configuration you're using, the type of proxy, 
> the 
> >> > average 
> >> > request load, etc.? 
> >> 
> >> sure! it's dirt simple, just apache mod_proxy with these lines in 
> >> httpd.conf: 
> >> 
> >> SSLProxyEngine on 
> >> <Location "/instagram/"> 
> >>     ProxyPass "https://www.instagram.com/"; 
> >> </Location> 
> >> 
> >> my load is miniscule and pretty constant, roughly 1-2qm on average. 
> >> most of that is to profile URLs (evenly spread across ~500 users), the 
> >> rest to individual photo URLs like eg 
> >> https://www.instagram.com/p/BE4xLpmABFz/. 
> >> 
> >> i used to do 3-5x that much before i throttled down recently. i 
> >> haven't tried, but i expect i could ramp back up to that on the 
> >> reverse proxy and not get 429ed. 
> >> 
> >> 
> >> > On Thursday, May 5, 2016 at 6:36:12 PM UTC-4, Ryan Barrett wrote: 
> >> >> 
> >> >> thanks for going above and beyond, nick! much appreciated. i'm 
> >> >> currently working around it by using a reverse proxy outside of app 
> >> >> engine, so that my requests are charged to a different IP and 
> isolated 
> >> >> from other app engine apps. glad this info is here now for other 
> >> >> people too. 
> >> >> 
> >> >> 
> >> >> On Thu, May 5, 2016 at 2:13 PM, 'Nick (Cloud Platform Support)' via 
> >> >> Google App Engine <google-appengine@googlegroups.com> wrote: 
> >> >> > 
> >> >> > After some extensive testing, I've determined that the 429 you're 
> >> >> > receiving 
> >> >> > is expected behaviour from instagram, and it does relate to a 
> >> >> > windowing 
> >> >> > average, although it may not be the same as that published in 
> their 
> >> >> > documentation for APIs. After sending a few thousand requests in a 
> >> >> > span 
> >> >> > of 
> >> >> > ~15 seconds, I began to receive 429 responses, with some 200's 
> >> >> > intermixed. 
> >> >> > 
> >> >> > Cheers, 
> >> >> > 
> >> >> > Nick 
> >> >> > Cloud Platform Community Support 
> >> >> > 
> >> >> > On Wednesday, May 4, 2016 at 3:21:46 PM UTC-4, Ryan Barrett wrote: 
> >> >> >> 
> >> >> >> On Wed, May 4, 2016 at 12:02 PM, 'Nick (Cloud Platform Support)' 
> via 
> >> >> >> Google App Engine <google-appengine@googlegroups.com> wrote: 
> >> >> >> > So, you're attempting merely to fetch http://www.instagram.com/, 
>
> >> >> >> > and 
> >> >> >> > you 
> >> >> >> > receive 429 on the first request, and you're not launching many 
> >> >> >> > other 
> >> >> >> > requests at the same time? It seems odd that a rate-limit 
> response 
> >> >> >> > would 
> >> >> >> > come without a condition being reached requiring 
> rate-limiting... 
> >> >> >> > Let 
> >> >> >> > me 
> >> >> >> 
> >> >> >> i'm actually fetching profile URLs, not the front page. eg 
> `import 
> >> >> >> urllib2; urllib2.urlopen('https://www.instagram.com/kevin/')` 
> <https://www.instagram.com/kevin/')> in 
> >> >> >> https://shell-hrd.appspot.com/ gets 429ed even though i'm not 
> >> >> >> fetching 
> >> >> >> that particular URL in any of my apps. 
> >> >> >> 
> >> >> >> it definitely seems odd, agreed. i only suspect rate 
> >> >> >> limiting/blocking 
> >> >> >> at the IP level because i exhaused the other obvious causes. i'd 
> be 
> >> >> >> happy to be proven wrong! 
> >> >> >> 
> >> >> >> 
> >> >> >> > know what you think in your reply. 
> >> >> >> > 
> >> >> >> > Cheers, 
> >> >> >> > 
> >> >> >> > Nick 
> >> >> >> > Cloud Platform Community Support 
> >> >> >> > 
> >> >> >> > On Wednesday, May 4, 2016 at 1:09:35 PM UTC-4, Ryan Barrett 
> wrote: 
> >> >> >> >> 
> >> >> >> >> thanks for the replies! i should have emphasized that this is 
> for 
> >> >> >> >> www.instagram.com, not the API. API requests are working 
> fine. 
> >> >> >> >> 
> >> >> >> >> you're right that IP blocking wouldn't usually be the first 
> >> >> >> >> culprit 
> >> >> >> >> in 
> >> >> >> >> general, especially for 429s. i tried from a few different 
> apps, 
> >> >> >> >> though, 
> >> >> >> >> including shell-hrd (log in my first post), which pretty much 
> >> >> >> >> never 
> >> >> >> >> uses 
> >> >> >> >> urlfetch otherwise based on its quota numbers, so i doubt it's 
> >> >> >> >> User-Agent 
> >> >> >> >> blocking. i tried an entirely new www.instagram.com URL and 
> still 
> >> >> >> >> got a 
> >> >> >> >> 429, 
> >> >> >> >> so it's probably not specific URLs, at least due to my own 
> >> >> >> >> traffic. 
> >> >> >> >> and 
> >> >> >> >> i 
> >> >> >> >> can fetch the same URL fine from my local machine. hence my IP 
> >> >> >> >> suspicion. 
> >> >> >> >> 
> >> >> >> >> i've already worked around this, so it's not urgent. just 
> figured 
> >> >> >> >> you 
> >> >> >> >> all 
> >> >> >> >> might want to know. thanks again! 
> >> >> >> >> 
> >> >> >> >> On Monday, May 2, 2016 at 10:52:11 AM UTC-7, Nick (Cloud 
> Platform 
> >> >> >> >> Support) 
> >> >> >> >> wrote: 
> >> >> >> >>> 
> >> >> >> >>> Hey Ryan, 
> >> >> >> >>> 
> >> >> >> >>> I'm unsure that this indicates that App Engine specifically 
> is 
> >> >> >> >>> being 
> >> >> >> >>> rate-limited. It's likely that the 429 response is directly 
> >> >> >> >>> related 
> >> >> >> >>> to 
> >> >> >> >>> the 
> >> >> >> >>> frequency with which you're making requests, regardless of 
> the 
> >> >> >> >>> origin 
> >> >> >> >>> of 
> >> >> >> >>> those requests. While not impossible, I suppose, it would be 
> >> >> >> >>> surprising if 
> >> >> >> >>> they were keeping track of App Engine IP ranges and applying 
> a 
> >> >> >> >>> different 
> >> >> >> >>> rate-limit, and would require some thorough A/B testing to 
> >> >> >> >>> prove. 
> >> >> >> >>> So, 
> >> >> >> >>> I 
> >> >> >> >>> recommend just checking their documentation or, if the 
> >> >> >> >>> rate-limit 
> >> >> >> >>> is 
> >> >> >> >>> undocumented, benchmarking to attempt to determine it, and 
> try 
> >> >> >> >>> to 
> >> >> >> >>> fly 
> >> >> >> >>> under 
> >> >> >> >>> it. Generally, exponential-backoff is a good tactic when 
> dealing 
> >> >> >> >>> with 
> >> >> >> >>> rate-limiting. 
> >> >> >> >>> 
> >> >> >> >>> Sincerely, 
> >> >> >> >>> 
> >> >> >> >>> Nick 
> >> >> >> >>> Cloud Platform Community Support 
> >> >> >> >>> 
> >> >> >> >>> On Monday, May 2, 2016 at 11:57:15 AM UTC-4, Nickolas 
> Daskalou 
> >> >> >> >>> wrote: 
> >> >> >> >>>> 
> >> >> >> >>>> Hi Ryan, 
> >> >> >> >>>> 
> >> >> >> >>>> It seems to be working fine for us (SocialPage.me). 
> >> >> >> >>>> 
> >> >> >> >>>> Are you accessing their API using separate access tokens for 
> >> >> >> >>>> each 
> >> >> >> >>>> user? 
> >> >> >> >>>> 
> >> >> >> >>>> Nick 
> >> >> >> >>>> 
> >> >> >> >>>> 
> >> >> >> >>>> On 2 May 2016 at 14:30, Ryan Barrett <goo...@ryanb.org> 
> wrote: 
> >> >> >> >>>>> 
> >> >> >> >>>>> hi all! just FYI, it looks like Instagram is blocking/rate 
> >> >> >> >>>>> limiting 
> >> >> >> >>>>> App 
> >> >> >> >>>>> Engine's IPs from fetching www.instagram.com, both 
> urlfetch 
> >> >> >> >>>>> and 
> >> >> >> >>>>> sockets, 
> >> >> >> >>>>> across apps. e.g. this session from 
> >> >> >> >>>>> https://shell-hrd.appspot.com/ : 
> >> >> >> >>>>> 
> >> >> >> >>>>> >>> urllib2.urlopen('https://www.instagram.com/snarfed/') 
> >> >> >> >>>>> Traceback (most recent call last): 
> >> >> >> >>>>> ... 
> >> >> >> >>>>>   File 
> >> >> >> >>>>> 
> >> >> >> >>>>> 
> >> >> >> >>>>> 
> >> >> >> >>>>> 
> "/base/data/home/runtimes/python/python_dist/lib/python2.5/urllib2.py", 
> line 
> >> >> >> >>>>> 506, in http_error_default 
> >> >> >> >>>>>     raise HTTPError(req.get_full_url(), code, msg, hdrs, 
> fp) 
> >> >> >> >>>>> HTTPError: HTTP Error 429: Unknown 
> >> >> >> >>>>> 
> >> >> >> >>>>> it's not 100% consistent - i occasionally see requests make 
> it 
> >> >> >> >>>>> through 
> >> >> >> >>>>> - but the majority get 429ed. 
> >> >> >> >>>>> 
> >> >> >> >>>>> not holding my breath, but i figured you all might want to 
> >> >> >> >>>>> know, 
> >> >> >> >>>>> especially in case cloud support people have lines of 
> >> >> >> >>>>> communication 
> >> >> >> >>>>> open 
> >> >> >> >>>>> with instagram/facebook for this kind of thing. 
> >> >> >> >>>>> 
> >> >> >> >>>>> -- 
> >> >> >> >>>>> You received this message because you are subscribed to the 
> >> >> >> >>>>> Google 
> >> >> >> >>>>> Groups "Google App Engine" group. 
> >> >> >> >>>>> To unsubscribe from this group and stop receiving emails 
> from 
> >> >> >> >>>>> it, 
> >> >> >> >>>>> send 
> >> >> >> >>>>> an email to google-appengi...@googlegroups.com. 
> >> >> >> >>>>> To post to this group, send email to 
> >> >> >> >>>>> google-a...@googlegroups.com. 
> >> >> >> >>>>> Visit this group at 
> >> >> >> >>>>> https://groups.google.com/group/google-appengine. 
> >> >> >> >>>>> To view this discussion on the web visit 
> >> >> >> >>>>> 
> >> >> >> >>>>> 
> >> >> >> >>>>> 
> >> >> >> >>>>> 
> https://groups.google.com/d/msgid/google-appengine/be7f6ead-fe34-45c4-9ee0-00956b5f89de%40googlegroups.com.
>  
>
> >> >> >> >>>>> For more options, visit https://groups.google.com/d/optout. 
>
> >> >> >> >>>> 
> >> >> >> >>>> 
> >> >> >> > -- 
> >> >> >> > You received this message because you are subscribed to a topic 
> in 
> >> >> >> > the 
> >> >> >> > Google Groups "Google App Engine" group. 
> >> >> >> > To unsubscribe from this topic, visit 
> >> >> >> > 
> >> >> >> > 
> >> >> >> > 
> >> >> >> > 
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe. 
>
> >> >> >> > To unsubscribe from this group and all its topics, send an 
> email 
> >> >> >> > to 
> >> >> >> > google-appengine+unsubscr...@googlegroups.com. 
> >> >> >> > To post to this group, send email to 
> >> >> >> > google-appengine@googlegroups.com. 
> >> >> >> > Visit this group at 
> >> >> >> > https://groups.google.com/group/google-appengine. 
> >> >> >> > To view this discussion on the web visit 
> >> >> >> > 
> >> >> >> > 
> >> >> >> > 
> >> >> >> > 
> https://groups.google.com/d/msgid/google-appengine/8ef83fef-658e-48e0-a2f5-c6aee889d455%40googlegroups.com.
>  
>
> >> >> >> > 
> >> >> >> > For more options, visit https://groups.google.com/d/optout. 
> >> >> >> 
> >> >> >> -- 
> >> >> >> https://snarfed.org/ 
> >> >> > 
> >> >> > -- 
> >> >> > You received this message because you are subscribed to a topic in 
> >> >> > the 
> >> >> > Google Groups "Google App Engine" group. 
> >> >> > To unsubscribe from this topic, visit 
> >> >> > 
> >> >> > 
> >> >> > 
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe. 
>
> >> >> > To unsubscribe from this group and all its topics, send an email 
> to 
> >> >> > google-appengine+unsubscr...@googlegroups.com. 
> >> >> > To post to this group, send email to 
> >> >> > google-appengine@googlegroups.com. 
> >> >> > Visit this group at 
> https://groups.google.com/group/google-appengine. 
> >> >> > To view this discussion on the web visit 
> >> >> > 
> >> >> > 
> >> >> > 
> https://groups.google.com/d/msgid/google-appengine/29c32354-cc82-452b-bc8f-fc4f5a62e464%40googlegroups.com.
>  
>
> >> >> > 
> >> >> > For more options, visit https://groups.google.com/d/optout. 
> >> >> 
> >> >> -- 
> >> >> https://snarfed.org/ 
> >> > 
> >> > -- 
> >> > You received this message because you are subscribed to a topic in 
> the 
> >> > Google Groups "Google App Engine" group. 
> >> > To unsubscribe from this topic, visit 
> >> > 
> >> > 
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe. 
>
> >> > To unsubscribe from this group and all its topics, send an email to 
> >> > google-appengine+unsubscr...@googlegroups.com. 
> >> > To post to this group, send email to 
> google-appengine@googlegroups.com. 
> >> > Visit this group at https://groups.google.com/group/google-appengine. 
>
> >> > To view this discussion on the web visit 
> >> > 
> >> > 
> https://groups.google.com/d/msgid/google-appengine/102fb6db-a520-41e0-8c2f-57d969560ad8%40googlegroups.com.
>  
>
> >> > 
> >> > For more options, visit https://groups.google.com/d/optout. 
> >> 
> >> 
> >> 
> >> -- 
> >> https://snarfed.org/ 
> > 
> > -- 
> > You received this message because you are subscribed to a topic in the 
> > Google Groups "Google App Engine" group. 
> > To unsubscribe from this topic, visit 
> > 
> https://groups.google.com/d/topic/google-appengine/rpendSIxJMo/unsubscribe. 
>
> > To unsubscribe from this group and all its topics, send an email to 
> > google-appengine+unsubscr...@googlegroups.com. 
> > To post to this group, send email to google-appengine@googlegroups.com. 
> > Visit this group at https://groups.google.com/group/google-appengine. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/google-appengine/345134f9-d8dc-4da8-99b0-f418c51dfd02%40googlegroups.com.
>  
>
> > 
> > For more options, visit https://groups.google.com/d/optout. 
>
>
>
> -- 
> https://snarfed.org/ 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/98b89603-c816-4dd8-9f95-c95f9f2ecaec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
  • Re: [google-appengin... 'Nick (Cloud Platform Support)' via Google App Engine
    • Re: [google-app... Ryan Barrett
      • Re: [google... 'Nick (Cloud Platform Support)' via Google App Engine
        • Re: [go... Ryan B
          • Re:... 'Nick (Cloud Platform Support)' via Google App Engine
            • ... Ryan B
            • ... 'Nick (Cloud Platform Support)' via Google App Engine
            • ... Ryan Barrett
            • ... 'Nick (Cloud Platform Support)' via Google App Engine
            • ... Ryan B
            • ... 'Nick (Cloud Platform Support)' via Google App Engine

Reply via email to