Re: [Distutils] PyPI index workaround

2016-07-19 Thread Михаил Голубев
Well, it isn't that bandwidth intensive, these recent changes only cause
extra (but tiny and potentially faster) requests to PyPI. We've also
decided to update versions of packages only once a day unless user
explicitly forced refreshing of packages list. If PyPI is not available for
some reason, e.g. due to network problems, we merely don't show package
versions, it won't break any other IDE functionality.

2016-07-15 1:45 GMT+02:00 Matt Bacchi :

> OT: I hope you're going to provide a setting to allow the user to disable
> this unnecessary and bandwith intensive 'feature'?
>
> -Matt
>
> From: "Михаил Голубев" 
> To: Donald Stufft 
> Cc: Dmitry Trofimov ,
> distutils-sig@python.org
> Date: Wed, 13 Jul 2016 23:21:24 +0300
> Subject: Re: [Distutils] PyPI index workaround
> Right, sorry, that initial question wasn't clear about that.
>
> We need the latest versions only for installed packages. Nonetheless, as
> you noted, it's still several dozens consecutive requests to
> "/simple/" for each PyCharm session of every user.
>
> Can you handle that?
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>


-- 
Best regards
Mikhail Golubev
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-14 Thread Matt Bacchi
OT: I hope you're going to provide a setting to allow the user to disable
this unnecessary and bandwith intensive 'feature'?

-Matt

From: "Михаил Голубев" 
To: Donald Stufft 
Cc: Dmitry Trofimov ,
distutils-sig@python.org
Date: Wed, 13 Jul 2016 23:21:24 +0300
Subject: Re: [Distutils] PyPI index workaround
Right, sorry, that initial question wasn't clear about that.

We need the latest versions only for installed packages. Nonetheless, as
you noted, it's still several dozens consecutive requests to
"/simple/" for each PyCharm session of every user.

Can you handle that?
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-14 Thread Donald Stufft

> On Jul 14, 2016, at 5:30 AM, Михаил Голубев  wrote:
> 
> Ok, you convinced me that these extra requests from PyCharm won't cause you 
> any problems. Impressive stats, by the way :)
> 
> We will focus on migrating our packaging-related features to these new 
> endpoints; hopefully, it won't take long. Note, however, that we need to 
> prepare updates for already released versions of PyCharm. We'll let you know 
> as soon as everything is ready.
> 
> Ernest W. Durbin III suggested changing User-Agent, so that it's clear which 
> requests come from PyCharm. To me it seems a fair point.
> 
> Batch API, as mentioned by Steve Dower, are very welcome, anyway. Also 
> "/simple" index is still HTML page. Honestly, it's a bit cumbersome that this 
> information can be received only by scraping HTML and for everything else 
> there are JSON REST API and XML-RPC.

Yea, I plan on a new “next gen” API in Warehouse at some point that will be 
much cleaner overall and not require multiple different formats to use :). For 
the record, XML-RPC should be avoided where possible as well, we also can’t 
cache that in the CDN (because it’s a POST request to the same URL for all 
routes, and the CDN can’t inspect the body of a POST request to determine cache 
key). 

> 
> Is anyone from PyPA attending to EuroPython next week? We could discuss these 
> matters further there.

I’m not. I’m not sure if anyone else is.

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-14 Thread Михаил Голубев
Ok, you convinced me that these extra requests from PyCharm won't cause you
any problems. Impressive stats, by the way :)

We will focus on migrating our packaging-related features to these new
endpoints; hopefully, it won't take long. Note, however, that we need to
prepare updates for already released versions of PyCharm. We'll let you
know as soon as everything is ready.

Ernest W. Durbin III suggested changing User-Agent, so that it's clear
which requests come from PyCharm. To me it seems a fair point.

Batch API, as mentioned by Steve Dower, are very welcome, anyway. Also
"/simple" index is still HTML page. Honestly, it's a bit cumbersome that
this information can be received only by scraping HTML and for everything
else there are JSON REST API and XML-RPC.

Is anyone from PyPA attending to EuroPython next week? We could discuss
these matters further there.







2016-07-13 23:54 GMT+03:00 Donald Stufft :

>
> On Jul 13, 2016, at 4:21 PM, Михаил Голубев  wrote:
>
> Can you handle that?
>
>
>
> Oh, and just to put things in scale in the past 30 days:
>
> * PyPI has served > 3 billion HTTP requests.
> * PyPI has served > 327TB of bandwidth.
> * The 95%tile for cache hit vs cache miss is 92%.
> * We regularly serve >1,000 concurrent requests -
> https://s.caremad.io/QDTlK0mRj7/
>
> —
> Donald Stufft
>
>
>
>


-- 
Best regards
Mikhail Golubev
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-13 Thread Donald Stufft

> On Jul 13, 2016, at 4:21 PM, Михаил Голубев  wrote:
> 
> Can you handle that?


Oh, and just to put things in scale in the past 30 days:

* PyPI has served > 3 billion HTTP requests.
* PyPI has served > 327TB of bandwidth.
* The 95%tile for cache hit vs cache miss is 92%.
* We regularly serve >1,000 concurrent requests - 
https://s.caremad.io/QDTlK0mRj7/ 

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-13 Thread Donald Stufft

> On Jul 13, 2016, at 4:21 PM, Михаил Голубев  wrote:
> 
> Right, sorry, that initial question wasn't clear about that. 
> 
> We need the latest versions only for installed packages. Nonetheless, as you 
> noted, it's still several dozens consecutive requests to 
> "/simple/" for each PyCharm session of every user. 
> 
> Can you handle that?


The short answer is yes.

The longer answer is, that we have Fastly acting as a CDN in front of PyPI and 
serving an item out of the cache in Fastly is essentially free for us in terms 
of resources (obviously Fastly needs to handle that load, but they’re well 
equipped to handle much larger loads than we are). Thus, the more cacheable 
(and the longer lived a particular cache item can be) the easier it is for us 
to scale a particular URL on PyPI.

The url you’re currently using has a view downsides that prevent it from being 
able to be cached effectively:

* The URL is a “UI” URL, so it includes information like current logged in user 
and thus we need to Vary: Cookie which means it’s less likely to be cached at 
all since each unique cookie header adds another response to be cached for that 
URL, and Fastly will only save ~200 responses per URL before it starts to evict 
some.

* Similarly to above, since it’s a “UI” URL people expect it to update fairly 
quickly, because legacy PyPI wasn’t implemented with long lived caching with 
purging on updates in mind, it was easier to simply implement it with a short 
(5 minute IIRC) TTL on the cached object rather than long lived TTLs with 
purging (as we do in the “API” urls).

* Responses that act as collections of projects need to be invalidated anytime 
something changes that may invalidate that collection. In an API that lists 
every project and the latest version, that means it needs to be invalidated 
anytime something releases a new version.

Compare that to looking at /simple/ and then either accessing /simple// or 
/pypi//json (all of which are cached for long periods of time and purged 
on demand).

* None of those are “UI” URLs, so they have long cache times and they do not 
Vary on Cookie.

* For /simple/ we don’t list any versions we only list projects themselves. 
This means that we only need to invalidate this page whenever a brand new 
project is added to PyPI or an existing project is completely deleted. This 
occurs far less than someone releasing an existing project.

* For /simple/ we don’t need to do any particularly heavy duty querying, it’s a 
simple select on an ~80k length table (versus a select on an 80k length table, 
with a join to a 500k length table) and is fairly quick to render.

* For /simple// and /pypi//json these are scoped to an individual 
project, so they can be cached for a very long time and only invalidated when 
that particular project releases, not when _any_ project releases. This means 
that the likelihood we can serve one of these out of cache is VERY high.

* For /simple// and /pypi//json our SQL queries are relatively quick 
because they don’t need to operate over the entire table, but only over the 
records for one single project.

Given all of the above, and the fact that listing every project and their 
latest version is *slow* and resource intensive, yes it’s very likely that 
doing that will be far better for our ability to serve your requests, because 
the extra requests will almost certainly be able to be served straight from the 
Fastly caches and never hit our origin servers at all.

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-13 Thread Steve Dower
I'm also interested (for the same support in Visual Studio) though we're 
unaffected by this change.

A batch API to get info for many packages would be great. Currently we scrape 
simple and then post JSON queries for individual packages.

Cheers,
Steve

Top-posted from my Windows Phone

-Original Message-
From: "Михаил Голубев" 
Sent: ‎7/‎13/‎2016 13:04
To: "Donald Stufft" 
Cc: "distutils-sig@python.org" 
Subject: Re: [Distutils] PyPI index workaround

I'm sorry, I should have posted my commentary here, not in the separate thread.
 
We have some issues with suggested "/simple" endpoint. Despite the need to 
scrap the web page, old endpoint allowed us to quickly find latest versions of 
the packages hosted on PyPI. We did a single request on IDE startup and showed 
outdated installed packages in the settings later. Index "/simple" however 
contains only package names and links to the dedicated pages with their 
artifacts (not for each of them, though). It means that now we have to make 
tons of individual requests to find the latest published version for each 
installed package. Isn't it going to load the service even worse?


So, yes, we're interested most in the latest version of a package. 


2016-07-13 21:57 GMT+03:00 Donald Stufft :



On Jul 13, 2016, at 2:43 PM, Dmitry Trofimov  
wrote:


Hi,


to have information about available packages, PyCharm IDE currently parses
the PyPI index page (https://pypi.python.org/pypi?%3Aaction=index).
As it is going to be deprecated soon, we are looking for a workaround.


What we need is, making one request, to get the name and the version of all 
PyPI packages. Then we cache this information in the IDE 
(https://github.com/JetBrains/intellij-community/blob/7e16c042a19767d5f548c84f88cc5edd5f9d1721/python/src/com/jetbrains/python/packaging/PyPIPackageUtil.java).


By name and version, do you mean the latest version?


—
Donald Stufft








___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig







-- 

Best regards
Mikhail Golubev___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-13 Thread Михаил Голубев
Right, sorry, that initial question wasn't clear about that.

We need the latest versions only for installed packages. Nonetheless, as
you noted, it's still several dozens consecutive requests to
"/simple/" for each PyCharm session of every user.

Can you handle that?

2016-07-13 22:56 GMT+03:00 Donald Stufft :

>
> On Jul 13, 2016, at 3:40 PM, Dmitry Trofimov <
> dmitry.trofi...@jetbrains.com> wrote:
>
> Does that mean that PyPI index page will live for a while until the new
> API is implemented?
>
>
> Yes, though I’m looking at this right now.
>
> I do have a question here though. If I understand the dialog, this is to
> provide a way for people to upgrade packages they have installed, and to
> tell them if their is a newer version or not. So my question here is why do
> you need the latest version for *every* package instead of just the ones
> you have installed?
>
> If you narrow it down to just the ones that are installed, then the number
> of HTTP requests needed with the current APIs goes down from ~80,000 to
> likely <100 or even <50 in most cases.
>
> —
> Donald Stufft
>
>
>
>


-- 
Best regards
Mikhail Golubev
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-13 Thread Михаил Голубев
I'm sorry, I should have posted my commentary here, not in the separate
thread.


> We have some issues with suggested "/simple" endpoint. Despite the need to
> scrap the web page, old endpoint allowed us to quickly find latest versions
> of the packages hosted on PyPI. We did a single request on IDE startup and
> showed outdated installed packages in the settings later. Index "/simple"
> however contains only package names and links to the dedicated pages with
> their artifacts (not for each of them, though). It means that now we have
> to make tons of individual requests to find the latest published version
> for each installed package. Isn't it going to load the service even worse?


So, yes, we're interested most in the latest version of a package.

2016-07-13 21:57 GMT+03:00 Donald Stufft :

>
> On Jul 13, 2016, at 2:43 PM, Dmitry Trofimov <
> dmitry.trofi...@jetbrains.com> wrote:
>
> Hi,
>
> to have information about available packages, PyCharm IDE currently parses
> the PyPI index page (https://pypi.python.org/pypi?%3Aaction=index).
> As it is going to be deprecated soon, we are looking for a workaround.
>
> What we need is, making one request, to get the name and the version of
> all PyPI packages. Then we cache this information in the IDE (
> https://github.com/JetBrains/intellij-community/blob/7e16c042a19767d5f548c84f88cc5edd5f9d1721/python/src/com/jetbrains/python/packaging/PyPIPackageUtil.java
> ).
>
>
> By name and version, do you mean the latest version?
>
> —
> Donald Stufft
>
>
>
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>


-- 
Best regards
Mikhail Golubev
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-13 Thread Donald Stufft

> On Jul 13, 2016, at 3:40 PM, Dmitry Trofimov  
> wrote:
> 
> Does that mean that PyPI index page will live for a while until the new API 
> is implemented? 

Yes, though I’m looking at this right now.

I do have a question here though. If I understand the dialog, this is to 
provide a way for people to upgrade packages they have installed, and to tell 
them if their is a newer version or not. So my question here is why do you need 
the latest version for *every* package instead of just the ones you have 
installed?

If you narrow it down to just the ones that are installed, then the number of 
HTTP requests needed with the current APIs goes down from ~80,000 to likely 
<100 or even <50 in most cases.

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-13 Thread Dmitry Trofimov
>
>
> Ok, we don’t currently have an API like that (largely because nobody has
> come up with a use case that was pressing enough to need to devote
> resources to it). It was requested though, and is being tracked by
> https://github.com/pypa/warehouse/issues/347. This is likely enough to
> pull this issue onto my radar as sooner rather than later issue.


Does that mean that PyPI index page will live for a while until the new API
is implemented?

On Wed, Jul 13, 2016 at 9:25 PM, Donald Stufft  wrote:

>
> On Jul 13, 2016, at 3:12 PM, Михаил Голубев  wrote:
>
> I'm sorry, I should have posted my commentary here, not in the separate
> thread.
>
>
>> We have some issues with suggested "/simple" endpoint. Despite the need
>> to scrap the web page, old endpoint allowed us to quickly find latest
>> versions of the packages hosted on PyPI. We did a single request on IDE
>> startup and showed outdated installed packages in the settings later. Index
>> "/simple" however contains only package names and links to the dedicated
>> pages with their artifacts (not for each of them, though). It means that
>> now we have to make tons of individual requests to find the latest
>> published version for each installed package. Isn't it going to load the
>> service even worse?
>
>
> So, yes, we're interested most in the latest version of a package.
>
>
>
> Ok, we don’t currently have an API like that (largely because nobody has
> come up with a use case that was pressing enough to need to devote
> resources to it). It was requested though, and is being tracked by
> https://github.com/pypa/warehouse/issues/347. This is likely enough to
> pull this issue onto my radar as sooner rather than later issue.
>
>
> —
> Donald Stufft
>
>
>
>


-- 

Dmitry Trofimov
PyCharm Team Lead
JetBrainshttp://www.jetbrains.com
The Drive To Develop
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-13 Thread Donald Stufft

> On Jul 13, 2016, at 3:12 PM, Михаил Голубев  wrote:
> 
> I'm sorry, I should have posted my commentary here, not in the separate 
> thread.
>  
> We have some issues with suggested "/simple" endpoint. Despite the need to 
> scrap the web page, old endpoint allowed us to quickly find latest versions 
> of the packages hosted on PyPI. We did a single request on IDE startup and 
> showed outdated installed packages in the settings later. Index "/simple" 
> however contains only package names and links to the dedicated pages with 
> their artifacts (not for each of them, though). It means that now we have to 
> make tons of individual requests to find the latest published version for 
> each installed package. Isn't it going to load the service even worse?
> 
> So, yes, we're interested most in the latest version of a package. 
> 


Ok, we don’t currently have an API like that (largely because nobody has come 
up with a use case that was pressing enough to need to devote resources to it). 
It was requested though, and is being tracked by 
https://github.com/pypa/warehouse/issues/347 
. This is likely enough to pull 
this issue onto my radar as sooner rather than later issue.


—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI index workaround

2016-07-13 Thread Donald Stufft

> On Jul 13, 2016, at 2:43 PM, Dmitry Trofimov  
> wrote:
> 
> Hi,
> 
> to have information about available packages, PyCharm IDE currently parses
> the PyPI index page (https://pypi.python.org/pypi?%3Aaction=index 
> ).
> As it is going to be deprecated soon, we are looking for a workaround.
> 
> What we need is, making one request, to get the name and the version of all 
> PyPI packages. Then we cache this information in the IDE 
> (https://github.com/JetBrains/intellij-community/blob/7e16c042a19767d5f548c84f88cc5edd5f9d1721/python/src/com/jetbrains/python/packaging/PyPIPackageUtil.java
>  
> ).

By name and version, do you mean the latest version?

—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] PyPI index workaround

2016-07-13 Thread Dmitry Trofimov
Hi,

to have information about available packages, PyCharm IDE currently parses
the PyPI index page (https://pypi.python.org/pypi?%3Aaction=index).
As it is going to be deprecated soon, we are looking for a workaround.

What we need is, making one request, to get the name and the version of all
PyPI packages. Then we cache this information in the IDE (
https://github.com/JetBrains/intellij-community/blob/7e16c042a19767d5f548c84f88cc5edd5f9d1721/python/src/com/jetbrains/python/packaging/PyPIPackageUtil.java
).

What official API could you advise us to look at?

Any hint is appreciated.


Best regards,
Dmitry
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig