Re: [Nix-dev] Upcoming PyPi URL Scheme Change

2016-04-26 Thread Domen Kožar
This is now fixed in master, we should also backport to 16.03

Thanks to Freddy:
https://github.com/NixOS/nixpkgs/commit/d5e6a4494a2eb00e52b309fc7a196d84ff8625ec

On Thu, Apr 21, 2016 at 7:27 AM, Dario Bertini  wrote:

> I also started to write some code to automate discovery of python package
> dependencies.
>
> Unfortunately I haven't had the chance to keep working on it. And some of
> the formats are quite ambiguous  (does the lack of a run_requires key mean
> that we should look for the information somewhere else, or does this
> package have no dependencies?)
>
> I started by writing some code to munge the setup.py files, to extract
> some information from them. Unfortunately it won't be able to work on any
> setup.py  (unless by using something like fuckit.py, ugh)... Also, due to
> some grammar changes, it currently only works with Python3.4
>
> (I wanted to write it for Nix purposes, but the code that I wrote up to
> now is not nix-specific, and I thus chose the pypi4all name)
>
> I'll try to add the few other incomplete changes that I have now, and add
> a couple or tests...
>
> It uses a little bit of the internal pip api, which is not stable (and
> requires a recent enough version of pip+setuptools) , but at least it means
> that it shouldn't be affected by changes like the one in the subject.
>
> You also don't want to be executing this on a trusted machine, since it'll
> fetch stuff from pypi that we don't know in advance if it could be malicious
>
> https://github.com/berdario/pypi4all
>
> On 21 April 2016 07:02:17 BST, Freddy Rietdijk 
> wrote:
> >Thanks for the update.
> >
> >There are indeed some things we can automate. Before, I experimented
> >with
> >using one of the API's to get out as much metadata as possible. We
> >could
> >also use pypi2nix, which can give for more information, but requires
> >downloading all files.
> >Unfortunately, the old site still uses MD5 so I quit my effort using
> >the
> >API. The new site (https://warehouse.python.org/) uses SHA256 though.
> >
> >If this change in URL scheme is really going to happen I think we
> >should
> >start using the API to find the correct file, version, hash,
> >description
> >and license. Optionally, we should make it possible to run pypi2nix to
> >extract more, and more precise, information.
> >
> >See also https://github.com/NixOS/nixpkgs/issues/11587.
> >
> >On Thu, Apr 21, 2016 at 12:31 AM, Profpatsch 
> >wrote:
> >
> >> On 16-04-20 11:41am, Graham Christensen wrote:
> >> > I recently got word that PyPi is changing their URL scheme.
> >> >
> >> > Old example:
> >> >
> >>
> >
> https://pypi.python.org/packages/source/a/ansible/ansible-1.8.2.tar.gz#md5=c2ac0e5a4c092dfa84c7e9e51cd45095
> >> >
> >> > New example:
> >> >
> >>
> >
> https://pypi.python.org/packages/62/18/91f0e5059373e9b87588c2a1c3b4c3c08ee89e0443aa2017469a4cdae41c/SCRY-1.1.2-py2-none-any.whl#md5=a3c636c4e94df1f0644b6917a9c05e67
> >>
> >> This is going to be a lot of work.
> >>
> >> > Yet another option is to run a sort of "translator" service
> >that can
> >> consume
> >> > the PyPI JSON API and will output the URLs in whatever format
> >best
> >> suites you.
> >> > An example of this is pypi.debian.net (which I don't know where
> >the
> >> code base
> >> > for it is, but the proof of concept I wrote for it is at
> >> > https://github.com/dstufft/pypi-debian). These translators are
> >> fairly simple,
> >> > they take an URL, pull the project and filename out of it and
> >then
> >> use the JSON
> >> > API to figure out the "real" URL and then just simply redirects
> >to
> >> that.
> >>
> >> Maybe it’s time to automate what we can? Similar to Hackage?
> >>
> >> --
> >> Proudly written in Mutt with Vim on NixOS.
> >> Q: Why is this email five sentences or less?
> >> A: http://five.sentenc.es
> >> May take up to five days to read your message. If it’s urgent, call
> >me.
> >> ___
> >> nix-dev mailing list
> >> nix-dev@lists.science.uu.nl
> >> http://lists.science.uu.nl/mailman/listinfo/nix-dev
> >>
> >
> >
> >
> >
> >___
> >nix-dev mailing list
> >nix-dev@lists.science.uu.nl
> >http://lists.science.uu.nl/mailman/listinfo/nix-dev
>
> --
> Sent from mobile. Please excuse my brevity.
> ___
> nix-dev mailing list
> nix-dev@lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
>
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] Upcoming PyPi URL Scheme Change

2016-04-20 Thread Dario Bertini
I also started to write some code to automate discovery of python package 
dependencies.

Unfortunately I haven't had the chance to keep working on it. And some of the 
formats are quite ambiguous  (does the lack of a run_requires key mean that we 
should look for the information somewhere else, or does this package have no 
dependencies?)

I started by writing some code to munge the setup.py files, to extract some 
information from them. Unfortunately it won't be able to work on any setup.py  
(unless by using something like fuckit.py, ugh)... Also, due to some grammar 
changes, it currently only works with Python3.4

(I wanted to write it for Nix purposes, but the code that I wrote up to now is 
not nix-specific, and I thus chose the pypi4all name)

I'll try to add the few other incomplete changes that I have now, and add a 
couple or tests...

It uses a little bit of the internal pip api, which is not stable (and requires 
a recent enough version of pip+setuptools) , but at least it means that it 
shouldn't be affected by changes like the one in the subject.

You also don't want to be executing this on a trusted machine, since it'll 
fetch stuff from pypi that we don't know in advance if it could be malicious

https://github.com/berdario/pypi4all

On 21 April 2016 07:02:17 BST, Freddy Rietdijk  wrote:
>Thanks for the update.
>
>There are indeed some things we can automate. Before, I experimented
>with
>using one of the API's to get out as much metadata as possible. We
>could
>also use pypi2nix, which can give for more information, but requires
>downloading all files.
>Unfortunately, the old site still uses MD5 so I quit my effort using
>the
>API. The new site (https://warehouse.python.org/) uses SHA256 though.
>
>If this change in URL scheme is really going to happen I think we
>should
>start using the API to find the correct file, version, hash,
>description
>and license. Optionally, we should make it possible to run pypi2nix to
>extract more, and more precise, information.
>
>See also https://github.com/NixOS/nixpkgs/issues/11587.
>
>On Thu, Apr 21, 2016 at 12:31 AM, Profpatsch 
>wrote:
>
>> On 16-04-20 11:41am, Graham Christensen wrote:
>> > I recently got word that PyPi is changing their URL scheme.
>> >
>> > Old example:
>> >
>>
>https://pypi.python.org/packages/source/a/ansible/ansible-1.8.2.tar.gz#md5=c2ac0e5a4c092dfa84c7e9e51cd45095
>> >
>> > New example:
>> >
>>
>https://pypi.python.org/packages/62/18/91f0e5059373e9b87588c2a1c3b4c3c08ee89e0443aa2017469a4cdae41c/SCRY-1.1.2-py2-none-any.whl#md5=a3c636c4e94df1f0644b6917a9c05e67
>>
>> This is going to be a lot of work.
>>
>> > Yet another option is to run a sort of "translator" service
>that can
>> consume
>> > the PyPI JSON API and will output the URLs in whatever format
>best
>> suites you.
>> > An example of this is pypi.debian.net (which I don't know where
>the
>> code base
>> > for it is, but the proof of concept I wrote for it is at
>> > https://github.com/dstufft/pypi-debian). These translators are
>> fairly simple,
>> > they take an URL, pull the project and filename out of it and
>then
>> use the JSON
>> > API to figure out the "real" URL and then just simply redirects
>to
>> that.
>>
>> Maybe it’s time to automate what we can? Similar to Hackage?
>>
>> --
>> Proudly written in Mutt with Vim on NixOS.
>> Q: Why is this email five sentences or less?
>> A: http://five.sentenc.es
>> May take up to five days to read your message. If it’s urgent, call
>me.
>> ___
>> nix-dev mailing list
>> nix-dev@lists.science.uu.nl
>> http://lists.science.uu.nl/mailman/listinfo/nix-dev
>>
>
>
>
>
>___
>nix-dev mailing list
>nix-dev@lists.science.uu.nl
>http://lists.science.uu.nl/mailman/listinfo/nix-dev

-- 
Sent from mobile. Please excuse my brevity.
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] Upcoming PyPi URL Scheme Change

2016-04-20 Thread Freddy Rietdijk
Thanks for the update.

There are indeed some things we can automate. Before, I experimented with
using one of the API's to get out as much metadata as possible. We could
also use pypi2nix, which can give for more information, but requires
downloading all files.
Unfortunately, the old site still uses MD5 so I quit my effort using the
API. The new site (https://warehouse.python.org/) uses SHA256 though.

If this change in URL scheme is really going to happen I think we should
start using the API to find the correct file, version, hash, description
and license. Optionally, we should make it possible to run pypi2nix to
extract more, and more precise, information.

See also https://github.com/NixOS/nixpkgs/issues/11587.

On Thu, Apr 21, 2016 at 12:31 AM, Profpatsch  wrote:

> On 16-04-20 11:41am, Graham Christensen wrote:
> > I recently got word that PyPi is changing their URL scheme.
> >
> > Old example:
> >
> https://pypi.python.org/packages/source/a/ansible/ansible-1.8.2.tar.gz#md5=c2ac0e5a4c092dfa84c7e9e51cd45095
> >
> > New example:
> >
> https://pypi.python.org/packages/62/18/91f0e5059373e9b87588c2a1c3b4c3c08ee89e0443aa2017469a4cdae41c/SCRY-1.1.2-py2-none-any.whl#md5=a3c636c4e94df1f0644b6917a9c05e67
>
> This is going to be a lot of work.
>
> > Yet another option is to run a sort of "translator" service that can
> consume
> > the PyPI JSON API and will output the URLs in whatever format best
> suites you.
> > An example of this is pypi.debian.net (which I don't know where the
> code base
> > for it is, but the proof of concept I wrote for it is at
> > https://github.com/dstufft/pypi-debian). These translators are
> fairly simple,
> > they take an URL, pull the project and filename out of it and then
> use the JSON
> > API to figure out the "real" URL and then just simply redirects to
> that.
>
> Maybe it’s time to automate what we can? Similar to Hackage?
>
> --
> Proudly written in Mutt with Vim on NixOS.
> Q: Why is this email five sentences or less?
> A: http://five.sentenc.es
> May take up to five days to read your message. If it’s urgent, call me.
> ___
> nix-dev mailing list
> nix-dev@lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
>
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] Upcoming PyPi URL Scheme Change

2016-04-20 Thread Graham Christensen
Profpatsch  writes:

> Maybe it’s time to automate what we can? Similar to Hackage?

I'm not too familiar with what is done with Hackage, but I think what
we have for Python isn't a good approach. I have a meeting with dstufft
to try and come up with a better solution. It might be helpful to
understand what we have with Hackage to do a better job.

I know Domen has specific expertise here, and probably has some really
valuable feedback.

My problem with the current system is we have many arbitrarily versioned
python packages referencing each other in a haphazardly developed graph
of dependencies. Upgrading one package has a nasty cascading effect of
needing to upgrade each of the other ones depending on it. This has
stymied at least one of my attempts at contributing fixes.

For development dependencies, more fully automating it is probably the
best approach.

I think for applications, it would be more beneficial to take an
approach similar to Bundix, Npm2Nix, etc. The community's current tools
for Python (pypi2nix, pip2nix, others?) seem to work on some types of
packages, and sometimes not on others.

I have a prototype of an alternative method which leans harder on pip
to do the work than nix. Instead of building each python dependency in
its own derivation:

 1. it creates a fake Pypi mirror of all the dependent packages
 2. installs all of the packages at once with `pip install -r
 requirements.txt`

This avoids issues like circular dependencies and other complexities of
how python packaging works, but is a much heavier-weight installation
mechanism. I'll have to test more before saying it is good or not.

Best,
Graham
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] Upcoming PyPi URL Scheme Change

2016-04-20 Thread Profpatsch
On 16-04-20 11:41am, Graham Christensen wrote:
> I recently got word that PyPi is changing their URL scheme.
> 
> Old example:
> https://pypi.python.org/packages/source/a/ansible/ansible-1.8.2.tar.gz#md5=c2ac0e5a4c092dfa84c7e9e51cd45095
> 
> New example:
> https://pypi.python.org/packages/62/18/91f0e5059373e9b87588c2a1c3b4c3c08ee89e0443aa2017469a4cdae41c/SCRY-1.1.2-py2-none-any.whl#md5=a3c636c4e94df1f0644b6917a9c05e67

This is going to be a lot of work.

> Yet another option is to run a sort of "translator" service that can 
> consume
> the PyPI JSON API and will output the URLs in whatever format best suites 
> you.
> An example of this is pypi.debian.net (which I don't know where the code 
> base
> for it is, but the proof of concept I wrote for it is at
> https://github.com/dstufft/pypi-debian). These translators are fairly 
> simple,
> they take an URL, pull the project and filename out of it and then use 
> the JSON
> API to figure out the "real" URL and then just simply redirects to that.

Maybe it’s time to automate what we can? Similar to Hackage?

-- 
Proudly written in Mutt with Vim on NixOS.
Q: Why is this email five sentences or less?
A: http://five.sentenc.es
May take up to five days to read your message. If it’s urgent, call me.
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


[Nix-dev] Upcoming PyPi URL Scheme Change

2016-04-20 Thread Graham Christensen

Hello Nixers,

I recently got word that PyPi is changing their URL scheme.

Old example:
https://pypi.python.org/packages/source/a/ansible/ansible-1.8.2.tar.gz#md5=c2ac0e5a4c092dfa84c7e9e51cd45095

New example:
https://pypi.python.org/packages/62/18/91f0e5059373e9b87588c2a1c3b4c3c08ee89e0443aa2017469a4cdae41c/SCRY-1.1.2-py2-none-any.whl#md5=a3c636c4e94df1f0644b6917a9c05e67

This is just a heads-up for anyone who updates the next python package.

>From Donald Stufft, of PyPa:

So, previously PyPI used URLs like :
/packages/{python version}/{name[0]}/{name}/{filename}

Now it uses:
/packages/{hash[:2]}/{hash[2:4]}/{hash[4:]}/{filename}
Where hash is blake2b(file_content, digest_size=32).hexdigest().lower()

There are a few reasons for this:

* We generally do not allow people to delete a file and re-upload the same
  version again. However the old lay out generally means that we *can't* do
  that even if we wanted to because HTTP clients will use the URL as the key
  for a cache and thus it can never change (other than to be deleted).

* The file system is not transactional and isn't part of the database, which
  means we get put in a funny pickle where we have to decide if we persist 
the
  change to the file system *prior* to committing the transaction or *after*
  committing. Both ways have their ups and downs and neither solves all of 
the
  issues. In general, on upload we try to save the file prior to committing
  because once it's been committed downstream users will expect it to exist
  and if we haven't saved the file to disk yet it may not yet exist yet (and
  if saving fails, it may never exist).

  However, this raises a problem. We're currently using Amazon S3 to save
  files which is an eventually consistent data store. When writing a brand
  new file it will be (in the S3 region we're using) available immediately
  after writing a *new* file, however for writing a file that has already
  existed it can take some time for it to be consistent (reportedly being 
able
  to take up to hours for this to occur). This leaves us in a sticky 
situation
  where someone can run this:

  setup.py sdist upload

  And have PyPI accept the upload, write it to S3 and then fail to commit 
the
  upload. Then when the user re-runs that we'll write the file to S3 again
  (however it will have changed contents because ``setup.py sdist`` is not
  deterministic) and then commit the database, succeeding this time. If this
  happens then in the time period between when the database commits and when
  Amazon S3 has yet to update the file to the latest version (possibly 
taking
  hours) everyone is going to fail downloading/installing that file because
  the hash we're getting from Amazon S3 isn't going to match the hash that 
we
  have recorded in the PyPI database. To make this even more painful, we
  utilize download caching of the files pretty heavily and to do that we 
make
  the assumption that the contents at the URL will never change. So not only
  will it be broken in that window before Amazon S3 has become consistent, 
it
  will be persistently broken for anyone who attempted to install it until
  they go out of their way to delete their cache. By making the URL 
determined
  by the *contents* of the file, we make it so repeating the same upload 
with
  different contents will by definition end up with a different URL side
  stepping the entire problem.

* When a file gets deleted from PyPI we have to delete it from the backing
  store too because the URL is predictable and people attempt to short 
circuit
  the Simple Repository API and we want a file deletion to, by default, mean
  that people don't discover that version. However, this flies in the face 
of
  people who use the simple repository API to resolve a version (or the Web 
UI)
  who then want to resolved URL into something with the expectation it will 
not
  change or go away. This change allows us to simply stop deleting files, so
  that if someone bakes a file URL into something it can continue to work 
into
  perpetuity without people accidentally installing that through simple URL
  building in the end user software.


Now even though the specific location of the file has not been considered 
part
of our "API" nonetheless people have over time baked in assumptions about 
that
URL scheme in various things, and obviously this change will break those
things. So then how should someone deal with this change?

Well, the simplest (though perhaps not the least effort) is to remove 
whatever
assumptions have been made and replace them with the new URL structure. This
will fix things today, but it may or may not be the case that tomorrow the 
URL
structure changes again.

Another option is to disco