Re: [Distutils] Indexing modules in Python distributions

2017-02-09 Thread Jeremy Stanley
On 2017-02-08 18:14:38 + (+), Thomas Kluyver wrote:
[...]
> What I'm proposing differs in that it would need to download files from
> PyPI - basically all of them, if we're thorough about it. I imagine
> that's going to involve a lot of data transfer. Do we know what order of
> magnitude we're talking about?
[...]

The crowd I run with uses https://pypi.org/project/bandersnatch/ to
maintain a full PyPI mirror for our project's distributed CI system,
and du says the current aggregate size is 488GiB. Also if you want
to initialize a full mirror this way, plan for it to take several
days to populate.
-- 
Jeremy Stanley
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] GSoC 2017 - Working on pip

2017-02-09 Thread Donald Stufft
I’ve never done it before, but I’m happy to provide mentoring on this.

> On Feb 8, 2017, at 9:15 PM, Pradyun Gedam  wrote:
> 
> Hello Everyone!
> 
> Ralf Gommers suggested that I put this proposal here on this list, for 
> feedback and for seeing if anyone would be willing to mentor me. So, here it 
> is.
> 
> -
> 
> My name is Pradyun Gedam. I'm currently a first year student VIT University 
> in India.
> 
> I would like to apply for GSoC 2017 under PSF.
> 
> I currently have a project in mind - the "pip needs a dependency resolver" 
> issue [1]. I would like to take on this specific project but am willing to do 
> some other project as well.
> 
> For some background, around mid 2016, I started contributing to pip. The 
> first issue I tackled was #59 [2] - a request for upgrade command and an 
> upgrade-all command that has been open for over 5.5 years. Over the months 
> following that, I've have had the opportunity to work with and understand 
> multiple parts of pip's codebase while working on this issue and a few 
> others. This search on GitHub issues [3] also provides a good summary of what 
> work I've done on pip.
> 
> [2]: https://github.com/pypa/pip/issues/988 
> 
> [2]: https://github.com/pypa/pip/issues/59 
> 
> [3]: https://github.com/pypa/pip/issues?q=author%3Apradyunsg 
> 
> 
> Eagerly-waiting-for-a-response-ly,
> Pradyun Gedam
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig


—
Donald Stufft



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Indexing modules in Python distributions

2017-02-09 Thread Nick Coghlan
On 8 February 2017 at 19:14, Thomas Kluyver  wrote:
> What I'm proposing differs in that it would need to download files from PyPI
> - basically all of them, if we're thorough about it. I imagine that's going
> to involve a lot of data transfer. Do we know what order of magnitude we're
> talking about? Is it so large that we should be thinking of running the
> scanner in the same data centre as the file storage?

Last time I asked Donald about doing things like this, he noted that a
full mirror is ~215 GiB. That was a year or two ago so I assume the
number has gone up since then, but it should still be in the same
order of magnitude.

>From an ecosystem resilience point of view, there's also a lot to be
said for having copies of the full PyPI bulk artifact store in both
AWS S3 (which is where the production PyPI data lives) and in Azure :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] GSoC 2017 - Working on pip

2017-02-09 Thread Xavier Fernandez
That's would be a great news :)

On Thu, Feb 9, 2017 at 3:15 AM, Pradyun Gedam  wrote:

> Hello Everyone!
>
> Ralf Gommers suggested that I put this proposal here on this list, for
> feedback and for seeing if anyone would be willing to mentor me. So, here
> it is.
>
> -
>
> My name is Pradyun Gedam. I'm currently a first year student VIT
> University in India.
>
> I would like to apply for GSoC 2017 under PSF.
>
> I currently have a project in mind - the "pip needs a dependency resolver"
> issue [1]. I would like to take on this specific project but am willing to
> do some other project as well.
>
> For some background, around mid 2016, I started contributing to pip. The
> first issue I tackled was #59 [2] - a request for upgrade command and an
> upgrade-all command that has been open for over 5.5 years. Over the months
> following that, I've have had the opportunity to work with and understand
> multiple parts of pip's codebase while working on this issue and a few
> others. This search on GitHub issues [3] also provides a good summary of
> what work I've done on pip.
>
> [2]: https://github.com/pypa/pip/issues/988
> [2]: https://github.com/pypa/pip/issues/59
> [3]: https://github.com/pypa/pip/issues?q=author%3Apradyunsg
>
> Eagerly-waiting-for-a-response-ly,
> Pradyun Gedam
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Indexing modules in Python distributions

2017-02-09 Thread Thomas Kluyver
On Wed, Feb 8, 2017, at 11:06 PM, Wes Turner wrote:

> So, IIUC,

> you're looking to emit

> ((URL, release, platform), namespaces_odict)

> for each new and all existing packages;

> by uncompressing every package and running every setup.py (hopefully
> in a container)?


Something like that, yes. For packages that publish wheels, we can
analyse those directly without needing to run setup.py. Of course there
are many packages with only sdists published.


> Could this (namespace extraction) be added to 'setup.py build' for
> the future?


Potentially. As I mentioned, there is a place in the metadata to put
this information - the 'Provides' field.  However, relying on package
uploaders would take a long time to build up decent coverage of the
available packages, so I'm inclined to focus on scanning PyPI, similar
to the tool Chris already showed.


Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig