[Distutils] GSoC 2017 - Working on pip

2017-02-08 Thread Pradyun Gedam
Hello Everyone!

Ralf Gommers suggested that I put this proposal here on this list, for
feedback and for seeing if anyone would be willing to mentor me. So, here
it is.

-

My name is Pradyun Gedam. I'm currently a first year student VIT University
in India.

I would like to apply for GSoC 2017 under PSF.

I currently have a project in mind - the "pip needs a dependency resolver"
issue [1]. I would like to take on this specific project but am willing to
do some other project as well.

For some background, around mid 2016, I started contributing to pip. The
first issue I tackled was #59 [2] - a request for upgrade command and an
upgrade-all command that has been open for over 5.5 years. Over the months
following that, I've have had the opportunity to work with and understand
multiple parts of pip's codebase while working on this issue and a few
others. This search on GitHub issues [3] also provides a good summary of
what work I've done on pip.

[2]: https://github.com/pypa/pip/issues/988
[2]: https://github.com/pypa/pip/issues/59
[3]: https://github.com/pypa/pip/issues?q=author%3Apradyunsg

Eagerly-waiting-for-a-response-ly,
Pradyun Gedam
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Indexing modules in Python distributions

2017-02-08 Thread Wes Turner
On Wednesday, February 8, 2017, Thomas Kluyver  wrote:

> Thanks Steve, Chris,
>
> On Tue, Feb 7, 2017, at 04:49 PM, Chris Wilcox wrote:
>
> I may be able to help jump-start this a bit and provide a platform for
> this to run on. I deployed a small service that scans PyPI to figure out
> statistics on Python 2 vs Python 3 support using PyPI Classifiers. The
> source is on GitHub: https://github.com/crwilcox/PyPI-Gatherer. It
> watches the PyPI updates feed and refreshes entries for packages as they
> show up as modified. It should be possible to add your lib, query, and add
> an additional row or two to the result. I am happy to work together on
> this. Also, the data is stored in an Azure Table Storage which has rest
> endpoints (and a Python SDK) that makes getting the published data
> straight-forward.
>
>
> I had a quick look through this, and it does look like it should provide a
> useful framework for scanning PyPI and updating the results. :-)
>
> What I'm proposing differs in that it would need to download files from
> PyPI - basically all of them, if we're thorough about it. I imagine that's
> going to involve a lot of data transfer. Do we know what order of magnitude
> we're talking about? Is it so large that we should be thinking of running
> the scanner in the same data centre as the file storage?
>


So, IIUC,
you're looking to emit
((URL, release, platform), namespaces_odict)
for each new and all existing packages;
by uncompressing every package and running every setup.py (hopefully in a
container)?

https://github.com/python/pypi-salt/blob/master/provisioning/salt/roots/pillar/top.sls

https://github.com/python/pypi-salt/blob/master/provisioning/salt/roots/pillar/warehouse-deploys/warehouse-dev.sls

https://github.com/python/pypi-salt/blob/master/provisioning/salt/roots/salt/warehouse/web.sls

-
https://github.com/pypa/warehouse/blob/master/warehouse/packaging/search.py
 - elasticsearch_dsl
-
https://github.com/pypa/warehouse/blob/master/warehouse/packaging/models.py
  - SQLAlchemy
- https://github.com/pypa/warehouse/blob/master/warehouse/celery.py
  - celery

- https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py
  - namespaces are useful metadata (worth adding to the spec)
- https://github.com/pypa/interoperability-peps/issues/31
  - JSONLD

- https://github.com/python/psf-salt/blob/master/pillar/prod/top.sls
- https://github.com/python/psf-salt/blob/master/pillar/prod/roles.sls

- One CI project (container FROM python: (debian)) per python package with
additional metadata per project?
  - conda-forge solves for this case
- and then how to post the extra metadata (build artifact) back from
the CI build and mark the task as done


Could this (namespace extraction) be added to 'setup.py build' for the
future?


>
> Thomas
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Indexing modules in Python distributions

2017-02-08 Thread Thomas Kluyver
Thanks Steve, Chris,



On Tue, Feb 7, 2017, at 04:49 PM, Chris Wilcox wrote:

> I may be able to help jump-start this a bit and provide a platform for
> this to run on. I deployed a small service that scans PyPI to figure
> out statistics on Python 2 vs Python 3 support using PyPI Classifiers.
> The source is on GitHub:  https://github.com/crwilcox/PyPI-Gatherer.
> It watches the PyPI updates feed and refreshes entries for packages as
> they show up as modified. It should be possible to add your lib,
> query, and add an additional row or two to the result. I am happy to
> work together on this. Also, the data is stored in an Azure Table
> Storage which has rest endpoints (and a Python SDK) that makes getting
> the published data straight-forward.


I had a quick look through this, and it does look like it should provide
a useful framework for scanning PyPI and updating the results. :-)


What I'm proposing differs in that it would need to download files from
PyPI - basically all of them, if we're thorough about it. I imagine
that's going to involve a lot of data transfer. Do we know what order of
magnitude we're talking about? Is it so large that we should be thinking
of running the scanner in the same data centre as the file storage?


Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig