Re: [Distutils] Indexing modules in Python distributions

2017-02-07 Thread Chris Wilcox via Distutils-SIG
Thanks for cc-ing me Steve.

I may be able to help jump-start this a bit and provide a platform for this to 
run on. I deployed a small service that scans PyPI to figure out statistics on 
Python 2 vs Python 3 support using PyPI Classifiers. The source is on GitHub: 
https://github.com/crwilcox/PyPI-Gatherer. It watches the PyPI updates feed and 
refreshes entries for packages as they show up as modified. It should be 
possible to add your lib, query, and add an additional row or two to the 
result. I am happy to work together on this. Also, the data is stored in an 
Azure Table Storage which has rest endpoints (and a Python SDK) that makes 
getting the published data straight-forward.

Here is an example of using the data provided by the service. This is a Jupyter 
Notebook analysing Python 3 Adoption: 
https://notebooks.azure.com/chris/libraries/pypidataanalysis

Thanks.
Chris

From: Steve Dower [mailto:steve.do...@python.org]
Sent: Tuesday, 7 February, 2017 6:39
To: Thomas Kluyver ; distutils-sig@python.org
Cc: Chris Wilcox 
Subject: RE: [Distutils] Indexing modules in Python distributions

I'm interested, and potentially in a position to provide funded infrastructure 
for this (though perhaps not as soon as you'd like, since things can move 
slowly at my end).

My personal preference would be to download a full list. This is slow moving 
data that will gzip nicely, and my uses (in IDE) will require many tentative 
queries. I can also see value in a single-query API, but keep it simple - the 
value here is in the data, not the lookup.

As far as updates go, most packaging systems should have some sort of release 
notification or update feed, so the work is likely going to be in hooking up to 
those and turning it into a scan task.

Cheers,
Steve

Top-posted from my Windows Phone

From: Thomas Kluyver
Sent: ‎2/‎7/‎2017 3:30
To: distutils-sig@python.org
Subject: [Distutils] Indexing modules in Python distributions
For a variety of reasons, I would like to build an index of what
modules/packages are contained in which distributions ('packages') on
PyPI. For instance:

- Identifying requirements by static analysis of code: 'import zmq' ->
requires pyzmq
- Finding corresponding packages from different packaging systems: pyzmq
on PyPI corresponds to pyzmq in conda, and python[3]-zmq in Debian
repositories. This is an oversimplification, but importable module names
provide a common basis to compare packages. I'd like a tool that could
pick between different ways of installing a given module.

People often assume that the import name is the same as the name on
PyPI. This is true in the vast majority of cases, but there's no
requirement that they are the same, and there are cases where they're
not - pyzmq is one example.

The metadata field 'Provides' is, according to PEP 314, intended for
this purpose, but the standard packaging tools don't make it easy to
use, and consequently very few packages specify it.

I have started putting together a tool to index wheels. It reads a .whl
file, finds modules inside it, and tries to identify namespace packages.
It's still quite rough, but it worked with the wheels I tried.
https://github.com/takluyver/wheeldex

Is this something that other people are interested in?

One thing I'm trying to work out at the moment is how the data would be
accessed: as a web service that tools can query online, or more like
Linux packaging, where tools download and cache a list to do lookups
locally. Or both? There's also, of course, the question of how the index
would be built and updated.

Thanks,
Thomas
___
Distutils-SIG maillist  -  
Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Indexing modules in Python distributions

2017-02-07 Thread Steve Dower
I'm interested, and potentially in a position to provide funded infrastructure 
for this (though perhaps not as soon as you'd like, since things can move 
slowly at my end).

My personal preference would be to download a full list. This is slow moving 
data that will gzip nicely, and my uses (in IDE) will require many tentative 
queries. I can also see value in a single-query API, but keep it simple - the 
value here is in the data, not the lookup.

As far as updates go, most packaging systems should have some sort of release 
notification or update feed, so the work is likely going to be in hooking up to 
those and turning it into a scan task.

Cheers,
Steve

Top-posted from my Windows Phone

-Original Message-
From: "Thomas Kluyver" 
Sent: ‎2/‎7/‎2017 3:30
To: "distutils-sig@python.org" 
Subject: [Distutils] Indexing modules in Python distributions

For a variety of reasons, I would like to build an index of what
modules/packages are contained in which distributions ('packages') on
PyPI. For instance:

- Identifying requirements by static analysis of code: 'import zmq' ->
requires pyzmq
- Finding corresponding packages from different packaging systems: pyzmq
on PyPI corresponds to pyzmq in conda, and python[3]-zmq in Debian
repositories. This is an oversimplification, but importable module names
provide a common basis to compare packages. I'd like a tool that could
pick between different ways of installing a given module.

People often assume that the import name is the same as the name on
PyPI. This is true in the vast majority of cases, but there's no
requirement that they are the same, and there are cases where they're
not - pyzmq is one example.

The metadata field 'Provides' is, according to PEP 314, intended for
this purpose, but the standard packaging tools don't make it easy to
use, and consequently very few packages specify it.

I have started putting together a tool to index wheels. It reads a .whl
file, finds modules inside it, and tries to identify namespace packages.
It's still quite rough, but it worked with the wheels I tried.
https://github.com/takluyver/wheeldex

Is this something that other people are interested in?

One thing I'm trying to work out at the moment is how the data would be
accessed: as a web service that tools can query online, or more like
Linux packaging, where tools download and cache a list to do lookups
locally. Or both? There's also, of course, the question of how the index
would be built and updated.

Thanks,
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Indexing modules in Python distributions

2017-02-07 Thread Thomas Kluyver
For a variety of reasons, I would like to build an index of what
modules/packages are contained in which distributions ('packages') on
PyPI. For instance:

- Identifying requirements by static analysis of code: 'import zmq' ->
requires pyzmq
- Finding corresponding packages from different packaging systems: pyzmq
on PyPI corresponds to pyzmq in conda, and python[3]-zmq in Debian
repositories. This is an oversimplification, but importable module names
provide a common basis to compare packages. I'd like a tool that could
pick between different ways of installing a given module.

People often assume that the import name is the same as the name on
PyPI. This is true in the vast majority of cases, but there's no
requirement that they are the same, and there are cases where they're
not - pyzmq is one example.

The metadata field 'Provides' is, according to PEP 314, intended for
this purpose, but the standard packaging tools don't make it easy to
use, and consequently very few packages specify it.

I have started putting together a tool to index wheels. It reads a .whl
file, finds modules inside it, and tries to identify namespace packages.
It's still quite rough, but it worked with the wheels I tried.
https://github.com/takluyver/wheeldex

Is this something that other people are interested in?

One thing I'm trying to work out at the moment is how the data would be
accessed: as a web service that tools can query online, or more like
Linux packaging, where tools download and cache a list to do lookups
locally. Or both? There's also, of course, the question of how the index
would be built and updated.

Thanks,
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig