Re: [Distutils] Maintaining a curated set of Python packages

2016-12-23 Thread Glyph Lefkowitz

> On Dec 22, 2016, at 11:15 PM, Nick Coghlan  wrote:
> On 22 December 2016 at 09:08, Chris Barker  > wrote:
> And there are utilities that let you run a script in a given environment:
> (and maybe others)
>  (pip 
> Script Installer) creates a dedicated venv for the module and its 
> dependencies, and then adds symlinks from ~/.local/bin to any scripts 
> installed into the venv's bin directory. As Armin notes in the README, it's a 
> really nice way to handle utilities that happen to be written in Python and 
> published via PyPI, without having them impact any other aspect of your 
> system.

I just wanted to echo that this is a great tool, and it teaches really good 
habits (i.e. don't install your general-purpose python tools into 
project-specific virtual environments).

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-22 Thread Nick Coghlan
On 22 December 2016 at 09:08, Chris Barker  wrote:

> And there are utilities that let you run a script in a given environment:
> (and maybe others)
> (pip Script Installer) creates a
dedicated venv for the module and its dependencies, and then adds symlinks
from ~/.local/bin to any scripts installed into the venv's bin directory.
As Armin notes in the README, it's a really nice way to handle utilities
that happen to be written in Python and published via PyPI, without having
them impact any other aspect of your system.


Nick Coghlan   |   |   Brisbane, Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-21 Thread Chris Barker
On Fri, Dec 16, 2016 at 5:51 AM, Daniel Holth  wrote:

> One possibility to consider is that virtualenv itself is a bad idea. Why
> should the Python interpreter executable, rather than the program being
> run, determine the set of packages that is available for import?

well, way back when, som eof us suggestted that pyton have pacakge version
mangement built in to import:

import this_package>=2.1

or whatever.

At that time, the pyGTK and wxPython projects had done a role-your-own
version of this. wxPython's was:

import wxversion'2.3')
import wx

kind a kludgy, but it worked.

However, Guido, among others was pretty adamant that this was NOT python's

Then, along came setuptools that kinda-sorta provided something like that,
and then virtualenv -- and the rest is history.

I now use conda, which provides environments that manage python itself,
other C libs, etc, and it works pretty well.

And there are utilities that let you run a script in a given environment:

(and maybe others)

So that does kinda pass the responsibility to the app.



Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-21 Thread Chris Barker
On Thu, Dec 15, 2016 at 8:29 PM, Glyph Lefkowitz 

> At the beginning of your story you mentioned the GUI client - *that* is
> the missing piece ;).  I've been saying for years that we need a
> that lets you easily bootstrap all this stuff: walk you through installing
> C dev tools if your packages need them, present a GUI search interface to
> finding packages, present a normal "file->open" dialog for selecting a
> location for a new virtualenv, automatically pop open a terminal, launching
> a Jupyter notebook whose kernel is pointed at said environment...

Anaconda provides something like this -- personally, I;m a command lien
geek, so have no idea how much or if it's any good. But might be worth a
look if you're interested.



Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Nick Coghlan
On 17 December 2016 at 06:40, Glyph Lefkowitz 

> On Dec 16, 2016, at 5:51 AM, Daniel Holth  wrote:
> One possibility to consider is that virtualenv itself is a bad idea. Why
> should the Python interpreter executable, rather than the program being
> run, determine the set of packages that is available for import? It is
> confusing and inconvenient to have to deal with environments at all. Yes,
> even if you are using a helper. Maybe there can be a better way to manage
> dependencies that is not completely disjoint from
> I can see why you'd say that, but I disagree.  I think the *name* "virtualenv"
> is really confusing, but the general idea of "it's the interpreter and not
> the app" is a very powerful concept because you can run a REPL (or a
> notebook, or a debugger, or a doc generator, or any other dev tool) in the
> same *context* as your application code, without actually loading or
> executing any specific thing from your application code.  Virtualenv also
> lets you easily control which Python version or interpreter (hello, pypy!)
> is being used in each context.

I'll also note that VSCode's Python plugin will find virtual environments
that are located inside the project directory by default. That approach of
"the virtualenv is inside the project directory" is probably a decent
pattern to recommend as a default, since it aligns with the way a lot of
IDEs (including VSCode itself) already work. When you use that model,
rather than being something you have to think about explicitly, the "Python
virtual environment" just becomes an implementation detail of how the IDE
manages your application dependencies, with the added bonus that *if you
want to*, you can re-use that environment independently of both the
application *and* the IDE.

And while I personally prefer to keep the notion of "project" and
"environment" more explicitly separate (i.e. I have an M:N mapping between
a collection of virtualenvs centrally managed with vex and the various
projects in my devel, fedoradevel and rhdevel folders, hence [1]), I
believe that level of complexity in a local dev setup isn't really normal
even for experienced programmers, let alone folks that are still early in
the learning process.



Nick Coghlan   |   |   Brisbane, Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Greg Ewing

Glyph Lefkowitz wrote:

"New Project" could 
just create a requirements.txt and a for you, alongside a git 
repo and a virtualenv for that project.  Or, the UI could be geared 
towards setting up a tox.ini rather than a virtualenv, and run 
everything through tox so it's in an isolated environment with defined 

I'd be very interested in something like this. I'm not a
big fan of IDEs generally, but one feature I do appreciate
greatly is having a one-button "build" process that creates
a distributable app bundled with everything it needs, and
be assured it will work on someone else's machine.

That's currently rather difficult to do with Python in any but
the simplest cases, even for a single platform. Cross-platform
is even worse. +1 on providing some tools to make it easier.

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Daniel Holth
On Fri, Dec 16, 2016 at 3:40 PM Glyph Lefkowitz 

> On Dec 16, 2016, at 5:51 AM, Daniel Holth  wrote:
> I'm also a visual studio code fan. It is the first editor I've tried that
> feels lightweight like Vim but has the power of many plugins. That, and the
> text rendering is excellent.
> is a lovely GUI package manager.
> There's a lot to like here - no disrespect to the Stallion team - but it's
> worth remembering this lesson from Havoc Pennington:
> The *major* reason UI is important for this use-case - onboarding of new
> people to Python programming - is to give them *discoverability* on terms
> they're already familiar with.  That means that the first "UI" element has
> to be a cross-platform UI bundle.  Stallion is still a thing you have to
> install (and from what I can see, a thing you have to install into a
> virtualenv?)
> One possibility to consider is that virtualenv itself is a bad idea. Why
> should the Python interpreter executable, rather than the program being
> run, determine the set of packages that is available for import? It is
> confusing and inconvenient to have to deal with environments at all. Yes,
> even if you are using a helper. Maybe there can be a better way to manage
> dependencies that is not completely disjoint from
> I can see why you'd say that, but I disagree.  I think the *name* "virtualenv"
> is really confusing, but the general idea of "it's the interpreter and not
> the app" is a very powerful concept because you can run a REPL (or a
> notebook, or a debugger, or a doc generator, or any other dev tool) in the
> same *context* as your application code, without actually loading or
> executing any specific thing from your application code.  Virtualenv also
> lets you easily control which Python version or interpreter (hello, pypy!)
> is being used in each context.

My point is really that virtualenv both causes and solves problems. There's
jaraco's thing I was trying to remember which
is a creative alternative to virtualenv for some situations. I wish there
was a smoother path between virtualenv develop-against-pypi-libraries and
end-user application deployment.

Stallion has always looked cool but it hasn't been updated in a few years,
I admit I never used it "for real". Don't know what it would it take to
make a useful GUI package manager. Interactive dependency conflict
resolution graphs? Able to install into remote environments? Maybe the
command line version will always be easier to use.
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Glyph Lefkowitz

> On Dec 16, 2016, at 5:07 AM, Nick Coghlan  wrote:
> On 16 December 2016 at 20:57, Glyph Lefkowitz  > wrote:
> Anyhow, Xcode is far from perfect - many of the places it touches the UNIX 
> pipeline are extremely sharp edges you can easily impale yourself on (and 
> don't get me started about codesigning) - but it nevertheless points at a 
> different potential direction.  For example; why expose the concept of a 
> "virtual environment" directly at all?  "New Project" could just create a 
> requirements.txt and a for you, alongside a git repo and a 
> virtualenv for that project.  Or, the UI could be geared towards setting up a 
> tox.ini rather than a virtualenv, and run everything through tox so it's in 
> an isolated environment with defined requirements.  This is a best practice 
> anyway so why not make it easier to start early?
> This might all be way too much work, but I think it's important to remember 
> it's possible.
> Yeah, I think we agree more than we disagree here.

Quite.  But the devil's in the details :).

> The main thing is that one of the key ways newcomer-friendly environments 
> make themselves more approachable is to *constrain choice*.

I think framing this as "constraint" is a little misleading.  In a sense it is 
a constraint, but a better way to think of it is: provide a reasonable default. 
 Right now, the "default UI" that most users get is a bare bash prompt where 
commands like 'pip install' fail with an error for no reason they can discern.  
They can still choose to inject a different tool at any point in the process 
(after all, we're talking about frontends which create existing concepts like 
virtualenvs and package installations) if they so choose; they just get a 
default that does something - anything - useful.

> XCode usability benefits from being Apple-centric. Ditto for Visual Studio 
> and MS.
> Linux and Python, by contrast, were both born out of a DIY culture where 
> folks being free to choose their own tools was initially perceived solely as 
> a highly desirable feature, rather than as a potential barrier to entry for 
> newcomers.
> That means there's an argument to be made that something like YHat's Rodeo 
> [1] might be a better starting point for data analytics in Python than 
> jumping straight to Jupyter Notebook, and it's also why the Mu editor [2] 
> exists as a dedicated tool for folks learning Python by way of the micro:bit 
> project.
> [1] 
> [2] 

Minor point - nobody should use Mu yet, at least not on the mac: 

More significantly, I think any text editor will do just fine (as long as it's 
not Emacs or Vim) - I've had great success with 
, and even Notepad will do in 
a pinch.  There are already pretty good integration points where editors can be 
told to open specific files.  One of my frustrations with the educational 
ecosystem is the focus on the (quite difficult) problem of providing students 
with a fully integrated text editing / script running / debugging environment, 
rather than figuring out how to orchestrate and launch the quite powerful and 
sophisticated tools we already have.

>> However, the reason I brought up the Curse and Firefox GUI examples was to 
>> emphasise the problems they hide from the default rich client experience:
>> - their default focus is on managing one environment per device
> In the analogous Python tool, one could replace "per device" with "per 
> project" - and perhaps have a "default project" so something useful could 
> happen even before you've decided what you're doing...
> But we've immediately bumped the complexity level up in doing so, and it's a 
> level of complexity that many people initially spending all of their 
> development time on a single project may not need. 

I think we're underestimating potential programming students.  The idea of 
managing multiple documents is likely something they're familiar with from word 
processing apps.  If not, then fine - we can start them off with a default 

> I thought this thread was already interminable, I look forward to reading the 
> never-ending rest of it now that you've raised the grim spectre of the PyPI 
> user-ratings feature from the dead :).
> All the arguments against integrating user ratings into a service that's 
> focused on lowering barriers to publication still hold, so I'm really just 
> noting that that decision to create a friendlier publishing environment 
> *does* introduce some additional constraints elsewhere in the distribution 
> pipeline.
>> User-curated package sets strikes me as the _lowest_ priority feature 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Glyph Lefkowitz

> On Dec 16, 2016, at 5:51 AM, Daniel Holth  wrote:
> I'm also a visual studio code fan. It is the first editor I've tried that 
> feels lightweight like Vim but has the power of many plugins. That, and the 
> text rendering is excellent.
> is a lovely GUI package manager.

There's a lot to like here - no disrespect to the Stallion team - but it's 
worth remembering this lesson from Havoc Pennington: 

The major reason UI is important for this use-case - onboarding of new people 
to Python programming - is to give them discoverability on terms they're 
already familiar with.  That means that the first "UI" element has to be a 
cross-platform UI bundle.  Stallion is still a thing you have to install (and 
from what I can see, a thing you have to install into a virtualenv?)

> One possibility to consider is that virtualenv itself is a bad idea. Why 
> should the Python interpreter executable, rather than the program being run, 
> determine the set of packages that is available for import? It is confusing 
> and inconvenient to have to deal with environments at all. Yes, even if you 
> are using a helper. Maybe there can be a better way to manage dependencies 
> that is not completely disjoint from

I can see why you'd say that, but I disagree.  I think the name "virtualenv" is 
really confusing, but the general idea of "it's the interpreter and not the 
app" is a very powerful concept because you can run a REPL (or a notebook, or a 
debugger, or a doc generator, or any other dev tool) in the same context as 
your application code, without actually loading or executing any specific thing 
from your application code.  Virtualenv also lets you easily control which 
Python version or interpreter (hello, pypy!) is being used in each context.


Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Brett Cannon
If people are serious about trying to prototype this stuff then the easiest
way might  be coming up with shell scripts that do the prompting if it's
faster to iterate that way than doing a full-blown GUI. Now that WIndows 10
has WSL/Bash it means for the first time all 3 major OSs have a common
shell people can work from. You could even go as far as making the shell
scripts be Cookiecutter templates such that people can experiment with
things being included/left out (e.g. an instructor wants to require Python
3.5, no git, and have people work from a virtual environment and so they
generate the shell script everyone is told to run to get things
going/verify the student's system is set up properly).

On Fri, 16 Dec 2016 at 05:52 Daniel Holth  wrote:

> I'm also a visual studio code fan. It is the first editor I've tried that
> feels lightweight like Vim but has the power of many plugins. That, and the
> text rendering is excellent.
> is a lovely GUI package manager.
> One possibility to consider is that virtualenv itself is a bad idea. Why
> should the Python interpreter executable, rather than the program being
> run, determine the set of packages that is available for import? It is
> confusing and inconvenient to have to deal with environments at all. Yes,
> even if you are using a helper. Maybe there can be a better way to manage
> dependencies that is not completely disjoint from

That just sounds like node_modules/ and I personally don't want to go down
that route. If you view the interpreter as another component of an app then
the disconnect doesn't seem so nutty (at least in my head; at that point
it's just another /usr/local to me).


> On Fri, Dec 16, 2016 at 8:07 AM Nick Coghlan  wrote:
> On 16 December 2016 at 20:57, Glyph Lefkowitz 
> wrote:
> Anyhow, Xcode is far from perfect - many of the places it touches the UNIX
> pipeline are extremely sharp edges you can easily impale yourself on (and
> don't get me started about codesigning) - but it nevertheless points at a
> different potential direction.  For example; why expose the concept of a
> "virtual environment" directly at all?  "New Project" could just create a
> requirements.txt and a for you, alongside a git repo and a
> virtualenv for that project.  Or, the UI could be geared towards setting up
> a tox.ini rather than a virtualenv, and run everything through tox so it's
> in an isolated environment with defined requirements.  This is a best
> practice anyway so why not make it easier to start early?
> This might all be way too much work, but I think it's important to
> remember it's possible.
> Yeah, I think we agree more than we disagree here. The main thing is that
> one of the key ways newcomer-friendly environments make themselves more
> approachable is to *constrain choice*.
> XCode usability benefits from being Apple-centric. Ditto for Visual Studio
> and MS.
> Linux and Python, by contrast, were both born out of a DIY culture where
> folks being free to choose their own tools was initially perceived solely
> as a highly desirable feature, rather than as a potential barrier to entry
> for newcomers.
> That means there's an argument to be made that something like YHat's Rodeo
> [1] might be a better starting point for data analytics in Python than
> jumping straight to Jupyter Notebook, and it's also why the Mu editor [2]
> exists as a dedicated tool for folks learning Python by way of the
> micro:bit project.
> [1]
> [2]
> However, the reason I brought up the Curse and Firefox GUI examples was to
> emphasise the problems they hide from the default rich client experience:
> - their default focus is on managing one environment per device
> In the analogous Python tool, one could replace "per device" with "per
> project" - and perhaps have a "default project" so something useful could
> happen even before you've decided what you're doing...
> But we've immediately bumped the complexity level up in doing so, and it's
> a level of complexity that many people initially spending all of their
> development time on a single project may not need.
> I thought this thread was already interminable, I look forward to reading
> the never-ending rest of it now that you've raised the grim spectre of the
> PyPI user-ratings feature from the dead :).
> All the arguments against integrating user ratings into a service that's
> focused on lowering barriers to publication still hold, so I'm really just
> noting that that decision to create a friendlier publishing environment
> *does* introduce some additional constraints elsewhere in the distribution
> pipeline.
> User-curated package sets strikes me as the _lowest_ priority feature out
> of all of those, if we are ordering by priority to deliver a good user
> experience.  I know 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Steve Dower
"Alternatively, I've recently started using Visual Studio Code as my editor for 
work ..."

FWIW, the long-term Python story in VSCode is currently (largely) one of my 
responsibilities, and that bootstrapping flow is exactly one of the pieces I 
really want to put in. Unfortunately, nobody has let me have any engineers to 
implement it yet :( (mostly for big-company-politics reasons, rather than 
hiring trouble)

Top-posted from my Windows Phone

-Original Message-
From: "Nick Coghlan" <>
Sent: ‎12/‎16/‎2016 5:08
To: "Glyph Lefkowitz" <>
Cc: "Barry Warsaw" <>; "DistUtils mailing list" 
Subject: Re: [Distutils] Maintaining a curated set of Python packages

On 16 December 2016 at 20:57, Glyph Lefkowitz <> wrote:

Anyhow, Xcode is far from perfect - many of the places it touches the UNIX 
pipeline are extremely sharp edges you can easily impale yourself on (and don't 
get me started about codesigning) - but it nevertheless points at a different 
potential direction.  For example; why expose the concept of a "virtual 
environment" directly at all?  "New Project" could just create a 
requirements.txt and a for you, alongside a git repo and a virtualenv 
for that project.  Or, the UI could be geared towards setting up a tox.ini 
rather than a virtualenv, and run everything through tox so it's in an isolated 
environment with defined requirements.  This is a best practice anyway so why 
not make it easier to start early?

This might all be way too much work, but I think it's important to remember 
it's possible.

Yeah, I think we agree more than we disagree here. The main thing is that one 
of the key ways newcomer-friendly environments make themselves more 
approachable is to *constrain choice*.

XCode usability benefits from being Apple-centric. Ditto for Visual Studio and 

Linux and Python, by contrast, were both born out of a DIY culture where folks 
being free to choose their own tools was initially perceived solely as a highly 
desirable feature, rather than as a potential barrier to entry for newcomers.

That means there's an argument to be made that something like YHat's Rodeo [1] 
might be a better starting point for data analytics in Python than jumping 
straight to Jupyter Notebook, and it's also why the Mu editor [2] exists as a 
dedicated tool for folks learning Python by way of the micro:bit project.



However, the reason I brought up the Curse and Firefox GUI examples was to 
emphasise the problems they hide from the default rich client experience:

- their default focus is on managing one environment per device

In the analogous Python tool, one could replace "per device" with "per project" 
- and perhaps have a "default project" so something useful could happen even 
before you've decided what you're doing...

But we've immediately bumped the complexity level up in doing so, and it's a 
level of complexity that many people initially spending all of their 
development time on a single project may not need. 

I thought this thread was already interminable, I look forward to reading the 
never-ending rest of it now that you've raised the grim spectre of the PyPI 
user-ratings feature from the dead :).

All the arguments against integrating user ratings into a service that's 
focused on lowering barriers to publication still hold, so I'm really just 
noting that that decision to create a friendlier publishing environment *does* 
introduce some additional constraints elsewhere in the distribution pipeline.

User-curated package sets strikes me as the _lowest_ priority feature out of 
all of those, if we are ordering by priority to deliver a good user experience. 
 I know "steam curators" have been brought up before - but we're talking about 
adding curators (one of my least favorite features of Steam, for what it's 
worth) before we've added "install game" ;-).

In many educational contexts, adding "install game" without support for 
institutional curators of some kind is a complete non-starter (even if those 
curators are a collaborative community like a Linux distribution, there's still 
more accountability than software publishing sites like PyPI tend to provide).

I initially wanted to disagree when I read this, but I'm not actually sure what 
educational contexts you're talking about, and why "accountability" is 

Schools, mainly. Lots of administrators are still scared of the internet, so 
one of the attractions of things like Raspberry Pi is that the software updates 
come from Debian rather than directly from the software publishers.

Sometimes you can get away with "What the bureaucracy doesn't know won't

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Daniel Holth
I'm also a visual studio code fan. It is the first editor I've tried that
feels lightweight like Vim but has the power of many plugins. That, and the
text rendering is excellent. is a lovely GUI package manager.

One possibility to consider is that virtualenv itself is a bad idea. Why
should the Python interpreter executable, rather than the program being
run, determine the set of packages that is available for import? It is
confusing and inconvenient to have to deal with environments at all. Yes,
even if you are using a helper. Maybe there can be a better way to manage
dependencies that is not completely disjoint from

On Fri, Dec 16, 2016 at 8:07 AM Nick Coghlan  wrote:

> On 16 December 2016 at 20:57, Glyph Lefkowitz 
> wrote:
> Anyhow, Xcode is far from perfect - many of the places it touches the UNIX
> pipeline are extremely sharp edges you can easily impale yourself on (and
> don't get me started about codesigning) - but it nevertheless points at a
> different potential direction.  For example; why expose the concept of a
> "virtual environment" directly at all?  "New Project" could just create a
> requirements.txt and a for you, alongside a git repo and a
> virtualenv for that project.  Or, the UI could be geared towards setting up
> a tox.ini rather than a virtualenv, and run everything through tox so it's
> in an isolated environment with defined requirements.  This is a best
> practice anyway so why not make it easier to start early?
> This might all be way too much work, but I think it's important to
> remember it's possible.
> Yeah, I think we agree more than we disagree here. The main thing is that
> one of the key ways newcomer-friendly environments make themselves more
> approachable is to *constrain choice*.
> XCode usability benefits from being Apple-centric. Ditto for Visual Studio
> and MS.
> Linux and Python, by contrast, were both born out of a DIY culture where
> folks being free to choose their own tools was initially perceived solely
> as a highly desirable feature, rather than as a potential barrier to entry
> for newcomers.
> That means there's an argument to be made that something like YHat's Rodeo
> [1] might be a better starting point for data analytics in Python than
> jumping straight to Jupyter Notebook, and it's also why the Mu editor [2]
> exists as a dedicated tool for folks learning Python by way of the
> micro:bit project.
> [1]
> [2]
> However, the reason I brought up the Curse and Firefox GUI examples was to
> emphasise the problems they hide from the default rich client experience:
> - their default focus is on managing one environment per device
> In the analogous Python tool, one could replace "per device" with "per
> project" - and perhaps have a "default project" so something useful could
> happen even before you've decided what you're doing...
> But we've immediately bumped the complexity level up in doing so, and it's
> a level of complexity that many people initially spending all of their
> development time on a single project may not need.
> I thought this thread was already interminable, I look forward to reading
> the never-ending rest of it now that you've raised the grim spectre of the
> PyPI user-ratings feature from the dead :).
> All the arguments against integrating user ratings into a service that's
> focused on lowering barriers to publication still hold, so I'm really just
> noting that that decision to create a friendlier publishing environment
> *does* introduce some additional constraints elsewhere in the distribution
> pipeline.
> User-curated package sets strikes me as the _lowest_ priority feature out
> of all of those, if we are ordering by priority to deliver a good user
> experience.  I know "steam curators" have been brought up before - but
> we're talking about adding curators (one of my least favorite features of
> Steam, for what it's worth) before we've added "install game" ;-).
> In many educational contexts, adding "install game" without support for
> institutional curators of some kind is a complete non-starter (even if
> those curators are a collaborative community like a Linux distribution,
> there's still more accountability than software publishing sites like PyPI
> tend to provide).
> I initially wanted to disagree when I read this, but I'm not actually sure
> what educational contexts you're talking about, and why "accountability" is
> important?
> Schools, mainly. Lots of administrators are still scared of the internet,
> so one of the attractions of things like Raspberry Pi is that the software
> updates come from Debian rather than directly from the software publishers.
> Sometimes you can get away with "What the bureaucracy doesn't know won't
> hurt it", but it's more convenient when teachers don't have to do that.

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Nick Coghlan
On 16 December 2016 at 20:57, Glyph Lefkowitz 

> Anyhow, Xcode is far from perfect - many of the places it touches the UNIX
> pipeline are extremely sharp edges you can easily impale yourself on (and
> don't get me started about codesigning) - but it nevertheless points at a
> different potential direction.  For example; why expose the concept of a
> "virtual environment" directly at all?  "New Project" could just create a
> requirements.txt and a for you, alongside a git repo and a
> virtualenv for that project.  Or, the UI could be geared towards setting up
> a tox.ini rather than a virtualenv, and run everything through tox so it's
> in an isolated environment with defined requirements.  This is a best
> practice anyway so why not make it easier to start early?
> This might all be way too much work, but I think it's important to
> remember it's possible.

Yeah, I think we agree more than we disagree here. The main thing is that
one of the key ways newcomer-friendly environments make themselves more
approachable is to *constrain choice*.

XCode usability benefits from being Apple-centric. Ditto for Visual Studio
and MS.

Linux and Python, by contrast, were both born out of a DIY culture where
folks being free to choose their own tools was initially perceived solely
as a highly desirable feature, rather than as a potential barrier to entry
for newcomers.

That means there's an argument to be made that something like YHat's Rodeo
[1] might be a better starting point for data analytics in Python than
jumping straight to Jupyter Notebook, and it's also why the Mu editor [2]
exists as a dedicated tool for folks learning Python by way of the
micro:bit project.


> However, the reason I brought up the Curse and Firefox GUI examples was to
> emphasise the problems they hide from the default rich client experience:
> - their default focus is on managing one environment per device
> In the analogous Python tool, one could replace "per device" with "per
> project" - and perhaps have a "default project" so something useful could
> happen even before you've decided what you're doing...

But we've immediately bumped the complexity level up in doing so, and it's
a level of complexity that many people initially spending all of their
development time on a single project may not need.

I thought this thread was already interminable, I look forward to reading
> the never-ending rest of it now that you've raised the grim spectre of the
> PyPI user-ratings feature from the dead :).

All the arguments against integrating user ratings into a service that's
focused on lowering barriers to publication still hold, so I'm really just
noting that that decision to create a friendlier publishing environment
*does* introduce some additional constraints elsewhere in the distribution

> User-curated package sets strikes me as the _lowest_ priority feature out
>> of all of those, if we are ordering by priority to deliver a good user
>> experience.  I know "steam curators" have been brought up before - but
>> we're talking about adding curators (one of my least favorite features of
>> Steam, for what it's worth) before we've added "install game" ;-).
> In many educational contexts, adding "install game" without support for
> institutional curators of some kind is a complete non-starter (even if
> those curators are a collaborative community like a Linux distribution,
> there's still more accountability than software publishing sites like PyPI
> tend to provide).
> I initially wanted to disagree when I read this, but I'm not actually sure
> what educational contexts you're talking about, and why "accountability" is
> important?

Schools, mainly. Lots of administrators are still scared of the internet,
so one of the attractions of things like Raspberry Pi is that the software
updates come from Debian rather than directly from the software publishers.

Sometimes you can get away with "What the bureaucracy doesn't know won't
hurt it", but it's more convenient when teachers don't have to do that.

> "beginner" is a direction, and not a fixed position; many people more
> "beginner" than the current audience could be well-served by a discoverable
> initial project-creation and REPL UI.  While I don't doubt that some
> backend pieces might help (although I still don't see how the one being
> discussed would), I also think that it would be very hard to say that the
> back-end is a *limiting factor* in UX improvement for the Python
> onboarding process; the front end could move quite a bit up the value chain
> without touching any of the various backends it would need to interact with.
> But of course, if I really wanted to make this point, I'd just write it;
> dstufft is certainly right that volunteer time is not fungible.  If I'm
> lucky, I'll have the time to do that at some point, since my efforts to

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-16 Thread Glyph Lefkowitz

> On Dec 15, 2016, at 9:23 PM, Nick Coghlan  wrote:
> On 16 December 2016 at 14:29, Glyph Lefkowitz  > wrote:
>> On Dec 15, 2016, at 8:18 PM, Nick Coghlan > > wrote:
> At the beginning of your story you mentioned the GUI client - that is the 
> missing piece ;).  I've been saying for years that we need a that 
> lets you easily bootstrap all this stuff: walk you through installing C dev 
> tools if your packages need them, present a GUI search interface to finding 
> packages, present a normal "file->open" dialog for selecting a location for a 
> new virtualenv, automatically pop open a terminal, launching a Jupyter 
> notebook whose kernel is pointed at said environment...
> It isn't really, as we started looking at this for IDLE, and the entire 
> current UX is just fundamentally beginner hostile:
> - virtual environments are hard
> - requirements files are hard
> - knowing what packages are trustworthy and worth your time is hard
> - limiting students to a set of "known safe" packages is hard
> - components that assume command line use are hard
> They're especially hard if the only way to distribute a fix is to release an 
> entire new edition of CPython rather than having IDLE talk to a (preferably 
> configurable) backend cloud service for updated instructions.
> So there's a reason so many learning and even full development environments 
> are moving online - they let the service provider deal with all the hassles 
> of providing an appropriately configured environment, while the students can 
> focus on learning how to code, and the developers can focus on defining their 
> application logic.

None of what you're saying is wrong here, so I don't want to disagree.

But, I think this is just one perspective; i.e. moving to a cloud environment 
is one approach to providing a more circumscribed environment, but embracing 
endpoint sandboxing is another.  For example, learning how to use Xcode is a 
fundamentally different (and easier!) sort of experience than learning the 
traditional UNIX development pipeline, due in large part to the fact that it 
provides a unified, discoverable interface.  This is despite the fact that 
Xcode projects are actually substantially more complex than their UNIX-y 
equivalents, due to the high levels of coupling and complexity in the way that 
you have to interface with certain system services (signing with entitlements, 
bundle metadata, etc).

You still have to retrieve many resources from the cloud - simulators, 
documentation, SDKs - but the UI tells you that you need those things, and 
straightforwardly automates the process of getting them.  Everything else that 
goes into a development project is not "environment setup", but a part of the 
Xcode project itself.  Similarly, version control (a git repository) is nearly 
implicitly a part of the project. It's tricky to even create one without a VCS 
backing it any more.

Anyhow, Xcode is far from perfect - many of the places it touches the UNIX 
pipeline are extremely sharp edges you can easily impale yourself on (and don't 
get me started about codesigning) - but it nevertheless points at a different 
potential direction.  For example; why expose the concept of a "virtual 
environment" directly at all?  "New Project" could just create a 
requirements.txt and a for you, alongside a git repo and a virtualenv 
for that project.  Or, the UI could be geared towards setting up a tox.ini 
rather than a virtualenv, and run everything through tox so it's in an isolated 
environment with defined requirements.  This is a best practice anyway so why 
not make it easier to start early?

This might all be way too much work, but I think it's important to remember 
it's possible.

> However, the reason I brought up the Curse and Firefox GUI examples was to 
> emphasise the problems they hide from the default rich client experience:
> - their default focus is on managing one environment per device

In the analogous Python tool, one could replace "per device" with "per project" 
- and perhaps have a "default project" so something useful could happen even 
before you've decided what you're doing...

> - they both may require environment restarts for changes to take effect

... one could just put a little blinking red light on any jupyter windows whose 
kernels need to be restarted :) ...

> - they both reference an at least somewhat moderated back end (by Curse in 
> the Curse client case, by Mozilla in the Firefox case)
> - they both incorporate popularity metrics and addon ratings into the client 
> experience

I thought this thread was already interminable, I look forward to reading the 
never-ending rest of it now that you've raised the grim spectre of the PyPI 
user-ratings feature from the dead :).

> Mobile app store clients also share those four 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Wes Turner
On Saturday, December 10, 2016, Wes Turner  wrote:

> Here are some standardized (conda) package versions:
> jupyter/docker-stacks/blob/master/scipy-notebook/Dockerfile
IDK how they choose packages - what "criteria for inclusion" - for the
kaggle/docker-python Dockerfile:

(Ubuntu, Conda, *, TPOT)

> On Thursday, December 8, 2016, Wes Turner  > wrote:
>> On Thursday, December 8, 2016, Nick Coghlan  wrote:
>>> Putting the conclusion first, I do see value in better publicising
>>> "Recommended libraries" based on some automated criteria like:
>>> - recommended in the standard library documentation
>>> - available via 1 or more cross-platform commercial Python redistributors
>>> - available via 1 or more Linux distro vendors
>>> - available via 1 or more web service development platforms
>> So these would be attributes tracked by a project maintainer and verified
>> by the known-good-set maintainer? Or?
>> (Again, here I reach for JSONLD. "count n" is only so useful; *which*
>> {[re]distros, platforms, heartfelt testimonials from incredible experts}
>> URLs )
>> - test coverage
>> - seclist contact info AND procedures
>> - more than one super admin maintainer
>> - what other criteria should/could/can we use to vet open source
>> libraries?
>>> That would be a potentially valuable service for folks new to the
>>> world of open source that are feeling somewhat overwhelmed by the
>>> sheer number of alternatives now available to them.
>>> However, I also think that would better fit in with the aims of an
>>> open source component tracking community like than it
>>> does a publisher-centric community like distutils-sig.
>> IDK if libraries are really in scope for stackshare. The feature
>> upcoming/down voting is pretty cool.
>>> The further comments below are just a bit more background on why I
>>> feel the integration testing aspect of the suggestion isn't likely to
>>> be particularly beneficial :)
>> A catch-all for testing bits from application-specific integration test
>> suites could be useful (and would likely require at least docker-compose,
>> dox, kompose for working with actual data stores)
>>> On 9 December 2016 at 01:10, Barry Warsaw  wrote:
>>> > Still, there may be value in inter-Python package compatibility tests,
>>> but
>>> > it'll take serious engineering effort (i.e. $ and time), ongoing
>>> maintenance,
>>> > ongoing effort to fix problems, and tooling to gate installability of
>>> failing
>>> > packages (with overrides for downstreams which don't care or already
>>> expend
>>> > such effort).
>>> I think this is really the main issue, as both desktop and server
>>> environments are moving towards the integrated platform + isolated
>>> applications approach popularised by mobile devices.
>>> That means we end up with two very different variants of automated
>>> integration testing:
>>> - the application focused kind offered by the likes of and
>>> (i.e. monitor for dependency updates, submit PRs to trigger
>>> app level CI)
>>> - the platform focused kind employed by distro vendors (testing all
>>> the platform components work together, including the app isolation
>>> features)
>>> The first kind makes sense if you're building something that runs *on*
>>> platforms (Docker containers, Snappy or FlatPak apps, web services,
>>> mobile apps, etc).
>>> The second kind inevitably ends up intertwined with the component
>>> review and release engineering systems of the particular platform, so
>>> it becomes really hard to collaborate cross-platform outside the
>>> context of specific projects like OpenStack that provide clear
>>> definitions for "What components do we collectively depend on that we
>>> need to test together?" and "What does 'working' mean in the context
>>> of this project?".
>>> Accordingly, for an initiative like this to be successful, it would
>>> need to put some thought up front into the questions of:
>>> 1. Who are the intended beneficiaries of the proposal?
>>> 2. What problem does it address that will prompt them to contribute
>>> time and/or money to solving it?
>>> 3. What do we expect people to be able to *stop doing* if the project
>>> proves successful?
>>> For platform providers, a generic "stdlib++" project wouldn't really
>>> reduce the amount of integration testing we'd need to do ourselves (we
>>> already don't test arbitrary combinations of dependencies, just the
>>> ones we provide at any given point in time).
>>> For application and service developers, the approach of pinning
>>> dependencies to specific versions and treating updates like any other
>>> source code 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Nick Coghlan
On 16 December 2016 at 14:29, Glyph Lefkowitz 

> On Dec 15, 2016, at 8:18 PM, Nick Coghlan  wrote:
> At the beginning of your story you mentioned the GUI client - *that* is
> the missing piece ;).  I've been saying for years that we need a
> that lets you easily bootstrap all this stuff: walk you through installing
> C dev tools if your packages need them, present a GUI search interface to
> finding packages, present a normal "file->open" dialog for selecting a
> location for a new virtualenv, automatically pop open a terminal, launching
> a Jupyter notebook whose kernel is pointed at said environment...

It isn't really, as we started looking at this for IDLE, and the entire
current UX is just fundamentally beginner hostile:

- virtual environments are hard
- requirements files are hard
- knowing what packages are trustworthy and worth your time is hard
- limiting students to a set of "known safe" packages is hard
- components that assume command line use are hard

They're especially hard if the only way to distribute a fix is to release
an entire new edition of CPython rather than having IDLE talk to a
(preferably configurable) backend cloud service for updated instructions.

So there's a reason so many learning and even full development environments
are moving online - they let the service provider deal with all the hassles
of providing an appropriately configured environment, while the students
can focus on learning how to code, and the developers can focus on defining
their application logic.

However, the reason I brought up the Curse and Firefox GUI examples was to
emphasise the problems they hide from the default rich client experience:

- their default focus is on managing one environment per device
- they both may require environment restarts for changes to take effect
- they both reference an at least somewhat moderated back end (by Curse in
the Curse client case, by Mozilla in the Firefox case)
- they both incorporate popularity metrics and addon ratings into the
client experience

Mobile app store clients also share those four characteristics (where
"number of installations" and "star ratings" are critically important to
search rankings, but gaming the latter is mitigated by hiding the "Write a
review" feature if you haven't actually installed the app anywhere)

> User-curated package sets strikes me as the _lowest_ priority feature out
> of all of those, if we are ordering by priority to deliver a good user
> experience.  I know "steam curators" have been brought up before - but
> we're talking about adding curators (one of my least favorite features of
> Steam, for what it's worth) before we've added "install game" ;-).

In many educational contexts, adding "install game" without support for
institutional curators of some kind is a complete non-starter (even if
those curators are a collaborative community like a Linux distribution,
there's still more accountability than software publishing sites like PyPI
tend to provide).

> Users might even figure out this sort of stuff for themselves if they are
> given a discoverable API for things like search and installation of
> packages.

That sounds a bit like agreement that we're still missing some of the
backend pieces needed to make a beginner-friendly client really viable :)


Nick Coghlan   |   |   Brisbane, Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Wes Turner
On Thursday, December 15, 2016, Nick Coghlan  wrote:

> On 16 December 2016 at 05:50, Paul Moore  > wrote:
> > On 15 December 2016 at 19:13, Wes Turner  > wrote:
> >>> Just to add my POV, I also find your posts unhelpful, Wes. There's not
> >>> enough information for me to evaluate what you say, and you offer no
> >>> actual solutions to what's being discussed.
> >>
> >>
> >> I could quote myself suggesting solutions in this thread, if you like?
> >
> > You offer lots of pointers to information. But that's different.
> Exactly. There are *lots* of information processing standards out
> there, and lots of things we *could* provide natively that simply
> aren't worth the hassle since folks that care can provide them as
> "after market addons" for the audiences that considers them relevant.
> For example, a few things that can matter to different audiences are:
> - SPDX (Software Package Data Exchange) identifiers for licenses
> - CPE (Common Product Enumeration) and SWID (Software Identification)
> tags for published software
> - DOI (Digital Object Identifier) tags for citation purposes
> - Common Criteria certification for software supply chains

These are called properties with RDFS.

It takes very little effort to add additional properties. If the
unqualified attribute is not listed in a JSONLD @context, it can still be
added by specifying a URI

> I don't push for these upstream in distutils-sig not because I don't
> think they're important in general, but because I *don't think they're
> a priority for distutils-sig*. If you're teaching Python to school
> students, or teaching engineers and scientists how to better analyse
> their own data, or building a web service for yourself or your
> employer, these kinds of things simply don't matter.

#31 lists a number of advantages.
OTOMH, CVE security reports could be linked to the project/package URI (and
thus displayed along with the project detail page)

> The end users that care about them are well-positioned to tackle them
> on their own (or pay other organisations to do it for them), and
> because they span arbitrary publishing communities anyway, it doesn't
> really matter all that much if any given publishing community
> participates directly in the process (the only real beneficiaries are
> the intermediaries that actively blur the distinctions between the
> cooperative communities and the recalcitrant ones).

Linked Data minimizes

> > Anyway, let's just agree to differ - I can skip your mails if they
> > aren't helpful to me, and you don't need to bother about the fact that
> > you're not getting your points across to me.
> I consider it fairly important that we have a reasonably common
> understanding of the target userbase for direct consumption of PyPI
> data, and what we expect to be supplied as third party services. It's
> also important that we have a shared understanding of how to
> constructively frame proposals for change.

When I can afford the time, I'll again take a look at fixing the metadata
specification once and for all by (1) defining an @context for the existing
metadata, and (2) producing an additional pydist.jsonld TODO metadata
document (because the releases are currently keyed by version), and (3)
adding the model attribute and view to Warehouse.

> For the former, the Semantic Web, and folks that care about Semantic
> Web concepts like "Linked Data" in the abstract sense are not part of
> our primary audience. We don't go out of our way to make their lives
> difficult, but "it makes semantic analysis easier" also isn't a
> compelling rationale for change.

Unfortunately, you types are not well-versed in the problems that Linked
Data solves: it's all your data in your schema in your database; and URIs
are far less useful than RAM-local references (pointers).

See: BP-LD

> For the latter, some variants of constructive proposals look like:
> - "this kind of user has this kind of problem and this proposed
> solution will help mitigate it this way (and, by the way, here's an
> existing standard we can use)"
> - "this feature exists in , it's really
> valuable to users for , how about we offer it by
> default?"
> - "I wrote  for myself, and I think it would also help others
> for , can you help me make it more widely known and
> available?"

One could stuff additional metadata in # comments of a requirements.txt,
but that would be an ad-hoc parsing scheme with a SPOF tool dependency.

> They don't look like "Here's a bunch of technologies and organisations
> that exist on the internet that may in some way potentially be
> relevant to the management of a software distribution network", and
> nor does it look like "This data modeling standard exists, so we
> should use it, even though it doesn't actually simplify our lives or
> our users' lives in any way, and in fact makes them more complicated".


Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Glyph Lefkowitz

> On Dec 15, 2016, at 8:33 PM, Donald Stufft  wrote:
>> On Dec 15, 2016, at 11:29 PM, Glyph Lefkowitz > > wrote:
>> User-curated package sets strikes me as the _lowest_ priority feature out of 
>> all of those
> I don’t think anyone in the PyPA is planning on working on this currently. It 
> was a possible idea that was spawned from this thread. However the nature of 
> volunteer OSS is that volunteer time is not fungible and if someone feels 
> particularly enthused about this idea they are free to pursue it.

I did very consciously choose my words there: "strikes me as", not "is" ;-)


Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Donald Stufft

> On Dec 15, 2016, at 11:29 PM, Glyph Lefkowitz  wrote:
> User-curated package sets strikes me as the _lowest_ priority feature out of 
> all of those

I don’t think anyone in the PyPA is planning on working on this currently. It 
was a possible idea that was spawned from this thread. However the nature of 
volunteer OSS is that volunteer time is not fungible and if someone feels 
particularly enthused about this idea they are free to pursue it.

Donald Stufft

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Glyph Lefkowitz

> On Dec 15, 2016, at 8:18 PM, Nick Coghlan  wrote:
> On 16 December 2016 at 07:14, Glyph Lefkowitz  > wrote:
>> On Dec 15, 2016, at 6:39 AM, Donald Stufft > > wrote:
>> Theoretically we could allow people to not just select packages, but also 
>> package specifiers for their “curated package set”, so instead of saying 
>> “requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we 
>> really wanted to get slick we could even provide a requirements.txt file 
>> format, and have people able to install the entire set by doing something 
>> like:
>> $ pip install -r 
> Can't people already do this by publishing a package that just depends on 
> their whole 'package set'?
> Technically, sure, but it adds a lot of overhead. The closest equivalent 
> right now would be maintaining a source control repo somewhere with various 
> requirements files in it.
> However, at an ecosystem level, that doesn't have the same user experience 
> impact. The idea of building this into PyPI itself would be to *reshape the 
> learning curve of how people learn about dependency management as they're 
> introduced to Python*.
> Going back to the CurseGaming example, I actually use the free version of 
> their client to manage the Warcraft addons on my gaming PC. The basic usage 
> model is really simple and (not coincidentally) very similar to the way the 
> Add-on manager works in Firefox and other GUI apps with integrated plugin 
> managers:
> - you have an "Installed" tab for the addons you have installed
> - when you start the client, it checks for updates for all your installed 
> addons and the out of date ones gain an "Update" button
> - there's a separate tab where you can search all the available addons and 
> install new ones
> I've never used any of Curse's other clients (like the Minecraft or Kerbal 
> Space Program ones), but I assume they operate in a similar way.
> The paid tier of the Curse Client, and the account sync feature of Firefox, 
> then offer the ability to synchronize your installed addons across machines. 
> (There are also a lot of similarities between this model and the way mobile 
> app stores work)
> A comparable UX for Python/PyPI/pip would focus less on the 
> library-and-application development cases (where the presence of source 
> control is assumed), and more on the ad hoc scripting and learning-to-program 
> use cases, where you're typically more interested in "--user" installations 
> and the question of which parts of the Python ecosystem are just an import 
> away than you are in reproducability and maintainability.
> The ecosystem level learning curve then becomes:
> - did you know you can back up your list of user installed packages to PyPI?
> - did you know you can use PyPI to sync your user installs between systems?
> - did you know you can use PyPI to categorise your user installs and share 
> them with others?
> - OK, now it's time to start learning about version control, virtual 
> environments and automated testing
> It wouldn't necessarily make sense to bake this *directly* into Warehouse, 
> and the Mozilla folks responsible for Firefox Sync could no doubt offer real 
> word guidance on the infrastructure and support needed to operate a service 
> like that at scale, but the core concept of allowing package management to be 
> introduced independently of both version control and virtual environments 
> sounds potentially valuable to me.

Yeah, I think that this focus on curating packages on PyPI is reminiscent about 
the old yarn about looking for lost keys under the streetlight because it's 
dark everywhere else.  We're all familiar with web services and data formats, 
so we want to somehow have a data format or a web service be the answer to this 
problem.  But I don't believe that's where the problem is.

("this problem" being "let's make it easy and fun to a) bootstrap a common 
Python experimentation environment across multiple machines and b) _know that 
you have to do that_")

At the beginning of your story you mentioned the GUI client - that is the 
missing piece ;).  I've been saying for years that we need a that 
lets you easily bootstrap all this stuff: walk you through installing C dev 
tools if your packages need them, present a GUI search interface to finding 
packages, present a normal "file->open" dialog for selecting a location for a 
new virtualenv, automatically pop open a terminal, launching a Jupyter notebook 
whose kernel is pointed at said environment...

User-curated package sets strikes me as the _lowest_ priority feature out of 
all of those, if we are ordering by priority to deliver a good user experience. 
 I know "steam curators" have been 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Nick Coghlan
On 16 December 2016 at 07:14, Glyph Lefkowitz 

> On Dec 15, 2016, at 6:39 AM, Donald Stufft  wrote:
> Theoretically we could allow people to not just select packages, but also
> package specifiers for their “curated package set”, so instead of saying
> “requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we
> really wanted to get slick we could even provide a requirements.txt file
> format, and have people able to install the entire set by doing something
> like:
> $ pip install -r
> my-cool-set/requirements.txt
> Can't people already do this by publishing a package that just depends on
> their whole 'package set'?

Technically, sure, but it adds a lot of overhead. The closest equivalent
right now would be maintaining a source control repo somewhere with various
requirements files in it.

However, at an ecosystem level, that doesn't have the same user experience
impact. The idea of building this into PyPI itself would be to *reshape the
learning curve of how people learn about dependency management as they're
introduced to Python*.

Going back to the CurseGaming example, I actually use the free version of
their client to manage the Warcraft addons on my gaming PC. The basic usage
model is really simple and (not coincidentally) very similar to the way the
Add-on manager works in Firefox and other GUI apps with integrated plugin

- you have an "Installed" tab for the addons you have installed
- when you start the client, it checks for updates for all your installed
addons and the out of date ones gain an "Update" button
- there's a separate tab where you can search all the available addons and
install new ones

I've never used any of Curse's other clients (like the Minecraft or Kerbal
Space Program ones), but I assume they operate in a similar way.

The paid tier of the Curse Client, and the account sync feature of Firefox,
then offer the ability to synchronize your installed addons across
machines. (There are also a lot of similarities between this model and the
way mobile app stores work)

A comparable UX for Python/PyPI/pip would focus less on the
library-and-application development cases (where the presence of source
control is assumed), and more on the ad hoc scripting and
learning-to-program use cases, where you're typically more interested in
"--user" installations and the question of which parts of the Python
ecosystem are just an import away than you are in reproducability and

The ecosystem level learning curve then becomes:

- did you know you can back up your list of user installed packages to PyPI?
- did you know you can use PyPI to sync your user installs between systems?
- did you know you can use PyPI to categorise your user installs and share
them with others?
- OK, now it's time to start learning about version control, virtual
environments and automated testing

It wouldn't necessarily make sense to bake this *directly* into Warehouse,
and the Mozilla folks responsible for Firefox Sync could no doubt offer
real word guidance on the infrastructure and support needed to operate a
service like that at scale, but the core concept of allowing package
management to be introduced independently of both version control and
virtual environments sounds potentially valuable to me.


Nick Coghlan   |   |   Brisbane, Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Nick Coghlan
On 16 December 2016 at 05:50, Paul Moore  wrote:
> On 15 December 2016 at 19:13, Wes Turner  wrote:
>>> Just to add my POV, I also find your posts unhelpful, Wes. There's not
>>> enough information for me to evaluate what you say, and you offer no
>>> actual solutions to what's being discussed.
>> I could quote myself suggesting solutions in this thread, if you like?
> You offer lots of pointers to information. But that's different.

Exactly. There are *lots* of information processing standards out
there, and lots of things we *could* provide natively that simply
aren't worth the hassle since folks that care can provide them as
"after market addons" for the audiences that considers them relevant.

For example, a few things that can matter to different audiences are:

- SPDX (Software Package Data Exchange) identifiers for licenses
- CPE (Common Product Enumeration) and SWID (Software Identification)
tags for published software
- DOI (Digital Object Identifier) tags for citation purposes
- Common Criteria certification for software supply chains

I don't push for these upstream in distutils-sig not because I don't
think they're important in general, but because I *don't think they're
a priority for distutils-sig*. If you're teaching Python to school
students, or teaching engineers and scientists how to better analyse
their own data, or building a web service for yourself or your
employer, these kinds of things simply don't matter.

The end users that care about them are well-positioned to tackle them
on their own (or pay other organisations to do it for them), and
because they span arbitrary publishing communities anyway, it doesn't
really matter all that much if any given publishing community
participates directly in the process (the only real beneficiaries are
the intermediaries that actively blur the distinctions between the
cooperative communities and the recalcitrant ones).

> Anyway, let's just agree to differ - I can skip your mails if they
> aren't helpful to me, and you don't need to bother about the fact that
> you're not getting your points across to me.

I consider it fairly important that we have a reasonably common
understanding of the target userbase for direct consumption of PyPI
data, and what we expect to be supplied as third party services. It's
also important that we have a shared understanding of how to
constructively frame proposals for change.

For the former, the Semantic Web, and folks that care about Semantic
Web concepts like "Linked Data" in the abstract sense are not part of
our primary audience. We don't go out of our way to make their lives
difficult, but "it makes semantic analysis easier" also isn't a
compelling rationale for change.

For the latter, some variants of constructive proposals look like:

- "this kind of user has this kind of problem and this proposed
solution will help mitigate it this way (and, by the way, here's an
existing standard we can use)"
- "this feature exists in , it's really
valuable to users for , how about we offer it by
- "I wrote  for myself, and I think it would also help others
for , can you help me make it more widely known and

They don't look like "Here's a bunch of technologies and organisations
that exist on the internet that may in some way potentially be
relevant to the management of a software distribution network", and
nor does it look like "This data modeling standard exists, so we
should use it, even though it doesn't actually simplify our lives or
our users' lives in any way, and in fact makes them more complicated".

> Who knows, one day I
> might find the time to look into JSON-LD, at which point I may or may
> not understand why you think it's such a useful tool for solving all
> these problems (in spite of the fact that no-one else seems to think
> the same...)

I *have* looked at JSON-LD (based primarily on Wes's original
suggestions), both from the perspective of the Python packaging
ecosystem specifically, as well as my day job working on software
supply chain management.

My verdict was that for managing a dependency graph implementation, it
ends up in the category of technologies that qualify as "interesting,
but not helpful". In many ways, it's the urllib2 of data linking -
just as urllib2 gives you a URL handling framework which you can
configure to handle HTTP rather than just providing a HTTP-specific
interface the way requests does [1], JSON-LD gives you a data linking
framework, which you can then use to define links between your data,
rather than just linking the data directly in a domain-appropriate
fashion. Using a framework for the sake of using a framework rather
than out of a genuine engineering need doesn't tend to lead to good
software systems.

Wes seems to think that my perspective on this is born out of
ignorance, so repeatedly bringing it up may make me change my point of
view. However, our problems haven't changed, and the nature 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Glyph Lefkowitz

> On Dec 15, 2016, at 6:39 AM, Donald Stufft  wrote:
>> On Dec 15, 2016, at 9:35 AM, Steve Dower > > wrote:
>> The "curated package sets" on PyPI idea sounds a bit like Steam's curator 
>> lists, which I like to think of as Twitter for game reviews. You can follow 
>> a curator to see their comments on particular games, and the most popular 
>> curators have their comments appear on the actual listings too.
>> Might be interesting to see how something like that worked for PyPI, though 
>> the initial investment is pretty high. (It doesn't solve the coherent bundle 
>> problem either, just the discovery of good libraries problem.)
> Theoretically we could allow people to not just select packages, but also 
> package specifiers for their “curated package set”, so instead of saying 
> “requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we 
> really wanted to get slick we could even provide a requirements.txt file 
> format, and have people able to install the entire set by doing something 
> like:
> $ pip install -r 

Can't people already do this by publishing a package that just depends on their 
whole 'package set'?


Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Paul Moore
On 15 December 2016 at 19:13, Wes Turner  wrote:
>> Just to add my POV, I also find your posts unhelpful, Wes. There's not
>> enough information for me to evaluate what you say, and you offer no
>> actual solutions to what's being discussed.
> I could quote myself suggesting solutions in this thread, if you like?

You offer lots of pointers to information. But that's different.

Anyway, let's just agree to differ - I can skip your mails if they
aren't helpful to me, and you don't need to bother about the fact that
you're not getting your points across to me. Who knows, one day I
might find the time to look into JSON-LD, at which point I may or may
not understand why you think it's such a useful tool for solving all
these problems (in spite of the fact that no-one else seems to think
the same...)

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Wes Turner
On Thursday, December 15, 2016, Paul Moore  wrote:

> On 15 December 2016 at 15:58, Nick Coghlan  > wrote:
> > On 16 December 2016 at 01:38, Wes Turner  > wrote:
> >> On Thursday, December 15, 2016, Nick Coghlan  > wrote:
> >>> This answer hasn't changed the last dozen times you've brought up
> >>> JSON-LD. It isn't *going* to change. So please stop bringing it up.
> >>
> >>
> >> No, the problem is the same; and solving it (joining user-specific
> package
> >> metadata with central repository metadata on a common URI) with web
> >> standards is the best approach.
> >
> > Then do what Donald did with go develop your own PyPI
> > competitor that uses your new and better approach to prove your point.
> >
> > As things stand, you're just generating noise and link spam on the
> > list, and asking politely for you to stop it doesn't appear to be
> > working
> Just to add my POV, I also find your posts unhelpful, Wes. There's not
> enough information for me to evaluate what you say, and you offer no
> actual solutions to what's being discussed.

I could quote myself suggesting solutions in this thread, if you like?

> As Nick says, if you can demonstrate what you're suggesting via a
> prototype implementation, that might help. But endless posts of links
> to standards documents that I have neither the time nor the
> inclination to read are not useful.

I'm suggesting that the solution here is to create version-controlled
collections of resources with metadata.

With types, those are Collection s of CreativeWork s. (Or, e.g.
SoftwareApplicationCollection s of SoftwareApplication s)

With Django, one would create a ListView generic view for the given
resource type; and then define e.g. an for DRF (Django REST
Framework), and an for Haystack (Elastic search).

A Pyramid implementation would be similar; with additional tests and code
just like Donald.

[Links omitted because you can now copy/paste keywords into the search
engines yourself; because you're unable to scroll past citations to find
the content you're looking for.]

DWBP and BP-LD would be useful reading for Warehouse API developers seeking
to add additional functionality to support the aforementioned use cases
with maximally-compatible linked open data.

Have a good day.

> Paul
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Wes Turner
On Thursday, December 15, 2016, Nick Coghlan  wrote:

> On 16 December 2016 at 01:38, Wes Turner  > wrote:
> > On Thursday, December 15, 2016, Nick Coghlan  > wrote:
> >> This answer hasn't changed the last dozen times you've brought up
> >> JSON-LD. It isn't *going* to change. So please stop bringing it up.
> >
> >
> > No, the problem is the same; and solving it (joining user-specific
> package
> > metadata with central repository metadata on a common URI) with web
> > standards is the best approach.
> Then do what Donald did with go develop your own PyPI
> competitor that uses your new and better approach to prove your point.
> As things stand, you're just generating noise and link spam on the
> list, and asking politely for you to stop it doesn't appear to be
> working

I've intentionally included the URLs which support my position; none of
which do I have any commercial interest in. This is not link spam, this is
me explaining your problem to you in order to save time and money.

There are your:
- existing solutions that work today
- types (web standard types)
- use cases
- server load engineering challenges (web apps must cache)

Have a good day.

> Regards,
> Nick.
> --
> Nick Coghlan   |    |   Brisbane,
> Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Paul Moore
On 15 December 2016 at 15:58, Nick Coghlan  wrote:
> On 16 December 2016 at 01:38, Wes Turner  wrote:
>> On Thursday, December 15, 2016, Nick Coghlan  wrote:
>>> This answer hasn't changed the last dozen times you've brought up
>>> JSON-LD. It isn't *going* to change. So please stop bringing it up.
>> No, the problem is the same; and solving it (joining user-specific package
>> metadata with central repository metadata on a common URI) with web
>> standards is the best approach.
> Then do what Donald did with go develop your own PyPI
> competitor that uses your new and better approach to prove your point.
> As things stand, you're just generating noise and link spam on the
> list, and asking politely for you to stop it doesn't appear to be
> working

Just to add my POV, I also find your posts unhelpful, Wes. There's not
enough information for me to evaluate what you say, and you offer no
actual solutions to what's being discussed.

As Nick says, if you can demonstrate what you're suggesting via a
prototype implementation, that might help. But endless posts of links
to standards documents that I have neither the time nor the
inclination to read are not useful.

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Nick Coghlan
On 16 December 2016 at 01:38, Wes Turner  wrote:
> On Thursday, December 15, 2016, Nick Coghlan  wrote:
>> This answer hasn't changed the last dozen times you've brought up
>> JSON-LD. It isn't *going* to change. So please stop bringing it up.
> No, the problem is the same; and solving it (joining user-specific package
> metadata with central repository metadata on a common URI) with web
> standards is the best approach.

Then do what Donald did with go develop your own PyPI
competitor that uses your new and better approach to prove your point.

As things stand, you're just generating noise and link spam on the
list, and asking politely for you to stop it doesn't appear to be


Nick Coghlan   |   |   Brisbane, Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Wes Turner
On Thursday, December 15, 2016, Wes Turner  wrote:

> On Thursday, December 15, 2016, Nick Coghlan  > wrote:
>> On 16 December 2016 at 00:39, Donald Stufft  wrote:
>> > Theoretically we could allow people to not just select packages, but
>> also
>> > package specifiers for their “curated package set”, so instead of saying
>> > “requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we
>> > really wanted to get slick we could even provide a requirements.txt file
>> > format, and have people able to install the entire set by doing
>> something
>> > like:
>> >
>> > $ pip install -r
>> >
>> CurseGaming provide addon managers for a variety of game addons
>> (Warcraft, Minecraft, etc), and the ability to define "AddOn Packs" is
>> one of the ways they make it useful to have an account on the site
>> even if you don't publish any addons of your own. Even if you don't
>> make them public, you can still use them to sync your addon sets
>> between different machines.
>> In the context of Python, where I can see this kind of thing being
>> potentially useful is for folks to manage package sets that aren't
>> necessarily coupled to any specific project, but match the way they
>> *personally* work.
>> - "These are the packages I like to have installed to --user"
>> - "These are the packages I use to start a CLI app"
>> - "These are the packages I use to start a web app"
>> - etc...
> Does a requirements.txt in a {git,} repo solve for this already?

Could you just generate a README.rst for a long_description from
requirements.txt (or, or pipfile),
store that in a {git,} repository,
and use the existing pypi release versioning and upload machinery?

Or would it be more useful to surface the other graph edges on project

- These additional queries would be less burdensome as AJAX requests to
JSON REST API views. Project pages could continue to load without waiting
for these additional cached edge bundles (and interactionStatistic(s)) to
  - "This SoftwarePackage is in the following n SoftwarePackageCollections,
with the following comments and reviews"
  - "This SoftwarePackage has received n UserLikes (through Warehouse)"

> A Collection contains (hasPart) CreativeWorks
> -
> -
> RDFa and JSONLD representations do parse as ordered lists.
> SoftwarePackageCollection
> SoftwareApplicationCollection
>> It also provides a way for people to vote on projects that's a little
>> more meaningful than stars - projects that appear in a lot of personal
>> stack definitions are likely to be generally valuable (the closest
>> equivalent to that today is mining code repositories like GitHub for
>> requirements.txt files and seeing what people are using that way).
> >
> D: CreativeWork
> - is now
> -
> (These are write-heavy features: they would change the database load of
> Warehouse)
>> So yeah, if folks interested in this were to add it to Warehouse (and
>> hence, I think it would definitely be a valuable enhancement
>> to the overall ecosystem. "What needs to be implemented in order to be
>> able to shut down the legacy service at" is the
>> *PSF's* focus, but that doesn't mean it needs to be everyone's focus.
>> Cheers,
>> Nick.
>> --
>> Nick Coghlan   |   |   Brisbane, Australia
>> ___
>> Distutils-SIG maillist  -
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Wes Turner
On Thursday, December 15, 2016, Nick Coghlan  wrote:

> On 16 December 2016 at 00:57, Wes Turner  > wrote:
> > This would be a graph. JSONLD?
> > #PEP426JSONLD:
> > -
> > -
> >
> > With JSONLD, we could merge SoftwarePackage metadata with
> > SoftwarePackageCollection metadata (just throwing some types out there).
> Wes, JSON-LD is a metasystem used for descriptive analytics across
> mixed datasets, which *isn't a problem we have*. We have full
> authority over the data formats we care about, and the user needs that
> matter to distutils-sig are:
> - publishers of Python packages
> - consumers of Python packages
> - maintainers of the toolchain
> It would *absolutely* make sense for Semantic Web folks to get
> involved in the project (either directly, or by building
> a separate service backed by the data set) and seek to
> produce a global set of semantically linked data that spans not only
> dependencies within language ecosystems, but also dependencies between
> them. It *doesn't* make sense for every single language ecosystem to
> come up with its own unique spin on how to incorporate software
> packages into semantic web models, nor does it make sense to try to
> warp the Python packaging user experience to better meet the needs of
> taxonomists of knowledge.
> This answer hasn't changed the last dozen times you've brought up
> JSON-LD. It isn't *going* to change. So please stop bringing it up.

No, the problem is the same; and solving it (joining user-specific package
metadata with central repository metadata on a common URI) with web
standards is the best approach.

> Regards,
> Nick.
> --
> Nick Coghlan   |    |   Brisbane,
> Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Wes Turner
On Thursday, December 15, 2016, Nick Coghlan  wrote:

> On 16 December 2016 at 00:39, Donald Stufft  > wrote:
> > Theoretically we could allow people to not just select packages, but also
> > package specifiers for their “curated package set”, so instead of saying
> > “requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we
> > really wanted to get slick we could even provide a requirements.txt file
> > format, and have people able to install the entire set by doing something
> > like:
> >
> > $ pip install -r
> >
> CurseGaming provide addon managers for a variety of game addons
> (Warcraft, Minecraft, etc), and the ability to define "AddOn Packs" is
> one of the ways they make it useful to have an account on the site
> even if you don't publish any addons of your own. Even if you don't
> make them public, you can still use them to sync your addon sets
> between different machines.
> In the context of Python, where I can see this kind of thing being
> potentially useful is for folks to manage package sets that aren't
> necessarily coupled to any specific project, but match the way they
> *personally* work.
> - "These are the packages I like to have installed to --user"
> - "These are the packages I use to start a CLI app"
> - "These are the packages I use to start a web app"
> - etc...

Does a requirements.txt in a {git,} repo solve for this already?

A Collection contains (hasPart) CreativeWorks


RDFa and JSONLD representations do parse as ordered lists.


> It also provides a way for people to vote on projects that's a little
> more meaningful than stars - projects that appear in a lot of personal
> stack definitions are likely to be generally valuable (the closest
> equivalent to that today is mining code repositories like GitHub for
> requirements.txt files and seeing what people are using that way). >

D: CreativeWork
- is now

(These are write-heavy features: they would change the database load of

> So yeah, if folks interested in this were to add it to Warehouse (and
> hence, I think it would definitely be a valuable enhancement
> to the overall ecosystem. "What needs to be implemented in order to be
> able to shut down the legacy service at" is the
> *PSF's* focus, but that doesn't mean it needs to be everyone's focus.
> Cheers,
> Nick.
> --
> Nick Coghlan   |    |   Brisbane,
> Australia
> ___
> Distutils-SIG maillist  - 
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Nick Coghlan
On 16 December 2016 at 00:57, Wes Turner  wrote:
> This would be a graph. JSONLD?
> -
> -
> With JSONLD, we could merge SoftwarePackage metadata with
> SoftwarePackageCollection metadata (just throwing some types out there).

Wes, JSON-LD is a metasystem used for descriptive analytics across
mixed datasets, which *isn't a problem we have*. We have full
authority over the data formats we care about, and the user needs that
matter to distutils-sig are:

- publishers of Python packages
- consumers of Python packages
- maintainers of the toolchain

It would *absolutely* make sense for Semantic Web folks to get
involved in the project (either directly, or by building
a separate service backed by the data set) and seek to
produce a global set of semantically linked data that spans not only
dependencies within language ecosystems, but also dependencies between
them. It *doesn't* make sense for every single language ecosystem to
come up with its own unique spin on how to incorporate software
packages into semantic web models, nor does it make sense to try to
warp the Python packaging user experience to better meet the needs of
taxonomists of knowledge.

This answer hasn't changed the last dozen times you've brought up
JSON-LD. It isn't *going* to change. So please stop bringing it up.


Nick Coghlan   |   |   Brisbane, Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Freddy Rietdijk
> Theoretically we could allow people to not just select packages, but also
package specifiers for their “curated package set”, so instead of saying
“requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we
really wanted to get slick we could even provide a requirements.txt file
format, and have people able to install the entire set by doing something
   $ pip install -r

What you want to be able to do is install a set of (abstract) packages
based on a curated set of (concrete) packages. Doing that you just need a
file with your concrete dependencies, and now we're back at pipfile and

On Thu, Dec 15, 2016 at 4:10 PM, Nick Coghlan  wrote:

> On 16 December 2016 at 00:39, Donald Stufft  wrote:
> > Theoretically we could allow people to not just select packages, but also
> > package specifiers for their “curated package set”, so instead of saying
> > “requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we
> > really wanted to get slick we could even provide a requirements.txt file
> > format, and have people able to install the entire set by doing something
> > like:
> >
> > $ pip install -r
> >
> CurseGaming provide addon managers for a variety of game addons
> (Warcraft, Minecraft, etc), and the ability to define "AddOn Packs" is
> one of the ways they make it useful to have an account on the site
> even if you don't publish any addons of your own. Even if you don't
> make them public, you can still use them to sync your addon sets
> between different machines.
> In the context of Python, where I can see this kind of thing being
> potentially useful is for folks to manage package sets that aren't
> necessarily coupled to any specific project, but match the way they
> *personally* work.
> - "These are the packages I like to have installed to --user"
> - "These are the packages I use to start a CLI app"
> - "These are the packages I use to start a web app"
> - etc...
> It also provides a way for people to vote on projects that's a little
> more meaningful than stars - projects that appear in a lot of personal
> stack definitions are likely to be generally valuable (the closest
> equivalent to that today is mining code repositories like GitHub for
> requirements.txt files and seeing what people are using that way).
> So yeah, if folks interested in this were to add it to Warehouse (and
> hence, I think it would definitely be a valuable enhancement
> to the overall ecosystem. "What needs to be implemented in order to be
> able to shut down the legacy service at" is the
> *PSF's* focus, but that doesn't mean it needs to be everyone's focus.
> Cheers,
> Nick.
> --
> Nick Coghlan   |   |   Brisbane, Australia
> ___
> Distutils-SIG maillist  -
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Wes Turner
On Thursday, December 15, 2016, Wes Turner  wrote:

> On Thursday, December 15, 2016, Donald Stufft  > wrote:
>> On Dec 15, 2016, at 9:35 AM, Steve Dower  wrote:
>> The "curated package sets" on PyPI idea sounds a bit like Steam's curator
>> lists, which I like to think of as Twitter for game reviews. You can follow
>> a curator to see their comments on particular games, and the most popular
>> curators have their comments appear on the actual listings too.
>> Might be interesting to see how something like that worked for PyPI,
>> though the initial investment is pretty high. (It doesn't solve the
>> coherent bundle problem either, just the discovery of good libraries
>> problem.)
>> Theoretically we could allow people to not just select packages, but also
>> package specifiers for their “curated package set”, so instead of saying
>> “requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we
>> really wanted to get slick we could even provide a requirements.txt file
>> format, and have people able to install the entire set by doing something
>> like:
>> $ pip install -r
>> my-cool-set/requirements.txt
> With version control?
> $ pip install -r
> requirements.txt
> $ pip install -r
> requirements.txt
> This would be a graph. JSONLD?
> -
> -
> With JSONLD, we could merge SoftwarePackage metadata with
> SoftwarePackageCollection metadata (just throwing some types out there).
> A is a
> CreativeWork .
- as the canonical project

- There's almost certainly a transform of TOML to JSONLD (" TOMLLD ")
- There is a standardized transform of JSONLD to RDF
- YAMLLD is a stricter subset of YAML (because just OrderedDicts

>> —
>> Donald Stufft
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Nick Coghlan
On 16 December 2016 at 00:39, Donald Stufft  wrote:
> Theoretically we could allow people to not just select packages, but also
> package specifiers for their “curated package set”, so instead of saying
> “requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we
> really wanted to get slick we could even provide a requirements.txt file
> format, and have people able to install the entire set by doing something
> like:
> $ pip install -r

CurseGaming provide addon managers for a variety of game addons
(Warcraft, Minecraft, etc), and the ability to define "AddOn Packs" is
one of the ways they make it useful to have an account on the site
even if you don't publish any addons of your own. Even if you don't
make them public, you can still use them to sync your addon sets
between different machines.

In the context of Python, where I can see this kind of thing being
potentially useful is for folks to manage package sets that aren't
necessarily coupled to any specific project, but match the way they
*personally* work.

- "These are the packages I like to have installed to --user"
- "These are the packages I use to start a CLI app"
- "These are the packages I use to start a web app"
- etc...

It also provides a way for people to vote on projects that's a little
more meaningful than stars - projects that appear in a lot of personal
stack definitions are likely to be generally valuable (the closest
equivalent to that today is mining code repositories like GitHub for
requirements.txt files and seeing what people are using that way).

So yeah, if folks interested in this were to add it to Warehouse (and
hence, I think it would definitely be a valuable enhancement
to the overall ecosystem. "What needs to be implemented in order to be
able to shut down the legacy service at" is the
*PSF's* focus, but that doesn't mean it needs to be everyone's focus.


Nick Coghlan   |   |   Brisbane, Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Wes Turner
On Thursday, December 15, 2016, Donald Stufft  wrote:

> On Dec 15, 2016, at 9:35 AM, Steve Dower  > wrote:
> The "curated package sets" on PyPI idea sounds a bit like Steam's curator
> lists, which I like to think of as Twitter for game reviews. You can follow
> a curator to see their comments on particular games, and the most popular
> curators have their comments appear on the actual listings too.
> Might be interesting to see how something like that worked for PyPI,
> though the initial investment is pretty high. (It doesn't solve the
> coherent bundle problem either, just the discovery of good libraries
> problem.)
> Theoretically we could allow people to not just select packages, but also
> package specifiers for their “curated package set”, so instead of saying
> “requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we
> really wanted to get slick we could even provide a requirements.txt file
> format, and have people able to install the entire set by doing something
> like:
> $ pip install -r
> my-cool-set/requirements.txt

With version control?

$ pip install -r

$ pip install -r

This would be a graph. JSONLD?

With JSONLD, we could merge SoftwarePackage metadata with
SoftwarePackageCollection metadata (just throwing some types out there).

A is a

> —
> Donald Stufft
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Steve Dower
The "curated package sets" on PyPI idea sounds a bit like Steam's curator 
lists, which I like to think of as Twitter for game reviews. You can follow a 
curator to see their comments on particular games, and the most popular 
curators have their comments appear on the actual listings too.

Might be interesting to see how something like that worked for PyPI, though the 
initial investment is pretty high. (It doesn't solve the coherent bundle 
problem either, just the discovery of good libraries problem.)

Top-posted from my Windows Phone

-Original Message-
From: "Donald Stufft" <>
Sent: ‎12/‎15/‎2016 4:21
To: "Freddy Rietdijk" <>
Cc: "DistUtils mailing list" <>; "Barry Warsaw" 
Subject: Re: [Distutils] Maintaining a curated set of Python packages

On Dec 15, 2016, at 7:13 AM, Freddy Rietdijk <> wrote:

> Putting the conclusion first, I do see value in better publicising

> "Recommended libraries" based on some automated criteria like:

Yes, we should recommend third-party libraries in a trusted place like the 
documentation of CPython. The amount of packages that are available can be 
overwhelming. Yet, defining a set of packages that are recommended, and perhaps 
working together, is still far from defining an exact set of packages that are 
known to work together, something which I proposed here.

We could theoretically bake this into PyPI itself, though I’m not sure if that 
makes sense.

We could also probably bake something like “curated package sets” into PyPI 
where individual users (or groups of users) can create their own view of PyPI 
for people to use, while still relying on all of the infrastructure of PyPI. 
Although I’m not sure that makes any sense either.

Donald Stufft___
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Donald Stufft

> On Dec 15, 2016, at 9:35 AM, Steve Dower  wrote:
> The "curated package sets" on PyPI idea sounds a bit like Steam's curator 
> lists, which I like to think of as Twitter for game reviews. You can follow a 
> curator to see their comments on particular games, and the most popular 
> curators have their comments appear on the actual listings too.
> Might be interesting to see how something like that worked for PyPI, though 
> the initial investment is pretty high. (It doesn't solve the coherent bundle 
> problem either, just the discovery of good libraries problem.)

Theoretically we could allow people to not just select packages, but also 
package specifiers for their “curated package set”, so instead of saying 
“requests”, you could say “requests~=2.12” or “requests==2.12.2”. If we really 
wanted to get slick we could even provide a requirements.txt file format, and 
have people able to install the entire set by doing something like:

$ pip install -r

Donald Stufft

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Donald Stufft

> On Dec 15, 2016, at 7:13 AM, Freddy Rietdijk  wrote:
> > Putting the conclusion first, I do see value in better publicising
> > "Recommended libraries" based on some automated criteria like:
> Yes, we should recommend third-party libraries in a trusted place like the 
> documentation of CPython. The amount of packages that are available can be 
> overwhelming. Yet, defining a set of packages that are recommended, and 
> perhaps working together, is still far from defining an exact set of packages 
> that are known to work together, something which I proposed here.

We could theoretically bake this into PyPI itself, though I’m not sure if that 
makes sense.

We could also probably bake something like “curated package sets” into PyPI 
where individual users (or groups of users) can create their own view of PyPI 
for people to use, while still relying on all of the infrastructure of PyPI. 
Although I’m not sure that makes any sense either.

Donald Stufft

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-15 Thread Freddy Rietdijk
It's interesting to read about how other distributions upgrade their
package sets. In Nixpkgs most packages are updated manually. Some
frameworks/languages provide their dependencies declarative, in which case
it becomes 'straightforward' to include whole package sets, like in the
case of Haskell. Some expressions need to be overridden manually because
e.g. they require certain system libraries. It's manual work, but not that
much. This is what I would like to see for Python as well.

The Python packages we still update manually although we have tools to
automate most of it. The reason for not using those tools is because a) it
means evaluating or building parts of the packages to get the dependencies
and b) too often upstream pins versions of dependencies which turns out to
be entirely unnecessary, and would therefore prevent an upgrade. Even so,
we have over 1500 Python packages per interpreter version that according to
our CI seem to work together. We do only build on 3 architectures (i386,
amd64 and darwin/osx). Compatibility with the latter is sometimes an issue
because its guessing what Apple has changed when releasing a new version.

> I'm not sure how useful it would be higher up the food chain, since those
contexts will be
> different enough to cause both false positives and false negatives.  And
> does often take quite a bit of focused engineering effort to monitor
> which don't promote (something we want to automate),

In my experience the manual work that typically needs to be done is a)
making available Python dependencies, b) unpinning versions when
unnecessary and report upstream, and c) making sure the package finds the
system libraries. Issues with packages that cannot be upgraded because of a
version of a system dependency I haven't yet encountered. In my proposal a)
and b) would be fixed by the curated package set.

> :
> - -> pip-compile -> requirements.txt (~pipfile.lock)

Yep, there are tools, which get the job done when developing and using a
set of packages. Now, you want to deploy your app. You can't use your
requirements.txt on a Linux distribution because they have a curated set of
packages which is typically different from your set (although maybe they do
provide tools to package the versions you need). But, you can choose to use
a second package manager, pip or conda. System dependencies? That's
something conda can somewhat take of. Problem solved you would say, except
now you have multiple package managers that you need.

> Practically, a developer would want a subset of the given known-good-set
(and then additional packages), so:
> - fork/copy requirements--MM-REV--.txt
> - #comment out unused deps
> - add '-r addl-requirements.txt'

See the link I shared earlier on how this is already done with Haskell and
stack.yaml and how it could be used with `pipfile`

> Putting the conclusion first, I do see value in better publicising
> "Recommended libraries" based on some automated criteria like:

Yes, we should recommend third-party libraries in a trusted place like the
documentation of CPython. The amount of packages that are available can be
overwhelming. Yet, defining a set of packages that are recommended, and
perhaps working together, is still far from defining an exact set of
packages that are known to work together, something which I proposed here.

> As pointed out by others, there are external groups doing "curating".
conda-forge is one such project, so I'll comment from that perspective

I haven't used conda in a long time, and conda-forge didn't exist back
then. I see versions are pinned, but versions of dependencies sometimes as
well. If I choose to install *all* the packages available via conda-forge,
will I get a fixed package set, or will the SAT-solver try to find a
working set (and possibly fail at it)? I hope it is the former, since if it
is the latter then it is not curated in how I meant it.

On Thu, Dec 15, 2016 at 6:22 AM, Nick Coghlan  wrote:

> On 15 December 2016 at 03:41, Chris Barker  wrote:
> [Barry wrote]
> >> Ubuntu has an elaborate automated system for testing some dimension of
> >> compatibility issues between packages, not just Python packages.  Debian
> >> has
> >> the same system but isn't gated on the results.
> >
> > This brings up the larger issue -- PyPi is inherently different than
> these
> > efforts -- PyPi has always been about each package author maintaining
> their
> > own package -- Linux distros and conda-forge, and ??? all have a small
> set
> > of core contributions that do the package maintenance.
> Fedora at least has just shy of 1900 people in the "packager" group,
> so I don't know that "small" is the right word in absolute terms :)
> However, relatively speaking, even a packager group that size is still
> an order of magnitude smaller than the 30k+ 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-14 Thread Nick Coghlan
On 15 December 2016 at 03:41, Chris Barker  wrote:
[Barry wrote]
>> Ubuntu has an elaborate automated system for testing some dimension of
>> compatibility issues between packages, not just Python packages.  Debian
>> has
>> the same system but isn't gated on the results.
> This brings up the larger issue -- PyPi is inherently different than these
> efforts -- PyPi has always been about each package author maintaining their
> own package -- Linux distros and conda-forge, and ??? all have a small set
> of core contributions that do the package maintenance.

Fedora at least has just shy of 1900 people in the "packager" group,
so I don't know that "small" is the right word in absolute terms :)

However, relatively speaking, even a packager group that size is still
an order of magnitude smaller than the 30k+ publishers on PyPI (which
is in turn an order of magnitude smaller than the 180k+ registered
PyPI accounts)

> This is a large
> effort, and wold be insanely hard with the massive amount of stuff on
> PyPi
> In fact, I think the kinda-sort curation that comes from individual
> communities is working remarkably well:
> the scipy community
> the django community
> ...

Exactly. Armin Ronacher and a few others have also started a new
umbrella group on GitHub, Pallets, collecting together some of the key
infrastructure projects in the Flask ecosystem:

Dell/EMC's John Mark Walker has a recent article about this
"downstream distribution" formation process on, where
it's an emergent phenomenon arising from the needs of people that are
consuming open source components to achieve some particular purpose
rather than working on them for their own sake:

It's a fairly different activity from pure upstream development -
where upstream is a matter of "design new kinds and versions of Lego
bricks" (e.g. the Linux kernel, gcc, CPython, PyPI projects),
downstream integration is more "define new Lego kits using the already
available bricks" (e.g. Debian, Fedora, conda-forge), while commercial
product and service development is "We already put the Lego kit
together for you, so you can just use it" (e.g. Ubuntu, RHEL, Amazon
Linux, ActivePython, Enthought Canopy,


Nick Coghlan   |   |   Brisbane, Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-14 Thread Chris Barker - NOAA Federal
> I think it's unfair to describe these efforts as a "kludge";

I was specifically referring to using wheels to deliver C libs--
pip+wheel were not designed for that. But I don't mean to offend,
there has been a lot of great work done, and yes, the situation has
much improved as a result.

There are times when a kludge is called for!

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-14 Thread Glyph Lefkowitz

> On Dec 14, 2016, at 9:41 AM, Chris Barker  wrote:
> As pointed out by others, there are external groups doing "curating". 
> conda-forge is one such project, so I'll comment from that perspective:
> It's difficult because the definition of compatibility is highly dependent on
> the consumer's environment.  For example, C extension compatibility will
> depend on the version of libraries available on the platform versions you care
> about. 
> Indeed -- which is why Anaconda and conda-forge are built on conda rather 
> than pip -- it is designed to handle these issues.
> However with the many linux effort, and some efforts to kludge C libs into 
> binary wheels, pypi may just be able to handle more of these issues -- so 
> curating may have it's advantages.

I think it's unfair to describe these efforts as a "kludge"; many of the tools 
developed for manylinux1 et. al. are actually pretty sophisticated tooling with 
a mature ecosystem approach to library bundling.  Personally I have noticed a 
_massive_ reduction in the support overhead involved in getting new users spun 
up in the present Python packaging ecosystem.  Due to the availability of 
cross-platform wheels, it's possible to do a LOT more python development 
without a C compiler than it used to be.

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-14 Thread Chris Barker
As pointed out by others, there are external groups doing "curating".
conda-forge is one such project, so I'll comment from that perspective:

It's difficult because the definition of compatibility is highly dependent
> the consumer's environment.  For example, C extension compatibility will
> depend on the version of libraries available on the platform versions you
> care
> about.

Indeed -- which is why Anaconda and conda-forge are built on conda rather
than pip -- it is designed to handle these issues.

However with the many linux effort, and some efforts to kludge C libs into
binary wheels, pypi may just be able to handle more of these issues -- so
curating may have it's advantages.

As it happens, I recently proposed a version system for conda-forge (much
like Anaconda) where you would select a given conda-forge version, and ALL
the packages in that version would be expected to work together. the idea
is that you don't want one package dependent on libjpeg6 while another
depends on libjpeg7, for instance.

But the community has decided that rather than try to version the whole
system, we instead rely on robust "pinning" -- i.e. version specific
dependencies -- package A depends on libjpeg7, not just "libjpeg". Then
there are tools that go through all the packages and check for incompatible
pinnings, and update an build new versions where required (or at least ping
the maintainers of a package that un update is needed)

Ubuntu has an elaborate automated system for testing some dimension of
> compatibility issues between packages, not just Python packages.  Debian
> has
> the same system but isn't gated on the results.

This brings up the larger issue -- PyPi is inherently different than these
efforts -- PyPi has always been about each package author maintaining their
own package -- Linux distros and conda-forge, and ??? all have a small set
of core contributions that do the package maintenance. This is a large
effort, and wold be insanely hard with the massive amount of stuff on

In fact, I think the kinda-sort curation that comes from individual
communities is working remarkably well:

the scipy community
the django community



Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-10 Thread Wes Turner
Here are some standardized (conda) package versions:

On Thursday, December 8, 2016, Wes Turner  wrote:

> On Thursday, December 8, 2016, Nick Coghlan  > wrote:
>> Putting the conclusion first, I do see value in better publicising
>> "Recommended libraries" based on some automated criteria like:
>> - recommended in the standard library documentation
>> - available via 1 or more cross-platform commercial Python redistributors
>> - available via 1 or more Linux distro vendors
>> - available via 1 or more web service development platforms
> So these would be attributes tracked by a project maintainer and verified
> by the known-good-set maintainer? Or?
> (Again, here I reach for JSONLD. "count n" is only so useful; *which*
> {[re]distros, platforms, heartfelt testimonials from incredible experts}
> URLs )
> - test coverage
> - seclist contact info AND procedures
> - more than one super admin maintainer
> - what other criteria should/could/can we use to vet open source libraries?
>> That would be a potentially valuable service for folks new to the
>> world of open source that are feeling somewhat overwhelmed by the
>> sheer number of alternatives now available to them.
>> However, I also think that would better fit in with the aims of an
>> open source component tracking community like than it
>> does a publisher-centric community like distutils-sig.
> IDK if libraries are really in scope for stackshare. The feature
> upcoming/down voting is pretty cool.
>> The further comments below are just a bit more background on why I
>> feel the integration testing aspect of the suggestion isn't likely to
>> be particularly beneficial :)
> A catch-all for testing bits from application-specific integration test
> suites could be useful (and would likely require at least docker-compose,
> dox, kompose for working with actual data stores)
>> On 9 December 2016 at 01:10, Barry Warsaw  wrote:
>> > Still, there may be value in inter-Python package compatibility tests,
>> but
>> > it'll take serious engineering effort (i.e. $ and time), ongoing
>> maintenance,
>> > ongoing effort to fix problems, and tooling to gate installability of
>> failing
>> > packages (with overrides for downstreams which don't care or already
>> expend
>> > such effort).
>> I think this is really the main issue, as both desktop and server
>> environments are moving towards the integrated platform + isolated
>> applications approach popularised by mobile devices.
>> That means we end up with two very different variants of automated
>> integration testing:
>> - the application focused kind offered by the likes of and
>> (i.e. monitor for dependency updates, submit PRs to trigger
>> app level CI)
>> - the platform focused kind employed by distro vendors (testing all
>> the platform components work together, including the app isolation
>> features)
>> The first kind makes sense if you're building something that runs *on*
>> platforms (Docker containers, Snappy or FlatPak apps, web services,
>> mobile apps, etc).
>> The second kind inevitably ends up intertwined with the component
>> review and release engineering systems of the particular platform, so
>> it becomes really hard to collaborate cross-platform outside the
>> context of specific projects like OpenStack that provide clear
>> definitions for "What components do we collectively depend on that we
>> need to test together?" and "What does 'working' mean in the context
>> of this project?".
>> Accordingly, for an initiative like this to be successful, it would
>> need to put some thought up front into the questions of:
>> 1. Who are the intended beneficiaries of the proposal?
>> 2. What problem does it address that will prompt them to contribute
>> time and/or money to solving it?
>> 3. What do we expect people to be able to *stop doing* if the project
>> proves successful?
>> For platform providers, a generic "stdlib++" project wouldn't really
>> reduce the amount of integration testing we'd need to do ourselves (we
>> already don't test arbitrary combinations of dependencies, just the
>> ones we provide at any given point in time).
>> For application and service developers, the approach of pinning
>> dependencies to specific versions and treating updates like any other
>> source code change already works well in most cases.
>> That leaves library and framework developers, who currently tend to
>> adopt the policy of "for each version of Python that we support, we
>> test against the latest versions of our dependencies that were
>> available at the time we ran the test", leaving testing against older
>> versions to platform providers. If there's a key related framework

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-08 Thread Wes Turner
On Thursday, December 8, 2016, Nick Coghlan  wrote:

> Putting the conclusion first, I do see value in better publicising
> "Recommended libraries" based on some automated criteria like:
> - recommended in the standard library documentation
> - available via 1 or more cross-platform commercial Python redistributors
> - available via 1 or more Linux distro vendors
> - available via 1 or more web service development platforms
So these would be attributes tracked by a project maintainer and verified
by the known-good-set maintainer? Or?

(Again, here I reach for JSONLD. "count n" is only so useful; *which*
{[re]distros, platforms, heartfelt testimonials from incredible experts}
URLs )

- test coverage
- seclist contact info AND procedures
- more than one super admin maintainer
- what other criteria should/could/can we use to vet open source libraries?

> That would be a potentially valuable service for folks new to the
> world of open source that are feeling somewhat overwhelmed by the
> sheer number of alternatives now available to them.
> However, I also think that would better fit in with the aims of an
> open source component tracking community like than it
> does a publisher-centric community like distutils-sig.

IDK if libraries are really in scope for stackshare. The feature
upcoming/down voting is pretty cool.

> The further comments below are just a bit more background on why I
> feel the integration testing aspect of the suggestion isn't likely to
> be particularly beneficial :)

A catch-all for testing bits from application-specific integration test
suites could be useful (and would likely require at least docker-compose,
dox, kompose for working with actual data stores)

> On 9 December 2016 at 01:10, Barry Warsaw >
> wrote:
> > Still, there may be value in inter-Python package compatibility tests,
> but
> > it'll take serious engineering effort (i.e. $ and time), ongoing
> maintenance,
> > ongoing effort to fix problems, and tooling to gate installability of
> failing
> > packages (with overrides for downstreams which don't care or already
> expend
> > such effort).
> I think this is really the main issue, as both desktop and server
> environments are moving towards the integrated platform + isolated
> applications approach popularised by mobile devices.
> That means we end up with two very different variants of automated
> integration testing:
> - the application focused kind offered by the likes of and
> (i.e. monitor for dependency updates, submit PRs to trigger
> app level CI)
> - the platform focused kind employed by distro vendors (testing all
> the platform components work together, including the app isolation
> features)
> The first kind makes sense if you're building something that runs *on*
> platforms (Docker containers, Snappy or FlatPak apps, web services,
> mobile apps, etc).
> The second kind inevitably ends up intertwined with the component
> review and release engineering systems of the particular platform, so
> it becomes really hard to collaborate cross-platform outside the
> context of specific projects like OpenStack that provide clear
> definitions for "What components do we collectively depend on that we
> need to test together?" and "What does 'working' mean in the context
> of this project?".
> Accordingly, for an initiative like this to be successful, it would
> need to put some thought up front into the questions of:
> 1. Who are the intended beneficiaries of the proposal?
> 2. What problem does it address that will prompt them to contribute
> time and/or money to solving it?
> 3. What do we expect people to be able to *stop doing* if the project
> proves successful?
> For platform providers, a generic "stdlib++" project wouldn't really
> reduce the amount of integration testing we'd need to do ourselves (we
> already don't test arbitrary combinations of dependencies, just the
> ones we provide at any given point in time).
> For application and service developers, the approach of pinning
> dependencies to specific versions and treating updates like any other
> source code change already works well in most cases.
> That leaves library and framework developers, who currently tend to
> adopt the policy of "for each version of Python that we support, we
> test against the latest versions of our dependencies that were
> available at the time we ran the test", leaving testing against older
> versions to platform providers. If there's a key related framework
> that also provides LTS versions (e.g. Django), then some folks may add
> that to their test matrix as well.
> In that context, "Only breaks backwards compatibility for compelling
> reasons" becomes a useful long term survival trait for libraries and
> frameworks, as gratuitous breakages are likely to lead to people
> migrating away from particularly unreliable 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-08 Thread Nick Coghlan
Putting the conclusion first, I do see value in better publicising
"Recommended libraries" based on some automated criteria like:

- recommended in the standard library documentation
- available via 1 or more cross-platform commercial Python redistributors
- available via 1 or more Linux distro vendors
- available via 1 or more web service development platforms

That would be a potentially valuable service for folks new to the
world of open source that are feeling somewhat overwhelmed by the
sheer number of alternatives now available to them.

However, I also think that would better fit in with the aims of an
open source component tracking community like than it
does a publisher-centric community like distutils-sig.

The further comments below are just a bit more background on why I
feel the integration testing aspect of the suggestion isn't likely to
be particularly beneficial :)

On 9 December 2016 at 01:10, Barry Warsaw  wrote:
> Still, there may be value in inter-Python package compatibility tests, but
> it'll take serious engineering effort (i.e. $ and time), ongoing maintenance,
> ongoing effort to fix problems, and tooling to gate installability of failing
> packages (with overrides for downstreams which don't care or already expend
> such effort).

I think this is really the main issue, as both desktop and server
environments are moving towards the integrated platform + isolated
applications approach popularised by mobile devices.

That means we end up with two very different variants of automated
integration testing:

- the application focused kind offered by the likes of and (i.e. monitor for dependency updates, submit PRs to trigger
app level CI)
- the platform focused kind employed by distro vendors (testing all
the platform components work together, including the app isolation

The first kind makes sense if you're building something that runs *on*
platforms (Docker containers, Snappy or FlatPak apps, web services,
mobile apps, etc).

The second kind inevitably ends up intertwined with the component
review and release engineering systems of the particular platform, so
it becomes really hard to collaborate cross-platform outside the
context of specific projects like OpenStack that provide clear
definitions for "What components do we collectively depend on that we
need to test together?" and "What does 'working' mean in the context
of this project?".

Accordingly, for an initiative like this to be successful, it would
need to put some thought up front into the questions of:

1. Who are the intended beneficiaries of the proposal?
2. What problem does it address that will prompt them to contribute
time and/or money to solving it?
3. What do we expect people to be able to *stop doing* if the project
proves successful?

For platform providers, a generic "stdlib++" project wouldn't really
reduce the amount of integration testing we'd need to do ourselves (we
already don't test arbitrary combinations of dependencies, just the
ones we provide at any given point in time).

For application and service developers, the approach of pinning
dependencies to specific versions and treating updates like any other
source code change already works well in most cases.

That leaves library and framework developers, who currently tend to
adopt the policy of "for each version of Python that we support, we
test against the latest versions of our dependencies that were
available at the time we ran the test", leaving testing against older
versions to platform providers. If there's a key related framework
that also provides LTS versions (e.g. Django), then some folks may add
that to their test matrix as well.

In that context, "Only breaks backwards compatibility for compelling
reasons" becomes a useful long term survival trait for libraries and
frameworks, as gratuitous breakages are likely to lead to people
migrating away from particularly unreliable dependencies.


Nick Coghlan   |   |   Brisbane, Australia
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-08 Thread Jeremy Stanley
On 2016-12-08 10:05:47 -0600 (-0600), Wes Turner wrote:
> - pbr does away with and install_requires in favor of just
> requirements.txt

It doesn't entirely "do away with" (it still relies on a
relatively minimal boilerplate which loads its setuptools
entrypoint), but does allow you to basically abstract away most
common configuration into declarative setup.cfg and requirements.txt
files (similar to some of the pyproject.toml use cases Donald et al
have for PEP 518).
Jeremy Stanley
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-08 Thread Wes Turner
On Thursday, December 8, 2016, Wes Turner  wrote:

> On Thursday, December 1, 2016, Freddy Rietdijk  > wrote:
>> Hi,
>> I would like to propose that, as a community, we jointly maintain a
>> curated set of Python packages that are known to work together. These
>> packages would receive security updates for some time and every couple of
>> months a new major release of the curated set comes available. The idea of
>> this is inspired by Haskell LTS, so maybe we should call this PyPI LTS?
>> So why a PyPI LTS?
>> PyPI makes available all versions of packages that were uploaded, and by
>> default installers like pip will try to use the latest available versions
>> of packages, unless told otherwise. With a requirements.txt file (or a
>> future pipfile.lock) and we can pin as much as we like our
>> requirements of respectively the environment and package requirements,
>> thereby making a more reproducible environment possible and also fixing the
>> API for developers. Pinning requirements is often a manual job, although
>> one could use pip freeze or other tools.
> :
> - -> pip-compile -> requirements.txt (~pipfile.lock)
> - I can't remember whether pip-compile includes the checksum in the
> compiled requirements.txt
> - Current recommendation: sha256
> - (These are obviously different for platform-specific wheels and bdists)
> - Unlike pip freeze, pip-chill includes only top-level deps
>> A common problem is when two packages in a certain environment require
>> different versions of a package. Having a curated set of packages,
>> developers could be encouraged to test against the latest stable and
>> nightly of the curated package set, thereby increasing compatibility
>> between different packages, something I think we all want.
>> Having a compatible set of packages is not only interesting for
>> developers, but also for downstream distributions. All distributions try to
>> find a set of packages that are working together and release them. This is
>> a lot of work, and I think it would be in everyone's benefit if we try to
>> solve this issue together.
> I think conda has already been mentioned.
> - environment.yml :
> environment-from-file
> - "A community led collection of recipes, build infrastructure and
> distributions for the conda package manager."
> - "AppVeyor, CircleCI and TravisCI"
>> A possible solution
>> Downstream, that is developers and distributions, will need a set of
>> packages that are known to work together. At minimum this would consist of,
>> per package, the name of the package and its version, but for
>> reproducibility I would propose adding the filename and hash as well.
>> Because there isn't any reliable method to extract the requirements of a
>> package, I propose also including `setup_requires`, install_requires`, and
>> `tests_require` explicitly. That way, distributions can automatically build
>> recipes for the packages (although non-Python dependencies would still have
>> to be resolved by the distribution).
>> The package set would be released as lts--MM-REVISION, and developers
>> can choose to track a specific revision, but would typically be asked to
>> track only lts--MM which would resolve to the latest REVISION.
>> Because dependencies vary per Python language version, interpreter, and
>> operating system, we would have to have these sets for each combination and
>> therefore I propose having a source which evaluates to say a TOML/JSON file
>> per version/interpreter/OS.
>> How this source file should be written I don't know; while I think the
>> Nix expression language is an excellent choice for this, it is not possible
>> for everyone to use and therefore likely not an option.
> YAML: environment.yml, meta.yaml
> -
> requirements-section
> -
> Could/would there be a package with an integration test suite in tests/?
> Practically, a developer would want a subset of the given known-good-set
> (and then additional packages), so:
> - fork/copy requirements--MM-REV--.txt
> - #comment out unused deps
> - add '-r addl-requirements.txt'
>> Open questions
>> There are still plenty of open questions.
>> - Who decides when a package is updated that would break dependents? This
>> is an issue all distributions face, so maybe we should involve them.
> IDK if e.g. can post to a mailing list?
> - "Stop wasting your time by manually keeping track of changelogs.

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-08 Thread Wes Turner
On Thursday, December 1, 2016, Freddy Rietdijk 

> Hi,
> I would like to propose that, as a community, we jointly maintain a
> curated set of Python packages that are known to work together. These
> packages would receive security updates for some time and every couple of
> months a new major release of the curated set comes available. The idea of
> this is inspired by Haskell LTS, so maybe we should call this PyPI LTS?
> So why a PyPI LTS?
> PyPI makes available all versions of packages that were uploaded, and by
> default installers like pip will try to use the latest available versions
> of packages, unless told otherwise. With a requirements.txt file (or a
> future pipfile.lock) and we can pin as much as we like our
> requirements of respectively the environment and package requirements,
> thereby making a more reproducible environment possible and also fixing the
> API for developers. Pinning requirements is often a manual job, although
> one could use pip freeze or other tools.
> :

- -> pip-compile -> requirements.txt (~pipfile.lock)
- I can't remember whether pip-compile includes the checksum in the
compiled requirements.txt

- Current recommendation: sha256
- (These are obviously different for platform-specific wheels and bdists)

- Unlike pip freeze, pip-chill includes only top-level deps

> A common problem is when two packages in a certain environment require
> different versions of a package. Having a curated set of packages,
> developers could be encouraged to test against the latest stable and
> nightly of the curated package set, thereby increasing compatibility
> between different packages, something I think we all want.
> Having a compatible set of packages is not only interesting for
> developers, but also for downstream distributions. All distributions try to
> find a set of packages that are working together and release them. This is
> a lot of work, and I think it would be in everyone's benefit if we try to
> solve this issue together.

I think conda has already been mentioned.

- environment.yml :

- "A community led collection of recipes, build infrastructure and
distributions for the conda package manager."
- "AppVeyor, CircleCI and TravisCI"

> A possible solution
> Downstream, that is developers and distributions, will need a set of
> packages that are known to work together. At minimum this would consist of,
> per package, the name of the package and its version, but for
> reproducibility I would propose adding the filename and hash as well.
> Because there isn't any reliable method to extract the requirements of a
> package, I propose also including `setup_requires`, install_requires`, and
> `tests_require` explicitly. That way, distributions can automatically build
> recipes for the packages (although non-Python dependencies would still have
> to be resolved by the distribution).
> The package set would be released as lts--MM-REVISION, and developers
> can choose to track a specific revision, but would typically be asked to
> track only lts--MM which would resolve to the latest REVISION.
> Because dependencies vary per Python language version, interpreter, and
> operating system, we would have to have these sets for each combination and
> therefore I propose having a source which evaluates to say a TOML/JSON file
> per version/interpreter/OS.
> How this source file should be written I don't know; while I think the Nix
> expression language is an excellent choice for this, it is not possible for
> everyone to use and therefore likely not an option.

YAML: environment.yml, meta.yaml


Could/would there be a package with an integration test suite in tests/?

Practically, a developer would want a subset of the given known-good-set
(and then additional packages), so:

- fork/copy requirements--MM-REV--.txt
- #comment out unused deps
- add '-r addl-requirements.txt'

> Open questions
> There are still plenty of open questions.
> - Who decides when a package is updated that would break dependents? This
> is an issue all distributions face, so maybe we should involve them.

IDK if e.g. can post to a mailing list?

- "Stop wasting your time by manually keeping track of changelogs. keeps your python projects secure by monitoring their
- Source:

> - How would this be integrated with pip / virtualenv / pipfile.lock /
> requirements.txt / 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-08 Thread Barry Warsaw
On Dec 01, 2016, at 10:45 AM, Freddy Rietdijk wrote:

>Having a compatible set of packages is not only interesting for developers,
>but also for downstream distributions. All distributions try to find a set
>of packages that are working together and release them. This is a lot of
>work, and I think it would be in everyone's benefit if we try to solve this
>issue together.

It's an interesting but difficult problem at the level of PyPI.

It's difficult because the definition of compatibility is highly dependent on
the consumer's environment.  For example, C extension compatibility will
depend on the version of libraries available on the platform versions you care
about.  There are also dependents on Python libraries that you can't capture
only in PyPI, e.g. some distro-only package may depend on a particular Python
package API.  PyPI can't test any of this.  And some distros (e.g. Ubuntu) may
have multiple dimensions of consumability, e.g. the classic apt packages
vs. snaps, primary devel archive vs. stable distro versions vs. backports, etc.

Ubuntu has an elaborate automated system for testing some dimension of
compatibility issues between packages, not just Python packages.  Debian has
the same system but isn't gated on the results.  Individual distro packages
can include a set of tests that are run against the built version of the
package (as opposed to tests run at package build time).  When a new version
of that package is uploaded it ends up in a "proposed" pocket that is
generally not installed by users.  The proposed new version has its tests run,
and any package that depends on that package also has its tests run.  Only
when all those tests pass is the package automatically promoted to the release
pocket, and thus is installable by the general user population.  This does a
great job within the context of the distro of raising the quality of the
archive because obvious incompatibilities (i.e. those for which tests exist)
block such promotion.

(FWIW, the difference is that Debian doesn't block promotion of packages
failing their automated tests, so 1) it provides less value to Debian; 2) we
end up inheriting and seeing these problems in Ubuntu and so have to expend
downstream effort to fix the failures.)

All of this happens automatically within the context of the distro, on
multiple architectures, so it provides a lot of value, but I'm not sure how
useful it would be higher up the food chain, since those contexts will be
different enough to cause both false positives and false negatives.  And it
does often take quite a bit of focused engineering effort to monitor packages
which don't promote (something we want to automate), to actually fix the
problems wherever is most appropriate (and as far upstream as possible), and
to create meaningful tests of compatibility in the first place (except for
default tests such as installability).

Still, there may be value in inter-Python package compatibility tests, but
it'll take serious engineering effort (i.e. $ and time), ongoing maintenance,
ongoing effort to fix problems, and tooling to gate installability of failing
packages (with overrides for downstreams which don't care or already expend
such effort).


Description: OpenPGP digital signature
Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-02 Thread Nick Coghlan
On 3 December 2016 at 03:34, Freddy Rietdijk  wrote:
> On Fri, Dec 2, 2016 at 4:33 PM, Robert T. McGibbon 
> wrote:
>> Isn't this issue already solved by (and the raison d'être of) the multiple
>> third-party Python redistributors, like the various OS package maintainers,
>> Continuum's Anaconda, Enthought Canopy, ActiveState Python, WinPython, etc?
> My intention is not creating yet another distribution. Instead, I want to
> see if there is interest in the different distributions on sharing some of
> the burden of curating by bringing up this discussion and seeing what is
> needed. These distributions have their recipes that allow them to build
> their packages using their tooling. What I propose is having some of that
> data community managed so the distributions can use that along with their
> tooling to build the eventual packages.

There's definitely interest in more automated curation such that
publishing through PyPI means you get pre-built binary artifacts and
compatibility testing for popular platforms automatically, but the
hard part of that isn't really the technical aspects, it's developing
a robust funding and governance model for the related sustaining
engineering activities.

That upstream component level "yes it builds" and "yes it passes its
self-tests" data is then useful to redistributors, since it would make
it straightforward to filter out releases that don't even build or
pass their own tests even before they make it into a downstream review

> These are interesting issues you bring up here. What I seek is having a set
> that has per package a version, source, Python dependencies and build
> system. Other dependencies would be for now left out, unless someone has a
> good idea how to include those. Distributions can take this curated set and
> extend the data with their distribution specific things. For example, in Nix
> we could load such a set, map a function that builds the packages in the
> set, and override what is passed to the function when necessary (e.g. to add
> system dependencies, our patches, or how tests are invoked, and so on).

Something that could be useful on that front is to mine the stdlib
documentation for "seealso" references to third party libraries and
collect them into an automation-friendly reference API. The benefit of
that approach is that it:

- would be immediately useful in its own right as a "stdlib++" definition
- solves the scope problem (the problem tackled has to be common
enough to have a default solution in the standard library, but complex
enough that there are recommended alternatives)
- solves the governance problem (the approval process for new entries
is to get them referenced from the relevant stdlib module

> Responsiveness is indeed an interesting issue. If there's enough backing,
> then I imagine security issues will be resolved as fast as they are nowadays
> by the distributions backing the initiative.

Not necessarily, as many of those responsiveness guarantees rely on
the ability of redistributors to carry downstream patches, even before
there's a corresponding upstream security release. This is especially
so for upstream projects that follow an as-needed release model,
without much (if any) automation of their publication process.

>> If a curation community *isn't* doing any of those things, then it isn't
>> adding a lot of value beyond folks just doing DIY integration in their CI
>> system by pinning their dependencies to particular versions.
> I would imagine that distributions that would support this idea would have a
> CI tracking packages built using the curated set and the
> distribution-specific changes. When there's an issue they could fix it at
> their side, or if it is something that might belong in the curated set, they
> would report the issue. At some point, when they would freeze, they would
> pin to a certain .MM and API breakage should not occur.

Yeah, this is effectively what happens already, it's just not
particularly visible outside the individual redistributor pipelines.

> is a very interesting initiative. It seems they scan the
> contents of the archives and extract dependencies based on what is in the
> requirements files, which is often more than is actually needed for building
> and running the package. They would benefit from having a declarative style
> for the dependencies and build system, but that is another issue (PEP 517
> e.g.) than what I bring up here. We also have a tool that runs pip in a
> sandbox to determine the dependencies, and then provide us with an
> expression. It works, but it shouldn't be necessary.

Alas, with 94k+ based packages already in the wild, arbitrary
code execution for dependency metadata generation is going to be with
us for a while. That said, centralised services like
should lead to more folks being able to just use their already
collected dependency 

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-02 Thread Freddy Rietdijk
On Fri, Dec 2, 2016 at 4:33 PM, Robert T. McGibbon 

> Isn't this issue already solved by (and the raison d'être of) the multiple
> third-party Python redistributors, like the various OS package maintainers,
> Continuum's Anaconda, Enthought Canopy, ActiveState Python, WinPython, etc?

My intention is not creating yet another distribution. Instead, I want to
see if there is interest in the different distributions on sharing some of
the burden of curating by bringing up this discussion and seeing what is
needed. These distributions have their recipes that allow them to build
their packages using their tooling. What I propose is having some of that
data community managed so the distributions can use that along with their
tooling to build the eventual packages.

On Fri, Dec 2, 2016 at 5:23 PM, Nick Coghlan  wrote:

> On 3 December 2016 at 01:33, Robert T. McGibbon 
> wrote:
>> Isn't this issue already solved by (and the raison d'être of) the
>> multiple third-party Python redistributors, like the various OS package
>> maintainers, Continuum's Anaconda, Enthought Canopy, ActiveState Python,
>> WinPython, etc?
> Yep. Once you start talking content curation, you're in a situation where:
> - you're providing an ongoing service that will always be needed (the
> specific packages will change, but the task won't)
> - you need to commit to a certain level of responsiveness for security
> issues
- you'll generally need to span multiple language ecosystems, not just
> Python
> - exactly which packages are interesting will depend on the user audience
> you're targeting
> - the tolerance for API breakage will also vary based on the audience
> you're targeting
> - you'll often want to be able to carry patches that aren't present in the
> upstream components
> - you'll need to decide which target platforms you want to support

These are interesting issues you bring up here. What I seek is having a set
that has per package a version, source, Python dependencies and build
system. Other dependencies would be for now left out, unless someone has a
good idea how to include those. Distributions can take this curated set and
extend the data with their distribution specific things. For example, in
Nix we could load such a set, map a function that builds the packages in
the set, and override what is passed to the function when necessary (e.g.
to add system dependencies, our patches, or how tests are invoked, and so

Responsiveness is indeed an interesting issue. If there's enough backing,
then I imagine security issues will be resolved as fast as they are
nowadays by the distributions backing the initiative.

> If a curation community *isn't* doing any of those things, then it isn't
> adding a lot of value beyond folks just doing DIY integration in their CI
> system by pinning their dependencies to particular versions.
I would imagine that distributions that would support this idea would have
a CI tracking packages built using the curated set and the
distribution-specific changes. When there's an issue they could fix it at
their side, or if it is something that might belong in the curated set,
they would report the issue. At some point, when they would freeze, they
would pin to a certain .MM and API breakage should not occur.

> As far as the comments about determining dependencies goes, the way pip
> does it generally works fine, you just need a sandboxed environment to do
> the execution, and both redistributors and open source information
> providers like are actively working on automating that
> process (coping with packages as they exist on PyPI today, rather than
> relying on the upstream community to change anything about the way Python
> packaging works).
> is a very interesting initiative. It seems they scan the
contents of the archives and extract dependencies based on what is in the
requirements files, which is often more than is actually needed for
building and running the package. They would benefit from having a
declarative style for the dependencies and build system, but that is
another issue (PEP 517 e.g.) than what I bring up here. We also have a tool
that runs pip in a sandbox to determine the dependencies, and then provide
us with an expression. It works, but it shouldn't be necessary.

Distutils-SIG maillist  -

Re: [Distutils] Maintaining a curated set of Python packages

2016-12-02 Thread Robert T. McGibbon
Isn't this issue already solved by (and the raison d'être of) the multiple
third-party Python redistributors, like the various OS package maintainers,
Continuum's Anaconda, Enthought Canopy, ActiveState Python, WinPython, etc?

[image: Inline image 1]


On Thu, Dec 1, 2016 at 4:45 AM, Freddy Rietdijk 

> Hi,
> I would like to propose that, as a community, we jointly maintain a
> curated set of Python packages that are known to work together. These
> packages would receive security updates for some time and every couple of
> months a new major release of the curated set comes available. The idea of
> this is inspired by Haskell LTS, so maybe we should call this PyPI LTS?
> So why a PyPI LTS?
> PyPI makes available all versions of packages that were uploaded, and by
> default installers like pip will try to use the latest available versions
> of packages, unless told otherwise. With a requirements.txt file (or a
> future pipfile.lock) and we can pin as much as we like our
> requirements of respectively the environment and package requirements,
> thereby making a more reproducible environment possible and also fixing the
> API for developers. Pinning requirements is often a manual job, although
> one could use pip freeze or other tools.
> A common problem is when two packages in a certain environment require
> different versions of a package. Having a curated set of packages,
> developers could be encouraged to test against the latest stable and
> nightly of the curated package set, thereby increasing compatibility
> between different packages, something I think we all want.
> Having a compatible set of packages is not only interesting for
> developers, but also for downstream distributions. All distributions try to
> find a set of packages that are working together and release them. This is
> a lot of work, and I think it would be in everyone's benefit if we try to
> solve this issue together.
> A possible solution
> Downstream, that is developers and distributions, will need a set of
> packages that are known to work together. At minimum this would consist of,
> per package, the name of the package and its version, but for
> reproducibility I would propose adding the filename and hash as well.
> Because there isn't any reliable method to extract the requirements of a
> package, I propose also including `setup_requires`, install_requires`, and
> `tests_require` explicitly. That way, distributions can automatically build
> recipes for the packages (although non-Python dependencies would still have
> to be resolved by the distribution).
> The package set would be released as lts--MM-REVISION, and developers
> can choose to track a specific revision, but would typically be asked to
> track only lts--MM which would resolve to the latest REVISION.
> Because dependencies vary per Python language version, interpreter, and
> operating system, we would have to have these sets for each combination and
> therefore I propose having a source which evaluates to say a TOML/JSON file
> per version/interpreter/OS.
> How this source file should be written I don't know; while I think the Nix
> expression language is an excellent choice for this, it is not possible for
> everyone to use and therefore likely not an option.
> Open questions
> There are still plenty of open questions.
> - Who decides when a package is updated that would break dependents? This
> is an issue all distributions face, so maybe we should involve them.
> - How would this be integrated with pip / virtualenv / pipfile.lock /
> requirements.txt / See e.g.
> pipfile/issues/10#issuecomment-262229620
> References to Haskell LTS
> Here are several links to some interesting documents on how Haskell LTS
> works.
> - A blog post describing what Haskell LTS is: https://www.fpcomplete.
> com/blog/2014/12/backporting-bug-fixes
> - Rules regarding uploading and breaking packages:
> fpco/stackage/blob/master/
> - The actual LTS files
> What do you think of this proposal? Would you be interested in this as
> developer, or packager?
> Freddy
> ___
> Distutils-SIG maillist  -

Distutils-SIG maillist  -