Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread David Cournapeau
On Sat, May 16, 2015 at 11:36 PM, Justin Cappos jcap...@nyu.edu wrote:

 I am no expert, but I don't understand why backtracking algorithms would
 to be faster than SAT, since they both potentially need to walk over the
 full set of possible solutions. It is hard to reason about the cost because
 the worst case is in theory growing exponentially in both cases.


 This is talked about a bit in this thread:
 https://github.com/pypa/pip/issues/988

 Each algorithm could be computationally more efficient.  Basically, *if
 there are no conflicts* backtracking will certainly win.  If there are a
 huge number of conflicts a SAT solver will certainly win.  It's not clear
 where the tipping point is between the two schemes.

 However, a better question is does the computational difference matter?
 If one is a microsecond faster than the other, I don't think anyone cares.
 However, from the OPIUM paper (listed off of that thread), it is clear that
 SAT solver resolution can be slow without optimizations to make them work
 more like backtracking resolvers.  From my experience backtracking
 resolvers are also slow when the conflict rate is high.


Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes  1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

If you SAT solver is in pure python, you can choose a direction of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.


 This only considers computation cost though.  Other factors can become
 more expensive than computation.  For example, SAT solvers need all the
 rules to consider.  So a SAT solution needs to effectively download the
 full dependency graph before starting.  A backtracking dependency resolver
 can just download packages or dependency information as it considers them.
 The bandwidth cost for SAT solvers should be higher.


With a reasonable representation, I think you can make it small enough. To
give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.

The libsvolv people worked on a binary representation that may also be
worth looking at.


 P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
 talk more with you and/or Robert about minutiae that others may not care
 about.


Sure, I would be happy too. As I mentioned before, we have some code around
a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.

David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread David Mertz
I've just started monitoring this SIG to get a sense of the issues and
status of things.  I've also just started working for Continuum Analytics.

Continuum has a great desire to make 'pip' work with conda packages.
Obviously, we love for users to choose the Anaconda Python distribution but
many will not for a variety of reasons (many good reasons).

However, we would like for users of other distros still to be able to
benefit from our creation of binary packages for many platforms in the
conda format.  As has been discussed in recent threads on dependency
solving, the way conda provides metadata apart from entire packages makes
much of that work easier.  But even aside from that, there are simply a
large number of well-tested packages (not only for Python, it is true, so
that's possibly a wrinkle in the task) we have generated in conda format.

It is true that right now, a user can in principle type:

  % pip install conda
  % conda install some_conda_package

But that creates two separate systems for tracking what's installed and
what dependencies are resolved; and many users will not want to convert
completely to conda after that step.

What would be better as a user experience would be to let users do this:

  % pip install --upgrade pip
  % pip install some_conda_package

Whether that second command ultimately downloads code from pyip.python.org
or from repo.continuum.io is probably less important for a user experience
perspective.  Continuum is very happy to upload all of our conda packages
to PyPI if this would improve this user experience.  Obviously, the idea
here would be that the user would be able to type 'pip list' and friends
afterward, and have knowledge of what was installed, even as conda packages.

I'm hoping members of the SIG can help me understand both the technical and
social obstacles that need to be overcome before this can happen.

Yours, David...
-- 
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Daniel Holth
On May 16, 2015 11:22 AM, David Cournapeau courn...@gmail.com wrote:



 On Sat, May 16, 2015 at 11:36 PM, Justin Cappos jcap...@nyu.edu wrote:

 I am no expert, but I don't understand why backtracking algorithms
would to be faster than SAT, since they both potentially need to walk over
the full set of possible solutions. It is hard to reason about the cost
because the worst case is in theory growing exponentially in both cases.


 This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988

 Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.

 However, a better question is does the computational difference matter?
If one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.


 Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes  1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

Where optimizing means something like find a solution with the newest
possible releases of the required packages, not execution speed.

 If you SAT solver is in pure python, you can choose a direction of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.


 This only considers computation cost though.  Other factors can become
more expensive than computation.  For example, SAT solvers need all the
rules to consider.  So a SAT solution needs to effectively download the
full dependency graph before starting.  A backtracking dependency resolver
can just download packages or dependency information as it considers them.
The bandwidth cost for SAT solvers should be higher.


 With a reasonable representation, I think you can make it small enough.
To give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.

 The libsvolv people worked on a binary representation that may also be
worth looking at.


 P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy
to talk more with you and/or Robert about minutiae that others may not care
about.


 Sure, I would be happy too. As I mentioned before, we have some code
around a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.

 David


 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Paul Moore
On 16 May 2015 at 20:04, David Mertz dme...@continuum.io wrote:
 What would be better as a user experience would be to let users do this:

   % pip install --upgrade pip
   % pip install some_conda_package

 Whether that second command ultimately downloads code from pyip.python.org
 or from repo.continuum.io is probably less important for a user experience
 perspective.  Continuum is very happy to upload all of our conda packages to
 PyPI if this would improve this user experience.  Obviously, the idea here
 would be that the user would be able to type 'pip list' and friends
 afterward, and have knowledge of what was installed, even as conda packages.

 I'm hoping members of the SIG can help me understand both the technical and
 social obstacles that need to be overcome before this can happen.

My immediate thought is, what obstacles stand in the way of a conda
to wheel conversion utility? With such a utility, a wholesale
conversion of conda packages to wheels, along with hosting those
wheels somewhere (binstar? PyPI isn't immediately possible as only
package owners can upload files), would essentially give this
capability.

There presumably are issues with this approach (maybe technical, more
likely social) but it seems to me that understanding *why* this
approach doesn't work would be a good first step towards identifying
an actual solution.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 12:04 PM, David Mertz dme...@continuum.io wrote:

 Continuum has a great desire to make 'pip' work with conda packages.
 Obviously, we love for users to choose the Anaconda Python distribution but
 many will not for a variety of reasons (many good reasons).


Hmm -- this strikes me as very, very , tricky -- and of course, tied in to
the other thread I've been spending a bunch of time on...

However, we would like for users of other distros still to be able to
 benefit from our creation of binary packages for many platforms in the
 conda format.


Frankly, if you want your efforts at building binaries to get used outside
of Anaconda, then you shoudl be building wheels in the first place. While
conda does more than pip + wheel can do -- I suppose you _could_ use wheels
for the things it can support..

But on to the technical issues:

conda python packages depend on other conda packages, and some of those
packages are not python packages at all. The common use case here are
non-python dynamic libs -- exactly the use case I've been going on in the
other thread about...

And conda installs those dynamic libs in a conda environment -- outside of
the python environment. So you can't really use a conda package without a
conda enviroment, and an installer that understands that environment (I
think conda install does some lib path re-naming, yes?), i.e. conda itself.
So I think that's kind of a dead end.

So what about the idea of a conda-package-to-wheel converter? conda
packages an wheels have a bit in common -- IIUC, they are both basically a
zip of all the files you need installed. But again the problem is those
dependencies on third party dynamic libs.

So far that to work -- pip+wheel would have to grow a way to deal with
installing, managing and using dynamic libs. See the other thread for the
nightmare there...

And while I'd love to see this happen, perhaps an easier route would be for
conda_build to grow a static flag that will statically link stuff and get
to somethign already supported by pip, wheel, and pypi.

-Chris



 It is true that right now, a user can in principle type:

   % pip install conda
   % conda install some_conda_package

 But that creates two separate systems for tracking what's installed and
 what dependencies are resolved;


Indeed -- which is why some folks are working on making it easier to use
conda for everythingconverting a wheel to a conda package is probably
easier than the other way around..

Funny -- just moments ago I wrote that it didn't seem that anyone other
than me was interested in extending pip_wheel to support this kind of thing
-- I guess I was wrong!

Great to see you and continuum thinking about this.


-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Paul Moore
On 16 May 2015 at 20:04, Chris Barker chris.bar...@noaa.gov wrote:
 I was referring to the SetDllDirectory API. I don't think that gets picked
 up by other processes.

 from:

 https://msdn.microsoft.com/en-us/library/windows/desktop/ms686203%28v=vs.85%29.aspx

 It looks like you can add a path, at run time, that gets searched for dlls
 before the rest of the system locations. And this does to effect any other
 applications. But you'd need to make sure this got run before any of the
 effected packages where loaded -- which is proabbly what David meant by
 needing to control the python binary.

Ah, sorry - I misunderstood you. This might work, but as you say, the
DLL Path change would need to run before any imports needed it. Which
basically means it needs to be part of the Python interpreter startup.
It *could* be run as normal user code - you just have to ensure you
run it before any imports that need shared libraries. But that seems
very fragile to me. I'm not sure it's viable as a generic solution.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Chris Barker
On Fri, May 15, 2015 at 11:35 PM, David Cournapeau courn...@gmail.com
wrote:

 On Sat, May 16, 2015 at 4:56 AM, Chris Barker chris.bar...@noaa.gov
 wrote:


 But in short -- I'm pretty sure there is a way, on all systems, to have a
 standard way to build extension modules, combined with a standard way to
 install shared libs, so that a lib can be shared among multiple packages.
 So the question remains:


 There is actually no way to do that on windows without modifying the
 interpreter somehow.


Darn.


 This was somehow discussed a bit at PyCon when talking about windows
 packaging:

  1. the simple way to share DLLs across extensions is to put them in the
 %PATH%, but that's horrible.


yes -- that has to be off the table, period.


 2. there are ways to put DLLs in a shared directory *not* in the %PATH%
 since at least windows XP SP2 and above, through the SetDllDirectory API.

 With 2., you still have the issue of DLL hell,


could you clarify a bit -- I thought that this could, at least, put a dir
on the search path that was specific to that python context. So it would
require cooperation among all the packages being used at once, but not get
tangled up with the rest of the system. but maybe I'm wrong here -- I have
no idea what the heck I'm doing with this!

which may be resolved through naming and activation contexts.


I guess that's what I mean by the above..


 I had a brief chat with Steve where he mentioned that this may be a
 solution, but he was not 100 % sure IIRC. The main drawback of this
 solution is that it won't work when inheriting virtual environments (as you
 can only set a single directory).


no relative paths here? or path that can be set at run time? or maybe Im
missing what inheriting virtual environments means...


 FWIW, we are about to deploy 2. @ Enthought (where we control the python
 interpreter, so it is much easier for

us).


It'll be great to see how that works out, then. I take that this means that
for Canopy, you've decided that statically linking everything is NOT The
way to go. Which is a good data point to have.

Thanks for the update.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Nick Coghlan
On 17 May 2015 at 00:36, Justin Cappos jcap...@nyu.edu wrote:
 This only considers computation cost though.  Other factors can become more
 expensive than computation.  For example, SAT solvers need all the rules to
 consider.  So a SAT solution needs to effectively download the full
 dependency graph before starting.  A backtracking dependency resolver can
 just download packages or dependency information as it considers them.

This is the defining consideration for pip at this point: a SAT solver
requires publication of static dependency metadata on PyPI, which is
dependent on both the Warehouse migration *and* the completion and
acceptance of PEP 426. Propagation out to PyPI caching proxies and
mirrors like devpi and the pulp-python plugin will then take even
longer.

A backtracking resolver doesn't have those gating dependencies, as it
can tolerate the current dynamic metadata model.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 4:13 AM, Paul Moore p.f.mo...@gmail.com wrote:

  Though it's a lot harder to provide a build environment than just the
 lib to
  link too .. Im going to have to think more about that...

 It seems to me that the end user doesn't really have a problem here
 (pip install matplotlib works fine for me using the existing wheel).


Sure -- but that's because Matthew Brett has done a lot of work to make
that happen.

 It's the package maintainers (who have to build the binaries) that
 have the issue because everyone ends up doing the same work over and
 over, building dependencies.


Exactly -- It would be nice if the ecosystem made that easier.


 So rather than trying to address the hard
 problem of dynamic linking, maybe a simpler solution is to set up a
 PyPI-like hosting solution for static libraries of C dependencies?

 It could be as simple as a github project that contained a directory
 for each dependency,


I started that here:

https://github.com/PythonCHB/mac-builds

but haven't kept it up. And Matthew Brett has done most of the work here:

https://github.com/MacPython

not sure how he's sharing the static libs -- but it could be done.

 With a setuptools build plugin you could even just

specify your libraries in setup.py, and have the plugin download the
 lib files automatically at build time.


actually, that's a pretty cool idea! you'd need  place to host them --
gitHbu is no longer hosting downloads are they? though you could probably
use github-pages.. (or somethign else)


 People add libraries to the
 archive simply by posting pull requests. Maybe the project maintainer
 maintains the actual binaries by running the builds separately and
 publishing them separately, or maybe PRs include binaries


or you use a CI system to build them. Something like this is being done by
a bunch of folks for conda/binstar:

https://github.com/ioos/conda-recipes

is just one example.

PS The above is described as if it's single-platform, mostly because I

only tend to think about these issues from a Windows POV, but it
 shouldn't be hard to extend it to multi-platform.


Indeed -- the MacWheels projects are, of course single platform, but could
be extended. though at the end of the day, there isn't much to share
between building libs on different platforms (unless you are using a
cross-platfrom build tool -- why I was trying out gattai for my stuff)

The conda stuff is multi-platform, though, in fact, you have to write a
separate build script for each platform -- it doesn't really provide
anything to help with that part.

But while these efforts are moving towards removing the need for every
pacakge maintainer to build the deps -- we are now duplicating the effort
of trying to remove duplication of effort :-) -- but maybe just waiting for
something to gain momentum and rise to the top is the answer.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Justin Cappos

 I am no expert, but I don't understand why backtracking algorithms would
 to be faster than SAT, since they both potentially need to walk over the
 full set of possible solutions. It is hard to reason about the cost because
 the worst case is in theory growing exponentially in both cases.


This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988

Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.

However, a better question is does the computational difference matter?  If
one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.

This only considers computation cost though.  Other factors can become more
expensive than computation.  For example, SAT solvers need all the rules to
consider.  So a SAT solution needs to effectively download the full
dependency graph before starting.  A backtracking dependency resolver can
just download packages or dependency information as it considers them.  The
bandwidth cost for SAT solvers should be higher.

Thanks,
Justin
P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
talk more with you and/or Robert about minutiae that others may not care
about.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 10:12 AM, Nick Coghlan ncogh...@gmail.com wrote:

  Maybe, but it's a problem to be solved, and the Linux distros more or
 less
  solve it for us, but OS-X and Windows have no such system built in (OS-X
  does have Brew and macports)

 Windows 10 has Chocalatey and OneGet:

 * https://chocolatey.org/
 *
 http://blogs.msdn.com/b/garretts/archive/2015/01/27/oneget-and-the-windows-10-preview.aspx


cool -- though I don't think we want the official python to depend on a
third party system, and one get won't be available for most users for a
LONG time...

The fact that OS-X users have to choose between fink, macport, homebrew or
roll-your-own is a MAJOR soruce of pain for supporting the OS-X community.
More than one way to do it is not the goal.

conda and nix then fill the niche for language independent packaging
 at the user level rather than the system level.


yup -- conda is, indeed, pretty cool.

I think there is a bit of fuzz here -- cPython, at least, uses the the

  operating system provided C/C++
  dynamic linking system -- it's not a totally independent thing.

 I'm specifically referring to the *declaration* of dependencies here.


sure -- that's my point about the current missing link -- setuptools,
pip, etc, can only declare python-package-level dependencies, not
binary-level dependencies.

My idea is to bundle up a shared lib in a python package -- then, if you
declare a dependency on that package, you've handles the dep issue. The
trick is that a particular binary wheel depends on that other binary wheel
-- rather than the whole package depending on it. (that is, on linux, it
would have no dependency, on OS-X it would -- but then only the wheel built
for a non-macports build, etc).

I think we could hack around this by monkey-patching the wheel after it is
built, so may be worth playing with to see how it works before proposing
any changes to the ecosystem.

 And if you are using something like conda you don't need pip

 or wheels anyway!

 Correct, just as if you're relying solely on Linux system packages,
 you don't need pip or wheels. Aside from the fact that conda is
 cross-platform, the main difference between the conda community and a
 Linux distro is in the *kind* of software we're likely to have already
 done the integration work for.


sure. but the cross-platform thing is BIG -- we NEED pip and wheel because
rpm, or deb, or ... are all platform and distro dependent -- we want a way
for package maintainers to support a broad audience without having to deal
with 12 different package systems.

The key to understanding the difference in the respective roles of pip
 and conda is realising that there are *two* basic distribution
 scenarios that we want to be able to cover (I go into this in more
 detail in
 https://www.python.org/dev/peps/pep-0426/#development-distribution-and-deployment-of-python-software
 ):


hmm -- sure, they are different, but is it impossible to support both with
one system?


 * software developer/publisher - software integrator/service operator
 (or data analyst)
 * software developer/publisher - software integrator - service
 operator (or data analyst)

...

 On the consumption side, though, the nature of the PyPA tooling as a
 platform-independent software publication toolchain means that if you
 want to consume the PyPA formats directly, you need to be prepared to
 do your own integration work.


Exactly! and while Linux system admins can do their own system integration
work, everyday users (and many Windows sys admins) can't, and we shouldn't
expect them to.

And, in fact, the PyPA tooling does support the more casual user much of
the time -- for example, I'm in the third quarter of a Python certification
class -- Intro, Web development, Advanced topics -- and only half way
through the third class have I run into any problems with sticking with the
PyPA tools.

(except for pychecker -- not being on Pypi :-( )

Many public web service developers are
 entirely happy with that deal, but most system administrators and data
 analysts trying to deal with components written in multiple
 programming languages aren't.


exactly -- but it's not because the audience is different in their role --
it's because different users need different python packages. The PyPA tools
support pure-python great -- and compiled extensions without deps pretty
well -- but there is a bit of gap with extensions that require other deps.

It's a 90% (95%) solution... It'd be nice to get it to a 99% solution.

Where is really gets ugly is where you need stuff that has nothing to do
with python -- say a Julia run-time, or ...

Anaconda is there to support that: their philosophy is that if you are
trying to do full-on data analysis with python, you are likely to need
stuff strickly beyond the python ecosystem -- your own Fortran code, numpy
(which requires LLVM), etc.

Maybe they are right -- but there is still a heck of a lot of stuff that
you can do and stay 

Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread David Cournapeau
On Sun, May 17, 2015 at 12:40 AM, Daniel Holth dho...@gmail.com wrote:


 On May 16, 2015 11:22 AM, David Cournapeau courn...@gmail.com wrote:
 
 
 
  On Sat, May 16, 2015 at 11:36 PM, Justin Cappos jcap...@nyu.edu wrote:
 
  I am no expert, but I don't understand why backtracking algorithms
 would to be faster than SAT, since they both potentially need to walk over
 the full set of possible solutions. It is hard to reason about the cost
 because the worst case is in theory growing exponentially in both cases.
 
 
  This is talked about a bit in this thread:
 https://github.com/pypa/pip/issues/988
 
  Each algorithm could be computationally more efficient.  Basically, *if
 there are no conflicts* backtracking will certainly win.  If there are a
 huge number of conflicts a SAT solver will certainly win.  It's not clear
 where the tipping point is between the two schemes.
 
  However, a better question is does the computational difference
 matter?  If one is a microsecond faster than the other, I don't think
 anyone cares.  However, from the OPIUM paper (listed off of that thread),
 it is clear that SAT solver resolution can be slow without optimizations to
 make them work more like backtracking resolvers.  From my experience
 backtracking resolvers are also slow when the conflict rate is high.
 
 
  Pure SAT is fast enough in practice in my experience (concretely:
 solving thousand of rules takes  1 sec). It becomes more complicated as
 you need to optimize the solution, especially when you have already
 installed packages. This is unfortunately not as well discussed in the
 literature. Pseudo-boolean SAT for optimization was argued to be too slow
 by the 0 install people, but OTOH, this seems to be what's used in conda,
 so who knows :)

 Where optimizing means something like find a solution with the newest
 possible releases of the required packages, not execution speed.


Indeed, it was not obvious in this context :) Though in theory,
optimization is more general. It could be optimizing w.r.t. a cost function
taking into account #packages, download size, minimal number of changes,
etc... This is where you want a pseudo-boolean SAT, which is what conda
uses I think.

0install, composer and I believe libsolv took a different route, and use
heuristics to find a reasonably good solution by picking the next
candidate. This requires access to the internals of the SAT solver though
(not a problem if you have a python implementation).

David

  If you SAT solver is in pure python, you can choose a direction of the
 search which is more meaningful. I believe this is what 0install does from
 reading http://0install.net/solver.html, and what we have in our own SAT
 solver code. I unfortunately cannot look at the 0install code myself as it
 is under the GPL and am working on a BSD solver implementation. I also do
 not know how they handle updates and already installed packages.
 
 
  This only considers computation cost though.  Other factors can become
 more expensive than computation.  For example, SAT solvers need all the
 rules to consider.  So a SAT solution needs to effectively download the
 full dependency graph before starting.  A backtracking dependency resolver
 can just download packages or dependency information as it considers them.
 The bandwidth cost for SAT solvers should be higher.
 
 
  With a reasonable representation, I think you can make it small enough.
 To give an idea, our index @ Enthought containing around 20k packages takes
 ~340 kb compressed w/ bz2 if you only keep the data required for dependency
 handling (name, version and runtime dependencies), and that's using json,
 an inefficient encoding, so I suspect encoding all of pypi may be a few MB
 only fetch, which is generally faster that doing tens of http requests.
 
  The libsvolv people worked on a binary representation that may also be
 worth looking at.
 
 
  P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy
 to talk more with you and/or Robert about minutiae that others may not care
 about.
 
 
  Sure, I would be happy too. As I mentioned before, we have some code
 around a SAT-based solver, but it is not ready yet, which is why we kept it
 private (https://github.com/enthought/sat-solver). It handles well (==
 both speed and quality-wise) the case where nothing is installed, but
 behaves poorly when packages are already installed, and does not handle the
 update case yet. The code is also very prototype-ish, but is not too
 complicated to experimente with it.
 
  David
 
 
  ___
  Distutils-SIG maillist  -  Distutils-SIG@python.org
  https://mail.python.org/mailman/listinfo/distutils-sig
 

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Donald Stufft

 On May 16, 2015, at 1:24 PM, Nick Coghlan ncogh...@gmail.com wrote:
 
 On 17 May 2015 at 00:36, Justin Cappos jcap...@nyu.edu wrote:
 This only considers computation cost though.  Other factors can become more
 expensive than computation.  For example, SAT solvers need all the rules to
 consider.  So a SAT solution needs to effectively download the full
 dependency graph before starting.  A backtracking dependency resolver can
 just download packages or dependency information as it considers them.
 
 This is the defining consideration for pip at this point: a SAT solver
 requires publication of static dependency metadata on PyPI, which is
 dependent on both the Warehouse migration *and* the completion and
 acceptance of PEP 426. Propagation out to PyPI caching proxies and
 mirrors like devpi and the pulp-python plugin will then take even
 longer.
 
 A backtracking resolver doesn't have those gating dependencies, as it
 can tolerate the current dynamic metadata model.
 


Even when we have Warehouse and PEP 426, that only gives us that data going
forward, the 400k files that currently exist on PyPI still won’t have static
metadata. We could parse it out for Wheels but not for anything else. For the
foreseeable future any solution will need to be able to handle iteratively
finding constraints. Though I think a SAT solver can do it if it can handle
incremental solving or just by re-doing the SAT problem each time we discover
a new constraint.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Paul Moore
On 16 May 2015 at 19:40, Chris Barker chris.bar...@noaa.gov wrote:
 With 2., you still have the issue of DLL hell,

 could you clarify a bit -- I thought that this could, at least, put a dir on
 the search path that was specific to that python context. So it would
 require cooperation among all the packages being used at once, but not get
 tangled up with the rest of the system. but maybe I'm wrong here -- I have
 no idea what the heck I'm doing with this!

Suppose Python adds C:\PythonXY\SharedDLLs to %PATH%. Suppose there's
a libpng.dll in there, for matplotlib. Everything works fine.

Then I install another non-Python application that uses libpng.dll,
and does so by putting libpng.dll alongside the executable (a common
way of making DLLs available with Windows applications). Also assume
that the application installer adds the application directory to the
*start* of PATH.

Now, Python extensions will use this 3rd party application's DLL
rather than the correct one. If it's ABI-incompatible, the Python
extension will crash. If it's ABI compatible, but behaves differently
(it could be a different version) there could be inconsistencies or
failures.

The problem is that while Python can add a DLL directory to PATH, it
cannot control what *else* is on PATH, or what has priority.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 11:54 AM, Paul Moore p.f.mo...@gmail.com wrote:

  could you clarify a bit -- I thought that this could, at least, put a
 dir on
  the search path that was specific to that python context. So it would
  require cooperation among all the packages being used at once, but not
 get
  tangled up with the rest of the system. but maybe I'm wrong here -- I
 have
  no idea what the heck I'm doing with this!

 Suppose Python adds C:\PythonXY\SharedDLLs to %PATH%. Suppose there's
 a libpng.dll in there, for matplotlib.


I think we all agree that %PATH% is NOT the option! Taht is the key source
od dll hell on Windows.

I was referring to the SetDllDirectory API. I don't think that gets picked
up by other processes.

from:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686203%28v=vs.85%29.aspx

It looks like you can add a path, at run time, that gets searched for dlls
before the rest of the system locations. And this does to effect any other
applications. But you'd need to make sure this got run before any of the
effected packages where loaded -- which is proabbly what David meant by
needing to control the python binary.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Sébastien Douche
On Fri, 15 May 2015, at 15:48, Donald Stufft wrote:
 Hey!

Hi Donald

 Ideally I hope people start to use ReadTheDocs instead of PyPI itself.

+1. Do you want to use the python.org domain (ex.
pypi.python.org/docs) or keep RTD on it own domain?


-- 
Sébastien Douche s...@nmeos.net
Twitter: @sdouche
http://douche.name
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Nick Coghlan
On 15 May 2015 at 04:01, Chris Barker chris.bar...@noaa.gov wrote:
  I'm confused -- you don't want a system to be able to install ONE
  version
  of a lib that various python packages can all link to? That's really
  the
  key use-case for me



 Are we talking about Python libraries accessed via Python APIs, or
 linking to external dependencies not written in Python (including
 linking directly to C libraries shipped with a Python library)?


 I, at least, am talking about the latter. for a concrete example: libpng,
 for instance, might be needed by PIL, wxPython, Matplotlib, and who knows
 what else. At this point, if you want to build a package of any of these,
 you need to statically link it into each of them, or distribute shared libs
 with each package -- if you ware using them all together (which I do,
 anyway) you now have three copies of the same lib (but maybe different
 versions) all linked into your executable. Maybe there is no downside to
 that (I haven't had a problem yet), but it seems like a bad way to do it!

 It's the latter I consider to be out of scope for a language specific
 packaging system


 Maybe, but it's a problem to be solved, and the Linux distros more or less
 solve it for us, but OS-X and Windows have no such system built in (OS-X
 does have Brew and macports)

Windows 10 has Chocalatey and OneGet:

* https://chocolatey.org/
* 
http://blogs.msdn.com/b/garretts/archive/2015/01/27/oneget-and-the-windows-10-preview.aspx

conda and nix then fill the niche for language independent packaging
at the user level rather than the system level.

 - Python packaging dependencies are designed to
 describe inter-component dependencies based on the Python import
 system, not dependencies based on the operating system provided C/C++
 dynamic linking system.

 I think there is a bit of fuzz here -- cPython, at least, uses the the
 operating system provided C/C++
 dynamic linking system -- it's not a totally independent thing.

I'm specifically referring to the *declaration* of dependencies here.
While CPython itself will use the dynamic linker to load extension
modules found via the import system, the loading of further
dynamically linked modules beyond that point is entirely opaque not
only to the interpreter runtime at module import time, but also to pip
at installation time.

 If folks are after the latter, than they want
 a language independent package system, like conda, nix, or the system
 package manager in a Linux distribution.

 And I am, indeed, focusing on conda lately for this reason -- but not all my
 users want to use a whole new system, they just want to pip install and
 have it work. And if you are using something like conda you don't need pip
 or wheels anyway!

Correct, just as if you're relying solely on Linux system packages,
you don't need pip or wheels. Aside from the fact that conda is
cross-platform, the main difference between the conda community and a
Linux distro is in the *kind* of software we're likely to have already
done the integration work for.

The key to understanding the difference in the respective roles of pip
and conda is realising that there are *two* basic distribution
scenarios that we want to be able to cover (I go into this in more
detail in 
https://www.python.org/dev/peps/pep-0426/#development-distribution-and-deployment-of-python-software):

* software developer/publisher - software integrator/service operator
(or data analyst)
* software developer/publisher - software integrator - service
operator (or data analyst)

Note the second line has 3 groups and 2 distribution arrows, while the
first line only has the 2 groups and a single distribution step.

pip and the other Python specific tools cover that initial
developer/publisher - integrator link for Python projects. This means
that Python developers only need to learn a single publishing
toolchain (the PyPA tooling) to get started, and they'll be able to
publish their software in a format that any integrator that supports
Python can consume (whether that's for direct consumption in a DIY
integration scenario, or to put through a redistributor's integration
processes).

On the consumption side, though, the nature of the PyPA tooling as a
platform-independent software publication toolchain means that if you
want to consume the PyPA formats directly, you need to be prepared to
do your own integration work. Many public web service developers are
entirely happy with that deal, but most system administrators and data
analysts trying to deal with components written in multiple
programming languages aren't.

That latter link, where the person or organisation handling the
software integration task is distinct from the person or organisation
running an operational service, or carrying out some data analysis,
are where the language independent redistributor tools like
Chocolatey, Nix, deb, rpm, conda, Docker, etc all come in - they let a
redistributor handle the integration task (or at least 

Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Donald Stufft

 On May 16, 2015, at 3:04 PM, David Mertz dme...@continuum.io wrote:
 
 I've just started monitoring this SIG to get a sense of the issues and status 
 of things.  I've also just started working for Continuum Analytics.
 
 Continuum has a great desire to make 'pip' work with conda packages.  
 Obviously, we love for users to choose the Anaconda Python distribution but 
 many will not for a variety of reasons (many good reasons).
 
 However, we would like for users of other distros still to be able to benefit 
 from our creation of binary packages for many platforms in the conda format.  
 As has been discussed in recent threads on dependency solving, the way conda 
 provides metadata apart from entire packages makes much of that work easier.  
 But even aside from that, there are simply a large number of well-tested 
 packages (not only for Python, it is true, so that's possibly a wrinkle in 
 the task) we have generated in conda format.
 
 It is true that right now, a user can in principle type:
 
   % pip install conda
   % conda install some_conda_package
 
 But that creates two separate systems for tracking what's installed and what 
 dependencies are resolved; and many users will not want to convert completely 
 to conda after that step.
 
 What would be better as a user experience would be to let users do this:
 
   % pip install --upgrade pip
   % pip install some_conda_package
 
 Whether that second command ultimately downloads code from pyip.python.org 
 http://pyip.python.org/ or from repo.continuum.io 
 http://repo.continuum.io/ is probably less important for a user experience 
 perspective.  Continuum is very happy to upload all of our conda packages to 
 PyPI if this would improve this user experience.  Obviously, the idea here 
 would be that the user would be able to type 'pip list' and friends 
 afterward, and have knowledge of what was installed, even as conda packages.
 
 I'm hoping members of the SIG can help me understand both the technical and 
 social obstacles that need to be overcome before this can happen.



As Paul mentioned, I’m not sure I see a major benefit to being able to ``pip 
install`` a conda package that doesn’t come with a lot of footguns, since any 
conda package either won’t be able to depend on things like Python or random C 
libraries or we’re going to have to just ignore those dependencies or what have 
you. I think a far more workable solution is one that translates a conda 
package to a Wheel.

Practically speaking the only real benefit that conda packages has over pip is 
the one benefit that simply teaching pip to install conda packages won’t 
provide - Namely that it supports things which aren’t Python packages. However 
I don’t think it’s likely that we’re going to be able to install R or erlang or 
whatever into a virtual environment (for instance), but maybe I’m wrong. There 
are a few other benefits, but that’s not anything that are inherent in the two 
different approaches, it’s just things that conda has that pip is planning on 
getting, it just hasn’t gotten them yet because either we have to convince 
people to publish our new formats (e.g. we can’t go out and create a wheel repo 
of common packages) or because we haven’t gotten to it yet because dealing with 
the crushing legacy of PyPI’s ~400k packages is significant slow down factor.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Nick Coghlan
On 16 May 2015 at 11:52, Robert Collins robe...@robertcollins.net wrote:
 On 16 May 2015 at 13:45, Donald Stufft don...@stufft.io wrote:

 On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net 
 wrote:

 On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
 Why not start with pip at least being a simple fail-on-conflict resolver
 (vs the 1st found wins resolver it is now)...

 You'd backtrack for the sake of re-walking when new constraints are 
 found,
 but not for the purpose of solving conflicts.

 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.

 Well, I'm scratching the itch I have. If its too hard to get something
 decent, sure I might back off in my goals, but I see no point aiming
 for something less than all the other language specific packaging
 systems out there have.


 So what makes the other language specific packaging systems different? As far
 as I know all of them have complete archives (e.g. they are like PyPI where 
 they
 have a lot of versions, not like Linux Distros). What can we learn from how 
 they
 solved this?

 NB; I have by no means finished low hanging heuristics and space
 trimming stuff :). I have some simple things in mind and am sure I'll
 end up with something 'good enough' for day to day use. The thing I'm
 worried about is the long term health of the approach.

Longer term, I think it makes sense to have the notion of active and
obsolete versions baked into PyPI's API and the web UI. This
wouldn't be baked into the package metadata itself (unlike the
proposed Obsoleted-By field for project renaming), but rather be a
dynamic reflection of whether or not *new* users should be looking at
the affected version, and whether or not it should be considered as a
candidate for dependency resolution when not specifically requested.
(This could also replace the current hidden versions feature, which
only hides things from the web UI, without having any impact on the
information published to automated tools through the programmatic API)

Tools that list outdated packages could also be simplified a bit, as
their first pass could just be to check the obsolescence markers on
installed packages, with the second pass being to check for newer
versions of those packages.

While the bare minimum would be to let project mantainers set the
obsolescence flag directly, we could also potentially offer projects
some automated obsolescence schemes, such as:

* single active released version, anything older is marked as obsolete
whenever a new (non pre-release) version is uploaded
* semantic versioning, with a given maximum number of active released
X versions (e.g. 2), but only the most recent (according to PEP 440)
released version with a given X.* is active, everything else is
obsolete
* CPython-style and date-based versioning, with a given maximum number
of active released X.Y versions (e.g. 2), but only the most recent
(according to PEP 440) released version with a given X.Y.* is active,
everything else is obsolete

Pre-release versions could also be automatically flagged as obsolete
by PyPI as soon as a newer version for the same release (including the
final release itself) was uploaded for the given package.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Nick Coghlan
On 17 May 2015 06:19, Chris Barker chris.bar...@noaa.gov wrote:
 indeed -- but it does have a bunch of python-specific featuresit was
built around the need to combine python with other systems.

 That makes it an interesting alternative to pip on the package
 *consumption* side for data analysts, but it isn't currently a good
 fit for any of pip's other use cases (e.g. one of the scenarios I'm
 personally most interested in is that pip is now part of the
 Fedora/RHEL/CentOS build pipeline for Python based RPM packages - we
 universally recommend using pip install in the %install phase over
 using setup.py install directly)


 hmm -- conda generally uses setup.py install in its build scripts. And
it doesn't use pip install because it wants to handle the downloading and
dependencies itself (in fact, turning OFF setuptools dependency handling is
an annoyance..)

 So I'm not sure why pip is needed here -- would it be THAT much harder to
build rpms of python packages if it didn't exist? (I do see why you
wouldn't want to use conda to build rpms..)

We switched to recommending pip to ensure that the Fedora (et al) build
toolchain can be updated to emit  handle newer Python metadata standards
just by upgrading pip. For example, it means that system installed packages
on modern Fedora installations should (at least in theory) provide full PEP
376 installation metadata with the installer reported as the system package
manager.

The conda folks (wastefully, in my view) are still attempting to compete
directly with pip upstream, instead of delegating to it from their build
scripts as an abstraction layer that helps hide the complexity of the
Python packaging ecosystem.

 But while _maybe_ if conda had been around 5 years earlier we could have
not bothered with wheel,

No, we couldn't, as conda doesn't work as well for system integrators.

 I'm not proposing that we drop it -- just that we push pip and wheel a
bit farther to broaden the supported user-base.

I can't stop you working on something I consider a deep rabbithole, but why
not just recommend the use of conda, and only pubish sdists on PyPI? conda
needs more users and contributors seeking better integration with the PyPA
tooling, and minimising the non-productive competition.

The web development folks targeting Linux will generally be in a position
to build from source (caching the resulting wheel file, or perhaps an
entire container image).

Also, assuming Fedora's experiment with language specific repos goes well (
https://fedoraproject.org/wiki/Env_and_Stacks/Projects/LanguageSpecificRepositories),
we may see other distros replicating that model of handling the wheel
creation task on behalf of their users.

It's also worth noting that one of my key intended use cases for metadata
extensions is to publish platform specific external dependencies in the
upstream project metadata, which would get us one step closer to fully
automated repackaging into policy compliant redistributor packages.

 Binary wheels already work for Python packages that have been
 developed with cross-platform maintainability and deployability taken
 into account as key design considerations (including pure Python
 wheels, where the binary format just serves as an installation
 accelerator). That category just happens to exclude almost all
 research and data analysis software, because it excludes the libraries
 at the bottom of that stack


 It doesn't quite exclude those -- just makes it harder. And while
depending on Fortran, etc, is pretty unique to the data analysis stack,
stuff like libpng, libcurl, etc, etc, isn't -- non-system libs are not a
rare thing.

The rare thing is having two packages which are tightly coupled to the ABI
of a given external dependency. That's a generally bad idea because it
causes exactly these kinds of problems with independent distribution of
prebuilt components.

The existence of tight ABI coupling between components both gives the
scientific Python stack a lot of its power, *and* makes it almost as hard
to distribute in binary form as native GUI applications.

 It's also the case that when you *are* doing your own system
 integration, wheels are a powerful tool for caching builds,

 conda does this nicely as well  :-) Im not tlrying to argue, at all,
that binary wheels are useless, jsu that they could be a bit more useful.

A PEP 426 metadata extension proposal for describing external binary
dependencies would certainly be a welcome addition. That's going to be a
common need for automated repackaging tools, even if we never find a
practical way to take advantage of it upstream.

  Ah -- here is a key point -- because of that, we DO support binary
packages

  on PyPi -- but only for Windows and OS-X.. I'm just suggesting we find
a way
  to extend that to pacakges that require a non-system non-python
dependency.

 At the point you're managing arbitrary external binary dependencies,
 you've lost all the constraints that let us get away with doing 

Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread David Cournapeau
On Sat, May 16, 2015 at 4:56 AM, Chris Barker chris.bar...@noaa.gov wrote:

 On Fri, May 15, 2015 at 1:49 AM, Paul Moore p.f.mo...@gmail.com wrote:

 On 14 May 2015 at 19:01, Chris Barker chris.bar...@noaa.gov wrote:
  Ah -- here is the issue -- but I think we HAVE pretty much got what we
 need
  here -- at least for Windows and OS-X. It depends what you mean by
  curated, but it seems we have a (defacto?) policy for PyPi: binary
 wheels
  should be compatible with the python.org builds. So while each package
 wheel
  is supplied by the package maintainer one way or another, rather than
 by a
  central entity, it is more or less curated -- or at least standardized.
 And
  if you are going to put a binary wheel up, you need to make sure it
 matches
  -- and that is less than trivial for packages that require a third party
  dependency -- but building the lib statically and then linking it in is
 not
  inherently easier than doing a dynamic link.

 I think the issue is that, if we have 5 different packages that depend
 on (say) libpng, and we're using dynamic builds, then how do those
 packages declare that they need access to libpng.dll?


 this is the missing link -- it is a binary build dependency, not a package
 dependency -- so not such much that matplotlib-1.4.3 depends on libpng.x.y,
 but that:



 matplotlib-1.4.3-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl

 depends on:

 libpng-x.y

 (all those binary parts will come from the platform)

 That's what's missing now.

 And on Windows,
 where does the user put libpng.dll so that it gets picked up?


 Well, here is the rub -- Windows dll hell really  is hell -- but I think
 it goes into the python dll searchpath (sorry, not on a
 Windows box where I can really check this out right now), it can work -- I
 know have an in-house product that has multiple python modules sharing a
 single dll somehow



 And how
 does a non-expert user do this (put it in $DIRECTORY, update your
 PATH, blah blah blah doesn't work for the average user)?


 That's why we may need to update the tooling to handle this -- Im not
 totally sure if the current wheel format can support this on Windows --
 though it can on OS-X.

 In particular, on Windows, note that the shared DLL must either be in
 the directory where the executable is located (which is fun when you
 have virtualenvs, embedded interpreters, etc), or on PATH (which has
 other implications - suppose I have an incompatible version of
 libpng.dll, from mingw, say, somewhere earlier on PATH).


 that would be dll hell, yes.


 The problem isn't so much defining a standard ABI that shared DLLs
 need - as you say, that's a more or less solved problem on Windows -
 it's managing how those shared DLLs are made available to Python
 extensions. And *that* is what Unix package managers do for you, and
 Windows doesn't have a good solution for (other than bundle all the
 dependent DLLs with the app, or suffer DLL hell).


 exactly -- but if we consider the python install to be the app, rather
 than an individual python bundle, then we _may_ be OK.

 PS For a fun exercise, it might be interesting to try breaking conda -


 Windows really is simply broken [1] in this regard -- so I'm quite sure
 you could break conda -- but it does seem to do a pretty good job of not
 being broken easily by common uses -- I can't say I know enough about
 Windows dll finding or conda to know how...

 Oh, and conda is actually broken in this regard on OS-X at this point --
 if you compile your own extension in an anaconda environment, it will find
 a shared lib at compile time that it won't find at run time. -- the conda
 install process fixes these, but that's a pain when under development --
 i.e. you don't want to have to actually install the package with conda to
 run a test each time you re-build the dll.. (or even change a bit of python
 code...)

 But in short -- I'm pretty sure there is a way, on all systems, to have a
 standard way to build extension modules, combined with a standard way to
 install shared libs, so that a lib can be shared among multiple packages.
 So the question remains:


There is actually no way to do that on windows without modifying the
interpreter somehow. This was somehow discussed a bit at PyCon when talking
about windows packaging:

 1. the simple way to share DLLs across extensions is to put them in the
%PATH%, but that's horrible.
 2. there are ways to put DLLs in a shared directory *not* in the %PATH%
since at least windows XP SP2 and above, through the SetDllDirectory API.

With 2., you still have the issue of DLL hell, which may be resolved
through naming and activation contexts. I had a brief chat with Steve where
he mentioned that this may be a solution, but he was not 100 % sure IIRC.
The main drawback of this solution is that it won't work when inheriting
virtual environments (as you can only set a single directory).

FWIW, we are 

Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Nick Coghlan
On 16 May 2015 at 04:34, Donald Stufft don...@stufft.io wrote:
 So I can’t speak for ReadTheDocs, but I believe that they are considering
 and/or are planning on offering arbitrary HTML uploads similarly to how
 you can upload documentation to PyPI. I don’t know if this will actually
 happen and what it would look like but I know they are thinking about it.

I've never tried it with ReadTheDocs, but in theory the .. raw::
html docutils directive allows arbitrary HTML content to be embedded
in a reStructuredText page.

Regardless, Can ReadTheDocs do X? questions are better asked on
https://groups.google.com/forum/#!forum/read-the-docs, while both
GitHub and Atlassian (via BitBucket) offer free static HTML hosting.

In relation to the original question, +1 for attempting to phase out
PyPI's documentation hosting capability in favour of delegating to
RTFD or third party static HTML hosting. One possible option to
explore that minimises disruption for existing users might be to stop
offering it to *new* projects, while allowing existing projects to
continue uploading new versions of their documentation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Wes Turner
On May 16, 2015 4:55 AM, Nick Coghlan ncogh...@gmail.com wrote:

 On 16 May 2015 at 04:34, Donald Stufft don...@stufft.io wrote:
  So I can’t speak for ReadTheDocs, but I believe that they are
considering
  and/or are planning on offering arbitrary HTML uploads similarly to how
  you can upload documentation to PyPI. I don’t know if this will actually
  happen and what it would look like but I know they are thinking about
it.

 I've never tried it with ReadTheDocs, but in theory the .. raw::
 html docutils directive allows arbitrary HTML content to be embedded
 in a reStructuredText page.

 Regardless, Can ReadTheDocs do X? questions are better asked on
 https://groups.google.com/forum/#!forum/
https://groups.google.com/forum/#!forum/read-the-docsread-the-docs
https://groups.google.com/forum/#!forum/read-the-docs,

ReadTheDocs is hiring!

https://blog.readthedocs.com/read-the-docs-is-hiring/

 while both
 GitHub and Atlassian (via BitBucket) offer free static HTML hosting.

I just wrote a tool (pypi:pgs) for serving files over HTTP directly from
gh-pages branches that works in conjunction with pypi:ghp-import (
gh-pages; touch .nojekyll).

CloudFront DNS can sort of be used to add TLS/SSL to custom domains with
GitHub Pages (and probably BitBucket)


 In relation to the original question, +1 for attempting to phase out
 PyPI's documentation hosting capability in favour of delegating to
 RTFD or third party static HTML hosting. One possible option to
 explore that minimises disruption for existing users might be to stop
 offering it to *new* projects, while allowing existing projects to
 continue uploading new versions of their documentation.

- [ ] DOC: migration / alternatives guide


 Cheers,
 Nick.

 --
 Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Paul Moore
On 16 May 2015 at 07:35, David Cournapeau courn...@gmail.com wrote:
 But in short -- I'm pretty sure there is a way, on all systems, to have a
 standard way to build extension modules, combined with a standard way to
 install shared libs, so that a lib can be shared among multiple packages. So
 the question remains:

 There is actually no way to do that on windows without modifying the
 interpreter somehow. This was somehow discussed a bit at PyCon when talking
 about windows packaging:

  1. the simple way to share DLLs across extensions is to put them in the
 %PATH%, but that's horrible.
  2. there are ways to put DLLs in a shared directory *not* in the %PATH%
 since at least windows XP SP2 and above, through the SetDllDirectory API.

 With 2., you still have the issue of DLL hell, which may be resolved through
 naming and activation contexts. I had a brief chat with Steve where he
 mentioned that this may be a solution, but he was not 100 % sure IIRC. The
 main drawback of this solution is that it won't work when inheriting virtual
 environments (as you can only set a single directory).

 FWIW, we are about to deploy 2. @ Enthought (where we control the python
 interpreter, so it is much easier for us).

This is indeed precisely the issue. In general, Python code can run
with the executable being in many different places - there are the
standard installs, virtualenvs, and embedding scenarios to consider.
So put DLLs alongside the executable, which is often how Windows
applications deal with this issue, is not a valid option (that's an
option David missed out above, but that's fine as it doesn't work :-))

Putting DLLs on %PATH% *does* cause problems, and pretty severe ones.
People who use ports of Unix tools, such as myself, hit this a lot -
at one point I got so frustrated with various incompatible versions of
libintl showing up on my PATH, all with the same name, that I went on
a spree of rebuilding all of the GNU tools without libintl support,
just to avoid the issue (and older versions openssl were just as bad
with libeay, etc).

So, as David says, you pretty much have to use SetDllDirectory and
similar features to get a viable location for shared DLLs. I guess it
*may* be possible to call those APIs from a Python extension that you
load *before* using any shared DLLs, but that seems like a very
fragile solution. It's also possible for Python 3.6+ to add a new
shared DLLs location for such things, which the core interpreter
includes (either via SetDllDirectory or by the same mechanism that
adds C:\PythonXY\DLLs to the search path at the moment). But that
wouldn't help older versions.

So while I encourage Chris' enthusiasm in looking for a solution to
this issue, I'm not sure it's as easy as he's hoping.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Donald Stufft
Ok, so unless someone comes out against this in the near future here are my
plans:

1. Implement the ability to delete documentation.

2. Implement the ability to add a (simple) redirect where we would essentially
   just send /project/(.*) to $REDIRECT_BASE/$1.

3. Implement the ability to point the documentation URL to something that isn't
   pythonhosted.org

4. Send an email out to all projects that are currently utilizing the hosted
   documentation telling that it is going away, and give them links to RTD and
   GithubPages and whatever bitbucket calls their service.

5. Disable Documentation Uploads to PyPI with an error message that tells
   people the service has been discontinued.


In addition to the above steps, we'll maintain any documentaton that doesn't
get deleted (and the above redirects) indefinitely. Serving static read only
documentation (other than deletes) is something that we can do without much
trouble or cost.

I think that this will cover all of the things that people in this thread have
brought up as well as providing a sane migration path to go from
pythonhosted.org documentation to wherever they choose to place their docs in
the future.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 4:16 PM, Donald Stufft don...@stufft.io wrote:

 On Sat, May 16, 2015 at 3:03 PM, Donald Stufft don...@stufft.io wrote:

 There are a few other benefits, but that’s not anything that are inherent
 in the two different approaches, it’s just things that conda has that pip
 is planning on getting,


 Huh? I'm confused -- didn't we just have a big thread about how pip+wheel
 probably ISN'T going to handle shared libs -- that those are exactly what
 conda packages do provide -- aside from R and Erlange, anyway :-)

 but it's not the packages in this case that we need -- it's the
 environment -- and I can't see how pip is going to provide a conda
 environment….


 I never said pip was going to provide an environment, I said the main
 benefit conda has over pip, which pip will most likely not get in any
 reasonable time frame, is that it handles things which are not Python
 packages.


well, I got a bit distraced by Erlang and R -- i.e. things that have
nothing to do with python packages.

libxml, on the other hand, is a lib that one might want to use with a
python package -- so a bit more apropos here.

But my confusion was about: things that conda has that pip is planning on
getting -- what are those things? Any of the stuff that conda has that
really useful like handling shared libs, pip is NOT getting -- yes?


 A shared library is not a Python package so I’m not sure what this message
 is even saying? ``pip install lxml-from-conda`` is just going to flat out
 break because pip won’t install the libxml2 shared library.


exactly -- if you're going to install a shared lib, you need somewhere to
put it -- and that's what a conda environment provides.

Trying not to go around in circles, but python _could_ provide a standard
place in which to put shared libs -- and then pip _could_ provide a way to
manage them. That would require dealing with that whole binary API problem,
so we probably won't do it. I'm not sure what the point of contention is
here:

I think it would be useful to have a way to manage shared libs solely for
python packages to use -- and it would be useful to that way to be part of
the standard python ecosytem. Others may not think it would be useful
enough to be worth the pain in the neck it would be.

And that's what the nifty conda packages continuum (and others) have built
could provide -- those shared libs that are built in a  compatible way with
a python binary. After all, pure python packages are no problem, compiled
python packages without any dependencies are little problem. The hard part
is those darn third party libs.

conda also provides a way to mange all sorts of other stuff that has
nothing to do with python, but I'm guessing  that's not what continuum
would like to contribute to pypi

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Donald Stufft

 On May 16, 2015, at 8:50 PM, Chris Barker chris.bar...@noaa.gov wrote:
 
 On Sat, May 16, 2015 at 4:16 PM, Donald Stufft don...@stufft.io 
 mailto:don...@stufft.io wrote:
 On Sat, May 16, 2015 at 3:03 PM, Donald Stufft don...@stufft.io 
 mailto:don...@stufft.io wrote:
 There are a few other benefits, but that’s not anything that are inherent in 
 the two different approaches, it’s just things that conda has that pip is 
 planning on getting,
 
 Huh? I'm confused -- didn't we just have a big thread about how pip+wheel 
 probably ISN'T going to handle shared libs -- that those are exactly what 
 conda packages do provide -- aside from R and Erlange, anyway :-)
 
 but it's not the packages in this case that we need -- it's the environment 
 -- and I can't see how pip is going to provide a conda environment….
 
 I never said pip was going to provide an environment, I said the main benefit 
 conda has over pip, which pip will most likely not get in any reasonable time 
 frame, is that it handles things which are not Python packages.
 
 well, I got a bit distraced by Erlang and R -- i.e. things that have nothing 
 to do with python packages.
 
 libxml, on the other hand, is a lib that one might want to use with a python 
 package -- so a bit more apropos here.
 
 But my confusion was about: things that conda has that pip is planning on 
 getting -- what are those things? Any of the stuff that conda has that 
 really useful like handling shared libs, pip is NOT getting -- yes?


The ability to resolve dependencies with static metadata is the major one that 
comes to my mind that’s specific to pip. The ability to have better build 
systems besides distutils/setuptools is a more ecosystem level one but that’s 
something we’ll get too.

As far as shared libs… beyond what’s already possible (sticking a shared lib 
inside of a python project and having libraries load that .dll explicitly) it’s 
not currently on the road map and may never be. I hesitate to say never because 
it’s obviously a problem that needs solved and if the Python ecosystem solves 
it (specific to shared libraries, not whole runtimes or other languages or what 
have you) then that would be a useful thing. I think we have lower hanging 
fruit that we need to deal with before something like that is even possibly to 
be on the radar though (if we ever put it on the radar).

 
 A shared library is not a Python package so I’m not sure what this message is 
 even saying? ``pip install lxml-from-conda`` is just going to flat out break 
 because pip won’t install the libxml2 shared library.
 
 exactly -- if you're going to install a shared lib, you need somewhere to put 
 it -- and that's what a conda environment provides.
 
 Trying not to go around in circles, but python _could_ provide a standard 
 place in which to put shared libs -- and then pip _could_ provide a way to 
 manage them. That would require dealing with that whole binary API problem, 
 so we probably won't do it. I'm not sure what the point of contention is here:
 
 I think it would be useful to have a way to manage shared libs solely for 
 python packages to use -- and it would be useful to that way to be part of 
 the standard python ecosytem. Others may not think it would be useful enough 
 to be worth the pain in the neck it would be.
 
 And that's what the nifty conda packages continuum (and others) have built 
 could provide -- those shared libs that are built in a  compatible way with a 
 python binary. After all, pure python packages are no problem, compiled 
 python packages without any dependencies are little problem. The hard part is 
 those darn third party libs.
 
 conda also provides a way to mange all sorts of other stuff that has nothing 
 to do with python, but I'm guessing  that's not what continuum would like to 
 contribute to pypi….

I guess I’m confused what the benefit of making pip able to install a conda 
package would be. If Python adds someplace for shared libs to go then we could 
just add shared lib support to Wheels, it’s just another file type so that’s 
not a big deal. The hardest part is dealing with ABI compatibility. However, 
given the current state of things, what’s the benefit of being able to do ``pip 
install conda-lxml``? Either it’s going to flat out break or you’re going to 
have to do ``conda install libxml2`` first, and if you’re doing ``conda install 
libxml2`` first then why not just do ``conda install lxml``?

I view conda the same way I view apt-get, yum, Chocolatey, etc. It provides an 
environment and you can install a Python package into that environment, but 
that pip shouldn’t know how to install a .deb or a .rpm or a conda package 
because those packages rely on specifics to that environment and Python 
packages can’t.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist 

Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Ben Finney
Donald Stufft don...@stufft.io writes:

 Ok, so unless someone comes out against this in the near future here are my
 plans:

 1. Implement the ability to delete documentation.

+1.

 2. Implement the ability to add a (simple) redirect where we would
 essentially just send /project/(.*) to $REDIRECT_BASE/$1.

 3. Implement the ability to point the documentation URL to something
 that isn't pythonhosted.org

Both of these turn PyPI into a vector for arbitrary content, including
(for example) illegal, misleading, or malicious content.

Automatic redirects actively expose the visitor to any malicious or
mistaken links set by the project owner.

If you want to allow the documentation to be at some arbitrary location
of the project owner's choice, then an explicit static link, which the
visitor must click on (similar to the project home page link) is best.

-- 
 \  “I find the whole business of religion profoundly interesting. |
  `\ But it does mystify me that otherwise intelligent people take |
_o__)it seriously.” —Douglas Adams |
Ben Finney

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Donald Stufft

 On May 16, 2015, at 9:31 PM, Ben Finney ben+pyt...@benfinney.id.au wrote:
 
 Donald Stufft don...@stufft.io writes:
 
 Ok, so unless someone comes out against this in the near future here are my
 plans:
 
 1. Implement the ability to delete documentation.
 
 +1.
 
 2. Implement the ability to add a (simple) redirect where we would
 essentially just send /project/(.*) to $REDIRECT_BASE/$1.
 
 3. Implement the ability to point the documentation URL to something
 that isn't pythonhosted.org
 
 Both of these turn PyPI into a vector for arbitrary content, including
 (for example) illegal, misleading, or malicious content.
 
 Automatic redirects actively expose the visitor to any malicious or
 mistaken links set by the project owner.
 
 If you want to allow the documentation to be at some arbitrary location
 of the project owner's choice, then an explicit static link, which the
 visitor must click on (similar to the project home page link) is best.
 

To be clear, the documentation isn’t hosted on PyPI, it’s hosted on
pythonhosted.org and we already allow people to upload arbitrary content to
that domain, which can include JS based redirects.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Donald Stufft

 On May 16, 2015, at 7:09 PM, Chris Barker chris.bar...@noaa.gov wrote:
 
 On Sat, May 16, 2015 at 3:03 PM, Donald Stufft don...@stufft.io 
 mailto:don...@stufft.io wrote:
 There are a few other benefits, but that’s not anything that are inherent in 
 the two different approaches, it’s just things that conda has that pip is 
 planning on getting,
 
 Huh? I'm confused -- didn't we just have a big thread about how pip+wheel 
 probably ISN'T going to handle shared libs -- that those are exactly what 
 conda packages do provide -- aside from R and Erlange, anyway :-)
 
 but it's not the packages in this case that we need -- it's the environment 
 -- and I can't see how pip is going to provide a conda environment….


I never said pip was going to provide an environment, I said the main benefit 
conda has over pip, which pip will most likely not get in any reasonable time 
frame, is that it handles things which are not Python packages. A shared 
library is not a Python package so I’m not sure what this message is even 
saying? ``pip install lxml-from-conda`` is just going to flat out break because 
pip won’t install the libxml2 shared library.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 3:03 PM, Donald Stufft don...@stufft.io wrote:

 There are a few other benefits, but that’s not anything that are inherent
 in the two different approaches, it’s just things that conda has that pip
 is planning on getting,


Huh? I'm confused -- didn't we just have a big thread about how pip+wheel
probably ISN'T going to handle shared libs -- that those are exactly what
conda packages do provide -- aside from R and Erlange, anyway :-)

but it's not the packages in this case that we need -- it's the environment
-- and I can't see how pip is going to provide a conda environment

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig