Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Nick Coghlan
On 17 May 2015 06:19, "Chris Barker"  wrote:
> indeed -- but it does have a bunch of python-specific featuresit was
built around the need to combine python with other systems.
>
>> That makes it an interesting alternative to pip on the package
>> *consumption* side for data analysts, but it isn't currently a good
>> fit for any of pip's other use cases (e.g. one of the scenarios I'm
>> personally most interested in is that pip is now part of the
>> Fedora/RHEL/CentOS build pipeline for Python based RPM packages - we
>> universally recommend using "pip install" in the %install phase over
>> using "setup.py install" directly)
>
>
> hmm -- conda generally uses "setup.py install" in its build scripts. And
it doesn't use pip install because it wants to handle the downloading and
dependencies itself (in fact, turning OFF setuptools dependency handling is
an annoyance..)
>
> So I'm not sure why pip is needed here -- would it be THAT much harder to
build rpms of python packages if it didn't exist? (I do see why you
wouldn't want to use conda to build rpms..)

We switched to recommending pip to ensure that the Fedora (et al) build
toolchain can be updated to emit & handle newer Python metadata standards
just by upgrading pip. For example, it means that system installed packages
on modern Fedora installations should (at least in theory) provide full PEP
376 installation metadata with the installer reported as the system package
manager.

The conda folks (wastefully, in my view) are still attempting to compete
directly with pip upstream, instead of delegating to it from their build
scripts as an abstraction layer that helps hide the complexity of the
Python packaging ecosystem.

> But while _maybe_ if conda had been around 5 years earlier we could have
not bothered with wheel,

No, we couldn't, as conda doesn't work as well for system integrators.

> I'm not proposing that we drop it -- just that we push pip and wheel a
bit farther to broaden the supported user-base.

I can't stop you working on something I consider a deep rabbithole, but why
not just recommend the use of conda, and only pubish sdists on PyPI? conda
needs more users and contributors seeking better integration with the PyPA
tooling, and minimising the non-productive competition.

The web development folks targeting Linux will generally be in a position
to build from source (caching the resulting wheel file, or perhaps an
entire container image).

Also, assuming Fedora's experiment with language specific repos goes well (
https://fedoraproject.org/wiki/Env_and_Stacks/Projects/LanguageSpecificRepositories),
we may see other distros replicating that model of handling the wheel
creation task on behalf of their users.

It's also worth noting that one of my key intended use cases for metadata
extensions is to publish platform specific external dependencies in the
upstream project metadata, which would get us one step closer to fully
automated repackaging into policy compliant redistributor packages.

>> Binary wheels already work for Python packages that have been
>> developed with cross-platform maintainability and deployability taken
>> into account as key design considerations (including pure Python
>> wheels, where the binary format just serves as an installation
>> accelerator). That category just happens to exclude almost all
>> research and data analysis software, because it excludes the libraries
>> at the bottom of that stack
>
>
> It doesn't quite exclude those -- just makes it harder. And while
depending on Fortran, etc, is pretty unique to the data analysis stack,
stuff like libpng, libcurl, etc, etc, isn't -- non-system libs are not a
rare thing.

The rare thing is having two packages which are tightly coupled to the ABI
of a given external dependency. That's a generally bad idea because it
causes exactly these kinds of problems with independent distribution of
prebuilt components.

The existence of tight ABI coupling between components both gives the
scientific Python stack a lot of its power, *and* makes it almost as hard
to distribute in binary form as native GUI applications.

>> It's also the case that when you *are* doing your own system
>> integration, wheels are a powerful tool for caching builds,

> conda does this nicely as well  :-) I"m not tlrying to argue, at all,
that binary wheels are useless, jsu that they could be a bit more useful.

A PEP 426 metadata extension proposal for describing external binary
dependencies would certainly be a welcome addition. That's going to be a
common need for automated repackaging tools, even if we never find a
practical way to take advantage of it upstream.

>> > Ah -- here is a key point -- because of that, we DO support binary
packages
>>
>> > on PyPi -- but only for Windows and OS-X.. I'm just suggesting we find
a way
>> > to extend that to pacakges that require a non-system non-python
dependency.
>>
>> At the point you're managing arbitrary external binary dependencies,
>> you've lost all 

Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Donald Stufft

> On May 16, 2015, at 9:31 PM, Ben Finney  wrote:
> 
> Donald Stufft  writes:
> 
>> Ok, so unless someone comes out against this in the near future here are my
>> plans:
>> 
>> 1. Implement the ability to delete documentation.
> 
> +1.
> 
>> 2. Implement the ability to add a (simple) redirect where we would
>> essentially just send //(.*) to $REDIRECT_BASE/$1.
>> 
>> 3. Implement the ability to point the documentation URL to something
>> that isn't pythonhosted.org
> 
> Both of these turn PyPI into a vector for arbitrary content, including
> (for example) illegal, misleading, or malicious content.
> 
> Automatic redirects actively expose the visitor to any malicious or
> mistaken links set by the project owner.
> 
> If you want to allow the documentation to be at some arbitrary location
> of the project owner's choice, then an explicit static link, which the
> visitor must click on (similar to the project home page link) is best.
> 

To be clear, the documentation isn’t hosted on PyPI, it’s hosted on
pythonhosted.org and we already allow people to upload arbitrary content to
that domain, which can include JS based redirects.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Ben Finney
Donald Stufft  writes:

> Ok, so unless someone comes out against this in the near future here are my
> plans:
>
> 1. Implement the ability to delete documentation.

+1.

> 2. Implement the ability to add a (simple) redirect where we would
> essentially just send //(.*) to $REDIRECT_BASE/$1.
>
> 3. Implement the ability to point the documentation URL to something
> that isn't pythonhosted.org

Both of these turn PyPI into a vector for arbitrary content, including
(for example) illegal, misleading, or malicious content.

Automatic redirects actively expose the visitor to any malicious or
mistaken links set by the project owner.

If you want to allow the documentation to be at some arbitrary location
of the project owner's choice, then an explicit static link, which the
visitor must click on (similar to the project home page link) is best.

-- 
 \  “I find the whole business of religion profoundly interesting. |
  `\ But it does mystify me that otherwise intelligent people take |
_o__)it seriously.” —Douglas Adams |
Ben Finney

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Donald Stufft

> On May 16, 2015, at 8:50 PM, Chris Barker  wrote:
> 
> On Sat, May 16, 2015 at 4:16 PM, Donald Stufft  > wrote:
>> On Sat, May 16, 2015 at 3:03 PM, Donald Stufft > > wrote:
>> There are a few other benefits, but that’s not anything that are inherent in 
>> the two different approaches, it’s just things that conda has that pip is 
>> planning on getting,
>> 
>> Huh? I"'m confused -- didn't we just have a big thread about how pip+wheel 
>> probably ISN'T going to handle shared libs -- that those are exactly what 
>> conda packages do provide -- aside from R and Erlange, anyway :-)
>> 
>> but it's not the packages in this case that we need -- it's the environment 
>> -- and I can't see how pip is going to provide a conda environment….
> 
> I never said pip was going to provide an environment, I said the main benefit 
> conda has over pip, which pip will most likely not get in any reasonable time 
> frame, is that it handles things which are not Python packages.
> 
> well, I got a bit distraced by Erlang and R -- i.e. things that have nothing 
> to do with python packages.
> 
> libxml, on the other hand, is a lib that one might want to use with a python 
> package -- so a bit more apropos here.
> 
> But my confusion was about: "things that conda has that pip is planning on 
> getting" -- what are those things? Any of the stuff that conda has that 
> really useful like handling shared libs, pip is NOT getting -- yes?


The ability to resolve dependencies with static metadata is the major one that 
comes to my mind that’s specific to pip. The ability to have better build 
systems besides distutils/setuptools is a more ecosystem level one but that’s 
something we’ll get too.

As far as shared libs… beyond what’s already possible (sticking a shared lib 
inside of a python project and having libraries load that .dll explicitly) it’s 
not currently on the road map and may never be. I hesitate to say never because 
it’s obviously a problem that needs solved and if the Python ecosystem solves 
it (specific to shared libraries, not whole runtimes or other languages or what 
have you) then that would be a useful thing. I think we have lower hanging 
fruit that we need to deal with before something like that is even possibly to 
be on the radar though (if we ever put it on the radar).

> 
> A shared library is not a Python package so I’m not sure what this message is 
> even saying? ``pip install lxml-from-conda`` is just going to flat out break 
> because pip won’t install the libxml2 shared library.
> 
> exactly -- if you're going to install a shared lib, you need somewhere to put 
> it -- and that's what a conda environment provides.
> 
> Trying not to go around in circles, but python _could_ provide a standard 
> place in which to put shared libs -- and then pip _could_ provide a way to 
> manage them. That would require dealing with that whole binary API problem, 
> so we probably won't do it. I'm not sure what the point of contention is here:
> 
> I think it would be useful to have a way to manage shared libs solely for 
> python packages to use -- and it would be useful to that way to be part of 
> the standard python ecosytem. Others may not think it would be useful enough 
> to be worth the pain in the neck it would be.
> 
> And that's what the nifty conda packages continuum (and others) have built 
> could provide -- those shared libs that are built in a  compatible way with a 
> python binary. After all, pure python packages are no problem, compiled 
> python packages without any dependencies are little problem. The hard part is 
> those darn third party libs.
> 
> conda also provides a way to mange all sorts of other stuff that has nothing 
> to do with python, but I'm guessing  that's not what continuum would like to 
> contribute to pypi….

I guess I’m confused what the benefit of making pip able to install a conda 
package would be. If Python adds someplace for shared libs to go then we could 
just add shared lib support to Wheels, it’s just another file type so that’s 
not a big deal. The hardest part is dealing with ABI compatibility. However, 
given the current state of things, what’s the benefit of being able to do ``pip 
install conda-lxml``? Either it’s going to flat out break or you’re going to 
have to do ``conda install libxml2`` first, and if you’re doing ``conda install 
libxml2`` first then why not just do ``conda install lxml``?

I view conda the same way I view apt-get, yum, Chocolatey, etc. It provides an 
environment and you can install a Python package into that environment, but 
that pip shouldn’t know how to install a .deb or a .rpm or a conda package 
because those packages rely on specifics to that environment and Python 
packages can’t.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Dist

Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 4:16 PM, Donald Stufft  wrote:

> On Sat, May 16, 2015 at 3:03 PM, Donald Stufft  wrote:
>
>> There are a few other benefits, but that’s not anything that are inherent
>> in the two different approaches, it’s just things that conda has that pip
>> is planning on getting,
>>
>
> Huh? I"'m confused -- didn't we just have a big thread about how pip+wheel
> probably ISN'T going to handle shared libs -- that those are exactly what
> conda packages do provide -- aside from R and Erlange, anyway :-)
>
> but it's not the packages in this case that we need -- it's the
> environment -- and I can't see how pip is going to provide a conda
> environment….
>
>
> I never said pip was going to provide an environment, I said the main
> benefit conda has over pip, which pip will most likely not get in any
> reasonable time frame, is that it handles things which are not Python
> packages.
>

well, I got a bit distraced by Erlang and R -- i.e. things that have
nothing to do with python packages.

libxml, on the other hand, is a lib that one might want to use with a
python package -- so a bit more apropos here.

But my confusion was about: "things that conda has that pip is planning on
getting" -- what are those things? Any of the stuff that conda has that
really useful like handling shared libs, pip is NOT getting -- yes?


> A shared library is not a Python package so I’m not sure what this message
> is even saying? ``pip install lxml-from-conda`` is just going to flat out
> break because pip won’t install the libxml2 shared library.
>

exactly -- if you're going to install a shared lib, you need somewhere to
put it -- and that's what a conda environment provides.

Trying not to go around in circles, but python _could_ provide a standard
place in which to put shared libs -- and then pip _could_ provide a way to
manage them. That would require dealing with that whole binary API problem,
so we probably won't do it. I'm not sure what the point of contention is
here:

I think it would be useful to have a way to manage shared libs solely for
python packages to use -- and it would be useful to that way to be part of
the standard python ecosytem. Others may not think it would be useful
enough to be worth the pain in the neck it would be.

And that's what the nifty conda packages continuum (and others) have built
could provide -- those shared libs that are built in a  compatible way with
a python binary. After all, pure python packages are no problem, compiled
python packages without any dependencies are little problem. The hard part
is those darn third party libs.

conda also provides a way to mange all sorts of other stuff that has
nothing to do with python, but I'm guessing  that's not what continuum
would like to contribute to pypi

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Donald Stufft
Ok, so unless someone comes out against this in the near future here are my
plans:

1. Implement the ability to delete documentation.

2. Implement the ability to add a (simple) redirect where we would essentially
   just send //(.*) to $REDIRECT_BASE/$1.

3. Implement the ability to point the documentation URL to something that isn't
   pythonhosted.org

4. Send an email out to all projects that are currently utilizing the hosted
   documentation telling that it is going away, and give them links to RTD and
   GithubPages and whatever bitbucket calls their service.

5. Disable Documentation Uploads to PyPI with an error message that tells
   people the service has been discontinued.


In addition to the above steps, we'll maintain any documentaton that doesn't
get deleted (and the above redirects) indefinitely. Serving static read only
documentation (other than deletes) is something that we can do without much
trouble or cost.

I think that this will cover all of the things that people in this thread have
brought up as well as providing a sane migration path to go from
pythonhosted.org documentation to wherever they choose to place their docs in
the future.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 3:03 PM, Donald Stufft  wrote:

> There are a few other benefits, but that’s not anything that are inherent
> in the two different approaches, it’s just things that conda has that pip
> is planning on getting,
>

Huh? I"'m confused -- didn't we just have a big thread about how pip+wheel
probably ISN'T going to handle shared libs -- that those are exactly what
conda packages do provide -- aside from R and Erlange, anyway :-)

but it's not the packages in this case that we need -- it's the environment
-- and I can't see how pip is going to provide a conda environment

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Donald Stufft

> On May 16, 2015, at 7:09 PM, Chris Barker  wrote:
> 
> On Sat, May 16, 2015 at 3:03 PM, Donald Stufft  > wrote:
> There are a few other benefits, but that’s not anything that are inherent in 
> the two different approaches, it’s just things that conda has that pip is 
> planning on getting,
> 
> Huh? I"'m confused -- didn't we just have a big thread about how pip+wheel 
> probably ISN'T going to handle shared libs -- that those are exactly what 
> conda packages do provide -- aside from R and Erlange, anyway :-)
> 
> but it's not the packages in this case that we need -- it's the environment 
> -- and I can't see how pip is going to provide a conda environment….


I never said pip was going to provide an environment, I said the main benefit 
conda has over pip, which pip will most likely not get in any reasonable time 
frame, is that it handles things which are not Python packages. A shared 
library is not a Python package so I’m not sure what this message is even 
saying? ``pip install lxml-from-conda`` is just going to flat out break because 
pip won’t install the libxml2 shared library.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Donald Stufft

> On May 16, 2015, at 3:04 PM, David Mertz  wrote:
> 
> I've just started monitoring this SIG to get a sense of the issues and status 
> of things.  I've also just started working for Continuum Analytics.
> 
> Continuum has a great desire to make 'pip' work with conda packages.  
> Obviously, we love for users to choose the Anaconda Python distribution but 
> many will not for a variety of reasons (many good reasons).
> 
> However, we would like for users of other distros still to be able to benefit 
> from our creation of binary packages for many platforms in the conda format.  
> As has been discussed in recent threads on dependency solving, the way conda 
> provides metadata apart from entire packages makes much of that work easier.  
> But even aside from that, there are simply a large number of well-tested 
> packages (not only for Python, it is true, so that's possibly a wrinkle in 
> the task) we have generated in conda format.
> 
> It is true that right now, a user can in principle type:
> 
>   % pip install conda
>   % conda install some_conda_package
> 
> But that creates two separate systems for tracking what's installed and what 
> dependencies are resolved; and many users will not want to convert completely 
> to conda after that step.
> 
> What would be better as a user experience would be to let users do this:
> 
>   % pip install --upgrade pip
>   % pip install some_conda_package
> 
> Whether that second command ultimately downloads code from pyip.python.org 
>  or from repo.continuum.io 
>  is probably less important for a user experience 
> perspective.  Continuum is very happy to upload all of our conda packages to 
> PyPI if this would improve this user experience.  Obviously, the idea here 
> would be that the user would be able to type 'pip list' and friends 
> afterward, and have knowledge of what was installed, even as conda packages.
> 
> I'm hoping members of the SIG can help me understand both the technical and 
> social obstacles that need to be overcome before this can happen.



As Paul mentioned, I’m not sure I see a major benefit to being able to ``pip 
install`` a conda package that doesn’t come with a lot of footguns, since any 
conda package either won’t be able to depend on things like Python or random C 
libraries or we’re going to have to just ignore those dependencies or what have 
you. I think a far more workable solution is one that translates a conda 
package to a Wheel.

Practically speaking the only real benefit that conda packages has over pip is 
the one benefit that simply teaching pip to install conda packages won’t 
provide - Namely that it supports things which aren’t Python packages. However 
I don’t think it’s likely that we’re going to be able to install R or erlang or 
whatever into a virtual environment (for instance), but maybe I’m wrong. There 
are a few other benefits, but that’s not anything that are inherent in the two 
different approaches, it’s just things that conda has that pip is planning on 
getting, it just hasn’t gotten them yet because either we have to convince 
people to publish our new formats (e.g. we can’t go out and create a wheel repo 
of common packages) or because we haven’t gotten to it yet because dealing with 
the crushing legacy of PyPI’s ~400k packages is significant slow down factor.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 12:04 PM, David Mertz  wrote:

> Continuum has a great desire to make 'pip' work with conda packages.
> Obviously, we love for users to choose the Anaconda Python distribution but
> many will not for a variety of reasons (many good reasons).
>

Hmm -- this strikes me as very, very , tricky -- and of course, tied in to
the other thread I've been spending a bunch of time on...

However, we would like for users of other distros still to be able to
> benefit from our creation of binary packages for many platforms in the
> conda format.
>

Frankly, if you want your efforts at building binaries to get used outside
of Anaconda, then you shoudl be building wheels in the first place. While
conda does more than pip + wheel can do -- I suppose you _could_ use wheels
for the things it can support..

But on to the technical issues:

conda python packages depend on other conda packages, and some of those
packages are not python packages at all. The common use case here are
non-python dynamic libs -- exactly the use case I've been going on in the
other thread about...

And conda installs those dynamic libs in a conda environment -- outside of
the python environment. So you can't really use a conda package without a
conda enviroment, and an installer that understands that environment (I
think conda install does some lib path re-naming, yes?), i.e. conda itself.
So I think that's kind of a dead end.

So what about the idea of a conda-package-to-wheel converter? conda
packages an wheels have a bit in common -- IIUC, they are both basically a
zip of all the files you need installed. But again the problem is those
dependencies on third party dynamic libs.

So far that to work -- pip+wheel would have to grow a way to deal with
installing, managing and using dynamic libs. See the other thread for the
nightmare there...

And while I'd love to see this happen, perhaps an easier route would be for
conda_build to grow a "static" flag that will statically link stuff and get
to somethign already supported by pip, wheel, and pypi.

-Chris


>
> It is true that right now, a user can in principle type:
>
>   % pip install conda
>   % conda install some_conda_package
>
> But that creates two separate systems for tracking what's installed and
> what dependencies are resolved;
>

Indeed -- which is why some folks are working on making it easier to use
conda for everythingconverting a wheel to a conda package is probably
easier than the other way around..

Funny -- just moments ago I wrote that it didn't seem that anyone other
than me was interested in extending pip_wheel to support this kind of thing
-- I guess I was wrong!

Great to see you and continuum thinking about this.


-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 10:12 AM, Nick Coghlan  wrote:

> > Maybe, but it's a problem to be solved, and the Linux distros more or
> less
> > solve it for us, but OS-X and Windows have no such system built in (OS-X
> > does have Brew and macports)
>
> Windows 10 has Chocalatey and OneGet:
>
> * https://chocolatey.org/
> *
> http://blogs.msdn.com/b/garretts/archive/2015/01/27/oneget-and-the-windows-10-preview.aspx
>

cool -- though I don't think we want the "official" python to depend on a
third party system, and one get won't be available for most users for a
LONG time...

The fact that OS-X users have to choose between fink, macport, homebrew or
roll-your-own is a MAJOR soruce of pain for supporting the OS-X community.
"More than one way to do it" is not the goal.

conda and nix then fill the niche for language independent packaging
> at the user level rather than the system level.
>

yup -- conda is, indeed, pretty cool.

   > I think there is a bit of fuzz here -- cPython, at least, uses the "the

> > operating system provided C/C++
> > dynamic linking system" -- it's not a totally independent thing.
>
> I'm specifically referring to the *declaration* of dependencies here.
>

sure -- that's my point about the current "missing link" -- setuptools,
pip, etc, can only declare python-package-level dependencies, not
binary-level dependencies.

My idea is to bundle up a shared lib in a python package -- then, if you
declare a dependency on that package, you've handles the dep issue. The
trick is that a particular binary wheel depends on that other binary wheel
-- rather than the whole package depending on it. (that is, on linux, it
would have no dependency, on OS-X it would -- but then only the wheel built
for a non-macports build, etc).

I think we could hack around this by monkey-patching the wheel after it is
built, so may be worth playing with to see how it works before proposing
any changes to the ecosystem.

> And if you are using something like conda you don't need pip
>
> or wheels anyway!
>
> Correct, just as if you're relying solely on Linux system packages,
> you don't need pip or wheels. Aside from the fact that conda is
> cross-platform, the main difference between the conda community and a
> Linux distro is in the *kind* of software we're likely to have already
> done the integration work for.
>

sure. but the cross-platform thing is BIG -- we NEED pip and wheel because
rpm, or deb, or ... are all platform and distro dependent -- we want a way
for package maintainers to support a broad audience without having to deal
with 12 different package systems.

The key to understanding the difference in the respective roles of pip
> and conda is realising that there are *two* basic distribution
> scenarios that we want to be able to cover (I go into this in more
> detail in
> https://www.python.org/dev/peps/pep-0426/#development-distribution-and-deployment-of-python-software
> ):
>

hmm -- sure, they are different, but is it impossible to support both with
one system?


> * software developer/publisher -> software integrator/service operator
> (or data analyst)
> * software developer/publisher -> software integrator -> service
> operator (or data analyst)
>
...

> On the consumption side, though, the nature of the PyPA tooling as a
> platform-independent software publication toolchain means that if you
> want to consume the PyPA formats directly, you need to be prepared to
> do your own integration work.


Exactly! and while Linux system admins can do their own system integration
work, everyday users (and many Windows sys admins) can't, and we shouldn't
expect them to.

And, in fact, the PyPA tooling does support the more casual user much of
the time -- for example, I'm in the third quarter of a Python certification
class -- Intro, Web development, Advanced topics -- and only half way
through the third class have I run into any problems with sticking with the
PyPA tools.

(except for pychecker -- not being on Pypi :-( )

Many public web service developers are
> entirely happy with that deal, but most system administrators and data
> analysts trying to deal with components written in multiple
> programming languages aren't.
>

exactly -- but it's not because the audience is different in their role --
it's because different users need different python packages. The PyPA tools
support pure-python great -- and compiled extensions without deps pretty
well -- but there is a bit of gap with extensions that require other deps.

It's a 90% (95%) solution... It'd be nice to get it to a 99% solution.

Where is really gets ugly is where you need stuff that has nothing to do
with python -- say a Julia run-time, or ...

Anaconda is there to support that: their philosophy is that if you are
trying to do full-on data analysis with python, you are likely to need
stuff strickly beyond the python ecosystem -- your own Fortran code, numpy
(which requires LLVM), etc.

Maybe they are right -- but there is still a heck 

Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Paul Moore
On 16 May 2015 at 20:04, Chris Barker  wrote:
> I was referring to the SetDllDirectory API. I don't think that gets picked
> up by other processes.
>
> from:
>
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms686203%28v=vs.85%29.aspx
>
> It looks like you can add a path, at run time, that gets searched for dlls
> before the rest of the system locations. And this does to effect any other
> applications. But you'd need to make sure this got run before any of the
> effected packages where loaded -- which is proabbly what David meant by
> needing to "control the python binary".

Ah, sorry - I misunderstood you. This might work, but as you say, the
DLL Path change would need to run before any imports needed it. Which
basically means it needs to be part of the Python interpreter startup.
It *could* be run as normal user code - you just have to ensure you
run it before any imports that need shared libraries. But that seems
very fragile to me. I'm not sure it's viable as a generic solution.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread Paul Moore
On 16 May 2015 at 20:04, David Mertz  wrote:
> What would be better as a user experience would be to let users do this:
>
>   % pip install --upgrade pip
>   % pip install some_conda_package
>
> Whether that second command ultimately downloads code from pyip.python.org
> or from repo.continuum.io is probably less important for a user experience
> perspective.  Continuum is very happy to upload all of our conda packages to
> PyPI if this would improve this user experience.  Obviously, the idea here
> would be that the user would be able to type 'pip list' and friends
> afterward, and have knowledge of what was installed, even as conda packages.
>
> I'm hoping members of the SIG can help me understand both the technical and
> social obstacles that need to be overcome before this can happen.

My immediate thought is, what obstacles stand in the way of a "conda
to wheel" conversion utility? With such a utility, a wholesale
conversion of conda packages to wheels, along with hosting those
wheels somewhere (binstar? PyPI isn't immediately possible as only
package owners can upload files), would essentially give this
capability.

There presumably are issues with this approach (maybe technical, more
likely social) but it seems to me that understanding *why* this
approach doesn't work would be a good first step towards identifying
an actual solution.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Making pip and PyPI work with conda packages

2015-05-16 Thread David Mertz
I've just started monitoring this SIG to get a sense of the issues and
status of things.  I've also just started working for Continuum Analytics.

Continuum has a great desire to make 'pip' work with conda packages.
Obviously, we love for users to choose the Anaconda Python distribution but
many will not for a variety of reasons (many good reasons).

However, we would like for users of other distros still to be able to
benefit from our creation of binary packages for many platforms in the
conda format.  As has been discussed in recent threads on dependency
solving, the way conda provides metadata apart from entire packages makes
much of that work easier.  But even aside from that, there are simply a
large number of well-tested packages (not only for Python, it is true, so
that's possibly a wrinkle in the task) we have generated in conda format.

It is true that right now, a user can in principle type:

  % pip install conda
  % conda install some_conda_package

But that creates two separate systems for tracking what's installed and
what dependencies are resolved; and many users will not want to convert
completely to conda after that step.

What would be better as a user experience would be to let users do this:

  % pip install --upgrade pip
  % pip install some_conda_package

Whether that second command ultimately downloads code from pyip.python.org
or from repo.continuum.io is probably less important for a user experience
perspective.  Continuum is very happy to upload all of our conda packages
to PyPI if this would improve this user experience.  Obviously, the idea
here would be that the user would be able to type 'pip list' and friends
afterward, and have knowledge of what was installed, even as conda packages.

I'm hoping members of the SIG can help me understand both the technical and
social obstacles that need to be overcome before this can happen.

Yours, David...
-- 
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 11:54 AM, Paul Moore  wrote:

> > could you clarify a bit -- I thought that this could, at least, put a
> dir on
> > the search path that was specific to that python context. So it would
> > require cooperation among all the packages being used at once, but not
> get
> > tangled up with the rest of the system. but maybe I'm wrong here -- I
> have
> > no idea what the heck I'm doing with this!
>
> Suppose Python adds C:\PythonXY\SharedDLLs to %PATH%. Suppose there's
> a libpng.dll in there, for matplotlib.
>

I think we all agree that %PATH% is NOT the option! Taht is the key source
od dll hell on Windows.

I was referring to the SetDllDirectory API. I don't think that gets picked
up by other processes.

from:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686203%28v=vs.85%29.aspx

It looks like you can add a path, at run time, that gets searched for dlls
before the rest of the system locations. And this does to effect any other
applications. But you'd need to make sure this got run before any of the
effected packages where loaded -- which is proabbly what David meant by
needing to "control the python binary".

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Chris Barker
On Sat, May 16, 2015 at 4:13 AM, Paul Moore  wrote:

> > Though it's a lot harder to provide a build environment than just the
> lib to
> > link too .. I"m going to have to think more about that...
>
> It seems to me that the end user doesn't really have a problem here
> ("pip install matplotlib" works fine for me using the existing wheel).
>

Sure -- but that's because Matthew Brett has done a lot of work to make
that happen.

> It's the package maintainers (who have to build the binaries) that
> have the issue because everyone ends up doing the same work over and
> over, building dependencies.


Exactly -- It would be nice if the ecosystem made that easier.


> So rather than trying to address the hard
> problem of dynamic linking, maybe a simpler solution is to set up a
> PyPI-like hosting solution for static libraries of C dependencies?
>
> It could be as simple as a github project that contained a directory
> for each dependency,


I started that here:

https://github.com/PythonCHB/mac-builds

but haven't kept it up. And Matthew Brett has done most of the work here:

https://github.com/MacPython

not sure how he's sharing the static libs -- but it could be done.

 With a setuptools build plugin you could even just
>
specify your libraries in setup.py, and have the plugin download the
> lib files automatically at build time.


actually, that's a pretty cool idea! you'd need  place to host them --
gitHbu is no longer hosting "downloads" are they? though you could probably
use github-pages.. (or somethign else)


> People add libraries to the
> archive simply by posting pull requests. Maybe the project maintainer
> maintains the actual binaries by running the builds separately and
> publishing them separately, or maybe PRs include binaries


or you use a CI system to build them. Something like this is being done by
a bunch of folks for conda/binstar:

https://github.com/ioos/conda-recipes

is just one example.

PS The above is described as if it's single-platform, mostly because I
>
only tend to think about these issues from a Windows POV, but it
> shouldn't be hard to extend it to multi-platform.
>

Indeed -- the MacWheels projects are, of course single platform, but could
be extended. though at the end of the day, there isn't much to share
between building libs on different platforms (unless you are using a
cross-platfrom build tool -- why I was trying out gattai for my stuff)

The conda stuff is multi-platform, though, in fact, you have to write a
separate build script for each platform -- it doesn't really provide
anything to help with that part.

But while these efforts are moving towards removing the need for every
pacakge maintainer to build the deps -- we are now duplicating the effort
of trying to remove duplication of effort :-) -- but maybe just waiting for
something to gain momentum and rise to the top is the answer.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Paul Moore
On 16 May 2015 at 19:40, Chris Barker  wrote:
>> With 2., you still have the issue of DLL hell,
>
> could you clarify a bit -- I thought that this could, at least, put a dir on
> the search path that was specific to that python context. So it would
> require cooperation among all the packages being used at once, but not get
> tangled up with the rest of the system. but maybe I'm wrong here -- I have
> no idea what the heck I'm doing with this!

Suppose Python adds C:\PythonXY\SharedDLLs to %PATH%. Suppose there's
a libpng.dll in there, for matplotlib. Everything works fine.

Then I install another non-Python application that uses libpng.dll,
and does so by putting libpng.dll alongside the executable (a common
way of making DLLs available with Windows applications). Also assume
that the application installer adds the application directory to the
*start* of PATH.

Now, Python extensions will use this 3rd party application's DLL
rather than the correct one. If it's ABI-incompatible, the Python
extension will crash. If it's ABI compatible, but behaves differently
(it could be a different version) there could be inconsistencies or
failures.

The problem is that while Python can add a DLL directory to PATH, it
cannot control what *else* is on PATH, or what has priority.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Chris Barker
On Fri, May 15, 2015 at 11:35 PM, David Cournapeau 
wrote:

> On Sat, May 16, 2015 at 4:56 AM, Chris Barker 
> wrote:
>
>>
>> But in short -- I'm pretty sure there is a way, on all systems, to have a
>> standard way to build extension modules, combined with a standard way to
>> install shared libs, so that a lib can be shared among multiple packages.
>> So the question remains:
>>
>
> There is actually no way to do that on windows without modifying the
> interpreter somehow.
>

Darn.


> This was somehow discussed a bit at PyCon when talking about windows
> packaging:
>
>  1. the simple way to share DLLs across extensions is to put them in the
> %PATH%, but that's horrible.
>

yes -- that has to be off the table, period.


> 2. there are ways to put DLLs in a shared directory *not* in the %PATH%
> since at least windows XP SP2 and above, through the SetDllDirectory API.
>
> With 2., you still have the issue of DLL hell,
>

could you clarify a bit -- I thought that this could, at least, put a dir
on the search path that was specific to that python context. So it would
require cooperation among all the packages being used at once, but not get
tangled up with the rest of the system. but maybe I'm wrong here -- I have
no idea what the heck I'm doing with this!

which may be resolved through naming and activation contexts.
>

I guess that's what I mean by the above..


> I had a brief chat with Steve where he mentioned that this may be a
> solution, but he was not 100 % sure IIRC. The main drawback of this
> solution is that it won't work when inheriting virtual environments (as you
> can only set a single directory).
>

no relative paths here? or path that can be set at run time? or maybe I"m
missing what "inheriting virtual environments" means...


> FWIW, we are about to deploy 2. @ Enthought (where we control the python
> interpreter, so it is much easier for
>
us).
>

It'll be great to see how that works out, then. I take that this means that
for Canopy, you've decided that statically linking everything is NOT The
way to go. Which is a good data point to have.

Thanks for the update.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Donald Stufft

> On May 16, 2015, at 1:24 PM, Nick Coghlan  wrote:
> 
> On 17 May 2015 at 00:36, Justin Cappos  wrote:
>> This only considers computation cost though.  Other factors can become more
>> expensive than computation.  For example, SAT solvers need all the rules to
>> consider.  So a SAT solution needs to effectively download the full
>> dependency graph before starting.  A backtracking dependency resolver can
>> just download packages or dependency information as it considers them.
> 
> This is the defining consideration for pip at this point: a SAT solver
> requires publication of static dependency metadata on PyPI, which is
> dependent on both the Warehouse migration *and* the completion and
> acceptance of PEP 426. Propagation out to PyPI caching proxies and
> mirrors like devpi and the pulp-python plugin will then take even
> longer.
> 
> A backtracking resolver doesn't have those gating dependencies, as it
> can tolerate the current dynamic metadata model.
> 


Even when we have Warehouse and PEP 426, that only gives us that data going
forward, the 400k files that currently exist on PyPI still won’t have static
metadata. We could parse it out for Wheels but not for anything else. For the
foreseeable future any solution will need to be able to handle iteratively
finding constraints. Though I think a SAT solver can do it if it can handle
incremental solving or just by re-doing the SAT problem each time we discover
a new constraint.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Nick Coghlan
On 17 May 2015 at 00:36, Justin Cappos  wrote:
> This only considers computation cost though.  Other factors can become more
> expensive than computation.  For example, SAT solvers need all the rules to
> consider.  So a SAT solution needs to effectively download the full
> dependency graph before starting.  A backtracking dependency resolver can
> just download packages or dependency information as it considers them.

This is the defining consideration for pip at this point: a SAT solver
requires publication of static dependency metadata on PyPI, which is
dependent on both the Warehouse migration *and* the completion and
acceptance of PEP 426. Propagation out to PyPI caching proxies and
mirrors like devpi and the pulp-python plugin will then take even
longer.

A backtracking resolver doesn't have those gating dependencies, as it
can tolerate the current dynamic metadata model.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Nick Coghlan
On 15 May 2015 at 04:01, Chris Barker  wrote:
>> >> I'm confused -- you don't want a system to be able to install ONE
>> >> version
>> >> of a lib that various python packages can all link to? That's really
>> >> the
>> >> key use-case for me
>
>
>>
>> Are we talking about Python libraries accessed via Python APIs, or
>> linking to external dependencies not written in Python (including
>> linking directly to C libraries shipped with a Python library)?
>
>
> I, at least, am talking about the latter. for a concrete example: libpng,
> for instance, might be needed by PIL, wxPython, Matplotlib, and who knows
> what else. At this point, if you want to build a package of any of these,
> you need to statically link it into each of them, or distribute shared libs
> with each package -- if you ware using them all together (which I do,
> anyway) you now have three copies of the same lib (but maybe different
> versions) all linked into your executable. Maybe there is no downside to
> that (I haven't had a problem yet), but it seems like a bad way to do it!
>
>> It's the latter I consider to be out of scope for a language specific
>> packaging system
>
>
> Maybe, but it's a problem to be solved, and the Linux distros more or less
> solve it for us, but OS-X and Windows have no such system built in (OS-X
> does have Brew and macports)

Windows 10 has Chocalatey and OneGet:

* https://chocolatey.org/
* 
http://blogs.msdn.com/b/garretts/archive/2015/01/27/oneget-and-the-windows-10-preview.aspx

conda and nix then fill the niche for language independent packaging
at the user level rather than the system level.

>> - Python packaging dependencies are designed to
>> describe inter-component dependencies based on the Python import
>> system, not dependencies based on the operating system provided C/C++
>> dynamic linking system.
>
> I think there is a bit of fuzz here -- cPython, at least, uses the "the
> operating system provided C/C++
> dynamic linking system" -- it's not a totally independent thing.

I'm specifically referring to the *declaration* of dependencies here.
While CPython itself will use the dynamic linker to load extension
modules found via the import system, the loading of further
dynamically linked modules beyond that point is entirely opaque not
only to the interpreter runtime at module import time, but also to pip
at installation time.

>> If folks are after the latter, than they want
>> a language independent package system, like conda, nix, or the system
>> package manager in a Linux distribution.
>
> And I am, indeed, focusing on conda lately for this reason -- but not all my
> users want to use a whole new system, they just want to "pip install" and
> have it work. And if you are using something like conda you don't need pip
> or wheels anyway!

Correct, just as if you're relying solely on Linux system packages,
you don't need pip or wheels. Aside from the fact that conda is
cross-platform, the main difference between the conda community and a
Linux distro is in the *kind* of software we're likely to have already
done the integration work for.

The key to understanding the difference in the respective roles of pip
and conda is realising that there are *two* basic distribution
scenarios that we want to be able to cover (I go into this in more
detail in 
https://www.python.org/dev/peps/pep-0426/#development-distribution-and-deployment-of-python-software):

* software developer/publisher -> software integrator/service operator
(or data analyst)
* software developer/publisher -> software integrator -> service
operator (or data analyst)

Note the second line has 3 groups and 2 distribution arrows, while the
first line only has the 2 groups and a single distribution step.

pip and the other Python specific tools cover that initial
developer/publisher -> integrator link for Python projects. This means
that Python developers only need to learn a single publishing
toolchain (the PyPA tooling) to get started, and they'll be able to
publish their software in a format that any integrator that supports
Python can consume (whether that's for direct consumption in a DIY
integration scenario, or to put through a redistributor's integration
processes).

On the consumption side, though, the nature of the PyPA tooling as a
platform-independent software publication toolchain means that if you
want to consume the PyPA formats directly, you need to be prepared to
do your own integration work. Many public web service developers are
entirely happy with that deal, but most system administrators and data
analysts trying to deal with components written in multiple
programming languages aren't.

That latter link, where the person or organisation handling the
software integration task is distinct from the person or organisation
running an operational service, or carrying out some data analysis,
are where the language independent redistributor tools like
Chocolatey, Nix, deb, rpm, conda, Docker, etc all come in - they let a

Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread David Cournapeau
On Sun, May 17, 2015 at 12:40 AM, Daniel Holth  wrote:

>
> On May 16, 2015 11:22 AM, "David Cournapeau"  wrote:
> >
> >
> >
> > On Sat, May 16, 2015 at 11:36 PM, Justin Cappos  wrote:
> >>>
> >>> I am no expert, but I don't understand why backtracking algorithms
> would to be faster than SAT, since they both potentially need to walk over
> the full set of possible solutions. It is hard to reason about the cost
> because the worst case is in theory growing exponentially in both cases.
> >>
> >>
> >> This is talked about a bit in this thread:
> https://github.com/pypa/pip/issues/988
> >>
> >> Each algorithm could be computationally more efficient.  Basically, *if
> there are no conflicts* backtracking will certainly win.  If there are a
> huge number of conflicts a SAT solver will certainly win.  It's not clear
> where the tipping point is between the two schemes.
> >>
> >> However, a better question is does the computational difference
> matter?  If one is a microsecond faster than the other, I don't think
> anyone cares.  However, from the OPIUM paper (listed off of that thread),
> it is clear that SAT solver resolution can be slow without optimizations to
> make them work more like backtracking resolvers.  From my experience
> backtracking resolvers are also slow when the conflict rate is high.
> >
> >
> > Pure SAT is fast enough in practice in my experience (concretely:
> solving thousand of rules takes < 1 sec). It becomes more complicated as
> you need to optimize the solution, especially when you have already
> installed packages. This is unfortunately not as well discussed in the
> literature. Pseudo-boolean SAT for optimization was argued to be too slow
> by the 0 install people, but OTOH, this seems to be what's used in conda,
> so who knows :)
>
> Where optimizing means something like "find a solution with the newest
> possible releases of the required packages", not execution speed.
>

Indeed, it was not obvious in this context :) Though in theory,
optimization is more general. It could be optimizing w.r.t. a cost function
taking into account #packages, download size, minimal number of changes,
etc... This is where you want a pseudo-boolean SAT, which is what conda
uses I think.

0install, composer and I believe libsolv took a different route, and use
heuristics to find a reasonably good solution by picking the next
candidate. This requires access to the internals of the SAT solver though
(not a problem if you have a python implementation).

David

> > If you SAT solver is in pure python, you can choose a "direction" of the
> search which is more meaningful. I believe this is what 0install does from
> reading http://0install.net/solver.html, and what we have in our own SAT
> solver code. I unfortunately cannot look at the 0install code myself as it
> is under the GPL and am working on a BSD solver implementation. I also do
> not know how they handle updates and already installed packages.
> >
> >>
> >> This only considers computation cost though.  Other factors can become
> more expensive than computation.  For example, SAT solvers need all the
> rules to consider.  So a SAT solution needs to effectively download the
> full dependency graph before starting.  A backtracking dependency resolver
> can just download packages or dependency information as it considers them.
> The bandwidth cost for SAT solvers should be higher.
> >
> >
> > With a reasonable representation, I think you can make it small enough.
> To give an idea, our index @ Enthought containing around 20k packages takes
> ~340 kb compressed w/ bz2 if you only keep the data required for dependency
> handling (name, version and runtime dependencies), and that's using json,
> an inefficient encoding, so I suspect encoding all of pypi may be a few MB
> only fetch, which is generally faster that doing tens of http requests.
> >
> > The libsvolv people worked on a binary representation that may also be
> worth looking at.
> >
> >>
> >> P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy
> to talk more with you and/or Robert about minutiae that others may not care
> about.
> >
> >
> > Sure, I would be happy too. As I mentioned before, we have some code
> around a SAT-based solver, but it is not ready yet, which is why we kept it
> private (https://github.com/enthought/sat-solver). It handles well (==
> both speed and quality-wise) the case where nothing is installed, but
> behaves poorly when packages are already installed, and does not handle the
> update case yet. The code is also very prototype-ish, but is not too
> complicated to experimente with it.
> >
> > David
> >
> >
> > ___
> > Distutils-SIG maillist  -  Distutils-SIG@python.org
> > https://mail.python.org/mailman/listinfo/distutils-sig
> >
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Daniel Holth
On May 16, 2015 11:22 AM, "David Cournapeau"  wrote:
>
>
>
> On Sat, May 16, 2015 at 11:36 PM, Justin Cappos  wrote:
>>>
>>> I am no expert, but I don't understand why backtracking algorithms
would to be faster than SAT, since they both potentially need to walk over
the full set of possible solutions. It is hard to reason about the cost
because the worst case is in theory growing exponentially in both cases.
>>
>>
>> This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988
>>
>> Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.
>>
>> However, a better question is does the computational difference matter?
If one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.
>
>
> Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes < 1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

Where optimizing means something like "find a solution with the newest
possible releases of the required packages", not execution speed.

> If you SAT solver is in pure python, you can choose a "direction" of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.
>
>>
>> This only considers computation cost though.  Other factors can become
more expensive than computation.  For example, SAT solvers need all the
rules to consider.  So a SAT solution needs to effectively download the
full dependency graph before starting.  A backtracking dependency resolver
can just download packages or dependency information as it considers them.
The bandwidth cost for SAT solvers should be higher.
>
>
> With a reasonable representation, I think you can make it small enough.
To give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.
>
> The libsvolv people worked on a binary representation that may also be
worth looking at.
>
>>
>> P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy
to talk more with you and/or Robert about minutiae that others may not care
about.
>
>
> Sure, I would be happy too. As I mentioned before, we have some code
around a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.
>
> David
>
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread David Cournapeau
On Sat, May 16, 2015 at 11:36 PM, Justin Cappos  wrote:

> I am no expert, but I don't understand why backtracking algorithms would
>> to be faster than SAT, since they both potentially need to walk over the
>> full set of possible solutions. It is hard to reason about the cost because
>> the worst case is in theory growing exponentially in both cases.
>>
>
> This is talked about a bit in this thread:
> https://github.com/pypa/pip/issues/988
>
> Each algorithm could be computationally more efficient.  Basically, *if
> there are no conflicts* backtracking will certainly win.  If there are a
> huge number of conflicts a SAT solver will certainly win.  It's not clear
> where the tipping point is between the two schemes.
>
> However, a better question is does the computational difference matter?
> If one is a microsecond faster than the other, I don't think anyone cares.
> However, from the OPIUM paper (listed off of that thread), it is clear that
> SAT solver resolution can be slow without optimizations to make them work
> more like backtracking resolvers.  From my experience backtracking
> resolvers are also slow when the conflict rate is high.
>

Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes < 1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

If you SAT solver is in pure python, you can choose a "direction" of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.


> This only considers computation cost though.  Other factors can become
> more expensive than computation.  For example, SAT solvers need all the
> rules to consider.  So a SAT solution needs to effectively download the
> full dependency graph before starting.  A backtracking dependency resolver
> can just download packages or dependency information as it considers them.
> The bandwidth cost for SAT solvers should be higher.
>

With a reasonable representation, I think you can make it small enough. To
give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.

The libsvolv people worked on a binary representation that may also be
worth looking at.


> P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
> talk more with you and/or Robert about minutiae that others may not care
> about.
>

Sure, I would be happy too. As I mentioned before, we have some code around
a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.

David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Justin Cappos
>
> I am no expert, but I don't understand why backtracking algorithms would
> to be faster than SAT, since they both potentially need to walk over the
> full set of possible solutions. It is hard to reason about the cost because
> the worst case is in theory growing exponentially in both cases.
>

This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988

Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.

However, a better question is does the computational difference matter?  If
one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.

This only considers computation cost though.  Other factors can become more
expensive than computation.  For example, SAT solvers need all the rules to
consider.  So a SAT solution needs to effectively download the full
dependency graph before starting.  A backtracking dependency resolver can
just download packages or dependency information as it considers them.  The
bandwidth cost for SAT solvers should be higher.

Thanks,
Justin
P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
talk more with you and/or Robert about minutiae that others may not care
about.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Sébastien Douche
On Fri, 15 May 2015, at 15:48, Donald Stufft wrote:
> Hey!

Hi Donald

> Ideally I hope people start to use ReadTheDocs instead of PyPI itself.

+1. Do you want to use the python.org domain (ex.
"pypi.python.org/docs") or keep RTD on it own domain?


-- 
Sébastien Douche 
Twitter: @sdouche
http://douche.name
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Nick Coghlan
On 16 May 2015 at 11:52, Robert Collins  wrote:
> On 16 May 2015 at 13:45, Donald Stufft  wrote:
>>
>>> On May 15, 2015, at 9:22 PM, Robert Collins  
>>> wrote:
>>>
>>> On 16 May 2015 at 11:08, Marcus Smith  wrote:
 Why not start with pip at least being a "simple" fail-on-conflict resolver
 (vs the "1st found wins" resolver it is now)...

 You'd "backtrack" for the sake of re-walking when new constraints are 
 found,
 but not for the purpose of solving conflicts.

 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.
>>>
>>> Well, I'm scratching the itch I have. If its too hard to get something
>>> decent, sure I might back off in my goals, but I see no point aiming
>>> for something less than all the other language specific packaging
>>> systems out there have.
>>
>>
>> So what makes the other language specific packaging systems different? As far
>> as I know all of them have complete archives (e.g. they are like PyPI where 
>> they
>> have a lot of versions, not like Linux Distros). What can we learn from how 
>> they
>> solved this?
>
> NB; I have by no means finished low hanging heuristics and space
> trimming stuff :). I have some simple things in mind and am sure I'll
> end up with something 'good enough' for day to day use. The thing I'm
> worried about is the long term health of the approach.

Longer term, I think it makes sense to have the notion of "active" and
"obsolete" versions baked into PyPI's API and the web UI. This
wouldn't be baked into the package metadata itself (unlike the
proposed "Obsoleted-By" field for project renaming), but rather be a
dynamic reflection of whether or not *new* users should be looking at
the affected version, and whether or not it should be considered as a
candidate for dependency resolution when not specifically requested.
(This could also replace the current "hidden versions" feature, which
only hides things from the web UI, without having any impact on the
information published to automated tools through the programmatic API)

Tools that list outdated packages could also be simplified a bit, as
their first pass could just be to check the obsolescence markers on
installed packages, with the second pass being to check for newer
versions of those packages.

While the bare minimum would be to let project mantainers set the
obsolescence flag directly, we could also potentially offer projects
some automated obsolescence schemes, such as:

* single active released version, anything older is marked as obsolete
whenever a new (non pre-release) version is uploaded
* semantic versioning, with a given maximum number of active released
X versions (e.g. 2), but only the most recent (according to PEP 440)
released version with a given X.* is active, everything else is
obsolete
* CPython-style and date-based versioning, with a given maximum number
of active released X.Y versions (e.g. 2), but only the most recent
(according to PEP 440) released version with a given X.Y.* is active,
everything else is obsolete

Pre-release versions could also be automatically flagged as obsolete
by PyPI as soon as a newer version for the same release (including the
final release itself) was uploaded for the given package.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Paul Moore
On 16 May 2015 at 06:45, Chris Barker  wrote:
>> > Personally, I'm on the fence here -- I really want newbies to be able to
>> > simply "pip install" as many packages as possible and get a good result
>> > when
>> > they do it.
>>
>> Static linking gives that on Windows FWIW. (And maybe also on OSX?)
>> This is a key point, though - the goal shouldn't be "use dynamic
>> linking" but rather "make the user experience as easy as possible". It
>> may even be that the best approach (dynamic or static) differs
>> depending on platform.
>
>
> true -- though we also have another problem -- that static linking solution
> is actually a big pain for package maintainers -- building and linking the
> dependencies the right way is a pain -- and now everyone that uses a given
> lib has to figure out how to do it. Giving folks a dynamic lib they can use
> would mie it easier for tehm to build their packages -- a nice benifit
> there.
>
> Though it's a lot harder to provide a build environment than just the lib to
> link too .. I"m going to have to think more about that...

It seems to me that the end user doesn't really have a problem here
("pip install matplotlib" works fine for me using the existing wheel).
It's the package maintainers (who have to build the binaries) that
have the issue because everyone ends up doing the same work over and
over, building dependencies. So rather than trying to address the hard
problem of dynamic linking, maybe a simpler solution is to set up a
PyPI-like hosting solution for static libraries of C dependencies?

It could be as simple as a github project that contained a directory
for each dependency, with scripts to build Python-compatible static
libraries, and probably built .lib files for the supported
architectures. With a setuptools build plugin you could even just
specify your libraries in setup.py, and have the plugin download the
lib files automatically at build time. People add libraries to the
archive simply by posting pull requests. Maybe the project maintainer
maintains the actual binaries by running the builds separately and
publishing them separately, or maybe PRs include binaries - either way
would work (although having the maintainer do it ensures a certain
level of QA that the build process is reproducible).

It could even include libraries that people need for embedding, rather
than extensions (I recently needed a version of libxpm compatible with
Python 3.5, for building a Python-enabled vim, for example).

The msys2 projects provides something very similar to this at
https://github.com/Alexpux/MINGW-packages which is a repository of
build scripts for various packages.

Paul

PS The above is described as if it's single-platform, mostly because I
only tend to think about these issues from a Windows POV, but it
shouldn't be hard to extend it to multi-platform.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

2015-05-16 Thread Paul Moore
On 16 May 2015 at 07:35, David Cournapeau  wrote:
>> But in short -- I'm pretty sure there is a way, on all systems, to have a
>> standard way to build extension modules, combined with a standard way to
>> install shared libs, so that a lib can be shared among multiple packages. So
>> the question remains:
>
> There is actually no way to do that on windows without modifying the
> interpreter somehow. This was somehow discussed a bit at PyCon when talking
> about windows packaging:
>
>  1. the simple way to share DLLs across extensions is to put them in the
> %PATH%, but that's horrible.
>  2. there are ways to put DLLs in a shared directory *not* in the %PATH%
> since at least windows XP SP2 and above, through the SetDllDirectory API.
>
> With 2., you still have the issue of DLL hell, which may be resolved through
> naming and activation contexts. I had a brief chat with Steve where he
> mentioned that this may be a solution, but he was not 100 % sure IIRC. The
> main drawback of this solution is that it won't work when inheriting virtual
> environments (as you can only set a single directory).
>
> FWIW, we are about to deploy 2. @ Enthought (where we control the python
> interpreter, so it is much easier for us).

This is indeed precisely the issue. In general, Python code can run
with "the executable" being in many different places - there are the
standard installs, virtualenvs, and embedding scenarios to consider.
So "put DLLs alongside the executable", which is often how Windows
applications deal with this issue, is not a valid option (that's an
option David missed out above, but that's fine as it doesn't work :-))

Putting DLLs on %PATH% *does* cause problems, and pretty severe ones.
People who use ports of Unix tools, such as myself, hit this a lot -
at one point I got so frustrated with various incompatible versions of
libintl showing up on my PATH, all with the same name, that I went on
a spree of rebuilding all of the GNU tools without libintl support,
just to avoid the issue (and older versions openssl were just as bad
with libeay, etc).

So, as David says, you pretty much have to use SetDllDirectory and
similar features to get a viable location for shared DLLs. I guess it
*may* be possible to call those APIs from a Python extension that you
load *before* using any shared DLLs, but that seems like a very
fragile solution. It's also possible for Python 3.6+ to add a new
"shared DLLs" location for such things, which the core interpreter
includes (either via SetDllDirectory or by the same mechanism that
adds C:\PythonXY\DLLs to the search path at the moment). But that
wouldn't help older versions.

So while I encourage Chris' enthusiasm in looking for a solution to
this issue, I'm not sure it's as easy as he's hoping.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Wes Turner
On May 16, 2015 4:55 AM, "Nick Coghlan"  wrote:
>
> On 16 May 2015 at 04:34, Donald Stufft  wrote:
> > So I can’t speak for ReadTheDocs, but I believe that they are
considering
> > and/or are planning on offering arbitrary HTML uploads similarly to how
> > you can upload documentation to PyPI. I don’t know if this will actually
> > happen and what it would look like but I know they are thinking about
it.
>
> I've never tried it with ReadTheDocs, but in theory the ".. raw::
> html" docutils directive allows arbitrary HTML content to be embedded
> in a reStructuredText page.
>
> Regardless, "Can ReadTheDocs do X?" questions are better asked on
> https://groups.google.com/forum/#!forum/
read-the-docs
,

ReadTheDocs is hiring!

https://blog.readthedocs.com/read-the-docs-is-hiring/

> while both
> GitHub and Atlassian (via BitBucket) offer free static HTML hosting.

I just wrote a tool (pypi:pgs) for serving files over HTTP directly from
gh-pages branches that works in conjunction with pypi:ghp-import (>
gh-pages; touch .nojekyll).

CloudFront DNS can sort of be used to add TLS/SSL to custom domains with
GitHub Pages (and probably BitBucket)

>
> In relation to the original question, +1 for attempting to phase out
> PyPI's documentation hosting capability in favour of delegating to
> RTFD or third party static HTML hosting. One possible option to
> explore that minimises disruption for existing users might be to stop
> offering it to *new* projects, while allowing existing projects to
> continue uploading new versions of their documentation.

- [ ] DOC: migration / alternatives guide

>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI and Uploading Documentation

2015-05-16 Thread Nick Coghlan
On 16 May 2015 at 04:34, Donald Stufft  wrote:
> So I can’t speak for ReadTheDocs, but I believe that they are considering
> and/or are planning on offering arbitrary HTML uploads similarly to how
> you can upload documentation to PyPI. I don’t know if this will actually
> happen and what it would look like but I know they are thinking about it.

I've never tried it with ReadTheDocs, but in theory the ".. raw::
html" docutils directive allows arbitrary HTML content to be embedded
in a reStructuredText page.

Regardless, "Can ReadTheDocs do X?" questions are better asked on
https://groups.google.com/forum/#!forum/read-the-docs, while both
GitHub and Atlassian (via BitBucket) offer free static HTML hosting.

In relation to the original question, +1 for attempting to phase out
PyPI's documentation hosting capability in favour of delegating to
RTFD or third party static HTML hosting. One possible option to
explore that minimises disruption for existing users might be to stop
offering it to *new* projects, while allowing existing projects to
continue uploading new versions of their documentation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig