Re: Speculative idea: incorporating venv into our Python application packaging advice

2017-05-23 Thread Charalampos Stratakis
Hi Nick,

Very interesting topic and certainly these ideas are worth exploring.

(Subscribing to this thread so I can revisit it more thoroughly at the future).

Regards,

Charalampos Stratakis
Associate Software Engineer
Python Maintenance Team, Red Hat


- Original Message -
From: "Nick Coghlan" <ncogh...@gmail.com>
To: "Fedora Python SIG" <python-devel@lists.fedoraproject.org>
Sent: Sunday, May 21, 2017 6:07:26 AM
Subject: Speculative idea: incorporating venv into our Python application 
packaging advice

My day job currently involves working on a Python CLI (and potentially
a backing socket-activated service) that needs to run across
Fedora/RHEL/CentOS/SCLs, *without* accidentally exposing a Python
level API that we might inadvertently end up needing to support.

(Note: this CLI is not being, and will likely never be, proposed for
incorporation into Fedora itself - it's a tool to help migrate
applications between different operating system versions without doing
an in-place upgrade, so the update cycles need to be decoupled from
those of the operating system itself)

At the moment, we offer two different ways of installing that:

1. via pipsi, which uses the system Python, but has no access to
system level Python libraries
2. via RPM, which has access to system level Python libraries, exposes
the application's internal libraries for import by other applications
(which we don't really want to do) and also requires that *all*
dependencies be available as system packages

Both approaches have significant downsides:

* the pipsi based approach is *too* decoupled from the host OS,
installing things into the virtual environment even when a perfectly
acceptable version is already installed and maintained as a system
package. It also means we can't benefit from distro level patches to
packages like requests, so the app is decoupled from the system
certificate store
* the RPM based approach isn't decoupled from the OS *enough*, so we
can't readily do things like selectively installing private copies of
newer versions of dependencies on RHEL/CentOS, while using the system
packages on Fedora. It also means the Python packages implementing the
application itself are globally available for import rather than only
being usable from within the application

While we haven't implemented it yet, the approach I'm considering to
tackle this problem [1] involves integrating creation of an
app-specific private virtual environment into the definition of the
application RPM, with the following details:

* unlike pipsi, this virtual environment would be configured to allow
access to the system site packages, giving us the best of both worlds:
we'd use system packages if readily available, otherwise we'd stick
our own pinned dependency in the virtual env and treat it as part of
the application (and hence the app developers' responsibility to keep
up to date)
* we'd come up with some way of turning the Python level dependencies
into additional entries in the RPM's Sources list, and then turn those
into a local sdist index during the %prep phase. That way, we'd
support offline builds automatically, and be well positioned to have
pip autofill any gaps where system level dependencies didn't meet the
needs of the application
* we'd deliberately omit some of the packages injected into the
virtual environment from the resulting RPM (most notably: we'd either
remove pip, wheel, and setuptools, or else avoid installing them in
the first place)

Where I think this idea crosses over into being a suitable topic for
the Fedora Python SIG relates to the current modularity initiatives
and various problems we've faced over the years around separating the
challenges of "provide an application that happens to be written in
Python" and "provide a supported Python API as part of the system
Python installation".

Some examples:

* the helper library for the "mock" CLI tool had to be renamed to
"mockbuild" to fix a conflict with the upstream "mock" testing library
* despite officially having no supported public API, people still
write "import pip" instead of running the pip CLI in a subprocess
* ditto for the yum CLI (and even for DNF, some non-trivial changes
were recently needed to better separate the "supported for third party
use with defined backwards compatibilty guarantees" APIs from the "for
internal use by the DNF CLI and may change at any time" APIs

All of those could have been avoided if the recommended structure for
"applications that happen to be written in Python" included a virtual
environment that isolated the "private to the application" Python
modules (including the application's own source code) from the
"intended for third party consumption" public APIs.

In the near term, my own focus is going to be on figuring out the
details of this structure specifically for LeApp, but I wan

Speculative idea: incorporating venv into our Python application packaging advice

2017-05-20 Thread Nick Coghlan
My day job currently involves working on a Python CLI (and potentially
a backing socket-activated service) that needs to run across
Fedora/RHEL/CentOS/SCLs, *without* accidentally exposing a Python
level API that we might inadvertently end up needing to support.

(Note: this CLI is not being, and will likely never be, proposed for
incorporation into Fedora itself - it's a tool to help migrate
applications between different operating system versions without doing
an in-place upgrade, so the update cycles need to be decoupled from
those of the operating system itself)

At the moment, we offer two different ways of installing that:

1. via pipsi, which uses the system Python, but has no access to
system level Python libraries
2. via RPM, which has access to system level Python libraries, exposes
the application's internal libraries for import by other applications
(which we don't really want to do) and also requires that *all*
dependencies be available as system packages

Both approaches have significant downsides:

* the pipsi based approach is *too* decoupled from the host OS,
installing things into the virtual environment even when a perfectly
acceptable version is already installed and maintained as a system
package. It also means we can't benefit from distro level patches to
packages like requests, so the app is decoupled from the system
certificate store
* the RPM based approach isn't decoupled from the OS *enough*, so we
can't readily do things like selectively installing private copies of
newer versions of dependencies on RHEL/CentOS, while using the system
packages on Fedora. It also means the Python packages implementing the
application itself are globally available for import rather than only
being usable from within the application

While we haven't implemented it yet, the approach I'm considering to
tackle this problem [1] involves integrating creation of an
app-specific private virtual environment into the definition of the
application RPM, with the following details:

* unlike pipsi, this virtual environment would be configured to allow
access to the system site packages, giving us the best of both worlds:
we'd use system packages if readily available, otherwise we'd stick
our own pinned dependency in the virtual env and treat it as part of
the application (and hence the app developers' responsibility to keep
up to date)
* we'd come up with some way of turning the Python level dependencies
into additional entries in the RPM's Sources list, and then turn those
into a local sdist index during the %prep phase. That way, we'd
support offline builds automatically, and be well positioned to have
pip autofill any gaps where system level dependencies didn't meet the
needs of the application
* we'd deliberately omit some of the packages injected into the
virtual environment from the resulting RPM (most notably: we'd either
remove pip, wheel, and setuptools, or else avoid installing them in
the first place)

Where I think this idea crosses over into being a suitable topic for
the Fedora Python SIG relates to the current modularity initiatives
and various problems we've faced over the years around separating the
challenges of "provide an application that happens to be written in
Python" and "provide a supported Python API as part of the system
Python installation".

Some examples:

* the helper library for the "mock" CLI tool had to be renamed to
"mockbuild" to fix a conflict with the upstream "mock" testing library
* despite officially having no supported public API, people still
write "import pip" instead of running the pip CLI in a subprocess
* ditto for the yum CLI (and even for DNF, some non-trivial changes
were recently needed to better separate the "supported for third party
use with defined backwards compatibilty guarantees" APIs from the "for
internal use by the DNF CLI and may change at any time" APIs

All of those could have been avoided if the recommended structure for
"applications that happen to be written in Python" included a virtual
environment that isolated the "private to the application" Python
modules (including the application's own source code) from the
"intended for third party consumption" public APIs.

In the near term, my own focus is going to be on figuring out the
details of this structure specifically for LeApp, but I wanted to
raise the notion here early so I didn't go down any paths that would
later prove to be an absolute deal-breaker for updating the distro
level recommendations.

Cheers,
Nick.

[1] https://github.com/leapp-to/prototype/issues/126


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org