[issue42738] subprocess: don't close all file descriptors by default (close_fds=False)

2021-12-17 Thread Cristian Rodriguez


Cristian Rodriguez  added the comment:

My 2 CLP.
On linux/glibc you do not have to do this, there is a 
posix_spawn_file_actions_addclosefrom_np action which is implemented using  
close_range(2) which will give you the fastest way to this job without hassle.

for the not posix_spawn case, interate over /proc/self/fd and set the CLOEXEC 
flag using fcntl.. if close_Range is not available or returns returns different 
than 0

--
nosy: +judas.iscariote

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42738] subprocess: don't close all file descriptors by default (close_fds=False)

2021-10-27 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

Gregory P. Smith wrote:
> A higher level "best practices for launching child processes module" with 
> APIs reflecting explicit intents (performance vs security vs simplicity) 
> rather than requiring users to understand subprocess platform specific 
> details may be a good idea at this point (on PyPI I assume).

Interesting that you say that. subprocess was originally written with exactly 
this idea in mind - to create a module which deals with all the platform 
details, so that the user doesn't have to know about them: 
https://www.python.org/dev/peps/pep-0324/#motivation

On the topic itself: I believe Python should come with safe and usable 
defaults, but provide ways to enable alternative approaches which have more 
performance if the user so decides (by e.g. passing in a parameter). In such 
cases, the user then becomes responsible for understanding the implications.

Since there already is a parameter, expert users can already gain more 
performance by using it and then explicitly only closing FDs the users knows 
can/should be closed in the new process.

Perhaps there's a middle ground: have subprocess only loop over the first 64k 
FDs per default and not all possible ones.

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42738] subprocess: don't close all file descriptors by default (close_fds=False)

2021-10-27 Thread Alexey Izbyshev


Alexey Izbyshev  added the comment:

> As a concrete example, we have a (non-Python) build system and task runner 
> that orchestrates many tasks to run in parallel. Some of those tasks end up 
> invoking Python scripts that use subprocess.run() to run other programs. Our 
> task runner intentionally passes an inheritable file descriptor that is 
> unique to each task as a form of a keep-alive token; if the child processes 
> continue to pass inheritable file descriptors to their children, then we can 
> determine whether all of the processes spawned from a task have terminated by 
> checking whither the last open handle to that file descriptor has been 
> closed. This is particularly important when a processes exits before its 
> children, sometimes uncleanly due to being force killed by the system or by a 
> user.

I don't see how such scheme could be usable in a general-purpose build system. 
Python is just one example of a program that closes file descriptors for child 
processes, but there might be arbitrary more. In general, there is no guarantee 
that a descriptor would be inherited in a process tree if it contains at least 
one process running code that you don't fully control. To properly control 
process trees, I'd suggest to consider prctl(PR_SET_CHILD_SUBREAPER) or PID 
namespaces on Linux and job objects on Windows (don't know about other OSes).

> Regarding security, PEP 446 already makes it so that any files opened from 
> within a Python program are non-inheritable by default, which I agree is a 
> good default. One can make the argument that it's not Python's job to enforce 
> a security policy on file descriptors that a Python process has inherited 
> from a parent process, since Python cannot distinguish from descriptors that 
> were accidentally or intentionally inherited.

While I agree that such argument could be made, closing FDs for children also 
affects FDs opened by C extensions, which are not subject to PEP 446. And this 
is important not only for security: for example, it's easy to create a deadlock 
if one process is blocked on a read() from a pipe, which it expects to be 
closed at a proper moment, but the pipe write end is leaked to some unrelated 
long-running process which keeps it alive indefinitely without being aware of 
it.

And again, even if subprocess didn't close FDs by default (or somehow closed 
only non-inherited FDs), how could this be used in a wider picture? Any 
third-party library (even written in Python) could still decide to close e.g. 
some range of descriptors, so one wouldn't be able to rely on FD inheritance in 
a general setting, just as one can't rely on inheritance of environment 
variables (works most of the time, but no promises). Traditional Unix things 
like inheritance across fork()/exec() and no O_CLOEXEC by default are 
convenient for quick hacks, but my impression is that such defaults are 
considered to be design bugs in the modern setting by many.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42738] subprocess: don't close all file descriptors by default (close_fds=False)

2021-10-26 Thread Richard Xia


Richard Xia  added the comment:

I'd like to provide another, non-performance-related use case for changing the 
default value of Popen's close_fds parameters back to False.

In some scenarios, a (non-Python) parent process may want its descendant 
processes to inherit a particular file descriptor and for each descendant 
process to pass on that file descriptor its own children. In this scenario, a 
Python program may just be an intermediate script that calls out to multiple 
subprocesses, and closing the inheritable file descriptors by default would 
interfere with the parent process's ability to pass on that file descriptor to 
descendants.

As a concrete example, we have a (non-Python) build system and task runner that 
orchestrates many tasks to run in parallel. Some of those tasks end up invoking 
Python scripts that use subprocess.run() to run other programs. Our task runner 
intentionally passes an inheritable file descriptor that is unique to each task 
as a form of a keep-alive token; if the child processes continue to pass 
inheritable file descriptors to their children, then we can determine whether 
all of the processes spawned from a task have terminated by checking whither 
the last open handle to that file descriptor has been closed. This is 
particularly important when a processes exits before its children, sometimes 
uncleanly due to being force killed by the system or by a user.

In our use case, Python's default value of close_fds=True interferes with our 
tracking scheme, since it prevents Python's subprocesses from inheriting that 
file descriptor, even though that file descriptor has intentionally been made 
inheritable.

While we are able to work around the issue by explicitly setting 
close_fds=False in as much of our Python code as possible, it's difficult to 
enforce this globally since we have many small Python scripts. We also have no 
control over any third party libraries that may possibly call Popen.

Regarding security, PEP 446 already makes it so that any files opened from 
within a Python program are non-inheritable by default, which I agree is a good 
default. One can make the argument that it's not Python's job to enforce a 
security policy on file descriptors that a Python process has inherited from a 
parent process, since Python cannot distinguish from descriptors that were 
accidentally or intentionally inherited.

--
nosy: +richardxia

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42738] subprocess: don't close all file descriptors by default (close_fds=False)

2020-12-28 Thread Eryk Sun


Eryk Sun  added the comment:

> Python 3.7 added the support for the PROC_THREAD_ATTRIBUTE_HANDLE_LIST 
> in subprocess.STARTUPINFO: lpAttributeList['handle_list'] parameter.

The motivating reason to add support for the WinAPI handle list was to allow 
changing the default to close_fds=True regardless of the need to inherit 
standard handles. However, even when using the handle list, one still has to 
make each handle in the list inheritable. Thus concurrent calls to os.system() 
and os.spawn*() -- which are not implemented via subprocess.Popen(), but should 
be -- may leak the handles in the list. If the default is changed to 
close_fds=False, then by default concurrent Popen() calls may also leak 
temporarily inheritable handles when a handle list isn't used to constrain 
inheritance.

Background

Windows implicitly duplicates standard I/O handles from a parent process to a 
child process if they're both console applications and the child inherits the 
console session. However, subprocess.Popen() requires standard I/O inheritance 
to work consistently even without an inherited console session. It explicitly 
inherits standard handles in the STARTUPINFO record, which requires 
CreateProcessW to be called with bInheritHandles as TRUE. In 3.7+, Popen() also 
passes the standard-handle values in a PROC_THREAD_ATTRIBUTE_HANDLE_LIST 
attribute that constrains inheritance, but handles in the list still have to be 
made inheritable before calling CreateProcessW. Thus they may be leaked by 
concurrent CreateProcessW calls that inherit handles without a constraining 
handle list.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42738] subprocess: don't close all file descriptors by default (close_fds=False)

2020-12-27 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

Note that vfork() support has been merged for 3.10 via bpo-35823, so 
posix_spawn() is less of a performance carrot than it used to be on Linux.  
vfork() exists macOS, that code could likely be enabled there after some 
investigation+testing.

Regardless, changing this default sounds difficult due to the variety of things 
depending on the existing behavior - potentially for security issues as you've 
noted - when running in a process with other file descriptors potentially not 
managed by Python (ie: extension modules) that don't explicitly use CLOEXEC.

The subprocess APIs are effectively evolving to become lower level over time as 
we continually find warts in them that need addressing but find defaults that 
cannot change due to existing uses.  A higher level "best practices for 
launching child processes module" with APIs reflecting explicit intents 
(performance vs security vs simplicity) rather than requiring users to 
understand subprocess platform specific details may be a good idea at this 
point (on PyPI I assume).

We changed posix close_fds default to True in 3.2 when Jeffrey and I wrote 
_posixsubprocess to better match the behavior most users actually want - 
undoing that doesn't feel right.

--
type:  -> performance

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42738] subprocess: don't close all file descriptors by default (close_fds=False)

2020-12-26 Thread Alexey Izbyshev


Alexey Izbyshev  added the comment:

> Using close_fds=False, subprocess can use posix_spawn() which is safer and 
> faster than fork+exec. For example, on Linux, the glibc implements it as a 
> function using vfork which is faster than fork if the parent allocated a lot 
> of memory. On macOS, posix_spawn() is even a syscall.

On Linux, unless you care specifically about users with Python 3.10+ on older 
kernels, implementing support for closerange() syscall in subprocess would 
provide better net benefit. This is because (a) performance advantage of 
posix_spawn() is no longer relevant on Linux after bpo-35823 and (b) supporting 
closerange() would benefit even those users who still need close_fds=True.

--
nosy: +izbyshev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42738] subprocess: don't close all file descriptors by default (close_fds=False)

2020-12-25 Thread STINNER Victor


New submission from STINNER Victor :

To make subprocess faster, I propose to no longer close all file descriptors by 
default in subprocess: change Popen close_fds parameter default to False 
(close_fds=False).

Using close_fds=False, subprocess can use posix_spawn() which is safer and 
faster than fork+exec. For example, on Linux, the glibc implements it as a 
function using vfork which is faster than fork if the parent allocated a lot of 
memory. On macOS, posix_spawn() is even a syscall.

The main drawback is that close_fds=False reopens a security vulnerability if a 
file descriptor is not marked as non-inheritable. The PEP 446 "Make newly 
created file descriptors non-inheritable" was implemented in Python 3.4: Python 
should only create non-inheritable FDs, if it's not the case, Python should be 
fixed. Sadly, 3rd party Python modules may not implement the PEP 446. In this 
case, close_fds=True should be used explicitly, or these modules should be 
fixed. os.set_inheritable() can be used to make FDs as non-inheritable.

close_fds=True has a cost on subprocess performance. When the maximum number of 
file descriptors is larger than 10,000 and Python has no way to list open file 
descriptors, calling close(fd) once per file descriptor can take several 
milliseconds. When I wrote the PEP 446 (in 2013), on a FreeBSD buildbot with 
MAXFD=655,000, closing all FDs took 300 ms (see bpo-11284 "slow close file 
descriptors").

FreeBSD has been fixed recently by using closefrom() function which makes 
_posixsubprocess and os.closerange() faster.

In 2020, my Red Hat colleagues still report the issue in Linux containers 
using... Python 2.7, since Python 2.7 subprocess also close all file 
descriptors in a loop (there was no code to list open file descriptors). The 
problem still exists in Python 3 if subprocess cannot open /proc/self/fd/ 
directory, when /proc pseudo filesystem is not mounted (or if the access is 
blocked, ex: by a sandbox). The problem is that some containers are created a 
very high limit for the maximum number of FDs: os.sysconf("SC_OPEN_MAX") 
returns 1,048,576. Calling close() more than 1 million of times is slow...

See also related issue bpo-38435 "Start the deprecation cycle for subprocess 
preexec_fn".

--

Notes about close_fds=True.

Python 3.9 can now use closefrom() function on FreeBSD: bpo-38061.

Linux 5.10 has a new closerange() syscall: https://lwn.net/Articles/789023/

Linux 5.11 (not released yet) will add a new CLOSE_RANGE_CLOEXEC flag to 
close_range(): https://lwn.net/Articles/837816/

--

History of the close_fds parameter default value.

In Python 2.7, subprocess didn't close all file descriptors by default: 
close_fds=False by default.

Dec 4, 2010: In Python 3.2 (bpo-7213, bpo-2320), subprocess.Popen started to 
emit a deprecating warning when close_fds was not specified explicitly (commit 
d23047b62c6f885def9020bd9b304110f9b9c52d):

+if close_fds is None:
+# Notification for http://bugs.python.org/issue7213 & issue2320
+warnings.warn(
+'The close_fds parameter was not specified.  Its default'
+' will change from False to True in a future Python'
+' version.  Most users should set it to True.  Please'
+' update your code explicitly set close_fds.',
+DeprecationWarning)

Dec 13 2010, bpo-7213: close_fds default value was changed to True on 
non-Windows platforms, and False on Windows (commit 
f5604853889bfbbf84b48311c63c0e775dff38cc). The implementation was adjusted in 
bpo-6559 (commit 8edd99d0852c45f70b6abc851e6b326d4250cd33) to use a new 
_PLATFORM_DEFAULT_CLOSE_FDS singleton object.

See issues:

* bpo-2320: Race condition in subprocess using stdin
* bpo-6559: add pass_fds paramter to subprocess.Popen()
* bpo-7213: subprocess leaks open file descriptors between Popen instances 
causing hangs

--

On Windows, there is also the question of handles (HANDLE type). Python 3.7 
added the support for the PROC_THREAD_ATTRIBUTE_HANDLE_LIST in 
subprocess.STARTUPINFO: lpAttributeList['handle_list'] parameter. Hopefully, 
Windows has a way better default than Unix: all handles are created as 
non-inheritable by default, so these is no need to explicitly close them.

--
components: Library (Lib)
messages: 383739
nosy: gregory.p.smith, vstinner
priority: normal
severity: normal
status: open
title: subprocess: don't close all file descriptors by default (close_fds=False)
versions: Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com