[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2022-03-13 Thread Marvin Poul


Marvin Poul  added the comment:

Here's the small patch.  Sadly I have no overview what the affected linux 
kernel version are.  I guess technically you can all this "working around a bug 
in specific linux version", but since it's a very minor change that saves one 
syscall even for non-breaking version, I feel it's justified.  Let me know if 
you'd like any modification done however.

--
keywords: +patch
Added file: https://bugs.python.org/file50672/shutil.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2022-03-13 Thread Marvin Poul


Marvin Poul  added the comment:

I hope you don't mind me necro posting, but I ran into this issue again and
have a small patch to solve it.

I attached an MWE that triggers the BlockingIOError reliably on ext4
filesystems in linux 4.12.14 and python 3.8.12.  Running under strace -e
sendfile gives the following output

# manually calling sendfile to check that it works
> sendfile(5, 4, [0] => [8388608], 8388608) = 8388608
# sendfile calls originating in shutil.copy
> sendfile(5, 4, [0] => [8388608], 8388608) = 8388608
> sendfile(5, 4, [8388608], 8388608)  = -1 EAGAIN (Resource temporarily 
> unavailable)
> Shutil Failed!
> [Errno 11] Resource temporarily unavailable: 
> '/cmmc/u/zora/scratch/sendfile_bug/tmpaqx2o4uj' -> 
> '/cmmc/u/zora/scratch/sendfile_bug/tmpb8rzg8rg'
> +++ exited with 0 +++

This shows that the first call to sendfile actually copies the whole file and
the EAGAIN is only triggered on the second, unnecessary, call.  I have tested
with a small C program that it's triggered whenever sendfile's offset + count
exceeds the file size of in_fd.  This is weird behaviour on the kernels side
that seems to have changed in newer kernel versions (issue is not present e.g.
on my 5.16.12 laptop).

Anyways my patch makes that second call not appear by keeping track of the file
size and the bytes written so far.  It's against the current python main
branch, but if I see correctly this part hasn't changed in years.  I have
checked the error is not thrown when the patch is applied.

(I can only attach one file, so patch is attached in a new one.)

--
nosy: +pmrv
Added file: https://bugs.python.org/file50671/sendfile.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-05-10 Thread Giampaolo Rodola'


Change by Giampaolo Rodola' :


--
pull_requests: +24674
pull_request: https://github.com/python/cpython/pull/26024

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-05-10 Thread Giampaolo Rodola'


Giampaolo Rodola'  added the comment:

> The question seems to be is if it should be okay to _GiveUpOnFastCopy after a 
> partial (incomplete) copy has already occurred via sendfile.

I think it should not. For posterity: my rationale for introducing 
_USE_CP_SENDFILE was to allow monkey patching for corner cases such as this one 
(see also bpo-36610 / GH-13675), but expose it as a private name because I 
expected them to be rare and likely up to a broken underlying implementation, 
as it appears this is the case. FWIW, I deem _USE_CP_SENDFILE usage in 
production code as legitimate, and as such it should stay private but never be 
removed.

--
versions: +Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-05-10 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

The logic for bailing out to a slow copy is currently:

https://github.com/python/cpython/blob/main/Lib/shutil.py#L158

that condition appears to not be happening in Alexei's test.  Suggesting that 
either at least one sendfile call succeeded and thus offset is non-zero or the 
lseek failed.

run that test under pdb and walk thru the code, or under strace to look at the 
syscalls and find out.

The question seems to be is if it should be okay to _GiveUpOnFastCopy after a 
partial (incomplete) copy has already occurred via sendfile.

--
nosy: +giampaolo.rodola

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-05-10 Thread Pablo Conesa


Pablo Conesa  added the comment:

So, is it ok, when the fast copy fails, not to _GiveupOnFastCopy(err)?

I can understand that fast copy might fail, but then the Giveup part should 
happen and it wasn't.

Additionally, _USE_CP_SENDFILE could be taken, optionally from an environment 
variable to cancel the fastcopy once we know it will fail?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-05-06 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

I don't believe CPython should be working around a bug in specific Linux kernel 
versions in the standard library unless they are extremely pernicious and not 
considered to be a bug and thus ever be fixed in the OS kernel.

As the sendfile system call appears to infinitely return one of EAGAIN, 
EALREADY, EWOULDBLOCK, or EINPROGRESS in this case, there isn't anything 
CPython could do.  A retry/backoff loop won't help.

This should be worked around at the application level by whatever means are 
appropriate.

--
nosy: +gregory.p.smith
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed
type: crash -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-05-05 Thread PEAR


PEAR  added the comment:

Most probably related: https://www.ibm.com/support/pages/apar/IJ28891

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-04-27 Thread PEAR


Change by PEAR :


--
nosy: +PEAR

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-04-22 Thread Alexei Colin


Change by Alexei Colin :


--
versions: +Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-04-22 Thread Alexei Colin


Change by Alexei Colin :


--
versions: +Python 3.10 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-04-22 Thread Alexei Colin


Alexei Colin  added the comment:

Can confirm that this BlockingIOError happens on GPFS (alpine) on Summit 
supercomputer, tested with Python 3.8 and 3.10a7.

I found that it happens only for file sizes above 65536. Minimal example:

This filesize works:

$ rm -f srcfile dstfile && truncate --size 65535 srcfile && python3.10 -c 
"import shutil; shutil.copyfile(b'srcfile', b'dstfile')"

This file size (and larger) does not work:

$ rm -f srcfile dstfile && truncate --size 65536 srcfile && python3.10 -c 
"import shutil; shutil.copyfile(b'srcfile', b'dstfile')"
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../usr/lib/python3.10/shutil.py", line 265, in copyfile
_fastcopy_sendfile(fsrc, fdst)
  File "/.../usr/lib/python3.10/shutil.py", line 162, in _fastcopy_sendfile
raise err
  File "/.../usr/lib/python3.10/shutil.py", line 142, in _fastcopy_sendfile
sent = os.sendfile(outfd, infd, offset, blocksize)
BlockingIOError: [Errno 11] Resource temporarily unavailable: b'srcfile' -> 
b'dstfile'

I tried patching shutil.py to retry the the call on this EAGAIN, but subsequent 
attempts fail with EAGAIN again indefinitely.

I also use OP's workaround: set _USE_CP_SENDFILE = False in shutil.py

--
nosy: +alexeicolin

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43743] BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.

2021-04-06 Thread Pablo Conesa


New submission from Pablo Conesa :

Hi, one of our users is reporting this starting to happen in a GPFS. All has 
been working fine for NTFS so far for many years.

I had a look at my shutil code, and I can see the try/except code trying to 
fall back to the "slower" copyfileobj(fsrc, fdst).

But it seems, by the stacktrace bellow that the "catch" is not happening.

Any idea how to fix this?

I guess something like:

import shutil
shutil._USE_CP_SENDFILE = False

should avoid the fast_copy attempt.



> Traceback (most recent call last):
>   File 
> "/opt/pxsoft/scipion/v3/ubuntu20.04/scipion-em-esrf/esrf/workflow/esrf_launch_workflow.py",
>  line 432, in 
> project.scheduleProtocol(prot)
>   File 
> "/opt/pxsoft/scipion/v3/ubuntu20.04/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/project/project.py",
>  line 633, in scheduleProtocol
> pwutils.path.copyFile(self.dbPath, protocol.getDbPath())
>   File 
> "/opt/px/scipion/v3/ubuntu20.04/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/utils/path.py",
>  line 247, in copyFile
> shutil.copy(source, dest)
>   File 
> "/opt/pxsoft/scipion/v3/ubuntu20.04/anaconda3/envs/.scipion3env/lib/python3.8/shutil.py",
>  line 415, in copy
> copyfile(src, dst, follow_symlinks=follow_symlinks)
>   File 
> "/opt/pxsoft/scipion/v3/ubuntu20.04/anaconda3/envs/.scipion3env/lib/python3.8/shutil.py",
>  line 272, in copyfile
> _fastcopy_sendfile(fsrc, fdst)
>   File 
> "/opt/pxsoft/scipion/v3/ubuntu20.04/anaconda3/envs/.scipion3env/lib/python3.8/shutil.py",
>  line 169, in _fastcopy_sendfile
> raise err
>   File 
> "/opt/pxsoft/scipion/v3/ubuntu20.04/anaconda3/envs/.scipion3env/lib/python3.8/shutil.py",
>  line 149, in _fastcopy_sendfile
> sent = os.sendfile(outfd, infd, offset, blocksize)
> BlockingIOError: [Errno 11] Resource temporarily unavailable: 
> 'project.sqlite' -> 'Runs/02_ProtImportMovies/logs/run.db'

--
components: IO
messages: 390297
nosy: p.conesa.mingo
priority: normal
severity: normal
status: open
title: BlockingIOError: [Errno 11] Resource temporarily unavailable: on GPFS.
type: crash
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com