[issue23649] tarfile not re-entrant for multi-threading

2019-06-10 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2019-06-09 Thread Jeffrey Kintscher


Change by Jeffrey Kintscher :


--
nosy: +Jeffrey.Kintscher

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2018-04-07 Thread Xavier de Gaye

Xavier de Gaye  added the comment:

extract_from_pkgs() in the attached extract_from_packages.py script extracts 
/etc files from the tar files in PKG_DIR into WORK_DIR using a 
ThreadPoolExecutor (a ThreadPoolExecutor, when used to extract all the /etc 
files from the packages that build a whole ArchLinux system, divides the 
elapsed time by 2). Running this script that tests this function fails randomly 
with the same error as reported by Srdjan in msg237961.

Replacing ThreadPoolExecutor with ProcessPoolExecutor also fails randomly.

Using the safe_makedirs() context manager to enclose the statements than run 
ThreadPoolExecutor fixes the problem.

Obviously this in not a problem related to thread-safety (it occurs also with 
ProcessPoolExecutor) but a problem about the robustness of the tarfile module 
in a concurrent access context. The problem is insidious in that it may never 
occur in an application test suite.

--
nosy: +xdegaye
Added file: https://bugs.python.org/file47523/extract_from_packages.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-16 Thread Lars Gustäbel

Lars Gustäbel added the comment:

I agree with David that there is no need for tarfile to be thread-safe. There 
is nothing to be gained from distributing one TarFile object among multiple 
threads because it operates on a single resource which has to be accessed 
sequentially anyway. So, it seems best to me if we leave it like it is and let 
the user add locks around it as she/he sees fit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-16 Thread STINNER Victor

STINNER Victor added the comment:

Lars Gustäbel added the comment:
 I agree with David that there is no need for tarfile to be thread-safe. There 
 is nothing to be gained from distributing one TarFile object among multiple 
 threads because it operates on a single resource which has to be accessed 
 sequentially anyway. So, it seems best to me if we leave it like it is and 
 let the user add locks around it as she/he sees fit.

In asyncio, it was a design choice to not be thread-safe, to allow
more optimizations and support multiple implementations of asyncio,
without this important constraint.

I modified recently the asyncio doc to warn users in each class that
asyncio objects are *not* thread safe, with an explanation how to use
correctly asyncio with threads.

https://docs.python.org/dev/library/asyncio-eventloop.html#asyncio.BaseEventLoop
This class is not thread safe.

Such change in tarfile doc is probably enough for tarfile.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

New submission from Srdjan Grubor:

When running tarfile.extract through multiple threads, the archive reading 
pointer is not protected from simultaneous seeks and causes various convoluted 
bugs:

  some code
self.archive_object.extract(member, extraction_path)
  File /usr/lib/python3.4/tarfile.py, line 2019, in extract
set_attrs=set_attrs)
  File /usr/lib/python3.4/tarfile.py, line 2088, in _extract_member
self.makefile(tarinfo, targetpath)
  File /usr/lib/python3.4/tarfile.py, line 2127, in makefile
source.seek(tarinfo.offset_data)
  File /usr/lib/python3.4/gzip.py, line 573, in seek
self.read(1024)
  File /usr/lib/python3.4/gzip.py, line 365, in read
if not self._read(readsize):
  File /usr/lib/python3.4/gzip.py, line 449, in _read
self._read_eof()
  File /usr/lib/python3.4/gzip.py, line 485, in _read_eof
hex(self.crc)))
OSError: CRC check failed 0x1036a2e1 != 0x0

--
messages: 237960
nosy: sgnn7
priority: normal
severity: normal
status: open
title: tarfile not re-entrant for multi-threading
type: behavior
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Changes by Srdjan Grubor sg...@sgnn7.org:


--
type: behavior - enhancement

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

Also, extract_member in tarfile.py is not thread-safe since the check for 
folder existence might occur during another thread's creation of that same dir 
causing the code to error out.

  File /usr/lib/python3.4/concurrent/futures/thread.py, line 54, in run
result = self.fn(*self.args, **self.kwargs)
  File ./xdelta3-dir-patcher, line 499, in _apply_file_delta
archive_object.expand(patch_file, staging_dir)
  File ./xdelta3-dir-patcher, line 284, in expand
self.archive_object.extract(member, extraction_path)
  File /usr/lib/python3.4/tarfile.py, line 2019, in extract
set_attrs=set_attrs)
  File /usr/lib/python3.4/tarfile.py, line 2080, in _extract_member
os.makedirs(upperdirs)
  File /usr/lib/python3.4/os.py, line 237, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 
'/tmp/XDelta3DirPatcher_is0y4_5f/xdelta/updated folder'

Code causing problems:
2065 def _extract_member(self, tarinfo, targetpath, set_attrs=True):
...
2075 # Create all upper directories.
2076 upperdirs = os.path.dirname(targetpath)
2077 if upperdirs and not os.path.exists(upperdirs):
...
2080 os.makedirs(upperdirs)  # Fails since the dir might be already 
created between lines 2077 and 2080

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Changes by Srdjan Grubor sg...@sgnn7.org:


--
type: enhancement - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

The code around tarfile multi-threading was fixed for me on the user-side with 
threading.Lock() usage so it might work to use this within the library and the 
directory creation could be improved by probably doing a try/except around the 
makedirs() call with ignoring of the exception if it's FileExistsError - my 
code I use elsewhere fixes this with:
def _safe_makedirs(self, dir_path):
try:
makedirs(dir_path)
# Concurrency problems need to be handled. If two threads create
# the same dir, there might be a race between them checking and
# doing makedirs so we handle that as gracefully as possible here.
except FileExistsError as fee:
if not os.path.isdir(dir_path):
raise fee 

If I get time, I'll submit a patch but it seems like I probably won't for this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread R. David Murray

R. David Murray added the comment:

If you want to use an object that has state in more than one thread you 
generally have to put some locking around it.  Unless I'm missing something 
(which I might be) I don't think it is tarfile's responsibility to do this.

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
nosy: +haypo, lars.gustaebel
versions: +Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

I don't know if that's true of core libraries. Why complicate things for end 
users when those issues could be done in the library itself and be completely 
transparent to the devs? A simple RLock latch wouldn't pose almost any speed 
degradation but would work in both threaded and non-threaded situations as 
expected.

--
versions:  -Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

After some thinking, for the makedirs it should only need 
makedirs(exist_ok=True)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

The whole lib still needs the threading locks added but the patch submitted 
should fix things for people that do the locking from their code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23649] tarfile not re-entrant for multi-threading

2015-03-12 Thread Srdjan Grubor

Srdjan Grubor added the comment:

Patch for the multithreaded expansion of files and use of makedirs.

--
keywords: +patch
Added file: http://bugs.python.org/file38462/mutithreading_tarfile.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23649
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com