New submission from Thomas <thger...@hhu.de>:

According to https://docs.python.org/3.5/whatsnew/changelog.html#id108 
bpo-14099, reading multiple ZipExtFiles should be thread-safe, but it is not.

I created a small example where two threads try to read files from the same 
ZipFile simultaneously, which crashes with a Bad CRC-32 error. This is 
especially surprising since all files in the ZipFile only contain 0-bytes and 
have the same CRC.

My use case is a ZipFile with 82000 files. Creating multiple ZipFiles from the 
same "physical" zip file is not a satisfactory workaround because it takes 
several seconds each time. Instead, I open it only once and clone it for each 
thread:

with zipfile.ZipFile("/tmp/dummy.zip", "w") as dummy:
    pass

def clone_zipfile(z):
    z_cloned = zipfile.ZipFile("/tmp/dummy.zip")
    z_cloned.NameToInfo = z.NameToInfo
    z_cloned.fp = open(z.fp.name, "rb")
    return z_cloned

This is a much better solution for my use case than locking. I am using 
multiple threads because I want to finish my task faster, but locking defeats 
that purpose.

However, this cloning is somewhat of a dirty hack and will break when the file 
is not a real file but rather a file-like object.

Unfortunately, I do not have a solution for the general case.

----------
files: test.py
messages: 381090
nosy: Thomas
priority: normal
severity: normal
status: open
title: Reading ZipFile not thread-safe
versions: Python 3.7, Python 3.8
Added file: https://bugs.python.org/file49601/test.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42369>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to