New submission from Thouis (Ray) Jones <tho...@gmail.com>:

On my system (OSX 10.6.8) using the python.org 32/64-bit build of 2.7.2, I see 
incorrect results from os.listdir() in a threaded program.  The error is that 
the result of os.listdir() is missing a few files from its list.

First, my use case.  I work with large image-based datasets, often with 
hundreds of thousands of images.  The first step in processing is to locate all 
of these images and extract some basic information (size, channels, etc.).  To 
do this more efficiently on network filesystems, where listing directories and 
stat()ing files is often slow, I wrote a multithreaded analog to os.walk().  
While validating its results against unix 'find', I saw discrepancies in the 
number of files found.

My guess is that OSX's readdir() is not reentrant when dealing with SMB shares, 
even on different DIR pointers.  It's also possible that readdir() is not 
reentrant with lstat(), as some of my tests seemed to indicate this, but I need 
to run some more tests to be sure that's what I was actually seeing.

In any case, there are three possible ways to fix this, I think.

- Remove the Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS around readdir() in 
posixmodule.c

- Put a mutex on readdir()

- Use readdir_r().  I've attached a potential patch for 2.7.2 for this solution.

I would prefer the second or last approach, as they preserve the ability to do 
other work while listing large directories.

By my reading of the python 3.0 to 3.4 sources, this problem exists in those 
versions, as well.

----------
components: Library (Lib)
files: py272_readdir_r.patch
keywords: patch
messages: 148737
nosy: thouis
priority: normal
severity: normal
status: open
title: readdir() in os.listdir not threadsafe on OSX 10.6.8
type: behavior
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file23832/py272_readdir_r.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13517>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to