Re: [Full-disclosure] Re: readdir_r considered harmful

2005-11-08 Thread Casper . Dik

>In practice, you're correct. In theory, however, consider the  
>following code
>path.
>
>
>> THREAD 1  THREAD 2
>> ----
>> DIR *d1 = opendir(dir1);
>>   DIR *d2 = opendir(dir2);
>> dent1 = readdir(dir1);
>>   dent2 = readdir(dir2);
>> use(dent1);
>>
>
>In most implementations, dent1 != dent2. HOWEVER, there is no  
>guarantee that
>they will not both point to the same statically allocated buffer, and  
>some
>implementations may do so. For example, this is why ctime_r exists:  
>ctime
>returns a pointer to a statically allocated buffer, and hence is not  
>thread
>safe.

The standard actually guarantees that the static storage is
associated with the specific directory STREAM.  So a system on which
dent1 and dent2 point to the same buffer and reads from one stream
affect the buffer returned by reads from another stream are not
POSIX compliant.

See:

http://www.opengroup.org/onlinepubs/009695399/functions/readdir.html

"The pointer returned by readdir() points to data which may be
overwritten by another call to readdir() on the same directory
stream. This data is not overwritten by another call to readdir()
on a different directory stream."

But is also goes on to say:

"The readdir() function need not be reentrant. A function that is
not required to be reentrant is not required to be thread-safe."

which is the one thing I like POSIX to fix for thread safe implementations.

Casper


Re: [Full-disclosure] Re: readdir_r considered harmful

2005-11-08 Thread Andrew Miller
[EMAIL PROTECTED] wrote:
...

>Had they done so, we would never have had to use readdir_r() and progammers
>would not have introduced bugs in the (mis)use of pathconf, over allocating,
>etc.
>
>I would be interested in seeing any real-world use of readdir_r() in
>a context where readdir_r() is required (multiple threads reading from
>a single DIR *).
>  
>
Consider the following situation(I'm not sure if anyone actually does this):
1) You have a "spool" directory containing a large number of files, each
which represents a task to process.
2) You have a number of worker threads. Each worker thread reads a file
from the global DIR*, and then opens and reads the file for
processing(and then loops on 2).

Of course, you could always just put a mutex around every call to
readdir(), and copy the filename somewhere safe, or you could invent a
signalling system to ask one thread to do all the readdir()s. Whether
this makes sense depends on how much of readdir_r has to be spent inside
a global mutex/spinlock anyway, and how long the processing part takes
compared with the readdir() part.

Andrew



Re: [Full-disclosure] Re: readdir_r considered harmful

2005-11-07 Thread Casper . Dik


>On 11/6/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> I don't see how that is relevant; the typical use of readdir() is as follows:
>>
>> DIR *dirp = opendir(name);
>>
>> while ((dent = readdir(dirp)) != NULL) {
>> ...
>> }
>>
>> closedir(dirp);
>>
>> Nothing other threads do with readdir() on different dirp's will influence
>> what "dent" points to.
>
>The issue is multiple threads using the same DIR.

No, it isn't.  I certainly limited the scope of my contribution to
single threads reading from a DIR.

All the 80-odd uses of readdir_r() in the Solaris core source code,
all can (and should) be replaced with readdir().  All have a single
thread reading and reusing the same "struct dirent", so readdir()
could be used in POSIXly correct fashion if the following sentence
in the open group's manual page was not present:

"The readdir() function need not be reentrant. A function that is not
required to be reentrant is not required to be thread-safe."

I believe that this is an error in POSIX; when "threadedness" was added
the manual page could have been changed to indicate that a single
thread using the above idiom was safe.

Had they done so, we would never have had to use readdir_r() and progammers
would not have introduced bugs in the (mis)use of pathconf, over allocating,
etc.

I would be interested in seeing any real-world use of readdir_r() in
a context where readdir_r() is required (multiple threads reading from
a single DIR *).

Casper


Re: [Full-disclosure] Re: readdir_r considered harmful

2005-11-07 Thread Ulrich Drepper
On 11/6/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> I don't see how that is relevant; the typical use of readdir() is as follows:
>
> DIR *dirp = opendir(name);
>
> while ((dent = readdir(dirp)) != NULL) {
> ...
> }
>
> closedir(dirp);
>
> Nothing other threads do with readdir() on different dirp's will influence
> what "dent" points to.

The issue is multiple threads using the same DIR.


Re: [Full-disclosure] Re: readdir_r considered harmful

2005-11-07 Thread Casper . Dik

>Then you never really understood the implementation, seems.  Of course
>all implementations keep the content of the directory as read with
>getdents or so in the DIR descriptor.  But it is usually not the case
>that the whole content fits into the buffer allocated.  One could, of
>course, resize the buffer to fit the content of the directory read,
>even if this means reserving hundreds or thousands of kBs.  But this
>is not how most implementations work.

I don't see how that is relevant; the typical use of readdir() is as follows:

DIR *dirp = opendir(name);

while ((dent = readdir(dirp)) != NULL) {
...
}

closedir(dirp);

Nothing other threads do with readdir() on different dirp's will influence
what "dent" points to.

I have *never* seen a program where multiple threads read from a single
dirp; and I can't image the use.

>Instead implementations keep work similar to every buffered file I/O
>operation.  But this means that buffer content is replaced.  If this
>happens and some thread uses readdir() instead of readdir_r(), the
>returned string pointer suddenly becomes invalid since it points to
>memory which has been replaced.

Yes, the next call to readdir() *on the same dirp* may change what
the previous call; but that's completely irrelevant for most uses
of readdir().

Of course, an application may want to save all readdir() return values,
but that is completely orthogonal to threads; there is no reason
why the POSIX *thread* specification includes readdir_r().

>Next time, before you make such comments, ask Don Cragun to explain
>things to you.

Next time before you mail, you might want to engage your brain.

There is NO reason for a thread-safe library to use readdir_r() over
readdir(), with common readdir() implementations.

Casper


Re: [Full-disclosure] Re: readdir_r considered harmful

2005-11-07 Thread Ulrich Drepper
On 11/5/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Why not:
>
> 4. Require the readdir() implementation to use state local to dirp.
>
> I've never understood the rationale behind readdir_r;

Then you never really understood the implementation, seems.  Of course
all implementations keep the content of the directory as read with
getdents or so in the DIR descriptor.  But it is usually not the case
that the whole content fits into the buffer allocated.  One could, of
course, resize the buffer to fit the content of the directory read,
even if this means reserving hundreds or thousands of kBs.  But this
is not how most implementations work.

Instead implementations keep work similar to every buffered file I/O
operation.  But this means that buffer content is replaced.  If this
happens and some thread uses readdir() instead of readdir_r(), the
returned string pointer suddenly becomes invalid since it points to
memory which has been replaced.

Next time, before you make such comments, ask Don Cragun to explain
things to you.


Re: [Full-disclosure] Re: readdir_r considered harmful

2005-11-07 Thread Andrew Farmer

On 06 Nov 05, at 01:00, [EMAIL PROTECTED] wrote:
Then you never really understood the implementation, seems.  Of  
course

all implementations keep the content of the directory as read with
getdents or so in the DIR descriptor.  But it is usually not the case
that the whole content fits into the buffer allocated.  One could, of
course, resize the buffer to fit the content of the directory read,
even if this means reserving hundreds or thousands of kBs.  But this
is not how most implementations work.



I don't see how that is relevant; the typical use of readdir() is  
as follows:


DIR *dirp = opendir(name);

while ((dent = readdir(dirp)) != NULL) {
...
}

closedir(dirp);

Nothing other threads do with readdir() on different dirp's will  
influence

what "dent" points to.

I have *never* seen a program where multiple threads read from a  
single

dirp; and I can't image the use.



In practice, you're correct. In theory, however, consider the  
following code

path.



THREAD 1  THREAD 2
----
DIR *d1 = opendir(dir1);
  DIR *d2 = opendir(dir2);
dent1 = readdir(dir1);
  dent2 = readdir(dir2);
use(dent1);



In most implementations, dent1 != dent2. HOWEVER, there is no  
guarantee that
they will not both point to the same statically allocated buffer, and  
some
implementations may do so. For example, this is why ctime_r exists:  
ctime
returns a pointer to a statically allocated buffer, and hence is not  
thread

safe.

You are correct, though, that the glibc implementation of readdir is
thread-safe, so readdir_r is unnecessary in all common situations.


PGP.sig
Description: This is a digitally signed message part