On Tue, Jun 04, 2013 at 09:57:00PM +0200, Oleg Nesterov wrote: > Will do... but so far I am confused. > > I do not see how they could race (I mean /proc/pid/task only). OK, OK, > the usage of ->f_pos in sys_getdents() looks "obviously wrong", but this > is another story? And "put f_pos in a local variable" can't help.
For one thing, a bunch of directories use generic_file_llseek(), which does *not* use ->i_mutex. For another, there's a very unpleasant problem with read(2) (failing) attempt racing with ->f_pos modifications in ->readdir(). Take a look at sys_read() and note that it is done with no serialization at all (not in the top level, that is) and that it puts the (unmodified by generic_read_dir()) value of pos back into file->f_pos as soon as vfs_read() passes -EISDIR (returned by generic_read_dir()) back to sys_read(). I.e. ->f_pos is silently reset back to the value it used to have on the entry to read(2). Despite foo_readdir() assumptions that it won't be changed behind its back. Reset itself wouldn't be a problem - if several threads mess with read() on the same opened file in parallel, you are not promised anything good about the resulting IO pointer position. The same applies here. However, many ->readdir() instances use file->f_pos as a variable they can use for internal needs and _that_ leads to very unpleasant races. The sane solution is to do what ->read()/->write()/etc. are doing - pass an address of local copy of ->f_pos, so they are able to use it without worrying about concurrent modifications of that value. That obviously solves all problems with generic_file_lseek(), etc., as well as this sys_read() shite. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/