Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
Bharata B Rao: > - The cache can grow arbitrarily large in size for big directories thereby > consuming lots of memory. Pruning individual cache entries is out of question > as entire cache is needed for subsequent readdirs for duplicate elimination. Additionally, the memory usage may be a problem too since your implementation calls kmalloc() for every names. > - Whenever _any_ directory that is part of the union gets > modified (addition/deletion of entries), the dirent cache of all the unions > which this directory is part of, needs to be purged and rebuilt. This is > expensive not only due to re-reads of dirents but also because > readdir(2)/getdents(2) needs to be synchronized with other operations > like mkdir/mknod/link/unlink etc. The cache in struct file doesn't need to be refreshed unless rewinddir() is issued. Also you can maintain the cache in every add/del entries, instead of discarding the cache entirely. > After all this, I am beginning to think if it would be better to delegate > this readdir and whiteout processing to userspace. Can this be better handled Yes, I had such idea once. And copy-up too. They can be done in userspace (while you need to be careful about the privilege). Anyway I agree with you. As I wrote before, this approach consumes a lot of memory and cpu (for comparing whiteouted names). Junjiro Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
On Thu, 2007-12-06 at 11:01 +0100, Jan Blunck wrote: > > Rather than give each _dirent_ an offset, could we give each sub-mount > > an offset? Let's say we have three members comprising a union mount > > directory. The first has 100 dirents, the second 200, and the third > > 10,000. When the first readdir is done, we populate the table like > > this: > > > > mount_offset[0] = 0; > > mount_offset[1] = 100; > > mount_offset[2] = 300; > > > > If someone seeks back to 150, then we subtrack the mount[1]'s offset > > (100), and realize that we want the 50th dirent from mount[1]. > > Yes, that is a nice idea and it is exactly what I have implemented in my patch > series. But you forgot one thing: directories are not flat files. The dentry > offset in a directory is a random cookie. Therefore it is not possible to have > a linear mapping without allocating memory. Is is truly a random cookie where the fs gets to put anything in there that it can fit in a long? Isn't that an advantage? Being a random cookie, we can encode anything we want in there. We could encode the "mount index" (or whatever you want to call it) in high or low bits and have the rest store the fs-specific offset. The only problem there would be running out of storage space in long. But, what do people expect when they have huge directories (or huge directory offsets)? Surely, if their fs is already pressing the fs's directory data types to the limits, they can't simply union mount a couple of those directories together and expect magic. Union mounts also can't be expected to compress these directory positions to fit. So, I think it's reasonable behavior to readdir() until the sum of the highest position seen on each mount would overflow the off_t that we have to store it in. -- Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
On Thu, Dec 06, 2007 at 11:01:18AM +0100, Jan Blunck wrote: > On Wed, Dec 05, Dave Hansen wrote: > > > I think the key here is what kind of consistency we're trying to > > provide. If a directory is being changed underneath a reader, what > > kinds of guarantees do they get about the contents of their directory > > read? When do those guarantees start? Are there any at open() time? > > But we still want to be compliant to what POSIX defines. The problem isn't the > consistency of the readdir result but the seekdir/telldir interface. IMHO that > interface is totally broken: you need to be able to find every offset given by > telldir since the last open. The problem is that seekdir isn't able to return > errors. Otherwise you could just forbid seeking on union directories. Also, what kind of consistency is expected when a directory is open(2)ed and readdir(2) and lseek(2) are applied to it when the directory gets changed underneath the reader. From this: http://www.opengroup.org/onlinepubs/009695399/functions/lseek.html the behaviour/guarantees wasn't apparent to me. > > > Rather than give each _dirent_ an offset, could we give each sub-mount > > an offset? Let's say we have three members comprising a union mount > > directory. The first has 100 dirents, the second 200, and the third > > 10,000. When the first readdir is done, we populate the table like > > this: > > > > mount_offset[0] = 0; > > mount_offset[1] = 100; > > mount_offset[2] = 300; > > > > If someone seeks back to 150, then we subtrack the mount[1]'s offset > > (100), and realize that we want the 50th dirent from mount[1]. > > Yes, that is a nice idea and it is exactly what I have implemented in my patch > series. But you forgot one thing: directories are not flat files. The dentry > offset in a directory is a random cookie. Therefore it is not possible to have > a linear mapping without allocating memory. And I defined this linear behaviour on the cache of dirents we maintain in the approach I posted. And the main reason we maintain cache of dirents in memory is for duplicate elimination. > > > I don't know whether we're bound to this: > > > > http://www.opengroup.org/onlinepubs/007908775/xsh/readdir.html > > > > "If a file is removed from or added to the directory after the > > most recent call to opendir() or rewinddir(), whether a > > subsequent call to readdir() returns an entry for that file is > > unspecified." > > > > But that would seem to tell me that once you populate a table such as > > the one I've described and create it at open(dir) time, you don't > > actually ever need to update it. > > Yes, I'm using such a patch on our S390 buildservers to work around some > readdir/seek/rm problem with old glibc versions. It seems to work but on the > other hand this are really huge systems and I haven't run out of memory while > doing a readdir yet ;) > > The proper way to implement this would be to cache the offsets on a per inode > base. Otherwise the user could easily DoS this by opening a number of > directories and never close them. > You mean cache the offsets or dirents ? How would that solve the seek problem ? How would it enable you to define a seek behaviour for the entire union of directories ? Regards, Bharata. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
On Wed, Dec 05, Dave Hansen wrote: > I think the key here is what kind of consistency we're trying to > provide. If a directory is being changed underneath a reader, what > kinds of guarantees do they get about the contents of their directory > read? When do those guarantees start? Are there any at open() time? But we still want to be compliant to what POSIX defines. The problem isn't the consistency of the readdir result but the seekdir/telldir interface. IMHO that interface is totally broken: you need to be able to find every offset given by telldir since the last open. The problem is that seekdir isn't able to return errors. Otherwise you could just forbid seeking on union directories. > Rather than give each _dirent_ an offset, could we give each sub-mount > an offset? Let's say we have three members comprising a union mount > directory. The first has 100 dirents, the second 200, and the third > 10,000. When the first readdir is done, we populate the table like > this: > > mount_offset[0] = 0; > mount_offset[1] = 100; > mount_offset[2] = 300; > > If someone seeks back to 150, then we subtrack the mount[1]'s offset > (100), and realize that we want the 50th dirent from mount[1]. Yes, that is a nice idea and it is exactly what I have implemented in my patch series. But you forgot one thing: directories are not flat files. The dentry offset in a directory is a random cookie. Therefore it is not possible to have a linear mapping without allocating memory. > I don't know whether we're bound to this: > > http://www.opengroup.org/onlinepubs/007908775/xsh/readdir.html > > "If a file is removed from or added to the directory after the > most recent call to opendir() or rewinddir(), whether a > subsequent call to readdir() returns an entry for that file is > unspecified." > > But that would seem to tell me that once you populate a table such as > the one I've described and create it at open(dir) time, you don't > actually ever need to update it. Yes, I'm using such a patch on our S390 buildservers to work around some readdir/seek/rm problem with old glibc versions. It seems to work but on the other hand this are really huge systems and I haven't run out of memory while doing a readdir yet ;) The proper way to implement this would be to cache the offsets on a per inode base. Otherwise the user could easily DoS this by opening a number of directories and never close them. Regards, Jan -- Jan Blunck <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
On Wed, Dec 05, Dave Hansen wrote: I think the key here is what kind of consistency we're trying to provide. If a directory is being changed underneath a reader, what kinds of guarantees do they get about the contents of their directory read? When do those guarantees start? Are there any at open() time? But we still want to be compliant to what POSIX defines. The problem isn't the consistency of the readdir result but the seekdir/telldir interface. IMHO that interface is totally broken: you need to be able to find every offset given by telldir since the last open. The problem is that seekdir isn't able to return errors. Otherwise you could just forbid seeking on union directories. Rather than give each _dirent_ an offset, could we give each sub-mount an offset? Let's say we have three members comprising a union mount directory. The first has 100 dirents, the second 200, and the third 10,000. When the first readdir is done, we populate the table like this: mount_offset[0] = 0; mount_offset[1] = 100; mount_offset[2] = 300; If someone seeks back to 150, then we subtrack the mount[1]'s offset (100), and realize that we want the 50th dirent from mount[1]. Yes, that is a nice idea and it is exactly what I have implemented in my patch series. But you forgot one thing: directories are not flat files. The dentry offset in a directory is a random cookie. Therefore it is not possible to have a linear mapping without allocating memory. I don't know whether we're bound to this: http://www.opengroup.org/onlinepubs/007908775/xsh/readdir.html If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. But that would seem to tell me that once you populate a table such as the one I've described and create it at open(dir) time, you don't actually ever need to update it. Yes, I'm using such a patch on our S390 buildservers to work around some readdir/seek/rm problem with old glibc versions. It seems to work but on the other hand this are really huge systems and I haven't run out of memory while doing a readdir yet ;) The proper way to implement this would be to cache the offsets on a per inode base. Otherwise the user could easily DoS this by opening a number of directories and never close them. Regards, Jan -- Jan Blunck [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
On Thu, Dec 06, 2007 at 11:01:18AM +0100, Jan Blunck wrote: On Wed, Dec 05, Dave Hansen wrote: I think the key here is what kind of consistency we're trying to provide. If a directory is being changed underneath a reader, what kinds of guarantees do they get about the contents of their directory read? When do those guarantees start? Are there any at open() time? But we still want to be compliant to what POSIX defines. The problem isn't the consistency of the readdir result but the seekdir/telldir interface. IMHO that interface is totally broken: you need to be able to find every offset given by telldir since the last open. The problem is that seekdir isn't able to return errors. Otherwise you could just forbid seeking on union directories. Also, what kind of consistency is expected when a directory is open(2)ed and readdir(2) and lseek(2) are applied to it when the directory gets changed underneath the reader. From this: http://www.opengroup.org/onlinepubs/009695399/functions/lseek.html the behaviour/guarantees wasn't apparent to me. Rather than give each _dirent_ an offset, could we give each sub-mount an offset? Let's say we have three members comprising a union mount directory. The first has 100 dirents, the second 200, and the third 10,000. When the first readdir is done, we populate the table like this: mount_offset[0] = 0; mount_offset[1] = 100; mount_offset[2] = 300; If someone seeks back to 150, then we subtrack the mount[1]'s offset (100), and realize that we want the 50th dirent from mount[1]. Yes, that is a nice idea and it is exactly what I have implemented in my patch series. But you forgot one thing: directories are not flat files. The dentry offset in a directory is a random cookie. Therefore it is not possible to have a linear mapping without allocating memory. And I defined this linear behaviour on the cache of dirents we maintain in the approach I posted. And the main reason we maintain cache of dirents in memory is for duplicate elimination. I don't know whether we're bound to this: http://www.opengroup.org/onlinepubs/007908775/xsh/readdir.html If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. But that would seem to tell me that once you populate a table such as the one I've described and create it at open(dir) time, you don't actually ever need to update it. Yes, I'm using such a patch on our S390 buildservers to work around some readdir/seek/rm problem with old glibc versions. It seems to work but on the other hand this are really huge systems and I haven't run out of memory while doing a readdir yet ;) The proper way to implement this would be to cache the offsets on a per inode base. Otherwise the user could easily DoS this by opening a number of directories and never close them. You mean cache the offsets or dirents ? How would that solve the seek problem ? How would it enable you to define a seek behaviour for the entire union of directories ? Regards, Bharata. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
On Thu, 2007-12-06 at 11:01 +0100, Jan Blunck wrote: Rather than give each _dirent_ an offset, could we give each sub-mount an offset? Let's say we have three members comprising a union mount directory. The first has 100 dirents, the second 200, and the third 10,000. When the first readdir is done, we populate the table like this: mount_offset[0] = 0; mount_offset[1] = 100; mount_offset[2] = 300; If someone seeks back to 150, then we subtrack the mount[1]'s offset (100), and realize that we want the 50th dirent from mount[1]. Yes, that is a nice idea and it is exactly what I have implemented in my patch series. But you forgot one thing: directories are not flat files. The dentry offset in a directory is a random cookie. Therefore it is not possible to have a linear mapping without allocating memory. Is is truly a random cookie where the fs gets to put anything in there that it can fit in a long? Isn't that an advantage? Being a random cookie, we can encode anything we want in there. We could encode the mount index (or whatever you want to call it) in high or low bits and have the rest store the fs-specific offset. The only problem there would be running out of storage space in long. But, what do people expect when they have huge directories (or huge directory offsets)? Surely, if their fs is already pressing the fs's directory data types to the limits, they can't simply union mount a couple of those directories together and expect magic. Union mounts also can't be expected to compress these directory positions to fit. So, I think it's reasonable behavior to readdir() until the sum of the highest position seen on each mount would overflow the off_t that we have to store it in. -- Dave -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
Bharata B Rao: - The cache can grow arbitrarily large in size for big directories thereby consuming lots of memory. Pruning individual cache entries is out of question as entire cache is needed for subsequent readdirs for duplicate elimination. Additionally, the memory usage may be a problem too since your implementation calls kmalloc() for every names. - Whenever _any_ directory that is part of the union gets modified (addition/deletion of entries), the dirent cache of all the unions which this directory is part of, needs to be purged and rebuilt. This is expensive not only due to re-reads of dirents but also because readdir(2)/getdents(2) needs to be synchronized with other operations like mkdir/mknod/link/unlink etc. The cache in struct file doesn't need to be refreshed unless rewinddir() is issued. Also you can maintain the cache in every add/del entries, instead of discarding the cache entirely. After all this, I am beginning to think if it would be better to delegate this readdir and whiteout processing to userspace. Can this be better handled Yes, I had such idea once. And copy-up too. They can be done in userspace (while you need to be careful about the privilege). Anyway I agree with you. As I wrote before, this approach consumes a lot of memory and cpu (for comparing whiteouted names). Junjiro Okajima -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
On Wed, 2007-12-05 at 20:07 +0530, Bharata B Rao wrote: > In this approach, the cached dirents are given offsets in the form of > linearly increasing indices/cookies (like 0, 1, 2,...). This helps us to > uniformly define offsets across all the directories of the union > irrespective of the type of filesystem involved. Also this is needed to > define a seek behaviour on the union mounted directory. This cache is stored > as part of the struct file of the topmost directory of the union and will > remain as long as the directory is kept open. That's a little brute force, don't you think? Especially the memory exhaustion seems to really make this an undesirable approach. I think the key here is what kind of consistency we're trying to provide. If a directory is being changed underneath a reader, what kinds of guarantees do they get about the contents of their directory read? When do those guarantees start? Are there any at open() time? Rather than give each _dirent_ an offset, could we give each sub-mount an offset? Let's say we have three members comprising a union mount directory. The first has 100 dirents, the second 200, and the third 10,000. When the first readdir is done, we populate the table like this: mount_offset[0] = 0; mount_offset[1] = 100; mount_offset[2] = 300; If someone seeks back to 150, then we subtrack the mount[1]'s offset (100), and realize that we want the 50th dirent from mount[1]. I don't know whether we're bound to this: http://www.opengroup.org/onlinepubs/007908775/xsh/readdir.html "If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified." But that would seem to tell me that once you populate a table such as the one I've described and create it at open(dir) time, you don't actually ever need to update it. The storage for this is only comparable to the number of mounts that you have. One issue comes if we manage to overflow our data types with too many entries in too many submounts. But, I guess we can just truncate the directory in that case. -- Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
Hi, In Union Mount, the merged view of directories of the union is obtained by enhancing readdir(2)/getdents(2) to read and merge the entries of all the directories by eliminating the duplicates. While we have tried a few approaches for this, none of them could perfectly solve all the problems. One of the challenges has been to provide a correct directory seek support for the union mounted directories. Sometime back when I posted an RFC (http://lkml.org/lkml/2007/9/7/22) on this problem, one of the suggestions I got was to have the dirents stored in a cache (which we already do for duplicate elimination) and define the directory seek behaviour on this cache constructed out of unioned directories. I set out to try this and the result is this set of patches. I am myself not impressed by the implementation complexity in these patches but posting them here only to get further comments and suggestions. Moreover I haven't debugged them thoroughly to uncover all the problems. While I don't expect anybody try these patches, for the completeness sake I have to mention that these apply on top of Jan Blunck's patches on 2.6.24-rc2-mm1 (ftp://ftp.suse.com/pub/people/jblunck/patches/). In this approach, the cached dirents are given offsets in the form of linearly increasing indices/cookies (like 0, 1, 2,...). This helps us to uniformly define offsets across all the directories of the union irrespective of the type of filesystem involved. Also this is needed to define a seek behaviour on the union mounted directory. This cache is stored as part of the struct file of the topmost directory of the union and will remain as long as the directory is kept open. While this approach works, it has the following downsides: - The cache can grow arbitrarily large in size for big directories thereby consuming lots of memory. Pruning individual cache entries is out of question as entire cache is needed for subsequent readdirs for duplicate elimination. - When an exising union gets overlaid by a new directory, there is a possibility of the cache getting duplicated for the new union, thereby wasting more space. - Whenever _any_ directory that is part of the union gets modified (addition/deletion of entries), the dirent cache of all the unions which this directory is part of, needs to be purged and rebuilt. This is expensive not only due to re-reads of dirents but also because readdir(2)/getdents(2) needs to be synchronized with other operations like mkdir/mknod/link/unlink etc. - Since lseek(2) of the unioned directory also works on the same dirent cache, it too needs to be invalidated when the directory gets modified. - Supporting SEEK_END of lseek(2) is expensive as it involves reading in all the dirents to know the EOF of the directory file. After all this, I am beginning to think if it would be better to delegate this readdir and whiteout processing to userspace. Can this be better handled by readdir(3) in glibc ? But even with this, I am not sure if correct seek behaviour (from lseek(2) or seekdir(3)) can be achieved in an easy manner. Any thoughts on this ? Earlier Erez Zodak had suggested that things will become easier if readdir state is stored as a disk file (http://lkml.org/lkml/2007/9/7/114). This approach simplifies directory seek support in Unionfs. But I am not sure if such an approach would go well with VFS based unification approach like Union Mount. Regards, Bharata. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
Hi, In Union Mount, the merged view of directories of the union is obtained by enhancing readdir(2)/getdents(2) to read and merge the entries of all the directories by eliminating the duplicates. While we have tried a few approaches for this, none of them could perfectly solve all the problems. One of the challenges has been to provide a correct directory seek support for the union mounted directories. Sometime back when I posted an RFC (http://lkml.org/lkml/2007/9/7/22) on this problem, one of the suggestions I got was to have the dirents stored in a cache (which we already do for duplicate elimination) and define the directory seek behaviour on this cache constructed out of unioned directories. I set out to try this and the result is this set of patches. I am myself not impressed by the implementation complexity in these patches but posting them here only to get further comments and suggestions. Moreover I haven't debugged them thoroughly to uncover all the problems. While I don't expect anybody try these patches, for the completeness sake I have to mention that these apply on top of Jan Blunck's patches on 2.6.24-rc2-mm1 (ftp://ftp.suse.com/pub/people/jblunck/patches/). In this approach, the cached dirents are given offsets in the form of linearly increasing indices/cookies (like 0, 1, 2,...). This helps us to uniformly define offsets across all the directories of the union irrespective of the type of filesystem involved. Also this is needed to define a seek behaviour on the union mounted directory. This cache is stored as part of the struct file of the topmost directory of the union and will remain as long as the directory is kept open. While this approach works, it has the following downsides: - The cache can grow arbitrarily large in size for big directories thereby consuming lots of memory. Pruning individual cache entries is out of question as entire cache is needed for subsequent readdirs for duplicate elimination. - When an exising union gets overlaid by a new directory, there is a possibility of the cache getting duplicated for the new union, thereby wasting more space. - Whenever _any_ directory that is part of the union gets modified (addition/deletion of entries), the dirent cache of all the unions which this directory is part of, needs to be purged and rebuilt. This is expensive not only due to re-reads of dirents but also because readdir(2)/getdents(2) needs to be synchronized with other operations like mkdir/mknod/link/unlink etc. - Since lseek(2) of the unioned directory also works on the same dirent cache, it too needs to be invalidated when the directory gets modified. - Supporting SEEK_END of lseek(2) is expensive as it involves reading in all the dirents to know the EOF of the directory file. After all this, I am beginning to think if it would be better to delegate this readdir and whiteout processing to userspace. Can this be better handled by readdir(3) in glibc ? But even with this, I am not sure if correct seek behaviour (from lseek(2) or seekdir(3)) can be achieved in an easy manner. Any thoughts on this ? Earlier Erez Zodak had suggested that things will become easier if readdir state is stored as a disk file (http://lkml.org/lkml/2007/9/7/114). This approach simplifies directory seek support in Unionfs. But I am not sure if such an approach would go well with VFS based unification approach like Union Mount. Regards, Bharata. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/5] Union Mount: A Directory listing approach with lseek support
On Wed, 2007-12-05 at 20:07 +0530, Bharata B Rao wrote: In this approach, the cached dirents are given offsets in the form of linearly increasing indices/cookies (like 0, 1, 2,...). This helps us to uniformly define offsets across all the directories of the union irrespective of the type of filesystem involved. Also this is needed to define a seek behaviour on the union mounted directory. This cache is stored as part of the struct file of the topmost directory of the union and will remain as long as the directory is kept open. That's a little brute force, don't you think? Especially the memory exhaustion seems to really make this an undesirable approach. I think the key here is what kind of consistency we're trying to provide. If a directory is being changed underneath a reader, what kinds of guarantees do they get about the contents of their directory read? When do those guarantees start? Are there any at open() time? Rather than give each _dirent_ an offset, could we give each sub-mount an offset? Let's say we have three members comprising a union mount directory. The first has 100 dirents, the second 200, and the third 10,000. When the first readdir is done, we populate the table like this: mount_offset[0] = 0; mount_offset[1] = 100; mount_offset[2] = 300; If someone seeks back to 150, then we subtrack the mount[1]'s offset (100), and realize that we want the 50th dirent from mount[1]. I don't know whether we're bound to this: http://www.opengroup.org/onlinepubs/007908775/xsh/readdir.html If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified. But that would seem to tell me that once you populate a table such as the one I've described and create it at open(dir) time, you don't actually ever need to update it. The storage for this is only comparable to the number of mounts that you have. One issue comes if we manage to overflow our data types with too many entries in too many submounts. But, I guess we can just truncate the directory in that case. -- Dave -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/