Re: Massive slowdown when re-querying large nfs dir
Andrew Morton wrote: > > > I would suggest getting a 'tcpdump -s0' trace and seeing (with > > > wireshark) what is different between the various cases. > > > > Thanks Neil for looking into this. Your suggestion has already been > > answered in a previous post, where the difference has been attributed to > > "ls -l" inducing lookup for the first try, which is fast, and getattr > > for later tries, which is super-slow. > > > > Now it's easy to blame the userland rpc.nfs.V2 server for this, but > > what's not clear is how come 2.4.31 handles getattr faster than 2.6.23? > > We broke 2.6? It'd be interesting to run the ls in an infinite loop on > the client them start poking at the server. Is the 2.6 server doing > physical IO? Is the 2.6 server consuming more system time? etc. A basic > `vmstat 1' trace for both 2.4 and 2.6 would be a starting point. > > Could be that there's some additional latency caused by networking > changes, too. I expect the tcpdump/wireshark/etc traces would have > sufficient resolution for us to be able to see that. The problem turns out to be "tune2fs -O dir_index". Removing that feature resolves the big slowdown. Does 2.4.31 support this feature? Neil Brown wrote: > Maybe an "strace -tt" of the nfs server might show some significant > difference. ### # ls -l <3K dir entry> (first try after mount inducing lookup) in ~3sec # strace -tt rpc.nfsd 08:28:14.668557 time([1194499694]) = 1194499694 08:28:14.669420 alarm(5)= 2 08:28:14.669667 select(1024, [4 5], NULL, NULL, NULL) = 1 (in [4]) 08:28:14.670142 recvfrom(4, "\275\3607{\0\0\0\0\0\0\0\2\0\1\206\243\0\0\0\2\0\0\0\4"..., 8800, 0, {sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, [16]) = 116 08:28:14.670554 time(NULL) = 1194499694 08:28:14.670711 time([1194499694]) = 1194499694 08:28:14.670875 lstat("/a/x", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0 08:28:14.671134 time([1194499694]) = 1194499694 08:28:14.671302 lstat("/a/x/3619", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 08:28:14.671530 time([1194499694]) = 1194499694 08:28:14.671701 alarm(2)= 5 08:28:14.671903 time([1194499694]) = 1194499694 08:28:14.672060 lstat("/a/x/3619", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 08:28:14.672305 time([1194499694]) = 1194499694 08:28:14.672508 sendto(4, "\275\3607{\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128, 0, {sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, 16) = 128 08:28:14.672909 time([1194499694]) = 1194499694 08:28:14.673869 alarm(5)= 2 08:28:14.674145 select(1024, [4 5], NULL, NULL, NULL) = 1 (in [4]) 08:28:14.674589 recvfrom(4, "\276\3607{\0\0\0\0\0\0\0\2\0\1\206\243\0\0\0\2\0\0\0\4"..., 8800, 0, {sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, [16]) = 116 08:28:14.675003 time(NULL) = 1194499694 08:28:14.675160 time([1194499694]) = 1194499694 08:28:14.675321 lstat("/a/x", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0 08:28:14.675581 time([1194499694]) = 1194499694 08:28:14.675749 lstat("/a/x/3631", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 08:28:14.675979 time([1194499694]) = 1194499694 08:28:14.676150 alarm(2)= 5 08:28:14.676348 time([1194499694]) = 1194499694 08:28:14.676505 lstat("/a/x/3631", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 08:28:14.676746 time([1194499694]) = 1194499694 08:28:14.676952 sendto(4, "\276\3607{\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128, 0, {sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, 16) = 128 ## # ls -l <3K dir entry> (second try after mount inducing getattr) in ~11sec # strace -tt rpc.nfsd 08:28:40.963668 time([1194499720]) = 1194499720 08:28:40.964525 alarm(5)= 2 08:28:40.964772 select(1024, [4 5], NULL, NULL, NULL) = 1 (in [4]) 08:28:40.965215 recvfrom(4, ",\3747{\0\0\0\0\0\0\0\2\0\1\206\243\0\0\0\2\0\0\0\1\0\0"..., 8800, 0, {sa_family=AF_INET, sin_port=htons(888), sin_addr=inet_addr("10.0.0.111")}, [16]) = 108 08:28:40.965609 time(NULL) = 1194499720 08:28:40.965763 time([1194499720]) = 1194499720 08:28:40.965941 stat("/", {st_mode=S_IFDIR|0755, st_size=2048, ...}) = 0 08:28:40.966176 setfsuid(0) = 0 08:28:40.966329 stat("/", {st_mode=S_IFDIR|0755, st_size=2048, ...}) = 0 08:28:40.966539 stat("/", {st_mode=S_IFDIR|0755, st_size=2048, ...}) = 0 08:28:40.966748 open("/", O_RDONLY|O_NONBLOCK) = 0 08:28:40.966919 fcntl(0, F_SETFD, FD_CLOEXEC) = 0 08:28:40.967084 lseek(0, 0, SEEK_CUR) = 0 08:28:40.967240 getdents(0, /* 71 entries */, 3933) = 1220 08:28:40.968195 close(0)= 0 08:28:40.968351 stat("/a/", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0 08:28:40.968583 stat("/a/",
Re: writeout stalls in current -git
On Wed, Nov 07, 2007 at 08:15:06AM +0100, Torsten Kaiser wrote: > On 11/7/07, David Chinner <[EMAIL PROTECTED]> wrote: > > Ok, so it's not synchronous writes that we are doing - we're just > > submitting bio's tagged as WRITE_SYNC to get the I/O issued quickly. > > The "synchronous" nature appears to be coming from higher level > > locking when reclaiming inodes (on the flush lock). It appears that > > inode write clustering is failing completely so we are writing the > > same block multiple times i.e. once for each inode in the cluster we > > have to write. > > Works for me. The only remaining stalls are sub second and look > completely valid, considering the amount of files being removed. > Tested-by: Torsten Kaiser <[EMAIL PROTECTED]> Great - thanks for reporting the problem and testing the fix. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with accessing namespace_sem from LSM.
Hello. Christoph Hellwig wrote: > Same argument as with the AA folks: it does not have any business looking > at the vfsmount. If you create a file it can and in many setups will > show up in multiple vfsmounts, so making decisions based on the particular > one this creat happens through is wrong and actually dangerous. Thus TOMOYO 1.x doesn't use LSM hooks, and AppArmor for OpenSuSE 10.3 added "struct vfsmount" parameter for VFS helper functions and LSM hooks. Not all systems use bind mounts. There is likely only one vfsmount which corresponds with a given dentry. What does "dangerous" mean? It causes crash? Regards. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Large SMBwriteX testing.
I have verified that it works for the case in which "min receivefile size" is under 128K. When I set it to 25 and tried to read 148000 there were two or three problems (reply_write_and_X in Samba is calling smb_len instead of smb_len_large and it is looking for "req->unread_bytes" incorrectly in a few places in reply.c and fileio.c On Nov 2, 2007 6:43 PM, Jeremy Allison <[EMAIL PROTECTED]> wrote: > Hi Steve, > > I've finished adding the ability for smbd to support up to > 16MB writeX calls in the latest git 3.2 tree. > > To enable, set the parameter : > > min receivefile size = XXX > > where XXX is the smallest writeX you want to handle with recvfile. > > The linux kernel doesn't yet support zerocopy from network to > file (ie. splice only works one way currently) so it's emulated > in userspace (with a 128k staging buffer) for now. > > Also it must be an unsigned connection (for obvious reasons). > > Once you've set this param smbd will start reporting > CIFS_UNIX_LARGE_WRITE_CAP on a SMB_QUERY_CIFS_UNIX_INFO: > call and you should be good to go. You'll need to use > a writeX call identical to Windows (14 wct with a 1 byte > pad field) in order to trigger the new code. > > Let me know if you get the chance to test it and if > it makes a speed difference for CIFSFS. > > Cheers, > > Jeremy. > -- Thanks, Steve - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cramfs in big endian
On Wed, Nov 07, 2007 at 09:51:48PM +0100, Andi Drebes wrote: > Hi! > > > I would suggest you to use squashfs instead of cramfs. > > First, it's newer, it's better, it's actively developed, it doesn't have any > > limits like the bad cramfs. > I'm developing a new linux based firmware for my router which uses cramfs. > Switching to squashfs still needs some time. Meanwhile, I have to work with > cramfs. As the router uses the big endian format and as my machine works with > the little endian format, I'm unable to mount the router's filesystem images. Making cramfs endianess-independent shouldn't be much work. Take a look at the helpers in fs/ufs/swab.h and use them for every ondisk access in cramfs. Drop me a not if you need some help. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with accessing namespace_sem from LSM.
On Thu, Nov 08, 2007 at 07:04:23AM +0900, Tetsuo Handa wrote: > The reason why I want to access namespace_sem inside security_inode_create() > is that > it doesn't receive "struct vfsmount" parameter. > If "struct vfsmount" *were* passed to security_inode_create(), > I have no need to access namespace_sem. Same argument as with the AA folks: it does not have any business looking at the vfsmount. If you create a file it can and in many setups will show up in multiple vfsmounts, so making decisions based on the particular one this creat happens through is wrong and actually dangerous. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with accessing namespace_sem from LSM.
Hello. Christoph Hellwig wrote: > > Isn't security_inode_create() a part of VFS internals? > It's not. security_inode_create is part of the LSM infrastructure, and > the actual methods are part of security modules and definitively not > VFS internals. The reason why I want to access namespace_sem inside security_inode_create() is that it doesn't receive "struct vfsmount" parameter. If "struct vfsmount" *were* passed to security_inode_create(), I have no need to access namespace_sem. And now, since calling down_read(&namespace_sem) causes deadlock, I'm looking for a solution. What you said ("I'd start looking for design bugs in whatever code you have using it first.") sounds "never try to implement pathname based access control at security_inode_create()", which makes AppArmor (for OpenSuSE 10.1/10.2) and TOMOYO unable to apply access control. At first, I thought that this lockdep's warning is a false positive, since "struct inode" is allocated/freed dynamically. But the warning still appears even after I disabled freeing memory at destroy_inode() in fs/namei.c (so that address of locking object in "struct inode" never be reused), it is likely genuine. Regards. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive slowdown when re-querying large nfs dir - CORRECTION
On Thursday November 8, [EMAIL PROTECTED] wrote: > > Not really a credible difference as the reported difference is between > two *clients* and the speed of getattr vs lookup would depend on the > *server*. Sorry, my bad. I misread your original problem description. It would appear to be a server difference. Maybe an "strace -tt" of the nfs server might show some significant difference. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive slowdown when re-querying large nfs dir
On Wednesday November 7, [EMAIL PROTECTED] wrote: > Neil Brown wrote: > > > > I would suggest getting a 'tcpdump -s0' trace and seeing (with > > wireshark) what is different between the various cases. > > Thanks Neil for looking into this. Your suggestion has already been answered > in a previous post, where the difference has been attributed to "ls -l" > inducing lookup for the first try, which is fast, and getattr for later > tries, which is super-slow. Not really a credible difference as the reported difference is between two *clients* and the speed of getattr vs lookup would depend on the *server*. > > Now it's easy to blame the userland rpc.nfs.V2 server for this, but what's > not clear is how come 2.4.31 handles getattr faster than 2.6.23? I suspect a more detailed analysis of the traces is in order. I strongly suspect you will see a difference between the two clients, and you have only reported a difference between the first and second "ls -l" (unless I missed some email). It seems most likely that 2.6 is issuing substantially more GETATTR requests than 2.4. There have certainly been reports of this in the past and they have been either fixed or justified. This may be a new situation. Or it may be that 2.4 was being fast by being incorrect in some way. Only an analysis of the logs would tell. Maybe you would like to post the (binary, using "-s 0") traces for both 2.4 and 2.6 NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cramfs in big endian
Hi! > I would suggest you to use squashfs instead of cramfs. > First, it's newer, it's better, it's actively developed, it doesn't have any > limits like the bad cramfs. I'm developing a new linux based firmware for my router which uses cramfs. Switching to squashfs still needs some time. Meanwhile, I have to work with cramfs. As the router uses the big endian format and as my machine works with the little endian format, I'm unable to mount the router's filesystem images. Andi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fs io with struct page instead of iovecs
On Wed, Nov 07, 2007 at 09:02:05AM -0800, Zach Brown wrote: > Badari Pulavarty wrote: > > On Tue, 2007-11-06 at 17:43 -0800, Zach Brown wrote: > >> At the FS meeting at LCE there was some talk of doing O_DIRECT writes from > >> the > >> kernel with pages instead of with iovecs. T > > > > Why ? Whats the use case ? > > Well, I think there's a few: > > There are existing callers which hold a kmap() across ->write, which > isn't great. ecryptfs() does this. That's mentioned in the patch > series. Arguably loopback should be using this instead of copying some > fs paths and trying to call aop methods directly. > > I seem to remember Christoph and David having stories of knfsd folks in > SGI wanting to do O_DIRECT writes from knfsd? (If not, *I* kind of want > to, after rolling some patches to align net rx descriptors :)). The main reason is to remove the serialised writer problem when multiple clients are writing to the one file. With XFS and direct I/O, we can have multiple concurrent writers to the one file and have it scale rather than be limited to what a single cpu holding the i_mutex can do > Lustre shows us that there is a point at which you can't saturate your > network and storage if your cpu is copying all the data. Buy more CPUs ;) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with accessing namespace_sem from LSM.
On Tue, Nov 06, 2007 at 11:52:40PM +0900, Tetsuo Handa wrote: > Hello. > > Christoph Hellwig wrote: > > Any code except VFS internals has no business using it at all and doesn't > > do that in mainline either. I'd start looking for design bugs in whatever > > code you have using it first. > Isn't security_inode_create() a part of VFS internals? It's not. security_inode_create is part of the LSM infrastructure, and the actual methods are part of security modules and definitively not VFS internals. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive slowdown when re-querying large nfs dir
> On Wed, 7 Nov 2007 12:36:26 +0300 Al Boldi <[EMAIL PROTECTED]> wrote: > Neil Brown wrote: > > On Tuesday November 6, [EMAIL PROTECTED] wrote: > > > > On Tue, 6 Nov 2007 14:28:11 +0300 Al Boldi <[EMAIL PROTECTED]> wrote: > > > > Al Boldi wrote: > > > > > There is a massive (3-18x) slowdown when re-querying a large nfs dir > > > > > (2k+ entries) using a simple ls -l. > > > > > > > > > > On 2.6.23 client and server running userland rpc.nfs.V2: > > > > > first try: time -p ls -l <2k+ entry dir> in ~2.5sec > > > > > more tries: time -p ls -l <2k+ entry dir> in ~8sec > > > > > > > > > > first try: time -p ls -l <5k+ entry dir> in ~9sec > > > > > more tries: time -p ls -l <5k+ entry dir> in ~180sec > > > > > > > > > > On 2.6.23 client and 2.4.31 server running userland rpc.nfs.V2: > > > > > first try: time -p ls -l <2k+ entry dir> in ~2.5sec > > > > > more tries: time -p ls -l <2k+ entry dir> in ~7sec > > > > > > > > > > first try: time -p ls -l <5k+ entry dir> in ~8sec > > > > > more tries: time -p ls -l <5k+ entry dir> in ~43sec > > > > > > > > > > Remounting the nfs-dir on the client resets the problem. > > > > > > > > > > Any ideas? > > > > > > > > Ok, I played some more with this, and it turns out that nfsV3 is a lot > > > > faster. But, this does not explain why the 2.4.31 kernel is still > > > > over 4-times faster than 2.6.23. > > > > > > > > Can anybody explain what's going on? > > > > > > Sure, Neil can! ;) > > Thanks Andrew! > > > Nuh. > > He said "userland rpc.nfs.Vx". I only do "kernel-land NFS". In these > > days of high specialisation, each line of code is owned by a different > > person, and finding the right person is hard > > > > I would suggest getting a 'tcpdump -s0' trace and seeing (with > > wireshark) what is different between the various cases. > > Thanks Neil for looking into this. Your suggestion has already been answered > in a previous post, where the difference has been attributed to "ls -l" > inducing lookup for the first try, which is fast, and getattr for later > tries, which is super-slow. > > Now it's easy to blame the userland rpc.nfs.V2 server for this, but what's > not clear is how come 2.4.31 handles getattr faster than 2.6.23? > We broke 2.6? It'd be interesting to run the ls in an infinite loop on the client them start poking at the server. Is the 2.6 server doing physical IO? Is the 2.6 server consuming more system time? etc. A basic `vmstat 1' trace for both 2.4 and 2.6 would be a starting point. Could be that there's some additional latency caused by networking changes, too. I expect the tcpdump/wireshark/etc traces would have sufficient resolution for us to be able to see that. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fs io with struct page instead of iovecs
Badari Pulavarty wrote: > On Tue, 2007-11-06 at 17:43 -0800, Zach Brown wrote: >> At the FS meeting at LCE there was some talk of doing O_DIRECT writes from >> the >> kernel with pages instead of with iovecs. T > > Why ? Whats the use case ? Well, I think there's a few: There are existing callers which hold a kmap() across ->write, which isn't great. ecryptfs() does this. That's mentioned in the patch series. Arguably loopback should be using this instead of copying some fs paths and trying to call aop methods directly. I seem to remember Christoph and David having stories of knfsd folks in SGI wanting to do O_DIRECT writes from knfsd? (If not, *I* kind of want to, after rolling some patches to align net rx descriptors :)). Lustre shows us that there is a point at which you can't saturate your network and storage if your cpu is copying all the data. I'll be the first to admit that the community might not feel a pressing need to address this for in-kernel file system writers, but the observation remains. - z - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fs io with struct page instead of iovecs
On Tue, 2007-11-06 at 17:43 -0800, Zach Brown wrote: > At the FS meeting at LCE there was some talk of doing O_DIRECT writes from the > kernel with pages instead of with iovecs. T Why ? Whats the use case ? Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANN] Squashfs 3.3 released
maximilian attems wrote: On Mon, Nov 05, 2007 at 11:13:14AM +, Phillip Lougher wrote: The next stage after this release is to fix the one remaining blocking issue (filesystem endianness), and then try to get Squashfs mainlined into the Linux kernel again. that would be very cool! Yes, it would be cool :) Five years is a long time to maintain something out of tree, especially recently when there's been so many minor changes to the VFS interface between kernel releases. with my hat as debian kernel maintainer i'd be very relieved to see it mainlined. i don't know of any major distro that doesn't ship it. I don't know of any major distro that doesn't ship Squashfs either (except arguably Slackware). Putting my other hat on (one of the Ubuntu kernel maintainers) I don't think Squashfs has caused distros that many problems because it is an easy patch to apply (it doesn't touch that many kernel files), but it is always good to minimise the differences from the stock kernel.org kernel. Phillip - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANN] Squashfs 3.3 released
Michael Tokarev wrote: A tiny bug[fix] I always forgot to send... In fs/squashfs/inode.c, constants TASK_UNINTERRUPTIBLE and TASK_INTERRUPTIBLE are used, but they aren't sometimes defined (declared in linux/sched.h): Thanks - Squashfs gained a lot of #includes over time, many which I deemed were unnecessary and removed in Squashfs 3.2. I obviously removed too many. Fix applied to CVS. Phillip - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: migratepage failures on reiserfs
On Wed, 2007-11-07 at 14:56 +, Mel Gorman wrote: > On (05/11/07 14:46), Christoph Lameter didst pronounce: > > On Mon, 5 Nov 2007, Mel Gorman wrote: > > > > > The grow_dev_page() pages should be reclaimable even though migration > > > is not supported for those pages? They were marked movable as it was > > > useful for lumpy reclaim taking back pages for hugepage allocations and > > > the like. Would it make sense for memory unremove to attempt migration > > > first and reclaim second? > > > > Note that a page is still movable even if there is no file system method > > for migration available. In that case the page needs to be cleaned before > > it can be moved. > > > > Badari, do you know if the pages failed to migrate because they were > dirty or because the filesystem simply had ownership of the pages and > wouldn't let them go? >From the debug, it looks like all the buffers are clean and they have a b_count == 1. So drop_buffers() fails to release the buffer. Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cramfs in big endian
> I'm currently trying to enable the cramfs to mount filesystems with a > different endianness. I would suggest you to use squashfs instead of cramfs. First, it's newer, it's better, it's actively developed, it doesn't have any limits like the bad cramfs. Moreover, it currently supports both endians. (hurry up, as kernel people said in the past that squashfs should NEVER EVER support multiple endians, so the feature will be dropped from squashfs, in order to get it into mainline kernel more easily; if my informations are correct). Tomas M - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: migratepage failures on reiserfs
On (05/11/07 14:46), Christoph Lameter didst pronounce: > On Mon, 5 Nov 2007, Mel Gorman wrote: > > > The grow_dev_page() pages should be reclaimable even though migration > > is not supported for those pages? They were marked movable as it was > > useful for lumpy reclaim taking back pages for hugepage allocations and > > the like. Would it make sense for memory unremove to attempt migration > > first and reclaim second? > > Note that a page is still movable even if there is no file system method > for migration available. In that case the page needs to be cleaned before > it can be moved. > Badari, do you know if the pages failed to migrate because they were dirty or because the filesystem simply had ownership of the pages and wouldn't let them go? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive slowdown when re-querying large nfs dir
Neil Brown wrote: > On Tuesday November 6, [EMAIL PROTECTED] wrote: > > > On Tue, 6 Nov 2007 14:28:11 +0300 Al Boldi <[EMAIL PROTECTED]> wrote: > > > Al Boldi wrote: > > > > There is a massive (3-18x) slowdown when re-querying a large nfs dir > > > > (2k+ entries) using a simple ls -l. > > > > > > > > On 2.6.23 client and server running userland rpc.nfs.V2: > > > > first try: time -p ls -l <2k+ entry dir> in ~2.5sec > > > > more tries: time -p ls -l <2k+ entry dir> in ~8sec > > > > > > > > first try: time -p ls -l <5k+ entry dir> in ~9sec > > > > more tries: time -p ls -l <5k+ entry dir> in ~180sec > > > > > > > > On 2.6.23 client and 2.4.31 server running userland rpc.nfs.V2: > > > > first try: time -p ls -l <2k+ entry dir> in ~2.5sec > > > > more tries: time -p ls -l <2k+ entry dir> in ~7sec > > > > > > > > first try: time -p ls -l <5k+ entry dir> in ~8sec > > > > more tries: time -p ls -l <5k+ entry dir> in ~43sec > > > > > > > > Remounting the nfs-dir on the client resets the problem. > > > > > > > > Any ideas? > > > > > > Ok, I played some more with this, and it turns out that nfsV3 is a lot > > > faster. But, this does not explain why the 2.4.31 kernel is still > > > over 4-times faster than 2.6.23. > > > > > > Can anybody explain what's going on? > > > > Sure, Neil can! ;) Thanks Andrew! > Nuh. > He said "userland rpc.nfs.Vx". I only do "kernel-land NFS". In these > days of high specialisation, each line of code is owned by a different > person, and finding the right person is hard > > I would suggest getting a 'tcpdump -s0' trace and seeing (with > wireshark) what is different between the various cases. Thanks Neil for looking into this. Your suggestion has already been answered in a previous post, where the difference has been attributed to "ls -l" inducing lookup for the first try, which is fast, and getattr for later tries, which is super-slow. Now it's easy to blame the userland rpc.nfs.V2 server for this, but what's not clear is how come 2.4.31 handles getattr faster than 2.6.23? Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + embed-a-struct-path-into-struct-nameidata-instead-of-nd-dentrymnt.pa tch added to -mm tree
Junjiro Okajima, first of all thanks for the feedback on my union mount patches. On Tue, Nov 06, [EMAIL PROTECTED] wrote: > Whiteouts in your code can be a serious memory pressure, since they are > kept in dcache. I know the inode for whiteouts exists only one and it is > shared, but dentries for whiteouts are not. They are created for each > name and resident in dcache. > I am afraid it can be a problem easily when you create and unlink a > temporary file many times. Generally their filenames are unique. The problem that you describe is only existing on tmpfs as the topmost union layer. In all other cases the whiteout dentries can be shrinked like the dentries of other filetypes too. This is the price you have to pay for using union mounts because somewhere this information must be stored. With ext3 or other diskbased filesystems the whiteouts are stored on disk like normal files. Therefore the dentry cache can be shrinked and reread by a lookup. > Regarding to struct path in nameidata, I have no objection > basically. But I think it is better to create macros for backward > compatibility as struct file did. In case of f_dentry and f_mnt that was easy because you could use macros for it. Still people tend to be lazy and don't change their code if you don't force them (or do it for them). Anyway, in nameidata we used dentry and mnt as the field names. Therefore it isn't possible to use macros except of stuff like ND2DENTRY(nd) kind of stuff which is even worse. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html