Re: zpool doesn't upgrade - Re: ZFS directory with a large number of files
On Tue, Aug 2, 2011 at 8:59 PM, Ronald Klop ronald-freeb...@klop.yi.org wrote: On Tue, 02 Aug 2011 12:55:43 +0200, seanr...@gmail.com seanr...@gmail.com wrote: I think this zpool upgrade thing is weird. Can you try 'zpool upgrade -a'? Mine says: zpool get version zroot NAME PROPERTY VALUE SOURCE zroot version 28 default Mind the SOURCE=default vs. SOURCE=local. Is it possible you did 'zpool set version=15 tank' in the past? You can check that with 'zpool history'. NB: if you upgrade the boot pool, don't forget to upgrade to boot loader. (See UPDATING) % sudo zpool upgrade -a Password: This system is currently running ZFS pool version 15. All pools are formatted using this version. I checked zpool history and I never set the version explicitly. My 'world' is from the 8th of March; it's possible my tree is sufficiently old (my kernel was built on the 12th of June; I'm fairly sure its from the same tree as the world, but it's also possible my kernel and userland have been out of sync for 2 months). I'll upgrade this machine sometime soon and see if that fixes the issue. Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On 8/2/11 9:39 AM, seanr...@gmail.com wrote: Hi there, I Googled around and checked the PRs and wasn't successful in finding any reports of what I'm seeing. I'm hoping someone here can help me debug what's going on. On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. As you might imagine, a 'ls' of this directory took quite some time. The files were conveniently named with a timestamp in the filename (still images from a security camera, once per second) so I've since moved them all to timestamped directories (/MM/dd/hh/mm). What I found though was the original directory the images were in is still very slow to ls -- and it only has 1 file in it, another directory. While not addressing your original question, which many people have already, I'll toss in the following: I do hope you've disabled access times on your ZFS dataset ? zfs set atime=off YOUR_DATASET/supercamera/captures ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zpool doesn't upgrade - Re: ZFS directory with a large number of files
On Tue, 09 Aug 2011 10:38:01 +0200, seanr...@gmail.com seanr...@gmail.com wrote: On Tue, Aug 2, 2011 at 8:59 PM, Ronald Klop ronald-freeb...@klop.yi.org wrote: On Tue, 02 Aug 2011 12:55:43 +0200, seanr...@gmail.com seanr...@gmail.com wrote: I think this zpool upgrade thing is weird. Can you try 'zpool upgrade -a'? Mine says: zpool get version zroot NAME PROPERTY VALUESOURCE zroot version 28 default Mind the SOURCE=default vs. SOURCE=local. Is it possible you did 'zpool set version=15 tank' in the past? You can check that with 'zpool history'. NB: if you upgrade the boot pool, don't forget to upgrade to boot loader. (See UPDATING) % sudo zpool upgrade -a Password: This system is currently running ZFS pool version 15. All pools are formatted using this version. I checked zpool history and I never set the version explicitly. My 'world' is from the 8th of March; it's possible my tree is sufficiently old (my kernel was built on the 12th of June; I'm fairly sure its from the same tree as the world, but it's also possible my kernel and userland have been out of sync for 2 months). I'll upgrade this machine sometime soon and see if that fixes the issue. Sean You can set the property to 28 and upgrade after that. zpool set version=28 zroot ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On 06.08.11 09:24, Gary Palmer wrote: Its been quite a while since I worked on the filesystem stuff in any detail but I believe, at least for UFS, it doesn't GC the directory, just truncate it if enough of the entries at the end are deleted to free up at least one fragment or block. This was my point indeed. If you empty a directory or remove files form the end of the directory is it truncated, this is not really a GC, but rather a shortcut. I guess the reason why it does not use GC is because of concurrency/locking reasons. Or maybe the code was just not written yet. But with ZFS this should be much easier to implement. If it is the same in Solaris, then it is not done so far... But then, the promise made by ZFS is to provide constant directory access timing. I am just wondering.. does implementing such garbage collection merit a new ZFS filesystem version? Daniel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Aug 6, 2011, at 07:24, Gary Palmer wrote: On Fri, Aug 05, 2011 at 08:56:36PM -0700, Doug Barton wrote: On 08/05/2011 20:38, Daniel O'Connor wrote: Ahh, but OP had moved these files away and performance was still poor.. _that_ is the bug. I'm no file system expert, but it seems to me the key questions are; how long does it take the system to recover from this condition, and if it's more than N $periods is that a problem? We can't stop users from doing wacky stuff, but the system should be robust in the face of this. Its been quite a while since I worked on the filesystem stuff in any detail but I believe, at least for UFS, it doesn't GC the directory, just truncate it if enough of the entries at the end are deleted to free up at least one fragment or block. If you create N files and then a directory and move the N files into the directory, the directory entry will still be N+1 records into the directory and the only way to recover is to recreate the directory that formerly contained the N files. It is theoretically possible to compat the directory but since the code to do that wasn't written when I last worked with UFS I suspect its non trivial. I don't know what ZFS does in this situation It sounds like it does something similar. I re-ran the experiment to see if I could narrow down the problem. % mkdir foo % cd foo for i in {1..1000}; do touch $i; done % ls list % for file in $(cat list); do rm -f $file; done % time ls (slow!) % rm -f list % time ls (slow!) I would like to dig into this a bit more, I suppose it's probably a good enough reason to explore how DTrace works :) Sean___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Sun, Aug 7, 2011 at 10:20 AM, Sean Rees seanr...@gmail.com wrote: On Aug 6, 2011, at 07:24, Gary Palmer wrote: On Fri, Aug 05, 2011 at 08:56:36PM -0700, Doug Barton wrote: On 08/05/2011 20:38, Daniel O'Connor wrote: Ahh, but OP had moved these files away and performance was still poor.. _that_ is the bug. I'm no file system expert, but it seems to me the key questions are; how long does it take the system to recover from this condition, and if it's more than N $periods is that a problem? We can't stop users from doing wacky stuff, but the system should be robust in the face of this. Its been quite a while since I worked on the filesystem stuff in any detail but I believe, at least for UFS, it doesn't GC the directory, just truncate it if enough of the entries at the end are deleted to free up at least one fragment or block. If you create N files and then a directory and move the N files into the directory, the directory entry will still be N+1 records into the directory and the only way to recover is to recreate the directory that formerly contained the N files. It is theoretically possible to compat the directory but since the code to do that wasn't written when I last worked with UFS I suspect its non trivial. I don't know what ZFS does in this situation It sounds like it does something similar. I re-ran the experiment to see if I could narrow down the problem. % mkdir foo % cd foo for i in {1..1000}; do touch $i; done Self-pedant mode enabled: for i in {1..100} :) I truncated the zeros in correcting the copy/paste from my shell :) Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Fri, Aug 05, 2011 at 08:56:36PM -0700, Doug Barton wrote: On 08/05/2011 20:38, Daniel O'Connor wrote: Ahh, but OP had moved these files away and performance was still poor.. _that_ is the bug. I'm no file system expert, but it seems to me the key questions are; how long does it take the system to recover from this condition, and if it's more than N $periods is that a problem? We can't stop users from doing wacky stuff, but the system should be robust in the face of this. Its been quite a while since I worked on the filesystem stuff in any detail but I believe, at least for UFS, it doesn't GC the directory, just truncate it if enough of the entries at the end are deleted to free up at least one fragment or block. If you create N files and then a directory and move the N files into the directory, the directory entry will still be N+1 records into the directory and the only way to recover is to recreate the directory that formerly contained the N files. It is theoretically possible to compat the directory but since the code to do that wasn't written when I last worked with UFS I suspect its non trivial. I don't know what ZFS does in this situation Gary ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
Daniel Kalchev dan...@digsys.bg wrote: On 02.08.11 12:46, Daniel O'Connor wrote: I am pretty sure UFS does not have this problem. i.e. once you delete/move the files out of the directory its performance would be good again. UFS would be the classic example of poor performance if you do this. Classic indeed. UFS dirhash has pretty much taken care of this a decade ago. -- Christian naddy Weisgerber na...@mips.inka.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
Hi, all, Am 05.08.2011 um 17:12 schrieb Christian Weisgerber: Daniel Kalchev dan...@digsys.bg wrote: On 02.08.11 12:46, Daniel O'Connor wrote: I am pretty sure UFS does not have this problem. i.e. once you delete/move the files out of the directory its performance would be good again. UFS would be the classic example of poor performance if you do this. Classic indeed. UFS dirhash has pretty much taken care of this a decade ago. While dirhash is quite an improvement, it is definitely no silver bullet. When I asked Kirk McKusick at last year's EuroBSDCon if having a six-figure number of files in a single directory was a clever idea (I just had a customer who ran into that situation), he just smiled and shook his head. The directory in question was the typo3temp/pics directory that TYPO3 uses to scale images that are part of the website, so they are handed to the browser in exactly the size they are supposed to be rendered. The performance impact was quite heavy, because at some point requests started to pile up, PHP scripts did not finish in time, fcgi slots stayed used ... most of you will know that scenario. At some threshold a machine goes from loaded, maybe a bit slow, but generally responsive to no f*ing way. Best regards, Patrick ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On 06/08/2011, at 5:17, Patrick M. Hausen wrote: Am 05.08.2011 um 17:12 schrieb Christian Weisgerber: Daniel Kalchev dan...@digsys.bg wrote: On 02.08.11 12:46, Daniel O'Connor wrote: I am pretty sure UFS does not have this problem. i.e. once you delete/move the files out of the directory its performance would be good again. UFS would be the classic example of poor performance if you do this. Classic indeed. UFS dirhash has pretty much taken care of this a decade ago. While dirhash is quite an improvement, it is definitely no silver bullet. When I asked Kirk McKusick at last year's EuroBSDCon if having a six-figure number of files in a single directory was a clever idea (I just had a customer who ran into that situation), he just smiled and shook his head. Ahh, but OP had moved these files away and performance was still poor.. _that_ is the bug. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On 08/05/2011 20:38, Daniel O'Connor wrote: Ahh, but OP had moved these files away and performance was still poor.. _that_ is the bug. I'm no file system expert, but it seems to me the key questions are; how long does it take the system to recover from this condition, and if it's more than N $periods is that a problem? We can't stop users from doing wacky stuff, but the system should be robust in the face of this. -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On 08/05/11 21:47, Patrick M. Hausen wrote: Hi, all, Am 05.08.2011 um 17:12 schrieb Christian Weisgerber: Daniel Kalchevdan...@digsys.bg wrote: On 02.08.11 12:46, Daniel O'Connor wrote: I am pretty sure UFS does not have this problem. i.e. once you delete/move the files out of the directory its performance would be good again. UFS would be the classic example of poor performance if you do this. Classic indeed. UFS dirhash has pretty much taken care of this a decade ago. While dirhash is quite an improvement, it is definitely no silver bullet. When I asked Kirk McKusick at last year's EuroBSDCon if having a six-figure number of files in a single directory was a clever idea (I just had a customer who ran into that situation), he just smiled and shook his head. The directory in question was the typo3temp/pics directory that TYPO3 uses to scale images that are part of the website, so they are handed to the browser in exactly the size they are supposed to be rendered. The performance impact was quite heavy, because at some point requests started to pile up, PHP scripts did not finish in time, fcgi slots stayed used ... most of you will know that scenario. At some threshold a machine goes from loaded, maybe a bit slow, but generally responsive to no f*ing way. Best regards, Patrick I have similar situation here, but with a numerical simulation software, which drops for each timestep of integration a file of all integrated objects. Since the code is adopted and not very clever written in terms of doinf its I/O, I have to deal with this. While performing dynamical high resolution integrations of several hundreds of thousand objects over a time scale of 1 Ga produces even with a larger dump-delta creates a lot of files. making those files bigger results in a situation where they are hard to analyse, so its a tradeoff situation. ZFS and UFS2 perform bad on this situation, UFS2 even more than ZFS, but also ZFS is still a pain in the ass. Oliver ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On 2011-Aug-02 08:39:03 +0100, seanr...@gmail.com seanr...@gmail.com wrote: On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. As you might imagine, a 'ls' of this directory took quite some time. The files were conveniently named with a timestamp in the filename (still images from a security camera, once per second) so I've since moved them all to timestamped directories (/MM/dd/hh/mm). What I found though was the original directory the images were in is still very slow to ls -- and it only has 1 file in it, another directory. I've also seen this behaviour on Solaris 10 after cleaning out a directory with a large number of files (though not as pathological as your case). I tried creating and deleting entries in an unsuccessful effort to trigger directory compaction. I wound up moving the remaining contents into a new directory, deleting the original one and renaming the new directory. It would appear te be a garbage collection bug in ZFS. On 2011-Aug-02 13:10:27 +0300, Daniel Kalchev dan...@digsys.bg wrote: On 02.08.11 12:46, Daniel O'Connor wrote: I am pretty sure UFS does not have this problem. i.e. once you delete/move the files out of the directory its performance would be good again. UFS would be the classic example of poor performance if you do this. Traditional UFS (including Solaris) behave badly in this scenario but 4.4BSD derivatives will release unused space at the end of a directory and have smarts to more efficiently skip unused entries at the start of a directory. -- Peter Jeremy pgpmdeH6w8Ny5.pgp Description: PGP signature
Re: ZFS directory with a large number of files
Not an in depth solution for ZFS, but maybe a solution for you. mkdir images2 mv images/* images2 rmdir images Ronald. On Tue, 02 Aug 2011 09:39:03 +0200, seanr...@gmail.com seanr...@gmail.com wrote: Hi there, I Googled around and checked the PRs and wasn't successful in finding any reports of what I'm seeing. I'm hoping someone here can help me debug what's going on. On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. As you might imagine, a 'ls' of this directory took quite some time. The files were conveniently named with a timestamp in the filename (still images from a security camera, once per second) so I've since moved them all to timestamped directories (/MM/dd/hh/mm). What I found though was the original directory the images were in is still very slow to ls -- and it only has 1 file in it, another directory. To clarify: % ls second [lots of time and many many files enumerated] % # rename files using rename script % ls second [wait ages] 2011 dead % mkdir second2 mv second/2011 second2 % ls second2 [fast!] 2011 % ls second [still very slow] dead % time ls second dead/ gls -F --color 0.00s user 1.56s system 0% cpu 3:09.61 total (timings are similar for /bin/ls) This data is stored on a striped ZFS pool (version 15, though the kernel reports version 28 is available but zpool upgrade seems to disagree), 2T in size. I've run zpool scrub with no effect. ZFS is busily driving the disks away; my iostat monitoring has all three drives in the zpool running at 40-60% busy for the duration of the ls (it was quiet before). I've attached truss to the ls process. It spends a lot of time here: fstatfs(0x5,0x7fffe0d0,0x800ad5548,0x7fffdfd8,0x0,0x0) = 0 (0x0) I'm thinking there's some old ZFS metadata that it's looking into, but I'm not sure how to best dig into this to understand what's going on under the hood. Can anyone perhaps point me the right direction on this? Thanks, Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Tue, Aug 02, 2011 at 08:39:03AM +0100, seanr...@gmail.com wrote: On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. I'll keep this real simple: Why did you do this? I hope this was a stress test of some kind. If not: This is the 2nd or 3rd mail in recent months from people saying I decided to do something utterly stupid with my filesystem[1] and now I'm asking why performance sucks. Why can people not create proper directory tree layouts to avoid this problem regardless of what filesystem is used? I just don't get it. [1]: Applies to any filesystem, not just ZFS. There was a UFS one a month or two ago too... -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
inline On Tue, Aug 2, 2011 at 10:08 AM, Jeremy Chadwick free...@jdc.parodius.com wrote: On Tue, Aug 02, 2011 at 08:39:03AM +0100, seanr...@gmail.com wrote: On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. I'll keep this real simple: Why did you do this? I hope this was a stress test of some kind. If not: Not really, but it turned into one. The camera I was using had the ability (rather handily) to upload a still image once per second via FTP to a server of my choosing. It didn't have the ability to organize them for me in a neat directory hierarchy. So on holidays I went for 3 weeks and came back to ~2M images in the same directory. This is the 2nd or 3rd mail in recent months from people saying I decided to do something utterly stupid with my filesystem[1] and now I'm asking why performance sucks. Why can people not create proper directory tree layouts to avoid this problem regardless of what filesystem is used? I just don't get it. I'm not sure it's utterly stupid; I didn't expect legendarily fast performance from 'ls' or anything else that enumerated the contents of the directory when all the files were there. Now that the files are neatly organized, I expected fstatfs() on the directory to become fast again. It isn't. I'd like to understand why (or maybe learn a new trick or two about inspecting ZFS...) Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Tue, Aug 2, 2011 at 9:39 AM, seanr...@gmail.com seanr...@gmail.com wrote: On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. As you might imagine, a 'ls' of this directory took quite some time. What actually takes some time here isn't zfs, but the sorting of ls(1). Usually, running ls(1) with -f (Output is not sorted) speeds up things enormously. The files were conveniently named with a timestamp in the filename (still images from a security camera, once per second) so I've since moved them all to timestamped directories (/MM/dd/hh/mm). What I found though was the original directory the images were in is still very slow to ls -- and it only has 1 file in it, another directory. That is strange... and shouldn't happen. According to the ZFS Performance Wiki [1], operations on ZFS file systems are supposed to be pretty efficient: Concurrent, constant time directory operations Large directories need constant time operations (lookup, create, delete, etc). Hot directories need concurrent operations. ZFS uses extensible hashing to solve this. Block based, amortized growth cost, short chains for constant time ops, per-block locking for high concurrency. A caveat is that readir returns entries in hash order. Directories are implemented via the ZFS Attribute Processor (ZAP) in ZFS. ZAP can be used to arbitrary name value pairs. ZAP uses two algorithms are optimized for large lists (large directories) and small lists (attribute lists). The ZAP implementation is in zap.c and zap_leaf.c. Each directory is maintained as a table of pointers to constant sized buckets holding a variable number of entries. Each directory record is 16k in size. When this block gets full, a new block of size next power of two is allocated. A directory starts off as a microzap, and then upgraded to a fat zap (via mzap_upgrade) if the size of the name exceeds MZAP_NAME_LEN ( MZAP_ENT_LEN - 8 - 4 - 2) or 50 or if the size of the microzap exceeds MZAP_MAX_BLKSZ (128k) [1]: http://www.solarisinternals.com/wiki/index.php/ZFS_Performance I don't know what's going on there, but someone with ZFS internals expertise may want to have a closer look. To clarify: % ls second [lots of time and many many files enumerated] % # rename files using rename script % ls second [wait ages] 2011 dead % mkdir second2 mv second/2011 second2 % ls second2 [fast!] 2011 % ls second [still very slow] dead % time ls second dead/ gls -F --color 0.00s user 1.56s system 0% cpu 3:09.61 total (timings are similar for /bin/ls) This data is stored on a striped ZFS pool (version 15, though the kernel reports version 28 is available but zpool upgrade seems to disagree), 2T in size. I've run zpool scrub with no effect. ZFS is busily driving the disks away; my iostat monitoring has all three drives in the zpool running at 40-60% busy for the duration of the ls (it was quiet before). I've attached truss to the ls process. It spends a lot of time here: fstatfs(0x5,0x7fffe0d0,0x800ad5548,0x7fffdfd8,0x0,0x0) = 0 (0x0) That's a very good hint indeed! I'm thinking there's some old ZFS metadata that it's looking into, but I'm not sure how to best dig into this to understand what's going on under the hood. Can anyone perhaps point me the right direction on this? Thanks, Sean Regards, -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Tue, Aug 02, 2011 at 10:16:35AM +0100, seanr...@gmail.com wrote: On Tue, Aug 2, 2011 at 10:08 AM, Jeremy Chadwick free...@jdc.parodius.com wrote: On Tue, Aug 02, 2011 at 08:39:03AM +0100, seanr...@gmail.com wrote: On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. I'll keep this real simple: Why did you do this? I hope this was a stress test of some kind. ?If not: Not really, but it turned into one. The camera I was using had the ability (rather handily) to upload a still image once per second via FTP to a server of my choosing. It didn't have the ability to organize them for me in a neat directory hierarchy. So on holidays I went for 3 weeks and came back to ~2M images in the same directory. I equate this to the following conversation with a doctor: I shoved 2 million cotton balls into my ear, and now my hearing is sub-par. Why did you do this? In this situation, the correct reply to the physician is: because I wasn't thinking, doc. I suppose the reason I'm being so brash is that people doing this always brings into question the thought process that went into the decision to do said thing (both on your part, as well as the engineers of your camera who thought such a feature would be a wise choice, especially with an interval of 1 picture per second[1]). When I was being taught the ropes of system administration at Oregon State, the team of crotchety UNIX admins there made it quite clear that there were things you just Did Not Do(tm) to computer systems. Shoving thousands of files into a single directory with no hierarchy was one of them. Again I will point this out: it doesn't matter what filesystem is used, they all will suffer (just to different degrees) in this situation. This is just one of those things where I have to say: Please Don't Do This(tm). At least this is an educational experience for you. This is the 2nd or 3rd mail in recent months from people saying I decided to do something utterly stupid with my filesystem[1] and now I'm asking why performance sucks. Why can people not create proper directory tree layouts to avoid this problem regardless of what filesystem is used? ?I just don't get it. I'm not sure it's utterly stupid; I didn't expect legendarily fast performance from 'ls' or anything else that enumerated the contents of the directory when all the files were there. You shouldn't expect any kind of performance, at all, on any filesystem, when/if you do this. The effects can linger for quite some time. I can't come up with a good analogy at the moment. Now that the files are neatly organized, I expected fstatfs() on the directory to become fast again. It isn't. I'd like to understand why (or maybe learn a new trick or two about inspecting ZFS...) Ronald's recommendation should address the problem, or at least diminish it in the least. [1]: I would also strongly advocate contacting your camera manufacturer and asking them to add some extremely simple/basic code for adding a directory hierarchy when the images are put on an FTP server. The amount of code is extremely nominal. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On 02/08/2011, at 19:12, Jeremy Chadwick wrote: When I was being taught the ropes of system administration at Oregon State, the team of crotchety UNIX admins there made it quite clear that there were things you just Did Not Do(tm) to computer systems. Shoving thousands of files into a single directory with no hierarchy was one of them. Sounds like a terminal case of Stockholm syndrome ;) It might be avoidable by the user being nice to the computer, but come on.. The computer is supposed to do tedious crap that humans don't like. I am pretty sure UFS does not have this problem. i.e. once you delete/move the files out of the directory its performance would be good again. If it is a limitation in ZFS it would be nice to know that, perhaps it truly, really is a bug that can be avoided (or it's inherent in the way ZFS handles such things) -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On 02/08/2011, at 18:38, Jeremy Chadwick wrote: On Tue, Aug 02, 2011 at 08:39:03AM +0100, seanr...@gmail.com wrote: On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. I'll keep this real simple: Why did you do this? I hope this was a stress test of some kind. If not: This is the 2nd or 3rd mail in recent months from people saying I decided to do something utterly stupid with my filesystem[1] and now I'm asking why performance sucks. Why can people not create proper directory tree layouts to avoid this problem regardless of what filesystem is used? I just don't get it. [1]: Applies to any filesystem, not just ZFS. There was a UFS one a month or two ago too… The problem is that he is being punished with shitty FS performance even though the directory structure is now non-silly. It sounds like the FS hasn't GC'd some (now unneeded) metadata.. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On 02.08.11 12:46, Daniel O'Connor wrote: I am pretty sure UFS does not have this problem. i.e. once you delete/move the files out of the directory its performance would be good again. UFS would be the classic example of poor performance if you do this. If it is a limitation in ZFS it would be nice to know that, perhaps it truly, really is a bug that can be avoided (or it's inherent in the way ZFS handles such things) It is possible that there is not enough memory in ARC to cache that large directory. Other than that, perhaps in ZFS it would be easier to prune the unused directory entries, than it is in UFS. It looks like this is not implemented. Another reason might be some FreeBSD specific implementation issue for fstatfs. In any case, the data available is not sufficient. More information would help, like how much RAM this system has, how much ARC uses, some ARC stats. What made me wonder is .. how exactly the kernel and zpool disagree on zpool version? What is the pool version in fact? Daniel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Tue, Aug 2, 2011 at 11:10 AM, Daniel Kalchev dan...@digsys.bg wrote: If it is a limitation in ZFS it would be nice to know that, perhaps it truly, really is a bug that can be avoided (or it's inherent in the way ZFS handles such things) It is possible that there is not enough memory in ARC to cache that large directory. Other than that, perhaps in ZFS it would be easier to prune the unused directory entries, than it is in UFS. It looks like this is not implemented. Another reason might be some FreeBSD specific implementation issue for fstatfs. In any case, the data available is not sufficient. More information would help, like how much RAM this system has, how much ARC uses, some ARC stats. Which sysctl's would you like? I grabbed these to start: kstat.zfs.misc.arcstats.size: 118859656 kstat.zfs.misc.arcstats.hdr_size: 3764416 kstat.zfs.misc.arcstats.data_size: 53514240 kstat.zfs.misc.arcstats.other_size: 61581000 kstat.zfs.misc.arcstats.hits: 46762467 kstat.zfs.misc.arcstats.misses: 1607 The machine has 2GB of memory. What made me wonder is .. how exactly the kernel and zpool disagree on zpool version? What is the pool version in fact? % dmesg | grep ZFS ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add vfs.zfs.prefetch_disable=0 to /boot/loader.conf. ZFS filesystem version 5 ZFS storage pool version 28 % zpool get version tank NAME PROPERTY VALUESOURCE tank version 15 local % zpool upgrade tank This system is currently running ZFS pool version 15. Pool 'tank' is already formatted using the current version. Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Tue, Aug 02, 2011 at 11:55:43AM +0100, seanr...@gmail.com wrote: On Tue, Aug 2, 2011 at 11:10 AM, Daniel Kalchev dan...@digsys.bg wrote: If it is a limitation in ZFS it would be nice to know that, perhaps it truly, really is a bug that can be avoided (or it's inherent in the way ZFS handles such things) It is possible ?that there is not enough memory in ARC to cache that large directory. Other than that, perhaps in ZFS it would be easier to prune the unused directory entries, than it is in UFS. It looks like this is not implemented. Another reason might be some FreeBSD specific implementation issue for fstatfs. In any case, the data available is not sufficient. More information would help, like how much RAM this system has, how much ARC uses, some ARC stats. Which sysctl's would you like? Output from sysctl vfs.zfs kstat.zfs would be sufficient. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Tue, Aug 2, 2011 at 12:07 PM, Jeremy Chadwick free...@jdc.parodius.com wrote: On Tue, Aug 02, 2011 at 11:55:43AM +0100, seanr...@gmail.com wrote: On Tue, Aug 2, 2011 at 11:10 AM, Daniel Kalchev dan...@digsys.bg wrote: If it is a limitation in ZFS it would be nice to know that, perhaps it truly, really is a bug that can be avoided (or it's inherent in the way ZFS handles such things) It is possible ?that there is not enough memory in ARC to cache that large directory. Other than that, perhaps in ZFS it would be easier to prune the unused directory entries, than it is in UFS. It looks like this is not implemented. Another reason might be some FreeBSD specific implementation issue for fstatfs. In any case, the data available is not sufficient. More information would help, like how much RAM this system has, how much ARC uses, some ARC stats. Which sysctl's would you like? Output from sysctl vfs.zfs kstat.zfs would be sufficient. Here we are: vfs.zfs.l2c_only_size: 0 vfs.zfs.mfu_ghost_data_lsize: 0 vfs.zfs.mfu_ghost_metadata_lsize: 26383360 vfs.zfs.mfu_ghost_size: 26383360 vfs.zfs.mfu_data_lsize: 0 vfs.zfs.mfu_metadata_lsize: 154112 vfs.zfs.mfu_size: 3944960 vfs.zfs.mru_ghost_data_lsize: 0 vfs.zfs.mru_ghost_metadata_lsize: 76250624 vfs.zfs.mru_ghost_size: 76250624 vfs.zfs.mru_data_lsize: 30208 vfs.zfs.mru_metadata_lsize: 16896 vfs.zfs.mru_size: 29353984 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 150016 vfs.zfs.l2arc_norw: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_noprefetch: 1 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_write_boost: 8388608 vfs.zfs.l2arc_write_max: 8388608 vfs.zfs.arc_meta_limit: 26214400 vfs.zfs.arc_meta_used: 108539456 vfs.zfs.arc_min: 33554432 vfs.zfs.arc_max: 104857600 vfs.zfs.dedup.prefetch: 1 vfs.zfs.mdcomp_disable: 0 vfs.zfs.write_limit_override: 0 vfs.zfs.write_limit_inflated: 6360993792 vfs.zfs.write_limit_max: 265041408 vfs.zfs.write_limit_min: 33554432 vfs.zfs.write_limit_shift: 3 vfs.zfs.no_write_throttle: 0 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 1 vfs.zfs.check_hostid: 1 vfs.zfs.recover: 0 vfs.zfs.txg.synctime_ms: 1000 vfs.zfs.txg.timeout: 5 vfs.zfs.scrub_limit: 10 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 10485760 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.write_gap_limit: 4096 vfs.zfs.vdev.read_gap_limit: 32768 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.ramp_rate: 2 vfs.zfs.vdev.time_shift: 6 vfs.zfs.vdev.min_pending: 4 vfs.zfs.vdev.max_pending: 10 vfs.zfs.vdev.bio_flush_disable: 0 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_replay_disable: 0 vfs.zfs.zio.use_uma: 0 vfs.zfs.version.zpl: 5 vfs.zfs.version.spa: 28 vfs.zfs.version.acl: 1 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 kstat.zfs.misc.xuio_stats.onloan_read_buf: 0 kstat.zfs.misc.xuio_stats.onloan_write_buf: 0 kstat.zfs.misc.xuio_stats.read_buf_copied: 0 kstat.zfs.misc.xuio_stats.read_buf_nocopy: 0 kstat.zfs.misc.xuio_stats.write_buf_copied: 0 kstat.zfs.misc.xuio_stats.write_buf_nocopy: 107064 kstat.zfs.misc.zfetchstats.hits: 0 kstat.zfs.misc.zfetchstats.misses: 0 kstat.zfs.misc.zfetchstats.colinear_hits: 0 kstat.zfs.misc.zfetchstats.colinear_misses: 0 kstat.zfs.misc.zfetchstats.stride_hits: 0 kstat.zfs.misc.zfetchstats.stride_misses: 0 kstat.zfs.misc.zfetchstats.reclaim_successes: 0 kstat.zfs.misc.zfetchstats.reclaim_failures: 0 kstat.zfs.misc.zfetchstats.streams_resets: 0 kstat.zfs.misc.zfetchstats.streams_noresets: 0 kstat.zfs.misc.zfetchstats.bogus_streams: 0 kstat.zfs.misc.arcstats.hits: 47091548 kstat.zfs.misc.arcstats.misses: 17064059 kstat.zfs.misc.arcstats.demand_data_hits: 15357194 kstat.zfs.misc.arcstats.demand_data_misses: 3077290 kstat.zfs.misc.arcstats.demand_metadata_hits: 31102404 kstat.zfs.misc.arcstats.demand_metadata_misses: 8692242 kstat.zfs.misc.arcstats.prefetch_data_hits: 0 kstat.zfs.misc.arcstats.prefetch_data_misses: 0 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 631950 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 5294527 kstat.zfs.misc.arcstats.mru_hits: 27566971 kstat.zfs.misc.arcstats.mru_ghost_hits: 2179308 kstat.zfs.misc.arcstats.mfu_hits: 18950663 kstat.zfs.misc.arcstats.mfu_ghost_hits: 2714218 kstat.zfs.misc.arcstats.allocated: 19825272 kstat.zfs.misc.arcstats.deleted: 12619489 kstat.zfs.misc.arcstats.stolen: 9003539 kstat.zfs.misc.arcstats.recycle_miss: 10224598 kstat.zfs.misc.arcstats.mutex_miss: 1984 kstat.zfs.misc.arcstats.evict_skip: 216358592 kstat.zfs.misc.arcstats.evict_l2_cached: 0 kstat.zfs.misc.arcstats.evict_l2_eligible: 433025541120 kstat.zfs.misc.arcstats.evict_l2_ineligible: 87633796096 kstat.zfs.misc.arcstats.hash_elements: 15988 kstat.zfs.misc.arcstats.hash_elements_max: 43365 kstat.zfs.misc.arcstats.hash_collisions: 5599202 kstat.zfs.misc.arcstats.hash_chains: 3944
Re: ZFS directory with a large number of files
On 02/08/2011 11:10, Daniel Kalchev wrote: Other than that, perhaps in ZFS it would be easier to prune the unused directory entries, than it is in UFS. It looks like this is not implemented. Remember that ZFS uses copy-on-write for all filesystem updates. Any change to a directory contents means the whole directory data is rewritten. In which case, there's no benefit to retaining a large data structure with lots of empty slots (as happened on Unix FSes in the past.) I'd expect, and I see in my (admittedly fairly cursory) testing that ZFS directory data sizes update immediately whenever files are added or removed from the directory. Where this gets interesting is when the directory gets sufficiently large that the directory data is larger than the 128kB block size used by ZFS. As that takes many more files than any sensible person would put into one directory it's possible there's a bug in handling such large structures which is only rarely tickled. But this is all speculation on my behalf, and I have no evidence to back it up. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
zpool doesn't upgrade - Re: ZFS directory with a large number of files
On Tue, 02 Aug 2011 12:55:43 +0200, seanr...@gmail.com seanr...@gmail.com wrote: On Tue, Aug 2, 2011 at 11:10 AM, Daniel Kalchev dan...@digsys.bg wrote: If it is a limitation in ZFS it would be nice to know that, perhaps it truly, really is a bug that can be avoided (or it's inherent in the way ZFS handles such things) It is possible that there is not enough memory in ARC to cache that large directory. Other than that, perhaps in ZFS it would be easier to prune the unused directory entries, than it is in UFS. It looks like this is not implemented. Another reason might be some FreeBSD specific implementation issue for fstatfs. In any case, the data available is not sufficient. More information would help, like how much RAM this system has, how much ARC uses, some ARC stats. Which sysctl's would you like? I grabbed these to start: kstat.zfs.misc.arcstats.size: 118859656 kstat.zfs.misc.arcstats.hdr_size: 3764416 kstat.zfs.misc.arcstats.data_size: 53514240 kstat.zfs.misc.arcstats.other_size: 61581000 kstat.zfs.misc.arcstats.hits: 46762467 kstat.zfs.misc.arcstats.misses: 1607 The machine has 2GB of memory. What made me wonder is .. how exactly the kernel and zpool disagree on zpool version? What is the pool version in fact? % dmesg | grep ZFS ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add vfs.zfs.prefetch_disable=0 to /boot/loader.conf. ZFS filesystem version 5 ZFS storage pool version 28 % zpool get version tank NAME PROPERTY VALUESOURCE tank version 15 local % zpool upgrade tank This system is currently running ZFS pool version 15. Pool 'tank' is already formatted using the current version. Sean I think this zpool upgrade thing is weird. Can you try 'zpool upgrade -a'? Mine says: zpool get version zroot NAME PROPERTY VALUESOURCE zroot version 28 default Mind the SOURCE=default vs. SOURCE=local. Is it possible you did 'zpool set version=15 tank' in the past? You can check that with 'zpool history'. NB: if you upgrade the boot pool, don't forget to upgrade to boot loader. (See UPDATING) Ronald. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org