Re: zpool doesn't upgrade - Re: ZFS directory with a large number of files
On Tue, Aug 2, 2011 at 8:59 PM, Ronald Klop ronald-freeb...@klop.yi.org wrote: On Tue, 02 Aug 2011 12:55:43 +0200, seanr...@gmail.com seanr...@gmail.com wrote: I think this zpool upgrade thing is weird. Can you try 'zpool upgrade -a'? Mine says: zpool get version zroot NAME PROPERTY VALUE SOURCE zroot version 28 default Mind the SOURCE=default vs. SOURCE=local. Is it possible you did 'zpool set version=15 tank' in the past? You can check that with 'zpool history'. NB: if you upgrade the boot pool, don't forget to upgrade to boot loader. (See UPDATING) % sudo zpool upgrade -a Password: This system is currently running ZFS pool version 15. All pools are formatted using this version. I checked zpool history and I never set the version explicitly. My 'world' is from the 8th of March; it's possible my tree is sufficiently old (my kernel was built on the 12th of June; I'm fairly sure its from the same tree as the world, but it's also possible my kernel and userland have been out of sync for 2 months). I'll upgrade this machine sometime soon and see if that fixes the issue. Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Sun, Aug 7, 2011 at 10:20 AM, Sean Rees seanr...@gmail.com wrote: On Aug 6, 2011, at 07:24, Gary Palmer wrote: On Fri, Aug 05, 2011 at 08:56:36PM -0700, Doug Barton wrote: On 08/05/2011 20:38, Daniel O'Connor wrote: Ahh, but OP had moved these files away and performance was still poor.. _that_ is the bug. I'm no file system expert, but it seems to me the key questions are; how long does it take the system to recover from this condition, and if it's more than N $periods is that a problem? We can't stop users from doing wacky stuff, but the system should be robust in the face of this. Its been quite a while since I worked on the filesystem stuff in any detail but I believe, at least for UFS, it doesn't GC the directory, just truncate it if enough of the entries at the end are deleted to free up at least one fragment or block. If you create N files and then a directory and move the N files into the directory, the directory entry will still be N+1 records into the directory and the only way to recover is to recreate the directory that formerly contained the N files. It is theoretically possible to compat the directory but since the code to do that wasn't written when I last worked with UFS I suspect its non trivial. I don't know what ZFS does in this situation It sounds like it does something similar. I re-ran the experiment to see if I could narrow down the problem. % mkdir foo % cd foo for i in {1..1000}; do touch $i; done Self-pedant mode enabled: for i in {1..100} :) I truncated the zeros in correcting the copy/paste from my shell :) Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS directory with a large number of files
Hi there, I Googled around and checked the PRs and wasn't successful in finding any reports of what I'm seeing. I'm hoping someone here can help me debug what's going on. On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. As you might imagine, a 'ls' of this directory took quite some time. The files were conveniently named with a timestamp in the filename (still images from a security camera, once per second) so I've since moved them all to timestamped directories (/MM/dd/hh/mm). What I found though was the original directory the images were in is still very slow to ls -- and it only has 1 file in it, another directory. To clarify: % ls second [lots of time and many many files enumerated] % # rename files using rename script % ls second [wait ages] 2011 dead % mkdir second2 mv second/2011 second2 % ls second2 [fast!] 2011 % ls second [still very slow] dead % time ls second dead/ gls -F --color 0.00s user 1.56s system 0% cpu 3:09.61 total (timings are similar for /bin/ls) This data is stored on a striped ZFS pool (version 15, though the kernel reports version 28 is available but zpool upgrade seems to disagree), 2T in size. I've run zpool scrub with no effect. ZFS is busily driving the disks away; my iostat monitoring has all three drives in the zpool running at 40-60% busy for the duration of the ls (it was quiet before). I've attached truss to the ls process. It spends a lot of time here: fstatfs(0x5,0x7fffe0d0,0x800ad5548,0x7fffdfd8,0x0,0x0) = 0 (0x0) I'm thinking there's some old ZFS metadata that it's looking into, but I'm not sure how to best dig into this to understand what's going on under the hood. Can anyone perhaps point me the right direction on this? Thanks, Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
inline On Tue, Aug 2, 2011 at 10:08 AM, Jeremy Chadwick free...@jdc.parodius.com wrote: On Tue, Aug 02, 2011 at 08:39:03AM +0100, seanr...@gmail.com wrote: On my FreeBSD 8.2-S machine (built circa 12th June), I created a directory and populated it over the course of 3 weeks with about 2 million individual files. I'll keep this real simple: Why did you do this? I hope this was a stress test of some kind. If not: Not really, but it turned into one. The camera I was using had the ability (rather handily) to upload a still image once per second via FTP to a server of my choosing. It didn't have the ability to organize them for me in a neat directory hierarchy. So on holidays I went for 3 weeks and came back to ~2M images in the same directory. This is the 2nd or 3rd mail in recent months from people saying I decided to do something utterly stupid with my filesystem[1] and now I'm asking why performance sucks. Why can people not create proper directory tree layouts to avoid this problem regardless of what filesystem is used? I just don't get it. I'm not sure it's utterly stupid; I didn't expect legendarily fast performance from 'ls' or anything else that enumerated the contents of the directory when all the files were there. Now that the files are neatly organized, I expected fstatfs() on the directory to become fast again. It isn't. I'd like to understand why (or maybe learn a new trick or two about inspecting ZFS...) Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Tue, Aug 2, 2011 at 11:10 AM, Daniel Kalchev dan...@digsys.bg wrote: If it is a limitation in ZFS it would be nice to know that, perhaps it truly, really is a bug that can be avoided (or it's inherent in the way ZFS handles such things) It is possible that there is not enough memory in ARC to cache that large directory. Other than that, perhaps in ZFS it would be easier to prune the unused directory entries, than it is in UFS. It looks like this is not implemented. Another reason might be some FreeBSD specific implementation issue for fstatfs. In any case, the data available is not sufficient. More information would help, like how much RAM this system has, how much ARC uses, some ARC stats. Which sysctl's would you like? I grabbed these to start: kstat.zfs.misc.arcstats.size: 118859656 kstat.zfs.misc.arcstats.hdr_size: 3764416 kstat.zfs.misc.arcstats.data_size: 53514240 kstat.zfs.misc.arcstats.other_size: 61581000 kstat.zfs.misc.arcstats.hits: 46762467 kstat.zfs.misc.arcstats.misses: 1607 The machine has 2GB of memory. What made me wonder is .. how exactly the kernel and zpool disagree on zpool version? What is the pool version in fact? % dmesg | grep ZFS ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add vfs.zfs.prefetch_disable=0 to /boot/loader.conf. ZFS filesystem version 5 ZFS storage pool version 28 % zpool get version tank NAME PROPERTY VALUESOURCE tank version 15 local % zpool upgrade tank This system is currently running ZFS pool version 15. Pool 'tank' is already formatted using the current version. Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS directory with a large number of files
On Tue, Aug 2, 2011 at 12:07 PM, Jeremy Chadwick free...@jdc.parodius.com wrote: On Tue, Aug 02, 2011 at 11:55:43AM +0100, seanr...@gmail.com wrote: On Tue, Aug 2, 2011 at 11:10 AM, Daniel Kalchev dan...@digsys.bg wrote: If it is a limitation in ZFS it would be nice to know that, perhaps it truly, really is a bug that can be avoided (or it's inherent in the way ZFS handles such things) It is possible ?that there is not enough memory in ARC to cache that large directory. Other than that, perhaps in ZFS it would be easier to prune the unused directory entries, than it is in UFS. It looks like this is not implemented. Another reason might be some FreeBSD specific implementation issue for fstatfs. In any case, the data available is not sufficient. More information would help, like how much RAM this system has, how much ARC uses, some ARC stats. Which sysctl's would you like? Output from sysctl vfs.zfs kstat.zfs would be sufficient. Here we are: vfs.zfs.l2c_only_size: 0 vfs.zfs.mfu_ghost_data_lsize: 0 vfs.zfs.mfu_ghost_metadata_lsize: 26383360 vfs.zfs.mfu_ghost_size: 26383360 vfs.zfs.mfu_data_lsize: 0 vfs.zfs.mfu_metadata_lsize: 154112 vfs.zfs.mfu_size: 3944960 vfs.zfs.mru_ghost_data_lsize: 0 vfs.zfs.mru_ghost_metadata_lsize: 76250624 vfs.zfs.mru_ghost_size: 76250624 vfs.zfs.mru_data_lsize: 30208 vfs.zfs.mru_metadata_lsize: 16896 vfs.zfs.mru_size: 29353984 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 150016 vfs.zfs.l2arc_norw: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_noprefetch: 1 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_write_boost: 8388608 vfs.zfs.l2arc_write_max: 8388608 vfs.zfs.arc_meta_limit: 26214400 vfs.zfs.arc_meta_used: 108539456 vfs.zfs.arc_min: 33554432 vfs.zfs.arc_max: 104857600 vfs.zfs.dedup.prefetch: 1 vfs.zfs.mdcomp_disable: 0 vfs.zfs.write_limit_override: 0 vfs.zfs.write_limit_inflated: 6360993792 vfs.zfs.write_limit_max: 265041408 vfs.zfs.write_limit_min: 33554432 vfs.zfs.write_limit_shift: 3 vfs.zfs.no_write_throttle: 0 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 1 vfs.zfs.check_hostid: 1 vfs.zfs.recover: 0 vfs.zfs.txg.synctime_ms: 1000 vfs.zfs.txg.timeout: 5 vfs.zfs.scrub_limit: 10 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 10485760 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.write_gap_limit: 4096 vfs.zfs.vdev.read_gap_limit: 32768 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.ramp_rate: 2 vfs.zfs.vdev.time_shift: 6 vfs.zfs.vdev.min_pending: 4 vfs.zfs.vdev.max_pending: 10 vfs.zfs.vdev.bio_flush_disable: 0 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_replay_disable: 0 vfs.zfs.zio.use_uma: 0 vfs.zfs.version.zpl: 5 vfs.zfs.version.spa: 28 vfs.zfs.version.acl: 1 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 kstat.zfs.misc.xuio_stats.onloan_read_buf: 0 kstat.zfs.misc.xuio_stats.onloan_write_buf: 0 kstat.zfs.misc.xuio_stats.read_buf_copied: 0 kstat.zfs.misc.xuio_stats.read_buf_nocopy: 0 kstat.zfs.misc.xuio_stats.write_buf_copied: 0 kstat.zfs.misc.xuio_stats.write_buf_nocopy: 107064 kstat.zfs.misc.zfetchstats.hits: 0 kstat.zfs.misc.zfetchstats.misses: 0 kstat.zfs.misc.zfetchstats.colinear_hits: 0 kstat.zfs.misc.zfetchstats.colinear_misses: 0 kstat.zfs.misc.zfetchstats.stride_hits: 0 kstat.zfs.misc.zfetchstats.stride_misses: 0 kstat.zfs.misc.zfetchstats.reclaim_successes: 0 kstat.zfs.misc.zfetchstats.reclaim_failures: 0 kstat.zfs.misc.zfetchstats.streams_resets: 0 kstat.zfs.misc.zfetchstats.streams_noresets: 0 kstat.zfs.misc.zfetchstats.bogus_streams: 0 kstat.zfs.misc.arcstats.hits: 47091548 kstat.zfs.misc.arcstats.misses: 17064059 kstat.zfs.misc.arcstats.demand_data_hits: 15357194 kstat.zfs.misc.arcstats.demand_data_misses: 3077290 kstat.zfs.misc.arcstats.demand_metadata_hits: 31102404 kstat.zfs.misc.arcstats.demand_metadata_misses: 8692242 kstat.zfs.misc.arcstats.prefetch_data_hits: 0 kstat.zfs.misc.arcstats.prefetch_data_misses: 0 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 631950 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 5294527 kstat.zfs.misc.arcstats.mru_hits: 27566971 kstat.zfs.misc.arcstats.mru_ghost_hits: 2179308 kstat.zfs.misc.arcstats.mfu_hits: 18950663 kstat.zfs.misc.arcstats.mfu_ghost_hits: 2714218 kstat.zfs.misc.arcstats.allocated: 19825272 kstat.zfs.misc.arcstats.deleted: 12619489 kstat.zfs.misc.arcstats.stolen: 9003539 kstat.zfs.misc.arcstats.recycle_miss: 10224598 kstat.zfs.misc.arcstats.mutex_miss: 1984 kstat.zfs.misc.arcstats.evict_skip: 216358592 kstat.zfs.misc.arcstats.evict_l2_cached: 0 kstat.zfs.misc.arcstats.evict_l2_eligible: 433025541120 kstat.zfs.misc.arcstats.evict_l2_ineligible: 87633796096 kstat.zfs.misc.arcstats.hash_elements: 15988 kstat.zfs.misc.arcstats.hash_elements_max: 43365 kstat.zfs.misc.arcstats.hash_collisions: 5599202 kstat.zfs.misc.arcstats.hash_chains: 3944