Re: zpool doesn't upgrade - Re: ZFS directory with a large number of files

2011-08-09 Thread seanr...@gmail.com
On Tue, Aug 2, 2011 at 8:59 PM, Ronald Klop ronald-freeb...@klop.yi.org wrote:
 On Tue, 02 Aug 2011 12:55:43 +0200, seanr...@gmail.com seanr...@gmail.com
 wrote:
 I think this zpool upgrade thing is weird. Can you try 'zpool upgrade -a'?

 Mine says:
 zpool get version zroot
 NAME   PROPERTY  VALUE    SOURCE
 zroot  version   28       default

 Mind the SOURCE=default vs. SOURCE=local.
 Is it possible you did 'zpool set version=15 tank' in the past? You can
 check that with 'zpool history'.

 NB: if you upgrade the boot pool, don't forget to upgrade to boot loader.
 (See UPDATING)

% sudo zpool upgrade -a
Password:
This system is currently running ZFS pool version 15.

All pools are formatted using this version.

I checked zpool history and I never set the version explicitly. My
'world' is from the 8th of March; it's possible my tree is
sufficiently old (my kernel was built on the 12th of June; I'm fairly
sure its from the same tree as the world, but it's also possible my
kernel and userland have been out of sync for 2 months).

I'll upgrade this machine sometime soon and see if that fixes the issue.

Sean
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS directory with a large number of files

2011-08-07 Thread seanr...@gmail.com
On Sun, Aug 7, 2011 at 10:20 AM, Sean Rees seanr...@gmail.com wrote:

 On Aug 6, 2011, at 07:24, Gary Palmer wrote:

 On Fri, Aug 05, 2011 at 08:56:36PM -0700, Doug Barton wrote:
 On 08/05/2011 20:38, Daniel O'Connor wrote:

 Ahh, but OP had moved these files away and performance was still poor.. 
 _that_ is the bug.

 I'm no file system expert, but it seems to me the key questions are; how
 long does it take the system to recover from this condition, and if it's
 more than N $periods is that a problem? We can't stop users from doing
 wacky stuff, but the system should be robust in the face of this.

 Its been quite a while since I worked on the filesystem stuff in any
 detail but I believe, at least for UFS, it doesn't GC the directory,
 just truncate it if enough of the entries at the end are deleted
 to free up at least one fragment or block.  If you create N files and
 then a directory and move the N files into the directory, the directory
 entry will still be N+1 records into the directory and the only way to
 recover is to recreate the directory that formerly contained the N
 files.  It is theoretically possible to compat the directory but since
 the code to do that wasn't written when I last worked with UFS I suspect
 its non trivial.

 I don't know what ZFS does in this situation

 It sounds like it does something similar.

 I re-ran the experiment to see if I could narrow down the problem.

 % mkdir foo
 % cd foo  for i in {1..1000}; do touch $i; done

Self-pedant mode enabled:

for i in {1..100} :) I truncated the zeros in correcting the
copy/paste from my shell :)


Sean
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS directory with a large number of files

2011-08-02 Thread seanr...@gmail.com
Hi there,

I Googled around and checked the PRs and wasn't successful in finding
any reports of what I'm seeing. I'm hoping someone here can help me
debug what's going on.

On my FreeBSD 8.2-S machine (built circa 12th June), I created a
directory and populated it over the course of 3 weeks with about 2
million individual files. As you might imagine, a 'ls' of this
directory took quite some time.

The files were conveniently named with a timestamp in the filename
(still images from a security camera, once per second) so I've since
moved them all to timestamped directories (/MM/dd/hh/mm). What I
found though was the original directory the images were in is still
very slow to ls -- and it only has 1 file in it, another directory.

To clarify:
% ls second
[lots of time and many many files enumerated]
% # rename files using rename script
% ls second
[wait ages]
2011 dead
% mkdir second2  mv second/2011 second2
% ls second2
[fast!]
2011
% ls second
[still very slow]
dead
% time ls second
dead/
gls -F --color  0.00s user 1.56s system 0% cpu 3:09.61 total

(timings are similar for /bin/ls)

This data is stored on a striped ZFS pool (version 15, though the
kernel reports version 28 is available but zpool upgrade seems to
disagree), 2T in size. I've run zpool scrub with no effect. ZFS is
busily driving the disks away; my iostat monitoring has all three
drives in the zpool running at 40-60% busy for the duration of the ls
(it was quiet before).

I've attached truss to the ls process. It spends a lot of time here:
fstatfs(0x5,0x7fffe0d0,0x800ad5548,0x7fffdfd8,0x0,0x0) = 0 (0x0)

I'm thinking there's some old ZFS metadata that it's looking into, but
I'm not sure how to best dig into this to understand what's going on
under the hood.

Can anyone perhaps point me the right direction on this?

Thanks,

Sean
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS directory with a large number of files

2011-08-02 Thread seanr...@gmail.com
inline

On Tue, Aug 2, 2011 at 10:08 AM, Jeremy Chadwick
free...@jdc.parodius.com wrote:
 On Tue, Aug 02, 2011 at 08:39:03AM +0100, seanr...@gmail.com wrote:
 On my FreeBSD 8.2-S machine (built circa 12th June), I created a
 directory and populated it over the course of 3 weeks with about 2
 million individual files.

 I'll keep this real simple:

 Why did you do this?

 I hope this was a stress test of some kind.  If not:

Not really, but it turned into one.

The camera I was using had the ability (rather handily) to upload a
still image once per second via FTP to a server of my choosing. It
didn't have the ability to organize them for me in a neat directory
hierarchy. So on holidays I went for 3 weeks and came back to ~2M
images in the same directory.


 This is the 2nd or 3rd mail in recent months from people saying I
 decided to do something utterly stupid with my filesystem[1] and now I'm
 asking why performance sucks.

 Why can people not create proper directory tree layouts to avoid this
 problem regardless of what filesystem is used?  I just don't get it.


I'm not sure it's utterly stupid; I didn't expect legendarily fast
performance from 'ls' or anything else that enumerated the contents of
the directory when all the files were there. Now that the files are
neatly organized, I expected fstatfs() on the directory to become fast
again. It isn't. I'd like to understand why (or maybe learn a new
trick or two about inspecting ZFS...)


Sean
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS directory with a large number of files

2011-08-02 Thread seanr...@gmail.com
On Tue, Aug 2, 2011 at 11:10 AM, Daniel Kalchev dan...@digsys.bg wrote:
 If it is a limitation in ZFS it would be nice to know that, perhaps it
 truly, really is a bug that can be avoided (or it's inherent in the way ZFS
 handles such things)

 It is possible  that there is not enough memory in ARC to cache that large
 directory.

 Other than that, perhaps in ZFS it would be easier to prune the unused
 directory entries, than it is in UFS. It looks like this is not implemented.

 Another reason might be some FreeBSD specific implementation issue for
 fstatfs.

 In any case, the data available is not sufficient. More information would
 help, like how much RAM this system has, how much ARC uses, some ARC stats.

Which sysctl's would you like?

I grabbed these to start:
kstat.zfs.misc.arcstats.size: 118859656
kstat.zfs.misc.arcstats.hdr_size: 3764416
kstat.zfs.misc.arcstats.data_size: 53514240
kstat.zfs.misc.arcstats.other_size: 61581000

kstat.zfs.misc.arcstats.hits: 46762467
kstat.zfs.misc.arcstats.misses: 1607

The machine has 2GB of memory.

 What made me wonder is .. how exactly the kernel and zpool disagree on zpool
 version? What is the pool version in fact?

% dmesg | grep ZFS
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
to enable, add vfs.zfs.prefetch_disable=0 to /boot/loader.conf.
ZFS filesystem version 5
ZFS storage pool version 28

% zpool get version tank
NAME  PROPERTY  VALUESOURCE
tank  version   15   local

% zpool upgrade tank
This system is currently running ZFS pool version 15.

Pool 'tank' is already formatted using the current version.


Sean
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS directory with a large number of files

2011-08-02 Thread seanr...@gmail.com
On Tue, Aug 2, 2011 at 12:07 PM, Jeremy Chadwick
free...@jdc.parodius.com wrote:
 On Tue, Aug 02, 2011 at 11:55:43AM +0100, seanr...@gmail.com wrote:
 On Tue, Aug 2, 2011 at 11:10 AM, Daniel Kalchev dan...@digsys.bg wrote:
  If it is a limitation in ZFS it would be nice to know that, perhaps it
  truly, really is a bug that can be avoided (or it's inherent in the way 
  ZFS
  handles such things)
 
  It is possible ?that there is not enough memory in ARC to cache that large
  directory.
 
  Other than that, perhaps in ZFS it would be easier to prune the unused
  directory entries, than it is in UFS. It looks like this is not 
  implemented.
 
  Another reason might be some FreeBSD specific implementation issue for
  fstatfs.
 
  In any case, the data available is not sufficient. More information would
  help, like how much RAM this system has, how much ARC uses, some ARC stats.

 Which sysctl's would you like?

 Output from sysctl vfs.zfs kstat.zfs would be sufficient.

Here we are:


vfs.zfs.l2c_only_size: 0
vfs.zfs.mfu_ghost_data_lsize: 0
vfs.zfs.mfu_ghost_metadata_lsize: 26383360
vfs.zfs.mfu_ghost_size: 26383360
vfs.zfs.mfu_data_lsize: 0
vfs.zfs.mfu_metadata_lsize: 154112
vfs.zfs.mfu_size: 3944960
vfs.zfs.mru_ghost_data_lsize: 0
vfs.zfs.mru_ghost_metadata_lsize: 76250624
vfs.zfs.mru_ghost_size: 76250624
vfs.zfs.mru_data_lsize: 30208
vfs.zfs.mru_metadata_lsize: 16896
vfs.zfs.mru_size: 29353984
vfs.zfs.anon_data_lsize: 0
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_size: 150016
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
vfs.zfs.arc_meta_limit: 26214400
vfs.zfs.arc_meta_used: 108539456
vfs.zfs.arc_min: 33554432
vfs.zfs.arc_max: 104857600
vfs.zfs.dedup.prefetch: 1
vfs.zfs.mdcomp_disable: 0
vfs.zfs.write_limit_override: 0
vfs.zfs.write_limit_inflated: 6360993792
vfs.zfs.write_limit_max: 265041408
vfs.zfs.write_limit_min: 33554432
vfs.zfs.write_limit_shift: 3
vfs.zfs.no_write_throttle: 0
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 1
vfs.zfs.check_hostid: 1
vfs.zfs.recover: 0
vfs.zfs.txg.synctime_ms: 1000
vfs.zfs.txg.timeout: 5
vfs.zfs.scrub_limit: 10
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 10485760
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.write_gap_limit: 4096
vfs.zfs.vdev.read_gap_limit: 32768
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.ramp_rate: 2
vfs.zfs.vdev.time_shift: 6
vfs.zfs.vdev.min_pending: 4
vfs.zfs.vdev.max_pending: 10
vfs.zfs.vdev.bio_flush_disable: 0
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_replay_disable: 0
vfs.zfs.zio.use_uma: 0
vfs.zfs.version.zpl: 5
vfs.zfs.version.spa: 28
vfs.zfs.version.acl: 1
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
kstat.zfs.misc.xuio_stats.onloan_read_buf: 0
kstat.zfs.misc.xuio_stats.onloan_write_buf: 0
kstat.zfs.misc.xuio_stats.read_buf_copied: 0
kstat.zfs.misc.xuio_stats.read_buf_nocopy: 0
kstat.zfs.misc.xuio_stats.write_buf_copied: 0
kstat.zfs.misc.xuio_stats.write_buf_nocopy: 107064
kstat.zfs.misc.zfetchstats.hits: 0
kstat.zfs.misc.zfetchstats.misses: 0
kstat.zfs.misc.zfetchstats.colinear_hits: 0
kstat.zfs.misc.zfetchstats.colinear_misses: 0
kstat.zfs.misc.zfetchstats.stride_hits: 0
kstat.zfs.misc.zfetchstats.stride_misses: 0
kstat.zfs.misc.zfetchstats.reclaim_successes: 0
kstat.zfs.misc.zfetchstats.reclaim_failures: 0
kstat.zfs.misc.zfetchstats.streams_resets: 0
kstat.zfs.misc.zfetchstats.streams_noresets: 0
kstat.zfs.misc.zfetchstats.bogus_streams: 0
kstat.zfs.misc.arcstats.hits: 47091548
kstat.zfs.misc.arcstats.misses: 17064059
kstat.zfs.misc.arcstats.demand_data_hits: 15357194
kstat.zfs.misc.arcstats.demand_data_misses: 3077290
kstat.zfs.misc.arcstats.demand_metadata_hits: 31102404
kstat.zfs.misc.arcstats.demand_metadata_misses: 8692242
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 631950
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 5294527
kstat.zfs.misc.arcstats.mru_hits: 27566971
kstat.zfs.misc.arcstats.mru_ghost_hits: 2179308
kstat.zfs.misc.arcstats.mfu_hits: 18950663
kstat.zfs.misc.arcstats.mfu_ghost_hits: 2714218
kstat.zfs.misc.arcstats.allocated: 19825272
kstat.zfs.misc.arcstats.deleted: 12619489
kstat.zfs.misc.arcstats.stolen: 9003539
kstat.zfs.misc.arcstats.recycle_miss: 10224598
kstat.zfs.misc.arcstats.mutex_miss: 1984
kstat.zfs.misc.arcstats.evict_skip: 216358592
kstat.zfs.misc.arcstats.evict_l2_cached: 0
kstat.zfs.misc.arcstats.evict_l2_eligible: 433025541120
kstat.zfs.misc.arcstats.evict_l2_ineligible: 87633796096
kstat.zfs.misc.arcstats.hash_elements: 15988
kstat.zfs.misc.arcstats.hash_elements_max: 43365
kstat.zfs.misc.arcstats.hash_collisions: 5599202
kstat.zfs.misc.arcstats.hash_chains: 3944