Re: ZFS txg implementation flaw
On Mon, Oct 28, 2013 at 02:56:17PM -0700, Xin Li wrote: > >>> Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry > >>> { @traces[stack()] = count(); }' > >>> > >>> After some (2-3) seconds > >>> > >>> kernel`vnode_destroy_vobject+0xb9 > >>> zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78 > >>> kernel`vgonel+0x134 kernel`vnlru_free+0x362 > >>> kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f > >>> kernel`0x80cdbfde 2490 > > > > 0x80cdbfd0 : mov%r12,%rdi > > 0x80cdbfd3 : mov%rbx,%rsi > > 0x80cdbfd6 : mov%rsp,%rdx > > 0x80cdbfd9 : callq 0x808db560 > > 0x80cdbfde :jmpq > > 0x80cdca80 0x80cdbfe3 > > :nopw 0x0(%rax,%rax,1) > > 0x80cdbfe9 :nopl 0x0(%rax) > > > > > >>> I don't have user process created threads nor do fork/exit. > >> > >> This has nothing to do with fork/exit but does suggest that you > >> are running of vnodes. What does sysctl -a | grep vnode say? > > > > kern.maxvnodes: 1095872 kern.minvnodes: 273968 > > vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 > > vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 > > vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes: > > 316321 debug.sizeof.vnode: 504 > > Try setting vfs.wantfreevnodes to 547936 (double it). Now fork_trampoline was gone, but I still see prcfr (and zfod/totfr too). Currently half of peeak traffic and I can't check impact to IRQ handling. kern.maxvnodes: 1095872 kern.minvnodes: 547936 vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 63134 vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10836 vfs.freevnodes: 481873 vfs.wantfreevnodes: 547936 vfs.numvnodes: 517331 debug.sizeof.vnode: 504 Now dtrace -n 'fbt:kernel:vm_object_terminate:entry { @traces[stack()] = count(); }' kernel`vm_object_deallocate+0x520 kernel`vm_map_entry_deallocate+0x4c kernel`vm_map_process_deferred+0x3d kernel`sys_munmap+0x16c kernel`amd64_syscall+0x5ea kernel`0x80cdbd97 56 I think this is nginx memory management (allocation|dealocation). Can I tune malloc to disable free pages? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: > As I see ZFS cretate seperate thread for earch txg writing. > Also for writing to L2ARC. > As result -- up to several thousands threads created and destoyed per > second. And hundreds thousands page allocations, zeroing, maping > unmaping and freeing per seconds. Very high overhead. How are you measuring the number of threads being created / destroyed? This claim seems erroneous given how the ZFS thread pool mechanism actually works (and yes, there are thread pools already). It would be helpful to both see your measurement methodology and the workload you are using in your tests. - Jordan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 10/28/13 14:45, Slawa Olhovchenkov wrote: > On Mon, Oct 28, 2013 at 02:38:30PM -0700, Xin Li wrote: > >> -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 >> >> On 10/28/13 14:32, Slawa Olhovchenkov wrote: >>> On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard >>> wrote: >>> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: > As I see ZFS cretate seperate thread for earch txg writing. > Also for writing to L2ARC. As result -- up to several > thousands threads created and destoyed per second. And > hundreds thousands page allocations, zeroing, maping > unmaping and freeing per seconds. Very high overhead. How are you measuring the number of threads being created / destroyed? This claim seems erroneous given how the ZFS thread pool mechanism actually works (and yes, there are thread pools already). It would be helpful to both see your measurement methodology and the workload you are using in your tests. >>> >>> Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry >>> { @traces[stack()] = count(); }' >>> >>> After some (2-3) seconds >>> >>> kernel`vnode_destroy_vobject+0xb9 >>> zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78 >>> kernel`vgonel+0x134 kernel`vnlru_free+0x362 >>> kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f >>> kernel`0x80cdbfde 2490 > > 0x80cdbfd0 : mov%r12,%rdi > 0x80cdbfd3 : mov%rbx,%rsi > 0x80cdbfd6 : mov%rsp,%rdx > 0x80cdbfd9 : callq 0x808db560 > 0x80cdbfde :jmpq > 0x80cdca80 0x80cdbfe3 > :nopw 0x0(%rax,%rax,1) > 0x80cdbfe9 :nopl 0x0(%rax) > > >>> I don't have user process created threads nor do fork/exit. >> >> This has nothing to do with fork/exit but does suggest that you >> are running of vnodes. What does sysctl -a | grep vnode say? > > kern.maxvnodes: 1095872 kern.minvnodes: 273968 > vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 > vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 > vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes: > 316321 debug.sizeof.vnode: 504 Try setting vfs.wantfreevnodes to 547936 (double it). Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- iQIcBAEBCgAGBQJSbt2BAAoJEJW2GBstM+nsknMP/1QQQ0BHJOu//nG2M2HnYGsQ bS0he2xdom/GpPuMS3AwGYYwZTWwauGwr3c2K4czW5AzghNDxpVfycobuGeWVvcB mvyBgkGhxy33nxVuw9hH4FJW62vJc9sJKlgg5QNQhER81OpCBS2AcVv7qNNtj9f6 svZrhu6X28maas+JnwSr5U82gudC1uhHD3h1pZqc+ogFiEgHlQOoL3Pl6SrpTKUZ WNFnKd9xWQ/28n26r+jzQu9SlTSStKNQcZiCsMO/5TcGs6Ul8Ft2pS0EKYvVMdVF poPLItT7qa38nM9BXZYNiESIoZpe1coYXX0en6NMTa0q7JerN05tk3d8q31Rn/Hp toodJuZB8zA+ZN732s295G06j9gDbSj/iFLumV/0s9OHMVT5lgqVjxmPurmjE+ay nnPrTDpO3Ef45nC6Gb87yN2ML2GG40de5kYWtieLFt5aSJhQjvmDA+zOxdC9orrh raspOHfgysvSh8ykaS9SsNdzgEJr5TTzbxh91Ft06e65TEdIzX9HhnqxOLBT+lC1 E6OKYVuU1rLjZPPTplCFI922JbyKEhSc73Gu03zPma8cJEzP/ztCxm/Jv0PrV+4b SzphVQdMbUr2TMKAUIJXcCwHSWhmqmODoDcHoTbC0kBAqyAbaTCZ8PJaR/A8 jxbZvQV8dGjSYu0LVhnT =3Xt/ -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On Mon, Oct 28, 2013 at 02:38:30PM -0700, Xin Li wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA512 > > On 10/28/13 14:32, Slawa Olhovchenkov wrote: > > On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote: > > > >> > >> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov > >> wrote: > >> > >>> As I see ZFS cretate seperate thread for earch txg writing. > >>> Also for writing to L2ARC. As result -- up to several thousands > >>> threads created and destoyed per second. And hundreds thousands > >>> page allocations, zeroing, maping unmaping and freeing per > >>> seconds. Very high overhead. > >> > >> How are you measuring the number of threads being created / > >> destroyed? This claim seems erroneous given how the ZFS thread > >> pool mechanism actually works (and yes, there are thread pools > >> already). > >> > >> It would be helpful to both see your measurement methodology and > >> the workload you are using in your tests. > > > > Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry { > > @traces[stack()] = count(); }' > > > > After some (2-3) seconds > > > > kernel`vnode_destroy_vobject+0xb9 zfs.ko`zfs_freebsd_reclaim+0x2e > > kernel`VOP_RECLAIM_APV+0x78 kernel`vgonel+0x134 > > kernel`vnlru_free+0x362 kernel`vnlru_proc+0x61e > > kernel`fork_exit+0x11f kernel`0x80cdbfde 2490 0x80cdbfd0 : mov%r12,%rdi 0x80cdbfd3 : mov%rbx,%rsi 0x80cdbfd6 : mov%rsp,%rdx 0x80cdbfd9 : callq 0x808db560 0x80cdbfde :jmpq 0x80cdca80 0x80cdbfe3 :nopw 0x0(%rax,%rax,1) 0x80cdbfe9 :nopl 0x0(%rax) > > I don't have user process created threads nor do fork/exit. > > This has nothing to do with fork/exit but does suggest that you are > running of vnodes. What does sysctl -a | grep vnode say? kern.maxvnodes: 1095872 kern.minvnodes: 273968 vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes: 316321 debug.sizeof.vnode: 504 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 10/28/13 14:32, Slawa Olhovchenkov wrote: > On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote: > >> >> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov >> wrote: >> >>> As I see ZFS cretate seperate thread for earch txg writing. >>> Also for writing to L2ARC. As result -- up to several thousands >>> threads created and destoyed per second. And hundreds thousands >>> page allocations, zeroing, maping unmaping and freeing per >>> seconds. Very high overhead. >> >> How are you measuring the number of threads being created / >> destroyed? This claim seems erroneous given how the ZFS thread >> pool mechanism actually works (and yes, there are thread pools >> already). >> >> It would be helpful to both see your measurement methodology and >> the workload you are using in your tests. > > Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry { > @traces[stack()] = count(); }' > > After some (2-3) seconds > > kernel`vnode_destroy_vobject+0xb9 zfs.ko`zfs_freebsd_reclaim+0x2e > kernel`VOP_RECLAIM_APV+0x78 kernel`vgonel+0x134 > kernel`vnlru_free+0x362 kernel`vnlru_proc+0x61e > kernel`fork_exit+0x11f kernel`0x80cdbfde 2490 > > I don't have user process created threads nor do fork/exit. This has nothing to do with fork/exit but does suggest that you are running of vnodes. What does sysctl -a | grep vnode say? Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- iQIcBAEBCgAGBQJSbtlWAAoJEJW2GBstM+ns1BgP/iD89HXV3g/c4/GliMG27yB0 WMoWJVDvHmzvRuHBMC6rUIqvyfSaK4EdFDK2jYUIM9qQwWcrSXRXIDBLNE/5MHwl FgcsaBlFaE17bMwzWrZRCzSb1YMxHXmHG5e10YrGUW8TKkGBVtDD6SIMVK8xg6SQ 5HM2HJR8BVaB65z4S1tLxA+VIqHitUZ0/kTME6X1Z+Y/CwS29F+seXk1DlDYNZM3 W3UVTxJnVwf9HhHRvx/kDtPIPeuIz0O/M5cgtbYq78wjG9Zim6a8SWpuxKeduDoT CTllgyEidc+vtDiEiksRsja3ATwynzjLGlNribnMKP2U4KMu9qfVUXDse3wwKKXa +f9Yfzg+fif3r6d/hdlQCtHJhjNlqfjDjCXHHpuTftLU2ONpj9hwKYKOqp6ykmt9 Ok2QziXqBxRMVXJjDAOybv8P1zCAcTpRtvR25bbE7T0M49dvVw51CdAdX8m8nJR+ tX72r+j4BeoNflQWqSsG8P9ao3AuOk6jGgXdtngbbpteyplaVqLragFo8shfUNmY dWaJp46wUq3gaRBSO/4CkzdyWl99eTTOAW4/Zr78LuYT5wN7FL590AAT3Jmc9N4Z edZsR2a8VwluLAVuJNqf9odg7MW03xxjKKf9Wm/I112XtFHDg/dCIrdf4cWc5iuA SvGKci6yZfy5e6hj+ZH5 =aVMu -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote: > > On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: > > > As I see ZFS cretate seperate thread for earch txg writing. > > Also for writing to L2ARC. > > As result -- up to several thousands threads created and destoyed per > > second. And hundreds thousands page allocations, zeroing, maping > > unmaping and freeing per seconds. Very high overhead. > > How are you measuring the number of threads being created / destroyed? This > claim seems erroneous given how the ZFS thread pool mechanism actually works > (and yes, there are thread pools already). > > It would be helpful to both see your measurement methodology and the workload > you are using in your tests. Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry { @traces[stack()] = count(); }' After some (2-3) seconds kernel`vnode_destroy_vobject+0xb9 zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78 kernel`vgonel+0x134 kernel`vnlru_free+0x362 kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f kernel`0x80cdbfde 2490 I don't have user process created threads nor do fork/exit. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On Mon, Oct 28, 2013 at 04:51:02PM -0400, Allan Jude wrote: > On 2013-10-28 16:48, Slawa Olhovchenkov wrote: > > On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote: > > > >> On 2013-10-28 14:16, Slawa Olhovchenkov wrote: > >>> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote: > >>> > On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: > > > I can be wrong. > > As I see ZFS cretate seperate thread for earch txg writing. > > Also for writing to L2ARC. > > As result -- up to several thousands threads created and destoyed per > > second. And hundreds thousands page allocations, zeroing, maping > > unmaping and freeing per seconds. Very high overhead. > > > > In systat -vmstat I see totfr up to 60, prcfr up to 20. > > > > Estimated overhead -- 30% of system time. > > > > Can anybody implement thread and page pool for txg? > Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this? > >>> vfs.zfs.txg.timeout: 5 > >>> > >>> Only x5 lowering (less in real case with burst writing). And more > >>> fragmentation on writing and etc. > >>> ___ > >>> freebsd-current@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-current > >>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > >> >From my understanding, increasing the timeout so you are doing fewer > >> transaction groups, would actually be the way to increase performance, > >> at the cost of 'bursty' writing and the associated uneven latency. > > This (increasing the timeout) is dramaticaly decreasing read > > performance by very high IO burst. > It shouldn't affect read performance, except during the flush operations > (every txg.timeout seconds) Yes, I talk about this time. > If you watch with 'gstat' or 'gstat -f ada.$' you should see the cycle > > reading quickly, then every txg.timeout seconds (and for maybe longer), > it flushes the entire transaction group (may be 100s of MBs) to the > disk, this high write load may make reads slow until it is finished. Yes. And read may delayed for some seconds. This is unacceptable for may case. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On 2013-10-28 16:48, Slawa Olhovchenkov wrote: > On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote: > >> On 2013-10-28 14:16, Slawa Olhovchenkov wrote: >>> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote: >>> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: > I can be wrong. > As I see ZFS cretate seperate thread for earch txg writing. > Also for writing to L2ARC. > As result -- up to several thousands threads created and destoyed per > second. And hundreds thousands page allocations, zeroing, maping > unmaping and freeing per seconds. Very high overhead. > > In systat -vmstat I see totfr up to 60, prcfr up to 20. > > Estimated overhead -- 30% of system time. > > Can anybody implement thread and page pool for txg? Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this? >>> vfs.zfs.txg.timeout: 5 >>> >>> Only x5 lowering (less in real case with burst writing). And more >>> fragmentation on writing and etc. >>> ___ >>> freebsd-current@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-current >>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" >> >From my understanding, increasing the timeout so you are doing fewer >> transaction groups, would actually be the way to increase performance, >> at the cost of 'bursty' writing and the associated uneven latency. > This (increasing the timeout) is dramaticaly decreasing read > performance by very high IO burst. It shouldn't affect read performance, except during the flush operations (every txg.timeout seconds) If you watch with 'gstat' or 'gstat -f ada.$' you should see the cycle reading quickly, then every txg.timeout seconds (and for maybe longer), it flushes the entire transaction group (may be 100s of MBs) to the disk, this high write load may make reads slow until it is finished. Over the course of a full 60 seconds, this should result in a higher total read performance, although it will be uneven, slower during the write cycle. -- Allan Jude ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote: > On 2013-10-28 14:16, Slawa Olhovchenkov wrote: > > On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote: > > > >> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: > >> > >>> I can be wrong. > >>> As I see ZFS cretate seperate thread for earch txg writing. > >>> Also for writing to L2ARC. > >>> As result -- up to several thousands threads created and destoyed per > >>> second. And hundreds thousands page allocations, zeroing, maping > >>> unmaping and freeing per seconds. Very high overhead. > >>> > >>> In systat -vmstat I see totfr up to 60, prcfr up to 20. > >>> > >>> Estimated overhead -- 30% of system time. > >>> > >>> Can anybody implement thread and page pool for txg? > >> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this? > > vfs.zfs.txg.timeout: 5 > > > > Only x5 lowering (less in real case with burst writing). And more > > fragmentation on writing and etc. > > ___ > > freebsd-current@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > >From my understanding, increasing the timeout so you are doing fewer > transaction groups, would actually be the way to increase performance, > at the cost of 'bursty' writing and the associated uneven latency. This (increasing the timeout) is dramaticaly decreasing read performance by very high IO burst. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On 2013-10-28 14:25, aurfalien wrote: > On Oct 28, 2013, at 11:16 AM, Slawa Olhovchenkov wrote: > >> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote: >> >>> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: >>> I can be wrong. As I see ZFS cretate seperate thread for earch txg writing. Also for writing to L2ARC. As result -- up to several thousands threads created and destoyed per second. And hundreds thousands page allocations, zeroing, maping unmaping and freeing per seconds. Very high overhead. In systat -vmstat I see totfr up to 60, prcfr up to 20. Estimated overhead -- 30% of system time. Can anybody implement thread and page pool for txg? >>> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this? >> vfs.zfs.txg.timeout: 5 >> >> Only x5 lowering (less in real case with burst writing). And more >> fragmentation on writing and etc. > So leave it default in other words. > > Good to know. > > - aurf > > ___ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" The default is the default for a reason, although the original default was 30 -- Allan Jude ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On Oct 28, 2013, at 11:16 AM, Slawa Olhovchenkov wrote: > On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote: > >> >> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: >> >>> I can be wrong. >>> As I see ZFS cretate seperate thread for earch txg writing. >>> Also for writing to L2ARC. >>> As result -- up to several thousands threads created and destoyed per >>> second. And hundreds thousands page allocations, zeroing, maping >>> unmaping and freeing per seconds. Very high overhead. >>> >>> In systat -vmstat I see totfr up to 60, prcfr up to 20. >>> >>> Estimated overhead -- 30% of system time. >>> >>> Can anybody implement thread and page pool for txg? >> >> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this? > > vfs.zfs.txg.timeout: 5 > > Only x5 lowering (less in real case with burst writing). And more > fragmentation on writing and etc. So leave it default in other words. Good to know. - aurf ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On 2013-10-28 14:16, Slawa Olhovchenkov wrote: > On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote: > >> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: >> >>> I can be wrong. >>> As I see ZFS cretate seperate thread for earch txg writing. >>> Also for writing to L2ARC. >>> As result -- up to several thousands threads created and destoyed per >>> second. And hundreds thousands page allocations, zeroing, maping >>> unmaping and freeing per seconds. Very high overhead. >>> >>> In systat -vmstat I see totfr up to 60, prcfr up to 20. >>> >>> Estimated overhead -- 30% of system time. >>> >>> Can anybody implement thread and page pool for txg? >> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this? > vfs.zfs.txg.timeout: 5 > > Only x5 lowering (less in real case with burst writing). And more > fragmentation on writing and etc. > ___ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" >From my understanding, increasing the timeout so you are doing fewer transaction groups, would actually be the way to increase performance, at the cost of 'bursty' writing and the associated uneven latency. -- Allan Jude ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: > I can be wrong. > As I see ZFS cretate seperate thread for earch txg writing. > Also for writing to L2ARC. > As result -- up to several thousands threads created and destoyed per > second. And hundreds thousands page allocations, zeroing, maping > unmaping and freeing per seconds. Very high overhead. > > In systat -vmstat I see totfr up to 60, prcfr up to 20. > > Estimated overhead -- 30% of system time. > > Can anybody implement thread and page pool for txg? Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this? - aurf ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS txg implementation flaw
On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote: > > On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote: > > > I can be wrong. > > As I see ZFS cretate seperate thread for earch txg writing. > > Also for writing to L2ARC. > > As result -- up to several thousands threads created and destoyed per > > second. And hundreds thousands page allocations, zeroing, maping > > unmaping and freeing per seconds. Very high overhead. > > > > In systat -vmstat I see totfr up to 60, prcfr up to 20. > > > > Estimated overhead -- 30% of system time. > > > > Can anybody implement thread and page pool for txg? > > Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this? vfs.zfs.txg.timeout: 5 Only x5 lowering (less in real case with burst writing). And more fragmentation on writing and etc. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
ZFS txg implementation flaw
I can be wrong. As I see ZFS cretate seperate thread for earch txg writing. Also for writing to L2ARC. As result -- up to several thousands threads created and destoyed per second. And hundreds thousands page allocations, zeroing, maping unmaping and freeing per seconds. Very high overhead. In systat -vmstat I see totfr up to 60, prcfr up to 20. Estimated overhead -- 30% of system time. Can anybody implement thread and page pool for txg? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"