Re: ZFS Boot Menu
06.10.2013 08:54, Teske, Devin wrote: On Sep 30, 2013, at 6:20 AM, Volodymyr Kostyrko wrote: 29.09.2013 00:30, Teske, Devin wrote: Interested in feedback, but moreover I would like to see who is interested in tackling this with me? I can't do it alone... I at least need testers whom will provide feedback and edge-case testing. Sign me in, I'm not fluent with forth but testing something new is always fun. Cool; to start with, do you have a virtual appliance software like VMware or VirtualBox? Experience with generating ZFS pools in said software? VirtualBox/Qemu, Qemu is able to emulate boot to serial for example. And yes I tried working with ZFS in VMs. I think that we may have something to test next month. Right now, we're working on the ability of bsdinstall(8) to provision Boot on ZFS as a built-in functionality. That sounds cool. -- Sphinx of black quartz, judge my vow. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS Boot Menu
On Sep 30, 2013, at 6:20 AM, Volodymyr Kostyrko wrote: 29.09.2013 00:30, Teske, Devin wrote: Interested in feedback, but moreover I would like to see who is interested in tackling this with me? I can't do it alone... I at least need testers whom will provide feedback and edge-case testing. Sign me in, I'm not fluent with forth but testing something new is always fun. Cool; to start with, do you have a virtual appliance software like VMware or VirtualBox? Experience with generating ZFS pools in said software? I think that we may have something to test next month. Right now, we're working on the ability of bsdinstall(8) to provision Boot on ZFS as a built-in functionality. This feature (when in; projected for 10.0-BETA1) will make testing the Forth enhancements easier (IMHO). -- Devin _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS Boot Menu
Am 28.09.2013 23:30, schrieb Teske, Devin: In my recent interview on bsdnow.tv, I was pinged on BEs in Forth. I'd like to revisit this. Back on Sept 20th, 2012, I posted some pics demonstrating what exactly code that was in HEAD (at the time) was/is capable of. These three pictures (posted the same day) tell a story: 1. You boot to the menu: http://twitpic.com/b1eswi/full 2. You select option #5 to get here: http://twitpic.com/b1etyb/full 3. You select option #2 to get here: http://twitpic.com/b1ew47/full I've just (today) uploaded the /boot/menu.rc file(s) that I used to create those pictures: http://druidbsd.cvs.sf.net/viewvc/druidbsd/zfsbeastie/ NB: There's a README file to go along with the files. HINT: diff -pu menu.rc.1.current-head menu.rc.2.cycler HINT: diff -pu menu.rc.1.current-head menu.rc.2.dynamic-submenu Interested in feedback, but moreover I would like to see who is interested in tackling this with me? I can't do it alone... I at least need testers whom will provide feedback and edge-case testing. Woohoo! Great! I am using ZFS boot environments with beadm, so I can test a bit. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS Boot Menu
29.09.2013 00:30, Teske, Devin wrote: Interested in feedback, but moreover I would like to see who is interested in tackling this with me? I can't do it alone... I at least need testers whom will provide feedback and edge-case testing. Sign me in, I'm not fluent with forth but testing something new is always fun. -- Sphinx of black quartz, judge my vow. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
ZFS Boot Menu
In my recent interview on bsdnow.tv, I was pinged on BEs in Forth. I'd like to revisit this. Back on Sept 20th, 2012, I posted some pics demonstrating what exactly code that was in HEAD (at the time) was/is capable of. These three pictures (posted the same day) tell a story: 1. You boot to the menu: http://twitpic.com/b1eswi/full 2. You select option #5 to get here: http://twitpic.com/b1etyb/full 3. You select option #2 to get here: http://twitpic.com/b1ew47/full I've just (today) uploaded the /boot/menu.rc file(s) that I used to create those pictures: http://druidbsd.cvs.sf.net/viewvc/druidbsd/zfsbeastie/ NB: There's a README file to go along with the files. HINT: diff -pu menu.rc.1.current-head menu.rc.2.cycler HINT: diff -pu menu.rc.1.current-head menu.rc.2.dynamic-submenu Interested in feedback, but moreover I would like to see who is interested in tackling this with me? I can't do it alone... I at least need testers whom will provide feedback and edge-case testing. -- Devin _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Zfs encryption property for freebsd 8.3
Hi, I want to encrypt some disk on my server with Zfs encryption property but it is not available. Are there anybody have got an experience about this? [url]http://docs.oracle.com/cd/E23824_01/html/821-1448/gkkih.html#scrolltoc[/url] [url]http://www.oracle.com/technetwork/articles/servers-storage-admin/manage-zfs-encryption-1715034.html[/url] These are good explanations but I got an error and output shows all property; [root@HP ~]# zpool status pool: output state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM output ONLINE 0 0 0 ad0s1eONLINE 0 0 0 errors: No known data errors [root@HP ~]# zfs create -o encryption=on output/home cannot create 'output/home': invalid property 'encryption' [root@HP ~]# zfs get encryption bad property list: invalid property 'encryption' usage: get [-rHp] [-d max] [-o all | field[,...]] [-t type[,...]] [-s source[,...]] all | property[,...] [filesystem|volume|snapshot] ... The following properties are supported: PROPERTY EDIT INHERIT VALUES availableNO NO size clones NO NO dataset[,...] compressratioNO NO 1.00x or higher if compressed creation NO NO date defer_destroyNO NO yes | no mounted NO NO yes | no origin NO NO snapshot refcompressratio NO NO 1.00x or higher if compressed referenced NO NO size type NO NO filesystem | volume | snapshot used NO NO size usedbychildren NO NO size usedbydatasetNO NO size usedbyrefreservation NO NO size usedbysnapshots NO NO size userrefs NO NO count written NO NO size aclinherit YES YES discard | noallow | restricted | passthrough | passthrough-x aclmode YES YES discard | groupmask | passthrough | restricted atime YES YES on | off canmountYES NO on | off | noauto casesensitivity NO YES sensitive | insensitive | mixed checksumYES YES on | off | fletcher2 | fletcher4 | sha256 compression YES YES on | off | lzjb | gzip | gzip-[1-9] | zle copies YES YES 1 | 2 | 3 dedup YES YES on | off | verify | sha256[,verify] devices YES YES on | off execYES YES on | off jailed YES YES on | off logbias YES YES latency | throughput mlslabelYES YES sensitivity label mountpoint YES YES path | legacy | none nbmand YES YES on | off normalizationNO YES none | formC | formD | formKC | formKD primarycacheYES YES all | none | metadata quota YES NO size | none readonlyYES YES on | off recordsize YES YES 512 to 128k, power of 2 refquotaYES NO size | none refreservation YES NO size | none reservation YES NO size | none secondarycache YES YES all | none | metadata setuid YES YES on | off sharenfsYES YES on | off | share(1M) options sharesmbYES YES on | off | sharemgr(1M) options snapdir YES YES hidden | visible syncYES YES standard | always | disabled utf8only NO YES on | off version YES NO 1 | 2 | 3 | 4 | 5 | current volblocksize NO YES 512 to 128k, power of 2 volsize YES NO size vscan YES YES on | off xattr YES YES on | off userused@... NO NO size groupused@...NO NO size userquota@... YES NO size | none groupquota@... YES NO size | none written@snap NO NO size Sizes are specified in bytes with standard units such as K, M, G, etc. User-defined properties can be specified by using a name containing a colon (:). The {user|group}{used|quota}@ properties must be appended with a user or group specifier of one of these forms: POSIX name (eg: matt) POSIX id(eg: 126829) SMB name@domain (eg: matt@sun) SMB SID (eg: S-1-234-567-89) [root@HP ~]# - How can I use or add encryption property to FreeBsd 8.3? ___ freebsd
Re: Zfs encryption property for freebsd 8.3
Le 03/09/2013 14:14, Emre Çamalan a écrit : Hi, I want to encrypt some disk on my server with Zfs encryption property but it is not available. That would require ZFS v30. As far as I am aware Oracle has not released the code under CDDL. From http://forums.freebsd.org/showthread.php?t=30036 So you can use ZFS pools on GELI volumes, it can be a good start. I not play with it. -- Florent Peterschmitt | Please: flor...@peterschmitt.fr| * Avoid HTML/RTF in E-mail. +33 (0)6 64 33 97 92 | * Send PDF for documents. http://florent.peterschmitt.fr | * Trim your quotations. Really. Proudly powered by Open Source | Thank you :) signature.asc Description: OpenPGP digital signature
Re: Zfs encryption property for freebsd 8.3
On Tue, Sep 3, 2013 at 6:22 AM, Florent Peterschmitt flor...@peterschmitt.fr wrote: Le 03/09/2013 14:14, Emre Çamalan a écrit : Hi, I want to encrypt some disk on my server with Zfs encryption property but it is not available. That would require ZFS v30. As far as I am aware Oracle has not released the code under CDDL. Oracle's ZFS encryption is crap anyway. It works at the filesystem level, not the pool level, so a lot of metadata is in plaintext; I don't remember how much exactly. It's also highly vulnerable to watermarking attacks. From http://forums.freebsd.org/showthread.php?t=30036 So you can use ZFS pools on GELI volumes, it can be a good start. I not play with it. GELI is full-disk encryption. It's far superior to ZFS encryption. -- Florent Peterschmitt | Please: flor...@peterschmitt.fr| * Avoid HTML/RTF in E-mail. +33 (0)6 64 33 97 92 | * Send PDF for documents. http://florent.peterschmitt.fr | * Trim your quotations. Really. Proudly powered by Open Source | Thank you :) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Zfs encryption property for freebsd 8.3
Le 03/09/2013 16:53, Alan Somers a écrit : GELI is full-disk encryption. It's far superior to ZFS encryption. Yup, but is there a possibility to encrypt a ZFS volume (not a whole pool) with a separate GELI partition? Also, in-ZFS encryption would be a nice thing if it could work like an LVM/LUKS where each logical LVM volume can be encrypted or not and have its own crypt key. I saw that Illumos has ZFS encrytion in the TODO list. -- Florent Peterschmitt | Please: flor...@peterschmitt.fr| * Avoid HTML/RTF in E-mail. +33 (0)6 64 33 97 92 | * Send PDF for documents. http://florent.peterschmitt.fr | * Trim your quotations. Really. Proudly powered by Open Source | Thank you :) signature.asc Description: OpenPGP digital signature
Re: Zfs encryption property for freebsd 8.3
On Tue, Sep 3, 2013 at 9:01 AM, Florent Peterschmitt flor...@peterschmitt.fr wrote: Le 03/09/2013 16:53, Alan Somers a écrit : GELI is full-disk encryption. It's far superior to ZFS encryption. Yup, but is there a possibility to encrypt a ZFS volume (not a whole pool) with a separate GELI partition? You mean encrypt a zvol with GELI and put a file system on that? I suppose that would work, but I bet that it would be slow. Also, in-ZFS encryption would be a nice thing if it could work like an LVM/LUKS where each logical LVM volume can be encrypted or not and have its own crypt key. My understanding is that this is exactly how Oracle's ZFS encryption works. Each ZFS filesystem can have its own key, or be in plaintext. Every cryptosystem involves a tradeoff between security and convenience, and ZFS encryption goes fairly hard toward convenience. In particular, Oracle decided that encrypted files must be deduplicatable. A necessary result is that they are trivially vulnerable to watermarking attacks. https://blogs.oracle.com/darren/entry/zfs_encryption_what_is_on I saw that Illumos has ZFS encrytion in the TODO list. -- Florent Peterschmitt | Please: flor...@peterschmitt.fr| * Avoid HTML/RTF in E-mail. +33 (0)6 64 33 97 92 | * Send PDF for documents. http://florent.peterschmitt.fr | * Trim your quotations. Really. Proudly powered by Open Source | Thank you :) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS
On Fri, 30 Aug 2013, Patrick wrote: On Fri, Aug 30, 2013 at 1:30 AM, Andriy Gapon a...@freebsd.org wrote: I don't have an exact recollection of what is installed by freebsd-update - are *.symbols files installed? Doesn't look like it. I wonder if I can grab that from a distro site or somewhere? it seems so: marck@woozle:/pub/FreeBSD/releases/amd64/8.4-RELEASE/kernels grep -c symbol generic.mtree 636 So, get kernels subdir from the release and extract symbols from them: cat generic.?? | tar tvjf - \*.symbols -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS
On Sat, 31 Aug 2013, Dmitry Morozovsky wrote: I don't have an exact recollection of what is installed by freebsd-update - are *.symbols files installed? Doesn't look like it. I wonder if I can grab that from a distro site or somewhere? it seems so: marck@woozle:/pub/FreeBSD/releases/amd64/8.4-RELEASE/kernels grep -c symbol generic.mtree 636 So, get kernels subdir from the release and extract symbols from them: cat generic.?? | tar tvjf - \*.symbols ah, ``tar xvjf'' of course -- I did test-run -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS
On Thu, Aug 29, 2013 at 2:32 PM, Andriy Gapon a...@freebsd.org wrote: on 29/08/2013 19:37 Patrick said the following: I've got a system running on a VPS that I'm trying to upgrade from 8.2 to 8.4. It has a ZFS root. After booting the new kernel, I get: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x40 fault code = supervisor read data, page not present instruction pointer = 0x20:0x810d7691 stack pointer = 0x28:0xff81ba60 frame pointer = 0x28:0xff81ba90 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 1 (kernel) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0x8066cb96 at kdb_backtrace+0x66 #1 0x8063925e at panic+0x1ce #2 0x809c21d0 at trap_fatal+0x290 #3 0x809c255e at trap_pfault+0x23e #4 0x809c2a2e at trap+0x3ce #5 0x809a9624 at calltrap+0x8 #6 0x810df517 at vdev_mirror_child_select+0x67 If possible, please run 'kgdb /path/to/8.4/kernel' and then in kgdb do 'list *vdev_mirror_child_select+0x67' H... (kgdb) list *vdev_mirror_child_select+0x67 No symbol table is loaded. Use the file command. Do I need to build the kernel from source myself? This kernel is what freebsd-update installed during part 1 of the upgrade. Patrick ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS
on 30/08/2013 11:17 Patrick said the following: H... (kgdb) list *vdev_mirror_child_select+0x67 No symbol table is loaded. Use the file command. Do I need to build the kernel from source myself? This kernel is what freebsd-update installed during part 1 of the upgrade. I don't have an exact recollection of what is installed by freebsd-update - are *.symbols files installed? -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS
On Fri, Aug 30, 2013 at 1:30 AM, Andriy Gapon a...@freebsd.org wrote: I don't have an exact recollection of what is installed by freebsd-update - are *.symbols files installed? Doesn't look like it. I wonder if I can grab that from a distro site or somewhere? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS
On Fri, Aug 30, 2013 at 1:30 AM, Andriy Gapon a...@freebsd.org wrote: I don't have an exact recollection of what is installed by freebsd-update - are *.symbols files installed? Doesn't look like it. I wonder if I can grab that from a distro site or somewhere? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Fatal trap 12 going from 8.2 to 8.4 with ZFS
I've got a system running on a VPS that I'm trying to upgrade from 8.2 to 8.4. It has a ZFS root. After booting the new kernel, I get: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x40 fault code = supervisor read data, page not present instruction pointer = 0x20:0x810d7691 stack pointer = 0x28:0xff81ba60 frame pointer = 0x28:0xff81ba90 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 1 (kernel) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0x8066cb96 at kdb_backtrace+0x66 #1 0x8063925e at panic+0x1ce #2 0x809c21d0 at trap_fatal+0x290 #3 0x809c255e at trap_pfault+0x23e #4 0x809c2a2e at trap+0x3ce #5 0x809a9624 at calltrap+0x8 #6 0x810df517 at vdev_mirror_child_select+0x67 #7 0x810dfacc at vdev_mirror_io_start+0x24c #8 0x810f7c52 at zio_vdev_io_start+0x232 #9 0x810f76f3 at zio_execute+0xc3 #10 0x810f77ad at zio_wait+0x2d #11 0x8108991e at arc_read+0x6ce #12 0x8109d9d4 at dmu_objset_open_impl+0xd4 #13 0x810b4014 at dsl_pool_init+0x34 #14 0x810c7eea at spa_load+0x6aa #15 0x810c90b2 at spa_load_best+0x52 #16 0x810cb0ca at spa_open_common+0x14a #17 0x810a892d at dsl_dir_open_spa+0x2cd Uptime: 3s Cannot dump. Device not defined or unavailable. I've booted back into the 8.2 kernel without any problems, but I'm wondering if anyone can suggest what I should try to get this working? I used freebsd-update to upgrade, and this was after the first freebsd-update install where it installs the kernel. My /boot/loader.conf has: zfs_load=YES vfs.root.mountfrom=zfs:zroot Should I be going from 8.2 - 8.3 - 8.4? Patrick ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS
on 29/08/2013 19:37 Patrick said the following: I've got a system running on a VPS that I'm trying to upgrade from 8.2 to 8.4. It has a ZFS root. After booting the new kernel, I get: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x40 fault code = supervisor read data, page not present instruction pointer = 0x20:0x810d7691 stack pointer = 0x28:0xff81ba60 frame pointer = 0x28:0xff81ba90 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 1 (kernel) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0x8066cb96 at kdb_backtrace+0x66 #1 0x8063925e at panic+0x1ce #2 0x809c21d0 at trap_fatal+0x290 #3 0x809c255e at trap_pfault+0x23e #4 0x809c2a2e at trap+0x3ce #5 0x809a9624 at calltrap+0x8 #6 0x810df517 at vdev_mirror_child_select+0x67 If possible, please run 'kgdb /path/to/8.4/kernel' and then in kgdb do 'list *vdev_mirror_child_select+0x67' #7 0x810dfacc at vdev_mirror_io_start+0x24c #8 0x810f7c52 at zio_vdev_io_start+0x232 #9 0x810f76f3 at zio_execute+0xc3 #10 0x810f77ad at zio_wait+0x2d #11 0x8108991e at arc_read+0x6ce #12 0x8109d9d4 at dmu_objset_open_impl+0xd4 #13 0x810b4014 at dsl_pool_init+0x34 #14 0x810c7eea at spa_load+0x6aa #15 0x810c90b2 at spa_load_best+0x52 #16 0x810cb0ca at spa_open_common+0x14a #17 0x810a892d at dsl_dir_open_spa+0x2cd Uptime: 3s Cannot dump. Device not defined or unavailable. I've booted back into the 8.2 kernel without any problems, but I'm wondering if anyone can suggest what I should try to get this working? I used freebsd-update to upgrade, and this was after the first freebsd-update install where it installs the kernel. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem
Am 12.07.2013 14:33, schrieb Volodymyr Kostyrko: You can try to experiment with zpool hidden flags. Look at this command: zpool import -N -o readonly=on -f -R /pool pool It will try to import pool in readonly mode so no data would be written on it. It also doesn't mount anything on import so if any fs is damaged you have less chances triggering a coredump. Also zpool import has a hidden -T switch that gives you ability to select transaction that you want to try to restore. You'll need a list of available transaction though: zdb -ul vdev This one when given a vdev lists all uberblocks with their respective transaction ids. You can take the highest one (it's not the last one) and try to mount pool with: zpool import -N -o readonly=on -f -R /pool -F -T transaction_id pool I had good luck with ZFS recovery with the following approach: 1) Use zdb to identify a TXG for which the data structures are intact 2) Select recovery mode by loading the ZFS KLD with vfs.zfs.recover=1 set in /boot/loader.conf 3) Import the pool with the above -T option referring to a suitable TXG found with the help zdb. The zdb commands to use are: # zdb -AAA -L -t TXG -bcdmu POOL (Both -AAA and -L reduce the amount of consistency checking performed. A pool (at TXG) that needs these options to allow zdb to succeed is damaged, but may still allow recovery of most or all files. Be sure to only import that pool R/O, or your data will probably be lost!) A list of TXGs to try can be retrieved with zdb -hh POOL. You may need to add -e to the list of zdb options, since the port is exported / not currently mounted). Regards, STefan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem
11.07.2013 17:43, Reid Linnemann написав(ла): So recently I was trying to transfer a root-on-ZFS zpool from one pair of disks to a single, larger disk. As I am wont to do, I botched the transfer up and decided to destroy the ZFS filesystems on the destination and start again. Naturally I was up late working on this, being sloppy and drowsy without any coffee, and lo and behold I issued my 'zfs destroy -R' and immediately realized after pressing [ENTER[ that I had given it the source's zpool name. oops. Fortunately I was able to interrupt the procedure with only /usr being destroyed from the pool and I was able to send/receive the truly vital data in my /var partition to the new disk and re-deploy the base system to /usr on the new disk. The only thing I'm really missing at this point is all of the third-party software configuration I had in /usr/local/etc and my apache data in /usr/local/www. You can try to experiment with zpool hidden flags. Look at this command: zpool import -N -o readonly=on -f -R /pool pool It will try to import pool in readonly mode so no data would be written on it. It also doesn't mount anything on import so if any fs is damaged you have less chances triggering a coredump. Also zpool import has a hidden -T switch that gives you ability to select transaction that you want to try to restore. You'll need a list of available transaction though: zdb -ul vdev This one when given a vdev lists all uberblocks with their respective transaction ids. You can take the highest one (it's not the last one) and try to mount pool with: zpool import -N -o readonly=on -f -R /pool -F -T transaction_id pool Then check available filesystems. -- Sphinx of black quartz, judge my vow. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem
Hey presto! / zfs list NAME USED AVAIL REFER MOUNTPOINT bucket 485G 1.30T 549M legacy bucket/tmp21K 1.30T21K legacy bucket/usr 29.6G 1.30T 29.6G /mnt/usr bucket/var 455G 1.30T 17.7G /mnt/var bucket/var/srv 437G 1.30T 437G /mnt/var/srv There's my old bucket! Thanks much for the hidden -T argument, Volodymyr! Now I can get back the remainder of my missing configuration. On Fri, Jul 12, 2013 at 7:33 AM, Volodymyr Kostyrko c.kw...@gmail.comwrote: 11.07.2013 17:43, Reid Linnemann написав(ла): So recently I was trying to transfer a root-on-ZFS zpool from one pair of disks to a single, larger disk. As I am wont to do, I botched the transfer up and decided to destroy the ZFS filesystems on the destination and start again. Naturally I was up late working on this, being sloppy and drowsy without any coffee, and lo and behold I issued my 'zfs destroy -R' and immediately realized after pressing [ENTER[ that I had given it the source's zpool name. oops. Fortunately I was able to interrupt the procedure with only /usr being destroyed from the pool and I was able to send/receive the truly vital data in my /var partition to the new disk and re-deploy the base system to /usr on the new disk. The only thing I'm really missing at this point is all of the third-party software configuration I had in /usr/local/etc and my apache data in /usr/local/www. You can try to experiment with zpool hidden flags. Look at this command: zpool import -N -o readonly=on -f -R /pool pool It will try to import pool in readonly mode so no data would be written on it. It also doesn't mount anything on import so if any fs is damaged you have less chances triggering a coredump. Also zpool import has a hidden -T switch that gives you ability to select transaction that you want to try to restore. You'll need a list of available transaction though: zdb -ul vdev This one when given a vdev lists all uberblocks with their respective transaction ids. You can take the highest one (it's not the last one) and try to mount pool with: zpool import -N -o readonly=on -f -R /pool -F -T transaction_id pool Then check available filesystems. -- Sphinx of black quartz, judge my vow. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem
So recently I was trying to transfer a root-on-ZFS zpool from one pair of disks to a single, larger disk. As I am wont to do, I botched the transfer up and decided to destroy the ZFS filesystems on the destination and start again. Naturally I was up late working on this, being sloppy and drowsy without any coffee, and lo and behold I issued my 'zfs destroy -R' and immediately realized after pressing [ENTER[ that I had given it the source's zpool name. oops. Fortunately I was able to interrupt the procedure with only /usr being destroyed from the pool and I was able to send/receive the truly vital data in my /var partition to the new disk and re-deploy the base system to /usr on the new disk. The only thing I'm really missing at this point is all of the third-party software configuration I had in /usr/local/etc and my apache data in /usr/local/www. After a few minutes on Google I came across this wonderful page: http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script where the author has published information about his python script which locates the uberblocks on the raw disk and shows the user the most recent transaction IDs, prompts the user for a transaction ID to roll back to, and zeroes out all uberblocks beyond that point. Theoretically, I should be able to use this script to get back to the transaction prior to my dreaded 'zfs destroy -R', then be able to recover the data I need (since no further writes have been done to the source disks). First, I know there's a problem in the script on FreeBSD in which the grep pattern for the od output expects a single space between the output elements. I've attached a patch that allows the output to be properly grepped in FreeBSD, so we can actually get to the transaction log. But now we are to my current problem. When attempting to roll back with this script, it tries to dd zero'd bytes to offsets into the disk device (/dev/ada1p3 in my case) where the uberblocks are located. But even with kern.geom.debugflags set to 0x10 (and I am runnign this as root) I get 'Operation not permitted' when the script tries to zero out the unwanted transactions. I'm fairly certain this is because the geom is in use by the ZFS subsystem, as it is still recognized as a part of the original pool. I'm hesitant to zfs export the pool, as I don't know if that wipes the transaction history on the pool. Does anyone have any ideas? Thanks, -Reid zfs_revert-0.1.py.patch Description: Binary data ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem
zpool export does not wipe the transaction history. It does, however, write new labels and some metadata, so there is a very slight chance that it might overwrite some of the blocks that you're trying to recover. But it's probably safe. An alternative, much more complicated, solution would be to have ZFS open the device non-exclusively. This patch will do that. Caveat programmer: I haven't tested this patch in isolation. Change 624068 by willa@willa_SpectraBSD on 2012/08/09 09:28:38 Allow multiple opens of geoms used by vdev_geom. Also ignore the pool guid for spares when checking to decide whether it's ok to attach a vdev. This enables using hotspares to replace other devices, as well as using a given hotspare in multiple pools. We need to investigate alternative solutions in order to allow opening the geoms exclusive. Affected files ... ... //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#2 edit Differences ... //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#2 (text) @@ -179,49 +179,23 @@ gp = g_new_geomf(zfs_vdev_class, zfs::vdev); gp-orphan = vdev_geom_orphan; gp-attrchanged = vdev_geom_attrchanged; - cp = g_new_consumer(gp); - error = g_attach(cp, pp); - if (error != 0) { - printf(%s(%d): g_attach failed: %d\n, __func__, - __LINE__, error); - g_wither_geom(gp, ENXIO); - return (NULL); - } - error = g_access(cp, 1, 0, 1); - if (error != 0) { - printf(%s(%d): g_access failed: %d\n, __func__, - __LINE__, error); - g_wither_geom(gp, ENXIO); - return (NULL); - } - ZFS_LOG(1, Created geom and consumer for %s., pp-name); - } else { - /* Check if we are already connected to this provider. */ - LIST_FOREACH(cp, gp-consumer, consumer) { - if (cp-provider == pp) { - ZFS_LOG(1, Provider %s already in use by ZFS. - Failing attach., pp-name); - return (NULL); - } - } - cp = g_new_consumer(gp); - error = g_attach(cp, pp); - if (error != 0) { - printf(%s(%d): g_attach failed: %d\n, - __func__, __LINE__, error); - g_destroy_consumer(cp); - return (NULL); - } - error = g_access(cp, 1, 0, 1); - if (error != 0) { - printf(%s(%d): g_access failed: %d\n, - __func__, __LINE__, error); - g_detach(cp); - g_destroy_consumer(cp); - return (NULL); - } - ZFS_LOG(1, Created consumer for %s., pp-name); + } + cp = g_new_consumer(gp); + error = g_attach(cp, pp); + if (error != 0) { + printf(%s(%d): g_attach failed: %d\n, __func__, + __LINE__, error); + g_wither_geom(gp, ENXIO); + return (NULL); + } + error = g_access(cp, /*r*/1, /*w*/0, /*e*/0); + if (error != 0) { + printf(%s(%d): g_access failed: %d\n, __func__, + __LINE__, error); + g_wither_geom(gp, ENXIO); + return (NULL); } + ZFS_LOG(1, Created consumer for %s., pp-name); cp-private = vd; vd-vdev_tsd = cp; @@ -251,7 +225,7 @@ cp-private = NULL; gp = cp-geom; - g_access(cp, -1, 0, -1); + g_access(cp, -1, 0, 0); /* Destroy consumer on last close. */ if (cp-acr == 0 cp-ace == 0) { ZFS_LOG(1, Destroyed consumer to %s., cp-provider-name); @@ -384,6 +358,18 @@ cp-provider-name); } +static inline boolean_t +vdev_attach_ok(vdev_t *vd, uint64_t pool_guid, uint64_t vdev_guid) +{ + boolean_t pool_ok; + boolean_t vdev_ok; + + /* Spares can be assigned to multiple pools. */ + pool_ok = vd-vdev_isspare || pool_guid == spa_guid(vd-vdev_spa); + vdev_ok = vdev_guid == vd-vdev_guid; + return (pool_ok vdev_ok); +} + static struct g_consumer * vdev_geom_attach_by_guids(vdev_t *vd) { @@ -420,8 +406,7 @@ g_topology_lock(); g_access(zcp, -1, 0, 0); g_detach(zcp); - if (pguid != spa_guid(vd-vdev_spa) || - vguid != vd-vdev_guid
Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem
On Thu, Jul 11, 2013 at 9:04 AM, Alan Somers asom...@freebsd.org wrote: zpool export does not wipe the transaction history. It does, however, write new labels and some metadata, so there is a very slight chance that it might overwrite some of the blocks that you're trying to recover. But it's probably safe. An alternative, much more complicated, solution would be to have ZFS open the device non-exclusively. This patch will do that. Caveat programmer: I haven't tested this patch in isolation. This change is quite a bit more than necessary, and probably wouldn't apply to FreeBSD given the other changes in the code. Really, to make non-exclusive opens you just have to change the g_access() calls in vdev_geom.c so the third argument is always 0. However, see below. On Thu, Jul 11, 2013 at 8:43 AM, Reid Linnemann linnema...@gmail.com wrote: But now we are to my current problem. When attempting to roll back with this script, it tries to dd zero'd bytes to offsets into the disk device (/dev/ada1p3 in my case) where the uberblocks are located. But even with kern.geom.debugflags set to 0x10 (and I am runnign this as root) I get 'Operation not permitted' when the script tries to zero out the unwanted transactions. I'm fairly certain this is because the geom is in use by the ZFS subsystem, as it is still recognized as a part of the original pool. I'm hesitant to zfs export the pool, as I don't know if that wipes the transaction history on the pool. Does anyone have any ideas? You do not have a choice. Changing the on-disk state does not mean the in-core state will update to match, and the pool could get into a really bad state if you try to modify the transactions on disk while it's online, since it may write additional transactions (which rely on state you're about to destroy), before you export. Also, rolling back transactions in this manner assumes that the original blocks (that were COW'd) are still in their original state. If you're using TRIM or have a pretty full pool, the odds are not in your favor. It's a roll of the dice, in any case. --Will. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem
Will, Thanks, that makes sense. I know this is all a crap shoot, but I've really got nothing to lose at this point, so this is just a good opportunity to rummage around the internals of ZFS and learn a few things. I might even get lucky and recover some data! On Thu, Jul 11, 2013 at 10:59 AM, Will Andrews w...@firepipe.net wrote: On Thu, Jul 11, 2013 at 9:04 AM, Alan Somers asom...@freebsd.org wrote: zpool export does not wipe the transaction history. It does, however, write new labels and some metadata, so there is a very slight chance that it might overwrite some of the blocks that you're trying to recover. But it's probably safe. An alternative, much more complicated, solution would be to have ZFS open the device non-exclusively. This patch will do that. Caveat programmer: I haven't tested this patch in isolation. This change is quite a bit more than necessary, and probably wouldn't apply to FreeBSD given the other changes in the code. Really, to make non-exclusive opens you just have to change the g_access() calls in vdev_geom.c so the third argument is always 0. However, see below. On Thu, Jul 11, 2013 at 8:43 AM, Reid Linnemann linnema...@gmail.com wrote: But now we are to my current problem. When attempting to roll back with this script, it tries to dd zero'd bytes to offsets into the disk device (/dev/ada1p3 in my case) where the uberblocks are located. But even with kern.geom.debugflags set to 0x10 (and I am runnign this as root) I get 'Operation not permitted' when the script tries to zero out the unwanted transactions. I'm fairly certain this is because the geom is in use by the ZFS subsystem, as it is still recognized as a part of the original pool. I'm hesitant to zfs export the pool, as I don't know if that wipes the transaction history on the pool. Does anyone have any ideas? You do not have a choice. Changing the on-disk state does not mean the in-core state will update to match, and the pool could get into a really bad state if you try to modify the transactions on disk while it's online, since it may write additional transactions (which rely on state you're about to destroy), before you export. Also, rolling back transactions in this manner assumes that the original blocks (that were COW'd) are still in their original state. If you're using TRIM or have a pretty full pool, the odds are not in your favor. It's a roll of the dice, in any case. --Will. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Make ZFS use the physical sector size when computing initial ashift
The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. DES -- Dag-Erling Smørgrav - d...@des.no Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c === --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c (revision 253138) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c (working copy) @@ -578,6 +578,7 @@ { struct g_provider *pp; struct g_consumer *cp; + u_int sectorsize; size_t bufsize; int error; @@ -661,8 +662,21 @@ /* * Determine the device's minimum transfer size. + * + * This is a bit of a hack. For performance reasons, we would + * prefer to use the physical sector size (reported by GEOM as + * stripesize) as minimum transfer size. However, doing so + * unconditionally would break existing vdevs. Therefore, we + * compute ashift based on stripesize when the vdev isn't already + * part of a pool (vdev_asize == 0), and sectorsize otherwise. */ - *ashift = highbit(MAX(pp-sectorsize, SPA_MINBLOCKSIZE)) - 1; + if (vd-vdev_asize == 0 pp-stripesize pp-sectorsize + ISP2(pp-stripesize) pp-stripesize = VDEV_PAD_SIZE) { + sectorsize = pp-stripesize; + } else { + sectorsize = pp-sectorsize; + } + *ashift = highbit(MAX(sectorsize, SPA_MINBLOCKSIZE)) - 1; /* * Clear the nowritecache settings, so that on a vdev_reopen() ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
Hi DES, unfortunately you need a quite bit more than this to work compatibly. I've had a patch here that does just this for quite some time but there's been some discussion on how we want additional control over this so its not been commited. If others are interested I've attached this as it achieves what we needed here so may also be of use for others too. There's also a big discussion on illumos about this very subject ATM so I'm monitoring that too. Hopefully there will be a nice conclusion come from that how people want to proceed and we'll be able to get a change in that works for everyone. Regards Steve - Original Message - From: Dag-Erling Smørgrav d...@des.no To: freebsd...@freebsd.org; freebsd-hackers@freebsd.org Cc: ivo...@freebsd.org Sent: Wednesday, July 10, 2013 10:02 AM Subject: Make ZFS use the physical sector size when computing initial ashift The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. zzz-zfs-ashift-fix.patch Description: Binary data ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
Steven Hartland kill...@multiplay.co.uk writes: Hi DES, unfortunately you need a quite bit more than this to work compatibly. *chirp* *chirp* *chirp* DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote: If others are interested I've attached this as it achieves what we needed here so may also be of use for others too. There's also a big discussion on illumos about this very subject ATM so I'm monitoring that too. Hopefully there will be a nice conclusion come from that how people want to proceed and we'll be able to get a change in that works for everyone. Hmm. I wonder if the simplest approach would be the better. I mean, adding a flag to zpool. At home I have a playground FreeBSD machine with a ZFS zmirror, and, you guessed it, I was careless when I purchased the components, I asked for two 1 TB drives and that I got, but different models, one of them advanced format and the other one classic. I don't think it's that bad to create a pool on a classic disk using 4 KB blocks, and it's quite likely that replacement disks will be 4 KB in the near future. Also, if you use SSDs the situation is similar. Borja. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
There's lots more to consider when considering a way foward not least of all ashift isn't a zpool configuration option is per top level vdev, space consideration of moving from 512b to 4k, see previous and current discussions on zfs-de...@freebsd.org and z...@lists.illumos.org for details. Regards Steve - Original Message - From: Borja Marcos bor...@sarenet.es On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote: If others are interested I've attached this as it achieves what we needed here so may also be of use for others too. There's also a big discussion on illumos about this very subject ATM so I'm monitoring that too. Hopefully there will be a nice conclusion come from that how people want to proceed and we'll be able to get a change in that works for everyone. Hmm. I wonder if the simplest approach would be the better. I mean, adding a flag to zpool. At home I have a playground FreeBSD machine with a ZFS zmirror, and, you guessed it, I was careless when I purchased the components, I asked for two 1 TB drives and that I got, but different models, one of them advanced format and the other one classic. I don't think it's that bad to create a pool on a classic disk using 4 KB blocks, and it's quite likely that replacement disks will be 4 KB in the near future. Also, if you use SSDs the situation is similar. This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 07/10/13 02:02, Dag-Erling Smrgrav wrote: The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. I think there are multiple versions of this (I also have one[1]) but the concern is that if one creates a pool with ashift=9, and now ashift=12, the pool gets unimportable. So there need a way to disable this behavior. Another thing (not really related to the automatic detection) is that we need a way to manually override this setting from command line when creating the pool, this is under active discussion at Illumos mailing list right now. [1] https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26 Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- iQEcBAEBCgAGBQJR3ZgAAAoJEG80Jeu8UPuzM6kIALu3Ud4uu+kdcsp+zNS54iw6 Etx2xWOjbHhJ1PZ0BKJ4R5/BOfpW4b1DrarPtpZLxoyg55GwlEVCH8Cia9ucznfP KgFGwzztQlsiI5hcWD6RVNkAx/2o7sSynbprxxP1UdEdmH7f5MWVpNwjGE2KiIpA 0TxfTu8Sg0/QB7h3pGWt5sJSuwyogewvHIfTAgHEqnQdYPXxpadH7PS7shSJVdim z2C9GoyLVQ6BMxXzQDcmA+fllgMZVKXROG7SxDFNDTWPnZ9HMZp2OJKELLtuZB1y Iaq/gd3uPR2ZzPxw2OjdYKe7khWtmuU5Ox6+natsOKCqfoAfCjArA8zJZYsZoMI= =Nd1V -END PGP SIGNATURE- ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 11:21 AM, Xin Li delp...@delphij.net wrote: Signed PGP part On 07/10/13 02:02, Dag-Erling Smrgrav wrote: The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. I think there are multiple versions of this (I also have one[1]) but the concern is that if one creates a pool with ashift=9, and now ashift=12, the pool gets unimportable. So there need a way to disable this behavior. Another thing (not really related to the automatic detection) is that we need a way to manually override this setting from command line when creating the pool, this is under active discussion at Illumos mailing list right now. [1] https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26 Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. -- Justin signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Make ZFS use the physical sector size when computing initial ashift
- Original Message - From: Xin Li -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 07/10/13 02:02, Dag-Erling Sm?rgrav wrote: The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. I think there are multiple versions of this (I also have one[1]) but the concern is that if one creates a pool with ashift=9, and now ashift=12, the pool gets unimportable. So there need a way to disable this behavior. I've tested my patch in all configurations I can think of including exported ashift=9 pools being imported, all no issues. For your example e.g. # Create a 4K pool (min_create_ashift=4K, dev=512) test:src sysctl vfs.zfs.min_create_ashift vfs.zfs.min_create_ashift: 12 test:src mdconfig -a -t swap -s 128m -S 512 -u 0 test:src zpool create mdpool md0 test:src zdb mdpool | grep ashift ashift: 12 ashift: 12 # Create a 512b pool (min_create_ashift=512, dev=512) test:src zpool destroy mdpool test:src sysctl vfs.zfs.min_create_ashift=9 vfs.zfs.min_create_ashift: 12 - 9 test:src zpool create mdpool md0 test:src zdb mdpool | grep ashift ashift: 9 ashift: 9 # Import a 512b pool (min_create_ashift=4K, dev=512) test:src zpool export mdpool test:src sysctl vfs.zfs.min_create_ashift=12 vfs.zfs.min_create_ashift: 9 - 12 test:src zpool import mdpool test:src zdb mdpool | grep ashift ashift: 9 ashift: 9 # Create a 4K pool (min_create_ashift=512, dev=4K) test:src zpool destroy mdpool test:src mdconfig -d -u 0 test:src mdconfig -a -t swap -s 128m -S 4096 -u 0 test:src sysctl vfs.zfs.min_create_ashift=9 vfs.zfs.min_create_ashift: 12 - 9 test:src zpool create mdpool md0 test:src zdb mdpool | grep ashift ashift: 12 ashift: 12 # Import a 4K pool (min_create_ashift=4K, dev=4K) test:src zpool export mdpool test:src sysctl vfs.zfs.min_create_ashift=12 vfs.zfs.min_create_ashift: 9 - 12 test:src zpool import mdpool test:src zdb mdpool | grep ashift ashift: 12 ashift: 12 Another thing (not really related to the automatic detection) is that we need a way to manually override this setting from command line when creating the pool, this is under active discussion at Illumos mailing list right now. [1] https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26 Yep has been on my list for a while, based on previous discussions on zfs-devel@. I've not had any time recently but I'm following the illumos thread to see what conclusions they come to. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 07/10/13 10:38, Justin T. Gibbs wrote: [snip] I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Yes, me too. Your version is superior. Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- iQEcBAEBCgAGBQJR3aQzAAoJEG80Jeu8UPuzHn8H/1ZpoTqAQ4+mgQOttOwXgBcr 2Fgh52ztW8fCEQSeIosxXKO06hP7HxFfTPvmeeWyjT8zIpSUSFV6G0NclebKDncP huGFofvx3BKPRmfzZp4iZx1wWQUxSHTmv6ceDwvP7P8GJ0mON+SrZxmmwUjKrf7V W9Sazl0p8e0nxSQykLyjjrkaBx5Iv+aUxu8Alomwy9BmpM8+gd2yutvzghW5L36L 0CvAtIMXdlc+eUdAqa/2rOk/nMOA9sfWVW0gkKYCZk6wvj2DMzjii05UechZ4Z+l 6nEU3UdVsbTX73CABZv4my4JAWc5Yk1s/cWrxtn68AfK8LMPFJCJcVXXOSckMWI= =351W -END PGP SIGNATURE- ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
- Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. If your interested in the reason for this its explained in the comments in my version which does a very similar thing with validation. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 1:06 PM, Steven Hartland kill...@multiplay.co.uk wrote: - Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) Yes, the approaches are similar. Our current version records the logical access size in the vdev structure too, which might relate to the issue below. One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. -- Justin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
- Original Message - From: Justin T. Gibbs On Jul 10, 2013, at 1:06 PM, Steven Hartland wrote: - Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) Yes, the approaches are similar. Our current version records the logical access size in the vdev structure too, which might relate to the issue below. One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. Yes with your methodology you'll only see the issue if zfs_max_auto_ashift and physical_ashift are both 13, but this can be the case for example on a RAID controller with large stripsize. Looking back at my old patch it too suffers from the same issue along with the current code base, but that would only happen if logical sector size resulted in an ashift 13 which is going to be much less common ;-) Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 1:42 PM, Steven Hartland kill...@multiplay.co.uk wrote: - Original Message - From: Justin T. Gibbs On Jul 10, 2013, at 1:06 PM, Steven Hartland wrote: - Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) Yes, the approaches are similar. Our current version records the logical access size in the vdev structure too, which might relate to the issue below. One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. Yes with your methodology you'll only see the issue if zfs_max_auto_ashift and physical_ashift are both 13, but this can be the case for example on a RAID controller with large stripsize. I'm not sure I follow. logical_ashift is available in our latest code, as is the physical_ashift. But even without the logical_ashift, why doesn't the zio pipeline properly thunk zio_phys_read() access based on the configured ashift? -- Justin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
- Original Message - From: Justin T. Gibbs ... One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. Yes with your methodology you'll only see the issue if zfs_max_auto_ashift and physical_ashift are both 13, but this can be the case for example on a RAID controller with large stripsize. I'm not sure I follow. logical_ashift is available in our latest code, as is the physical_ashift. But even without the logical_ashift, why doesn't the zio pipeline properly thunk zio_phys_read() access based on the configured ashift? When I looked at it, which was a long time ago now so please excuse me if I'm a little rusty on the details, zio_phys_read() was working more luck than judgement as the offsets passed in where calculated from a valid start + increment based on the size of a structure within vdev_label_offset() with no ashift logic applied that I cound find. The result was pools created with large ashift's where unstable when I tested. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
here is my real world production example of users mail as well as documents. /dev/mirror/home1.eli 2788 1545 1243 55% 1941057 20981181 8% /home Not the same data, I imagine. A mix. 90% Mailboxes and user data (documents, pictures), rest are some .tar.gz backups. At other places i have similar situation. one or more gmirror sets, 1-3TB each depends on drives. For those who puts 1000 of mailboxes i recommend dovecot with mdbox storage backend I was dealing with the actual byte counts ... that figure is going to be in whole blocks. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On 2013-01-23 21:22, Wojciech Puchar wrote: While RAID-Z is already a king of bad performance, I don't believe RAID-Z is any worse than RAID5. Do you have any actual measurements to back up your claim? it is clearly described even in ZFS papers. Both on reads and writes it gives single drive random I/O performance. With ZFS and RAID-Z the situation is a bit more complex. Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors). A worst case scenario could happen if your random i/o workload was reading random files each of 2048 bytes. Each file read would require data from 4 disks (5th is parity and won't be read unless there are errors). However if files were 512 bytes or less then only one disk would be used. 1024 bytes - two disks, etc. So ZFS is probably not the best choice to store millions of small files if random access to whole files is the primary concern. But lets look at a different scenario - a PostgreSQL database. Here table data is split and stored in 1GB files. ZFS splits the file into 128KiB records (recordsize property). This record is then again split into 4 columns each 32768 bytes. 5th column is generated containing parity. Each column is then stored on a different disk. You could think of it as a regular RAID-5 with stripe size of 32768 bytes. PostgreSQL uses 8192 byte pages that fit evenly both into ZFS record size and column size. Each page access requires only a single disk read. Random i/o performance here should be 5 times that of a single disk. For me the reliability ZFS offers is far more important than pure performance. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
then stored on a different disk. You could think of it as a regular RAID-5 with stripe size of 32768 bytes. PostgreSQL uses 8192 byte pages that fit evenly both into ZFS record size and column size. Each page access requires only a single disk read. Random i/o performance here should be 5 times that of a single disk. think about writing 8192 byte pages randomly. and then doing linear search over table. For me the reliability ZFS offers is far more important than pure performance. Except it is on paper reliability. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
Wow!.! OK. It sounds like you (or someone like you) can answer some of my burning questions about ZFS. On Thu, Jan 24, 2013 at 8:12 AM, Adam Nowacki nowa...@platinum.linux.plwrote: Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors). A worst case scenario could happen if your random i/o workload was reading random files each of 2048 bytes. Each file read would require data from 4 disks (5th is parity and won't be read unless there are errors). However if files were 512 bytes or less then only one disk would be used. 1024 bytes - two disks, etc. So ZFS is probably not the best choice to store millions of small files if random access to whole files is the primary concern. But lets look at a different scenario - a PostgreSQL database. Here table data is split and stored in 1GB files. ZFS splits the file into 128KiB records (recordsize property). This record is then again split into 4 columns each 32768 bytes. 5th column is generated containing parity. Each column is then stored on a different disk. You could think of it as a regular RAID-5 with stripe size of 32768 bytes. Ok... so my question then would be... what of the small files. If I write several small files at once, does the transaction use a record, or does each file need to use a record? Additionally, if small files use sub-records, when you delete that file, does the sub-record get moved or just wasted (until the record is completely free)? I'm considering the difference, say, between cyrus imap (one file per message ZFS, database files on different ZFS filesystem) and dbmail imap (postgresql on ZFS). ... now I realize that PostgreSQL on ZFS has some special issues (but I don't have a choice here between ZFS and non-ZFS ... ZFS has already been chosen), but I'm also figuring that PostgreSQL on ZFS has some waste compared to cyrus IMAP on ZFS. So far in my research, Cyrus makes some compelling arguments that the common use case of most IMAP database files is full scan --- for which it's database files are optimized and SQL-based files are not. I agree that some operations can be more efficient in a good SQL database, but full scan (as a most often used query) is not. Cyrus also makes sense to me as a collection of small files ... for which I expect ZFS to excel... including the ability to snapshot with impunity... but I am terribly curious how the files are handled in transactions. I'm actually (right now) running some filesize statistics (and I'll get back to the list, if asked), but I'd like to know how ZFS is going to store the arriving mail... :). ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
several small files at once, does the transaction use a record, or does each file need to use a record? Additionally, if small files use sub-records, when you delete that file, does the sub-record get moved or just wasted (until the record is completely free)? writes of small files are always good with ZFS. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On 2013-01-24 15:24, Wojciech Puchar wrote: For me the reliability ZFS offers is far more important than pure performance. Except it is on paper reliability. This on paper reliability in practice saved a 20TB pool. See one of my previous emails. Any other filesystem or hardware/software raid without per-disk checksums would have failed. Silent corruption of non-important files would be the best case, complete filesystem death by important metadata corruption as the worst case. I've been using ZFS for 3 years in many systems. Biggest one has 44 disks and 4 ZFS pools - this one survived SAS expander disconnects, a few kernel panics and countless power failures (UPS only holds for a few hours). So far I've not lost a single ZFS pool or any data stored. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On 2013-01-24 15:45, Zaphod Beeblebrox wrote: Ok... so my question then would be... what of the small files. If I write several small files at once, does the transaction use a record, or does each file need to use a record? Additionally, if small files use sub-records, when you delete that file, does the sub-record get moved or just wasted (until the record is completely free)? Each file is a fully self-contained object (together with full parity) all the way to the physical storage. A 1 byte file on RAID-Z2 pool will always use 3 disks, 3 sectors total for data alone. You can use du to verify - it reports physical size together with parity. Metadata like directory entry or file attributes is stored separately and shared with other files. For small files there may be a lot of wasted space. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
Ok... here's the existing data: There are 3,236,316 files summing to 97,500,008,691 bytes. That puts the average file at 30,127 bytes. But for the full breakdown: 512 : 7758 1024 : 139046 2048 : 1468904 4096 : 325375 8192 : 492399 16384 : 324728 32768 : 263210 65536 : 102407 131072 : 43046 262144 : 22259 524288 : 17136 1048576 : 13788 2097152 : 8279 4194304 : 4501 8388608 : 2317 16777216 : 1045 33554432 : 119 67108864 : 2 I produced that list with the output of ls -R's byte counts, sorted and then processed with: (while read num; do count=$[count+1]; if [ $num -gt $size ]; then echo $size : $count;size=$[size*2]; count=0; fi; done) imapfilesizelist ... now the new machine has two 2T disks in a ZFS mirror --- so I suppose it won't waste as much space as a RAID-Z ZFS --- in that files less than 512 bytes will take 512 bytes? By far the most common case is 2048 bytes ... so that would indicate that a RAID-Z larger than 5 disks would waste much space. Does that go to your recomendations on vdev size, then? To have an 8 or 9 disk vdev, you should be storing at smallest 4k files? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
So far I've not lost a single ZFS pool or any data stored. so far my house wasn't robbed. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
There are 3,236,316 files summing to 97,500,008,691 bytes. That puts the average file at 30,127 bytes. But for the full breakdown: quite low. what do you store. here is my real world production example of users mail as well as documents. /dev/mirror/home1.eli 2788 1545 124355% 1941057 209811818% /home ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Thu, Jan 24, 2013 at 2:26 PM, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: There are 3,236,316 files summing to 97,500,008,691 bytes. That puts the average file at 30,127 bytes. But for the full breakdown: quite low. what do you store. Apparently you're not really following this thread... just trolling? I had said that it was cyrus IMAP data (which, for reference, is one file per email message). here is my real world production example of users mail as well as documents. /dev/mirror/home1.eli 2788 1545 124355% 1941057 209811818% /home Not the same data, I imagine. I was dealing with the actual byte counts ... that figure is going to be in whole blocks. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Jan 24, 2013, at 4:24 PM, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: Except it is on paper reliability. This on paper reliability saved my ass numerous times. For example I had one home NAS server machine with flaky SATA controller that would not detect one of the four drives from time to time on reboot. This made my pool degraded several times, and even rebooting with let's say disk4 failed to a situation that disk3 is failed did not corrupt any data. I don't think this is possible with any other open source FS, let alone hardware RAID that would drop the whole array because of this. I have never ever personally lost any data on ZFS. Yes, the performance is another topic, and you must know what you are doing, and what is your usage pattern, but from reliability standpoint, to me ZFS looks more durable than anything else. P.S.: My home NAS is running freebsd-CURRENT with ZFS from the first version available. Several drives died, two times the pool was expanded by replacing all drives one by one and resilvered, no single byte lost. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
While RAID-Z is already a king of bad performance, I don't believe RAID-Z is any worse than RAID5. Do you have any actual measurements to back up your claim? it is clearly described even in ZFS papers. Both on reads and writes it gives single drive random I/O performance. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
This is because RAID-Z spreads each block out over all disks, whereas RAID5 (as it is typically configured) puts each block on only one disk. So to read a block from RAID-Z, all data disks must be involved, vs. for RAID5 only one disk needs to have its head moved. For other workloads (especially streaming reads/writes), there is no fundamental difference, though of course implementation quality may vary. streaming workload generally is always good. random I/O is what is important. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On 23 Jan 2013 20:23, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: While RAID-Z is already a king of bad performance, I don't believe RAID-Z is any worse than RAID5. Do you have any actual measurements to back up your claim? it is clearly described even in ZFS papers. Both on reads and writes it gives single drive random I/O performance. So we have to take your word for it? Provide a link if you're going to make assertions, or they're no more than your own opinion. Chris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote: So we have to take your word for it? Provide a link if you're going to make assertions, or they're no more than your own opinion. I've heard this same thing -- every vdev == 1 drive in performance. I've never seen any proof/papers on it though. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Wed, Jan 23, 2013 at 12:22 PM, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: While RAID-Z is already a king of bad performance, I don't believe RAID-Z is any worse than RAID5. Do you have any actual measurements to back up your claim? it is clearly described even in ZFS papers. Both on reads and writes it gives single drive random I/O performance. For reads - true. For writes it's probably behaves better than RAID5 as it does not have to go through read-modify-write for partial block updates. Search for RAID-5 write hole. If you need higher performance, build your pool out of multiple RAID-Z vdevs. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Wed, Jan 23, 2013 at 1:09 PM, Mark Felder f...@feld.me wrote: On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote: So we have to take your word for it? Provide a link if you're going to make assertions, or they're no more than your own opinion. I've heard this same thing -- every vdev == 1 drive in performance. I've never seen any proof/papers on it though. 1 drive in performance only applies to number of random i/o operations vdev can perform. You still get increased throughput. I.e. 5-drive RAIDZ will have 4x bandwidth of individual disks in vdev, but would deliver only as many IOPS as the slowest drive as record would have to be read back from N-1 or N-2 drived in vdev. It's the same for RAID5. IMHO for identical record/block size RAID5 has no advantage over RAID-Z for reads and does have disadvantage when it comes to small writes. Never mind lack of data integrity checks and other bells and whistles ZFS provides. --Artem ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
I've heard this same thing -- every vdev == 1 drive in performance. I've never seen any proof/papers on it though. read original ZFS papers. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
gives single drive random I/O performance. For reads - true. For writes it's probably behaves better than RAID5 yes, because as with reads it gives single drive performance. small writes on RAID5 gives lower than single disk performance. If you need higher performance, build your pool out of multiple RAID-Z vdevs. even you need normal performance use gmirror and UFS ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On 23 January 2013 21:24, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: I've heard this same thing -- every vdev == 1 drive in performance. I've never seen any proof/papers on it though. read original ZFS papers. No, you are making the assertion, provide a link. Chris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
1 drive in performance only applies to number of random i/o operations vdev can perform. You still get increased throughput. I.e. 5-drive RAIDZ will have 4x bandwidth of individual disks in vdev, but unless your work is serving movies it doesn't matter. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote: So we have to take your word for it? Provide a link if you're going to make assertions, or they're no more than your own opinion. I've heard this same thing -- every vdev == 1 drive in performance. I've never seen any proof/papers on it though. first google answer from request raids performance https://blogs.oracle.com/roch/entry/when_to_and_not_to Effectively, as a first approximation, an N-disk RAID-Z group will behave as a single device in terms of deliveredrandom input IOPS. Thus a 10-disk group of devices each capable of 200-IOPS, will globally act as a 200-IOPS capable RAID-Z group. This is the price to pay to achieve proper data protection without the 2X block overhead associated with mirroring. -- Michel Talon ta...@lpthe.jussieu.fr smime.p7s Description: S/MIME cryptographic signature
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Wed, Jan 23, 2013 at 1:25 PM, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: gives single drive random I/O performance. For reads - true. For writes it's probably behaves better than RAID5 yes, because as with reads it gives single drive performance. small writes on RAID5 gives lower than single disk performance. If you need higher performance, build your pool out of multiple RAID-Z vdevs. even you need normal performance use gmirror and UFS I've no objection. If it works for you -- go for it. For me personally ZFS performance is good enough, and data integrity verification is something that I'm willing to sacrifice some performance for. ZFS scrub gives me either warm and fuzzy feeling that everything is OK, or explicitly tells me that something bad happened *and* reconstructs the data if it's possible. Just my $0.02, --Artem ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Jan 23, 2013, at 11:09 PM, Mark Felder f...@feld.me wrote: On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote: So we have to take your word for it? Provide a link if you're going to make assertions, or they're no more than your own opinion. I've heard this same thing -- every vdev == 1 drive in performance. I've never seen any proof/papers on it though. ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org Here is a blog post that describes why this is true for IOPS: http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On 23 Jan 2013 21:45, Michel Talon ta...@lpthe.jussieu.fr wrote: On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote: So we have to take your word for it? Provide a link if you're going to make assertions, or they're no more than your own opinion. I've heard this same thing -- every vdev == 1 drive in performance. I've never seen any proof/papers on it though. first google answer from request raids performance https://blogs.oracle.com/roch/entry/when_to_and_not_to Effectively, as a first approximation, an N-disk RAID-Z group will behave as a single device in terms of deliveredrandom input IOPS. Thus a 10-disk group of devices each capable of 200-IOPS, will globally act as a 200-IOPS capable RAID-Z group. This is the price to pay to achieve proper data protection without the 2X block overhead associated with mirroring. Thanks for the link, but I could have done that; I am attempting to explain to Wojciech that his habit of making bold assertions and arrogantly refusing to back them up makes for frustrating reading. Chris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
associated with mirroring. Thanks for the link, but I could have done that; I am attempting to explain to Wojciech that his habit of making bold assertions and as you can see it is not a bold assertion, just you use something without even reading it's docs. Not mentioning doing any more research. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
even you need normal performance use gmirror and UFS I've no objection. If it works for you -- go for it. both works. For todays trend of solving everything by more hardware ZFS may even have enough performance. But still it is dangerous for a reasons i explained, as well as it promotes bad setups and layouts like making single filesystem out of large amount of disks. This is bad for no matter what filesystem and RAID setup you use, or even what OS. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On 01/23/13 14:27, Wojciech Puchar wrote: both works. For todays trend of solving everything by more hardware ZFS may even have enough performance. But still it is dangerous for a reasons i explained, as well as it promotes bad setups and layouts like making single filesystem out of large amount of disks. This is bad for no matter what filesystem and RAID setup you use, or even what OS. ZFS mirror performance is quite good (both random IO and sequential), and resilvers/scrubs are measured in an hour or less. You can always make pool out of these instead of RAIDZ if you can get away with less total available space. I think RAIDZ vs Gmirror is a bad comparison, you can use a ZFS mirror with all the ZFS features, plus N-way (not sure if gmirror does this). Regarding single large filesystems, there is an old saying about not putting all your eggs into one basket, even if it's a great basket :) Matt ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On Mon, Jan 21, 2013 at 11:36 PM, Peter Jeremy pe...@rulingia.com wrote: On 2013-Jan-21 12:12:45 +0100, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: While RAID-Z is already a king of bad performance, I don't believe RAID-Z is any worse than RAID5. Do you have any actual measurements to back up your claim? Leaving aside anecdotal evidence (or actual measurements), RAID-Z is fundamentally slower than RAID4/5 *for random reads*. This is because RAID-Z spreads each block out over all disks, whereas RAID5 (as it is typically configured) puts each block on only one disk. So to read a block from RAID-Z, all data disks must be involved, vs. for RAID5 only one disk needs to have its head moved. For other workloads (especially streaming reads/writes), there is no fundamental difference, though of course implementation quality may vary. Even better - use UFS. To each their own. As a ZFS developer, it should come as no surprise that in my opinion and experience, the benefits of ZFS almost always outweigh this downside. --matt ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
Please don't misinterpret this post: ZFS's ability to recover from fairly catastrophic failures is pretty stellar, but I'm wondering if there can be from my testing it is exactly opposite. You have to see a difference between marketing and reality. a little room for improvement. I use RAID pretty much everywhere. I don't like to loose data and disks are cheap. I have a fair amount of experience with all flavors ... and ZFS just like me. And because i want performance and - as you described - disks are cheap - i use RAID-1 (gmirror). has become a go-to filesystem for most of my applications. My applications doesn't tolerate low performance, overcomplexity and high risk of data loss. That's why i use properly tuned UFS, gmirror, and prefer not to use gstripe but have multiple filesystems One of the best recommendations I can give for ZFS is it's crash-recoverability. Which is marketing, not truth. If you want bullet-proof recoverability, UFS beats everything i've ever seen. If you want FAST crash recovery, use softupdates+journal, available in FreeBSD 9. As a counter example, if you have most hardware RAID going or a software whole-disk raid, after a crash it will generally declare one disk as good and the other disk as to be repaired ... after which a full surface scan of the affected disks --- reading one and writing the other --- ensues. true. gmirror do it, but you can defer mirror rebuild, which i use. I have a script that send me a mail when gmirror is degraded, and i - after finding out the cause of problem, and possibly replacing disk - run rebuild after work hours, so no slowdown is experienced. ZFS is smart on this point: it will recover on reboot with a minimum amount of fuss. Even if you dislodge a drive ... so that it's missing the last 'n' transactions, ZFS seems to figure this out (which I thought was extra cudos). Yes this is marketing. practice is somehow different. as you discovered yourself. MY PROBLEM comes from problems that scrub can fix. Let's talk, in specific, about my home array. It has 9x 1.5T and 8x 2T in a RAID-Z configuration (2 sets, obviously). While RAID-Z is already a king of bad performance, i assume you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools you would spread load unevenly and make performance even worse. A full scrub of my drives weighs in at 36 hours or so. which is funny as ZFS is marketed as doing this efficient (like checking only used space). dd if=/dev/disk of=/dev/null bs=2m would take no more than a few hours. and you may do all in parallel. vr2/cvs:0x1c1 Now ... this is just an example: after each scrub, the hex number was seems like scrub simply not do it's work right. before the old error was cleared. Then this new error gets similarly cleared by the next scrub. It seems that if the scrub returned to this new found error after fixing the known errors, this could save whole new scrub runs from being required. Even better - use UFS. For both bullet proof recoverability and performance. If you need help in tuning you may ask me privately. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On 2013-Jan-21 12:12:45 +0100, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: That's why i use properly tuned UFS, gmirror, and prefer not to use gstripe but have multiple filesystems When I started using ZFS, I didn't fully trust it so I had a gmirrored UFS root (including a full src tree). Over time, I found that gmirror plus UFS was giving me more problems than ZFS. In particular, I was seeing behaviour that suggested that the mirrors were out of sync, even though gmirror insisted they were in sync. Unfortunately, there is no way to get gmirror to verify the mirroring or to get UFS to check correctness of data or metadata (fsck can only check metadata consistency). I've since moved to a ZFS root. Which is marketing, not truth. If you want bullet-proof recoverability, UFS beats everything i've ever seen. I've seen the opposite. One big difference is that ZFS is designed to ensure it returns the data that was written to it whereas UFS just returns the bytes it finds where it thinks it wrote your data. One side effect of this is that ZFS is far fussier about hardware quality - since it checksums everything, it is likely to pick up glitches that UFS doesn't notice. If you want FAST crash recovery, use softupdates+journal, available in FreeBSD 9. I'll admit that I haven't used SU+J but one downside of SU+J is that it prevents the use of snapshots, which in turn prevents the (safe) use of dump(8) (which is the official tool for UFS backups) on live filesystems. of fuss. Even if you dislodge a drive ... so that it's missing the last 'n' transactions, ZFS seems to figure this out (which I thought was extra cudos). Yes this is marketing. practice is somehow different. as you discovered yourself. Most of the time this works as designed. It's possible there are bugs in the implementation. While RAID-Z is already a king of bad performance, I don't believe RAID-Z is any worse than RAID5. Do you have any actual measurements to back up your claim? i assume you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools you would spread load unevenly and make performance even worse. There's no real reason why you could't have 2 different vdevs in the same pool. A full scrub of my drives weighs in at 36 hours or so. which is funny as ZFS is marketed as doing this efficient (like checking only used space). It _does_ only check used space but it does so in logical order rather than physical order. For a fragmented pool, this means random accesses. Even better - use UFS. Then you'll never know that your data has been corrupted. For both bullet proof recoverability and performance. use ZFS. -- Peter Jeremy pgpo1y4DGw4Rb.pgp Description: PGP signature
ZFS regimen: scrub, scrub, scrub and scrub again.
Please don't misinterpret this post: ZFS's ability to recover from fairly catastrophic failures is pretty stellar, but I'm wondering if there can be a little room for improvement. I use RAID pretty much everywhere. I don't like to loose data and disks are cheap. I have a fair amount of experience with all flavors ... and ZFS has become a go-to filesystem for most of my applications. One of the best recommendations I can give for ZFS is it's crash-recoverability. As a counter example, if you have most hardware RAID going or a software whole-disk raid, after a crash it will generally declare one disk as good and the other disk as to be repaired ... after which a full surface scan of the affected disks --- reading one and writing the other --- ensues. On my Windows desktop, the pair of 2T's take 3 or 4 hours to do this. A pair of green 2T's can take over 6. You don't loose any data, but you have severely reduced performance until it's repaired. The rub is that you know only one or two blocks could possibly even be different ... and that this is a highly unoptimized way of going about the problem. ZFS is smart on this point: it will recover on reboot with a minimum amount of fuss. Even if you dislodge a drive ... so that it's missing the last 'n' transactions, ZFS seems to figure this out (which I thought was extra cudos). MY PROBLEM comes from problems that scrub can fix. Let's talk, in specific, about my home array. It has 9x 1.5T and 8x 2T in a RAID-Z configuration (2 sets, obviously). The drives themselves are housed (4 each) in external drive bays with a single SATA connection for each. I think I have spoken of this here before. A full scrub of my drives weighs in at 36 hours or so. Now around Christmas, while moving some things, I managed to pull the plug on one cabinet of 4 drives. It was likely that the only active use of the filesystem was an automated cvs checkin (backup) given that the errors only appeared on the cvs directory. IN-THE-END, no data was lost, but I had to scrub 4 times to remove the complaints, which showed like this from zpool status -v errors: Permanent errors have been detected in the following files: vr2/cvs:0x1c1 Now ... this is just an example: after each scrub, the hex number was different. I also couldn't actually find the error on the cvs filesystem, as a side note. Not many files are stored there, and they all seemed to be present. MY TAKEAWAY from this is that 2 major improvements could be made to ZFS: 1) a pause for scrub... such that long scrubs could be paused during working hours. 2) going back over errors... during each scrub, the new error was found before the old error was cleared. Then this new error gets similarly cleared by the next scrub. It seems that if the scrub returned to this new found error after fixing the known errors, this could save whole new scrub runs from being required. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
Hi, On 01/20/13 23:26, Zaphod Beeblebrox wrote: 1) a pause for scrub... such that long scrubs could be paused during working hours. While not exactly pause, but isn't playing with scrub_delay works here? vfs.zfs.scrub_delay: Number of ticks to delay scrub Set this to a high value during working hours, and set back to its normal (or even below) value off working hours. (maybe resilver delay, or some other values should also be set, I haven't yet read the relevant code) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
On Mon, Dec 17, 2012 at 05:22:50PM -0500, Rick Macklem wrote: Zaphod Beeblebrox wrote: Does windows 7 support nfs v4, then? Is it expected (ie: is it worthwhile trying) that nfsv4 would perform at a similar speed to iSCSI? It would seem that this at least requires active directory (or this user name mapping ... which I remember being hard). As far as I know, there is no NFSv4 in Windows. I only made the comment (which I admit was a bit off topic), because the previous post had stated SMB or NFS, they're the same or something like that.) There was work on an NFSv4 client for Windows being done by CITI at the Univ. of Michigan funded by Microsoft research, but I have no idea if it was ever released. There appears to be an implementation of NFSV4 {client,server} for Windows available from OpenText (via their acquisition of Hummingbird). This would not be a free product. I have no experience with their NFSV4 stuff, so have no comments on the speed... -Fred Whiteside ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
On 12/12/2012 17:57, Zaphod Beeblebrox wrote: The performance of the iSCSI disk is about the same as the local disk for some operations --- faster for some, slower for others. The workstation has 12G of memory and it's my perception that iSCSI is heavily cached and that this enhances it's performance. The second launch of a game ... or the second switch into an area (ie: loading a specific piece of geometry again) is very fast. The performance on the SMB share is abysmal compared to the performance on the iSCSI share. At the very least, there seems to be little benifit to launching the same application twice --- which is most likely windows fault. Think about what you have there: With iSCSI you have a block device, which is seen on your workstation as a disk drive, on which it creates a local file system (NTFS), and does *everything* like it is using a local disk drive. This includes caching, access permission calculations, file locking, etc. With a network file system (either SMB or NFS, it doesn't matter), you need to ask the server for *each* of the following situations: * to ask the server if a file has been changed so the client can use cached data (if the protocol supports it) * to ask the server if a file (or a portion of a file) has been locked by another client This basically means that for almost every single IO, you need to ask the server for something, which involves network traffic and round-trip delays. (there are smarter network protocols, and even extensions to SMB and NFS, but they are not widely used) signature.asc Description: OpenPGP digital signature
Re: iSCSI vs. SMB with ZFS.
With a network file system (either SMB or NFS, it doesn't matter), you need to ask the server for *each* of the following situations: * to ask the server if a file has been changed so the client can use cached data (if the protocol supports it) * to ask the server if a file (or a portion of a file) has been locked by another client not really if there is only one user of file - then windows know this, but change to behaviour you described when there are more users. AND FINALLY the latter behaviour fails to work properly since windows XP (worked fine with windows 98). If you use programs that read/write share same files you may be sure data corruption would happen. you have to set locking = yes oplocks = no level2 oplocks = no to make it work properly but even more slow!. This basically means that for almost every single IO, you need to ask the server for something, which involves network traffic and round-trip delays. Not that. The problem is that windows do not use all free memory for caching as with local or local (iSCSI) disk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
Wojciech Puchar wrote: With a network file system (either SMB or NFS, it doesn't matter), you need to ask the server for *each* of the following situations: * to ask the server if a file has been changed so the client can use cached data (if the protocol supports it) * to ask the server if a file (or a portion of a file) has been locked by another client not really if there is only one user of file - then windows know this, but change to behaviour you described when there are more users. AND FINALLY the latter behaviour fails to work properly since windows XP (worked fine with windows 98). If you use programs that read/write share same files you may be sure data corruption would happen. you have to set locking = yes oplocks = no level2 oplocks = no to make it work properly but even more slow!. Btw, NFSv4 has delegations, which are essentially level2 oplocks. They can be enabled for a server if the volumes exported via NFSv4 are not being accessed locally (including Samba). For them to work, the nfscbd needs to be running on the client(s) and the clients must have IP addresses visible to the server for a callback TCP connection (no firewalls or NAT gateways). Even with delegations working, the client caching is limited to the buffer cache. I have an experimental patch that uses on-disk caching in the client for delegated files (I call it packrats), but it is not ready for production use. Now that I have the 4.1 client in place, I plan to get back to working on it. rick This basically means that for almost every single IO, you need to ask the server for something, which involves network traffic and round-trip delays. Not that. The problem is that windows do not use all free memory for caching as with local or local (iSCSI) disk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
Does windows 7 support nfs v4, then? Is it expected (ie: is it worthwhile trying) that nfsv4 would perform at a similar speed to iSCSI? It would seem that this at least requires active directory (or this user name mapping ... which I remember being hard). ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
Zaphod Beeblebrox wrote: Does windows 7 support nfs v4, then? Is it expected (ie: is it worthwhile trying) that nfsv4 would perform at a similar speed to iSCSI? It would seem that this at least requires active directory (or this user name mapping ... which I remember being hard). As far as I know, there is no NFSv4 in Windows. I only made the comment (which I admit was a bit off topic), because the previous post had stated SMB or NFS, they're the same or something like that.) There was work on an NFSv4 client for Windows being done by CITI at the Univ. of Michigan funded by Microsoft research, but I have no idea if it was ever released. rick ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
On Mon, Dec 17, 2012 at 2:22 PM, Rick Macklem rmack...@uoguelph.ca wrote: Zaphod Beeblebrox wrote: Does windows 7 support nfs v4, then? Is it expected (ie: is it worthwhile trying) that nfsv4 would perform at a similar speed to iSCSI? It would seem that this at least requires active directory (or this user name mapping ... which I remember being hard). As far as I know, there is no NFSv4 in Windows. I only made the comment (which I admit was a bit off topic), because the previous post had stated SMB or NFS, they're the same or something like that.) There was work on an NFSv4 client for Windows being done by CITI at the Univ. of Michigan funded by Microsoft research, but I have no idea if it was ever released. rick http://www.citi.umich.edu/projects/nfsv4/ Projects: NFS Version 4 Open Source Reference Implementation We are developing an implementation of NFSv4 and NFSv4.1 for Linux http://www.citi.umich.edu/projects/nfsv4/windows/ http://www.citi.umich.edu/projects/nfsv4/windows/readme.html http://www.citi.umich.edu/projects/ Thank you very much . Mehmet Erol Sanliturk ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
you cannot compare file serving and block device serving. On Mon, 17 Dec 2012, Zaphod Beeblebrox wrote: Does windows 7 support nfs v4, then? Is it expected (ie: is it worthwhile trying) that nfsv4 would perform at a similar speed to iSCSI? It would seem that this at least requires active directory (or this user name mapping ... which I remember being hard). ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
iSCSI vs. SMB with ZFS.
So... I have two machines. My Fileserver is a core-2-duo machine with FreeBSD-9.1-ish ZFS, istgt and samba 3.6. My workstation is windows 7 on an i7. Both have GigE and are connected directly via a managed switch with jumbo packets (specifically 9016) enabled. Both are using tagged vlan packets to the switch (if that matters at all). Some time ago, I created a 2T iSCSI disk on ZFS to serve the Steam directory (games) on my C drive as it was growing rather large. I've been quite happy with this. The performance of the iSCSI disk is about the same as the local disk for some operations --- faster for some, slower for others. The workstation has 12G of memory and it's my perception that iSCSI is heavily cached and that this enhances it's performance. The second launch of a game ... or the second switch into an area (ie: loading a specific piece of geometry again) is very fast. But this is imperfect. The iSCSI disk reserves all of it's space and the files on the disk are only accessible to the computer that mounts it. The most recent Steam update supported an easy way to put steam folders on other disks and partitions. I created another Steam folder on an SMB share from the same server and proceeded to move one of my games there. The performance on the SMB share is abysmal compared to the performance on the iSCSI share. At the very least, there seems to be little benifit to launching the same application twice --- which is most likely windows fault. I haven't done any major amount of tuning on the SMB share lately, but the last time I cared, it was setup reasonably... with TCPNODELAY and whatnot. I also notice that my copy of smbd runs with 1 thread (according to top) rather than the 11 threads that istgt uses. Does this breakdown of performance square with other's experiences? Will SMB always have significantly less performance than iSCSI coming from ZFS? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
about the same as the local disk for some operations --- faster for some, slower for others. The workstation has 12G of memory and it's my perception that iSCSI is heavily cached and that this enhances it's any REAL test means doing something that will not fit in cache. But this is imperfect. The iSCSI disk reserves all of it's space and the files on the disk are only accessible to the computer that mounts it. it is even more imperfect. you layout one filesystem over another filesystem. not only degraded performance but you don't have parallel access to files on this disk. The performance on the SMB share is abysmal compared to the performance on the iSCSI share. At the very least, there seems to be little benifit to launching the same application twice --- which is most likely windows fault. This is SMB protocol. sorry it is stupid. And it doesn't make real use of cache. This is how windows file sharing works. Fine if you just want to copy files. not fine if you work on them. Will SMB always have significantly less performance than iSCSI coming depends what you do but yes SMB is not efficient. i am happy with SMB as it is enough to store users or shared documents. And it is quite fast on large file copy etc. but terrible on randomly accessing files. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
as you show your needs for unshared data for single workstation is in order of single large hard drive. reducing drive count on file server by one and connecting this one drive directly to workstation is the best solution ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
On Wed, Dec 12, 2012 at 5:16 PM, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: about the same as the local disk for some operations --- faster for some, slower for others. The workstation has 12G of memory and it's my perception that iSCSI is heavily cached and that this enhances it's any REAL test means doing something that will not fit in cache. That's plainly not true at all on it's face. It depends on what you're testing. In this particular test, I'm looking at the performance of the components on a singular common task --- that of running a game. It's common to run a game more than once and it's common to move from area to area in the game loading, unloading and reloading the same data. My test is a valid comparison of the two modes of loading the game ... from iSCSI and from SMB. You cold criticize me for several things --- I only tested two games or I have unrealistically large and powerful hardware, but really... consider what you are testing before you pontificate on test design. And even in the case where you want to look at the enterprise performance of a system, knowing both the cache performance and the disk performance is better than only knowing one or other. Throughput is a combination of these features. Pure disk performance serves as a lower bound, but cache performance (especially on some of the ZFS systems people are creating these days ... with 100's of gigs of RAM) is an equally valid statistic and optimization. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
common to move from area to area in the game loading, unloading and reloading the same data. My test is a valid comparison of the two modes of loading the game ... from iSCSI and from SMB. i don't know how windows cache network shares (iSCSI is treated as local not network). Here is a main problem IMHO. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: iSCSI vs. SMB with ZFS.
-Original Message- From: Zaphod Beeblebrox Sent: Wednesday, December 12, 2012 6:57 PM To: FreeBSD Hackers Subject: iSCSI vs. SMB with ZFS. So... I have two machines. My Fileserver is a core-2-duo machine with FreeBSD-9.1-ish ZFS, istgt and samba 3.6. My workstation is windows 7 on an i7. Both have GigE and are connected directly via a managed switch with jumbo packets (specifically 9016) enabled. Both are using tagged vlan packets to the switch (if that matters at all). My experience on samba has been that it’s slow whatever one does to tweak it (probably just too linux-centric code to start with or whatever...) Just as another datapoint – do you have tried NFS yet? Win7 has NFS available as OS component, although not installed by default? -Reko ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Zfs import issue
Hi, I am importing zfs snapshot to freebsd-9 from anther host running freebsd-9. When the import happens, it locks the filesystem, df hangs and unable to use the filesystem. Once the import completes, the filesystem is back to normal and read/write works fine. The same doesnt happen in Solaris/OpenIndiana. # uname -an FreeBSD hostname 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 r...@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 Zfs ver: 28 Any inputs would be helpful. Is there any way to overcome this freeze ? Regards, Ram ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Looking for testers / feedback for ZFS recieve properties options
We encountered a problem receiving a full ZFS stream from a disk we had backed up. The problem was the receive was aborting due to quota being exceeded so I did some digging around and found that Oracle ZFS now has -x and -o options as documented here:- http://docs.oracle.com/cd/E23824_01/html/821-1462/zfs-1m.html Seems this has been raised as a feature request upstream: https://www.illumos.org/issues/2745 Anyway being stuck with a backup we couldn't restore I had a play a implementing these options and have a prototype up and running, which I'd like feedback on. This patch also adds a -l option which allows the streams to be limited to those specified. Another option which I think would be useful and seemed relatively painless to add. The initial version of the patch which is based off 8.3-RELEASE can be found here: http://blog.multiplay.co.uk/dropzone/freebsd/zfs-recv-properties.patch Any feedback appreciated Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD ZFS source
Oliver and Chris, thanks. 3 aug 2012 kl. 00.19 skrev Oliver Pinter: http://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/common/ http://svnweb.freebsd.org/base/head/cddl/contrib/opensolaris/lib/ On 8/2/12, Fredrik starkbe...@gmail.com wrote: Hello, Excuse me for this newb question but exactly where are the current ZFS files located? I have been looking at the CVS on freebsd.org under /src/contrib/opensolaris/ but that does not seem to be the current ones. Is this correct? Regards___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
FreeBSD ZFS source
Hello, Excuse me for this newb question but exactly where are the current ZFS files located? I have been looking at the CVS on freebsd.org under /src/contrib/opensolaris/ but that does not seem to be the current ones. Is this correct? Regards___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD ZFS source
On Thu, Aug 02, 2012 at 22:48:50 +0200 , Fredrik wrote: Hello, Excuse me for this newb question but exactly where are the current ZFS files located? I have been looking at the CVS on freebsd.org under /src/contrib/opensolaris/ but that does not seem to be the current ones. Is this correct? $ find /usr/src -type d -iname zfs /usr/src/cddl/contrib/opensolaris/cmd/zfs /usr/src/cddl/sbin/zfs /usr/src/lib/libprocstat/zfs /usr/src/sys/boot/zfs /usr/src/sys/cddl/boot/zfs /usr/src/sys/cddl/contrib/opensolaris/common/zfs /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs /usr/src/sys/modules/zfs /usr/src/tools/regression/zfs Those are probably a good start. Some of them just contain a Makefile pointing you elsewhere in the tree, though. I might have missed something, and I'm sure someone will correct me if I have. -- Thanks and best regards, Chris Nehren pgp4qvQrT9NKB.pgp Description: PGP signature
Re: FreeBSD ZFS source
http://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/common/ http://svnweb.freebsd.org/base/head/cddl/contrib/opensolaris/lib/ On 8/2/12, Fredrik starkbe...@gmail.com wrote: Hello, Excuse me for this newb question but exactly where are the current ZFS files located? I have been looking at the CVS on freebsd.org under /src/contrib/opensolaris/ but that does not seem to be the current ones. Is this correct? Regards___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Root on ZFS GPT and boot to ufs partition
System install for manual http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot only + freebsd-ufs (ada0p2) uname -a FreeBSD beastie.mydomain.local 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r229812: Mon Jan 9 19:08:10 MSK 2012 andrey@beastie.mydomain.local:/usr/obj/usr/src/sys/W_BOOK amd64 gpart show = 34 625142381 ada0 GPT (298G) 34128 1 freebsd-boot (64k) 162 26621952 2 freebsd-ufs (12G) 266221148388608 3 freebsd-swap (4.0G) 35010722 590131693 4 freebsd-zfs (281G) boot code MBR (pmbr) and gptzfsboot loader In the old loader was F1,F2,F3 new no :( Is there a way to boot system freebsd-ufs (ada0p2) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Root on ZFS GPT and boot to ufs partition
Andrey Fesenko wrote: System install for manual http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot only + freebsd-ufs (ada0p2) uname -a FreeBSD beastie.mydomain.local 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r229812: Mon Jan 9 19:08:10 MSK 2012 andrey@beastie.mydomain.local:/usr/obj/usr/src/sys/W_BOOK amd64 gpart show =34 625142381 ada0 GPT (298G) 34128 1 freebsd-boot (64k) 162 26621952 2 freebsd-ufs (12G) 266221148388608 3 freebsd-swap (4.0G) 35010722 590131693 4 freebsd-zfs (281G) boot code MBR (pmbr) and gptzfsboot loader In the old loader was F1,F2,F3 new no :( Is there a way to boot system freebsd-ufs (ada0p2) `gpart set -a bootonce -i 2 ada0` should do. -- Sphinx of black quartz judge my vow. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Root on ZFS GPT and boot to ufs partition
On Mon, Jan 23, 2012 at 7:18 PM, Volodymyr Kostyrko c.kw...@gmail.com wrote: Andrey Fesenko wrote: System install for manual http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot only + freebsd-ufs (ada0p2) uname -a FreeBSD beastie.mydomain.local 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r229812: Mon Jan 9 19:08:10 MSK 2012 andrey@beastie.mydomain.local:/usr/obj/usr/src/sys/W_BOOK amd64 gpart show = 34 625142381 ada0 GPT (298G) 34 128 1 freebsd-boot (64k) 162 26621952 2 freebsd-ufs (12G) 26622114 8388608 3 freebsd-swap (4.0G) 35010722 590131693 4 freebsd-zfs (281G) boot code MBR (pmbr) and gptzfsboot loader In the old loader was F1,F2,F3 new no :( Is there a way to boot system freebsd-ufs (ada0p2) `gpart set -a bootonce -i 2 ada0` should do. -- Sphinx of black quartz judge my vow. # gpart set -a bootonce -i 2 ada0 bootonce set on ada0p2 #shutdown -r now No, not work. After reboot freebsd-zfs (ada0p4) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device
On 23/08/2011 03:23, Peter Jeremy wrote: On 2011-Aug-22 12:45:08 +0200, Ivan Vorasivo...@freebsd.org wrote: It would be suboptimal but only for the slight waste of space that would have otherwise been reclaimed if the block or fragment size remained 512 or 2K. This waste of space is insignificant for the vast majority of users and there are no performance penalties, so it seems that switching to 4K sectors by default for all file systems would actually be a good idea. This is heavily dependent on the size distribution. I can't quickly check for ZFS but I've done some quick checks on UFS. The following are sizes in MB for my copies of the listed trees with different UFS frag size. These include directories but not indirect blocks: 1b 512b 1024b 2048b 4096b 4430 4511 4631 4875 5457 /usr/ncvs 4910 5027 5181 5499 6133 Old FreeBSD SVN repo 299 370 485733 1252 /usr/ports cheched out from CVS 467 485 509557656 /usr/src 8-stable checkout from CVS Note that the ports tree grew by 50% going from 1K to 2K frags and will grow by another 70% going to 4KB frags. Similar issues will be seen when you have lots of small file. I agree but there are at least two things going for making the increase anyway: 1) 2 TB drives cost $80 2) Where the space is really important, the person in charge usually knows it and can choose a non-default size like 512b fragments. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device
On 23 August 2011 10:52, Ivan Voras ivo...@freebsd.org wrote: I agree but there are at least two things going for making the increase anyway: 1) 2 TB drives cost $80 2) Where the space is really important, the person in charge usually knows it and can choose a non-default size like 512b fragments. helpers like sysinstall should help with choosing the smaller blocks for smaller drives (especially SSD) Aled ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device
On 23/08/2011 11:59, Aled Morris wrote: On 23 August 2011 10:52, Ivan Vorasivo...@freebsd.org wrote: I agree but there are at least two things going for making the increase anyway: 1) 2 TB drives cost $80 2) Where the space is really important, the person in charge usually knows it and can choose a non-default size like 512b fragments. helpers like sysinstall should help with choosing the smaller blocks for smaller drives (especially SSD) Only via hints and help text. Too much magic in the installer leads to awkward choices :) (e.g. first you need to distinguish between a VM with a small drive, a SSD small drive, or a SAN small volume... it quickly turns into an AI-class problem). ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device
On 19/08/2011 14:21, Aled Morris wrote: On 19 August 2011 11:15, Tom Evanstevans...@googlemail.com wrote: On Thu, Aug 18, 2011 at 6:50 PM, Yuriy...@rawbw.com wrote: Some latest hard drives have logical sectors of 512 byte when they actually have 4k physical sectors. ... Shouldn't UFS and ZFS drivers be able to either read the right sector size from the underlying device or at least issue a warning? The device never reports the actual sector size, so unless FreeBSD keeps a database of 4k sector hard drives that report as 512 byte sector hard drives, there is nothing that can be done. At what point should we change the default in newfs/zfs to 4k? It is already changed for UFS in 9. I guess formatting the filesystem for 4k sectors on a 512b drive would still work but it would be suboptimal. What would the performance penalty be in reality? It would be suboptimal but only for the slight waste of space that would have otherwise been reclaimed if the block or fragment size remained 512 or 2K. This waste of space is insignificant for the vast majority of users and there are no performance penalties, so it seems that switching to 4K sectors by default for all file systems would actually be a good idea. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org