Re: ZFS Boot Menu

2013-10-08 Thread Volodymyr Kostyrko

06.10.2013 08:54, Teske, Devin wrote:


On Sep 30, 2013, at 6:20 AM, Volodymyr Kostyrko wrote:


29.09.2013 00:30, Teske, Devin wrote:

Interested in feedback, but moreover I would like to see who is
interested in tackling this with me? I can't do it alone... I at least
need testers whom will provide feedback and edge-case testing.


Sign me in, I'm not fluent with forth but testing something new is always fun.



Cool; to start with, do you have a virtual appliance software like VMware or
VirtualBox? Experience with generating ZFS pools in said software?


VirtualBox/Qemu, Qemu is able to emulate boot to serial for example. And 
yes I tried working with ZFS in VMs.



I think that we may have something to test next month.

Right now, we're working on the ability of bsdinstall(8) to provision Boot on
ZFS as a built-in functionality.


That sounds cool.

--
Sphinx of black quartz, judge my vow.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS Boot Menu

2013-10-05 Thread Teske, Devin

On Sep 30, 2013, at 6:20 AM, Volodymyr Kostyrko wrote:

 29.09.2013 00:30, Teske, Devin wrote:
 Interested in feedback, but moreover I would like to see who is
 interested in tackling this with me? I can't do it alone... I at least
 need testers whom will provide feedback and edge-case testing.
 
 Sign me in, I'm not fluent with forth but testing something new is always fun.
 

Cool; to start with, do you have a virtual appliance software like VMware or
VirtualBox? Experience with generating ZFS pools in said software?

I think that we may have something to test next month.

Right now, we're working on the ability of bsdinstall(8) to provision Boot on
ZFS as a built-in functionality.

This feature (when in; projected for 10.0-BETA1) will make testing the Forth
enhancements easier (IMHO).
-- 
Devin

_
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS Boot Menu

2013-09-30 Thread Lars Engels

Am 28.09.2013 23:30, schrieb Teske, Devin:

In my recent interview on bsdnow.tv, I was pinged on BEs in Forth.

I'd like to revisit this.

Back on Sept 20th, 2012, I posted some pics demonstrating what
exactly code that was in HEAD (at the time) was/is capable of.

These three pictures (posted the same day) tell a story:
1. You boot to the menu: http://twitpic.com/b1eswi/full
2. You select option #5 to get here: http://twitpic.com/b1etyb/full
3. You select option #2 to get here: http://twitpic.com/b1ew47/full

I've just (today) uploaded the /boot/menu.rc file(s) that I used to 
create

those pictures:

http://druidbsd.cvs.sf.net/viewvc/druidbsd/zfsbeastie/

NB: There's a README file to go along with the files.

HINT: diff -pu menu.rc.1.current-head menu.rc.2.cycler
HINT: diff -pu menu.rc.1.current-head menu.rc.2.dynamic-submenu

Interested in feedback, but moreover I would like to see who is
interested in tackling this with me? I can't do it alone... I at least
need testers whom will provide feedback and edge-case testing.


Woohoo! Great! I am using ZFS boot environments with beadm, so I can 
test a bit.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS Boot Menu

2013-09-30 Thread Volodymyr Kostyrko

29.09.2013 00:30, Teske, Devin wrote:

Interested in feedback, but moreover I would like to see who is
interested in tackling this with me? I can't do it alone... I at least
need testers whom will provide feedback and edge-case testing.


Sign me in, I'm not fluent with forth but testing something new is 
always fun.


--
Sphinx of black quartz, judge my vow.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


ZFS Boot Menu

2013-09-28 Thread Teske, Devin
In my recent interview on bsdnow.tv, I was pinged on BEs in Forth.

I'd like to revisit this.

Back on Sept 20th, 2012, I posted some pics demonstrating what
exactly code that was in HEAD (at the time) was/is capable of.

These three pictures (posted the same day) tell a story:
1. You boot to the menu: http://twitpic.com/b1eswi/full
2. You select option #5 to get here: http://twitpic.com/b1etyb/full
3. You select option #2 to get here: http://twitpic.com/b1ew47/full

I've just (today) uploaded the /boot/menu.rc file(s) that I used to create
those pictures:

http://druidbsd.cvs.sf.net/viewvc/druidbsd/zfsbeastie/

NB: There's a README file to go along with the files.

HINT: diff -pu menu.rc.1.current-head menu.rc.2.cycler
HINT: diff -pu menu.rc.1.current-head menu.rc.2.dynamic-submenu

Interested in feedback, but moreover I would like to see who is
interested in tackling this with me? I can't do it alone... I at least
need testers whom will provide feedback and edge-case testing.
-- 
Devin

_
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Zfs encryption property for freebsd 8.3

2013-09-03 Thread Emre Çamalan
Hi, 
I want to encrypt some disk on my server with Zfs encryption property but it is 
not available.

Are there anybody have got an experience about this?


[url]http://docs.oracle.com/cd/E23824_01/html/821-1448/gkkih.html#scrolltoc[/url]
[url]http://www.oracle.com/technetwork/articles/servers-storage-admin/manage-zfs-encryption-1715034.html[/url]

These are good explanations but I got an error and output shows all property;


[root@HP ~]# zpool status
  pool: output
 state: ONLINE
  scan: none requested
config:

NAMESTATE READ WRITE CKSUM
output  ONLINE   0 0 0
  ad0s1eONLINE   0 0 0

errors: No known data errors
[root@HP ~]# zfs create -o encryption=on output/home
cannot create 'output/home': invalid property 'encryption'
[root@HP ~]# zfs get encryption
bad property list: invalid property 'encryption'
usage:
get [-rHp] [-d max] [-o all | field[,...]] [-t type[,...]] [-s 
source[,...]]
all | property[,...] [filesystem|volume|snapshot] ...

The following properties are supported:

PROPERTY   EDIT  INHERIT   VALUES

availableNO   NO   size
clones   NO   NO   dataset[,...]
compressratioNO   NO   1.00x or higher if compressed
creation NO   NO   date
defer_destroyNO   NO   yes | no
mounted  NO   NO   yes | no
origin   NO   NO   snapshot
refcompressratio  NO   NO   1.00x or higher if compressed
referenced   NO   NO   size
type NO   NO   filesystem | volume | snapshot
used NO   NO   size
usedbychildren   NO   NO   size
usedbydatasetNO   NO   size
usedbyrefreservation  NO   NO   size
usedbysnapshots  NO   NO   size
userrefs NO   NO   count
written  NO   NO   size
aclinherit  YES  YES   discard | noallow | restricted | 
passthrough | passthrough-x
aclmode YES  YES   discard | groupmask | passthrough | 
restricted
atime   YES  YES   on | off
canmountYES   NO   on | off | noauto
casesensitivity  NO  YES   sensitive | insensitive | mixed
checksumYES  YES   on | off | fletcher2 | fletcher4 | sha256
compression YES  YES   on | off | lzjb | gzip | gzip-[1-9] | zle
copies  YES  YES   1 | 2 | 3
dedup   YES  YES   on | off | verify | sha256[,verify]
devices YES  YES   on | off
execYES  YES   on | off
jailed  YES  YES   on | off
logbias YES  YES   latency | throughput
mlslabelYES  YES   sensitivity label
mountpoint  YES  YES   path | legacy | none
nbmand  YES  YES   on | off
normalizationNO  YES   none | formC | formD | formKC | formKD
primarycacheYES  YES   all | none | metadata
quota   YES   NO   size | none
readonlyYES  YES   on | off
recordsize  YES  YES   512 to 128k, power of 2
refquotaYES   NO   size | none
refreservation  YES   NO   size | none
reservation YES   NO   size | none
secondarycache  YES  YES   all | none | metadata
setuid  YES  YES   on | off
sharenfsYES  YES   on | off | share(1M) options
sharesmbYES  YES   on | off | sharemgr(1M) options
snapdir YES  YES   hidden | visible
syncYES  YES   standard | always | disabled
utf8only NO  YES   on | off
version YES   NO   1 | 2 | 3 | 4 | 5 | current
volblocksize NO  YES   512 to 128k, power of 2
volsize YES   NO   size
vscan   YES  YES   on | off
xattr   YES  YES   on | off
userused@... NO   NO   size
groupused@...NO   NO   size
userquota@...   YES   NO   size | none
groupquota@...  YES   NO   size | none
written@snap   NO   NO   size

Sizes are specified in bytes with standard units such as K, M, G, etc.

User-defined properties can be specified by using a name containing a colon (:).

The {user|group}{used|quota}@ properties must be appended with
a user or group specifier of one of these forms:
POSIX name  (eg: matt)
POSIX id(eg: 126829)
SMB name@domain (eg: matt@sun)
SMB SID (eg: S-1-234-567-89)
[root@HP ~]# 
-

How can I use or add encryption property to FreeBsd 8.3?
___
freebsd

Re: Zfs encryption property for freebsd 8.3

2013-09-03 Thread Florent Peterschmitt
Le 03/09/2013 14:14, Emre Çamalan a écrit :
 Hi, 
 I want to encrypt some disk on my server with Zfs encryption property but it 
 is not available.

That would require ZFS v30. As far as I am aware Oracle has not
released the code under CDDL.

From http://forums.freebsd.org/showthread.php?t=30036

So you can use ZFS pools on GELI volumes, it can be a good start. I not
play with it.

-- 
Florent Peterschmitt   | Please:
flor...@peterschmitt.fr|  * Avoid HTML/RTF in E-mail.
+33 (0)6 64 33 97 92   |  * Send PDF for documents.
http://florent.peterschmitt.fr |  * Trim your quotations. Really.
Proudly powered by Open Source | Thank you :)



signature.asc
Description: OpenPGP digital signature


Re: Zfs encryption property for freebsd 8.3

2013-09-03 Thread Alan Somers
On Tue, Sep 3, 2013 at 6:22 AM, Florent Peterschmitt
flor...@peterschmitt.fr wrote:
 Le 03/09/2013 14:14, Emre Çamalan a écrit :
 Hi,
 I want to encrypt some disk on my server with Zfs encryption property but it 
 is not available.

 That would require ZFS v30. As far as I am aware Oracle has not
 released the code under CDDL.

Oracle's ZFS encryption is crap anyway.  It works at the filesystem
level, not the pool level, so a lot of metadata is in plaintext; I
don't remember how much exactly.  It's also highly vulnerable to
watermarking attacks.


 From http://forums.freebsd.org/showthread.php?t=30036

 So you can use ZFS pools on GELI volumes, it can be a good start. I not
 play with it.

GELI is full-disk encryption.  It's far superior to ZFS encryption.


 --
 Florent Peterschmitt   | Please:
 flor...@peterschmitt.fr|  * Avoid HTML/RTF in E-mail.
 +33 (0)6 64 33 97 92   |  * Send PDF for documents.
 http://florent.peterschmitt.fr |  * Trim your quotations. Really.
 Proudly powered by Open Source | Thank you :)

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Zfs encryption property for freebsd 8.3

2013-09-03 Thread Florent Peterschmitt
Le 03/09/2013 16:53, Alan Somers a écrit :
 GELI is full-disk encryption.  It's far superior to ZFS encryption.

Yup, but is there a possibility to encrypt a ZFS volume (not a whole
pool) with a separate GELI partition?

Also, in-ZFS encryption would be a nice thing if it could work like an
LVM/LUKS where each logical LVM volume can be encrypted or not and have
its own crypt key.

I saw that Illumos has ZFS encrytion in the TODO list.

-- 
Florent Peterschmitt   | Please:
flor...@peterschmitt.fr|  * Avoid HTML/RTF in E-mail.
+33 (0)6 64 33 97 92   |  * Send PDF for documents.
http://florent.peterschmitt.fr |  * Trim your quotations. Really.
Proudly powered by Open Source | Thank you :)



signature.asc
Description: OpenPGP digital signature


Re: Zfs encryption property for freebsd 8.3

2013-09-03 Thread Alan Somers
On Tue, Sep 3, 2013 at 9:01 AM, Florent Peterschmitt
flor...@peterschmitt.fr wrote:
 Le 03/09/2013 16:53, Alan Somers a écrit :
 GELI is full-disk encryption.  It's far superior to ZFS encryption.

 Yup, but is there a possibility to encrypt a ZFS volume (not a whole
 pool) with a separate GELI partition?

You mean encrypt a zvol with GELI and put a file system on that?  I
suppose that would work, but I bet that it would be slow.


 Also, in-ZFS encryption would be a nice thing if it could work like an
 LVM/LUKS where each logical LVM volume can be encrypted or not and have
 its own crypt key.

My understanding is that this is exactly how Oracle's ZFS encryption
works.  Each ZFS filesystem can have its own key, or be in plaintext.
Every cryptosystem involves a tradeoff between security and
convenience, and ZFS encryption goes fairly hard toward convenience.
In particular, Oracle decided that encrypted files must be
deduplicatable.  A necessary result is that they are trivially
vulnerable to watermarking attacks.

https://blogs.oracle.com/darren/entry/zfs_encryption_what_is_on


 I saw that Illumos has ZFS encrytion in the TODO list.

 --
 Florent Peterschmitt   | Please:
 flor...@peterschmitt.fr|  * Avoid HTML/RTF in E-mail.
 +33 (0)6 64 33 97 92   |  * Send PDF for documents.
 http://florent.peterschmitt.fr |  * Trim your quotations. Really.
 Proudly powered by Open Source | Thank you :)

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-31 Thread Dmitry Morozovsky
On Fri, 30 Aug 2013, Patrick wrote:

 On Fri, Aug 30, 2013 at 1:30 AM, Andriy Gapon a...@freebsd.org wrote:
 
  I don't have an exact recollection of what is installed by freebsd-update - 
  are
  *.symbols files installed?
 
 Doesn't look like it. I wonder if I can grab that from a distro site
 or somewhere?

it seems so:

marck@woozle:/pub/FreeBSD/releases/amd64/8.4-RELEASE/kernels grep -c symbol 
generic.mtree
636

So, get kernels subdir from the release and extract symbols from them:

cat generic.?? | tar tvjf - \*.symbols

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-31 Thread Dmitry Morozovsky
On Sat, 31 Aug 2013, Dmitry Morozovsky wrote:

   I don't have an exact recollection of what is installed by freebsd-update 
   - are
   *.symbols files installed?
  
  Doesn't look like it. I wonder if I can grab that from a distro site
  or somewhere?
 
 it seems so:
 
 marck@woozle:/pub/FreeBSD/releases/amd64/8.4-RELEASE/kernels grep -c symbol 
 generic.mtree
 636
 
 So, get kernels subdir from the release and extract symbols from them:
 
 cat generic.?? | tar tvjf - \*.symbols

ah, ``tar xvjf'' of course -- I did test-run

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-30 Thread Patrick
On Thu, Aug 29, 2013 at 2:32 PM, Andriy Gapon a...@freebsd.org wrote:
 on 29/08/2013 19:37 Patrick said the following:
 I've got a system running on a VPS that I'm trying to upgrade from 8.2
 to 8.4. It has a ZFS root. After booting the new kernel, I get:

 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x40
 fault code  = supervisor read data, page not present
 instruction pointer = 0x20:0x810d7691
 stack pointer   = 0x28:0xff81ba60
 frame pointer   = 0x28:0xff81ba90
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 1 (kernel)
 trap number = 12
 panic: page fault
 cpuid = 0
 KDB: stack backtrace:
 #0 0x8066cb96 at kdb_backtrace+0x66
 #1 0x8063925e at panic+0x1ce
 #2 0x809c21d0 at trap_fatal+0x290
 #3 0x809c255e at trap_pfault+0x23e
 #4 0x809c2a2e at trap+0x3ce
 #5 0x809a9624 at calltrap+0x8
 #6 0x810df517 at vdev_mirror_child_select+0x67

 If possible, please run 'kgdb /path/to/8.4/kernel' and then in kgdb do 'list
 *vdev_mirror_child_select+0x67'

H...

(kgdb) list *vdev_mirror_child_select+0x67
No symbol table is loaded.  Use the file command.

Do I need to build the kernel from source myself? This kernel is what
freebsd-update installed during part 1 of the upgrade.

Patrick
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-30 Thread Andriy Gapon
on 30/08/2013 11:17 Patrick said the following:
 H...
 
 (kgdb) list *vdev_mirror_child_select+0x67
 No symbol table is loaded.  Use the file command.
 
 Do I need to build the kernel from source myself? This kernel is what
 freebsd-update installed during part 1 of the upgrade.

I don't have an exact recollection of what is installed by freebsd-update - are
*.symbols files installed?

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-30 Thread Patrick
On Fri, Aug 30, 2013 at 1:30 AM, Andriy Gapon a...@freebsd.org wrote:

 I don't have an exact recollection of what is installed by freebsd-update - 
 are
 *.symbols files installed?

Doesn't look like it. I wonder if I can grab that from a distro site
or somewhere?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-30 Thread Patrick
On Fri, Aug 30, 2013 at 1:30 AM, Andriy Gapon a...@freebsd.org wrote:

 I don't have an exact recollection of what is installed by freebsd-update - 
 are
 *.symbols files installed?

Doesn't look like it. I wonder if I can grab that from a distro site
or somewhere?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-29 Thread Patrick
I've got a system running on a VPS that I'm trying to upgrade from 8.2
to 8.4. It has a ZFS root. After booting the new kernel, I get:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x40
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x810d7691
stack pointer   = 0x28:0xff81ba60
frame pointer   = 0x28:0xff81ba90
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 1 (kernel)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0x8066cb96 at kdb_backtrace+0x66
#1 0x8063925e at panic+0x1ce
#2 0x809c21d0 at trap_fatal+0x290
#3 0x809c255e at trap_pfault+0x23e
#4 0x809c2a2e at trap+0x3ce
#5 0x809a9624 at calltrap+0x8
#6 0x810df517 at vdev_mirror_child_select+0x67
#7 0x810dfacc at vdev_mirror_io_start+0x24c
#8 0x810f7c52 at zio_vdev_io_start+0x232
#9 0x810f76f3 at zio_execute+0xc3
#10 0x810f77ad at zio_wait+0x2d
#11 0x8108991e at arc_read+0x6ce
#12 0x8109d9d4 at dmu_objset_open_impl+0xd4
#13 0x810b4014 at dsl_pool_init+0x34
#14 0x810c7eea at spa_load+0x6aa
#15 0x810c90b2 at spa_load_best+0x52
#16 0x810cb0ca at spa_open_common+0x14a
#17 0x810a892d at dsl_dir_open_spa+0x2cd
Uptime: 3s
Cannot dump. Device not defined or unavailable.

I've booted back into the 8.2 kernel without any problems, but I'm
wondering if anyone can suggest what I should try to get this working?
I used freebsd-update to upgrade, and this was after the first
freebsd-update install where it installs the kernel.

My /boot/loader.conf has:

zfs_load=YES
vfs.root.mountfrom=zfs:zroot

Should I be going from 8.2 - 8.3 - 8.4?

Patrick
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-29 Thread Andriy Gapon
on 29/08/2013 19:37 Patrick said the following:
 I've got a system running on a VPS that I'm trying to upgrade from 8.2
 to 8.4. It has a ZFS root. After booting the new kernel, I get:
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x40
 fault code  = supervisor read data, page not present
 instruction pointer = 0x20:0x810d7691
 stack pointer   = 0x28:0xff81ba60
 frame pointer   = 0x28:0xff81ba90
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 1 (kernel)
 trap number = 12
 panic: page fault
 cpuid = 0
 KDB: stack backtrace:
 #0 0x8066cb96 at kdb_backtrace+0x66
 #1 0x8063925e at panic+0x1ce
 #2 0x809c21d0 at trap_fatal+0x290
 #3 0x809c255e at trap_pfault+0x23e
 #4 0x809c2a2e at trap+0x3ce
 #5 0x809a9624 at calltrap+0x8
 #6 0x810df517 at vdev_mirror_child_select+0x67

If possible, please run 'kgdb /path/to/8.4/kernel' and then in kgdb do 'list
*vdev_mirror_child_select+0x67'

 #7 0x810dfacc at vdev_mirror_io_start+0x24c
 #8 0x810f7c52 at zio_vdev_io_start+0x232
 #9 0x810f76f3 at zio_execute+0xc3
 #10 0x810f77ad at zio_wait+0x2d
 #11 0x8108991e at arc_read+0x6ce
 #12 0x8109d9d4 at dmu_objset_open_impl+0xd4
 #13 0x810b4014 at dsl_pool_init+0x34
 #14 0x810c7eea at spa_load+0x6aa
 #15 0x810c90b2 at spa_load_best+0x52
 #16 0x810cb0ca at spa_open_common+0x14a
 #17 0x810a892d at dsl_dir_open_spa+0x2cd
 Uptime: 3s
 Cannot dump. Device not defined or unavailable.
 
 I've booted back into the 8.2 kernel without any problems, but I'm
 wondering if anyone can suggest what I should try to get this working?
 I used freebsd-update to upgrade, and this was after the first
 freebsd-update install where it installs the kernel.


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem

2013-07-14 Thread Stefan Esser
Am 12.07.2013 14:33, schrieb Volodymyr Kostyrko:
 You can try to experiment with zpool hidden flags. Look at this command:
 
 zpool import -N -o readonly=on -f -R /pool pool
 
 It will try to import pool in readonly mode so no data would be written
 on it. It also doesn't mount anything on import so if any fs is damaged
 you have less chances triggering a coredump. Also zpool import has a
 hidden -T switch that gives you ability to select transaction that you
 want to try to restore. You'll need a list of available transaction though:
 
 zdb -ul vdev
 
 This one when given a vdev lists all uberblocks with their respective
 transaction ids. You can take the highest one (it's not the last one)
 and try to mount pool with:
 
 zpool import -N -o readonly=on -f -R /pool -F -T transaction_id pool

I had good luck with ZFS recovery with the following approach:

1) Use zdb to identify a TXG for which the data structures are intact

2) Select recovery mode by loading the ZFS KLD with vfs.zfs.recover=1
   set in /boot/loader.conf

3) Import the pool with the above -T option referring to a suitable TXG
   found with the help zdb.

The zdb commands to use are:

# zdb -AAA -L -t TXG -bcdmu POOL

(Both -AAA and -L reduce the amount of consistency checking performed.
A pool (at TXG) that needs these options to allow zdb to succeed is
damaged, but may still allow recovery of most or all files. Be sure
to only import that pool R/O, or your data will probably be lost!)

A list of TXGs to try can be retrieved with zdb -hh POOL.

You may need to add -e to the list of zdb options, since the port is
exported / not currently mounted).

Regards, STefan
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem

2013-07-12 Thread Volodymyr Kostyrko

11.07.2013 17:43, Reid Linnemann написав(ла):

So recently I was trying to transfer a root-on-ZFS zpool from one pair of
disks to a single, larger disk. As I am wont to do, I botched the transfer
up and decided to destroy the ZFS filesystems on the destination and start
again. Naturally I was up late working on this, being sloppy and drowsy
without any coffee, and lo and behold I issued my 'zfs destroy -R' and
immediately realized after pressing [ENTER[ that I had given it the
source's zpool name. oops. Fortunately I was able to interrupt the
procedure with only /usr being destroyed from the pool and I was able to
send/receive the truly vital data in my /var partition to the new disk and
re-deploy the base system to /usr on the new disk. The only thing I'm
really missing at this point is all of the third-party software
configuration I had in /usr/local/etc and my apache data in /usr/local/www.


You can try to experiment with zpool hidden flags. Look at this command:

zpool import -N -o readonly=on -f -R /pool pool

It will try to import pool in readonly mode so no data would be written 
on it. It also doesn't mount anything on import so if any fs is damaged 
you have less chances triggering a coredump. Also zpool import has a 
hidden -T switch that gives you ability to select transaction that you 
want to try to restore. You'll need a list of available transaction though:


zdb -ul vdev

This one when given a vdev lists all uberblocks with their respective 
transaction ids. You can take the highest one (it's not the last one) 
and try to mount pool with:


zpool import -N -o readonly=on -f -R /pool -F -T transaction_id pool

Then check available filesystems.

--
Sphinx of black quartz, judge my vow.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem

2013-07-12 Thread Reid Linnemann
Hey presto!

/ zfs list
NAME USED  AVAIL  REFER  MOUNTPOINT
bucket   485G  1.30T   549M  legacy
bucket/tmp21K  1.30T21K  legacy
bucket/usr  29.6G  1.30T  29.6G  /mnt/usr
bucket/var   455G  1.30T  17.7G  /mnt/var
bucket/var/srv   437G  1.30T   437G  /mnt/var/srv

There's my old bucket! Thanks much for the hidden -T argument, Volodymyr!
Now I can get back the remainder of my missing configuration.


On Fri, Jul 12, 2013 at 7:33 AM, Volodymyr Kostyrko c.kw...@gmail.comwrote:

 11.07.2013 17:43, Reid Linnemann написав(ла):

  So recently I was trying to transfer a root-on-ZFS zpool from one pair of
 disks to a single, larger disk. As I am wont to do, I botched the transfer
 up and decided to destroy the ZFS filesystems on the destination and start
 again. Naturally I was up late working on this, being sloppy and drowsy
 without any coffee, and lo and behold I issued my 'zfs destroy -R' and
 immediately realized after pressing [ENTER[ that I had given it the
 source's zpool name. oops. Fortunately I was able to interrupt the
 procedure with only /usr being destroyed from the pool and I was able to
 send/receive the truly vital data in my /var partition to the new disk and
 re-deploy the base system to /usr on the new disk. The only thing I'm
 really missing at this point is all of the third-party software
 configuration I had in /usr/local/etc and my apache data in
 /usr/local/www.


 You can try to experiment with zpool hidden flags. Look at this command:

 zpool import -N -o readonly=on -f -R /pool pool

 It will try to import pool in readonly mode so no data would be written on
 it. It also doesn't mount anything on import so if any fs is damaged you
 have less chances triggering a coredump. Also zpool import has a hidden -T
 switch that gives you ability to select transaction that you want to try to
 restore. You'll need a list of available transaction though:

 zdb -ul vdev

 This one when given a vdev lists all uberblocks with their respective
 transaction ids. You can take the highest one (it's not the last one) and
 try to mount pool with:

 zpool import -N -o readonly=on -f -R /pool -F -T transaction_id pool

 Then check available filesystems.

 --
 Sphinx of black quartz, judge my vow.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem

2013-07-11 Thread Reid Linnemann
So recently I was trying to transfer a root-on-ZFS zpool from one pair of
disks to a single, larger disk. As I am wont to do, I botched the transfer
up and decided to destroy the ZFS filesystems on the destination and start
again. Naturally I was up late working on this, being sloppy and drowsy
without any coffee, and lo and behold I issued my 'zfs destroy -R' and
immediately realized after pressing [ENTER[ that I had given it the
source's zpool name. oops. Fortunately I was able to interrupt the
procedure with only /usr being destroyed from the pool and I was able to
send/receive the truly vital data in my /var partition to the new disk and
re-deploy the base system to /usr on the new disk. The only thing I'm
really missing at this point is all of the third-party software
configuration I had in /usr/local/etc and my apache data in /usr/local/www.

After a few minutes on Google I came across this wonderful page:

http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script

where the author has published information about his python script which
locates the uberblocks on the raw disk and shows the user the most recent
transaction IDs, prompts the user for a transaction ID to roll back to, and
zeroes out all uberblocks beyond that point. Theoretically, I should be
able to use this script to get back to the transaction prior to my dreaded
'zfs destroy -R', then be able to recover the data I need (since no further
writes have been done to the source disks).

First, I know there's a problem in the script on FreeBSD in which the grep
pattern for the od output expects a single space between the output
elements. I've attached a patch that allows the output to be properly
grepped in FreeBSD, so we can actually get to the transaction log.

But now we are to my current problem. When attempting to roll back with
this script, it tries to dd zero'd bytes to offsets into the disk device
(/dev/ada1p3 in my case) where the uberblocks are located. But even
with kern.geom.debugflags
set to 0x10 (and I am runnign this as root) I get 'Operation not permitted'
when the script tries to zero out the unwanted transactions. I'm fairly
certain this is because the geom is in use by the ZFS subsystem, as it is
still recognized as a part of the original pool. I'm hesitant to zfs export
the pool, as I don't know if that wipes the transaction history on the
pool. Does anyone have any ideas?

Thanks,
-Reid


zfs_revert-0.1.py.patch
Description: Binary data
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem

2013-07-11 Thread Alan Somers
zpool export does not wipe the transaction history.  It does,
however, write new labels and some metadata, so there is a very slight
chance that it might overwrite some of the blocks that you're trying
to recover.  But it's probably safe.  An alternative, much more
complicated, solution would be to have ZFS open the device
non-exclusively.  This patch will do that.  Caveat programmer: I
haven't tested this patch in isolation.

Change 624068 by willa@willa_SpectraBSD on 2012/08/09 09:28:38

Allow multiple opens of geoms used by vdev_geom.
Also ignore the pool guid for spares when checking to decide whether
it's ok to attach a vdev.

This enables using hotspares to replace other devices, as well as
using a given hotspare in multiple pools.

We need to investigate alternative solutions in order to allow
opening the geoms exclusive.

Affected files ...

... 
//SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#2
edit

Differences ...

 
//SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c#2
(text) 

@@ -179,49 +179,23 @@
gp = g_new_geomf(zfs_vdev_class, zfs::vdev);
gp-orphan = vdev_geom_orphan;
gp-attrchanged = vdev_geom_attrchanged;
-   cp = g_new_consumer(gp);
-   error = g_attach(cp, pp);
-   if (error != 0) {
-   printf(%s(%d): g_attach failed: %d\n, __func__,
-  __LINE__, error);
-   g_wither_geom(gp, ENXIO);
-   return (NULL);
-   }
-   error = g_access(cp, 1, 0, 1);
-   if (error != 0) {
-   printf(%s(%d): g_access failed: %d\n, __func__,
-  __LINE__, error);
-   g_wither_geom(gp, ENXIO);
-   return (NULL);
-   }
-   ZFS_LOG(1, Created geom and consumer for %s., pp-name);
-   } else {
-   /* Check if we are already connected to this provider. */
-   LIST_FOREACH(cp, gp-consumer, consumer) {
-   if (cp-provider == pp) {
-   ZFS_LOG(1, Provider %s already in use by ZFS. 
-   Failing attach., pp-name);
-   return (NULL);
-   }
-   }
-   cp = g_new_consumer(gp);
-   error = g_attach(cp, pp);
-   if (error != 0) {
-   printf(%s(%d): g_attach failed: %d\n,
-  __func__, __LINE__, error);
-   g_destroy_consumer(cp);
-   return (NULL);
-   }
-   error = g_access(cp, 1, 0, 1);
-   if (error != 0) {
-   printf(%s(%d): g_access failed: %d\n,
-  __func__, __LINE__, error);
-   g_detach(cp);
-   g_destroy_consumer(cp);
-   return (NULL);
-   }
-   ZFS_LOG(1, Created consumer for %s., pp-name);
+   }
+   cp = g_new_consumer(gp);
+   error = g_attach(cp, pp);
+   if (error != 0) {
+   printf(%s(%d): g_attach failed: %d\n, __func__,
+  __LINE__, error);
+   g_wither_geom(gp, ENXIO);
+   return (NULL);
+   }
+   error = g_access(cp, /*r*/1, /*w*/0, /*e*/0);
+   if (error != 0) {
+   printf(%s(%d): g_access failed: %d\n, __func__,
+  __LINE__, error);
+   g_wither_geom(gp, ENXIO);
+   return (NULL);
}
+   ZFS_LOG(1, Created consumer for %s., pp-name);

cp-private = vd;
vd-vdev_tsd = cp;
@@ -251,7 +225,7 @@
cp-private = NULL;

gp = cp-geom;
-   g_access(cp, -1, 0, -1);
+   g_access(cp, -1, 0, 0);
/* Destroy consumer on last close. */
if (cp-acr == 0  cp-ace == 0) {
ZFS_LOG(1, Destroyed consumer to %s., cp-provider-name);
@@ -384,6 +358,18 @@
cp-provider-name);
 }

+static inline boolean_t
+vdev_attach_ok(vdev_t *vd, uint64_t pool_guid, uint64_t vdev_guid)
+{
+   boolean_t pool_ok;
+   boolean_t vdev_ok;
+
+   /* Spares can be assigned to multiple pools. */
+   pool_ok = vd-vdev_isspare || pool_guid == spa_guid(vd-vdev_spa);
+   vdev_ok = vdev_guid == vd-vdev_guid;
+   return (pool_ok  vdev_ok);
+}
+
 static struct g_consumer *
 vdev_geom_attach_by_guids(vdev_t *vd)
 {
@@ -420,8 +406,7 @@
g_topology_lock();
g_access(zcp, -1, 0, 0);
g_detach(zcp);
-   if (pguid != spa_guid(vd-vdev_spa) ||
-   vguid != vd-vdev_guid

Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem

2013-07-11 Thread Will Andrews
On Thu, Jul 11, 2013 at 9:04 AM, Alan Somers asom...@freebsd.org wrote:
 zpool export does not wipe the transaction history.  It does,
 however, write new labels and some metadata, so there is a very slight
 chance that it might overwrite some of the blocks that you're trying
 to recover.  But it's probably safe.  An alternative, much more
 complicated, solution would be to have ZFS open the device
 non-exclusively.  This patch will do that.  Caveat programmer: I
 haven't tested this patch in isolation.

This change is quite a bit more than necessary, and probably wouldn't
apply to FreeBSD given the other changes in the code.  Really, to make
non-exclusive opens you just have to change the g_access() calls in
vdev_geom.c so the third argument is always 0.

However, see below.

 On Thu, Jul 11, 2013 at 8:43 AM, Reid Linnemann linnema...@gmail.com wrote:
 But now we are to my current problem. When attempting to roll back with
 this script, it tries to dd zero'd bytes to offsets into the disk device
 (/dev/ada1p3 in my case) where the uberblocks are located. But even
 with kern.geom.debugflags
 set to 0x10 (and I am runnign this as root) I get 'Operation not permitted'
 when the script tries to zero out the unwanted transactions. I'm fairly
 certain this is because the geom is in use by the ZFS subsystem, as it is
 still recognized as a part of the original pool. I'm hesitant to zfs export
 the pool, as I don't know if that wipes the transaction history on the
 pool. Does anyone have any ideas?

You do not have a choice.  Changing the on-disk state does not mean
the in-core state will update to match, and the pool could get into a
really bad state if you try to modify the transactions on disk while
it's online, since it may write additional transactions (which rely on
state you're about to destroy), before you export.

Also, rolling back transactions in this manner assumes that the
original blocks (that were COW'd) are still in their original state.
If you're using TRIM or have a pretty full pool, the odds are not in
your favor.  It's a roll of the dice, in any case.

--Will.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Attempting to roll back zfs transactions on a disk to recover a destroyed ZFS filesystem

2013-07-11 Thread Reid Linnemann
Will,

Thanks, that makes sense. I know this is all a crap shoot, but I've really
got nothing to lose at this point, so this is just a good opportunity to
rummage around the internals of ZFS and learn a few things. I might even
get lucky and recover some data!


On Thu, Jul 11, 2013 at 10:59 AM, Will Andrews w...@firepipe.net wrote:

 On Thu, Jul 11, 2013 at 9:04 AM, Alan Somers asom...@freebsd.org wrote:
  zpool export does not wipe the transaction history.  It does,
  however, write new labels and some metadata, so there is a very slight
  chance that it might overwrite some of the blocks that you're trying
  to recover.  But it's probably safe.  An alternative, much more
  complicated, solution would be to have ZFS open the device
  non-exclusively.  This patch will do that.  Caveat programmer: I
  haven't tested this patch in isolation.

 This change is quite a bit more than necessary, and probably wouldn't
 apply to FreeBSD given the other changes in the code.  Really, to make
 non-exclusive opens you just have to change the g_access() calls in
 vdev_geom.c so the third argument is always 0.

 However, see below.

  On Thu, Jul 11, 2013 at 8:43 AM, Reid Linnemann linnema...@gmail.com
 wrote:
  But now we are to my current problem. When attempting to roll back with
  this script, it tries to dd zero'd bytes to offsets into the disk device
  (/dev/ada1p3 in my case) where the uberblocks are located. But even
  with kern.geom.debugflags
  set to 0x10 (and I am runnign this as root) I get 'Operation not
 permitted'
  when the script tries to zero out the unwanted transactions. I'm fairly
  certain this is because the geom is in use by the ZFS subsystem, as it
 is
  still recognized as a part of the original pool. I'm hesitant to zfs
 export
  the pool, as I don't know if that wipes the transaction history on the
  pool. Does anyone have any ideas?

 You do not have a choice.  Changing the on-disk state does not mean
 the in-core state will update to match, and the pool could get into a
 really bad state if you try to modify the transactions on disk while
 it's online, since it may write additional transactions (which rely on
 state you're about to destroy), before you export.

 Also, rolling back transactions in this manner assumes that the
 original blocks (that were COW'd) are still in their original state.
 If you're using TRIM or have a pretty full pool, the odds are not in
 your favor.  It's a roll of the dice, in any case.

 --Will.
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Dag-Erling Smørgrav
The attached patch causes ZFS to base the minimum transfer size for a
new vdev on the GEOM provider's stripesize (physical sector size) rather
than sectorsize (logical sector size), provided that stripesize is a
power of two larger than sectorsize and smaller than or equal to
VDEV_PAD_SIZE.  This should eliminate the need for ivoras@'s gnop trick
when creating ZFS pools on Advanced Format drives.

DES
-- 
Dag-Erling Smørgrav - d...@des.no

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	(revision 253138)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	(working copy)
@@ -578,6 +578,7 @@
 {
 	struct g_provider *pp;
 	struct g_consumer *cp;
+	u_int sectorsize;
 	size_t bufsize;
 	int error;
 
@@ -661,8 +662,21 @@
 
 	/*
 	 * Determine the device's minimum transfer size.
+	 *
+	 * This is a bit of a hack.  For performance reasons, we would
+	 * prefer to use the physical sector size (reported by GEOM as
+	 * stripesize) as minimum transfer size.  However, doing so
+	 * unconditionally would break existing vdevs.  Therefore, we
+	 * compute ashift based on stripesize when the vdev isn't already
+	 * part of a pool (vdev_asize == 0), and sectorsize otherwise.
 	 */
-	*ashift = highbit(MAX(pp-sectorsize, SPA_MINBLOCKSIZE)) - 1;
+	if (vd-vdev_asize == 0  pp-stripesize  pp-sectorsize 
+	ISP2(pp-stripesize)  pp-stripesize = VDEV_PAD_SIZE) {
+		sectorsize = pp-stripesize;
+	} else {
+		sectorsize = pp-sectorsize;
+	}
+	*ashift = highbit(MAX(sectorsize, SPA_MINBLOCKSIZE)) - 1;
 
 	/*
 	 * Clear the nowritecache settings, so that on a vdev_reopen()
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland

Hi DES, unfortunately you need a quite bit more than this to work compatibly.

I've had a patch here that does just this for quite some time but there's been 
some
discussion on how we want additional control over this so its not been commited.

If others are interested I've attached this as it achieves what we needed here 
so
may also be of use for others too.

There's also a big discussion on illumos about this very subject ATM so I'm
monitoring that too.

Hopefully there will be a nice conclusion come from that how people want to
proceed and we'll be able to get a change in that works for everyone.

   Regards
   Steve
- Original Message - 
From: Dag-Erling Smørgrav d...@des.no

To: freebsd...@freebsd.org; freebsd-hackers@freebsd.org
Cc: ivo...@freebsd.org
Sent: Wednesday, July 10, 2013 10:02 AM
Subject: Make ZFS use the physical sector size when computing initial ashift


The attached patch causes ZFS to base the minimum transfer size for a
new vdev on the GEOM provider's stripesize (physical sector size) rather
than sectorsize (logical sector size), provided that stripesize is a
power of two larger than sectorsize and smaller than or equal to
VDEV_PAD_SIZE.  This should eliminate the need for ivoras@'s gnop trick
when creating ZFS pools on Advanced Format drives.

DES
--
Dag-Erling Smørgrav - d...@des.no








___
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org 




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

zzz-zfs-ashift-fix.patch
Description: Binary data
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Dag-Erling Smørgrav
Steven Hartland kill...@multiplay.co.uk writes:
 Hi DES, unfortunately you need a quite bit more than this to work
 compatibly.

*chirp* *chirp* *chirp*

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Borja Marcos

On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote:

 If others are interested I've attached this as it achieves what we needed 
 here so
 may also be of use for others too.
 
 There's also a big discussion on illumos about this very subject ATM so I'm
 monitoring that too.
 
 Hopefully there will be a nice conclusion come from that how people want to
 proceed and we'll be able to get a change in that works for everyone.

Hmm. I wonder if the simplest approach would be the better. I mean, adding a 
flag to zpool.

At home I have a playground FreeBSD machine with a ZFS zmirror, and, you 
guessed it, I was
careless when I purchased the components, I asked for two 1 TB drives and 
that I got, but different
models, one of them advanced format and the other one classic.

I don't think it's that bad to create a pool on a classic disk using 4 KB 
blocks, and it's quite likely that
replacement disks will be 4 KB in the near future. 

Also, if you use SSDs the situation is similar.





Borja.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland

There's lots more to consider when considering a way foward not least of all
ashift isn't a zpool configuration option is per top level vdev, space
consideration of moving from 512b to 4k, see previous and current discussions
on zfs-de...@freebsd.org and z...@lists.illumos.org for details.

   Regards
   Steve

- Original Message - 
From: Borja Marcos bor...@sarenet.es


On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote:


If others are interested I've attached this as it achieves what we needed here 
so
may also be of use for others too.

There's also a big discussion on illumos about this very subject ATM so I'm
monitoring that too.

Hopefully there will be a nice conclusion come from that how people want to
proceed and we'll be able to get a change in that works for everyone.


Hmm. I wonder if the simplest approach would be the better. I mean, adding a 
flag to zpool.

At home I have a playground FreeBSD machine with a ZFS zmirror, and, you 
guessed it, I was
careless when I purchased the components, I asked for two 1 TB drives and 
that I got, but different
models, one of them advanced format and the other one classic.

I don't think it's that bad to create a pool on a classic disk using 4 KB 
blocks, and it's quite likely that
replacement disks will be 4 KB in the near future. 


Also, if you use SSDs the situation is similar.



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 07/10/13 02:02, Dag-Erling Sm￸rgrav wrote:
 The attached patch causes ZFS to base the minimum transfer size for
 a new vdev on the GEOM provider's stripesize (physical sector size)
 rather than sectorsize (logical sector size), provided that
 stripesize is a power of two larger than sectorsize and smaller
 than or equal to VDEV_PAD_SIZE.  This should eliminate the need for
 ivoras@'s gnop trick when creating ZFS pools on Advanced Format
 drives.

I think there are multiple versions of this (I also have one[1]) but
the concern is that if one creates a pool with ashift=9, and now
ashift=12, the pool gets unimportable.  So there need a way to disable
this behavior.

Another thing (not really related to the automatic detection) is that
we need a way to manually override this setting from command line when
creating the pool, this is under active discussion at Illumos mailing
list right now.

[1]
https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26

Cheers,
- -- 
Xin LI delp...@delphij.nethttps://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-

iQEcBAEBCgAGBQJR3ZgAAAoJEG80Jeu8UPuzM6kIALu3Ud4uu+kdcsp+zNS54iw6
Etx2xWOjbHhJ1PZ0BKJ4R5/BOfpW4b1DrarPtpZLxoyg55GwlEVCH8Cia9ucznfP
KgFGwzztQlsiI5hcWD6RVNkAx/2o7sSynbprxxP1UdEdmH7f5MWVpNwjGE2KiIpA
0TxfTu8Sg0/QB7h3pGWt5sJSuwyogewvHIfTAgHEqnQdYPXxpadH7PS7shSJVdim
z2C9GoyLVQ6BMxXzQDcmA+fllgMZVKXROG7SxDFNDTWPnZ9HMZp2OJKELLtuZB1y
Iaq/gd3uPR2ZzPxw2OjdYKe7khWtmuU5Ox6+natsOKCqfoAfCjArA8zJZYsZoMI=
=Nd1V
-END PGP SIGNATURE-
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Justin T. Gibbs
On Jul 10, 2013, at 11:21 AM, Xin Li delp...@delphij.net wrote:

 Signed PGP part
 On 07/10/13 02:02, Dag-Erling Sm￸rgrav wrote:
  The attached patch causes ZFS to base the minimum transfer size for
  a new vdev on the GEOM provider's stripesize (physical sector size)
  rather than sectorsize (logical sector size), provided that
  stripesize is a power of two larger than sectorsize and smaller
  than or equal to VDEV_PAD_SIZE.  This should eliminate the need for
  ivoras@'s gnop trick when creating ZFS pools on Advanced Format
  drives.
 
 I think there are multiple versions of this (I also have one[1]) but
 the concern is that if one creates a pool with ashift=9, and now
 ashift=12, the pool gets unimportable.  So there need a way to disable
 this behavior.
 
 Another thing (not really related to the automatic detection) is that
 we need a way to manually override this setting from command line when
 creating the pool, this is under active discussion at Illumos mailing
 list right now.
 
 [1]
 https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26
 
 Cheers,
 - -- 
 Xin LI delp...@delphij.nethttps://www.delphij.net/
 FreeBSD - The Power to Serve!   Live free or die
 
 ___
 freebsd...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org

I'm sure lots of folks have some solution to this.  Here is an
old version of what we use at Spectra:

http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff

The above patch is missing some cleanup that was motivated by my
discussions with George Wilson about this change in April.  I'll
dig that up later tonight.  Even if you don't read the full diff,
please read the included checkin comment since it explains the
motivation behind this particular solution.

This is on my list of things to upstream in the next week or so after
I add logic to the userspace tools to report whether or not the
TLVs in a pool are using an optimal allocation size.  This is only
possible if you actually make ZFS fully aware of logical, physical,
and the configured allocation size.  All of the other patches I've seen
just treat physical as logical.

--
Justin



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland


- Original Message - 
From: Xin Li 


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 07/10/13 02:02, Dag-Erling Sm?rgrav wrote:

The attached patch causes ZFS to base the minimum transfer size for
a new vdev on the GEOM provider's stripesize (physical sector size)
rather than sectorsize (logical sector size), provided that
stripesize is a power of two larger than sectorsize and smaller
than or equal to VDEV_PAD_SIZE.  This should eliminate the need for
ivoras@'s gnop trick when creating ZFS pools on Advanced Format
drives.


I think there are multiple versions of this (I also have one[1]) but
the concern is that if one creates a pool with ashift=9, and now
ashift=12, the pool gets unimportable.  So there need a way to disable
this behavior.


I've tested my patch in all configurations I can think of including exported
ashift=9 pools being imported, all no issues.

For your example e.g.

# Create a 4K pool (min_create_ashift=4K, dev=512)
test:src sysctl vfs.zfs.min_create_ashift
vfs.zfs.min_create_ashift: 12
test:src mdconfig -a -t swap -s 128m -S 512 -u 0
test:src zpool create mdpool md0
test:src zdb mdpool | grep ashift
   ashift: 12
   ashift: 12

# Create a 512b pool (min_create_ashift=512, dev=512)
test:src zpool destroy mdpool
test:src sysctl vfs.zfs.min_create_ashift=9
vfs.zfs.min_create_ashift: 12 - 9
test:src zpool create mdpool md0 
test:src zdb mdpool | grep ashift

   ashift: 9
   ashift: 9

# Import a 512b pool (min_create_ashift=4K, dev=512)
test:src zpool export mdpool
test:src sysctl vfs.zfs.min_create_ashift=12
vfs.zfs.min_create_ashift: 9 - 12
test:src zpool import mdpool
test:src zdb mdpool | grep ashift
   ashift: 9
   ashift: 9

# Create a 4K pool (min_create_ashift=512, dev=4K)
test:src zpool destroy mdpool
test:src mdconfig -d -u 0
test:src mdconfig -a -t swap -s 128m -S 4096 -u 0   
test:src sysctl vfs.zfs.min_create_ashift=9

vfs.zfs.min_create_ashift: 12 - 9
test:src zpool create mdpool md0
test:src zdb mdpool | grep ashift
   ashift: 12
   ashift: 12

# Import a 4K pool (min_create_ashift=4K, dev=4K)
test:src zpool export mdpool
test:src sysctl vfs.zfs.min_create_ashift=12
vfs.zfs.min_create_ashift: 9 - 12
test:src zpool import mdpool
test:src zdb mdpool | grep ashift
   ashift: 12
   ashift: 12


Another thing (not really related to the automatic detection) is that
we need a way to manually override this setting from command line when
creating the pool, this is under active discussion at Illumos mailing
list right now.

[1]
https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26


Yep has been on my list for a while, based on previous discussions on 
zfs-devel@. I've not had any time recently but I'm following the illumos

thread to see what conclusions they come to.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 07/10/13 10:38, Justin T. Gibbs wrote:
[snip]
 I'm sure lots of folks have some solution to this.  Here is an 
 old version of what we use at Spectra:
 
 http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff
 
 The above patch is missing some cleanup that was motivated by my 
 discussions with George Wilson about this change in April.  I'll 
 dig that up later tonight.  Even if you don't read the full diff, 
 please read the included checkin comment since it explains the 
 motivation behind this particular solution.
 
 This is on my list of things to upstream in the next week or so
 after I add logic to the userspace tools to report whether or not
 the TLVs in a pool are using an optimal allocation size.  This is
 only possible if you actually make ZFS fully aware of logical,
 physical, and the configured allocation size.  All of the other
 patches I've seen just treat physical as logical.

Yes, me too.  Your version is superior.

Cheers,
- -- 
Xin LI delp...@delphij.nethttps://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-

iQEcBAEBCgAGBQJR3aQzAAoJEG80Jeu8UPuzHn8H/1ZpoTqAQ4+mgQOttOwXgBcr
2Fgh52ztW8fCEQSeIosxXKO06hP7HxFfTPvmeeWyjT8zIpSUSFV6G0NclebKDncP
huGFofvx3BKPRmfzZp4iZx1wWQUxSHTmv6ceDwvP7P8GJ0mON+SrZxmmwUjKrf7V
W9Sazl0p8e0nxSQykLyjjrkaBx5Iv+aUxu8Alomwy9BmpM8+gd2yutvzghW5L36L
0CvAtIMXdlc+eUdAqa/2rOk/nMOA9sfWVW0gkKYCZk6wvj2DMzjii05UechZ4Z+l
6nEU3UdVsbTX73CABZv4my4JAWc5Yk1s/cWrxtn68AfK8LMPFJCJcVXXOSckMWI=
=351W
-END PGP SIGNATURE-
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland
- Original Message - 
From: Justin T. Gibbs 

I'm sure lots of folks have some solution to this.  Here is an
old version of what we use at Spectra:

 http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff

The above patch is missing some cleanup that was motivated by my
discussions with George Wilson about this change in April.  I'll
dig that up later tonight.  Even if you don't read the full diff,
please read the included checkin comment since it explains the
motivation behind this particular solution.

This is on my list of things to upstream in the next week or so after
I add logic to the userspace tools to report whether or not the
TLVs in a pool are using an optimal allocation size.  This is only
possible if you actually make ZFS fully aware of logical, physical,
and the configured allocation size.  All of the other patches I've seen
just treat physical as logical.


Reading through your patch it seems that your logical_ashift equates to
the current ashift values which for geom devices is based off sectorsize
and your physical_ashift is based stripesize.

This is almost identical to the approach I used adding a desired ashift,
which equates to your physical_ashift, along side the standard ashift
i.e. required aka logical_ashift value :)

One issue I did spot in your patch is that you currently expose
zfs_max_auto_ashift as a sysctl but don't clamp its value which would
cause problems should a user configure values  13.

If your interested in the reason for this its explained in the comments in 
my version which does a very similar thing with validation.


   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Justin T. Gibbs
On Jul 10, 2013, at 1:06 PM, Steven Hartland kill...@multiplay.co.uk wrote:

 - Original Message - From: Justin T. Gibbs 
 I'm sure lots of folks have some solution to this.  Here is an
 old version of what we use at Spectra:
 http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff
 The above patch is missing some cleanup that was motivated by my
 discussions with George Wilson about this change in April.  I'll
 dig that up later tonight.  Even if you don't read the full diff,
 please read the included checkin comment since it explains the
 motivation behind this particular solution.
 
 This is on my list of things to upstream in the next week or so after
 I add logic to the userspace tools to report whether or not the
 TLVs in a pool are using an optimal allocation size.  This is only
 possible if you actually make ZFS fully aware of logical, physical,
 and the configured allocation size.  All of the other patches I've seen
 just treat physical as logical.
 
 Reading through your patch it seems that your logical_ashift equates to
 the current ashift values which for geom devices is based off sectorsize
 and your physical_ashift is based stripesize.
 
 This is almost identical to the approach I used adding a desired ashift,
 which equates to your physical_ashift, along side the standard ashift
 i.e. required aka logical_ashift value :)

Yes, the approaches are similar.  Our current version records the logical
access size in the vdev structure too, which might relate to the issue
below.

 One issue I did spot in your patch is that you currently expose
 zfs_max_auto_ashift as a sysctl but don't clamp its value which would
 cause problems should a user configure values  13.

I would expect the zio pipeline to simply insert an ashift aligned thunking
buffer for these operations, but I haven't tried going past an ashift of 13 in
my tests.  If it is an issue, it seems the restriction should be based on
logical access size, not optimal access size.

--
Justin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland


- Original Message - 
From: Justin T. Gibbs

On Jul 10, 2013, at 1:06 PM, Steven Hartland wrote:
- Original Message - From: Justin T. Gibbs 

I'm sure lots of folks have some solution to this.  Here is an
old version of what we use at Spectra:
http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff
The above patch is missing some cleanup that was motivated by my
discussions with George Wilson about this change in April.  I'll
dig that up later tonight.  Even if you don't read the full diff,
please read the included checkin comment since it explains the
motivation behind this particular solution.

This is on my list of things to upstream in the next week or so after
I add logic to the userspace tools to report whether or not the
TLVs in a pool are using an optimal allocation size.  This is only
possible if you actually make ZFS fully aware of logical, physical,
and the configured allocation size.  All of the other patches I've seen
just treat physical as logical.


Reading through your patch it seems that your logical_ashift equates to
the current ashift values which for geom devices is based off sectorsize
and your physical_ashift is based stripesize.

This is almost identical to the approach I used adding a desired ashift,
which equates to your physical_ashift, along side the standard ashift
i.e. required aka logical_ashift value :)


Yes, the approaches are similar.  Our current version records the logical
access size in the vdev structure too, which might relate to the issue
below.

 One issue I did spot in your patch is that you currently expose
 zfs_max_auto_ashift as a sysctl but don't clamp its value which would
 cause problems should a user configure values  13.

I would expect the zio pipeline to simply insert an ashift aligned thunking
buffer for these operations, but I haven't tried going past an ashift of 13 in
my tests.  If it is an issue, it seems the restriction should be based on
logical access size, not optimal access size.


Yes with your methodology you'll only see the issue if zfs_max_auto_ashift
and physical_ashift are both  13, but this can be the case for example
on a RAID controller with large stripsize.

Looking back at my old patch it too suffers from the same issue along with
the current code base, but that would only happen if logical sector size
resulted in an ashift  13 which is going to be much less common ;-)

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Justin T. Gibbs
On Jul 10, 2013, at 1:42 PM, Steven Hartland kill...@multiplay.co.uk wrote:

 
 - Original Message - From: Justin T. Gibbs
 On Jul 10, 2013, at 1:06 PM, Steven Hartland wrote:
 - Original Message - From: Justin T. Gibbs 
 I'm sure lots of folks have some solution to this.  Here is an
 old version of what we use at Spectra:
 http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff
 The above patch is missing some cleanup that was motivated by my
 discussions with George Wilson about this change in April.  I'll
 dig that up later tonight.  Even if you don't read the full diff,
 please read the included checkin comment since it explains the
 motivation behind this particular solution.
 This is on my list of things to upstream in the next week or so after
 I add logic to the userspace tools to report whether or not the
 TLVs in a pool are using an optimal allocation size.  This is only
 possible if you actually make ZFS fully aware of logical, physical,
 and the configured allocation size.  All of the other patches I've seen
 just treat physical as logical.
 Reading through your patch it seems that your logical_ashift equates to
 the current ashift values which for geom devices is based off sectorsize
 and your physical_ashift is based stripesize.
 This is almost identical to the approach I used adding a desired ashift,
 which equates to your physical_ashift, along side the standard ashift
 i.e. required aka logical_ashift value :)
 
 Yes, the approaches are similar.  Our current version records the logical
 access size in the vdev structure too, which might relate to the issue
 below.
 
  One issue I did spot in your patch is that you currently expose
  zfs_max_auto_ashift as a sysctl but don't clamp its value which would
  cause problems should a user configure values  13.
 
 I would expect the zio pipeline to simply insert an ashift aligned thunking
 buffer for these operations, but I haven't tried going past an ashift of 13 
 in
 my tests.  If it is an issue, it seems the restriction should be based on
 logical access size, not optimal access size.
 
 Yes with your methodology you'll only see the issue if zfs_max_auto_ashift
 and physical_ashift are both  13, but this can be the case for example
 on a RAID controller with large stripsize.

I'm not sure I follow.  logical_ashift is available in our latest code, as is 
the
physical_ashift.  But even without the logical_ashift, why doesn't the zio
pipeline properly thunk zio_phys_read() access based on the configured ashift?

--
Justin

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland
- Original Message - 
From: Justin T. Gibbs

...

 One issue I did spot in your patch is that you currently expose
 zfs_max_auto_ashift as a sysctl but don't clamp its value which would
 cause problems should a user configure values  13.

I would expect the zio pipeline to simply insert an ashift aligned thunking
buffer for these operations, but I haven't tried going past an ashift of 13 in
my tests.  If it is an issue, it seems the restriction should be based on
logical access size, not optimal access size.


Yes with your methodology you'll only see the issue if zfs_max_auto_ashift
and physical_ashift are both  13, but this can be the case for example
on a RAID controller with large stripsize.


I'm not sure I follow.  logical_ashift is available in our latest code, as is 
the
physical_ashift.  But even without the logical_ashift, why doesn't the zio
pipeline properly thunk zio_phys_read() access based on the configured ashift?


When I looked at it, which was a long time ago now so please excuse me if
I'm a little rusty on the details, zio_phys_read() was working more luck than
judgement as the offsets passed in where calculated from a valid start + 
increment
based on the size of a structure within vdev_label_offset() with no ashift
logic applied that I cound find.

The result was pools created with large ashift's where unstable when I
tested.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-25 Thread Wojciech Puchar


  here is my real world production example of users mail as well as 
documents.


  /dev/mirror/home1.eli      2788 1545  1243    55% 1941057 20981181    8%  
 /home


Not the same data, I imagine.


A mix. 90% Mailboxes and user data (documents, pictures), rest are some 
.tar.gz backups.


At other places i have similar situation. one or more gmirror sets, 1-3TB 
each depends on drives.


For those who puts 1000 of mailboxes i recommend dovecot with mdbox 
storage backend



  I was dealing with the actual byte counts ... that figure is going to 

be in whole blocks.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Adam Nowacki

On 2013-01-23 21:22, Wojciech Puchar wrote:

While RAID-Z is already a king of bad performance,


I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
measurements to back up your claim?


it is clearly described even in ZFS papers. Both on reads and writes it
gives single drive random I/O performance.


With ZFS and RAID-Z the situation is a bit more complex.

Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors).

A worst case scenario could happen if your random i/o workload was 
reading random files each of 2048 bytes. Each file read would require 
data from 4 disks (5th is parity and won't be read unless there are 
errors). However if files were 512 bytes or less then only one disk 
would be used. 1024 bytes - two disks, etc.


So ZFS is probably not the best choice to store millions of small files 
if random access to whole files is the primary concern.


But lets look at a different scenario - a PostgreSQL database. Here 
table data is split and stored in 1GB files. ZFS splits the file into 
128KiB records (recordsize property). This record is then again split 
into 4 columns each 32768 bytes. 5th column is generated containing 
parity. Each column is then stored on a different disk. You could think 
of it as a regular RAID-5 with stripe size of 32768 bytes.


PostgreSQL uses 8192 byte pages that fit evenly both into ZFS record 
size and column size. Each page access requires only a single disk read. 
Random i/o performance here should be 5 times that of a single disk.


For me the reliability ZFS offers is far more important than pure 
performance.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Wojciech Puchar
then stored on a different disk. You could think of it as a regular RAID-5 
with stripe size of 32768 bytes.


PostgreSQL uses 8192 byte pages that fit evenly both into ZFS record size and 
column size. Each page access requires only a single disk read. Random i/o 
performance here should be 5 times that of a single disk.


think about writing 8192 byte pages randomly. and then doing linear search 
over table.




For me the reliability ZFS offers is far more important than pure 
performance.

Except it is on paper reliability.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Zaphod Beeblebrox
Wow!.!  OK.  It sounds like you (or someone like you) can answer some of my
burning questions about ZFS.

On Thu, Jan 24, 2013 at 8:12 AM, Adam Nowacki nowa...@platinum.linux.plwrote:


 Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors).

 A worst case scenario could happen if your random i/o workload was reading
 random files each of 2048 bytes. Each file read would require data from 4
 disks (5th is parity and won't be read unless there are errors). However if
 files were 512 bytes or less then only one disk would be used. 1024 bytes -
 two disks, etc.

 So ZFS is probably not the best choice to store millions of small files if
 random access to whole files is the primary concern.

 But lets look at a different scenario - a PostgreSQL database. Here table
 data is split and stored in 1GB files. ZFS splits the file into 128KiB
 records (recordsize property). This record is then again split into 4
 columns each 32768 bytes. 5th column is generated containing parity. Each
 column is then stored on a different disk. You could think of it as a
 regular RAID-5 with stripe size of 32768 bytes.


Ok... so my question then would be... what of the small files.  If I write
several small files at once, does the transaction use a record, or does
each file need to use a record?  Additionally, if small files use
sub-records, when you delete that file, does the sub-record get moved or
just wasted (until the record is completely free)?

I'm considering the difference, say, between cyrus imap (one file per
message ZFS, database files on different ZFS filesystem) and dbmail imap
(postgresql on ZFS).

... now I realize that PostgreSQL on ZFS has some special issues (but I
don't have a choice here between ZFS and non-ZFS ... ZFS has already been
chosen), but I'm also figuring that PostgreSQL on ZFS has some waste
compared to cyrus IMAP on ZFS.

So far in my research, Cyrus makes some compelling arguments that the
common use case of most IMAP database files is full scan --- for which it's
database files are optimized and SQL-based files are not.  I agree that
some operations can be more efficient in a good SQL database, but full scan
(as a most often used query) is not.

Cyrus also makes sense to me as a collection of small files ... for which I
expect ZFS to excel... including the ability to snapshot with impunity...
but I am terribly curious how the files are handled in transactions.

I'm actually (right now) running some filesize statistics (and I'll get
back to the list, if asked), but I'd like to know how ZFS is going to store
the arriving mail... :).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Wojciech Puchar

several small files at once, does the transaction use a record, or does
each file need to use a record?  Additionally, if small files use
sub-records, when you delete that file, does the sub-record get moved or
just wasted (until the record is completely free)?


writes of small files are always good with ZFS.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Adam Nowacki

On 2013-01-24 15:24, Wojciech Puchar wrote:

For me the reliability ZFS offers is far more important than pure
performance.

Except it is on paper reliability.


This on paper reliability in practice saved a 20TB pool. See one of my 
previous emails. Any other filesystem or hardware/software raid without 
per-disk checksums would have failed. Silent corruption of non-important 
files would be the best case, complete filesystem death by important 
metadata corruption as the worst case.


I've been using ZFS for 3 years in many systems. Biggest one has 44 
disks and 4 ZFS pools - this one survived SAS expander disconnects, a 
few kernel panics and countless power failures (UPS only holds for a few 
hours).


So far I've not lost a single ZFS pool or any data stored.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Adam Nowacki

On 2013-01-24 15:45, Zaphod Beeblebrox wrote:

Ok... so my question then would be... what of the small files.  If I write
several small files at once, does the transaction use a record, or does
each file need to use a record?  Additionally, if small files use
sub-records, when you delete that file, does the sub-record get moved or
just wasted (until the record is completely free)?


Each file is a fully self-contained object (together with full parity) 
all the way to the physical storage. A 1 byte file on RAID-Z2 pool will 
always use 3 disks, 3 sectors total for data alone. You can use du to 
verify - it reports physical size together with parity. Metadata like 
directory entry or file attributes is stored separately and shared with 
other files. For small files there may be a lot of wasted space.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Zaphod Beeblebrox
Ok... here's the existing data:

There are 3,236,316 files summing to 97,500,008,691 bytes.  That puts the
average file at 30,127 bytes.  But for the full breakdown:

512 : 7758
1024 : 139046
2048 : 1468904
4096 : 325375
8192 : 492399
16384 : 324728
32768 : 263210
65536 : 102407
131072 : 43046
262144 : 22259
524288 : 17136
1048576 : 13788
2097152 : 8279
4194304 : 4501
8388608 : 2317
16777216 : 1045
33554432 : 119
67108864 : 2

I produced that list with the output of ls -R's byte counts, sorted and
then processed with:

(while read num; do count=$[count+1]; if [ $num -gt $size ]; then echo
$size : $count;size=$[size*2]; count=0; fi; done) imapfilesizelist

... now the new machine has two 2T disks in a ZFS mirror --- so I suppose
it won't waste as much space as a RAID-Z ZFS --- in that files less than
512 bytes will take 512 bytes?  By far the most common case is 2048 bytes
... so that would indicate that a RAID-Z larger than 5 disks would waste
much space.

Does that go to your recomendations on vdev size, then?   To have an 8 or 9
disk vdev, you should be storing at smallest 4k files?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Wojciech Puchar

So far I've not lost a single ZFS pool or any data stored.

so far my house wasn't robbed.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Wojciech Puchar

There are 3,236,316 files summing to 97,500,008,691 bytes.  That puts the
average file at 30,127 bytes.  But for the full breakdown:


quite low. what do you store.

here is my real world production example of users mail as well as 
documents.



/dev/mirror/home1.eli  2788 1545  124355% 1941057 209811818%   /home


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Zaphod Beeblebrox
On Thu, Jan 24, 2013 at 2:26 PM, Wojciech Puchar 
woj...@wojtek.tensor.gdynia.pl wrote:

 There are 3,236,316 files summing to 97,500,008,691 bytes.  That puts the
 average file at 30,127 bytes.  But for the full breakdown:


 quite low. what do you store.


Apparently you're not really following this thread... just trolling?  I had
said that it was cyrus IMAP data (which, for reference, is one file per
email message).


 here is my real world production example of users mail as well as
 documents.


 /dev/mirror/home1.eli  2788 1545  124355% 1941057 209811818%
 /home


Not the same data, I imagine.  I was dealing with the actual byte counts
... that figure is going to be in whole blocks.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Nikolay Denev

On Jan 24, 2013, at 4:24 PM, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl 
wrote:
 
 Except it is on paper reliability.

This on paper reliability saved my ass numerous times.
For example I had one home NAS server machine with flaky SATA controller that 
would not detect one of the four drives from time to time on reboot.
This made my pool degraded several times, and even rebooting with let's say 
disk4 failed to a situation that disk3 is failed did not corrupt any data.
I don't think this is possible with any other open source FS, let alone 
hardware RAID that would drop the whole array because of this.
I have never ever personally lost any data on ZFS. Yes, the performance is 
another topic, and you must know what you are doing, and what is your
usage pattern, but from reliability standpoint, to me ZFS looks more durable 
than anything else.

P.S.: My home NAS is running freebsd-CURRENT with ZFS from the first version 
available. Several drives died, two times the pool was expanded
by replacing all drives one by one and resilvered, no single byte lost.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

While RAID-Z is already a king of bad performance,


I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
measurements to back up your claim?


it is clearly described even in ZFS papers. Both on reads and writes it 
gives single drive random I/O performance.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

This is because RAID-Z spreads each block out over all disks, whereas RAID5
(as it is typically configured) puts each block on only one disk.  So to
read a block from RAID-Z, all data disks must be involved, vs. for RAID5
only one disk needs to have its head moved.

For other workloads (especially streaming reads/writes), there is no
fundamental difference, though of course implementation quality may vary.
streaming workload generally is always good. random I/O is what is 
important.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Chris Rees
On 23 Jan 2013 20:23, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl
wrote:

 While RAID-Z is already a king of bad performance,


 I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
 measurements to back up your claim?


 it is clearly described even in ZFS papers. Both on reads and writes it
gives single drive random I/O performance.

So we have to take your word for it?

Provide a link if you're going to make assertions, or they're no more than
your own opinion.

Chris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Mark Felder

On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:



So we have to take your word for it?
Provide a link if you're going to make assertions, or they're no more  
than

your own opinion.


I've heard this same thing -- every vdev == 1 drive in performance. I've  
never seen any proof/papers on it though.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Artem Belevich
On Wed, Jan 23, 2013 at 12:22 PM, Wojciech Puchar
woj...@wojtek.tensor.gdynia.pl wrote:
 While RAID-Z is already a king of bad performance,


 I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
 measurements to back up your claim?


 it is clearly described even in ZFS papers. Both on reads and writes it
 gives single drive random I/O performance.

For reads - true. For writes it's probably behaves better than RAID5
as it does not have to go through read-modify-write for partial block
updates. Search for RAID-5 write hole.
If you need higher performance, build your pool out of multiple RAID-Z vdevs.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Artem Belevich
On Wed, Jan 23, 2013 at 1:09 PM, Mark Felder f...@feld.me wrote:
 On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:


 So we have to take your word for it?
 Provide a link if you're going to make assertions, or they're no more than
 your own opinion.


 I've heard this same thing -- every vdev == 1 drive in performance. I've
 never seen any proof/papers on it though.

1 drive in performance only applies to number of random i/o
operations vdev can perform. You still get increased throughput. I.e.
5-drive RAIDZ will have 4x bandwidth of individual disks in vdev, but
would deliver only as many IOPS as the slowest drive as record would
have to be read back from N-1 or N-2 drived in vdev. It's the same for
RAID5. IMHO for identical record/block size RAID5 has no advantage
over RAID-Z for reads and does have disadvantage when it comes to
small writes. Never mind lack of data integrity checks and other bells
and whistles ZFS provides.

--Artem
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar


I've heard this same thing -- every vdev == 1 drive in performance. I've 
never seen any proof/papers on it though.

read original ZFS papers.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

gives single drive random I/O performance.


For reads - true. For writes it's probably behaves better than RAID5


yes, because as with reads it gives single drive performance. small writes 
on RAID5 gives lower than single disk performance.



If you need higher performance, build your pool out of multiple RAID-Z vdevs.

even you need normal performance use gmirror and UFS
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Chris Rees
On 23 January 2013 21:24, Wojciech Puchar
woj...@wojtek.tensor.gdynia.pl wrote:

 I've heard this same thing -- every vdev == 1 drive in performance. I've
 never seen any proof/papers on it though.

 read original ZFS papers.

No, you are making the assertion, provide a link.

Chris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

1 drive in performance only applies to number of random i/o
operations vdev can perform. You still get increased throughput. I.e.
5-drive RAIDZ will have 4x bandwidth of individual disks in vdev, but


unless your work is serving movies it doesn't matter.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Michel Talon
On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:


 So we have to take your word for it?
 Provide a link if you're going to make assertions, or they're no more  
 than
 your own opinion.

I've heard this same thing -- every vdev == 1 drive in performance. I've  
never seen any proof/papers on it though.


first google answer from request raids performance
https://blogs.oracle.com/roch/entry/when_to_and_not_to

Effectively,  as  a first approximation,  an  N-disk RAID-Z group will
behave as   a single   device in  terms  of  deliveredrandom input
IOPS. Thus  a 10-disk group of devices  each capable of 200-IOPS, will
globally act as a 200-IOPS capable RAID-Z group.  This is the price to
pay to achieve proper data  protection without  the 2X block  overhead
associated with mirroring.



--

Michel Talon
ta...@lpthe.jussieu.fr







smime.p7s
Description: S/MIME cryptographic signature


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Artem Belevich
On Wed, Jan 23, 2013 at 1:25 PM, Wojciech Puchar
woj...@wojtek.tensor.gdynia.pl wrote:
 gives single drive random I/O performance.


 For reads - true. For writes it's probably behaves better than RAID5


 yes, because as with reads it gives single drive performance. small writes
 on RAID5 gives lower than single disk performance.


 If you need higher performance, build your pool out of multiple RAID-Z
 vdevs.

 even you need normal performance use gmirror and UFS

I've no objection. If it works for you -- go for it.

For me personally ZFS performance is good enough, and data integrity
verification is something that I'm willing to sacrifice some
performance for. ZFS scrub gives me either warm and fuzzy feeling that
everything is OK, or explicitly tells me that something bad happened
*and* reconstructs the data if it's possible.

Just my $0.02,

--Artem
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Nikolay Denev

On Jan 23, 2013, at 11:09 PM, Mark Felder f...@feld.me wrote:

 On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:
 
 
 So we have to take your word for it?
 Provide a link if you're going to make assertions, or they're no more than
 your own opinion.
 
 I've heard this same thing -- every vdev == 1 drive in performance. I've 
 never seen any proof/papers on it though.
 ___
 freebsd...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org


Here is a blog post that describes why this is true for IOPS:

http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Chris Rees
On 23 Jan 2013 21:45, Michel Talon ta...@lpthe.jussieu.fr wrote:

 On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:

 
  So we have to take your word for it?
  Provide a link if you're going to make assertions, or they're no more
  than
  your own opinion.

 I've heard this same thing -- every vdev == 1 drive in performance. I've
 never seen any proof/papers on it though.


 first google answer from request raids performance
 https://blogs.oracle.com/roch/entry/when_to_and_not_to

 Effectively,  as  a first approximation,  an  N-disk RAID-Z group will
 behave as   a single   device in  terms  of  deliveredrandom input
 IOPS. Thus  a 10-disk group of devices  each capable of 200-IOPS, will
 globally act as a 200-IOPS capable RAID-Z group.  This is the price to
 pay to achieve proper data  protection without  the 2X block  overhead
 associated with mirroring.

Thanks for the link, but I could have done that;  I am attempting to
explain to Wojciech that his habit of making bold assertions and
arrogantly refusing to back them up makes for frustrating reading.

Chris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

associated with mirroring.


Thanks for the link, but I could have done that;  I am attempting to
explain to Wojciech that his habit of making bold assertions and
as you can see it is not a bold assertion, just you use something without 
even reading it's docs.

Not mentioning doing any more research.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar


even you need normal performance use gmirror and UFS


I've no objection. If it works for you -- go for it.


both works. For todays trend of solving everything by more hardware ZFS 
may even have enough performance.


But still it is dangerous for a reasons i explained, as well as it 
promotes bad setups and layouts like making single filesystem out of large 
amount of disks. This is bad for no matter what filesystem and RAID setup 
you use, or even what OS.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread matt
On 01/23/13 14:27, Wojciech Puchar wrote:


 both works. For todays trend of solving everything by more hardware
 ZFS may even have enough performance.

 But still it is dangerous for a reasons i explained, as well as it
 promotes bad setups and layouts like making single filesystem out of
 large amount of disks. This is bad for no matter what filesystem and
 RAID setup you use, or even what OS.


ZFS mirror performance is quite good (both random IO and sequential),
and resilvers/scrubs are measured in an hour or less. You can always
make pool out of these instead of RAIDZ if you can get away with less
total available space.

I think RAIDZ vs Gmirror is a bad comparison, you can use a ZFS mirror
with all the ZFS features, plus N-way (not sure if gmirror does this).

Regarding single large filesystems, there is an old saying about not
putting all your eggs into one basket, even if it's a great basket :)

Matt


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-22 Thread Matthew Ahrens
On Mon, Jan 21, 2013 at 11:36 PM, Peter Jeremy pe...@rulingia.com wrote:
 On 2013-Jan-21 12:12:45 +0100, Wojciech Puchar 
woj...@wojtek.tensor.gdynia.pl wrote:
While RAID-Z is already a king of bad performance,

 I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
 measurements to back up your claim?

Leaving aside anecdotal evidence (or actual measurements), RAID-Z is
fundamentally slower than RAID4/5 *for random reads*.

This is because RAID-Z spreads each block out over all disks, whereas RAID5
(as it is typically configured) puts each block on only one disk.  So to
read a block from RAID-Z, all data disks must be involved, vs. for RAID5
only one disk needs to have its head moved.

For other workloads (especially streaming reads/writes), there is no
fundamental difference, though of course implementation quality may vary.

 Even better - use UFS.

To each their own.  As a ZFS developer, it should come as no surprise that
in my opinion and experience, the benefits of ZFS almost always outweigh
this downside.

--matt
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-21 Thread Wojciech Puchar

Please don't misinterpret this post: ZFS's ability to recover from fairly
catastrophic failures is pretty stellar, but I'm wondering if there can be


from my testing it is exactly opposite. You have to see a difference 
between marketing and reality.



a little room for improvement.

I use RAID pretty much everywhere.  I don't like to loose data and disks
are cheap.  I have a fair amount of experience with all flavors ... and ZFS


just like me. And because i want performance and - as you described - 
disks are cheap - i use RAID-1 (gmirror).



has become a go-to filesystem for most of my applications.


My applications doesn't tolerate low performance, overcomplexity and 
high risk of data loss.


That's why i use properly tuned UFS, gmirror, and prefer not to use 
gstripe but have multiple filesystems



One of the best recommendations I can give for ZFS is it's
crash-recoverability.


Which is marketing, not truth. If you want bullet-proof recoverability, 
UFS beats everything i've ever seen.


If you want FAST crash recovery, use softupdates+journal, available in 
FreeBSD 9.



 As a counter example, if you have most hardware RAID
going or a software whole-disk raid, after a crash it will generally
declare one disk as good and the other disk as to be repaired ... after
which a full surface scan of the affected disks --- reading one and writing
the other --- ensues.


true. gmirror do it, but you can defer mirror rebuild, which i use.
I have a script that send me a mail when gmirror is degraded, and i - 
after finding out the cause of problem, and possibly replacing disk - run 
rebuild after work hours, so no slowdown is experienced.



ZFS is smart on this point: it will recover on reboot with a minimum amount
of fuss.  Even if you dislodge a drive ... so that it's missing the last
'n' transactions, ZFS seems to figure this out (which I thought was extra
cudos).


Yes this is marketing. practice is somehow different. as you discovered 
yourself.




MY PROBLEM comes from problems that scrub can fix.

Let's talk, in specific, about my home array.  It has 9x 1.5T and 8x 2T in
a RAID-Z configuration (2 sets, obviously).


While RAID-Z is already a king of bad performance, i assume 
you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools you would 
spread load unevenly and make performance even worse.




A full scrub of my drives weighs in at 36 hours or so.


which is funny as ZFS is marketed as doing this efficient (like checking 
only used space).


dd if=/dev/disk of=/dev/null bs=2m would take no more than a few hours. 
and you may do all in parallel.



   vr2/cvs:0x1c1

Now ... this is just an example: after each scrub, the hex number was


seems like scrub simply not do it's work right.


before the old error was cleared.  Then this new error gets similarly
cleared by the next scrub.  It seems that if the scrub returned to this new
found error after fixing the known errors, this could save whole new
scrub runs from being required.


Even better - use UFS.
For both bullet proof recoverability and performance.
If you need help in tuning you may ask me privately.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-21 Thread Peter Jeremy
On 2013-Jan-21 12:12:45 +0100, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl 
wrote:
That's why i use properly tuned UFS, gmirror, and prefer not to use 
gstripe but have multiple filesystems

When I started using ZFS, I didn't fully trust it so I had a gmirrored
UFS root (including a full src tree).  Over time, I found that gmirror
plus UFS was giving me more problems than ZFS.  In particular, I was
seeing behaviour that suggested that the mirrors were out of sync,
even though gmirror insisted they were in sync.  Unfortunately, there
is no way to get gmirror to verify the mirroring or to get UFS to
check correctness of data or metadata (fsck can only check metadata
consistency).  I've since moved to a ZFS root.

Which is marketing, not truth. If you want bullet-proof recoverability, 
UFS beats everything i've ever seen.

I've seen the opposite.  One big difference is that ZFS is designed to
ensure it returns the data that was written to it whereas UFS just
returns the bytes it finds where it thinks it wrote your data.  One
side effect of this is that ZFS is far fussier about hardware quality
- since it checksums everything, it is likely to pick up glitches that
UFS doesn't notice.

If you want FAST crash recovery, use softupdates+journal, available in 
FreeBSD 9.

I'll admit that I haven't used SU+J but one downside of SU+J is that
it prevents the use of snapshots, which in turn prevents the (safe)
use of dump(8) (which is the official tool for UFS backups) on live
filesystems.

 of fuss.  Even if you dislodge a drive ... so that it's missing the last
 'n' transactions, ZFS seems to figure this out (which I thought was extra
 cudos).

Yes this is marketing. practice is somehow different. as you discovered 
yourself.

Most of the time this works as designed.  It's possible there are bugs
in the implementation.

While RAID-Z is already a king of bad performance,

I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
measurements to back up your claim?

 i assume 
you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools 
you would 
spread load unevenly and make performance even worse.

There's no real reason why you could't have 2 different vdevs in the
same pool.

 A full scrub of my drives weighs in at 36 hours or so.

which is funny as ZFS is marketed as doing this efficient (like checking 
only used space).

It _does_ only check used space but it does so in logical order rather
than physical order.  For a fragmented pool, this means random accesses.

Even better - use UFS.

Then you'll never know that your data has been corrupted.

For both bullet proof recoverability and performance.
use ZFS.

-- 
Peter Jeremy


pgpo1y4DGw4Rb.pgp
Description: PGP signature


ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-20 Thread Zaphod Beeblebrox
Please don't misinterpret this post: ZFS's ability to recover from fairly
catastrophic failures is pretty stellar, but I'm wondering if there can be
a little room for improvement.

I use RAID pretty much everywhere.  I don't like to loose data and disks
are cheap.  I have a fair amount of experience with all flavors ... and ZFS
has become a go-to filesystem for most of my applications.

One of the best recommendations I can give for ZFS is it's
crash-recoverability.  As a counter example, if you have most hardware RAID
going or a software whole-disk raid, after a crash it will generally
declare one disk as good and the other disk as to be repaired ... after
which a full surface scan of the affected disks --- reading one and writing
the other --- ensues.  On my Windows desktop, the pair of 2T's take 3 or 4
hours to do this.  A pair of green 2T's can take over 6.  You don't loose
any data, but you have severely reduced performance until it's repaired.

The rub is that you know only one or two blocks could possibly even be
different ... and that this is a highly unoptimized way of going about the
problem.

ZFS is smart on this point: it will recover on reboot with a minimum amount
of fuss.  Even if you dislodge a drive ... so that it's missing the last
'n' transactions, ZFS seems to figure this out (which I thought was extra
cudos).

MY PROBLEM comes from problems that scrub can fix.

Let's talk, in specific, about my home array.  It has 9x 1.5T and 8x 2T in
a RAID-Z configuration (2 sets, obviously).  The drives themselves are
housed (4 each) in external drive bays with a single SATA connection for
each.  I think I have spoken of this here before.

A full scrub of my drives weighs in at 36 hours or so.

Now around Christmas, while moving some things, I managed to pull the plug
on one cabinet of 4 drives.  It was likely that the only active use of the
filesystem was an automated cvs checkin (backup) given that the errors only
appeared on the cvs directory.

IN-THE-END, no data was lost, but I had to scrub 4 times to remove the
complaints, which showed like this from zpool status -v

errors: Permanent errors have been detected in the following files:

vr2/cvs:0x1c1

Now ... this is just an example: after each scrub, the hex number was
different.  I also couldn't actually find the error on the cvs filesystem,
as a side note.  Not many files are stored there, and they all seemed to be
present.

MY TAKEAWAY from this is that 2 major improvements could be made to ZFS:

1) a pause for scrub... such that long scrubs could be paused during
working hours.

2) going back over errors... during each scrub, the new error was found
before the old error was cleared.  Then this new error gets similarly
cleared by the next scrub.  It seems that if the scrub returned to this new
found error after fixing the known errors, this could save whole new
scrub runs from being required.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-20 Thread Attila Nagy

Hi,

On 01/20/13 23:26, Zaphod Beeblebrox wrote:


1) a pause for scrub... such that long scrubs could be paused during
working hours.



While not exactly pause, but isn't playing with scrub_delay works here?

vfs.zfs.scrub_delay: Number of ticks to delay scrub

Set this to a high value during working hours, and set back to its 
normal (or even below) value off working hours. (maybe resilver delay, 
or some other values should also be set, I haven't yet read the relevant 
code)

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-18 Thread Fred Whiteside
On Mon, Dec 17, 2012 at 05:22:50PM -0500, Rick Macklem wrote:
 Zaphod Beeblebrox wrote:
  Does windows 7 support nfs v4, then? Is it expected (ie: is it
  worthwhile
  trying) that nfsv4 would perform at a similar speed to iSCSI? It would
  seem that this at least requires active directory (or this user name
  mapping ... which I remember being hard).
 
 As far as I know, there is no NFSv4 in Windows. I only made the comment
 (which I admit was a bit off topic), because the previous post had stated
  SMB or NFS, they're the same or something like that.)
 
 There was work on an NFSv4 client for Windows being done by CITI at the
 Univ. of Michigan funded by Microsoft research, but I have no idea if it
 was ever released.

There appears to be an implementation of NFSV4 {client,server} for
Windows available from OpenText (via their acquisition of Hummingbird). This
would not be a free product. I have no experience with their NFSV4 stuff,
so have no comments on the speed...

-Fred Whiteside
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-17 Thread Ivan Voras
On 12/12/2012 17:57, Zaphod Beeblebrox wrote:

 The performance of the iSCSI disk is
 about the same as the local disk for some operations --- faster for
 some, slower for others.  The workstation has 12G of memory and it's
 my perception that iSCSI is heavily cached and that this enhances it's
 performance.  The second launch of a game ... or the second switch
 into an area (ie: loading a specific piece of geometry again) is very
 fast.

 The performance on the SMB share is abysmal compared to the
 performance on the iSCSI share.  At the very least, there seems to be
 little benifit to launching the same application twice --- which is
 most likely windows fault.

Think about what you have there:

With iSCSI you have a block device, which is seen on your workstation as
a disk drive, on which it creates a local file system (NTFS), and does
*everything* like it is using a local disk drive. This includes caching,
access permission calculations, file locking, etc.

With a network file system (either SMB or NFS, it doesn't matter), you
need to ask the server for *each* of the following situations:
* to ask the server if a file has been changed so the client can use
cached data (if the protocol supports it)
* to ask the server if a file (or a portion of a file) has been locked
by another client

This basically means that for almost every single IO, you need to ask
the server for something, which involves network traffic and round-trip
delays.

(there are smarter network protocols, and even extensions to SMB and
NFS, but they are not widely used)



signature.asc
Description: OpenPGP digital signature


Re: iSCSI vs. SMB with ZFS.

2012-12-17 Thread Wojciech Puchar

With a network file system (either SMB or NFS, it doesn't matter), you
need to ask the server for *each* of the following situations:
* to ask the server if a file has been changed so the client can use
cached data (if the protocol supports it)
* to ask the server if a file (or a portion of a file) has been locked
by another client


not really if there is only one user of file - then windows know this, but 
change to behaviour you described when there are more users.


AND FINALLY the latter behaviour fails to work properly since windows XP 
(worked fine with windows 98). If you use programs that read/write share 
same files you may be sure data corruption would happen.


you have to set
locking = yes
oplocks = no
level2 oplocks = no

to make it work properly but even more slow!.


This basically means that for almost every single IO, you need to ask
the server for something, which involves network traffic and round-trip
delays.
Not that. The problem is that windows do not use all free memory for 
caching as with local or local (iSCSI) disk.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-17 Thread Rick Macklem
Wojciech Puchar wrote:
  With a network file system (either SMB or NFS, it doesn't matter),
  you
  need to ask the server for *each* of the following situations:
  * to ask the server if a file has been changed so the client can
  use
  cached data (if the protocol supports it)
  * to ask the server if a file (or a portion of a file) has been
  locked
  by another client
 
 not really if there is only one user of file - then windows know this,
 but
 change to behaviour you described when there are more users.
 
 AND FINALLY the latter behaviour fails to work properly since windows
 XP
 (worked fine with windows 98). If you use programs that read/write
 share
 same files you may be sure data corruption would happen.
 
 you have to set
 locking = yes
 oplocks = no
 level2 oplocks = no
 
 to make it work properly but even more slow!.
 
Btw, NFSv4 has delegations, which are essentially level2 oplocks. They can
be enabled for a server if the volumes exported via NFSv4 are not being
accessed locally (including Samba). For them to work, the nfscbd needs to
be running on the client(s) and the clients must have IP addresses visible
to the server for a callback TCP connection (no firewalls or NAT gateways).

Even with delegations working, the client caching is limited to the buffer
cache.

I have an experimental patch that uses on-disk caching in the client for
delegated files (I call it packrats), but it is not ready for production
use. Now that I have the 4.1 client in place, I plan to get back to working
on it.

rick

  This basically means that for almost every single IO, you need to
  ask
  the server for something, which involves network traffic and
  round-trip
  delays.
 Not that. The problem is that windows do not use all free memory for
 caching as with local or local (iSCSI) disk.
 
 
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to
 freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-17 Thread Zaphod Beeblebrox
Does windows 7 support nfs v4, then?  Is it expected (ie: is it worthwhile
trying) that nfsv4 would perform at a similar speed to iSCSI?  It would
seem that this at least requires active directory (or this user name
mapping ... which I remember being hard).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-17 Thread Rick Macklem
Zaphod Beeblebrox wrote:
 Does windows 7 support nfs v4, then? Is it expected (ie: is it
 worthwhile
 trying) that nfsv4 would perform at a similar speed to iSCSI? It would
 seem that this at least requires active directory (or this user name
 mapping ... which I remember being hard).

As far as I know, there is no NFSv4 in Windows. I only made the comment
(which I admit was a bit off topic), because the previous post had stated
 SMB or NFS, they're the same or something like that.)

There was work on an NFSv4 client for Windows being done by CITI at the
Univ. of Michigan funded by Microsoft research, but I have no idea if it
was ever released.

rick

 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to
 freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-17 Thread Mehmet Erol Sanliturk
On Mon, Dec 17, 2012 at 2:22 PM, Rick Macklem rmack...@uoguelph.ca wrote:

 Zaphod Beeblebrox wrote:
  Does windows 7 support nfs v4, then? Is it expected (ie: is it
  worthwhile
  trying) that nfsv4 would perform at a similar speed to iSCSI? It would
  seem that this at least requires active directory (or this user name
  mapping ... which I remember being hard).

 As far as I know, there is no NFSv4 in Windows. I only made the comment
 (which I admit was a bit off topic), because the previous post had stated
  SMB or NFS, they're the same or something like that.)

 There was work on an NFSv4 client for Windows being done by CITI at the
 Univ. of Michigan funded by Microsoft research, but I have no idea if it
 was ever released.

 rick

 


http://www.citi.umich.edu/projects/nfsv4/
Projects: NFS Version 4 Open Source Reference Implementation
We are developing an implementation of NFSv4 and NFSv4.1 for Linux


http://www.citi.umich.edu/projects/nfsv4/windows/
http://www.citi.umich.edu/projects/nfsv4/windows/readme.html


http://www.citi.umich.edu/projects/


Thank you very much .

Mehmet Erol Sanliturk
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org



Re: iSCSI vs. SMB with ZFS.

2012-12-17 Thread Wojciech Puchar

you cannot compare file serving and block device serving.



On Mon, 17 Dec 2012, Zaphod Beeblebrox wrote:


Does windows 7 support nfs v4, then?  Is it expected (ie: is it worthwhile 
trying) that nfsv4 would perform at a similar speed to
iSCSI?  It would seem that this at least requires active directory (or this 
user name mapping ... which I remember being hard).


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

iSCSI vs. SMB with ZFS.

2012-12-12 Thread Zaphod Beeblebrox
So... I have two machines.  My Fileserver is a core-2-duo machine with
FreeBSD-9.1-ish ZFS, istgt and samba 3.6.  My workstation is windows 7
on an i7.  Both have GigE and are connected directly via a managed
switch with jumbo packets (specifically 9016) enabled.  Both are using
tagged vlan packets to the switch (if that matters at all).

Some time ago, I created a 2T iSCSI disk on ZFS to serve the Steam
directory (games) on my C drive as it was growing rather large.  I've
been quite happy with this.  The performance of the iSCSI disk is
about the same as the local disk for some operations --- faster for
some, slower for others.  The workstation has 12G of memory and it's
my perception that iSCSI is heavily cached and that this enhances it's
performance.  The second launch of a game ... or the second switch
into an area (ie: loading a specific piece of geometry again) is very
fast.

But this is imperfect.  The iSCSI disk reserves all of it's space and
the files on the disk are only accessible to the computer that mounts
it.

The most recent Steam update supported an easy way to put steam
folders on other disks and partitions.  I created another Steam folder
on an SMB share from the same server and proceeded to move one of my
games there.

The performance on the SMB share is abysmal compared to the
performance on the iSCSI share.  At the very least, there seems to be
little benifit to launching the same application twice --- which is
most likely windows fault.

I haven't done any major amount of tuning on the SMB share lately, but
the last time I cared, it was setup reasonably... with TCPNODELAY and
whatnot.  I also notice that my copy of smbd runs with 1 thread
(according to top) rather than the 11 threads that istgt uses.

Does this breakdown of performance square with other's experiences?
Will SMB always have significantly less performance than iSCSI coming
from ZFS?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-12 Thread Wojciech Puchar

about the same as the local disk for some operations --- faster for
some, slower for others.  The workstation has 12G of memory and it's
my perception that iSCSI is heavily cached and that this enhances it's


any REAL test means doing something that will not fit in cache.


But this is imperfect.  The iSCSI disk reserves all of it's space and
the files on the disk are only accessible to the computer that mounts
it.


it is even more imperfect. you layout one filesystem over another 
filesystem. not only degraded performance but you don't have 
parallel access to files on this disk.



The performance on the SMB share is abysmal compared to the
performance on the iSCSI share.  At the very least, there seems to be
little benifit to launching the same application twice --- which is
most likely windows fault.


This is SMB protocol. sorry it is stupid. And it doesn't make real use of 
cache.
This is how windows file sharing works. Fine if you just want to copy 
files. not fine if you work on them.



Will SMB always have significantly less performance than iSCSI coming

depends what you do but yes SMB is not efficient.

i am happy with SMB as it is enough to store users or shared documents.
And it is quite fast on large file copy etc.

but terrible on randomly accessing files.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-12 Thread Wojciech Puchar
as you show your needs for unshared data for single workstation is in 
order of single large hard drive.


reducing drive count on file server by one and connecting this one drive 
directly to workstation is the best solution

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-12 Thread Zaphod Beeblebrox
On Wed, Dec 12, 2012 at 5:16 PM, Wojciech Puchar
woj...@wojtek.tensor.gdynia.pl wrote:
 about the same as the local disk for some operations --- faster for
 some, slower for others.  The workstation has 12G of memory and it's
 my perception that iSCSI is heavily cached and that this enhances it's

 any REAL test means doing something that will not fit in cache.

That's plainly not true at all on it's face.  It depends on what
you're testing.  In this particular test, I'm looking at the
performance of the components on a singular common task --- that of
running a game.  It's common to run a game more than once and it's
common to move from area to area in the game loading, unloading and
reloading the same data.  My test is a valid comparison of the two
modes of loading the game ... from iSCSI and from SMB.

You cold criticize me for several things --- I only tested two games
or I have unrealistically large and powerful hardware, but really...
consider what you are testing before you pontificate on test design.

And even in the case where you want to look at the enterprise
performance of a system, knowing both the cache performance and the
disk performance is better than only knowing one or other.  Throughput
is a combination of these features.  Pure disk performance serves as a
lower bound, but cache performance (especially on some of the ZFS
systems people are creating these days ... with 100's of gigs of RAM)
is an equally valid statistic and optimization.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-12 Thread Wojciech Puchar

common to move from area to area in the game loading, unloading and
reloading the same data.  My test is a valid comparison of the two
modes of loading the game ... from iSCSI and from SMB.


i don't know how windows cache network shares (iSCSI is treated as 
local not network). Here is a main problem IMHO.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iSCSI vs. SMB with ZFS.

2012-12-12 Thread Reko Turja
-Original Message- 
From: Zaphod Beeblebrox 
Sent: Wednesday, December 12, 2012 6:57 PM 
To: FreeBSD Hackers 
Subject: iSCSI vs. SMB with ZFS. 

 So... I have two machines.  My Fileserver is a core-2-duo machine with
 FreeBSD-9.1-ish ZFS, istgt and samba 3.6.  My workstation is windows 7
 on an i7.  Both have GigE and are connected directly via a managed
 switch with jumbo packets (specifically 9016) enabled.  Both are using
 tagged vlan packets to the switch (if that matters at all).

My experience on samba has been that it’s slow whatever one does to tweak it 
(probably just too linux-centric code to start with or whatever...) Just as 
another datapoint – do you have tried NFS yet? Win7 has NFS available as OS 
component, although not installed by default?

-Reko
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Zfs import issue

2012-10-03 Thread Ram Chander
Hi,

 I am importing zfs snapshot to freebsd-9 from anther host running
freebsd-9.  When the import happens, it locks the filesystem, df hangs
and unable to use the filesystem. Once the import completes, the filesystem
is back to normal and read/write works fine.  The same doesnt happen in
Solaris/OpenIndiana.

# uname -an
FreeBSD hostname 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30
UTC 2012 r...@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
amd64

Zfs ver: 28


Any inputs would be helpful. Is there any way to overcome this freeze ?


Regards,
Ram
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Looking for testers / feedback for ZFS recieve properties options

2012-09-30 Thread Steven Hartland

We encountered a problem receiving a full ZFS stream from
a disk we had backed up. The problem was the receive was
aborting due to quota being exceeded so I did some digging
around and found that Oracle ZFS now has -x and -o options
as documented here:-
http://docs.oracle.com/cd/E23824_01/html/821-1462/zfs-1m.html

Seems this has been raised as a feature request upstream:
https://www.illumos.org/issues/2745

Anyway being stuck with a backup we couldn't restore I had
a play a implementing these options and have a prototype up
and running, which I'd like feedback on.

This patch also adds a -l option which allows the streams to
be limited to those specified. Another option which I think
would be useful and seemed relatively painless to add.

The initial version of the patch which is based off
8.3-RELEASE can be found here:
http://blog.multiplay.co.uk/dropzone/freebsd/zfs-recv-properties.patch

Any feedback appreciated

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD ZFS source

2012-08-03 Thread Fredrik
Oliver and Chris, thanks. 

3 aug 2012 kl. 00.19 skrev Oliver Pinter:

 http://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/common/
 http://svnweb.freebsd.org/base/head/cddl/contrib/opensolaris/lib/
 
 
 On 8/2/12, Fredrik starkbe...@gmail.com wrote:
 Hello,
 
 Excuse me for this newb question but exactly where are the current ZFS files
 located? I have been looking at the CVS on freebsd.org under
 /src/contrib/opensolaris/ but that does not seem to be the current ones. Is
 this correct?
 
 Regards___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


FreeBSD ZFS source

2012-08-02 Thread Fredrik
Hello,

Excuse me for this newb question but exactly where are the current ZFS files 
located? I have been looking at the CVS on freebsd.org under 
/src/contrib/opensolaris/ but that does not seem to be the current ones. Is 
this correct?

Regards___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD ZFS source

2012-08-02 Thread Chris Nehren
On Thu, Aug 02, 2012 at 22:48:50 +0200 , Fredrik wrote:
 Hello,
 
 Excuse me for this newb question but exactly where are the current ZFS
 files located? I have been looking at the CVS on freebsd.org under
 /src/contrib/opensolaris/ but that does not seem to be the current
 ones. Is this correct?

$ find /usr/src -type d -iname zfs
/usr/src/cddl/contrib/opensolaris/cmd/zfs
/usr/src/cddl/sbin/zfs
/usr/src/lib/libprocstat/zfs
/usr/src/sys/boot/zfs
/usr/src/sys/cddl/boot/zfs
/usr/src/sys/cddl/contrib/opensolaris/common/zfs
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs
/usr/src/sys/modules/zfs
/usr/src/tools/regression/zfs

Those are probably a good start. Some of them just contain a Makefile
pointing you elsewhere in the tree, though.

I might have missed something, and I'm sure someone will correct me if I
have.

-- 
Thanks and best regards,
Chris Nehren


pgp4qvQrT9NKB.pgp
Description: PGP signature


Re: FreeBSD ZFS source

2012-08-02 Thread Oliver Pinter
http://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/common/
http://svnweb.freebsd.org/base/head/cddl/contrib/opensolaris/lib/


On 8/2/12, Fredrik starkbe...@gmail.com wrote:
 Hello,

 Excuse me for this newb question but exactly where are the current ZFS files
 located? I have been looking at the CVS on freebsd.org under
 /src/contrib/opensolaris/ but that does not seem to be the current ones. Is
 this correct?

 Regards___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Root on ZFS GPT and boot to ufs partition

2012-01-23 Thread Andrey Fesenko
System install for manual http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot
only + freebsd-ufs (ada0p2)
 uname -a
FreeBSD beastie.mydomain.local 10.0-CURRENT FreeBSD 10.0-CURRENT #0
r229812: Mon Jan  9 19:08:10 MSK 2012
andrey@beastie.mydomain.local:/usr/obj/usr/src/sys/W_BOOK  amd64
 gpart show
=   34  625142381  ada0  GPT  (298G)
 34128 1  freebsd-boot  (64k)
162   26621952 2  freebsd-ufs  (12G)
   266221148388608 3  freebsd-swap  (4.0G)
   35010722  590131693 4  freebsd-zfs  (281G)
boot code MBR (pmbr) and gptzfsboot loader

In the old loader was F1,F2,F3 new no :(

Is there a way to boot system freebsd-ufs (ada0p2)
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Root on ZFS GPT and boot to ufs partition

2012-01-23 Thread Volodymyr Kostyrko

Andrey Fesenko wrote:

System install for manual http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot
only + freebsd-ufs (ada0p2)

uname -a

FreeBSD beastie.mydomain.local 10.0-CURRENT FreeBSD 10.0-CURRENT #0
r229812: Mon Jan  9 19:08:10 MSK 2012
andrey@beastie.mydomain.local:/usr/obj/usr/src/sys/W_BOOK  amd64

gpart show

=34  625142381  ada0  GPT  (298G)
  34128 1  freebsd-boot  (64k)
 162   26621952 2  freebsd-ufs  (12G)
266221148388608 3  freebsd-swap  (4.0G)
35010722  590131693 4  freebsd-zfs  (281G)
boot code MBR (pmbr) and gptzfsboot loader

In the old loader was F1,F2,F3 new no :(

Is there a way to boot system freebsd-ufs (ada0p2)


`gpart set -a bootonce -i 2 ada0` should do.

--
Sphinx of black quartz judge my vow.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Root on ZFS GPT and boot to ufs partition

2012-01-23 Thread Andrey Fesenko
On Mon, Jan 23, 2012 at 7:18 PM, Volodymyr Kostyrko c.kw...@gmail.com wrote:
 Andrey Fesenko wrote:

 System install for manual http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot
 only + freebsd-ufs (ada0p2)

 uname -a

 FreeBSD beastie.mydomain.local 10.0-CURRENT FreeBSD 10.0-CURRENT #0
 r229812: Mon Jan  9 19:08:10 MSK 2012
 andrey@beastie.mydomain.local:/usr/obj/usr/src/sys/W_BOOK  amd64

 gpart show

 =        34  625142381  ada0  GPT  (298G)
          34        128     1  freebsd-boot  (64k)
         162   26621952     2  freebsd-ufs  (12G)
    26622114    8388608     3  freebsd-swap  (4.0G)
    35010722  590131693     4  freebsd-zfs  (281G)
 boot code MBR (pmbr) and gptzfsboot loader

 In the old loader was F1,F2,F3 new no :(

 Is there a way to boot system freebsd-ufs (ada0p2)


 `gpart set -a bootonce -i 2 ada0` should do.

 --
 Sphinx of black quartz judge my vow.

# gpart set -a bootonce -i 2 ada0
bootonce set on ada0p2
#shutdown -r now

No, not work. After reboot freebsd-zfs (ada0p4)
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device

2011-08-23 Thread Ivan Voras

On 23/08/2011 03:23, Peter Jeremy wrote:

On 2011-Aug-22 12:45:08 +0200, Ivan Vorasivo...@freebsd.org  wrote:

It would be suboptimal but only for the slight waste of space that would
have otherwise been reclaimed if the block or fragment size remained 512
or 2K. This waste of space is insignificant for the vast majority of
users and there are no performance penalties, so it seems that switching
to 4K sectors by default for all file systems would actually be a good idea.


This is heavily dependent on the size distribution.  I can't quickly
check for ZFS but I've done some quick checks on UFS.  The following
are sizes in MB for my copies of the listed trees with different UFS
frag size.  These include directories but not indirect blocks:

   1b  512b  1024b  2048b  4096b
 4430  4511  4631   4875   5457  /usr/ncvs
 4910  5027  5181   5499   6133  Old FreeBSD SVN repo
  299   370   485733   1252  /usr/ports cheched out from CVS
  467   485   509557656  /usr/src 8-stable checkout from CVS

Note that the ports tree grew by 50% going from 1K to 2K frags and
will grow by another 70% going to 4KB frags.  Similar issues will
be seen when you have lots of small file.


I agree but there are at least two things going for making the increase 
anyway:


1) 2 TB drives cost $80
2) Where the space is really important, the person in charge usually 
knows it and can choose a non-default size like 512b fragments.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device

2011-08-23 Thread Aled Morris
On 23 August 2011 10:52, Ivan Voras ivo...@freebsd.org wrote:


 I agree but there are at least two things going for making the increase
 anyway:

 1) 2 TB drives cost $80
 2) Where the space is really important, the person in charge usually knows
 it and can choose a non-default size like 512b fragments.



helpers like sysinstall should help with choosing the smaller blocks for
smaller drives (especially SSD)

Aled
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device

2011-08-23 Thread Ivan Voras

On 23/08/2011 11:59, Aled Morris wrote:

On 23 August 2011 10:52, Ivan Vorasivo...@freebsd.org  wrote:



I agree but there are at least two things going for making the increase
anyway:

1) 2 TB drives cost $80
2) Where the space is really important, the person in charge usually knows
it and can choose a non-default size like 512b fragments.


helpers like sysinstall should help with choosing the smaller blocks for
smaller drives (especially SSD)


Only via hints and help text. Too much magic in the installer leads to 
awkward choices :)


(e.g. first you need to distinguish between a VM with a small drive, a 
SSD small drive, or a SAN small volume... it quickly turns into an 
AI-class problem).


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device

2011-08-22 Thread Ivan Voras

On 19/08/2011 14:21, Aled Morris wrote:

On 19 August 2011 11:15, Tom Evanstevans...@googlemail.com  wrote:


On Thu, Aug 18, 2011 at 6:50 PM, Yuriy...@rawbw.com  wrote:

Some latest hard drives have logical sectors of 512 byte when they

actually

have 4k physical sectors.



...

Shouldn't UFS and ZFS drivers be able to either read the right sector size



from the underlying device or at least issue a warning?


The device never reports the actual sector size, so unless FreeBSD
keeps a database of 4k sector hard drives that report as 512 byte
sector hard drives, there is nothing that can be done.


At what point should we change the default in newfs/zfs to 4k?


It is already changed for UFS in 9.


I guess formatting the filesystem for 4k sectors on a 512b drive would still
work but it would be suboptimal.  What would the performance penalty be in
reality?


It would be suboptimal but only for the slight waste of space that would 
have otherwise been reclaimed if the block or fragment size remained 512 
or 2K. This waste of space is insignificant for the vast majority of 
users and there are no performance penalties, so it seems that switching 
to 4K sectors by default for all file systems would actually be a good idea.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


  1   2   3   4   >