> -----Original Message----- > From: Fred Liu > Sent: 星期一, 三月 28, 2016 16:57 > To: 'Chris Siebenmann'; 'Richard Jahnel' > Cc: 'omnios-discuss@lists.omniti.com' > Subject: RE: [OmniOS-discuss] 4kn or 512e with ashift=12 > > > > > -----Original Message----- > > From: Fred Liu > > Sent: 星期四, 三月 24, 2016 18:26 > > To: 'Chris Siebenmann'; Richard Jahnel > > Cc: omnios-discuss@lists.omniti.com > > Subject: RE: [OmniOS-discuss] 4kn or 512e with ashift=12 > > > > > > > > > -----Original Message----- > > > From: Chris Siebenmann [mailto:c...@cs.toronto.edu] > > > Sent: 星期三, 三月 23, 2016 23:33 > > > To: Richard Jahnel > > > Cc: Chris Siebenmann; Fred Liu; omnios-discuss@lists.omniti.com > > > Subject: Re: [OmniOS-discuss] 4kn or 512e with ashift=12 > > > > > > > It should be noted that using a 512e disk as a 512n disk subjects > > > > you to a significant risk of silent corruption in the event of power > > > > loss. > > > > Because 512e disks does a read>modify>write operation to modify > > > > 512byte chunk of a 4k sector, zfs won't know about the other > > > > 7 corrupted 512e sectors in the event of a power loss during a > > > > write operation. So when discards the incomplete txg on reboot, it > > > > won't do anything about the other 7 512e sectors it doesn't know were > affected. > > > > > > This is true; under normal circumstances you do not want to use a > > > 512e drive in an ashift=9 vdev. However, if you have a dead 512n > > > drive and you have no remaining 512n spares, your choices are to run > > > without redundancy, to wedge in a 512e drive and accept the > > > potential problems on power failure (problems that can likely be > > > fixed by scrubbing the pool afterwards), or obtain enough additional > > > drives (and perhaps > > > server(s)) to entirely rebuild the pool on 512e drives with ashift=12. > > > > > > In this situation, running with a 512e drive and accepting the > > > performance issues and potential exposure to power failures is > > > basically the lesser evil. (I wish ZFS was willing to accept this, > > > but it isn't.) > > > > > [Fred Liu]: I have a similar test here: > > > > [root@00-25-90-74-f5-04 ~]# zpool status > > pool: tank > > state: ONLINE > > scan: resilvered 187G in 21h9m with 0 errors on Thu Jan 15 08:05:16 > > 2015 > > config: > > > > NAME STATE READ WRITE CKSUM > > tank ONLINE 0 0 0 > > raidz2-0 ONLINE 0 0 0 > > c2t45d0 ONLINE 0 0 0 > > c2t46d0 ONLINE 0 0 0 > > c2t47d0 ONLINE 0 0 0 > > c2t48d0 ONLINE 0 0 0 > > c2t49d0 ONLINE 0 0 0 > > c2t52d0 ONLINE 0 0 0 > > c2t53d0 ONLINE 0 0 0 > > c2t44d0 ONLINE 0 0 0 > > spares > > c0t5000CCA6A0C791CBd0 AVAIL > > > > errors: No known data errors > > > > pool: zones > > state: ONLINE > > scan: scrub repaired 0 in 2h45m with 0 errors on Tue Aug 12 20:24:30 > > 2014 > > config: > > > > NAME STATE READ WRITE > CKSUM > > zones ONLINE 0 0 0 > > raidz2-0 ONLINE 0 0 0 > > c0t5000C500584AC07Bd0 ONLINE 0 0 0 > > c0t5000C500584AC557d0 ONLINE 0 0 0 > > c0t5000C500584ACB1Fd0 ONLINE 0 0 0 > > c0t5000C500584AD7B3d0 ONLINE 0 0 0 > > c0t5000C500584C30DBd0 ONLINE 0 0 0 > > c0t5000C500586E54A3d0 ONLINE 0 0 0 > > c0t5000C500586EF0CBd0 ONLINE 0 0 0 > > c0t5000C50058426A0Fd0 ONLINE 0 0 0 > > logs > > c4t0d0 ONLINE 0 0 0 > > c4t1d0 ONLINE 0 0 0 > > cache > > c0t55CD2E404BE9CB7Ed0 ONLINE 0 0 0 > > > > errors: No known data errors > > > > [root@00-25-90-74-f5-04 ~]# format > > Searching for disks...done > > > > > > AVAILABLE DISK SELECTIONS: > > 0. c0t55CD2E404BE9CB7Ed0 <ATA-INTEL > SSDSC2BW18-DC32-167.68GB> > > /scsi_vhci/disk@g55cd2e404be9cb7e > > 1. c0t5000C500584AC07Bd0 > > <SEAGATE-ST91000640SS-0004-931.51GB> > > /scsi_vhci/disk@g5000c500584ac07b > > 2. c0t5000C500584AC557d0 > > <SEAGATE-ST91000640SS-0004-931.51GB> > > /scsi_vhci/disk@g5000c500584ac557 > > 3. c0t5000C500584ACB1Fd0 > > <SEAGATE-ST91000640SS-0004-931.51GB> > > /scsi_vhci/disk@g5000c500584acb1f > > 4. c0t5000C500584AD7B3d0 > > <SEAGATE-ST91000640SS-0004-931.51GB> > > /scsi_vhci/disk@g5000c500584ad7b3 > > 5. c0t5000C500584C30DBd0 > > <SEAGATE-ST91000640SS-0004-931.51GB> > > /scsi_vhci/disk@g5000c500584c30db > > 6. c0t5000C500586E54A3d0 > > <SEAGATE-ST91000640SS-0004-931.51GB> > > /scsi_vhci/disk@g5000c500586e54a3 > > 7. c0t5000C500586EF0CBd0 > > <SEAGATE-ST91000640SS-0004-931.51GB> > > /scsi_vhci/disk@g5000c500586ef0cb > > 8. c0t5000C50058426A0Fd0 > > <SEAGATE-ST91000640SS-0004-931.51GB> > > /scsi_vhci/disk@g5000c50058426a0f > > 9. c0t5000CCA6A0C791CBd0 <ATA-Hitachi > HTS54101-A480-931.51GB> > > /scsi_vhci/disk@g5000cca6a0c791cb > > 10. c0t50000F0056425331d0 <ATA-SAMSUNG > MMCRE28G-AS1Q-119.24GB> > > /scsi_vhci/disk@g50000f0056425331 > > 11. c2t44d0 <ATA-Hitachi HTS54101-A480-931.51GB> > > /pci@0,0/pci8086,1c10@1c/pci1000,3140@0/sd@2c,0 > > 12. c2t45d0 <ATA-Hitachi HTS54101-A480-931.51GB> > > /pci@0,0/pci8086,1c10@1c/pci1000,3140@0/sd@2d,0 > > 13. c2t46d0 <ATA-ST1000LM024 HN-M-0002-931.51GB> > > /pci@0,0/pci8086,1c10@1c/pci1000,3140@0/sd@2e,0 > > 14. c2t47d0 <ATA-ST1000LM024 HN-M-0002-931.51GB> > > /pci@0,0/pci8086,1c10@1c/pci1000,3140@0/sd@2f,0 > > 15. c2t48d0 <ATA-WDC WD10JPVT-08A-1A01-931.51GB> > > /pci@0,0/pci8086,1c10@1c/pci1000,3140@0/sd@30,0 > > 16. c2t49d0 <ATA-WDC WD10JPVT-75A-1A01-931.51GB> > > /pci@0,0/pci8086,1c10@1c/pci1000,3140@0/sd@31,0 > > 17. c2t52d0 <ATA-ST1000LM024 HN-M-0001-931.51GB> > > /pci@0,0/pci8086,1c10@1c/pci1000,3140@0/sd@34,0 > > 18. c2t53d0 <ATA-ST1000LM024 HN-M-0001-931.51GB> > > /pci@0,0/pci8086,1c10@1c/pci1000,3140@0/sd@35,0 > > 19. c4t0d0 <ATA-ANS9010_2NNN2NNN-_200-1.78GB> > > /pci@0,0/pci15d9,624@1f,2/disk@0,0 > > 20. c4t1d0 <ATA-ANS9010_2NNN2NNN-_200-1.78GB> > > /pci@0,0/pci15d9,624@1f,2/disk@1,0 > > > > > > > > [root@00-25-90-74-f5-04 ~]# zpool replace tank c2t44d0 > > c0t5000CCA6A0C791CBd0 cannot replace c2t44d0 with > > c0t5000CCA6A0C791CBd0: devices have different sector alignment > > > > But in fact "c2t44d0" and "c0t5000CCA6A0C791CBd0" are in the same > > model -- ATA-Hitachi HTS54101-A480-931.51GB. > > That is HTS541010A9E680 > > (https://www.hgst.com/sites/default/files/resources/TS5K1000_ds.pdf) > > which is a 512e HDD. > > The *only* difference is that "c2t44d0" is attached to a LSI 1068 HBA > > and "c0t5000CCA6A0C791CBd0" is attached to a LSI 2308 HBA. > > > > [root@00-25-90-74-f5-04 ~]# zdb -l /dev/dsk/c2t44d0s0 | grep ashift > > ashift: 9 > > ashift: 9 > > ashift: 9 > > ashift: 9 > > > > [root@00-25-90-74-f5-04 ~]# zdb -l /dev/dsk/c0t5000CCA6A0C791CBd0s0 | > > grep ashift > > ashift: 12 > > ashift: 12 > > ashift: 12 > > ashift: 12 > > format> inq > > Vendor: ATA > > Product: Hitachi HTS54101 > > Revision: A480 > > format> q > > > > adding " "ATA Hitachi HTS54101", "physical-block-size:512"," into > > sd.conf > > > > [root@00-25-90-74-f5-04 ~]# update_drv -vf sd Cannot unload module: sd > > Will be unloaded upon reboot. > > Forcing update of sd.conf. > > sd.conf updated in the kernel. > > > > Reboot the server for "cfgadm -c unconfigure" can't work here. > > > > [root@00-25-90-74-f5-04 ~]# zdb -l /dev/dsk/c0t5000CCA6A0C791CBd0s0 | > > grep ashift > > [root@00-25-90-74-f5-04 ~]# > > > > No ashift in output now. > > > > [root@00-25-90-74-f5-04 ~]# zdb -l /dev/dsk/c2t44d0s0 | grep ashift > > ashift: 9 > > ashift: 9 > > ashift: 9 > > ashift: 9 > > > > same like before > > > > [root@00-25-90-74-f5-04 ~]# zpool replace tank c2t44d0 > > c0t5000CCA6A0C791CBd0 cannot replace c2t44d0 with > > c0t5000CCA6A0C791CBd0: devices have different sector alignment > > > > Remove the spare: > > [root@00-25-90-74-f5-04 ~]# zpool remove tank c0t5000CCA6A0C791CBd0 > > [root@00-25-90-74-f5-04 ~]# > > > > Add it back: > > [root@00-25-90-74-f5-04 ~]# zpool add tank spare c0t5000CCA6A0C791CBd0 > > [root@00-25-90-74-f5-04 ~]# > > > > [root@00-25-90-74-f5-04 ~]# zpool status tank > > pool: tank > > state: ONLINE > > scan: resilvered 187G in 21h9m with 0 errors on Thu Jan 15 08:05:16 > > 2015 > > config: > > > > NAME STATE READ WRITE CKSUM > > tank ONLINE 0 0 0 > > raidz2-0 ONLINE 0 0 0 > > c2t45d0 ONLINE 0 0 0 > > c2t46d0 ONLINE 0 0 0 > > c2t47d0 ONLINE 0 0 0 > > c2t48d0 ONLINE 0 0 0 > > c2t49d0 ONLINE 0 0 0 > > c2t52d0 ONLINE 0 0 0 > > c2t53d0 ONLINE 0 0 0 > > c2t44d0 ONLINE 0 0 0 > > spares > > c0t5000CCA6A0C791CBd0 AVAIL > > > > errors: No known data errors > > > > Still not working; > > > > [root@00-25-90-74-f5-04 ~]# zpool replace tank c2t44d0 > > c0t5000CCA6A0C791CBd0 cannot replace c2t44d0 with > > c0t5000CCA6A0C791CBd0: devices have different sector alignment > > > > > > maybe the sd.conf update is not correct. > > > > > > It looks ZoL is a really helpful tool to override ashift when sd.conf doesn't > work > even with following errors: > > PANIC: blkptr at ffff8807e4b34000 has invalid CHECKSUM 10 Showing stack for > process 11419 > Pid: 11419, comm: z_zvol Tainted: P -- ------------ > 2.6.32-573.3.1.el6.x86_64 #1 > Call Trace: > [<ffffffffa0472e9d>] ? spl_dumpstack+0x3d/0x40 [spl] [<ffffffffa0472f2d>] ? > vcmn_err+0x8d/0xf0 [spl] [<ffffffff815391da>] ? > schedule_timeout+0x19a/0x2e0 [<ffffffff81089c10>] ? > process_timeout+0x0/0x10 [<ffffffff810a1697>] ? finish_wait+0x67/0x80 > [<ffffffffa046e4bf>] ? spl_kmem_cache_alloc+0x38f/0x8c0 [spl] > [<ffffffffa0526e62>] ? zfs_panic_recover+0x52/0x60 [zfs] > [<ffffffffa04c7220>] ? > arc_read_done+0x0/0x320 [zfs] [<ffffffffa0577283>] ? > zfs_blkptr_verify+0x83/0x420 [zfs] [<ffffffff810a14b0>] ? > autoremove_wake_function+0x0/0x40 [<ffffffffa0578292>] ? > zio_read+0x42/0x100 [zfs] [<ffffffff81178cbd>] ? __kmalloc_node+0x4d/0x60 > [<ffffffffa04c7220>] ? arc_read_done+0x0/0x320 [zfs] [<ffffffffa04c9721>] ? > arc_read+0x341/0xa70 [zfs] [<ffffffffa04d1b34>] ? > dbuf_prefetch+0x1f4/0x2e0 [zfs] [<ffffffffa04d892a>] ? > dmu_prefetch+0x1da/0x210 [zfs] [<ffffffff8127e51d>] ? > alloc_disk_node+0xad/0x110 [<ffffffffa0584ce7>] ? > zvol_create_minor_impl+0x607/0x630 [zfs] [<ffffffffa0585298>] ? > zvol_create_minors_cb+0x88/0xf0 [zfs] [<ffffffffa04dac36>] ? > dmu_objset_find_impl+0x106/0x420 [zfs] [<ffffffffa0585210>] ? > zvol_create_minors_cb+0x0/0xf0 [zfs] [<ffffffffa04dacfa>] ? > dmu_objset_find_impl+0x1ca/0x420 [zfs] [<ffffffffa0585210>] ? > zvol_create_minors_cb+0x0/0xf0 [zfs] [<ffffffffa04dacfa>] ? > dmu_objset_find_impl+0x1ca/0x420 [zfs] [<ffffffffa0585210>] ? > zvol_create_minors_cb+0x0/0xf0 [zfs] [<ffffffffa0585210>] ? > zvol_create_minors_cb+0x0/0xf0 [zfs] [<ffffffffa04dafa2>] ? > dmu_objset_find+0x52/0x80 [zfs] [<ffffffffa046dd26>] ? > spl_kmem_alloc+0x96/0x1a0 [spl] [<ffffffffa05850a2>] ? > zvol_task_cb+0x392/0x3b0 [zfs] [<ffffffffa0470ebf>] ? > taskq_thread+0x25f/0x540 [spl] [<ffffffff810672b0>] ? > default_wake_function+0x0/0x20 [<ffffffffa0470c60>] ? > taskq_thread+0x0/0x540 [spl] [<ffffffff810a101e>] ? kthread+0x9e/0xc0 > [<ffffffff8100c28a>] ? child_rip+0xa/0x20 [<ffffffff810a0f80>] ? > kthread+0x0/0xc0 [<ffffffff8100c280>] ? child_rip+0x0/0x20 > INFO: task z_zvol:11419 blocked for more than 120 seconds. > Tainted: P -- ------------ 2.6.32-573.3.1.el6.x86_64 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > z_zvol D 0000000000000003 0 11419 2 0x00000000 > ffff8807e5daf610 0000000000000046 0000000000000000 0000000000000000 > 0000000000000000 ffff8807e5daf5e8 000000b57f764afb 0000000000020000 > ffff8807e5daf5b0 0000000100074cde ffff8807e68bbad8 ffff8 > > The pool can still be imported by ZoL and processed by zpool replace -o > ashift=9: > > [root@livecd ~]# zpool status > pool: tank > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Mon Mar 28 08:36:30 2016 > 1.95G scanned out of 1.47T at 1.76M/s, 242h30m to go > 246M resilvered, 0.13% done > config: > > NAME STATE > READ WRITE CKSUM > tank ONLINE > 0 0 0 > raidz2-0 ONLINE > 0 0 0 > ata-Hitachi_HTS541010A9E680_J8400076GJU97C ONLINE > 0 0 0 > ata-ST1000LM024_HN-M101MBB_S2R8J9BC502817 ONLINE > 0 0 0 > ata-ST1000LM024_HN-M101MBB_S2R8J9KC505621 ONLINE > 0 0 0 > ata-WDC_WD10JPVT-08A1YT2_WD-WXD1A4355927 ONLINE > 0 0 0 > ata-WDC_WD10JPVT-75A1YT0_WXP1EA2KFK12 ONLINE > 0 0 0 > ata-ST1000LM024_HN-M101MBB_S318J9AF191087 ONLINE > 0 0 0 > ata-ST1000LM024_HN-M101MBB_S318J9AF191090 ONLINE > 0 0 0 > spare-7 ONLINE > 0 0 0 > ata-Hitachi_HTS541010A9E680_J8400076GJ0KZD ONLINE > 0 0 0 > sdw ONLINE > 0 0 0 (resilvering) > spares > sdw INUSE > currently in use > > It can be good remedy for the old storage systems with 512n HDDs and > without 512n spares. > >
Switching back to illumos: [root@00-25-90-74-f5-04 ~]# zpool status tank pool: tank state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Mar 28 20:36:30 2016 2.29G scanned out of 1.47T at 1.63M/s, 261h27m to go 288M resilvered, 0.15% done config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c2t45d0 ONLINE 0 0 0 c2t46d0 ONLINE 0 0 0 c2t47d0 ONLINE 0 0 0 c2t48d0 ONLINE 0 0 0 c2t49d0 ONLINE 0 0 0 c2t52d0 ONLINE 0 0 0 c2t53d0 ONLINE 0 0 0 spare-7 ONLINE 0 0 0 c2t44d0 ONLINE 0 0 0 c0t5000CCA6A0C791CBd0 ONLINE 0 0 0 (resilvering) spares c0t5000CCA6A0C791CBd0 INUSE currently in use errors: No known data errors Thanks. Fred _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss