a few comments below... but in general, what you are doing represents quite a 
few hours
of work, more hours than I have to spend on email :-(

On Sep 29, 2012, at 9:55 AM, Edward Ned Harvey (openindiana) 
<[email protected]> wrote:

> I am beating my head against a wall, and beginning to wonder if this is my 
> fault, or the system's fault.  Because it seems to me like it SHOULD work.  I 
> am very confident now that I got it all set up right...
> 
> I wonder if this is one of those areas that was unstable at the time of the 
> latest open source release, and it's something I'm being bullied into paying 
> Oracle for...  Is it?  I'm planning to go try solaris 11 today as a trial, to 
> see if it solves the problem.  I'm not sure if I hope for the best, or hope 
> for it to fail too.
> 
> I have two identical systems, call them host1 and host2, sitting side by 
> side, both running openindiana 151a6.  They are connected to a LAN, and the 
> second interface is connected to each other via crossover cable.  They have 6 
> disks each.  The first 2 disks (disk0 and disk1) are configured as an OS 
> mirror.  The remaining 4 disks (disk2 - disk5) are shared via iscsi over the 
> crossover interface.
> 
> Each machine connects to 4 iscsi local disks via 127.0.0.1, and 4 remote 
> disks via iscsi, using the other machine's crossover IP address.  Thankfully, 
> the device names are the same on both sides, so I can abbreviate and simplify 
> things for myself by doing something like this:
> 
> export H1D2=c5t600144F00800273F3337506659260001d0
> ...
> export H2D5=c5t600144F0080027238750506658910004d0
> 
> I create two pools.  Each pool has 2 local disks and 2 remote disks in a 
> mirror.
> sudo zpool create HAstorage1 mirror $H1D2 $H2D2 mirror $H1D3 $H2D3
> sudo zpool create HAstorage2 mirror $H1D4 $H2D4 mirror $H1D5 $H2D5
> 
> I shutdown the VM's which are running without any problem on the local 
> storage.  I use zfs send to migrate them onto HAstorage1 and HAstorage2, I 
> tweak their config files and import them back into virtualbox.  Launch them.  
> Everything goes perfectly.
> 
> I leave the machines in production for a while.  Continually monitor 
> /var/adm/messages and zpool status on both systems.  No problems.  I beat the 
> hell out of everything, close and reopen virtualbox, restart all the guest 
> VM's, no problem.  Export pool and import on the other system.  Bring up 
> guest machines on the other system.  Everything is awesome.  Until I reboot 
> one of the hosts.
> 
> It doesn't seem to matter if I use "reboot -p" or "init 6"  It seems to 
> happen *sometimes* in both situations.
> 
> The one thing that is consistent, during the reboot, both on the way down and 
> on the way up, I'll see console message being thrown about unable to access 
> scsi device.  Sometimes the machine will actually *fail* to reboot, and I'll 
> have to power-reset.
> 
> After reboot, sometimes the pool appears perfectly healthy,  and sometimes it 
> appears as degraded, where ... This is so odd ...  Sometimes the offline disk 
> is the remote disk, and sometimes the offline disk is the local disk.  In 
> either case, I zpool clear the offending device, and the resilver takes 
> place, and the pool looks healthy again.
> 
> Then I launch a VM.  (Linux VM).  The guest gets as far as grub, and loading 
> the kernel, and start going through the startup scripts, and before the guest 
> is fully up, it starts choking, and goes into readonly mode, and fails to 
> boot.  I know it's going to fail before it fails, because it performs crazy 
> dog slow.  Meanwhile, if I watch /var/adm/messages on the host, I again see 
> scsi errors being thrown.  Oddly, zpool status still looks normal.  
> Generally, virtualbox will choke so bad I'll have to force kill virtualbox... 
>  Sometimes I can gracefully kill virtualbox.  Sometimes it's so bad I can't 
> even do that, and I'm forced to power reset.
> 
> I would normally suspect a hardware or ethernet or local disk driver problem 
> for these type of symptoms - But I feel that's pretty solidly eliminated, by 
> a few tests:
> 
> 1- As long as I don't reboot the host, the guest VM's can stay in production 
> the whole time.  All the traffic is going across the network, the mirror is 
> simply working.  I can export pool on one system, and import it on the other, 
> and launch the VM's on the other host.  No problem.  I can "zfs send" these 
> filesystems across the interface, and receive on the other side, no problem.  
> My ethernet error count stays zero according to netstat, on both sides.  So I 
> really don't think it's an ethernet hardware or bad cable problem.
> 
> 2- Only during reboot, I suddenly get scsi errors in the log.  To me, it 
> seems, this shouldn't happen.
> 
> Surprisingly to me:  During the minute when one system is down, the other 
> system still shows nothing in /var/adm/messages, and still shows the pool as 
> healthy.

Default sd timeouts are 60 seconds. This has been discussed
ad nauseum.

>  At first I wondered if this meant I screwed up the mapping, and was actually 
> using all the local disks, but that's not the case.  It turns out, in 
> actuality, the host that's still up is treating the failure as simple IO 
> errors, increasing the error count for the unavailable devices.  

Yes, of course.

> Only after the count gets sufficiently high, does the running system finally 
> mark the offending device as offline / unavailable.  "Error count too high."  
> When the other system comes back up again, annoyingly, it doesn't 
> automatically bring the unavailable device back online.  I have to "zpool 
> clear" the offending device.  As soon as I clear one device - the other one 
> automatically comes online too.

Works as designed.

> I would expect a little more plug-n-play-ish intelligence here.  When the 
> remote iscsi device disappears, it seems to me, it should be handled a little 
> more similarly to somebody yanking out the USB external disk.

iSCSI devices are not removable disks.

>  Rather than retrying and increasing the error count, the running system 
> should be alerted this disk is going offline, and handle that situation more 
> gracefully ...  Well, that's just my expectation anyway.  I only care right 
> now because it's causing problems.

Given the constraints of IP, how do you propose that an iSCSI initiator will 
discover
that a remote LU is down?

> For sanity check, here is how I created the iscsi devices:
> (Before any comments, I know now that I don't need to be so obsessive about 
> keeping track of which local device maps to which iscsi device name, but this 
> should still be fine, and this is what I did originally; I would probably do 
> it differently next time, with only one target per host machine, and simply 
> allow the LUNs to map themselves.)
> 
> (on both systems)
> sudo pkg install pkg:/network/iscsi/target
> sudo svcadm enable -s svc:/system/stmf
> sudo svcadm enable -s svc:/network/iscsi/target
> sudo iscsiadm modify discovery --static enable
> 
> (on host1)
> export DISKNUM=2
> sudo sbdadm create-lu /dev/rdsk/c4t${DISKNUM}d0

NB, in the bad old days, you needed to use sbdadm here. Today,
you can often do the same thing directly with stmfadm, where you
can also assign an alias, so you can make the names something
memorable, rather than relying on a spreadsheet.

> export GUID=(whatever it said)
> sudo stmfadm create-tg disk${DISKNUM}
> sudo stmfadm add-view -t disk${DISKNUM} $GUID
> sudo itadm create-target
> export TARGET=(whatever it said)

Normally, we would create a target and (usually) a target group
before adding any views.

You might consider using --alias and setting it to something memorable,
rather than relying on a spreadsheet.

> sudo stmfadm offline-target $TARGET
> sudo stmfadm add-tg-member -g disk${DISKNUM} $TARGET
> sudo stmfadm online-target $TARGET
> sudo iscsiadm add static-config ${TARGET},127.0.0.1
> sudo format -e
> (Make a note of the new device name, and hit Ctrl-C.  Keep record in a 
> spreadsheet somewhere, "host1 disk2 = c5t600144F00800273F3337506659260001d0")
> 
> (Now on the other host)
> export TARGET=(whatever it said)
> sudo stmfadm online-target $TARGET
> sudo iscsiadm add static-config ${TARGET},192.168.7.7
> 
> Repeat all the above, for each disk, and for each host.  In the end, I can 
> see all 8 disks from both hosts, and I have a spreadsheet to remember which 
> iscsi device name maps to which physical device.
> 

The reason you don't see commercial support for configs like this
is because proving the system will be stable in all cases is very,
very difficult. The interdependencies in this architecture make 
the proofs non-trivial, to say the least.
 -- richard

--
illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco 
www.zfsday.com
[email protected]
+1-760-896-4422






-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to