Hey all,

   So here is an update.  I tried multi-path and it made no difference
for my problem.  I was able to get umount to work, however the problem
just propagated over to logout.  So for example here is what I would
do in error recovery.

1.  Disconnect the iSCSI
2.  Wait for the timeout.
3.  Wait till all the iSCSI debug messages subside and the volume gets
remounted as readonly.
4.  Reconnect the iSCSI
5.  Execute a 'umount -lf' on the mount point <-- IN the past I had
stopped my app from trying to write here, now I just umounted.
6.  Execute a logout: iqn.2000-08.com.intransa:ivsms.dg1.storage1 --
logout
-->  It ends up freezing here.

     I tried a bunch of stuff to get around this, but nothing worked
for me.  So I upgraded my kernel from 2.6.16 to 2.6.18 and all of a
sudden the problem was gone.  I ran about 50 tests and could not
reproduce.  No I'm wondering what could have changed from 2.6.16 to
2.6.18 that would remedy this problem.  I'd rather just simply patch
2.6.16 with whatever changes make this work, because its hard for me
to upgrade to a new kernel without causing other problems.  Does
anyone out there know?  I'm going to look through the change logs and
see if anything stands out but if any of you people know I'd
appreciate your input.
     Mike you said in your response:

"The other thing is that if you get those errors and then you pull the
cable back in so IO can execute again, you are in a wierd state. I am
not sure what the FS will do and supports at the point it is already
had
some IO failed."

    Is this implying that the filesystems themselves are the culprit?

Thanks.

On Jun 23, 1:53 pm, Mike Christie <[EMAIL PROTECTED]> wrote:
> An Oneironaut wrote:
> >      I posted a little while back about this, but I still seem to be
> > having trouble with this issue.  Originally I tried to setup my iSCSI
> > connection so that it had a 24 day timeout period and the no-op timers
> > would be disabled.  However this timeout led to a variety of issues
> > including causing umount, reboot, and other commands to hang.
>
> Did you get IO errors before you tried those commands? If IO is still
> internally queued, you would want to run the iscsiadm command to logout
> which in this case would just fail everything. When the FS gets all the
> errors for the IO it had outstanding, I think you can then forcable
> unmount it.
>
> The reboot command is going to hang if you have IO queued still. You
> need to do the logout command first (the iscsi init scripts should force
> a logout too).
>
>
>
> >      So in the end the long timeout proved to be too much trouble so I
> > moved back to the 120s timeout with noop timers enabled.  However even
> > this is causing me trouble.
> >      Currently I am using my iSCSI device to store video which means I
> > am sending a large amount of data over the network into my device at a
> > pretty high rate.  In my tests if I cut the connection sometimes
> > things will work out fine.  The connection gets cut and after 120s I
> > get a whole slew of iSCSI "queuing" errors and such and finally the
> > iSCSI device gets remounted as read only.  Once the error messages
> > stop if I stop all of my video archiving, reconnect the iSCSI device,
> > logout, umount the iSCSI, remount the iSCSI, log back in, and restart
> > my video archiver everything will work fine.
> >     However in other cases when i cut the connection the iSCSI debugs
> > won't be as numerous and it goes to read only mode almost
> > immediately.  When I try the above steps to recover my system hangs
> > like before on the umount.  I used KDB to get a dump of what is going
> > during the umount and will add it to this message.  It appears that
> > the umount process has context switched out waiting for the io to
> > complete. The io to be completed are 'sync'ing of the buffers which
> > never happens or completed and does not wake up umount.
> >     I've tried numerous things to get around this umount issue
> > including a variety of umount flags, the remount command, long delays
> > in my code and the kernel code.  But nothing has worked up to this
> > point.  I'm currently working on version 2.6.16 of the kernel with
> > open iSCSI version 2.0-865.9.  I am going to try the latest and
> > greatest to see if that helps at all.  I'm convinced that the current
> > problem has something to do with a quick change to 'ro' mode vs a
> > slower change to 'ro' mode after timeout.
>
> > If you guys have any advice or insight I'd appreciate the help.  I
> > posted the debug in the files section.  The filename is
> > umount_hang.rtf.  I've bolded the area where the umount gets called.
>
> Is this the script you are running? I did not see the bolded stuff.
>
> #!/bin/bash -x
> umount -f /media1_0
> rmdir /media1_0
> iscsiadm -m node -p 172.19.153.14:3260,0 -T
> iqn.2000-08.com.intransa:ivsms.dg1.storage1 --logout
> iscsiadm -m node -p 172.19.153.14:3260,0 -T
> iqn.2000-08.com.intransa:ivsms.dg1.storage1 -o delete
>
> If you run this command and IO is queued due to the replacement_timeout
> (if you pulled a cable then the initiator detected it and was trying to
> log back in) not expiring yet then the unmount is going to hang or fail
> or who knows. It is not going to do what you want though.
>
> If you are trying to force the unmount and do not care about data
> getting writen then you can just do the logout command, then do the
> unmount command.
>
> The other thing is that if you get those errors and then you pull the
> cable back in so IO can execute again, you are in a wierd state. I am
> not sure what the FS will do and supports at the point it is already had
> some IO failed.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to