On 08/05/2009 11:33 AM, Mike Christie wrote:
> On 08/05/2009 11:26 AM, Mike Christie wrote:
>> On 08/05/2009 11:01 AM, Erez Zilber wrote:
>>> On Wed, Aug 5, 2009 at 6:19 PM, Mike Christie<micha...@cs.wisc.edu>    
>>> wrote:
>>>> On 08/04/2009 01:12 PM, Erez Zilber wrote:
>>>>> On Tue, Aug 4, 2009 at 8:17 PM, Mike Christie<micha...@cs.wisc.edu>      
>>>>> wrote:
>>>>>> Erez Zilber wrote:
>>>>>>> I'm running with open-iscsi.git HEAD + the check suspend bit patch +
>>>>>>> the wake xmit on error patch. If I disconnect the cable on the
>>>>>>> initiator side (even while not running IO), I see that after sending
>>>>>>> the signal, the  iscsi_q_XX thread reaches 100% cpu. I ran it over
>>>>>>> several 1GB/ 10 GB drivers and got the same results.
>>>>>>>
>>>>>>> If I remove the  wake xmit on error patch, I don't see this behavior.
>>>>>>>
>>>>>> Shoot, I have been running the xmit wakeup and suspend bit patch here
>>>>>> fine. Let me do some more testing.
>>>>>>
>>>>>> Is this something you always hit? Could you send me the final patch you
>>>>>> ended up using?
>>>>> I see this every time. Note that I'm not running with
>>>>> linux-2.6-iscsi.git. I'm using the open-iscsi.git tree + the 2 patches
>>>>> that I took without any change (using git-show) from the
>>>>> linux-2.6-iscsi.git tree. Which tree did you test it on?
>>>>>
>>>>> I added some printks to the code and saw that the signal does get sent
>>>>> from iscsi_sw_tcp_conn_stop, but I didn't see that (rc == -EINTR || rc
>>>>> == -EAGAIN) in  iscsi_sw_tcp_xmit (), even when I ran IO on that
>>>>> session.
>>>>>
>>>> Does r in iscsi_sw_tcp_xmit_segment == 0?
>>>>
>>> No, it is never zero.
>>>
>>>> If not I think you need a diffferent patch. In one of the patch versions
>>>> iscsi_sw_tcp_xmit_segment could return -ENODATA (this is when I had a
>>>> check for suspend_tx in there). iscsi_sw_tcp_xmit did not check this and
>>>> so I think  we can loop.
>>>>
>>>> Could you try the attached patch. It was made over open-iscsi.git for
>>>> you. I dropped the suspend bit check in iscsi_sw_tcp_xmit_segment,
>>>> because it is not needed. If we end up blocking the signal will wake us.
>>> I ran it and got the same 100% cpu usage. Did you try to run it on
>>> your machines with open-iscsi.git? Did you see a different behavior?
>>>
>> I just ran it. Maybe I am looking for the wrong thing though.
>>
>> For your problem, when the signal is sent does the recovery go ok and we
>> end up reconnecting? But the problem is just that the xmit thread takes
>> up 100% of the cpu?
>>
>
>
> Ignore this. I see the problem now. I was thinking you did not
> reconnect. I see the cpu usage. Let me do some digging.
>

I found it. The problem is that we will send the signal if the xmit 
thread is running or not. If it is not running the workqueue code will 
keep getting woken up to handle the signal, but because we have not 
called queue_work the workqueue code will not let the thread run so we 
never get to flush the signal until we reconnect and send down a login 
pdu (the login pdu does a queue_work finally).

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to