from:"Erez Zilber"

Re: iSCSI disconnections after running iscsistart

2016-03-06 Thread Erez Zilber

Thanks Chris and Mike for clarifying this iscsistart/iscsid issue.

Erez

On Fri, Mar 4, 2016 at 2:47 AM, Mike Christie  wrote:
> On 03/03/2016 03:17 PM, Chris Leech wrote:
>> On Thu, Mar 03, 2016 at 10:16:45PM +0200, Erez Zilber wrote:
>>> Hi,
>>>
>>> I'm running iSCSI boot for RHEL & SUSE nodes. Sometimes, after
>>> iscsistart is called, errors on the iSCSI target side occur (e.g. temp
>>> network disconnection) and the iSCSI connection is disconnected. In
>>> the boot log, it looks like this (on a RHEL node):
>>>
>>> connection1:0: detected conn error (1020)
>>>
>>> And then, of course, the node fails to boot.
>>>
>>> I would expect that iscsid would handle this and reconnect, but I
>>> don't see in the boot log that iscsid was started. I also took a look
>>> at /usr/share/dracut/modules.d/95iscsi/iscsiroot, but didn't find
>>> iscsid there.
>>
>> No, if the connection fails during boot I don't think it's covered as
>> you found out.  Of course if you're booting from iSCSI, a network
>> disruption while the iSCSI boot firmware or bootloader is running is
>> also probably fatal.
>>
>>> Is it possible to run iscsid as part of initrd, before iscsistart is 
>>> executed?
>>
>> Transitioning iscsid from the initrd is problematic. Processes started
>> in the initrd don't have the right view of the filesystem, aren't in the
>> proper security context, etc.  If we kill the initrd started process and
>> restart, there's still a time gap.
>>
>> And iscsistart does not play well with a running iscsid.
>
> You do not really need both a iscsistart and iscsiadm+iscsid. iscsiadm
> can log into ibft and fw boot targets too. iscsiadm just does not
> support passing in boot values like when you use the dracut root=iscsi:
> method. Feel free to add it since it might be easier. You could also
> just run iscsiadm to create a tmp record for the target you want and
> then just run iscsiadm login command for it.
>
> The only other feature iscsistart has is that it can start up
> networking. This is nice for a simple initramfs. dracut based ones like
> in RHEL and SLES do not need that since it handles the networking.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to open-iscsi+unsubscr...@googlegroups.com.
> To post to this group, send email to open-iscsi@googlegroups.com.
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

iSCSI disconnections after running iscsistart

2016-03-03 Thread Erez Zilber

Hi,

I'm running iSCSI boot for RHEL & SUSE nodes. Sometimes, after
iscsistart is called, errors on the iSCSI target side occur (e.g. temp
network disconnection) and the iSCSI connection is disconnected. In
the boot log, it looks like this (on a RHEL node):

connection1:0: detected conn error (1020)

And then, of course, the node fails to boot.

I would expect that iscsid would handle this and reconnect, but I
don't see in the boot log that iscsid was started. I also took a look
at /usr/share/dracut/modules.d/95iscsi/iscsiroot, but didn't find
iscsid there.

Is it possible to run iscsid as part of initrd, before iscsistart is executed?

Thanks,
Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Re: [PATCH] decrease sndtmo

2010-03-01 Thread Erez Zilber

On Wed, Feb 3, 2010 at 11:51 PM, Mike Christie  wrote:
> On 02/03/2010 06:07 AM, Erez Zilber wrote:
>>
>> On Wed, Feb 3, 2010 at 11:30 AM, Mike Christie
>>  wrote:
>>>
>>> On 02/03/2010 01:50 AM, Erez Zilber wrote:
>>>>>
>>>>> It looks like I posted it at Red Hat and never got a response, and I
>>>>> probably then forgot about it and never asked upstream. Will send mail
>>>>> upstream now.
>>>>
>>>> Which list are you sending it to? I thought it was lkml, but didn't
>>>> find any discussion there.
>>>>
>>>
>>> I think I found a nicer solution. See the attached patch made over
>>> linus's
>>> tree. I am just not sure if we are allowed to set the sk_err field -
>>> maybe
>>> it is supposed to be internal to the socket code. The patch seems to be
>>> working for me.
>>>
>>
>> Works great for me.
>>
>
> Ok. I am going to post it to netdev today/tomorrow, to make sure they are ok
> with how I am accessing the sock struct.
>

Mike,

Did you get any response from the netdev list?

Thanks,
Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: 2.6.14-23_compat.patch & CentOS 5.4

2010-03-01 Thread Erez Zilber

On Tue, Feb 2, 2010 at 10:53 AM, Erez Zilber  wrote:
> On Tue, Feb 2, 2010 at 6:28 AM, Mike Christie  wrote:
>> On 02/01/2010 01:45 PM, Erez Zilber wrote:
>>>
>>> On Mon, Feb 1, 2010 at 9:29 PM, Mike Christie
>>>  wrote:
>>>>
>>>> On 02/01/2010 12:42 AM, Erez Zilber wrote:
>>>>>
>>>>> On Sun, Jan 31, 2010 at 8:24 PM, Rakesh Ranjan
>>>>>  wrote:
>>>>>>
>>>>>> On 01/31/2010 08:04 PM, Erez Zilber wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> When I build open-iscsi on CentOS 5.4, I get the following errors:
>>>>>>>
>>>>>>> In file included from
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.h:30,
>>>>>>>                  from
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.c:30:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:154:
>>>>>>> error: static declaration of ‘kernel_getsockname’ follows non-static
>>>>>>> declaration
>>>>>>> include/linux/net.h:221: error: previous declaration of
>>>>>>> ‘kernel_getsockname’ was here
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:160:
>>>>>>> error: static declaration of ‘kernel_getpeername’ follows non-static
>>>>>>> declaration
>>>>>>> include/linux/net.h:223: error: previous declaration of
>>>>>>> ‘kernel_getpeername’ was here
>>>>>>>
>>>>>>> This is probably because of the following code in
>>>>>>> 2.6.14-23_compat.patch:
>>>>>>>
>>>>>>> +#ifdef RHEL_RELEASE_CODE
>>>>>>> +#if (RHEL_RELEASE_CODE<      RHEL_RELEASE_VERSION(5, 4))
>>>>>>> +#define RHELC1 1
>>>>>>> +#endif
>>>>>>> +#endif
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>> +#if (LINUX_VERSION_CODE<      KERNEL_VERSION(2,6,19)) \
>>>>>>> +&&      !(defined RHELC1)
>>>>>>> +static inline int kernel_getsockname(struct socket *sock, struct
>>>>>>> sockaddr *addr,
>>>>>>> +                       int *addrlen)
>>>>>>> +{
>>>>>>> +       return sock->ops->getname(sock, addr, addrlen, 0);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline int kernel_getpeername(struct socket *sock, struct
>>>>>>> sockaddr *addr,
>>>>>>> +                       int *addrlen)
>>>>>>> +{
>>>>>>> +       return sock->ops->getname(sock, addr, addrlen, 1);
>>>>>>> +}
>>>>>>> +#endif
>>>>>>>
>>>>>>> What does RHELC1 mean? What does it have to do with versions older
>>>>>>> than
>>>>>>> 5.4?
>>>>>>
>>>>>> Hi Erez,
>>>>>>
>>>>>> RHELC1 is for RHEL5.{1,3}, IIRC it defines some of the missing symbols
>>>>>> from
>>>>>> RHEL5.{1,3} and apart from that it also provides some build support for
>>>>>> older SLES. About the above error's it seems we need to put a check for
>>>>>> CentOS versions.
>>>>>>
>>>>>> Regards
>>>>>> Rakesh Ranjan
>>>>>>
>>>>>
>>>>> Hi Rakesh,
>>>>>
>>>>> CentOS&    RHEL have the same kernels. What I'm asking is: what is the
>>>>> difference between RHEL 5.3 and 5.4 (or: what is the difference
>>>>> between CentOS 5.3 and 5.4). It looks like getsockname&    getpeername
>>>>> exist in both 5.3&    5.4. Do you think that 5.4 should be handled
>>>>> differently?
>>>>>
>>>>
>>>> I do not think so. I think what happened is that only 5.3 was out when
>>>> Rakesh made the patch, so it was just a dumb case of where I did not
>>>> update
>>>> the patch when 5.4 came up.
>>>>
>>>
>>> So, it's a trivial patch, isn't it? Do you want me to send a patch or
>>> will you add it yourself? I can send a patch tomorrow.
>>>
>>
>> Go ahead and send it. I am almost done with the patch to fix up the other
>> problem you posted about.
>>
>
> I've attached 2 versions. One fixes only the < 5.5 case and the other
> one handles all RHEL versions that are <6.0. I prefer the 2nd one
> (assuming that there will be no API breakage until RHEL 6.0).
>
> Erez
>

Mike,

Any news about this patch?

Thanks,
Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: [PATCH] decrease sndtmo

2010-02-03 Thread Erez Zilber

On Wed, Feb 3, 2010 at 11:30 AM, Mike Christie  wrote:
> On 02/03/2010 01:50 AM, Erez Zilber wrote:
>>>
>>> It looks like I posted it at Red Hat and never got a response, and I
>>> probably then forgot about it and never asked upstream. Will send mail
>>> upstream now.
>>
>> Which list are you sending it to? I thought it was lkml, but didn't
>> find any discussion there.
>>
>
> I think I found a nicer solution. See the attached patch made over linus's
> tree. I am just not sure if we are allowed to set the sk_err field - maybe
> it is supposed to be internal to the socket code. The patch seems to be
> working for me.
>

Works great for me.

Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: [PATCH] decrease sndtmo

2010-02-02 Thread Erez Zilber

> It looks like I posted it at Red Hat and never got a response, and I
> probably then forgot about it and never asked upstream. Will send mail
> upstream now.

Which list are you sending it to? I thought it was lkml, but didn't
find any discussion there.

Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: [PATCH] decrease sndtmo

2010-02-02 Thread Erez Zilber

On Tue, Feb 2, 2010 at 6:59 PM, Mike Christie  wrote:
> On 02/02/2010 09:25 AM, Erez Zilber wrote:
>>
>> On Thu, Aug 6, 2009 at 5:32 PM, Mike Christie
>>  wrote:
>>>
>>> On 08/06/2009 05:26 AM, Erez Zilber wrote:
>>>>
>>>> On Wed, Aug 5, 2009 at 10:22 PM, Mike Christie
>>>>  wrote:
>>>>>
>>>>> On 08/05/2009 12:34 PM, Erez Zilber wrote:
>>>>>>>
>>>>>>> I found it. The problem is that we will send the signal if the xmit
>>>>>>> thread is running or not. If it is not running the workqueue code
>>>>>>> will
>>>>>>> keep getting woken up to handle the signal, but because we have not
>>>>>>> called queue_work the workqueue code will not let the thread run so
>>>>>>> we
>>>>>>> never get to flush the signal until we reconnect and send down a
>>>>>>> login
>>>>>>> pdu (the login pdu does a queue_work finally).
>>>>>>>
>>>>>> When you say "the xmit thread is running", I guess that you mean that
>>>>>> the xmit thread is busy with IO, right? Note that I said that this
>>>>>
>>>>> No. workqueue.c:worker_thread() is spinning. It is looping because
>>>>> there
>>>>> is a signal pending, but the iscsi work code which has the
>>>>> flush_signals
>>>>> is not getting run because there is no work queued.
>>>>>
>>>>> So you could add a
>>>>>
>>>>> if (signal_pending(current))
>>>>>         flush_signals(current)
>>>>>
>>>>> to worker_thread() "for" loop and I think this will fix the problem.
>>>>>
>>>>
>>>> Looks like this solves the problem. I've added the following patch to
>>>> the centos 5.3 kernel (2.6.18-128.1.6.el5):
>>>>
>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>> index 8594efb..e148ed8 100644
>>>> --- a/kernel/workqueue.c
>>>> +++ b/kernel/workqueue.c
>>>> @@ -253,6 +253,9 @@ static int worker_thread(void *__cwq)
>>>>
>>>>          set_current_state(TASK_INTERRUPTIBLE);
>>>>          while (!kthread_should_stop()) {
>>>> +               if (signal_pending(current))
>>>> +                       flush_signals(current);
>>>> +
>>>>                  add_wait_queue(&cwq->more_work,&wait);
>>>>                  if (list_empty(&cwq->worklist))
>>>>                          schedule();
>>>>
>>>> I'm running with open-iscsi.git + 2 commits from linux-2.6-iscsi.git
>>>> (9c302cc45b70ecc4b606d65a445902381066061b&
>>>> 75be23dc40ba2f215779d5ba60fda9a762271bbe).
>>>>
>>>> Will you push it upstream&    into the RHEL kernel?
>>>>
>>>
>>> I am not sure. I was thinking that switching from a workqueue to a
>>> thread is the right thing to do. The drawback is that the workqueue is
>>> nice when there are multiple sessions for a host like is done with
>>> bnx2i, cxgb3i and be_iscsi. I can just queue_work and pass the
>>> connection to send on. If I switch to a work_queue I have to add my own
>>> code to do that.
>>>
>>> I am going to post a patch like you did to linux-kernel and see what
>>> people say is best. If it goes in then I will port to RHEL.
>>>
>>> --~--~-~--~~~---~--~~
>>> You received this message because you are subscribed to the Google Groups
>>> "open-iscsi" group.
>>> To post to this group, send email to open-iscsi@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> open-iscsi+unsubscr...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/open-iscsi
>>> -~--~~~~--~~--~--~---
>>>
>>>
>>
>> Mike,
>>
>> We had this discussion a long time ago. I don't remember what
>> eventually happened with it. Did you push the workqueue patch to the
>> kernel? What about the suspend-and-wake patch?
>>
>
> It looks like I posted it at Red Hat and never got a response, and I
> probably then forgot about it and never asked upstream. Will send mail
> upstream now.
>
> --

I encountered this problem ~6 months ago and found some workaround.
Now, I moved to new (and faster) HW, and I'm hitting this again and
again in scenarios with lots of I/O + killing the target machine.

Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: [PATCH] decrease sndtmo

2010-02-02 Thread Erez Zilber

On Thu, Aug 6, 2009 at 5:32 PM, Mike Christie  wrote:
>
> On 08/06/2009 05:26 AM, Erez Zilber wrote:
>> On Wed, Aug 5, 2009 at 10:22 PM, Mike Christie  wrote:
>>> On 08/05/2009 12:34 PM, Erez Zilber wrote:
>>>>> I found it. The problem is that we will send the signal if the xmit
>>>>> thread is running or not. If it is not running the workqueue code will
>>>>> keep getting woken up to handle the signal, but because we have not
>>>>> called queue_work the workqueue code will not let the thread run so we
>>>>> never get to flush the signal until we reconnect and send down a login
>>>>> pdu (the login pdu does a queue_work finally).
>>>>>
>>>> When you say "the xmit thread is running", I guess that you mean that
>>>> the xmit thread is busy with IO, right? Note that I said that this
>>> No. workqueue.c:worker_thread() is spinning. It is looping because there
>>> is a signal pending, but the iscsi work code which has the flush_signals
>>> is not getting run because there is no work queued.
>>>
>>> So you could add a
>>>
>>> if (signal_pending(current))
>>>         flush_signals(current)
>>>
>>> to worker_thread() "for" loop and I think this will fix the problem.
>>>
>>
>> Looks like this solves the problem. I've added the following patch to
>> the centos 5.3 kernel (2.6.18-128.1.6.el5):
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 8594efb..e148ed8 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -253,6 +253,9 @@ static int worker_thread(void *__cwq)
>>
>>          set_current_state(TASK_INTERRUPTIBLE);
>>          while (!kthread_should_stop()) {
>> +               if (signal_pending(current))
>> +                       flush_signals(current);
>> +
>>                  add_wait_queue(&cwq->more_work,&wait);
>>                  if (list_empty(&cwq->worklist))
>>                          schedule();
>>
>> I'm running with open-iscsi.git + 2 commits from linux-2.6-iscsi.git
>> (9c302cc45b70ecc4b606d65a445902381066061b&
>> 75be23dc40ba2f215779d5ba60fda9a762271bbe).
>>
>> Will you push it upstream&  into the RHEL kernel?
>>
>
> I am not sure. I was thinking that switching from a workqueue to a
> thread is the right thing to do. The drawback is that the workqueue is
> nice when there are multiple sessions for a host like is done with
> bnx2i, cxgb3i and be_iscsi. I can just queue_work and pass the
> connection to send on. If I switch to a work_queue I have to add my own
> code to do that.
>
> I am going to post a patch like you did to linux-kernel and see what
> people say is best. If it goes in then I will port to RHEL.
>
> --~--~-~--~~~---~--~~
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To post to this group, send email to open-iscsi@googlegroups.com
> To unsubscribe from this group, send email to 
> open-iscsi+unsubscr...@googlegroups.com
> For more options, visit this group at 
> http://groups.google.com/group/open-iscsi
> -~--~~~~--~~--~--~---
>
>

Mike,

We had this discussion a long time ago. I don't remember what
eventually happened with it. Did you push the workqueue patch to the
kernel? What about the suspend-and-wake patch?

Thanks,
Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: 2.6.14-23_compat.patch & CentOS 5.4

2010-02-02 Thread Erez Zilber

On Tue, Feb 2, 2010 at 6:28 AM, Mike Christie  wrote:
> On 02/01/2010 01:45 PM, Erez Zilber wrote:
>>
>> On Mon, Feb 1, 2010 at 9:29 PM, Mike Christie
>>  wrote:
>>>
>>> On 02/01/2010 12:42 AM, Erez Zilber wrote:
>>>>
>>>> On Sun, Jan 31, 2010 at 8:24 PM, Rakesh Ranjan
>>>>  wrote:
>>>>>
>>>>> On 01/31/2010 08:04 PM, Erez Zilber wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> When I build open-iscsi on CentOS 5.4, I get the following errors:
>>>>>>
>>>>>> In file included from
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.h:30,
>>>>>>                  from
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.c:30:
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:154:
>>>>>> error: static declaration of ‘kernel_getsockname’ follows non-static
>>>>>> declaration
>>>>>> include/linux/net.h:221: error: previous declaration of
>>>>>> ‘kernel_getsockname’ was here
>>>>>>
>>>>>>
>>>>>>
>>>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:160:
>>>>>> error: static declaration of ‘kernel_getpeername’ follows non-static
>>>>>> declaration
>>>>>> include/linux/net.h:223: error: previous declaration of
>>>>>> ‘kernel_getpeername’ was here
>>>>>>
>>>>>> This is probably because of the following code in
>>>>>> 2.6.14-23_compat.patch:
>>>>>>
>>>>>> +#ifdef RHEL_RELEASE_CODE
>>>>>> +#if (RHEL_RELEASE_CODE<      RHEL_RELEASE_VERSION(5, 4))
>>>>>> +#define RHELC1 1
>>>>>> +#endif
>>>>>> +#endif
>>>>>>
>>>>>> and
>>>>>>
>>>>>> +#if (LINUX_VERSION_CODE<      KERNEL_VERSION(2,6,19)) \
>>>>>> +&&      !(defined RHELC1)
>>>>>> +static inline int kernel_getsockname(struct socket *sock, struct
>>>>>> sockaddr *addr,
>>>>>> +                       int *addrlen)
>>>>>> +{
>>>>>> +       return sock->ops->getname(sock, addr, addrlen, 0);
>>>>>> +}
>>>>>> +
>>>>>> +static inline int kernel_getpeername(struct socket *sock, struct
>>>>>> sockaddr *addr,
>>>>>> +                       int *addrlen)
>>>>>> +{
>>>>>> +       return sock->ops->getname(sock, addr, addrlen, 1);
>>>>>> +}
>>>>>> +#endif
>>>>>>
>>>>>> What does RHELC1 mean? What does it have to do with versions older
>>>>>> than
>>>>>> 5.4?
>>>>>
>>>>> Hi Erez,
>>>>>
>>>>> RHELC1 is for RHEL5.{1,3}, IIRC it defines some of the missing symbols
>>>>> from
>>>>> RHEL5.{1,3} and apart from that it also provides some build support for
>>>>> older SLES. About the above error's it seems we need to put a check for
>>>>> CentOS versions.
>>>>>
>>>>> Regards
>>>>> Rakesh Ranjan
>>>>>
>>>>
>>>> Hi Rakesh,
>>>>
>>>> CentOS&    RHEL have the same kernels. What I'm asking is: what is the
>>>> difference between RHEL 5.3 and 5.4 (or: what is the difference
>>>> between CentOS 5.3 and 5.4). It looks like getsockname&    getpeername
>>>> exist in both 5.3&    5.4. Do you think that 5.4 should be handled
>>>> differently?
>>>>
>>>
>>> I do not think so. I think what happened is that only 5.3 was out when
>>> Rakesh made the patch, so it was just a dumb case of where I did not
>>> update
>>> the patch when 5.4 came up.
>>>
>>
>> So, it's a trivial patch, isn't it? Do you want me to send a patch or
>> will you add it yourself? I can send a patch tomorrow.
>>
>
> Go ahead and send

Re: Reset eh timer if cmd is really making progress

2010-02-01 Thread Erez Zilber

On Tue, Feb 2, 2010 at 6:47 AM, Mike Christie  wrote:
> On 02/01/2010 01:21 PM, Mike Christie wrote:
>>
>> On 02/01/2010 11:52 AM, Mike Christie wrote:
>>>
>>> On 02/01/2010 05:14 AM, Erez Zilber wrote:
>>>>
>>>> When iscsi_eh_cmd_timed_out gets called, we can ask scsi-ml to give us
>>>> more time if the cmd is making progress (i.e. if there was some data
>>>> transfer since the last timeout).
>>>>
>>>> The problem is that task->last_xfer& task->last_timeout are set to
>>>> the value of 'jiffies' when allocating the task. If the target machine
>>>> is already dead when we send the cmd, no progress will be made. Still,
>>>> when iscsi_eh_cmd_timed_out will be called, it will think that data
>>>> was sent since the last timeout and reset the timer (and waste time).
>>>> In order to solve that, iscsi_eh_cmd_timed_out should also check if
>>>> there was any data transfer after the task was allocated.
>>>>
>>>
>>> I agree it is a problem with the code.
>>>
>>> The problem is that the check also handled the case where we are so
>>> backed up that we cannot even send a cmd/data within the cmd timeout.
>>> For that case, the check was giving it a extra cmd timeout seconds to
>>> get it off. That code is not really good though. It should probably just
>>> loop over all the cmds there and see if any cmds have made progress. If
>>> so give the cmd more time, if not then fail.
>>>
>>> I was not sure though if I should check if any cmds to the target made
>>> progress or if any cmds to the same disk. It could be that just one disk
>>> went bad, so we might want to check per disk. However, this could be the
>>> first IO to the disk and it just got stuck behind a bunch of other IO to
>>> other disks, so in that case we want to check per target.
>>>
>>
>> Give me until tomorrow. I think I can cook up patch. Before when
>> deciding when to check for dev vs target, I was mixed up with some
>> reordering stuff, but I think I have a patch that should work for both
>> of us.
>>
>
> I think the attached patch should do what we both want.
>
> Instead of always waiting an extra cmd timeout seconds, we will only get
> extra time if:
> 1. The command has made progress. (changed test to time_after)
> 2. Commands queued before the timed out one have made transfers since we
> started the task or since it last timedout.
>
> Patch was made over scsi-misc.
>

Looks okay to me.

Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: 2.6.14-23_compat.patch & CentOS 5.4

2010-02-01 Thread Erez Zilber

On Mon, Feb 1, 2010 at 9:29 PM, Mike Christie  wrote:
> On 02/01/2010 12:42 AM, Erez Zilber wrote:
>>
>> On Sun, Jan 31, 2010 at 8:24 PM, Rakesh Ranjan  wrote:
>>>
>>> On 01/31/2010 08:04 PM, Erez Zilber wrote:
>>>>
>>>> Hi,
>>>>
>>>> When I build open-iscsi on CentOS 5.4, I get the following errors:
>>>>
>>>> In file included from
>>>>
>>>>
>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.h:30,
>>>>                  from
>>>>
>>>>
>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.c:30:
>>>>
>>>>
>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:154:
>>>> error: static declaration of ‘kernel_getsockname’ follows non-static
>>>> declaration
>>>> include/linux/net.h:221: error: previous declaration of
>>>> ‘kernel_getsockname’ was here
>>>>
>>>>
>>>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:160:
>>>> error: static declaration of ‘kernel_getpeername’ follows non-static
>>>> declaration
>>>> include/linux/net.h:223: error: previous declaration of
>>>> ‘kernel_getpeername’ was here
>>>>
>>>> This is probably because of the following code in
>>>> 2.6.14-23_compat.patch:
>>>>
>>>> +#ifdef RHEL_RELEASE_CODE
>>>> +#if (RHEL_RELEASE_CODE<    RHEL_RELEASE_VERSION(5, 4))
>>>> +#define RHELC1 1
>>>> +#endif
>>>> +#endif
>>>>
>>>> and
>>>>
>>>> +#if (LINUX_VERSION_CODE<    KERNEL_VERSION(2,6,19)) \
>>>> +&&    !(defined RHELC1)
>>>> +static inline int kernel_getsockname(struct socket *sock, struct
>>>> sockaddr *addr,
>>>> +                       int *addrlen)
>>>> +{
>>>> +       return sock->ops->getname(sock, addr, addrlen, 0);
>>>> +}
>>>> +
>>>> +static inline int kernel_getpeername(struct socket *sock, struct
>>>> sockaddr *addr,
>>>> +                       int *addrlen)
>>>> +{
>>>> +       return sock->ops->getname(sock, addr, addrlen, 1);
>>>> +}
>>>> +#endif
>>>>
>>>> What does RHELC1 mean? What does it have to do with versions older than
>>>> 5.4?
>>>
>>> Hi Erez,
>>>
>>> RHELC1 is for RHEL5.{1,3}, IIRC it defines some of the missing symbols
>>> from
>>> RHEL5.{1,3} and apart from that it also provides some build support for
>>> older SLES. About the above error's it seems we need to put a check for
>>> CentOS versions.
>>>
>>> Regards
>>> Rakesh Ranjan
>>>
>>
>> Hi Rakesh,
>>
>> CentOS&  RHEL have the same kernels. What I'm asking is: what is the
>> difference between RHEL 5.3 and 5.4 (or: what is the difference
>> between CentOS 5.3 and 5.4). It looks like getsockname&  getpeername
>> exist in both 5.3&  5.4. Do you think that 5.4 should be handled
>> differently?
>>
>
> I do not think so. I think what happened is that only 5.3 was out when
> Rakesh made the patch, so it was just a dumb case of where I did not update
> the patch when 5.4 came up.
>

So, it's a trivial patch, isn't it? Do you want me to send a patch or
will you add it yourself? I can send a patch tomorrow.

Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reset eh timer if cmd is really making progress

2010-02-01 Thread Erez Zilber

When iscsi_eh_cmd_timed_out gets called, we can ask scsi-ml to give us
more time if the cmd is making progress (i.e. if there was some data
transfer since the last timeout).

The problem is that task->last_xfer & task->last_timeout are set to
the value of 'jiffies' when allocating the task. If the target machine
is already dead when we send the cmd, no progress will be made. Still,
when iscsi_eh_cmd_timed_out will be called, it will think that data
was sent since the last timeout and reset the timer (and waste time).
In order to solve that, iscsi_eh_cmd_timed_out should also check if
there was any data transfer after the task was allocated.

How about the following patch?

diff --git a/kernel/libiscsi.c b/kernel/libiscsi.c
index 90f3018..b727c10 100644
--- a/kernel/libiscsi.c
+++ b/kernel/libiscsi.c
@@ -1419,6 +1419,7 @@ static inline struct iscsi_task
*iscsi_alloc_task(struct iscsi_conn *conn,
task->conn = conn;
task->sc = sc;
task->have_checked_conn = 0;
+   task->init_time = jiffies;
task->last_timeout = jiffies;
task->last_xfer = jiffies;
INIT_LIST_HEAD(&task->running);
@@ -1817,7 +1818,8 @@ static enum blk_eh_timer_return
iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
 * we can check if it is the task or connection when we send the
 * nop as a ping.
 */
-   if (time_after_eq(task->last_xfer, task->last_timeout)) {
+   if (time_after_eq(task->last_xfer, task->last_timeout) &&
+   time_after(task->last_xfer, task->init_time)) {
ISCSI_DBG_EH(session, "Command making progress. Asking "
 "scsi-ml for more time to complete. "
 "Last data recv at %lu. Last timeout was at "
diff --git a/kernel/libiscsi.h b/kernel/libiscsi.h
index 1990243..4712ea9 100644
--- a/kernel/libiscsi.h
+++ b/kernel/libiscsi.h
@@ -126,6 +126,7 @@ struct iscsi_task {
struct iscsi_conn   *conn;  /* used connection*/

/* data processing tracking */
+   unsigned long   init_time;
unsigned long   last_xfer;
unsigned long   last_timeout;
int have_checked_conn;

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: 2.6.14-23_compat.patch & CentOS 5.4

2010-01-31 Thread Erez Zilber

On Sun, Jan 31, 2010 at 8:24 PM, Rakesh Ranjan  wrote:
> On 01/31/2010 08:04 PM, Erez Zilber wrote:
>>
>> Hi,
>>
>> When I build open-iscsi on CentOS 5.4, I get the following errors:
>>
>> In file included from
>>
>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.h:30,
>>                  from
>>
>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.c:30:
>>
>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:154:
>> error: static declaration of ‘kernel_getsockname’ follows non-static
>> declaration
>> include/linux/net.h:221: error: previous declaration of
>> ‘kernel_getsockname’ was here
>>
>> /home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:160:
>> error: static declaration of ‘kernel_getpeername’ follows non-static
>> declaration
>> include/linux/net.h:223: error: previous declaration of
>> ‘kernel_getpeername’ was here
>>
>> This is probably because of the following code in 2.6.14-23_compat.patch:
>>
>> +#ifdef RHEL_RELEASE_CODE
>> +#if (RHEL_RELEASE_CODE<  RHEL_RELEASE_VERSION(5, 4))
>> +#define RHELC1 1
>> +#endif
>> +#endif
>>
>> and
>>
>> +#if (LINUX_VERSION_CODE<  KERNEL_VERSION(2,6,19)) \
>> +&&  !(defined RHELC1)
>> +static inline int kernel_getsockname(struct socket *sock, struct
>> sockaddr *addr,
>> +                       int *addrlen)
>> +{
>> +       return sock->ops->getname(sock, addr, addrlen, 0);
>> +}
>> +
>> +static inline int kernel_getpeername(struct socket *sock, struct
>> sockaddr *addr,
>> +                       int *addrlen)
>> +{
>> +       return sock->ops->getname(sock, addr, addrlen, 1);
>> +}
>> +#endif
>>
>> What does RHELC1 mean? What does it have to do with versions older than
>> 5.4?
>
> Hi Erez,
>
> RHELC1 is for RHEL5.{1,3}, IIRC it defines some of the missing symbols from
> RHEL5.{1,3} and apart from that it also provides some build support for
> older SLES. About the above error's it seems we need to put a check for
> CentOS versions.
>
> Regards
> Rakesh Ranjan
>

Hi Rakesh,

CentOS & RHEL have the same kernels. What I'm asking is: what is the
difference between RHEL 5.3 and 5.4 (or: what is the difference
between CentOS 5.3 and 5.4). It looks like getsockname & getpeername
exist in both 5.3 & 5.4. Do you think that 5.4 should be handled
differently?

Thanks,
Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: [PATCH] Maintain a list of nop-out PDUs that almost timed out

2010-01-31 Thread Erez Zilber

On Fri, Jan 29, 2010 at 8:59 PM, Mike Christie  wrote:
> On 01/28/2010 09:25 AM, Erez Zilber wrote:
>>>
>>>
>>> +struct iscsi_noop_info {
>>> +       struct timeval tv;
>>>
>>>
>>> Can you pass this between userspace and the kernel safely? The fields in
>>> there are longs, and I thought those could be different sizes if you had
>>> something like 32bit userspace and 64 kernels.
>>
>> What do you suggest to do here?
>
> I do not know. Am I even right? After I sent that, I was thinking syscalls
> must pass it around ok, so it must be ok or there must be some way to do it.
> I did not look into it any more though.

I must say that I don't know how to handle the scenario of 32bit
userspace and 64 kernels.

>
>
>> I'm not very familiar with debugfs. What do you want to do with it?
>>
>
> Nothing. I was just saying it is an option. If you were also fine with the
> netlink interface you used, so am I, so no changes needed.
>

The following patch (built over the original patch) fixes the problems
that we discussed (except for the 32-64 problem). If you're okay with
it, I will send a single updated patch.

diff --git a/doc/iscsiadm.8 b/doc/iscsiadm.8
index 6f85cb5..9f60beb 100644
--- a/doc/iscsiadm.8
+++ b/doc/iscsiadm.8
@@ -126,14 +126,14 @@ operator.

 .TP
 \fB\-i\fR, \fB\-\-debuginfo=\fIdebug_info_op\fR
-Specifies a debug infomration operation \fIdebug_info_op\fR.
\fIdebug_info_op\fR must be \fIdump\fR or \fIclear\fR.
+Specifies a debug infomration operation \fIdebug_info_op\fR.
\fIdebug_info_op\fR must be \fIdump_noop\fR or \fIclear_noop\fR.
 .IP
 This option is valid only for the session mode.
 .IP
-\fIdump\fR prints a list of recent Nop-out PDUs that almost timed out
with information about the time that the Nop-out PDU was sent and its
delay.
+\fIdump_noop\fR prints a list of recent Nop-out PDUs that almost
timed out with information about the time that the Nop-out PDU was
sent and its delay.
 A Nop-out PDU (that did not time out) is defined as 'almost timed
out' if its delay is greater than node.session.noop_threshold *
node.conn[0].timeo.noop_out_timeout.
 .IP
-\fIclear\fR clears the list of recent Nop-out PDUs that almost timed
out.
+\fIclear_noop\fR clears the list of recent Nop-out PDUs that almost
timed out.

 .TP
 \fB\-o\fR, \fB\-\-op=\fIop\fR
diff --git a/include/iscsi_if.h b/include/iscsi_if.h
index 7b96692..0b7dfe8 100644
--- a/include/iscsi_if.h
+++ b/include/iscsi_if.h
@@ -328,9 +328,10 @@ enum iscsi_param {
ISCSI_PARAM_ISID,
ISCSI_PARAM_INITIATOR_NAME,

+   ISCSI_PARAM_TGT_RESET_TMO,
+
ISCSI_PARAM_NOOP_THRESHOLD,

-   ISCSI_PARAM_TGT_RESET_TMO,
/* must always be last */
ISCSI_PARAM_MAX,
 };
diff --git a/kernel/libiscsi.c b/kernel/libiscsi.c
index 5011139..3f10940 100644
--- a/kernel/libiscsi.c
+++ b/kernel/libiscsi.c
@@ -980,6 +980,15 @@ static void iscsi_send_nopout(struct iscsi_conn
*conn, struct iscsi_nopin *rhdr)
}
 }

+static unsigned long iscsi_get_ping_delay(unsigned long last_ping)
+{
+   /* Check if there was a jiffies rollover */
+   if (jiffies < last_ping)
+   return ULONG_MAX - last_ping + jiffies;
+   else
+   return jiffies - last_ping;
+}
+
 static int iscsi_nop_out_rsp(struct iscsi_task *task,
 struct iscsi_nopin *nop, char *data, int
datalen)
 {
@@ -998,8 +1007,7 @@ static int iscsi_nop_out_rsp(struct iscsi_task
*task,
   data, datalen))
rc = ISCSI_ERR_CONN_FAILED;
} else {
-   ping_delay = jiffies - conn->last_ping;
-
+   ping_delay = iscsi_get_ping_delay(conn->last_ping);
/* Check if the ping has almost timed out */
if (ping_delay >=
(session->noop_threshold * (conn->ping_timeout *
HZ)) /
diff --git a/usr/config.h b/usr/config.h
index 90063fe..0ac6ff5 100644
--- a/usr/config.h
+++ b/usr/config.h
@@ -164,11 +164,11 @@ typedef enum discovery_type {
DISCOVERY_TYPE_FW,
 } discovery_type_e;

-typedef enum noop_op_type {
-   NOOP_OP_NONE,
-   NOOP_OP_DUMP,
-   NOOP_OP_CLEAR,
-} noop_op_type_e;
+typedef enum debug_info_op_type {
+   DBG_INFO_OP_NONE,
+   DBG_INFO_OP_DUMP_NOOP,
+   DBG_INFO_OP_CLEAR_NOOP,
+} debug_info_op_type_e;

 typedef struct conn_rec {
iscsi_startup_e startup;
diff --git a/usr/iscsiadm.c b/usr/iscsiadm.c
index bcbaf27..4670118 100644
--- a/usr/iscsiadm.c
+++ b/usr/iscsiadm.c
@@ -95,7 +95,7 @@ static struct option const long_options[] =
{"help", no_argument, NULL, 'h'},
{NULL, 0, NULL, 0},
 };
-static char *short_options =
"RlVhm:p:P:T:H:I:U:k:L:d:r:n:v:o:sSt:u:i:";
+static char *short_options =
"RlVhm:p:P

2.6.14-23_compat.patch & CentOS 5.4

2010-01-31 Thread Erez Zilber

Hi,

When I build open-iscsi on CentOS 5.4, I get the following errors:

In file included from
/home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.h:30,
 from
/home1/erez.zilber/work/open-source/open-iscsi/kernel/scsi_transport_iscsi.c:30:
/home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:154:
error: static declaration of ‘kernel_getsockname’ follows non-static
declaration
include/linux/net.h:221: error: previous declaration of
‘kernel_getsockname’ was here
/home1/erez.zilber/work/open-source/open-iscsi/kernel/open_iscsi_compat.h:160:
error: static declaration of ‘kernel_getpeername’ follows non-static
declaration
include/linux/net.h:223: error: previous declaration of
‘kernel_getpeername’ was here

This is probably because of the following code in 2.6.14-23_compat.patch:

+#ifdef RHEL_RELEASE_CODE
+#if (RHEL_RELEASE_CODE < RHEL_RELEASE_VERSION(5, 4))
+#define RHELC1 1
+#endif
+#endif

and

+#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19)) \
+   && !(defined RHELC1)
+static inline int kernel_getsockname(struct socket *sock, struct
sockaddr *addr,
+   int *addrlen)
+{
+   return sock->ops->getname(sock, addr, addrlen, 0);
+}
+
+static inline int kernel_getpeername(struct socket *sock, struct
sockaddr *addr,
+   int *addrlen)
+{
+   return sock->ops->getname(sock, addr, addrlen, 1);
+}
+#endif

What does RHELC1 mean? What does it have to do with versions older than 5.4?

Thanks,
Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: [PATCH] Maintain a list of nop-out PDUs that almost timed out

2010-01-28 Thread Erez Zilber

On Tue, Jan 19, 2010 at 2:27 PM, Mike Christie  wrote:
> On 12/28/2009 05:10 AM, Erez Zilber wrote:
>>
>> diff --git a/usr/iscsiadm.c b/usr/iscsiadm.c
>> index 6188f43..ee4cfc4 100644
>> --- a/usr/iscsiadm.c
>> +++ b/usr/iscsiadm.c
>> @@ -94,7 +94,7 @@ static struct option const long_options[] =
>>         {"help", no_argument, NULL, 'h'},
>>         {NULL, 0, NULL, 0},
>>  };
>> -static char *short_options = "RlVhm:p:P:T:H:I:U:k:L:d:r:n:v:o:sSt:u:i:";
>> +static char *short_options = "RlVhm:p:P:T:H:I:U:k:L:d:r:n:v:o:sSt:ui:";
>>
>>  static void usage(int status)
>>  {
>>
>
> Sorry the late reply, and thanks again for working on this.
>
>
> @@ -317,6 +328,8 @@ enum iscsi_param {
>        ISCSI_PARAM_ISID,
>        ISCSI_PARAM_INITIATOR_NAME,
>
> +       ISCSI_PARAM_NOOP_THRESHOLD,
> +
>        ISCSI_PARAM_TGT_RESET_TMO,
>
>
> ISCSI_PARAM_NOOP_THRESHOLD should be after ISCSI_PARAM_TGT_RESET_TMO. I
> think it got messed up when you updated a patch.

Done.

>
>
> +struct iscsi_noop_info {
> +       struct timeval tv;
>
>
> Can you pass this between userspace and the kernel safely? The fields in
> there are longs, and I thought those could be different sizes if you had
> something like 32bit userspace and 64 kernels.

What do you suggest to do here?

>
>
>
> +       unsigned int elapsed_time_msec;
> +       short init;
> +};
>
>
>               ping_delay = jiffies - conn->last_ping;
> +
> +               /* Check if the ping has almost timed out */
> +               if (ping_delay >=
> +                   (session->noop_threshold * (conn->ping_timeout * HZ)) /
>
> Does this handle jiffies rollover ok? If jiffies rolled over and it is 1 but
> last_ping was ULONG_MAX, then ping_delay is going to be very large.

Fixed.

>
>
>
> +static noop_op_type_e
> +str_to_noop_op(char *str)
> +{
> +       noop_op_type_e op;
> +
> +       if (!strcmp("dump", str))
> +               op = NOOP_OP_DUMP;
> +       else if (!strcmp("clear", str))
> +               op = NOOP_OP_CLEAR;
> +       else
> +               op = NOOP_OP_NONE;
> +
> +       return op;
> +}
>
> Should we make these names clear that it resets/dumps the noop data. If we
> added a scsi cmd timer option, then we probably want to be able to
> dump/reset one or the other. So rename dump to dump_noop and clear to
> clear_noop.

Fixed. We also need to replace str_to_noop_op with
str_to_debug_info_op (and change the enum accordingly).

>
>
>
> case MODE_NODE:
>        if ((rc = verify_mode_params(argc, argv, "RsPIdmlSonvupTUL",
>
> This line needs a 'i'. It looks like you added the node compat support but
> you need to enable it with the 'i'. I added it when I tested your patch and
> it worked fine, so just added it when you send the patch.
>
>

OK

> Also what about debugfs? Did you think you would want the flexibility? Is
> that too many interfaces (sysfs, netlink, bsg and debugfs)?

I'm not very familiar with debugfs. What do you want to do with it?

Erez

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Suggestion for new logging mechanism in open-iscsi

2010-01-14 Thread Erez Zilber

On Thu, Jan 14, 2010 at 1:04 PM, Erez Zilber  wrote:
> On Thu, Jan 14, 2010 at 10:44 AM, Ulrich Windl
>  wrote:
>> On 13 Jan 2010 at 18:56, Erez Zilber wrote:
>>
>> Hi!
>>
>> I wonder whether the #define could be replaced with an inline function.
>> The readability would be much better, and the code should be more or
>> less the same, unless you need special macro processing. For code size
>> we could even try to make it a "normal" routine (i.e. not inlined). The
>> performance might suffer a little bit, however.
>>
>> Also: With "-O3" gcc usually automatically inlines short "static"
>> (locally defined?) routines.
>>
>
> We can make it inline instead of using a macro. I used macros because
> this is what we had till now, and I didn't want to change that. The
> only problem with inline functions is that it's up to the compiler to
> decide if the function will be inlined. This may affect performance in
> some cases.
>
> Erez
>

One more thing: take a look at _iscsi_sw_tcp_log. If we have a large
amount of contexts in the module (connection, session, error handling,
data transfer, socket, module etc), the switch block may become quite
large. I don't know if the compiler will decide to make such a
function inline.

Erez
-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Suggestion for new logging mechanism in open-iscsi

2010-01-14 Thread Erez Zilber

On Thu, Jan 14, 2010 at 10:44 AM, Ulrich Windl
 wrote:
> On 13 Jan 2010 at 18:56, Erez Zilber wrote:
>
> Hi!
>
> I wonder whether the #define could be replaced with an inline function.
> The readability would be much better, and the code should be more or
> less the same, unless you need special macro processing. For code size
> we could even try to make it a "normal" routine (i.e. not inlined). The
> performance might suffer a little bit, however.
>
> Also: With "-O3" gcc usually automatically inlines short "static"
> (locally defined?) routines.
>

We can make it inline instead of using a macro. I used macros because
this is what we had till now, and I didn't want to change that. The
only problem with inline functions is that it's up to the compiler to
decide if the function will be inlined. This may affect performance in
some cases.

Erez
-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Suggestion for new logging mechanism in open-iscsi

2010-01-13 Thread Erez Zilber

On Wed, Dec 16, 2009 at 5:31 PM, Erez Zilber  wrote:
> On Wed, Dec 2, 2009 at 10:55 AM, Erez Zilber  wrote:
>> I'd like to make some changes in the logging in open-iscsi. The
>> current status is as follows:
>>
>> kernel modules:
>>
>> * We use iscsi_cls_session_printk & iscsi_cls_conn_printk in
>> scsi_transport_iscsi.c. They are sometimes wrapped by macros (e.g.
>> ISCSI_DBG_TRANS_SESSION). These macros use KERN_INFO and are
>> controlled by module parameters.
>>
>> * We use iscsi_session_printk & iscsi_conn_printk for the rest of the
>> kernel code.These macros wrap iscsi_cls_session_printk &
>> iscsi_cls_conn_printk accordingly. They are sometimes wrapped by
>> macros (e.g. ISCSI_SW_TCP_DBG). These macros use KERN_INFO and are
>> controlled by module parameters.
>>
>> * We sometimes use printk calls.
>>
>> userspace:
>>
>> We use log_warning, log_error & log_debug. They depend on the logging
>> level that we use (0-8). if (log_level > level), the log is sent to
>> syslog with the appropriate log level (LOG_WARNING/LOG_ERR/LOG_DEBUG).
>>
>> My motivation: with the current logging mechanism, if an error occurs,
>> I'm unable to tell exactly what happened. The default logging level is
>> too low. Increasing it affects performance. Another problem is that
>> open-iscsi has too many logging mechanisms.
>>
>> I suggest that:
>> 1. For kernel modules, we will have 'events' (or any better name that
>> you suggest) like 'session', 'conn', 'eh', 'cmd' etc. For each event,
>> we will have a logging level. For example, the user may want to set
>> the 'conn' event to 'DEBUG'. It means that we will print all conn
>> related logs that are DEBUG and above (e.g. WARNING, ERROR).
>
> I suggest that each kernel module will have its own events. Each event
> will be represented by a module parameter (with some default value).
>
>> 2. For userspace code, we could do the same (i.e. have events and a
>> log level per event).
>
> Regarding the 'events' in userspace - we will have events A, B & C for
> iscsid and events D, E & F for iscsiadm. For each event, we will
> probably have a default logging level. The user may want to run with
> another logging level for each event. For iscsid, I suggest that we
> add this to iscsid.conf. For iscsiadm, the user will be able to do
> something like:
>
> iscsiadm -d some_level - this will set all events to 'some_level'
> iscsiadm -dE level_for_E -dF level_for_F - this will set the event 'E'
> to 'level_for_E' and the event 'F' to 'level_for_F'. The event 'D'
> will use the default logging level.
>
> Comments?
>
> Thanks,
> Erez
>

I've started working on the new logging. I've started with iscsi_tcp.
Here's a glance of the general idea. If you have comments on the
general implementation, let's discuss them now. Later, it will be much
more difficult for me.

Here it is:

Added the following code in libiscsi.h:

#define iscsi_log(log_level, dev, dbg_fmt, arg...)  \
do {\
char *log_level_str;\
switch (log_level) {\
case ISCSI_LOG_ERROR:   \
log_level_str = "ERROR";\
break;  \
case ISCSI_LOG_WARN:\
log_level_str = "WARN"; \
break;  \
case ISCSI_LOG_INFO:\
log_level_str = "INFO"; \
break;  \
case ISCSI_LOG_DEBUG:   \
log_level_str = "DEBUG";\
break;  \
case ISCSI_LOG_TRACE:   \
log_level_str = "TRACE";\
break;  \
}   \
if (log_level > ISCSI_LOG_INFO) {   \

kernel oops in resched_task

2010-01-06 Thread Erez Zilber

Hi,

I got this oops while running open-iscsi on a CentOS 5.3 machine
(don't know how to recreate it). it crashes while trying to wake up
the work queue after queuecommand was called. Has anyone seen
something similar?

Jan  2 22:24:27 172.16.9.55 RIP: 0010:[]
Jan  2 22:24:27 172.16.9.55  [] resched_task+0x12/0x65
Jan  2 22:24:27 172.16.9.55 RSP: 0018:81033f4e3948  EFLAGS: 00010097
Jan  2 22:24:27 172.16.9.55 RAX: 80420400 RBX: a7784ff72815 RCX:
81033f4e2000
Jan  2 22:24:27 172.16.9.55 RDX: 3f4e2048 RSI: 810302f0d0f8 RDI:
8103307e47a0
Jan  2 22:24:27 172.16.9.55 RBP: 81033f4e3948 R08: 810653e3fea0 R09:
0020
Jan  2 22:24:27 172.16.9.55 R10:  R11:  R12:
0007
Jan  2 22:24:27 172.16.9.55 R13: 81065d7990c0 R14: 0007 R15:
81000901c480
Jan  2 22:24:27 172.16.9.55 FS:  2aac40713940()
GS:81010dae53c0() knlGS:
Jan  2 22:24:27 172.16.9.55 CS:  0010 DS:  ES:  CR0: 80050033
Jan  2 22:24:27 172.16.9.55 CR2: 00017aae7c80 CR3: 0005ea57f000 CR4:
06e0
Jan  2 22:24:27 172.16.9.55 Process kal (pid: 15472, threadinfo
81033f4e2000, task 8103307e47a0)
Jan  2 22:24:27 172.16.9.55 Stack:
Jan  2 22:24:27 172.16.9.55  81033f4e39e8
Jan  2 22:24:27 172.16.9.55  8004676f
Jan  2 22:24:27 172.16.9.55  307e47a0
Jan  2 22:24:27 172.16.9.55  0001
Jan  2 22:24:27 172.16.9.55
Jan  2 22:24:27 172.16.9.55  000700115288
Jan  2 22:24:27 172.16.9.55  0001
Jan  2 22:24:27 172.16.9.55  0007
Jan  2 22:24:27 172.16.9.55  
Jan  2 22:24:27 172.16.9.55
Jan  2 22:24:27 172.16.9.55  663135343732
Jan  2 22:24:27 172.16.9.55  80012e80
Jan  2 22:24:27 172.16.9.55  0010
Jan  2 22:24:27 172.16.9.55  
Jan  2 22:24:27 172.16.9.55
Jan  2 22:24:27 172.16.9.55 Call Trace:
Jan  2 22:24:27 172.16.9.55  [] try_to_wake_up+0x422/0x481
Jan  2 22:24:27 172.16.9.55  [] get_request+0x1a6/0x36d
Jan  2 22:24:27 172.16.9.55  []
autoremove_wake_function+0x9/0x2e
Jan  2 22:24:27 172.16.9.55  []
__wake_up_common+0x3e/0x68
Jan  2 22:24:27 172.16.9.55  [] __wake_up+0x38/0x4f
Jan  2 22:24:27 172.16.9.55  [] vprintk+0x2b2/0x317
Jan  2 22:24:27 172.16.9.55  []
generic_make_request+0x1d2/0x1e9
Jan  2 22:24:27 172.16.9.55  [] printk+0x52/0xbd
Jan  2 22:24:27 172.16.9.55  []
__queue_work+0x43/0x53
Jan  2 22:24:27 172.16.9.55  [] queue_work+0x4e/0x57
Jan  2 22:24:27 172.16.9.55  [] queue_work+0x4e/0x57
Jan  2 22:24:27 172.16.9.55  []
:libiscsi:iscsi_queuecommand+0x40e/0x434
Jan  2 22:24:27 172.16.9.55  []
:scsi_mod:scsi_done+0x0/0x18
Jan  2 22:24:27 172.16.9.55  []
__sched_text_start+0x78/0xbd5
Jan  2 22:24:27 172.16.9.55  []
elv_next_request+0x196/0x1a6
Jan  2 22:24:27 172.16.9.55  []
do_gettimeofday+0x40/0x8f
Jan  2 22:24:27 172.16.9.55  []
getnstimeofday+0x10/0x28
Jan  2 22:24:27 172.16.9.55  []
io_schedule+0x3f/0x67
Jan  2 22:24:27 172.16.9.55  []
__blockdev_direct_IO+0x8b0/0xa19
Jan  2 22:24:27 172.16.9.55  []
blkdev_direct_IO+0x32/0x37
Jan  2 22:24:27 172.16.9.55  []
blkdev_get_blocks+0x0/0x96
Jan  2 22:24:27 172.16.9.55  []
__generic_file_aio_read+0xb8/0x190
Jan  2 22:24:27 172.16.9.55  []
generic_file_read+0xac/0xc5
Jan  2 22:24:27 172.16.9.55  []
autoremove_wake_function+0x0/0x2e
Jan  2 22:24:27 172.16.9.55  []
thread_return+0x62/0xfe
Jan  2 22:24:27 172.16.9.55  []
audit_syscall_entry+0x16e/0x1a1
Jan  2 22:24:27 172.16.9.55  [] vfs_read+0xcb/0x171
Jan  2 22:24:27 172.16.9.55  []
sys_pread64+0x50/0x70
Jan  2 22:24:27 172.16.9.55  [] tracesys+0x71/0xe0
Jan  2 22:24:27 172.16.9.55  [] tracesys+0xd5/0xe0
Jan  2 22:24:27 172.16.9.55
Jan  2 22:24:27 172.16.9.55
Jan  2 22:24:27 172.16.9.55 Code:
Jan  2 22:24:27 172.16.9.55 48
Jan  2 22:24:27 172.16.9.55 8b
Jan  2 22:24:27 172.16.9.55 14
Jan  2 22:24:27 172.16.9.55 d5
Jan  2 22:24:27 172.16.9.55 40
Jan  2 22:24:27 172.16.9.55 7a
Jan  2 22:24:27 172.16.9.55 3d
Jan  2 22:24:27 172.16.9.55 80
Jan  2 22:24:27 172.16.9.55 48
Jan  2 22:24:27 172.16.9.55 03
Jan  2 22:24:27 172.16.9.55 42
Jan  2 22:24:27 172.16.9.55 08
Jan  2 22:24:27 172.16.9.55 8b
Jan  2 22:24:27 172.16.9.55 00
Jan  2 22:24:27 172.16.9.55 85
Jan  2 22:24:27 172.16.9.55 c0
Jan  2 22:24:27 172.16.9.55 7e
Jan  2 22:24:27 172.16.9.55 0a
Jan  2 22:24:27 172.16.9.55 0f
Jan  2 22:24:27 172.16.9.55 0b
Jan  2 22:24:27 172.16.9.55
Jan  2 22:24:27 172.16.9.55 RIP
Jan  2 22:24:27 172.16.9.55  [] resched_task+0x12/0x65
Jan  2 22:24:27 172.16.9.55  RSP 
Jan  2 22:24:28 172.16.9.55 BUG: warning at
arch/x86_64/kernel/crash.c:148/nmi_shootdown_cpus() (Tainted: GF)
Jan  2 22:24:28 172.16.9.55
Jan  2 22:24:28 172.16.9.55 Call Trace:
Jan  2 22:24:28 172.16.9.55  []
machine_crash_shutdown+0xaa/0xf3
Jan  2 22:24:28 172.16.9.55  [] crash_kexec+0xcc/0xe8
Jan  2 22:24:28 172.16.9.55  [] resched_task+0x12/0x65
Jan  2 22:24:28 172.16.9.55  [] __die+0xf6/0xff
Jan  2 22:24:28 172.16.9.55  [] do_

Re: [PATCH] Maintain a list of nop-out PDUs that almost timed out

2009-12-28 Thread Erez Zilber

On Sun, Dec 20, 2009 at 3:57 PM, Erez Zilber  wrote:
> On Thu, Dec 17, 2009 at 5:01 AM, Mike Christie  wrote:
>> Erez Zilber wrote:
>>> On Tue, Dec 15, 2009 at 10:34 AM, Mike Christie  
>>> wrote:
>>>> Erez Zilber wrote:
>>>>> Maintain a list of nop-out PDUs that almost timed out.
>>>>> With this information, you can understand and debug the
>>>>> whole system: you can check your target and see what caused
>>>>> it to be so slow on that specific time, you can see if your
>>>>> network was very busy during that time etc.
>>>>>
>>>>> Signed-off-by: Erez Zilber 
>>>>>
>>>> Sorry for the late reply. Thanks for doing this work!
>>>>
>>>>
>>>> @@ -88,11 +89,12 @@ static struct option const long_options[] =
>>>>         {"killiscsid", required_argument, NULL, 'k'},
>>>>         {"debug", required_argument, NULL, 'd'},
>>>>         {"show", no_argument, NULL, 'S'},
>>>> +       {"noopinfo", required_argument, NULL, 'N'},
>>>>         {"version", no_argument, NULL, 'V'},
>>>>         {"help", no_argument, NULL, 'h'},
>>>>         {NULL, 0, NULL, 0},
>>>>  };
>>>>
>>>>
>>>> Do you think you want something more generic like a get-debug-info type
>>>> of feature? Maybe in the future we could also show how many times a scsi
>>>> command has timed out or almost timed out (user can adjust scsi timer
>>>> then) or how many times the scsi eh has fired and how many times a
>>>> abort, lu or target reset has timed out?
>>>>
>>>> So maybe it would be get like get stats where then the noop info is the
>>>> first stat and in the future we will have more?
>>>
>>> Sounds good. Let's change it to {"debuginfo", required_argument, NULL, 
>>> 'i'}. OK?
>>
>> Ok with me.
>>
>>
>>>>
>>>>
>>>>
>>>> @@ -994,8 +997,25 @@ static int iscsi_nop_out_rsp(struct iscsi_task *task,
>>>>                 if (iscsi_recv_pdu(conn->cls_conn, (struct iscsi_hdr *)nop,
>>>>                                    data, datalen))
>>>>                         rc = ISCSI_ERR_CONN_FAILED;
>>>> -       } else
>>>> +       } else {
>>>> +               ping_delay = jiffies - conn->last_ping;
>>>> +
>>>> +               /* Check if the ping has almost timed out */
>>>> +               if (ping_delay >=
>>>> +                   (session->noop_threshold * (conn->ping_timeout * HZ)) /
>>>> +                   100) {
>>>> +                       mutex_lock(&conn->noop_info_mutex);
>>>>
>>>>
>>>>
>>>> We annot use a mutex here, because we run from a bottom half (softirq
>>>> for iscsi and a tasklet for iser), and we cannot sleep in a bh since
>>>> there is no process context.
>>>
>>> You're right. I'll replace the mutex with a spinlock.
>>>
>>>>
>>>> +                       idx = conn->noop_info_arr_idx;
>>>> +                       conn->noop_info_arr[idx].tv = conn->tmp_noop_tv;
>>>> +                       conn->noop_info_arr[idx].elapsed_time_msec =
>>>> ping_delay;
>>>>
>>>> I think ping_delay is just in jiffies, so you have to do some sort of
>>>> jiffies_to_msec call (I cannot rememeber the name of the helper but
>>>> there is one).
>>>
>>> I think it should be: conn->noop_info_arr[idx].elapsed_time_msec =
>>> ping_delay * HZ / 1000
>>
>>
>>
>> I think the function is jiffies_to_msecs()
>>
>>
>>
>>>
>>>>
>>>> +                       conn->noop_info_arr[idx].init = 1;
>>>> +                       conn->noop_info_arr_idx =
>>>> +                               (conn->noop_info_arr_idx + 1) %
>>>> NOOP_INFO_ARR_LEN;
>>>>
>>>>
>>>> I am not getting the reason for the division?
>>>
>>> Are you talking about "conn->noop_info_arr_idx =
>>> (conn->noop_info_arr_idx + 1) % NOOP_INFO_ARR_LEN"? It's a cyclic
>>> array.
>>>
>>
>> Ah, nevermind :)
>>
>> --
>>
>
> Attached with the fixes that we discussed. I tested it with a 2.6.18
> kernel. I was able to apply the following patches:
>
> * 2.6.14-23_compat.patch
> * 2.6.24_compat.patch
> * 2.6.26_compat.patch
> * 2.6.27_compat.patch
>
> What about the following patches?
> * 2.6.14-19_compat.patch
> * 2.6.16-suse.patch
> * 2.6.20-21_compat.patch
>
> Are they still maintained? If not, can we throw (at least some of) them away?
>
> Erez
>

Found a small (and stupid) bug in my patch:

diff --git a/usr/iscsiadm.c b/usr/iscsiadm.c
index 6188f43..ee4cfc4 100644
--- a/usr/iscsiadm.c
+++ b/usr/iscsiadm.c
@@ -94,7 +94,7 @@ static struct option const long_options[] =
{"help", no_argument, NULL, 'h'},
{NULL, 0, NULL, 0},
 };
-static char *short_options = "RlVhm:p:P:T:H:I:U:k:L:d:r:n:v:o:sSt:u:i:";
+static char *short_options = "RlVhm:p:P:T:H:I:U:k:L:d:r:n:v:o:sSt:ui:";

 static void usage(int status)
 {

Mike - I will send an updated patch after you review the rest of the patch.

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Help:insmod: error inserting ib_iser.ko

2009-12-21 Thread Erez Zilber

What is the output of 'iscsiadm -m session -P 3'? You should see something like:


Attached SCSI devices:

Host Number: 29 State: running
scsi29 Channel 00 Id 0 Lun: 0
Attached scsi disk sdb  State: running

If you have a session without an attached device, maybe your iSCSI
target exposes a target (e.g. iqn.something) but there is no device
attached to that target. Which target do you use?

Erez

On Mon, Dec 21, 2009 at 10:31 AM, Xintao Zhang
 wrote:
> Hi Erez
> I can see the target disk in dmesg -c,but where is not device in /dev/
> can you  can help me ?
>
> 2009/12/21 Xintao Zhang 
>>
>> Hi Erez
>> I do it with you method,I can see the target disk.
>> I want know what tools I can use if I use the ib_iSER module or I don't
>> want see the error prompt in the screen.
>> Do you have good idae ?
>>
>> 2009/12/20 Erez Zilber 
>>>
>>> Please do the following:
>>> 1. Stop open-iscsi: make sure that 'lsmod | grep iscsi' & 'ps aux |
>>> grep iscsi' return nothing.
>>> 2. Make sure that no other version of open-iscsi is installed: run
>>> 'rpm -qa | grep iscsi'.
>>> 2. Run 'dmesg -c'.
>>> 3. Start open-iscsi: /etc/init.d/open-iscsi start
>>> 4. Send the output of 'lsmod | grep iscsi' & 'dmesg -c'.
>>>
>>> Erez
>>>
>>> On Thu, Dec 17, 2009 at 4:23 AM, Xintao Zhang
>>>  wrote:
>>> > I start open-iscsi with you method,but appear a problem.
>>> > I use the tail -f /var/log/messages can't find iscis device.
>>> > can you help me?
>>> > 2009/12/17 Xintao Zhang 
>>> >>
>>> >> Thanks Erez
>>> >>
>>> >> 2009/12/16 Erez Zilber 
>>> >>>
>>> >>> I guess that you want to run open-iscsi over iscsi_tcp which is the
>>> >>> default open-iscsi transport. Assuming that all other modules were
>>> >>> loaded successfully (libiscsi, iscsi_tcp, scsi_transport_iscsi and
>>> >>> libiscsi_tcp (depends on the version of open-iscsi that you use)),
>>> >>> you
>>> >>> don't need to do anything. open-iscsi has multiple transport
>>> >>> (iscsi_tcp, ib_iser, bnx2i, cxgb3i). The error that you got means
>>> >>> that
>>> >>> the ib_iser transport could not be loaded, but you don't need iSER
>>> >>> anyway.
>>> >>>
>>> >>> Erez
>>> >>>
>>> >>> On Wed, Dec 16, 2009 at 11:47 AM, Xintao Zhang
>>> >>>  wrote:
>>> >>> > I don't want use th "iser"
>>> >>> > I want the open-iscsi run normally run,can you  help me?
>>> >>> >
>>> >>> > 2009/12/16 Erez Zilber 
>>> >>> >>
>>> >>> >> On Tue, Dec 15, 2009 at 12:26 PM, DeepBlue
>>> >>> >> 
>>> >>> >> wrote:
>>> >>> >> > There is a error with I inserting the ib_iser.ko module as
>>> >>> >> > following:
>>> >>> >> > 
>>> >>> >> > [r...@localhost init.d]# insmod
>>> >>> >> > /lib/modules/2.6.18-8.el5/kernel/
>>> >>> >> > drivers/infiniband/ulp/iser/ib_iser.ko
>>> >>> >> > insmod: error inserting
>>> >>> >> > '/lib/modules/2.6.18-8.el5/kernel/drivers/
>>> >>> >> > infiniband/ulp/iser/ib_iser.ko': -1 Unknown symbol in module
>>> >>> >> > 
>>> >>> >> > my linux kernel 2.6.18-8.el5
>>> >>> >> > who can help me?
>>> >>> >> >
>>> >>> >>
>>> >>> >> This is because your ib_iser module was built against open-iscsi
>>> >>> >> from
>>> >>> >> the redhat kernel (2.6.18-8.el5). You are currently running
>>> >>> >> open-iscsi
>>> >>> >> modules that you took from open-iscsi.org. If you run 'dmesg -c',
>>> >>> >> you'll see the list of symbols that it disagrees on.
>>

Re: Help:insmod: error inserting ib_iser.ko

2009-12-21 Thread Erez Zilber

You shouldn't care too much about the ib_iser error message. If you
don't want to see it, edit /etc/init.d/open-iscsi and remove the
modprobe line that loads ib_iser.

Erez

On Mon, Dec 21, 2009 at 10:19 AM, Xintao Zhang
 wrote:
> Hi Erez
> I do it with you method,I can see the target disk.
> I want know what tools I can use if I use the ib_iSER module or I don't want
> see the error prompt in the screen.
> Do you have good idae ?
>
> 2009/12/20 Erez Zilber 
>>
>> Please do the following:
>> 1. Stop open-iscsi: make sure that 'lsmod | grep iscsi' & 'ps aux |
>> grep iscsi' return nothing.
>> 2. Make sure that no other version of open-iscsi is installed: run
>> 'rpm -qa | grep iscsi'.
>> 2. Run 'dmesg -c'.
>> 3. Start open-iscsi: /etc/init.d/open-iscsi start
>> 4. Send the output of 'lsmod | grep iscsi' & 'dmesg -c'.
>>
>> Erez
>>
>> On Thu, Dec 17, 2009 at 4:23 AM, Xintao Zhang
>>  wrote:
>> > I start open-iscsi with you method,but appear a problem.
>> > I use the tail -f /var/log/messages can't find iscis device.
>> > can you help me?
>> > 2009/12/17 Xintao Zhang 
>> >>
>> >> Thanks Erez
>> >>
>> >> 2009/12/16 Erez Zilber 
>> >>>
>> >>> I guess that you want to run open-iscsi over iscsi_tcp which is the
>> >>> default open-iscsi transport. Assuming that all other modules were
>> >>> loaded successfully (libiscsi, iscsi_tcp, scsi_transport_iscsi and
>> >>> libiscsi_tcp (depends on the version of open-iscsi that you use)), you
>> >>> don't need to do anything. open-iscsi has multiple transport
>> >>> (iscsi_tcp, ib_iser, bnx2i, cxgb3i). The error that you got means that
>> >>> the ib_iser transport could not be loaded, but you don't need iSER
>> >>> anyway.
>> >>>
>> >>> Erez
>> >>>
>> >>> On Wed, Dec 16, 2009 at 11:47 AM, Xintao Zhang
>> >>>  wrote:
>> >>> > I don't want use th "iser"
>> >>> > I want the open-iscsi run normally run,can you  help me?
>> >>> >
>> >>> > 2009/12/16 Erez Zilber 
>> >>> >>
>> >>> >> On Tue, Dec 15, 2009 at 12:26 PM, DeepBlue
>> >>> >> 
>> >>> >> wrote:
>> >>> >> > There is a error with I inserting the ib_iser.ko module as
>> >>> >> > following:
>> >>> >> > 
>> >>> >> > [r...@localhost init.d]# insmod /lib/modules/2.6.18-8.el5/kernel/
>> >>> >> > drivers/infiniband/ulp/iser/ib_iser.ko
>> >>> >> > insmod: error inserting
>> >>> >> > '/lib/modules/2.6.18-8.el5/kernel/drivers/
>> >>> >> > infiniband/ulp/iser/ib_iser.ko': -1 Unknown symbol in module
>> >>> >> > 
>> >>> >> > my linux kernel 2.6.18-8.el5
>> >>> >> > who can help me?
>> >>> >> >
>> >>> >>
>> >>> >> This is because your ib_iser module was built against open-iscsi
>> >>> >> from
>> >>> >> the redhat kernel (2.6.18-8.el5). You are currently running
>> >>> >> open-iscsi
>> >>> >> modules that you took from open-iscsi.org. If you run 'dmesg -c',
>> >>> >> you'll see the list of symbols that it disagrees on.
>> >>> >>
>> >>> >> What can you do? Assuming that you want to use iSER, use the
>> >>> >> open-iscsi kernel modules from kernel + open-iscsi userspace tools
>> >>> >> from open-iscsi.org. Another (easier) option is to install OFED
>> >>> >> that
>> >>> >> has open-iscsi with iSER support.
>> >>> >>
>> >>> >> Erez
>> >>> >>
>> >>> >> --
>> >>> >>
>> >>> >> You received this message because you are subscribed to the Google
>> >>> >> Groups
>> >>> >> "open-iscsi" group.
>> >>> >> To post to this group, send email to open-is...@googlegroups.com.
>> >>> >> To unsubscribe from

Re: Help:insmod: error inserting ib_iser.ko

2009-12-20 Thread Erez Zilber

Please do the following:
1. Stop open-iscsi: make sure that 'lsmod | grep iscsi' & 'ps aux |
grep iscsi' return nothing.
2. Make sure that no other version of open-iscsi is installed: run
'rpm -qa | grep iscsi'.
2. Run 'dmesg -c'.
3. Start open-iscsi: /etc/init.d/open-iscsi start
4. Send the output of 'lsmod | grep iscsi' & 'dmesg -c'.

Erez

On Thu, Dec 17, 2009 at 4:23 AM, Xintao Zhang
 wrote:
> I start open-iscsi with you method,but appear a problem.
> I use the tail -f /var/log/messages can't find iscis device.
> can you help me?
> 2009/12/17 Xintao Zhang 
>>
>> Thanks Erez
>>
>> 2009/12/16 Erez Zilber 
>>>
>>> I guess that you want to run open-iscsi over iscsi_tcp which is the
>>> default open-iscsi transport. Assuming that all other modules were
>>> loaded successfully (libiscsi, iscsi_tcp, scsi_transport_iscsi and
>>> libiscsi_tcp (depends on the version of open-iscsi that you use)), you
>>> don't need to do anything. open-iscsi has multiple transport
>>> (iscsi_tcp, ib_iser, bnx2i, cxgb3i). The error that you got means that
>>> the ib_iser transport could not be loaded, but you don't need iSER
>>> anyway.
>>>
>>> Erez
>>>
>>> On Wed, Dec 16, 2009 at 11:47 AM, Xintao Zhang
>>>  wrote:
>>> > I don't want use th "iser"
>>> > I want the open-iscsi run normally run,can you  help me?
>>> >
>>> > 2009/12/16 Erez Zilber 
>>> >>
>>> >> On Tue, Dec 15, 2009 at 12:26 PM, DeepBlue
>>> >> 
>>> >> wrote:
>>> >> > There is a error with I inserting the ib_iser.ko module as
>>> >> > following:
>>> >> > 
>>> >> > [r...@localhost init.d]# insmod /lib/modules/2.6.18-8.el5/kernel/
>>> >> > drivers/infiniband/ulp/iser/ib_iser.ko
>>> >> > insmod: error inserting '/lib/modules/2.6.18-8.el5/kernel/drivers/
>>> >> > infiniband/ulp/iser/ib_iser.ko': -1 Unknown symbol in module
>>> >> > 
>>> >> > my linux kernel 2.6.18-8.el5
>>> >> > who can help me?
>>> >> >
>>> >>
>>> >> This is because your ib_iser module was built against open-iscsi from
>>> >> the redhat kernel (2.6.18-8.el5). You are currently running open-iscsi
>>> >> modules that you took from open-iscsi.org. If you run 'dmesg -c',
>>> >> you'll see the list of symbols that it disagrees on.
>>> >>
>>> >> What can you do? Assuming that you want to use iSER, use the
>>> >> open-iscsi kernel modules from kernel + open-iscsi userspace tools
>>> >> from open-iscsi.org. Another (easier) option is to install OFED that
>>> >> has open-iscsi with iSER support.
>>> >>
>>> >> Erez
>>> >>
>>> >> --
>>> >>
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups
>>> >> "open-iscsi" group.
>>> >> To post to this group, send email to open-is...@googlegroups.com.
>>> >> To unsubscribe from this group, send email to
>>> >> open-iscsi+unsubscr...@googlegroups.com.
>>> >> For more options, visit this group at
>>> >> http://groups.google.com/group/open-iscsi?hl=en.
>>> >>
>>> >>
>>> >
>>> > --
>>> >
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "open-iscsi" group.
>>> > To post to this group, send email to open-is...@googlegroups.com.
>>> > To unsubscribe from this group, send email to
>>> > open-iscsi+unsubscr...@googlegroups.com.
>>> > For more options, visit this group at
>>> > http://groups.google.com/group/open-iscsi?hl=en.
>>> >
>>>
>>> --
>>>
>>> You received this message because you are subscribed to the Google Groups
>>> "open-iscsi" group.
>>> To post to this group, send email to open-is...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> open-iscsi+unsubscr...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/open-iscsi?hl=en.
>>>
>>>
>>
>
> --
>
> You received this message because you are subscribed to the Google Groups
> "open-iscsi" group.
> To post to this group, send email to open-is...@googlegroups.com.
> To unsubscribe from this group, send email to
> open-iscsi+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/open-iscsi?hl=en.
>

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Suggestion for new logging mechanism in open-iscsi

2009-12-16 Thread Erez Zilber

On Wed, Dec 2, 2009 at 10:55 AM, Erez Zilber  wrote:
> I'd like to make some changes in the logging in open-iscsi. The
> current status is as follows:
>
> kernel modules:
>
> * We use iscsi_cls_session_printk & iscsi_cls_conn_printk in
> scsi_transport_iscsi.c. They are sometimes wrapped by macros (e.g.
> ISCSI_DBG_TRANS_SESSION). These macros use KERN_INFO and are
> controlled by module parameters.
>
> * We use iscsi_session_printk & iscsi_conn_printk for the rest of the
> kernel code.These macros wrap iscsi_cls_session_printk &
> iscsi_cls_conn_printk accordingly. They are sometimes wrapped by
> macros (e.g. ISCSI_SW_TCP_DBG). These macros use KERN_INFO and are
> controlled by module parameters.
>
> * We sometimes use printk calls.
>
> userspace:
>
> We use log_warning, log_error & log_debug. They depend on the logging
> level that we use (0-8). if (log_level > level), the log is sent to
> syslog with the appropriate log level (LOG_WARNING/LOG_ERR/LOG_DEBUG).
>
> My motivation: with the current logging mechanism, if an error occurs,
> I'm unable to tell exactly what happened. The default logging level is
> too low. Increasing it affects performance. Another problem is that
> open-iscsi has too many logging mechanisms.
>
> I suggest that:
> 1. For kernel modules, we will have 'events' (or any better name that
> you suggest) like 'session', 'conn', 'eh', 'cmd' etc. For each event,
> we will have a logging level. For example, the user may want to set
> the 'conn' event to 'DEBUG'. It means that we will print all conn
> related logs that are DEBUG and above (e.g. WARNING, ERROR).

I suggest that each kernel module will have its own events. Each event
will be represented by a module parameter (with some default value).

> 2. For userspace code, we could do the same (i.e. have events and a
> log level per event).

Regarding the 'events' in userspace - we will have events A, B & C for
iscsid and events D, E & F for iscsiadm. For each event, we will
probably have a default logging level. The user may want to run with
another logging level for each event. For iscsid, I suggest that we
add this to iscsid.conf. For iscsiadm, the user will be able to do
something like:

iscsiadm -d some_level - this will set all events to 'some_level'
iscsiadm -dE level_for_E -dF level_for_F - this will set the event 'E'
to 'level_for_E' and the event 'F' to 'level_for_F'. The event 'D'
will use the default logging level.

Comments?

Thanks,
Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Help:insmod: error inserting ib_iser.ko

2009-12-16 Thread Erez Zilber

I guess that you want to run open-iscsi over iscsi_tcp which is the
default open-iscsi transport. Assuming that all other modules were
loaded successfully (libiscsi, iscsi_tcp, scsi_transport_iscsi and
libiscsi_tcp (depends on the version of open-iscsi that you use)), you
don't need to do anything. open-iscsi has multiple transport
(iscsi_tcp, ib_iser, bnx2i, cxgb3i). The error that you got means that
the ib_iser transport could not be loaded, but you don't need iSER
anyway.

Erez

On Wed, Dec 16, 2009 at 11:47 AM, Xintao Zhang
 wrote:
> I don't want use th "iser"
> I want the open-iscsi run normally run,can you  help me?
>
> 2009/12/16 Erez Zilber 
>>
>> On Tue, Dec 15, 2009 at 12:26 PM, DeepBlue 
>> wrote:
>> > There is a error with I inserting the ib_iser.ko module as following:
>> > 
>> > [r...@localhost init.d]# insmod /lib/modules/2.6.18-8.el5/kernel/
>> > drivers/infiniband/ulp/iser/ib_iser.ko
>> > insmod: error inserting '/lib/modules/2.6.18-8.el5/kernel/drivers/
>> > infiniband/ulp/iser/ib_iser.ko': -1 Unknown symbol in module
>> > 
>> > my linux kernel 2.6.18-8.el5
>> > who can help me?
>> >
>>
>> This is because your ib_iser module was built against open-iscsi from
>> the redhat kernel (2.6.18-8.el5). You are currently running open-iscsi
>> modules that you took from open-iscsi.org. If you run 'dmesg -c',
>> you'll see the list of symbols that it disagrees on.
>>
>> What can you do? Assuming that you want to use iSER, use the
>> open-iscsi kernel modules from kernel + open-iscsi userspace tools
>> from open-iscsi.org. Another (easier) option is to install OFED that
>> has open-iscsi with iSER support.
>>
>> Erez
>>
>> --
>>
>> You received this message because you are subscribed to the Google Groups
>> "open-iscsi" group.
>> To post to this group, send email to open-is...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> open-iscsi+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/open-iscsi?hl=en.
>>
>>
>
> --
>
> You received this message because you are subscribed to the Google Groups
> "open-iscsi" group.
> To post to this group, send email to open-is...@googlegroups.com.
> To unsubscribe from this group, send email to
> open-iscsi+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/open-iscsi?hl=en.
>

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Help:insmod: error inserting ib_iser.ko

2009-12-15 Thread Erez Zilber

On Tue, Dec 15, 2009 at 12:26 PM, DeepBlue  wrote:
> There is a error with I inserting the ib_iser.ko module as following:
> 
> [r...@localhost init.d]# insmod /lib/modules/2.6.18-8.el5/kernel/
> drivers/infiniband/ulp/iser/ib_iser.ko
> insmod: error inserting '/lib/modules/2.6.18-8.el5/kernel/drivers/
> infiniband/ulp/iser/ib_iser.ko': -1 Unknown symbol in module
> 
> my linux kernel 2.6.18-8.el5
> who can help me?
>

This is because your ib_iser module was built against open-iscsi from
the redhat kernel (2.6.18-8.el5). You are currently running open-iscsi
modules that you took from open-iscsi.org. If you run 'dmesg -c',
you'll see the list of symbols that it disagrees on.

What can you do? Assuming that you want to use iSER, use the
open-iscsi kernel modules from kernel + open-iscsi userspace tools
from open-iscsi.org. Another (easier) option is to install OFED that
has open-iscsi with iSER support.

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: [PATCH] Maintain a list of nop-out PDUs that almost timed out

2009-12-15 Thread Erez Zilber

On Tue, Dec 15, 2009 at 10:34 AM, Mike Christie  wrote:
> Erez Zilber wrote:
>> Maintain a list of nop-out PDUs that almost timed out.
>> With this information, you can understand and debug the
>> whole system: you can check your target and see what caused
>> it to be so slow on that specific time, you can see if your
>> network was very busy during that time etc.
>>
>> Signed-off-by: Erez Zilber 
>>
>
> Sorry for the late reply. Thanks for doing this work!
>
>
> @@ -88,11 +89,12 @@ static struct option const long_options[] =
>         {"killiscsid", required_argument, NULL, 'k'},
>         {"debug", required_argument, NULL, 'd'},
>         {"show", no_argument, NULL, 'S'},
> +       {"noopinfo", required_argument, NULL, 'N'},
>         {"version", no_argument, NULL, 'V'},
>         {"help", no_argument, NULL, 'h'},
>         {NULL, 0, NULL, 0},
>  };
>
>
> Do you think you want something more generic like a get-debug-info type
> of feature? Maybe in the future we could also show how many times a scsi
> command has timed out or almost timed out (user can adjust scsi timer
> then) or how many times the scsi eh has fired and how many times a
> abort, lu or target reset has timed out?
>
> So maybe it would be get like get stats where then the noop info is the
> first stat and in the future we will have more?

Sounds good. Let's change it to {"debuginfo", required_argument, NULL, 'i'}. OK?
>
>
>
>
> @@ -994,8 +997,25 @@ static int iscsi_nop_out_rsp(struct iscsi_task *task,
>                 if (iscsi_recv_pdu(conn->cls_conn, (struct iscsi_hdr *)nop,
>                                    data, datalen))
>                         rc = ISCSI_ERR_CONN_FAILED;
> -       } else
> +       } else {
> +               ping_delay = jiffies - conn->last_ping;
> +
> +               /* Check if the ping has almost timed out */
> +               if (ping_delay >=
> +                   (session->noop_threshold * (conn->ping_timeout * HZ)) /
> +                   100) {
> +                       mutex_lock(&conn->noop_info_mutex);
>
>
>
> We annot use a mutex here, because we run from a bottom half (softirq
> for iscsi and a tasklet for iser), and we cannot sleep in a bh since
> there is no process context.

You're right. I'll replace the mutex with a spinlock.

>
>
> +                       idx = conn->noop_info_arr_idx;
> +                       conn->noop_info_arr[idx].tv = conn->tmp_noop_tv;
> +                       conn->noop_info_arr[idx].elapsed_time_msec =
> ping_delay;
>
> I think ping_delay is just in jiffies, so you have to do some sort of
> jiffies_to_msec call (I cannot rememeber the name of the helper but
> there is one).

I think it should be: conn->noop_info_arr[idx].elapsed_time_msec =
ping_delay * HZ / 1000

>
>
> +                       conn->noop_info_arr[idx].init = 1;
> +                       conn->noop_info_arr_idx =
> +                               (conn->noop_info_arr_idx + 1) %
> NOOP_INFO_ARR_LEN;
>
>
> I am not getting the reason for the division?

Are you talking about "conn->noop_info_arr_idx =
(conn->noop_info_arr_idx + 1) % NOOP_INFO_ARR_LEN"? It's a cyclic
array.

>
>
> +                       mutex_unlock(&conn->noop_info_mutex);
> +               }
> +
>                 mod_timer(&conn->transport_timer, jiffies +
> conn->recv_timeout);
> +       }
>         iscsi_complete_task(task, ISCSI_TASK_COMPLETED);
>         return rc;
>  }
>
>
>
> @@ -2026,6 +2046,12 @@ static void
> iscsi_check_transport_timeouts(unsigned long data)
>         if (time_before_eq(last_recv + recv_timeout, jiffies)) {
>                 /* send a ping to try to provoke some traffic */
>                 ISCSI_DBG_CONN(conn, "Sending nopout as ping\n");
> +
> +               /* First, save the time of the ping for later use */
> +               mutex_lock(&conn->noop_info_mutex);
>
> Cannot use a mutex here either, because this is run from a timer. Timers
> are run from bhs too.

mutex -> spinlock.

Let me know if you're OK with my suggestions and I'll resend the patch.

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: How to make a 2.6..*_compat.patch without sub-directory?

2009-12-14 Thread Erez Zilber

Use git diff --relative (when you're in the open-iscsi/kernel dir).

Erez

On Mon, Dec 14, 2009 at 8:25 PM, Yangkook Kim  wrote:
> I posted similar message on other thread, but let me ask the same question
> with diffrent tittle.
>
> I want to make a kernel compat patch without "kernel/" sub-directory.
> I used "git diff" to output the patch, but each header of outputted
> patch includes "kernel/" sub-directory like below.
>
> e.g
> diff --git a/kernel/libiscsi.c b/kernel/libiscsi.c
> index 0b810b6..6ffb49c 100644
> --- a/kernel/libiscsi.c
> +++ b/kernel/libiscsi.c
>
> git history shows me that other people have made a compat patch
> without "kernel/" sub-directory in headers.
>
> Can anybody tell me how to do this?
>
> Thanks,
> Kim
>
> --
>
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To post to this group, send email to open-is...@googlegroups.com.
> To unsubscribe from this group, send email to 
> open-iscsi+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/open-iscsi?hl=en.
>
>
>

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Information about iSCSI pings that almost timed out

2009-12-08 Thread Erez Zilber

> Regrading the average delay of a ping request task - we need to have
> the average delay, but we're interested only in the average delay of
> pings that were sent lately (i.e. not pings that were sent a year
> ago). Am I right?
>
> I thought about having a cyclic array of delays in the kernel. It can
> hold the delays of the last X pings (e.g. X = 1000). Whenever the user
> runs 'iscsiadm -m session -s', this array will be sent to userspace
> and we can calc the average delay/standard deviation/whatever you want
> in userland.
>
> Comments?
>
> Erez
>

Anyone has comments on this? I'd like to start working on it and need
some feedback.

Thanks,
Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Suggestion for new logging mechanism in open-iscsi

2009-12-07 Thread Erez Zilber

On Mon, Dec 7, 2009 at 7:24 PM, Mike Christie  wrote:
> Erez Zilber wrote:
>> I'd like to make some changes in the logging in open-iscsi. The
>> current status is as follows:
>>
>> kernel modules:
>>
>> * We use iscsi_cls_session_printk & iscsi_cls_conn_printk in
>> scsi_transport_iscsi.c. They are sometimes wrapped by macros (e.g.
>> ISCSI_DBG_TRANS_SESSION). These macros use KERN_INFO and are
>> controlled by module parameters.
>>
>> * We use iscsi_session_printk & iscsi_conn_printk for the rest of the
>> kernel code.These macros wrap iscsi_cls_session_printk &
>> iscsi_cls_conn_printk accordingly. They are sometimes wrapped by
>> macros (e.g. ISCSI_SW_TCP_DBG). These macros use KERN_INFO and are
>> controlled by module parameters.
>>
>> * We sometimes use printk calls.
>>
>> userspace:
>>
>> We use log_warning, log_error & log_debug. They depend on the logging
>> level that we use (0-8). if (log_level > level), the log is sent to
>> syslog with the appropriate log level (LOG_WARNING/LOG_ERR/LOG_DEBUG).
>>
>> My motivation: with the current logging mechanism, if an error occurs,
>> I'm unable to tell exactly what happened. The default logging level is
>> too low. Increasing it affects performance. Another problem is that
>> open-iscsi has too many logging mechanisms.
>>
>> I suggest that:
>> 1. For kernel modules, we will have 'events' (or any better name that
>> you suggest) like 'session', 'conn', 'eh', 'cmd' etc. For each event,
>> we will have a logging level. For example, the user may want to set
>> the 'conn' event to 'DEBUG'. It means that we will print all conn
>> related logs that are DEBUG and above (e.g. WARNING, ERROR).
>> 2. For userspace code, we could do the same (i.e. have events and a
>> log level per event).
>> 3. Userspace logging uses the 'daemon' facility. This should
>> definitely be the default, but we should allow the user to use another
>> facility. The motivation for doing so is that if we want to send all
>> iscsid logs to a separate file, we can set it to 'local2' for example
>> (instead of 'daemon').
>>
>
> Sorry for the late reply.
>
> This sounds nice.
>
> When you do this, could you also unify what gets printed to id what
> object is logging the message. Currently the kernel prints a session or
> conn sysfs/bus id (session1 or connection1:2), but userspace prints
> whatever it wants. Sometimes it just prints out a log with nothing so
> you have no idea where it came from, and sometimes it prints a id that
> looks like a sysfs one.
>

Sure. The only thing that I don't know is how to get the
sessionX/connectionY string in userspace. Where is it stored?

Thanks,
Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Suggestion for new logging mechanism in open-iscsi

2009-12-02 Thread Erez Zilber

On Wed, Dec 2, 2009 at 3:02 PM, Ulrich Windl
 wrote:
> On 2 Dec 2009 at 10:55, Erez Zilber wrote:
>
>> I'd like to make some changes in the logging in open-iscsi. The
>> current status is as follows:
>>
>> kernel modules:
>>
>> * We use iscsi_cls_session_printk & iscsi_cls_conn_printk in
>> scsi_transport_iscsi.c. They are sometimes wrapped by macros (e.g.
>> ISCSI_DBG_TRANS_SESSION). These macros use KERN_INFO and are
>> controlled by module parameters.
>>
>> * We use iscsi_session_printk & iscsi_conn_printk for the rest of the
>> kernel code.These macros wrap iscsi_cls_session_printk &
>> iscsi_cls_conn_printk accordingly. They are sometimes wrapped by
>> macros (e.g. ISCSI_SW_TCP_DBG). These macros use KERN_INFO and are
>> controlled by module parameters.
>>
>> * We sometimes use printk calls.
>>
>> userspace:
>>
>> We use log_warning, log_error & log_debug. They depend on the logging
>> level that we use (0-8). if (log_level > level), the log is sent to
>> syslog with the appropriate log level (LOG_WARNING/LOG_ERR/LOG_DEBUG).
>>
>> My motivation: with the current logging mechanism, if an error occurs,
>> I'm unable to tell exactly what happened. The default logging level is
>> too low. Increasing it affects performance. Another problem is that
>> open-iscsi has too many logging mechanisms.
>>
>> I suggest that:
>> 1. For kernel modules, we will have 'events' (or any better name that
>
> I'd call them "contexts" or "event sources".
>
>> you suggest) like 'session', 'conn', 'eh', 'cmd' etc. For each event,
>> we will have a logging level. For example, the user may want to set
>> the 'conn' event to 'DEBUG'. It means that we will print all conn
>> related logs that are DEBUG and above (e.g. WARNING, ERROR).
>> 2. For userspace code, we could do the same (i.e. have events and a
>> log level per event).
>> 3. Userspace logging uses the 'daemon' facility. This should
>> definitely be the default, but we should allow the user to use another
>> facility. The motivation for doing so is that if we want to send all
>> iscsid logs to a separate file, we can set it to 'local2' for example
>> (instead of 'daemon').
>>
>> Comments?
>
> What I'd wish: For most messages, especially if informational messages are
> enabled, it's not clear what type of severity a message has (i.e. debug, info,
> warning, error, fatal error). I'd wish the severity would be obvious from the
> message.

I agree.

>
> I once wrote some code to do something like that. Showing some calling 
> examples
> might make it obvious:
>
> message(MSG_TYPE_DEBUG, __LINE__, my_step, instance,
>        "min_pause=%ld\n",
>        min_pause);
>
> message(MSG_TYPE_ERROR, __LINE__, my_step, instance,
>        "Incompatible alert state: file %s, ID %06lx\n",
>        state_file, state.id);
>
> message(MSG_TYPE_INFO, __LINE__, my_step, instance,
>        "OS error (%s: %s)\n",
>        state_file, strerror(last_errno));
>
> message(MSG_TYPE_FATAL, __LINE__, my_step, instance,
>        "No memory for state file %s\n",
>        file);
>
> message(MSG_TYPE_WARN, __LINE__, my_step, instance,
>        "file \"%s\" exceeded size threshold\n",
>        msg_file);
>
> My "instance" is what you called "event"; "step" are steps in processing 
> something
> (an integer), and __LINE__ is just handy if "Instance" includes the source 
> file.

Yes. It would be nice if we had the function + line number for each
log. I'm not sure that I understand the meaning of 'step'. Here's an
example (to make sure that I got you right): if you send a PDU, you
can print multiple messages while preparing & sending the PDU, each
with its own step number. For example:

"(1) allocating PDU"
"(2) init PDU"
"(3) sending PDU"
"(4) PDU was sent"

Is this what you mean?

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Suggestion for new logging mechanism in open-iscsi

2009-12-02 Thread Erez Zilber

I'd like to make some changes in the logging in open-iscsi. The
current status is as follows:

kernel modules:

* We use iscsi_cls_session_printk & iscsi_cls_conn_printk in
scsi_transport_iscsi.c. They are sometimes wrapped by macros (e.g.
ISCSI_DBG_TRANS_SESSION). These macros use KERN_INFO and are
controlled by module parameters.

* We use iscsi_session_printk & iscsi_conn_printk for the rest of the
kernel code.These macros wrap iscsi_cls_session_printk &
iscsi_cls_conn_printk accordingly. They are sometimes wrapped by
macros (e.g. ISCSI_SW_TCP_DBG). These macros use KERN_INFO and are
controlled by module parameters.

* We sometimes use printk calls.

userspace:

We use log_warning, log_error & log_debug. They depend on the logging
level that we use (0-8). if (log_level > level), the log is sent to
syslog with the appropriate log level (LOG_WARNING/LOG_ERR/LOG_DEBUG).

My motivation: with the current logging mechanism, if an error occurs,
I'm unable to tell exactly what happened. The default logging level is
too low. Increasing it affects performance. Another problem is that
open-iscsi has too many logging mechanisms.

I suggest that:
1. For kernel modules, we will have 'events' (or any better name that
you suggest) like 'session', 'conn', 'eh', 'cmd' etc. For each event,
we will have a logging level. For example, the user may want to set
the 'conn' event to 'DEBUG'. It means that we will print all conn
related logs that are DEBUG and above (e.g. WARNING, ERROR).
2. For userspace code, we could do the same (i.e. have events and a
log level per event).
3. Userspace logging uses the 'daemon' facility. This should
definitely be the default, but we should allow the user to use another
facility. The motivation for doing so is that if we want to send all
iscsid logs to a separate file, we can set it to 'local2' for example
(instead of 'daemon').

Comments?

Thanks,
Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: [PATCH] Maintain a list of nop-out PDUs that almost timed out

2009-12-01 Thread Erez Zilber

On Tue, Dec 1, 2009 at 3:16 PM, Ulrich Windl
 wrote:
> On 1 Dec 2009 at 14:57, Erez Zilber wrote:
>
>> Maintain a list of nop-out PDUs that almost timed out.
>> With this information, you can understand and debug the
>> whole system: you can check your target and see what caused
>> it to be so slow on that specific time, you can see if your
>> network was very busy during that time etc.
>>
>
> Hi!
>
> Having studied TCP overload protection and flow control mechanisms recently, I
> wondered if a look at the TCP window sizes could be a indicator equivalent to
> timed-out nops. My idea is: Why implement something, if it's possibly already
> there for free.
>
> Regards,
> Ulrich
>

We've discussed and agreed on the idea of a list of nop-out PDUs that
almost timed out
(http://groups.google.com/group/open-iscsi/msg/02b93de35e80697c) and
here's the patch. I'm not familiar enough with the mechanism that you
describe, but the patch that I sent does the job with an easy to use
iscsiadm interface.

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: [PATCH] Maintain a list of nop-out PDUs that almost timed out

2009-12-01 Thread Erez Zilber

On Tue, Dec 1, 2009 at 2:57 PM, Erez Zilber  wrote:
> On Sun, Nov 22, 2009 at 11:24 PM, Erez Zilber  wrote:
>> On Sun, Nov 22, 2009 at 10:44 PM, Mike Christie  wrote:
>>> Erez Zilber wrote:
>>>> On Thu, Nov 19, 2009 at 10:02 PM, Mike Christie  
>>>> wrote:
>>>>> Ulrich Windl wrote:
>>>>>> On 19 Nov 2009 at 11:07, Erez Zilber wrote:
>>>>>>
>>>>>>> On Thu, Nov 19, 2009 at 9:38 AM, Ulrich Windl
>>>>>>>  wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> Wouldn't it be more obvious to calculate the average delay to a ping 
>>>>>>>> request?
>>>>>>>> (Possibly exponential average as for the system loads) (min and Max 
>>>>>>>> would be good
>>>>>>>> as well, but standard deviation probably requires use of the FPU, so 
>>>>>>>> that's not
>>>>>>>> possible in kernel modules (AFAIK)).
>>>>>>> It's in userspace, so (almost) everything is possible. It's nice to
>>>>>>> have counters, average delay etc, but I want to be able to know
>>>>>>> exactly when bad things almost happened (i.e. timeout almost expired).
>>>>>>> Counters/average delay will not help me.
>>>>>> I thought you want to tune the timeouts. So if properly tuned, the 
>>>>>> kernel will log
>>>>>> when when your measurements are unusual (i.e. timeout exceeded).
>>>>>>
>>>>> I think that is what I wanted. I think Erez wants something a little
>>>>> different, right Erez?
>>>>
>>>> I think that it would be nice if we had both:
>>>> 1. The average delay of a ping request.
>>>> 2. A list of ping requests that almost timed out with some helpful
>>>> info (when was the ping sent and how much time until we got a
>>>> response). With this information, you can understand and debug the
>>>> whole system: you can check your target and see what caused it to be
>>>> so slow on that specific time, you can see if your network was very
>>>> busy during that time etc.
>>>>
>>>
>>> I think this sounds good to me.
>>>
>>
>> Great. I will try to send a patch soon.
>>
>> Erez
>>
>
> Maintain a list of nop-out PDUs that almost timed out.
> With this information, you can understand and debug the
> whole system: you can check your target and see what caused
> it to be so slow on that specific time, you can see if your
> network was very busy during that time etc.
>
> Signed-off-by: Erez Zilber 
>

One comment - I've created a compat patch only for 2.6.14-23. If
everybody is pleased with this patch, I will make sure that all other
compat patches work and resend the patch.

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Information about iSCSI pings that almost timed out

2009-12-01 Thread Erez Zilber

On Sun, Nov 22, 2009 at 10:44 PM, Mike Christie  wrote:
> Erez Zilber wrote:
>> On Thu, Nov 19, 2009 at 10:02 PM, Mike Christie  wrote:
>>> Ulrich Windl wrote:
>>>> On 19 Nov 2009 at 11:07, Erez Zilber wrote:
>>>>
>>>>> On Thu, Nov 19, 2009 at 9:38 AM, Ulrich Windl
>>>>>  wrote:
>>>>>> Hi!
>>>>>>
>>>>>> Wouldn't it be more obvious to calculate the average delay to a ping 
>>>>>> request?
>>>>>> (Possibly exponential average as for the system loads) (min and Max 
>>>>>> would be good
>>>>>> as well, but standard deviation probably requires use of the FPU, so 
>>>>>> that's not
>>>>>> possible in kernel modules (AFAIK)).
>>>>> It's in userspace, so (almost) everything is possible. It's nice to
>>>>> have counters, average delay etc, but I want to be able to know
>>>>> exactly when bad things almost happened (i.e. timeout almost expired).
>>>>> Counters/average delay will not help me.
>>>> I thought you want to tune the timeouts. So if properly tuned, the kernel 
>>>> will log
>>>> when when your measurements are unusual (i.e. timeout exceeded).
>>>>
>>> I think that is what I wanted. I think Erez wants something a little
>>> different, right Erez?
>>
>> I think that it would be nice if we had both:
>> 1. The average delay of a ping request.
>> 2. A list of ping requests that almost timed out with some helpful
>> info (when was the ping sent and how much time until we got a
>> response). With this information, you can understand and debug the
>> whole system: you can check your target and see what caused it to be
>> so slow on that specific time, you can see if your network was very
>> busy during that time etc.
>>
>
> I think this sounds good to me.
>

Regrading the average delay of a ping request task - we need to have
the average delay, but we're interested only in the average delay of
pings that were sent lately (i.e. not pings that were sent a year
ago). Am I right?

I thought about having a cyclic array of delays in the kernel. It can
hold the delays of the last X pings (e.g. X = 1000). Whenever the user
runs 'iscsiadm -m session -s', this array will be sent to userspace
and we can calc the average delay/standard deviation/whatever you want
in userland.

Comments?

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Information about iSCSI pings that almost timed out

2009-11-22 Thread Erez Zilber

On Sun, Nov 22, 2009 at 10:44 PM, Mike Christie  wrote:
> Erez Zilber wrote:
>> On Thu, Nov 19, 2009 at 10:02 PM, Mike Christie  wrote:
>>> Ulrich Windl wrote:
>>>> On 19 Nov 2009 at 11:07, Erez Zilber wrote:
>>>>
>>>>> On Thu, Nov 19, 2009 at 9:38 AM, Ulrich Windl
>>>>>  wrote:
>>>>>> Hi!
>>>>>>
>>>>>> Wouldn't it be more obvious to calculate the average delay to a ping 
>>>>>> request?
>>>>>> (Possibly exponential average as for the system loads) (min and Max 
>>>>>> would be good
>>>>>> as well, but standard deviation probably requires use of the FPU, so 
>>>>>> that's not
>>>>>> possible in kernel modules (AFAIK)).
>>>>> It's in userspace, so (almost) everything is possible. It's nice to
>>>>> have counters, average delay etc, but I want to be able to know
>>>>> exactly when bad things almost happened (i.e. timeout almost expired).
>>>>> Counters/average delay will not help me.
>>>> I thought you want to tune the timeouts. So if properly tuned, the kernel 
>>>> will log
>>>> when when your measurements are unusual (i.e. timeout exceeded).
>>>>
>>> I think that is what I wanted. I think Erez wants something a little
>>> different, right Erez?
>>
>> I think that it would be nice if we had both:
>> 1. The average delay of a ping request.
>> 2. A list of ping requests that almost timed out with some helpful
>> info (when was the ping sent and how much time until we got a
>> response). With this information, you can understand and debug the
>> whole system: you can check your target and see what caused it to be
>> so slow on that specific time, you can see if your network was very
>> busy during that time etc.
>>
>
> I think this sounds good to me.
>

Great. I will try to send a patch soon.

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.

Re: Information about iSCSI pings that almost timed out

2009-11-22 Thread Erez Zilber

On Thu, Nov 19, 2009 at 10:02 PM, Mike Christie  wrote:
> Ulrich Windl wrote:
>> On 19 Nov 2009 at 11:07, Erez Zilber wrote:
>>
>>> On Thu, Nov 19, 2009 at 9:38 AM, Ulrich Windl
>>>  wrote:
>>>> Hi!
>>>>
>>>> Wouldn't it be more obvious to calculate the average delay to a ping 
>>>> request?
>>>> (Possibly exponential average as for the system loads) (min and Max would 
>>>> be good
>>>> as well, but standard deviation probably requires use of the FPU, so 
>>>> that's not
>>>> possible in kernel modules (AFAIK)).
>>> It's in userspace, so (almost) everything is possible. It's nice to
>>> have counters, average delay etc, but I want to be able to know
>>> exactly when bad things almost happened (i.e. timeout almost expired).
>>> Counters/average delay will not help me.
>>
>> I thought you want to tune the timeouts. So if properly tuned, the kernel 
>> will log
>> when when your measurements are unusual (i.e. timeout exceeded).
>>
>
> I think that is what I wanted. I think Erez wants something a little
> different, right Erez?

I think that it would be nice if we had both:
1. The average delay of a ping request.
2. A list of ping requests that almost timed out with some helpful
info (when was the ping sent and how much time until we got a
response). With this information, you can understand and debug the
whole system: you can check your target and see what caused it to be
so slow on that specific time, you can see if your network was very
busy during that time etc.

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.

Re: Information about iSCSI pings that almost timed out

2009-11-19 Thread Erez Zilber

On Thu, Nov 19, 2009 at 9:38 AM, Ulrich Windl
 wrote:
> Hi!
>
> Wouldn't it be more obvious to calculate the average delay to a ping request?
> (Possibly exponential average as for the system loads) (min and Max would be 
> good
> as well, but standard deviation probably requires use of the FPU, so that's 
> not
> possible in kernel modules (AFAIK)).

It's in userspace, so (almost) everything is possible. It's nice to
have counters, average delay etc, but I want to be able to know
exactly when bad things almost happened (i.e. timeout almost expired).
Counters/average delay will not help me.

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.

Re: Information about iSCSI pings that almost timed out

2009-11-18 Thread Erez Zilber

On Thu, Nov 19, 2009 at 2:49 AM, Mike Christie  wrote:
> Erez Zilber wrote:
>> open-iscsi sends nop-outs to the target. If the target responds quick
>> enough, we don't get a timeout. I'd like to know (for internal debug
>> purposes) how many times the ping timer almost expired. This sounds
>> like a useful feature also for other open-iscsi developers/users.
>>
>> I was thinking about adding the following mechanism:
>>
>> 1. Add an array of some length to store long nop-outs. Protect it with
>> some lock.
>> 2. If a nop-in (as a response to nop-out) was received after >= 0.7 *
>> ping_to (or 0.8 or whatever), add some info about it to the array
>> (when was the nop-out sent, how much time until we got a nop-in etc).
>> 3. The array should be used in a cyclic way - when it gets full,
>> overwrite the 1st entry.
>> 4. We can dump the info from the array from time to time or the user
>> may use iscsiadm to do that. When this is done, we can delete the
>> contents of the array.
>>
>
> What info did you want to store?

The following info:
1. The exact time when the nop-out was sent.
2. How much time until we got the nop-in.

>
>
> What about some perf counters for it? Add a counter for if a ping took
> a-b secs, b-c secs, and c-d secs.

This is problematic because:
1. You can set your ping timeout to X. According to your suggestion,
we will need to have the following counters:
a. < 0.2*X
b. 0.2*X - 0.4*X
c. 0.4*X - 0.6*X
d. 0.6*X - 0.8*X
e. 0.8*X - X
This means that the names of the counters will change according to
the value of X. You may have different values of X for different
connections which makes this more problematic.
2. If you have the array that I suggest instead of these counters and
you see that on 17:45:07 you had a long ping (that did not time out),
you can check what happened on that time in your target. This may be
very helpful.

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.

Information about iSCSI pings that almost timed out

2009-11-18 Thread Erez Zilber

open-iscsi sends nop-outs to the target. If the target responds quick
enough, we don't get a timeout. I'd like to know (for internal debug
purposes) how many times the ping timer almost expired. This sounds
like a useful feature also for other open-iscsi developers/users.

I was thinking about adding the following mechanism:

1. Add an array of some length to store long nop-outs. Protect it with
some lock.
2. If a nop-in (as a response to nop-out) was received after >= 0.7 *
ping_to (or 0.8 or whatever), add some info about it to the array
(when was the nop-out sent, how much time until we got a nop-in etc).
3. The array should be used in a cyclic way - when it gets full,
overwrite the 1st entry.
4. We can dump the info from the array from time to time or the user
may use iscsiadm to do that. When this is done, we can delete the
contents of the array.

Comments? Objections?

Erez

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.

Re: [Patch 2/2] iscsiadm: checking return value of iscsid_req_wait() in iscsid_req_by_rec()

2009-11-15 Thread Erez Zilber


On Fri, Nov 13, 2009 at 9:00 PM, Yangkook Kim  wrote:
>>It looks like you used the git tree. What branch did you use? I am
>>asking because I could not find the code below.
>
> I actually did't use git. I just used "diff -Naur file1 file2 > my patch".
> and put "signed-off" by myself.
>
> This is actually very first time to send patch and dont't know the correct
> manner to send a patch to maintainer. Sorry for making confused you.
>
> Do I have to use git to make patch?
>
> I actually put modified patch of util.c that adds checking
> MGMT_IPC_ERR_EXISTS as
> I explained in the reply to your question in [Pathch 1/2].
>
> If it would be better if making patch using git, I will do so.
>
> Also I would be very appriciate if you briefly tell me the correct manner
> of sending patch to you.
>

What you should do is:

1. Clone the open-iscsi.git tree. The shortest way to do that is:
a. Install git - it usually comes with your distro.
b. Run 'git clone
git://git.kernel.org/pub/scm/linux/kernel/git/mnc/open-iscsi.git'
c. Run 'cd open-iscsi'
2. Now, make some preparations:
a. git repo-config user.name "Yangkook Kim"
b. git repo-config user.email yangkook...@gmail.com
3. Make your code changes.
4. Commit you code:
a. Use 'git add' to add the files that you want to have in the commit.
b. After adding all files, you can run 'git diff --cached --color'.
c. Commit your code: 'git commit -s'
d. Write some commit message.
5. Create a patch: use 'git format-patch -n commitish'. For example,
if commit 'A' was the HEAD when you cloned the tree and you make
commits 'B', 'C' & 'D', then 'git format-patch -n A' will create 3
patch files (for 'B', 'C' & 'D'),
6. Check your patches - use checkpatch
(http://lxr.linux.no/#linux+v2.6.31/scripts/checkpatch.pl) to make
sure that your patch has the correct style etc. Run 'checkpatch.pl
--no-tree '.
7. Send your patch.

Good luck,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

open-iscsi and multiqueue

2009-11-04 Thread Erez Zilber


Mike,

Can open-iscsi utilize the multiqueue feature
(http://lwn.net/Articles/289137/) in order to improve performance? If
yes, does it require any change in open-iscsi? Did you try to measure
the performance improvement?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Adding new logs to userspace code

2009-10-22 Thread Erez Zilber


Mike,

I was thinking about adding some debug prints in the userspace code. I
have some questions:
1. Is there any convention for the log level in log_debug?
2. What about adding log_info (i.e. between log_warning & log_debug)?
3. In the redhat init script, iscsid runs as 'daemon iscsid'. How can
I start it with parameters (e.g. iscsid -d8)? I can't just run 'daemon
iscsid -d8'.

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

2 questions about log.c

2009-10-18 Thread Erez Zilber


I took a look at dolog() & log_flush(). Both use semop. If I
understood the semop man page correctly, using a negative sem_op value
means 'down' (i.e. enter a critical section). Using a positive sem_op
value means 'up' (i.e. leave the critical section). According to that,
it looks to me that the syslog calls in dolog() & log_flush() print
incorrect information. Am I right?

Another (bigger) problem - from time to time, when I run 'iscsiadm -m
node -U all', it never returns. When I ran 'echo t >
/proc/sysrq-trigger', I got the following:

iscsidS  0  8441  1  8442 24234 (NOTLB)
Oct 15 14:46:29 b73 kernel:  81012e28dd28 0086
 7fb18660
Oct 15 14:46:29 b73 kernel:  81012e28de48 000a
81003e3bd080 810143b85100
Oct 15 14:46:29 b73 kernel:  912e761c6b46 2973
81003e3bd268 0006800547f7
Oct 15 14:46:29 b73 kernel: Call Trace:
Oct 15 14:46:29 b73 kernel:  [] __next_cpu+0x19/0x28
Oct 15 14:46:29 b73 kernel:  [] find_busiest_group+0x20d/0x621
Oct 15 14:46:29 b73 kernel:  [] sys_semtimedop+0x627/0x720
Oct 15 14:46:29 b73 kernel:  [] thread_return+0x62/0xfe
Oct 15 14:46:29 b73 kernel:  [] lock_hrtimer_base+0x26/0x4c
Oct 15 14:46:29 b73 kernel:  []
hrtimer_try_to_cancel+0x4a/0x53
Oct 15 14:46:29 b73 kernel:  [] hrtimer_cancel+0xc/0x16
Oct 15 14:46:29 b73 kernel:  [] do_nanosleep+0x47/0x70
Oct 15 14:46:29 b73 kernel:  [] hrtimer_nanosleep+0x58/0x118
Oct 15 14:46:29 b73 kernel:  [] tracesys+0xd5/0xe0
Oct 15 14:46:29 b73 kernel:
Oct 15 14:46:29 b73 kernel: iscsidS 800627ba 0
8442  1 29016  8441 (NOTLB)
Oct 15 14:46:29 b73 kernel:  81012fa65d28 0086
 8101a0282100
Oct 15 14:46:29 b73 kernel:  81012fa65e10 000a
81067683d040 81017b7fb080
Oct 15 14:46:29 b73 kernel:  912e761c6007 28ca
81067683d228 00018003d267
Oct 15 14:46:29 b73 kernel: Call Trace:
Oct 15 14:46:29 b73 kernel:  [] sys_semtimedop+0x627/0x720
Oct 15 14:46:29 b73 kernel:  []
inet_stream_connect+0x225/0x236
Oct 15 14:46:29 b73 kernel:  [] sock_getsockopt+0x326/0x348
Oct 15 14:46:29 b73 kernel:  [] lock_sock+0xa7/0xb2
Oct 15 14:46:29 b73 kernel:  [] sys_connect+0x7e/0xae
Oct 15 14:46:29 b73 kernel:  [] tracesys+0xd5/0xe0

It looks like both iscsid processes are waiting for a semaphore.

Later, when I ran strace, I got the following logs (because semop was
interrupted):

Oct 15 14:53:28 b73 iscsid: semop up failed 4
Oct 15 14:53:56 b73 iscsid: semop down failed
Oct 15 14:54:27 b73 iscsid: semop up failed 4

BTW - why do we always have 2 iscsid processes?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: Strange syslog message from iscsid

2009-10-06 Thread Erez Zilber


On Tue, Oct 6, 2009 at 6:59 PM, Mike Christie  wrote:
>
> On 10/05/2009 05:44 AM, Erez Zilber wrote:
>> Sometimes, I see the following empty syslog message on my console:
>>
>> Message from syslogd@ at Mon Oct  5 12:43:21 2009 ...
>> kpc19 iscsid:
>>
>
> I have not seen it. Do you know if it is during login or logout or
> during error handling?

It happened to me during an error scenario, but I'm not 100% sure.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Strange syslog message from iscsid

2009-10-05 Thread Erez Zilber


Sometimes, I see the following empty syslog message on my console:

Message from syslogd@ at Mon Oct  5 12:43:21 2009 ...
kpc19 iscsid:

Where does it come from? Does anyone else see this?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: [PATCH] Fix compilation warnings in usr/kernel code

2009-09-08 Thread Erez Zilber


On Tue, Sep 8, 2009 at 9:01 PM, Mike Christie wrote:
>
> On 09/06/2009 02:11 AM, Erez Zilber wrote:
>> On Thu, Sep 3, 2009 at 8:02 PM, Mike Christie  wrote:
>>> On 09/03/2009 11:21 AM, Erez Zilber wrote:
>>>> Fix compilation warnings and modify the Makefiles to treat
>>>> warnings as errors.
>>>>
>>>> Signed-off-by: Erez Zilber
>>>>
>>>
>>> Thanks.
>>>
>>>
>>> I get this compilation error on fedora 10. We used to get a warning about 
>>> it not being initialized and upstream we did this patch. Is this just a bug 
>>> in my compiler?
>>>
>>>
>>> @@ -693,6 +692,7 @@ int iscsi_add_session(struct iscsi_cls_session 
>>> *session, unsigned int target_id)
>>>                                                  "Too many iscsi targets. 
>>> Max "
>>>                                                  "number of targets is 
>>> %d.\n",
>>>                                                  ISCSI_MAX_TARGET - 1);
>>> +                       err = -EOVERFLOW;
>>>                         goto release_host;
>>>                 }
>>>         }
>>>
>>>
>>>   make
>>> make -C /lib/modules/2.6.29.4-167.fc11.x86_64/build M=`pwd` KBUILD_OUTPUT=  
>>> V=0 modules
>>> make[1]: Entering directory `/usr/src/kernels/2.6.29.4-167.fc11.x86_64'
>>>   CC [M]  
>>> /home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel/scsi_transport_iscsi.o
>>> cc1: warnings being treated as errors
>>> /home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel/scsi_transport_iscsi.c:
>>>  In function ‘iscsi_add_session’:
>>> /home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel/scsi_transport_iscsi.c:678:
>>>  error: ‘err’ may be used uninitialized in this function
>>> make[2]: *** 
>>> [/home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel/scsi_transport_iscsi.o]
>>>  Error 1
>>> make[1]: *** 
>>> [_module_/home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel] Error 2
>>> make[1]: Leaving directory `/usr/src/kernels/2.6.29.4-167.fc11.x86_64'
>>
>> Yeah, this looks strange because err should have been initialized in
>> the for loop (assuming that ISCSI_MAX_TARGET>  0). Maybe the compiler
>> doesn't check that (which may be considered a bug/feature ;) ). Do you
>> want to add this fix also to open-iscsi.git?
>>
>
> I messed up. I thought we already had the fix, so I was wondering why I
> get the compilation error with your patch. I mean I thought your patch
> added a new failure somehow. Ignore me :)
>

OK, so do you want to apply my patch as is + the required fix in
iscsi_add_session?

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: [PATCH] Fix compilation warnings in usr/kernel code

2009-09-06 Thread Erez Zilber


On Thu, Sep 3, 2009 at 8:02 PM, Mike Christie wrote:
>
> On 09/03/2009 11:21 AM, Erez Zilber wrote:
>> Fix compilation warnings and modify the Makefiles to treat
>> warnings as errors.
>>
>> Signed-off-by: Erez Zilber 
>>
>
>
> Thanks.
>
>
> I get this compilation error on fedora 10. We used to get a warning about it 
> not being initialized and upstream we did this patch. Is this just a bug in 
> my compiler?
>
>
> @@ -693,6 +692,7 @@ int iscsi_add_session(struct iscsi_cls_session *session, 
> unsigned int target_id)
>                                                 "Too many iscsi targets. Max "
>                                                 "number of targets is %d.\n",
>                                                 ISCSI_MAX_TARGET - 1);
> +                       err = -EOVERFLOW;
>                        goto release_host;
>                }
>        }
>
>
>  make
> make -C /lib/modules/2.6.29.4-167.fc11.x86_64/build M=`pwd` KBUILD_OUTPUT=  
> V=0 modules
> make[1]: Entering directory `/usr/src/kernels/2.6.29.4-167.fc11.x86_64'
>  CC [M]  
> /home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel/scsi_transport_iscsi.o
> cc1: warnings being treated as errors
> /home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel/scsi_transport_iscsi.c:
>  In function ‘iscsi_add_session’:
> /home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel/scsi_transport_iscsi.c:678:
>  error: ‘err’ may be used uninitialized in this function
> make[2]: *** 
> [/home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel/scsi_transport_iscsi.o]
>  Error 1
> make[1]: *** 
> [_module_/home/mnc/kernel/iscsi/open-iscsi/devel/open-iscsi/kernel] Error 2
> make[1]: Leaving directory `/usr/src/kernels/2.6.29.4-167.fc11.x86_64'

Yeah, this looks strange because err should have been initialized in
the for loop (assuming that ISCSI_MAX_TARGET > 0). Maybe the compiler
doesn't check that (which may be considered a bug/feature ;) ). Do you
want to add this fix also to open-iscsi.git?

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: [PATCH] Fix compilation warnings in usr/kernel code

2009-09-03 Thread Erez Zilber

On Thu, Sep 3, 2009 at 7:21 PM, Erez Zilber wrote:
> Fix compilation warnings and modify the Makefiles to treat
> warnings as errors.
>
> Signed-off-by: Erez Zilber 
>

2 comments about this patch:
1. I'm not familiar enough with the fwparam_ibft code, so I didn't add
the -Werror flag to the Makefile. There are still compilation warnings
in that code.
2. I was able to test it on CentOS 5.3 & kubuntu 9.04 only. If anyone
can test it on other kernels and submit fixes, it would be great.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

[PATCH] Fix compilation warnings in usr/kernel code

2009-09-03 Thread Erez Zilber

Fix compilation warnings and modify the Makefiles to treat
warnings as errors.

Signed-off-by: Erez Zilber 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

From 6c97d8d941fa9caaae42f01d686465d8b49f Mon Sep 17 00:00:00 2001
From: Erez Zilber 
Date: Thu, 3 Sep 2009 14:12:27 +0300
Subject: [PATCH] Fix compilation warnings in usr/kernel code

Fix compilation warnings and modify the Makefiles to treat
warnings as errors.

Signed-off-by: Erez Zilber 
---
 kernel/2.6.14-23_compat.patch|  106 ++
 kernel/Makefile  |2 +-
 kernel/libiscsi.c|3 +-
 usr/Makefile |2 +-
 usr/auth.c   |   34 +---
 usr/iface.c  |   38 --
 usr/iscsid.c |   22 +++-
 usr/log.c|   34 
 usr/mgmt_ipc.c   |9 +++-
 usr/strings.c|2 +-
 usr/util.c   |   26 --
 utils/Makefile   |2 +-
 utils/fwparam_ibft/fwparam_ppc.c |7 ++-
 utils/sysdeps/Makefile   |2 +-
 14 files changed, 157 insertions(+), 132 deletions(-)

diff --git a/kernel/2.6.14-23_compat.patch b/kernel/2.6.14-23_compat.patch
index ab233bb..4b02d2e 100644
--- a/kernel/2.6.14-23_compat.patch
+++ b/kernel/2.6.14-23_compat.patch
@@ -1,8 +1,8 @@
 diff --git a/iscsi_tcp.c b/iscsi_tcp.c
-index caa116c..71df5b9 100644
+index bce1594..c46888d 100644
 --- a/iscsi_tcp.c
 +++ b/iscsi_tcp.c
-@@ -456,11 +456,9 @@ static int iscsi_sw_tcp_pdu_init(struct iscsi_task *task,
+@@ -460,11 +460,9 @@ static int iscsi_sw_tcp_pdu_init(struct iscsi_task *task,
  	if (!task->sc)
  		iscsi_sw_tcp_send_linear_data_prep(conn, task->data, count);
  	else {
@@ -17,7 +17,7 @@ index caa116c..71df5b9 100644
  	}
  
  	if (err) {
-@@ -793,7 +791,11 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
+@@ -797,7 +795,11 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
  	shost->max_lun = iscsi_max_lun;
  	shost->max_id = 0;
  	shost->max_channel = 0;
@@ -29,7 +29,7 @@ index caa116c..71df5b9 100644
  
  	if (iscsi_host_add(shost, NULL))
  		goto free_host;
-@@ -832,12 +834,6 @@ static void iscsi_sw_tcp_session_destroy(struct iscsi_cls_session *cls_session)
+@@ -836,12 +838,6 @@ static void iscsi_sw_tcp_session_destroy(struct iscsi_cls_session *cls_session)
  	iscsi_host_free(shost);
  }
  
@@ -42,7 +42,7 @@ index caa116c..71df5b9 100644
  static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
  {
  	blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
-@@ -846,6 +842,9 @@ static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
+@@ -850,6 +846,9 @@ static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
  }
  
  static struct scsi_host_template iscsi_sw_tcp_sht = {
@@ -52,7 +52,7 @@ index caa116c..71df5b9 100644
  	.module			= THIS_MODULE,
  	.name			= "iSCSI Initiator over TCP/IP",
  	.queuecommand   = iscsi_queuecommand,
-@@ -856,9 +855,8 @@ static struct scsi_host_template iscsi_sw_tcp_sht = {
+@@ -860,9 +859,8 @@ static struct scsi_host_template iscsi_sw_tcp_sht = {
  	.cmd_per_lun		= ISCSI_DEF_CMD_PER_LUN,
  	.eh_abort_handler   = iscsi_eh_abort,
  	.eh_device_reset_handler= iscsi_eh_device_reset,
@@ -77,7 +77,7 @@ index f9a4044..ab20530 100644
  #include "libiscsi_tcp.h"
  
 diff --git a/libiscsi.c b/libiscsi.c
-index fe4b66e..6217f76 100644
+index 223a5b8..161c971 100644
 --- a/libiscsi.c
 +++ b/libiscsi.c
 @@ -24,7 +24,10 @@
@@ -91,7 +91,7 @@ index fe4b66e..6217f76 100644
  #include 
  #include 
  #include 
-@@ -60,6 +63,8 @@ MODULE_PARM_DESC(debug_libiscsi, "Turn on debugging for libiscsi module. "
+@@ -83,6 +86,8 @@ MODULE_PARM_DESC(debug_libiscsi_eh,
  	 __func__, ##arg);		\
  	} while (0);
  
@@ -100,7 +100,7 @@ index fe4b66e..6217f76 100644
  /* Serial Number Arithmetic, 32 bits, less than, RFC1982 */
  #define SNA32_CHECK 2147483648UL
  
-@@ -229,7 +234,7 @@ static int iscsi_prep_bidi_ahs(struct iscsi_task *task)
+@@ -252,7 +257,7 @@ static int iscsi_prep_bidi_ahs(struct iscsi_task *task)
  		  sizeof(rlen_ahdr->reserved));
  	rlen_ahdr->ahstype = ISCSI_AHSTYPE_RLENGTH;
  	rlen_ahdr->reserved = 0;
@@ -109,7 +109,7 @@ index fe4b66e..6217f76 100644
  
  	ISCSI_DBG_SESSION(task->conn->session,
  			  "bidi-in rlen_ahdr->read_length(%d) "
-@@ -300,7 +305,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task

Re: [PATCH] decrease sndtmo

2009-08-06 Thread Erez Zilber


On Wed, Aug 5, 2009 at 10:22 PM, Mike Christie wrote:
>
> On 08/05/2009 12:34 PM, Erez Zilber wrote:
>>> I found it. The problem is that we will send the signal if the xmit
>>> thread is running or not. If it is not running the workqueue code will
>>> keep getting woken up to handle the signal, but because we have not
>>> called queue_work the workqueue code will not let the thread run so we
>>> never get to flush the signal until we reconnect and send down a login
>>> pdu (the login pdu does a queue_work finally).
>>>
>>
>> When you say "the xmit thread is running", I guess that you mean that
>> the xmit thread is busy with IO, right? Note that I said that this
>
> No. workqueue.c:worker_thread() is spinning. It is looping because there
> is a signal pending, but the iscsi work code which has the flush_signals
> is not getting run because there is no work queued.
>
> So you could add a
>
> if (signal_pending(current))
>        flush_signals(current)
>
> to worker_thread() "for" loop and I think this will fix the problem.
>

Looks like this solves the problem. I've added the following patch to
the centos 5.3 kernel (2.6.18-128.1.6.el5):

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8594efb..e148ed8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -253,6 +253,9 @@ static int worker_thread(void *__cwq)

set_current_state(TASK_INTERRUPTIBLE);
while (!kthread_should_stop()) {
+   if (signal_pending(current))
+   flush_signals(current);
+
add_wait_queue(&cwq->more_work, &wait);
if (list_empty(&cwq->worklist))
schedule();

I'm running with open-iscsi.git + 2 commits from linux-2.6-iscsi.git
(9c302cc45b70ecc4b606d65a445902381066061b &
75be23dc40ba2f215779d5ba60fda9a762271bbe).

Will you push it upstream & into the RHEL kernel?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: [PATCH] decrease sndtmo

2009-08-05 Thread Erez Zilber


On Wed, Aug 5, 2009 at 7:45 PM, Mike Christie wrote:
>
> On 08/05/2009 11:33 AM, Mike Christie wrote:
>> On 08/05/2009 11:26 AM, Mike Christie wrote:
>>> On 08/05/2009 11:01 AM, Erez Zilber wrote:
>>>> On Wed, Aug 5, 2009 at 6:19 PM, Mike Christie    
>>>> wrote:
>>>>> On 08/04/2009 01:12 PM, Erez Zilber wrote:
>>>>>> On Tue, Aug 4, 2009 at 8:17 PM, Mike Christie      
>>>>>> wrote:
>>>>>>> Erez Zilber wrote:
>>>>>>>> I'm running with open-iscsi.git HEAD + the check suspend bit patch +
>>>>>>>> the wake xmit on error patch. If I disconnect the cable on the
>>>>>>>> initiator side (even while not running IO), I see that after sending
>>>>>>>> the signal, the  iscsi_q_XX thread reaches 100% cpu. I ran it over
>>>>>>>> several 1GB/ 10 GB drivers and got the same results.
>>>>>>>>
>>>>>>>> If I remove the  wake xmit on error patch, I don't see this behavior.
>>>>>>>>
>>>>>>> Shoot, I have been running the xmit wakeup and suspend bit patch here
>>>>>>> fine. Let me do some more testing.
>>>>>>>
>>>>>>> Is this something you always hit? Could you send me the final patch you
>>>>>>> ended up using?
>>>>>> I see this every time. Note that I'm not running with
>>>>>> linux-2.6-iscsi.git. I'm using the open-iscsi.git tree + the 2 patches
>>>>>> that I took without any change (using git-show) from the
>>>>>> linux-2.6-iscsi.git tree. Which tree did you test it on?
>>>>>>
>>>>>> I added some printks to the code and saw that the signal does get sent
>>>>>> from iscsi_sw_tcp_conn_stop, but I didn't see that (rc == -EINTR || rc
>>>>>> == -EAGAIN) in  iscsi_sw_tcp_xmit (), even when I ran IO on that
>>>>>> session.
>>>>>>
>>>>> Does r in iscsi_sw_tcp_xmit_segment == 0?
>>>>>
>>>> No, it is never zero.
>>>>
>>>>> If not I think you need a diffferent patch. In one of the patch versions
>>>>> iscsi_sw_tcp_xmit_segment could return -ENODATA (this is when I had a
>>>>> check for suspend_tx in there). iscsi_sw_tcp_xmit did not check this and
>>>>> so I think  we can loop.
>>>>>
>>>>> Could you try the attached patch. It was made over open-iscsi.git for
>>>>> you. I dropped the suspend bit check in iscsi_sw_tcp_xmit_segment,
>>>>> because it is not needed. If we end up blocking the signal will wake us.
>>>> I ran it and got the same 100% cpu usage. Did you try to run it on
>>>> your machines with open-iscsi.git? Did you see a different behavior?
>>>>
>>> I just ran it. Maybe I am looking for the wrong thing though.
>>>
>>> For your problem, when the signal is sent does the recovery go ok and we
>>> end up reconnecting? But the problem is just that the xmit thread takes
>>> up 100% of the cpu?
>>>
>>
>>
>> Ignore this. I see the problem now. I was thinking you did not
>> reconnect. I see the cpu usage. Let me do some digging.
>>
>
> I found it. The problem is that we will send the signal if the xmit
> thread is running or not. If it is not running the workqueue code will
> keep getting woken up to handle the signal, but because we have not
> called queue_work the workqueue code will not let the thread run so we
> never get to flush the signal until we reconnect and send down a login
> pdu (the login pdu does a queue_work finally).
>

When you say "the xmit thread is running", I guess that you mean that
the xmit thread is busy with IO, right? Note that I said that this
happens whether I'm running IO or everything's idle. 2 more thing that
I forgot to mention:

1. I didn't try to reconnect the cable (actually, I disabled the port
in the switch) and see if the problem goes away.
2. When I logout (while the port is stil disconnected), everything
goes back to normal, but I guess that this is because the xmit thread
dies.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: [PATCH] decrease sndtmo

2009-08-05 Thread Erez Zilber


On Wed, Aug 5, 2009 at 6:19 PM, Mike Christie wrote:
> On 08/04/2009 01:12 PM, Erez Zilber wrote:
>> On Tue, Aug 4, 2009 at 8:17 PM, Mike Christie  wrote:
>>> Erez Zilber wrote:
>>>> I'm running with open-iscsi.git HEAD + the check suspend bit patch +
>>>> the wake xmit on error patch. If I disconnect the cable on the
>>>> initiator side (even while not running IO), I see that after sending
>>>> the signal, the  iscsi_q_XX thread reaches 100% cpu. I ran it over
>>>> several 1GB/ 10 GB drivers and got the same results.
>>>>
>>>> If I remove the  wake xmit on error patch, I don't see this behavior.
>>>>
>>> Shoot, I have been running the xmit wakeup and suspend bit patch here
>>> fine. Let me do some more testing.
>>>
>>> Is this something you always hit? Could you send me the final patch you
>>> ended up using?
>>
>> I see this every time. Note that I'm not running with
>> linux-2.6-iscsi.git. I'm using the open-iscsi.git tree + the 2 patches
>> that I took without any change (using git-show) from the
>> linux-2.6-iscsi.git tree. Which tree did you test it on?
>>
>> I added some printks to the code and saw that the signal does get sent
>> from iscsi_sw_tcp_conn_stop, but I didn't see that (rc == -EINTR || rc
>> == -EAGAIN) in  iscsi_sw_tcp_xmit (), even when I ran IO on that
>> session.
>>
>
> Does r in iscsi_sw_tcp_xmit_segment == 0?
>

No, it is never zero.

> If not I think you need a diffferent patch. In one of the patch versions
> iscsi_sw_tcp_xmit_segment could return -ENODATA (this is when I had a
> check for suspend_tx in there). iscsi_sw_tcp_xmit did not check this and
> so I think  we can loop.
>
> Could you try the attached patch. It was made over open-iscsi.git for
> you. I dropped the suspend bit check in iscsi_sw_tcp_xmit_segment,
> because it is not needed. If we end up blocking the signal will wake us.

I ran it and got the same 100% cpu usage. Did you try to run it on
your machines with open-iscsi.git? Did you see a different behavior?

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: [PATCH] decrease sndtmo

2009-08-04 Thread Erez Zilber


On Tue, Aug 4, 2009 at 8:17 PM, Mike Christie wrote:
>
> Erez Zilber wrote:
>>
>> I'm running with open-iscsi.git HEAD + the check suspend bit patch +
>> the wake xmit on error patch. If I disconnect the cable on the
>> initiator side (even while not running IO), I see that after sending
>> the signal, the  iscsi_q_XX thread reaches 100% cpu. I ran it over
>> several 1GB/ 10 GB drivers and got the same results.
>>
>> If I remove the  wake xmit on error patch, I don't see this behavior.
>>
>
> Shoot, I have been running the xmit wakeup and suspend bit patch here
> fine. Let me do some more testing.
>
> Is this something you always hit? Could you send me the final patch you
> ended up using?

I see this every time. Note that I'm not running with
linux-2.6-iscsi.git. I'm using the open-iscsi.git tree + the 2 patches
that I took without any change (using git-show) from the
linux-2.6-iscsi.git tree. Which tree did you test it on?

I added some printks to the code and saw that the signal does get sent
from iscsi_sw_tcp_conn_stop, but I didn't see that (rc == -EINTR || rc
== -EAGAIN) in  iscsi_sw_tcp_xmit (), even when I ran IO on that
session.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: [PATCH] decrease sndtmo

2009-08-04 Thread Erez Zilber


On Sat, Aug 1, 2009 at 6:34 AM, Mike Christie wrote:
> Mike Christie wrote:
>> On 07/31/2009 04:03 AM, Hannes Reinecke wrote:
>>> Mike Christie wrote:
 tcp_sendpages/tcp_sendmsg can wait sndtmo seconds
 if a connection goes bad. This then delays session
 recovery, because that code must wait for the xmit
 thread to flush. OTOH, if we did not wait at all
 we are less efficient in the lock management
 because we must reacquire the session lock every
 time the network layer returns ENOBUFS and runs
 iscsi_tcp's write_space callout.

 This tries to balance the two by reducing the
 wait to 3 seconds from 15. If we have waited 3 secs
 to send a pdu then perf is already taking a hit so
 grabbing the session lock again is not going to make a
 difference. And waiting up to 3 secs for the xmit thread
 to flush and suspend is not that long (at least a lot better
 than 15).

>>> :-)
>>>
>>> Cool. I'm running with 1 sec here, but the principle is
>>> the same. Especially for a multipathed setup you really
>>> want this.
>>>
>>> Oh, what about making this setting dependend on the
>>> transport class timeout?
>>> Worst case sendpages/sendmsg will take up to 3 seconds
>>> now before it even will return an error.
>>> So having a transport class timeout lower than that
>>> is pointless as we have no means of terminating
>>> a call being stuck in sendpages/sendmsg and the
>>> transport class will always terminate the command.
>>>
>>> So we should either limit the transport class timeout
>>> to not being able to be set lower than 3 seconds or
>>> make this timeout set by the transport class timeout.
>>>
>>
>> Good point! Let me backout my patch, and do some more digging on why I
>> cannot just do
>>
>> signal(xmit thread)
>>
>> to wake it from sendpage/sendmsg right away.
>>
>> If I cannot get that to work, then I will send a patch to implement what
>> you describe.
>
> I got the signal stuff working. I am attaching the patch here. I put it
> in my iscsi branch, because it is built over some other patches I sent
> Erez in his logout takes ~50 secs thread.
>

I'm running with open-iscsi.git HEAD + the check suspend bit patch +
the wake xmit on error patch. If I disconnect the cable on the
initiator side (even while not running IO), I see that after sending
the signal, the  iscsi_q_XX thread reaches 100% cpu. I ran it over
several 1GB/ 10 GB drivers and got the same results.

If I remove the  wake xmit on error patch, I don't see this behavior.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: logout takes ~50 seconds while running heavy IO

2009-08-03 Thread Erez Zilber


On Mon, Aug 3, 2009 at 7:27 PM, Mike Christie wrote:
>
> On 08/03/2009 02:31 AM, Erez Zilber wrote:
>> On Sat, Aug 1, 2009 at 6:31 AM, Mike Christie  wrote:
>>> Mike Christie wrote:
>>>> Mike Christie wrote:
>>>>> Mike Christie wrote:
>>>>>> Mike Christie wrote:
>>>>>>> On 07/31/2009 07:43 AM, Erez Zilber wrote:
>>>>>>>> I thought that this patch just reduces the timeout from 15 to 3. Does
>>>>>>>> it also fix the 3 sndtmo periods or is it another patch? What was the
>>>>>>> Another patch. You should have it.
>>>>>>>
>>>>>>>> bug that caused the 3 sndtmo periods waiting?
>>>>>>>>
>>>>>>> I think I meant two.
>>>>>>>
>>>>>>> If iscsi_tcp_xmit_segment sent some data, then got EAGIN, it would drop
>>>>>>> the EAGIN. iscsi_xmit would then retry the operation so we would wait 
>>>>>>> again.
>>>>>>>
>>>>>> There is one more bug.
>>>>>>
>>>>>> Could you try the 15->3 sec snd tmp patch plus the attached patch?
>>>>>>
>>>>>> Another problem is that if we had multiple tasks on the cmd or requeue
>>>>>> lists, and iscsi_tcp returns a error, the write_space function can still
>>>>>> run and queue iscsi_data_xmit. If it was a legetimate problem and
>>>>>> iscsi_conn_failure was run but we raced and iscsi_data_xmit was run
>>>>>> first it could miss the suspend bit checks, and start trying to send
>>>>>> data again and hit another timeout.
>>>>>>
>>>>> Here is a updated patch that also fixes the problem for cxgb3i and
>>>>> iscsi_tcp.
>>>>>
>>>> Sorry. This one fixes a possible leak.
>>>>
>>> And here is a patch to signal the xmit thread. I am not sure what I was
>>> doing wrong before. For some reason the thread would not break out of
>>> the wait. With this patch it is working. It is built over the
>>> check-suspend2.patch.
>>>
>>
>> I've updated my tree to the open-iscsi.git head (commit
>> f10c7942ad0dd26388eed0b46c44bad429fce0ad) and applied the following 3
>> patches on top of it:
>> 1. iscsi_tcp-reduce-sk-sndtmo.patch
>> 2. check-suspend2
>> 3. wake-xmit-on-err - this one breaks because I applied
>> iscsi_tcp-reduce-sk-sndtmo.patch. I had to change the following hunk:
>>
>> @@ -304,7 +304,7 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
>>                   * is getting stopped. libiscsi will know so propogate err
>>                   * for it to do the right thing.
>>                   */
>> -               if (rc == -EAGAIN)
>> +               if (rc == -EAGAIN || rc == -EINTR || rc == -ENODATA)
>>                          return rc;
>>                  else if (rc<  0) {
>>                          rc = ISCSI_ERR_XMIT_FAILED;
>>
>> to:
>>
>> diff --git a/kernel/iscsi_tcp.c b/kernel/iscsi_tcp.c
>> index af02499..65492e4 100644
>> --- a/kernel/iscsi_tcp.c
>> +++ b/kernel/iscsi_tcp.c
>> @@ -283,7 +283,7 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
>>                   * is getting stopped. libiscsi will know so propogate err
>>                   * for it to do the right thing.
>>                   */
>> -               if (rc == -EAGAIN)
>> +               if (rc == -EAGAIN || rc == -EINTR || rc == -ENODATA)
>>                          return -ENOBUFS;
>>                  else if (rc<  0) {
>>                          rc = ISCSI_ERR_XMIT_FAILED;
>>
>> I'm not sure if this is what you meant. Anyway, I saw on another
>
> It was.
>
>> thread that you plan to modify iscsi_tcp-reduce-sk-sndtmo so it will
>> depend on the transport class timeout.
>
> We actually do not need to do that when we have wake-xmit-on-err,
> because that patch should wake the xmit thread right away and we do not
> have to worry about waiting in sendpage/sendmsg.
>

So, the sk sndtmo patch is not required anymore, right?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: logout takes ~50 seconds while running heavy IO

2009-08-03 Thread Erez Zilber


On Sat, Aug 1, 2009 at 6:31 AM, Mike Christie wrote:
> Mike Christie wrote:
>> Mike Christie wrote:
>>> Mike Christie wrote:
>>>> Mike Christie wrote:
>>>>> On 07/31/2009 07:43 AM, Erez Zilber wrote:
>>>>>> I thought that this patch just reduces the timeout from 15 to 3. Does
>>>>>> it also fix the 3 sndtmo periods or is it another patch? What was the
>>>>> Another patch. You should have it.
>>>>>
>>>>>> bug that caused the 3 sndtmo periods waiting?
>>>>>>
>>>>> I think I meant two.
>>>>>
>>>>> If iscsi_tcp_xmit_segment sent some data, then got EAGIN, it would drop
>>>>> the EAGIN. iscsi_xmit would then retry the operation so we would wait 
>>>>> again.
>>>>>
>>>> There is one more bug.
>>>>
>>>> Could you try the 15->3 sec snd tmp patch plus the attached patch?
>>>>
>>>> Another problem is that if we had multiple tasks on the cmd or requeue
>>>> lists, and iscsi_tcp returns a error, the write_space function can still
>>>> run and queue iscsi_data_xmit. If it was a legetimate problem and
>>>> iscsi_conn_failure was run but we raced and iscsi_data_xmit was run
>>>> first it could miss the suspend bit checks, and start trying to send
>>>> data again and hit another timeout.
>>>>
>>> Here is a updated patch that also fixes the problem for cxgb3i and
>>> iscsi_tcp.
>>>
>>
>> Sorry. This one fixes a possible leak.
>>
>
> And here is a patch to signal the xmit thread. I am not sure what I was
> doing wrong before. For some reason the thread would not break out of
> the wait. With this patch it is working. It is built over the
> check-suspend2.patch.
>

I've updated my tree to the open-iscsi.git head (commit
f10c7942ad0dd26388eed0b46c44bad429fce0ad) and applied the following 3
patches on top of it:
1. iscsi_tcp-reduce-sk-sndtmo.patch
2. check-suspend2
3. wake-xmit-on-err - this one breaks because I applied
iscsi_tcp-reduce-sk-sndtmo.patch. I had to change the following hunk:

@@ -304,7 +304,7 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
 * is getting stopped. libiscsi will know so propogate err
 * for it to do the right thing.
 */
-   if (rc == -EAGAIN)
+   if (rc == -EAGAIN || rc == -EINTR || rc == -ENODATA)
return rc;
else if (rc < 0) {
rc = ISCSI_ERR_XMIT_FAILED;

to:

diff --git a/kernel/iscsi_tcp.c b/kernel/iscsi_tcp.c
index af02499..65492e4 100644
--- a/kernel/iscsi_tcp.c
+++ b/kernel/iscsi_tcp.c
@@ -283,7 +283,7 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
 * is getting stopped. libiscsi will know so propogate err
 * for it to do the right thing.
 */
-   if (rc == -EAGAIN)
+   if (rc == -EAGAIN || rc == -EINTR || rc == -ENODATA)
return -ENOBUFS;
else if (rc < 0) {
rc = ISCSI_ERR_XMIT_FAILED;

I'm not sure if this is what you meant. Anyway, I saw on another
thread that you plan to modify iscsi_tcp-reduce-sk-sndtmo so it will
depend on the transport class timeout.

After doing all that, logout completes quickly (a few seconds).

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: logout takes ~50 seconds while running heavy IO

2009-07-31 Thread Erez Zilber


On Fri, Jul 31, 2009 at 9:43 AM, Ulrich
Windl wrote:
>
> On 30 Jul 2009 at 19:25, Erez Zilber wrote:
>
>>
>> Mike,
>>
>> I'm seeing some strange behavior that I don't completely understand:
>> during heavy IO, I disconnect the cable on the target machine and
>> after a few seconds, I'm trying to logout from that target on the
>> initiator side. Here's what I see:
>
> Hi,
>
> IMHO unpluggin the cable is a bad thing to test for network problems:
>
> 1) A Gb NIC typically takes up to 5 seconds to re-establish the link to a 
> switch
> after re-plugging.
> 2) The kernel (udev) may trigger the execution of some scripts to set up the
> network after un-plug/re-plug (maybe even using DHCP)
> 3) The TCP stack may suspend I/O until some timeout
> ...then come the iSCSI issues...
>
> Maybe a more effective (and easier to manage) test would be adding a blackhole
> route. That's purely software and it just makes the packets go to nowhere.
>

I'm not trying to test network problems. I'm trying to test real life
problems: a computer is connected in the lab and suddenly someone
pulls out the cable accidentally or the switch fails etc.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: logout takes ~50 seconds while running heavy IO

2009-07-31 Thread Erez Zilber


On Thu, Jul 30, 2009 at 8:25 PM, Mike Christie wrote:
>
> Mike Christie wrote:
>> Erez Zilber wrote:
>>> Mike,
>>>
>>> I'm seeing some strange behavior that I don't completely understand:
>>> during heavy IO, I disconnect the cable on the target machine and
>>> after a few seconds, I'm trying to logout from that target on the
>>> initiator side. Here's what I see:
>>>
>>> t = 0: iscsi_eh_cmd_timed_out gets called for all commands that timed
>>> out. It resets the timer because we have a ping task &
>>> time_before_eq(conn->last_recv + (conn->recv_timeout * HZ), jiffies)
>>> is true.
>>>
>>> t = /sys/block/sdX/device/timeout: iscsi_eh_cmd_timed_out gets called
>>> again. The session state is ISCSI_STATE_IN_RECOVERY. This will be
>>> called again and again every /sys/block/sdX/device/timeout seconds.
>>> The function returns and resets the timer.
>>>
>>> t = 11: I'm using iscsiadm to logout from the bad session
>>>
>>> t = 60 sec: __iscsi_block_session is called. I guess that the call
>>> stack is something like: mgmt_ipc_session_logout ->
>>> session_logout_task -> session_conn_shutdown -> iscsi_sw_tcp_conn_stop
>>> -> iscsi_conn_stop -> iscsi_start_session_recovery ->
>>> iscsi_block_session -> __iscsi_block_session. I'm not sure if this was
>>> initiated by the logout that iscsiadm initiated 49 seconds ago.
>>>
>>> Now, I see that the commands that were in flight time out:
>>>
>>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
>>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e
>>> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 
>>> 100663048
>>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
>>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x0600
>>> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 
>>> 100662792
>>> Jul 30 18:26:46 be8 kernel: device-mapper: multipath: Failing path 8:32.
>>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
>>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e
>>> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 41943296
>>>
>>> t = 61: ctldev_handle calls iscsi_sched_conn_context with
>>> EV_CONN_ERROR. I'm not sure where this came from.
>>>
>>> What I don't understand is why __iscsi_block_session was called only
>>> after 60 seconds. Is it configurable? Anyway, here's the node info:
>>>
>>
>> block session is only called from iscsi_conn_stop.
>>
>> iscsi_conn_stop is called when there is a connection error like 1011.
>> The kernel will throw the error, then send a msg to userspace. Userspace
>> will then call ep_disconnect to have the kernel clean up the kernel
>> structs for the ep.  Then when that has completed, iscsid will call
>> iscsi_conn_stop.
>>
>> What could be delaying the block session is that the
>> iscsi_start_session_recovery -> iscsi_suspend_tx will wait for the xmit
>> thread to flush. Before we had a timeout of 15 seconds so we could be
>> waiting in there that long. I just modified that to 3 seconds in that
>> sndtmo patch (that is also in the linux-2.6-iscsi git tree), but what we
>> really want is a way to just signal that thread to wake right away.
>>
>
> Oh yeah, what version of open-iscsi were you using? Could you try the
> 871 release with that patch
> http://groups.google.com/group/open-iscsi/browse_thread/thread/0940a3217ea0c4ff

I'm running with ef0357c4728ebba1a4b91a7f6d69c729a5f9e6e3 more or
less. I will try this patch and let you know if it helps.

>
> In the older releases, we could end up waiting for 3 sndtmo periods so
> that would add 45 secs on there.
>

I thought that this patch just reduces the timeout from 15 to 3. Does
it also fix the 3 sndtmo periods or is it another patch? What was the
bug that caused the 3 sndtmo periods waiting?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: logout takes ~50 seconds while running heavy IO

2009-07-31 Thread Erez Zilber


On Thu, Jul 30, 2009 at 8:20 PM, Mike Christie wrote:
>
> Erez Zilber wrote:
>> Mike,
>>
>> I'm seeing some strange behavior that I don't completely understand:
>> during heavy IO, I disconnect the cable on the target machine and
>> after a few seconds, I'm trying to logout from that target on the
>> initiator side. Here's what I see:
>>
>> t = 0: iscsi_eh_cmd_timed_out gets called for all commands that timed
>> out. It resets the timer because we have a ping task &
>> time_before_eq(conn->last_recv + (conn->recv_timeout * HZ), jiffies)
>> is true.
>>
>> t = /sys/block/sdX/device/timeout: iscsi_eh_cmd_timed_out gets called
>> again. The session state is ISCSI_STATE_IN_RECOVERY. This will be
>> called again and again every /sys/block/sdX/device/timeout seconds.
>> The function returns and resets the timer.
>>
>> t = 11: I'm using iscsiadm to logout from the bad session
>>
>> t = 60 sec: __iscsi_block_session is called. I guess that the call
>> stack is something like: mgmt_ipc_session_logout ->
>> session_logout_task -> session_conn_shutdown -> iscsi_sw_tcp_conn_stop
>> -> iscsi_conn_stop -> iscsi_start_session_recovery ->
>> iscsi_block_session -> __iscsi_block_session. I'm not sure if this was
>> initiated by the logout that iscsiadm initiated 49 seconds ago.
>>
>> Now, I see that the commands that were in flight time out:
>>
>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e
>> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100663048
>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x0600
>> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100662792
>> Jul 30 18:26:46 be8 kernel: device-mapper: multipath: Failing path 8:32.
>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
>> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e
>> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 41943296
>>
>> t = 61: ctldev_handle calls iscsi_sched_conn_context with
>> EV_CONN_ERROR. I'm not sure where this came from.
>>
>> What I don't understand is why __iscsi_block_session was called only
>> after 60 seconds. Is it configurable? Anyway, here's the node info:
>>
>
> block session is only called from iscsi_conn_stop.
>
> iscsi_conn_stop is called when there is a connection error like 1011.
> The kernel will throw the error, then send a msg to userspace. Userspace
> will then call ep_disconnect to have the kernel clean up the kernel
> structs for the ep.  Then when that has completed, iscsid will call
> iscsi_conn_stop.
>
> What could be delaying the block session is that the
> iscsi_start_session_recovery -> iscsi_suspend_tx will wait for the xmit
> thread to flush. Before we had a timeout of 15 seconds so we could be
> waiting in there that long. I just modified that to 3 seconds in that
> sndtmo patch (that is also in the linux-2.6-iscsi git tree),

Sounds good. I will try this patch on my tree.


> but what we really want is a way to just signal that thread to wake right 
> away.
>

Is there a way to do that (signal the thread )?

> What also could be delaying it is the ping timeout detection. Do you see
> a ping timeout message in the log? How long after the cable pull is it?

I see a ping timeout message on t = 2. It looks like this (with extra
logs that I added):

Jul 30 18:25:48 be8 kernel:  connection2:0: ping timeout of 5 secs
expired, last rx 4295216248, last ping 4295221248, now 4295226248
Jul 30 18:25:48 be8 kernel: iscsi_check_transport_timeouts: calling
iscsi_conn_failure
Jul 30 18:25:48 be8 kernel: iscsi_conn_failure: calling iscsi_conn_error_event
Jul 30 18:25:48 be8 kernel:  connection2:0: detected conn error (1011)

>  Is it close to the noop values in the iscsid.conf you set?

I'm using the default 5 seconds ping timeout. The cmd timeout is 7
seconds. Note that t = 0 is the time of the 1st error, not the exact
time when the cable was pulled out.

> When the ping timesout, we will throw the conn error 1011, which kicks off the
> recovery mentioned above.
>
>
>
> Block session can also be called as a result of the iscsiadm logout.
>
> If we have not detected a connection problem, then session_logout_task
> will call session_unbind, which will end up removing the target. If we
> have to send a sync cache for the device then the unbind would wait for
> th

logout takes ~50 seconds while running heavy IO

2009-07-30 Thread Erez Zilber


Mike,

I'm seeing some strange behavior that I don't completely understand:
during heavy IO, I disconnect the cable on the target machine and
after a few seconds, I'm trying to logout from that target on the
initiator side. Here's what I see:

t = 0: iscsi_eh_cmd_timed_out gets called for all commands that timed
out. It resets the timer because we have a ping task &
time_before_eq(conn->last_recv + (conn->recv_timeout * HZ), jiffies)
is true.

t = /sys/block/sdX/device/timeout: iscsi_eh_cmd_timed_out gets called
again. The session state is ISCSI_STATE_IN_RECOVERY. This will be
called again and again every /sys/block/sdX/device/timeout seconds.
The function returns and resets the timer.

t = 11: I'm using iscsiadm to logout from the bad session

t = 60 sec: __iscsi_block_session is called. I guess that the call
stack is something like: mgmt_ipc_session_logout ->
session_logout_task -> session_conn_shutdown -> iscsi_sw_tcp_conn_stop
-> iscsi_conn_stop -> iscsi_start_session_recovery ->
iscsi_block_session -> __iscsi_block_session. I'm not sure if this was
initiated by the logout that iscsiadm initiated 49 seconds ago.

Now, I see that the commands that were in flight time out:

Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e
Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100663048
Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x0600
Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100662792
Jul 30 18:26:46 be8 kernel: device-mapper: multipath: Failing path 8:32.
Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e
Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 41943296

t = 61: ctldev_handle calls iscsi_sched_conn_context with
EV_CONN_ERROR. I'm not sure where this came from.

What I don't understand is why __iscsi_block_session was called only
after 60 seconds. Is it configurable? Anyway, here's the node info:

node.name = iqn.2009-01.com.example
node.tpgt = 1
node.startup = manual
iface.hwaddress = default
iface.ipaddress = default
iface.iscsi_ifacename = default
iface.net_ifacename = default
iface.transport_name = tcp
iface.initiatorname = 
node.discovery_address = 172.22.0.20
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 1
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.auth.authmethod = None
node.session.auth.username = 
node.session.auth.password = 
node.session.auth.username_in = 
node.session.auth.password_in = 
node.session.timeo.replacement_timeout = 1
node.session.err_timeo.abort_timeout = 2
node.session.err_timeo.lu_reset_timeout = 2
node.session.err_timeo.host_reset_timeout = 1
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 172.22.0.20
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 2
node.conn[0].timeo.login_timeout = 2
node.conn[0].timeo.auth_timeout = 2
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: [PATCH] Add logging to scsi_transport_iscsi.c

2009-07-30 Thread Erez Zilber


On Thu, Jul 30, 2009 at 6:37 AM, Mike Christie wrote:
> On 07/26/2009 08:48 AM, Erez Zilber wrote:
>> I've attached a new version. I hope it's better. Whenever possible,
>> there's a dbg statement before&  after. For example, if we free the
>> conn object, I can't put a dbg call after it (because conn is already
>> NULL). If you still see specific things that need to be fixed, let me
>> know.
>>
>
> Thanks for the work on this.
>
> How about the attached.
> - I added a ":" between the function name and debug output.
> - Removed some extra newlines
> - Tried to add dbg statements at the top and end of functions that can
> take a long time or fail in odd ways because they call into the scsi
> layer like the scanning, blocking, target removal, etc. For functions
> like allocation, adding, destroying and freeing I tried to just add a
> dbg statement at the top of end of the function.
>
> The patch was made over the linux-2.6-iscsi tree iscsi branch.
>

Looks good.

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

[PATCH] Don't kill iscsid if logout from all nodes fail

2009-07-23 Thread Erez Zilber

If 'iscsiadm -m node --logoutall=all' fails when stopping
the open-iscsi service, we shouldn't kill iscsid.

This solves the following race:
1. A logout from a node is initiated by the user.
2. Before the logout completes, the user runs /etc/init.d/iscsi stop.
   The 'stop' method logs out from all nodes. When it tries to logout
   from the node that is already logging out (step #1), it fails
   because it is already logging out. Then, the 'stop' method kills
   iscsid.
3. The logout command form step #1 returns and notifies the (dead) daemon.

Now, running 'iscsiadm -m session' shows a session (which, actually, doesn't
exist anymore) and the iscsi service is down.

Signed-off-by: Erez Zilber 

On Wed, Jul 22, 2009 at 8:27 PM, Mike Christie wrote:
>
> On 07/22/2009 09:09 AM, Erez Zilber wrote:
>> Mike,
>>
>> I'm seeing from time to time the following scenario (although
>> currently I'm not able to reproduce it):
>>
>
> No need to replicate it. I know what you are referring to.
>
>> 1. A logout from a node is initiated by the user.
>> 2. Before the logout completes, the user runs /etc/init.d/iscsi stop.
>> The 'stop' method logs out from all nodes. When it tries to logout
>> from the node that is already logging out (step #1), it fails because
>> it is already logging out (see initiator.c::session_logout_task when
>> conn->logout_qtask != NULL). Then, the 'stop' method kills iscsid.
>> 3. The logout command form step #1 returns and notifies the (dead) daemon.
>>
>> I was thinking about something like this to solve that:
>>
>> diff --git a/etc/initd/initd.redhat b/etc/initd/initd.redhat
>> index d68f135..55d35ec 100644
>> --- a/etc/initd/initd.redhat
>> +++ b/etc/initd/initd.redhat
>> @@ -39,6 +39,11 @@ stop()
>>          echo -n $"Stopping iSCSI initiator service: "
>>          sync
>>          iscsiadm -m node --logoutall=all
>> +       RETVAL=$?
>> +       if [ $RETVAL -ne 0 ]; then
>> +               echo "Could not logout from all nodes, try again later"
>> +               return $RETVAL
>> +       fi
>>          killproc iscsid
>>          rm -f /var/run/iscsid.pid
>>          [ $RETVAL -eq 0 ]&&  rm -f /var/lock/subsys/open-iscsi
>> @@ -76,6 +81,10 @@ case "$1" in
>>                          ;;
>>          restart)
>>                          stop
>> +                       if [ $RETVAL -ne 0 ]; then
>> +                               echo "Stopping iSCSI initiator service
>> failed, not starting"
>> +                               exit $RETVAL
>> +                       fi
>>                          start
>>                          ;;
>>          status)
>>
>> I'm not sure if there are other logout failure scenarios in which we
>> want to shutdown the open-iscsi service anyway. If not, I guess that
>> this fix (and a similar fix for SuSE&  Debian) should do the work.
>>
>
> This looks ok to me. We could make iscsid more complex and send
> notifications to multiple iscsiadm requests, but I am thinking this does
> not come up very much in normal use, so I am ok with the simple fix in
> the patch.
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Don't kill iscsid if logout from all nodes fail

If 'iscsiadm -m node --logoutall=all' fails when stopping
the open-iscsi service, we shouldn't kill iscsid.

This solves the following race:
1. A logout from a node is initiated by the user.
2. Before the logout completes, the user runs /etc/init.d/iscsi stop.
   The 'stop' method logs out from all nodes. When it tries to logout
   from the node that is already logging out (step #1), it fails
   because it is already logging out. Then, the 'stop' method kills
   iscsid.
3. The logout command form step #1 returns and notifies the (dead) daemon.

Now, running 'iscsiadm -m session' shows a session (which, actually, doesn't
exist anymore) and the iscsi service is down.

Signed-off-by: Erez Zilber 
---
 etc/initd/initd.debian |   16 +++-
 etc/initd/initd.redhat |   13 +
 etc/initd/initd.suse   |5 +
 3 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/etc/in

Re: Race during logout

2009-07-23 Thread Erez Zilber


On Thu, Jul 23, 2009 at 9:15 AM, Ulrich
Windl wrote:
>
> On 22 Jul 2009 at 17:09, Erez Zilber wrote:
>
>>         restart)
>>                         stop
>> +                       if [ $RETVAL -ne 0 ]; then
>> +                               echo "Stopping iSCSI initiator service
>> failed, not starting"
>> +                               exit $RETVAL
>> +                       fi
>>                         start
>>                         ;;
>
> What will happen if the service is not running, the user is unsure and selects
> "restart" instead of "start"? Usually "restart" should be the safe variant of
> "start" IMHO.
>

I think it should work. If you run 'stop' when the service is down, it
returns '0':

[r...@kpc36 ~]# /etc/init.d/iscsi stop
Stopping iSCSI initiator service:  [  OK  ]
[r...@kpc36 ~]# echo $?
0
[r...@kpc36 ~]# /etc/init.d/iscsi stop
Stopping iSCSI initiator service:  [  OK  ]
[r...@kpc36 ~]# echo $?
0

So, restarting when the service is down should work fine.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Race during logout

2009-07-22 Thread Erez Zilber


Mike,

I'm seeing from time to time the following scenario (although
currently I'm not able to reproduce it):

1. A logout from a node is initiated by the user.
2. Before the logout completes, the user runs /etc/init.d/iscsi stop.
The 'stop' method logs out from all nodes. When it tries to logout
from the node that is already logging out (step #1), it fails because
it is already logging out (see initiator.c::session_logout_task when
conn->logout_qtask != NULL). Then, the 'stop' method kills iscsid.
3. The logout command form step #1 returns and notifies the (dead) daemon.

I was thinking about something like this to solve that:

diff --git a/etc/initd/initd.redhat b/etc/initd/initd.redhat
index d68f135..55d35ec 100644
--- a/etc/initd/initd.redhat
+++ b/etc/initd/initd.redhat
@@ -39,6 +39,11 @@ stop()
echo -n $"Stopping iSCSI initiator service: "
sync
iscsiadm -m node --logoutall=all
+   RETVAL=$?
+   if [ $RETVAL -ne 0 ]; then
+   echo "Could not logout from all nodes, try again later"
+   return $RETVAL
+   fi
killproc iscsid
rm -f /var/run/iscsid.pid
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/open-iscsi
@@ -76,6 +81,10 @@ case "$1" in
;;
restart)
stop
+   if [ $RETVAL -ne 0 ]; then
+   echo "Stopping iSCSI initiator service
failed, not starting"
+   exit $RETVAL
+   fi
start
;;
status)

I'm not sure if there are other logout failure scenarios in which we
want to shutdown the open-iscsi service anyway. If not, I guess that
this fix (and a similar fix for SuSE & Debian) should do the work.

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

[PATCH] Add logging to scsi_transport_iscsi.c

2009-07-22 Thread Erez Zilber

Logging for connections and sessions in the scsi_transport_iscsi module
is now controlled by module parameters.

Signed-off-by: Erez Zilber 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Add logging to scsi_transport_iscsi.c

Logging for connections and sessions in the scsi_transport_iscsi module
is now controlled by module parameters.

Signed-off-by: Erez Zilber 
---
 kernel/2.6.14-23_compat.patch |  105 -
 kernel/2.6.24_compat.patch|  101 +++
 kernel/scsi_transport_iscsi.c |   64 +
 3 files changed, 164 insertions(+), 106 deletions(-)

diff --git a/kernel/2.6.14-23_compat.patch b/kernel/2.6.14-23_compat.patch
index ab233bb..306ed2e 100644
--- a/kernel/2.6.14-23_compat.patch
+++ b/kernel/2.6.14-23_compat.patch
@@ -1,8 +1,8 @@
 diff --git a/iscsi_tcp.c b/iscsi_tcp.c
-index caa116c..71df5b9 100644
+index bce1594..c46888d 100644
 --- a/iscsi_tcp.c
 +++ b/iscsi_tcp.c
-@@ -456,11 +456,9 @@ static int iscsi_sw_tcp_pdu_init(struct iscsi_task *task,
+@@ -460,11 +460,9 @@ static int iscsi_sw_tcp_pdu_init(struct iscsi_task *task,
  	if (!task->sc)
  		iscsi_sw_tcp_send_linear_data_prep(conn, task->data, count);
  	else {
@@ -17,7 +17,7 @@ index caa116c..71df5b9 100644
  	}
  
  	if (err) {
-@@ -793,7 +791,11 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
+@@ -797,7 +795,11 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
  	shost->max_lun = iscsi_max_lun;
  	shost->max_id = 0;
  	shost->max_channel = 0;
@@ -29,7 +29,7 @@ index caa116c..71df5b9 100644
  
  	if (iscsi_host_add(shost, NULL))
  		goto free_host;
-@@ -832,12 +834,6 @@ static void iscsi_sw_tcp_session_destroy(struct iscsi_cls_session *cls_session)
+@@ -836,12 +838,6 @@ static void iscsi_sw_tcp_session_destroy(struct iscsi_cls_session *cls_session)
  	iscsi_host_free(shost);
  }
  
@@ -42,7 +42,7 @@ index caa116c..71df5b9 100644
  static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
  {
  	blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
-@@ -846,6 +842,9 @@ static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
+@@ -850,6 +846,9 @@ static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
  }
  
  static struct scsi_host_template iscsi_sw_tcp_sht = {
@@ -52,7 +52,7 @@ index caa116c..71df5b9 100644
  	.module			= THIS_MODULE,
  	.name			= "iSCSI Initiator over TCP/IP",
  	.queuecommand   = iscsi_queuecommand,
-@@ -856,9 +855,8 @@ static struct scsi_host_template iscsi_sw_tcp_sht = {
+@@ -860,9 +859,8 @@ static struct scsi_host_template iscsi_sw_tcp_sht = {
  	.cmd_per_lun		= ISCSI_DEF_CMD_PER_LUN,
  	.eh_abort_handler   = iscsi_eh_abort,
  	.eh_device_reset_handler= iscsi_eh_device_reset,
@@ -77,7 +77,7 @@ index f9a4044..ab20530 100644
  #include "libiscsi_tcp.h"
  
 diff --git a/libiscsi.c b/libiscsi.c
-index fe4b66e..6217f76 100644
+index 73c4231..a55dbbd 100644
 --- a/libiscsi.c
 +++ b/libiscsi.c
 @@ -24,7 +24,10 @@
@@ -91,7 +91,7 @@ index fe4b66e..6217f76 100644
  #include 
  #include 
  #include 
-@@ -60,6 +63,8 @@ MODULE_PARM_DESC(debug_libiscsi, "Turn on debugging for libiscsi module. "
+@@ -83,6 +86,8 @@ MODULE_PARM_DESC(debug_libiscsi_eh,
  	 __func__, ##arg);		\
  	} while (0);
  
@@ -100,7 +100,7 @@ index fe4b66e..6217f76 100644
  /* Serial Number Arithmetic, 32 bits, less than, RFC1982 */
  #define SNA32_CHECK 2147483648UL
  
-@@ -229,7 +234,7 @@ static int iscsi_prep_bidi_ahs(struct iscsi_task *task)
+@@ -252,7 +257,7 @@ static int iscsi_prep_bidi_ahs(struct iscsi_task *task)
  		  sizeof(rlen_ahdr->reserved));
  	rlen_ahdr->ahstype = ISCSI_AHSTYPE_RLENGTH;
  	rlen_ahdr->reserved = 0;
@@ -109,7 +109,7 @@ index fe4b66e..6217f76 100644
  
  	ISCSI_DBG_SESSION(task->conn->session,
  			  "bidi-in rlen_ahdr->read_length(%d) "
-@@ -300,7 +305,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
+@@ -323,7 +328,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
  			return rc;
  	}
  	if (sc->sc_data_direction == DMA_TO_DEVICE) {
@@ -118,7 +118,7 @@ index fe4b66e..6217f76 100644
  		struct iscsi_r2t_info *r2t = &task->unsol_r2t;
  
  		hdr->data_length = cpu_to_be32(out_len);
-@@ -346,7 +351,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
+@@ -369,7 +374,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
  	} else {
  		hdr->flags |= ISCSI_FLAG_CMD_FINAL;
  		zero_

Re: logout call doesn't return

2009-06-29 Thread Erez Zilber


On Tue, Jun 23, 2009 at 8:29 PM, Mike Christie wrote:
>
> Erez Zilber wrote:
>> Mike,
>>
>> I'm trying to debug a problem that we have with iscsiadm: I'm running
>> open-iscsi against multiple targets. At some point, I'm closing the
>> connection from one of the targets (i.e. on the target side). Then, I
>> try to logout from the initiator side, but something goes wrong. The
>> last thing that iscsiadm does it call recv from iscsid_response and it
>> doesn't return (at least not after 10 minutes). I also see that in the
>> kernel, __iscsi_unbind_session calls scsi_remove_target and doesn't
>> return. I guess that this causes iscsiadm to wait on the recv call.
>
> Yeah, iscsiadm will wait for the iscsid operations like the unind to
> complete, and that can take a while.
>
> If you stop the target and then we start the session shutdown process
> while we still think the session is up (we have not got a tcp connection
> error or rst or any other indication that is bad like a nop timing out),
> then we are going to end up firing the iscsi or scsi eh.
>
> If you have IO running or if your LU requires a cache sync to be sent
> when shutting it down, then the worse case is that you have nops turned
> off, and for some reason the network layer does not return a error (just
> returns somehting we thing is retryable like EAGAIN) when we try to do
> sendpage/sendmsg. This will result in the scsi commands timing out. Then
> the aborts and other tmfs will timeout, and then we will wait for
> replacement_timeout seconds to try and reconnect.
>
> If you have nops on or the net layer returns a error, it would be a
> little faster because you do not have to wait for scsi commands to time
> out. The nop will timeout after noop_timeout seconds, then we will wait
> for replacement_timeout seconds to reconnect. After that time we will
> fail everything.
>
> if you do not have IO running and your device does not require cache
> syncs, then it should be a lot shorter, but still may be a minute. The
> __iscsi_unbind_session/scsi_remove_target should complete quickly since
> they do not have to wait on IO and cache syncs to complete. We would
> just wait for the logout iscsi pdu to timeout.
>
>
> There is also a bug, where we retry the sending of data even though we
> know the connection is bad. This patch helps
> http://git.kernel.org/?p=linux/kernel/git/mnc/linux-2.6-iscsi.git;a=commit;h=b138adb2df49967bf0a035143f734d33c4263963
> but what we want is to be able to break from the sendpage/sendsg wait. I
> am working on a patch, but have hit some problems (for some reason if I
> send a signal it does not break from the wait). This problem only adds
> maybe 30 seconds extra for the logout of a session, so I am not sure
> that is what you are hitting.
>
>
>
> So first check if your device needs a cache sync. You can check that by
> looking at /var/log/messages when the device is discovered. You will see
>  something like:
>
> kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
>
> If write cache is enabled then the scsi layer will send cache syncs.
>
> Then check your replacement_timeout. If that is really long, then we
> might be hitting that.
>
>
>
>
>>
>> BTW - I'm not running with the latest code. My HEAD is commit
>> ef0357c4728ebba1a4b91a7f6d69c729a5f9e6e3. I don't know if any relevant
>> bug fixes were made lately.
>
>
>
> Just so you know, I normally work on linux-2.6-iscsi, which tracks
> upstream, then port to open-iscsi/kernel, so the newest kernel patches
> will be in there.

Eventually, it was caused by an internal bug that we had. After fixing
it, things look OK. Thanks for your help.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

logout call doesn't return

2009-06-23 Thread Erez Zilber


Mike,

I'm trying to debug a problem that we have with iscsiadm: I'm running
open-iscsi against multiple targets. At some point, I'm closing the
connection from one of the targets (i.e. on the target side). Then, I
try to logout from the initiator side, but something goes wrong. The
last thing that iscsiadm does it call recv from iscsid_response and it
doesn't return (at least not after 10 minutes). I also see that in the
kernel, __iscsi_unbind_session calls scsi_remove_target and doesn't
return. I guess that this causes iscsiadm to wait on the recv call.

BTW - I'm not running with the latest code. My HEAD is commit
ef0357c4728ebba1a4b91a7f6d69c729a5f9e6e3. I don't know if any relevant
bug fixes were made lately.

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: RFC: do we need a new list for kernel patches

2009-06-13 Thread Erez Zilber


Having everything in a single list sounds like a better idea to me. If
we have 2 lists, many people will have to monitor both.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: Improving kernel logs in open-iscsi

2009-06-10 Thread Erez Zilber


On Mon, Jun 8, 2009 at 7:38 PM, Mike Christie wrote:
>
> Erez Zilber wrote:
>>> With that patch you can do both. module_params are exposed in sysfs and
>>> if you set IWUSR then you can write to it.
>>>
>>> echo 1 > /sys/module/libiscsi/debug_libiscsi
>>>
>>> to turn on
>>>
>>> echo 0 > /sys/module/libiscsi/debug_libiscsi
>>>
>>> to turn off.
>>
>> OK, thanks.
>>
>>>> We can have separate logging for stuff like login, scsi, error
>>>> handling, connection state etc.
>>>>
>>> Send a patch. Note that right now each module has its own logging param.
>>> So the higher level libiscsi stuff has one. The common iscsi over tcp
>>> module libiscsi_tcp has one, then iscsi_tcp has its own. You may want to
>>> merge them all into one or make it so you can turn it on/off at each
>>> layer/level.
>>
>> I've attached a patch that add 2 more module parameters to libiscsi,
>> so now we have conn, session & eh events. Later, we can add more
>> logging events. If you're ok with this fix, I will send another patch
>> that fixes the broken compat patches. BTW - which compat patches are
>> still maintained?
>>
> Hey Erez,
>
> In iscsi_eh_target_reset
>
>         ISCSI_DBG_SESSION(session, "wait for relogin\n");
>         wait_event_interruptible(conn->ehwait,
>
>
> Should this be a ISCSI_DBG_EH?
>
>
> And in iscsi_exec_task_mgmt_fn:
>
>         add_timer(&conn->tmf_timer);
>         ISCSI_DBG_SESSION(session, "tmf set timeout\n");
>
> should this one be a EH?
>
> if so I can just manually change them when I merge the patch.
>

Yes, please make these 2 changes.

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

regarding "increase kernel thread nice"

2009-06-07 Thread Erez Zilber


Mike,

What performance gain was achieved by this patch? Did you also test IOPS?

http://git.kernel.org/?p=linux/kernel/git/mnc/open-iscsi.git;a=commitdiff;h=175a817ac84d3651d38bebb55d03d62f6e45e370

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: Improving kernel logs in open-iscsi

2009-06-07 Thread Erez Zilber

On Thu, Jun 4, 2009 at 7:59 PM, Mike Christie wrote:
>
> Erez Zilber wrote:
>>> With that patch you can do both. module_params are exposed in sysfs and
>>> if you set IWUSR then you can write to it.
>>>
>>> echo 1 > /sys/module/libiscsi/debug_libiscsi
>>>
>>> to turn on
>>>
>>> echo 0 > /sys/module/libiscsi/debug_libiscsi
>>>
>>> to turn off.
>>
>> OK, thanks.
>>
>>>> We can have separate logging for stuff like login, scsi, error
>>>> handling, connection state etc.
>>>>
>>> Send a patch. Note that right now each module has its own logging param.
>>> So the higher level libiscsi stuff has one. The common iscsi over tcp
>>> module libiscsi_tcp has one, then iscsi_tcp has its own. You may want to
>>> merge them all into one or make it so you can turn it on/off at each
>>> layer/level.
>>
>> I've attached a patch that add 2 more module parameters to libiscsi,
>> so now we have conn, session & eh events. Later, we can add more
>> logging events. If you're ok with this fix, I will send another patch
>
> Looks ok to me.
>
>> that fixes the broken compat patches. BTW - which compat patches are
>
> Ok thanks.
>
>> still maintained?
>>
>
> 2.6.14-23_compat.patch
> 2.6.24_compat.patch
> 2.6.26_compat.patch
> 2.6.27_compat.patch
>

Attached with fixes to compat patches.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

[PATCH] Use more debug flags in libiscsi

Allow the user to control the debug logs in libiscsi. We will now
have a module param for connection, session & error handling. Also,
fix backport patches.

Signed-off-by: Erez Zilber 
---
 kernel/2.6.26_compat.patch |   10 ++--
 kernel/2.6.27_compat.patch |   13 ++
 kernel/libiscsi.c  |   93 +++-
 3 files changed, 67 insertions(+), 49 deletions(-)

diff --git a/kernel/2.6.26_compat.patch b/kernel/2.6.26_compat.patch
index 98e9d3e..641d2f6 100644
--- a/kernel/2.6.26_compat.patch
+++ b/kernel/2.6.26_compat.patch
@@ -1,5 +1,5 @@
 diff --git a/libiscsi.c b/libiscsi.c
-index 733fa00..5d88693 100644
+index 149d5eb..467abbf 100644
 --- a/libiscsi.c
 +++ b/libiscsi.c
 @@ -38,6 +38,8 @@
@@ -8,9 +8,9 @@ index 733fa00..5d88693 100644
  
 +#include "open_iscsi_compat.h"
 +
- static int iscsi_dbg_lib;
- module_param_named(debug_libiscsi, iscsi_dbg_lib, int, S_IRUGO | S_IWUSR);
- MODULE_PARM_DESC(debug_libiscsi, "Turn on debugging for libiscsi module. "
+ static int iscsi_dbg_lib_conn;
+ module_param_named(debug_libiscsi_conn, iscsi_dbg_lib_conn, int,
+ 		   S_IRUGO | S_IWUSR);
 diff --git a/open_iscsi_compat.h b/open_iscsi_compat.h
 new file mode 100644
 index 000..b977df5
@@ -46,7 +46,7 @@ index 000..b977df5
 +
 +#endif
 diff --git a/scsi_transport_iscsi.c b/scsi_transport_iscsi.c
-index c9e95e7..79bd57f 100644
+index a49a92c..c07535e 100644
 --- a/scsi_transport_iscsi.c
 +++ b/scsi_transport_iscsi.c
 @@ -30,6 +30,8 @@
diff --git a/kernel/2.6.27_compat.patch b/kernel/2.6.27_compat.patch
index 144f68b..5111232 100644
--- a/kernel/2.6.27_compat.patch
+++ b/kernel/2.6.27_compat.patch
@@ -1,5 +1,5 @@
 diff --git a/libiscsi.c b/libiscsi.c
-index 733fa00..5d88693 100644
+index 149d5eb..467abbf 100644
 --- a/libiscsi.c
 +++ b/libiscsi.c
 @@ -38,6 +38,8 @@
@@ -8,9 +8,9 @@ index 733fa00..5d88693 100644
  
 +#include "open_iscsi_compat.h"
 +
- static int iscsi_dbg_lib;
- module_param_named(debug_libiscsi, iscsi_dbg_lib, int, S_IRUGO | S_IWUSR);
- MODULE_PARM_DESC(debug_libiscsi, "Turn on debugging for libiscsi module. "
+ static int iscsi_dbg_lib_conn;
+ module_param_named(debug_libiscsi_conn, iscsi_dbg_lib_conn, int,
+		S_IRUGO | S_IWUSR);
 diff --git a/open_iscsi_compat.h b/open_iscsi_compat.h
 new file mode 100644
 index 000..b977df5
@@ -46,7 +46,7 @@ index 000..b977df5
 +
 +#endif
 diff --git a/scsi_transport_iscsi.c b/scsi_transport_iscsi.c
-index c9e95e7..38e14e5 100644
+index a49a92c..a9f480e 100644
 --- a/scsi_transport_iscsi.c
 +++ b/scsi_transport_iscsi.c
 @@ -30,6 +30,8 @@
@@ -58,6 +58,3 @@ index c9e95e7..38e14e5 100644
  #define ISCSI_SESSION_ATTRS 21
  #define ISCSI_CONN_ATTRS 13
  #define ISCSI_HOST_ATTRS 4
--- 
-1.5.2.1
-
diff --git a/kernel/libiscsi.c b/kernel/libiscsi.c
index fe4b66e..149d5eb 100644
--- a/kernel/libiscsi.c
+++ b/kernel/libiscsi.c
@@ -38,15 +38,30

[PATCH] Add Module.markers to .gitignore

2009-06-04 Thread Erez Zilber



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

From 509f8c8eecf4992e84eb1fe7467bdecdd0b3463a Mon Sep 17 00:00:00 2001
From: Erez Zilber 
Date: Thu, 4 Jun 2009 19:06:20 +0300
Subject: [PATCH 1/2] Add Module.markers to .gitignore

Signed-off-by: Erez Zilber 
---
 kernel/.gitignore |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/.gitignore b/kernel/.gitignore
index fd9ee8c..0b2be3c 100644
--- a/kernel/.gitignore
+++ b/kernel/.gitignore
@@ -8,3 +8,4 @@ has_*_patch
 *.mod.*
 open_iscsi_compat.h
 *.orig
+Module.markers
-- 
1.6.0.4

Re: Improving kernel logs in open-iscsi

2009-06-04 Thread Erez Zilber

> With that patch you can do both. module_params are exposed in sysfs and
> if you set IWUSR then you can write to it.
>
> echo 1 > /sys/module/libiscsi/debug_libiscsi
>
> to turn on
>
> echo 0 > /sys/module/libiscsi/debug_libiscsi
>
> to turn off.

OK, thanks.

>
>>
>> We can have separate logging for stuff like login, scsi, error
>> handling, connection state etc.
>>
>
> Send a patch. Note that right now each module has its own logging param.
> So the higher level libiscsi stuff has one. The common iscsi over tcp
> module libiscsi_tcp has one, then iscsi_tcp has its own. You may want to
> merge them all into one or make it so you can turn it on/off at each
> layer/level.

I've attached a patch that add 2 more module parameters to libiscsi,
so now we have conn, session & eh events. Later, we can add more
logging events. If you're ok with this fix, I will send another patch
that fixes the broken compat patches. BTW - which compat patches are
still maintained?

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~------~----~--~--~---

From 93034f7bbc0a6c9d6fb5b84984cbef831d15ac55 Mon Sep 17 00:00:00 2001
From: Erez Zilber 
Date: Thu, 4 Jun 2009 19:07:31 +0300
Subject: [PATCH 2/2] Use more debug flags in libiscsi

Allow the user to control the debug logs in libiscsi. We will now
have a module param for connection, session & error handling.

Signed-off-by: Erez Zilber 
---
 kernel/libiscsi.c |   93 
 1 files changed, 57 insertions(+), 36 deletions(-)

diff --git a/kernel/libiscsi.c b/kernel/libiscsi.c
index fe4b66e..149d5eb 100644
--- a/kernel/libiscsi.c
+++ b/kernel/libiscsi.c
@@ -38,15 +38,30 @@
 #include "scsi_transport_iscsi.h"
 #include "libiscsi.h"
 
-static int iscsi_dbg_lib;
-module_param_named(debug_libiscsi, iscsi_dbg_lib, int, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(debug_libiscsi, "Turn on debugging for libiscsi module. "
-		 "Set to 1 to turn on, and zero to turn off. Default "
-		 "is off.");
+static int iscsi_dbg_lib_conn;
+module_param_named(debug_libiscsi_conn, iscsi_dbg_lib_conn, int,
+		   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(debug_libiscsi_conn,
+		 "Turn on debugging for connections in libiscsi module. "
+		 "Set to 1 to turn on, and zero to turn off. Default is off.");
+
+static int iscsi_dbg_lib_session;
+module_param_named(debug_libiscsi_session, iscsi_dbg_lib_session, int,
+		   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(debug_libiscsi_session,
+		 "Turn on debugging for sessions in libiscsi module. "
+		 "Set to 1 to turn on, and zero to turn off. Default is off.");
+
+static int iscsi_dbg_lib_eh;
+module_param_named(debug_libiscsi_eh, iscsi_dbg_lib_eh, int,
+		   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(debug_libiscsi_eh,
+		 "Turn on debugging for error handling in libiscsi module. "
+		 "Set to 1 to turn on, and zero to turn off. Default is off.");
 
 #define ISCSI_DBG_CONN(_conn, dbg_fmt, arg...)			\
 	do {			\
-		if (iscsi_dbg_lib)\
+		if (iscsi_dbg_lib_conn)\
 			iscsi_conn_printk(KERN_INFO, _conn,	\
 	 "%s " dbg_fmt,	\
 	 __func__, ##arg);	\
@@ -54,7 +69,15 @@ MODULE_PARM_DESC(debug_libiscsi, "Turn on debugging for libiscsi module. "
 
 #define ISCSI_DBG_SESSION(_session, dbg_fmt, arg...)			\
 	do {\
-		if (iscsi_dbg_lib)	\
+		if (iscsi_dbg_lib_session)\
+			iscsi_session_printk(KERN_INFO, _session,	\
+	 "%s " dbg_fmt,		\
+	 __func__, ##arg);		\
+	} while (0);
+
+#define ISCSI_DBG_EH(_session, dbg_fmt, arg...)\
+	do {\
+		if (iscsi_dbg_lib_eh)	\
 			iscsi_session_printk(KERN_INFO, _session,	\
 	 "%s " dbg_fmt,		\
 	 __func__, ##arg);		\
@@ -1557,10 +1580,10 @@ int iscsi_eh_target_reset(struct scsi_cmnd *sc)
 	spin_lock_bh(&session->lock);
 	if (session->state == ISCSI_STATE_TERMINATE) {
 failed:
-		iscsi_session_printk(KERN_INFO, session,
- "failing target reset: Could not log "
- "back into target [age %d]\n",
- session->age);
+		ISCSI_DBG_EH(session,
+			 "failing target reset: Could not log back into "
+			 "target [age %d]\n",
+			 session->age);
 		spin_unlock_bh(&session->lock);
 		mutex_unlock(&session->eh_mutex);
 		return FAILED;
@@ -1584,10 +1607,10 @@ failed:
 
 	mutex_lock(&session->eh_mutex);

Re: Improving kernel logs in open-iscsi

2009-05-20 Thread Erez Zilber


On Mon, Mar 9, 2009 at 7:16 PM, Mike Christie  wrote:
>
> Erez Zilber wrote:
>> Currently, open-iscsi uses debug_scsi & debug_tcp for logging. This is
>> controlled by DEBUG_SCSI & DEBUG_TCP. The current method is
>> problematic because you can't enable/disable these logs without
>> recompiling.
>>
>> Before I start working on it, I'd like to discuss it and decide how to
>> do that. If we have something like /sys/class/iscsi_logging, will it
>> be good enough? We can also have logging levels (error, info, trace
>> etc).
>>
>
> It is changed upstream. See the linux-2.6.-iscsi tree (iscsi branch).
> Each core iscsi module has its own logging. I was going to make it more
> finely grained, but I have never asked for just the login or just the
> pdu or scsi logging info and for userspace have always just asked for
> all of it. Also the iscsi kernel code is mostly for passthrough of pdus
> so there is not much to break up.
>
> However, maybe more error info without debugging type of would be nicer
> though. You could extend the existing logging fields to do that. An
> admin might want more error info logged than what we do today maybe?
>

Sorry for getting back to this issue so late.

Are you talking about this commit?

http://git.kernel.org/?p=linux/kernel/git/mnc/linux-2.6-iscsi.git;a=commitdiff;h=1b2c7af877f427a2b25583c9033616c9ebd30aed

I thought that instead of having a module param, it could be nice if
we could control logging from sysfs or something similar, so you can
change the logging level without restarting the driver.

We can have separate logging for stuff like login, scsi, error
handling, connection state etc.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: No abort is sent for a WRITE command that takes too long

2009-05-18 Thread Erez Zilber


On Mon, May 18, 2009 at 6:01 PM, Mike Christie  wrote:
>
> Erez Zilber wrote:
>> On Mon, May 18, 2009 at 4:36 PM, Mike Christie  wrote:
>>> Erez Zilber wrote:
>>>> I enabled open-iscsi logging + added some printk calls when the abort
>>>> handler returns.
>>>> Here's the log. I see that iscsi_eh_cmd_timed_out gets called, but
>>>> there's no abort.
>>>> May 17 11:00:06 kpc36 kernel:  session1: iscsi_eh_cmd_timed_out scsi
>>>> cmd 8101e30efe40 timedout
>>>> May 17 11:00:06 kpc36 kernel:  session1: iscsi_eh_cmd_timed_out return
>>>> timer reset
>>> As you can see in iscsi_eh_cmd_timed_out, if the sesison is down then
>>> there is no point in letting the scsi eh run since we have to relogin
>>> and restart commands so we would return reset timer which prevents the
>>> scsi eh from running.
>>
>> Makes sense. There's only one thing that I don't understand - when
>> does scsi-ml call  iscsi_eh_cmd_timed_out? I thought that if a cmd
>> times out, scsi-ml sends an abort.
>
>
> scsi-ml calls scsi_times_out (which calls iscsi_eh_cmd_timed_out) when
> the scsi command timer expires (timeout is value in
> /sys/block/sdX/device/timeout).
>
> If the eh_timed_out callout returns BLK_EH_NOT_HANDLED then the scsi eh
> will run and call the abort function.
>
> The transportt->eh_timed_out allows the class/lld to override the scsi
> eh if for example it is a transport problem. In that case an abort or
> lun reset would not help, because you cannot send the TMF since there is
> no access to the target/disk.
>

I understand. Thanks!

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: No abort is sent for a WRITE command that takes too long

2009-05-18 Thread Erez Zilber


On Mon, May 18, 2009 at 4:36 PM, Mike Christie  wrote:
>
> Erez Zilber wrote:
>>
>> I enabled open-iscsi logging + added some printk calls when the abort
>> handler returns.
>> Here's the log. I see that iscsi_eh_cmd_timed_out gets called, but
>> there's no abort.
>
>> May 17 11:00:06 kpc36 kernel:  session1: iscsi_eh_cmd_timed_out scsi
>> cmd 8101e30efe40 timedout
>> May 17 11:00:06 kpc36 kernel:  session1: iscsi_eh_cmd_timed_out return
>> timer reset
>
> As you can see in iscsi_eh_cmd_timed_out, if the sesison is down then
> there is no point in letting the scsi eh run since we have to relogin
> and restart commands so we would return reset timer which prevents the
> scsi eh from running.

Makes sense. There's only one thing that I don't understand - when
does scsi-ml call  iscsi_eh_cmd_timed_out? I thought that if a cmd
times out, scsi-ml sends an abort.

>
> And then there is code in there to check if we are in the middle of
> checking the connection. If we are then we ask for some more time with
> the command, and that will prevent the scsi eh from running. This looks
> like it can be problem because we would get a response to our nop which
> would update the last_recv field. If there was no progress being made
> for the scsi command we would still ask to reset the timer and we could
> end up in that loop forever since the scsi layer does not cap the number
> of times you can reset the time. I will send a patch to fix that.
>
>
> However, that probably will not fix your problem.
>
>
> For your specific setup, it looks like we hit the
> iscsi_eh_cmd_timed_out, reset the scsi command timer becuase we are in
> the middle of checking the the connection with the nop/ping, but then
> the nop/ping does not return in time and so we drop the session:
>
>   connection1:0: ping timeout of 5 secs
> expired, recv timeout 5, last rx 4526718494, last ping 4526723494, now
> 4526728494
>
> That is why on the target you see it cleanup up commands. On the
> initiator you can see us cleaning up:
>
> May 17 11:00:07 kpc36 kernel:  session1: iscsi_start_session_recovery
> blocking session
> May 17 11:00:07 kpc36 kernel:  session1: fail_scsi_tasks failing sc
> 8101e30efe40 itt 0x13 state 3
>
> And then later in the logs you will see us start the commands again when
> we are logged in again.
>
>
> So you probably need to continue to replying to nops when the r2t is
> dropped. I will fix it on the initiatotr side to detect if we are not
> getting IO for a specific command and then let the scsi eh run.

The current behavior doesn't create a problem for me - instead of
getting an 'abort' for the cmd, the session gets dropped and the cmd
is cleaned up anyway. I was only wondering why it happens.

Thanks for the detailed explanation. It was helpful.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: No abort is sent for a WRITE command that takes too long

2009-05-17 Thread Erez Zilber


On Thu, May 14, 2009 at 6:25 PM, Mike Christie  wrote:
>
> Erez Zilber wrote:
>>> Could you try my linux-2.6-iscsi git tree and turn on debugging?
>>
>> Moving to another kernel will be harder for me, but I can add a printk
>> in the eh handler.
>>
>
> Can you recompile the libiscsi module with debugging?
> uncomment this:
>
> /* #define DEBUG_SCSI */
>
> in libiscsi.h
>
>
> You might also want to add some debugs in iscsi_eh_abort. We might
> hitting one of those early return statements (some do not have
> printks/debug statements).
>
>
>>>
>>>> open-iscsi logs in again after 10 seconds and sends the command again.
>>> Do you see something about a target reset or host reset?
>>
>> No
>>
>>>> SCST also cleans up the session.
>>> If the initiator thinks the abort failed (actually it is more like we
>>> will return failed if we think it is possible that someone could still
>>> be accessing the commands buffers, because we do not want scsi-ml to
>>> start using them again) it would return FAILED to the scsi-eh which for
>>> us would end up running the host/target reset, which we just drop the
>>> session for.
>>>
>>> We discussed before we need to modify how we decided when to return
>>> failed and we need to send a target reset for the host/target reset
>>> handler because the target reset and session relogin have different
>>> clearing effects. Vlad also has concerns about the tcp/ip connection
>>> teardown and buildup.
>>>
>>>> Can anyone explain the reason for this behavior? I would expect that
>>> iser and iscsi_tcp both hooked in the same libiscsi.c eh code. You
>>> should know this :)
>>
>> I suspect that scsi-ml doesn't call the eh handler at all. Anyway, I
>> will add this printk and retest.
>>
>
> Yeah, that is what it is looking like. If the initiator dropped the
> session then you should see the conn error 1011 or a "target reset
> succeeded" or "host reset succeeded" message or a failure message if we
> did not log back in.
>
> >
>

I enabled open-iscsi logging + added some printk calls when the abort
handler returns.
Here's the log. I see that iscsi_eh_cmd_timed_out gets called, but
there's no abort.

May 17 10:59:30 kpc36 kernel:  connection1:0:
iscsi_check_transport_timeouts Sending nopout as ping
May 17 10:59:30 kpc36 kernel:  connection1:0:
iscsi_check_transport_timeouts Setting next tmo 4526696621
May 17 10:59:30 kpc36 kernel:  session1: iscsi_prep_mgmt_task mgmtpdu
[op 0x0 hdr->itt 0x4009 datalen 0]
May 17 10:59:30 kpc36 kernel:  session1: __iscsi_complete_pdu [op 0x20
cid 0 itt 0x9 len 0]
May 17 10:59:30 kpc36 kernel:  session1: iscsi_complete_task complete
task itt 0x9 state 3 sc 
May 17 10:59:30 kpc36 kernel:  session1: iscsi_free_task freeing task
itt 0x9 state 1 sc 
May 17 10:59:30 kpc36 kernel:  connection1:0:
iscsi_check_transport_timeouts Setting next tmo 4526696621
May 17 10:59:33 kpc36 kernel:  session1: iscsi_prep_scsi_cmd_pdu iscsi
prep [read cid 0 sc 8101e5e1e6c0 cdb 0x12 itt 0xa len 36 bidi_len
0 cmdsn 81 win 32]
May 17 10:59:33 kpc36 kernel:  session1: __iscsi_complete_pdu [op 0x25
cid 0 itt 0xa len 0]
May 17 10:59:33 kpc36 kernel:  session1: iscsi_data_in_rsp data in
with status done [sc 8101e5e1e6c0 res 0 itt 0xa]
May 17 10:59:33 kpc36 kernel:  session1: iscsi_complete_task complete
task itt 0xa state 3 sc 8101e5e1e6c0
May 17 10:59:33 kpc36 kernel:  session1: iscsi_free_task freeing task
itt 0xa state 1 sc 8101e5e1e6c0
May 17 10:59:33 kpc36 kernel:  session1: iscsi_prep_scsi_cmd_pdu iscsi
prep [read cid 0 sc 8101dcca4080 cdb 0x25 itt 0xb len 8 bidi_len 0
cmdsn 82 win 32]
May 17 10:59:33 kpc36 kernel:  session1: __iscsi_complete_pdu [op 0x25
cid 0 itt 0xb len 0]
May 17 10:59:33 kpc36 kernel:  session1: iscsi_data_in_rsp data in
with status done [sc 8101dcca4080 res 0 itt 0xb]
May 17 10:59:33 kpc36 kernel:  session1: iscsi_complete_task complete
task itt 0xb state 3 sc 8101dcca4080
May 17 10:59:33 kpc36 kernel:  session1: iscsi_free_task freeing task
itt 0xb state 1 sc 8101dcca4080
May 17 10:59:33 kpc36 kernel:  session1: iscsi_prep_scsi_cmd_pdu iscsi
prep [read cid 0 sc 8101d8cd7980 cdb 0x28 itt 0xc len 512 bidi_len
0 cmdsn 83 win 32]
May 17 10:59:33 kpc36 kernel:  session1: __iscsi_complete_pdu [op 0x25
cid 0 itt 0xc len 0]
May 17 10:59:33 kpc36 kernel:  session1: iscsi_data_in_rsp data in
with status done [sc 8101d8cd7980 res 0 itt 0xc]
May 17 10:59:33 kpc36 kernel:  session1: iscsi_complete_task complete
task itt 0xc state 3 sc 8101d8cd7980
May 17 10:59:33 kpc36 kernel:  session1: iscsi_free_task freeing task
itt 0xc stat

Re: No abort is sent for a WRITE command that takes too long

2009-05-14 Thread Erez Zilber


On Thu, May 14, 2009 at 5:55 PM, Mike Christie  wrote:
>
> Erez Zilber wrote:
>> Hi,
>>
>> I'm running a setup of open-iscsi & SCST. In order to test error
>> scenarios during a WRITE command, I've added a delay in SCST, so after
>> it receives the command, it doesn't send an R2T for 20 seconds. I also
>> modified the device timeout on the initiator side to 5 seconds.
>> However, I don't see an 'abort' for that command. Instead, I see that
>
> How are you looking? With ethereal? Did you see a TUR get sent before
> the session recovery?

I'm running wireshark. i didn't see any TUR.

>
> Could you try my linux-2.6-iscsi git tree and turn on debugging?

Moving to another kernel will be harder for me, but I can add a printk
in the eh handler.

>
>
>> open-iscsi logs in again after 10 seconds and sends the command again.
>
> Do you see something about a target reset or host reset?

No

>
>> SCST also cleans up the session.
>
> If the initiator thinks the abort failed (actually it is more like we
> will return failed if we think it is possible that someone could still
> be accessing the commands buffers, because we do not want scsi-ml to
> start using them again) it would return FAILED to the scsi-eh which for
> us would end up running the host/target reset, which we just drop the
> session for.
>
> We discussed before we need to modify how we decided when to return
> failed and we need to send a target reset for the host/target reset
> handler because the target reset and session relogin have different
> clearing effects. Vlad also has concerns about the tcp/ip connection
> teardown and buildup.
>
>>
>> Can anyone explain the reason for this behavior? I would expect that
>
> iser and iscsi_tcp both hooked in the same libiscsi.c eh code. You
> should know this :)

I suspect that scsi-ml doesn't call the eh handler at all. Anyway, I
will add this printk and retest.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: No abort is sent for a WRITE command that takes too long

2009-05-14 Thread Erez Zilber


On Thu, May 14, 2009 at 2:54 PM, Ulrich Windl
 wrote:
>
> On 14 May 2009 at 13:12, Erez Zilber wrote:
>
>>
>> Hi,
>>
>> I'm running a setup of open-iscsi & SCST. In order to test error
>> scenarios during a WRITE command, I've added a delay in SCST, so after
>> it receives the command, it doesn't send an R2T for 20 seconds. I also
>> modified the device timeout on the initiator side to 5 seconds.
>> However, I don't see an 'abort' for that command. Instead, I see that
>> open-iscsi logs in again after 10 seconds and sends the command again.
>> SCST also cleans up the session.
>>
>> Can anyone explain the reason for this behavior? I would expect that
>> scsi-ml on the initiator side will send an 'abort' after 5 seconds.
>
> IMHO 5s for a "target abort" is way too short: If your physical disk has some
> problems (like seek errors, verify error, etc.) it can take very long for a 
> write
> to complete. Usually one would prefer to see the data on disk over triggering 
> some
> error. I don't know the spects, but I'd guess the suggested timeout is 
> minutes,
> not seconds.

Maybe it wasn't clear, but the 5 seconds timeout is only for testing &
debugging.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

No abort is sent for a WRITE command that takes too long

2009-05-14 Thread Erez Zilber


Hi,

I'm running a setup of open-iscsi & SCST. In order to test error
scenarios during a WRITE command, I've added a delay in SCST, so after
it receives the command, it doesn't send an R2T for 20 seconds. I also
modified the device timeout on the initiator side to 5 seconds.
However, I don't see an 'abort' for that command. Instead, I see that
open-iscsi logs in again after 10 seconds and sends the command again.
SCST also cleans up the session.

Can anyone explain the reason for this behavior? I would expect that
scsi-ml on the initiator side will send an 'abort' after 5 seconds.

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: corruption in session list

2009-04-27 Thread Erez Zilber


On Mon, Apr 27, 2009 at 6:56 PM, Mike Christie  wrote:
>
> Erez Zilber wrote:
>> From time to time, I see errors like the following when I run
>> 'iscsiadm -m session':
>>
>> tcp: [2] []:-1,1 ��A�¹V���
>>
>> Is it a known bug? I don't know how to reproduce it. It just happens.
>>
>
> I have not seen it. Have you seen it in multiple versions of open-iscsi?

I'm 2 or 3 commits away from the HEAD. I haven't seen that with other versions.

> Do you actually have a session with sid 2 running or is it all garbage?

Will check that when it happens again.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

corruption in session list

2009-04-27 Thread Erez Zilber


>From time to time, I see errors like the following when I run
'iscsiadm -m session':

tcp: [2] []:-1,1 ��A�¹V���

Is it a known bug? I don't know how to reproduce it. It just happens.

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Improving kernel logs in open-iscsi

2009-03-09 Thread Erez Zilber


Currently, open-iscsi uses debug_scsi & debug_tcp for logging. This is
controlled by DEBUG_SCSI & DEBUG_TCP. The current method is
problematic because you can't enable/disable these logs without
recompiling.

Before I start working on it, I'd like to discuss it and decide how to
do that. If we have something like /sys/class/iscsi_logging, will it
be good enough? We can also have logging levels (error, info, trace
etc).

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: Disable aggregation of requests

2009-02-19 Thread Erez Zilber


On Wed, Feb 18, 2009 at 10:09 PM, Boaz Harrosh  wrote:
>
> Mike Christie wrote:
>> Erez Zilber wrote:
>>> On Tue, Feb 17, 2009 at 11:35 PM, Mike Christie  
>>> wrote:
>>>> Erez Zilber wrote:
>>>>> Hi,
>>>>>
>>>>> I'm running a setup of open-iscsi connected to a target. When I run
>>>>> I/O from the initiator (e.g using dd) with transaction size of 128kB,
>>>>> I sometimes see that 2 128kB requests are aggregated to a single 256kB
>>>>> request. This is rare, but it happens from time to time. Can I disable
>>>>> this feature? Who is responsible for that? Is it scsi-ml?
>>>>>
>>>> block layer.
>>>>
>>>> /sys/block/sdX/queue/max_sectors_kb
>>> Thanks, but this will limit the I/O size for all I/Os. What I forgot
>>> to mention is that sometimes I also send larger I/Os (e.g. 512kB).
>>> With the proposed solution, these large I/Os will be sent as multiple
>>> 128kB I/Os (and affect the performance). Isn't there a way to simply
>>> avoid this aggregation?
>>>
>>
>> Not that I know of when going through the block layer. I think you will
>> have to ask lkml.
>>
>> I think the only way to control it is the  bsg/sg/passthrough route
>> since that does not do merging. The other alternative is to just hack
>> the code to do what you want :)
>>
>
> You can select the no-op I/O elevator and you can also use direct IO
> like with sg_dd from the sg_utils package
>

I'm using noop already, but that didn't help. I'll try to ask in lkml.

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: Disable aggregation of requests

2009-02-18 Thread Erez Zilber


On Wed, Feb 18, 2009 at 10:36 AM, Ulrich Windl
 wrote:
>
> On 18 Feb 2009 at 8:44, Erez Zilber wrote:
>
>>
>> On Tue, Feb 17, 2009 at 11:35 PM, Mike Christie  wrote:
>> >
>> > Erez Zilber wrote:
>> >> Hi,
>> >>
>> >> I'm running a setup of open-iscsi connected to a target. When I run
>> >> I/O from the initiator (e.g using dd) with transaction size of 128kB,
>> >> I sometimes see that 2 128kB requests are aggregated to a single 256kB
>> >> request. This is rare, but it happens from time to time. Can I disable
>> >> this feature? Who is responsible for that? Is it scsi-ml?
>> >>
>> >
>> > block layer.
>> >
>> > /sys/block/sdX/queue/max_sectors_kb
>>
>> Thanks, but this will limit the I/O size for all I/Os. What I forgot
>> to mention is that sometimes I also send larger I/Os (e.g. 512kB).
>> With the proposed solution, these large I/Os will be sent as multiple
>> 128kB I/Os (and affect the performance). Isn't there a way to simply
>> avoid this aggregation?
>
> Hi!
>
> May I curiously ask why you would want that at all? In my experience, the 
> larger
> the requests, and the more requests are packed into one packet, the better the
> performance.
>

That is correct, but sometimes you allocate resources according to
some assumptions (e.g. commmand size not bigger than X). If the
assumptions breaks, bad things happen...

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: Disable aggregation of requests

2009-02-17 Thread Erez Zilber


On Tue, Feb 17, 2009 at 11:35 PM, Mike Christie  wrote:
>
> Erez Zilber wrote:
>> Hi,
>>
>> I'm running a setup of open-iscsi connected to a target. When I run
>> I/O from the initiator (e.g using dd) with transaction size of 128kB,
>> I sometimes see that 2 128kB requests are aggregated to a single 256kB
>> request. This is rare, but it happens from time to time. Can I disable
>> this feature? Who is responsible for that? Is it scsi-ml?
>>
>
> block layer.
>
> /sys/block/sdX/queue/max_sectors_kb

Thanks, but this will limit the I/O size for all I/Os. What I forgot
to mention is that sometimes I also send larger I/Os (e.g. 512kB).
With the proposed solution, these large I/Os will be sent as multiple
128kB I/Os (and affect the performance). Isn't there a way to simply
avoid this aggregation?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Disable aggregation of requests

2009-02-17 Thread Erez Zilber


Hi,

I'm running a setup of open-iscsi connected to a target. When I run
I/O from the initiator (e.g using dd) with transaction size of 128kB,
I sometimes see that 2 128kB requests are aggregated to a single 256kB
request. This is rare, but it happens from time to time. Can I disable
this feature? Who is responsible for that? Is it scsi-ml?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: iSER error, please help

2009-01-21 Thread Erez Zilber


On Wed, Jan 21, 2009 at 3:31 PM, Rumburak  wrote:
>
> Hello,
>
>  I am trying to configure iSCSI over iSER on two server (one acting
> as target - with tgtd, and the other acting as initiator with open-
> iscsi). Already installe OFED and have RDMA working (tested with
> rping).
>  The tgtd is installed and configured (tested it by using the tcp
> transport), but when I try to use iSER as the transport I get some
> errors on kernel not being able to handle NULL pointer dereference

This is definitely a bug. Even if iSER can't start normally, it should
exit without a NULL pointer dereference.

> (memory management error??). It seems that the NIC does not support
> FMRs (have no idea what is that...)

FMR - fast memory registartion. If your HCA doesn't support it, you
can't run iSER.

 and fires up the error
> "iser_create_ib_conn_res:unable to alloc mem or create resource, err
> -38" which is related to FMR (as far I could see in the code).
>   Since the NIC is a Chelsio S302 it supports iWARP RDMA, but I'm not
> sure if this is OK for iSER (or still only Infiniband is supported).
>Someone can please help me in setting up the iSCSI over iSER (in
> case it works on iWARP)?

As far as I know, iSER over iWARP is not supported yet (please correct
me if I'm wrong).

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: Is MaxOutstandingR2T hardcoded?

2009-01-08 Thread Erez Zilber


On Wed, Jan 7, 2009 at 8:34 PM, Mike Christie  wrote:
>
> Erez Zilber wrote:
>> I noticed that if I change the value of
>> node.session.iscsi.MaxOutstandingR2T to some value > 1 (and the target
>> also supports a value higher than 1), it is still negotiated to '1'. I
>> saw that the login PDU itself contains 'MaxOutstandingR2T = 1'.
>>
>> I took a look at the code and found this in 
>> login.c::add_params_normal_session:
>>
>> http://git.kernel.org/?p=linux/kernel/git/mnc/open-iscsi.git;a=blob;f=usr/login.c;h=02358703a423a1b09f578fd919e9245797a3c0b1;hb=HEAD#l802
>>
>> Is it hardcoded on purpose? If yes, why?
>>
>
> Because the code only supports one.

So, maybe it's a good idea to disable commands like "iscsiadm -m node
--op update -n node.session.iscsi.MaxOutstandingR2T -v some_val".

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Is MaxOutstandingR2T hardcoded?

2009-01-07 Thread Erez Zilber


I noticed that if I change the value of
node.session.iscsi.MaxOutstandingR2T to some value > 1 (and the target
also supports a value higher than 1), it is still negotiated to '1'. I
saw that the login PDU itself contains 'MaxOutstandingR2T = 1'.

I took a look at the code and found this in login.c::add_params_normal_session:

http://git.kernel.org/?p=linux/kernel/git/mnc/open-iscsi.git;a=blob;f=usr/login.c;h=02358703a423a1b09f578fd919e9245797a3c0b1;hb=HEAD#l802

Is it hardcoded on purpose? If yes, why?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

kernel oops in iscsi_tcp_recv

2008-12-17 Thread Erez Zilber


Mike,

I got a kernel oops while logging in from v870-1 to an iSCSI-SCST
target. it happens before 'iscsiadm -m node -L all' returns. Is this a
known bug? Here's the log:

Dec 17 19:56:21 172.16.4.12 Unable to handle kernel paging request
Dec 17 19:56:26 172.16.4.12  at 1fd8 RIP:
Dec 17 19:56:31 172.16.4.12  []
:iscsi_tcp:iscsi_tcp_recv+0xb9/0x498
Dec 17 19:56:36 172.16.4.12 PGD 406b54067
Dec 17 19:56:41 172.16.4.12 PUD 409354067
Dec 17 19:56:46 172.16.4.12 PMD 0
Dec 17 19:56:51 172.16.4.12
Dec 17 19:56:56 172.16.4.12 Oops:  [1]
Dec 17 19:57:01 172.16.4.12 SMP
Dec 17 19:57:06 172.16.4.12
Dec 17 19:57:11 172.16.4.12 last sysfs file: /block/sdc/removable
Dec 17 19:57:16 172.16.4.12 CPU 0

I will try to add more debug prints tomorrow, and see if I can give
more details.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

[PATCH] rm unused variable in fw_entry.c

2008-10-22 Thread Erez Zilber

Signed-off-by: Erez Zilber <[EMAIL PROTECTED]>
---
 utils/fwparam_ibft/fw_entry.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

From d68940f71fa21b15d8767d78b26ffa143ae4 Mon Sep 17 00:00:00 2001
From: Erez Zilber <[EMAIL PROTECTED]>
Date: Wed, 22 Oct 2008 12:00:00 +0200
Subject: [PATCH] rm unused variable in fw_entry.c

Signed-off-by: Erez Zilber <[EMAIL PROTECTED]>
---
 utils/fwparam_ibft/fw_entry.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/utils/fwparam_ibft/fw_entry.c b/utils/fwparam_ibft/fw_entry.c
index 915bbb7..fbfa3dd 100644
--- a/utils/fwparam_ibft/fw_entry.c
+++ b/utils/fwparam_ibft/fw_entry.c
@@ -38,8 +38,6 @@ int fw_get_entry(struct boot_context *context, const char *filepath)
  */
 static void dump_mac(struct boot_context *context)
 {
-	int i;
-
 	if (!strlen(context->mac))
 		return;
 
-- 
1.5.6.1

[PATCH] Minor fixes in iscsi_discovery documentation

2008-10-22 Thread Erez Zilber

Some changes that were made in iscsi_discovery were
not reflected in the docs.

Signed-off-by: Erez Zilber <[EMAIL PROTECTED]>
---
 doc/iscsi_discovery.8 |   10 +-
 utils/iscsi_discovery |6 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

From 10aed58d75a8138e034c9f59864eeae3b093e511 Mon Sep 17 00:00:00 2001
From: Erez Zilber <[EMAIL PROTECTED]>
Date: Wed, 22 Oct 2008 11:45:24 +0200
Subject: [PATCH] Minor fixes in iscsi_discovery documentation

Some changes that were made in iscsi_discovery were
not reflected in the docs.

Signed-off-by: Erez Zilber <[EMAIL PROTECTED]>
---
 doc/iscsi_discovery.8 |   10 +-
 utils/iscsi_discovery |6 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/iscsi_discovery.8 b/doc/iscsi_discovery.8
index c46223b..9971b83 100644
--- a/doc/iscsi_discovery.8
+++ b/doc/iscsi_discovery.8
@@ -6,19 +6,19 @@
 
 .TH "iscsi_discovery" 8
 .SH NAME
-iscsi_discovery \- discover iscsi devices
+iscsi_discovery \- discover iSCSI targets
 .SH SYNOPSIS
 .B iscsi_discovery  [-p ] [-d] [-t  [-f]] [-m] [-l]
 
 .SH DESCRIPTION
 Perform send-targets discovery to the specified IP. If a discovery record
-is generated, try to login to the portal using iSER and TCP transports
+is generated, try to login to the portal using the preferred transport
 (-t flag specifies the requested transport type, TCP is the default).
 If login using a certain transport succeeds, mark the portal for automatic
 login (unless -m flag is used), and disconnect (unless -l flag is used).
 
-For iscsi discovery to work, open-iscsi services must be running. e.g. iscsid 
-should be up, and the iscsi modules loaded. This is best accomplished by the
+For iSCSI discovery to work, open-iscsi services must be running. i.e. iscsid 
+should be up, and the iSCSI modules loaded. This is best accomplished by the
 init.d startup script.
 
 .\" .SH OPTIONS
@@ -47,6 +47,6 @@ login - login to the new discovered nodes (defualt is false).
 .SH AUTHOR
 Written by Dan Bar Dov
 .SH "REPORTING BUGS"
-Report bugs to <[EMAIL PROTECTED]>.
+Report bugs to <[EMAIL PROTECTED]>.
 .SH COPYRIGHT
 Copyright \(co Voltaire Ltd. 2006.
diff --git a/utils/iscsi_discovery b/utils/iscsi_discovery
index 9f1e7cf..3c6edf3 100755
--- a/utils/iscsi_discovery
+++ b/utils/iscsi_discovery
@@ -20,17 +20,17 @@
 
 # iscsi_discovery:
 #* does a send-targets discovery to the given IP
-#* set the transport type to ISER
+#* set the transport type to the preferred transport (or tcp is -t flag is not used)
 #* tries to login
 #* if succeeds,
 #  o logout,
-#  o mark record autmatic
+#  o mark record autmatic (unless -m flag is used)
 #* else
 #  o reset transport type to TCP
 #  o try to login
 #  o if succeeded
 #+ logout
-#+ mark record automatic
+#+ mark record automatic (unless -m flag is used)
 #
 
 usage()
-- 
1.5.6.1

Future tags in the open-iscsi git tree

2008-10-02 Thread Erez Zilber


Mike,

I don't know if you read the whole discussion about light-weight &
annotated tags. Anyway, it looks like it would be better to use only
annotated tags in the future because many automatic tools (e.g.
git-describe) ignore light-weight tags.

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: git-describe doesn't show the most recent tag

2008-09-28 Thread Erez Zilber


On Sun, Sep 28, 2008 at 5:39 PM, Pierre Habouzit <[EMAIL PROTECTED]> wrote:
> On Sun, Sep 28, 2008 at 02:29:21PM +0000, Erez Zilber wrote:
>> On Sun, Sep 28, 2008 at 4:55 PM, Pierre Habouzit <[EMAIL PROTECTED]> wrote:
>> > On Sun, Sep 28, 2008 at 01:48:42PM +, Erez Zilber wrote:
>> >> Why is this happening?
>> >
>> >   --tags
>> >   Instead of using only the annotated tags, use any tag found in
>> >   
>> >   .git/refs/tags.
>> >
>>
>> I'm not sure that I understand the difference between tags and annotated 
>> tags.
>
>  a lightweight tag is just a reference. an annotated tag has a message
> associated. Usually tags are meant as local help, whereas annotated tags
> are the ones pushed to the repositories and that never change. That's
> why many tools ignore non annotated tags by default unless you pass
> --tags to them.

Thanks for the explanation.

>
>> Anyway, if I move to the master branch, I see the following tags:
>>
>> [EMAIL PROTECTED]:/tmp/open-iscsi.git]$ ls .git/refs/tags/
>> 2.0-868-rc1  2.0-869  2.0-869.1  2.0-869.2  2.0-869-rc2  2.0-869-rc3
>> 2.0-869-rc4  2.0-870-rc1
>> [EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-tag
>> 2.0-868-rc1
>> 2.0-869
>> 2.0-869-rc2
>> 2.0-869-rc3
>> 2.0-869-rc4
>> 2.0-869.1
>> 2.0-869.2
>> 2.0-870-rc1
>>
>> However:
>> [EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-describe --tags
>> 2.0-868-rc1-81-g31c9d42
>>
>> I was expecting to see 2.0-870-rc1 here.
>
>  That's because master is not at -rc1 exactly, but some commits
> afterwards. Please read the git-describe manpage fully, it's _really_
> well explained:
>
>   The command finds the most recent tag that is reachable from a commit.
>   If the tag points to the commit, then only the tag is shown. Otherwise,
>   it suffixes the tag name with the number of additional commits on top
>   of the tagged object and the abbreviated object name of the most recent
>   commit.
>
>
> Which means that your master is 81 commits ahead of the exact 2.0-860-rc1 tag,
> at sha1 31c9d42

I read that, but I still don't understand what happens in the open-iscsi tree:

[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ cat .git/refs/tags/2.0-870-rc1
5e80c8167c112687ae7b30b1e40af6f03088c56c

The head is 12 commits from the 2.0-870-rc1 tag. Therefore, I expected
to see something like 2.0-870-rc1-12-some_hash (not
2.0-868-rc1-81-g31c9d42).

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Re: git-describe doesn't show the most recent tag

2008-09-28 Thread Erez Zilber


On Sun, Sep 28, 2008 at 4:55 PM, Pierre Habouzit <[EMAIL PROTECTED]> wrote:
> On Sun, Sep 28, 2008 at 01:48:42PM +0000, Erez Zilber wrote:
>> Why is this happening?
>
>   --tags
>   Instead of using only the annotated tags, use any tag found in
>   
>   .git/refs/tags.
>

I'm not sure that I understand the difference between tags and annotated tags.

Anyway, if I move to the master branch, I see the following tags:

[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ ls .git/refs/tags/
2.0-868-rc1  2.0-869  2.0-869.1  2.0-869.2  2.0-869-rc2  2.0-869-rc3
2.0-869-rc4  2.0-870-rc1
[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-tag
2.0-868-rc1
2.0-869
2.0-869-rc2
2.0-869-rc3
2.0-869-rc4
2.0-869.1
2.0-869.2
2.0-870-rc1

However:
[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-describe --tags
2.0-868-rc1-81-g31c9d42

I was expecting to see 2.0-870-rc1 here.

Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

git-describe doesn't show the most recent tag

2008-09-28 Thread Erez Zilber


Hi,

I'm trying to run git-describe on the open-iscsi git tree
(git://git.kernel.org/pub/scm/linux/kernel/git/mnc/open-iscsi.git):

[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-branch -a
* master
  origin/2.0-869-bugfix
  origin/HEAD
  origin/bnx2i
  origin/cxgb3i
  origin/master
[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-describe
2.0-868-rc1-81-g31c9d42

However, there are newer tags than 2.0-868-rc1:
[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-tag
2.0-868-rc1
2.0-869
2.0-869-rc2
2.0-869-rc3
2.0-869-rc4
2.0-869.1
2.0-869.2
2.0-870-rc1

>From what I see in the man page "git-describe - Show the most recent
tag that is reachable from a commit". In this repository, it doesn't
look like that...

Now, I switch to the "2.0-869-bugfix" branch:
[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-checkout -b
2.0-869-bugfix origin/2.0-869-bugfix
Branch 2.0-869-bugfix set up to track remote branch
refs/remotes/origin/2.0-869-bugfix.
Switched to a new branch "2.0-869-bugfix"

and running again git-describe:
[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-describe
2.0-868-rc1-33-g81133dd

Only if I use the --tags flag, I get what I expected:
[EMAIL PROTECTED]:/tmp/open-iscsi.git]$ git-describe --tags
2.0-869.2

Why is this happening?

Thanks,
Erez

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

[PATCH] Add .gitignore files

2008-09-10 Thread Erez Zilber

Some files should not be tracked by git.

Signed-off-by: Erez Zilber <[EMAIL PROTECTED]>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

Add .gitignore files

Some files should not be tracked by git.

Signed-off-by: Erez Zilber <[EMAIL PROTECTED]>
---
 .gitignore|1 +
 kernel/.gitignore |   10 ++
 usr/.gitignore|3 +++
 utils/.gitignore  |1 +
 4 files changed, 15 insertions(+), 0 deletions(-)
 create mode 100644 .gitignore
 create mode 100644 kernel/.gitignore
 create mode 100644 usr/.gitignore
 create mode 100644 utils/.gitignore

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000..5761abc
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1 @@
+*.o
diff --git a/kernel/.gitignore b/kernel/.gitignore
new file mode 100644
index 000..fd9ee8c
--- /dev/null
+++ b/kernel/.gitignore
@@ -0,0 +1,10 @@
+.*.ko.cmd
+.*.o.cmd
+.tmp_versions
+Module.symvers
+cur_patched
+has_*_patch
+*.ko
+*.mod.*
+open_iscsi_compat.h
+*.orig
diff --git a/usr/.gitignore b/usr/.gitignore
new file mode 100644
index 000..32000e2
--- /dev/null
+++ b/usr/.gitignore
@@ -0,0 +1,3 @@
+iscsiadm
+iscsid
+iscsistart
diff --git a/utils/.gitignore b/utils/.gitignore
new file mode 100644
index 000..7efe3fd
--- /dev/null
+++ b/utils/.gitignore
@@ -0,0 +1 @@
+iscsi-iname
-- 
1.5.6.1

1 2 >

1 - 100 of 176 matches

Mail list logo