Re: [RFC]: libiscsi patch to support cxgb3i on older RHEL-5/SLES-10

2009-11-03 Thread Rakesh Ranjan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mike Christie wrote:
 Rakesh Ranjan wrote:
 Mike Christie wrote:
 Rakesh Ranjan wrote:
 Rakesh Ranjan wrote:
 Mike Christie wrote:
 On 09/01/2009 09:53 AM, Mike Christie wrote:
 On 09/01/2009 03:58 AM, Or Gerlitz wrote:
 Mike Christie wrote:
 Or, I am ccing you because some time ago Erez was working on
 support
 for older RHEL and SLES kernels for OFED. It looks like the patch
 below would not be useful to you because iser is supported in
 those
 kernels, but did you guys all need RHEL 4 and maybe SLES 9
 support too?
 Hi Mike, I'm used to work with patches which have a change log
 and are
 signed, where this patch lacks both, so I can't really
 understand what
 it is about, sorry.

 A signature is not going to help you understand that patch will
 it? :)

 I do not think a changelog will help either since it is the first
 version of a RFC patch.

   From the subject of the mail and the body it looks like Rakesh is
 trying to port libiscsi to older distro kernels (RHEL 5 and SLES 10
 based) so he can support cxgb3i on them.

 I am just asking you guys if you also need RHEL 4 and SLES 9
 support.

 You guys meaning, do you need iser and does Rakesh need cxgb3i?
 Hi Mike,

 Yes we do want to support cxgb3i on RHEL4/SLES9. I am sending the
 modified patch against current james tree's libiscsi part. This
 patch can replace existing 2.6.14-23_compat.patch.

 Hi Mike,

 Here is updated patch that fixes some MACROS to fix compilation
 issue on RHEL5.0 and SLES10.2


 Was the patch in this mail the final version?

 What was this for:

 +#if !(defined RHELC1)  !(defined SLEC1)
 struct delayed_work recovery_work;
 +#else
 +   struct work_struct recovery_work;
 +#endif



 And what was the reason for the ifdefs related to this for:

 +#if !(defined RHELC1)  !(defined SLEC1) \
 +(LINUX_VERSION_CODE = KERNEL_VERSION(2,6,19))
 task-have_checked_conn = false;
 task-last_timeout = jiffies;
 task-last_xfer = jiffies;
 +#endif

 Hi Mike,

 These checks I have used to preserve the original 2.6.14-23 needed
 contents. Since we don't want to have separate for each different OS
 release, so I just put above part with these guards.

 
 Do you need to mess with the delayed_work though? In open_iscsi_compat.h
 we have compat code for this:
 
 +#if LINUX_VERSION_CODE = KERNEL_VERSION(2,6,19)
 +struct delayed_work {
 +   struct work_struct work;
 +};
 
 
 and I thought this was working for RHEL kernels as well as kernel.org ones.
 
 
 My question for the second chunk was more why do you need to ifdef them
 at all? Those task fields will always be there won't they? Is it
 something that code is interacting with that is missing?
 
 Sorry for the late reply again.

Hi Mike,

These changes came from the .870 2.6.14-19_compat.patch mistakenly and
also I been thinking about RHEL4 and SLES9 support also in the same
patch. But right now we don't have any plan to support RHEL4/SLES9.

I am attaching the fixed patch. Please share your feedback on same.

Regards
Rakesh Ranjan
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJK8DKNAAoJEBqoHbxtDU4Jd/kP/A6BRK+mu93wYPmi3K1lU+Zj
kvjmDWHnLDR1d6Izq3f/z0WLR5Q2Yp1XHVH1R/m4sngU3DKv0upFJZCSmJMA8ZES
O+8B0Rtny0ko6KGlge+27Ud85GsWamWZ8T64Sr6ACR2Bx2JPO9yu3f2z4ReAH2+Q
DXHun5DORcNAv460XHfNwDmoPK8dxY+TeHRiPa1Guj/DmLd5e0xvVOmWfKHapBAH
wRRe/NyzH/9dr9PbOUBYuBtBzNraqHBjwNT5qNmBVyZ5l5BBHp8AkOKuH/QZYUFL
CPLk2NmDF8e+Oqc0ALeJIbeK4n8i8Fa0ymiFYL/8+lTr0odZX8hTY0iBUvZiLBBv
i3ZYx8chYP9p3UUOPl5/k3Aj3cqMsNm2WTCtFs5H71uzMjLw2+xAP22H8gE4YO5D
YMOEK/U0DDWtlPF+7/TVvPBSfC9npFW1PVf2tJ6ZTcp/kT6z2tb1LmIjluXIiVb3
ZPM3gkl9jtWMYwSijJoiCg8FhFdzj04z4ZPaEPazQxwPcF531wcDBVFekYH/jhVe
uqVRvHmymUT5140AawtsIDyrIjxSdH+lBSsDcUy11oDAQfxf8NWm+aHn0CKiA4I3
V9+qxAVCmoXdjPmEBM7X2gTVpk9mFlVwEzhBYCcD4gdzo55edr9GqXGgU5zuJPF3
CGpMvWyMBwzOTBuJJ7zh
=EKqG
-END PGP SIGNATURE-

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

diff -Nuarp kernel-orig/iscsi_tcp.c kernel/iscsi_tcp.c
--- kernel-orig/iscsi_tcp.c	2009-11-03 15:55:10.0 +0530
+++ kernel/iscsi_tcp.c	2009-11-03 17:28:11.0 +0530
@@ -459,10 +459,9 @@ static int iscsi_sw_tcp_pdu_init(struct 
 	if (!task-sc)
 		iscsi_sw_tcp_send_linear_data_prep(conn, task-data, count);
 	else {
-		struct scsi_data_buffer *sdb = scsi_out(task-sc);
 
-		err = iscsi_sw_tcp_send_data_prep(conn, sdb-table.sgl,
-		  sdb-table.nents, offset,
+		err = iscsi_sw_tcp_send_data_prep(conn, scsi_sglist(task-sc),
+		  scsi_sg_count(task-sc), 

Re: [RFC] iscsi transport : add sgio pass-thru support

2009-11-03 Thread James Smart

I actually started the patch with this in mind - making a common layer.  I was 
able to commonize:  xx_bsg_destroy_job(), xx_bsg_jobdone(), xx_softirq_done(), 
a helper for the timeout function (chkjobdone()), xx_bsg_map_buffer(), 
xx_req_to_bsgjob(), and xx_bsg_goose_queue().

However, what I was finding was I was jumping through hoops with the data 
structures (whose header where, structures within structures, nested private 
areas, etc).  Additionally, I kept finding chunks of the code flow, which had 
parallels to the items in the common routines, that had to be left within the 
transport (e.g. rx path in transport, tx in common; or vice versa) - e.g. if I 
can't encapsulate both sides of the code flow within the common code I lose 
many of the advantages - I ended up abandoning it  under the guise of 
complexity==bad

I can post some of the work to see if you have the same conclusion. Yes, I 
don't like the replication either.

-- james s



Mike Christie wrote:
 James Smart wrote:
 This patch implements the same infrastructure as found in the FC transport
 for sgio request/response handling.

 The patch creates (and exports to userland) a new header - scsi_bsg_iscsi.h


 
 Sorry for the late reply. I am trying to sell my house and move.
 
 
 Based on your experience with fc bsg support, do you think there is some 
 common code? It looks like a lot of this is generic. I just started 
 looking at the fc bsg stuff again, so I am not sure ATM.
 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



open-iscsi-2.0-871 on ubuntu 9.10

2009-11-03 Thread Stefan

Hello all,
Im having trouble to get open-iscsi running on ubuntu 9.10. Im thinking he 
*.870 is not working correctly. Cause having trouble to get it discovering.
So I want to try 871.

doing:
/software/open-iscsi-2.0-871# ls
COPYING  Changelog  Makefile  README  THANKS  debian  doc  etc  include  kernel 
 
test  usr  utils
r...@monster:~/software/open-iscsi-2.0-871# make install
make -C kernel install_kernel
make[1]: Entering directory `/home/stefan/software/open-iscsi-2.0-871/kernel'
make[1]: *** No rule to make target `linux_2_6_31', needed by `kernel_check'.  
Stop.
make[1]: Leaving directory `/home/stefan/software/open-iscsi-2.0-871/kernel'
make: *** [install_kernel] Error 2


Can not find any information what is missing or what Im doing wrong.

Can someone help?

tia
stefan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi taking down servers?

2009-11-03 Thread Mike Christie

Morten W. Petersen wrote:
 Hi,
 
 we have some servers running open-iscsi 2.0.870~rc3-0.4 (Debian) on

Is this open-iscsi that comes with debian or is that a open-iscsi.org 
release of 870-rc3?

 a couple of servers.
 
 One of these servers has been frequent unexplained reboots, and another
 I saw had a hard freeze due to the sdb/sdc device becoming
 unwritable/unreadable.
 
 Could open-iscsi be the culprit here? 

It could be. Is there anything in /var/log/messages when the reboot 
occurs? Do you see something about a conn error or ping/nop timing out 
or somehting about a host reset failing?

How do you know the disk is not read/writable?

Are you doing failover with the MD3000i target?

What values are you using for the noops and replacement_timeout?


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



open-iscsi taking down servers?

2009-11-03 Thread Morten W. Petersen

Hi,

we have some servers running open-iscsi 2.0.870~rc3-0.4 (Debian) on
a couple of servers.

One of these servers has been frequent unexplained reboots, and another
I saw had a hard freeze due to the sdb/sdc device becoming
unwritable/unreadable.

Could open-iscsi be the culprit here?  We've had a number of other
servers from the same vendor (not connected to a SAN, using open-iscsi)
and have not seen unexplained reboots like this before.

If so, is there hope that this could be fixed in a short amount of time, or
we need to consider dumping (selling) the SAN and going for regular
harddrives instead?

The SAN is an MD3000i BTW.

Thanks,

Morten

-- 
Morten W. Petersen
Manager
Nidelven IT Ltd

Phone: +47 45 44 00 69
Email: mor...@nidelven-it.no


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



a deadlock (or corruption) bug in iscsid's logging

2009-11-03 Thread guy keren


Hi,

the logging code in open-iscsi uses a logarea structure in shared 
memory protected by a SysV semaphore (using semop system calls) - and 
also places the sembuf array structure used in the semop calls in this 
same shared-memory area (i.e. inside the logarea struct that is 
allocated in shared memory at function logarea_init).

as a result, both the iscsid logging process and the iscsid control 
process attempt to use this structure in a non-synchronized manner, 
which is racy and may result either a deadlock or data corruption (we 
saw these deadlocks several times).

the relevant code of the logging process is in usr/log.c, function 
log_flush(). the relevant code of the control process is in the same 
file, function dolog().

the deadlock senario:

1. the logging process has the semaphore held. the control process is
   doing some work.
2. the logging process is about to release the semaphore. it sets 
the sem_op parameter in the sembuf structure to '1'.
3. the control process now wants to add a logging record. it sets 
the sem_op parameter in the sembuf structure to '-1'.
4. the control process invokes semop and gets blocked (because the 
semaphore is held by the logging process).
5. the logging process invokes semop and also gets blocked for the 
same reason.

we're in deadlock.

to get a data corruption, we'll need a slightly different scheduling - 
i.e. that the process that wants to take the lock will update the sembuf 
struct first - and then the process releasing the lock would modify 
sem_op to '1' - and we'll have both processes increasing the semaphore's 
value instead of one increasing and one decreasing it - and thus the 
semaphore's value will later allow both processes to grab the semaphore 
at the same time.

solution:

to solve this, i have created a local variable on the stack of both 
processes, that is used with the semop calls. another possible solution 
is to move the sembuf structure to a global variable that is NOT placed 
in shared memory.

before i send a patch - is there a preference either way? or some other way?

thanks,
--guy

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: a deadlock (or corruption) bug in iscsid's logging

2009-11-03 Thread Ulrich Windl

On 3 Nov 2009 at 2:54, guy keren wrote:

 
 
 Hi,
 
 the logging code in open-iscsi uses a logarea structure in shared 
 memory protected by a SysV semaphore (using semop system calls) - and 
 also places the sembuf array structure used in the semop calls in this 
 same shared-memory area (i.e. inside the logarea struct that is 
 allocated in shared memory at function logarea_init).
 
 as a result, both the iscsid logging process and the iscsid control 
 process attempt to use this structure in a non-synchronized manner, 

The fact that the logging and the control process use the shared memory in a 
unsynchronized way seems unrelated to the fact that both structures are located 
in 
the same memory area, or I didn't understand your statement. For performance 
reasons it seems wise to locate the controlling semaphores close to the area 
being 
controlled.

 which is racy and may result either a deadlock or data corruption (we 
 saw these deadlocks several times).
 
 the relevant code of the logging process is in usr/log.c, function 
 log_flush(). the relevant code of the control process is in the same 
 file, function dolog().
 
 the deadlock senario:
 
 1. the logging process has the semaphore held. the control process is
doing some work.
 2. the logging process is about to release the semaphore. it sets 
 the sem_op parameter in the sembuf structure to '1'.
 3. the control process now wants to add a logging record. it sets 
 the sem_op parameter in the sembuf structure to '-1'.

Ah, I understand: not the semaphore structure is in shared memory, but the 
parameter structure for calling the semop(). OK, that's bad. Probably those 
structures should be local (on the stack). I was confused with POSIX semaphores 
where shared memory is required.

 4. the control process invokes semop and gets blocked (because the 
 semaphore is held by the logging process).
 5. the logging process invokes semop and also gets blocked for the 
 same reason.
 
 we're in deadlock.

Good spotting!

Ulrich

 
 to get a data corruption, we'll need a slightly different scheduling - 
 i.e. that the process that wants to take the lock will update the sembuf 
 struct first - and then the process releasing the lock would modify 
 sem_op to '1' - and we'll have both processes increasing the semaphore's 
 value instead of one increasing and one decreasing it - and thus the 
 semaphore's value will later allow both processes to grab the semaphore 
 at the same time.
 
 solution:
 
 to solve this, i have created a local variable on the stack of both 
 processes, that is used with the semop calls. another possible solution 
 is to move the sembuf structure to a global variable that is NOT placed 
 in shared memory.
 
 before i send a patch - is there a preference either way? or some other way?
 
 thanks,
 --guy
 
  



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---