Re: iSCSI tape drive problems

2008-06-18 Thread Sparqz

I must have really dark sunglasses on, because I've never seen an
iSCSI tape drive before?

Anyhow, I don't think your MTU (lack of jumbo frames) is the problem,
I can easily get 50MB/s sustained transfers over a gigabit network.

First I would confirm that the path between your linux iSCSI server
and your tape drive is running a the full wire speed (gigabit).

Assuming eth0 is the ethernet port you use for iSCSI traffic type
"ethtool eth0" and make sure you see:
Speed: 1000Mb/s
Duplex: Full
amongst the garbage that fills the screen.

Also, you can type "ifconfig eth0" and you should see something like:
  RX packets:62361505 errors:0 dropped:0 overruns:0 frame:0
  TX packets:110583510 errors:0 dropped:0 overruns:0 carrier:0
make sure that there are no errors (or dropped packets, overruns or
framing errors)

I wouldn't worry too much about the TCP Checksum messages that
wireshark shows (usually it's because another part of the system
checks these checksums and mangles them in the process)

Finally, can your local disk support 30MB/s ?  might be worth
installing something like iozone or bonnie++ to make sure your disk
can be read and written to at 30MB/s.

Hope this helps!

On Jun 17, 6:27 am, Michael <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I've been using open-iscsi to set up an IBM Ultrium LTO-4 tape drive.
> I can connect and transfer files and everything, but the maximum read
> or write speed I can get is like 16MB/s by tweaking the block size.  I
> am on a gigabit network, which the tape drive supports.  The drive
> specifications rate it at 30MB/s so I'm off by 50%.  Does anyone know
> what I can do to get the speed up to scratch?  I'm pretty much a Linux/
> network newbie so I no idea if its an open-iscsi tweak or a network
> tweak that I could possibly do.  Also, I checked and my network card
> only supports MTU=1500, if that matters.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Problems with Open-iSCSI and Infortrend A16E-G2130-4

2008-05-04 Thread Sparqz

Hi All,

I have two of these arrays, and I must warn you that this unit is
unstable unless you are running firmware 3.61G.61 or later.

I was using Jumbo frames early on, but then decided to switch them off
as some of our servers did not support jumbo frames.

Without jumbo frames I see performance of up to 80MB/s on a single
host, and I can easily saturate 2 gigabit links with multiple servers
streaming data from the Infortrend array.

When I was using Jumbo frames, I didn't set the MTU to 9000, I think
it was closer to 8500...

Thanks,

Stuart.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-06 Thread Sparqz

Hi Hannes,

I guess it was only a matter of time before you found this thread (-:

I've had a look at the source through git and I must admit I'm fairly
lost - I was a qualified (embedded) software engineer at one stage,
but I've dropped that in favor of systems / network administration.
First time I've used git too, I much prefer the software I was using
previously to manage code, as you could clearly see the project
milestones & versions.

If Novell / SuSE are prepared to support the latest open-iscsi code
(from git) then I'm happy to stick with what works.  I haven't been
able to crash / corrupt anything, and I've been thrashing it to the
limits of 1Gbit ethernet.  I'll come back when I have problems with
10gbe!

Thanks all!

Stuart.

On Mar 6, 11:10 pm, Hannes Reinecke <[EMAIL PROTECTED]> wrote:
> Sparqz wrote:
> > FYI: If anyone is keen to have a look at my bugzilla submission at
> > SuSE
>
> >https://bugzilla.novell.com/show_bug.cgi?id=366492
>
> Careful what you say here; I'm listening in :-)
>
> No, seriously: If you could do a git bisect to find out which
> commit fixed this I'd more than happy.
> The last commit that went into SLES10 SP1 is
> b4a62d156e793115d973d6841e060b5a5e77e57c
> or svn r768.
>
> But then I've updated the open-iscsi for SLES10 SP2 to
> git latest, so we should be fine there.
> Or that's the hope, at least.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke   zSeries & Storage
> [EMAIL PROTECTED]  +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-04 Thread Sparqz

Hi Mike,

> Oh yeah, could you also 
> tryhttp://www.open-iscsi.org/bits/open-iscsi-2.0-868-rc1.tar.gz
> this one fixes a perf problem we saw with some other targets.

open-iscsi-2.0-868-rc1 works fine, performance is equal to open-
iscsi-2.0-865.15.

I'm fairly sure now that the performance drop is due to the server
hardware and not the code.

ecweb1:/iscsi/xfs/nobody # bonnie++ -u nobody (open-iscsi-2.0-865.15 -
2008-03-05 11:15am - retries=5, no errors)
Version  1.03   --Sequential Output-- --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
%CP  /sec %CP
ecweb1   8G 45343  99 94899  27 45229  10 42284  94 93601   9
682.0   1
--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -
Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP  /sec %CP
 16   772   5 + +++  1023   6  1347  10 + ++
+   943   6
ecweb1,8G,
45343,99,94899,27,45229,10,42284,94,93601,9,682.0,1,16,772,5,+,+++,
1023,6,1347,10,+,+++,943,6

ecweb1:/iscsi/xfs/nobody # bonnie++ -u nobody (open-iscsi-2.0-868-rc1
- 2008-03-05 15:31pm - retries=5, no errors)
Version  1.03   --Sequential Output-- --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
%CP  /sec %CP
ecweb1   8G 45183  99 93715  26 43997  10 43634  97 93738   9
698.7   1
--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -
Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP  /sec %CP
 16   763   5 + +++   977   6  1237   9 + ++
+   892   8
ecweb1,8G,
45183,99,93715,26,43997,10,43634,97,93738,9,698.7,1,16,763,5,+,+++,
977,6,1237,9,+,+++,892,8

(apologies if the line wrapping stuffs this up)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-04 Thread Sparqz

FYI: If anyone is keen to have a look at my bugzilla submission at
SuSE

https://bugzilla.novell.com/show_bug.cgi?id=366492

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-04 Thread Sparqz

Hi Mike,

> Could you try this tarball
> http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/ope...
>
> And try it with different IO schedulers by doing
> cat > /sys/block/sdb/queue/scheduler
> echo $ONE_OF_THE_VALUES_FROM_THE_CAT > /sys/block/sdb/queue/scheduler
>
> Please be carefull with the tarball. It is not stable, and just for a
> quick performance test.

This tarball doesn't compile ): It may be because I don't have a
complete build environment on this development server - I have
uploaded the error messages for you to scan through, see if there is
anything obvious there.

Thanks,

Stuart.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-03-04 Thread Sparqz

On Mar 5, 4:36 am, Mike Christie <[EMAIL PROTECTED]> wrote:
> Mike Christie wrote:
> >> Were there any default parameters changed from open-iscsi-2.0-707 and
> >> open-iscsi-2.0-865 ?
>
> > None, that should affect you like how you are seeing. Could you try this
> > tarball
> >http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/ope...
>
> > And try it with different IO schedulers by doing
> > cat > /sys/block/sdb/queue/scheduler
> > echo $ONE_OF_THE_VALUES_FROM_THE_CAT > /sys/block/sdb/queue/scheduler
>
> > Please be carefull with the tarball. It is not stable, and just for a
> > quick performance test.
>
> Oh yeah, could you also 
> tryhttp://www.open-iscsi.org/bits/open-iscsi-2.0-868-rc1.tar.gz
> this one fixes a perf problem we saw with some other targets.

Right, I'm talking with SuSE at the moment, they want me to try
changing /sys/block/sdX/device/retries (from 5 to 10)  They also want
a tcpdump, but I've told them that will be in the gigabytes of size.
I gave them the tcpdump summary from above...

Once I've done experimenting with retries and send the details back to
SuSE I'll try your suggestions (-:

*phew* so busy... so much to do!

At the moment I'm using a development server for testing (it fails
just like the production servers do with the distro supplied open-
iscsi drivers), so I'm not worried if your tarball nukes the server.

Thanks again
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



iSCSI Errors - SLES10 + Infortrend

2008-03-02 Thread Sparqz

The default run for bonnie++ on a server with 32GB of memory takes
forever...!  But it looks like the performance is up on the production
server (Still SLES10 SP1) with the semi-stable release from the open-
iscsi website.

Version  1.03   --Sequential Output-- --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
%CP  /sec %CP
thor63G 45924  98 109503  26 36807   8 46482  84 85767  11
353.9   0
--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -
Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP  /sec %CP
 16  9597  57 + +++  8192  38 10504  52 + +++
7936  44
thor,63G,
45924,98,109503,26,36807,8,46482,84,85767,11,353.9,0,16,9597,57,+,+
++,8192,38,10504,52,+,+++,7936,44

Is there a change-log I can hunt through to figure out perhaps where
the bug was and what fixed it?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



iSCSI Errors - SLES10 + Infortrend

2008-03-02 Thread Sparqz

Hi Erik,

You've done it again...? You've changed my discussion subject from
"iSCSI Errors - SLES10 + Infortrend" to "iSCSI targets &
Replication".  If you are indeed going through the proper route to
create a new discussion then this behavior of Google Groups is
_really_ undesirable.

For the record, I'm quite calm thanks, just a bit miffed as to why
this is happening (or why anyone would think that this is a good
feature to regex a subject and overwrite the original subject) I-:

Perhaps the polite thing to do would be to alter one's subject so that
it doesn't overwrite existing subjects ?

On Mar 2, 1:58 am, Erik Bussink <[EMAIL PROTECTED]> wrote:
> On Mon, 2008-02-25 at 12:41 -0800, Sparqz wrote:
> > Hi Erik,
>
> > Why have you hijacked my thread?  Please don't... start a new thread/
> > topic.
>
> Whoa, you need to calm down. For one, I had not read any of the post for
> the iSCSI errors & SLES10 post that where here recently, and I still
> haven't read them. I will later today.
>
> I just had a question, and opened a new discussion with a new title.
> So it seems we have 5 letters in our title threads that are common...
> iSCSI. Nice one mate.
>
> If this discussion overlaps other posts that where posted recently, you
> can just point it out.
>
> Erik
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-02-25 Thread Sparqz

Here's some output from tcpdump, this is a bunch of duplicate acks, I
don't think these are normal... unless anyone can tell me different?

192.168.50.8 is the iSCSI target, 192.168.50.11 is the initiator

11:14:45.391670 IP 192.168.50.8.3260 > 192.168.50.11.58035: P
3143284:3144176(892) ack 193 win 61440
11:14:45.391674 IP 192.168.50.8.3260 > 192.168.50.11.58035: .
3144176:3145636(1460) ack 193 win 61440
11:14:45.391676 IP 192.168.50.11.58035 > 192.168.50.8.3260: . ack
3145636 win 32767
11:14:45.391681 IP 192.168.50.8.3260 > 192.168.50.11.58035: .
3145636:3147096(1460) ack 193 win 61440
11:14:45.391686 IP 192.168.50.8.3260 > 192.168.50.11.58035: P
3147096:3148320(1224) ack 193 win 61440
11:14:45.391689 IP 192.168.50.11.58035 > 192.168.50.8.3260: . ack
3148320 win 32767
11:14:45.391694 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391699 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391702 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391706 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391710 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391713 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391716 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391719 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391722 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391725 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391763 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391766 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.391769 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 193
win 61440
11:14:45.398114 IP 192.168.50.11.58035 > 192.168.50.8.3260: P
193:241(48) ack 3148320 win 32767
11:14:45.398211 IP 192.168.50.8.3260 > 192.168.50.11.58035: . ack 241
win 61440
11:14:45.398636 IP 192.168.50.8.3260 > 192.168.50.11.58035: .
3148320:3149780(1460) ack 241 win 61440
11:14:45.398645 IP 192.168.50.8.3260 > 192.168.50.11.58035: .
3149780:3151240(1460) ack 241 win 61440
11:14:45.398649 IP 192.168.50.11.58035 > 192.168.50.8.3260: . ack
3151240 win 32767
11:14:45.398654 IP 192.168.50.8.3260 > 192.168.50.11.58035: .
3151240:3152464(1224) ack 241 win 61440
11:14:45.398658 IP 192.168.50.8.3260 > 192.168.50.11.58035: .
3152464:3153924(1460) ack 241 win 61440
11:14:45.398662 IP 192.168.50.11.58035 > 192.168.50.8.3260: . ack
3153924 win 32767


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



iSCSI Errors - SLES10 + Infortrend

2008-02-25 Thread Sparqz

Hi Erik,

Why have you hijacked my thread?  Please don't... start a new thread/
topic.

This thread is still active as I'm still having issues with my iSCSI
setup.

Thanks

Stuart.

On Feb 25, 12:01 pm, Erik Bussink <[EMAIL PROTECTED]> wrote:
> If I use iSCSI Targets on a Linux server A(ietd for example), how would
> I make a replicate of the iSCSI LUN on another iSCSI LUN (in block
> level) for safe keeping ?
>
> Should I create a software RAID1 with the two iSCSI disks ? or create
> some custom scripts to create a RAID1 and break the RAID1 ?
>
> thanks for any pointers...
> Erik
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI targets & Replication iSCSI Errors - SLES10 + Infortrend

2008-02-25 Thread Sparqz

Hi Erik,

Why have you hijacked my thread?  Please don't... start a new thread/
topic.

This thread is still active as I'm still having issues with my iSCSI
setup.

Thanks

Stuart.

On Feb 25, 12:01 pm, Erik Bussink <[EMAIL PROTECTED]> wrote:
> If I use iSCSI Targets on a Linux server A(ietd for example), how would
> I make a replicate of the iSCSI LUN on another iSCSI LUN (in block
> level) for safe keeping ?
>
> Should I create a software RAID1 with the two iSCSI disks ? or create
> some custom scripts to create a RAID1 and break the RAID1 ?
>
> thanks for any pointers...
> Erik
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-02-21 Thread Sparqz

Just a quick note - the supplier of the Infortrend gear wants to
replace all 32 1TB Seagate hard drives with 32 1TB Hitachi hard
drives...

?? Sounds like an expensive (and quite possibly pointless) exercise
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: bnx2i driver for RHEL 4?

2008-02-21 Thread Sparqz

I would be keen to see RHEL4 support for bnx2i - I was under the
impression that the DL585-G2 has two onboard BCM5706 which were "Multi-
purpose" with iSCSI offload engine?

Do you have to pay a license fee to use HP's bnx2i, or is that only
when used under a Micro$oft operating system?

On Feb 22, 5:23 am, "Anil Veerabhadrappa" <[EMAIL PROTECTED]> wrote:
> bnx2i is supported on RHEL5 and SUSE10u1 distro.
> We have explored few options to add RHEL4 support but nothing
> has materalized yet.
>
> -Original Message-
> From: open-iscsi@googlegroups.com on behalf of extraspecialbitter
> Sent: Thu 2/21/2008 7:16 AM
> To: open-iscsi
> Subject: bnx2i driver for RHEL 4?
>
> I recently installed an HP NC373F "Multi-purpose" Gigabit NIC card in
> one of our lab's DL585-G2 servers running RHEL 4.0 (Update 4).  The
> kernel recognized the card on boot, and it was easily associated with
> the existing bnx2 driver.  The intent is to connect to a SAN device
> via iSCSI, something that seems to be provided by HP's bnx2i driver
> for RHEL 5.  My question is this: is there a backport of this driver
> available for RHEL 4?  Thanks in advance...
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-02-21 Thread Sparqz



On Feb 22, 5:16 am, Mike Christie <[EMAIL PROTECTED]> wrote:
> Sparqz wrote:
> > /var/log/messages from sles10 i586 server:
>
> Was there anything else in the log about a nop-out or iscsi ping timing out?

No, what you see is what you get in this case...

>
> > kernel:  connection2:0: iscsi: detected conn error (1011)
> > iscsid: detected iSCSI connection 2:0 error (1011) state (3)
>
> This indicates that there was a generic connection error. It could be
> caused by a scsi command timing out and abort tasks to try and clean it
> up failing, or a nop-out timing out, or the network error could have
> detected a error, or the target could have dropped the sesssion/connection.
>
> In your case it looks like a scsi command timed (by default this is 60
> secs (see /sys/block/sdX/device/timeout)) out and this caused the scsi
> error handler to run. That failed, so we went to session level recovery
> and dropped the session.
>
> > iscsid: connect failed (113)
> > iscsid: connect failed (113)
> > iscsid: connect failed (113)
>
> The session/connection was dropped, and we tried to reconnect (we are at
> the stage just trying to do a socket connection). It failed with
> EHOSTUNREACH (No route to host).
>
> > kernel:  session2: iscsi: session recovery timed out after 120 secs
>
> We tried to reconnect for node.session.timeo.replacement_timeout (see
> iscsi.conf for details) seconds, but could not so we failed the scsi
> error handler (this put the devices in the offline state) and that
> started failing commands.
>
> > kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error
> > recovery
>
> So that explains what happened at the iscsi/scsi layers, but I am not
> sure it is helpful about the why it happened :( Normally when a scsi
> command times out and we cannot reconnect for 2 minutes, it is a driver
> bug or the something bad happened on the target or network. I am not
> sure what is in SLES, so if you could try the userspace tools and kernel
> modules from 
> upstream:http://www.open-iscsi.org/bits/open-iscsi-2.0-865.15.tar.gzhttp://www.open-iscsi.org/bits/open-iscsi-2.0-868-test1.tar.gz
> it would help. Before installing the above, make sure you uninstall the
> SLES iscsi tools (do a "whereis iscsid" to make sure the old tools are
> removed before using the new ones), because I am not sure what is in SLES.

Here are some version numbers to mull over from SLES10SP1 i586:

xyz:~ # modinfo libiscsi
filename:   /lib/modules/2.6.16.54-0.2.5-bigsmp/kernel/drivers/
scsi/libiscsi.ko
license:GPL
description:iSCSI library functions
author: Mike Christie
srcversion: 1B231B8F924EC0C453E2567
depends:scsi_mod,scsi_transport_iscsi
supported:  yes
vermagic:   2.6.16.54-0.2.5-bigsmp SMP 586 REGPARM gcc-4.1

xyz:~ # modinfo scsi_transport_iscsi
filename:   /lib/modules/2.6.16.54-0.2.5-bigsmp/kernel/drivers/
scsi/scsi_transport_iscsi.ko
version:2.0-754
license:GPL
description:iSCSI Transport Interface
author: Mike Christie <[EMAIL PROTECTED]>, Dmitry Yusupov
<[EMAIL PROTECTED]>, Alex Aizman <[EMAIL PROTECTED]>
srcversion: EFE28A663136BA264DC431B
depends:scsi_mod
supported:  yes
vermagic:   2.6.16.54-0.2.5-bigsmp SMP 586 REGPARM gcc-4.1

xyz:~ # modinfo iscsi_tcp
filename:   /lib/modules/2.6.16.54-0.2.5-bigsmp/kernel/drivers/
scsi/iscsi_tcp.ko
license:GPL
description:iSCSI/TCP data-path
author: Dmitry Yusupov <[EMAIL PROTECTED]>, Alex Aizman
<[EMAIL PROTECTED]>
srcversion: 64B40B87D6791C1AD7D6548
depends:libiscsi,scsi_transport_iscsi,scsi_mod
supported:  yes
vermagic:   2.6.16.54-0.2.5-bigsmp SMP 586 REGPARM gcc-4.1
parm:   max_lun:uint

xyz:~ # rpm -qa | grep scsi
open-iscsi-2.0.707-0.24

xyz:~ # iscsid -v
iscsid version 2.0-754

These are production servers, so I can't really 'mess' around with
compiling source and loading drivers from outside the distribution's
packages (sles).  I might see if I can assimilate some server hardware
and run tests on that.  I have people screaming at me for more space,
some people want 10 to 20 TB of space, and if I can't get open-iscsi
to work for us I'll either have to buy iscsi HBAs, or switch to the
more expensive FC route I-:
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iSCSI Errors - SLES10 + Infortrend

2008-02-21 Thread Sparqz

Hi Pasi,

> > My setup:
>
> > 1x HP DL585 - SLES10 x86_64
> > 1x HP DL585 - RHEL4 x86_64
> > 1x HP DL380 - SLES10 i586
>
> SLES10 or SLES10SP1 ?

SLES10SP1

>
> Have you tried installing and using the latest open-iscsi from open-iscsi.org 
> ?
>
> > 2x Cisco 2960G (gigabit) switches
>
> > 2x Infortrend A16E-G2130-4 with 16x 1TB disks each
>
> > The two Infortrend arrays have all their gigabit ethernet ports
> > plugged into one of the cisco switches, then we have 2 fibre
> > connections leading to the other cisco switch which has the three
> > servers plugged into it.  The network is completely isolated from our
> > other company networks.
>
> So you have only 2 gbit/sec of bandwidth between the Cisco switches?

That's correct.  I've never seen the two links saturated together, the
most I've seen is ~95% on the first link and ~50% on the second.

>
> How many ethernet ports do your iSCSI arrays have (plugged in to the
> switches)?

Each iSCSI array has four 1Gbit ethernet ports, so all four ports are
connected on each array.

>
> How many ethernet ports each server is using / plugged in to the switch?

Each server has two 1Gbit ethernet ports - but only one port is used
on each server for iSCSI traffic, the other is for usual LAN traffic.

>
> > At first I thought it was a network problem, so we replaced our dodgy
> > Netgear switches with quality Cisco networking gear, but the problem
> > is the same, if anything it's worse because the Cisco switches
> > facilitate higher bandwidth (extra ~20mb/s) and the errors seem to be
> > more reliably producible.
>
> Do you see packet drops/errors in any of the ports? Check all ports in both
> switches.

No drops and no errors on any of the ports on the servers or on the
switches.  There's no way to tell what is happening on the iSCSI
arrays.

>
> > None of the linux ethernet statistics report any errors (ifconfig) and
> > the cisco switches don't report any packet errors either.  The
> > Infortrend arrays don't provide ethernet statistics.
>
> Check linux TCP statistics for tcp retransmits? netstat -s

Tcp:
9787 active connections openings
4964 passive connection openings
8 failed connection attempts
885 connection resets received
33 connections established
1903902036 segments received
3106760297 segments send out
2108006 segments retransmited
0 bad segments received.
1298 resets sent

Looks like there are...  any way to just pull the stats for eth1 ?

>
> > Wireshark (ethereal) shows many errors - clusters of Duplicate ACKs,
> > and a few "previous segment lost".
>
> Are you using ethernet flow control? Check the switch settings, and server
> NIC settings.. and possible iSCSI array settings..

Someone replied outside of the forum, suggesting I turn on flow
control.  It's made things a lot faster, but I still see problems with
packets, and eventually iscsi errors.

>
> In a bigger IP-SAN setup with many servers and switches flow control might be
> needed to get a good performance and to prevent tcp retransmits from
> happening (=preventing the switch port buffers becoming full and packet drop
> happening).
>
> > Any help would be much appreciated !!!
>
> Btw have you tried with ext3? XFS is known to have problems with some setups
> and versions..

ext3 is worse in my experience.  because our partitions are 1, 2, 5TB
in size XFS works better for us, especially in the case where the
partition has to be scanned for errors.  fsck takes hours on large
multiple terabyte arrays!  xfs_check takes only a few minutes.
Although, it could just be the amount of IO that fsck.ext3 does that
causes iscsi problems and delays etc.

>
> I'm not familiar with Infotrend iSCSI arrays so can't comment much about
> them..

I get that a lot )-;

>
> -- Pasi
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



iSCSI Errors - SLES10 + Infortrend

2008-02-20 Thread Sparqz

Hi All,

I have a nasty problem with open-iscsi on SLES10 + an Infortrend iSCSI
array.

Basically it looks like everything goes wrong as soon as the read/
write load becomes heavy, although network dumps suggest the problem
is always there, it just goes critical when the load is too heavy.

My setup:

1x HP DL585 - SLES10 x86_64
1x HP DL585 - RHEL4 x86_64
1x HP DL380 - SLES10 i586

2x Cisco 2960G (gigabit) switches

2x Infortrend A16E-G2130-4 with 16x 1TB disks each

The two Infortrend arrays have all their gigabit ethernet ports
plugged into one of the cisco switches, then we have 2 fibre
connections leading to the other cisco switch which has the three
servers plugged into it.  The network is completely isolated from our
other company networks.

At first I thought it was a network problem, so we replaced our dodgy
Netgear switches with quality Cisco networking gear, but the problem
is the same, if anything it's worse because the Cisco switches
facilitate higher bandwidth (extra ~20mb/s) and the errors seem to be
more reliably producible.

None of the linux ethernet statistics report any errors (ifconfig) and
the cisco switches don't report any packet errors either.  The
Infortrend arrays don't provide ethernet statistics.

Wireshark (ethereal) shows many errors - clusters of Duplicate ACKs,
and a few "previous segment lost".

dmesg output from the SLES10 x86_64 server:

sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 1945613312
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 3866132584
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 1565827648
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 429620296
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 429619272
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 429618248
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 429617224
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 429616200
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 429615176
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 429614152
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 429613128
Buffer I/O error on device dm-9, logical block 53701593
lost page write due to I/O error on dm-9
Buffer I/O error on device dm-9, logical block 53701594
lost page write due to I/O error on dm-9
Buffer I/O error on device dm-9, logical block 53701595
lost page write due to I/O error on dm-9
Buffer I/O error on device dm-9, logical block 53701596
lost page write due to I/O error on dm-9
Buffer I/O error on device dm-9, logical block 53701597
lost page write due to I/O error on dm-9
Buffer I/O error on device dm-9, logical block 53701598
lost page write due to I/O error on dm-9
Buffer I/O error on device dm-9, logical block 53701599
lost page write due to I/O error on dm-9
Buffer I/O error on device dm-9, logical block 53701600
lost page write due to I/O error on dm-9
Buffer I/O error on device dm-9, logical block 53701601
lost page write due to I/O error on dm-9
Buffer I/O error on device dm-9, logical block 53701602
lost page write due to I/O error on dm-9
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 2972647720
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 2717078440
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 1566942880
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 1023998416
I/O error in filesystem ("dm-9") meta-data dev dm-9 block
0x3d08f850   ("xfs_trans_read_buf") error 5 buf count 8192
sd 11:0:0:4: SCSI error: return code = 0x0002
end_request: I/O error, dev sdi, sector 2048020038
I/O error in filesystem ("dm-9") meta-data dev dm-9 block
0x7a124cc6   ("xlog_iodone") error 5 buf count 1024
xfs_force_shutdown(dm-9,0x2) called from line 960 of file fs/xfs/
xfs_log.c.  Return address = 0x882913aa
Filesystem "dm-9": Log I/O Error Detected.  Shutting down filesystem:
dm-9
Please umount the filesystem, and rectify the problem(s)
xfs_force_shutdown(dm-9,0x1) called from line 424 of file fs/xfs/
xfs_rw.c.  Return address = 0x882a5139
xfs_force_shutdown(dm-9,0x1) called from line 424 of file fs/xfs/
xfs_rw.c.  Return address = 0x882a5139

/var/log/messages from sles10 i586 server:

kernel:  connection2:0: iscsi: detected conn error (1011)
iscsid: detected iSCSI connection 2:0 error (1011) state (3)
iscsid: connect failed (113)
iscsid: connect failed (113)
iscsid: connect failed (113)
kernel:  session2: iscsi: session recovery timed out after 120 secs