Dave Watkins wrote:
This _may_ have been a wild goose chase from the beginning.

Are you able to reproduce on a 64-bit system with different hardware?


R.

The OF box I was using had a temporary 3ware raid card (PCI-X) in it until the new 9650SE (PCI-E) cards became available. This has now happened and when this card arrived it seems there are PCI-E issues on the Tyan board we are using. When both PCI-E slots on the board are populated much worse things happen, such as lockups at POST or in the BIOS depending on which slots the cards are physically in. I'm beginning to suspect we were seeing this same issue when stressing the PCI-E NIC but not to the same degree since at the time PCI-E traffic was much less. Of note is that said lockups don't occur unless both cards are installed, the new 3ware or the Intel NIC by themsleves operate fine with the exception of seeing the error that started this thread so I don't think it's the new 3ware card causing the problem The only hole in the above theory is that even when I was using the on-board NIC's to test we were still seeing this and they are both connected to a PCI-X bus. In any case Tyan are aware of the issue now and I'm hoping to see a BIOS release to solve it in the near future

________________________________

From: Rafiu Fakunle [mailto:[EMAIL PROTECTED]
Sent: Tue 11/28/2006 10:49 AM
To: Dave Watkins
Cc: [email protected]
Subject: Re: [OF-users] iSCSI bug?



Dave Watkins wrote:
When you say "local box" do you mean the openfiler box
Yes please.
 or the windows
box? I have run local tests on the windows box without isse but haven't
tried the openfiler box. I will try disabling jumbo frames, I have tried
without NAPI and flow control with no success

-----Original Message-----
From: Rafiu Fakunle [mailto:[EMAIL PROTECTED]
Sent: Tuesday, 28 November 2006 10:27 a.m.
To: Dave Watkins
Cc: [email protected]
Subject: Re: [OF-users] iSCSI bug?

Dave Watkins wrote:
I'll start with a better description and get onto working with the new
packages

I'm running iometer on the Windows Server 2003 x64 box against an
iSCSI
volume using the MS iSCSI initiator (2.02).

Using iometer and selecting any of the 0% read, 0% random access
specifications will return expected performance numbers, but stopping
that test, and changing the access specification to any 100% read, 0%
random test will generate the errors. Larger block sizes _seem_ to
make
it happen more frequently so I have created a 256k block size test for
the above access specifications. I have been using 16 and 64
outstanding
I/O's but anything above zero seems to show the error.

Networking on both ends is via Intel e1000 cards and jumbo frames are
enabled, and so is flow control, NAPI is also enabled on the Openfiler
box. All other network settings are default.
Try without all the tweaking and then enable them one by one.

Also have you successfully run the benchmarks on the local box without
going through iSCSI?


R.

Dave

-----Original Message-----
From: Rafiu Fakunle [mailto:[EMAIL PROTECTED]
Sent: Tuesday, 28 November 2006 1:30 a.m.
To: Dave Watkins
Cc: [email protected]
Subject: Re: [OF-users] iSCSI bug?

Dave Watkins wrote:
Still the same, both with bonding enabled and disabled unfortunatly
First:

try adding "nosoftlockup" to the grub boot options and then run the
benchmarks again.

Next:

http://www.openfiler.com/download/PACKAGES/iscsi_trgt-kernel-r78.ccs
http://www.openfiler.com/download/PACKAGES/iscsi_trgt-r78.ccs

(kernel and userland)

same as before (--replace-files)

Finally:

Also a bit more detail about your test set-up (components, parameters,
triggers etc) would be great.

R.
-----Original Message-----
From: Rafiu Fakunle [mailto:[EMAIL PROTECTED]
Sent: Monday, 27 November 2006 2:52 p.m.
To: Rafiu Fakunle
Cc: Dave Watkins; [email protected]
Subject: Re: [OF-users] iSCSI bug?

Rafiu Fakunle wrote:
Dave Watkins wrote:
Ok, UP is fine. To be sure it wasn't the e1000 driver I also tried
using
only the Broadcom NIC's as well. Under UP there is no error, under
SMP
the error reoccurs even with e1000 not loaded and no bonding.

Hope this helps
Immensely. I'm just doing up a changeset for you.
http://www.openfiler.com/download/PACKAGES/iscsi_trgt-kernel-0.4.14.ccs
conary update iscsi_trgt-kernel-0.4.14.ccs --replace-files

Then test again with 2.6.17.14-0.3.smp.x86_64 (with and without
bonding)
Thx,

R.
R.

Dave

-----Original Message-----
From: Rafiu Fakunle [mailto:[EMAIL PROTECTED] Sent: Monday, 27
November 2006 1:15 p.m.
To: Dave Watkins
Cc: [email protected]
Subject: Re: [OF-users] iSCSI bug?

OK, and UP without trunking?

R.

Dave Watkins wrote:

With or without trunking seem to generate the same problem

Without trunking I got
BUG: soft lockup detected on CPU#0!

Call Trace: <IRQ> <ffffffff8029f73c>{softlockup_tick+210}
       <ffffffff80289151>{update_process_times+66}
<ffffffff802713fe>{smp_local_timer_interrupt+35}
       <ffffffff80271463>{smp_apic_timer_interrupt+65}
<ffffffff8025f54c>{apic_timer_interrupt+132} <EOI>
       <ffffffff80224b87>{tcp_sendmsg+0}
<ffffffff80413bba>{inet_ioctl+0}
       <ffffffff88141216>{:iscsi_trgt:is_data_available+62}
       <ffffffff881419e7>{:iscsi_trgt:istd+1460}
<ffffffff80403ea6>{tcp_sendpage+0}
       <ffffffff8027fef6>{__wake_up_common+67}
<ffffffff8029131c>{keventd_create_kthread+0}
       <ffffffff88141433>{:iscsi_trgt:istd+0}
<ffffffff8029131c>{keventd_create_kthread+0}
       <ffffffff80231a7d>{kthread+200}
<ffffffff8025f8a2>{child_rip+8}
       <ffffffff8029131c>{keventd_create_kthread+0}
<ffffffff8027308f>{flat_send_IPI_mask+0}
       <ffffffff8027308f>{flat_send_IPI_mask+0}
<ffffffff8027308f>{flat_send_IPI_mask+0}
       <ffffffff802319b5>{kthread+0}
<ffffffff8025f89a>{child_rip+0}
BUG: soft lockup detected on CPU#0!

Call Trace: <IRQ> <ffffffff8029f73c>{softlockup_tick+210}
       <ffffffff80289151>{update_process_times+66}
<ffffffff802713fe>{smp_local_timer_interrupt+35}
       <ffffffff80271463>{smp_apic_timer_interrupt+65}
<ffffffff8025f54c>{apic_timer_interrupt+132} <EOI>
       <ffffffff80224b87>{tcp_sendmsg+0}
<ffffffff881411c0>{:iscsi_trgt:nthread_wakeup+35}
       <ffffffff881411b3>{:iscsi_trgt:nthread_wakeup+22}
<ffffffff8814219a>{:iscsi_trgt:istd+3431}
       <ffffffff80403ea6>{tcp_sendpage+0}
<ffffffff8027fef6>{__wake_up_common+67}
       <ffffffff8029131c>{keventd_create_kthread+0}
<ffffffff88141433>{:iscsi_trgt:istd+0}
       <ffffffff8029131c>{keventd_create_kthread+0}
<ffffffff80231a7d>{kthread+200}
       <ffffffff8025f8a2>{child_rip+8}
<ffffffff8029131c>{keventd_create_kthread+0}
       <ffffffff8027308f>{flat_send_IPI_mask+0}
<ffffffff8027308f>{flat_send_IPI_mask+0}
       <ffffffff8027308f>{flat_send_IPI_mask+0}
<ffffffff802319b5>{kthread+0}
       <ffffffff8025f89a>{child_rip+0}

Re-enabling trunking again and I get
BUG: soft lockup detected on CPU#0!

Call Trace: <IRQ> <ffffffff8029f73c>{softlockup_tick+210}
       <ffffffff80289151>{update_process_times+66}
<ffffffff802713fe>{smp_local_timer_interrupt+35}
       <ffffffff80271463>{smp_apic_timer_interrupt+65}
<ffffffff8025f54c>{apic_timer_interrupt+132} <EOI>
       <ffffffff80254356>{tcp_ioctl+0}
<ffffffff8020af50>{__might_sleep+30}
       <ffffffff802326d7>{lock_sock+28}
<ffffffff80263257>{_spin_lock_bh+9}
       <ffffffff8022fd23>{release_sock+15}
<ffffffff802543a2>{tcp_ioctl+76}
       <ffffffff80413c44>{inet_ioctl+138}
<ffffffff88141216>{:iscsi_trgt:is_data_available+62}
       <ffffffff8814125a>{:iscsi_trgt:do_recv+41}
<ffffffff8023081f>{qdisc_restart+24}
       <ffffffff8022eaa6>{dev_queue_xmit+510}
<ffffffff8807c266>{:bonding:bond_dev_queue_xmit+489}
       <ffffffff8023277e>{lock_sock+195}
<ffffffff8807fd96>{:bonding:bond_xmit_roundrobin+154}
       <ffffffff80232136>{__tcp_push_pending_frames+1367}
<ffffffff8022fd23>{release_sock+15}
       <ffffffff80225551>{tcp_sendmsg+2506}
<ffffffff80236f84>{do_sock_write+199}
       <ffffffff803dbac1>{sock_writev+220}
<ffffffff8025db21>{cache_alloc_refill+237}
       <ffffffff80220d80>{tcp_transmit_skb+1579}
<ffffffff80408067>{tcp_retransmit_skb+1352}
       <ffffffff80254356>{tcp_ioctl+0}
<ffffffff8024f5a4>{finish_wait+52}
       <ffffffff803e0d10>{sk_stream_wait_memory+458}
<ffffffff80291608>{autoremove_wake_function+0}
       <ffffffff80291608>{autoremove_wake_function+0}
<ffffffff8022fd23>{release_sock+15}
       <ffffffff80246a25>{try_to_wake_up+955}
<ffffffff88141609>{:iscsi_trgt:istd+470}
       <ffffffff80403ea6>{tcp_sendpage+0}
<ffffffff8027fef6>{__wake_up_common+67}
       <ffffffff8029131c>{keventd_create_kthread+0}
<ffffffff88141433>{:iscsi_trgt:istd+0}
       <ffffffff8029131c>{keventd_create_kthread+0}
<ffffffff80231a7d>{kthread+200}
       <ffffffff8025f8a2>{child_rip+8}
<ffffffff8029131c>{keventd_create_kthread+0}
       <ffffffff8027308f>{flat_send_IPI_mask+0}
<ffffffff8027308f>{flat_send_IPI_mask+0}
       <ffffffff8027308f>{flat_send_IPI_mask+0}
<ffffffff802319b5>{kthread+0}
       <ffffffff8025f89a>{child_rip+0}

Without trunking though the write performance after this doesn't
seem
to

be affected (still at about 80-90MB rather than down at less than
10MB)

-----Original Message-----
From: Rafiu Fakunle [mailto:[EMAIL PROTECTED] Sent: Monday, 27
November 2006 12:27 p.m.
To: Dave Watkins
Cc: [email protected]
Subject: Re: [OF-users] iSCSI bug?

Dave Watkins wrote:
Sorry about that, I remembered as soon as I sent it that I hadn't
included version. It's x86_64 version 2.2 (did a conary updateall
from

2.1 beta. Uname -r gives 2.6.17.14-0.3.smp.x86_64.

I'll try with a UP kernel although it will take some time as I
have
to

rebuild the e1000 module from the UP kernel sources.
Try without the network trunking anyway in the meantime. Would be
an
interesting test.

R.


I'll let you know
if I can reproduce on the UP kernel.

I don't think it's related to that ticket as they are all writes
anyway
and they only see the problem on large files.

Dave

-----Original Message-----
From: Rafiu Fakunle [mailto:[EMAIL PROTECTED] Sent: Monday, 27
November 2006 11:40 a.m.
To: Dave Watkins
Cc: [email protected]
Subject: Re: [OF-users] iSCSI bug?

Hi Dave,

Excellent test and bug report.

I wonder whether it may be related to this:

https://project.openfiler.com/tracker/ticket/435

Can you try to reproduce with a UP kernel pls.

Also I need the output of `uname -r`

Thx,

R.

FTR: this is running r58 from IET svn


Dave Watkins wrote:
Hi All

I think I've found a bug in the iscsi target software in my
benchmarking/testing.

Some background on the hardware first in case it may be related.
Dual core/dual opteron with 2GB of ram
3ware 8006 2 port raid card for OS drives
3ware 9550SX card for data drives
Dual GB Broadcom on-board NIC's teamed into bond0 (management)
Quad port Intel PCI-E GB NIC with all 4 ports teamed into bond1
(main

iscsi data network)
4 x 250GB WD SATA HDD's in RAID5

Of note here is that I have had to replace the e1000 driver with
the
latest from Intel to support the quad port card

I have made some volumes and mounted them on various windows
servers
and
have been using iobench to tune performance of the system. When
using

a
read only test pattern I see this

BUG: soft lockup detected on CPU#0!

Call Trace: <IRQ> <ffffffff8029f73c>{softlockup_tick+210}
       <ffffffff80289151>{update_process_times+66}
<ffffffff802713fe>{smp_local_timer_interrupt+35}
       <ffffffff80271463>{smp_apic_timer_interrupt+65}
<ffffffff8025f54c>{apic_timer_interrupt+132} <EOI>
       <ffffffff88141486>{:iscsi_trgt:istd+83}
<ffffffff88141476>{:iscsi_trgt:istd+67}
       <ffffffff80403ea6>{tcp_sendpage+0}
<ffffffff8027fef6>{__wake_up_common+67}
       <ffffffff8029131c>{keventd_create_kthread+0}
<ffffffff88141433>{:iscsi_trgt:istd+0}
       <ffffffff8029131c>{keventd_create_kthread+0}
<ffffffff80231a7d>{kthread+200}
       <ffffffff8025f8a2>{child_rip+8}
<ffffffff8029131c>{keventd_create_kthread+0}
       <ffffffff802319b5>{kthread+0}
<ffffffff8025f89a>{child_rip+0}
BUG: soft lockup detected on CPU#0!

Call Trace: <IRQ> <ffffffff8029f73c>{softlockup_tick+210}
       <ffffffff80289151>{update_process_times+66}
<ffffffff802713fe>{smp_local_timer_interrupt+35}
       <ffffffff80271463>{smp_apic_timer_interrupt+65}
<ffffffff8025f54c>{apic_timer_interrupt+132} <EOI>
       <ffffffff802631ec>{_spin_unlock_irqrestore+8}
<ffffffff80246a25>{try_to_wake_up+955}
       <ffffffff881411cc>{:iscsi_trgt:nthread_wakeup+47}
<ffffffff8814219a>{:iscsi_trgt:istd+3431}
       <ffffffff80403ea6>{tcp_sendpage+0}
<ffffffff8027fef6>{__wake_up_common+67}
       <ffffffff8029131c>{keventd_create_kthread+0}
<ffffffff88141433>{:iscsi_trgt:istd+0}
       <ffffffff8029131c>{keventd_create_kthread+0}
<ffffffff80231a7d>{kthread+200}
       <ffffffff8025f8a2>{child_rip+8}
<ffffffff8029131c>{keventd_create_kthread+0}
       <ffffffff802319b5>{kthread+0}
<ffffffff8025f89a>{child_rip+0}
Doing write only based patterns this doesn't come up. After this
performance of the system dives (from about 110MB/sec of iscsi
performance to about 10MB/sec).

This is fairly reproducible here so if you need anymore
information
just
ask.

Dave


------------------------------------------------------------------------
_______________________________________________
Openfiler-users mailing list
[email protected]
https://lists.openfiler.com/mailman/listinfo/openfiler-users
_______________________________________________
Openfiler-users mailing list
[email protected]
https://lists.openfiler.com/mailman/listinfo/openfiler-users





_______________________________________________
Openfiler-users mailing list
[email protected]
https://lists.openfiler.com/mailman/listinfo/openfiler-users

Reply via email to