Re: lsi_scsi: error: Bad Status move errors with kvm-79

2009-01-03 Thread Anssi Kolehmainen
On Thu, Jan 01, 2009 at 05:34:33PM +0200, Anssi Kolehmainen wrote:
> Tried kvm-82 (2.6.28-rc7 kernel, kvm-82 modules) today and the same
> bug still exists. I thought everything was working somewhat fine but
> then Windows decided to shoot with shotgun at its own system files and I
> got to spend the day recovering and then rebuilding the vm :)
> 
> I'll setup another computer and see whether I can reproduce that on another
> system.

Well, this time on a brand new PC with 2.6.28, KVM-82. Everything
started out working fine and I thought I couldn't get any errors. First
Oracle DB installation resulted only in few "windows detected controller
error" messages in event log (but those don't cause any problems). After
that Weblogic installation caused windows to BSOD (but no Bad Status
move error). After reboot I tried copying the installation file to guest
drive and that caused 4 Bad Status moves and BSOD...

Tried without kvm modules and everything worked fine (except being
rather slow). Did the same thing with kvm modules loaded and it hit the
error.  Could this be some kind of timing issue?

http://kelvin.aketzu.net/kvm-2sec.log.bz2 contains about two last
seconds of debugging output (with DEBUG_LSI and DEBUG_LSI_REG). Added
second.microsecond timestamps to keep better track of what is going on.
At the moment of crash windows was writing to disk about 10mb/s (copying
file over 100mbps lan).

-- 
Anssi Kolehmainen
anssi.kolehmai...@iki.fi
040-5085390
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lsi_scsi: error: Bad Status move errors with kvm-79

2009-01-01 Thread Anssi Kolehmainen
On Mon, Dec 08, 2008 at 10:27:00AM -0600, Ryan Harper wrote:
> > > * Anssi Kolehmainen  [2008-12-04 08:51]:
> > > > I have kvm environment with linux-2.6.28-rc7 x86_64 (Xeon), kvm-79 host
> > > > and bunch of Win2K3 guests. Sometimes I get 'lsi_scsi: error: Bad Status
> > > > move' from kvm (qemu) and in Windows event log "The device,
> > > > \Device\Scsi\sym_hi1, did not respond within the timeout period." These
> > > > errors come somewhat at random, usually with 10-30 second intervals when
> > > > there is enough disk usage in the guest (seems that installing Bea
> > > > Weblogic or Oracle database is pretty nice for causing these errors).
> 
> Using kvm-userspace.git (kvm-80 is equivalent w.r.t scsi level) I
> installed win2k3 sp2 with scsi as the main device, qcow2 backed file.
> Downloaded, unzipped and installed the Weblogic server.  All with no
> issues at all.  I'll give it a run against raw devices, that seems to be
> what you were using.  Any more details on how to reliably reproduce the
> issue will help me track down the bug.

Tried kvm-82 (2.6.28-rc7 kernel, kvm-82 modules) today and the same
bug still exists. I thought everything was working somewhat fine but
then Windows decided to shoot with shotgun at its own system files and I
got to spend the day recovering and then rebuilding the vm :)

I'll setup another computer and see whether I can reproduce that on another
system.

-- 
Anssi Kolehmainen
anssi.kolehmai...@iki.fi
040-5085390
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lsi_scsi: error: Bad Status move errors with kvm-79

2008-12-08 Thread Ryan Harper
* Anssi Kolehmainen <[EMAIL PROTECTED]> [2008-12-04 10:11]:
> On Thu, Dec 04, 2008 at 09:26:35AM -0600, Ryan Harper wrote:
> > * Anssi Kolehmainen <[EMAIL PROTECTED]> [2008-12-04 08:51]:
> > > Hi,
> > > 
> > > I have kvm environment with linux-2.6.28-rc7 x86_64 (Xeon), kvm-79 host
> > > and bunch of Win2K3 guests. Sometimes I get 'lsi_scsi: error: Bad Status
> > > move' from kvm (qemu) and in Windows event log "The device,
> > > \Device\Scsi\sym_hi1, did not respond within the timeout period." These
> > > errors come somewhat at random, usually with 10-30 second intervals when
> > > there is enough disk usage in the guest (seems that installing Bea
> > > Weblogic or Oracle database is pretty nice for causing these errors).
> > > 
> > > Usually windows is able to recover from these but sometimes (=too often)
> > > I get random delays and hangups. Also I have gotten BSOD 0x77 (0x02,
> > > 0x00, 0x00, 0x5f4000) about once a day.
> > > 
> > > Any ideas how to debug / fix this problem?
> > 
> > Current KVM userspace has a bogus line in the scsi code relating to the
> > DBC register which looks like is what is tripping up the Bad Status, or
> > could be anyhow.  Try out with this patch applied to your qemu dir:
> > 
> > http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg00043.html
> 
> That line is not in kvm-79... And with that line added Windows doesn't
> even seem to start.

Yeah, I was hoping it was something simple, but you're right, 79 was
released before that bogus line made it into qemu.  It's now been
removed from qemu cvs, and kvm-userspace just sync'ed.

> 
> > You can also enable debugging in qemu/hw/lsi53c895a.c and in
> > qemu/hw/scsi-disk.c   Sending that output here would be helpful if we're
> > still tracking it.
> 
> Nice 160mb log from just starting windows and running the installer
> until BSOD. Only 1.7mb with bzip2 compression:
> http://kelvin.aketzu.net/kvm-qemu.log.bz2

I scanned through that, but didn't see anything that jumped out at me.
Typically, I need a good run and a failing run to tell what's going on.

> > Are there free/downloadable copies of Bae or Oracle that I can use to
> > recreate?
> 
> Yeah, they are both available for free. You can get the Bea (now Oracle)
> Weblogic (application server) from:
> http://www.oracle.com/technology/software/products/ias/htdocs/wls_main.html
> The version we used is "Oracle WebLogic Server 10.0 MP1"
> 
> (I guess any big application which uncompresses to HD / installs lots of
> files might work.)

Using kvm-userspace.git (kvm-80 is equivalent w.r.t scsi level) I
installed win2k3 sp2 with scsi as the main device, qcow2 backed file.
Downloaded, unzipped and installed the Weblogic server.  All with no
issues at all.  I'll give it a run against raw devices, that seems to be
what you were using.  Any more details on how to reliably reproduce the
issue will help me track down the bug.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lsi_scsi: error: Bad Status move errors with kvm-79

2008-12-04 Thread Anssi Kolehmainen
On Thu, Dec 04, 2008 at 09:26:35AM -0600, Ryan Harper wrote:
> * Anssi Kolehmainen <[EMAIL PROTECTED]> [2008-12-04 08:51]:
> > Hi,
> > 
> > I have kvm environment with linux-2.6.28-rc7 x86_64 (Xeon), kvm-79 host
> > and bunch of Win2K3 guests. Sometimes I get 'lsi_scsi: error: Bad Status
> > move' from kvm (qemu) and in Windows event log "The device,
> > \Device\Scsi\sym_hi1, did not respond within the timeout period." These
> > errors come somewhat at random, usually with 10-30 second intervals when
> > there is enough disk usage in the guest (seems that installing Bea
> > Weblogic or Oracle database is pretty nice for causing these errors).
> > 
> > Usually windows is able to recover from these but sometimes (=too often)
> > I get random delays and hangups. Also I have gotten BSOD 0x77 (0x02,
> > 0x00, 0x00, 0x5f4000) about once a day.
> > 
> > Any ideas how to debug / fix this problem?
> 
> Current KVM userspace has a bogus line in the scsi code relating to the
> DBC register which looks like is what is tripping up the Bad Status, or
> could be anyhow.  Try out with this patch applied to your qemu dir:
> 
> http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg00043.html

That line is not in kvm-79... And with that line added Windows doesn't
even seem to start.

> You can also enable debugging in qemu/hw/lsi53c895a.c and in
> qemu/hw/scsi-disk.c   Sending that output here would be helpful if we're
> still tracking it.

Nice 160mb log from just starting windows and running the installer
until BSOD. Only 1.7mb with bzip2 compression:
http://kelvin.aketzu.net/kvm-qemu.log.bz2

(The "error:" line isn't there since it went to stderr. This time
Windows crashed so the error should be somewhere in the end.)

> If you can recreate with the patch applied or on an older KVM that
> doesn't have that line in there, I'll try to reproduce. 

kvm-79 doesn't seem to have it.

> Are there free/downloadable copies of Bae or Oracle that I can use to
> recreate?

Yeah, they are both available for free. You can get the Bea (now Oracle)
Weblogic (application server) from:
http://www.oracle.com/technology/software/products/ias/htdocs/wls_main.html
The version we used is "Oracle WebLogic Server 10.0 MP1"

(I guess any big application which uncompresses to HD / installs lots of
files might work.)

-- 
Anssi Kolehmainen
[EMAIL PROTECTED]
040-5085390
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lsi_scsi: error: Bad Status move errors with kvm-79

2008-12-04 Thread Ryan Harper
* Anssi Kolehmainen <[EMAIL PROTECTED]> [2008-12-04 08:51]:
> Hi,
> 
> I have kvm environment with linux-2.6.28-rc7 x86_64 (Xeon), kvm-79 host
> and bunch of Win2K3 guests. Sometimes I get 'lsi_scsi: error: Bad Status
> move' from kvm (qemu) and in Windows event log "The device,
> \Device\Scsi\sym_hi1, did not respond within the timeout period." These
> errors come somewhat at random, usually with 10-30 second intervals when
> there is enough disk usage in the guest (seems that installing Bea
> Weblogic or Oracle database is pretty nice for causing these errors).
> 
> Usually windows is able to recover from these but sometimes (=too often)
> I get random delays and hangups. Also I have gotten BSOD 0x77 (0x02,
> 0x00, 0x00, 0x5f4000) about once a day.
> 
> Any ideas how to debug / fix this problem?

Current KVM userspace has a bogus line in the scsi code relating to the
DBC register which looks like is what is tripping up the Bad Status, or
could be anyhow.  Try out with this patch applied to your qemu dir:

http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg00043.html

You can also try older KVM releases, kvm 76 at least doesn't have that
line present.  That might be easier than applying the patch.

You can also enable debugging in qemu/hw/lsi53c895a.c and in
qemu/hw/scsi-disk.c   Sending that output here would be helpful if we're
still tracking it.

If you can recreate with the patch applied or on an older KVM that
doesn't have that line in there, I'll try to reproduce.  Are there
free/downloadable copies of Bae or Oracle that I can use to recreate?


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


lsi_scsi: error: Bad Status move errors with kvm-79

2008-12-04 Thread Anssi Kolehmainen
Hi,

I have kvm environment with linux-2.6.28-rc7 x86_64 (Xeon), kvm-79 host
and bunch of Win2K3 guests. Sometimes I get 'lsi_scsi: error: Bad Status
move' from kvm (qemu) and in Windows event log "The device,
\Device\Scsi\sym_hi1, did not respond within the timeout period." These
errors come somewhat at random, usually with 10-30 second intervals when
there is enough disk usage in the guest (seems that installing Bea
Weblogic or Oracle database is pretty nice for causing these errors).

Usually windows is able to recover from these but sometimes (=too often)
I get random delays and hangups. Also I have gotten BSOD 0x77 (0x02,
0x00, 0x00, 0x5f4000) about once a day.

Any ideas how to debug / fix this problem?


KVM startup command:
/usr/local/bin/qemu-system-x86_64 -name vm1
 -smp 1 -m 1024 -vnc :4 -k fi -serial mon:telnet::10004,server,nowait
 -daemonize -localtime -vga std -usb -usbdevice tablet 
 -net nic,macaddr=00:16:3e:00:00:4,model=e1000 -net tap,ifname=tap-vm1
 -pidfile /var/run/kvm/vm1.pid -boot c 
 -drive index=0,media=disk,if=scsi,boot=on,file=/dev/mapper/vg0-vm

-- 
Anssi Kolehmainen
[EMAIL PROTECTED]
040-5085390
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html