Re: 3ware problems

2001-03-21 Thread rand

Doug Mike, thanks for all your help and the time you've invested in
Doug this! What we can do to assist?

Mike Do you have any coding experience?  I can't reproduce this here,
Mike but what I want to do is see what the command that's stuck on
Mike the busy queue looks like.

Sure, sounds like fun.

Mike If you can add another function like twe_printstate that invokes
Mike twe_print_request on each of the requests on the busy queue and
Mike let me know what they look like, that might give me some clues.

I'll do that today.

Mike (I'd send you diffs, but I'm snowed at work and quite ill just
Mike now 8(...)

Hope you feel better. 

I've never seen that smilie before, is that for projectile vomiting? 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 3ware problems

2001-03-21 Thread rand

Mike If you can add another function like twe_printstate that invokes
Mike twe_print_request on each of the requests on the busy queue and
Mike let me know what they look like, that might give me some clues.
 
Doug OK, I haven't written the twe_printstate function yet, but I
Doug think I have the request. I got the filesystem wedged first, and
Doug then browsing the datastructures with DDB, I think I've found
Doug the busy queue. Here's the request:

Mike Cool, this works just as well. 8)

Doug db call twe_print_request(0xc1529800)
Doug twe0: CMD: request_id 89  opcode READ  size 7  unit 0  host_id 0
Doug twe0:  status 0  flags 0x0  count 16  sgl_offset 3
Doug twe0:  lba 264703
Doug twe0:   0: 0xce4f000/4096
Doug twe0:   1: 0x2ab/4096
Doug twe0:  tr_command 0xc1529800/0x1749d800  tr_data 0xcb928000/0xce4f000,8192
Doug twe0:  tr_status 2  tr_flags 0x1  tr_complete 0xc011f170  tr_private 0

Mike Er.  This is bad; tr_status == 2 means that the command has been
Mike completed; it shouldn't still be on the busy queue.  Can you
Mike check to make sure you have the right queue here?

I am not at all positive I've got the right queue. I *think* I do. I'm
trying to break it again now, and I'll use the code below to verify
the queue. I'm also going to hit the kernel core with gdb to see if I
can verify that. 

Doug I'm rebuilding the kernel now with the function twe_printstate,
Doug after I figured it out with the debugger. (This reminds me of a
Doug saying that has to do with horses and carriages, hmm.)

Mike Hrm.  It *should* be pretty easy; I'm sorry I confused you with
Mike the 'printstate' reference; you should be able to fix up
Mike twe_report to just dump the busy queue:

Mike   struct twe_request  *tr;
Mike ...

Mike   TAILQ_FOREACH(tr, TAILQ_FIRST(sc-twe_busy), tr_link)
Mike   twe_print_request(tr);

This doesn't compile for me. Every time I try to use 'sc-twe_busy' I
get a syntax error: invalid type argument of `-'

Here is what I'm using right now:

s = splbio();
for (i = 0; (sc = devclass_get_softc(twe_devclass, i)) != NULL; i++) {
twe_print_controller(sc);
printf("ready queue: %d entries\n", sc-twe_qstat[TWEQ_READY].q_length);
TAILQ_FOREACH(tr, sc-twe_ready, tr_link) twe_print_request(tr);
printf("busy queue: %d entries\n", sc-twe_qstat[TWEQ_BUSY].q_length);
TAILQ_FOREACH(tr, sc-twe_busy, tr_link) twe_print_request(tr);
printf("complete queue: %d entries\n", sc-twe_qstat[TWEQ_COMPLETE].q_length);
TAILQ_FOREACH(tr, sc-twe_complete, tr_link) twe_print_request(tr);
}
splx(s);

This compiles, and when I run it it doesn't crash!  :) In fact, it
says all the queues are empty.

Doug Oh, btw, it took over 3 million rows to get it stuck this
Doug time. Gotta love a test cycle of 6 hours or so.  Sigh.

Mike This is obviously a really weird case; possibly either an
Mike extremely narrow race, or some very borderline PCI issue.  One
Mike question I should have asked, but don't recall whether you
Mike answered; are you using an AMD K7 system by any chance?  We've
Mike seen some *very* weird behaviour with these controllers in some
Mike K7 systems.

Yes, it *is* really weird. I can only get it to break with MySQL. From
a suggestion of Mike Tancsa, I tried lots of concurrent bonnies, and
also running a buildworld with a high -j value. I let both run for
about 12 hours each, with no failure. The only thing that'll kill it
is MySQL.  I'm confused.  :(

Nope. Its a SuperMicro P6DBU, with dual 400MHz CPUs. 

Mike Thanks again for your help here.

My pleasure. (Calling what we are doing 'help' is complementary. If
this is help, what you are doing for us must be close to divine
intervention!  :))

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 3ware problems

2001-03-21 Thread rand

Mike Sorry, the above code is totally bogus; I'm kinda delirious
Mike (feverish) right now.

Mike Try

Mike   TAILQ_FOREACH(tr, sc-twe_busy, tr_link)

Yup, that is what I half figured out half guessed at. (Helps to have
other code to look through!)


Mike [...] there's a pattern of some sort involved, we just don't
Mike know what it is yet...

I do!

   (outside air temp / inside air temp) * day of the month % line voltage + wc -l 
/etc/motd - df -k /var/db/mysql

Oh, no. Thats my IQ.   :)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 3ware problems

2001-03-21 Thread Mike Tancsa

At 05:20 PM 3/21/2001 -0800, Mike Smith wrote:
This is obviously a really weird case; possibly either an extremely
narrow race, or some very borderline PCI issue.  One question I should
have asked, but don't recall whether you answered; are you using an AMD
K7 system by any chance?  We've seen some *very* weird behaviour with
these controllers in some K7 systems.


Just a shot in the dark, but could be something to do with the hard drives 
themselves ?  Are these not the same units that people have been having 
problems with across the board on various OSes and IDE controllers ? e.g. 
see the thread
Subject: Re: 2GB limit on gzip?
on [EMAIL PROTECTED]
Different issue, but same model of drives (4 IBM 75GB DTLA)
I have been using only Quantum IDEs on all my boxes save for a few 40 gig 
Maxtors on part of my news spool.

 ---Mike

Mike Tancsa,  tel +1 519 651 3400
Network Administration,   [EMAIL PROTECTED]
Sentex Communications www.sentex.net
Cambridge, Ontario Canada www.sentex.net/mike


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 3ware problems

2001-03-18 Thread Mike Tancsa

At 11:54 AM 3/18/2001 -0600, [EMAIL PROTECTED] wrote:
When we bound the array this last time, we took all the defaults: A
64KB stripe size, disk write cache enabled. It takes a surprisingly
long time to initialize the array. What I did is to boot off the
4.3 floppies (no cdrom in this system) and go to the slice
(a.k.a. DOS partition) editor and write out the slice
information. This 'write' seems to cause the twe0 driver to
initialize the array, and then I go home to bed. When I wake up it
is done, and then I usually just reboot and restart the
installation.



The delay is normal.  When you setup anything other than a RAID0 array, the 
card is actually doing work to your drives in the background. Grab the 
array manager from
http://people.freebsd.org/~msmith/RAID/3ware/3dm-bsd.1.09.00.002.tar.gz
and it will notify you when its done. You can also speed up the 
initialization part a bit by setting it to a faster rebuild time.

 ---Mike

Mike Tancsa,  tel +1 519 651 3400
Network Administration,   [EMAIL PROTECTED]
Sentex Communications www.sentex.net
Cambridge, Ontario Canada www.sentex.net/mike


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 3ware problems

2001-03-18 Thread rand

Doug It takes a surprisingly long time to initialize the array.

Mike The delay is normal.  When you setup anything other than a RAID0
Mike array, the card is actually doing work to your drives in the
Mike background. Grab the array manager from
Mike http://people.freebsd.org/~msmith/RAID/3ware/3dm-bsd.1.09.00.002.tar.gz
Mike and it will notify you when its done. You can also speed up the
Mike initialization part a bit by setting it to a faster rebuild
Mike time.

We finally did figure that out. The problem in this particular
cirmstance with the 3dm utility is that the only controller in the box
is the 3ware 6400. So inorder to run 3dm I need to have FreeBSD
installed, and installing FreeBSD at the same time the controller is
initializing the array, is really slow.  :)

The first time I did this I thought something was broken when I
watched the newfs output those duplicate super block locations. It was
about 10 seconds between each block! After a search of the FreeBSD
lists I found a reference to initializing the array, and just waited.

On ttyv1 the kernel issues a message when the array initialization is
done, so I usually just wait for that.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 3ware problems

2001-03-18 Thread Mike Smith

 The first time I did this I thought something was broken when I
 watched the newfs output those duplicate super block locations. It was
 about 10 seconds between each block! After a search of the FreeBSD
 lists I found a reference to initializing the array, and just waited.
 
 On ttyv1 the kernel issues a message when the array initialization is
 done, so I usually just wait for that.

Newfs is even more pessimal, due to the alignment fixup the driver has to 
perform.  However, the array is perfectly usable while it's being 
initialised.  Just.  Dog.  Slow. 8)

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 3ware problems

2001-03-15 Thread Douglas K. Rand

 Drat.  There it is; you've got a command that looks like it's stuck in
 the adapter.

 try changing the value of TWE_Q_LENGTH in /sys/dev/twe/twereg.h to 100 and
 see if you can reproduce it.

Well, I just woke up and mysqd was stuck again in getblk, this time with a
TWE_Q_LENGTH of 100:

db call twe_report
twe0: status   57007390CQEMPTY,UCREADY,RQEMPTY,
twe0:   current  max
twe0: free  0099 0100
twe0: ready  
twe0: busy  0001 0100
twe0: complete   0011
twe0: bioq   0027
twe0: AEN queue head 1  tail 0
twed: total bio count in 1646323  out 1646322




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 3ware problems

2001-03-15 Thread rand

** Mike Tancsa [EMAIL PROTECTED] on Thu, 15 Mar 2001 14:33:29 -0500
** in [Re: 3ware problems ] writes:

Mike I tried yesterday to stress the machine with 25 simultaneous
Mike bonnie -s 500 

Mike Although the machine was sluggish, it still worked.  Similarly,
Mike make -j12 buildworld worked. In the past when i saw a similar
Mike bug, I could reproduce it 100% of the time this way.

Earlier today I ran 30 concurrent "bonnie -s 500" and while things
were slow, no problems showed up. Right now I'm on my 7th "make -j16
build world" and its working fine. After this buildworld finishes, I
think I'll start up a shell script to keep 20 concurrent bonnie's
running overnight.

(The buildworlds are taking about 70 minutes to complete. The system
 is a dual PIII 400MHz with 384MB of RAM on a SuperMicro P6DBU. Not bad
 times.)

So far the only way I can get the problem to show up is banging on
MySQL for 3-12 hours.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: 3ware problems

2001-03-14 Thread Douglas K. Rand

 Drat.  There it is; you've got a command that looks like it's stuck in
 the adapter.

I'll go grab the can of WD-40.   :)

 I didn't see you respond to Mike T - are you using 64k or 128k stripes?

I didn't get his query until I had already started the mysqd trying to break
things. And now I'm at home, and while serial consoles are *really* great for
most things, I can't get at the 3ware BIOS from here. I didn't want to respond
until I had checked the bios.

I'm /fairly/ sure that I took the default 64K stripes, but one time in
rebinding the array I did change the stripe size.

 If the latter, try changing the value of TWE_Q_LENGTH in
 /sys/dev/twe/twereg.h to 100 and see if you can reproduce it.

Rebuilding the kernel as I type.

 I am worrying about firmware here at the moment

We are running the latest firmware as of about 10 days ago.

 Thanks for your patience.

Are you kidding?  Thanks for all the help. We really appreciate it.
Anything we can do to help, let us know.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message