Re: 3ware problems
Mike> Er. This is bad; tr_status == 2 means that the command has been Mike> completed; it shouldn't still be on the busy queue. Can you Mike> check to make sure you have the right queue here? Well, it looks like I had the wrong queue before. Blush. At least this time tr_status is 1. Not sure if that is good or bad though! :) Here is the debug output: db> call twe_printqueues twe0: status 57007310 twe0: current max twe0: free 0099 0100 twe0: ready twe0: busy 0001 0100 twe0: complete 0009 twe0: bioq 0021 twe0: AEN queue head 1 tail 0 ready queue: 0 entries busy queue: 1 entries twe0: CMD: request_id 54 opcode size 11 unit 0 host_id 0 twe0: status 0 flags 0x0 count 32 sgl_offset 3 twe0: lba 10466 twe0: 0: 0xffc4000/4096 twe0: 1: 0x11f85000/4096 twe0: 2: 0x12d66000/4096 twe0: 3: 0x10e87000/4096 twe0: tr_command 0xc1520400/0x174f4400 tr_data 0xce0f4000/0xffc4000,16384 twe0: tr_status 1 tr_flags 0x2 tr_complete 0xc011f1b0 tr_private 0xc9260400 complete queue: 0 entries This was generated with the code: void twe_printqueues(void) { struct twe_softc*sc; struct twe_request *tr = NULL; int i, s; s = splbio(); for (i = 0; (sc = devclass_get_softc(twe_devclass, i)) != NULL; i++) { twe_print_controller(sc); printf("ready queue: %d entries\n", sc->twe_qstat[TWEQ_READY].q_length); TAILQ_FOREACH(tr, sc->twe_ready, tr_link) twe_print_request(tr); printf("busy queue: %d entries\n", sc->twe_qstat[TWEQ_BUSY].q_length); TAILQ_FOREACH(tr, sc->twe_busy, tr_link) twe_print_request(tr); printf("complete queue: %d entries\n", sc->twe_qstat[TWEQ_COMPLETE].q_length); TAILQ_FOREACH(tr, sc->twe_complete, tr_link) twe_print_request(tr); } splx(s); } To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
At 05:20 PM 3/21/2001 -0800, Mike Smith wrote: >This is obviously a really weird case; possibly either an extremely >narrow race, or some very borderline PCI issue. One question I should >have asked, but don't recall whether you answered; are you using an AMD >K7 system by any chance? We've seen some *very* weird behaviour with >these controllers in some K7 systems. Just a shot in the dark, but could be something to do with the hard drives themselves ? Are these not the same units that people have been having problems with across the board on various OSes and IDE controllers ? e.g. see the thread Subject: Re: 2GB limit on gzip? on [EMAIL PROTECTED] Different issue, but same model of drives (4 IBM 75GB DTLA) I have been using only Quantum IDEs on all my boxes save for a few 40 gig Maxtors on part of my news spool. ---Mike Mike Tancsa, tel +1 519 651 3400 Network Administration, [EMAIL PROTECTED] Sentex Communications www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
Mike> Sorry, the above code is totally bogus; I'm kinda delirious Mike> (feverish) right now. Mike> Try Mike> TAILQ_FOREACH(tr, &sc->twe_busy, tr_link) Yup, that is what I half figured out half guessed at. (Helps to have other code to look through!) Mike> [...] there's a pattern of some sort involved, we just don't Mike> know what it is yet... I do! (outside air temp / inside air temp) * day of the month % line voltage + wc -l /etc/motd - df -k /var/db/mysql Oh, no. Thats my IQ. :) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
Mike> If you can add another function like twe_printstate that invokes Mike> twe_print_request on each of the requests on the busy queue and Mike> let me know what they look like, that might give me some clues. Doug> OK, I haven't written the twe_printstate function yet, but I Doug> think I have the request. I got the filesystem wedged first, and Doug> then browsing the datastructures with DDB, I think I've found Doug> the busy queue. Here's the request: Mike> Cool, this works just as well. 8) Doug> db> call twe_print_request(0xc1529800) Doug> twe0: CMD: request_id 89 opcode size 7 unit 0 host_id 0 Doug> twe0: status 0 flags 0x0 count 16 sgl_offset 3 Doug> twe0: lba 264703 Doug> twe0: 0: 0xce4f000/4096 Doug> twe0: 1: 0x2ab/4096 Doug> twe0: tr_command 0xc1529800/0x1749d800 tr_data 0xcb928000/0xce4f000,8192 Doug> twe0: tr_status 2 tr_flags 0x1 tr_complete 0xc011f170 tr_private 0 Mike> Er. This is bad; tr_status == 2 means that the command has been Mike> completed; it shouldn't still be on the busy queue. Can you Mike> check to make sure you have the right queue here? I am not at all positive I've got the right queue. I *think* I do. I'm trying to break it again now, and I'll use the code below to verify the queue. I'm also going to hit the kernel core with gdb to see if I can verify that. Doug> I'm rebuilding the kernel now with the function twe_printstate, Doug> after I figured it out with the debugger. (This reminds me of a Doug> saying that has to do with horses and carriages, hmm.) Mike> Hrm. It *should* be pretty easy; I'm sorry I confused you with Mike> the 'printstate' reference; you should be able to fix up Mike> twe_report to just dump the busy queue: Mike> struct twe_request *tr; Mike> ... Mike> TAILQ_FOREACH(tr, TAILQ_FIRST(sc->twe_busy), tr_link) Mike> twe_print_request(tr); This doesn't compile for me. Every time I try to use 'sc->twe_busy' I get a syntax error: invalid type argument of `->' Here is what I'm using right now: s = splbio(); for (i = 0; (sc = devclass_get_softc(twe_devclass, i)) != NULL; i++) { twe_print_controller(sc); printf("ready queue: %d entries\n", sc->twe_qstat[TWEQ_READY].q_length); TAILQ_FOREACH(tr, sc->twe_ready, tr_link) twe_print_request(tr); printf("busy queue: %d entries\n", sc->twe_qstat[TWEQ_BUSY].q_length); TAILQ_FOREACH(tr, sc->twe_busy, tr_link) twe_print_request(tr); printf("complete queue: %d entries\n", sc->twe_qstat[TWEQ_COMPLETE].q_length); TAILQ_FOREACH(tr, sc->twe_complete, tr_link) twe_print_request(tr); } splx(s); This compiles, and when I run it it doesn't crash! :) In fact, it says all the queues are empty. Doug> Oh, btw, it took over 3 million rows to get it stuck this Doug> time. Gotta love a test cycle of 6 hours or so. Sigh. Mike> This is obviously a really weird case; possibly either an Mike> extremely narrow race, or some very borderline PCI issue. One Mike> question I should have asked, but don't recall whether you Mike> answered; are you using an AMD K7 system by any chance? We've Mike> seen some *very* weird behaviour with these controllers in some Mike> K7 systems. Yes, it *is* really weird. I can only get it to break with MySQL. From a suggestion of Mike Tancsa, I tried lots of concurrent bonnies, and also running a buildworld with a high -j value. I let both run for about 12 hours each, with no failure. The only thing that'll kill it is MySQL. I'm confused. :( Nope. Its a SuperMicro P6DBU, with dual 400MHz CPUs. Mike> Thanks again for your help here. My pleasure. (Calling what we are doing 'help' is complementary. If this is help, what you are doing for us must be close to divine intervention! :)) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
Doug> Mike, thanks for all your help and the time you've invested in Doug> this! What we can do to assist? Mike> Do you have any coding experience? I can't reproduce this here, Mike> but what I want to do is see what the command that's stuck on Mike> the busy queue looks like. Sure, sounds like fun. Mike> If you can add another function like twe_printstate that invokes Mike> twe_print_request on each of the requests on the busy queue and Mike> let me know what they look like, that might give me some clues. I'll do that today. Mike> (I'd send you diffs, but I'm snowed at work and quite ill just Mike> now 8(...) Hope you feel better. I've never seen that smilie before, is that for projectile vomiting? To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
> The first time I did this I thought something was broken when I > watched the newfs output those duplicate super block locations. It was > about 10 seconds between each block! After a search of the FreeBSD > lists I found a reference to initializing the array, and just waited. > > On ttyv1 the kernel issues a message when the array initialization is > done, so I usually just wait for that. Newfs is even more pessimal, due to the alignment fixup the driver has to perform. However, the array is perfectly usable while it's being initialised. Just. Dog. Slow. 8) -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
Doug> It takes a surprisingly long time to initialize the array. Mike> The delay is normal. When you setup anything other than a RAID0 Mike> array, the card is actually doing work to your drives in the Mike> background. Grab the array manager from Mike> http://people.freebsd.org/~msmith/RAID/3ware/3dm-bsd.1.09.00.002.tar.gz Mike> and it will notify you when its done. You can also speed up the Mike> initialization part a bit by setting it to a faster rebuild Mike> time. We finally did figure that out. The problem in this particular cirmstance with the 3dm utility is that the only controller in the box is the 3ware 6400. So inorder to run 3dm I need to have FreeBSD installed, and installing FreeBSD at the same time the controller is initializing the array, is really slow. :) The first time I did this I thought something was broken when I watched the newfs output those duplicate super block locations. It was about 10 seconds between each block! After a search of the FreeBSD lists I found a reference to initializing the array, and just waited. On ttyv1 the kernel issues a message when the array initialization is done, so I usually just wait for that. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
At 11:54 AM 3/18/2001 -0600, [EMAIL PROTECTED] wrote: >When we bound the array this last time, we took all the defaults: A >64KB stripe size, disk write cache enabled. It takes a surprisingly >long time to initialize the array. What I did is to boot off the >4.3 floppies (no cdrom in this system) and go to the slice >(a.k.a. DOS partition) editor and write out the slice >information. This 'write' seems to cause the twe0 driver to >initialize the array, and then I go home to bed. When I wake up it >is done, and then I usually just reboot and restart the >installation. The delay is normal. When you setup anything other than a RAID0 array, the card is actually doing work to your drives in the background. Grab the array manager from http://people.freebsd.org/~msmith/RAID/3ware/3dm-bsd.1.09.00.002.tar.gz and it will notify you when its done. You can also speed up the initialization part a bit by setting it to a faster rebuild time. ---Mike Mike Tancsa, tel +1 519 651 3400 Network Administration, [EMAIL PROTECTED] Sentex Communications www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
** Mike Tancsa <[EMAIL PROTECTED]> on Thu, 15 Mar 2001 14:33:29 -0500 ** in [Re: 3ware problems ] writes: Mike> I tried yesterday to stress the machine with 25 simultaneous Mike> bonnie -s 500 & Mike> Although the machine was sluggish, it still worked. Similarly, Mike> make -j12 buildworld worked. In the past when i saw a similar Mike> bug, I could reproduce it 100% of the time this way. Earlier today I ran 30 concurrent "bonnie -s 500" and while things were slow, no problems showed up. Right now I'm on my 7th "make -j16 build world" and its working fine. After this buildworld finishes, I think I'll start up a shell script to keep 20 concurrent bonnie's running overnight. (The buildworlds are taking about 70 minutes to complete. The system is a dual PIII 400MHz with 384MB of RAM on a SuperMicro P6DBU. Not bad times.) So far the only way I can get the problem to show up is banging on MySQL for 3-12 hours. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
> Drat. There it is; you've got a command that looks like it's stuck in > the adapter. > > try changing the value of TWE_Q_LENGTH in /sys/dev/twe/twereg.h to 100 and > see if you can reproduce it. Well, I just woke up and mysqd was stuck again in getblk, this time with a TWE_Q_LENGTH of 100: db> call twe_report twe0: status 57007390 twe0: current max twe0: free 0099 0100 twe0: ready twe0: busy 0001 0100 twe0: complete 0011 twe0: bioq 0027 twe0: AEN queue head 1 tail 0 twed: total bio count in 1646323 out 1646322 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: 3ware problems
> Drat. There it is; you've got a command that looks like it's stuck in > the adapter. I'll go grab the can of WD-40. :) > I didn't see you respond to Mike T - are you using 64k or 128k stripes? I didn't get his query until I had already started the mysqd trying to break things. And now I'm at home, and while serial consoles are *really* great for most things, I can't get at the 3ware BIOS from here. I didn't want to respond until I had checked the bios. I'm /fairly/ sure that I took the default 64K stripes, but one time in rebinding the array I did change the stripe size. > If the latter, try changing the value of TWE_Q_LENGTH in > /sys/dev/twe/twereg.h to 100 and see if you can reproduce it. Rebuilding the kernel as I type. > I am worrying about firmware here at the moment We are running the latest firmware as of about 10 days ago. > Thanks for your patience. Are you kidding? Thanks for all the help. We really appreciate it. Anything we can do to help, let us know. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message