Re: Vinum deprecation for FreeBSD 14 - are there any remaining Vinum users?
On Thu, Apr 01, 2021 at 11:20:44AM +0300, Lev Serebryakov wrote: | On 01.04.2021 2:39, Doug Ambrisko wrote: | | > | > I can only state that I use it only occasionally, and that when I do. I | > | > have had no problems with it. I'm glad that it's there when I need it. | > | | > | Thanks for the reply. Can you comment on your use cases - in | > | particular, did you use mirror, stripe, or raid5? If the first two | > | then gmirror, gconcat, gstripe, and/or graid are suitable | > | replacements. | > | | > | I'm not looking to deprecate it just because it's old, but because of | > | a mismatch between user and developer expectations about its | > | stability. | > | > It would be nice if graid got full support for RAID5 alteast I'm not sure | > how much the others are used for that are not fully supported (RAID4, | > RAID5, RAID5E, RAID5EE, RAID5R, RAID6, RAIDMDF) according to the man | > page. I started to hack in RAID5 full support and try to avoid writes | > if members didn't change. This limits our VROC support. | My experience, as co-author and maintainer of `sysutil/graid5`, | shows, that it is very non-trivial task. It contains many subtle | problems. | | `graid5` still has some undiscovered problems, and I don't think it | worth fixing in 2021, when we have ZFS for many years. The only advantage I see of graid supporting raid5 would be better support for VROC and people like RAID5. I don't like RAID5 for SSD's since it adds to write amplification issues but people like it. RAID5 had terrible write performance in Linux with concurrent I/O. I wanted to see if FreeBSD could do better. Intel seems to be pushing VMD since we recently had a FreeBSD user need newer VMD support since they couldn't turn it off in the BIOS. VMware doesn't support VROC. We support it a bit in that VMD allows graid to access the drives and deals with the Intel meta data. It doesn't read the info. from the EFI runtime. So in RAID 0, 1 and 10 should work. It would be nice if someone could install FreeBSD on working Linux config. No-one has asked for it so it doesn't seem very important. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Vinum deprecation for FreeBSD 14 - are there any remaining Vinum users?
On Fri, Mar 26, 2021 at 10:22:53AM -0400, Ed Maste wrote: | On Thu, 25 Mar 2021 at 15:09, Chris wrote: | > | > I can only state that I use it only occasionally, and that when I do. I | > have had no problems with it. I'm glad that it's there when I need it. | | Thanks for the reply. Can you comment on your use cases - in | particular, did you use mirror, stripe, or raid5? If the first two | then gmirror, gconcat, gstripe, and/or graid are suitable | replacements. | | I'm not looking to deprecate it just because it's old, but because of | a mismatch between user and developer expectations about its | stability. It would be nice if graid got full support for RAID5 alteast I'm not sure how much the others are used for that are not fully supported (RAID4, RAID5, RAID5E, RAID5EE, RAID5R, RAID6, RAIDMDF) according to the man page. I started to hack in RAID5 full support and try to avoid writes if members didn't change. This limits our VROC support. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Cisco 12G SAS RAID support (FreeBSD 12.1-RELEASE) ?
On Tue, Nov 05, 2019 at 09:44:36PM +0100, Miroslav Lachman wrote: | Chris Ross wrote on 11/05/2019 21:19: | > On Tue, Nov 05, 2019 at 08:20:15PM +0100, Miroslav Lachman wrote: | >> Chris Ross wrote on 11/05/2019 19:34: | >>> Hello. I have a Cisco UCS C220-M5 with a RAID controller. It calls itself | >>> "Cisco 12G Modular Raid Controller with 2GB cache", PPID UCSC-RAID-M5. | >>> Looking at the CIMC, it shows the PCI vendor/device ids 1000:0014, which | >>> looks to be an LSI MegaRAID Tri-Mode SAS3516. It looks like this should | >>> be supported by the mpr(4) driver, but it doesn't seem to recognize it | >>> at boot time. | >> | >> Do you have mpr_load="YES" in loader.conf? | >> Or for ISO booting you can manually load kernel modules at boot prompt. | > | > I dropped to boot prompt in ISO boot, and entered 'mpr_load="YES"'. | > | > I tried "load", but wasn't able to devine how to load the mpr module with | > that. Is that needed, or should 'mpr_load="YES"' have accomplished the | > desired result? | | mpr_load="YES" goes to /etc/loader.conf | | If you need to load mpr manually in boot prompt I am not sure if it | should be: | load mpr | or | load mpr.ko | of full path | load /boot/kernel/mpr.ko This should be a mrsas card and not an HBA! mrsas supports all current UCS RAID cards ... and the next unreleased UCS system :-) You might need the one in -current for that. I'm not sure what is in 12.1. Doug A. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Hangs with mrsas?
On Tue, Mar 22, 2016 at 04:09:48PM -0400, Garrett Wollman wrote: | < said: | | > You could try: | > https://people.freebsd.org/~ambrisko/mrsas.patch | | I take it that the important part of this patch is changing the DMA | tag and scatter/gather setup to allow 64-bit addresses? (Why would | the original driver have been limited to 32-bit addresses? It's quite | new hardware!) Yes, primarily ... there are some other things such as let the OS set things up especially in the ioctl path since user-land probably won't setup a proper SG list for the kernel. The DMA address space for the card was limited to 256K in 32 bit address space. So it didn't take much to fragment that up so things could fail or have to wait to get memory. On initial boot things worked "okay" but after some run time with our appliance (we run 64 bit) memory allocations would have issues. We found this was made worse with RAID cards that didn't have cache. I assume no cache would make I/O operations to take longer and then tie up memory longer. With the same SW running on cards with cache we didn't see these issues. So I assume they completed fast enough not to hold onto memory for very long. With these changes our appliances without RAID cache runs faster and doesn't run into "strange" issues now. We run in RAID 10 mode. It also adds RAID card event messages to dmesg. On the plus side this code exposed a VM bug in 9.2 for us! There is still a bug that with a card without cache if I send lots of management commands quickly to reconfigure the RAID the driver reports the firmware had an OCR issue and never recovers. If I put a sleep 1 after each command then it is okay. I need to try this again and dump the term log to see if the firmware will give me a clue. With the cards that we are currently using the RAID cache is an option. So they only thing I'm changing is the HW and not the firmware. However, the firmware seems to flip itself into different device when I add or remove cache. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Installer on serial-console-only-embedded system
On Mon, Aug 12, 2013 at 01:53:15PM +, Teske, Devin wrote: | sysinstall had the ability to allow you to muck with /etc/ttys before | rebooting to your installed OS. | | This functionality is coming back slowly. | | In 9.2-R you will be able to (somehow) bow out of the installation process | after it's complete (e.g., "Ctrl-C" ??) and then run bsdconfig -- invoking | the "TTYs" module, giving you a chance to change the settings before you | reboot from your newly installed system. | | Tighter Integration will follow in the years to come... but replacing a | tool that had a 15-year run which did _all_ of this stuff, is/was not an | overnight project. Rather, it's a journey! I also had made changes to sysinstall that if it detected a boot with -h then it did the /etc/tty etc. changes automatically to the installed system. It would be good to see this come back. I'm not sure if Robert's official changes did that. It's fairly easy to check what the console device is and then do the right thing. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mfi panic on recused on non-recusive mutex MFI I/O lock
On Fri, Nov 09, 2012 at 05:06:03PM -, Steven Hartland wrote: | | - Original Message - | From: "Steven Hartland" | ... | >I've just had another panic, trace below, but it doesn't seem to be related | >to my changes so I'd appreciate your feedback on them as they are for now. | > | >While the lock patch fixes the problems I've seen, its not clear to me | >why mfi_tbolt_reset is acquiring the lock and hence requiring | >mfi_process_fw_state_chg_isr to jump through hoops to ensure locking | >around queue manipulation is done correctly. Given what its doing | >(resetting the entire adapter) I wouldn't be surprised if it should | >really be acquiring the config lock. | > | >Other things I've noticed / questions | >* Should mfi_abort sleep even if its call to mfi_mapcmd fails? | >* Should mfi_get_controller_info really ignore the error from mfi_mapcmd? | >* Do these controllers not support none 512 byte requests? Currently | >all syspd requests are done assuming 512 byte sectors which the disk may | >not be. This will both reduce performance or potentially break totally | >if the firmware isn't translating it under the surface correctly. | > | >Anyway the new panic manually transcribed is:- | >panic: Bad linx elm 0xff0069b0fc0 next->prev != elm | >... | >mfi_tbolt_get_cmd() | >mfi_build_mpt_pass_thru() | >mfi_tbolt_build_mpt_cmd() | >mfi_tbolt_send_frame() | >bus_dmamap_load() | >mfi_mapcmd() | >mfi_startio() | >mfi_syspd_strategy() | >g_disk_start() | >g_io_schedule_down() | >g_down_proc_body() | >fork_exit() | >fork_trampoline() | > | >Looks like mfi_cmd_tbolt_tqh has become corrupt some how, but as far as I | >can tell all manip is done using the TAILQ macros and under mfi_io_lock | >so its not obvious to me at this time why this is, any ideas? | | I've gone through looking for the possible cause of this and while there's | nothing directly connected to the manip of this queue I've found and fixed | quite a large number of additional problems which may have been indirectly | causing this problem. | | The biggest change is to use mfi_max_cmds to limit the value stored in | sc->mfi_max_fw_cmds as this is used extensively throughout the driver | for allocation and range checks so having this inconsitently set opened up | a large number of possible overrun errors. | | The new patch attached documents all the changes in detail. | | I've managed to do one test run so far which failed to reproduce any panics, | so definitely moving in the right direction :) | | The machine has now been collected for repair by the supplier but I'm going | to try and get them to put it online for more testing over the weekend. | | Given the failure rate so far if I can do another 4 runs with no panics I'd | be happy that the majority of error conditions are working as expected. Sounds like you have made some good progress. I looked at your prior locking change and they good. Haven't had time to go through the queue changes yet. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mfi panic on recused on non-recusive mutex MFI I/O lock
On Tue, Nov 06, 2012 at 12:09:42AM -, Steven Hartland wrote: | Thanks Doug, actually just finished another test run with some more | debugging in and I believe I've found the reason for the non-recusive | lock and at least some of the queuing issues. | | The non-recursive lock is due to the mfi_tbolt_reset calling | mfi_process_fw_state_chg_isr with mfi_io_lock held which in turn calls | mfi_tbolt_init_MFI_queue which tries to acquire mfi_io_lock hence | the problem. | | mfi-lock.txt attached I believe fixes this as well as what appears | to be an invalid call to mtx_unlock(&sc->mfi_io_lock) in mfi_attach | which never acquires the lock as far as can see, possibly a cut and | paste error. I don't seem to see the attachment. | The invalid queue problems seem to stem from the error cases of | the calls to mfi_mapcmd, some of which call mfi_release_command which | blindly sets cm_flags = 0 and then enqueues it on the free queue. Now | depending on the flow of mfi_mapcmd and where the error occurs the | command may or may not have been put on the busy queue which is going | to cause problems. | | Going to investigate this further but that's what my current theory is. | | Your patch seems quite extensive, so if could you give me brief run | down on the changes that would be most appreciated. I'll being doing that in the commit message which should happen today. | FYI, I'm aware that the cause of my underlying issues are some | hardware issues (likely cable or backplane related) but it does mean | I'm in the position to test these usually rare error cases, so wanting | the make the most of it before we get the hardware swapped out. That would be good. It makes it easier to debug things when it shows the problem. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Problem with IPMI KCS driver
On Thu, Oct 18, 2012 at 01:44:59PM +0400, Anton Yuzhaninov wrote: | On 28.09.2012 16:48, John Baldwin wrote: | >>kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 point to | >>> /usr/src/sys/dev/ipmi/ipmi_kcs.c:94 | >>> | >>> 91 while (ticks - start< MAX_TIMEOUT&& | >>> 92 !(status& KCS_STATUS_OBF)) { | >>> 93 DELAY(100); | >>> 94 status = INB(sc, KCS_CTL_STS); | >>> 95 } | >Hummm. I'm a bit out of ideas then. Even the volatile change is a bug | >that | >could have been confirmed (to see if volatile was preventing the compiler | >from caching the value of 'ticks') by examining the assembly. | > | >Well, maybe this. This just avoids using 'ticks' altogether and depends on | >DELAY(100) doing what it says: | | New patch also don't solve my problem. | | My guess was wrong. Loop in kcs_wait_for_obf() is not endless, at least | with last patch. | Whole function called in some loop, but because loop in kcs_wait_for_obf() | takes much CPU time, backtrace always point to loop kcs_wait_for_obf(). Yep, the IPMI local interfaces are polled so they use a lot of CPU so it pretty much always going to be checking "are you done yet" once a command is submitted. We have local patches here that changes the DELAY into a tsleep when the system is running. It has the bad feature of making it a lot slower but uses far less CPU. So for us it is a good trade off. One reason to put it into a loop is so things happen in order and are not interrupted. I guess a different approach might be to do a "big" lock around the entire submit and get response code fargment. Then it would be expensed against the application thread running in the kernel. We also have local changes to all it to run in polled mode without the kernel thread when we are dumping a kernel backtrace into the IPMI system event log. That's nice when the kernel core hasn't worked on a remote machine but we see the back trace in SEL. | This problem need further investigation. It might be good to instrument the code in ipmi.c in which it sending a command and then getting status. If that is actually looking okay then maybe some application is doing something bad. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9
On Fri, Jul 27, 2012 at 10:51:43PM +0300, Andriy Gapon wrote: | on 27/07/2012 17:33 Andrew Boyer said the following: | > | > On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote: | > | >> For the time being I had to revert the following from my stable/9 tree. | >> Otherwise I would get a kernel panic on shutdown from ipmi(4). | >> | >> http://svnweb.freebsd.org/base?view=revision&revision=237839 | >> http://svnweb.freebsd.org/base?view=revision&revision=221121 | > | > On a somewhat related note: We noticed recently that you can't pet or disable | > the IPMI hardware watchdog once SCHEDULER_STOPPED() is true. This means it | > can fire unexpectedly while you're dumping core or rebooting, depending on | > how long the timeout was on the pet before the panic. The ipmi driver will | > need to process the command differently if the scheduler is stopped. I | > haven't had time to look at a fix yet. | | Yeah, I noticed that unlike most (all?) other watchdog drivers where watchdog | re-arming is a very basic operation like doing one I/O the IPMI watchdog does | some more complex stuff which involves waiting on another thread. I think that | this may be a little bit too much for a reliable watchdog driver. At least, as | you note, this definitely won't work for the panic case where only one thread is | left running. I guess that the driver should check for that case and do a | direct operation instead of enqueueing a request and waiting for another thread | to execute it. I have some local hacks, that allows KCS mode to run in a polled mode. We do that so we can put kernel back traces into the system event log. Julian had code in FreeBSD to "pat" a watchdog during a core dump. We have local code here to disable console muted when dropping into the kernel debugger and enable console muting when exited. It might be useful to tie this into the watchdog, disable it when in kernel debugger and resume it when exited. With my polling hack, I don't think I delt with the case if there was already a transaction in progress. SMIC could be done like KCS. SSIF could be harder since it uses the i2c interface to talk to the HW which is more complicated. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: LSI MegaRAID SAS 9240 with mfi driver?
Jan Mikkelsen writes: | On 31/03/2012, at 1:14 AM, Doug Ambrisko wrote: | > John Baldwin writes: | > | On Friday, March 30, 2012 12:06:40 am Jan Mikkelsen wrote: | > | > Hi, | > | > | ... | > | > | > I have a loan LSI MegaRAID SAS 9240-4i controller for testing. | > | > The pciconf -lv output is: | > | > | > | > none3@pci0:1:0:0: class=0x010400 card=0x92411000 chip=0x00731000 rev=0x03 hdr=0x00 | > | > vendor = 'LSI Logic / Symbios Logic' | > | > device = 'MegaRAID SAS 9240' | > | > class = mass storage | > | > subclass = RAID | > | > | > | > I added this line to src/sys/dev/mfi/mfi_pci.c | > | > | > | > {0x1000, 0x0073, 0x, 0x, MFI_FLAGS_GEN2, "LSI MegaRAID SAS 9240"}, | > | > | > | > It gave this result (tried with hw.mfi.msi set to 0 and to 1): | > | > | > | > mfi0: port 0xdc00-0xdcff mem 0xfe7bc000-0xfe7b,0xfe7c-0xfe7f irq 16 at device 0.0 on pci1 | > | > mfi0: Using MSI | > | > mfi0: Megaraid SAS driver Ver 3.00 | > | > mfi0: Frame 0xff8000285000 timed out command 0x26C8040 | > | > mfi0: failed to send init command | > | > | > | > The firmware is package 20.10.1-0077, which is the latest on the LSI website. | > | > | > | > Is this path likely to work out? Any suggestions on where to go from here? | > | | > | You should try the updated mfi(4) driver that Doug (cc'd) is going to soon | > | merge into HEAD. It syncs up with the mfi(4) driver on LSI's website which | > | supports several cards that the current mfi(4) driver does not. (I'm not | > | fully sure if the 9240 is in that group or not. Doug might know however.) | > | > Yes, this card is supported with the mfi(4) in projects/head_mfi. Looks | > like we fixed a couple of last minute found bugs when trying to create a | > RAID wth mfiutil. This should be fixed now. I'm going to start the | > merge to -current today. The version in head_mfi can run on older | > versions of FreeBSD with the changes that Sean did. | | I have just imported the mfi(4) and mfiutil(8) into a 9.0-RELEASE tree to | try this out. | | When booting up with two fresh drives attached, they show up as usable | JBOD disks. However, I cannot use mfiutil to create anything with them. | Every drive gives | |"mfiutil: Drive n not available" You might want to include the output of: mfiutil show drives and then the command you are trying to do to create a RAID with. | Is this expected behaviour? How can I create a raid1 volume using | mfiutil and clean disks? I'm not sure if mfiutil can switch disks from JBOD mode to RAID. I don't see any reason why it shouldn't. It can't go from RAID to real JBOD mode since it doesn't have code to support that. | I tried using MegaCli from the LSI website (versions 8.02.16 and | 8.02.21), but they can't even detect the controller. I know you | said at some point that a very recent version of MegaCli was | required. What version is necessary? What was the syntax you used since usage is cryptic? I've never seen a MegaCli that couldn't access the card. What I meant by more recent MegaCli is earlier versions didn't have the JBOD commands in it. I have a 8.00.46 that knows about JBOD. | dmesg: | | mfi0: port 0xdc00-0xdcff mem 0xfe7bc000-0xfe7b,0xfe7c-0xfe7f irq 16 at device 0.0 on pci1 | mfi0: Using MSI | mfi0: Megaraid SAS driver Ver 4.23 | mfi0: 7021 (387925223s/0x0020/info) - Shutdown command received from host | mfi0: 7022 (boot + 4s/0x0020/info) - Firmware initialization started (PCI ID 0073/1000/9241/1000) | mfi0: 7023 (boot + 4s/0x0020/info) - Firmware version 2.120.244-1482 | mfi0: 7024 (boot + 5s/0x0020/info) - Package version 20.10.1-0077 | mfi0: 7025 (boot + 5s/0x0020/info) - Board Revision 03A | mfi0: 7026 (boot + 33s/0x0002/info) - Inserted: PD 32(e0xff/s1) | mfisyspd0: on mfi0 | mfisyspd0: 1907729MB (3907029168 sectors) SYSPD volume | mfisyspd0: SYSPD volume attached | mfisyspd1: on mfi0 | mfisyspd1: 1907729MB (3907029168 sectors) SYSPD volume | mfisyspd1: SYSPD volume attached You are definitely in real JBOD mode with each drive being /dev/mfisyspd0 and /dev/mfisyspd1. So you can access the drives as those to do some experiments with if you want to. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
Alexander Motin writes: | On 04/06/12 20:12, Doug Ambrisko wrote: | > Alexander Motin writes: | > | On 04/04/12 21:47, John Baldwin wrote: | > |> On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote: | > |>> John Baldwin writes: | > |>> | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote: | > |>> |> John Baldwin writes: | > |>> |> | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote: | > |>> |> |> Doug Ambrisko writes: | > |>> |> |> | John Baldwin writes: | > |>> |> |> | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: | > |>> |> |> | |> Sean Bruno writes: | > |>> |> |> | |> | Noting a failure to attach to the onboard IPMI controller | > |> with | > |>> | this | > |>> |> | dell | > |>> |> |> | |> | R815. Not sure what to start poking at and thought I'd | > |> though | > |>> | this | > |>> |> | over | > |>> |> |> | |> | here for comment. | > |>> |> |> | |> | | > |>> |> |> | |> | -bash-4.2$ dmesg |grep ipmi | > |>> |> |> | |> | ipmi0: KCS mode found at io 0xca8 on acpi | > |>> |> |> | |> | ipmi1: on isa0 | > |>> |> |> | |> | device_attach: ipmi1 attach returned 16 | > |>> |> |> | |> | ipmi1: on isa0 | > |>> |> |> | |> | device_attach: ipmi1 attach returned 16 | > |>> |> |> | |> | ipmi0: Timed out waiting for GET_DEVICE_ID | > |>> |> |> | |> | > |>> |> |> | |> I've run into this recently. A quick hack to fix it is: | > |>> |> |> | |> | > |>> |> |> | |> Index: ipmi.c | > |>> |> |> | |> | > [snip] | > |>> | If you use "-ct" then you get a file you can feed into schedgraph. | > |>> | However, just reading the log, it seems that IRQ 20 keeps preempting | > |>> | the KCS worker thread preventing it from getting anything done. Also, | > |>> | there seem to be a lot of threads on CPU 0's runqueue waiting for a | > |>> | chance to run (load average of 12 or 13 the entire time). You can try | > |>> | just bumping up the max timeout from 3 seconds to higher perhaps. Not | > |>> | sure why IRQ 20 keeps firing though. It might be related to USB, so | > |>> | you could try fiddling with USB options in the BIOS perhaps, or disabling | > |>> | the USB drivers to see if that fixes IPMI. | > |>> | > |>> Tried without USB in kernel: | > |>> http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt | > |> | > |> Hmm, it's still just running constantly (note that the idle thread is | > |> _never_ scheduled). The lion's share of the time seems to be spent in | > |> "xpt_thrd". Note that there are several places where nothing happens except | > |> that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks. I | > |> would maybe start debugging that to see what in the world it is doing. Maybe | > |> it is polling some hardware down in xpt_action() (i.e., xpt_action() for a | > |> single bus called down into a driver and it is just spinning using polling | > |> instead of sleeping and waiting for an interrupt). | > | | > | "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus | > | on attach and by controller driver on hot-plug events. For some | > | controllers it may be quite CPU-hungry. For example, for legacy ATA | > | controllers, where bus reset may take many seconds of hardware polling, | > | while devices just spinning up. For ahci(4) it was improved about year | > | ago to not use polling when possible, but it still may loop for some | > | time if controller is not responding on reset. What mfi(4), mentioned in | > | log, does during scanning, I am not sure. | > | > I thought that mfi(4) could be an issue. There are some ata controllers | > with nothing attached. I built a GENERIC with USB and mfi commented out | > and then the timeout issue went away: | >ipmi0: KCS mode found at io 0xca8 on acpi | >ipmi1: on isa0 | >device_attach: ipmi1 attach returned 16 | >ipmi1: on isa0 | >device_attach: ipmi1 attach returned 16 | >ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1 | >ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211 | >ipmi0: DEBUG ipmi_complete_reques
Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
Alexander Motin writes: [ Charset ISO-8859-1 unsupported, converting... ] | On 04/04/12 21:47, John Baldwin wrote: | > On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote: | >> John Baldwin writes: | >> | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote: | >> |> John Baldwin writes: | >> |> | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote: | >> |> |> Doug Ambrisko writes: | >> |> |> | John Baldwin writes: | >> |> |> | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: | >> |> |> | |> Sean Bruno writes: | >> |> |> | |> | Noting a failure to attach to the onboard IPMI controller | > with | >> | this | >> |> | dell | >> |> |> | |> | R815. Not sure what to start poking at and thought I'd | > though | >> | this | >> |> | over | >> |> |> | |> | here for comment. | >> |> |> | |> | | >> |> |> | |> | -bash-4.2$ dmesg |grep ipmi | >> |> |> | |> | ipmi0: KCS mode found at io 0xca8 on acpi | >> |> |> | |> | ipmi1: on isa0 | >> |> |> | |> | device_attach: ipmi1 attach returned 16 | >> |> |> | |> | ipmi1: on isa0 | >> |> |> | |> | device_attach: ipmi1 attach returned 16 | >> |> |> | |> | ipmi0: Timed out waiting for GET_DEVICE_ID | >> |> |> | |> | >> |> |> | |> I've run into this recently. A quick hack to fix it is: | >> |> |> | |> | >> |> |> | |> Index: ipmi.c | >> |> |> | |> [snip] | >> | If you use "-ct" then you get a file you can feed into schedgraph. | >> | However, just reading the log, it seems that IRQ 20 keeps preempting | >> | the KCS worker thread preventing it from getting anything done. Also, | >> | there seem to be a lot of threads on CPU 0's runqueue waiting for a | >> | chance to run (load average of 12 or 13 the entire time). You can try | >> | just bumping up the max timeout from 3 seconds to higher perhaps. Not | >> | sure why IRQ 20 keeps firing though. It might be related to USB, so | >> | you could try fiddling with USB options in the BIOS perhaps, or disabling | >> | the USB drivers to see if that fixes IPMI. | >> | >> Tried without USB in kernel: | >>http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt | > | > Hmm, it's still just running constantly (note that the idle thread is | > _never_ scheduled). The lion's share of the time seems to be spent in | > "xpt_thrd". Note that there are several places where nothing happens except | > that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks. I | > would maybe start debugging that to see what in the world it is doing. Maybe | > it is polling some hardware down in xpt_action() (i.e., xpt_action() for a | > single bus called down into a driver and it is just spinning using polling | > instead of sleeping and waiting for an interrupt). | | "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus | on attach and by controller driver on hot-plug events. For some | controllers it may be quite CPU-hungry. For example, for legacy ATA | controllers, where bus reset may take many seconds of hardware polling, | while devices just spinning up. For ahci(4) it was improved about year | ago to not use polling when possible, but it still may loop for some | time if controller is not responding on reset. What mfi(4), mentioned in | log, does during scanning, I am not sure. I thought that mfi(4) could be an issue. There are some ata controllers with nothing attached. I built a GENERIC with USB and mfi commented out and then the timeout issue went away: ipmi0: KCS mode found at io 0xca8 on acpi ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1 ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211 ipmi0: DEBUG ipmi_complete_request 529 after wakeup 2272 ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 2332 ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0 Without mfi and with USB and it had issues: ipmi0: KCS mode found at io 0xca8 on acpi ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2 ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137 ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199 ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259 ipm
Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
John Baldwin writes: | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote: | > John Baldwin writes: | > | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote: | > | > Doug Ambrisko writes: | > | > | John Baldwin writes: | > | > | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: | > | > | | > Sean Bruno writes: | > | > | | > | Noting a failure to attach to the onboard IPMI controller with | this | > | dell | > | > | | > | R815. Not sure what to start poking at and thought I'd though | this | > | over | > | > | | > | here for comment. | > | > | | > | | > | > | | > | -bash-4.2$ dmesg |grep ipmi | > | > | | > | ipmi0: KCS mode found at io 0xca8 on acpi | > | > | | > | ipmi1: on isa0 | > | > | | > | device_attach: ipmi1 attach returned 16 | > | > | | > | ipmi1: on isa0 | > | > | | > | device_attach: ipmi1 attach returned 16 | > | > | | > | ipmi0: Timed out waiting for GET_DEVICE_ID | > | > | | > | > | > | | > I've run into this recently. A quick hack to fix it is: | > | > | | > | > | > | | > Index: ipmi.c | > | > | | > | === | > | > | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v | > | > | | > retrieving revision 1.14 | > | > | | > diff -u -p -r1.14 ipmi.c | > | > | | > --- ipmi.c14 Apr 2011 07:14:22 - 1.14 | > | > | | > +++ ipmi.c31 Mar 2012 19:18:35 - | > | > | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg) | > | > | | > if (error == EWOULDBLOCK) { | > | > | | > device_printf(dev, "Timed out waiting for | GET_DEVICE_ID\n"); | > | > | | > ipmi_free_request(req); | > | > | | > - return; | > | > | | > } else if (error) { | > | > | | > device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); | > | > | | > ipmi_free_request(req); | > | > | | > | > | > | | > The issue is that the wakeup doesn't actually wake up the msleep | > | > | | > in ipmi_submit_driver_request. The error being reported is that | > | > | | > the msleep timed out. This doesn't seem to be critical problem | > | > | | > since after this things seemed to work work. I saw this on 9.X. | > | > | | > Haven't seen it on 8.2. Not sure about -current. | > | > | | > | > | > | | > It doesn't happen on all machines. | > | > | | | > | > | | Hmm, are you seeing the KCS thread manage the request but the | wakeup() | > | is | > | > | | lost? | > | > | | > | > | It was a couple of weeks ago that I played with it. I put printf's | > | > | around the msleep and wakeup. I saw the wakeup called but the sleep | > | > | not get it. I can try the test again later today. Right now my main | > | > | work machine is recovering from a power outage. This was with 9.0 | > | > | when I first saw it. This issue seems to only happen at boot time. | > | > | If I kldload the module after the system is booted then it seems to | work | > | > | okay. The KCS part was working fine and got the data okay from the | > | > | request. I haven't seen or heard any issues with 8.2. | > | > | > | > With -current I patched ipmi.c with: | > | > Index: ipmi.c | > | > === | > | > --- ipmi.c (revision 233806) | > | > +++ ipmi.c (working copy) | > | > @@ -523,7 +523,11 @@ | > | > * waiter that we awaken. | > | > */ | > | > if (req->ir_owner == NULL) | > | > +{ | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup | > | %d\n",__FUNCTION__,__LINE__,ticks); | > | > wakeup(req); | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup | > | %d\n",__FUNCTION__,__LINE__,ticks); | > | > +} | > | > else { | > | > dev = req->ir_owner; | > | > TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, | > | ir_link); | > | > @@ -543,7 +547,11 @@ | > | > IPMI_LOCK(sc); | > | > error = sc->ipmi_enqueue_request(sc, req); | > | > if (error == 0) | > | > +{ | > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep | > | %d\n",__FUNCTION__,__LINE__,ticks); | > | > error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo); | > | > +device_printf(s
Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
John Baldwin writes: | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote: | > Doug Ambrisko writes: | > | John Baldwin writes: | > | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: | > | | > Sean Bruno writes: | > | | > | Noting a failure to attach to the onboard IPMI controller with this | dell | > | | > | R815. Not sure what to start poking at and thought I'd though this | over | > | | > | here for comment. | > | | > | | > | | > | -bash-4.2$ dmesg |grep ipmi | > | | > | ipmi0: KCS mode found at io 0xca8 on acpi | > | | > | ipmi1: on isa0 | > | | > | device_attach: ipmi1 attach returned 16 | > | | > | ipmi1: on isa0 | > | | > | device_attach: ipmi1 attach returned 16 | > | | > | ipmi0: Timed out waiting for GET_DEVICE_ID | > | | > | > | | > I've run into this recently. A quick hack to fix it is: | > | | > | > | | > Index: ipmi.c | > | | > === | > | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v | > | | > retrieving revision 1.14 | > | | > diff -u -p -r1.14 ipmi.c | > | | > --- ipmi.c14 Apr 2011 07:14:22 - 1.14 | > | | > +++ ipmi.c31 Mar 2012 19:18:35 - | > | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg) | > | | > if (error == EWOULDBLOCK) { | > | | > device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n"); | > | | > ipmi_free_request(req); | > | | > - return; | > | | > } else if (error) { | > | | > device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); | > | | > ipmi_free_request(req); | > | | > | > | | > The issue is that the wakeup doesn't actually wake up the msleep | > | | > in ipmi_submit_driver_request. The error being reported is that | > | | > the msleep timed out. This doesn't seem to be critical problem | > | | > since after this things seemed to work work. I saw this on 9.X. | > | | > Haven't seen it on 8.2. Not sure about -current. | > | | > | > | | > It doesn't happen on all machines. | > | | | > | | Hmm, are you seeing the KCS thread manage the request but the wakeup() | is | > | | lost? | > | | > | It was a couple of weeks ago that I played with it. I put printf's | > | around the msleep and wakeup. I saw the wakeup called but the sleep | > | not get it. I can try the test again later today. Right now my main | > | work machine is recovering from a power outage. This was with 9.0 | > | when I first saw it. This issue seems to only happen at boot time. | > | If I kldload the module after the system is booted then it seems to work | > | okay. The KCS part was working fine and got the data okay from the | > | request. I haven't seen or heard any issues with 8.2. | > | > With -current I patched ipmi.c with: | > Index: ipmi.c | > === | > --- ipmi.c (revision 233806) | > +++ ipmi.c (working copy) | > @@ -523,7 +523,11 @@ | > * waiter that we awaken. | > */ | > if (req->ir_owner == NULL) | > +{ | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup | %d\n",__FUNCTION__,__LINE__,ticks); | > wakeup(req); | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup | %d\n",__FUNCTION__,__LINE__,ticks); | > +} | > else { | > dev = req->ir_owner; | > TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, | ir_link); | > @@ -543,7 +547,11 @@ | > IPMI_LOCK(sc); | > error = sc->ipmi_enqueue_request(sc, req); | > if (error == 0) | > +{ | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep | %d\n",__FUNCTION__,__LINE__,ticks); | > error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo); | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep | %d\n",__FUNCTION__,__LINE__,ticks); | > +} | > if (error == 0) | > error = req->ir_error; | > IPMI_UNLOCK(sc); | > @@ -695,8 +703,11 @@ | > error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT); | > if (error == EWOULDBLOCK) { | > device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n"); | > + printf("DJA\n"); | > +/* | > ipmi_free_request(req); | > return; | > +*/ | > } else if (error) { | > device_printf(dev, "Failed GET_DEVICE_ID: %d\n&quo
Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
Doug Ambrisko writes: | John Baldwin writes: | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: | | > Sean Bruno writes: | | > | Noting a failure to attach to the onboard IPMI controller with this dell | | > | R815. Not sure what to start poking at and thought I'd though this over | | > | here for comment. | | > | | | > | -bash-4.2$ dmesg |grep ipmi | | > | ipmi0: KCS mode found at io 0xca8 on acpi | | > | ipmi1: on isa0 | | > | device_attach: ipmi1 attach returned 16 | | > | ipmi1: on isa0 | | > | device_attach: ipmi1 attach returned 16 | | > | ipmi0: Timed out waiting for GET_DEVICE_ID | | > | | > I've run into this recently. A quick hack to fix it is: | | > | | > Index: ipmi.c | | > === | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v | | > retrieving revision 1.14 | | > diff -u -p -r1.14 ipmi.c | | > --- ipmi.c14 Apr 2011 07:14:22 - 1.14 | | > +++ ipmi.c31 Mar 2012 19:18:35 - | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg) | | > if (error == EWOULDBLOCK) { | | > device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n"); | | > ipmi_free_request(req); | | > - return; | | > } else if (error) { | | > device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); | | > ipmi_free_request(req); | | > | | > The issue is that the wakeup doesn't actually wake up the msleep | | > in ipmi_submit_driver_request. The error being reported is that | | > the msleep timed out. This doesn't seem to be critical problem | | > since after this things seemed to work work. I saw this on 9.X. | | > Haven't seen it on 8.2. Not sure about -current. | | > | | > It doesn't happen on all machines. | | | | Hmm, are you seeing the KCS thread manage the request but the wakeup() is | | lost? | | It was a couple of weeks ago that I played with it. I put printf's | around the msleep and wakeup. I saw the wakeup called but the sleep | not get it. I can try the test again later today. Right now my main | work machine is recovering from a power outage. This was with 9.0 | when I first saw it. This issue seems to only happen at boot time. | If I kldload the module after the system is booted then it seems to work | okay. The KCS part was working fine and got the data okay from the | request. I haven't seen or heard any issues with 8.2. With -current I patched ipmi.c with: Index: ipmi.c === --- ipmi.c (revision 233806) +++ ipmi.c (working copy) @@ -523,7 +523,11 @@ * waiter that we awaken. */ if (req->ir_owner == NULL) +{ +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup %d\n",__FUNCTION__,__LINE__,ticks); wakeup(req); +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup %d\n",__FUNCTION__,__LINE__,ticks); +} else { dev = req->ir_owner; TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, ir_link); @@ -543,7 +547,11 @@ IPMI_LOCK(sc); error = sc->ipmi_enqueue_request(sc, req); if (error == 0) +{ +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep %d\n",__FUNCTION__,__LINE__,ticks); error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo); +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep %d\n",__FUNCTION__,__LINE__,ticks); +} if (error == 0) error = req->ir_error; IPMI_UNLOCK(sc); @@ -695,8 +703,11 @@ error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT); if (error == EWOULDBLOCK) { device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n"); + printf("DJA\n"); +/* ipmi_free_request(req); return; +*/ } else if (error) { device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); ipmi_free_request(req); and get # dmesg | grep ipmi ipmi0: KCS mode found at io 0xca8 on acpi ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi1: on isa0 device_attach: ipmi1 attach returned 16 ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2 ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6201 ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6263 ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 6323 ipmi0: Timed out waiting for GET_DEVICE_ID ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0 ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 6503 ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6620 ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6
Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
John Baldwin writes: | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote: | > Sean Bruno writes: | > | Noting a failure to attach to the onboard IPMI controller with this dell | > | R815. Not sure what to start poking at and thought I'd though this over | > | here for comment. | > | | > | -bash-4.2$ dmesg |grep ipmi | > | ipmi0: KCS mode found at io 0xca8 on acpi | > | ipmi1: on isa0 | > | device_attach: ipmi1 attach returned 16 | > | ipmi1: on isa0 | > | device_attach: ipmi1 attach returned 16 | > | ipmi0: Timed out waiting for GET_DEVICE_ID | > | > I've run into this recently. A quick hack to fix it is: | > | > Index: ipmi.c | > === | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v | > retrieving revision 1.14 | > diff -u -p -r1.14 ipmi.c | > --- ipmi.c 14 Apr 2011 07:14:22 - 1.14 | > +++ ipmi.c 31 Mar 2012 19:18:35 - | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg) | > if (error == EWOULDBLOCK) { | > device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n"); | > ipmi_free_request(req); | > - return; | > } else if (error) { | > device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); | > ipmi_free_request(req); | > | > The issue is that the wakeup doesn't actually wake up the msleep | > in ipmi_submit_driver_request. The error being reported is that | > the msleep timed out. This doesn't seem to be critical problem | > since after this things seemed to work work. I saw this on 9.X. | > Haven't seen it on 8.2. Not sure about -current. | > | > It doesn't happen on all machines. | | Hmm, are you seeing the KCS thread manage the request but the wakeup() is | lost? It was a couple of weeks ago that I played with it. I put printf's around the msleep and wakeup. I saw the wakeup called but the sleep not get it. I can try the test again later today. Right now my main work machine is recovering from a power outage. This was with 9.0 when I first saw it. This issue seems to only happen at boot time. If I kldload the module after the system is booted then it seems to work okay. The KCS part was working fine and got the data okay from the request. I haven't seen or heard any issues with 8.2. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
Doug Ambrisko writes: | Sean Bruno writes: | | Noting a failure to attach to the onboard IPMI controller with this dell | | R815. Not sure what to start poking at and thought I'd though this over | | here for comment. | | | | -bash-4.2$ dmesg |grep ipmi | | ipmi0: KCS mode found at io 0xca8 on acpi | | ipmi1: on isa0 | | device_attach: ipmi1 attach returned 16 | | ipmi1: on isa0 | | device_attach: ipmi1 attach returned 16 | | ipmi0: Timed out waiting for GET_DEVICE_ID | | I've run into this recently. A quick hack to fix it is: | | Index: ipmi.c | === | RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v | retrieving revision 1.14 | diff -u -p -r1.14 ipmi.c | --- ipmi.c14 Apr 2011 07:14:22 - 1.14 | +++ ipmi.c31 Mar 2012 19:18:35 - | @@ -695,7 +695,6 @@ ipmi_startup(void *arg) | if (error == EWOULDBLOCK) { | device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n"); | ipmi_free_request(req); | - return; Correction get rid of the ipmi_free_request as well. If you kldload then it doesn't have this issue. I've been doing that on -current for a while so I didn't notice the regression when it happened. | } else if (error) { | device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); | ipmi_free_request(req); | | The issue is that the wakeup doesn't actually wake up the msleep | in ipmi_submit_driver_request. The error being reported is that | the msleep timed out. This doesn't seem to be critical problem | since after this things seemed to work work. I saw this on 9.X. | Haven't seen it on 8.2. Not sure about -current. | | It doesn't happen on all machines. | | Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [stable-ish 9] Dell R815 ipmi(4) attach failure
Sean Bruno writes: | Noting a failure to attach to the onboard IPMI controller with this dell | R815. Not sure what to start poking at and thought I'd though this over | here for comment. | | -bash-4.2$ dmesg |grep ipmi | ipmi0: KCS mode found at io 0xca8 on acpi | ipmi1: on isa0 | device_attach: ipmi1 attach returned 16 | ipmi1: on isa0 | device_attach: ipmi1 attach returned 16 | ipmi0: Timed out waiting for GET_DEVICE_ID I've run into this recently. A quick hack to fix it is: Index: ipmi.c === RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v retrieving revision 1.14 diff -u -p -r1.14 ipmi.c --- ipmi.c 14 Apr 2011 07:14:22 - 1.14 +++ ipmi.c 31 Mar 2012 19:18:35 - @@ -695,7 +695,6 @@ ipmi_startup(void *arg) if (error == EWOULDBLOCK) { device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n"); ipmi_free_request(req); - return; } else if (error) { device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error); ipmi_free_request(req); The issue is that the wakeup doesn't actually wake up the msleep in ipmi_submit_driver_request. The error being reported is that the msleep timed out. This doesn't seem to be critical problem since after this things seemed to work work. I saw this on 9.X. Haven't seen it on 8.2. Not sure about -current. It doesn't happen on all machines. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: LSI MegaRAID SAS 9240 with mfi driver?
Jan Mikkelsen writes: | On 31/03/2012, at 9:21 AM, Doug Ambrisko wrote: | | > Jan Mikkelsen writes: | > | I don't know what changes Sean did. Are they in 9.0-release, or do I | > | need -stable after a certain point? I'm assuming I should be able to | > | take src/sys/dev/mfi/... and src/usr.sbin/mfiutil/... from -current. | > | > It's in the SVN project/head_mfi repro. You can browse it via the web at: | > http://svnweb.freebsd.org/base/projects/head_mfi/ | > | > It's not in -current yet. I'm working on the. I just did all the | > merges to a look try and eye'd them over. Now doing a compile test | > then I can check it into -current. | | OK, will check it out. | | > | The performance is an interesting thing. The write performance I care | > | about is ZFS raidz2 with 6 x JBOD disks (or 6 x single disk raid0) on | > | this controller. The 9261 with a BBU performs well but obviously costs more. | > | > There will need to be clarification in the future. JBOD is not that | > same as a single disk RAID. If I remember correctly, when doing some | > JBOD testing version single disk RAID is that JBOD is slower. A | > single disk RAID is faster since it can use the RAID. However, without | > the battery then you risk losing data on power outage etc. Without the | > battery then performance of a JBOD and single disk RAID should be able | > the same. | > | > A real JBOD as shown by LSI's firmware etc. shows up as a /dev/mfisyspd | > entries. JBOD by LSI is a newer thing. | | Ok, interesting. I was told by the distributor that the 9240 supports | JBOD mode, but the 9261 doesn't. I'm interested to test it out with ZFS. Correct, JBOD is not supported on all cards and depending on how the card comes needs to be enabled. Again JBOD is not RAID on a single disk. Also to clarify mfiutil create jbod does a RAID for each drive which isn't the same definition of JBOD that LSI talks about. They are 2 different animals. MegaCli can configure LSI JBOD's to enable the feature and create them. I'm not really sure what the value of JBOD support is. I haven't seen any kind of performance gains. | > | I can see the BBU being important for controller based raid5, but I'm | > | hoping that ZFS with JBOD will still perform well. I'm ignorant at this | > | point, so that's why I'm trying it out. Do you have any experience or | > | expectations with a 9240 being used in a setup like that? | > | > The battery or NVRAM doesn't matter on the RAID type being used since the | > cache in NVRAM mode, says done whenever it has space in the cache for the | > write. Eventually, it will hit the disk. Without the cache working in | > this mode the write can't be acknowledged until the disk says done. So | > performance suffers. With a single disk RAID you have been using the | > cache. | | With RAID-5 it is important because a single update requires two writes | and a failure in the window where one write has completed and one write | has not could cause data corruption. I don't know whether the controller | really handles this case. That shouldn't be a problem since the acknowledge won't happen until the writes are all done and if any fail then the I/O should fail back to the OS. | I guess I'm hopeful that ZFS will perform the function performed by the | NVRAM on the controller. I can see how the controller in isolation is | clearly slower without a BBU because it has to expose the higher layers | to the disk latency. All the ZFS should really be doing is adding another level of caching. Without an NVRAM cache, you can't really get the performance gain. | > Now you can force using the cache without NVRAM but you have to acknowledge | > the risk of that. | | Yes, I understand the risk, and it is one I do not want to take. All | the 9261s I have deployed have a BBU and go into write through mode if | the battery has a problem. | | I think I need to test it in the context of ZFS and see how it works | without controller NVRAM. Well, then you can do the performance test of the 9240 on the 9261s by disabling the battery and the cache! Feel free to do the test on the 9240. I can't see anything being faster without the NVRAM cache. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: LSI MegaRAID SAS 9240 with mfi driver?
Jan Mikkelsen writes: | Hi, | | On 31/03/2012, at 1:14 AM, Doug Ambrisko wrote: | | > John Baldwin writes: | > | On Friday, March 30, 2012 12:06:40 am Jan Mikkelsen wrote: | > | ... | > | > Is this path likely to work out? Any suggestions on where to go from here? | > | | > | You should try the updated mfi(4) driver that Doug (cc'd) is going to soon | > | merge into HEAD. It syncs up with the mfi(4) driver on LSI's website which | > | supports several cards that the current mfi(4) driver does not. (I'm not | > | fully sure if the 9240 is in that group or not. Doug might know however.) | > | > Yes, this card is supported with the mfi(4) in projects/head_mfi. Looks | > like we fixed a couple of last minute found bugs when trying to create a | > RAID wth mfiutil. This should be fixed now. I'm going to start the | > merge to -current today. The version in head_mfi can run on older | > versions of FreeBSD with the changes that Sean did. | > | > Note that I wouldn't recomend the 9240 since it can't have a battery | > option. NVRAM is the key to the speed of mfi(4) cards. However, that | > won't stop us from supporting | | Thanks. | | I don't know what changes Sean did. Are they in 9.0-release, or do I | need -stable after a certain point? I'm assuming I should be able to | take src/sys/dev/mfi/... and src/usr.sbin/mfiutil/... from -current. It's in the SVN project/head_mfi repro. You can browse it via the web at: http://svnweb.freebsd.org/base/projects/head_mfi/ It's not in -current yet. I'm working on the. I just did all the merges to a look try and eye'd them over. Now doing a compile test then I can check it into -current. | The performance is an interesting thing. The write performance I care | about is ZFS raidz2 with 6 x JBOD disks (or 6 x single disk raid0) on | this controller. The 9261 with a BBU performs well but obviously costs more. There will need to be clarification in the future. JBOD is not that same as a single disk RAID. If I remember correctly, when doing some JBOD testing version single disk RAID is that JBOD is slower. A single disk RAID is faster since it can use the RAID. However, without the battery then you risk losing data on power outage etc. Without the battery then performance of a JBOD and single disk RAID should be able the same. A real JBOD as shown by LSI's firmware etc. shows up as a /dev/mfisyspd entries. JBOD by LSI is a newer thing. | I can see the BBU being important for controller based raid5, but I'm | hoping that ZFS with JBOD will still perform well. I'm ignorant at this | point, so that's why I'm trying it out. Do you have any experience or | expectations with a 9240 being used in a setup like that? The battery or NVRAM doesn't matter on the RAID type being used since the cache in NVRAM mode, says done whenever it has space in the cache for the write. Eventually, it will hit the disk. Without the cache working in this mode the write can't be acknowledged until the disk says done. So performance suffers. With a single disk RAID you have been using the cache. Now you can force using the cache without NVRAM but you have to acknowledge the risk of that. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: LSI MegaRAID SAS 9240 with mfi driver?
John Baldwin writes: | On Friday, March 30, 2012 12:06:40 am Jan Mikkelsen wrote: | > Hi, | > | > I have a loan LSI MegaRAID SAS 9240-4i controller for testing. | > | > According to the LSI documentation, this device provides the MegaRAID | > interface and the BIOS message mentions MFI. The LSI driver for this device | > also lists support for the 9261 which I know is supported by mfi(4). | > Based on all this, I was hopeful that mfi(4) would work with the 9240. | > | > The pciconf -lv output is: | > | > none3@pci0:1:0:0: class=0x010400 card=0x92411000 chip=0x00731000 rev=0x03 hdr=0x00 | > vendor = 'LSI Logic / Symbios Logic' | > device = 'MegaRAID SAS 9240' | > class = mass storage | > subclass = RAID | > | > I added this line to src/sys/dev/mfi/mfi_pci.c | > | > {0x1000, 0x0073, 0x, 0x, MFI_FLAGS_GEN2, "LSI MegaRAID SAS 9240"}, | > | > It gave this result (tried with hw.mfi.msi set to 0 and to 1): | > | > mfi0: port 0xdc00-0xdcff mem 0xfe7bc000-0xfe7b,0xfe7c-0xfe7f irq 16 at device 0.0 on pci1 | > mfi0: Using MSI | > mfi0: Megaraid SAS driver Ver 3.00 | > mfi0: Frame 0xff8000285000 timed out command 0x26C8040 | > mfi0: failed to send init command | > | > The firmware is package 20.10.1-0077, which is the latest on the LSI website. | > | > Is this path likely to work out? Any suggestions on where to go from here? | | You should try the updated mfi(4) driver that Doug (cc'd) is going to soon | merge into HEAD. It syncs up with the mfi(4) driver on LSI's website which | supports several cards that the current mfi(4) driver does not. (I'm not | fully sure if the 9240 is in that group or not. Doug might know however.) Yes, this card is supported with the mfi(4) in projects/head_mfi. Looks like we fixed a couple of last minute found bugs when trying to create a RAID wth mfiutil. This should be fixed now. I'm going to start the merge to -current today. The version in head_mfi can run on older versions of FreeBSD with the changes that Sean did. Note that I wouldn't recomend the 9240 since it can't have a battery option. NVRAM is the key to the speed of mfi(4) cards. However, that won't stop us from supporting it. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: MFC: graid(8) (RAID GEOM) support
Jeremy Chadwick writes: | Sorry for the cross-post, but I thought both lists would want to know | about this. | | Looks like mav@ just committed this ~17 hours ago: | http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/raid/g_raid.c | | Those who have historically wanted to use Intel MatrixRAID (now called | Intel RST (Rapid Storage Technology)), but haven't due to the severe | issues/risks with ataraid(4), will probably be very interested in | this commit. I know I am! | | I plan on stress-testing the Intel support on a 2-disk system with | RAID-1 enabled, and will document my experiences, procedures, etc... We definitely want people to help test this out. It was designed from the start to be robust and do recovery for RAID 1 which is our use. We had previously hacked enhanced support into ataraid(4) and ata(4) for use in-house. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Enabling watchdog
Tom Evans writes: | On Fri, May 14, 2010 at 3:15 PM, Jeremy Chadwick | wrote: | > | > I'm a bit confused at this point, Doug. ?At what point did the OP state | > he has IPMI support or IPMI cards in his system? | | He said he had a Dell PowerEdge 2950 - iirc these all have IPMI. ... and although HW WD doesn't have to be in IPMI, I know for a fact it is on the base config. of a Dell PE2950 and has been since the PE2650. However, on the 2650 I saw false trips. It was one of the reasons I wrote ipmi(4). Eventually, I need to get in sync with jhb to add kernel back-trace support to it. I have some code at work to do it but it needs some work to ensure it works in every case etc. BTW, there is code/patches floating around to control the LCD on these Dell machines via ipmitool and on the r710 control attributes of the LCD. Unfortunately the ipmitool folks haven't pick it up. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Enabling watchdog
rihad writes: | On 05/14/2010 04:13 AM, Doug Ambrisko wrote: | > rihad writes: | > | Hi, I'm thinking of enabling the watchdog on our Dell PowerEdge 2950 / | > | FreeBSD 8.0 amd64, so that it reboots the machine in case of lockups. | > | Right now it doesn't work: | > | | > | # watchdog | > | watchdog: patting the dog: Operation not supported | > | # | > | Looking through the kernel configuration I found two relevant settings: | > | In /sys/conf/NOTES: | > | # | > | # Add software watchdog routines. | > | # | > | options SW_WATCHDOG | > | | > | and in /sys/amd64/conf/NOTES: | > | # | > | # Watchdog routines. | > | # | > | options MP_WATCHDOG | > | | > | Which of them should I rebuild the kernel with? BTW, the existing kernel | > | is built with the default "options SCHED_ULE" to make good use of | > | multiple CPUs, does watchdog work with it? | > | > If no one has said yet, kldload ipmi then run watchdogd. ... or compile | > it into the kernel. This will enable the IPMI HW watchdog. If it triggers, | > it will appear in the IPMI SEL (ipmitool sel list). | | Thanks. So did I understand it right that I should first install | sysutils/ipmitool, then start polling "ipmitool sel list" in a shell | script from a cron job run once a minute, and reboot in case IPMI | triggers? But if it's a kernel lockup, none of the user level code might | run at all. Any way to fall back to a hard and fast kernel level machine | reset? Nope, when you load the ipmi driver it provides a HW watchdog via ipmi and works with watchdogd. Now if you want to know if your machines rebooted due to the watchdog then check the ipmi sel for the watchdog event. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Enabling watchdog
rihad writes: | Hi, I'm thinking of enabling the watchdog on our Dell PowerEdge 2950 / | FreeBSD 8.0 amd64, so that it reboots the machine in case of lockups. | Right now it doesn't work: | | # watchdog | watchdog: patting the dog: Operation not supported | # | Looking through the kernel configuration I found two relevant settings: | In /sys/conf/NOTES: | # | # Add software watchdog routines. | # | options SW_WATCHDOG | | and in /sys/amd64/conf/NOTES: | # | # Watchdog routines. | # | options MP_WATCHDOG | | Which of them should I rebuild the kernel with? BTW, the existing kernel | is built with the default "options SCHED_ULE" to make good use of | multiple CPUs, does watchdog work with it? If no one has said yet, kldload ipmi then run watchdogd. ... or compile it into the kernel. This will enable the IPMI HW watchdog. If it triggers, it will appear in the IPMI SEL (ipmitool sel list). Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Netowrk Card BCE not working
Umar writes: | Dear Members! | | I have recently Install FreeBSD 7.2 amd64 on DELL R610. | | After successfuly installation network cards are not working and i got error | | bce0: /usr/src/dev/bce/if_bce.c(1386): Unable to write CTX memory: cid_addr = 0x008, offset = 0x0080 | | Would you please help me what should I do? A new version of firmware should be coming out from Dell that should resolve this issue. The firmware can be updated via DOS or Linux but not from FreeBSD using Linux emulation (atleast not yet) :-( A short time solution is to find a diag utility for the NIC and then disable management function in all NICs. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Monitoring tools for mfi0: ?
wsk writes: [ Charset UTF-8 unsupported, converting... ] | On Sun, Aug 09, 2009 at 08:04:35PM +0200, V=E1clav Haisman wrote: | > Hi, | | > I have a server with the "mfi0: " controller. Are there any | > monitoring tool for this? I tried camcontrol but it doesn't even list the | > device. | =20 | Maybe sysutils/megacli does what you want? | | Roland | | some times. I got follow mesgs on my Dell PE R900. any ideas? | | mfi0: 1989 (303015600s/0x0020/info) - Patrol Read started | mfi0: 2020 (303022694s/0x0020/info) - Patrol Read complete This is normal. Patrol read scans for potential disk errors. If you want more info in real-time then you can set hw.mfi.event_class="-2" and get more detail or use MegaCli to get the full event log. | mfi0: COMMAND 0xff80005a7870 TIMEOUT AFTER 43 SECONDS | mfi0: COMMAND 0xff80005a7ed0 TIMEOUT AFTER 58 SECONDS This is usually okay and saying that some commands are taking a while. If the command never completes then that is a problem. the RAID control can decide the order of completing commands so it can take some time. | pciconf: | m...@pci0:25:0:0: class=0x010400 card=0x1f0c1028 chip=0x00601000 rev=0x04 hdr=0x00 | vendor = 'LSI Logic (Was: Symbios Logic, NCR)' | device = 'SAS1078 PCI-X Fusion-MPT SAS' | class = mass storage | subclass = RAID FWIW, that is a bad discription since it claims it is the SAS card and not the RAID. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: System deadlock when using mksnap_ffs
Kostik Belousov writes: | On Thu, Nov 13, 2008 at 02:45:14AM -0800, Jeremy Chadwick wrote: [snip] | > If he can press Control-T, it means SIGINFO can be sent to the | > mksnap_ffs process, and the process responds with that information. So, | > the system is not deadlocked -- meaning, I believe what he experiences | > is what others experience (the system becomes completely unusable during | > mksnap_ffs running, but DOES NOT hang or lock up, it just becomes so | > god-awful slow that processes on the machine literally sit and spin for | > minutes at a time). | | Unless NOKERNINFO is specified in the local flags in the controlling | terminal termios, kernel prints one line summary as shown above. This is | done from the tty discipline input handler (or whatever it is in new tty | code). No process cooperation is required. On the other hand, actually | delivering SIGINFO and getting output from the process-installed | handler do require process to either executing usermode or sleeping | interruptible. Also note that "dead-lock" is not just a locking issue but can be WRT to other chains such as, hit the max buffer cache usage so the buffer daemon needs to flush things out but it can't since it needs a buffer but the buffer daemon can't get it since need to flush some. Things get really bad when the buffer daemon needs a buffer but can't! In theory it can go and use "emergency space" just for it to get out of this situation but it the buffer cache is fragmented such that all available buffers are to small then the buffer daemon is stuck on itself. Note that all stuff works except for anything that touch the buffer cache, such as a program coming off disk. A program in memory is okay. To really get a good picture of this you need to look at the various buffer cache variables via ddb (ie. hi, low, running etc.) A while back I wrote a debugging function to dump that state of things every minute or so. There are various loops you can get into. So then you start playing wack a mole. Usually due to the first bug you can't hit the 2nd, 3rd and so one adding to the fun. Unfortunately there isn't one magic bullet. These are not new problems since we hit them in 4.X. I did start to go over some of this issue with Tor but ran into ENOTIME on my side :-( Snap shots can take a very long time to make depending on the amount of stuff it has to snap shot and during that time it has to effectively lock out everything from the file system or the snap shot will be wrong. This then leads to a need for a good journaling fs that can be used on "big" disks (big, isn't that big anymore). Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: System deadlock when using mksnap_ffs
Kostik Belousov writes: | On Wed, Nov 12, 2008 at 07:49:28PM +, Tim Bishop wrote: | > On Wed, Nov 12, 2008 at 05:58:26PM +, Tim Bishop wrote: | > > I run the mksnap_ffs command to take the snapshot and some time later | > > the system completely freezes up: | > > | > > paladin# cd /u2/.snap/ | > > paladin# mksnap_ffs /u2 test.1 | > | > Someone (not named because they choose not to reply to the list) gave me | > the following patch: | > | > --- sys/ufs/ffs/ffs_snapshot.c.orig Wed Mar 22 09:42:31 2006 | > +++ sys/ufs/ffs/ffs_snapshot.c Mon Nov 20 14:59:13 2006 | > @@ -282,6 +282,8 @@ restart: | > if (error) | > goto out; | > bawrite(nbp); | > + if (cg % 10 == 0) | > + ffs_syncvnode(vp, MNT_WAIT); | > } | > /* | > * Copy all the cylinder group maps. Although the | > @@ -303,6 +305,8 @@ restart: | > goto out; | > error = cgaccount(cg, vp, nbp, 1); | > bawrite(nbp); | > + if (cg % 10 == 0) | > + ffs_syncvnode(vp, MNT_WAIT); | > if (error) | > goto out; | > } | > | > With the description: | > | > "What can happen is on a big file system it will fill up the buffer | > cache with I/O and then run out. When the buffer cache fills up then no | > more disk I/O can happen :-( When you do a sync, it flushes that out to | > disk so things don't hang." | > | > It seems to work too. But it seems more like a workaround than a fix? | | It looks hackish, but in fact it is not that wrong, and I even say that | it provides reasonable workaround. | | The usual way to prevent wdrain deadlock is to issue bwillwrite() call | before any vnode lock is taken. This is sufficient for most VFS syscalls | that typically put dozen or less dirty buffers into delayed write | queue. | | Snapshot creation does not call bwillwrite() at all, and then does a lot | of async writes, completely saturating buffer cache with dirty buffers. | bwillwrite cannot be called after the vnode is locked, and just forcing | a sync for the embrionic snapshot vnode is good enough. | | The 10 counter is debatable, but debate shall be postponed until the patch | goes into tree. I ask an anonymous submitter to commit it. Thanks ! I plan to commit it tomorrow since I sent it to Tim to test. The 10 can be tuned but it has kept a bunch of machines at work up. Glad people don't think it is that it is to wrong :-) It probably could be made a little more dynamic but I wonder if it would show any real performance difference and might risk more bugs. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: System deadlock when using mksnap_ffs
Jeremy Chadwick writes: [snip] | The rest of the below information is good -- but I'm confused about | something: is there anyone out there who can use mksnap_ffs on a | filesystem (/usr is a good test source) and NOT experience this | deadlocking problem? Literally *every* FreeBSD box I have root access | to suffers from this problem, so I'm a little baffled why we end-users | need to keep providing debugging output when it should be easy as pie | for a developer to do "dump -0 -L -a -f /path/fs.dump /usr" and watch | their system wedge. We can at work, but we have a bunch of other patches. There are a few problems with the buffer cache: 1) The buffer daemon can't use the space that is reserved for it since to flush some stuff it needs to use more buffers. 2) The buffer cache can get fragmented to prevent large I/O which the buffer daemon may need. 3) Other issues ... I have fix for "1". It is pretty easy. I have a hack'ish fix for "2" in the I make all request use max size so it can't get fragmented since there is no code to defrag and it isn't trivial to defrag the memory. I have some fixes for some other issues, but there were some review issues with them. I might just commit the fixes for 1 and 2. It makes things better and there was no-objections at the time. We have the patches in shipping products. I can try to do some experiments at work like you said since I had similar things working before and it is pretty easy to put in printf's to see the issue. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Management interface for cards powered by the "mfi" driver?
Karl Denninger writes: [snip] | Ok, wiped the src tree, re-cvs'd out the RELENG_7, rebuild world and kernel | and reinstalled (nice fast machine eh?) Not needed since FreeBSD 6.2 if I recall right. Forget if I got it in 6.1. | Anyway, no change: | | dbms# uname -v | FreeBSD 7.0-STABLE #1: Wed Jun 18 14:43:29 CDT 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC | | dbms# megacli -adpCount | | Controller Count: 0. | | dbms# megacli -Cfgdsply -a0 | | Failed to get ControllerId List. | Failed to get CpController object. | | Still no joy | | dbms# kldstat | Id Refs AddressSize Name | 1 17 0xc040 943140 kernel | 21 0xc0d44000 6a2c4acpi.ko | 31 0xc5534000 7000 linprocfs.ko | 43 0xc553b000 22000linux.ko | 51 0xc5585000 3000 linsysfs.ko | 61 0xc7a34000 3000 daemon_saver.ko | 71 0xc7c2d000 2000 mfi_linux.ko | | Says I got the proper KLDs loaded. | | dbms# mount | /dev/mfid0s1a on / (ufs, local, soft-updates) | devfs on /dev (devfs, local) | /dev/mfid0s1e on /dbms (ufs, local, soft-updates) | /dev/mfid0s1d on /usr (ufs, local, soft-updates) | linprocfs on /usr/compat/linux/proc (linprocfs, local) | linsysfs on /usr/compat/linux/sys (linsysfs, local) | | The two linux "look-sees" are there. | | So it looks like all the pre-reqs are there, but it still doesn't work. | | Here's the ID on the card and volume: | | mfi0: 524 (267116948s/0x0020/0) - Adapter ticks 267116948 elapsed 61s: Time established as 06/18/08 15:09:08; (61 seconds since power on) | mfid0: on mfi0 | mfid0: 237464MB (486326272 sectors) RAID volume '' is optimal | | What am I missing? The linux version sysctl is? Also I think you need to make sure mfi_linux.ko is loaded before linuxsys.ko mounts so you get the emulation hooks. Verify that via: head /compat/linux/sys/class/scsi_host/*/proc_name results in one saying: megaraid_sas or it won't think it is there. The count is good to see if your file system & linux version sysctl stuff is in the right state. Once it detects it, then the ioctl should work. 6-stable, 7-stable and -current all have the latest stuff to support all of the ioctl stuff as Linux does for MegaCli. MegaCli does various things to try to find the card in Linux that is really strange IMHO. For FreeBSD it doesn't have to be that complicated. They unfortunately, have not released a FreeBSD MegaCli which they could ... Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: lpbb broken in 6.x?
Bruce M Simpson writes: | Ian Smith wrote: | > To finish off completely hijacking your thread :) does anyone know of | > anything that can run a master/slave interface like pcf(4) which appears | > to have been an ISA bus only device? I don't have C skills to write | > one, though 400kHz master and slave routines in AVR asm were fun :) | > | > Later: after nearly losing this in a pine crash (don't ask), I've since | > seen John's reply to your later message. Could it be that smbus or | > something is also using iicbus rather than something messing with ppbus? | | Thanks for the hints. I don't have smbus in the kernel, nor do I have | any other i2c device drivers loaded in the system. | | I stopped using smbus when it became pretty clear that it wasn't doing | anything useful for me (it could never see CPU fan readouts or anything | like that when I tried it on 3 different PIII era systems). FWIW, this really isn't a fault of smbus. Some monitoring chips only have I/O type interface, other only have i2c some have both. Then it depends on what the manufacturer connects and then if they enable the i2c controller on the motherboard. I've used smbus on a bunch of HW and not on other. It depends on what they do and how they set up addressing. At a prior company I set up an LCD display module to interface to the MB i2c bus. I prototyped it by soldering wires onto a DIMM. Some MB's have an i2c header on board and others have it routed to PCI slots. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Dell PERC6?
Vlad GALU writes: | On 1/17/08, Ferdinand Goldmann <[EMAIL PROTECTED]> wrote: | > Hi! | > | > I am in the process of buying new Dell hardware, mainly the 2950 III. | > According to various postings I found, the PERC6/i Controller _should_ work | > with FreeBSD 6.3. Does anyone successfully use a 2950 III with PERC6/i | > controller and can confirm this? | > | > Sorry if the question sounds stupid, but as I cannot find any references to | > the PERC6 in either documentation or source code I am a bit confused, and I | > wanted to make sure it works before shelling out my employers money. :-) | > | > Many thanks for any enlightenment on this subject, | > kind regards, | > Ferdinand | |Don't know if this is useful to you, but I'm using 7.0 on the same | Dell platform, and hence on the same controller, with very good | results. I think the mfi(4) manpage should be updated too :) It's been updated in -current. Yes, PERC6 support is in 6.3 & 7.0. Thanks for the prompt, Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Don't buy AMD products (was Re: Xorg and ATI card query.)
Kip Macy writes: | Please be very careful. The only real alternative (Intel comes and | goes) is Nvidia whose driver is binary-only for i386 (no amd64 | support) and has a history for being notoriously buggy. I only buy ATI | because of the problems I keep seeing people have with the Nvidia | driver. I have a friend who has basically abandoned his dual-head | Nvidia card due to recurring issues. One thing that is a plus with nv is that X has some support for it, whereas, the newer ati cards have no support :-( I was a fan of ati since it was easier to get support. Now I'm starting to lean towards Nvidia :-( Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Xorg and ATI card query.
Daniel O'Connor writes: | On Tuesday 13 March 2007 05:10, Yann Golanski wrote: | > I have an ATI Radeon X1950 Sapphire and I am trying to get X/FreeBSD | > working with it. My system is a clean install of FreBSD. I've managed to | > get VESA to "work" but cannot get much more than that. | | There is no open source support for this card (alas). It's VESA or fglrx. | | > fglrx gives me an error at compile time since I do not have | > /usr/X11R6/bin/moc installed. | | Is this using the FreeBSD port at http://www.fglrx-freebsd.com/index.php? If | so you could just install moc which is part of qt. FWIW, I just went through this exercise for my new laptop. Vesa doesn't do 1920x1200 :-( and it isn't on amd64 :-( So I wanted to use the Linux fglrx. The one that he is to old to support my laptop. Realize that he does compile some misc. tools they are not needed for X to work. Really he is taking the Linux X drivers (fglxrc_drv.o & libfglrxdrm.a) and putting them into /usr/X11R6/lib/modules/drivers. The caveat is that you need an old enough version that doesn't link against Linux specific things like pthreads etc. which the newest ones do. Another caveat is that the older versions were built again X.org 6.8 so then you need an X.org of the version. In 6.9 some structures changed leading to a core dump :-( I ended up building my own X.org 6.8, install and then install the typical -current X stuff. The next thing I'm going to work on is to get the 32bit X server to run on a 64bit kernel so I can switch over to 64bit. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: running mksnap_ffs
Kris Kennaway writes: | Thanks for clarifying. Hopefully you and Tor can get something | committed soon! I'm not sure about that. I have to see what has changed since then. That was ... uhm a year ago when I dropped the ball. It's probably a good task for me to look at in the context of -current again. I should have disks to build a 1.5T file system to play with. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: running mksnap_ffs
Kris Kennaway writes: | On Tue, Jan 16, 2007 at 09:26:47PM +0100, Willem Jan Withagen wrote: | > Doug Ambrisko wrote: | > >| > or things can get wedged. We have some other patches as well that | > >might | > >| > be required. As a hack on a local server we have been using snap shots | > >| > to do a "hot" back-up of a data base each morning. This is based on | > >| > 6.x. | > >| | > >| What do you mean by "get wedged"? Are you seeing a deadlock, and if | > >| so then what are the details? When you say 6.x, do you mean | > >| up-to-date RELENG_6? There were various snapshot deadlock fixes | > >| committed over the past year including some in the past few months. | > > | > >The file-system would come to a stop, processes stuck on bio, snap-shots | > >not finishing etc. This was caused by the system running out of usable | > >buffers. The change forces them to be flushed every so often. This is | > >independant of locking. 10 might be to aggresive. Some scaling of | > >nbuf would probably be better. | > | > When I run mksnap_ffs it runs to the point where ANY access to the | > filesystem gives that process a lockup. | | Yes, that is expected. Actually it begins when something accesses the | directory in which the snapshot is being made, since that causes the | parent directory to be locked...then something tries to access the | parent directory, which eventually cascades back to the root. | | > Getting the file system back is only thru "hard reboot". Trying to do it | > the gentle way locks the whole system. | | Or waiting until the snapshot operation finishes. You (still) haven't | determined that it's actually hanging as opposed to just waiting for | the snapshot operation to finish. In my case is was easy to see that all the buffers were exhausted and the system was churning waiting for some to become available. Since they were all used up it never recovered. By sync'ing the buffers they got cleaned up and then the system never ran out. The snap shot was then able to finish. Via the debugger you can see this happen. I traced this problem in the debugger. There are other issues with the buffer deamon as well. We hit these since we run with a relatively low nbuf. The buffers can be get frag'ed so bad that it can't flush things since it can't get a full-size buffer. Another problem is that it can end up waiting on itself since the current code can't use it's emergency space to flush stuff. You can see this via ps etc. It's not a good thing if the buffer daemon is waiting on itself :-( We have patches to this as well but they need some more work. I was working with Tor, on this but then I got swamped at work with our 4.X -> 6.X and platform transition. All I can say is that we don't suffer from these problems now :-) I have printf's the log this stuff when some of these bugs are hit. Now the system survives those lock-up points. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Dell hardware raid 0 (sas5ir) or gmirror?
Joe Koberg writes: | Josef Karthauser wrote: | > On Mon, Jan 15, 2007 at 11:21:06AM +, Josef Karthauser wrote: | >> I'm purchasing a new server, and was wondering what anyone thought | >> about whether to pay extra for the SAS5IR card so I can RAID0 the | >> two drives, or whether to just rely on gmirror. My worry about the | >> former is that I can't seem to find management tools for | >> controlling the hardware controller. What if one of the drives | >> fails? How would I know? | > | > Of course I mean RAID1! | | I just bought two Dell PE-1950's to use as routers. They have LSI Logic | PERC/5i's attached to 80GB SATA drives. I am pretty sure this is the | same card used for SAS. | | One thing is for sure, the mfi(4) card and driver aren't shy! See below | for examples of the kernel messages I get regularly. I am sure drive | failure would be well noted. FYI, you can silence it to your level of comfort via: hw.mfi.event_class in /boot/loader.conf. The values being: MFI_EVT_CLASS_DEBUG = -2, MFI_EVT_CLASS_PROGRESS =-1, MFI_EVT_CLASS_INFO =0, MFI_EVT_CLASS_WARNING = 1, MFI_EVT_CLASS_CRITICAL =2, MFI_EVT_CLASS_FATAL = 3, MFI_EVT_CLASS_DEAD =4 The new default is info. so it's a little quieter. I'd suggest some care in going over info since a drive that failed will come through but when it is now okay will not. So if you are waiting for that you won't know. Here, we like the debug and progress stuff put into /var/log/messages. It makes support a lot easier. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: running mksnap_ffs
Kris Kennaway writes: | On Tue, Jan 16, 2007 at 10:13:57AM -0800, Doug Ambrisko wrote: | | > FWIW, with this patch I find making snap-shots a lot more reliable: | > | > --- sys/ufs/ffs/ffs_snapshot.c.orig Wed Mar 22 09:42:31 2006 | > +++ sys/ufs/ffs/ffs_snapshot.c Mon Nov 20 14:59:13 2006 | > @@ -282,6 +282,8 @@ restart: | > if (error) | > goto out; | > bawrite(nbp); | > + if (cg % 10 == 0) | > + ffs_syncvnode(vp, MNT_WAIT); | > } | > /* | > * Copy all the cylinder group maps. Although the | > @@ -303,6 +305,8 @@ restart: | > goto out; | > error = cgaccount(cg, vp, nbp, 1); | > bawrite(nbp); | > + if (cg % 10 == 0) | > + ffs_syncvnode(vp, MNT_WAIT); | > if (error) | > goto out; | > } | > | > or things can get wedged. We have some other patches as well that might | > be required. As a hack on a local server we have been using snap shots | > to do a "hot" back-up of a data base each morning. This is based on | > 6.x. | | What do you mean by "get wedged"? Are you seeing a deadlock, and if | so then what are the details? When you say 6.x, do you mean | up-to-date RELENG_6? There were various snapshot deadlock fixes | committed over the past year including some in the past few months. The file-system would come to a stop, processes stuck on bio, snap-shots not finishing etc. This was caused by the system running out of usable buffers. The change forces them to be flushed every so often. This is independant of locking. 10 might be to aggresive. Some scaling of nbuf would probably be better. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: running mksnap_ffs
Scott Oertel writes: | Kris Kennaway wrote: | > On Tue, Jan 02, 2007 at 09:06:24PM +0100, Willem Jan Withagen wrote: | > | >> Hi, | >> | >> I got the following Filesystem: | >> FilesystemSizeUsed Avail Capacity iused ifree %iused | >> /dev/da0a 1.3T422G823G34% 565952 1828334700% | >> | >> Running of a 3ware 9550, on a dual core Opteron 242 with 1Gb. | >> The system is used as SMB/NFS server for my other systems here. | >> | >> I would like to make weekly snapshots, but manually running mksnap_ffs | >> freezes access to the disk (I sort of expected that) but the process | >> never terminates. So I let is sit overnight, but looking a gstat did not | >> reveil any activity what so ever... | >> The disk was not released, mksnap_ffs could not be terminated. | >> And things resulted in me rebooting the system. | >> | >> So: | >> - How long should I expect making a snapshot to take: | >>5, 15, 30min, 1, 2 hour or even more??? | > | > Yes :) Snapshots were not designed for use in this way (they were | > designed to support background fsck and allow faster system recovery | > after power failure), so they don't scale as well as you might like on | > large filesystems. | | If snapshots were designed to support background fsck, then why did they | not make it more scalable? If you can't create a snapshot without the | system locking up, that means fsck won't be able to either, making | background fsck worthless for systems with large storage. FWIW, with this patch I find making snap-shots a lot more reliable: --- sys/ufs/ffs/ffs_snapshot.c.orig Wed Mar 22 09:42:31 2006 +++ sys/ufs/ffs/ffs_snapshot.c Mon Nov 20 14:59:13 2006 @@ -282,6 +282,8 @@ restart: if (error) goto out; bawrite(nbp); + if (cg % 10 == 0) + ffs_syncvnode(vp, MNT_WAIT); } /* * Copy all the cylinder group maps. Although the @@ -303,6 +305,8 @@ restart: goto out; error = cgaccount(cg, vp, nbp, 1); bawrite(nbp); + if (cg % 10 == 0) + ffs_syncvnode(vp, MNT_WAIT); if (error) goto out; } or things can get wedged. We have some other patches as well that might be required. As a hack on a local server we have been using snap shots to do a "hot" back-up of a data base each morning. This is based on 6.x. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.x from i386 to amd64
Peter Jeremy writes: | It would be nice to see the 32-bit emulation improved so that it is | possible to build/run the i386 versions of ports on an amd64 system. | This would be the best of both worlds. If I had any free time, I | would even work on this myself. I have this working well enough for everything that we build here. Our new build machines are running the amd64 kernels but we build for i386. After 6.2 is out I'll merge my uname/getosreldate changes to -stable and create a stub script to set the environment variables. We do some hacks to copy in the hosts ps, top, mount type things into a compat directory so it runs the hosts versions. It seems a few people are interested in this and it seems to be working well for us & myself. Maybe Kris can then convert his ports clusters over to amd64 OS'es to build everything. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Megacli fails to find SAS adapter
Sven Willenberger writes: | On Tue, 2006-10-10 at 22:11 -0700, Doug Ambrisko wrote: | > Sven Willenberger writes: | > | FreeBSD 6.2-PRERELEASE #3: Tue Oct 10 13:58:29 EDT 2006 | > | LSi 8480e SAS Raid card | Adding mfi_linux_enable="YES" to /boot/loader.conf did do the trick of | having the device added to the system: | | # cat /compat/linux/sys/class/scsi_host/host*/proc_name | (null) | megaraid_sas | (null) | | # sysctl compat.linux | compat.linux.oss_version: 198144 | compat.linux.osrelease: 2.6.12 | compat.linux.osname: Linux | | Although the MegaCli utility no longer complains about not finding a | controller, it sadly does nothing else either (except dump core on | certain commands): | | # ./MegaCli -AdpAllinfo -a0 I usually start with that. It should work okay. Check your /compat/linux/dev directory for stuff. It might have created null and some other entries look at the dates. Those nodes could be wrong. We have an empty /compat/linux/dev directory. | # ./MegaCli -AdpGetProp SpinupDriveCount -a0 | | Segmentation fault (core dumped) | # ./MegaCli -LDGetNum -a0 | | Failed to get VD count on adapter -9993. | # ./MegaCli -CfgFreeSpaceinfo -a0 | | Failed to initialize RM | | and so on ... I am guessing this is an issue with the MegaCli software | now; needless to say I certainly doubt that this will allow me to flash | the card bios (or even it if *could*, I would be leery of the process). If one doesn't work the reset probably won't. I be cautious to flash the card. It should work but I haven't tried it. If this is your only card then you have a lot to risk! On prior cards, Adaptec and LSI if the flash failed then the card was toast. MegaCli has some issues as well. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Megacli fails to find SAS adapter
Sven Willenberger writes: | FreeBSD 6.2-PRERELEASE #3: Tue Oct 10 13:58:29 EDT 2006 | LSi 8480e SAS Raid card | | mount: | linprocfs on /compat/linux/proc (linprocfs, local) | linsysfs on /compat/linux/sys (linsysfs, local) | /dev/mfid0s1d on /usr/local/pgsql (ufs, local, noatime) | | dmesg: | mfi0: 2025 - PCI 0x041000 0x04411 0x041000 0x041002: Firmware initialization started (PCI ID 0411/1000/1002/1000) | mfi0: 2026 - Type 18: Firmware version 1.00.00-0074 | mfi0: 2027 - Battery temperature is normal | mfi0: 2028 - Battery Present | mfi0: 2029 - PD 39(e1/s255) event: Enclosure (SES) discovered on PD 27(e1/s255) | mfi0: 2030 - PD 56(e2/s255) event: Enclosure (SES) discovered on PD 38(e2/s255) | mfi0: 2031 - PD 39(e1/s255) event: Inserted: PD 27(e1/s255) | mfi0: 2032 - Type 29: Inserted: PD 27(e1/s255) Info: enclPd=27, scsiType=d, portMap=10, sasAddr=50015b2180001839, | mfi0: 2033 - PD 56(e2/s255) event: Inserted: PD 38(e2/s255) | | pkg_info: | linux_base-fc-4_9 | | I have downloaded the Megacli and, using rpm2cpio extracted | MegaCli-1.01.09-0.i386.rpm into my home directory. | | ~/usr/sbin/MegaCli | brandelf -t Linux usr/sbin/MegaCli | | cd usr/sbin | | # ./MegaCli -EncInfo -aALL | | ERROR:Could not detect controller. | # ./MegaCli -CfgDsply -aALL | | ERROR:Could not detect controller. | | Do I actually need to set up the links in /compat/linux/sys for the SAS | raid card? or should this rpm be installed into the /compat/linux | directory? I need to upgrade the firmware on this card as for some | reason the webbios will not let me configure a Raid10 array and the only | way I can see to upgrade the fw is to use the megacli utility. Make sure you have the Linux ioctl module loaded before linsysfs so it can register the hooks. kldstat/kernel config will help. One sanity check is to do: dhcp194:ambrisko 11] cat /compat/linux/sys/class/scsi_host/host*/proc_name megaraid_sas (null) dhcp194:ambrisko 12] If you don't see megaraid_sas then it isn't going to work and is missing the linux mfi module. Also you need to set: sysctl compat.linux.osrelease=2.6.12 or things won't work well. This will probably break your fc-4_9 Linux install until the updates to Linux emulation is merged (maybe it has but I don't think so). Since it is a static binary we don't have linux base installed. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Dell 1950 does not properly respond to reboot and shutdown -p
John Baldwin writes: | On Tuesday 10 October 2006 08:54, Bill Moran wrote: | > In response to Doug Ambrisko <[EMAIL PROTECTED]>: | > > Bruno Ducrot writes: | > > | On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote: | > > | > In response to Bruno Ducrot <[EMAIL PROTECTED]>: | > > | > > Hi, | > > | > > | > > | > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote: | > > | > > > | > > | > > > A reboot causes the OS to halt, but the hardware just sits there on the | > > | > > > shutdown screen. | > > | > > > | > > | > > > A shutdown -p does the same. | > > | > > | > > | > > What exactly are the last few lines? | > > | > | > > | > (manually copied) | > > | > | > > | > ... | > > | > All buffers synced. | > > | > Uptime: 1m16s | > > | > | > > | | > > | Thanks. Then this happen after print_uptime(). | > > | | > > | I believe one of the drivers register a shutdown_final (or | > > | shutdown_post_sync) event that hang your system. I think (though I | > > | may be wrong) mfi may be that one. | > > | | > > | It would help if you can add some printf in dev/mfi/mfi.c into the | > > | mfi_shutdown() function in order to check if that assumption | > > | is correct. | > > | > > Some what related to this we have a local hack: | > > | > > --- sys/kern/subr_bus.c.orig Tue Jun 27 15:49:39 2006 | > > +++ sys/kern/subr_bus.c Tue Jun 27 15:49:51 2006 | > > @@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev) | > > device_t child; | > > | > > TAILQ_FOREACH(child, &dev->children, link) { | > > + DELAY(1000); | > > device_shutdown(child); | > > } | > | > This patch seems to "fix" the problem. I'm going to replace it with | > some printfs and see if I can determine which driver is actually | > causing the problem (hopefully it's only one). | > | > Am I wrong in saying that the correct solution would be to identify the | > driver that needs more time and implementing some sort of polling | > mechanism to ensure the hardware is ready when the driver wants to | > shut down? | | Well, first let's see which driver it is. :) You might be able to just | remove the DELAY and add a printf and see which device is printed last. I think it was in a different ones. One of our configs has the base HW + bge NIC the other has base HW + 2 x 2 port em NICs. The more NIC's the better chance for a problem. I've removed the hack from our kernel and I'm going to run the reboot cycle. I don't think a printf will work since I recall trying that it "fixed" the problem so I put the DELAY in :-( It could be generic problem to the system with a sufficiently fast CPU to beat the HW at shutting down. I'm not sure if his system is Dempsey or Woodcrest. We use Woodcrest and they are really faster. Other machines might be "slow" enough that it's not a a problem! We haven't seen it on our older platforms with the same kernel and similar HW configs. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Dell 1950 does not properly respond to reboot and shutdown -p
Bruno Ducrot writes: | On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote: | > In response to Bruno Ducrot <[EMAIL PROTECTED]>: | > > Hi, | > > | > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote: | > > > | > > > A reboot causes the OS to halt, but the hardware just sits there on the | > > > shutdown screen. | > > > | > > > A shutdown -p does the same. | > > | > > What exactly are the last few lines? | > | > (manually copied) | > | > ... | > All buffers synced. | > Uptime: 1m16s | > | | Thanks. Then this happen after print_uptime(). | | I believe one of the drivers register a shutdown_final (or | shutdown_post_sync) event that hang your system. I think (though I | may be wrong) mfi may be that one. | | It would help if you can add some printf in dev/mfi/mfi.c into the | mfi_shutdown() function in order to check if that assumption | is correct. Some what related to this we have a local hack: --- sys/kern/subr_bus.c.origTue Jun 27 15:49:39 2006 +++ sys/kern/subr_bus.c Tue Jun 27 15:49:51 2006 @@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev) device_t child; TAILQ_FOREACH(child, &dev->children, link) { + DELAY(1000); device_shutdown(child); } Seems like we were tearing things done to fast and resources stolen away from HW that was totally shutdown yet or something. I think this was worse when things had shared interrupts but I forget the exact details. It's been a lot time when I put in the hack and moved onto the next fire. It seems the more HW we had in the machine the worse the problem was. This is just a hack and not a fix. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Comtrol Rocketport driver is severely hosed under 6.x-STABLE
Karl Denninger writes: | | There is a severe problem (or set of problmes) with the Comtrol Rocketport | driver under FreeBSD 6.x, to the point that the driver is basically unusable. | | The driver is returning duplicate input frames and otherwise misbehaving | badly. There were no problems under FreeBSD 5.x. | | Does anyone know what has changed in the tty subsystem between 5.x and 6.x, | or, alternatively if there is no update on this, is there a KNOWN WORKING | PROPERLY multiport serial board under 6.x? | | This has totally hosed a number of my field installations when they attempted | to go from the 5.x operating environment to 6.x! | | Thanks in advance Try this for 6.1 in /sys/dev/rp: Index: rp.c === RCS file: /usr/local/cvsroot/freebsd/src/sys/dev/rp/rp.c,v retrieving revision 1.67.2.1 diff -u -p -r1.67.2.1 rp.c --- rp.c8 Nov 2005 15:35:27 - 1.67.2.1 +++ rp.c7 Sep 2006 18:19:44 - @@ -37,15 +37,18 @@ __FBSDID("$FreeBSD: src/sys/dev/rp/rp.c, /* * rp.c - for RocketPort FreeBSD */ +#include #include "opt_compat.h" #include +#include #include #include #include #include #include +#include #include #include #include @@ -57,7 +60,7 @@ __FBSDID("$FreeBSD: src/sys/dev/rp/rp.c, #include #include -static const char RocketPortVersion[] = "3.02"; +static const char RocketPortVersion[] = "1.0"; static Byte_t RData[RDATASIZE] = { @@ -116,6 +119,8 @@ Byte_t rp_sBitMapSetTbl[8] = 0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80 }; +int next_unit_number = 0; +int num_devices_found = 0; /*** Function: sReadAiopID Purpose: Read the AIOP idenfication number directly from an AIOP. @@ -587,6 +592,9 @@ static void rp_do_receive(struct rp_port unsignedint CharNStat; int ToRecv, wRecv, ch, ttynocopy; + if (tp->t_state & TS_TBLOCK) + return; + ToRecv = sGetRxCnt(cp); if(ToRecv == 0) return; @@ -615,7 +623,7 @@ static void rp_do_receive(struct rp_port CharNStat = rp_readch2(cp,sGetTxRxDataIO(cp)); ch = CharNStat & 0xff; - if((CharNStat & STMBREAK) || (CharNStat & STMFRAMEH)) + if((CharNStat & STMBREAKH) || (CharNStat & STMFRAMEH)) ch |= TTY_FE; else if (CharNStat & STMPARITYH) ch |= TTY_PE; @@ -645,6 +653,12 @@ static void rp_do_receive(struct rp_port if ( ToRecv > RXFIFO_SIZE ) { ToRecv = RXFIFO_SIZE; } + if ((tp->t_rawq.c_cc + ToRecv > tp->t_ihiwat) && + ((tp->t_cflag & CRTS_IFLOW) || +(tp->t_iflag & IXOFF)) && + !(tp->t_state & TS_TBLOCK)) + ttyblock(tp); + wRecv = ToRecv >> 1; if ( wRecv ) { rp_readmultich2(cp,sGetTxRxDataIO(cp),(u_int16_t *)rp->RxBuf,wRecv); @@ -686,6 +700,7 @@ static void rp_handle_port(struct rp_por IntMask = sGetChanIntID(cp); IntMask = IntMask & rp->rp_intmask; ChanStatus = sGetChanStatus(cp); + if(IntMask & RXF_TRIG) if(!(tp->t_state & TS_TBLOCK) && (tp->t_state & TS_CARR_ON) && (tp->t_state & TS_ISOPEN)) { rp_do_receive(rp, tp, cp, ChanStatus); @@ -769,22 +784,23 @@ rp_attachcommon(CONTROLLER_T *ctlp, int unit = device_get_unit(ctlp->dev); - printf("RocketPort%d (Version %s) %d ports.\n", unit, - RocketPortVersion, num_ports); + printf("RocketPort%d = %d ports.\n", unit, num_ports); rp_num_ports[unit] = num_ports; callout_handle_init(&rp_callout_handle); ctlp->rp = rp = (struct rp_port *) - malloc(sizeof(struct rp_port) * num_ports, M_TTYS, M_NOWAIT | M_ZERO); + malloc(sizeof(struct rp_port) * (num_ports+1), M_TTYS, M_NOWAIT | M_ZERO); if (rp == NULL) { device_printf(ctlp->dev, "rp_attachcommon: Could not malloc rp_ports structures.\n"); retval = ENOMEM; goto nogo; } - +/* else { + device_printf(ctlp->dev, "malloc'd rp_ports structures=%08x.\n", rp); + }*/ count = unit * 32; /* board times max ports per card SG */ - bzero(rp, sizeof(struct rp_port) * num_ports); + bzero(rp, sizeof(struct rp_port) * (num_ports+1)); oldspl = spltty(); rp_addr(unit) = rp; splx(oldspl); @@ -1016,9 +1032,10 @@ rpmodem(struct tty *tp, int sigon, int s } return (0); } - +#define B460800 460800 +#define B921600 921600 static struc
Re: i386/100160: [mfid] Perc5i: additional symptomatic info on virtual disk detection issue
Jeffrey Williams writes: | I don't know if anyone specifically is working on this, but just tried | to install FreeBSD 6.1 from CD on a Dell 2950 with the PERC 5/i SAS | controller. | | This server was originally configured with two hardware RAID virual | disks, the first was RAID 1 with two 36GB drives, and the second was | RAID 5 with three 72 GB drives. | | Just like the original PR, the first was detected and identified in the | the installer volume setup as both mfid0 and mfid1. | | In order to try and work around the problem and just get the machine up | and running, I tried deleting the RAID 1 virtual disk with the intention | of installing everything to the RAID 5 virtual disk, however, with the | first virtual disk removed, no drives where detected at all. | | Next I will be trying removing the physical drives original used in the | RAID 1 virtual disk, and re-initializing the RAID 5 array. I will | provide an update if successful. | | In the meantime if anybody else is aware of another work around of fix | for this, I appreciate hearing about it. If a patch comes out soon, I | will be happy to provide testing, but I have a small window as this | server was being implemented as an emergency replacement for another server. Upgrade the mfi driver to -stable. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: LSI/amr driver controller cache problem?
Patrick M. Hausen writes: | Here's the preliminary results: | | - Controller cache policy: write through (megamgr or BIOS setup) Write back should be okay with a battery. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: LSI/amr driver controller cache problem?
Patrick M. Hausen writes: | > > Also, check the cache | > > setting on the drives itself. Maybe the drives are loosing power or | > > getting reset while data is in their cache. | > | > I'm starting to suspect something like this. The controller's setting | > for the individual drives' caches is "OFF". But these (Seagate ST3500841NS) | > would not be the first ATA/SATA drives to "lie" about their cache for | > "performance". | | Seems like | | for i in 0 1 2 3 4 5 | do | megarc -pSetCache -WCE0 -SaveCacheSetting -ch0 -id$i -a0 | done | | did the trick. This is supposed to disable the physical drives' | write cache and save this setting in the drives' NVRAM, if supported. | | I don't know why simply setting the WC to "off" in the controller's | BIOS setup tool didn't have the same effect. I'm keeping my fingers | crossed ;-) | | Time to re-enable softupdates and do some more stress testing. | | Up to now the system survived two times "make installworld && reboot" | after I changed the settings. | | Thanks to the guys keeping the amr driver up-to-date. The Linux | "megamgr" utility works just fine. If I find the time, I'll make | a port. That would be great. I'd discourage the idea of MegaMon though since it leaks shared memory and exits unless LSI has finally fixed it. So monitoring is a pain. I guess a watcher script would be okay but it has a nasty habit of reporting prior errors every time it starts :-( We have a native local tool that works but we can't re-distribute it. The mfi driver doesn't have this issues since the driver reports all events directly. However, MegaCli doesn't actually create or delete a RAID (even with Linux). I have patches in the wings that deals with discovery while the system is up but we need clearance on them. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: bce0: Error mapping mbuf into TX chain!
David (Controller AE) Christensen writes: | Sorry, I've been out on vacation and just got back into town. I'll MFC | the patch within the next day or two. I'll let you merge in the down/up fix that I put into -current. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Dell PowerEdge 750 & 850 environtmental monitoring
Arnold Cavazos Jr. writes: | Does anybody have temperature and fan monitoring working on Dell | PowerEdge 750's & 850's? I have done my share of googling without much | luck. The PE850 should just work with ipmi(4) in 6.1-stable/-current and ipmitool. The PE750 will work with ipmi if you have the Drac card. It is possible to get thermal stuff on the PE750 via smbus but that is more complicated. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: em device hangs on ifconfig alias ...
Francisco Reyes writes: | Atanas writes: | > I have some newer machines with 2 Broadcom chips on-board. I plan to | > give them a try at some point in the future, but I'm not sure how stable | > the bge driver | | For us they have been a problem. Primarily because it causes all kinds of | freezing/crashes when having an IPMI board. I believe it has performed ok in | machines where we don't have an IPMI card. Can you try: http://www.ambrisko.com/doug/bge_ipmi_3.patch and see if that helps. I need one minor tweak to it before I can commit it. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: build fails on amd64 machine
Sean McNeil writes: | I get the following: | | ===> ipmi (depend) | make: don't know how to make ipmi.c. Stop | *** Error code 2 That should be fixed. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: megamgr on 6.1?
Brian Szymanski writes: | kldload amr_linux did the trick for me, thanks! Good to hear. You might want to look at megarc and megamonitor for Linux. Hopefully, LSI updated megamonitor to fix the share memory leak or it will exit in about 1/2 hour on FreeBSD since we don't allow us much shared memory usage as Linux. It leaks on Linux. It just takes longer to use it all up. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: megamgr on 6.1?
Boris Samorodov writes: | On Tue, 9 May 2006 12:37:43 -0400 (EDT) Brian Szymanski wrote: | > PS - mknod c 254 /compat/linux/dev/megadev0 (which is what the device is | > under linux) doesn't help :( | | I't only my imho, use it with care: | | # cd /dev | # ln -s amr0 megadev0 Nope, it needs to show up in devfs. Making a node manually is going to cause trouble. If there isn't a /dev/megadev0 then you don't have the amr_linux loaded. You can try to kldload but you might have to compile it in static. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Issues with nullconsole in FreeBSD 6.0-p6
Jonas B?low writes: | I'm experiencing a really strange problem using nullconsole in FreeBSD | 6.0-p6. Briefly, what happens is that the use of nullconsole affects | the behavior of the OS negatively, very negatively. | | There are two different setups with different kernel | configurations. They both have console set to nullconsole in | loader.conf. | | In the first setup the machine reboots spontaneously somewhere during | boot without leaving a hint of the reason. | | In the other setup there is a fsck process (fsck_4.2bsd) crashing with | signal 8 (floating point exception) during boot. The fsck is run on an | auxiliary disk during startup. | | Both these problems goes away if console is set to either vidconsole | or comconsole in loader.conf. | | Adding DDB to the kernel configuration prevents to machine from | continuously rebooting in the first setup. Instead, it silently halts | somewhere in the boot process. Not easy telling where. It's seems to | be somewhere late in the process. Probably when running rc.d scripts | by observing the time before reboot compared to when using vidconsole | or comconsole. | | I've tried to debug the problem. I've not figured out how to remotely | debug a kernel when using nullconsole. The escape to debugger hot keys | (Ctrl+Alt+Esc or Ctrl+SysReq) does not work when using | nullconsole. Therefor it is not possible to switch to remote mode. Can | DDB be force to go directly into remote mode? | | I really understand it is impossible to give a simple answer or | solution to my problems described above. Well, if someone knows a | solution I wouldn't mind sharing it. What I really would like help | with is some input on how to debug this further. What to look for, | things to try etc. We don't seem to have that problem here with our enhanced consmute stuff. I noticed some implementation strangeness with the newer consmute implementation. Our major change is to put it into a function so if you break into the debugger or it panics you get that stuff. Now technically this wouldn't be a good idea with the original motivation but works for us. Bug me to remember to extract the patch for you to try. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Temperature monitoring in FreeBSD 4/5/6
Stephan Koenig writes: | Does anyone know of an easy way to get temperature information out of | a Dell PowerEdge 1550/1650/1750/1850/2650/2850 running FreeBSD4/5/6? | | Something that has a very simple CLI that just outputs the temperature | without any formatting, or a library/sysctl, would be ideal. For now manually back port the ipmi device driver and then install the latest ipmitool from ports. Then you can run ipmitool via the local interfaces. Interface that are support are SMIC and KCS. SSIF is in progress and dealing with some strange ACPI defintions that put a hole in the address space of the HW :-( I haven't really looked at the BT interface yet. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Changing release version on source
Henri Hennebert writes: | > Glenn Dawson wrote: | >> At 12:06 PM 3/8/2006, you wrote: | >>> Does anyone know how to change the release version of the source | >>> code? I have some brain dead software (Plesk) that insists on | >>> FreeBSD 5.3, while it will work just fine on 5.5 and even 6. I am | >>> wondering i can change the version of RELENG_5 code so that this | >>> software will think its 5.3-R and let me install. I have tried | >>> changing the variable in /usr/src/release/Makefile, but that seems | >>> to have no effect. | >> | >> Take a look at sys/conf/newvers.sh | > | > Excellent, thanks! I'm presuming i have to do a full build/install | > world for this to take effect. Do you think that anything may break | > because of this manual change, even if i used RELENG_6 code? I will not | > be installing any ports. | | I would prefer to wrap /usr/bin/uname with a temporary custom version | returning | the disired values. You can use UNAME_ over-rides. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: New 'amr' driver and linux MegaMGR
Danny Braniss writes: | > Cristiano Deana writes: | > | 2006/3/1, Paul Saab <[EMAIL PROTECTED]>: | > | > works fine | > | | > | I got: | > | Failed to open driver node /dev/megadev0 | > | > Make sure you have amr_linux. kldload amr_linux.ko. Then you should | > get a /dev/megadev0. It also works in a static kernel. You might | > want to do an ls -l of /dev/megadev0. This is only available | > in FreeBSD 6.1 and -current it is not in FreeBSD 6.0. The changes | > will drop into FreeBSD 6.0 though. | | i'm getting: | | Copyright (c) 1992-2006 The FreeBSD Project. | Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 | The Regents of the University of California. All rights reserved. | FreeBSD 6.1-PRERELEASE #3: Mon Feb 27 10:23:29 IST 2006 | ... | module_register_init: MOD_LOAD (amr_linux, 0x805f1b00, 0) error 6 | ad4: 286168MB at ata2-master SATA150 | ad6: 286168MB at ata3-master SATA150 | ar0: 572072MB status: READY | ar0: disk0 READY using ad4 at ata2-master | ar0: disk1 READY using ad6 at ata3-master | | and no /dev/megadev0 | | maybe because: | [EMAIL PROTECTED]:31:2: class=0x01048f card=0x34518086 chip=0x25b08086 rev=0x02 hdr=0x00 | vendor = 'Intel Corporation' | device = '6300ESB Serial ATA Controller (RAID mode)' | class= mass storage | subclass = RAID You don't have an LSI RAID controller. Those are the built in Intel SATA ports in RAID mode which is software RAID. It is being detected as ata disks and using ata-raid. So you can't use the LSI RAID tools. If it was detected via the amr(4) driver then you could use the LSI RAID tools. Now it is using the LSI RAID meta-data. "atacontrol" will manage this assuming Soren has the meta-data write support for their format. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: LSI Megaraid (amr) performance woes
Sven Willenberger writes: | On Wed, 2006-03-01 at 15:08 -0500, Mike Tancsa wrote: | > At 02:10 PM 01/03/2006, Sven Willenberger wrote: | > | > >I cvsupped a 6.1 prerelease and found no performance improvements. I did | > >some further tests and the performance issues seem very specific to the | > >mirroring aspect of the raid: | > | > | > I am not familiar with the LSI cards, but with older 3ware and the | > ARECA cards, the raid sets when in any sort of redundancy mode must | > initialize in the background before normal use. Until that is | > complete, performance is seriously slow. Is the LSI doing that, and | > perhaps just not telling you ? | | I had thought of this too so I disabled the rapid (background) | initialization option and let the raids build to completion the slow | way. So unless it is still building even after it is done (or is doing | some other odd processor-intensive crc checking or something) I don't | think this is the source of the problem. If you run the Linux MegaMon utilties it will tell you if the controller is running background tasks like this and tell you the progress. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: New 'amr' driver and linux MegaMGR
Cristiano Deana writes: [ Charset ISO-8859-1 unsupported, converting... ] | 2006/3/1, Paul Saab <[EMAIL PROTECTED]>: | > works fine | | I got: | Failed to open driver node /dev/megadev0 Make sure you have amr_linux. kldload amr_linux.ko. Then you should get a /dev/megadev0. It also works in a static kernel. You might want to do an ls -l of /dev/megadev0. This is only available in FreeBSD 6.1 and -current it is not in FreeBSD 6.0. The changes will drop into FreeBSD 6.0 though. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Disk I/O system hang on 5.4-RELEASE-p8 i386
Kris Kennaway writes: | On Thu, Feb 23, 2006 at 04:44:46PM -0600, Greg Rivers wrote: | > On Thu, 23 Feb 2006, Michael R. Wayne wrote: | > | > >Been fighting this for a while. We have an older server, running | > >5.4-RELEASE-p8 i386 and used primarily for email, which hangs every | > >couple of weeks. The hang seems to be in the disk I/O system; pings | > >succeed, and I can continue get a login: prompt on the console until | > >I enter a login at which the response stops. | > >[snip] | > | > I think you're seeing the UFS deadlock I reported last November for | > RELENG_6. See the thread beginning at | > http://lists.freebsd.org/pipermail/freebsd-stable/2005-November/019979.html | > | > I believe this issue has made it onto the show-stopper list for | > 6.1-RELEASE and is being actively worked on. | | It's on the todo list, but I don't think it's being worked on yet. | The main problem is that we need a way to reproduce it on command. | I'd forgotten that snapshots are involved, so maybe it's just a matter | of running lots of mksnap_ffs while I/O is in progress. FWIW, I found a problem when creating snapshots in that it could exhaust available buffers and wedge: Index: ffs_snapshot.c === RCS file: /usr/local/cvsroot/freebsd/src/sys/ufs/ffs/ffs_snapshot.c,v retrieving revision 1.112 diff -u -p -r1.112 ffs_snapshot.c --- ffs_snapshot.c 9 Jan 2006 20:42:18 - 1.112 +++ ffs_snapshot.c 24 Feb 2006 23:02:19 - @@ -336,6 +336,8 @@ restart: if (error) goto out; bawrite(nbp); + if (cg % 10 == 0) + ffs_syncvnode(vp, MNT_WAIT); } /* * Copy all the cylinder group maps. Although the @@ -357,6 +360,8 @@ restart: goto out; error = cgaccount(cg, vp, nbp, 1); bawrite(nbp); + if (cg % 10 == 0) + ffs_syncvnode(vp, MNT_WAIT); if (error) goto out; } Fixed this problem for me. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.0 on Dell 1850 with PERC4e/DC RAID?
Scott Mitchell writes: | On Thu, Jan 12, 2006 at 04:41:17PM -0800, Doug Ambrisko wrote: | > Scott Mitchell writes: | > | | > | That's a pity. Maybe Doug was thinking of one of the aac(4) based PERC | > | cards? Still, something I can run out of cron to check the array status | > | should be fine. | > | > Are you refering to this Doug. The Linux ioctl shim requires one file | > that hasn't been committed yet. Scott L. & ps have it. I may commit | > it now that I'm back. This lets all of the Dell/LSI Linux tools | > run on FreeBSD including the firmware update tool. The caveat is | > that with the driver re-do it seems the certain things in the ioctl | > path causes the firmware to lock-up. I haven't been around enough | > to help with that problem. I have a binary that locks it up pretty | > quick. | | Hi Doug, | | I was actually referring to Doug White, who said: | | >From what I remember, you will receive status-change kernel messages when | >disks disappear, rebuilds start, and so forth. So for most day-to-day | >manipulation you should be fine. | | It wasn't clear if this applied to the amr(4)-based PERC cards or just the | aac(4) ones. Yes that only applies to the aac based machines and not amr based machines (ie. Adaptec versus LSI). With LSI you have to poll the controller for RAID events and that is not public. | Sounds like the re-worked amr driver will be very much better, at least | once a few more bugs have been ironed out of it. Yes. | > Most of the existing monitoring tools have bugs. The Linux tools | > tend to be better but the last copy of MegaMon leaked shared memory | > then quit. We have a tool at work but it is encumbered so we can't | > give it out. | > | > | > I did find a program | > | > posted to one of the freebsd lists called 'amrstat' that I run | > | > nightly. It produces this kind of output: | > | > | > | > Drive 0:68.24 GB, RAID1 | > io> optimal | > | > | > | > If it says "degraded" it is time to fix a drive. You just fire up | > | > the lsi megaraid tools and find out which drive it is. | > | > This is probably a faily good scheme. Caveat is that you can have | > a "optimal" RAID that is broken :-( | | That's pretty sucky, but presumably not a FreeBSD-specific problem? | Despite that, I'm reasonably hopeful that a scheme like this along with | good backups (which we have) will be enough to avoid any major disasters. It's not a FreeBSD specific problem. | Is Dell's support any better if you tell them you're running RedHat? We can sort-of run RedHat. That is, we ran the Linux RAID binaries from LSI & Dell with the Linux ioctl emulation layer I did on FreeBSD. I netboot Linux sometimes to verify some things. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.0 on Dell 1850 with PERC4e/DC RAID?
Jung-uk Kim writes: | On Friday 13 January 2006 11:59 am, Doug Ambrisko wrote: | > Jung-uk Kim writes: | > | On Thursday 12 January 2006 07:41 pm, Doug Ambrisko wrote: | > | > Scott Mitchell writes: | > | > | > I did find a program | > | > | > posted to one of the freebsd lists called 'amrstat' that I | > | > | > run nightly. It produces this kind of output: | > | > | > | > | > | > Drive 0:68.24 GB, RAID1 | > | > | > optimal | > | > | > | > | > | > If it says "degraded" it is time to fix a drive. You just | > | > | > fire up the lsi megaraid tools and find out which drive it | > | > | > is. | > | > | > | > This is probably a faily good scheme. Caveat is that you can | > | > have a "optimal" RAID that is broken :-( | > | | > | That's lame. Under what condition does it happen, do you know? | > | > Running RAID 10, a drive was swapped and the rebuild started on the | > replacement drive. The rebuild complained about the source drive | > for the mirror rebuild having read errors that couldn't be | > recovered. It continued on and finished re-creating the mirror. | > Then the RAID proceeeded onto a background init which they normal | > did and started failing that and re-starting the background init | > over and over again. The box changed the RAID from degraded to | > optimal when the rebuild completed (with errors). Do a dd of the | > entire RAID logical device returned an error at the bad sector | > since it couldn't recover that. The RAID controller reported an I/O | > error and still left the RAID as optimal. | > | > We reported this and where told that's the way it is designed :-( | > Probably the spec. is defined by whatever the RAID controller | > happens to do versus what make sense :-( | > | > So far this has only happened once. Changing firmware did not | > help. | | Similar thing happened to me once or twice (with RAID5) and I thought | it was just a broken controller. If the culprit was design, it IS | really lame. :-( I'd suggest whining to them. To me "optimal" means "as far as I know there are no problems with the RAID". If enough customers whine they might change their view! Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.0 on Dell 1850 with PERC4e/DC RAID?
Mike Tancsa writes: | At 11:59 AM 13/01/2006, Doug Ambrisko wrote: | >| | >| That's lame. Under what condition does it happen, do you know? | > | >Running RAID 10, a drive was swapped and the rebuild started on the | >replacement drive. The rebuild complained about the source drive | >for the mirror rebuild having read errors that couldn't be recovered. | >It continued on and finished re-creating the mirror. Then the RAID | >proceeeded onto a background init which they normal did and started | >failing that and re-starting the background init over and over again. | >The box changed the RAID from degraded to optimal when the rebuild | >completed (with errors). Do a dd of the entire RAID logical device | >returned an error at the bad sector since it couldn't recover that. | >The RAID controller reported an I/O error and still left the RAID as | >optimal. | > | >We reported this and where told that's the way it is designed :-( | | Interesting timing as I ran into this sort of situation on the | weekend on a 3ware drive in RAID1. The card had complained for a week | about read errors on drive 1. We thought we would wait until the | weekend maintenance window to swap it out. Sadly, before that | window, drive zero totally died a horrible death. We popped in a new | drive on port zero, started the rebuild, and it crapped out saying | there was a read error on drive 1. However, there is a check box | that says continue the build, even with errors on the source drive. With Adaptec we used to do a verify of each disk before a swap to increase our chances of a successful disk swap. Adaptec was a little heavy handed in if you are running on the last disk of the mirror and it has a read-error it will fail the drive. If you have a RAID 10 then you lose 1/2 the file system :-( I'd rather just get the read error back to the OS then loose the entire drive. | This setup seems to give you the best of both worlds. We did a quick | check of the resultant files compared to backups and only a couple | were toasted. (The box is going to be retired in a month, so if there | is other hidden fs corruption if it holds out for another 3 weeks we | dont care too much). The correct approach would be to do a total | restore of course, but this was good enough for us in this | situation. I guess the question is, is this RAID1 in a proper mirror | given that there are hard errors on the drive on port 1 ? That sounds like a good controller assuming it says the RAID is still degraded and it's not optimal. I assume "optimal" means everything is fine and safe to read the entire volume. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.0 on Dell 1850 with PERC4e/DC RAID?
Jung-uk Kim writes: | On Thursday 12 January 2006 07:41 pm, Doug Ambrisko wrote: | > Scott Mitchell writes: | > | > I did find a program | > | > posted to one of the freebsd lists called 'amrstat' that I run | > | > nightly. It produces this kind of output: | > | > | > | > Drive 0:68.24 GB, RAID1 | > | > optimal | > | > | > | > If it says "degraded" it is time to fix a drive. You just | > | > fire up the lsi megaraid tools and find out which drive it is. | > | > This is probably a faily good scheme. Caveat is that you can have | > a "optimal" RAID that is broken :-( | | That's lame. Under what condition does it happen, do you know? Running RAID 10, a drive was swapped and the rebuild started on the replacement drive. The rebuild complained about the source drive for the mirror rebuild having read errors that couldn't be recovered. It continued on and finished re-creating the mirror. Then the RAID proceeeded onto a background init which they normal did and started failing that and re-starting the background init over and over again. The box changed the RAID from degraded to optimal when the rebuild completed (with errors). Do a dd of the entire RAID logical device returned an error at the bad sector since it couldn't recover that. The RAID controller reported an I/O error and still left the RAID as optimal. We reported this and where told that's the way it is designed :-( Probably the spec. is defined by whatever the RAID controller happens to do versus what make sense :-( So far this has only happened once. Changing firmware did not help. Doug A. PS. sorry for the null email before this. Hit the wrong key. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.0 on Dell 1850 with PERC4e/DC RAID?
Jung-uk Kim writes: [ Charset euc-kr unsupported, skipping... ] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.0 on Dell 1850 with PERC4e/DC RAID?
Scott Mitchell writes: | On Fri, Jan 06, 2006 at 10:35:46AM -0500, Vivek Khera wrote: | > | > On Jan 5, 2006, at 5:41 PM, Scott Mitchell wrote: | > | > >I may be getting a new Dell PE1850 soon, to replace our ancient CVS | > >server | > >(still running 4-STABLE). The new machine will ideally run 6.0 and | > >have a | > >PERC4e/DC RAID card - the one with battery-backed cache. This is | > >listed as | > | > I have an 1850 with the buil-in PERC 4e/Si since all I needed was the | > RAID1 mirror of the internal drives. It works extremely well, and | > the speed is quite good. | | We'll only be mirroring the internal drives too for now - the 4e/DC seems | to be the only RAID option on the 1850 with battery-backed cache, and | doesn't cost much more for the extra peace-of-mind. | | > As for notices of when the drives go bad, under 4.x I've had disk | > failures with the amr driver (different PERC cards) and not gotten | > any such notices in the syslog that I recall. | | That's a pity. Maybe Doug was thinking of one of the aac(4) based PERC | cards? Still, something I can run out of cron to check the array status | should be fine. Are you refering to this Doug. The Linux ioctl shim requires one file that hasn't been committed yet. Scott L. & ps have it. I may commit it now that I'm back. This lets all of the Dell/LSI Linux tools run on FreeBSD including the firmware update tool. The caveat is that with the driver re-do it seems the certain things in the ioctl path causes the firmware to lock-up. I haven't been around enough to help with that problem. I have a binary that locks it up pretty quick. Most of the existing monitoring tools have bugs. The Linux tools tend to be better but the last copy of MegaMon leaked shared memory then quit. We have a tool at work but it is encumbered so we can't give it out. | > I did find a program | > posted to one of the freebsd lists called 'amrstat' that I run | > nightly. It produces this kind of output: | > | > Drive 0:68.24 GB, RAID1 io> optimal | > | > If it says "degraded" it is time to fix a drive. You just fire up | > the lsi megaraid tools and find out which drive it is. This is probably a faily good scheme. Caveat is that you can have a "optimal" RAID that is broken :-( On another note, ipmi is pretty good to remotely monitor these boxes and you can run the Dell SOL proxy tool for Linux on FreeBSD then setup the BIOS on the serial port and connect the serial port to BMC/LAN. FWIW, I've been working on an openipmi compatible driver. It basically works for a bunch of programs that I've tested with as long as they are compiled with a correct ioctl file. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: scsi card recommendation
Rutger Bevaart writes: | i've got about 15 Dell 1750, 1850 and 2850 boxes that use AMR-based SCSI | RAID controllers. i can manage these perfectly using emoore's port of the | amrcontrol and MEGAMGR tools, under 5.x only after adding the 4x-compat | ports package. Be very careful. Use of those utilities can result in random problems. I've had to remove all usage of any of that stuff from our systems. We've had other programs on the system core dump etc. :-( Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: installing FreeBSD on partition of a SATA Intel 865 raid0 volume
Saulius Menkevicius writes: | Doug White wrote: | >On Fri, 25 Mar 2005, James Wood wrote: | >>How do you setup FreeBSD on a partition of a raid0 volume? I downloaded | >>FreeBSD 5.3, then made a 60 GB partition in my raid volume, and then went to | >>boot from the CD. It did not see any raid volumes, it just sees two HDs. | > | >FreeBSD does not recognize the Adaptec HostRAID metadata so you will not | >be able to use RAID volumes configured with the HostRAID BIOS. You can use | >atacontrol to create FreeBSD software RAIDs, however. | | Actually there are is an unofficial patch to support the RAID0 mode in | ICH5-R. | http://www.ambrisko.com/doug/ata/ contains the patch, and I used it | without problems for half a year in | an i865pe/ich5-r configuration with RAID0 disk setup. (That was an older | version of the patch, though). FYI, Saulius ported the Intel RAID meta data to 5.3 a while ago. I put it up at: http://www.ambrisko.com/doug/ata/5.3-intel-raid-meta-data.patch FYI2, I should have a newer version of my ata patches since I finally figured out a bug I was having with a NFS root mount. The nfs_syncer spin-loops on bp queues and expects an interrupt routine to break it out on the side. Then soft updates tends to re-schedule work even when there is no work to do. These aren't really part of the ata stuff but are required to make it work better. Lastly I've made some more RAID robustness changes to deal with some error conditions we've caused here in testing. I need to roll them out of our dev. tree into my release tree. Then I should finally start getting to merge the HW support into FreeBSD 4.X tree now that I know the spin-loop bugs I was seeing wasn't something wacky in my code. Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: VMware, AIO, what's up?
Josef Karthauser writes: | On Fri, Mar 04, 2005 at 08:00:17PM -0800, Doug Ambrisko wrote: | > | > | I have aio.ko loaded and so it isn't that, and it worked a few days ago. | > | I'm scratching my head and could really do with a clue stick. Will | > | someone please throw me one? | > | > You need to run the | > vmware-any-any-update | > patch on the vmware binary to make it work with the new Linux libs. | > You can google to find it: | > | > a21p% ./update vmware | > Updating vmware ... VMware Workstation 2.0.4 (build-1142), now patched | > a21p% | > | | I've downloaded one from http://ftp.cvut.cz/vmware/, but it doesn't | work: | | genius% /tmp/vmware-any-any-update89/update vmware | /usr/local/lib/vmware/bin/vmware | Updating /usr/local/lib/vmware/bin/vmware ... failed | Cannot open /usr/local/lib/vmware/bin/vmware: m | genius% /tmp/vmware-any-any-update89/update vmware /usr/local/lib/vmware/bin | Updating /usr/local/lib/vmware/bin ... failed | Cannot open /usr/local/lib/vmware/bin: m Julian said its args. changed from mine: jules# ./update vmware /usr/local/lib/vmware/bin/vmware Updating /usr/local/lib/vmware/bin/vmware ... VMware Workstation 2.0.4 (build-1142), now patched jules# Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: VMware, AIO, what's up?
Josef Karthauser writes: | I'm confused. VMWare (3) no longer works for me. I upgraded my base | linux to base_linux_8 (from 6 I think) and now I get: | | VMware PANIC: (ide0:0) NOT_IMPLEMENTED F(831):712 | VMware PANIC: (VMX) AIO: NOT_IMPLEMENTED F(831):712 | | I have aio.ko loaded and so it isn't that, and it worked a few days ago. | I'm scratching my head and could really do with a clue stick. Will | someone please throw me one? You need to run the vmware-any-any-update patch on the vmware binary to make it work with the new Linux libs. You can google to find it: a21p% ./update vmware Updating vmware ... VMware Workstation 2.0.4 (build-1142), now patched a21p% Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "ioctl DIOCSMBR: Operation not permitted" from "boot0cfg -s 1"
David Wolfskill writes: | freebeast(5.4-P)[1] sudo boot0cfg -s 1 -v ad0 | Password: | boot0cfg: /dev/ad0: ioctl DIOCSMBR: Operation not permitted | freebeast(5.4-P)[2] You might try: sysctl kern.geom.debugflags=16 Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: stable sata patch: panic at kernel boot (can't dump)
Dmitry Morozovsky writes: | On Wed, 16 Feb 2005, Doug Ambrisko wrote: | | DA> I haven't tried using vinum with my patch set. That might be a problem. | DA> I'm not sure if anyone has tried vinum with my patch set most people use | DA> ata-raid if anything at all. | | I missed: the first machine I tried your patchset at uses vinum for all its | life, and it works like a charm. I suppose for this case cmd649 is the problem | but have no spare pci ATA controll to check... That's good new and bad news. Good news that there isn't a problem with interaction with vinum. I've been swamped at work so I haven't had a chance to test vinum with the patch set. I recall that the cmd649 is a problem controller in general (not specfic to my patches). Does it work okay without my patchset? Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: stable sata patch: panic at kernel boot (can't dump)
Dmitry Morozovsky writes: | On Wed, 16 Feb 2005, Doug Ambrisko wrote: | | DA> | trying to boot RELENG_4 kernel with your patches (sata_7) on our FTP I got | DA> | kernel panic (page fault in kernel mode, pid 2, no dump possible). Hardware | DA> | involved: | DA> | | DA> | [EMAIL PROTECTED]:~# grep ata /var/run/dmesg.boot | DA> | atapci0: port 0xa000-0xa03f,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 mem 0xed10-0xed11 irq 11 at device 8.0 on pci0 | DA> | ata2: at 0x9000 on atapci0 | DA> | ata3: at 0x9800 on atapci0 | DA> | atapci1: port 0xb400-0xb40f,0xb000-0xb003,0xac00-0xac07,0xa800-0xa803,0xa400-0xa407 irq 10 at device 9.0 on pci0 | DA> | ata4: at 0xa400 on atapci1 | DA> | ata5: at 0xac00 on atapci1 | DA> | atapci2: port 0xbc00-0xbc0f at device 17.1 on pci0 | DA> | ata0: at 0x1f0 irq 14 on atapci2 | DA> | ata1: at 0x170 irq 15 on atapci2 | DA> | ad0: 238475MB [484521/16/63] at ata0-master UDMA100 | DA> | ad2: 114473MB [232581/16/63] at ata1-master UDMA100 | DA> | ad4: 76319MB [155061/16/63] at ata2-master UDMA66 | DA> | ad6: 76319MB [155061/16/63] at ata3-master UDMA66 | DA> | ad8: 57241MB [116301/16/63] at ata4-master UDMA100 | DA> | | DA> | Kernel paniced just after sio0/sio1, where basic RELENG_4 starts ata channel | DA> | probes. No serial console at the moment, alas. | DA> | | DA> | Unfortunately I can't bring this machine out of service for long time; however, | DA> | we can survive occasional reboots/crashes. What other info can I provide to | DA> | debug this? | DA> | DA> I'd like some clarification. Does the system boot sometimes and other times | DA> is doesn't? Once the system is up does it stay up for a while? It doesn't | DA> seem like you are not using RAID. I have a couple more ata bug fixes that | DA> I need to roll into another patchset. It fixes a bug in which DMA transfers | DA> have not been cancelled when the controller is reset. I fixed another | DA> panic situation in version 8 that happens on boot if you have a bad sector | DA> at the beginning of the drive. I'd wait to version 9. I should be able | DA> to get that out later today. | | Sorry to not being specific enough ;-) | | No, the system panics reliably, just after sio initializing (for me it seems | ata drives probes phase). I did not use hardware RAID, I use vinum over these 5 | drives. | | Without the patchset system stays up for months acting as ftp/cvsupd/nfsd | server without any single issue. You are not using any SATA drives or have any SATA adapters correct. I haven't tried using vinum with my patch set. That might be a problem. I'm not sure if anyone has tried vinum with my patch set most people use ata-raid if anything at all. I'm not sure if I'll have time today to setup vinum to test with. If you are not using SATA or ata-raid you will only see some minimal advantages with this patch set. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: stable sata patch: panic at kernel boot (can't dump)
Dmitry Morozovsky writes: | Dear Doug, | | trying to boot RELENG_4 kernel with your patches (sata_7) on our FTP I got | kernel panic (page fault in kernel mode, pid 2, no dump possible). Hardware | involved: | | [EMAIL PROTECTED]:~# grep ata /var/run/dmesg.boot | atapci0: port 0xa000-0xa03f,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 mem 0xed10-0xed11 irq 11 at device 8.0 on pci0 | ata2: at 0x9000 on atapci0 | ata3: at 0x9800 on atapci0 | atapci1: port 0xb400-0xb40f,0xb000-0xb003,0xac00-0xac07,0xa800-0xa803,0xa400-0xa407 irq 10 at device 9.0 on pci0 | ata4: at 0xa400 on atapci1 | ata5: at 0xac00 on atapci1 | atapci2: port 0xbc00-0xbc0f at device 17.1 on pci0 | ata0: at 0x1f0 irq 14 on atapci2 | ata1: at 0x170 irq 15 on atapci2 | ad0: 238475MB [484521/16/63] at ata0-master UDMA100 | ad2: 114473MB [232581/16/63] at ata1-master UDMA100 | ad4: 76319MB [155061/16/63] at ata2-master UDMA66 | ad6: 76319MB [155061/16/63] at ata3-master UDMA66 | ad8: 57241MB [116301/16/63] at ata4-master UDMA100 | | Kernel paniced just after sio0/sio1, where basic RELENG_4 starts ata channel | probes. No serial console at the moment, alas. | | Unfortunately I can't bring this machine out of service for long time; however, | we can survive occasional reboots/crashes. What other info can I provide to | debug this? I'd like some clarification. Does the system boot sometimes and other times is doesn't? Once the system is up does it stay up for a while? It doesn't seem like you are not using RAID. I have a couple more ata bug fixes that I need to roll into another patchset. It fixes a bug in which DMA transfers have not been cancelled when the controller is reset. I fixed another panic situation in version 8 that happens on boot if you have a bad sector at the beginning of the drive. I'd wait to version 9. I should be able to get that out later today. Another thing that you might want to do is monitor dmesgs for any ata/ad errors while the system is running. Most panics happen later after the first error message. Also you could try looking at /var/log/messages. Thanks, Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: MegaRAID 'Bad Slot' Kernel message and crash.
Tony Byrne writes: | Basically, after some amount of uptime the kernel will emit a "amr0: | Bad slot x completed" message and pretty soon after this the box goes into a | partially unresponsive state forcing us to reboot it. So far the only | thing triggering the problem is the nightly jobs, where the amount of | IO is higher than during the day. | | Before deployment, we tested the box with 5.3-STABLE and managed to | trigger the problem twice. This forced us to try 4.10-STABLE which | was fine in testing and for a number of weeks after deployment. | However, just before new year we saw our first Bad Slot and crash under | 4.10. Since then it has happened 3 more times. We have upgraded the firmware to | the latest version available from Intel, and if anything this has made | the problem worse. | | The machine had 3 disks configured as a single RAID5 array. A fourth | disk is configured as a hot-standby. The card is equipped with 128Mb | of battery-backed cache. Write-back caching is enabled on the card. | Read-ahead caching is enabled in non-adaptive mode. | | Is anyone else using a SRCU42X RAID card and seeing similar | problems to ours? What about other cards supported by the amr driver? We run RAID 10 across 4 drives at work on Dell PE2850's which have amr RAID's and no-one has reported this problem to me (which they do). We run FreeBSD 4.10 & 5.3 on them. This is with and without our local mods. We have most experience with 4.10. Dell has their own firmware version (atleast to call it is a PERC controller). For now this is a "works for me". Doug A. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Promise TX2 SATA controllers
Jean-Francois Dockes writes: | Just in case it may help someone (this information is not very easily | accessible in the archives): | | - I have a Promise TX2 controller with a PCI ID of 0x3375105a . It works |for me in 4.10 by adding the new PCI ID everywhere that you'll find the |other/old one (0x3371105a) in the patch (see next paragraph) or kernel |source under dev/ata. Don't blame me if you lose your data, I will not |take responsibility, but this is weakly supported by the the two |controllers appearing to be handled just the same in -current. I added it to my local tree and it be in the next patch set. I need to add soft error recovery (ie. if one drive has a read error automatically recovery from the other drive) and a little more graceful addition of a failed drive back into the RAID. I also fixed a raid bug in ar_rw which could lead to a panic on on I/O error. Thanks, Doug A. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: atacontrol Raid, cannot re-add member to array
Harald Schmalzbauer writes: | I've never tried ataraid with "non-raid" controllers but I doubt that | detach/attach would work. I asked S?ren about the missing addspare in -stable | but never got any answer. I've add addspare and some other features in my 4.10 Release patches: http://www.ambrisko.com/doug/ata/ata_stable_sata_5.patch You might want to give that a try. Doug A. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Backporting S-ATA driver SiI 3112a to FreeBSD-STABLE?
Dag-Erling Smørgrav writes: | Doug Ambrisko <[EMAIL PROTECTED]> writes: | > BTW a failure mode in the SATA spec. says freeze (ie. lock up the | > system if you don't acknowledge SATA issues). | | Huh? Care to elaborate? >From the Serial ATA Spec. 1.0a. Section 11.1 for error handling: Error responses are generally classified into four categories Freeze Abort Retry Track/ignore The error handling responses described in this section are not comprehensive and are included to cover specific known error scenarios as well as to illustrate typical error control and recovery actions. This section is therefore descriptive and supplemental to the error reporting interface defined in section 10 and implementations may vary in their internal error recovery and control actions. For the most severe error conditions in which state has been critically perturbed in a way that it is not recoverable, the appropriate error response is to freeze and rely on a reset or similar operation to restore all necessary state to return to normal operation. I have seen freeze result in system lock-ups in which an NMI can't break into the debugger etc. With the Promise SATA cards if I have interrupts enabled and the interrupt handler checks whether or not a drive has left then the system lock ups go away with the Promise controller. For the Intel 6300ESB I need to poll the SATA serror register to look for SATA errors or the system will lock up. The way I generate an error is to either power off the SATA drive or pull the cable while the system is running. I'm running with ata-raid. Unfortunately the Intel parts don't interrupt on a SATA condition so I have to poll it. I need to put a timeout to check for drives coming or going then it should work close how the Promise controller works. The Intel 6300ESB does indeed have both sstatus and serror registers even though they only claim the serror is there. The sstatus register says whether or not a drive is there. Most of the existing ATA code will just bang on the controller and if a SATA error happens it is ignored and eventually just doing a inb or outb to the controller will lock up the system. I did initial instrumentation to that level of inb/outb. This wasn't a lot of fun to debug since if I messed up I'd get a system lock up. Granted in normal operation this isn't a problem but during failure recovery that I was testing system lock ups are not good. Now I really like the SATA stuff since it is pretty easy to implement kernel support for hot plug like USB drives etc. Doug A. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Problem with dc-nics 10,11
Holger Kipp writes: | I have a little problem with dc10, dc11. I use three quad dc cards, | so far from dc0 up to dc8 with no problems. | | All (dc0 to dc11) are displayed correctly with pciconf and with ifconfig. | The trouble is with dc10 and dc11 that they don't send any data out and | also don't react to arp requests etc. - at least using tcpdump won't show | anything coming in or going out. | Monitoring from an external system, this is the same. According to the | blinkinglights on the switch in between (also tried a hub), pings from | the other machine (or arp-requests if I don't use a permanent entry) etc | are send to the correct cable. | | As everything works from dc0 up to dc9, I'd suspect some sort of internal | name mismatching (like counting devices hexadecimal (dca) versus decimal | (dc10)). | | This is on an older system (4.6-STABLE). If someone had a similar problem | and it is now fixed in 4.8-STABLE, please let me know. Couldn't find a PR | for this... Considering that I've had 4*4 cards in prior 4.X systems my experience is that you have a BIOS that is not allocating resources to the cards after a while. I run into that before in which the BIOS stop setting up PCI devices after a certain number or not traversing all bridges. Doing a dmesg and looking IRQ allocation is a good starting point. It's probably bad. Doug A. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Any support for Intel ICH Watchdog ?
Don Bowman writes: | http://www.intel.com/design/chipsets/applnots/29227301.pdf | describes what is needed to support the watchdog (so that | stuck servers get unstuck :) I have some code at: http://www.ambrisko.com/doug/watchdog/ The implements SW & HW watch dogs. If HW exists it links in via syctl patches that lets the SW watch dog control HW watch dogs if they are in the system. This was done to permit better watch dog timeouts then HW since some hardware is very limited on the time duration so it is used to "enforce" the SW watch dog is still running. If the SW watch dog stops updated the HW watch dog then the machine reboots. The other advantage is that if the SW can provide the main watch dog service then it can cause a panic to figure out what went wrong. It has support for the Intel TCO watchdog and SIS630 chipset. This is prototype code that works. A scheme to add in sub drivers needs to be added. When FreeBSD decides how this should work then I'll probably redo in that sense. Caveat is no real HW bounds checks are done for valid timeout. The sysctl interface is nice in that you can kld{load,unload} the HW part and leave the SW part working. It also allows the watch dogs to be disabled when you enter the debugger etc. | Is there any support for this in freebsd stable? Yes it runs on -stable. Doug A. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: Aironet 350
Andrew Thompson writes: | I have a Cisco Aironet 350 wireless card which I am using in my FreeBSD | laptop. It works well except for the monitor mode, if I type the | follwoing commands the laptop will reset itself (no kernel panic, goes | straight to the post startup). | | < insert card > | ancontrol -i an0 -M 3 | ifconfig an0 up Well I don't do it that order. So maybe something busted. Are you running X when you do this. If you run X you are not likely to see a panic. Try to do it just from syscons. Also make sure you have kernel core dumps setup on your machine. None of that should matter though since it doesn't actually go into monitor mode until it is put in promiscous mode. | I am running "FreeBSD 4.7-RC #0: Wed Sep 25 11:26:38 NZST 2002" on a | Compaq evo n1000v and the card model is a AIR-PCM352. I have googled | but not found anything, has anyone else come across this? I haven't heard. Doug A. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message
Re: Problems with D-Link DEF-580TX??
Kal Torak writes: | Just wondering if anyone has been able to use the new DEF-580TX | quad port card without any problems?? No ... doesn't seem possible but you can make it happier with these patches. I need to test under current and then I will commit them. Seems like the chip has a fundamental problem to block any I/O but it's own under have RX load. Happens under Linux, Windows, FreeBSD etc. The Znyx (http://www.znyx.com/) cards seem to work fine under -stable. I have the 32 and 64 bit versions of the dc(4) versions. Doug A. Index: sys/pci/if_ste.c === RCS file: /cvs/src/sys/pci/if_ste.c,v retrieving revision 1.14.2.5 diff -u -r1.14.2.5 if_ste.c --- sys/pci/if_ste.c16 Dec 2001 15:46:08 - 1.14.2.5 +++ sys/pci/if_ste.c3 Aug 2002 03:36:06 - @@ -45,6 +45,7 @@ #include #include #include +#include #include @@ -415,6 +416,7 @@ { struct ste_softc*sc; struct mii_data *mii; + int i; sc = device_get_softc(dev); mii = device_get_softc(sc->ste_miibus); @@ -425,6 +427,15 @@ STE_CLRBIT2(sc, STE_MACCTL0, STE_MACCTL0_FULLDUPLEX); } + STE_SETBIT4(sc, STE_ASICCTL,STE_ASICCTL_RX_RESET | + STE_ASICCTL_TX_RESET); + for (i = 0; i < STE_TIMEOUT; i++) { + if (!(CSR_READ_4(sc, STE_ASICCTL) & STE_ASICCTL_RESET_BUSY)) + break; + } + if (i == STE_TIMEOUT) + printf("ste%d: rx reset never completed\n", sc->ste_unit); + return; } @@ -643,6 +654,9 @@ ste_stats_update(sc); } + if (status & STE_ISR_LINKEVENT) + mii_pollstat(device_get_softc(sc->ste_miibus)); + if (status & STE_ISR_HOSTERR) { ste_reset(sc); ste_init(sc); @@ -669,17 +683,20 @@ struct mbuf*m; struct ifnet *ifp; struct ste_chain_onefrag*cur_rx; - int total_len = 0; + int total_len = 0, count=0; u_int32_t rxstat; ifp = &sc->arpcom.ac_if; -again: + while((rxstat = sc->ste_cdata.ste_rx_head->ste_ptr->ste_status) + & STE_RXSTAT_DMADONE) { + if ((STE_RX_LIST_CNT - count) < 3) { + break; + } - while((rxstat = sc->ste_cdata.ste_rx_head->ste_ptr->ste_status)) { cur_rx = sc->ste_cdata.ste_rx_head; sc->ste_cdata.ste_rx_head = cur_rx->ste_next; - + /* * If an error occurs, update stats, clear the * status word and leave the mbuf cluster in place: @@ -730,29 +747,9 @@ /* Remove header from mbuf and pass it on. */ m_adj(m, sizeof(struct ether_header)); ether_input(ifp, eh, m); - } - - /* -* Handle the 'end of channel' condition. When the upload -* engine hits the end of the RX ring, it will stall. This -* is our cue to flush the RX ring, reload the uplist pointer -* register and unstall the engine. -* XXX This is actually a little goofy. With the ThunderLAN -* chip, you get an interrupt when the receiver hits the end -* of the receive ring, which tells you exactly when you -* you need to reload the ring pointer. Here we have to -* fake it. I'm mad at myself for not being clever enough -* to avoid the use of a goto here. -*/ - if (CSR_READ_4(sc, STE_RX_DMALIST_PTR) == 0 || - CSR_READ_4(sc, STE_DMACTL) & STE_DMACTL_RXDMA_STOPPED) { - STE_SETBIT4(sc, STE_DMACTL, STE_DMACTL_RXDMA_STALL); - ste_wait(sc); - CSR_WRITE_4(sc, STE_RX_DMALIST_PTR, - vtophys(&sc->ste_ldata->ste_rx_list[0])); - sc->ste_cdata.ste_rx_head = &sc->ste_cdata.ste_rx_chain[0]; - STE_SETBIT4(sc, STE_DMACTL, STE_DMACTL_RXDMA_UNSTALL); - goto again; + + cur_rx->ste_ptr->ste_status = 0; + count++; } return; @@ -836,11 +833,9 @@ void*xsc; { struct ste_softc*sc; - struct ste_statsstats; struct ifnet*ifp; struct mii_data *mii; - int i, s; - u_int8_t*p; + int s; s = splimp(); @@ -848,24 +843,23 @@ ifp = &sc->arpcom.ac_if; mii = device_get_softc(sc->ste_miibus); - p = (u_int8_t *)&stats; - - for (i = 0; i < sizeof(stats); i++) { - *p = CSR_READ_1(sc, STE_STATS + i); - p++; - } - - ifp->if_collisions += stats.ste_singl
Re: sis0: incorrect mac address
W. Desjardins writes: | Hello, | | on 3 out of 4 servers just installed, I get this when looking at ifconfig: | | sis0: flags=8843 mtu 1500 | inet 66.28.74.109 netmask 0xffe0 broadcast 66.28.74.127 | inet6 fe80::d483:b781:285a:6ea1%sis0 prefixlen 64 scopeid 0x1 | ether 00:00:00:00:00:00 | NOTE:--->^ | media: Ethernet autoselect (100baseTX ) | status: active | | these machines are due for production, but without valid mac's, they cant | talk to each other. | | systems are running 4.5 RELEASE with custom kernel (GENERIC had same | results). The motherboard is an asus cusi-fx sis socket 370 with the sis | 630e onboard fast ethernet chipset. | | I have 7 more of these exact same machines with most also running 4.5R | fine and showing normal mac addresses. normally I run stable on all my | machines, but I have been bringing them up to 4.5R to get them all in sync | with each other since they are all identical. | | has anyone had any problems with the recent versions of this motherboard | or am I looking at a few bad chipsets? Well you are dealing with an obsolete board. They may have built some with a 630ET chipset which is used on the ASUS TUSI motherboards. Here is a patch that fixes 630ET support in -stable (already fixed in -current). Note the TUSI and CUSI board look exactly the same except for voltage regulator. We have a bunch of the newer TUSI boards here. If this patch doesn't work can you add a printf to dump the "sc->sis_rev" value? Thanks, Doug A. Index: if_sisreg.h === RCS file: /cvs/src/sys/pci/if_sisreg.h,v retrieving revision 1.1.4.9 diff -u -r1.1.4.9 if_sisreg.h --- if_sisreg.h 9 Feb 2002 23:02:40 - 1.1.4.9 +++ if_sisreg.h 19 Feb 2002 03:49:55 - @@ -369,7 +369,7 @@ #define SIS_REV_630E 0x0081 #define SIS_REV_630S 0x0082 #define SIS_REV_630EA1 0x0083 -#define SIS_REV_630ET 0x0083 +#define SIS_REV_630ET 0x0084 #define SIS_REV_6350x0090 /* Index: if_sis.c === RCS file: /cvs/src/sys/pci/if_sis.c,v retrieving revision 1.13.4.19 diff -u -r1.13.4.19 if_sis.c --- if_sis.c9 Feb 2002 23:02:40 - 1.13.4.19 +++ if_sis.c19 Feb 2002 03:49:55 - @@ -919,11 +919,11 @@ */ if (sc->sis_rev == SIS_REV_630S || sc->sis_rev == SIS_REV_630E || - sc->sis_rev == SIS_REV_630EA1 || - sc->sis_rev == SIS_REV_630ET) + sc->sis_rev == SIS_REV_630EA1) sis_read_cmos(sc, dev, (caddr_t)&eaddr, 0x9, 6); - else if (sc->sis_rev == SIS_REV_635) + else if (sc->sis_rev == SIS_REV_635 || +sc->sis_rev == SIS_REV_630ET) sis_read_mac(sc, dev, (caddr_t)&eaddr); else #endif @@ -937,13 +937,6 @@ */ printf("sis%d: Ethernet address: %6D\n", unit, eaddr, ":"); - /* -* From the Linux driver: -* 630ET : set the mii access mode as software-mode -*/ - if (sc->sis_rev == SIS_REV_630ET) - SIS_SETBIT(sc, SIS_CSR, SIS_CSR_ACCESS_MODE); - sc->sis_unit = unit; callout_handle_init(&sc->sis_stat_ch); bcopy(eaddr, (char *)&sc->arpcom.ac_enaddr, ETHER_ADDR_LEN); To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-stable" in the body of the message