Re: Vinum deprecation for FreeBSD 14 - are there any remaining Vinum users?

2021-04-01 Thread Doug Ambrisko
On Thu, Apr 01, 2021 at 11:20:44AM +0300, Lev Serebryakov wrote:
| On 01.04.2021 2:39, Doug Ambrisko wrote:
| 
| > | > I can only state that I use it only occasionally, and that when I do. I
| > | > have had no problems with it. I'm glad that it's there when I need it.
| > |
| > | Thanks for the reply. Can you comment on your use cases - in
| > | particular, did you use mirror, stripe, or raid5? If the first two
| > | then gmirror, gconcat, gstripe, and/or graid are suitable
| > | replacements.
| > |
| > | I'm not looking to deprecate it just because it's old, but because of
| > | a mismatch between user and developer expectations about its
| > | stability.
| > 
| > It would be nice if graid got full support for RAID5 alteast I'm not sure
| > how much the others are used for that are not fully supported (RAID4,
| > RAID5, RAID5E, RAID5EE, RAID5R, RAID6, RAIDMDF) according to the man
| > page.  I started to hack in RAID5 full support and try to avoid writes
| > if members didn't change.  This limits our VROC support.
|   My experience, as co-author and maintainer of `sysutil/graid5`, 
|   shows, that it is very non-trivial task. It contains many subtle
|   problems.
| 
|   `graid5` still has some undiscovered problems, and I don't think it 
|   worth fixing in 2021, when we have ZFS for many years.

The only advantage I see of graid supporting raid5 would be better support
for VROC and people like RAID5.  I don't like RAID5 for SSD's since it
adds to write amplification issues but people like it.  RAID5 had
terrible write performance in Linux with concurrent I/O.  I wanted to
see if FreeBSD could do better.

Intel seems to be pushing VMD since we recently had a FreeBSD user
need newer VMD support since they couldn't turn it off in the BIOS.
VMware doesn't support VROC.  We support it a bit in that VMD allows
graid to access the drives and deals with the Intel meta data.  It
doesn't read the info. from the EFI runtime.  So in RAID 0, 1 and 10
should work.  It would be nice if someone could install FreeBSD on
working Linux config.  No-one has asked for it so it doesn't seem
very important.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Vinum deprecation for FreeBSD 14 - are there any remaining Vinum users?

2021-03-31 Thread Doug Ambrisko
On Fri, Mar 26, 2021 at 10:22:53AM -0400, Ed Maste wrote:
| On Thu, 25 Mar 2021 at 15:09, Chris  wrote:
| >
| > I can only state that I use it only occasionally, and that when I do. I
| > have had no problems with it. I'm glad that it's there when I need it.
| 
| Thanks for the reply. Can you comment on your use cases - in
| particular, did you use mirror, stripe, or raid5? If the first two
| then gmirror, gconcat, gstripe, and/or graid are suitable
| replacements.
| 
| I'm not looking to deprecate it just because it's old, but because of
| a mismatch between user and developer expectations about its
| stability.

It would be nice if graid got full support for RAID5 alteast I'm not sure
how much the others are used for that are not fully supported (RAID4,
RAID5, RAID5E, RAID5EE, RAID5R, RAID6, RAIDMDF) according to the man
page.  I started to hack in RAID5 full support and try to avoid writes
if members didn't change.  This limits our VROC support.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Cisco 12G SAS RAID support (FreeBSD 12.1-RELEASE) ?

2019-11-08 Thread Doug Ambrisko
On Tue, Nov 05, 2019 at 09:44:36PM +0100, Miroslav Lachman wrote:
| Chris Ross wrote on 11/05/2019 21:19:
| > On Tue, Nov 05, 2019 at 08:20:15PM +0100, Miroslav Lachman wrote:
| >> Chris Ross wrote on 11/05/2019 19:34:
| >>> Hello.  I have a Cisco UCS C220-M5 with a RAID controller.  It calls 
itself
| >>> "Cisco 12G Modular Raid Controller with 2GB cache", PPID UCSC-RAID-M5.
| >>> Looking at the CIMC, it shows the PCI vendor/device ids 1000:0014, which
| >>> looks to be an LSI MegaRAID Tri-Mode SAS3516.  It looks like this should
| >>> be supported by the mpr(4) driver, but it doesn't seem to recognize it
| >>> at boot time.
| >>
| >> Do you have mpr_load="YES" in loader.conf?
| >> Or for ISO booting you can manually load kernel modules at boot prompt.
| > 
| > I dropped to boot prompt in ISO boot, and entered 'mpr_load="YES"'.
| > 
| > I tried "load", but wasn't able to devine how to load the mpr module with
| > that.  Is that needed, or should 'mpr_load="YES"' have accomplished the
| > desired result?
| 
| mpr_load="YES" goes to /etc/loader.conf
| 
| If you need to load mpr manually in boot prompt I am not sure if it 
| should be:
| load mpr
| or
| load mpr.ko
| of full path
| load /boot/kernel/mpr.ko

This should be a mrsas card and not an HBA!  mrsas supports all current
UCS RAID cards ... and the next unreleased UCS system :-)  You might need
the one in -current for that.  I'm not sure what is in 12.1.

Doug A.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Hangs with mrsas?

2016-03-22 Thread Doug Ambrisko
On Tue, Mar 22, 2016 at 04:09:48PM -0400, Garrett Wollman wrote:
| < 
said:
| 
| > You could try:
| > https://people.freebsd.org/~ambrisko/mrsas.patch
| 
| I take it that the important part of this patch is changing the DMA
| tag and scatter/gather setup to allow 64-bit addresses?  (Why would
| the original driver have been limited to 32-bit addresses?  It's quite
| new hardware!)

Yes, primarily ... there are some other things such as let the OS set
things up especially in the ioctl path since user-land probably won't
setup a proper SG list for the kernel.  The DMA address space for the
card was limited to 256K in 32 bit address space.  So it didn't take
much to fragment that up so things could fail or have to wait to get
memory.  On initial boot things worked "okay" but after some run time
with our appliance (we run 64 bit) memory allocations would have issues.
We found this was made worse with RAID cards that didn't have cache.
I assume no cache would make I/O operations to take longer and then tie
up memory longer.  With the same SW running on cards with cache we didn't
see these issues.  So I assume they completed fast enough not to hold
onto memory for very long.  With these changes our appliances without
RAID cache runs faster and doesn't run into "strange" issues now.
We run in RAID 10 mode.

It also adds RAID card event messages to dmesg.

On the plus side this code exposed a VM bug in 9.2 for us!

There is still a bug that with a card without cache if I send lots of
management commands quickly to reconfigure the RAID the driver
reports the firmware had an OCR issue and never recovers.  If I put
a sleep 1 after each command then it is okay.  I need to
try this again and dump the term log to see if the firmware
will give me a clue.  With the cards that we are currently using
the RAID cache is an option.  So they only thing I'm changing is
the HW and not the firmware.  However, the firmware seems to flip
itself into different device when I add or remove cache.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Installer on serial-console-only-embedded system

2013-08-12 Thread Doug Ambrisko
On Mon, Aug 12, 2013 at 01:53:15PM +, Teske, Devin wrote:
| sysinstall had the ability to allow you to muck with /etc/ttys before
| rebooting to your installed OS.
|
| This functionality is coming back slowly.
|
| In 9.2-R you will be able to (somehow) bow out of the installation process
| after it's complete (e.g., "Ctrl-C" ??) and then run bsdconfig -- invoking
| the "TTYs" module, giving you a chance to change the settings before you
| reboot from your newly installed system.
|
| Tighter Integration will follow in the years to come... but replacing a
| tool that had a 15-year run which did _all_ of this stuff, is/was not an
| overnight project. Rather, it's a journey!

I also had made changes to sysinstall that if it detected a boot with
-h then it did the /etc/tty etc. changes automatically to the installed
system.  It would be good to see this come back.  I'm not sure if Robert's
official changes did that.  It's fairly easy to check what the console
device is and then do the right thing.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mfi panic on recused on non-recusive mutex MFI I/O lock

2012-11-09 Thread Doug Ambrisko
On Fri, Nov 09, 2012 at 05:06:03PM -, Steven Hartland wrote:
| 
| - Original Message - 
| From: "Steven Hartland"
| ...
| >I've just had another panic, trace below, but it doesn't seem to be related
| >to my changes so I'd appreciate your feedback on them as they are for now.
| >
| >While the lock patch fixes the problems I've seen, its not clear to me
| >why mfi_tbolt_reset is acquiring the lock and hence requiring
| >mfi_process_fw_state_chg_isr to jump through hoops to ensure locking
| >around queue manipulation is done correctly. Given what its doing
| >(resetting the entire adapter) I wouldn't be surprised if it should
| >really be acquiring the config lock.
| >
| >Other things I've noticed / questions
| >* Should mfi_abort sleep even if its call to mfi_mapcmd fails?
| >* Should mfi_get_controller_info really ignore the error from mfi_mapcmd?
| >* Do these controllers not support none 512 byte requests? Currently
| >all syspd requests are done assuming 512 byte sectors which the disk may
| >not be. This will both reduce performance or potentially break totally
| >if the firmware isn't translating it under the surface correctly.
| >
| >Anyway the new panic manually transcribed is:-
| >panic: Bad linx elm 0xff0069b0fc0 next->prev != elm
| >...
| >mfi_tbolt_get_cmd()
| >mfi_build_mpt_pass_thru()
| >mfi_tbolt_build_mpt_cmd()
| >mfi_tbolt_send_frame()
| >bus_dmamap_load()
| >mfi_mapcmd()
| >mfi_startio()
| >mfi_syspd_strategy()
| >g_disk_start()
| >g_io_schedule_down()
| >g_down_proc_body()
| >fork_exit()
| >fork_trampoline()
| >
| >Looks like mfi_cmd_tbolt_tqh has become corrupt some how, but as far as I
| >can tell all manip is done using the TAILQ macros and under mfi_io_lock
| >so its not obvious to me at this time why this is, any ideas?
| 
| I've gone through looking for the possible cause of this and while there's
| nothing directly connected to the manip of this queue I've found and fixed
| quite a large number of additional problems which may have been indirectly
| causing this problem.
| 
| The biggest change is to use mfi_max_cmds to limit the value stored in
| sc->mfi_max_fw_cmds as this is used extensively throughout the driver
| for allocation and range checks so having this inconsitently set opened up
| a large number of possible overrun errors.
| 
| The new patch attached documents all the changes in detail.
| 
| I've managed to do one test run so far which failed to reproduce any panics,
| so definitely moving in the right direction :)
| 
| The machine has now been collected for repair by the supplier but I'm going
| to try and get them to put it online for more testing over the weekend.
| 
| Given the failure rate so far if I can do another 4 runs with no panics I'd
| be happy that the majority of error conditions are working as expected.

Sounds like you have made some good progress.  I looked at your prior locking
change and they good.  Haven't had time to go through the queue changes
yet.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mfi panic on recused on non-recusive mutex MFI I/O lock

2012-11-06 Thread Doug Ambrisko
On Tue, Nov 06, 2012 at 12:09:42AM -, Steven Hartland wrote:
| Thanks Doug, actually just finished another test run with some more
| debugging in and I believe I've found the reason for the non-recusive
| lock and at least some of the queuing issues.
| 
| The non-recursive lock is due to the mfi_tbolt_reset calling
| mfi_process_fw_state_chg_isr with mfi_io_lock held which in turn calls
| mfi_tbolt_init_MFI_queue which tries to acquire mfi_io_lock hence
| the problem.
| 
| mfi-lock.txt attached I believe fixes this as well as what appears
| to be an invalid call to mtx_unlock(&sc->mfi_io_lock) in mfi_attach
| which never acquires the lock as far as can see, possibly a cut and
| paste error.

I don't seem to see the attachment.
 
| The invalid queue problems seem to stem from the error cases of
| the calls to mfi_mapcmd, some of which call mfi_release_command which
| blindly sets cm_flags = 0 and then enqueues it on the free queue. Now
| depending on the flow of mfi_mapcmd and where the error occurs the
| command may or may not have been put on the busy queue which is going
| to cause problems.
| 
| Going to investigate this further but that's what my current theory is.
| 
| Your patch seems quite extensive, so if could you give me brief run
| down on the changes that would be most appreciated.

I'll being doing that in the commit message which should happen today.
 
| FYI, I'm aware that the cause of my underlying issues are some
| hardware issues (likely cable or backplane related) but it does mean
| I'm in the position to test these usually rare error cases, so wanting
| the make the most of it before we get the hardware swapped out.

That would be good.  It makes it easier to debug things when it shows
the problem.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Problem with IPMI KCS driver

2012-10-18 Thread Doug Ambrisko
On Thu, Oct 18, 2012 at 01:44:59PM +0400, Anton Yuzhaninov wrote:
| On 28.09.2012 16:48, John Baldwin wrote:
| >>kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 point to
| >>>  /usr/src/sys/dev/ipmi/ipmi_kcs.c:94
| >>>
| >>> 91 while (ticks - start<  MAX_TIMEOUT&&
| >>> 92 !(status&  KCS_STATUS_OBF)) {
| >>> 93 DELAY(100);
| >>> 94 status = INB(sc, KCS_CTL_STS);
| >>> 95 }
| >Hummm.  I'm a bit out of ideas then.  Even the volatile change is a bug 
| >that
| >could have been confirmed (to see if volatile was preventing the compiler
| >from caching the value of 'ticks') by examining the assembly.
| >
| >Well, maybe this.  This just avoids using 'ticks' altogether and depends on
| >DELAY(100) doing what it says:
| 
| New patch also don't solve my problem.
| 
| My guess was wrong. Loop in kcs_wait_for_obf() is not endless, at least 
| with last patch.
| Whole function called in some loop, but because loop in kcs_wait_for_obf() 
| takes much CPU time, backtrace always point to loop kcs_wait_for_obf().

Yep, the IPMI local interfaces are polled so they use a lot of CPU
so it pretty much always going to be checking "are you done yet"
once a command is submitted.  We have local patches here that changes
the DELAY into a tsleep when the system is running.  It has the bad
feature of making it a lot slower but uses far less CPU.  So for us
it is a good trade off.  One reason to put it into a loop is
so things happen in order and are not interrupted.  I guess a different
approach might be to do a "big" lock around the entire submit and
get response code fargment.  Then it would be expensed against the
application thread running in the kernel.

We also have local changes to all it to run in polled mode without
the kernel thread when we are dumping a kernel backtrace into the
IPMI system event log.  That's nice when the kernel core hasn't
worked on a remote machine but we see the back trace in SEL.
 
| This problem need further investigation.

It might be good to instrument the code in ipmi.c in which it
sending a command and then getting status.  If that is actually
looking okay then maybe some application is doing something bad.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: IPMI hardware watchdogs Re: dell r420/r320 stable/9

2012-07-31 Thread Doug Ambrisko
On Fri, Jul 27, 2012 at 10:51:43PM +0300, Andriy Gapon wrote:
| on 27/07/2012 17:33 Andrew Boyer said the following:
| > 
| > On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote:
| > 
| >> For the time being I had to revert the following from my stable/9 tree. 
| >> Otherwise I would get a kernel panic on shutdown from ipmi(4).
| >> 
| >> http://svnweb.freebsd.org/base?view=revision&revision=237839 
| >> http://svnweb.freebsd.org/base?view=revision&revision=221121
| > 
| > On a somewhat related note: We noticed recently that you can't pet or 
disable
| > the IPMI hardware watchdog once SCHEDULER_STOPPED() is true.  This means it
| > can fire unexpectedly while you're dumping core or rebooting, depending on
| > how long the timeout was on the pet before the panic.  The ipmi driver will
| > need to process the command differently if the scheduler is stopped.  I
| > haven't had time to look at a fix yet.
| 
| Yeah, I noticed that unlike most (all?) other watchdog drivers where watchdog
| re-arming is a very basic operation like doing one I/O the IPMI watchdog does
| some more complex stuff which involves waiting on another thread.  I think 
that
| this may be a little bit too much for a reliable watchdog driver.  At least, 
as
| you note, this definitely won't work for the panic case where only one thread 
is
| left running.  I guess that the driver should check for that case and do a
| direct operation instead of enqueueing a request and waiting for another 
thread
| to execute it.

I have some local hacks, that allows KCS mode to run in a polled mode.
We do that so we can put kernel back traces into the system event
log.  Julian had code in FreeBSD to "pat" a watchdog during a core dump.
We have local code here to disable console muted when dropping into
the kernel debugger and enable console muting when exited.  It might
be useful to tie this into the watchdog, disable it when in kernel
debugger and resume it when exited.

With my polling hack, I don't think I delt with the case if there
was already a transaction in progress.  SMIC could be done like KCS.
SSIF could be harder since it uses the i2c interface to talk to the
HW which is more complicated.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: LSI MegaRAID SAS 9240 with mfi driver?

2012-04-16 Thread Doug Ambrisko
Jan Mikkelsen writes:
| On 31/03/2012, at 1:14 AM, Doug Ambrisko wrote:
| > John Baldwin writes:
| > | On Friday, March 30, 2012 12:06:40 am Jan Mikkelsen wrote:
| > | > Hi,
| > | >
| ...
| > 
| > | > I have a loan LSI MegaRAID SAS 9240-4i controller for testing.
| > | > The pciconf -lv output is:
| > | > 
| > | > none3@pci0:1:0:0:   class=0x010400 card=0x92411000 chip=0x00731000 
rev=0x03 hdr=0x00
| > | > vendor = 'LSI Logic / Symbios Logic'
| > | > device = 'MegaRAID SAS 9240'
| > | > class  = mass storage
| > | > subclass   = RAID
| > | > 
| > | > I added this line to src/sys/dev/mfi/mfi_pci.c
| > | > 
| > | > {0x1000, 0x0073, 0x, 0x, MFI_FLAGS_GEN2, "LSI MegaRAID 
SAS 9240"},
| > | > 
| > | > It gave this result (tried with hw.mfi.msi set to 0 and to 1):
| > | > 
| > | > mfi0:  port 0xdc00-0xdcff mem 
0xfe7bc000-0xfe7b,0xfe7c-0xfe7f irq 16 at device 0.0 on pci1
| > | > mfi0: Using MSI
| > | > mfi0: Megaraid SAS driver Ver 3.00 
| > | > mfi0: Frame 0xff8000285000 timed out command 0x26C8040
| > | > mfi0: failed to send init command
| > | > 
| > | > The firmware is package 20.10.1-0077, which is the latest on the LSI 
website.
| > | > 
| > | > Is this path likely to work out? Any suggestions on where to go from 
here?
| > | 
| > | You should try the updated mfi(4) driver that Doug (cc'd) is going to soon
| > | merge into HEAD.  It syncs up with the mfi(4) driver on LSI's website 
which
| > | supports several cards that the current mfi(4) driver does not.  (I'm not
| > | fully sure if the 9240 is in that group or not.  Doug might know however.)
| > 
| > Yes, this card is supported with the mfi(4) in projects/head_mfi.  Looks
| > like we fixed a couple of last minute found bugs when trying to create a
| > RAID wth mfiutil.  This should be fixed now.  I'm going to start the
| > merge to -current today.  The version in head_mfi can run on older
| > versions of FreeBSD with the changes that Sean did.
| 
| I have just imported the mfi(4) and mfiutil(8) into a 9.0-RELEASE tree to 
| try this out.
| 
| When booting up with two fresh drives attached, they show up as usable 
| JBOD disks. However, I cannot use mfiutil to create anything with them. 
| Every drive gives
| 
|"mfiutil: Drive n not available"

You might want to include the output of:
mfiutil show drives
and then the command you are trying to do to create a RAID with.
 
| Is this expected behaviour? How can I create a raid1 volume using 
| mfiutil and clean disks?

I'm not sure if mfiutil can switch disks from JBOD mode to RAID.
I don't see any reason why it shouldn't.  It can't go from RAID to 
real JBOD mode since it doesn't have code to support that.
 
| I tried using MegaCli from the LSI website (versions 8.02.16 and
| 8.02.21), but they can't even detect the controller. I know you
| said at some point that a very recent version of MegaCli was 
| required. What version is necessary?

What was the syntax you used since usage is cryptic?  I've never
seen a MegaCli that couldn't access the card.  What I meant by
more recent MegaCli is earlier versions didn't have the JBOD
commands in it.  I have a 8.00.46 that knows about JBOD.
 
| dmesg:
| 
| mfi0:  port 0xdc00-0xdcff mem 
0xfe7bc000-0xfe7b,0xfe7c-0xfe7f irq 16 at device 0.0 on pci1
| mfi0: Using MSI
| mfi0: Megaraid SAS driver Ver 4.23 
| mfi0: 7021 (387925223s/0x0020/info) - Shutdown command received from host
| mfi0: 7022 (boot + 4s/0x0020/info) - Firmware initialization started (PCI ID 
0073/1000/9241/1000)
| mfi0: 7023 (boot + 4s/0x0020/info) - Firmware version 2.120.244-1482
| mfi0: 7024 (boot + 5s/0x0020/info) - Package version 20.10.1-0077
| mfi0: 7025 (boot + 5s/0x0020/info) - Board Revision 03A
| mfi0: 7026 (boot + 33s/0x0002/info) - Inserted: PD 32(e0xff/s1)
| mfisyspd0:  on mfi0
| mfisyspd0: 1907729MB (3907029168 sectors) SYSPD volume
| mfisyspd0:  SYSPD volume attached
| mfisyspd1:  on mfi0
| mfisyspd1: 1907729MB (3907029168 sectors) SYSPD volume
| mfisyspd1:  SYSPD volume attached

You are definitely in real JBOD mode with each drive being /dev/mfisyspd0
and /dev/mfisyspd1.  So you can access the drives as those to do some
experiments with if you want to.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-06 Thread Doug Ambrisko
Alexander Motin writes:
| On 04/06/12 20:12, Doug Ambrisko wrote:
| > Alexander Motin writes:
| > | On 04/04/12 21:47, John Baldwin wrote:
| > |>  On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:
| > |>>  John Baldwin writes:
| > |>>  | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
| > |>>  |>   John Baldwin writes:
| > |>>  |>   | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
| > |>>  |>   |>   Doug Ambrisko writes:
| > |>>  |>   |>   | John Baldwin writes:
| > |>>  |>   |>   | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko 
wrote:
| > |>>  |>   |>   | |>   Sean Bruno writes:
| > |>>  |>   |>   | |>   | Noting a failure to attach to the onboard IPMI 
controller
| > |>  with
| > |>>  | this
| > |>>  |>   | dell
| > |>>  |>   |>   | |>   | R815.  Not sure what to start poking at and thought 
I'd
| > |>  though
| > |>>  | this
| > |>>  |>   | over
| > |>>  |>   |>   | |>   | here for comment.
| > |>>  |>   |>   | |>   |
| > |>>  |>   |>   | |>   | -bash-4.2$ dmesg |grep ipmi
| > |>>  |>   |>   | |>   | ipmi0: KCS mode found at io 0xca8 on acpi
| > |>>  |>   |>   | |>   | ipmi1:   on isa0
| > |>>  |>   |>   | |>   | device_attach: ipmi1 attach returned 16
| > |>>  |>   |>   | |>   | ipmi1:   on isa0
| > |>>  |>   |>   | |>   | device_attach: ipmi1 attach returned 16
| > |>>  |>   |>   | |>   | ipmi0: Timed out waiting for GET_DEVICE_ID
| > |>>  |>   |>   | |>
| > |>>  |>   |>   | |>   I've run into this recently.  A quick hack to fix it 
is:
| > |>>  |>   |>   | |>
| > |>>  |>   |>   | |>   Index: ipmi.c
| > |>>  |>   |>   | |>
| > [snip]
| > |>>  | If you use "-ct" then you get a file you can feed into schedgraph.
| > |>>  | However, just reading the log, it seems that IRQ 20 keeps preempting
| > |>>  | the KCS worker thread preventing it from getting anything done.  
Also,
| > |>>  | there seem to be a lot of threads on CPU 0's runqueue waiting for a
| > |>>  | chance to run (load average of 12 or 13 the entire time).  You can 
try
| > |>>  | just bumping up the max timeout from 3 seconds to higher perhaps.  
Not
| > |>>  | sure why IRQ 20 keeps firing though.  It might be related to USB, so
| > |>>  | you could try fiddling with USB options in the BIOS perhaps, or 
disabling
| > |>>  | the USB drivers to see if that fixes IPMI.
| > |>>
| > |>>  Tried without USB in kernel:
| > |>> http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt
| > |>
| > |>  Hmm, it's still just running constantly (note that the idle thread is
| > |>  _never_ scheduled).  The lion's share of the time seems to be spent in
| > |>  "xpt_thrd".  Note that there are several places where nothing happens 
except
| > |>  that "xpt_thrd" runs constantly (spinning) during 10's of statclock 
ticks.  I
| > |>  would maybe start debugging that to see what in the world it is doing.  
Maybe
| > |>  it is polling some hardware down in xpt_action() (i.e., xpt_action() 
for a
| > |>  single bus called down into a driver and it is just spinning using 
polling
| > |>  instead of sleeping and waiting for an interrupt).
| > |
| > | "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus
| > | on attach and by controller driver on hot-plug events. For some
| > | controllers it may be quite CPU-hungry. For example, for legacy ATA
| > | controllers, where bus reset may take many seconds of hardware polling,
| > | while devices just spinning up. For ahci(4) it was improved about year
| > | ago to not use polling when possible, but it still may loop for some
| > | time if controller is not responding on reset. What mfi(4), mentioned in
| > | log, does during scanning, I am not sure.
| >
| > I thought that mfi(4) could be an issue.  There are some ata controllers
| > with nothing attached.  I built a GENERIC with USB and mfi commented out
| > and then the timeout issue went away:
| >ipmi0: KCS mode found at io 0xca8 on acpi
| >ipmi1:  on isa0
| >device_attach: ipmi1 attach returned 16
| >ipmi1:  on isa0
| >device_attach: ipmi1 attach returned 16
| >ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1
| >ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211
| >ipmi0: DEBUG ipmi_complete_reques

Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-06 Thread Doug Ambrisko
Alexander Motin writes:
[ Charset ISO-8859-1 unsupported, converting... ]
| On 04/04/12 21:47, John Baldwin wrote:
| > On Wednesday, April 04, 2012 12:24:33 pm Doug Ambrisko wrote:
| >> John Baldwin writes:
| >> | On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
| >> |>  John Baldwin writes:
| >> |>  | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
| >> |>  |>  Doug Ambrisko writes:
| >> |>  |>  | John Baldwin writes:
| >> |>  |>  | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
| >> |>  |>  | |>  Sean Bruno writes:
| >> |>  |>  | |>  | Noting a failure to attach to the onboard IPMI controller
| > with
| >> | this
| >> |>  | dell
| >> |>  |>  | |>  | R815.  Not sure what to start poking at and thought I'd
| > though
| >> | this
| >> |>  | over
| >> |>  |>  | |>  | here for comment.
| >> |>  |>  | |>  |
| >> |>  |>  | |>  | -bash-4.2$ dmesg |grep ipmi
| >> |>  |>  | |>  | ipmi0: KCS mode found at io 0xca8 on acpi
| >> |>  |>  | |>  | ipmi1:  on isa0
| >> |>  |>  | |>  | device_attach: ipmi1 attach returned 16
| >> |>  |>  | |>  | ipmi1:  on isa0
| >> |>  |>  | |>  | device_attach: ipmi1 attach returned 16
| >> |>  |>  | |>  | ipmi0: Timed out waiting for GET_DEVICE_ID
| >> |>  |>  | |>
| >> |>  |>  | |>  I've run into this recently.  A quick hack to fix it is:
| >> |>  |>  | |>
| >> |>  |>  | |>  Index: ipmi.c
| >> |>  |>  | |>
[snip]
| >> | If you use "-ct" then you get a file you can feed into schedgraph.
| >> | However, just reading the log, it seems that IRQ 20 keeps preempting
| >> | the KCS worker thread preventing it from getting anything done.  Also,
| >> | there seem to be a lot of threads on CPU 0's runqueue waiting for a
| >> | chance to run (load average of 12 or 13 the entire time).  You can try
| >> | just bumping up the max timeout from 3 seconds to higher perhaps.  Not
| >> | sure why IRQ 20 keeps firing though.  It might be related to USB, so
| >> | you could try fiddling with USB options in the BIOS perhaps, or disabling
| >> | the USB drivers to see if that fixes IPMI.
| >>
| >> Tried without USB in kernel:
| >>http://people.freebsd.org/~ambrisko/ipmi_ktr_dump_no_usb.txt
| >
| > Hmm, it's still just running constantly (note that the idle thread is
| > _never_ scheduled).  The lion's share of the time seems to be spent in
| > "xpt_thrd".  Note that there are several places where nothing happens except
| > that "xpt_thrd" runs constantly (spinning) during 10's of statclock ticks.  
I
| > would maybe start debugging that to see what in the world it is doing.  
Maybe
| > it is polling some hardware down in xpt_action() (i.e., xpt_action() for a
| > single bus called down into a driver and it is just spinning using polling
| > instead of sleeping and waiting for an interrupt).
| 
| "xpt_thrd" is a bus scanner thread. It is scheduled by CAM for every bus 
| on attach and by controller driver on hot-plug events. For some 
| controllers it may be quite CPU-hungry. For example, for legacy ATA 
| controllers, where bus reset may take many seconds of hardware polling, 
| while devices just spinning up. For ahci(4) it was improved about year 
| ago to not use polling when possible, but it still may loop for some 
| time if controller is not responding on reset. What mfi(4), mentioned in 
| log, does during scanning, I am not sure.

I thought that mfi(4) could be an issue.  There are some ata controllers
with nothing attached.  I built a GENERIC with USB and mfi commented out
and then the timeout issue went away:
  ipmi0: KCS mode found at io 0xca8 on acpi
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 1
  ipmi0: DEBUG ipmi_complete_request 527 before wakeup 2211
  ipmi0: DEBUG ipmi_complete_request 529 after wakeup 2272
  ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 2332
  ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0

Without mfi and with USB and it had issues:
  ipmi0: KCS mode found at io 0xca8 on acpi
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
  ipmi0: DEBUG ipmi_complete_request 527 before wakeup 3137
  ipmi0: DEBUG ipmi_complete_request 529 after wakeup 3199
  ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 3259
  ipm

Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-04 Thread Doug Ambrisko
John Baldwin writes:
| On Tuesday, April 03, 2012 12:37:50 pm Doug Ambrisko wrote:
| > John Baldwin writes:
| > | On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
| > | > Doug Ambrisko writes:
| > | > | John Baldwin writes:
| > | > | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
| > | > | | > Sean Bruno writes:
| > | > | | > | Noting a failure to attach to the onboard IPMI controller with 
| this 
| > | dell
| > | > | | > | R815.  Not sure what to start poking at and thought I'd though 
| this 
| > | over
| > | > | | > | here for comment.
| > | > | | > | 
| > | > | | > | -bash-4.2$ dmesg |grep ipmi
| > | > | | > | ipmi0: KCS mode found at io 0xca8 on acpi
| > | > | | > | ipmi1:  on isa0
| > | > | | > | device_attach: ipmi1 attach returned 16
| > | > | | > | ipmi1:  on isa0
| > | > | | > | device_attach: ipmi1 attach returned 16
| > | > | | > | ipmi0: Timed out waiting for GET_DEVICE_ID
| > | > | | > 
| > | > | | > I've run into this recently.  A quick hack to fix it is:
| > | > | | > 
| > | > | | > Index: ipmi.c
| > | > | | > 
| ===
| > | > | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
| > | > | | > retrieving revision 1.14
| > | > | | > diff -u -p -r1.14 ipmi.c
| > | > | | > --- ipmi.c14 Apr 2011 07:14:22 -  1.14
| > | > | | > +++ ipmi.c31 Mar 2012 19:18:35 -
| > | > | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
| > | > | | >   if (error == EWOULDBLOCK) {
| > | > | | >   device_printf(dev, "Timed out waiting for 
| GET_DEVICE_ID\n");
| > | > | | >   ipmi_free_request(req);
| > | > | | > - return;
| > | > | | >   } else if (error) {
| > | > | | >   device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
| > | > | | >   ipmi_free_request(req);
| > | > | | > 
| > | > | | > The issue is that the wakeup doesn't actually wake up the msleep
| > | > | | > in ipmi_submit_driver_request.  The error being reported is that
| > | > | | > the msleep timed out.  This doesn't seem to be critical problem
| > | > | | > since after this things seemed to work work.  I saw this on 9.X.
| > | > | | > Haven't seen it on 8.2.  Not sure about -current.
| > | > | | > 
| > | > | | > It doesn't happen on all machines.
| > | > | | 
| > | > | | Hmm, are you seeing the KCS thread manage the request but the 
| wakeup() 
| > | is 
| > | > | | lost?
| > | > | 
| > | > | It was a couple of weeks ago that I played with it.  I put printf's
| > | > | around the msleep and wakeup.  I saw the wakeup called but the sleep
| > | > | not get it.  I can try the test again later today.  Right now my main
| > | > | work machine is recovering from a power outage.  This was with 9.0 
| > | > | when I first saw it.  This issue seems to only happen at boot time.
| > | > | If I kldload the module after the system is booted then it seems to 
| work 
| > | > | okay.  The KCS part was working fine and got the data okay from the
| > | > | request.  I haven't seen or heard any issues with 8.2.
| > | > 
| > | > With -current I patched ipmi.c with:
| > | > Index: ipmi.c
| > | > ===
| > | > --- ipmi.c  (revision 233806)
| > | > +++ ipmi.c  (working copy)
| > | > @@ -523,7 +523,11 @@
| > | >  * waiter that we awaken.
| > | >  */
| > | > if (req->ir_owner == NULL)
| > | > +{
| > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup 
| > | %d\n",__FUNCTION__,__LINE__,ticks);
| > | > wakeup(req);
| > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup 
| > | %d\n",__FUNCTION__,__LINE__,ticks);
| > | > +}
| > | > else {
| > | > dev = req->ir_owner;
| > | > TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, 
| > | ir_link);
| > | > @@ -543,7 +547,11 @@
| > | > IPMI_LOCK(sc);
| > | > error = sc->ipmi_enqueue_request(sc, req);
| > | > if (error == 0)
| > | > +{
| > | > +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep 
| > | %d\n",__FUNCTION__,__LINE__,ticks);
| > | > error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo);
| > | > +device_printf(s

Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-03 Thread Doug Ambrisko
John Baldwin writes:
| On Monday, April 02, 2012 7:27:13 pm Doug Ambrisko wrote:
| > Doug Ambrisko writes:
| > | John Baldwin writes:
| > | | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
| > | | > Sean Bruno writes:
| > | | > | Noting a failure to attach to the onboard IPMI controller with this 
| dell
| > | | > | R815.  Not sure what to start poking at and thought I'd though this 
| over
| > | | > | here for comment.
| > | | > | 
| > | | > | -bash-4.2$ dmesg |grep ipmi
| > | | > | ipmi0: KCS mode found at io 0xca8 on acpi
| > | | > | ipmi1:  on isa0
| > | | > | device_attach: ipmi1 attach returned 16
| > | | > | ipmi1:  on isa0
| > | | > | device_attach: ipmi1 attach returned 16
| > | | > | ipmi0: Timed out waiting for GET_DEVICE_ID
| > | | > 
| > | | > I've run into this recently.  A quick hack to fix it is:
| > | | > 
| > | | > Index: ipmi.c
| > | | > ===
| > | | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
| > | | > retrieving revision 1.14
| > | | > diff -u -p -r1.14 ipmi.c
| > | | > --- ipmi.c14 Apr 2011 07:14:22 -  1.14
| > | | > +++ ipmi.c31 Mar 2012 19:18:35 -
| > | | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
| > | | >   if (error == EWOULDBLOCK) {
| > | | >   device_printf(dev, "Timed out waiting for 
GET_DEVICE_ID\n");
| > | | >   ipmi_free_request(req);
| > | | > - return;
| > | | >   } else if (error) {
| > | | >   device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
| > | | >   ipmi_free_request(req);
| > | | > 
| > | | > The issue is that the wakeup doesn't actually wake up the msleep
| > | | > in ipmi_submit_driver_request.  The error being reported is that
| > | | > the msleep timed out.  This doesn't seem to be critical problem
| > | | > since after this things seemed to work work.  I saw this on 9.X.
| > | | > Haven't seen it on 8.2.  Not sure about -current.
| > | | > 
| > | | > It doesn't happen on all machines.
| > | | 
| > | | Hmm, are you seeing the KCS thread manage the request but the wakeup() 
| is 
| > | | lost?
| > | 
| > | It was a couple of weeks ago that I played with it.  I put printf's
| > | around the msleep and wakeup.  I saw the wakeup called but the sleep
| > | not get it.  I can try the test again later today.  Right now my main
| > | work machine is recovering from a power outage.  This was with 9.0 
| > | when I first saw it.  This issue seems to only happen at boot time.
| > | If I kldload the module after the system is booted then it seems to work 
| > | okay.  The KCS part was working fine and got the data okay from the
| > | request.  I haven't seen or heard any issues with 8.2.
| > 
| > With -current I patched ipmi.c with:
| > Index: ipmi.c
| > ===
| > --- ipmi.c  (revision 233806)
| > +++ ipmi.c  (working copy)
| > @@ -523,7 +523,11 @@
| >  * waiter that we awaken.
| >  */
| > if (req->ir_owner == NULL)
| > +{
| > +device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup 
| %d\n",__FUNCTION__,__LINE__,ticks);
| > wakeup(req);
| > +device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup 
| %d\n",__FUNCTION__,__LINE__,ticks);
| > +}
| > else {
| > dev = req->ir_owner;
| > TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, 
| ir_link);
| > @@ -543,7 +547,11 @@
| > IPMI_LOCK(sc);
| > error = sc->ipmi_enqueue_request(sc, req);
| > if (error == 0)
| > +{
| > +device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep 
| %d\n",__FUNCTION__,__LINE__,ticks);
| > error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo);
| > +device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep 
| %d\n",__FUNCTION__,__LINE__,ticks);
| > +}
| > if (error == 0)
| > error = req->ir_error;
| > IPMI_UNLOCK(sc);
| > @@ -695,8 +703,11 @@
| > error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
| > if (error == EWOULDBLOCK) {
| > device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
| > +   printf("DJA\n");
| > +/*
| > ipmi_free_request(req);
| > return;
| > +*/
| > } else if (error) {
| > device_printf(dev, "Failed GET_DEVICE_ID: %d\n&quo

Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-02 Thread Doug Ambrisko
Doug Ambrisko writes:
| John Baldwin writes:
| | On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
| | > Sean Bruno writes:
| | > | Noting a failure to attach to the onboard IPMI controller with this dell
| | > | R815.  Not sure what to start poking at and thought I'd though this over
| | > | here for comment.
| | > | 
| | > | -bash-4.2$ dmesg |grep ipmi
| | > | ipmi0: KCS mode found at io 0xca8 on acpi
| | > | ipmi1:  on isa0
| | > | device_attach: ipmi1 attach returned 16
| | > | ipmi1:  on isa0
| | > | device_attach: ipmi1 attach returned 16
| | > | ipmi0: Timed out waiting for GET_DEVICE_ID
| | > 
| | > I've run into this recently.  A quick hack to fix it is:
| | > 
| | > Index: ipmi.c
| | > ===
| | > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
| | > retrieving revision 1.14
| | > diff -u -p -r1.14 ipmi.c
| | > --- ipmi.c14 Apr 2011 07:14:22 -  1.14
| | > +++ ipmi.c31 Mar 2012 19:18:35 -
| | > @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
| | >   if (error == EWOULDBLOCK) {
| | >   device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
| | >   ipmi_free_request(req);
| | > - return;
| | >   } else if (error) {
| | >   device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
| | >   ipmi_free_request(req);
| | > 
| | > The issue is that the wakeup doesn't actually wake up the msleep
| | > in ipmi_submit_driver_request.  The error being reported is that
| | > the msleep timed out.  This doesn't seem to be critical problem
| | > since after this things seemed to work work.  I saw this on 9.X.
| | > Haven't seen it on 8.2.  Not sure about -current.
| | > 
| | > It doesn't happen on all machines.
| | 
| | Hmm, are you seeing the KCS thread manage the request but the wakeup() is 
| | lost?
| 
| It was a couple of weeks ago that I played with it.  I put printf's
| around the msleep and wakeup.  I saw the wakeup called but the sleep
| not get it.  I can try the test again later today.  Right now my main
| work machine is recovering from a power outage.  This was with 9.0 
| when I first saw it.  This issue seems to only happen at boot time.
| If I kldload the module after the system is booted then it seems to work 
| okay.  The KCS part was working fine and got the data okay from the
| request.  I haven't seen or heard any issues with 8.2.

With -current I patched ipmi.c with:
Index: ipmi.c
===
--- ipmi.c  (revision 233806)
+++ ipmi.c  (working copy)
@@ -523,7 +523,11 @@
 * waiter that we awaken.
 */
if (req->ir_owner == NULL)
+{
+device_printf(sc->ipmi_dev, "DEBUG %s %d before wakeup 
%d\n",__FUNCTION__,__LINE__,ticks);
wakeup(req);
+device_printf(sc->ipmi_dev, "DEBUG %s %d after wakeup 
%d\n",__FUNCTION__,__LINE__,ticks);
+}
else {
dev = req->ir_owner;
TAILQ_INSERT_TAIL(&dev->ipmi_completed_requests, req, ir_link);
@@ -543,7 +547,11 @@
IPMI_LOCK(sc);
error = sc->ipmi_enqueue_request(sc, req);
if (error == 0)
+{
+device_printf(sc->ipmi_dev, "DEBUG %s %d before msleep 
%d\n",__FUNCTION__,__LINE__,ticks);
error = msleep(req, &sc->ipmi_lock, 0, "ipmireq", timo);
+device_printf(sc->ipmi_dev, "DEBUG %s %d after msleep 
%d\n",__FUNCTION__,__LINE__,ticks);
+}
if (error == 0)
error = req->ir_error;
IPMI_UNLOCK(sc);
@@ -695,8 +703,11 @@
error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
if (error == EWOULDBLOCK) {
device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
+   printf("DJA\n");
+/*
ipmi_free_request(req);
return;
+*/
} else if (error) {
device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
ipmi_free_request(req);

and get
  # dmesg | grep ipmi
  ipmi0: KCS mode found at io 0xca8 on acpi
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi1:  on isa0
  device_attach: ipmi1 attach returned 16
  ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 2
  ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6201
  ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6263
  ipmi0: DEBUG ipmi_submit_driver_request 553 after msleep 6323
  ipmi0: Timed out waiting for GET_DEVICE_ID
  ipmi0: IPMI device rev. 0, firmware rev. 1.61, version 2.0
  ipmi0: DEBUG ipmi_submit_driver_request 551 before msleep 6503
  ipmi0: DEBUG ipmi_complete_request 527 before wakeup 6620
  ipmi0: DEBUG ipmi_complete_request 529 after wakeup 6

Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-04-02 Thread Doug Ambrisko
John Baldwin writes:
| On Saturday, March 31, 2012 3:25:48 pm Doug Ambrisko wrote:
| > Sean Bruno writes:
| > | Noting a failure to attach to the onboard IPMI controller with this dell
| > | R815.  Not sure what to start poking at and thought I'd though this over
| > | here for comment.
| > | 
| > | -bash-4.2$ dmesg |grep ipmi
| > | ipmi0: KCS mode found at io 0xca8 on acpi
| > | ipmi1:  on isa0
| > | device_attach: ipmi1 attach returned 16
| > | ipmi1:  on isa0
| > | device_attach: ipmi1 attach returned 16
| > | ipmi0: Timed out waiting for GET_DEVICE_ID
| > 
| > I've run into this recently.  A quick hack to fix it is:
| > 
| > Index: ipmi.c
| > ===
| > RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
| > retrieving revision 1.14
| > diff -u -p -r1.14 ipmi.c
| > --- ipmi.c  14 Apr 2011 07:14:22 -  1.14
| > +++ ipmi.c  31 Mar 2012 19:18:35 -
| > @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
| > if (error == EWOULDBLOCK) {
| > device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
| > ipmi_free_request(req);
| > -   return;
| > } else if (error) {
| > device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
| > ipmi_free_request(req);
| > 
| > The issue is that the wakeup doesn't actually wake up the msleep
| > in ipmi_submit_driver_request.  The error being reported is that
| > the msleep timed out.  This doesn't seem to be critical problem
| > since after this things seemed to work work.  I saw this on 9.X.
| > Haven't seen it on 8.2.  Not sure about -current.
| > 
| > It doesn't happen on all machines.
| 
| Hmm, are you seeing the KCS thread manage the request but the wakeup() is 
| lost?

It was a couple of weeks ago that I played with it.  I put printf's
around the msleep and wakeup.  I saw the wakeup called but the sleep
not get it.  I can try the test again later today.  Right now my main
work machine is recovering from a power outage.  This was with 9.0 
when I first saw it.  This issue seems to only happen at boot time.
If I kldload the module after the system is booted then it seems to work 
okay.  The KCS part was working fine and got the data okay from the
request.  I haven't seen or heard any issues with 8.2.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-03-31 Thread Doug Ambrisko
Doug Ambrisko writes:
| Sean Bruno writes:
| | Noting a failure to attach to the onboard IPMI controller with this dell
| | R815.  Not sure what to start poking at and thought I'd though this over
| | here for comment.
| | 
| | -bash-4.2$ dmesg |grep ipmi
| | ipmi0: KCS mode found at io 0xca8 on acpi
| | ipmi1:  on isa0
| | device_attach: ipmi1 attach returned 16
| | ipmi1:  on isa0
| | device_attach: ipmi1 attach returned 16
| | ipmi0: Timed out waiting for GET_DEVICE_ID
| 
| I've run into this recently.  A quick hack to fix it is:
| 
| Index: ipmi.c
| ===
| RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
| retrieving revision 1.14
| diff -u -p -r1.14 ipmi.c
| --- ipmi.c14 Apr 2011 07:14:22 -  1.14
| +++ ipmi.c31 Mar 2012 19:18:35 -
| @@ -695,7 +695,6 @@ ipmi_startup(void *arg)
|   if (error == EWOULDBLOCK) {
|   device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
|   ipmi_free_request(req);
| - return;

Correction get rid of the ipmi_free_request as well.

If you kldload then it doesn't have this issue.  I've been doing that
on -current for a while so I didn't notice the regression when it happened.

|   } else if (error) {
|   device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
|   ipmi_free_request(req);
| 
| The issue is that the wakeup doesn't actually wake up the msleep
| in ipmi_submit_driver_request.  The error being reported is that
| the msleep timed out.  This doesn't seem to be critical problem
| since after this things seemed to work work.  I saw this on 9.X.
| Haven't seen it on 8.2.  Not sure about -current.
| 
| It doesn't happen on all machines.
| 
| Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [stable-ish 9] Dell R815 ipmi(4) attach failure

2012-03-31 Thread Doug Ambrisko
Sean Bruno writes:
| Noting a failure to attach to the onboard IPMI controller with this dell
| R815.  Not sure what to start poking at and thought I'd though this over
| here for comment.
| 
| -bash-4.2$ dmesg |grep ipmi
| ipmi0: KCS mode found at io 0xca8 on acpi
| ipmi1:  on isa0
| device_attach: ipmi1 attach returned 16
| ipmi1:  on isa0
| device_attach: ipmi1 attach returned 16
| ipmi0: Timed out waiting for GET_DEVICE_ID

I've run into this recently.  A quick hack to fix it is:

Index: ipmi.c
===
RCS file: /cvs/src/sys/dev/ipmi/ipmi.c,v
retrieving revision 1.14
diff -u -p -r1.14 ipmi.c
--- ipmi.c  14 Apr 2011 07:14:22 -  1.14
+++ ipmi.c  31 Mar 2012 19:18:35 -
@@ -695,7 +695,6 @@ ipmi_startup(void *arg)
if (error == EWOULDBLOCK) {
device_printf(dev, "Timed out waiting for GET_DEVICE_ID\n");
ipmi_free_request(req);
-   return;
} else if (error) {
device_printf(dev, "Failed GET_DEVICE_ID: %d\n", error);
ipmi_free_request(req);

The issue is that the wakeup doesn't actually wake up the msleep
in ipmi_submit_driver_request.  The error being reported is that
the msleep timed out.  This doesn't seem to be critical problem
since after this things seemed to work work.  I saw this on 9.X.
Haven't seen it on 8.2.  Not sure about -current.

It doesn't happen on all machines.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: LSI MegaRAID SAS 9240 with mfi driver?

2012-03-30 Thread Doug Ambrisko
Jan Mikkelsen writes:
| On 31/03/2012, at 9:21 AM, Doug Ambrisko wrote:
| 
| > Jan Mikkelsen writes:
| > | I don't know what changes Sean did. Are they in 9.0-release, or do I 
| > | need -stable after a certain point? I'm assuming I should be able to 
| > | take src/sys/dev/mfi/... and src/usr.sbin/mfiutil/... from -current.
| > 
| > It's in the SVN project/head_mfi repro.  You can browse it via the web at:
| > http://svnweb.freebsd.org/base/projects/head_mfi/
| > 
| > It's not in -current yet.  I'm working on the.  I just did all the
| > merges to a look try and eye'd them over.  Now doing a compile test
| > then I can check it into -current.
| 
| OK, will check it out.
| 
| > | The performance is an interesting thing. The write performance I care 
| > | about is ZFS raidz2 with 6 x JBOD disks (or 6 x single disk raid0) on 
| > | this controller. The 9261 with a BBU performs well but obviously costs 
more.
| > 
| > There will need to be clarification in the future.  JBOD is not that
| > same as a single disk RAID.  If I remember correctly, when doing some
| > JBOD testing version single disk RAID is that JBOD is slower.  A 
| > single disk RAID is faster since it can use the RAID.  However, without
| > the battery then you risk losing data on power outage etc.  Without the
| > battery then performance of a JBOD and single disk RAID should be able
| > the same.
| > 
| > A real JBOD as shown by LSI's firmware etc. shows up as a /dev/mfisyspd
| > entries.  JBOD by LSI is a newer thing.
| 
| Ok, interesting. I was told by the distributor that the 9240 supports 
| JBOD mode, but the 9261 doesn't. I'm interested to test it out with ZFS.

Correct, JBOD is not supported on all cards and depending on how the
card comes needs to be enabled.  Again JBOD is not RAID on a single
disk.  Also to clarify mfiutil create jbod does a RAID for each drive
which isn't the same definition of JBOD that LSI talks about.  They
are 2 different animals.  MegaCli can configure LSI JBOD's to enable
the feature and create them.  I'm not really sure what the value of
JBOD support is.  I haven't seen any kind of performance gains.
 
| > | I can see the BBU being important for controller based raid5, but I'm 
| > | hoping that ZFS with JBOD will still perform well. I'm ignorant at this 
| > | point, so that's why I'm trying it out. Do you have any experience or 
| > | expectations with a 9240 being used in a setup like that?
| > 
| > The battery or NVRAM doesn't matter on the RAID type being used since the
| > cache in NVRAM mode, says done whenever it has space in the cache for the
| > write.  Eventually, it will hit the disk.  Without the cache working in
| > this mode the write can't be acknowledged until the disk says done.  So
| > performance suffers.  With a single disk RAID you have been using the
| > cache.
| 
| With RAID-5 it is important because a single update requires two writes 
| and a failure in the window where one write has completed and one write 
| has not could cause data corruption. I don't know whether the controller 
| really handles this case.

That shouldn't be a problem since the acknowledge won't happen until
the writes are all done and if any fail then the I/O should fail back
to the OS.
 
| I guess I'm hopeful that ZFS will perform the function performed by the 
| NVRAM on the controller. I can see how the controller in isolation is 
| clearly slower without a BBU because it has to expose the higher layers 
| to the disk latency.

All the ZFS should really be doing is adding another level of caching.
Without an NVRAM cache, you can't really get the performance gain.
 
| > Now you can force using the cache without NVRAM but you have to acknowledge
| > the risk of that.
| 
| Yes, I understand the risk, and it is one I do not want to take. All
| the 9261s I have deployed have a BBU and go into write through mode if 
| the battery has a problem.
| 
| I think I need to test it in the context of ZFS and see how it works 
| without controller NVRAM.

Well, then you can do the performance test of the 9240 on the 9261s
by disabling the battery and the cache!  Feel free to do the test on
the 9240.  I can't see anything being faster without the NVRAM cache.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: LSI MegaRAID SAS 9240 with mfi driver?

2012-03-30 Thread Doug Ambrisko
Jan Mikkelsen writes:
| Hi,
| 
| On 31/03/2012, at 1:14 AM, Doug Ambrisko wrote:
| 
| > John Baldwin writes:
| > | On Friday, March 30, 2012 12:06:40 am Jan Mikkelsen wrote:
| > | ...
| > | > Is this path likely to work out? Any suggestions on where to go from 
here?
| > | 
| > | You should try the updated mfi(4) driver that Doug (cc'd) is going to soon
| > | merge into HEAD.  It syncs up with the mfi(4) driver on LSI's website 
which
| > | supports several cards that the current mfi(4) driver does not.  (I'm not
| > | fully sure if the 9240 is in that group or not.  Doug might know however.)
| > 
| > Yes, this card is supported with the mfi(4) in projects/head_mfi.  Looks
| > like we fixed a couple of last minute found bugs when trying to create a
| > RAID wth mfiutil.  This should be fixed now.  I'm going to start the
| > merge to -current today.  The version in head_mfi can run on older
| > versions of FreeBSD with the changes that Sean did.
| > 
| > Note that I wouldn't recomend the 9240 since it can't have a battery
| > option.  NVRAM is the key to the speed of mfi(4) cards.  However, that
| > won't stop us from supporting 
| 
| Thanks.
| 
| I don't know what changes Sean did. Are they in 9.0-release, or do I 
| need -stable after a certain point? I'm assuming I should be able to 
| take src/sys/dev/mfi/... and src/usr.sbin/mfiutil/... from -current.

It's in the SVN project/head_mfi repro.  You can browse it via the web at:
http://svnweb.freebsd.org/base/projects/head_mfi/

It's not in -current yet.  I'm working on the.  I just did all the
merges to a look try and eye'd them over.  Now doing a compile test
then I can check it into -current.
 
| The performance is an interesting thing. The write performance I care 
| about is ZFS raidz2 with 6 x JBOD disks (or 6 x single disk raid0) on 
| this controller. The 9261 with a BBU performs well but obviously costs more.

There will need to be clarification in the future.  JBOD is not that
same as a single disk RAID.  If I remember correctly, when doing some
JBOD testing version single disk RAID is that JBOD is slower.  A 
single disk RAID is faster since it can use the RAID.  However, without
the battery then you risk losing data on power outage etc.  Without the
battery then performance of a JBOD and single disk RAID should be able
the same.

A real JBOD as shown by LSI's firmware etc. shows up as a /dev/mfisyspd
entries.  JBOD by LSI is a newer thing.
 
| I can see the BBU being important for controller based raid5, but I'm 
| hoping that ZFS with JBOD will still perform well. I'm ignorant at this 
| point, so that's why I'm trying it out. Do you have any experience or 
| expectations with a 9240 being used in a setup like that?

The battery or NVRAM doesn't matter on the RAID type being used since the
cache in NVRAM mode, says done whenever it has space in the cache for the
write.  Eventually, it will hit the disk.  Without the cache working in
this mode the write can't be acknowledged until the disk says done.  So
performance suffers.  With a single disk RAID you have been using the
cache.

Now you can force using the cache without NVRAM but you have to acknowledge
the risk of that.
 
Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: LSI MegaRAID SAS 9240 with mfi driver?

2012-03-30 Thread Doug Ambrisko
John Baldwin writes:
| On Friday, March 30, 2012 12:06:40 am Jan Mikkelsen wrote:
| > Hi,
| > 
| > I have a loan LSI MegaRAID SAS 9240-4i controller for testing.
| > 
| > According to the LSI documentation, this device provides the MegaRAID 
| > interface and the BIOS message mentions MFI. The LSI driver for this device 
| > also lists support for the 9261 which I know is supported by mfi(4). 
| > Based on all this, I was hopeful that mfi(4) would work with the 9240.
| > 
| > The pciconf -lv output is:
| > 
| > none3@pci0:1:0:0:   class=0x010400 card=0x92411000 chip=0x00731000 rev=0x03 
hdr=0x00
| > vendor = 'LSI Logic / Symbios Logic'
| > device = 'MegaRAID SAS 9240'
| > class  = mass storage
| > subclass   = RAID
| > 
| > I added this line to src/sys/dev/mfi/mfi_pci.c
| > 
| > {0x1000, 0x0073, 0x, 0x, MFI_FLAGS_GEN2, "LSI MegaRAID SAS 
9240"},
| > 
| > It gave this result (tried with hw.mfi.msi set to 0 and to 1):
| > 
| > mfi0:  port 0xdc00-0xdcff mem 
0xfe7bc000-0xfe7b,0xfe7c-0xfe7f irq 16 at device 0.0 on pci1
| > mfi0: Using MSI
| > mfi0: Megaraid SAS driver Ver 3.00 
| > mfi0: Frame 0xff8000285000 timed out command 0x26C8040
| > mfi0: failed to send init command
| > 
| > The firmware is package 20.10.1-0077, which is the latest on the LSI 
website.
| > 
| > Is this path likely to work out? Any suggestions on where to go from here?
| 
| You should try the updated mfi(4) driver that Doug (cc'd) is going to soon
| merge into HEAD.  It syncs up with the mfi(4) driver on LSI's website which
| supports several cards that the current mfi(4) driver does not.  (I'm not
| fully sure if the 9240 is in that group or not.  Doug might know however.)

Yes, this card is supported with the mfi(4) in projects/head_mfi.  Looks
like we fixed a couple of last minute found bugs when trying to create a
RAID wth mfiutil.  This should be fixed now.  I'm going to start the
merge to -current today.  The version in head_mfi can run on older
versions of FreeBSD with the changes that Sean did.

Note that I wouldn't recomend the 9240 since it can't have a battery
option.  NVRAM is the key to the speed of mfi(4) cards.  However, that
won't stop us from supporting it.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: MFC: graid(8) (RAID GEOM) support

2011-06-21 Thread Doug Ambrisko
Jeremy Chadwick writes:
| Sorry for the cross-post, but I thought both lists would want to know
| about this.
| 
| Looks like mav@ just committed this ~17 hours ago:
| http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/raid/g_raid.c
| 
| Those who have historically wanted to use Intel MatrixRAID (now called
| Intel RST (Rapid Storage Technology)), but haven't due to the severe
| issues/risks with ataraid(4), will probably be very interested in
| this commit.  I know I am!
| 
| I plan on stress-testing the Intel support on a 2-disk system with
| RAID-1 enabled, and will document my experiences, procedures, etc...

We definitely want people to help test this out.  It was designed from 
the start to be robust and do recovery for RAID 1 which is our use.
We had previously hacked enhanced support into ataraid(4) and ata(4) for 
use in-house. 

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling watchdog

2010-05-14 Thread Doug Ambrisko
Tom Evans writes:
| On Fri, May 14, 2010 at 3:15 PM, Jeremy Chadwick
|  wrote:
| >
| > I'm a bit confused at this point, Doug. ?At what point did the OP state
| > he has IPMI support or IPMI cards in his system?
| 
| He said he had a Dell PowerEdge 2950 - iirc these all have IPMI.

... and although HW WD doesn't have to be in IPMI, I know for a fact
it is on the base config. of a Dell PE2950 and has been since the PE2650.
However, on the 2650 I saw false trips.  It was one of the reasons I wrote 
ipmi(4).  Eventually, I need to get in sync with jhb to add kernel 
back-trace support to it.  I have some code at work to do it but it needs 
some work to ensure it works in every case etc.

BTW, there is code/patches floating around to control the LCD on these
Dell machines via ipmitool and on the r710 control attributes of the LCD.
Unfortunately the ipmitool folks haven't pick it up.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling watchdog

2010-05-14 Thread Doug Ambrisko
rihad writes:
| On 05/14/2010 04:13 AM, Doug Ambrisko wrote:
| > rihad writes:
| > | Hi, I'm thinking of enabling the watchdog on our Dell PowerEdge 2950 /
| > | FreeBSD 8.0 amd64, so that it reboots the machine in case of lockups.
| > | Right now it doesn't work:
| > |
| > | # watchdog
| > | watchdog: patting the dog: Operation not supported
| > | #
| > | Looking through the kernel configuration I found two relevant settings:
| > | In /sys/conf/NOTES:
| > | #
| > | # Add software watchdog routines.
| > | #
| > | options SW_WATCHDOG
| > |
| > | and in /sys/amd64/conf/NOTES:
| > | #
| > | # Watchdog routines.
| > | #
| > | options MP_WATCHDOG
| > |
| > | Which of them should I rebuild the kernel with? BTW, the existing kernel
| > | is built with the default "options SCHED_ULE" to make good use of
| > | multiple CPUs, does watchdog work with it?
| >
| > If no one has said yet, kldload ipmi then run watchdogd.  ... or compile
| > it into the kernel.  This will enable the IPMI HW watchdog.  If it triggers,
| > it will appear in the IPMI SEL (ipmitool sel list).
| 
| Thanks. So did I understand it right that I should first install 
| sysutils/ipmitool, then start polling "ipmitool sel list" in a shell 
| script from a cron job run once a minute, and reboot in case IPMI 
| triggers? But if it's a kernel lockup, none of the user level code might 
| run at all. Any way to fall back to a hard and fast kernel level machine 
| reset?

Nope, when you load the ipmi driver it provides a HW watchdog via ipmi
and works with watchdogd.  Now if you want to know if your machines 
rebooted due to the watchdog then check the ipmi sel for the watchdog 
event.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling watchdog

2010-05-13 Thread Doug Ambrisko
rihad writes:
| Hi, I'm thinking of enabling the watchdog on our Dell PowerEdge 2950 / 
| FreeBSD 8.0 amd64, so that it reboots the machine in case of lockups.
| Right now it doesn't work:
| 
| # watchdog
| watchdog: patting the dog: Operation not supported
| #
| Looking through the kernel configuration I found two relevant settings:
| In /sys/conf/NOTES:
| #
| # Add software watchdog routines.
| #
| options SW_WATCHDOG
| 
| and in /sys/amd64/conf/NOTES:
| #
| # Watchdog routines.
| #
| options MP_WATCHDOG
| 
| Which of them should I rebuild the kernel with? BTW, the existing kernel 
| is built with the default "options SCHED_ULE" to make good use of 
| multiple CPUs, does watchdog work with it?

If no one has said yet, kldload ipmi then run watchdogd.  ... or compile
it into the kernel.  This will enable the IPMI HW watchdog.  If it triggers,
it will appear in the IPMI SEL (ipmitool sel list).

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Netowrk Card BCE not working

2009-08-20 Thread Doug Ambrisko
Umar writes:
| Dear Members!
| 
| I have recently Install FreeBSD 7.2 amd64 on DELL R610.
| 
| After successfuly installation network cards are not working and i got error
| 
| bce0: /usr/src/dev/bce/if_bce.c(1386): Unable to write CTX memory: cid_addr = 
0x008, offset = 0x0080
| 
| Would you please help me what should I do?

A new version of firmware should be coming out from Dell that should
resolve this issue.  The firmware can be updated via DOS or Linux
but not from FreeBSD using Linux emulation (atleast not yet) :-(

A short time solution is to find a diag utility for the NIC and
then disable management function in all NICs.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Monitoring tools for mfi0: ?

2009-08-10 Thread Doug Ambrisko
wsk writes:
[ Charset UTF-8 unsupported, converting... ]
| On Sun, Aug 09, 2009 at 08:04:35PM +0200, V=E1clav Haisman wrote:
| > Hi,
| 
| > I have a server with the "mfi0: " controller. Are there any
| > monitoring tool for this? I tried camcontrol but it doesn't even list the
| > device.
| =20
| Maybe sysutils/megacli does what you want?
| 
| Roland
| 
| some times. I got follow mesgs on my Dell PE R900. any ideas?
| 
| mfi0: 1989 (303015600s/0x0020/info) - Patrol Read started
| mfi0: 2020 (303022694s/0x0020/info) - Patrol Read complete

This is normal.  Patrol read scans for potential disk errors.
If you want more info in real-time then you can set
hw.mfi.event_class="-2"
and get more detail or use MegaCli to get the full event log.

| mfi0: COMMAND 0xff80005a7870 TIMEOUT AFTER 43 SECONDS
| mfi0: COMMAND 0xff80005a7ed0 TIMEOUT AFTER 58 SECONDS

This is usually okay and saying that some commands are taking
a while.  If the command never completes then that is a problem.
the RAID control can decide the order of completing commands
so it can take some time.
 
| pciconf:
| m...@pci0:25:0:0: class=0x010400 card=0x1f0c1028 chip=0x00601000 rev=0x04 
hdr=0x00
| vendor = 'LSI Logic (Was: Symbios Logic, NCR)'
| device = 'SAS1078 PCI-X Fusion-MPT SAS'
| class  = mass storage
| subclass   = RAID

FWIW, that is a bad discription since it claims it is the SAS card and
not the RAID.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: System deadlock when using mksnap_ffs

2008-11-13 Thread Doug Ambrisko
Kostik Belousov writes:
| On Thu, Nov 13, 2008 at 02:45:14AM -0800, Jeremy Chadwick wrote:
[snip]
| > If he can press Control-T, it means SIGINFO can be sent to the
| > mksnap_ffs process, and the process responds with that information.  So,
| > the system is not deadlocked -- meaning, I believe what he experiences
| > is what others experience (the system becomes completely unusable during
| > mksnap_ffs running, but DOES NOT hang or lock up, it just becomes so
| > god-awful slow that processes on the machine literally sit and spin for
| > minutes at a time).
| 
| Unless NOKERNINFO is specified in the local flags in the controlling
| terminal termios, kernel prints one line summary as shown above. This is
| done from the tty discipline input handler (or whatever it is in new tty
| code). No process cooperation is required. On the other hand, actually
| delivering SIGINFO and getting output from the process-installed
| handler do require process to either executing usermode or sleeping
| interruptible.

Also note that "dead-lock" is not just a locking issue but can be
WRT to other chains such as, hit the max buffer cache usage so the
buffer daemon needs to flush things out but it can't since it needs
a buffer but the buffer daemon can't get it since need to flush some.
Things get really bad when the buffer daemon needs a buffer but
can't!  In theory it can go and use "emergency space" just for it
to get out of this situation but it the buffer cache is fragmented
such that all available buffers are to small then the buffer daemon
is stuck on itself.  Note that all stuff works except for anything
that touch the buffer cache, such as a program coming off disk.  A
program in memory is okay.

To really get a good picture of this you need to look at the 
various buffer cache variables via ddb (ie. hi, low, running etc.)
A while back I wrote a debugging function to dump that state of
things every minute or so.  There are various loops you can get into.
So then you start playing wack a mole.  Usually due to the first
bug you can't hit the 2nd, 3rd and so one adding to the fun.
Unfortunately there isn't one magic bullet.

These are not new problems since we hit them in 4.X.  I did start
to go over some of this issue with Tor but ran into ENOTIME on my
side :-(

Snap shots can take a very long time to make depending on the amount 
of stuff it has to snap shot and during that time it has to effectively
lock out everything from the file system or the snap shot will be 
wrong.  This then leads to a need for a good journaling fs that
can be used on "big" disks (big, isn't that big anymore).

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: System deadlock when using mksnap_ffs

2008-11-12 Thread Doug Ambrisko
Kostik Belousov writes:
| On Wed, Nov 12, 2008 at 07:49:28PM +, Tim Bishop wrote:
| > On Wed, Nov 12, 2008 at 05:58:26PM +, Tim Bishop wrote:
| > > I run the mksnap_ffs command to take the snapshot and some time later
| > > the system completely freezes up:
| > > 
| > > paladin# cd /u2/.snap/
| > > paladin# mksnap_ffs /u2 test.1
| > 
| > Someone (not named because they choose not to reply to the list) gave me
| > the following patch:
| > 
| > --- sys/ufs/ffs/ffs_snapshot.c.orig Wed Mar 22 09:42:31 2006
| > +++ sys/ufs/ffs/ffs_snapshot.c  Mon Nov 20 14:59:13 2006
| > @@ -282,6 +282,8 @@ restart:
| > if (error)
| > goto out;
| > bawrite(nbp);
| > +   if (cg % 10 == 0)
| > +   ffs_syncvnode(vp, MNT_WAIT);
| > }
| > /*
| >  * Copy all the cylinder group maps. Although the
| > @@ -303,6 +305,8 @@ restart:
| > goto out;
| > error = cgaccount(cg, vp, nbp, 1);
| > bawrite(nbp);
| > +   if (cg % 10 == 0)
| > +   ffs_syncvnode(vp, MNT_WAIT);
| > if (error)
| > goto out;
| > }
| > 
| > With the description:
| > 
| > "What can happen is on a big file system it will fill up the buffer
| > cache with I/O and then run out.  When the buffer cache fills up then no
| > more disk I/O can happen :-(  When you do a sync, it flushes that out to
| > disk so things don't hang."
| > 
| > It seems to work too. But it seems more like a workaround than a fix?
| 
| It looks hackish, but in fact it is not that wrong, and I even say that
| it provides reasonable workaround.
| 
| The usual way to prevent wdrain deadlock is to issue bwillwrite() call
| before any vnode lock is taken. This is sufficient for most VFS syscalls
| that typically put dozen or less dirty buffers into delayed write
| queue.
| 
| Snapshot creation does not call bwillwrite() at all, and then does a lot
| of async writes, completely saturating buffer cache with dirty buffers.
| bwillwrite cannot be called after the vnode is locked, and just forcing
| a sync for the embrionic snapshot vnode is good enough.
| 
| The 10 counter is debatable, but debate shall be postponed until the patch
| goes into tree. I ask an anonymous submitter to commit it. Thanks !

I plan to commit it tomorrow since I sent it to Tim to test.  The 10 can 
be tuned but it has kept a bunch of machines at work up.  Glad people 
don't think it is that it is to wrong :-)  It probably could be made
a little more dynamic but I wonder if it would show any real performance
difference and might risk more bugs.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: System deadlock when using mksnap_ffs

2008-11-12 Thread Doug Ambrisko
Jeremy Chadwick writes:
[snip]
| The rest of the below information is good -- but I'm confused about
| something: is there anyone out there who can use mksnap_ffs on a
| filesystem (/usr is a good test source) and NOT experience this
| deadlocking problem?  Literally *every* FreeBSD box I have root access
| to suffers from this problem, so I'm a little baffled why we end-users
| need to keep providing debugging output when it should be easy as pie
| for a developer to do "dump -0 -L -a -f /path/fs.dump /usr" and watch
| their system wedge.

We can at work, but we have a bunch of other patches.  There are a
few problems with the buffer cache:
1)  The buffer daemon can't use the space that is reserved for it
since to flush some stuff it needs to use more buffers.
2)  The buffer cache can get fragmented to prevent large I/O
which the buffer daemon may need.
3)  Other issues ...
I have fix for "1".  It is pretty easy.  I have a hack'ish fix for "2"
in the I make all request use max size so it can't get fragmented
since there is no code to defrag and it isn't trivial to defrag the
memory.  I have some fixes for some other issues, but there were
some review issues with them.  I might just commit the fixes for 1 and
2.  It makes things better and there was no-objections at the time.
We have the patches in shipping products.

I can try to do some experiments at work like you said since I 
had similar things working before and it is pretty easy to put in
printf's to see the issue.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Management interface for cards powered by the "mfi" driver?

2008-06-18 Thread Doug Ambrisko
Karl Denninger writes:
[snip]
| Ok, wiped the src tree, re-cvs'd out the RELENG_7, rebuild world and kernel
| and reinstalled (nice fast machine eh?)
 
Not needed since FreeBSD 6.2 if I recall right.  Forget if I got it in
6.1.

| Anyway, no change:
| 
| dbms# uname -v
| FreeBSD 7.0-STABLE #1: Wed Jun 18 14:43:29 CDT 2008 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/GENERIC 
| 
| dbms# megacli -adpCount
| 
| Controller Count: 0. 
| 
| dbms# megacli -Cfgdsply -a0
|  
| Failed to get ControllerId List.
| Failed to get CpController object.
| 
| Still no joy
| 
| dbms# kldstat
| Id Refs AddressSize Name
|  1   17 0xc040 943140   kernel
|  21 0xc0d44000 6a2c4acpi.ko
|  31 0xc5534000 7000 linprocfs.ko
|  43 0xc553b000 22000linux.ko
|  51 0xc5585000 3000 linsysfs.ko
|  61 0xc7a34000 3000 daemon_saver.ko
|  71 0xc7c2d000 2000 mfi_linux.ko
| 
| Says I got the proper KLDs loaded.
|
| dbms# mount
| /dev/mfid0s1a on / (ufs, local, soft-updates)
| devfs on /dev (devfs, local)
| /dev/mfid0s1e on /dbms (ufs, local, soft-updates)
| /dev/mfid0s1d on /usr (ufs, local, soft-updates)
| linprocfs on /usr/compat/linux/proc (linprocfs, local)
| linsysfs on /usr/compat/linux/sys (linsysfs, local)
| 
| The two linux "look-sees" are there.
| 
| So it looks like all the pre-reqs are there, but it still doesn't work.
| 
| Here's the ID on the card and volume:
| 
| mfi0: 524 (267116948s/0x0020/0) - Adapter ticks 267116948 elapsed 61s: Time 
established as 06/18/08 15:09:08; (61 seconds since power on)
| mfid0:  on mfi0
| mfid0: 237464MB (486326272 sectors) RAID volume '' is optimal
| 
| What am I missing?

The linux version sysctl is?  Also I think you need to make sure 
mfi_linux.ko is loaded before linuxsys.ko mounts so you get the emulation
hooks.  Verify that via:
head /compat/linux/sys/class/scsi_host/*/proc_name
results in one saying:
megaraid_sas
or it won't think it is there.

The count is good to see if your file system & linux version sysctl
stuff is in the right state.  Once it detects it, then the ioctl should
work.  6-stable, 7-stable and -current all have the latest stuff to
support all of the ioctl stuff as Linux does for MegaCli.  MegaCli
does various things to try to find the card in Linux that is really
strange IMHO.  For FreeBSD it doesn't have to be that complicated.
They unfortunately, have not released a FreeBSD MegaCli which they
could ...

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: lpbb broken in 6.x?

2008-03-25 Thread Doug Ambrisko
Bruce M Simpson writes:
| Ian Smith wrote:
| > To finish off completely hijacking your thread :) does anyone know of
| > anything that can run a master/slave interface like pcf(4) which appears
| > to have been an ISA bus only device?  I don't have C skills to write
| > one, though 400kHz master and slave routines in AVR asm were fun :)
| >
| > Later: after nearly losing this in a pine crash (don't ask), I've since
| > seen John's reply to your later message.  Could it be that smbus or
| > something is also using iicbus rather than something messing with ppbus? 
| 
| Thanks for the hints. I don't have smbus in the kernel, nor do I have 
| any other i2c device drivers loaded in the system.
| 
| I stopped using smbus when it became pretty clear that it wasn't doing 
| anything useful for me (it could never see CPU fan readouts or anything 
| like that when I tried it on 3 different PIII era systems).

FWIW, this really isn't a fault of smbus.  Some monitoring chips only 
have I/O type interface, other only have i2c some have both.  Then it
depends on what the manufacturer connects and then if they enable
the i2c controller on the motherboard.  I've used smbus on a 
bunch of HW and not on other.  It depends on what they do and how
they set up addressing.  At a prior company I set up an LCD display
module to interface to the MB i2c bus.  I prototyped it by soldering
wires onto a DIMM.  Some MB's have an i2c header on board and others
have it routed to PCI slots.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Dell PERC6?

2008-01-17 Thread Doug Ambrisko
Vlad GALU writes:
| On 1/17/08, Ferdinand Goldmann <[EMAIL PROTECTED]> wrote:
| > Hi!
| >
| > I am in the process of buying new Dell hardware, mainly the 2950 III.
| > According to various postings I found, the PERC6/i Controller _should_ work
| > with FreeBSD 6.3. Does anyone successfully use a 2950 III with PERC6/i
| > controller and can confirm this?
| >
| > Sorry if the question sounds stupid, but as I cannot find any references to
| > the PERC6 in either documentation or source code I am a bit confused, and I
| > wanted to make sure it works before shelling out my employers money. :-)
| >
| > Many thanks for any enlightenment on this subject,
| > kind regards,
| > Ferdinand
|
|Don't know if this is useful to you, but I'm using 7.0 on the same
| Dell platform, and hence on the same controller, with very good
| results. I think the mfi(4) manpage should be updated too :)

It's been updated in -current.  Yes, PERC6 support is in 6.3 & 7.0.

Thanks for the prompt,

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Don't buy AMD products (was Re: Xorg and ATI card query.)

2007-03-13 Thread Doug Ambrisko
Kip Macy writes:
| Please be very careful. The only real alternative (Intel comes and
| goes) is Nvidia whose driver is binary-only for i386 (no amd64
| support) and has a history for being notoriously buggy. I only buy ATI
| because of the problems I keep seeing people have with the Nvidia
| driver. I have a friend who has basically abandoned his dual-head
| Nvidia card due to recurring issues.

One thing that is a plus with nv is that X has some support for it,
whereas, the newer ati cards have no support :-(  I was a fan of ati 
since it was easier to get support.  Now I'm starting to lean towards 
Nvidia :-(

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Xorg and ATI card query.

2007-03-12 Thread Doug Ambrisko
Daniel O'Connor writes:
| On Tuesday 13 March 2007 05:10, Yann Golanski wrote:
| > I have an ATI Radeon X1950 Sapphire and I am trying to get X/FreeBSD
| > working with it.  My system is a clean install of FreBSD.   I've managed to
| > get VESA to "work" but cannot get much more than that.
| 
| There is no open source support for this card (alas). It's VESA or fglrx.
| 
| > fglrx gives me an error at compile time since I do not have
| > /usr/X11R6/bin/moc installed.
| 
| Is this using the FreeBSD port at http://www.fglrx-freebsd.com/index.php? If 
| so you could just install moc which is part of qt.

FWIW, I just went through this exercise for my new laptop.  Vesa doesn't
do 1920x1200 :-( and it isn't on amd64 :-(  So I wanted to use the Linux
fglrx.  The one that he is to old to support my laptop.  Realize that
he does compile some misc. tools they are not needed for X to work.
Really he is taking the Linux X drivers (fglxrc_drv.o & libfglrxdrm.a)
and putting them into /usr/X11R6/lib/modules/drivers.  The caveat is
that you need an old enough version that doesn't link against Linux
specific things like pthreads etc. which the newest ones do.  Another
caveat is that the older versions were built again X.org 6.8 so then
you need an X.org of the version.  In 6.9 some structures changed
leading to a core dump :-(

I ended up building my own X.org 6.8, install and then install the
typical -current X stuff.  The next thing I'm going to work on is
to get the 32bit X server to run on a 64bit kernel so I can switch
over to 64bit.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: running mksnap_ffs

2007-01-16 Thread Doug Ambrisko
Kris Kennaway writes:
| Thanks for clarifying.  Hopefully you and Tor can get something
| committed soon!

I'm not sure about that.  I have to see what has changed since then.
That was ... uhm a year ago when I dropped the ball.

It's probably a good task for me to look at in the context of -current
again.  I should have disks to build a 1.5T file system to play with.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: running mksnap_ffs

2007-01-16 Thread Doug Ambrisko
Kris Kennaway writes:
| On Tue, Jan 16, 2007 at 09:26:47PM +0100, Willem Jan Withagen wrote:
| > Doug Ambrisko wrote:
| > >| > or things can get wedged.  We have some other patches as well that 
| > >might
| > >| > be required.  As a hack on a local server we have been using snap shots
| > >| > to do a "hot" back-up of a data base each morning.  This is based on
| > >| > 6.x.
| > >|
| > >| What do you mean by "get wedged"?  Are you seeing a deadlock, and if
| > >| so then what are the details?  When you say 6.x, do you mean
| > >| up-to-date RELENG_6?  There were various snapshot deadlock fixes
| > >| committed over the past year including some in the past few months.
| > >
| > >The file-system would come to a stop, processes stuck on bio, snap-shots
| > >not finishing etc.  This was caused by the system running out of usable
| > >buffers.  The change forces them to be flushed every so often.  This is
| > >independant of locking.  10 might be to aggresive.  Some scaling of
| > >nbuf would probably be better.
| > 
| > When I run mksnap_ffs it runs to the point where ANY access to the 
| > filesystem gives that process a lockup.
| 
| Yes, that is expected.  Actually it begins when something accesses the
| directory in which the snapshot is being made, since that causes the
| parent directory to be locked...then something tries to access the
| parent directory, which eventually cascades back to the root.
| 
| > Getting the file system back is only thru "hard reboot". Trying to do it 
| > the gentle way locks the whole system.
| 
| Or waiting until the snapshot operation finishes.  You (still) haven't
| determined that it's actually hanging as opposed to just waiting for
| the snapshot operation to finish.

In my case is was easy to see that all the buffers were exhausted and
the system was churning waiting for some to become available.  Since they
were all used up it never recovered.  By sync'ing the buffers they got
cleaned up and then the system never ran out.  The snap shot was then
able to finish.  Via the debugger you can see this happen.  I traced
this problem in the debugger.  There are other issues with the buffer
deamon as well.  We hit these since we run with a relatively low
nbuf.  The buffers can be get frag'ed so bad that it can't flush
things since it can't get a full-size buffer.  Another problem is that
it can end up waiting on itself since the current code can't use
it's emergency space to flush stuff.  You can see this via ps etc.
It's not a good thing if the buffer daemon is waiting on itself :-(

We have patches to this as well but they need some more work.  I was
working with Tor, on this but then I got swamped at work with our 4.X -> 6.X
and platform transition.  All I can say is that we don't suffer from
these problems now :-)  I have printf's the log this stuff when some of
these bugs are hit.  Now the system survives those lock-up points.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Dell hardware raid 0 (sas5ir) or gmirror?

2007-01-16 Thread Doug Ambrisko
Joe Koberg writes:
| Josef Karthauser wrote:
| > On Mon, Jan 15, 2007 at 11:21:06AM +, Josef Karthauser wrote:
| >> I'm purchasing a new server, and was wondering what anyone thought 
| >> about whether to pay extra for the SAS5IR card so I can RAID0 the 
| >> two drives, or whether to just rely on gmirror. My worry about the 
| >> former is that I can't seem to find management tools for 
| >> controlling the hardware controller. What if one of the drives 
| >> fails? How would I know?
| > 
| > Of course I mean RAID1!
| 
| I just bought two Dell PE-1950's to use as routers. They have LSI Logic 
| PERC/5i's attached to 80GB SATA drives.  I am pretty sure this is the 
| same card used for SAS.
| 
| One thing is for sure, the mfi(4) card and driver aren't shy!  See below 
| for examples of the kernel messages I get regularly.  I am sure drive 
| failure would be well noted.

FYI, you can silence it to your level of comfort via:
hw.mfi.event_class
in /boot/loader.conf.  The values being:
MFI_EVT_CLASS_DEBUG =   -2,
MFI_EVT_CLASS_PROGRESS =-1,
MFI_EVT_CLASS_INFO =0,
MFI_EVT_CLASS_WARNING = 1,
MFI_EVT_CLASS_CRITICAL =2,
MFI_EVT_CLASS_FATAL =   3,
MFI_EVT_CLASS_DEAD =4
The new default is info. so it's a little quieter.  I'd suggest some
care in going over info since a drive that failed will come through but
when it is now okay will not.  So if you are waiting for that you
won't know.  Here, we like the debug and progress stuff put into 
/var/log/messages.  It makes support a lot easier.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: running mksnap_ffs

2007-01-16 Thread Doug Ambrisko
Kris Kennaway writes:
| On Tue, Jan 16, 2007 at 10:13:57AM -0800, Doug Ambrisko wrote:
|
| > FWIW, with this patch I find making snap-shots a lot more reliable:
| >
| > --- sys/ufs/ffs/ffs_snapshot.c.orig Wed Mar 22 09:42:31 2006
| > +++ sys/ufs/ffs/ffs_snapshot.c  Mon Nov 20 14:59:13 2006
| > @@ -282,6 +282,8 @@ restart:
| > if (error)
| > goto out;
| > bawrite(nbp);
| > +   if (cg % 10 == 0)
| > +   ffs_syncvnode(vp, MNT_WAIT);
| > }
| > /*
| >  * Copy all the cylinder group maps. Although the
| > @@ -303,6 +305,8 @@ restart:
| > goto out;
| > error = cgaccount(cg, vp, nbp, 1);
| > bawrite(nbp);
| > +   if (cg % 10 == 0)
| > +   ffs_syncvnode(vp, MNT_WAIT);
| > if (error)
| > goto out;
| > }
| >
| > or things can get wedged.  We have some other patches as well that might
| > be required.  As a hack on a local server we have been using snap shots
| > to do a "hot" back-up of a data base each morning.  This is based on
| > 6.x.
|
| What do you mean by "get wedged"?  Are you seeing a deadlock, and if
| so then what are the details?  When you say 6.x, do you mean
| up-to-date RELENG_6?  There were various snapshot deadlock fixes
| committed over the past year including some in the past few months.

The file-system would come to a stop, processes stuck on bio, snap-shots
not finishing etc.  This was caused by the system running out of usable
buffers.  The change forces them to be flushed every so often.  This is
independant of locking.  10 might be to aggresive.  Some scaling of
nbuf would probably be better.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: running mksnap_ffs

2007-01-16 Thread Doug Ambrisko
Scott Oertel writes:
| Kris Kennaway wrote:
| > On Tue, Jan 02, 2007 at 09:06:24PM +0100, Willem Jan Withagen wrote:
| >   
| >> Hi,
| >>
| >> I got the following Filesystem:
| >> FilesystemSizeUsed   Avail Capacity iused ifree %iused 
| >> /dev/da0a 1.3T422G823G34%  565952 1828334700%
| >>
| >> Running of a 3ware 9550, on a dual core Opteron 242 with 1Gb.
| >> The system is used as SMB/NFS server for my other systems here.
| >>
| >> I would like to make weekly snapshots, but manually running mksnap_ffs 
| >> freezes access to the disk (I sort of expected that) but the process 
| >> never terminates. So I let is sit overnight, but looking a gstat did not 
| >> reveil any activity what so ever...
| >> The disk was not released, mksnap_ffs could not be terminated.
| >> And things resulted in me rebooting the system.
| >>
| >> So:
| >>  - How long should I expect making a snapshot to take:
| >>5, 15, 30min, 1, 2 hour or even more???
| >
| > Yes :) Snapshots were not designed for use in this way (they were
| > designed to support background fsck and allow faster system recovery
| > after power failure), so they don't scale as well as you might like on
| > large filesystems.
| 
| If snapshots were designed to support background fsck, then why did they 
| not make it more scalable? If you can't create a snapshot without the 
| system locking up, that means fsck won't be able to either, making 
| background fsck worthless for systems with large storage.

FWIW, with this patch I find making snap-shots a lot more reliable:

--- sys/ufs/ffs/ffs_snapshot.c.orig Wed Mar 22 09:42:31 2006
+++ sys/ufs/ffs/ffs_snapshot.c  Mon Nov 20 14:59:13 2006
@@ -282,6 +282,8 @@ restart:
if (error)
goto out;
bawrite(nbp);
+   if (cg % 10 == 0)
+   ffs_syncvnode(vp, MNT_WAIT);
}
/*
 * Copy all the cylinder group maps. Although the
@@ -303,6 +305,8 @@ restart:
goto out;
error = cgaccount(cg, vp, nbp, 1);
bawrite(nbp);
+   if (cg % 10 == 0)
+   ffs_syncvnode(vp, MNT_WAIT);
if (error)
goto out;
}

or things can get wedged.  We have some other patches as well that might
be required.  As a hack on a local server we have been using snap shots
to do a "hot" back-up of a data base each morning.  This is based on
6.x.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.x from i386 to amd64

2006-11-01 Thread Doug Ambrisko
Peter Jeremy writes:
| It would be nice to see the 32-bit emulation improved so that it is
| possible to build/run the i386 versions of ports on an amd64 system.
| This would be the best of both worlds.  If I had any free time, I
| would even work on this myself.

I have this working well enough for everything that we build here.
Our new build machines are running the amd64 kernels but we build for
i386.

After 6.2 is out I'll merge my uname/getosreldate changes to -stable
and create a stub script to set the environment variables.  We
do some hacks to copy in the hosts ps, top, mount type things into
a compat directory so it runs the hosts versions.

It seems a few people are interested in this and it seems to be
working well for us & myself.

Maybe Kris can then convert his ports clusters over to amd64 OS'es
to build everything.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Megacli fails to find SAS adapter

2006-10-11 Thread Doug Ambrisko
Sven Willenberger writes:
| On Tue, 2006-10-10 at 22:11 -0700, Doug Ambrisko wrote:
| > Sven Willenberger writes:
| > | FreeBSD 6.2-PRERELEASE #3: Tue Oct 10 13:58:29 EDT 2006
| > | LSi 8480e SAS Raid card

| Adding mfi_linux_enable="YES" to /boot/loader.conf did do the trick of
| having the device added to the system:
| 
| # cat /compat/linux/sys/class/scsi_host/host*/proc_name
| (null)
| megaraid_sas
| (null)
| 
| # sysctl compat.linux
| compat.linux.oss_version: 198144
| compat.linux.osrelease: 2.6.12
| compat.linux.osname: Linux
| 
| Although the MegaCli utility no longer complains about not finding a
| controller, it sadly does nothing else either (except dump core on
| certain commands):
| 
| # ./MegaCli -AdpAllinfo -a0

I usually start with that.  It should work okay.  Check your
/compat/linux/dev directory for stuff.  It might have created
null and some other entries look at the dates.  Those nodes could
be wrong.  We have an empty /compat/linux/dev directory.
 
| # ./MegaCli -AdpGetProp SpinupDriveCount -a0
| 
| Segmentation fault (core dumped)
| # ./MegaCli -LDGetNum -a0
| 
| Failed to get VD count on adapter -9993.
| # ./MegaCli -CfgFreeSpaceinfo -a0
| 
| Failed to initialize RM
| 
| and so on ... I am guessing this is an issue with the MegaCli software
| now; needless to say I certainly doubt that this will allow me to flash
| the card bios (or even it if *could*, I would be leery of the process).

If one doesn't work the reset probably won't.  I be cautious to flash
the card.  It should work but I haven't tried it.  If this is your
only card then you have a lot to risk!  On prior cards, Adaptec and
LSI if the flash failed then the card was toast.  MegaCli has some
issues as well.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Megacli fails to find SAS adapter

2006-10-10 Thread Doug Ambrisko
Sven Willenberger writes:
| FreeBSD 6.2-PRERELEASE #3: Tue Oct 10 13:58:29 EDT 2006
| LSi 8480e SAS Raid card
| 
| mount:
| linprocfs on /compat/linux/proc (linprocfs, local)
| linsysfs on /compat/linux/sys (linsysfs, local)
| /dev/mfid0s1d on /usr/local/pgsql (ufs, local, noatime)
| 
| dmesg:
| mfi0: 2025 - PCI 0x041000 0x04411 0x041000 0x041002: Firmware initialization 
started (PCI ID 0411/1000/1002/1000)
| mfi0: 2026 - Type 18: Firmware version 1.00.00-0074
| mfi0: 2027 - Battery temperature is normal
| mfi0: 2028 - Battery Present
| mfi0: 2029 - PD 39(e1/s255) event: Enclosure (SES) discovered on PD 
27(e1/s255)
| mfi0: 2030 - PD 56(e2/s255) event: Enclosure (SES) discovered on PD 
38(e2/s255)
| mfi0: 2031 - PD 39(e1/s255) event: Inserted: PD 27(e1/s255)
| mfi0: 2032 - Type 29: Inserted: PD 27(e1/s255) Info: enclPd=27, scsiType=d, 
portMap=10, sasAddr=50015b2180001839,
| mfi0: 2033 - PD 56(e2/s255) event: Inserted: PD 38(e2/s255)
| 
| pkg_info:
| linux_base-fc-4_9
| 
| I have downloaded the Megacli and, using rpm2cpio extracted
| MegaCli-1.01.09-0.i386.rpm into my home directory.
| 
| ~/usr/sbin/MegaCli
| brandelf -t Linux usr/sbin/MegaCli
| 
| cd usr/sbin
| 
| # ./MegaCli -EncInfo -aALL
| 
| ERROR:Could not detect controller.
| # ./MegaCli -CfgDsply -aALL
| 
| ERROR:Could not detect controller.
| 
| Do I actually need to set up the links in /compat/linux/sys for the SAS
| raid card? or should this rpm be installed into the /compat/linux
| directory? I need to upgrade the firmware on this card as for some
| reason the webbios will not let me configure a Raid10 array and the only
| way I can see to upgrade the fw is to use the megacli utility.

Make sure you have the Linux ioctl module loaded before linsysfs so it
can register the hooks.  kldstat/kernel config will help.  One sanity
check is to do:
  dhcp194:ambrisko 11] cat /compat/linux/sys/class/scsi_host/host*/proc_name
  megaraid_sas
  (null)
  dhcp194:ambrisko 12] 

If you don't see megaraid_sas then it isn't going to work and is
missing the linux mfi module.  Also
you need to set:
sysctl compat.linux.osrelease=2.6.12
or things won't work well.  This will probably break your fc-4_9 Linux 
install until the updates to Linux emulation is merged (maybe it
has but I don't think so).  Since it is a static binary we don't have 
linux base installed.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Dell 1950 does not properly respond to reboot and shutdown -p

2006-10-10 Thread Doug Ambrisko
John Baldwin writes:
| On Tuesday 10 October 2006 08:54, Bill Moran wrote:
| > In response to Doug Ambrisko <[EMAIL PROTECTED]>:
| > > Bruno Ducrot writes:
| > > | On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote:
| > > | > In response to Bruno Ducrot <[EMAIL PROTECTED]>:
| > > | > > Hi,
| > > | > > 
| > > | > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote:
| > > | > > > 
| > > | > > > A reboot causes the OS to halt, but the hardware just sits there 
on the
| > > | > > > shutdown screen.
| > > | > > > 
| > > | > > > A shutdown -p does the same.
| > > | > > 
| > > | > > What exactly are the last few lines?
| > > | > 
| > > | > (manually copied)
| > > | > 
| > > | > ...
| > > | > All buffers synced.
| > > | > Uptime: 1m16s
| > > | > 
| > > | 
| > > | Thanks.  Then this happen after print_uptime().
| > > | 
| > > | I believe one of the drivers register a shutdown_final (or
| > > | shutdown_post_sync) event that hang your system.  I think (though I
| > > | may be wrong) mfi may be that one.
| > > | 
| > > | It would help if you can add some printf in dev/mfi/mfi.c into the
| > > | mfi_shutdown() function in order to check if that assumption
| > > | is correct.
| > > 
| > > Some what related to this we have a local hack:
| > > 
| > > --- sys/kern/subr_bus.c.orig  Tue Jun 27 15:49:39 2006
| > > +++ sys/kern/subr_bus.c   Tue Jun 27 15:49:51 2006
| > > @@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev)
| > >   device_t child;
| > >  
| > >   TAILQ_FOREACH(child, &dev->children, link) {
| > > + DELAY(1000);
| > >   device_shutdown(child);
| > >   }
| > 
| > This patch seems to "fix" the problem.  I'm going to replace it with
| > some printfs and see if I can determine which driver is actually
| > causing the problem (hopefully it's only one).
| > 
| > Am I wrong in saying that the correct solution would be to identify the
| > driver that needs more time and implementing some sort of polling
| > mechanism to ensure the hardware is ready when the driver wants to
| > shut down?
| 
| Well, first let's see which driver it is. :)  You might be able to just
| remove the DELAY and add a printf and see which device is printed last.

I think it was in a different ones.  One of our configs has the base
HW + bge NIC the other has base HW + 2 x 2 port em NICs.  The more
NIC's the better chance for a problem.

I've removed the hack from our kernel and I'm going to run the reboot
cycle.  I don't think a printf will work since I recall trying that
it "fixed" the problem so I put the DELAY in :-(  It could be generic
problem to the system with a sufficiently fast CPU to beat the
HW at shutting down.  I'm not sure if his system is Dempsey or Woodcrest.
We use Woodcrest and they are really faster.  Other machines might be 
"slow" enough that it's not a a problem!  We haven't seen it on our older 
platforms with the same kernel and similar HW configs.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Dell 1950 does not properly respond to reboot and shutdown -p

2006-10-04 Thread Doug Ambrisko
Bruno Ducrot writes:
| On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote:
| > In response to Bruno Ducrot <[EMAIL PROTECTED]>:
| > > Hi,
| > > 
| > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote:
| > > > 
| > > > A reboot causes the OS to halt, but the hardware just sits there on the
| > > > shutdown screen.
| > > > 
| > > > A shutdown -p does the same.
| > > 
| > > What exactly are the last few lines?
| > 
| > (manually copied)
| > 
| > ...
| > All buffers synced.
| > Uptime: 1m16s
| > 
| 
| Thanks.  Then this happen after print_uptime().
| 
| I believe one of the drivers register a shutdown_final (or
| shutdown_post_sync) event that hang your system.  I think (though I
| may be wrong) mfi may be that one.
| 
| It would help if you can add some printf in dev/mfi/mfi.c into the
| mfi_shutdown() function in order to check if that assumption
| is correct.

Some what related to this we have a local hack:

--- sys/kern/subr_bus.c.origTue Jun 27 15:49:39 2006
+++ sys/kern/subr_bus.c Tue Jun 27 15:49:51 2006
@@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev)
device_t child;
 
TAILQ_FOREACH(child, &dev->children, link) {
+   DELAY(1000);
device_shutdown(child);
}
 

Seems like we were tearing things done to fast and resources
stolen away from HW that was totally shutdown yet or something.
I think this was worse when things had shared interrupts but
I forget the exact details.  It's been a lot time when I put 
in the hack and moved onto the next fire.  It seems the more HW 
we had in the machine the worse the problem was.

This is just a hack and not a fix.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Comtrol Rocketport driver is severely hosed under 6.x-STABLE

2006-09-07 Thread Doug Ambrisko
Karl Denninger writes:
| 
| There is a severe problem (or set of problmes) with the Comtrol Rocketport
| driver under FreeBSD 6.x, to the point that the driver is basically unusable.
| 
| The driver is returning duplicate input frames and otherwise misbehaving
| badly.  There were no problems under FreeBSD 5.x.
| 
| Does anyone know what has changed in the tty subsystem between 5.x and 6.x,
| or, alternatively if there is no update on this, is there a KNOWN WORKING
| PROPERLY multiport serial board under 6.x?
| 
| This has totally hosed a number of my field installations when they attempted
| to go from the 5.x operating environment to 6.x!
| 
| Thanks in advance

Try this for 6.1 in /sys/dev/rp:

Index: rp.c
===
RCS file: /usr/local/cvsroot/freebsd/src/sys/dev/rp/rp.c,v
retrieving revision 1.67.2.1
diff -u -p -r1.67.2.1 rp.c
--- rp.c8 Nov 2005 15:35:27 -   1.67.2.1
+++ rp.c7 Sep 2006 18:19:44 -
@@ -37,15 +37,18 @@ __FBSDID("$FreeBSD: src/sys/dev/rp/rp.c,
 /* 
  * rp.c - for RocketPort FreeBSD
  */
+#include 
 
 #include "opt_compat.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -57,7 +60,7 @@ __FBSDID("$FreeBSD: src/sys/dev/rp/rp.c,
 #include 
 #include 
 
-static const char RocketPortVersion[] = "3.02";
+static const char RocketPortVersion[] = "1.0";
 
 static Byte_t RData[RDATASIZE] =
 {
@@ -116,6 +119,8 @@ Byte_t rp_sBitMapSetTbl[8] =
0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80
 };
 
+int next_unit_number = 0;
+int num_devices_found = 0;
 /***
 Function: sReadAiopID
 Purpose:  Read the AIOP idenfication number directly from an AIOP.
@@ -587,6 +592,9 @@ static void rp_do_receive(struct rp_port
unsignedint CharNStat;
int ToRecv, wRecv, ch, ttynocopy;
 
+   if (tp->t_state & TS_TBLOCK)
+   return;
+
ToRecv = sGetRxCnt(cp);
if(ToRecv == 0)
return;
@@ -615,7 +623,7 @@ static void rp_do_receive(struct rp_port
CharNStat = rp_readch2(cp,sGetTxRxDataIO(cp));
ch = CharNStat & 0xff;
 
-   if((CharNStat & STMBREAK) || (CharNStat & STMFRAMEH))
+   if((CharNStat & STMBREAKH) || (CharNStat & STMFRAMEH))
ch |= TTY_FE;
else if (CharNStat & STMPARITYH)
ch |= TTY_PE;
@@ -645,6 +653,12 @@ static void rp_do_receive(struct rp_port
if ( ToRecv > RXFIFO_SIZE ) {
ToRecv = RXFIFO_SIZE;
}
+   if ((tp->t_rawq.c_cc + ToRecv > tp->t_ihiwat) &&
+   ((tp->t_cflag & CRTS_IFLOW) ||
+(tp->t_iflag & IXOFF)) &&
+   !(tp->t_state & TS_TBLOCK))
+   ttyblock(tp);
+
wRecv = ToRecv >> 1;
if ( wRecv ) {

rp_readmultich2(cp,sGetTxRxDataIO(cp),(u_int16_t *)rp->RxBuf,wRecv);
@@ -686,6 +700,7 @@ static void rp_handle_port(struct rp_por
IntMask = sGetChanIntID(cp);
IntMask = IntMask & rp->rp_intmask;
ChanStatus = sGetChanStatus(cp);
+   
if(IntMask & RXF_TRIG)
if(!(tp->t_state & TS_TBLOCK) && (tp->t_state & TS_CARR_ON) && 
(tp->t_state & TS_ISOPEN)) {
rp_do_receive(rp, tp, cp, ChanStatus);
@@ -769,22 +784,23 @@ rp_attachcommon(CONTROLLER_T *ctlp, int 
 
unit = device_get_unit(ctlp->dev);
 
-   printf("RocketPort%d (Version %s) %d ports.\n", unit,
-   RocketPortVersion, num_ports);
+   printf("RocketPort%d = %d ports.\n", unit, num_ports);
rp_num_ports[unit] = num_ports;
callout_handle_init(&rp_callout_handle);
 
ctlp->rp = rp = (struct rp_port *)
-   malloc(sizeof(struct rp_port) * num_ports, M_TTYS, M_NOWAIT | 
M_ZERO);
+   malloc(sizeof(struct rp_port) * (num_ports+1), M_TTYS, M_NOWAIT 
| M_ZERO);
if (rp == NULL) {
device_printf(ctlp->dev, "rp_attachcommon: Could not malloc 
rp_ports structures.\n");
retval = ENOMEM;
goto nogo;
}
-
+/* else {
+   device_printf(ctlp->dev, "malloc'd rp_ports 
structures=%08x.\n", rp);
+   }*/
count = unit * 32;  /* board times max ports per card SG */
 
-   bzero(rp, sizeof(struct rp_port) * num_ports);
+   bzero(rp, sizeof(struct rp_port) * (num_ports+1));
oldspl = spltty();
rp_addr(unit) = rp;
splx(oldspl);
@@ -1016,9 +1032,10 @@ rpmodem(struct tty *tp, int sigon, int s
}
return (0);
 }
-
+#define B460800 460800
+#define B921600 921600
 static struc

Re: i386/100160: [mfid] Perc5i: additional symptomatic info on virtual disk detection issue

2006-09-07 Thread Doug Ambrisko
Jeffrey Williams writes:
| I don't know if anyone specifically is working on this, but just tried 
| to install FreeBSD 6.1 from CD on a Dell 2950 with the PERC 5/i SAS 
| controller.
| 
| This server was originally configured with two hardware RAID virual 
| disks, the first was RAID 1 with two 36GB drives, and the second was 
| RAID 5 with three 72 GB drives.
| 
| Just like the original PR, the first was detected and identified in the 
| the installer volume setup as both mfid0 and mfid1.
| 
| In order to try and work around the problem and just get the machine up 
| and running, I tried deleting the RAID 1 virtual disk with the intention 
| of installing everything to the RAID 5 virtual disk, however, with the 
| first virtual disk removed, no drives where detected at all.
| 
| Next I will be trying removing the physical drives original used in the 
| RAID 1 virtual disk, and re-initializing the RAID 5 array.  I will 
| provide an update if successful.
| 
| In the meantime if anybody else is aware of another work around of fix 
| for this, I appreciate hearing about it.  If a patch comes out soon, I 
| will be happy to provide testing, but I have a small window as this 
| server was being implemented as an emergency replacement for another server.

Upgrade the mfi driver to -stable.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: LSI/amr driver controller cache problem?

2006-09-07 Thread Doug Ambrisko
Patrick M. Hausen writes:
| Here's the preliminary results:
| 
| - Controller cache policy: write through (megamgr or BIOS setup)

Write back should be okay with a battery.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: LSI/amr driver controller cache problem?

2006-09-07 Thread Doug Ambrisko
Patrick M. Hausen writes:
| > > Also, check the cache
| > > setting on the drives itself. Maybe the drives are loosing power or
| > > getting reset while data is in their cache.
| > 
| > I'm starting to suspect something like this. The controller's setting
| > for the individual drives' caches is "OFF". But these (Seagate ST3500841NS)
| > would not be the first ATA/SATA drives to "lie" about their cache for
| > "performance".
| 
| Seems like
| 
|   for i in 0 1 2 3 4 5
|   do
|   megarc -pSetCache -WCE0 -SaveCacheSetting -ch0 -id$i -a0
|   done
| 
| did the trick. This is supposed to disable the physical drives'
| write cache and save this setting in the drives' NVRAM, if supported.
| 
| I don't know why simply setting the WC to "off" in the controller's
| BIOS setup tool didn't have the same effect. I'm keeping my fingers
| crossed ;-)
| 
| Time to re-enable softupdates and do some more stress testing.
| 
| Up to now the system survived two times "make installworld && reboot"
| after I changed the settings.
| 
| Thanks to the guys keeping the amr driver up-to-date. The Linux
| "megamgr" utility works just fine. If I find the time, I'll make
| a port.

That would be great.  I'd discourage the idea of MegaMon though since
it leaks shared memory and exits unless LSI has finally fixed it.
So monitoring is a pain.  I guess a watcher script would be okay
but it has a nasty habit of reporting prior errors every time it
starts :-(  We have a native local tool that works but we can't 
re-distribute it.

The mfi driver doesn't have this issues since the driver reports all
events directly.  However, MegaCli doesn't actually create or delete a 
RAID (even with Linux).  I have patches in the wings that deals with 
discovery while the system is up but we need clearance on them.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bce0: Error mapping mbuf into TX chain!

2006-07-12 Thread Doug Ambrisko
David (Controller AE) Christensen writes:
| Sorry, I've been out on vacation and just got back into town.  I'll MFC
| the patch within the next day or two.

I'll let you merge in the down/up fix that I put into -current.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Dell PowerEdge 750 & 850 environtmental monitoring

2006-07-12 Thread Doug Ambrisko
Arnold Cavazos Jr. writes:
| Does anybody have temperature and fan monitoring working on Dell 
| PowerEdge 750's & 850's?  I have done my share of googling without much 
| luck.

The PE850 should just work with ipmi(4) in 6.1-stable/-current and ipmitool.
The PE750 will work with ipmi if you have the Drac card.  It is possible to 
get thermal stuff on the PE750 via smbus but that is more complicated.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: em device hangs on ifconfig alias ...

2006-06-30 Thread Doug Ambrisko
Francisco Reyes writes:
| Atanas writes:
| > I have some newer machines with 2 Broadcom chips on-board. I plan to 
| > give them a try at some point in the future, but I'm not sure how stable 
| > the bge driver
| 
| For us they have been a problem. Primarily because it causes all kinds of 
| freezing/crashes when having an IPMI board. I believe it has performed ok in 
| machines where we don't have an IPMI card.

Can you try:
http://www.ambrisko.com/doug/bge_ipmi_3.patch 
and see if that helps.  I need one minor tweak to it before I can
commit it.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: build fails on amd64 machine

2006-06-15 Thread Doug Ambrisko
Sean McNeil writes:
| I get the following:
| 
| ===> ipmi (depend)
| make: don't know how to make ipmi.c. Stop
| *** Error code 2

That should be fixed.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: megamgr on 6.1?

2006-05-09 Thread Doug Ambrisko
Brian Szymanski writes:
| kldload amr_linux did the trick for me, thanks!

Good to hear.  You might want to look at megarc and megamonitor for Linux.
Hopefully, LSI updated megamonitor to fix the share memory leak or it
will exit in about 1/2 hour on FreeBSD since we don't allow us much
shared memory usage as Linux.  It leaks on Linux.  It just takes longer
to use it all up.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: megamgr on 6.1?

2006-05-09 Thread Doug Ambrisko
Boris Samorodov writes:
| On Tue, 9 May 2006 12:37:43 -0400 (EDT) Brian Szymanski wrote:
| > PS - mknod c 254 /compat/linux/dev/megadev0 (which is what the device is
| > under linux) doesn't help :(
| 
| I't only my imho, use it with care:
| 
| # cd /dev
| # ln -s amr0 megadev0

Nope, it needs to show up in devfs.  Making a node manually is going
to cause trouble.  If there isn't a /dev/megadev0 then you don't
have the amr_linux loaded.  You can try to kldload but you might
have to compile it in static.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Issues with nullconsole in FreeBSD 6.0-p6

2006-04-04 Thread Doug Ambrisko
Jonas B?low writes:
| I'm experiencing a really strange problem using nullconsole in FreeBSD
| 6.0-p6. Briefly, what happens is that the use of nullconsole affects
| the behavior of the OS negatively, very negatively.
| 
| There are two different setups with different kernel
| configurations. They both have console set to nullconsole in
| loader.conf.
| 
| In the first setup the machine reboots spontaneously somewhere during
| boot without leaving a hint of the reason.
| 
| In the other setup there is a fsck process (fsck_4.2bsd) crashing with
| signal 8 (floating point exception) during boot. The fsck is run on an
| auxiliary disk during startup.
| 
| Both these problems goes away if console is set to either vidconsole
| or comconsole in loader.conf.
| 
| Adding DDB to the kernel configuration prevents to machine from
| continuously rebooting in the first setup. Instead, it silently halts
| somewhere in the boot process. Not easy telling where. It's seems to
| be somewhere late in the process. Probably when running rc.d scripts
| by observing the time before reboot compared to when using vidconsole
| or comconsole.
| 
| I've tried to debug the problem. I've not figured out how to remotely
| debug a kernel when using nullconsole. The escape to debugger hot keys
| (Ctrl+Alt+Esc or Ctrl+SysReq) does not work when using
| nullconsole. Therefor it is not possible to switch to remote mode. Can
| DDB be force to go directly into remote mode?
| 
| I really understand it is impossible to give a simple answer or
| solution to my problems described above. Well, if someone knows a
| solution I wouldn't mind sharing it. What I really would like help
| with is some input on how to debug this further. What to look for,
| things to try etc.

We don't seem to have that problem here with our enhanced consmute stuff.
I noticed some implementation strangeness with the newer consmute
implementation.  Our major change is to put it into a function so if
you break into the debugger or it panics you get that stuff.  Now
technically this wouldn't be a good idea with the original motivation
but works for us.

Bug me to remember to extract the patch for you to try.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Temperature monitoring in FreeBSD 4/5/6

2006-03-24 Thread Doug Ambrisko
Stephan Koenig writes:
| Does anyone know of an easy way to get temperature information out of
| a Dell PowerEdge 1550/1650/1750/1850/2650/2850 running FreeBSD4/5/6?
|
| Something that has a very simple CLI that just outputs the temperature
| without any formatting, or a library/sysctl, would be ideal.

For now manually back port the ipmi device driver and then install
the latest ipmitool from ports.  Then you can run ipmitool via the
local interfaces.  Interface that are support are SMIC and KCS.
SSIF is in progress and dealing with some strange ACPI defintions
that put a hole in the address space of the HW :-(  I haven't really
looked at the BT interface yet.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Changing release version on source

2006-03-14 Thread Doug Ambrisko
Henri Hennebert writes:
| > Glenn Dawson wrote:
| >> At 12:06 PM 3/8/2006, you wrote:
| >>> Does anyone know how to change the release version of the source   
| >>> code? I have some brain dead software (Plesk) that insists on   
| >>> FreeBSD 5.3, while it will work just fine on 5.5 and even 6. I am   
| >>> wondering i can change the version of RELENG_5 code so that this   
| >>> software will think its 5.3-R and let me install. I have tried   
| >>> changing the variable in /usr/src/release/Makefile, but that seems  
| >>>  to have no effect.
| >>
| >> Take a look at sys/conf/newvers.sh
| >
| > Excellent, thanks! I'm presuming i have to do a full build/install
| > world for this to take effect. Do you think that anything may break
| > because of this manual change, even if i used RELENG_6 code? I will not
| > be installing any ports.
| 
| I would prefer to wrap /usr/bin/uname with a temporary custom version  
| returning
| the disired values.

You can use UNAME_ over-rides.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: New 'amr' driver and linux MegaMGR

2006-03-02 Thread Doug Ambrisko
Danny Braniss writes:
| > Cristiano Deana writes:
| > | 2006/3/1, Paul Saab <[EMAIL PROTECTED]>:
| > | > works fine
| > | 
| > | I got:
| > |  Failed to open driver node /dev/megadev0
| > 
| > Make sure you have amr_linux.  kldload amr_linux.ko.  Then you should
| > get a /dev/megadev0.  It also works in a static kernel.  You might
| > want to do an ls -l of /dev/megadev0.  This is only available
| > in FreeBSD 6.1 and -current it is not in FreeBSD 6.0.  The changes
| > will drop into FreeBSD 6.0 though.
| 
| i'm getting:
| 
| Copyright (c) 1992-2006 The FreeBSD Project.
| Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
| The Regents of the University of California. All rights reserved.
| FreeBSD 6.1-PRERELEASE #3: Mon Feb 27 10:23:29 IST 2006
| ...
| module_register_init: MOD_LOAD (amr_linux, 0x805f1b00, 0) error 6
| ad4: 286168MB  at ata2-master SATA150
| ad6: 286168MB  at ata3-master SATA150
| ar0: 572072MB  status: READY
| ar0: disk0 READY using ad4 at ata2-master
| ar0: disk1 READY using ad6 at ata3-master
| 
| and no /dev/megadev0
| 
| maybe because:
| [EMAIL PROTECTED]:31:2:  class=0x01048f card=0x34518086 chip=0x25b08086 
rev=0x02 hdr=0x00
| vendor   = 'Intel Corporation'
| device   = '6300ESB Serial ATA Controller (RAID mode)'
| class= mass storage
| subclass = RAID

You don't have an LSI RAID controller.  Those are the built in Intel
SATA ports in RAID mode which is software RAID.  It is being detected
as ata disks and using ata-raid.  So you can't use the LSI RAID tools.
If it was detected via the amr(4) driver then you could use the LSI RAID
tools.

Now it is using the LSI RAID meta-data.  "atacontrol" will manage this
assuming Soren has the meta-data write support for their format.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: LSI Megaraid (amr) performance woes

2006-03-01 Thread Doug Ambrisko
Sven Willenberger writes:
| On Wed, 2006-03-01 at 15:08 -0500, Mike Tancsa wrote:
| > At 02:10 PM 01/03/2006, Sven Willenberger wrote:
| > 
| > >I cvsupped a 6.1 prerelease and found no performance improvements. I did
| > >some further tests and the performance issues seem very specific to the
| > >mirroring aspect of the raid:
| > 
| > 
| > I am not familiar with the LSI cards, but with older 3ware and the 
| > ARECA cards, the raid sets when in any sort of redundancy mode must 
| > initialize in the background before normal use.  Until that is 
| > complete, performance is seriously slow.  Is the LSI doing that, and 
| > perhaps just not telling you ?
| 
| I had thought of this too so I disabled the rapid (background)
| initialization option and let the raids build to completion the slow
| way. So unless it is still building even after it is done (or is doing
| some other odd processor-intensive crc checking or something) I don't
| think this is the source of the problem.

If you run the Linux MegaMon utilties it will tell you if the 
controller is running background tasks like this and tell you the
progress.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: New 'amr' driver and linux MegaMGR

2006-03-01 Thread Doug Ambrisko
Cristiano Deana writes:
[ Charset ISO-8859-1 unsupported, converting... ]
| 2006/3/1, Paul Saab <[EMAIL PROTECTED]>:
| > works fine
| 
| I got:
|  Failed to open driver node /dev/megadev0

Make sure you have amr_linux.  kldload amr_linux.ko.  Then you should
get a /dev/megadev0.  It also works in a static kernel.  You might
want to do an ls -l of /dev/megadev0.  This is only available
in FreeBSD 6.1 and -current it is not in FreeBSD 6.0.  The changes
will drop into FreeBSD 6.0 though.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Disk I/O system hang on 5.4-RELEASE-p8 i386

2006-02-24 Thread Doug Ambrisko
Kris Kennaway writes:
| On Thu, Feb 23, 2006 at 04:44:46PM -0600, Greg Rivers wrote:
| > On Thu, 23 Feb 2006, Michael R. Wayne wrote:
| > 
| > >Been fighting this for a while.  We have an older server, running
| > >5.4-RELEASE-p8 i386 and used primarily for email, which hangs every
| > >couple of weeks.  The hang seems to be in the disk I/O system; pings
| > >succeed, and I can continue get a login: prompt on the console until
| > >I enter a login at which the response stops.
| > >[snip]
| > 
| > I think you're seeing the UFS deadlock I reported last November for 
| > RELENG_6.  See the thread beginning at 
| > http://lists.freebsd.org/pipermail/freebsd-stable/2005-November/019979.html
| > 
| > I believe this issue has made it onto the show-stopper list for 
| > 6.1-RELEASE and is being actively worked on.
| 
| It's on the todo list, but I don't think it's being worked on yet.
| The main problem is that we need a way to reproduce it on command.
| I'd forgotten that snapshots are involved, so maybe it's just a matter
| of running lots of mksnap_ffs while I/O is in progress.

FWIW, I found a problem when creating snapshots in that it could exhaust
available buffers and wedge:

Index: ffs_snapshot.c
===
RCS file: /usr/local/cvsroot/freebsd/src/sys/ufs/ffs/ffs_snapshot.c,v
retrieving revision 1.112
diff -u -p -r1.112 ffs_snapshot.c
--- ffs_snapshot.c  9 Jan 2006 20:42:18 -   1.112
+++ ffs_snapshot.c  24 Feb 2006 23:02:19 -
@@ -336,6 +336,8 @@ restart:
if (error)
goto out;
bawrite(nbp);
+   if (cg % 10 == 0)
+   ffs_syncvnode(vp, MNT_WAIT);
}
/*
 * Copy all the cylinder group maps. Although the
@@ -357,6 +360,8 @@ restart:
goto out;
error = cgaccount(cg, vp, nbp, 1);
bawrite(nbp);
+   if (cg % 10 == 0)
+   ffs_syncvnode(vp, MNT_WAIT);
if (error)
goto out;
}

Fixed this problem for me.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.0 on Dell 1850 with PERC4e/DC RAID?

2006-01-14 Thread Doug Ambrisko
Scott Mitchell writes:
| On Thu, Jan 12, 2006 at 04:41:17PM -0800, Doug Ambrisko wrote:
| > Scott Mitchell writes:
| > | 
| > | That's a pity.  Maybe Doug was thinking of one of the aac(4) based PERC
| > | cards?  Still, something I can run out of cron to check the array status
| > | should be fine.
| > 
| > Are you refering to this Doug.  The Linux ioctl shim requires one file
| > that hasn't been committed yet.  Scott L. & ps have it.  I may commit
| > it now that I'm back.  This lets all of the Dell/LSI Linux tools 
| > run on FreeBSD including the firmware update tool.  The caveat is
| > that with the driver re-do it seems the certain things in the ioctl
| > path causes the firmware to lock-up.  I haven't been around enough
| > to help with that problem.  I have a binary that locks it up pretty
| > quick.
| 
| Hi Doug,
| 
| I was actually referring to Doug White, who said:
| 
| >From what I remember, you will receive status-change kernel messages when
| >disks disappear, rebuilds start, and so forth. So for most day-to-day
| >manipulation you should be fine.
| 
| It wasn't clear if this applied to the amr(4)-based PERC cards or just the
| aac(4) ones.

Yes that only applies to the aac based machines and not amr based machines
(ie. Adaptec versus LSI).  With LSI you have to poll the controller
for RAID events and that is not public.
 
| Sounds like the re-worked amr driver will be very much better, at least
| once a few more bugs have been ironed out of it.

Yes.
 
| > Most of the existing monitoring tools have bugs.  The Linux tools
| > tend to be better but the last copy of MegaMon leaked shared memory
| > then quit.  We have a tool at work but it is encumbered so we can't
| > give it out.
| >  
| > | > I did find a program   
| > | > posted to one of the freebsd lists called 'amrstat' that I run  
| > | > nightly.  It produces this kind of output:
| > | > 
| > | > Drive 0:68.24 GB, RAID1  | > io> optimal
| > | > 
| > | > If it says "degraded" it is time to fix a drive.   You just fire up  
| > | > the lsi megaraid tools and find out which drive it is.
| > 
| > This is probably a faily good scheme.  Caveat is that you can have
| > a "optimal" RAID that is broken :-(
| 
| That's pretty sucky, but presumably not a FreeBSD-specific problem?
| Despite that, I'm reasonably hopeful that a scheme like this along with
| good backups (which we have) will be enough to avoid any major disasters.

It's not a FreeBSD specific problem.
 
| Is Dell's support any better if you tell them you're running RedHat?

We can sort-of run RedHat.  That is, we ran the Linux RAID binaries 
from LSI & Dell with the Linux ioctl emulation layer I did on FreeBSD.
I netboot Linux sometimes to verify some things.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.0 on Dell 1850 with PERC4e/DC RAID?

2006-01-13 Thread Doug Ambrisko
Jung-uk Kim writes:
| On Friday 13 January 2006 11:59 am, Doug Ambrisko wrote:
| > Jung-uk Kim writes:
| > | On Thursday 12 January 2006 07:41 pm, Doug Ambrisko wrote:
| > | > Scott Mitchell writes:
| > | > | > I did find a program
| > | > | > posted to one of the freebsd lists called 'amrstat' that I
| > | > | > run nightly.  It produces this kind of output:
| > | > | >
| > | > | > Drive 0:68.24 GB, RAID1
| > | > | >  optimal
| > | > | >
| > | > | > If it says "degraded" it is time to fix a drive.   You just
| > | > | > fire up the lsi megaraid tools and find out which drive it
| > | > | > is.
| > | >
| > | > This is probably a faily good scheme.  Caveat is that you can
| > | > have a "optimal" RAID that is broken :-(
| > |
| > | That's lame.  Under what condition does it happen, do you know?
| >
| > Running RAID 10, a drive was swapped and the rebuild started on the
| > replacement drive.  The rebuild complained about the source drive
| > for the mirror rebuild having read errors that couldn't be
| > recovered. It continued on and finished re-creating the mirror. 
| > Then the RAID proceeeded onto a background init which they normal
| > did and started failing that and re-starting the background init
| > over and over again. The box changed the RAID from degraded to
| > optimal when the rebuild completed (with errors).  Do a dd of the
| > entire RAID logical device returned an error at the bad sector
| > since it couldn't recover that. The RAID controller reported an I/O
| > error and still left the RAID as optimal.
| >
| > We reported this and where told that's the way it is designed :-(
| > Probably the spec. is defined by whatever the RAID controller
| > happens to do versus what make sense :-(
| >
| > So far this has only happened once.  Changing firmware did not
| > help.
| 
| Similar thing happened to me once or twice (with RAID5) and I thought 
| it was just a broken controller.  If the culprit was design, it IS 
| really lame. :-(

I'd suggest whining to them.  To me "optimal" means "as far as I know
there are no problems with the RAID".  If enough customers whine they
might change their view!

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.0 on Dell 1850 with PERC4e/DC RAID?

2006-01-13 Thread Doug Ambrisko
Mike Tancsa writes:
| At 11:59 AM 13/01/2006, Doug Ambrisko wrote:
| >|
| >| That's lame.  Under what condition does it happen, do you know?
| >
| >Running RAID 10, a drive was swapped and the rebuild started on the
| >replacement drive.  The rebuild complained about the source drive
| >for the mirror rebuild having read errors that couldn't be recovered.
| >It continued on and finished re-creating the mirror.  Then the RAID
| >proceeeded onto a background init which they normal did and started
| >failing that and re-starting the background init over and over again.
| >The box changed the RAID from degraded to optimal when the rebuild
| >completed (with errors).  Do a dd of the entire RAID logical device
| >returned an error at the bad sector since it couldn't recover that.
| >The RAID controller reported an I/O error and still left the RAID as
| >optimal.
| >
| >We reported this and where told that's the way it is designed :-(
| 
| Interesting timing as I ran into this sort of situation on the 
| weekend on a 3ware drive in RAID1. The card had complained for a week 
| about read errors on drive 1. We thought we would wait until the 
| weekend maintenance window to swap it out.  Sadly, before that 
| window, drive zero totally died a horrible death.  We popped in a new 
| drive on port zero, started the rebuild, and it crapped out saying 
| there was a read error on drive 1.  However, there is a check box 
| that says continue the build, even with errors on the source drive.

With Adaptec we used to do a verify of each disk before a swap
to increase our chances of a successful disk swap.  Adaptec was
a little heavy handed in if you are running on the last disk of the
mirror and it has a read-error it will fail the drive.  If you have
a RAID 10 then you lose 1/2 the file system :-(  I'd rather just
get the read error back to the OS then loose the entire drive.
 
| This setup seems to give you the best of both worlds.  We did a quick 
| check of the resultant files compared to backups and only a couple 
| were toasted. (The box is going to be retired in a month, so if there 
| is other hidden fs corruption if it holds out for another 3 weeks we 
| dont care too much). The correct approach would be to do a total 
| restore of course, but this was good enough for us in this 
| situation.  I guess the question is, is this RAID1 in a proper mirror 
| given that there are hard errors on the drive on port 1 ?

That sounds like a good controller assuming it says the RAID is still
degraded and it's not optimal.  I assume "optimal" means everything
is fine and safe to read the entire volume.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.0 on Dell 1850 with PERC4e/DC RAID?

2006-01-13 Thread Doug Ambrisko
Jung-uk Kim writes:
| On Thursday 12 January 2006 07:41 pm, Doug Ambrisko wrote:
| > Scott Mitchell writes:
| > | > I did find a program
| > | > posted to one of the freebsd lists called 'amrstat' that I run
| > | > nightly.  It produces this kind of output:
| > | >
| > | > Drive 0:68.24 GB, RAID1
| > | >  optimal
| > | >
| > | > If it says "degraded" it is time to fix a drive.   You just
| > | > fire up the lsi megaraid tools and find out which drive it is.
| >
| > This is probably a faily good scheme.  Caveat is that you can have
| > a "optimal" RAID that is broken :-(
| 
| That's lame.  Under what condition does it happen, do you know?

Running RAID 10, a drive was swapped and the rebuild started on the
replacement drive.  The rebuild complained about the source drive
for the mirror rebuild having read errors that couldn't be recovered.
It continued on and finished re-creating the mirror.  Then the RAID
proceeeded onto a background init which they normal did and started
failing that and re-starting the background init over and over again.
The box changed the RAID from degraded to optimal when the rebuild
completed (with errors).  Do a dd of the entire RAID logical device
returned an error at the bad sector since it couldn't recover that.
The RAID controller reported an I/O error and still left the RAID as
optimal.

We reported this and where told that's the way it is designed :-(
Probably the spec. is defined by whatever the RAID controller happens
to do versus what make sense :-(

So far this has only happened once.  Changing firmware did not help.

Doug A.

PS. sorry for the null email before this.  Hit the wrong key.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.0 on Dell 1850 with PERC4e/DC RAID?

2006-01-13 Thread Doug Ambrisko
Jung-uk Kim writes:
[ Charset euc-kr unsupported, skipping... ]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.0 on Dell 1850 with PERC4e/DC RAID?

2006-01-12 Thread Doug Ambrisko
Scott Mitchell writes:
| On Fri, Jan 06, 2006 at 10:35:46AM -0500, Vivek Khera wrote:
| > 
| > On Jan 5, 2006, at 5:41 PM, Scott Mitchell wrote:
| > 
| > >I may be getting a new Dell PE1850 soon, to replace our ancient CVS  
| > >server
| > >(still running 4-STABLE).  The new machine will ideally run 6.0 and  
| > >have a
| > >PERC4e/DC RAID card - the one with battery-backed cache.  This is  
| > >listed as
| > 
| > I have an 1850 with the buil-in PERC 4e/Si since all I needed was the  
| > RAID1 mirror of the internal drives.  It works extremely well, and  
| > the speed is quite good.
| 
| We'll only be mirroring the internal drives too for now - the 4e/DC seems
| to be the only RAID option on the 1850 with battery-backed cache, and
| doesn't cost much more for the extra peace-of-mind.
| 
| > As for notices of when the drives go bad, under 4.x I've had disk  
| > failures with the amr driver (different PERC cards) and not gotten  
| > any such notices in the syslog that I recall.
| 
| That's a pity.  Maybe Doug was thinking of one of the aac(4) based PERC
| cards?  Still, something I can run out of cron to check the array status
| should be fine.

Are you refering to this Doug.  The Linux ioctl shim requires one file
that hasn't been committed yet.  Scott L. & ps have it.  I may commit
it now that I'm back.  This lets all of the Dell/LSI Linux tools 
run on FreeBSD including the firmware update tool.  The caveat is
that with the driver re-do it seems the certain things in the ioctl
path causes the firmware to lock-up.  I haven't been around enough
to help with that problem.  I have a binary that locks it up pretty
quick.

Most of the existing monitoring tools have bugs.  The Linux tools
tend to be better but the last copy of MegaMon leaked shared memory
then quit.  We have a tool at work but it is encumbered so we can't
give it out.
 
| > I did find a program   
| > posted to one of the freebsd lists called 'amrstat' that I run  
| > nightly.  It produces this kind of output:
| > 
| > Drive 0:68.24 GB, RAID1  io> optimal
| > 
| > If it says "degraded" it is time to fix a drive.   You just fire up  
| > the lsi megaraid tools and find out which drive it is.

This is probably a faily good scheme.  Caveat is that you can have
a "optimal" RAID that is broken :-(

On another note, ipmi is pretty good to remotely monitor these boxes
and you can run the Dell SOL proxy tool for Linux on FreeBSD then setup
the BIOS on the serial port and connect the serial port to BMC/LAN.

FWIW, I've been working on an openipmi compatible driver.  It basically
works for a bunch of programs that I've tested with as long as they
are compiled with a correct ioctl file.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: scsi card recommendation

2005-04-19 Thread Doug Ambrisko
Rutger Bevaart writes:
| i've got about 15 Dell 1750, 1850 and 2850 boxes that use AMR-based SCSI
| RAID controllers. i can manage these perfectly using emoore's port of the
| amrcontrol and MEGAMGR tools, under 5.x only after adding the 4x-compat
| ports package.

Be very careful.  Use of those utilities can result in random
problems.  I've had to remove all usage of any of that stuff from
our systems.  We've had other programs on the system core dump etc. :-(

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: installing FreeBSD on partition of a SATA Intel 865 raid0 volume

2005-04-07 Thread Doug Ambrisko
Saulius Menkevicius writes:
| Doug White wrote:
| >On Fri, 25 Mar 2005, James Wood wrote:
| >>How do you setup FreeBSD on a partition of a raid0 volume? I downloaded
| >>FreeBSD 5.3, then made a 60 GB partition in my raid volume, and then went to
| >>boot from the CD. It did not see any raid volumes, it just sees two HDs.
| >
| >FreeBSD does not recognize the Adaptec HostRAID metadata so you will not
| >be able to use RAID volumes configured with the HostRAID BIOS. You can use
| >atacontrol to create FreeBSD software RAIDs, however.
|  
| Actually there are is an unofficial patch to support the RAID0 mode in 
| ICH5-R.
| http://www.ambrisko.com/doug/ata/ contains the patch, and I used it 
| without problems for half a year in
| an i865pe/ich5-r configuration with RAID0 disk setup. (That was an older 
| version of the patch, though).

FYI, Saulius ported the Intel RAID meta data to 5.3 a while ago.  I put
it up at:
http://www.ambrisko.com/doug/ata/5.3-intel-raid-meta-data.patch

FYI2, I should have a newer version of my ata patches since I finally
figured out a bug I was having with a NFS root mount.  The nfs_syncer
spin-loops on bp queues and expects an interrupt routine to break it out
on the side.  Then soft updates tends to re-schedule work even when there
is no work to do.  These aren't really part of the ata stuff but are
required to make it work better.  Lastly I've made some more RAID
robustness changes to deal with some error conditions we've caused
here in testing.

I need to roll them out of our dev. tree into my release tree.

Then I should finally start getting to merge the HW support into
FreeBSD 4.X tree now that I know the spin-loop bugs I was seeing 
wasn't something wacky in my code.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: VMware, AIO, what's up?

2005-03-05 Thread Doug Ambrisko
Josef Karthauser writes:
| On Fri, Mar 04, 2005 at 08:00:17PM -0800, Doug Ambrisko wrote:
| > 
| > | I have aio.ko loaded and so it isn't that, and it worked a few days ago.
| > | I'm scratching my head and could really do with a clue stick.  Will
| > | someone please throw me one?
| > 
| > You need to run the
| > vmware-any-any-update
| > patch on the vmware binary to make it work with the new Linux libs.
| > You can google to find it:
| > 
| >   a21p%  ./update vmware
| >   Updating vmware ... VMware Workstation 2.0.4 (build-1142), now patched
| >   a21p% 
| > 
| 
| I've downloaded one from http://ftp.cvut.cz/vmware/, but it doesn't
| work:
| 
| genius% /tmp/vmware-any-any-update89/update vmware
| /usr/local/lib/vmware/bin/vmware
| Updating /usr/local/lib/vmware/bin/vmware ... failed
| Cannot open /usr/local/lib/vmware/bin/vmware: m
| genius% /tmp/vmware-any-any-update89/update vmware /usr/local/lib/vmware/bin 
| Updating /usr/local/lib/vmware/bin ... failed   
| Cannot open /usr/local/lib/vmware/bin: m

Julian said its args. changed from mine:
  jules# ./update vmware  /usr/local/lib/vmware/bin/vmware
  Updating /usr/local/lib/vmware/bin/vmware ... VMware Workstation 2.0.4 
  (build-1142), now patched
  jules#

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: VMware, AIO, what's up?

2005-03-04 Thread Doug Ambrisko
Josef Karthauser writes:
| I'm confused.  VMWare (3) no longer works for me.  I upgraded my base
| linux to base_linux_8 (from 6 I think) and now I get:
| 
| VMware PANIC: (ide0:0) NOT_IMPLEMENTED F(831):712
| VMware PANIC: (VMX) AIO: NOT_IMPLEMENTED F(831):712
| 
| I have aio.ko loaded and so it isn't that, and it worked a few days ago.
| I'm scratching my head and could really do with a clue stick.  Will
| someone please throw me one?

You need to run the
vmware-any-any-update
patch on the vmware binary to make it work with the new Linux libs.
You can google to find it:

  a21p%  ./update vmware
  Updating vmware ... VMware Workstation 2.0.4 (build-1142), now patched
  a21p% 

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "ioctl DIOCSMBR: Operation not permitted" from "boot0cfg -s 1"

2005-03-02 Thread Doug Ambrisko
David Wolfskill writes:
| freebeast(5.4-P)[1] sudo boot0cfg -s 1 -v ad0
| Password:
| boot0cfg: /dev/ad0: ioctl DIOCSMBR: Operation not permitted
| freebeast(5.4-P)[2] 

You might try:
sysctl kern.geom.debugflags=16

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: stable sata patch: panic at kernel boot (can't dump)

2005-02-28 Thread Doug Ambrisko
Dmitry Morozovsky writes:
| On Wed, 16 Feb 2005, Doug Ambrisko wrote:
| 
| DA> I haven't tried using vinum with my patch set.  That might be a problem.  
| DA> I'm not sure if anyone has tried vinum with my patch set most people use
| DA> ata-raid if anything at all.
| 
| I missed: the first machine I tried your patchset at uses vinum for all its 
| life, and it works like a charm. I suppose for this case cmd649 is the 
problem 
| but have no spare pci ATA controll to check...

That's good new and bad news.  Good news that there isn't a problem with
interaction with vinum.  I've been swamped at work so I haven't had a
chance to test vinum with the patch set.

I recall that the cmd649 is a problem controller in general (not specfic
to my patches).  Does it work okay without my patchset?

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: stable sata patch: panic at kernel boot (can't dump)

2005-02-16 Thread Doug Ambrisko
Dmitry Morozovsky writes:
| On Wed, 16 Feb 2005, Doug Ambrisko wrote:
| 
| DA> | trying to boot RELENG_4 kernel with your patches (sata_7) on our FTP I 
got 
| DA> | kernel panic (page fault in kernel mode, pid 2, no dump possible). 
Hardware 
| DA> | involved:
| DA> | 
| DA> | [EMAIL PROTECTED]:~# grep ata /var/run/dmesg.boot 
| DA> | atapci0:  port 
0xa000-0xa03f,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 mem 
0xed10-0xed11 irq 11 at device 8.0 on pci0
| DA> | ata2: at 0x9000 on atapci0
| DA> | ata3: at 0x9800 on atapci0
| DA> | atapci1:  port 
0xb400-0xb40f,0xb000-0xb003,0xac00-0xac07,0xa800-0xa803,0xa400-0xa407 irq 10 at 
device 9.0 on pci0
| DA> | ata4: at 0xa400 on atapci1
| DA> | ata5: at 0xac00 on atapci1
| DA> | atapci2:  port 0xbc00-0xbc0f at device 17.1 
on pci0
| DA> | ata0: at 0x1f0 irq 14 on atapci2
| DA> | ata1: at 0x170 irq 15 on atapci2
| DA> | ad0: 238475MB  [484521/16/63] at ata0-master 
UDMA100
| DA> | ad2: 114473MB  [232581/16/63] at ata1-master 
UDMA100
| DA> | ad4: 76319MB  [155061/16/63] at ata2-master UDMA66
| DA> | ad6: 76319MB  [155061/16/63] at ata3-master UDMA66
| DA> | ad8: 57241MB  [116301/16/63] at ata4-master UDMA100
| DA> | 
| DA> | Kernel paniced just after sio0/sio1, where basic RELENG_4 starts ata 
channel 
| DA> | probes. No serial console at the moment, alas.
| DA> | 
| DA> | Unfortunately I can't bring this machine out of service for long time; 
however, 
| DA> | we can survive occasional reboots/crashes. What other info can I 
provide to 
| DA> | debug this?
| DA> 
| DA> I'd like some clarification.  Does the system boot sometimes and other 
times
| DA> is doesn't?  Once the system is up does it stay up for a while?  It 
doesn't 
| DA> seem like you are not using RAID.  I have a couple more ata bug fixes that
| DA> I need to roll into another patchset.  It fixes a bug in which DMA 
transfers
| DA> have not been cancelled when the controller is reset.  I fixed another
| DA> panic situation in version 8 that happens on boot if you have a bad sector
| DA> at the beginning of the drive.  I'd wait to version 9.  I should be able
| DA> to get that out later today.
| 
| Sorry to not being specific enough ;-)
| 
| No, the system panics reliably, just after sio initializing (for me it seems 
| ata drives probes phase). I did not use hardware RAID, I use vinum over these 
5 
| drives.
| 
| Without the patchset system stays up for months acting as ftp/cvsupd/nfsd 
| server without any single issue.

You are not using any SATA drives or have any SATA adapters correct.
I haven't tried using vinum with my patch set.  That might be a problem.  
I'm not sure if anyone has tried vinum with my patch set most people use
ata-raid if anything at all.

I'm not sure if I'll have time today to setup vinum to test with.  If you
are not using SATA or ata-raid you will only see some minimal advantages
with this patch set.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: stable sata patch: panic at kernel boot (can't dump)

2005-02-16 Thread Doug Ambrisko
Dmitry Morozovsky writes:
| Dear Doug,
| 
| trying to boot RELENG_4 kernel with your patches (sata_7) on our FTP I got 
| kernel panic (page fault in kernel mode, pid 2, no dump possible). Hardware 
| involved:
| 
| [EMAIL PROTECTED]:~# grep ata /var/run/dmesg.boot 
| atapci0:  port 
0xa000-0xa03f,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 mem 
0xed10-0xed11 irq 11 at device 8.0 on pci0
| ata2: at 0x9000 on atapci0
| ata3: at 0x9800 on atapci0
| atapci1:  port 
0xb400-0xb40f,0xb000-0xb003,0xac00-0xac07,0xa800-0xa803,0xa400-0xa407 irq 10 at 
device 9.0 on pci0
| ata4: at 0xa400 on atapci1
| ata5: at 0xac00 on atapci1
| atapci2:  port 0xbc00-0xbc0f at device 17.1 on 
pci0
| ata0: at 0x1f0 irq 14 on atapci2
| ata1: at 0x170 irq 15 on atapci2
| ad0: 238475MB  [484521/16/63] at ata0-master UDMA100
| ad2: 114473MB  [232581/16/63] at ata1-master UDMA100
| ad4: 76319MB  [155061/16/63] at ata2-master UDMA66
| ad6: 76319MB  [155061/16/63] at ata3-master UDMA66
| ad8: 57241MB  [116301/16/63] at ata4-master UDMA100
| 
| Kernel paniced just after sio0/sio1, where basic RELENG_4 starts ata channel 
| probes. No serial console at the moment, alas.
| 
| Unfortunately I can't bring this machine out of service for long time; 
however, 
| we can survive occasional reboots/crashes. What other info can I provide to 
| debug this?

I'd like some clarification.  Does the system boot sometimes and other times
is doesn't?  Once the system is up does it stay up for a while?  It doesn't 
seem like you are not using RAID.  I have a couple more ata bug fixes that
I need to roll into another patchset.  It fixes a bug in which DMA transfers
have not been cancelled when the controller is reset.  I fixed another
panic situation in version 8 that happens on boot if you have a bad sector
at the beginning of the drive.  I'd wait to version 9.  I should be able
to get that out later today.

Another thing that you might want to do is monitor dmesgs for any 
ata/ad errors while the system is running.  Most panics happen later 
after the first error message.  Also you could try looking at 
/var/log/messages.

Thanks,

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: MegaRAID 'Bad Slot' Kernel message and crash.

2005-01-12 Thread Doug Ambrisko
Tony Byrne writes:
| Basically, after some amount of uptime the kernel will emit a "amr0:
| Bad slot x completed" message and pretty soon after this the box goes into a
| partially unresponsive state forcing us to reboot it.  So far the only
| thing triggering the problem is the nightly jobs, where the amount of
| IO is higher than during the day.
| 
| Before deployment, we tested the box with 5.3-STABLE and managed to
| trigger the problem twice.  This forced us to try 4.10-STABLE which
| was fine in testing and for a number of weeks after deployment.
| However, just before new year we saw our first Bad Slot and crash under
| 4.10.  Since then it has happened 3 more times.  We have upgraded the 
firmware to
| the latest version available from Intel, and if anything this has made
| the problem worse.
|
| The machine had 3 disks configured as a single RAID5 array.  A fourth
| disk is configured as a hot-standby.  The card is equipped with 128Mb
| of battery-backed cache.  Write-back caching is enabled on the card.
| Read-ahead caching is enabled in non-adaptive mode.
| 
| Is anyone else using a SRCU42X RAID card and seeing similar
| problems to ours?  What about other cards supported by the amr driver?

We run RAID 10 across 4 drives at work on Dell PE2850's which have amr 
RAID's and no-one has reported this problem to me (which they do).  We run
FreeBSD 4.10 & 5.3 on them.  This is with and without our local mods.
We have most experience with 4.10.  Dell has their own firmware version
(atleast to call it is a PERC controller).

For now this is a "works for me".

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Promise TX2 SATA controllers

2004-10-08 Thread Doug Ambrisko
Jean-Francois Dockes writes:
| Just in case it may help someone (this information is not very easily
| accessible in the archives):
| 
|  - I have a Promise TX2 controller with a PCI ID of 0x3375105a . It works
|for me in 4.10 by adding the new PCI ID everywhere that you'll find the
|other/old one (0x3371105a) in the patch (see next paragraph) or kernel
|source under dev/ata. Don't blame me if you lose your data, I will not
|take responsibility, but this is weakly supported by the the two
|controllers appearing to be handled just the same in -current.

I added it to my local tree and it be in the next patch set.  I need
to add soft error recovery (ie. if one drive has a read error automatically
recovery from the other drive) and a little more graceful addition of a failed
drive back into the RAID.  I also fixed a raid bug in ar_rw which could
lead to a panic on on I/O error. 

Thanks,

Doug A.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: atacontrol Raid, cannot re-add member to array

2004-07-04 Thread Doug Ambrisko
Harald Schmalzbauer writes:
| I've never tried ataraid with "non-raid" controllers but I doubt that 
| detach/attach would work. I asked S?ren about the missing addspare in -stable 
| but never got any answer.

I've add addspare and some other features in my 4.10 Release patches:
http://www.ambrisko.com/doug/ata/ata_stable_sata_5.patch

You might want to give that a try.

Doug A.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Backporting S-ATA driver SiI 3112a to FreeBSD-STABLE?

2004-03-17 Thread Doug Ambrisko
Dag-Erling Smørgrav writes:
| Doug Ambrisko <[EMAIL PROTECTED]> writes:
| > BTW a failure mode in the SATA spec. says freeze (ie. lock up the
| > system if you don't acknowledge SATA issues).
| 
| Huh?  Care to elaborate?

>From the Serial ATA Spec. 1.0a.  Section 11.1 for error handling:
  Error responses are generally classified into four categories
Freeze
Abort
Retry
Track/ignore
  The error handling responses described in this section are not 
  comprehensive and are included to cover specific known error scenarios 
  as well as to illustrate typical error control and recovery actions.
  This section is therefore descriptive and supplemental to the error 
  reporting interface defined in section 10 and implementations may vary 
  in their internal error recovery and control actions.  

  For the most severe error conditions in which state has been critically 
  perturbed in a way that it is not recoverable, the appropriate error 
  response is to freeze and rely on a reset or similar operation to restore 
  all necessary state to return to normal operation.

I have seen freeze result in system lock-ups in which an NMI can't break
into the debugger etc.  With the Promise SATA cards if I have interrupts
enabled and the interrupt handler checks whether or not a drive has
left then the system lock ups go away with the Promise controller.  For
the Intel 6300ESB I need to poll the SATA serror register to look for
SATA errors or the system will lock up.  The way I generate an error
is to either power off the SATA drive or pull the cable while the 
system is running.  I'm running with ata-raid.  Unfortunately the
Intel parts don't interrupt on a SATA condition so I have to poll it.
I need to put a timeout to check for drives coming or going then
it should work close how the Promise controller works.  The Intel
6300ESB does indeed have both sstatus and serror registers even
though they only claim the serror is there.  The sstatus register
says whether or not a drive is there.

Most of the existing ATA code will just bang on the controller and
if a SATA error happens it is ignored and eventually just doing
a inb or outb to the controller will lock up the system.  I did
initial instrumentation to that level of inb/outb.  This wasn't
a lot of fun to debug since if I messed up I'd get a system lock up.
Granted in normal operation this isn't a problem but during failure
recovery that I was testing system lock ups are not good.

Now I really like the SATA stuff since it is pretty easy to implement
kernel support for hot plug like USB drives etc. 

Doug A.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Problem with dc-nics 10,11

2003-08-01 Thread Doug Ambrisko
Holger Kipp writes:
| I have a little problem with dc10, dc11. I use three quad dc cards,
| so far from dc0 up to dc8 with no problems.
| 
| All (dc0 to dc11) are displayed correctly with pciconf and with ifconfig.
| The trouble is with dc10 and dc11 that they don't send any data out and
| also don't react to arp requests etc. - at least using tcpdump won't show
| anything coming in or going out.
| Monitoring from an external system, this is the same. According to the 
| blinkinglights on the switch in between (also tried a hub), pings from
| the other machine (or arp-requests if I don't use a permanent entry) etc
| are send to the correct cable.
| 
| As everything works from dc0 up to dc9, I'd suspect some sort of internal
| name mismatching (like counting devices hexadecimal (dca) versus decimal
| (dc10)).
| 
| This is on an older system (4.6-STABLE). If someone had a similar problem
| and it is now fixed in 4.8-STABLE, please let me know. Couldn't find a PR
| for this...

Considering that I've had 4*4 cards in prior 4.X systems my experience
is that you have a BIOS that is not allocating resources to the
cards after a while.  I run into that before in which the BIOS stop 
setting up PCI devices after a certain number or not traversing 
all bridges.

Doing a dmesg and looking IRQ allocation is a good starting point.
It's probably bad.

Doug A.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Any support for Intel ICH Watchdog ?

2002-11-10 Thread Doug Ambrisko
Don Bowman writes:
| http://www.intel.com/design/chipsets/applnots/29227301.pdf
| describes what is needed to support the watchdog (so that
| stuck servers get unstuck :)

I have some code at:
http://www.ambrisko.com/doug/watchdog/
The implements SW & HW watch dogs.  If HW exists it links in via syctl
patches that lets the SW watch dog control HW watch dogs if they are in
the system.  This was done to permit better watch dog timeouts then
HW since some hardware is very limited on the time duration so it is
used to "enforce" the SW watch dog is still running.  If the SW watch dog
stops updated the HW watch dog then the machine reboots.  The other advantage
is that if the SW can provide the main watch dog service then it can 
cause a panic to figure out what went wrong.

It has support for the Intel TCO watchdog and SIS630 chipset.

This is prototype code that works.  A scheme to add in sub drivers needs
to be added.  When FreeBSD decides how this should work then I'll 
probably redo in that sense.

Caveat is no real HW bounds checks are done for valid timeout.

The sysctl interface is nice in that you can kld{load,unload} the HW part
and leave the SW part working.  It also allows the watch dogs to be
disabled when you enter the debugger etc.

| Is there any support for this in freebsd stable?

Yes it runs on -stable.

Doug A.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: Aironet 350

2002-09-25 Thread Doug Ambrisko

Andrew Thompson writes:
| I have a Cisco Aironet 350 wireless card which I am using in my FreeBSD 
| laptop. It works well except for the monitor mode, if I type the 
| follwoing commands the laptop will reset itself (no kernel panic, goes 
| straight to the post startup).
| 
| < insert card >
| ancontrol -i an0 -M 3
| ifconfig an0 up

Well I don't do it that order.  So maybe something busted.  Are you running
X when you do this.  If you run X you are not likely to see a panic.
Try to do it just from syscons.  Also make sure you have kernel core
dumps setup on your machine.  None of that should matter though since
it doesn't actually go into monitor mode until it is put in promiscous mode.

| I am running "FreeBSD 4.7-RC #0: Wed Sep 25 11:26:38 NZST 2002" on a 
| Compaq evo n1000v and the card model is a AIR-PCM352.  I have googled 
| but not found anything, has anyone else come across this?

I haven't heard.

Doug A.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: Problems with D-Link DEF-580TX??

2002-08-02 Thread Doug Ambrisko

Kal Torak writes:
| Just wondering if anyone has been able to use the new DEF-580TX
| quad port card without any problems??

No ... doesn't seem possible but you can make it happier with these patches.
I need to test under current and then I will commit them.

Seems like the chip has a fundamental problem to block any I/O but it's
own under have RX load.  Happens under Linux, Windows, FreeBSD etc.

The Znyx (http://www.znyx.com/) cards seem to work fine under -stable.
I have the 32 and 64 bit versions of the dc(4) versions.

Doug A.

Index: sys/pci/if_ste.c
===
RCS file: /cvs/src/sys/pci/if_ste.c,v
retrieving revision 1.14.2.5
diff -u -r1.14.2.5 if_ste.c
--- sys/pci/if_ste.c16 Dec 2001 15:46:08 -  1.14.2.5
+++ sys/pci/if_ste.c3 Aug 2002 03:36:06 -
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -415,6 +416,7 @@
 {
struct ste_softc*sc;
struct mii_data *mii;
+   int i;
 
sc = device_get_softc(dev);
mii = device_get_softc(sc->ste_miibus);
@@ -425,6 +427,15 @@
STE_CLRBIT2(sc, STE_MACCTL0, STE_MACCTL0_FULLDUPLEX);
}
 
+   STE_SETBIT4(sc, STE_ASICCTL,STE_ASICCTL_RX_RESET |
+   STE_ASICCTL_TX_RESET);
+   for (i = 0; i < STE_TIMEOUT; i++) {
+   if (!(CSR_READ_4(sc, STE_ASICCTL) & STE_ASICCTL_RESET_BUSY))
+   break;
+   }
+   if (i == STE_TIMEOUT)
+   printf("ste%d: rx reset never completed\n", sc->ste_unit);
+
return;
 }
  
@@ -643,6 +654,9 @@
ste_stats_update(sc);
}
 
+   if (status & STE_ISR_LINKEVENT)
+   mii_pollstat(device_get_softc(sc->ste_miibus));
+
if (status & STE_ISR_HOSTERR) {
ste_reset(sc);
ste_init(sc);
@@ -669,17 +683,20 @@
 struct mbuf*m;
 struct ifnet   *ifp;
struct ste_chain_onefrag*cur_rx;
-   int total_len = 0;
+   int total_len = 0, count=0;
u_int32_t   rxstat;
 
ifp = &sc->arpcom.ac_if;
 
-again:
+   while((rxstat = sc->ste_cdata.ste_rx_head->ste_ptr->ste_status)
+ & STE_RXSTAT_DMADONE) {
+   if ((STE_RX_LIST_CNT - count) < 3) {
+   break;
+   }
 
-   while((rxstat = sc->ste_cdata.ste_rx_head->ste_ptr->ste_status)) {
cur_rx = sc->ste_cdata.ste_rx_head;
sc->ste_cdata.ste_rx_head = cur_rx->ste_next;
-
+ 
/*
 * If an error occurs, update stats, clear the
 * status word and leave the mbuf cluster in place:
@@ -730,29 +747,9 @@
/* Remove header from mbuf and pass it on. */
m_adj(m, sizeof(struct ether_header));
ether_input(ifp, eh, m);
-   }
-
-   /*
-* Handle the 'end of channel' condition. When the upload
-* engine hits the end of the RX ring, it will stall. This
-* is our cue to flush the RX ring, reload the uplist pointer
-* register and unstall the engine.
-* XXX This is actually a little goofy. With the ThunderLAN
-* chip, you get an interrupt when the receiver hits the end
-* of the receive ring, which tells you exactly when you
-* you need to reload the ring pointer. Here we have to
-* fake it. I'm mad at myself for not being clever enough
-* to avoid the use of a goto here.
-*/
-   if (CSR_READ_4(sc, STE_RX_DMALIST_PTR) == 0 ||
-   CSR_READ_4(sc, STE_DMACTL) & STE_DMACTL_RXDMA_STOPPED) {
-   STE_SETBIT4(sc, STE_DMACTL, STE_DMACTL_RXDMA_STALL);
-   ste_wait(sc);
-   CSR_WRITE_4(sc, STE_RX_DMALIST_PTR,
-   vtophys(&sc->ste_ldata->ste_rx_list[0]));
-   sc->ste_cdata.ste_rx_head = &sc->ste_cdata.ste_rx_chain[0];
-   STE_SETBIT4(sc, STE_DMACTL, STE_DMACTL_RXDMA_UNSTALL);
-   goto again;
+   
+   cur_rx->ste_ptr->ste_status = 0;
+   count++;
}
 
return;
@@ -836,11 +833,9 @@
void*xsc;
 {
struct ste_softc*sc;
-   struct ste_statsstats;
struct ifnet*ifp;
struct mii_data *mii;
-   int i, s;
-   u_int8_t*p;
+   int s;
 
s = splimp();
 
@@ -848,24 +843,23 @@
ifp = &sc->arpcom.ac_if;
mii = device_get_softc(sc->ste_miibus);
 
-   p = (u_int8_t *)&stats;
-
-   for (i = 0; i < sizeof(stats); i++) {
-   *p = CSR_READ_1(sc, STE_STATS + i);
-   p++;
-   }
-
-   ifp->if_collisions += stats.ste_singl

Re: sis0: incorrect mac address

2002-02-18 Thread Doug Ambrisko

W. Desjardins writes:
| Hello,
| 
| on 3 out of 4 servers just installed, I get this when looking at ifconfig:
| 
| sis0: flags=8843 mtu 1500
| inet 66.28.74.109 netmask 0xffe0 broadcast 66.28.74.127
| inet6 fe80::d483:b781:285a:6ea1%sis0 prefixlen 64 scopeid 0x1
| ether 00:00:00:00:00:00
|  NOTE:--->^
| media: Ethernet autoselect (100baseTX )
| status: active
| 
| these machines are due for production, but without valid mac's, they cant
| talk to each other.
| 
| systems are running 4.5 RELEASE with custom kernel (GENERIC had same
| results). The motherboard is an asus cusi-fx sis socket 370 with the sis
| 630e onboard fast ethernet chipset.
| 
| I have 7 more of these exact same machines with most also running 4.5R
| fine and showing normal mac addresses. normally I run stable on all my
| machines, but I have been bringing them up to 4.5R to get them all in sync
| with each other since they are all identical.
| 
| has anyone had any problems with the recent versions of this motherboard
| or am I looking at a few bad chipsets?

Well you are dealing with an obsolete board.  They may have built
some with a 630ET chipset which is used on the ASUS TUSI motherboards.
Here is a patch that fixes 630ET support in -stable (already fixed
in -current).  Note the TUSI and CUSI board look exactly the same
except for voltage regulator.  We have a bunch of the newer TUSI 
boards here.

If this patch doesn't work can you add a printf to dump the
"sc->sis_rev" value?

Thanks,

Doug A.

Index: if_sisreg.h
===
RCS file: /cvs/src/sys/pci/if_sisreg.h,v
retrieving revision 1.1.4.9
diff -u -r1.1.4.9 if_sisreg.h
--- if_sisreg.h 9 Feb 2002 23:02:40 -   1.1.4.9
+++ if_sisreg.h 19 Feb 2002 03:49:55 -
@@ -369,7 +369,7 @@
 #define SIS_REV_630E   0x0081
 #define SIS_REV_630S   0x0082
 #define SIS_REV_630EA1 0x0083
-#define SIS_REV_630ET  0x0083
+#define SIS_REV_630ET  0x0084
 #define SIS_REV_6350x0090
 
 /*
Index: if_sis.c
===
RCS file: /cvs/src/sys/pci/if_sis.c,v
retrieving revision 1.13.4.19
diff -u -r1.13.4.19 if_sis.c
--- if_sis.c9 Feb 2002 23:02:40 -   1.13.4.19
+++ if_sis.c19 Feb 2002 03:49:55 -
@@ -919,11 +919,11 @@
 */
if (sc->sis_rev == SIS_REV_630S ||
sc->sis_rev == SIS_REV_630E ||
-   sc->sis_rev == SIS_REV_630EA1 ||
-   sc->sis_rev == SIS_REV_630ET)
+   sc->sis_rev == SIS_REV_630EA1)
sis_read_cmos(sc, dev, (caddr_t)&eaddr, 0x9, 6);
 
-   else if (sc->sis_rev == SIS_REV_635)
+   else if (sc->sis_rev == SIS_REV_635 ||
+sc->sis_rev == SIS_REV_630ET)
sis_read_mac(sc, dev, (caddr_t)&eaddr);
else
 #endif
@@ -937,13 +937,6 @@
 */
printf("sis%d: Ethernet address: %6D\n", unit, eaddr, ":");
 
-   /*
-* From the Linux driver:
-* 630ET : set the mii access mode as software-mode
-*/
-   if (sc->sis_rev == SIS_REV_630ET)
-   SIS_SETBIT(sc, SIS_CSR, SIS_CSR_ACCESS_MODE);
-   
sc->sis_unit = unit;
callout_handle_init(&sc->sis_stat_ch);
bcopy(eaddr, (char *)&sc->arpcom.ac_enaddr, ETHER_ADDR_LEN);

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message