Re: Any chance of someone commiting the patch in bin/131861 ?

2010-05-14 Thread Pete French

> Postfix will re-write this as part of sanitization, so I had to revert
> to creating mbox files by hand. Anyway, could you please test the
> following patch with a wider variety of mails?

I've been testing your patch for a few weeks now as my main email
client, and I havent encountered any problems - it also does fix
the reply issues I was originally having. Do you want to attach it
to the PR ? After that maybe someone could commit it - I am pretty
certain it doesnt actualy break any exising behaviour.

cheers,

-pete.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ipv6_ifconfig__alias not working

2010-05-14 Thread Spil Oss
Hi,

I'm trying to set ipv6 aliases for my jails in my rc.conf but it
doesn't seem to work as advertised. I have a /48 range assigned to me
(for this example 2001:dead:beef) and am trying to assign ipv6
addresses to a jail. The jails will all have ipv6 addresses in the
2001:dead:beef:1 range.

>From man rc.conf "Aliases should be set as ipv6_ifconfig__alias"

My bge0 config in /etc/rc.conf:
ifconfig_bge0="inet6 2001:dead:beef:::1/64 up"
ipv4_addrs_bge0="10.10.2.1/24 10.10.2.2/24 10.10.2.3/24 10.10.2.5/24
10.10.2.6/24"
ipv6_ifconfig_bge0_alias0="2001:dead:beef:1::5/64"
rtadvd_interfaces="wlan0 bge0"

Additional ipv6 config in /etc/rc.conf
ipv6_enable="YES"
ipv6_gateway_enable="YES"

The "2001:dead:beef:1::5/64" address is not assigned to bge0.
There must be some stupid mistake I'm making in my config. Is it
perhaps the ifconfig_bge0 line that screws up my config?

Kind regards,

Spil.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ipv6_ifconfig__alias not working

2010-05-14 Thread Matthew Seaman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14/05/2010 10:07:23, Spil Oss wrote:

> I'm trying to set ipv6 aliases for my jails in my rc.conf but it
> doesn't seem to work as advertised. I have a /48 range assigned to me
> (for this example 2001:dead:beef) and am trying to assign ipv6
> addresses to a jail. The jails will all have ipv6 addresses in the
> 2001:dead:beef:1 range.
> 
>>From man rc.conf "Aliases should be set as ipv6_ifconfig__alias"
> 
> My bge0 config in /etc/rc.conf:
> ifconfig_bge0=
> ipv4_addrs_bge0="10.10.2.1/24 10.10.2.2/24 10.10.2.3/24 10.10.2.5/24
> 10.10.2.6/24"
> ipv6_ifconfig_bge0_alias0="
> rtadvd_interfaces="wlan0 bge0"
> 
> Additional ipv6 config in /etc/rc.conf
> ipv6_enable="YES"
> ipv6_gateway_enable="YES"
> 
> The "2001:dead:beef:1::5/64" address is not assigned to bge0.
> There must be some stupid mistake I'm making in my config. Is it
> perhaps the ifconfig_bge0 line that screws up my config?

Hmmm... for consistencies' sake you should probably be using:

ipv6_ifconfig_bge0="2001:dead:beef:::1/64"
ipv6_ifconfig_bge0_alias0="2001:dead:beef:1::5/64"

or, to make things absolutely parallel to your IPv4 settings:

ipv6_addrs_bge0="2001:dead:beef:::1/64 2001:dead:beef:1::5/64"

Cheers,

Matthew

- -- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
  Kent, CT11 9PW
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvtGpoACgkQ8Mjk52CukIyauACeIVpsDf2VfGT0IpJXf0DQ2wLc
ROQAoIomIPblYcDCtYDU1pjDakzHMbWN
=OwJ5
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ipv6_ifconfig__alias not working

2010-05-14 Thread Spil Oss
Thanks for the hints Matthew!

Cleaning up my config I found the culprit. Copied
ipv6_network_interfaces="gif0"
from some guide which off course defeated all my efforts to configure
ipv6 on the other interfaces.

The ipv6_addrs_ knob doesn't seem to work (this is 8.0-p2),
can't find any references to it in the subr files either. Saw that
there's quite a bit of changes in -head though

Kind regards,

Spil.

On Fri, May 14, 2010 at 11:40 AM, Matthew Seaman
 wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 14/05/2010 10:07:23, Spil Oss wrote:
>
>> I'm trying to set ipv6 aliases for my jails in my rc.conf but it
>> doesn't seem to work as advertised. I have a /48 range assigned to me
>> (for this example 2001:dead:beef) and am trying to assign ipv6
>> addresses to a jail. The jails will all have ipv6 addresses in the
>> 2001:dead:beef:1 range.
>>
>>>From man rc.conf "Aliases should be set as 
>>>ipv6_ifconfig__alias"
>>
>> My bge0 config in /etc/rc.conf:
>> ifconfig_bge0=
>> ipv4_addrs_bge0="10.10.2.1/24 10.10.2.2/24 10.10.2.3/24 10.10.2.5/24
>> 10.10.2.6/24"
>> ipv6_ifconfig_bge0_alias0="
>> rtadvd_interfaces="wlan0 bge0"
>>
>> Additional ipv6 config in /etc/rc.conf
>> ipv6_enable="YES"
>> ipv6_gateway_enable="YES"
>>
>> The "2001:dead:beef:1::5/64" address is not assigned to bge0.
>> There must be some stupid mistake I'm making in my config. Is it
>> perhaps the ifconfig_bge0 line that screws up my config?
>
> Hmmm... for consistencies' sake you should probably be using:
>
> ipv6_ifconfig_bge0="2001:dead:beef:::1/64"
> ipv6_ifconfig_bge0_alias0="2001:dead:beef:1::5/64"
>
> or, to make things absolutely parallel to your IPv4 settings:
>
> ipv6_addrs_bge0="2001:dead:beef:::1/64 2001:dead:beef:1::5/64"
>
>        Cheers,
>
>        Matthew
>
> - --
> Dr Matthew J Seaman MA, D.Phil.                   7 Priory Courtyard
>                                                  Flat 3
> PGP: http://www.infracaninophile.co.uk/pgpkey     Ramsgate
>                                                  Kent, CT11 9PW
> -BEGIN PGP SIGNATURE-
> Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkvtGpoACgkQ8Mjk52CukIyauACeIVpsDf2VfGT0IpJXf0DQ2wLc
> ROQAoIomIPblYcDCtYDU1pjDakzHMbWN
> =OwJ5
> -END PGP SIGNATURE-
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling watchdog

2010-05-14 Thread John Baldwin

rihad wrote:

On 05/14/2010 04:13 AM, Doug Ambrisko wrote:

rihad writes:
| Hi, I'm thinking of enabling the watchdog on our Dell PowerEdge 2950 /
| FreeBSD 8.0 amd64, so that it reboots the machine in case of lockups.
| Right now it doesn't work:
|
| # watchdog
| watchdog: patting the dog: Operation not supported
| #
| Looking through the kernel configuration I found two relevant settings:
| In /sys/conf/NOTES:
| #
| # Add software watchdog routines.
| #
| options SW_WATCHDOG
|
| and in /sys/amd64/conf/NOTES:
| #
| # Watchdog routines.
| #
| options MP_WATCHDOG
|
| Which of them should I rebuild the kernel with? BTW, the existing 
kernel

| is built with the default "options SCHED_ULE" to make good use of
| multiple CPUs, does watchdog work with it?

If no one has said yet, kldload ipmi then run watchdogd.  ... or compile
it into the kernel.  This will enable the IPMI HW watchdog.  If it 
triggers,

it will appear in the IPMI SEL (ipmitool sel list).

Thanks. So did I understand it right that I should first install 
sysutils/ipmitool, then start polling "ipmitool sel list" in a shell 
script from a cron job run once a minute, and reboot in case IPMI 
triggers? But if it's a kernel lockup, none of the user level code might 
run at all. Any way to fall back to a hard and fast kernel level machine 
reset?


No, watchdogd and the IPMI driver will manage the watchdog.  You can use 
'sel elist' after a reboot to see if the reboot was triggered via the 
watchdog.


--
John Baldwin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread John Baldwin

Terry Kennedy wrote:

  I'm reposting this over here at the suggestion of the Forums moderator.
The original post is at http://forums.freebsd.org/showthread.php?t=14163

Got an interesting crash just now (well, as interesting as a crash on a 
soon-to-be production system can be).


This is 8-STABLE/amd64, last cvsup'd early in the morning of May 9th.

The system didn't complete the crash dump, so it needed a manual reset to get 
it going again.


The crash was a "page fault while in kernel mode" with the current process 
being the interrupt service routine for the bce0 GigE. Things progressed 
reasonably until partway through the dump, when the system locked up with a 
"Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the 
same PID as reported in the main crash.


Hmm.  You could try changing the code to not do a nested panic in that 
case.  You would update subr_turnstile.c to just return if panicstr is 
not NULL rather than calling panic.  However, there is still a good 
chance you will end up deadlocking in that case.  I have another patch I 
can send you next week that prevents blocking on mutexes duing a panic 
which may also help.


3) Is there any way to rig the system to obtain more info if this happens 
again? Right now I'm using an embedded remote console server, but I could 
switch the system to a serial port if enabling the kernel debugger might help. 
But I think that the sleeping thread bit would happen even at the debugger 
prompt, wouldn't it? 


Include DDB and enable the 'trace_on_panic' sysctl knob perhaps.

I just booted the new kernel and tried this again, and got another crash. The 
message is identical to the first, except that the instruction pointer changed 
by 0x10 (presumably due to code differences between the old and new kernels) 
and it got 6MB further writing the crash dump.


Since it seems I can reproduce this at will, I'll be glad to either perform 
additional information-gathering or give a developer access to the box for 
testing purposes.


Is it possible to correlate the source line in the kernel with the instruction 
pointer in the panic? 


If you are booted into the same kernel with the same modules loaded, you 
can probably run 'kgdb' as root do 'l *'.


--
John Baldwin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Terry Kennedy

> The crash was a "page fault while in kernel mode" with the current process
> being the interrupt service routine for the bce0 GigE. Things progressed
> reasonably until partway through the dump, when the system locked up with a
> "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the
> same PID as reported in the main crash.

Hmm.  You could try changing the code to not do a nested panic in that
case.  You would update subr_turnstile.c to just return if panicstr is
not NULL rather than calling panic.  However, there is still a good
chance you will end up deadlocking in that case.  I have another patch I
can send you next week that prevents blocking on mutexes duing a panic
which may also help.


 Ok, I'll be glad to try that.


> 3) Is there any way to rig the system to obtain more info if this happens
> again? Right now I'm using an embedded remote console server, but I could
> switch the system to a serial port if enabling the kernel debugger might help.
> But I think that the sleeping thread bit would happen even at the debugger
> prompt, wouldn't it?

Include DDB and enable the 'trace_on_panic' sysctl knob perhaps.


 Hmmm. Do you think it will get very far before the sleeping thread business
locks it up?


> Is it possible to correlate the source line in the kernel with the instruction
> pointer in the panic?

If you are booted into the same kernel with the same modules loaded, you
can probably run 'kgdb' as root do 'l *'.


 I did that and discovered that the 0x20: prefix is probably unwanted:

(kgdb) l *0x20:0x801e3c06
A syntax error in expression, near `:0x801e3c06'.
(kgdb) l *0x801e3c06
0x801e3c06 is in bce_start_locked (/usr/src/sys/dev/bce/if_bce.c:6996).
6991}
6992
6993count++;
6994
6995/* Send a copy of the frame to any BPF listeners. */
6996ETHER_BPF_MTAP(ifp, m_head);
6997}
6998
6999/* Exit if no packets were dequeued. */
7000if (count == 0) {
(kgdb) 


 This kernel does have BPF compiled in, but I don't think it was in use at
the time. 


 Any further suggestions to look at (remember, this system is in another
state from me and all I have is remote access to the framebuffer - I'd have
to go there and set up a serial console to be able to talk to the debugger
if it crashes).

Thanks,
   Terry Kennedy http://www.tmk.com
   te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: regression in dc(4) from 7.2 to RELENG_8

2010-05-14 Thread Andriy Gapon
on 14/05/2010 09:42 Chris Buechler said the following:
> one of our users has reported a regression in dc(4) on RELENG_8, the
> cards work fine on 7.2 and previous versions, but no longer function at
> all with RELENG_8 as of about a week ago.
> http://forum.pfsense.org/index.php/topic,24964.msg129488.html#msg129488

Perhaps this might be a cardbus issue (or even a more general issue) rather 
than a
dc(4) issue.
But first please try this patch reversed:

--- a/sys/dev/dc/if_dc.c
+++ b/sys/dev/dc/if_dc.c
@@ -331,7 +331,6 @@ static driver_t dc_driver = {

 static devclass_t dc_devclass;

-DRIVER_MODULE(dc, cardbus, dc_driver, dc_devclass, 0, 0);
 DRIVER_MODULE(dc, pci, dc_driver, dc_devclass, 0, 0);
 DRIVER_MODULE(miibus, dc, miibus_driver, miibus_devclass, 0, 0);



> dmesg from it working, from 7.2:
> cbb0:  at device 11.0 on pci0
> cardbus0:  on cbb0
> pccard0: <16-bit PCCard bus> on cbb0
> cbb0: [ITHREAD]
> cbb1:  at device 11.1 on pci0
> cardbus1:  on cbb1
> pccard1: <16-bit PCCard bus> on cbb1
> cbb1: [ITHREAD]
> dc0:  port 0x1080-0x10ff mem
> 0x8800-0x880007ff,0x88001000-0x880017ff irq 11 at device 0.0 on
> cardbus0
> miibus1:  on dc0
> tdkphy0:  PHY 0 on miibus1
> tdkphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> dc0: Ethernet address: 00:xx:xx:xx:xx:56
> dc0: [ITHREAD]
> dc1:  port 0x1100-0x117f mem
> 0x88002000-0x880027ff,0x88003000-0x880037ff irq 11 at device 0.0 on
> cardbus1
> miibus2:  on dc1
> tdkphy1:  PHY 0 on miibus2
> tdkphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> dc1: Ethernet address: 00:xx:xx:xx:xx:66
> dc1: [ITHREAD]
> 
> Not working, RELENG_8:
> cbb0:  at device 11.0 on pci0
> cardbus0:  on cbb0
> pccard0: <16-bit PCCard bus> on cbb0
> cbb0: [FILTER]
> cbb1:  at device 11.1 on pci0
> cardbus1:  on cbb1
> pccard1: <16-bit PCCard bus> on cbb1
> cbb1: [FILTER]
> cardbus0: Unable to allocate resource to read CIS.
> cardbus0: Unable to allocate resources for CIS
> cardbus0: Unable to allocate resource to read CIS.
> cardbus0: Unable to allocate resources for CIS
> dc0:  port 0x1080-0x10ff mem
> 0x8800-0x880007ff,0x88001000-0x880017ff irq 11 at device 0.0 on
> cardbus0
> dc0: No station address in CIS!
> device_attach: dc0 attach returned 6
> cardbus1: Unable to allocate resource to read CIS.
> cardbus1: Unable to allocate resources for CIS
> cardbus1: Unable to allocate resource to read CIS.
> cardbus1: Unable to allocate resources for CIS
> dc1:  port 0x1080-0x10ff mem
> 0x88002000-0x880027ff,0x88003000-0x880037ff irq 11 at device 0.0 on
> cardbus1
> dc1: No station address in CIS!
> device_attach: dc1 attach returned 6
> 
> 
> We can apply patches to our builds for this person and others to test
> and confirm the fix, before it's committed into FreeBSD.
> 
> Chris
> 
> ___
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> 


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Mount root error / New device numbering?

2010-05-14 Thread Fred Souza
On Fri, May 14, 2010 at 00:32, Fred Souza  wrote:
> Good to know, I never really paid much attention to those details (I
> will from now on). Thank you a lot for the help, Jeremy. I will try
> your suggestions in the morning and post back to tell what did I find
> out.

Like I said, here are my findings:

Jeremy's pointers were very correct, the difference in numbering seems
to be just an ata(4) change. Manually changing entries in /etc/fstab
does fix it, and I found out that the kernel panic I was getting was
merely a simple detail I overlooked:

The 3rd-party nvidia driver had been compiled on -RELEASE and was
causing the kernel panics on -STABLE. Simply disabling its loading at
the boot loader prompt, then booting with /etc/fstab properly updated
and then reinstalling the nvidia-driver port (`portmaster
nvidia-driver`) fixed it. Just to be on the safe side, I also
reinstalled the other 3rd-party kernel module I use (fusefs-ntfs3g),
even though it wasn't giving me any errors.

I did try the -STABLE snapshot image as Jeremy suggested, that's how I
figured he was right about the numering difference being an ata(4)
change. I preferred to just manually change the previous install's
/etc/fstab, though (but maybe there was a better way of doing this
with the -STABLE snapshot DVD).

The interrupt storm on irq21 is still happening, and I'm going to work
on that next. Mounting any non-audio CD/DVD stops it, so I'll keep
doing that until I actually find a fix for the issue.

So thank you very much, Jeremy. Your pointers were very helpful in
fixing my problem.


Best regards,
Fred
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Matthew Fleming
> > The crash was a "page fault while in kernel mode" with the current process 
> > being the interrupt service routine for the bce0 GigE. Things progressed 
> > reasonably until partway through the dump, when the system locked up with a 
> > "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". Thats the 
> > same PID as reported in the main crash.
> 
> Hmm.  You could try changing the code to not do a nested panic in that 
> case.  You would update subr_turnstile.c to just return if panicstr is 
> not NULL rather than calling panic.  However, there is still a good 
> chance you will end up deadlocking in that case.  I have another patch I 
> can send you next week that prevents blocking on mutexes duing a panic 
> which may also help.

It would be instructive to know exactly why we were in turnstile(9) but its 
likely due to mtx contention.

AIX has some code at the beginning of all the locking operations to avoid 
taking locks if we were running code out of kdb, though getting that worked out 
was slightly tricky with our variant of mtx_assert(9).  I seem to recall there 
was also some "lockbusting" code that forcibly reset all owned locks to have no 
owner, at least in some paths.

Given that the system is single-cpu and should be single-threaded when dumping, 
this seems to me to be something worth working through to get more reliable 
dumps.  Except for mtx_assert(9) I cant think of a reason to take locks once we 
start dumping or when in the debugger.

As an aside, with terribly corrupted locks Ive seen double panics when the 
attempt to print the lock name faulted in strlen(9) called for printf(9), due 
to a bad lockname pointer.  We have been able to get enough info off these 
crashes to debug them, but its useful to remember that the system may be in a 
very unstable state depending on why it panics.

Thanks,
matthew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling watchdog

2010-05-14 Thread Doug Ambrisko
rihad writes:
| On 05/14/2010 04:13 AM, Doug Ambrisko wrote:
| > rihad writes:
| > | Hi, I'm thinking of enabling the watchdog on our Dell PowerEdge 2950 /
| > | FreeBSD 8.0 amd64, so that it reboots the machine in case of lockups.
| > | Right now it doesn't work:
| > |
| > | # watchdog
| > | watchdog: patting the dog: Operation not supported
| > | #
| > | Looking through the kernel configuration I found two relevant settings:
| > | In /sys/conf/NOTES:
| > | #
| > | # Add software watchdog routines.
| > | #
| > | options SW_WATCHDOG
| > |
| > | and in /sys/amd64/conf/NOTES:
| > | #
| > | # Watchdog routines.
| > | #
| > | options MP_WATCHDOG
| > |
| > | Which of them should I rebuild the kernel with? BTW, the existing kernel
| > | is built with the default "options SCHED_ULE" to make good use of
| > | multiple CPUs, does watchdog work with it?
| >
| > If no one has said yet, kldload ipmi then run watchdogd.  ... or compile
| > it into the kernel.  This will enable the IPMI HW watchdog.  If it triggers,
| > it will appear in the IPMI SEL (ipmitool sel list).
| 
| Thanks. So did I understand it right that I should first install 
| sysutils/ipmitool, then start polling "ipmitool sel list" in a shell 
| script from a cron job run once a minute, and reboot in case IPMI 
| triggers? But if it's a kernel lockup, none of the user level code might 
| run at all. Any way to fall back to a hard and fast kernel level machine 
| reset?

Nope, when you load the ipmi driver it provides a HW watchdog via ipmi
and works with watchdogd.  Now if you want to know if your machines 
rebooted due to the watchdog then check the ipmi sel for the watchdog 
event.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling watchdog

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 07:16:28AM -0700, Doug Ambrisko wrote:
> rihad writes:
> | On 05/14/2010 04:13 AM, Doug Ambrisko wrote:
> | > rihad writes:
> | > | Hi, I'm thinking of enabling the watchdog on our Dell PowerEdge 2950 /
> | > | FreeBSD 8.0 amd64, so that it reboots the machine in case of lockups.
> | > | Right now it doesn't work:
> | > |
> | > | # watchdog
> | > | watchdog: patting the dog: Operation not supported
> | > | #
> | > | Looking through the kernel configuration I found two relevant settings:
> | > | In /sys/conf/NOTES:
> | > | #
> | > | # Add software watchdog routines.
> | > | #
> | > | options SW_WATCHDOG
> | > |
> | > | and in /sys/amd64/conf/NOTES:
> | > | #
> | > | # Watchdog routines.
> | > | #
> | > | options MP_WATCHDOG
> | > |
> | > | Which of them should I rebuild the kernel with? BTW, the existing kernel
> | > | is built with the default "options SCHED_ULE" to make good use of
> | > | multiple CPUs, does watchdog work with it?
> | >
> | > If no one has said yet, kldload ipmi then run watchdogd.  ... or compile
> | > it into the kernel.  This will enable the IPMI HW watchdog.  If it 
> triggers,
> | > it will appear in the IPMI SEL (ipmitool sel list).
> | 
> | Thanks. So did I understand it right that I should first install 
> | sysutils/ipmitool, then start polling "ipmitool sel list" in a shell 
> | script from a cron job run once a minute, and reboot in case IPMI 
> | triggers? But if it's a kernel lockup, none of the user level code might 
> | run at all. Any way to fall back to a hard and fast kernel level machine 
> | reset?
> 
> Nope, when you load the ipmi driver it provides a HW watchdog via ipmi
> and works with watchdogd.  Now if you want to know if your machines 
> rebooted due to the watchdog then check the ipmi sel for the watchdog 
> event.

I'm a bit confused at this point, Doug.  At what point did the OP state
he has IPMI support or IPMI cards in his system?

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Terry Kennedy
> > Hmm.  You could try changing the code to not do a nested panic in that
> > case.  You would update subr_turnstile.c to just return if panicstr is
> > not NULL rather than calling panic.  However, there is still a good
> > chance you will end up deadlocking in that case.  I have another patch I
> > can send you next week that prevents blocking on mutexes duing a panic
> > which may also help.
>
> It would be instructive to know exactly why we were in turnstile(9) but 
> its likely due to mtx contention.
>

> AIX has some code at the beginning of all the locking operations to avoid 
> taking locks if we were running code out of kdb, though getting that worked 
> out was slightly tricky with our variant of mtx_assert(9).  I seem to recall
> there was also some "lockbusting" code that forcibly reset all owned locks 
> to have no owner, at least in some paths.

> Given that the system is single-cpu and should be single-threaded when 
> dumping, this seems to me to be something worth working through to get 
> more reliable dumps.  Except for mtx_assert(9) I cant think of a reason 
> to take locks once we start dumping or when in the debugger.

  As an aside, this is a quad-core in one package CPU (an X3363). On both
this box and a similar one with an X5470, console messages continue to
print out after "the system has been halted - press any key to reboot" -
in particular, the shutdown makes a bunch of the "behind the scenes" man-
agement stuff like the virtual keyboard and monitor appear. Plugging or
unplugging USB devices will go through the whole deal of detecting and
making their service available.

  I know the other CPUs are considered to still be running (hence the
"halting other CPUs" when you press a key to reboot), but this is the
first time I've seen device detection, attachment, etc. show up on the
console after a shutdown.

  Is this behavior to be expected, or is it as unexpected as it was to
me? Systems are Dell Poweredge R300's, 8-STABLE amd64.

> As an aside, with terribly corrupted locks Ive seen double panics when the 
> attempt to print the lock name faulted in strlen(9) called for printf(9), 
> due to a bad lockname pointer.  We have been able to get enough info off 
> these crashes to debug them, but its useful to remember that the system 
> may be in a very unstable state depending on why it panics.

  True. In these crashes, the system is doing essentially nothing except
the one application (which, unfortunately, I don't have the source code
for). The second crash happened right after booting the system, logging in,
and firing off the application. It left an identical footprint (other than
the 0x10 byte offset due to a recompiled kernel) from the first one, where
the system had been up for 13+ hours.

  So, in this case I don't think there was a bunch of corruption piling up
which triggered the fault, but instead the one simple operation and right
away - splat!

  As I mentioned in the original posting, I'd be glad to give a developer
complete access to the system via the remote console (Dell DRAC 5 web
interface) and to the underlying FreeBSD if it'll help pin down the prob-
lem.

  Another thing I could try (would take a couple days until I could get
someone to the site) would be to try this using a bge port instead of
the bce one. That might help pin it down to either something in the bce-
specific code path, or somewhere else in the stack.

Thanks,
Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 09:56:47AM -0400, Terry Kennedy wrote:
>   As an aside, this is a quad-core in one package CPU (an X3363). On both
> this box and a similar one with an X5470, console messages continue to
> print out after "the system has been halted - press any key to reboot" -
> in particular, the shutdown makes a bunch of the "behind the scenes" man-
> agement stuff like the virtual keyboard and monitor appear. Plugging or
> unplugging USB devices will go through the whole deal of detecting and
> making their service available.
> 
>   I know the other CPUs are considered to still be running (hence the
> "halting other CPUs" when you press a key to reboot), but this is the
> first time I've seen device detection, attachment, etc. show up on the
> console after a shutdown.
> 
>   Is this behavior to be expected, or is it as unexpected as it was to
> me? Systems are Dell Poweredge R300's, 8-STABLE amd64.

I've seen this behaviour before (on non-Dell hardware).  I'm under the
impression there's an interrupt handler that isn't being unloaded, and
that the driver framework within the kernel does not "unload" on
FreeBSD.  What exactly does FreeBSD do on a system halt?  I'm under the
impression the OS should be unloading its interrupt handlers and then
execute the HLT opcode on each processor/core.

I don't have a tendency to halt my Supermicro systems, but shutdown -r
now or shutdown -p now is pretty common.  I've noticed an overall
improvement with regards to the shutdown procedure and how long things
take during the final phases (after filesystems are unmounted, etc.)
with the below sysctl set (in /etc/sysctl.conf, but you can set it in
real-time via command-line).

hw.acpi.handle_reboot=1

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling watchdog

2010-05-14 Thread Doug Ambrisko
Tom Evans writes:
| On Fri, May 14, 2010 at 3:15 PM, Jeremy Chadwick
|  wrote:
| >
| > I'm a bit confused at this point, Doug. ?At what point did the OP state
| > he has IPMI support or IPMI cards in his system?
| 
| He said he had a Dell PowerEdge 2950 - iirc these all have IPMI.

... and although HW WD doesn't have to be in IPMI, I know for a fact
it is on the base config. of a Dell PE2950 and has been since the PE2650.
However, on the 2650 I saw false trips.  It was one of the reasons I wrote 
ipmi(4).  Eventually, I need to get in sync with jhb to add kernel 
back-trace support to it.  I have some code at work to do it but it needs 
some work to ensure it works in every case etc.

BTW, there is code/patches floating around to control the LCD on these
Dell machines via ipmitool and on the r710 control attributes of the LCD.
Unfortunately the ipmitool folks haven't pick it up.

Doug A.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling watchdog

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 03:21:42PM +0100, Tom Evans wrote:
> On Fri, May 14, 2010 at 3:15 PM, Jeremy Chadwick
>  wrote:
> >
> > I'm a bit confused at this point, Doug.  At what point did the OP state
> > he has IPMI support or IPMI cards in his system?
> >
> 
> He said he had a Dell PowerEdge 2950 - iirc these all have IPMI.

Ah, thanks Tom!  I had no idea.  It surprised me when the conversation
turned from software watchdogs to IPMI hardware watchdogs.  :-)

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any chance of someone commiting the patch in bin/131861 ?

2010-05-14 Thread Ulrich Spörlein
On Fri, 14.05.2010 at 10:17:23 +0100, Pete French wrote:
> 
> > Postfix will re-write this as part of sanitization, so I had to revert
> > to creating mbox files by hand. Anyway, could you please test the
> > following patch with a wider variety of mails?
> 
> I've been testing your patch for a few weeks now as my main email
> client, and I havent encountered any problems - it also does fix
> the reply issues I was originally having. Do you want to attach it
> to the PR ? After that maybe someone could commit it - I am pretty
> certain it doesnt actualy break any exising behaviour.
> 
> cheers,
> 
> -pete.

I'll try to get review by some fellow FreeBSD dev that is more familiar
with our mail(1) history and then commit the changed eventually.

Do you feel strongly about merging the fix to 8 or 7 or both?

Regards,
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling watchdog

2010-05-14 Thread Tom Evans
On Fri, May 14, 2010 at 3:15 PM, Jeremy Chadwick
 wrote:
>
> I'm a bit confused at this point, Doug.  At what point did the OP state
> he has IPMI support or IPMI cards in his system?
>

He said he had a Dell PowerEdge 2950 - iirc these all have IPMI.

Cheers

Tom
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Any chance of someone commiting the patch in bin/131861 ?

2010-05-14 Thread Pete French
> Do you feel strongly about merging the fix to 8 or 7 or both?

Not really - it;s such a small change that it would seem a shame
not to commit it to earlier releases, but then I used 7 thoughout
it;s lifetime with the bug only being a minor annoyance.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Matthew Fleming
>   As an aside, this is a quad-core in one package CPU (an X3363). On both
> this box and a similar one with an X5470, console messages continue to
> print out after "the system has been halted - press any key to reboot" -
> in particular, the shutdown makes a bunch of the "behind the scenes" man-
> agement stuff like the virtual keyboard and monitor appear. Plugging or
> unplugging USB devices will go through the whole deal of detecting and
> making their service available.

Oops, youre right that other CPUs are running.

The stop_cpus() call is only made if kdb is entered.  doadump() is called out 
of boot() which comes later.  At Isilon weve been running with a patch that 
does stop_cpus() pretty close to the front of panic(9).

As an design decision it seems reasonable to call stop_cpus() early in panic(9) 
simply because most causes for panic means something unexpected, and the sooner 
the other CPUs arent running the more likely it is that they dont do more 
damage, leaving the system in a more useful state for dump or {g,d}db analysis. 
 This should be done before dump or entering kdb.

Im ccing -current@ since I would like a small discussion of moving the 
stop_cpus() to earlier in panic.  If this change is agreeable I can roll up a 
patch and test it on CURRENT.  Im not sure yet how much of the other 
panic-related changes we have made at Isilon would be required.

Thanks,
matthew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Matthew Jacob

   Matthew Fleming wrote:

  As an aside, this is a quad-core in one package CPU (an X3363). On both
this box and a similar one with an X5470, console messages continue to
print out after "the system has been halted - press any key to reboot" -
in particular, the shutdown makes a bunch of the "behind the scenes" man-
agement stuff like the virtual keyboard and monitor appear. Plugging or
unplugging USB devices will go through the whole deal of detecting and
making their service available.


Oops, youre right that other CPUs are running.

The stop_cpus() call is only made if kdb is entered.  doadump() is called out o
f boot() which comes later.  At Isilon weve been running with a patch that does
 stop_cpus() pretty close to the front of panic(9).

As an design decision it seems reasonable to call stop_cpus() early in panic(9)
 simply because most causes for panic means something unexpected, and the soone
r the other CPUs arent running the more likely it is that they dont do more dam
age, leaving the system in a more useful state for dump or {g,d}db analysis.  T
his should be done before dump or entering kdb.

Im ccing -current@ since I would like a small discussion of moving the stop_cpu
s() to earlier in panic.  If this change is agreeable I can roll up a patch and
 test it on CURRENT.  Im not sure yet how much of the other panic-related chang
es we have made at Isilon would be required.
  

   Work along this lines has been done at Panasas. We were planning on
   put it back to the community. There turns out to be lots of edge cases
   by changing this that we're still sorting thru.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer

Hi list,

I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 SATA 
controller on-board (do not have the RAID controller).


The system has 2 disks in a gmirror setup. Every now and then, probably 
under some load, one of the disks gets read or write timeouts like:

May  5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command
May  5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5). 
ad4[WRITE(offset=200404975104, length=16384)]
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider ad4 
disconnected.


or:

May 13 14:41:26 aberdeen kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 
retry left) LBA=975513887


Sometimes the read/write succeeds after a few retries, but sometimes it 
does not, so geom_mirror throws the disk out of the mirror.


Tonight ad6 was thrown out of the mirror and ad4 then gave actual read 
errors, resulting in a big mess :(


My question: does anyone have experience with FreeBSD on a Dell R300 or 
can anyone give me some help in trying to fix the timeouts?


I was told using AHCI could be better for SATA disks, but apparently 
(http://permalink.gmane.org/gmane.linux.kernel.pci/8267) the BIOS does 
not support turning that on, so that does not appear to be an option.


Thanks,
Pieter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Adam Vande More
On Fri, May 14, 2010 at 12:42 PM, Pieter de Boer  wrote:

> I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 SATA
> controller on-board (do not have the RAID controller).
>
> The system has 2 disks in a gmirror setup. Every now and then, probably
> under some load, one of the disks gets read or write timeouts like:
> May  5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command
> May  5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command
> May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5).
> ad4[WRITE(offset=200404975104, length=16384)]
> May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider ad4
> disconnected.
>

Have you tried replacing/checking the cables?  Does it always happen to ad4?
 Your drive could be dying, try swapping it out and see if the errors
persist.

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer

Adam Vande More wrote:


May  5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command
May  5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5).
ad4[WRITE(offset=200404975104, length=16384)]
May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider ad4
disconnected.



Have you tried replacing/checking the cables?  Does it always happen to ad4?
 Your drive could be dying, try swapping it out and see if the errors
persist.

It happens to both drives and to both drives I replaced a month ago with 
these. Didn't replace the cables back then, but they were correctly 
attached and are now. Also it would be odd that both cables are broken 
at the same time.


--
Pieter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Terry Kennedy
> Oops, youre right that other CPUs are running.
>
> The stop_cpus() call is only made if kdb is entered.  doadump() is called 
> out of boot() which comes later.  At Isilon weve been running with a patch 
> that does stop_cpus() pretty close to the front of panic(9).

  This is interesting, and changing the behavior will probably allow the
crash dump for the original problem (repeatable crash in the bce driver)
to be analyzed.

  At the moment, I'm more interested in dealing with the original problem
of the crash in bce. Right now, I'm running this vendor's product under
Linux compatibility mode. The vendor is hard at work building a native
FreeBSD version of their product. One of two things is going to happen
here: 1) the crash doesn't happen in native mode due to different code
paths being taken, and I lose the ability to reproduce the crash when the
box goes into production, or 2) the crash continues to happen and the ven-
dor gets the impression FreeBSD is unstable and not worth supporting. I'd
like to avoid that.

  So, any ideas on how to troubleshoot the panic in bce?

Thanks,
Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 07:42:33PM +0200, Pieter de Boer wrote:
> Hi list,
> 
> I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9
> SATA controller on-board (do not have the RAID controller).
> 
> The system has 2 disks in a gmirror setup. Every now and then,
> probably under some load, one of the disks gets read or write
> timeouts like:
> May  5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command
> May  5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command
> May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed
> (error=5). ad4[WRITE(offset=200404975104, length=16384)]
> May  5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider
> ad4 disconnected.
> 
> or:
> 
> May 13 14:41:26 aberdeen kernel: ad6: TIMEOUT - READ_DMA48 retrying
> (1 retry left) LBA=975513887
> 
> Sometimes the read/write succeeds after a few retries, but sometimes
> it does not, so geom_mirror throws the disk out of the mirror.
> 
> Tonight ad6 was thrown out of the mirror and ad4 then gave actual
> read errors, resulting in a big mess :(
> 
> My question: does anyone have experience with FreeBSD on a Dell R300
> or can anyone give me some help in trying to fix the timeouts?

Could you please do the following:

- Provide output from "vmstat -i"

- Provide output from "dmesg | grep -i ata"

- Install ports/sysutils/smartmontools (5.40 or later) and provide
  full output from commands "smartctl -a /dev/ad4" and "smartctl -a
  /dev/ad6"

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Pieter de Boer



My question: does anyone have experience with FreeBSD on a Dell R300
or can anyone give me some help in trying to fix the timeouts?


Could you please do the following:

- Provide output from "vmstat -i"

- Provide output from "dmesg | grep -i ata"

- Install ports/sysutils/smartmontools (5.40 or later) and provide
  full output from commands "smartctl -a /dev/ad4" and "smartctl -a
  /dev/ad6"


The ad4 SMART output is showing errors, as this disk is indeed broken 
now. It wasn't before and it is a replacement of another disk that 
wasn't broken either. Grmbl, I now see reallocated sectors on ad6 as 
well, in the smartctl output. So both disks look wonky; although afaik 
that's not the main issue here.


I've attached the smartctl output as separate files. smartmontools 5.40 
does not appear to exist; I used 5.39.1, the latest port version.


Attached also the vmstat -i and dmesg output.

--
Pieter
smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.0-RELEASE-p1 i386] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Black family
Device Model: WDC WD5001AALS-00L3B2
Serial Number:WD-WCASYA964063
Firmware Version: 01.03B01
User Capacity:500,107,862,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Fri May 14 23:01:49 2010 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85) Offline data collection activity
was aborted by an interrupting command 
from host.
Auto Offline Data Collection: Enabled.
Self-test execution status:  ( 241) Self-test routine in progress...
10% of test remaining.
Total time to complete Offline 
data collection: (11160) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:( 131) minutes.
Conveyance self-test routine
recommended polling time:(   5) minutes.
SCT capabilities:  (0x3037) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f   200   200   051Pre-fail  Always   
-   78
  3 Spin_Up_Time0x0027   184   168   021Pre-fail  Always   
-   3791
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   992
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x002e   200   200   000Old_age   Always   
-   0
  9 Power_On_Hours  0x0032   099   099   000Old_age   Always   
-   827
 10 Spin_Retry_Count0x0032   100   100   000Old_age   Always   
-   0
 11 Calibration_Retry_Count 0x0032   100   100   000Old_age   Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   990
192 Power-Off_Retract_Count 0x0032   199   199   000Old_age   Always   
-   989
193 Load_Cycle_Count0x0032   200   200   000Old_age   Always   
-   992
194 Temperature_Celsius 0x0022   125   109   000Old_age   Always   
-   22
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0032   200   198   000Old_age   Always   
- 

Re: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 11:09:28PM +0200, Pieter de Boer wrote:
> The ad4 SMART output is showing errors, as this disk is indeed
> broken now. It wasn't before and it is a replacement of another disk
> that wasn't broken either. Grmbl, I now see reallocated sectors on
> ad6 as well, in the smartctl output. So both disks look wonky;
> although afaik that's not the main issue here.

Lots to say about all of this.

Focusing on drive ad4 (Western Digital):

The disk has 1 uncorrected sector (Attribute 198).  This means the drive
tried to remap it and was not successful.  This could have happened any
time during the lifetime of the drive.  There are no pending sector
reallocations (Attribute 197) (meaning there aren't others which are bad
which the drive is waiting to attempt remapping for), and there are no
remapped sectors (Attribute 5).  There have been no successful
reallocation attempts during the drive's lifetime (Attribute 196).

In general, I would say this is acceptable.  If Attribute 198 was
higher, or you had other pending sectors which needed to be remapped,
I'd say replace the disk.

UDMA/CRC error count (Attribute 199) is zero.  That's good -- it means
that most likely cabling issues can be ruled out, since the attribute
tracks the number of communication errors between the controller and the
disk PCB.

Drive temperature looks good, so nothing to worry about there.

The drive itself has detected numerous error conditions in the SMART
error log during its lifetime -- a total of 48, but SMART only lists the
most recent 5.  The drive has been online for a total of 827 hours
(Attribute 9), which we can use to determine how recent the drive
experienced said errors.  Let's examine the first 3:

> Error 48 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
>   40 51 00 9d 84 0e e0  Error: UNC at LBA = 0x000e849d = 951453
>   c8 00 20 00 84 0e 00 00  00:45:18.204  READ DMA
> 
> Error 47 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
>   40 51 00 0c 9d 0e e0  Error: UNC at LBA = 0x000e9d0c = 957708
>   c8 00 80 00 9b 0e 00 00  00:03:08.605  READ DMA
> 
> Error 46 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
>   40 51 00 9d 84 0e e0  Error: UNC at LBA = 0x000e849d = 951453
>   c8 00 80 80 82 0e 00 00  00:03:05.176  READ DMA

Okay, it's probably safe to assume these are all signs of the
uncorrected sector.  When a drive attempts a LBA remap -- which in this
case it did, but failed -- it can spend quite a bit of time doing that;
in some cases minutes, not seconds.

The drive essentially "locks up" during this time (from the perspective
of the SATA controller) -- it's literally spending all of its time
trying to read and re-read the LBA/sector in different ways, hoping to
get the data out of it (and/or correct it with ECC) so that it can be
written to a spare block and then internally the bad LBA won't ever be
used again.  What the OS ends up seeing in this situation is disk
timeouts.  This is completely normal.

The WD Caviar Black drives have a useful feature called TLER -- it's
disabled by default, for reasons which I don't want to get into here --
which can force the drive to internally give up after X seconds (it's
user-selectable) when dealing with such remapping/errors.  The idea is
to keep the drive from being deemed dead from the OS/controller's point
of view.  I believe Seagate, Hitachi, or Samsung (I forget which) have
this feature as well, but it's not called TLER.

Anyway, so this is probably the cause of one detachment/timeout you've
seen FreeBSD report.  Let's move on to the 2 remaining errors:

> Error 45 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
>   40 51 08 20 47 6c e0  Error: UNC at LBA = 0x006c4720 = 7096096
>   c4 ff 08 ff 46 6c 00 00  00:01:09.459  READ MULTIPLE
> 
> Error 44 occurred at disk power-on lifetime: 817 hours (34 days + 1 hours)
>   40 51 08 21 8e 67 e0  Error: UNC at LBA = 0x00678e21 = 6786593
>   c4 ff 04 3f 2f 00 00 00  00:01:00.724  READ MULTIPLE

These two happened around the same time (10 seconds within one another).
I'm under the impression that these are *probably* the result of the
above uncorrected sector issue, but I'm not 100% certain.  Here's why I
think that:

- The errors occurred within the same hour mark (817) as the previous 3
  errors,
- The errors happened only 2 minutes prior to the preceding 3,
- The drive was in the process of executing READ MULTIPLE (cmd 0xc4),
  which tells the disk to read multiple logical sectors within 1 pass.

The ATA-8 specification states that READ MULTIPLE is a PIO command.  I'm
not sure how/why FreeBSD would be submitting this to a disk unless the
communication protocol had been downgraded from DMA to PIO.

mav@ might have some insights on this, as well as how to decode some of
the SMART error data shown.  It looks like the 48-bit read input block
is written in reverse order (word 5 to word 0).

If you want to find out the exact L

RE: Read / write timeouts on SATA disks connected to ICH9

2010-05-14 Thread Terry Kennedy
On Fri May 14 22:42:38 UTC 2010, Jeremy Chadwick wrote:
> Finally, your vmstat -i output:
>
> > # vmstat -i
> > interrupt  total   rate
> > irq23: atapci0 371021299  10423
>
> Good to know there's no IRQ sharing going on, but what does worry me is
> the interrupt rate (10K interrupts/second).  That seems *extremely*
> high, but it also depends on what kind of disk I/O is happening on this
> system -- especially since you have 2 disks attached to the same
> controller.

I have a bunch of R300's here. From one that is using the on-board SATA
and 2 drives in a gmirror setup (very similar to the OP) after 18 hours
of uptime:

[0:2] speedtest:~> vmstat -i
interrupt  total   rate
irq23: atapci0254116  3

  I haven't specifically done any stress testing on this box, though I did
do a "make -j8 buildworld" during the initial gmirror synchronization. 8-}

  The drives are a pair of Dell-labeled 160GB "SAMSUNG HE161HJ 1AC01121"
that shipped with the box.

  I also have another R300 with Dell's "SAS 6/iR" card (a re-branded LSI
1068-something, seen as "mpt" by FreeBSD). While Dell only sells that as
part of a package deal with the hot-swap backplane and redundant power
supplies, there's no reason you couldn't pick one up on eBay and add it
yourself. You'll need some sort of breakout cable to get from the big
connector on the SAS 6 to individual SATA ports.

Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_7 tinderbox] failure on amd64/amd64

2010-05-14 Thread FreeBSD Tinderbox
TB --- 2010-05-15 02:06:57 - tinderbox 2.6 running on freebsd-stable.sentex.ca
TB --- 2010-05-15 02:06:57 - starting RELENG_7 tinderbox run for amd64/amd64
TB --- 2010-05-15 02:06:57 - cleaning the object tree
TB --- 2010-05-15 02:07:27 - cvsupping the source tree
TB --- 2010-05-15 02:07:27 - /usr/bin/csup -z -r 3 -g -L 1 -h localhost -s 
/tinderbox/RELENG_7/amd64/amd64/supfile
TB --- 2010-05-15 02:07:38 - building world
TB --- 2010-05-15 02:07:38 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-05-15 02:07:38 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-05-15 02:07:38 - TARGET=amd64
TB --- 2010-05-15 02:07:38 - TARGET_ARCH=amd64
TB --- 2010-05-15 02:07:38 - TZ=UTC
TB --- 2010-05-15 02:07:38 - __MAKE_CONF=/dev/null
TB --- 2010-05-15 02:07:38 - cd /src
TB --- 2010-05-15 02:07:38 - /usr/bin/make -B buildworld
>>> World build started on Sat May 15 02:07:39 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> stage 5.1: building 32 bit shim libraries
>>> World build completed on Sat May 15 03:38:13 UTC 2010
TB --- 2010-05-15 03:38:13 - generating LINT kernel config
TB --- 2010-05-15 03:38:13 - cd /src/sys/amd64/conf
TB --- 2010-05-15 03:38:13 - /usr/bin/make -B LINT
TB --- 2010-05-15 03:38:13 - building LINT kernel
TB --- 2010-05-15 03:38:13 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-05-15 03:38:13 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-05-15 03:38:13 - TARGET=amd64
TB --- 2010-05-15 03:38:13 - TARGET_ARCH=amd64
TB --- 2010-05-15 03:38:13 - TZ=UTC
TB --- 2010-05-15 03:38:13 - __MAKE_CONF=/dev/null
TB --- 2010-05-15 03:38:13 - cd /src
TB --- 2010-05-15 03:38:13 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Sat May 15 03:38:13 UTC 2010
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3.1: making dependencies
[...]
===> em (depend)
@ -> /src/sys
machine -> /src/sys/amd64/include
awk -f @/tools/makeobjops.awk @/kern/device_if.m -h
awk -f @/tools/makeobjops.awk @/kern/bus_if.m -h
awk -f @/tools/makeobjops.awk @/dev/pci/pci_if.m -h
ln -sf /obj/amd64/src/sys/LINT/opt_inet.h opt_inet.h
make: don't know how to make if_lem.c. Stop
*** Error code 2

Stop in /src/sys/modules.
*** Error code 1

Stop in /obj/amd64/src/sys/LINT.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2010-05-15 03:40:22 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2010-05-15 03:40:22 - ERROR: failed to build lint kernel
TB --- 2010-05-15 03:40:22 - 4650.35 user 524.41 system 5605.26 real


http://tinderbox.freebsd.org/tinderbox-releng_7-RELENG_7-amd64-amd64.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_7 tinderbox] failure on i386/i386

2010-05-14 Thread FreeBSD Tinderbox
TB --- 2010-05-15 03:21:53 - tinderbox 2.6 running on freebsd-stable.sentex.ca
TB --- 2010-05-15 03:21:53 - starting RELENG_7 tinderbox run for i386/i386
TB --- 2010-05-15 03:21:53 - cleaning the object tree
TB --- 2010-05-15 03:22:18 - cvsupping the source tree
TB --- 2010-05-15 03:22:18 - /usr/bin/csup -z -r 3 -g -L 1 -h localhost -s 
/tinderbox/RELENG_7/i386/i386/supfile
TB --- 2010-05-15 03:22:30 - building world
TB --- 2010-05-15 03:22:30 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-05-15 03:22:30 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-05-15 03:22:30 - TARGET=i386
TB --- 2010-05-15 03:22:30 - TARGET_ARCH=i386
TB --- 2010-05-15 03:22:30 - TZ=UTC
TB --- 2010-05-15 03:22:30 - __MAKE_CONF=/dev/null
TB --- 2010-05-15 03:22:30 - cd /src
TB --- 2010-05-15 03:22:30 - /usr/bin/make -B buildworld
>>> World build started on Sat May 15 03:22:31 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Sat May 15 04:26:42 UTC 2010
TB --- 2010-05-15 04:26:42 - generating LINT kernel config
TB --- 2010-05-15 04:26:42 - cd /src/sys/i386/conf
TB --- 2010-05-15 04:26:42 - /usr/bin/make -B LINT
TB --- 2010-05-15 04:26:42 - building LINT kernel
TB --- 2010-05-15 04:26:42 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-05-15 04:26:42 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-05-15 04:26:42 - TARGET=i386
TB --- 2010-05-15 04:26:42 - TARGET_ARCH=i386
TB --- 2010-05-15 04:26:42 - TZ=UTC
TB --- 2010-05-15 04:26:42 - __MAKE_CONF=/dev/null
TB --- 2010-05-15 04:26:42 - cd /src
TB --- 2010-05-15 04:26:42 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Sat May 15 04:26:42 UTC 2010
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3.1: making dependencies
[...]
===> em (depend)
@ -> /src/sys
machine -> /src/sys/i386/include
awk -f @/tools/makeobjops.awk @/kern/device_if.m -h
awk -f @/tools/makeobjops.awk @/kern/bus_if.m -h
awk -f @/tools/makeobjops.awk @/dev/pci/pci_if.m -h
ln -sf /obj/src/sys/LINT/opt_inet.h opt_inet.h
make: don't know how to make if_lem.c. Stop
*** Error code 2

Stop in /src/sys/modules.
*** Error code 1

Stop in /obj/src/sys/LINT.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2010-05-15 04:29:04 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2010-05-15 04:29:04 - ERROR: failed to build lint kernel
TB --- 2010-05-15 04:29:04 - 3362.72 user 358.23 system 4030.56 real


http://tinderbox.freebsd.org/tinderbox-releng_7-RELENG_7-i386-i386.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_7 tinderbox] failure on i386/pc98

2010-05-14 Thread FreeBSD Tinderbox
TB --- 2010-05-15 03:40:23 - tinderbox 2.6 running on freebsd-stable.sentex.ca
TB --- 2010-05-15 03:40:23 - starting RELENG_7 tinderbox run for i386/pc98
TB --- 2010-05-15 03:40:23 - cleaning the object tree
TB --- 2010-05-15 03:40:43 - cvsupping the source tree
TB --- 2010-05-15 03:40:43 - /usr/bin/csup -z -r 3 -g -L 1 -h localhost -s 
/tinderbox/RELENG_7/i386/pc98/supfile
TB --- 2010-05-15 03:40:52 - building world
TB --- 2010-05-15 03:40:52 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-05-15 03:40:52 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-05-15 03:40:52 - TARGET=pc98
TB --- 2010-05-15 03:40:52 - TARGET_ARCH=i386
TB --- 2010-05-15 03:40:52 - TZ=UTC
TB --- 2010-05-15 03:40:52 - __MAKE_CONF=/dev/null
TB --- 2010-05-15 03:40:52 - cd /src
TB --- 2010-05-15 03:40:52 - /usr/bin/make -B buildworld
>>> World build started on Sat May 15 03:40:54 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Sat May 15 04:44:43 UTC 2010
TB --- 2010-05-15 04:44:43 - generating LINT kernel config
TB --- 2010-05-15 04:44:43 - cd /src/sys/pc98/conf
TB --- 2010-05-15 04:44:43 - /usr/bin/make -B LINT
TB --- 2010-05-15 04:44:43 - building LINT kernel
TB --- 2010-05-15 04:44:43 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-05-15 04:44:43 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-05-15 04:44:43 - TARGET=pc98
TB --- 2010-05-15 04:44:43 - TARGET_ARCH=i386
TB --- 2010-05-15 04:44:43 - TZ=UTC
TB --- 2010-05-15 04:44:43 - __MAKE_CONF=/dev/null
TB --- 2010-05-15 04:44:43 - cd /src
TB --- 2010-05-15 04:44:43 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Sat May 15 04:44:43 UTC 2010
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3.1: making dependencies
[...]
@ -> /src/sys
machine -> /src/sys/pc98/include
i386 -> /src/sys/i386/include
awk -f @/tools/makeobjops.awk @/kern/device_if.m -h
awk -f @/tools/makeobjops.awk @/kern/bus_if.m -h
awk -f @/tools/makeobjops.awk @/dev/pci/pci_if.m -h
ln -sf /obj/pc98/src/sys/LINT/opt_inet.h opt_inet.h
make: don't know how to make if_lem.c. Stop
*** Error code 2

Stop in /src/sys/modules.
*** Error code 1

Stop in /obj/pc98/src/sys/LINT.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2010-05-15 04:46:43 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2010-05-15 04:46:43 - ERROR: failed to build lint kernel
TB --- 2010-05-15 04:46:43 - 3309.74 user 364.98 system 3980.45 real


http://tinderbox.freebsd.org/tinderbox-releng_7-RELENG_7-i386-pc98.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [releng_7 tinderbox] failure on amd64/amd64

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 11:40:23PM -0400, FreeBSD Tinderbox wrote:
> ===> em (depend)
> @ -> /src/sys
> machine -> /src/sys/amd64/include
> awk -f @/tools/makeobjops.awk @/kern/device_if.m -h
> awk -f @/tools/makeobjops.awk @/kern/bus_if.m -h
> awk -f @/tools/makeobjops.awk @/dev/pci/pci_if.m -h
> ln -sf /obj/amd64/src/sys/LINT/opt_inet.h opt_inet.h
> make: don't know how to make if_lem.c. Stop
> *** Error code 2
> 
> Stop in /src/sys/modules.
> *** Error code 1
> 
> Stop in /obj/amd64/src/sys/LINT.
> *** Error code 1

Jack, did you break em(4) (or lem in this case) again?  :-)

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[releng_7 tinderbox] failure on ia64/ia64

2010-05-14 Thread FreeBSD Tinderbox
TB --- 2010-05-15 04:29:04 - tinderbox 2.6 running on freebsd-stable.sentex.ca
TB --- 2010-05-15 04:29:04 - starting RELENG_7 tinderbox run for ia64/ia64
TB --- 2010-05-15 04:29:04 - cleaning the object tree
TB --- 2010-05-15 04:29:28 - cvsupping the source tree
TB --- 2010-05-15 04:29:28 - /usr/bin/csup -z -r 3 -g -L 1 -h localhost -s 
/tinderbox/RELENG_7/ia64/ia64/supfile
TB --- 2010-05-15 04:29:40 - building world
TB --- 2010-05-15 04:29:40 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-05-15 04:29:40 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-05-15 04:29:40 - TARGET=ia64
TB --- 2010-05-15 04:29:40 - TARGET_ARCH=ia64
TB --- 2010-05-15 04:29:40 - TZ=UTC
TB --- 2010-05-15 04:29:40 - __MAKE_CONF=/dev/null
TB --- 2010-05-15 04:29:40 - cd /src
TB --- 2010-05-15 04:29:40 - /usr/bin/make -B buildworld
>>> World build started on Sat May 15 04:29:41 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
>>> World build completed on Sat May 15 05:55:48 UTC 2010
TB --- 2010-05-15 05:55:48 - generating LINT kernel config
TB --- 2010-05-15 05:55:48 - cd /src/sys/ia64/conf
TB --- 2010-05-15 05:55:48 - /usr/bin/make -B LINT
TB --- 2010-05-15 05:55:48 - building LINT kernel
TB --- 2010-05-15 05:55:48 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-05-15 05:55:48 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-05-15 05:55:48 - TARGET=ia64
TB --- 2010-05-15 05:55:48 - TARGET_ARCH=ia64
TB --- 2010-05-15 05:55:48 - TZ=UTC
TB --- 2010-05-15 05:55:48 - __MAKE_CONF=/dev/null
TB --- 2010-05-15 05:55:48 - cd /src
TB --- 2010-05-15 05:55:48 - /usr/bin/make -B buildkernel KERNCONF=LINT
>>> Kernel build for LINT started on Sat May 15 05:55:49 UTC 2010
>>> stage 1: configuring the kernel
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3.1: making dependencies
[...]
===> em (depend)
@ -> /src/sys
machine -> /src/sys/ia64/include
awk -f @/tools/makeobjops.awk @/kern/device_if.m -h
awk -f @/tools/makeobjops.awk @/kern/bus_if.m -h
awk -f @/tools/makeobjops.awk @/dev/pci/pci_if.m -h
ln -sf /obj/ia64/src/sys/LINT/opt_inet.h opt_inet.h
make: don't know how to make if_lem.c. Stop
*** Error code 2

Stop in /src/sys/modules.
*** Error code 1

Stop in /obj/ia64/src/sys/LINT.
*** Error code 1

Stop in /src.
*** Error code 1

Stop in /src.
TB --- 2010-05-15 05:57:23 - WARNING: /usr/bin/make returned exit code  1 
TB --- 2010-05-15 05:57:23 - ERROR: failed to build lint kernel
TB --- 2010-05-15 05:57:23 - 4602.17 user 363.85 system 5299.09 real


http://tinderbox.freebsd.org/tinderbox-releng_7-RELENG_7-ia64-ia64.full
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [releng_7 tinderbox] failure on amd64/amd64

2010-05-14 Thread Jack Vogel
DUH, forgot to add the file, lol.  Fix coming shortly

Jack


On Fri, May 14, 2010 at 9:54 PM, Jeremy Chadwick
wrote:

> On Fri, May 14, 2010 at 11:40:23PM -0400, FreeBSD Tinderbox wrote:
> > ===> em (depend)
> > @ -> /src/sys
> > machine -> /src/sys/amd64/include
> > awk -f @/tools/makeobjops.awk @/kern/device_if.m -h
> > awk -f @/tools/makeobjops.awk @/kern/bus_if.m -h
> > awk -f @/tools/makeobjops.awk @/dev/pci/pci_if.m -h
> > ln -sf /obj/amd64/src/sys/LINT/opt_inet.h opt_inet.h
> > make: don't know how to make if_lem.c. Stop
> > *** Error code 2
> >
> > Stop in /src/sys/modules.
> > *** Error code 1
> >
> > Stop in /obj/amd64/src/sys/LINT.
> > *** Error code 1
>
> Jack, did you break em(4) (or lem in this case) again?  :-)
>
> --
> | Jeremy Chadwick   j...@parodius.com |
> | Parodius Networking   http://www.parodius.com/ |
> | UNIX Systems Administrator  Mountain View, CA, USA |
> | Making life hard for others since 1977.  PGP: 4BD6C0CB |
>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"