Re: Panic on ZFS startup after crash

2008-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 12:29:54AM +0200, Daniel Eriksson wrote:
 Pawel Jakub Dawidek wrote:
 
  Can you try this patch?
  
  http://people.freebsd.org/~pjd/patches/space_map.c.patch
 
 Now it panics (solaris assert) at line 431 in dmu.c. I'll try to get a
 backtrace in a day or two if it would help.

The backtrace won't help here. I'm afraid your pool's metadata is
somehow corrupted that ZFS can't handle that. I saw warnings in your
first e-mail about ZFS not beeing able to replay ZIL. Can you try
disabling ZIL? Something like:

# zpool export name
# kldunload zfs
# kenv vfs.zfs.zil_disable=1
# kldload zfs
# zpool import name

Although I'm not sure if disabling ZIL will prevent replaying previously
prepared ZIL. If that won't help, I'm afraid the last suggestion I can
provide is to try the lastest ZFS version (I can prepare a patch for you
in a few days).

The panic you're seeing is in dmu_write() function. You could also try
to import a pool read-only, but I just tried doing so with
'zpool import -o ro name' command and it mount file systems
read-write. Not sure why it doesn't work, but I'll try to fix it today.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpZI9uza4kOn.pgp
Description: PGP signature


Re: Panic on ZFS startup after crash

2008-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 11:02:36AM +0200, Pawel Jakub Dawidek wrote:
 On Mon, Jul 21, 2008 at 12:29:54AM +0200, Daniel Eriksson wrote:
  Pawel Jakub Dawidek wrote:
  
   Can you try this patch?
   
 http://people.freebsd.org/~pjd/patches/space_map.c.patch
  
  Now it panics (solaris assert) at line 431 in dmu.c. I'll try to get a
  backtrace in a day or two if it would help.
 
 The backtrace won't help here. I'm afraid your pool's metadata is
 somehow corrupted that ZFS can't handle that. I saw warnings in your
 first e-mail about ZFS not beeing able to replay ZIL. Can you try
 disabling ZIL? Something like:
 
   # zpool export name
   # kldunload zfs
   # kenv vfs.zfs.zil_disable=1
   # kldload zfs
   # zpool import name
 
 Although I'm not sure if disabling ZIL will prevent replaying previously
 prepared ZIL. If that won't help, I'm afraid the last suggestion I can
 provide is to try the lastest ZFS version (I can prepare a patch for you
 in a few days).
 
 The panic you're seeing is in dmu_write() function. You could also try
 to import a pool read-only, but I just tried doing so with
 'zpool import -o ro name' command and it mount file systems
 read-write. Not sure why it doesn't work, but I'll try to fix it today.

I fixed 'zpool import -o ro' problem in HEAD, but you can also patch
your 7.0 sources with this patch:

http://people.freebsd.org/~pjd/patches/opensolaris_vfs.c.2.patch

With this patch applied and ZIL disabled, try to:

# zpool import -o ro name

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpw13LpCk4z9.pgp
Description: PGP signature


Re: ACPI regression on recent 7.0-STABLE: HPET stops working

2008-07-21 Thread Oleg V. Nauman

Quoting Oleg V. Nauman [EMAIL PROTECTED]:


Quoting Jeremy Chadwick [EMAIL PROTECTED]:


On Sat, Jul 19, 2008 at 10:03:15AM +0300, Oleg V. Nauman wrote:

It seems to be something was changed with ACPI support on 7.0-STABLE so
my next system upgrade ended with ACPI HPET not working anymore on my
ASUS A9Rp laptop.

Here is the part of /var/log/dmesg.today dated July 13:

FreeBSD 7.0-STABLE #65: Tue Jul  8 22:05:07 EEST 2008
   [EMAIL PROTECTED]:/usr/src/sys/i386/compile/oleg2
[..]
acpi0: A M I OEMRSDT on motherboard
acpi0: Overriding SCI Interrupt from IRQ 9 to IRQ 21
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, 77f0 (3) failed
Timecounter ACPI-safe frequency 3579545 Hz quality 850
acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
acpi_ec0: Embedded Controller: GPE 0x18 port 0x62,0x66 on acpi0
acpi_hpet0: High Precision Event Timer iomem
0xfed0-0xfed003ff on acpi0

Timecounter HPET frequency 14318180 Hz quality 900

Here is the fresh dmesg output info:

FreeBSD 7.0-STABLE #66: Tue Jul 15 22:11:27 EEST 2008
   [EMAIL PROTECTED]:/usr/src/sys/i386/compile/oleg2
[..]
acpi0: A M I OEMRSDT on motherboard
acpi0: Overriding SCI Interrupt from IRQ 9 to IRQ 21
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, 77f0 (3) failed
Timecounter ACPI-safe frequency 3579545 Hz quality 850
acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
[..]
acpi_hpet0: High Precision Event Timer iomem
0xfed0-0xfed003ff on acpi0

device_attach: acpi_hpet0 attach returned 12

And the part of actual sysctl kern.timecounter output:

kern.timecounter.choice: TSC(800) ACPI-safe(850) i8254(0) dummy(-100)
kern.timecounter.hardware: ACPI-safe


Seems okay here:

FreeBSD icarus.home.lan 7.0-STABLE FreeBSD 7.0-STABLE #0: Sat Jul   
12  10:53:08 PDT 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/PDSMI_PLUS_amd64  amd64


acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
acpi_hpet0: High Precision Event Timer iomem   
0xfed0-0xfed003ff on acpi0

Timecounter i8254 frequency 1193182 Hz quality 0
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
Timecounter HPET frequency 14318180 Hz quality 900
Timecounters tick every 1.000 msec

kern.timecounter.choice: TSC(-100) HPET(900) ACPI-fast(1000)
i8254(0) dummy(-100)

kern.timecounter.hardware: ACPI-fast

You sure you haven't upgraded your BIOS or something and forgot to
re-enable HPET?


 No it was not upgraded.. Have no option to enable/disable HPET through
BIOS settings though


 I was unclear a bit or so. There are no ACPI related settings in my  
laptop's BIOS.


 Well.. Backout 1.243.2.3 revision of /usr/src/sys/dev/acpica/acpi.c  
(committed to RELENG_7 at July 10 by jhb) fixes this issue for me:


acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0
Timecounter HPET frequency 14318180 Hz quality 900

kern.timecounter.choice: TSC(800) HPET(900) ACPI-safe(850) i8254(0)  
dummy(-100)

kern.timecounter.hardware: HPET

 Hopefully it helps to understand what is went wrong there.

Oleg









___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ACPI regression on recent 7.0-STABLE: HPET stops working

2008-07-21 Thread Jeremy Chadwick
On Mon, Jul 21, 2008 at 01:07:52PM +0300, Oleg V. Nauman wrote:
  Well.. Backout 1.243.2.3 revision of /usr/src/sys/dev/acpica/acpi.c  
 (committed to RELENG_7 at July 10 by jhb) fixes this issue for me:

 acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0
 Timecounter HPET frequency 14318180 Hz quality 900

 kern.timecounter.choice: TSC(800) HPET(900) ACPI-safe(850) i8254(0)  
 dummy(-100)
 kern.timecounter.hardware: HPET

  Hopefully it helps to understand what is went wrong there.

John, do you have any ideas WRT this regression?  HPET on this user's
system has the most granularity.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Multi-machine mirroring choices

2008-07-21 Thread Pete French
 The *big* issue I have right now is dealing with the slave machine going
 down. Once the master no longer has a connection to the ggated devices,
 all processes trying to use the device hang in D status. I have tried
 pkill'ing ggatec to no avail and ggatec destroy returns a message of
 gctl being busy. Trying to ggatec destroy -f panics the machine.

Oddly enough, this was the issue I had with iscsi which made me move
to using ggated instead. On our machines I use '-t 10' as an argument to
ggatec, and this makes it timeout once the connection has been down for
a certain amount of time. I am using gmirror on top, not ZFS, and this
handled the drive vanishing from the mirror quite happily. I haven't
tried it with ZFS, which may not like having the device suddenly dissapear.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: HP Pavilion dv2000 laptop wont boot off install cd

2008-07-21 Thread Kevin K
 On Thu, 17 Jul 2008 08:31:37 -0400
 Kevin K [EMAIL PROTECTED] wrote:
 
  For 7.0-RELEASE, it
  seemed to hang at Trying to mount root from ufs:/dev/md0.
 
 How long did you wait? If you didn't wait 10 or 15 minutes, please do.
 Various tests / probes take a long time to time out on some hardware.
 
 HTH
 --
 Regards,
 Torfinn Ingolfsen


I tried the 7.0-release-amd64 200807 snapshot and it booted (after holding
the spacebar @ /boot/loader.conf). It stopped at Trying to mount root from
ufs:/dev/md0. I waited about 30-45 minutes and it didn't continue from that
point -- the keyboard was also unresponsive.

Does anyone know if this is a known issue?



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Panic on ZFS startup after crash

2008-07-21 Thread Daniel Eriksson
Pawel Jakub Dawidek wrote:

 I'm afraid your pool's metadata is
 somehow corrupted that ZFS can't handle that.

Yes, that's my conclusion also. It looks like the intent log is messed
up enough to trigger an assert while ZFS tries to parse/replay it.

 I saw warnings in your
 first e-mail about ZFS not beeing able to replay ZIL. Can you try
 disabling ZIL? Something like:

I've already tried this, and it made no difference. When the box crashed
ZIL was enabled, and for some reason garbage got written into the ZIL.
Now whenever ZFS tries to import the pool it sees a non-empty ZIL and
tries to parse/replay it.

Is there an easy way to trick ZFS into thinking the ZIL is empty?

 Although I'm not sure if disabling ZIL will prevent replaying 
 previously prepared ZIL.

It won't unfortunately.

 If that won't help, I'm afraid the last suggestion I can
 provide is to try the lastest ZFS version (I can prepare a 
 patch for you in a few days).

I could probably prepare a temporary install of 8-CURRENT on a spare
drive and boot from that if it's easier for you to make a patch against
CURRENT instead of RELENG_7_0.

 You could also try 
 to import a pool read-only, but I just tried doing so with
 'zpool import -o ro name' command and it mount file systems
 read-write. Not sure why it doesn't work, but I'll try to fix 
 it today.

I'll try that!

___
Daniel Eriksson (http://www.toomuchdata.com/)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Panic on ZFS startup after crash

2008-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 03:49:24PM +0200, Daniel Eriksson wrote:
 Pawel Jakub Dawidek wrote:
 
  I'm afraid your pool's metadata is
  somehow corrupted that ZFS can't handle that.
 
 Yes, that's my conclusion also. It looks like the intent log is messed
 up enough to trigger an assert while ZFS tries to parse/replay it.
 
  I saw warnings in your
  first e-mail about ZFS not beeing able to replay ZIL. Can you try
  disabling ZIL? Something like:
 
 I've already tried this, and it made no difference. When the box crashed
 ZIL was enabled, and for some reason garbage got written into the ZIL.
 Now whenever ZFS tries to import the pool it sees a non-empty ZIL and
 tries to parse/replay it.
 
 Is there an easy way to trick ZFS into thinking the ZIL is empty?

I'll check that.

  If that won't help, I'm afraid the last suggestion I can
  provide is to try the lastest ZFS version (I can prepare a 
  patch for you in a few days).
 
 I could probably prepare a temporary install of 8-CURRENT on a spare
 drive and boot from that if it's easier for you to make a patch against
 CURRENT instead of RELENG_7_0.

The ZFS code in 7.0 is the same as in HEAD, so no worries.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp0sRKS33wJ4.pgp
Description: PGP signature


Re: Panic on ZFS startup after crash

2008-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 03:51:56PM +0200, Pawel Jakub Dawidek wrote:
 On Mon, Jul 21, 2008 at 03:49:24PM +0200, Daniel Eriksson wrote:
  Pawel Jakub Dawidek wrote:
  
   I'm afraid your pool's metadata is
   somehow corrupted that ZFS can't handle that.
  
  Yes, that's my conclusion also. It looks like the intent log is messed
  up enough to trigger an assert while ZFS tries to parse/replay it.
  
   I saw warnings in your
   first e-mail about ZFS not beeing able to replay ZIL. Can you try
   disabling ZIL? Something like:
  
  I've already tried this, and it made no difference. When the box crashed
  ZIL was enabled, and for some reason garbage got written into the ZIL.
  Now whenever ZFS tries to import the pool it sees a non-empty ZIL and
  tries to parse/replay it.
  
  Is there an easy way to trick ZFS into thinking the ZIL is empty?
 
 I'll check that.

Ok. We may try not to replay the ZIL, but leave it there and see what
will happen. We can also try to destroy the ZIL without replaying it.

What we do from now on can mess up your pool even further, so you may
want to backup entire disks if you want.

To skip replaying the ZIL you need to edit
/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c file, find
zil_replay() function and make the head of it looks like this:

void
zil_replay(objset_t *os, void *arg, uint64_t *txgp,
zil_replay_func_t *replay_func[TX_MAX_TYPE])
{
zilog_t *zilog = dmu_objset_zil(os);
const zil_header_t *zh = zilog-zl_header;
zil_replay_arg_t zr;

/* XXX: Try to skip the ZIL replay. */
return;

if (zil_empty(zilog)) {
zil_destroy(zilog, B_TRUE);
return;
}
[...]

If that won't work, we can try to destroy the ZIL without replaying it:

void
zil_replay(objset_t *os, void *arg, uint64_t *txgp,
zil_replay_func_t *replay_func[TX_MAX_TYPE])
{
zilog_t *zilog = dmu_objset_zil(os);
const zil_header_t *zh = zilog-zl_header;
zil_replay_arg_t zr;

/* XXX: Destroy the ZIL without replaying it. */
zil_destroy(zilog, B_FALSE);
return;

if (zil_empty(zilog)) {
zil_destroy(zilog, B_TRUE);
return;
}
[...]

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpibWYBElZFD.pgp
Description: PGP signature


Re: Multi-machine mirroring choices

2008-07-21 Thread Sven W



Pete French presumably uttered the following on 07/21/08 07:08:

The *big* issue I have right now is dealing with the slave machine going
down. Once the master no longer has a connection to the ggated devices,
all processes trying to use the device hang in D status. I have tried
pkill'ing ggatec to no avail and ggatec destroy returns a message of
gctl being busy. Trying to ggatec destroy -f panics the machine.


Oddly enough, this was the issue I had with iscsi which made me move
to using ggated instead. On our machines I use '-t 10' as an argument to
ggatec, and this makes it timeout once the connection has been down for
a certain amount of time. I am using gmirror on top, not ZFS, and this
handled the drive vanishing from the mirror quite happily. I haven't
tried it with ZFS, which may not like having the device suddenly dissapear.

-pete.


What I have found is that the master machine will lock up if the slave disappears 
during a large file transfer. I tested this by setting up zpool mirror on the master 
using a ggatec device from the slave. Then I:


pkill'ed ggated on the slave machine.

dd if=/dev/zero of=/data1/testfile2 bs=16k count=8192   [128MB] on the master

The dd command finished and the /var/log/messages showed I/O errors to the slave 
drive as expected. Messages also showed ggatec trying to reconnect every 10 seconds 
(ggatec was started with the -t 10 parameter).


Finally zfs marked the drive unavailable which then allowed me to ggatec destroy -u 
0 without getting the ioctl(/dev/ggctl): Device busy error message. (By the way, 
using ggatec destroy does not kill the ggatec create that created the process to 
begin with, I had to pkill ggatec to get that stop - bug?)


The above behavior would be acceptable for multi-machine mirroring as it would be 
scriptable.


The problem comes with Large writes. I tried to repeat the above with

dd if=/dev/zero of=/data1/testfile2 bs=16k count=32768 [512MB]

which then locks zfs,  and ultimately the system itself. It seems once the write 
size/buffer is full, zfs is unable to fail/unavail the slave drive and the entire 
system becomes unresponsive (cannot even ssh into it).


The bottom line is that without some type of timeout or time to fail (bad I/O to 
fail?) zpool + ggate[cd] seems to be an unworkable solution. This is actually a 
shame as the recover process swapping from master to slave and back again was so 
much cleaner and faster than using gmirror.



Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HP Pavilion dv2000 laptop wont boot off install cd

2008-07-21 Thread Carlos A. M. dos Santos
On Mon, Jul 21, 2008 at 9:26 AM, Kevin K [EMAIL PROTECTED] wrote:
 On Thu, 17 Jul 2008 08:31:37 -0400
 Kevin K [EMAIL PROTECTED] wrote:

  For 7.0-RELEASE, it
  seemed to hang at Trying to mount root from ufs:/dev/md0.

 How long did you wait? If you didn't wait 10 or 15 minutes, please do.
 Various tests / probes take a long time to time out on some hardware.

 HTH
 --
 Regards,
 Torfinn Ingolfsen


 I tried the 7.0-release-amd64 200807 snapshot and it booted (after holding
 the spacebar @ /boot/loader.conf).

I have seen this on some HP notebooks. It seems that the CD drive does
not to stabilize on time before booting, leading to some disk read
errors. The trick is to press F9 (or F12, depending on the notebook
model) to force the BIOS to show the boot device menu. Then, *after
the CD drive stop spinning*, choose the boot from CD/DVD option.

 It stopped at Trying to mount root from
 ufs:/dev/md0. I waited about 30-45 minutes and it didn't continue from that
 point -- the keyboard was also unresponsive.

 Does anyone know if this is a known issue?

Try to disable kbdmux before booting. Jump to the loader prompt and type:

set hint.kbdmux.0.disabled=1
boot -v

--
If you think things can't get worse it's probably only
because you lack sufficient imagination.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 7.1 and BIND exploit

2008-07-21 Thread Doug Barton

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Brett Glass wrote:
| Everyone:
|
| Will FreeBSD 7.1 be released in time to use it as an upgrade to
| close the BIND cache poisoning hole?

Brett, et al,

I'll make this simple for you. If you have a server that is running
BIND, update BIND now. If you need to use the ports, that's fine, just
do it now. Make sure that you are not specifying a port via any
query-source* options in named.conf, and that any firewall between
your named process and the outside world does keep-state on outgoing
UDP packets.

If you have a system with BIND installed (as it is by default) but you
are NOT running named, you don't need to worry about updating now, but
you should do it soonish just in case someone gets a wild hair and
starts up named on that box.

As for the meta-question, FreeBSD is currently operating on a
time-based release schedule, not a feature-based one. And to your
actual question, the answer is no.


hope this helps,

Doug

- --

~This .signature sanitized for your protection

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)

iEYEAREDAAYFAkiE4A0ACgkQyIakK9Wy8PtSWACeN+lmId1jdMF9zGt3v905XEgy
bT8AoJtmWCWRjyXSktaeJ6IHiwJas7Fk
=vtRp
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 7.1 and BIND exploit

2008-07-21 Thread Max Laier
On Monday 21 July 2008 21:14:22 Doug Barton wrote:
 Brett Glass wrote:
 | Everyone:
 |
 | Will FreeBSD 7.1 be released in time to use it as an upgrade to
 | close the BIND cache poisoning hole?

 Brett, et al,

 I'll make this simple for you. If you have a server that is running
 BIND, update BIND now. If you need to use the ports, that's fine, just
 do it now. Make sure that you are not specifying a port via any
 query-source* options in named.conf, and that any firewall between
 your named process and the outside world does keep-state on outgoing
 UDP packets.

... and that any NAT device employs at least a somewhat random port 
allocation mechanism - pf provides this.

 If you have a system with BIND installed (as it is by default) but you
 are NOT running named, you don't need to worry about updating now, but
 you should do it soonish just in case someone gets a wild hair and
 starts up named on that box.

 As for the meta-question, FreeBSD is currently operating on a
 time-based release schedule, not a feature-based one. And to your
 actual question, the answer is no.


 hope this helps,

 Doug



-- 
/\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 7.1 and BIND exploit

2008-07-21 Thread Kevin Oberman
 From: Max Laier [EMAIL PROTECTED]
 Date: Mon, 21 Jul 2008 21:38:46 +0200
 Sender: [EMAIL PROTECTED]
 
 On Monday 21 July 2008 21:14:22 Doug Barton wrote:
  Brett Glass wrote:
  | Everyone:
  |
  | Will FreeBSD 7.1 be released in time to use it as an upgrade to
  | close the BIND cache poisoning hole?
 
  Brett, et al,
 
  I'll make this simple for you. If you have a server that is running
  BIND, update BIND now. If you need to use the ports, that's fine, just
  do it now. Make sure that you are not specifying a port via any
  query-source* options in named.conf, and that any firewall between
  your named process and the outside world does keep-state on outgoing
  UDP packets.
 
 ... and that any NAT device employs at least a somewhat random port 
 allocation mechanism - pf provides this.

And, if you are not sure how good a job it does (and I am not), you
should use the OARC test to check how well it works:
dig +short porttest.dns-oarc.net TXT

If the result is not GOOD, it's not good enough.

You can test a remote server by adding @remote-server to the dig
command. The server may be specified by name or IP address.

Don't forget that ANY server that caches data, including an end system
running a caching only server is vulnerable.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751


pgpLrCTf4xVR4.pgp
Description: PGP signature


Re: ACPI regression on recent 7.0-STABLE: HPET stops working

2008-07-21 Thread John Baldwin
On Monday 21 July 2008 06:46:48 am Jeremy Chadwick wrote:
 On Mon, Jul 21, 2008 at 01:07:52PM +0300, Oleg V. Nauman wrote:
   Well.. Backout 1.243.2.3 revision of /usr/src/sys/dev/acpica/acpi.c  
  (committed to RELENG_7 at July 10 by jhb) fixes this issue for me:
 
  acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on 
acpi0
  Timecounter HPET frequency 14318180 Hz quality 900
 
  kern.timecounter.choice: TSC(800) HPET(900) ACPI-safe(850) i8254(0)  
  dummy(-100)
  kern.timecounter.hardware: HPET
 
   Hopefully it helps to understand what is went wrong there.
 
 John, do you have any ideas WRT this regression?  HPET on this user's
 system has the most granularity.

ENOCONTEXT.  I will try to find the thread in my stable@ inbox, but right now 
I can't tell anything from just this e-mail.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


chipset causing locks.

2008-07-21 Thread Jo Rhett
Thanks for the note.  No, just a coincidence.   The chipset is a VIA  
ProSavageDDR KM266.


But thanks for bringing that up ;-)

FWIW, as others have speculated enabling more logging from GEOM  
produced nothing.  It does appear to be a hardware failure of some sort.


On Jul 18, 2008, at 11:29 PM, Peter Wemm wrote:
On Wed, Jul 16, 2008 at 2:42 PM, Jo Rhett [EMAIL PROTECTED] 
 wrote:

On Fri, Jul 11, 2008 at 12:59:33AM -0700, Jo Rhett wrote:


Every time it is rebuilding ad0.   Every single boot in the last  
two

weeks.


On Jul 11, 2008, at 9:49 AM, Clifton Royston wrote:


That just means that it halted without a proper shutdown.  If it
crashes, the mirror isn't stopped properly, so it's marked dirty,  
so it

must rebuild it.  It is the precise analogy of finding all the file
systems dirty on boot and fscking them, following a crash.



Thanks for the clarification.  Dang, I hoped I was on to something.


This is really off on a tangent, but I thought I'd mention it on the
off-chance that it fit your problem.

Recently there have been grumblings about heat problems with certain
nvidia chipsets on consumer boards.  Apparently, there is some process
issue, if you believe trade rags like theinquirer.net etc.  Apparently
there is some issue with heat damage over time.  Consumer motherboards
with passive cooled (no fan) heat pipes etc seem to be particularly
vulnerable.  I use the word apparently because it is far from a
verified fact.

However, I've got two motherboards, one running freebsd, one running
windows, with nvidia chipsets.  Both used to be fine with onboard IDE
activity.  Both now use raid controllers so the IDE interfaces have
been idle for a good year or so.

Something came up and I had to use the IDE interfaces for a lot of
data transfer.  Suddenly, both machines are flakey.  The windows
machine blue screens under load.  My freebsd box just turns off
(motherboard appears to power off, but the power supply is on still).
The same happens when I use a linux boot disk, so I know its not
FreeBSD's fault.

The common factor seems to be that the motherboards are now about a
year and a half old.  They both have the same nvidia south bridge that
theinquirer.net was trashing.   Both used to work fine, now have
problems with IDE.  and now I recalled the article and started
wondering...

Do you, by any wildly remote chance, have an nvidia based motherboard?

I believe the fault I'm seeing is the system asserting a fatal error
by doing a HT ECC flood to halt everything.

--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];  
KI6FJV

All of this is for nothing if we don't go to the stars - JMS/B5
If Java had true garbage collection, most programs would delete
themselves upon execution. -- Robert Sewell


--
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Panic on ZFS startup after crash

2008-07-21 Thread Nenhum_de_Nos
 The ZFS code in 7.0 is the same as in HEAD, so no worries.

I'm trying zfs myself in a small enviroment at home, but for that I do
follow 7-STABLE. there's no need to do that, as based in the above
statement ?

thanks,

matheus

-- 
We will call you cygnus,
The God of balance you shall be

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ACPI regression on recent 7.0-STABLE: HPET stops working

2008-07-21 Thread John Baldwin
On Monday 21 July 2008 06:07:52 am Oleg V. Nauman wrote:
 Quoting Oleg V. Nauman [EMAIL PROTECTED]:
 
  Quoting Jeremy Chadwick [EMAIL PROTECTED]:
 
  On Sat, Jul 19, 2008 at 10:03:15AM +0300, Oleg V. Nauman wrote:
  It seems to be something was changed with ACPI support on 7.0-STABLE so
  my next system upgrade ended with ACPI HPET not working anymore on my
  ASUS A9Rp laptop.
 
  Here is the part of /var/log/dmesg.today dated July 13:
 
  FreeBSD 7.0-STABLE #65: Tue Jul  8 22:05:07 EEST 2008
 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/oleg2
  [..]
  acpi0: A M I OEMRSDT on motherboard
  acpi0: Overriding SCI Interrupt from IRQ 9 to IRQ 21
  acpi0: [ITHREAD]
  acpi0: Power Button (fixed)
  acpi0: reservation of 0, a (3) failed
  acpi0: reservation of 10, 77f0 (3) failed
  Timecounter ACPI-safe frequency 3579545 Hz quality 850
  acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
  acpi_ec0: Embedded Controller: GPE 0x18 port 0x62,0x66 on acpi0
  acpi_hpet0: High Precision Event Timer iomem
  0xfed0-0xfed003ff on acpi0
  Timecounter HPET frequency 14318180 Hz quality 900
 
  Here is the fresh dmesg output info:
 
  FreeBSD 7.0-STABLE #66: Tue Jul 15 22:11:27 EEST 2008
 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/oleg2
  [..]
  acpi0: A M I OEMRSDT on motherboard
  acpi0: Overriding SCI Interrupt from IRQ 9 to IRQ 21
  acpi0: [ITHREAD]
  acpi0: Power Button (fixed)
  acpi0: reservation of 0, a (3) failed
  acpi0: reservation of 10, 77f0 (3) failed
  Timecounter ACPI-safe frequency 3579545 Hz quality 850
  acpi_timer0: 32-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
  [..]
  acpi_hpet0: High Precision Event Timer iomem
  0xfed0-0xfed003ff on acpi0
  device_attach: acpi_hpet0 attach returned 12
 
  And the part of actual sysctl kern.timecounter output:
 
  kern.timecounter.choice: TSC(800) ACPI-safe(850) i8254(0) 
dummy(-100)
  kern.timecounter.hardware: ACPI-safe
 
  Seems okay here:
 
  FreeBSD icarus.home.lan 7.0-STABLE FreeBSD 7.0-STABLE #0: Sat Jul   
  12  10:53:08 PDT 2008
  [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PDSMI_PLUS_amd64  amd64
 
  acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
  acpi_hpet0: High Precision Event Timer iomem   
  0xfed0-0xfed003ff on acpi0
  Timecounter i8254 frequency 1193182 Hz quality 0
  Timecounter ACPI-fast frequency 3579545 Hz quality 1000
  Timecounter HPET frequency 14318180 Hz quality 900
  Timecounters tick every 1.000 msec
 
  kern.timecounter.choice: TSC(-100) HPET(900) ACPI-fast(1000)
  i8254(0) dummy(-100)
  kern.timecounter.hardware: ACPI-fast
 
  You sure you haven't upgraded your BIOS or something and forgot to
  re-enable HPET?
 
   No it was not upgraded.. Have no option to enable/disable HPET through
  BIOS settings though
 
   I was unclear a bit or so. There are no ACPI related settings in my  
 laptop's BIOS.
 
   Well.. Backout 1.243.2.3 revision of /usr/src/sys/dev/acpica/acpi.c  
 (committed to RELENG_7 at July 10 by jhb) fixes this issue for me:
 
 acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on 
acpi0
 Timecounter HPET frequency 14318180 Hz quality 900
 
 kern.timecounter.choice: TSC(800) HPET(900) ACPI-safe(850) i8254(0)  
 dummy(-100)
 kern.timecounter.hardware: HPET
 
   Hopefully it helps to understand what is went wrong there.

Ok, so the attempt to allocate the resource is failing for some reason.  Can 
you get output from 'devinfo -r' and 'devinfo -u'?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 7.1 and BIND exploit

2008-07-21 Thread Brett Glass
At 02:24 PM 7/21/2008, Kevin Oberman wrote:

Don't forget that ANY server that caches data, including an end system
running a caching only server is vulnerable.

Actually, there is an exception to this. A forward only cache/resolver is 
only as vulnerable as its forwarder(s). This is a workaround for the 
vulnerability for folks who have systems that they cannot easily upgrade: point 
at a trusted forwarder that's patched.

We're also looking at using dnscache from the djbdns package. It's really 
idiosyncratic, but seems to work well -- and if you're just doing a caching 
resolver you don't have to touch it once you get it configured.

Of course, all solutions that randomize ports are really just security by 
obscurity, because by shuffling ports you're hiding the way to poison your 
cache... a little.

--Brett Glass

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 7.1 and BIND exploit

2008-07-21 Thread Charles Sprickman

On Mon, 21 Jul 2008, Kevin Oberman wrote:


From: Max Laier [EMAIL PROTECTED]
Date: Mon, 21 Jul 2008 21:38:46 +0200
Sender: [EMAIL PROTECTED]

On Monday 21 July 2008 21:14:22 Doug Barton wrote:

Brett Glass wrote:
| Everyone:
|
| Will FreeBSD 7.1 be released in time to use it as an upgrade to
| close the BIND cache poisoning hole?

Brett, et al,

I'll make this simple for you. If you have a server that is running
BIND, update BIND now. If you need to use the ports, that's fine, just
do it now. Make sure that you are not specifying a port via any
query-source* options in named.conf, and that any firewall between
your named process and the outside world does keep-state on outgoing
UDP packets.


... and that any NAT device employs at least a somewhat random port
allocation mechanism - pf provides this.


And, if you are not sure how good a job it does (and I am not), you
should use the OARC test to check how well it works:
dig +short porttest.dns-oarc.net TXT

If the result is not GOOD, it's not good enough.


I was playing around with this a bit.  It seems like a patched server will 
give a standard deviation of more than 18,000.  If I make some queries 
behind a one-to-many NAT using pf, it falls to somewhere around 6,000 
(with a patched BIND - unpatched is pitiful).


PF is not *adding* any randomness to unpatched servers.  Since it has a 
(non-configurable?) range of ports it will grab when doing outbound NAT, 
the results are not as good as with no NAT intervention, but passable I 
suppose.


Of course in a 1:1 NAT setup it is transparent.

Charles


You can test a remote server by adding @remote-server to the dig
command. The server may be specified by name or IP address.

Don't forget that ANY server that caches data, including an end system
running a caching only server is vulnerable.
--
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD 7.1 and BIND exploit

2008-07-21 Thread Max Laier
On Tuesday 22 July 2008 00:31:53 Charles Sprickman wrote:
 On Mon, 21 Jul 2008, Kevin Oberman wrote:
  From: Max Laier [EMAIL PROTECTED]
  Date: Mon, 21 Jul 2008 21:38:46 +0200
  Sender: [EMAIL PROTECTED]
 
  On Monday 21 July 2008 21:14:22 Doug Barton wrote:
  Brett Glass wrote:
  | Everyone:
  |
  | Will FreeBSD 7.1 be released in time to use it as an upgrade to
  | close the BIND cache poisoning hole?
 
  Brett, et al,
 
  I'll make this simple for you. If you have a server that is running
  BIND, update BIND now. If you need to use the ports, that's fine,
  just do it now. Make sure that you are not specifying a port via
  any query-source* options in named.conf, and that any firewall
  between your named process and the outside world does keep-state on
  outgoing UDP packets.
 
  ... and that any NAT device employs at least a somewhat random port
  allocation mechanism - pf provides this.
 
  And, if you are not sure how good a job it does (and I am not), you
  should use the OARC test to check how well it works:
  dig +short porttest.dns-oarc.net TXT
 
  If the result is not GOOD, it's not good enough.

 I was playing around with this a bit.  It seems like a patched server
 will give a standard deviation of more than 18,000.  If I make some
 queries behind a one-to-many NAT using pf, it falls to somewhere around
 6,000 (with a patched BIND - unpatched is pitiful).

While the standard deviation gives some *indication* about the randomness 
of the selection it is no real measurement for its quality.

 PF is not *adding* any randomness to unpatched servers.  Since it has a
 (non-configurable?) range of ports it will grab when doing outbound
 NAT, the results are not as good as with no NAT intervention, but
 passable I suppose.

You can configure the range on a per-NAT-rule basis, e.g.:

 nat on $ext_if inet from ! ($ext_if) to any - ($ext_if) port 1024:65535

but you have to take care so that you don't collide with the ephemeral 
port range of the host itself.

Obviously you can't do much about an unpatched bind as with UDP there is 
no notion of connection so pf (or any NAT device for that matter) has 
to keep the NAT binding open for some time and a quick sequence of 
queries to the same server will be sent out through the same port.  So 
putting a NAT firewall in front of a DNS resolver is *NOT* a workaround!

 Of course in a 1:1 NAT setup it is transparent.

 Charles

  You can test a remote server by adding @remote-server to the dig
  command. The server may be specified by name or IP address.
 
  Don't forget that ANY server that caches data, including an end
  system running a caching only server is vulnerable.
  --
  R. Kevin Oberman, Network Engineer
  Energy Sciences Network (ESnet)
  Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
  E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
  Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751



-- 
/\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]