Re: ndis with Intel 2915 wireless, getting "NDIS ERROR" messagesand deattach on boot

2005-05-03 Thread Carl Gustavsson
Darren Pilgrim wrote:
From: Darren Pilgrim
From: Carl Gustavsson <[EMAIL PROTECTED]> 
   

Have you tested the iwi-driver?
See: http://damien.bergamini.free.fr/ipw/iwi-freebsd.html
 

I haven't yet.  Reading that page has brought up another questions.  On
   

the
 

page it says 5-STABLE doesn't support WPA.  My wireless network uses WPA.
Is this still the case?  I know -stable is -stable, but WPA something of a
show-stopper, if you ask me.
Fortunately, my neighborhood is a well-covered sea of open "linksys" and
"NETGEAR" APs in default configuration with to test the driver.
   

I'm using driver version 1.3.4 and firmware version 2.2.  The driver appears
to attach just fine: 

iwi0:  mem 0xdfcfd000-0xdfcfdfff irq
5 at device 3.0 on pci3
iwi0: Ethernet address: 00:0e:35:f6:d6:5c
iwi0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps
iwi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
iwi0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps
36Mbps 48Mbps 54Mbps
However, when I attempt to associate and get an IP address, I get "iwi0:
fatal error" when running dhclient.  Setting debug.iwi=10 and
debug.ieee80211=10, I get the following debug output when I run the firmware
download and dhclient commands:
# iwicontrol iwi0 -d /root/if_iwi/firmware-2.2 -m bss
Firmware cached: boot 6464, ucode 16326, main 166952
# dhclient iwi0
INTR!0x0100
INTR!0x0100
Setting MAC address to 00:0e:35:f6:d6:5c
TX!CMD!11!6
INTR!0x0800
Configuring adapter
TX!CMD!6!20
INTR!0x0800
Setting power mode to 0
TX!CMD!17!4
INTR!0x0800
Setting RTS threshold to 2312
TX!CMD!15!4
INTR!0x0800
Setting .11bg supported rates (12)
TX!CMD!22!16
INTR!0x0800
Setting .11a supported rates (8)
TX!CMD!22!16
INTR!0x0800
Setting initialization vector to 693451133
TX!CMD!34!4
INTR!0x0800
Enabling adapter
TX!CMD!2!0
INTR!0x0800
ieee80211_next_scan: chan 56->60
Start scanning
TX!CMD!20!60
INTR!0x0800
INTR!0x0002
Notification (20)
INTR!0x0002
Scan channel (36)
INTR!0x0002
Scan channel (40)
INTR!0x0002
Scan channel (44)
INTR!0x0002
Scan channel (48)
INTR!0x0002
RX!DATA!68!52!58
ieee80211_recv_mgmt: new probe response on chan 52 (bss chan 52) "191" from
00:0c:db:81:5e:a8
ieee80211_recv_mgmt: caps 0x401 bintval 100 erp 0x0
ieee80211_recv_mgmt: country info 55 53 20 24 04 11 34 04 17 95 05 1e
INTR!0x0002
Notification (25)
Scan channel (52)
INTR!0x0002
RX!DATA!68!56!50
ieee80211_recv_mgmt: new probe response on chan 56 (bss chan 56) "191" from
00:0c:db:81:5e:4a
ieee80211_recv_mgmt: caps 0x401 bintval 100 erp 0x0
ieee80211_recv_mgmt: country info 55 53 20 24 04 11 34 04 17 95 05 1e
INTR!0x0002
Notification (25)
Scan channel (56)
INTR!0x0002
Scan channel (60)
INTR!0x0002
Scan channel (64)
INTR!0x0002
Scan channel (149)
INTR!0x0002
Scan channel (153)
INTR!0x0002
Scan channel (157)
INTR!0x4000
iwi0: fatal error
At which point dhclient stalls for several seconds before switching to its
background mode.  `ifconfig iwi0` then shows "status: no carrier" and no IP
address assignment.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"
 

I use static IP:s in my wireless network and it works pretty fine (a 
2200BG-card). Gets some "iwi0: fatal error" but it works though.
I don't know how to solve your problem.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: PostgreSQL in FreeBSD jails

2005-05-03 Thread Dag-Erling Smørgrav
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> 'k, I've been doing multiple since 7.2 on the same machine, all on the
> same port, all different IPs, all on 4.x servers ... have never had an
> issue with crashes (its pretty much my most stable 4.x server) ...

It was never possible.  8.0 has a hack to detect and avoid shared
memory collisions, but I think it will still have problems with
semaphores.  I have no idea why it works (or seems to work) for you;
it never did for anyone else.

DES
-- 
Dag-Erling Smørgrav - [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: gmirror oddities

2005-05-03 Thread Craig Boston
On Tue, May 03, 2005 at 02:38:08PM -0500, Craig Boston wrote:
> I'm sorry that I don't remember who said it (I'll do some googling and
> follow up if I can find the reference), but one time this came up
> someone posted a very good idea which IMHO is a good enough solution to
> make the default.  That is, instead of hardcoding the provider name, put
> the provider *size* into the metadata.

Aha, it was in fact PJD:

http://lists.freebsd.org/pipermail/freebsd-geom/2005-February/000528.html
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: gmirror oddities

2005-05-03 Thread Craig Boston
On Tue, May 03, 2005 at 11:34:06AM -0700, George Hartzell wrote:
> The fix is described in the fourth comment block of Ralf's doc, either
> make the slice a sector smaller than the disk device or hardcode the
> provider name.  I've been using the hardcoding approach, and it seems
> to work for me.

I'm sorry that I don't remember who said it (I'll do some googling and
follow up if I can find the reference), but one time this came up
someone posted a very good idea which IMHO is a good enough solution to
make the default.  That is, instead of hardcoding the provider name, put
the provider *size* into the metadata.

In theory, that would give the geom classes enough information to deduce
which provider to attach to in all normal cases.  The only catch is if
the size changes somehow after it's been labeled, but that is usually
the sign of something else wrong that will eventually bite you.  It
could simply revert back to the current behavior in that case.

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: gmirror oddities

2005-05-03 Thread Eirik Øverby
On 03-05-05 20:34, "George Hartzell" <[EMAIL PROTECTED]> wrote:

> 
> Eirik Øverby writes:
>> Hi!
>> 
>> I've been using gmirror for a while to safeguard my system disks. I have
>> taken the slice-based mirror approach, where I use, say, ad0s1 and ad2s1 as
>> providers.
>> On one of my servers, this seems to be impossible. I create the mirror using
>> ad2s1 first (to keep my system running while I do some of the work), and
>> then I re-initialize ad0s1 (making it exactly the size of ad2s1) before
>> using gmirror insert to add it to the mirror.
>> However, at this point - when doing a gmirror list - it turns out that it
>> never added ad0s1 as a provider, but ad0 itself! As a result, I now have a
>> load of slices (ad0a, ad0b, ad0d, ad0e, ad0f) instead of having the same
>> structure as I have on ad2s1. It's just like ad2s1, just without the "s1"
>> part.
>> 
>> I've tried "dd if=/dev/zero of=/dev/ad0 bs=65536" a couple of times, in case
>> some old provider metadata was stored there. I also have exactly the same
>> setup in another server, the only difference being that it behaves as
>> expected..
>> 
>> Am I doing something blatantly wrong here? This IS supposed to work, right?
>> I've even found a very nice description of how to do it at
>> http://people.freebsd.org/~rse/mirror/
>> confirming that what I'm doing is right.
>> 
>> I'm on 5.4-PRERELEASE, but this problem has been there since 5.3-p2 or
>> something, which was when I first tried this.
> 
> I bet you're getting bitten by a problem that bit me.  It's described
> in the fine print in http://people.freebsd.org/~rse/mirror/.
> 
> Gmirror saves it's metadata on the last sector of its disk space.
> Since the slice (adXs1) and the disk device (adX) end at the same
> place on the disk, gmirror gets confused.  It tastes devices in a
> particular order, apparently devices first, then slices.  It finds the
> metadata when it tastes adX and goes ahead and uses it, even though it
> should be associating it w/ adXsY.  Hilarity ensues
> 
> The fix is described in the fourth comment block of Ralf's doc, either
> make the slice a sector smaller than the disk device or hardcode the
> provider name.  I've been using the hardcoding approach, and it seems
> to work for me.

Same here, tried that immediately after my last post.
Apologies for the noise ;)

/Eirik


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Experimental ttwwakeup() panic patch

2005-05-03 Thread Doug White
Hey folks,

I've taken a crack at working around the ttwwakeup() panic thats been
reported now and again.  My early analysis, based on debugging output from
rwatson, is that a defunct struct tty gets reused without cleaning out the
associated (stale) knote structures, and the ttwwakeup() at the end of
sioopen() jumps off into space when it finds them.

This patch is against RELENG_5 but the logic should apply to -CURRENT,
although the patch likely won't as ttymalloc() is organized differently
there.

I did some basic testing on my UP box and didn't see any abberant behavior
afterwards. However I can't reproduce the panic in question, so if you're
good at triggering the panic give this a spin.

http://people.freebsd.org/~dwhite/tty.c.20050503.patch

-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: gmirror oddities

2005-05-03 Thread George Hartzell

Eirik Øverby writes:
 > Hi!
 > 
 > I've been using gmirror for a while to safeguard my system disks. I have
 > taken the slice-based mirror approach, where I use, say, ad0s1 and ad2s1 as
 > providers.
 > On one of my servers, this seems to be impossible. I create the mirror using
 > ad2s1 first (to keep my system running while I do some of the work), and
 > then I re-initialize ad0s1 (making it exactly the size of ad2s1) before
 > using gmirror insert to add it to the mirror.
 > However, at this point - when doing a gmirror list - it turns out that it
 > never added ad0s1 as a provider, but ad0 itself! As a result, I now have a
 > load of slices (ad0a, ad0b, ad0d, ad0e, ad0f) instead of having the same
 > structure as I have on ad2s1. It's just like ad2s1, just without the "s1"
 > part.
 > 
 > I've tried "dd if=/dev/zero of=/dev/ad0 bs=65536" a couple of times, in case
 > some old provider metadata was stored there. I also have exactly the same
 > setup in another server, the only difference being that it behaves as
 > expected..
 > 
 > Am I doing something blatantly wrong here? This IS supposed to work, right?
 > I've even found a very nice description of how to do it at
 > http://people.freebsd.org/~rse/mirror/
 > confirming that what I'm doing is right.
 > 
 > I'm on 5.4-PRERELEASE, but this problem has been there since 5.3-p2 or
 > something, which was when I first tried this.

I bet you're getting bitten by a problem that bit me.  It's described
in the fine print in http://people.freebsd.org/~rse/mirror/.

Gmirror saves it's metadata on the last sector of its disk space.
Since the slice (adXs1) and the disk device (adX) end at the same
place on the disk, gmirror gets confused.  It tastes devices in a
particular order, apparently devices first, then slices.  It finds the
metadata when it tastes adX and goes ahead and uses it, even though it
should be associating it w/ adXsY.  Hilarity ensues

The fix is described in the fourth comment block of Ralf's doc, either
make the slice a sector smaller than the disk device or hardcode the
provider name.  I've been using the hardcoding approach, and it seems
to work for me.

g.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: RC4 will not install using FTP

2005-05-03 Thread Ken Smith
On Tue, 2005-05-03 at 16:54 +0100, Pete French wrote:
> Just been trying to install a new machine with 5.4-RC4. I normally
> do this using the bootonly image and FTP. But doing an FTP install
> always gives me the error "/: write failed, filesystem full" when
> it fetches the first chunk and tries to write it out.
> 
> I get exactly the same problem doing an FTP install from the full
> disc1.iso, so it isnt a problem wiuth the boootonly iso image.
> 
> Installing from the full CD works fine, however. Included below is
> the dmesg in case my hardware has something to dow ith this.

I haven't been able to reproduce this yet.  I've successfully done an
FTP install both from an initial floppy boot and from an initial disc1
CD boot.  So it's not a "totally generic" problem, it might have
something to do with the pathway you are following through sysinstall.

Was this a brand new install or were you installing over top of a
pre-existing install?  If the latter did you recycle the existing disk
partitions or have it wipe out everything and re-do them from scratch?

Thanks.

-- 
Ken Smith
- From there to here, from here to  |   [EMAIL PROTECTED]
  there, funny things are everywhere.   |
  - Theodore Geisel |



signature.asc
Description: This is a digitally signed message part


Re: panic on RELENG_5_4

2005-05-03 Thread Scott Long
Marcus Grando wrote:
Hi,
Panic on RELENG_5_4 (cvsup yesterday).
Waiting 5 seconds for SCSI devices to settle
panic: mutex Giant not owned at /usr/src/cam/cam_xpt.c:4825
cpuid = 0
KDB: enter: panic
[thread pid 35 tid 100031 ]
Stopped at  kdb_enter+0x2b: nop
db> where
Tracing pid 35 tid 100031 td 0xc22fb180
kdb_enter(c06e73f5) at kdb_enter+0x2b
panic(c06e68be,c06f7be6,c06c08da,12d9,c2608800) at panic+0x127
_mtx_assert(c074c660,1,c06c08da,12d9) at _mtx_assert+0x5c
xpt_done(c2608800,c23b6138,c23af000,c23af000,e4df8c54) at xpt_done+0x1d
amr_cam_complete_extcdb(c23b6138)at amr_cam_complete_extcdb+0c3
amr_complete(c23af000,0,54e3,0,d79200) at amr_complete+0x5e
amr_done(c23af000,c23af94c,0,c06dbc1d,1ae) at amr_done+0xb2
amr_pci_intr(c23af000) at amr_pci_intr+0x26
ithread_loop(c2306a00,e4df8d48,c2306a00,c0534f64,0) at ithread_loop+0x124
fork_exit(c0534f64,c2306a00,e4df8d48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe4df8d7c, ebp = 0 ---
db>
I need more information, please ask.
Regards
Could you try the patch at
http://people.freebsd.org/~scottl/amr_cam.5.4.diff
I'll make sure to get it into the release if it works.  It looks like I
had wanted to fix this exact problem in 6-current several months ago,
but chose to fix xpt_done() instead (something that I don't want to
merge back to 5-stable right now).
Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: FreeBSD 5.4 release status

2005-05-03 Thread Robert Watson
On Mon, 2 May 2005, Jonathan Noack wrote:
Hi I am abit confused here, have seen a post from someone using
5.4-STABLE how is that possible if 5.4 isnt RELEASE yet, and good news
on the bug fix.
When RELENG_5_4 was branched, RELENG_5 went from 5.4-PRERELEASE to 
5.4-STABLE to reflect that it is once again open for less-restricted 
development.  As 5.4 will be released via the RELENG_5_4 branch, it is 
normal and expected to have 5.4-STABLE around before 5.4-RELEASE.
Casey Schaufler (ex-SGI, previously ex-Sun) gave a great talk at a 
workshop I was at recently relating to closed and open source release 
processes.  One thing he pointed out that I found quite insightful is that 
what a development organization does is release source code and a build 
environment. Specifically, they release it to the release engineers, who 
then release a product, and may iterate some on the source product before 
turning it into the product.  We represent that in the FreeBSD world 
through a notion of release engineering branches: when the developer team 
is ready to generate a source product, we generate a branch.  It's done 
with the help of the release engineering team, but at some point the 
correlation between the source development branch and the releasee 
engineering branch becomes lower and they are essentially independent.

Casey was careful to point out that what a release engineering team 
releases may be quite different from the development organization's 
product -- it may have custom patches, custom build changes, documentation 
(release notes), logos, and who knows what else.

While there's overlap in these processes, and there's no cut and dry 
hand-off as a result of some iteration on the release process, I think 
this is a useful world view, and it explains why branches become -STABLE, 
etc.  The development organization, once its cut its source release to the 
release engineering team, goes back to doing what it does: building 
software for its next release.  We used to make the handoff synchronous, 
and we found that caused a lot of heartache.  The loosely synchronous 
model we have now (where people are a bit restrained) appears to work 
better.

Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic on RELENG_5_4

2005-05-03 Thread Scott Long
Marcus Grando wrote:
Hi,
Panic on RELENG_5_4 (cvsup yesterday).
Waiting 5 seconds for SCSI devices to settle
panic: mutex Giant not owned at /usr/src/cam/cam_xpt.c:4825
cpuid = 0
KDB: enter: panic
[thread pid 35 tid 100031 ]
Stopped at  kdb_enter+0x2b: nop
db> where
Tracing pid 35 tid 100031 td 0xc22fb180
kdb_enter(c06e73f5) at kdb_enter+0x2b
panic(c06e68be,c06f7be6,c06c08da,12d9,c2608800) at panic+0x127
_mtx_assert(c074c660,1,c06c08da,12d9) at _mtx_assert+0x5c
xpt_done(c2608800,c23b6138,c23af000,c23af000,e4df8c54) at xpt_done+0x1d
amr_cam_complete_extcdb(c23b6138)at amr_cam_complete_extcdb+0c3
amr_complete(c23af000,0,54e3,0,d79200) at amr_complete+0x5e
amr_done(c23af000,c23af94c,0,c06dbc1d,1ae) at amr_done+0xb2
amr_pci_intr(c23af000) at amr_pci_intr+0x26
ithread_loop(c2306a00,e4df8d48,c2306a00,c0534f64,0) at ithread_loop+0x124
fork_exit(c0534f64,c2306a00,e4df8d48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe4df8d7c, ebp = 0 ---
db>
I need more information, please ask.
Regards

I'll look at this, thanks.
Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA mkIII suspend problem

2005-05-03 Thread Søren Schmidt
Andrew Heybey wrote:
I just tried RELENG_5 as of last week and the latest (April 13) ATA
mkIII patches from http://people.freebsd.org/~sos/ATA/ on my laptop. 
Unfortunately, it breaks suspend-to-RAM (S3).

History:
I first tried RELENG_5 on the laptop (a Toshiba Tecra M2V) in January
and suspend did not work (the laptop hung after reinitializing the ATA
controller). Then I tried the first release of ATA mkIII. That first
version of the new ATA code made suspend work, and I was happy.
Last week, I tried upgrading to the latest RELENG_5 and the newest ATA
mkIII code, and now after suspending the kernel panics when reiniting
the ATA device(s) in ata-all.c:ata_reinit(), about line 217:
/* reinit the children and delete any that fails */
if (!device_get_children(dev, &children, &nchildren)) {
mtx_lock(&Giant);   /* newbus suckage it needs Giant */
for (i = 0; i < nchildren; i++) {
if (children[i] && device_is_attached(children[i]))
if (ATA_REINIT(children[i])) {
if (ch->running->dev == children[i]) {

device_printf(ch->running->dev,
  "FAILURE - device detached\n");
ch->running->dev = NULL;
ch->running = NULL;
}
device_delete_child(dev, children[i]);
}
}
free(children, M_TEMP);
mtx_unlock(&Giant); /* newbus suckage dealt with, release Giant */
}
The problem is that ch->running is NULL at this point.
Any suggestions on how to further debug or fix?
Thats been fixed since in -current just replace the line with:
if (ch->running && ch->running->dev == children[i]) {
--
-Søren
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RC4 will not install using FTP

2005-05-03 Thread Pete French
Just been trying to install a new machine with 5.4-RC4. I normally
do this using the bootonly image and FTP. But doing an FTP install
always gives me the error "/: write failed, filesystem full" when
it fetches the first chunk and tries to write it out.

I get exactly the same problem doing an FTP install from the full
disc1.iso, so it isnt a problem wiuth the boootonly iso image.

Installing from the full CD works fine, however. Included below is
the dmesg in case my hardware has something to dow ith this.

-pcf.

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RC4 #0: Mon May  2 00:08:36 UTC 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
ACPI APIC Table: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2392.04-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  
Features=0xbfebfbff
  Hyperthreading: 2 logical CPUs
real memory  = 536301568 (511 MB)
avail memory = 515141632 (491 MB)
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
ioapic2  irqs 48-71 on motherboard
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0:  on acpi0
acpi_button0:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib1:  mem 0xee00-0xefff at device 1.0 on pci0
pci1:  on pcib1
pcib2:  at device 2.0 on pci0
pci2:  on pcib2
pci2:  at device 28.0 (no driver 
attached)
pcib3:  at device 29.0 on pci2
pci3:  on pcib3
mpt0:  port 0xec00-0xecff mem 
0xfe3c-0xfe3d,0xfe3e-0xfe3f irq 25 at device 12.0 on pci3
mpt1:  port 0xe800-0xe8ff mem 
0xfe38-0xfe39,0xfe3a-0xfe3b irq 26 at device 12.1 on pci3
em0:  port 
0xe4c0-0xe4ff mem 0xfe36-0xfe37 irq 24 at device 14.0 on pci3
em0: Ethernet address: 00:0b:db:c6:06:68
em0:  Speed:N/A  Duplex:N/A
pci2:  at device 30.0 (no driver 
attached)
pcib4:  at device 31.0 on pci2
pci4:  on pcib4
uhci0:  port 0xff80-0xff9f irq 16 at 
device 29.0 on pci0
usb0:  on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1:  port 0xff60-0xff7f irq 19 at 
device 29.1 on pci0
usb1:  on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2:  port 0xff40-0xff5f irq 18 at 
device 29.2 on pci0
usb2:  on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pci0:  at device 29.7 (no driver attached)
pcib5:  at device 30.0 on pci0
pci5:  on pcib5
pcib6:  at device 13.0 on pci5
pci6:  on pcib6
pci6:  at device 0.0 (no driver attached)
pci6:  at device 4.0 (no driver attached)
pci6:  at device 8.0 (no driver attached)
pci6:  at device 12.0 (no driver attached)
isab0:  at device 31.0 on pci0
isa0:  on isab0
atapci0:  port 
0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pci0:  at device 31.3 (no driver attached)
pci0:  at device 31.5 (no driver attached)
fdc0:  port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0:  port 0x64,0x60 irq 1 on acpi0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
psm0:  irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0:  port 0x778-0x77f,0x378-0x37f irq 7 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0:  on ppc0
plip0:  on ppbus0
lpt0:  on ppbus0
lpt0: Interrupt-driven port
ppi0:  on ppbus0
orm0:  at iomem 
0xce000-0xc,0xcc800-0xcdfff,0xc8800-0xcc7ff,0xc-0xc87ff on isa0
pmtimer0 on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
Timecounter "TSC" frequency 2392044020 Hz quality 800
Timecounters tick every 10.000 msec
acd0: CDRW  at ata1-master PIO4
Waiting 15 seconds for SCSI devices to settle
da0 at mpt1 bus 0 target 0 lun 0
da0:  Fixed Direct Access SCSI-3 device 
da0: 320.000MB/s transfers (160.000MHz, offset 127, 16bit), Tagged Queueing 
Enabled
da0: 34732MB (71132959 512 byte sectors: 255H 63S/T 4427C)
Mounting root from ufs:/dev/da0s2a
em0: Link is up 100 Mbps Full Duplex

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-st

ATA mkIII suspend problem

2005-05-03 Thread Andrew Heybey
I just tried RELENG_5 as of last week and the latest (April 13) ATA
mkIII patches from http://people.freebsd.org/~sos/ATA/ on my laptop. 
Unfortunately, it breaks suspend-to-RAM (S3).

History:

I first tried RELENG_5 on the laptop (a Toshiba Tecra M2V) in January
and suspend did not work (the laptop hung after reinitializing the ATA
controller). Then I tried the first release of ATA mkIII. That first
version of the new ATA code made suspend work, and I was happy.

Last week, I tried upgrading to the latest RELENG_5 and the newest ATA
mkIII code, and now after suspending the kernel panics when reiniting
the ATA device(s) in ata-all.c:ata_reinit(), about line 217:

/* reinit the children and delete any that fails */
if (!device_get_children(dev, &children, &nchildren)) {
mtx_lock(&Giant);   /* newbus suckage it needs Giant */
for (i = 0; i < nchildren; i++) {
if (children[i] && device_is_attached(children[i]))
if (ATA_REINIT(children[i])) {
if (ch->running->dev == children[i]) {

device_printf(ch->running->dev,
  "FAILURE - device detached\n");
ch->running->dev = NULL;
ch->running = NULL;
}
device_delete_child(dev, children[i]);
}
}
free(children, M_TEMP);
mtx_unlock(&Giant); /* newbus suckage dealt with, release Giant */
}

The problem is that ch->running is NULL at this point.

Any suggestions on how to further debug or fix?

thanks,
andrew
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: PostgreSQL in FreeBSD jails

2005-05-03 Thread Marc G. Fournier
On Tue, 3 May 2005, [iso-8859-1] Dag-Erling Smørgrav wrote:
PostgreSQL has always had this problem, both on 4.x and 5.x.  A hack was 
put in place last November to work around it, but it still exists, and 
while it may now be possible (with 8.0) for multiple postmasters to run 
on the same machine
'k, I've been doing multiple since 7.2 on the same machine, all on the 
same port, all different IPs, all on 4.x servers ... have never had an 
issue with crashes (its pretty much my most stable 4.x server) ...

In fact:
# ps aux | grep postmaster | egrep -v "postmaster:" | sort +1 -n
scrappy 36569  0.0  0.0 14552  600  ??  IsJ  19Apr05   0:21.48 
/usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data -S (postgres)
scrappy 36675  0.0  0.0 258184 1052  ??  SsJ  19Apr05  14:10.24 
/usr/local/bin/postmaster -D /usr/local/pgsql/data -S (postgres)
scrappy 36865  0.0  0.0 16556  836  ??  IsJ  19Apr05   0:14.17 
/usr/local/bin/postmaster -D /usr/local/pgsql/data -S (postgres)
pgsql   37518  0.0  0.0 16400  396  ??  IsJ  19Apr05   0:04.02 
/usr/local/bin/postmaster (postgres)
pgsql   37815  0.0  0.0  8144  436  p9- IJ   19Apr05   0:14.62 
/usr/local/bin/postmaster (postgres)
pgsql   37962  0.0  0.0  8680  560  ??  IsJ  19Apr05   0:08.72 
/usr/local/bin/postmaster (postgres)
pgsql   38168  0.0  0.0 16400  452  ??  IsJ  19Apr05   0:37.69 
/usr/local/bin/postmaster (postgres)
pgsql   38316  0.0  0.0  7144  464  ??  IsJ  19Apr05   0:04.08 
/usr/local/bin/postmaster (postgres)
pgsql   38458  0.0  0.0  7208  380  ??  IsJ  19Apr05   0:04.01 
/usr/local/bin/postmaster (postgres)
pgsql   38596  0.0  0.0  6952  452  ??  IsJ  19Apr05   0:03.90 
/usr/local/bin/postmaster (postgres)
scrappy 38717  0.0  0.0  6952  436  ??  IsJ  19Apr05   0:03.98 
/usr/local/bin/postmaster (postgres)
pgsql   38868  0.0  0.0  8224  552  ??  SsJ  19Apr05   0:03.39 
/usr/local/bin/postmaster -D /usr/local/pgsql/data (postgres)
pgsql   38993  0.0  0.0  7912  584  ??  IsJ  19Apr05   0:06.41 
/usr/local/bin/postmaster (postgres)
pgsql   39126  0.0  0.0  7480  400  ??  IsJ  19Apr05   0:01.80 
/usr/local/bin/postmaster -D /usr/local/pgsql/data (postgres)
pgsql   87544  0.0  0.1  7948 3528  ??  IsJ  Sun08PM   0:00.78 
/usr/local/bin/postmaster -D /usr/local/pgsql/data (postgres)
# ipcs -a | fgrep -f /tmp/pids | sort +10 -n
m 3276835432003 --rw---  scrappy 1001  scrappy 1001  7 
10256384  36569  38717 8:40:46 11:51:28  8:37:57
m 1310765432004 --rw---  scrappy 1001  scrappy 1001100 
257957888  36675  38717 8:40:46 11:54:16  8:38:04
m 100925495432005 --rw---  scrappy 1001  scrappy 1001 12 
10362880  36865  38717 8:40:46 11:29:20  8:38:25
m 1310805432007 --rw---pgsqlpgsqlpgsqlpgsql  1 
10436608  37518  39126 8:41:20 11:53:08  8:39:18
m 1310815432008 --rw---pgsqlpgsqlpgsqlpgsql  6 
2449408  37815  39126 8:41:20 11:52:32  8:39:43
m 3932265432009 --rw---pgsqlpgsqlpgsqlpgsql  9 
2596864  37962  39126 8:41:20 11:50:25  8:39:55
m 1310835432010 --rw---pgsqlpgsqlpgsqlpgsql  1 
10436608  38168  39126 8:41:20 11:52:15  8:40:06
m 10485885432011 --rw---pgsqlpgsqlpgsqlpgsql  1 
1024000  38316  39126 8:41:20 11:51:53  8:40:19
m 1310855432012 --rw---pgsqlpgsqlpgsqlpgsql  1 
1024000  38458  39126 8:41:20 11:50:28  8:40:29
m 1310865432013 --rw---pgsqlpgsqlpgsqlpgsql  1 
761856  38596  39126 8:41:20 11:53:02  8:40:38
m 1310875432014 --rw---  scrappy 1001  scrappy 1001  1 
761856  38717  38717 8:40:46 11:50:42  8:40:46
m 1310885432015 --rw---pgsqlpgsqlpgsqlpgsql  2 
811008  38868  39126 8:41:20  1:59:39  8:40:58
m 15073455432016 --rw---pgsqlpgsqlpgsqlpgsql  1 
761856  38993  39126 8:41:20 11:50:37  8:41:07
m 13107385432017 --rw---pgsqlpgsqlpgsqlpgsql  2 
811008  39126  39126 8:41:20  0:59:01  8:41:20
m 1966155432001 --rw---pgsqlpgsqlpgsqlpgsql 11 
1548288  87544  8754420:32:30 11:50:56 20:32:30
So, unless I'm missing something here, each postmaster is acquiring its 
own ID, and the above servers consist of the following versions (all of 
which are built from ports):

   1 postgresql-7.2.4_2
   1 postgresql-7.4.1
   1 postgresql-7.4.1_1
   1 postgresql-7.4.2
   2 postgresql-7.4.5
   4 postgresql-7.4.6
   1 postgresql-devel-8.0.0,1
   1 postgresql-server-7.4.7
   1 postgresql-server-7.4.7_3
   1 postgresql-server-8.0.0
   1 postgresql-server-8.0.1_3
So, unless I'm missing something, 4.x did allow for running multiple 
PostgreSQL servers, on the same machine, in multiple jails, each with 
their own distinct shared memory segment ... or am I mis-reading the 
above?

it is also still possible for malicious code in one jail to crash
postmasters in other jails.
That one I can agree with, which is why all our

Re: kern/78824: race condition close()ing and read()ing the same socketpair on SMP.

2005-05-03 Thread Marc Olzheim
Is this going to be fixed before 5.4 ? It still breaks on today's
5.4-STABLE.

Marc
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Instant reboot FreeBSD 5.4-STABLE amd64

2005-05-03 Thread Marc Olzheim
On Tue, May 03, 2005 at 04:43:13PM +0200, Marc Olzheim wrote:
> FreeBSD 5.4-STABLE #12: Mon May  2 19:23:22 CEST 2005 [EMAIL 
> PROTECTED]:/usr/obj/usr/src/sys/HAMMER
> 
Doing exactly the same on i386 SMP results in a hanging gdb and the
process exiting with signal 8.

blackmetal:~#ps uaxwlp 759   
USER  PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED  TIME COMMAND UID  
PPID CPU PRI NI MWCHAN
marcolz   759  0.0  0.1  5976  5440  p0  I+4:45PM   0:00.07 gdb ./fpu5th
   104   650   0   8  0 wait  
blackmetal:~#

Marc


pgpEfz2BHv8l3.pgp
Description: PGP signature


Instant reboot FreeBSD 5.4-STABLE amd64

2005-05-03 Thread Marc Olzheim
FreeBSD 5.4-STABLE #12: Mon May  2 19:23:22 CEST 2005 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/HAMMER

hammer:~/src/hak/fpu>gdb ./fpu5th 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
(gdb) break floor
Function "floor" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y

Breakpoint 1 (floor) pending.
(gdb) r
Starting program: /vwww.mnt/sources/srcimport/marcolz/hak/fpu/fpu5th 
Breakpoint 2 at 0x80063fdf0
Pending breakpoint "floor" resolved
load: 0.08  cmd: fpu5th 917 [running] 1.69u 0.01s 4% 1192k

Program received signal SIGINFO, Information request.
[Switching to Thread 1 (LWP 100167)]
0x00080076558c in pthread_testcancel () from /usr/lib/libpthread.so.1
(gdb) disass floorl
No symbol "floorl" in current context.
(gdb) disass floorf
Dump of assembler code for function floorf:
0x00080063fd40 :  movss  %xmm0,0xfffc(%rsp)
0x00080063fd46 :  mov0xfffc(%rsp),%edx
0x00080063fd4a : mov%edx,%eax
0x00080063fd4c : sar$0x17,%eax
0x00080063fd4f : and$0xff,%eax
0x00080063fd54 : lea0xff81(%rax),%ecx
0x00080063fd57 : cmp$0x16,%ecx
0x00080063fd5a : jg 0x80063fda1 
0x00080063fd5c : test   %ecx,%ecx
0x00080063fd5e : js 0x80063fdb1 
0x00080063fd60 : mov$0x7f,%esi
0x00080063fd65 : movaps %xmm0,%xmm1
0x00080063fd68 : sar%cl,%esi
0x00080063fd6a : test   %edx,%esi
0x00080063fd6c : je 0x80063fd9d 
0x00080063fd6e : addss  3226(%rip),%xmm0# 0x800640a10 
<_fini+152>
0x00080063fd76 : ucomiss 3223(%rip),%xmm0# 0x800640a14 
<_fini+156>
0x00080063fd7d : jbe0x80063fd90 
0x00080063fd7f : test   %edx,%edx
0x00080063fd81 : js 0x80063fdd9 
---Type  to continue, or q  to quit---
0x00080063fd83 : mov%esi,%eax
0x00080063fd85 : not%eax
0x00080063fd87 : and%eax,%edx
0x00080063fd89 : data16
0x00080063fd8a : data16
0x00080063fd8b : data16
0x00080063fd8c : nop
0x00080063fd8d : data16
0x00080063fd8e : data16
0x00080063fd8f : nop
0x00080063fd90 : mov%edx,0xfffc(%rsp)
0x00080063fd94 : movss  0xfffc(%rsp),%xmm0
0x00080063fd9a : movaps %xmm0,%xmm1
0x00080063fd9d : movaps %xmm1,%xmm0
0x00080063fda0 : retq   
0x00080063fda1 : add$0xff80,%ecx
0x00080063fda4 :movaps %xmm0,%xmm1
0x00080063fda7 :jne0x80063fd9d 
0x00080063fda9 :addss  %xmm0,%xmm1
0x00080063fdad :movaps %xmm1,%xmm0
0x00080063fdb0 :retq   
0x00080063fdb1 :addss  3159(%rip),%xmm0# 
0x800640a10 <_fini+152>
---Type  to continue, or q  to quit---
0x00080063fdb9 :ucomiss 3156(%rip),%xmm0# 
0x800640a14 <_fini+156>
0x00080063fdc0 :jbe0x80063fd90 
0x00080063fdc2 :test   %edx,%edx
0x00080063fdc4 :js 0x80063fdca 
0x00080063fdc6 :xor%edx,%edx
0x00080063fdc8 :jmp0x80063fd90 
0x00080063fdca :test   $0x7fff,%edx
0x00080063fdd0 :je 0x80063fd90 
0x00080063fdd2 :mov$0xbf80,%edx
0x00080063fdd7 :jmp0x80063fd90 
0x00080063fdd9 :mov$0x80,%eax
0x00080063fdde :sar%cl,%eax
0x00080063fde0 :add%eax,%edx
0x00080063fde2 :jmp0x80063fd83 
0x00080063fde4 :nop
0x00080063fde5 :nop
0x00080063fde6 :nop
0x00080063fde7 :nop
0x00080063fde8 :nop
0x00080063fde9 :nop
0x00080063fdea :nop
0x00080063fdeb :nop
---Type  to continue, or q  to quit---
0x00080063fdec :nop
0x00080063fded :nop
0x00080063fdee :nop
0x00080063fdef :nop
End of assembler dump.
(gdb) q
The program is running.  Exit anyway? (y or n) y


*reboot, no dump* :-((

I tried debugging the program I just mailed to threads@

Marc


pgpeZoth4VhCa.pgp
Description: PGP signature


Re: NFS client/buffer cache deadlock

2005-05-03 Thread Brian Fundakowski Feldman
On Tue, May 03, 2005 at 11:47:00AM +0200, Marc Olzheim wrote:
> On Wed, Apr 27, 2005 at 12:08:57PM -0400, Brian Fundakowski Feldman wrote:
> > Alright, this will do synchronous, instead of short, writes (also,
> > of course, not deadlock the system) if you are trying to use an
> > excessively large buffer size.
> > 
> > 
> > 
> 
> Will this be incorporated in time for 5.4 ?

It really needs someone else to review the code changes more than just
conceptually to make this kind of an adjustment before release.  It
is not truly an optimal solution, as fully synchronous writes are not
necessary; just limiting the "write window" size and requiring posted
transactions to complete before queueing up more is.  Doing that is
more error-prone, however, and would I think complicate things just to
optimize the speed of a rare case.

Still, there are probably a few who would object, in which case they
should do the work of optimizing that side case  ;) There's still
missing an actual mount_nfs(8) configuration flag and documentation,
but those things are trivial.

(Forwarded on to -current as well, for additional eyes/testers.)

-- 
Brian Fundakowski Feldman   \'[ FreeBSD ]''\
  <> [EMAIL PROTECTED]   \  The Power to Serve! \
 Opinions expressed are my own.   \,,\
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


panic on RELENG_5_4

2005-05-03 Thread Marcus Grando
Hi,
Panic on RELENG_5_4 (cvsup yesterday).
Waiting 5 seconds for SCSI devices to settle
panic: mutex Giant not owned at /usr/src/cam/cam_xpt.c:4825
cpuid = 0
KDB: enter: panic
[thread pid 35 tid 100031 ]
Stopped at  kdb_enter+0x2b: nop
db> where
Tracing pid 35 tid 100031 td 0xc22fb180
kdb_enter(c06e73f5) at kdb_enter+0x2b
panic(c06e68be,c06f7be6,c06c08da,12d9,c2608800) at panic+0x127
_mtx_assert(c074c660,1,c06c08da,12d9) at _mtx_assert+0x5c
xpt_done(c2608800,c23b6138,c23af000,c23af000,e4df8c54) at xpt_done+0x1d
amr_cam_complete_extcdb(c23b6138)at amr_cam_complete_extcdb+0c3
amr_complete(c23af000,0,54e3,0,d79200) at amr_complete+0x5e
amr_done(c23af000,c23af94c,0,c06dbc1d,1ae) at amr_done+0xb2
amr_pci_intr(c23af000) at amr_pci_intr+0x26
ithread_loop(c2306a00,e4df8d48,c2306a00,c0534f64,0) at ithread_loop+0x124
fork_exit(c0534f64,c2306a00,e4df8d48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe4df8d7c, ebp = 0 ---
db>
I need more information, please ask.
Regards
--
Marcus Grando
Grupos Internet S/A
marcus(at)corp.grupos.com.br
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: MNT_USER?

2005-05-03 Thread Colin Percival
Danny Braniss wrote:
> BTW, this, the MNT_NOEXEC, uncovered, IMHO, a bug in libexec/rtld-elf/rtld.c
> where it's now checking for MNT_NOEXEC, but only if LD_LIBRARY_PATH is set!

This is not a bug.  Checking for MNT_NOEXEC adds a cost in performance, and
it is not necessary if LD_LIBRARY_PATH, LD_PRELOAD, and LD_LIBMAP* are not
set -- based on the assumption, that is, that no (sane) sysadmin would ever
put a MNT_NOEXEC-mounted filesystem into the default library path.

I agree that it's a bit counter-intuitive, but it's really just a case of
saving time by not checking for something which should Never Happen. :-)

Colin Percival
PS. Bravo to Ian for tracking down the bug in NFS -- I spent a while looking
for this, but got hopelessly lost.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: MNT_USER?

2005-05-03 Thread Danny Braniss
> In message <[EMAIL PROTECTED]>, Danny Braniss writes:
> >
> >after doing a mount_nfs as root (from the console or via su), statfs reports
> >that MNT_USER flags is set! this is also true with 5.4.
> 
> It's a bug in the statfs reporting for NFS filesystems. It should
> be fixed now in -CURRENT (revision 1.174 of sys/nfsclient/nfs_vfsops.c).
> I'll merge this to -STABLE in a week or so.
> 
> Ian
good, it did indeed fix the problem, and also a previous one that turned
on the MNT_NOEXEC.

BTW, this, the MNT_NOEXEC, uncovered, IMHO, a bug in libexec/rtld-elf/rtld.c
where it's now checking for MNT_NOEXEC, but only if LD_LIBRARY_PATH is set!

danny




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: NFS client/buffer cache deadlock

2005-05-03 Thread Marc Olzheim
On Wed, Apr 27, 2005 at 12:08:57PM -0400, Brian Fundakowski Feldman wrote:
> Alright, this will do synchronous, instead of short, writes (also,
> of course, not deadlock the system) if you are trying to use an
> excessively large buffer size.
> 
> 
> 

Will this be incorporated in time for 5.4 ?

Marc


pgpOu32NL5BLG.pgp
Description: PGP signature


Re: kernel: swap_pager: indefinite wait buffer - on 5.3-RELEASE-p5

2005-05-03 Thread Uwe Doering
Oliver Fromme wrote:
Uwe Doering <[EMAIL PROTECTED]> wrote:
 > Oliver Fromme wrote:
 > > If they're really identical (i.e. the same size and same
 > > geometry), then you can use dd(1) for duplication, like
 > > this:
 > > 
 > > # dd if=/dev/ad0 of=/dev/ad1 bs=64k conv=noerror,sync
 > > 
 > > The "noerror,sync" part is important so the dd command will
 > > not stop when it hits any bad spots on the source drive and
 > > instead will fill the blocks with zeroes on the destination
 > > drive.  Since it's only the swap partition, you shouldn't
 > > lose any data.
 > 
 > I would like to point out that the conclusion you're drawing in the last 
 > sentence is invalid IMHO.

I'm afraid I don't agree.
 > "indefinite wait buffer" messages at 
 > apparently random block numbers just indicate that the pager was unable 
 > to access the swap area (in its entirety!) when it wanted to.  It means 
 > that the disk drive was either dead at that point in time or busy trying 
 > to deal with a bad sector.
 > 
 > This sector could have been anywhere on the disk.  It just kept the disk 
 > drive busy for long enough that the pager started to complain.

The OP specifically said that the swap_pager messages were
the only kernel messages that he got.  That indicates that
only the swap partition is affected, because otherwise
there would have been other kernel messages indicating
I/O errors from one of the filesystems on that disk.
Your assumption here is that the filesystem code would become impatient, 
too.  This in not the case.  The swap pager has a timeout built in (20 
seconds IIRC) after which it prints a warning message and continues 
waiting, but there is nothing like this in the filesystem code.

If the disk drive is dead or busy trying to deal with a bad sector in a 
filesystem the kernel will wait silently and indefinitely until either 
the disk drive succeeds in recovering the sector, or it fails to do so. 
 In the latter case the kernel would log an I/O error.  But only when 
it hears back from the disk drive and not any earlier, in contrast to 
the swap pager.  That's why you often see only swap pager messages in 
case of a dying disk drive.

I checked the kernel sources, but of course I could have missed the 
relevant lines.  In this case I would appreciate a pointer to the place 
at which the filesystem code generates a warning message comparable to 
that from the swap pager.

   Uwe
--
Uwe Doering |  EscapeBox - Managed On-Demand UNIX Servers
[EMAIL PROTECTED]  |  http://www.escapebox.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


gmirror oddities

2005-05-03 Thread Eirik Øverby
Hi!

I've been using gmirror for a while to safeguard my system disks. I have
taken the slice-based mirror approach, where I use, say, ad0s1 and ad2s1 as
providers.
On one of my servers, this seems to be impossible. I create the mirror using
ad2s1 first (to keep my system running while I do some of the work), and
then I re-initialize ad0s1 (making it exactly the size of ad2s1) before
using gmirror insert to add it to the mirror.
However, at this point - when doing a gmirror list - it turns out that it
never added ad0s1 as a provider, but ad0 itself! As a result, I now have a
load of slices (ad0a, ad0b, ad0d, ad0e, ad0f) instead of having the same
structure as I have on ad2s1. It's just like ad2s1, just without the "s1"
part.

I've tried "dd if=/dev/zero of=/dev/ad0 bs=65536" a couple of times, in case
some old provider metadata was stored there. I also have exactly the same
setup in another server, the only difference being that it behaves as
expected..

Am I doing something blatantly wrong here? This IS supposed to work, right?
I've even found a very nice description of how to do it at
http://people.freebsd.org/~rse/mirror/
confirming that what I'm doing is right.

I'm on 5.4-PRERELEASE, but this problem has been there since 5.3-p2 or
something, which was when I first tried this.

Anyone?

Thanks,
/Eirik


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"