kernel panic(?) trying to copy data off failed drive with dd

2006-06-11 Thread Greg Lane
G'day everyone,

I have a 250 GB SATA disk with one regular FreeBSD parition 
that is obviously in some distress. 

I can mount the partition, but any attempt to fsck it gives (I think) 
a kernel panic. I say I think because sometimes it hangs forever, 
sometimes it reboots.  One of the panic messages (at one point or 
another, sorry I can't be precise) was in "initiate_write_inodeblock_ufs2".
Errors on the console have been like "FAILURE read_dma timed out" etc. 
When it hangs, the console doesn't always give an error, it just hangs. 

I commented out the drive in fstab so I could boot and started to dd 
the entire disk to an image so that I can try and mount the image, fsck 
it and salvage what I can.  However, when I get to the bad parts of the 
disk the machine hangs.  I copied the first 160 GB or so, then it hung.  
I have since recopied the first part of the disk and am trying to get the 
last part. Recently I have been trying:

dd if=/dev/ad4s1d conv=noerror,sync bs=512 skip=333184000 of=data1-image-p2

I was hoping noerror would skip over the bad parts of the disk, but the 
machine invariably hangs. 

(1) Is there anyway to stop it hanging on bad reads of the disk and just 
return an error, which noerror can skip over?  I am perfectly willing 
to patch the kernel or whatever it takes to get this working. While I 
have a backup, it is a month or so old. I can invest some time now to 
save having to regenerate the last month's work. (Fortunately this is 
the less active disk!  I have two other 250GB drives which would have 
been far worse to lose. They are now backed up to the present day!)

(2) Will dump/restore help me? Will dump skip over the errors 
any better?  Do I need an fsck'ed file system before I dump?

It seems to my uneducated eye that this is a read error that hangs the 
kernel before it ever gets passed to the user program (dd) so 
dump/restore will work no better.  But I don't really no. 

The machine is running 5.5 pre-release.  I can pull the disk and put it 
in a machine running 6-stable if that will help.  I could also install 
current on some box or another. Whatever will get the data back!! 

Advice please?!?

Cheers,
Greg
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Nonlinear repeated malloc/free performance

2006-06-11 Thread Mark Kirkwood
I just upped kern.maxdsize to 1G, but noticed that a test program that 
mallocs and frees in a loop (increasing sized chunks 1M -> 1024M) takes 
about 6 times longer for 1024 iterations than it does only 512 of 'em. 
Is this non-linearity expected?


I see from a profile that ifree is taking most of the time.

I guess this use-case is probably unusual (it's only a test program that 
I used to check I really can use the whole 1G...).


I'm running 6.1-STABLE from about 09 May. (prog and gprof attached).

Cheers

Mark
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 6.1-stable hangs and LORs

2006-06-11 Thread Brad Waite

Max Laier wrote:


From pf.conf(5):
BUGS
 Due to a lock order reversal (LOR) with the socket layer, the use of the
 group and user filter parameter in conjuction with a Giant-free netstack
 can result in a deadlock.  If you have to use group or user you must set
 debug.mpsafenet to ``0'' from the loader(8), for the moment.  This work-
 around will still produce the LOR, but Giant will protect from the dead-
 lock.



Thanks for the heads-up, Max.

I didn't think I was using 'user' or 'group' in my pf.conf, but the 
default ftp proxy entry indeed matches user ftp-proxy.  I'll try the 
mpsafenet=0 and we'll see if that fixes it.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: unmounting a filesystem safely that doesn't exist anymore

2006-06-11 Thread Peter Jeremy
On Sat, 2006-Jun-10 19:40:41 +0200, Bjrn Knig wrote:
>I did a mistake: I unplugged my digital camera accidentally before I 
>unmounted the filesystem. *doh* This happens very often, because I'm 
>very scatterbrained. =)

Your best solution may be to use mtools (ports/emulators/mtools) rather
than mounting the filesystem.

>changed ad hoc. I just want to know if somebody knows a workaround or 
>small trick that prevents the other filesystems from being unclean on 
>next boot-up.

The only way to do this is to have all the other filesystems mounted
read-only.  The "filesystem clean" flag is part of the superblock and
is cleared when a filesystem is mounted.  It will be set only if the
filesystem is cleanly unmounted.

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.1-stable hangs and LORs

2006-06-11 Thread Max Laier
On Sunday 11 June 2006 23:46, Brad Waite wrote:
> Kris Kennaway wrote:
> > We need to know the LOR before anyone can tell what is going wrong ;-)
>
> Ask and you will receive:
>
> lock order reversal:
>   1st 0xc077a440 pf task mtx (pf task mtx) @
> /usr/src/sys/contrib/pf/net/pf.c:6331
>   2nd 0xc07d3fac tcp (tcp) @ /usr/src/sys/contrib/pf/net/pf.c:2719
> KDB: stack backtrace:
> witness_checkorder(c07d3fac,9,c06fd2f7,a9f) at witness_checkorder+0x55c
> _mtx_lock_flags(c07d3fac,0,c06fd2f7,a9f,c07d3fac) at _mtx_lock_flags+0x40
> pf_socket_lookup(e35ccacc,e35ccad0,1,e35ccb8c,0) at pf_socket_lookup+0x103
> pf_test_tcp(e35ccb3c,e35ccb34,1,c4d3e400,c5027c00,14,c5032810,e35ccb8c,e35c
>cb40,e35ccb44,0,0) at pf_test_tcp+0x10d6
> pf_test(1,c4ba3c00,e35ccc2c,0,0) at pf_test+0xb77
> pf_check_in(0,e35ccc2c,c4ba3c00,1,0) at pf_check_in+0x37
> pfil_run_hooks(c07d3b60,e35c,c4ba3c00,1,0) at pfil_run_hooks+0xee
> ip_input(c5027c00,18,c07d3138,e35cccec,c05dcd63) at ip_input+0x1b2
> netisr_processqueue(c4b24500,c4b28000,0,e35ccd0c,c055877b) at
> netisr_processqueue+0xf
> swi_net(0,c4b28038,c4ad6d80,c0558590,c4ad520c) at swi_net+0x8b
> ithread_loop(c4ab68d0,e35ccd38,c4ab68d0,c0558590,0) at ithread_loop+0x1eb
> fork_exit(c0558590,c4ab68d0,e35ccd38) at fork_exit+0x7d
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xe35ccd6c, ebp = 0 ---

From pf.conf(5):
BUGS
 Due to a lock order reversal (LOR) with the socket layer, the use of the
 group and user filter parameter in conjuction with a Giant-free netstack
 can result in a deadlock.  If you have to use group or user you must set
 debug.mpsafenet to ``0'' from the loader(8), for the moment.  This work-
 around will still produce the LOR, but Giant will protect from the dead-
 lock.

-- 
/"\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


pgpGYFFvbEVpy.pgp
Description: PGP signature


Re: 6.1-stable hangs and LORs

2006-06-11 Thread Brad Waite

Kris Kennaway wrote:


We need to know the LOR before anyone can tell what is going wrong ;-)


Ask and you will receive:

lock order reversal:
 1st 0xc077a440 pf task mtx (pf task mtx) @ 
/usr/src/sys/contrib/pf/net/pf.c:6331

 2nd 0xc07d3fac tcp (tcp) @ /usr/src/sys/contrib/pf/net/pf.c:2719
KDB: stack backtrace:
witness_checkorder(c07d3fac,9,c06fd2f7,a9f) at witness_checkorder+0x55c
_mtx_lock_flags(c07d3fac,0,c06fd2f7,a9f,c07d3fac) at _mtx_lock_flags+0x40
pf_socket_lookup(e35ccacc,e35ccad0,1,e35ccb8c,0) at pf_socket_lookup+0x103
pf_test_tcp(e35ccb3c,e35ccb34,1,c4d3e400,c5027c00,14,c5032810,e35ccb8c,e35ccb40,e35ccb44,0,0) 
at pf_test_tcp+0x10d6

pf_test(1,c4ba3c00,e35ccc2c,0,0) at pf_test+0xb77
pf_check_in(0,e35ccc2c,c4ba3c00,1,0) at pf_check_in+0x37
pfil_run_hooks(c07d3b60,e35c,c4ba3c00,1,0) at pfil_run_hooks+0xee
ip_input(c5027c00,18,c07d3138,e35cccec,c05dcd63) at ip_input+0x1b2
netisr_processqueue(c4b24500,c4b28000,0,e35ccd0c,c055877b) at 
netisr_processqueue+0xf

swi_net(0,c4b28038,c4ad6d80,c0558590,c4ad520c) at swi_net+0x8b
ithread_loop(c4ab68d0,e35ccd38,c4ab68d0,c0558590,0) at ithread_loop+0x1eb
fork_exit(c0558590,c4ab68d0,e35ccd38) at fork_exit+0x7d
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe35ccd6c, ebp = 0 ---

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.1-stable hangs and LORs

2006-06-11 Thread Kris Kennaway
On Sun, Jun 11, 2006 at 12:09:05PM -0600, Brad Waite wrote:
> Hi guys,
> 
> I'm going to take another stab at getting some help.
> 
> For the last 6 months my FBSD gateway has been locking up every few 
> days, usually about once a week.  No panic, no reboots, just a hard lock 
> with no response on the console or over the net.
> 
> I've replaced literally every piece of hardware with the exception of 
> the case and power supply.  No change.
> 
> I've upgraded from 5.3- to 6.0- to 6.1-STABLE.  No change.
> 
> I've researched as much as I know how and still come up with hardly 
> anything.  I have turned on BREAK_TO_DEBUGGER, WITNESS and INVARIANTS 
> and the only indication I've gotten is a lock order reversal that's 
> *similar* to http://sources.zabbadoz.net/freebsd/lor/017.html.  The line 
> numbers in pf.c don't match up with LOR 017, but that's about all I can 
> tell.

We need to know the LOR before anyone can tell what is going wrong ;-)

Kris

pgpyHW6nxr5WK.pgp
Description: PGP signature


Re: 6.1-stable hangs and LORs

2006-06-11 Thread Roland Smith
On Sun, Jun 11, 2006 at 12:09:05PM -0600, Brad Waite wrote:
> Hi guys,
> 
> I'm going to take another stab at getting some help.
> 
> For the last 6 months my FBSD gateway has been locking up every few 
> days, usually about once a week.  No panic, no reboots, just a hard lock 
> with no response on the console or over the net.
> 
> I've replaced literally every piece of hardware with the exception of 
> the case and power supply.  No change.

One of my machines had random lockup problems, an then the power supply
died. The problems were gone after it was replaced. You might want to
swap out the power supply. And test the RAM.

> I've upgraded from 5.3- to 6.0- to 6.1-STABLE.  No change.
> 
> I've researched as much as I know how and still come up with hardly 
> anything.  I have turned on BREAK_TO_DEBUGGER, WITNESS and INVARIANTS 
> and the only indication I've gotten is a lock order reversal that's 
> *similar* to http://sources.zabbadoz.net/freebsd/lor/017.html.  The line 
> numbers in pf.c don't match up with LOR 017, but that's about all I can 
> tell.
> 
> I'm reasonably certain the issue is with pf, since I have 3 other 
> non-gateway servers humming along with no problems.  The hardware is 
> nearly identical - their RAID cards are different, but I've tried 
> running my gateway on just a single SCSI drive and had the same lockup 
> issue.  Of course, the issue could be somewhere else, but I'm at a loss 
> as to how to find it.

Could you try swapping two machines? That would be the ultimate check if
it's hardware related.
 
> I'm running my console over serial so I can log anything that's 
> necessary.  I've been able to break to the debugger, but to be honest, I 
> don't know what to look for.  I've seen several posts on the lists about 
>  posting the output of debug commands, but I figured it to be in poor 
> taste to just dump my output here before someone asked.

Posting a stack backtrace (bt) might be a good start.

> I'm getting a lot of heat from the boss since our VoIP phones don't work 
> when the gateway locks up.

Sometimes POTS isn't so bad after all. ;-)

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpWskgJLlVCJ.pgp
Description: PGP signature


6.1-stable hangs and LORs

2006-06-11 Thread Brad Waite

Hi guys,

I'm going to take another stab at getting some help.

For the last 6 months my FBSD gateway has been locking up every few 
days, usually about once a week.  No panic, no reboots, just a hard lock 
with no response on the console or over the net.


I've replaced literally every piece of hardware with the exception of 
the case and power supply.  No change.


I've upgraded from 5.3- to 6.0- to 6.1-STABLE.  No change.

I've researched as much as I know how and still come up with hardly 
anything.  I have turned on BREAK_TO_DEBUGGER, WITNESS and INVARIANTS 
and the only indication I've gotten is a lock order reversal that's 
*similar* to http://sources.zabbadoz.net/freebsd/lor/017.html.  The line 
numbers in pf.c don't match up with LOR 017, but that's about all I can 
tell.


I'm reasonably certain the issue is with pf, since I have 3 other 
non-gateway servers humming along with no problems.  The hardware is 
nearly identical - their RAID cards are different, but I've tried 
running my gateway on just a single SCSI drive and had the same lockup 
issue.  Of course, the issue could be somewhere else, but I'm at a loss 
as to how to find it.


I'm running my console over serial so I can log anything that's 
necessary.  I've been able to break to the debugger, but to be honest, I 
don't know what to look for.  I've seen several posts on the lists about 
 posting the output of debug commands, but I figured it to be in poor 
taste to just dump my output here before someone asked.


I'm getting a lot of heat from the boss since our VoIP phones don't work 
when the gateway locks up.


If someone can help identify and/or eliminate this issue, I'm more than 
happy to do everything I can to provide the necessary information.


Thanks.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: openoffice.org-2.0, portinstall and MAKE_ARGS

2006-06-11 Thread Andrey Melentyev
В сообщении от 11 июня 2006 11:46 Vlad GALU написал(a):
> On 6/11/06, Vlad GALU <[EMAIL PROTECTED]> wrote:
> > On 6/10/06, Andrey Melentyev <[EMAIL PROTECTED]> wrote:
> > > Hi all!
> > > I've got a problem with building editors/openoffice-2.0 on my
> > > FreeBSD-6.1. I want to control the configure process via knobs, such as
> > > WITH_CUPS, WITH_KDE and some others. When I try to install OOo this
> > > way:
> > >
> > > portupgrade -Nvm "-DWITH_KDE -DWITH_CUPS -DWITH_CCACHE -DWITH_GPC
> > > -DWITHOUT_MOZILLA" editors/openoffice.org-2.0
> > >
> > > I see right messages about make flags:
> > >
> > > --->  Session started at: Sat, 10 Jun 2006 16:34:58 +0400
> > > --->  Fresh installation of editors/openoffice.org-2.0 started at: Sat,
> > > 10 Jun 2006 16:34:58 +0400
> > > --->  Installing 'openoffice.org-2.0.3rc5' from a port
> > > (editors/openoffice.org-2.0)
> > > --->  Build of editors/openoffice.org-2.0 started at: Sat, 10 Jun 2006
> > > 16:35:03 +0400
> > > --->  Building '/usr/ports/editors/openoffice.org-2.0' with make
> > > flags: -DWITH_KDE -DWITH_CUPS -DWITH_CCACHE -DWITH_GPC
> > > -DWITHOUT_MOZILLA
> > >
> > > But if I put those make flags into /usr/local/etc/pkgtools.conf, then I
> > > get no message about custom make flags, and if I look
> > > at /usr/ports/editors/openoffice.org-2.0/work/config_office/config.log,
> > > I see that my make flags are not working properly.
> > > My pkgtools.conf part:
> > >
> > >  MAKE_ARGS = {
> > > ...
> > > 'editors/openoffice.org-2.0' => [
> > >   '-DWITH_CUPS',
> > > '-DWITH_KDE',
> > >   'LOCALIZED_LANG=ru',
> > >   '-DWITH_GPC',
> > >   '-DWITH_CCACHE'
> > > ],
> > > ...
> > > }
> >
> >FWIW, I spotted the same problem today. portupgrade -N doesn't pick
> > up the MAKE_ARGS from pkgtools.conf.
>
> FYI, after updating portupgrade to 2.1.2_1,1, everything works
> correctly.

'portversion -v sysutils/portupgrade' says, that portupgrade is already 
updated to 2.1.2_1,1 version:
# portversion -v sysutils/portupgrade
portupgrade-2.1.2_1,1   =  up-to-date with port

But the symptoms with openoffice.org-2.0 are still the same. Some other ports 
(such as vim, xchat, firefox-i18n) work fine with MAKE_ARGS. Maybe the error 
is in port name in pkgtools.conf? I tried different ways of specifying port 
name: editors/openoffice.org-2.0, openoffice.org-2.0, openoffice.org, 
openoffice, editors/openoffice.org* ans so on. No luck.

-- 
-wbr,
Andrey Melentyev
[EMAIL PROTECTED]
+7-904-644-91-66
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: openoffice.org-2.0, portinstall and MAKE_ARGS

2006-06-11 Thread Vlad GALU

On 6/11/06, Vlad GALU <[EMAIL PROTECTED]> wrote:

On 6/10/06, Andrey Melentyev <[EMAIL PROTECTED]> wrote:
> Hi all!
> I've got a problem with building editors/openoffice-2.0 on my FreeBSD-6.1.
> I want to control the configure process via knobs, such as WITH_CUPS, WITH_KDE
> and some others. When I try to install OOo this way:
>
> portupgrade -Nvm "-DWITH_KDE -DWITH_CUPS -DWITH_CCACHE -DWITH_GPC 
-DWITHOUT_MOZILLA"
> editors/openoffice.org-2.0
>
> I see right messages about make flags:
>
> --->  Session started at: Sat, 10 Jun 2006 16:34:58 +0400
> --->  Fresh installation of editors/openoffice.org-2.0 started at: Sat, 10 Jun
> 2006 16:34:58 +0400
> --->  Installing 'openoffice.org-2.0.3rc5' from a port
> (editors/openoffice.org-2.0)
> --->  Build of editors/openoffice.org-2.0 started at: Sat, 10 Jun 2006
> 16:35:03 +0400
> --->  Building '/usr/ports/editors/openoffice.org-2.0' with make
> flags: -DWITH_KDE -DWITH_CUPS -DWITH_CCACHE -DWITH_GPC -DWITHOUT_MOZILLA
>
> But if I put those make flags into /usr/local/etc/pkgtools.conf, then I get no
> message about custom make flags, and if I look
> at /usr/ports/editors/openoffice.org-2.0/work/config_office/config.log, I see
> that my make flags are not working properly.
> My pkgtools.conf part:
>
>  MAKE_ARGS = {
> ...
> 'editors/openoffice.org-2.0' => [
>   '-DWITH_CUPS',
> '-DWITH_KDE',
>   'LOCALIZED_LANG=ru',
>   '-DWITH_GPC',
>   '-DWITH_CCACHE'
> ],
> ...
> }

   FWIW, I spotted the same problem today. portupgrade -N doesn't pick
up the MAKE_ARGS from pkgtools.conf.



   FYI, after updating portupgrade to 2.1.2_1,1, everything works correctly.

--
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.1-RELEASE: WARNING - WRITE_DMA soft error

2006-06-11 Thread srwadleigh

Doug White wrote:

On Mon, 5 Jun 2006, srwadleigh wrote:


I have 6.1-RELEASE installed on a Supermicro SuperServer 6014P-TR
with Supermicro motherboard: Super X6DHP-TG 
http://supermicro.com/products/motherboard/Xeon800/E7520/X6DHP-TG.cfm


I have four SATA drives attached to the internal backplane which uses
the following controller, I am Not using the onboard RAID features:

   Marvell 88SX6081 4-port SATA Controller with 3rd-Party Adaptec
   AIC-8110(4x drive), RAID 0, 1, JBOD support

The problem I am seeing occurs with the fourth drive, ad10, and appears
on all read/write operations:

ad10: WARNING - WRITE_DMA48 soft error (ECC corrected) LBA=293046767
kernel:ad10: WARNING - WRITE_DMA soft error (ECC corrected) LBA=12393823
kernel:ad10: WARNING - READ_DMA soft error (ECC corrected) LBA=12480567

This same warning message appears on 6.0-RELEASE and 6.1-STABLE


I have problems with a similar chassis w/ SCSI and bay 3 throwing 
spurious errors on a number of systems, so I think its just poor 
backplane design on SuperMicro's part. Try getting them to replace 
your backplane board.




I am going to RMA the backplane and its composite SATA cable. I'll see 
if that makes a difference. Obviously not ideal, but how serious do you 
think these ECC warnings are, data lose, system stability?


Thanks
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"