On Fri, 13 Jul 2007, Marcos Laufer wrote:

> Otto,
> 
> I know the cables are allright, i'm using them with other hard drive .
> And the hard drive is new , but i will format it and check if it
> shows up some errors.
> I hope it is hardware related , i would get kind of scared otherwise.
> Do you need me to try anything else with this filesystem?

If possible (if it's not too large and the drive cooperates), I would
like a dd of the partition. I'm always very interested in having an
image of a filesystems on which fsck_ffs chokes. 

        -Otto

> 
> Regards,
> Marcos
> 
> ----- Original Message ----- 
> From: "Otto Moerbeek" <[EMAIL PROTECTED]>
> To: "Marcos Laufer" <[EMAIL PROTECTED]>
> Cc: <misc@openbsd.org>
> Sent: Friday, July 13, 2007 4:46 PM
> Subject: Re: fsck Segmentation fault on 4.1
> 
> 
> On Fri, 13 Jul 2007, Marcos Laufer wrote:
> 
> > Otto ,
> >
> > This is the error i get:
> > It starts booting , and it starts fsck , it fails with /dev/rwd0e and rwd0h,
> >
> > (i could see once that when it finished it says:)
> > fsck_ffs in free():  error: free_page: pointer to wrong page
> > fsck: /dev/rwd0h: Abort trap
> >
> > I reboot it again many times and that did not show again
> >
> >
> > i try to fsck manually like this as you say and i get:
> >
> > # ulimit -d unlimited
> > # fsck -y /dev/rwd0e
> >
> > INCONSISTENT CGSIZE=16384
> >
> > FIX? yes
> >
> > * * Last mounted on /usr
> > * * Phase 1- Check Blocks and Sizes
> > * * Phase 2 - Check pathnames
> > * * Phase 3 - Check Conectivity
> > * * Phase 4 - Check Reference Counts
> > * * Phase 5 - Check Cyl Groups
> >
> > CANNOT READ: BLK 64
> >
> > CONTINUE? yes
> >
> > fsck: /dev/rwd0e: Segmentation Fault
> 
> This is not an out of memory situation.
> 
> It looks like fsck_ffs has problems getting data from your disk,
> probably because of hardware failure or bad cabling.  Sometimes it
> detects it cannot read the data (the CANNOT READ: BLK 64 case), but it
> is possible it gets corrupted data in other cases.
> 
> Sadly, this can cause fsck_ffs to do the wrong thing and access wrong
> memory and corrupt it's internal data. During the last year I've fixed
> some stuff in this area, but there still remains cases that can go
> wrong.
> 
> -Otto
> 
> 
> > # _
> >
> >
> > The dmesg is:
> >
> > OpenBSD 4.1-stable (GENERIC) #0: Mon May 14 14:02:47 ART 2007
> >     [EMAIL PROTECTED]:/u/system/src/sys/arch/i386/compile/GENERIC
> > cpu0: Intel(R) Pentium(R) 4 CPU 2.80GHz ("GenuineIntel" 686-class) 2.81 GHz
> > cpu0:
> >
> FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX
> > ,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
> > real mem  = 1064857600 (1039900K)
> > avail mem = 964222976 (941624K)
> > using 4278 buffers containing 53366784 bytes (52116K) of memory
> > mainbus0 (root)
> > bios0 at mainbus0: AT/286+ BIOS, date 09/15/03, BIOS32 rev. 0 @ 0xfbbd0, 
> > SMBIOS rev. 2.2
> @
> > 0xf0800 (39 entries)
> > bios0: MICRO-STAR INTL, CO.,LTD. MS-6743
> > apm0 at bios0: Power Management spec V1.2
> > apm0: AC on, battery charge unknown
> > apm0: flags 70102 dobusy 1 doidle 1
> > pcibios0 at bios0: rev 2.1 @ 0xf0000/0xdf84
> > pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdeb0/176 (9 entries)
> > pcibios0: PCI Exclusive IRQs: 3 4 5 7 10 11
> > pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371SB ISA" rev 0x00)
> > pcibios0: PCI bus #1 is the last bus
> > bios0: ROM list: 0xc0000/0xa600 0xcc000/0x1800
> > acpi at mainbus0 not configured
> > cpu0 at mainbus0
> > pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
> > pchb0 at pci0 dev 0 function 0 "Intel 82865G/PE/P CPU-I/0-1" rev 0x02
> > vga1 at pci0 dev 2 function 0 "Intel 82865G Video" rev 0x02: aperture at 
> > 0xf0000000,
> size
> > 0x8000000
> > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> > wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> > ppb0 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0xc2
> > pci1 at ppb0 bus 1
> > fxp0 at pci1 dev 8 function 0 "Intel PRO/100 VE" rev 0x02, i82562: irq 10, 
> > address
> > 00:0c:76:b5:8a:85
> > inphy0 at fxp0 phy 1: i82562ET 10/100 PHY, rev. 0
> > ichpcib0 at pci0 dev 31 function 0 "Intel 82801EB/ER LPC" rev 0x02
> > pciide0 at pci0 dev 31 function 2 "Intel 82801EB SATA" rev 0x02: DMA, 
> > channel 0
> configured
> > to compatibility, channel 1 configured to compatibility
> > wd0 at pciide0 channel 0 drive 1: <WDC WD800JD-00MSA1>
> > wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
> > wd0(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 5
> > atapiscsi0 at pciide0 channel 1 drive 1
> > scsibus0 at atapiscsi0: 2 targets
> > cd0 at scsibus0 targ 0 lun 0: <LITEON, CD-ROM LTN526D, 9S01> SCSI0 5/cdrom 
> > removable
> > cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2
> > ichiic0 at pci0 dev 31 function 3 "Intel 82801EB/ER SMBus" rev 0x02: irq 4
> > iic0 at ichiic0
> > iic0: addr 0x2f 04=00 06=0a 07=00 0c=00 0d=07 0e=85 0f=00 10=c0 11=11 12=00 
> > 13=60 14=14
> > 15=62 16=01 17=06
> > isa0 at ichpcib0
> > isadma0 at isa0
> > pckbc0 at isa0 port 0x60/5
> > pckbd0 at pckbc0 (kbd slot)
> > pckbc0: using irq 1 for kbd slot
> > wskbd0 at pckbd0: console keyboard, using wsdisplay0
> > pmsi0 at pckbc0 (aux slot)
> > pckbc0: using irq 12 for aux slot
> > wsmouse0 at pmsi0 mux 0
> > pcppi0 at isa0 port 0x61
> > midi0 at pcppi0: <PC speaker>
> > spkr0 at pcppi0
> > lm0 at isa0 port 0x290/8: W83627THF
> > npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
> > biomask ebfd netmask effd ttymask ffff
> > pctr: user-level cycle counter enabled
> > dkcsum: wd0 matches BIOS drive 0x80
> > root on wd0a
> > rootdev=0x0 rrootdev=0x300 rawdev=0x302
> >
> >
> > ----- Original Message ----- 
> > From: "Otto Moerbeek" <[EMAIL PROTECTED]>
> > To: "Marcos Laufer" <[EMAIL PROTECTED]>
> > Cc: <misc@openbsd.org>
> > Sent: Friday, July 13, 2007 3:38 PM
> > Subject: Re: fsck Segmentation fault on 4.1
> >
> >
> > On Fri, 13 Jul 2007, Marcos Laufer wrote:
> >
> > > Hello,
> > >
> > > I want to report a problem i experienced while testing OpenBSD 4.1 .
> > > I've installed it, increased VM_PHYSSEG_MAX to 16
> > > in /usr/src/sys/arch/i386/include/vmparam.h to make
> > > it work with this particular motherboard and made a
> > > stable release.
> > > Installed a server with it and it's working fine as an MX for
> > > a few months until now.
> > > The machine was crashed, no error on the screen and the keyboard
> > > did not respond. I rebooted , it started to fsck , and
> > > the fsck failed on /usr. So i run fsck manually : fsck -y, but
> > > it crashes with segmentation fault, so i can't mount or
> > > start the server.
> > > I read on the archives that it was a problem because of running out
> > > of swap, but i had made a 2gb swap partition, despite of that
> > > i added a 64mb file as swap and tried fsck again, but no luck.
> > > This time it was easy for me to reinstall everything in a new hard disk, 
> > > but
> > > i still keep the old one because i would like to learn how to fix
> > > this , if anyone wants me to make some tests or has
> > > any ideas on what is going on , let me know.
> >
> > Start by showing the error messgae. A segmentation fault is something
> > different than running out of memory.
> >
> > If fsck segfaults, I need a proper error report.
> > See http://www.openbsd.org/report.html
> >
> > If fsck runs out of memory, increasing ulimit -d might help, like:
> >
> > # ulimit -d unlimited
> > # fsck ...
> >
> > That reminds me to cook a diff to do this automatically. With
> > filesystem getting larger an larger, more people will run into
> > out-of-mem situations.
> >
> > -Otto

Reply via email to