On Fri, 13 Jul 2007, Marcos Laufer wrote: > Otto, > > I know the cables are allright, i'm using them with other hard drive . > And the hard drive is new , but i will format it and check if it > shows up some errors. > I hope it is hardware related , i would get kind of scared otherwise. > Do you need me to try anything else with this filesystem?
If possible (if it's not too large and the drive cooperates), I would like a dd of the partition. I'm always very interested in having an image of a filesystems on which fsck_ffs chokes. -Otto > > Regards, > Marcos > > ----- Original Message ----- > From: "Otto Moerbeek" <[EMAIL PROTECTED]> > To: "Marcos Laufer" <[EMAIL PROTECTED]> > Cc: <misc@openbsd.org> > Sent: Friday, July 13, 2007 4:46 PM > Subject: Re: fsck Segmentation fault on 4.1 > > > On Fri, 13 Jul 2007, Marcos Laufer wrote: > > > Otto , > > > > This is the error i get: > > It starts booting , and it starts fsck , it fails with /dev/rwd0e and rwd0h, > > > > (i could see once that when it finished it says:) > > fsck_ffs in free(): error: free_page: pointer to wrong page > > fsck: /dev/rwd0h: Abort trap > > > > I reboot it again many times and that did not show again > > > > > > i try to fsck manually like this as you say and i get: > > > > # ulimit -d unlimited > > # fsck -y /dev/rwd0e > > > > INCONSISTENT CGSIZE=16384 > > > > FIX? yes > > > > * * Last mounted on /usr > > * * Phase 1- Check Blocks and Sizes > > * * Phase 2 - Check pathnames > > * * Phase 3 - Check Conectivity > > * * Phase 4 - Check Reference Counts > > * * Phase 5 - Check Cyl Groups > > > > CANNOT READ: BLK 64 > > > > CONTINUE? yes > > > > fsck: /dev/rwd0e: Segmentation Fault > > This is not an out of memory situation. > > It looks like fsck_ffs has problems getting data from your disk, > probably because of hardware failure or bad cabling. Sometimes it > detects it cannot read the data (the CANNOT READ: BLK 64 case), but it > is possible it gets corrupted data in other cases. > > Sadly, this can cause fsck_ffs to do the wrong thing and access wrong > memory and corrupt it's internal data. During the last year I've fixed > some stuff in this area, but there still remains cases that can go > wrong. > > -Otto > > > > # _ > > > > > > The dmesg is: > > > > OpenBSD 4.1-stable (GENERIC) #0: Mon May 14 14:02:47 ART 2007 > > [EMAIL PROTECTED]:/u/system/src/sys/arch/i386/compile/GENERIC > > cpu0: Intel(R) Pentium(R) 4 CPU 2.80GHz ("GenuineIntel" 686-class) 2.81 GHz > > cpu0: > > > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX > > ,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR > > real mem = 1064857600 (1039900K) > > avail mem = 964222976 (941624K) > > using 4278 buffers containing 53366784 bytes (52116K) of memory > > mainbus0 (root) > > bios0 at mainbus0: AT/286+ BIOS, date 09/15/03, BIOS32 rev. 0 @ 0xfbbd0, > > SMBIOS rev. 2.2 > @ > > 0xf0800 (39 entries) > > bios0: MICRO-STAR INTL, CO.,LTD. MS-6743 > > apm0 at bios0: Power Management spec V1.2 > > apm0: AC on, battery charge unknown > > apm0: flags 70102 dobusy 1 doidle 1 > > pcibios0 at bios0: rev 2.1 @ 0xf0000/0xdf84 > > pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdeb0/176 (9 entries) > > pcibios0: PCI Exclusive IRQs: 3 4 5 7 10 11 > > pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371SB ISA" rev 0x00) > > pcibios0: PCI bus #1 is the last bus > > bios0: ROM list: 0xc0000/0xa600 0xcc000/0x1800 > > acpi at mainbus0 not configured > > cpu0 at mainbus0 > > pci0 at mainbus0 bus 0: configuration mode 1 (no bios) > > pchb0 at pci0 dev 0 function 0 "Intel 82865G/PE/P CPU-I/0-1" rev 0x02 > > vga1 at pci0 dev 2 function 0 "Intel 82865G Video" rev 0x02: aperture at > > 0xf0000000, > size > > 0x8000000 > > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) > > wsdisplay0: screen 1-5 added (80x25, vt100 emulation) > > ppb0 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0xc2 > > pci1 at ppb0 bus 1 > > fxp0 at pci1 dev 8 function 0 "Intel PRO/100 VE" rev 0x02, i82562: irq 10, > > address > > 00:0c:76:b5:8a:85 > > inphy0 at fxp0 phy 1: i82562ET 10/100 PHY, rev. 0 > > ichpcib0 at pci0 dev 31 function 0 "Intel 82801EB/ER LPC" rev 0x02 > > pciide0 at pci0 dev 31 function 2 "Intel 82801EB SATA" rev 0x02: DMA, > > channel 0 > configured > > to compatibility, channel 1 configured to compatibility > > wd0 at pciide0 channel 0 drive 1: <WDC WD800JD-00MSA1> > > wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors > > wd0(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 5 > > atapiscsi0 at pciide0 channel 1 drive 1 > > scsibus0 at atapiscsi0: 2 targets > > cd0 at scsibus0 targ 0 lun 0: <LITEON, CD-ROM LTN526D, 9S01> SCSI0 5/cdrom > > removable > > cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2 > > ichiic0 at pci0 dev 31 function 3 "Intel 82801EB/ER SMBus" rev 0x02: irq 4 > > iic0 at ichiic0 > > iic0: addr 0x2f 04=00 06=0a 07=00 0c=00 0d=07 0e=85 0f=00 10=c0 11=11 12=00 > > 13=60 14=14 > > 15=62 16=01 17=06 > > isa0 at ichpcib0 > > isadma0 at isa0 > > pckbc0 at isa0 port 0x60/5 > > pckbd0 at pckbc0 (kbd slot) > > pckbc0: using irq 1 for kbd slot > > wskbd0 at pckbd0: console keyboard, using wsdisplay0 > > pmsi0 at pckbc0 (aux slot) > > pckbc0: using irq 12 for aux slot > > wsmouse0 at pmsi0 mux 0 > > pcppi0 at isa0 port 0x61 > > midi0 at pcppi0: <PC speaker> > > spkr0 at pcppi0 > > lm0 at isa0 port 0x290/8: W83627THF > > npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 > > biomask ebfd netmask effd ttymask ffff > > pctr: user-level cycle counter enabled > > dkcsum: wd0 matches BIOS drive 0x80 > > root on wd0a > > rootdev=0x0 rrootdev=0x300 rawdev=0x302 > > > > > > ----- Original Message ----- > > From: "Otto Moerbeek" <[EMAIL PROTECTED]> > > To: "Marcos Laufer" <[EMAIL PROTECTED]> > > Cc: <misc@openbsd.org> > > Sent: Friday, July 13, 2007 3:38 PM > > Subject: Re: fsck Segmentation fault on 4.1 > > > > > > On Fri, 13 Jul 2007, Marcos Laufer wrote: > > > > > Hello, > > > > > > I want to report a problem i experienced while testing OpenBSD 4.1 . > > > I've installed it, increased VM_PHYSSEG_MAX to 16 > > > in /usr/src/sys/arch/i386/include/vmparam.h to make > > > it work with this particular motherboard and made a > > > stable release. > > > Installed a server with it and it's working fine as an MX for > > > a few months until now. > > > The machine was crashed, no error on the screen and the keyboard > > > did not respond. I rebooted , it started to fsck , and > > > the fsck failed on /usr. So i run fsck manually : fsck -y, but > > > it crashes with segmentation fault, so i can't mount or > > > start the server. > > > I read on the archives that it was a problem because of running out > > > of swap, but i had made a 2gb swap partition, despite of that > > > i added a 64mb file as swap and tried fsck again, but no luck. > > > This time it was easy for me to reinstall everything in a new hard disk, > > > but > > > i still keep the old one because i would like to learn how to fix > > > this , if anyone wants me to make some tests or has > > > any ideas on what is going on , let me know. > > > > Start by showing the error messgae. A segmentation fault is something > > different than running out of memory. > > > > If fsck segfaults, I need a proper error report. > > See http://www.openbsd.org/report.html > > > > If fsck runs out of memory, increasing ulimit -d might help, like: > > > > # ulimit -d unlimited > > # fsck ... > > > > That reminds me to cook a diff to do this automatically. With > > filesystem getting larger an larger, more people will run into > > out-of-mem situations. > > > > -Otto