daily CVS update output
Updating src tree: P src/distrib/sets/lists/tests/mi P src/etc/Makefile P src/share/mk/bsd.README P src/sys/arch/amd64/conf/ALL P src/sys/arch/evbarm/conf/GENERIC64 P src/sys/arch/i386/conf/ALL P src/sys/arch/x86/x86/hyperv.c P src/sys/dev/acpi/ehci_acpi.c P src/sys/dev/acpi/files.acpi U src/sys/dev/acpi/ohci_acpi.c P src/sys/dev/hyperv/hypervvar.h P src/sys/dev/hyperv/vmbus.c P src/sys/dev/pci/if_wm.c P src/sys/dev/usb/ehci.c P src/sys/external/bsd/drm2/dist/drm/drm_atomic.c P src/sys/external/bsd/drm2/dist/drm/i915/i915_active.c P src/sys/external/bsd/drm2/dist/drm/radeon/radeon_ttm.c P src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c P src/sys/kern/subr_pool.c P src/usr.bin/make/unit-tests/Makefile U src/usr.bin/make/unit-tests/opt-version.exp U src/usr.bin/make/unit-tests/opt-version.mk Updating xsrc tree: Killing core files: Updating file list: -rw-rw-r-- 1 srcmastr netbsd 41233745 Dec 24 03:03 ls-lRA.gz
Re: Filesystem corruption in current 9.99.92 (posix1eacl & log enabled FFSv2)
Hi Chuck, On 23.12.21 18:12, Chuck Silvers wrote: a "cylinder group" is a metadata structure in FFS that describes the allocation state of a portion of the blocks and inodes of the file system and contains the inode records themselves. the header for this structure also contains a "magic number" field that is supposed to contain a certain constant value as a way to sanity-check that this metadata on disk was not overwritten with some completely unrelated contents. in your case, since the magic number field does not actually contain the value that it's supposed to contain, we know that the storage underneath the file system has gotten corrupted somehow. you'll want to track down how that happened, but that is separate from your immediate problem. this sounds like a bug I have seen before, where the extended attribute block for a file has been corrupted. please try the attached patch and see if this prevents the infinite loop. if that does prevent the infinite loop, then the file will probably appear not to have an ACL anymore, and I'm not sure what will happen if you try to set a new ACL on the file when it is in this state. for right now, the safest thing you can do will be to make a copy of the file without trying to preserve extended attributes (ie. do not use cp's "-p" option), then delete the original file, then move the copy of the file to have the original file's name, then you can change the new file's owner/group/mode/ACL to be what the original file had. thanks for the good explanation which helped me a lot, and the tip how to break the infinite loop. I will definitely try that. In the meantime I have mounted the filesystem without the posix1eacls option. In this mode the "find /export" command runs cleanly. So your tip regarding the ACLs / extended attributes seems completely right. Currently I transfer the data from there to another filesystem to compare it from there against the most recent backup. Unfortunately this also means that I don't know at the moment if I will get back the state I had before after a new mount with the posix1eacls option. I hope so though, because I would like to find out more about this. I'll get back to you as soon as I'm ready. Thanks again Matthias
Re: HEADS UP: Merging drm update
Acer Revo Box RN86 with DisplayPort connection: http://bsd-hardware.info/?probe=6d184a1e62 dmesg: https://disk.yandex.ru/d/EbYO4kOkUMqhfw Photos of screen: https://disk.yandex.ru/i/2fFMSqX14LT9ig https://disk.yandex.ru/i/S7Bj_2-62Mna3w https://disk.yandex.ru/i/7o46UsqbxfFKVw 23.12.2021 22:22, Dmitrii Postolov пишет: Hi! Sorry for my bad English... I am beginning user of NetBSD-Current. I am owner of Acer Revo Box RN86 (CPU Intel 9400T and UHD Graphics 630) with DisplayPort and HDMI connection. With DisplayPort connection NetBSD-Current with new drm works on Acer Revo RN86 (with some warnings) and graphics works. Also I test NetBSD-Current on Acer Aspire XC-895 with _HDMI_ connection and Intel NUC DN2820FYKH with HDMI connection, in this case the screen in blank on boot and system stop. I think that the problem is in HDMI mode and NetBSD-Current drm. Now Firefox is build on NetBSD-Current, so I send statistics later. --Dmitrii
Re: HEADS UP: Merging drm update
> Date: Fri, 24 Dec 2021 01:53:03 +0900 > From: Ryo ONODERA > > And with this patch, I have gotten the following dmesg: > This has no bus_space_map and extent_alloc_subregion1... OK, can you try the attached patch and see if it gives us any clues in dmesg? This prints a stack trace any time subr_extent.c writes to a struct extent_region and the region now covers the relevant space. diff --git a/sys/kern/subr_extent.c b/sys/kern/subr_extent.c index 05962829e9a9..e7a461c6f265 100644 --- a/sys/kern/subr_extent.c +++ b/sys/kern/subr_extent.c @@ -98,6 +98,22 @@ panic(a ...) printf(a) #defineKASSERT(exp) #endif +#include +#defineBADSTART((unsigned long)0x63ec5018) +#defineOPREGION_SIZE (8 * 1024) +#defineBADEND (BADSTART + OPREGION_SIZE) +static void +dumpex(const struct extent_region *rp, const char *file, int line) +{ + + if (BADEND < rp->er_start || rp->er_end <= BADSTART) + return; + printf("%s:%d extent_region @ %p [0x%lx, 0x%lx)\n", file, line, rp, + rp->er_start, rp->er_end); + db_stacktrace(); +} +#defineDUMPEX(rp) dumpex(rp, __FILE__, __LINE__) + static struct pool expool; /* @@ -373,6 +389,7 @@ extent_insert_and_optimize(struct extent *ex, u_long start, u_long size, * We can coalesce. Prepend us to the first region. */ LIST_FIRST(&ex->ex_regions)->er_start = start; + DUMPEX(LIST_FIRST(&ex->ex_regions)); extent_free_region_descriptor(ex, rp); return; } @@ -383,6 +400,7 @@ extent_insert_and_optimize(struct extent *ex, u_long start, u_long size, */ rp->er_start = start; rp->er_end = start + (size - 1); + DUMPEX(rp); LIST_INSERT_HEAD(&ex->ex_regions, rp, er_link); return; } @@ -402,6 +420,7 @@ extent_insert_and_optimize(struct extent *ex, u_long start, u_long size, * note of it. */ after->er_end = start + (size - 1); + DUMPEX(after); appended = 1; } @@ -420,6 +439,7 @@ extent_insert_and_optimize(struct extent *ex, u_long start, u_long size, * Yup, we can free it up. */ after->er_end = LIST_NEXT(after, er_link)->er_end; + DUMPEX(after); nextr = LIST_NEXT(after, er_link); LIST_REMOVE(nextr, er_link); extent_free_region_descriptor(ex, nextr); @@ -428,6 +448,7 @@ extent_insert_and_optimize(struct extent *ex, u_long start, u_long size, * Nope, just prepend us to the next region. */ LIST_NEXT(after, er_link)->er_start = start; + DUMPEX(LIST_NEXT(after, er_link)); } extent_free_region_descriptor(ex, rp); @@ -452,6 +473,7 @@ extent_insert_and_optimize(struct extent *ex, u_long start, u_long size, */ rp->er_start = start; rp->er_end = start + (size - 1); + DUMPEX(rp); LIST_INSERT_AFTER(after, rp, er_link); } @@ -1118,12 +1140,14 @@ extent_free(struct extent *ex, u_long start, u_long size, int flags) /* Case 2. */ if ((start == rp->er_start) && (end < rp->er_end)) { rp->er_start = (end + 1); + DUMPEX(rp); goto done; } /* Case 3. */ if ((start > rp->er_start) && (end == rp->er_end)) { rp->er_end = (start - 1); + DUMPEX(rp); goto done; } @@ -1132,9 +1156,11 @@ extent_free(struct extent *ex, u_long start, u_long size, int flags) /* Fill in new descriptor. */ nrp->er_start = end + 1; nrp->er_end = rp->er_end; + DUMPEX(nrp); /* Adjust current descriptor. */ rp->er_end = start - 1; + DUMPEX(rp); /* Insert new descriptor after current. */ LIST_INSERT_AFTER(rp, nrp, er_link);
Re: HEADS UP: Merging drm update
Hi! Sorry for my bad English... I am beginning user of NetBSD-Current. I am owner of Acer Revo Box RN86 (CPU Intel 9400T and UHD Graphics 630) with DisplayPort and HDMI connection. With DisplayPort connection NetBSD-Current with new drm works on Acer Revo RN86 (with some warnings) and graphics works. Also I test NetBSD-Current on Acer Aspire XC-895 with _HDMI_ connection and Intel NUC DN2820FYKH with HDMI connection, in this case the screen in blank on boot and system stop. I think that the problem is in HDMI mode and NetBSD-Current drm. Now Firefox is build on NetBSD-Current, so I send statistics later. --Dmitrii
Re: Filesystem corruption in current 9.99.92 (posix1eacl & log enabled FFSv2)
On Thu, Dec 23, 2021 at 12:30:14PM +0100, Matthias Petermann wrote: > Hello, > > for tracking down an FFS issue in current I would appreciate some advice. > There is a NetBSD 9.99.92 Xen/PV VM (storage provided by file backed VND). > The kernel is built from ~2012-11-27 CVS source. The root partition is a > normal FFSv2 with WAPBL. In addition there is a data partition for which I > have posix1eacls enabled (for samba network shares and sysvol). > > The data partition causes problems. Without the host being crashed or rudely > shut down in the past, the filesystem seems to have become inconsistent. I > first noticed this because the "find" of the daily cron job was still > running late in the morning with 100% CPU load but no disk I/O ongoing. > > Then I took the filesystem offline for safety and forced a fsck. Errors were > detected and solved: > > ``` > $ doas fsck -f NAME=export > ** /dev/rdk3 > ** File system is already clean > ** Last Mounted on /export > ** Phase 1 - Check Blocks and Sizes > ** Phase 2 - Check Pathnames > ** Phase 3 - Check Connectivity > ** Phase 4 - Check Reference Counts > ** Phase 5 - Check Cyl groups > CG 31: PASS5: BAD MAGIC NUMBER > ALTERNATE SUPERBLK(S) ARE INCORRECT > SALVAGE? [yn] > > CG 31: PASS5: BAD MAGIC NUMBER > ALTERNATE SUPERBLK(S) ARE INCORRECT > SALVAGE? [yn] y > > SUMMARY INFORMATION BAD > SALVAGE? [yn] y > > BLK(S) MISSING IN BIT MAPS > SALVAGE? [yn] y > > CG 799: PASS5: BAD MAGIC NUMBER > CG 801: PASS5: BAD MAGIC NUMBER > CG 806: PASS5: BAD MAGIC NUMBER > CG 823: PASS5: BAD MAGIC NUMBER > CG 962: PASS5: BAD MAGIC NUMBER > CG 966: PASS5: BAD MAGIC NUMBER > 482470 files, 113827090 used, 67860178 free (3818 frags, 8482045 blocks, > 0.0% fragmentation) > > * FILE SYSTEM WAS MODIFIED * > ``` > > I did not find too much information what this magic numbers of a cylinder > group means and what could have caused them to be "bad" :-/ a "cylinder group" is a metadata structure in FFS that describes the allocation state of a portion of the blocks and inodes of the file system and contains the inode records themselves. the header for this structure also contains a "magic number" field that is supposed to contain a certain constant value as a way to sanity-check that this metadata on disk was not overwritten with some completely unrelated contents. in your case, since the magic number field does not actually contain the value that it's supposed to contain, we know that the storage underneath the file system has gotten corrupted somehow. you'll want to track down how that happened, but that is separate from your immediate problem. > Anyway, a repeated fsck does not show further errors so I thought it should > be fine. However, after mounting the FS to /export with > > ``` > $ find /export > ``` > > i can still trigger the above mentioned 100% CPU problem in a reproduce-able > manner. Thereby find always hangs at the same directory entry. > > Does anyone have an idea how I can investigate this further? I have already > done a ktrace on find, but in the state in question there seems to be no > activity going on in find itself. > > Kind regards > Matthias this sounds like a bug I have seen before, where the extended attribute block for a file has been corrupted. please try the attached patch and see if this prevents the infinite loop. if that does prevent the infinite loop, then the file will probably appear not to have an ACL anymore, and I'm not sure what will happen if you try to set a new ACL on the file when it is in this state. for right now, the safest thing you can do will be to make a copy of the file without trying to preserve extended attributes (ie. do not use cp's "-p" option), then delete the original file, then move the copy of the file to have the original file's name, then you can change the new file's owner/group/mode/ACL to be what the original file had. -Chuck Index: sys/ufs/ffs/ffs_extattr.c === RCS file: /home/chs/netbsd/cvs/src/sys/ufs/ffs/ffs_extattr.c,v retrieving revision 1.8 diff -u -p -r1.8 ffs_extattr.c --- sys/ufs/ffs/ffs_extattr.c 14 Dec 2021 11:06:50 - 1.8 +++ sys/ufs/ffs/ffs_extattr.c 23 Dec 2021 16:52:18 - @@ -393,6 +393,9 @@ ffs_findextattr(u_char *ptr, u_int lengt /* make sure this entry is complete */ if (EXTATTR_NEXT(eap) > eaend) break; + /* handle corrupted ea_length */ + if (EXTATTR_NEXT(eap) < eap + 1) + break; if (eap->ea_namespace != nspace || eap->ea_namelength != nlen || memcmp(eap->ea_name, name, nlen) != 0) continue; @@ -857,6 +860,9 @@ ffs_listextattr(void *v) /* make sure this entry is complete */ if (EXTATTR_NEXT(eap) > eaend) break; + /* handle corrupted ea_length */ + if
Re: HEADS UP: Merging drm update
Hi, Ryo ONODERA writes: > Hi, > > Taylor R Campbell writes: > >>> Date: Thu, 23 Dec 2021 14:56:19 + >>> From: Taylor R Campbell >>> >>> I'm wondering whether the two bus_space_maps in intel_opregion_setup >>> overlap, and whether one needs to be a bus_space_subregion or >>> something. >> >> Never mind, that's a red herring and obviously not what's happening >> here. >> >> Maybe it's bus_space_alloc that's taking the address, not >> bus_space_map or bus_space_reserve at all? Can you put the same >> db_stacktrace treatment into extent_alloc_subregion1? Maybe every >> time an extent_region is inserted into the list or every time anything >> writes to er_start, print its address, start, end, and stack trace? > > I will add db_stacktrace() to extent_alloc_subregion1() > not to extent_alloc_subregion(). > Please give me some minutes. I have applied this patch: Index: sys/kern/subr_extent.c === RCS file: /cvsroot/src/sys/kern/subr_extent.c,v retrieving revision 1.89 diff -u -r1.89 subr_extent.c --- sys/kern/subr_extent.c 15 Aug 2019 09:04:22 - 1.89 +++ sys/kern/subr_extent.c 23 Dec 2021 17:05:00 - @@ -51,6 +51,8 @@ #include +#include + #elif defined(_EXTENT_TESTING) /* @@ -357,6 +359,12 @@ extent_insert_and_optimize(struct extent *ex, u_long start, u_long size, int flags, struct extent_region *after, struct extent_region *rp) { +#if 1 + if (start <= 0x63ec5018 && 0x63ec5018 < start + size) { + printf("extent_insert_and_optimize: Found!!! start=0x%lx, size=0x%lx\n", start, size); + db_stacktrace(); + } +#endif struct extent_region *nextr; int appended = 0; @@ -461,6 +469,13 @@ int extent_alloc_region(struct extent *ex, u_long start, u_long size, int flags) { +#if 1 + printf("extent_alloc_region handles addr=0x%lx, size=0x%lx\n", start, size); + if (start <= 0x63ec5018 && 0x63ec5018 < start + size) { + printf("extent_alloc_region: Found!!! start=0x%lx, size=0x%lx\n", start, size); + db_stacktrace(); + } +#endif struct extent_region *rp, *last, *myrp; u_long end = start + (size - 1); int error; @@ -555,6 +570,7 @@ * Check for a conflict. */ if (rp->er_end >= start) { + printf("extent_alloc_region: conflict: reserved: rp->er_start=0x%lx to rp->er_end=0x%lx, try: start=0x%lx to end=0x%lx\n", rp->er_start, rp->er_end, start, end); /* * We conflict. If we can (and want to) wait, * do so. @@ -619,6 +635,13 @@ u_long size, u_long alignment, u_long skew, u_long boundary, int flags, u_long *result) { +#if 1 + if (substart <= 0x63ec5018 && 0x63ec5018 < subend) { + printf("extent_alloc_subregion1: Found!!! substart=0x%lx, subend=0x%lx, size=0x%lx\n", substart, subend, size); + db_stacktrace(); + } +#endif + struct extent_region *rp, *myrp, *last, *bestlast; u_long newstart, newend, exend, beststart, bestovh, ovh; u_long dontcross; @@ -992,7 +1015,6 @@ extent_alloc_subregion(struct extent *ex, u_long start, u_long end, u_long size, u_long alignment, u_long boundary, int flags, u_long *result) { - return (extent_alloc_subregion1(ex, start, end, size, alignment, 0, boundary, flags, result)); } Index: sys/external/bsd/drm2/dist/drm/i915/display/intel_opregion.c === RCS file: /cvsroot/src/sys/external/bsd/drm2/dist/drm/i915/display/intel_opregion.c,v retrieving revision 1.4 diff -u -r1.4 intel_opregion.c --- sys/external/bsd/drm2/dist/drm/i915/display/intel_opregion.c19 Dec 2021 11:49:11 - 1.4 +++ sys/external/bsd/drm2/dist/drm/i915/display/intel_opregion.c23 Dec 2021 17:05:01 - @@ -927,6 +927,7 @@ int intel_opregion_setup(struct drm_i915_private *dev_priv) { + printf("Enter intel_opregion_setup\n"); struct intel_opregion *opregion = &dev_priv->opregion; struct pci_dev *pdev = dev_priv->drm.pdev; u32 asls, mboxes; @@ -953,11 +954,15 @@ #ifdef __NetBSD__ opregion->bst = pdev->pd_pa.pa_memt; + printf("intel_opregion_setup: asls=0x%x\n", asls); err = -bus_space_map(opregion->bst, asls, OPREGION_SIZE, BUS_SPACE_MAP_LINEAR|BUS_SPACE_MAP_CACHEABLE, &opregion->asls_bsh); if (err) { DRM_DEBUG_DRIVER("Failed to map opregion: %d\n", err); +#if 1 + panic("Failed to map opregion: %d\n", err); +#endif return err; } base = bus_space_vaddr(opregion->bst, opregion->asls_bsh); @@ -1035,6 +1040,7 @@ } #ifdef __NetBSD__ + printf("intel_opregion_set
Re: HEADS UP: Merging drm update
Hi, Taylor R Campbell writes: >> Date: Thu, 23 Dec 2021 14:56:19 + >> From: Taylor R Campbell >> >> I'm wondering whether the two bus_space_maps in intel_opregion_setup >> overlap, and whether one needs to be a bus_space_subregion or >> something. > > Never mind, that's a red herring and obviously not what's happening > here. > > Maybe it's bus_space_alloc that's taking the address, not > bus_space_map or bus_space_reserve at all? Can you put the same > db_stacktrace treatment into extent_alloc_subregion1? Maybe every > time an extent_region is inserted into the list or every time anything > writes to er_start, print its address, start, end, and stack trace? I will add db_stacktrace() to extent_alloc_subregion1() not to extent_alloc_subregion(). Please give me some minutes. -- Ryo ONODERA // r...@tetera.org PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB FD1B F404 27FA C7D1 15F3
Re: HEADS UP: Merging drm update
> Date: Thu, 23 Dec 2021 14:56:19 + > From: Taylor R Campbell > > I'm wondering whether the two bus_space_maps in intel_opregion_setup > overlap, and whether one needs to be a bus_space_subregion or > something. Never mind, that's a red herring and obviously not what's happening here. Maybe it's bus_space_alloc that's taking the address, not bus_space_map or bus_space_reserve at all? Can you put the same db_stacktrace treatment into extent_alloc_subregion1? Maybe every time an extent_region is inserted into the list or every time anything writes to er_start, print its address, start, end, and stack trace?
Re: HEADS UP: Merging drm update
> Date: Thu, 23 Dec 2021 20:53:00 +0900 > From: Ryo ONODERA > > I have added panic()s to extent_alloc_region and > extent_insert_and_optimize. > And the kernel with this change does not display anything at all. > After bootloader, my LCD turns black and stays black forever. > I have removed panic()s after db_stacktrace()s and added only > db_stacktrace() to extent_alloc_region and extent_insert_and_optimize. > And I have gotten the following dmesg. > (I feel that this has no new information.) Can you use pcictl to dump the pci config registers of the graphics device, and dump the content of whatever address is in register 0xfc? Something like: # pcictl pci0 dump -b 0 -d 2 -f 0 PCI configuration registers: Common header: 0x00: 0x01668086 0x0097 0x0309 0x Vendor Name: Intel (0x8086) Device Name: Ivy Bridge Integrated Graphics Device (0x0166) ... # pcictl pci0 read -b 0 -d 2 -f 0 0xfc daf4f018 # dd if=/dev/mem iseek=$((0xdaf4f018)) bs=1 count=8192 | hexdump -C Separately, if you can dmesgs out -- can you print asls and rvda in intel_opregion_setup? I'm wondering whether the two bus_space_maps in intel_opregion_setup overlap, and whether one needs to be a bus_space_subregion or something.
Re: ixg wierdness
On Wed, 22 Dec 2021, Patrick Welche wrote: On Wed, Dec 22, 2021 at 01:34:25PM +0100, Hauke Fath wrote: On Wed, 22 Dec 2021 12:26:21 +, Patrick Welche wrote: The box in 53155 is Hauke's - also a Dell, but slightly different model. he@, not hauke@ -- no Dell boxes here. Sorry - Havard's! On the 51355 front, dholland asks if the 2 bnx hang issue is the same as 47229, and it looks like it. From the email threads quoted in 47229, the gist seems to be that the issue doesn't exist on /i386, just /amd64. I reported something similar on an IBM x3550M3 back in 2019, too: http://mail-index.netbsd.org/tech-net/2019/03/19/msg007302.html -- Stephen
Filesystem corruption in current 9.99.92 (posix1eacl & log enabled FFSv2)
Hello, for tracking down an FFS issue in current I would appreciate some advice. There is a NetBSD 9.99.92 Xen/PV VM (storage provided by file backed VND). The kernel is built from ~2012-11-27 CVS source. The root partition is a normal FFSv2 with WAPBL. In addition there is a data partition for which I have posix1eacls enabled (for samba network shares and sysvol). The data partition causes problems. Without the host being crashed or rudely shut down in the past, the filesystem seems to have become inconsistent. I first noticed this because the "find" of the daily cron job was still running late in the morning with 100% CPU load but no disk I/O ongoing. Then I took the filesystem offline for safety and forced a fsck. Errors were detected and solved: ``` $ doas fsck -f NAME=export ** /dev/rdk3 ** File system is already clean ** Last Mounted on /export ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups CG 31: PASS5: BAD MAGIC NUMBER ALTERNATE SUPERBLK(S) ARE INCORRECT SALVAGE? [yn] CG 31: PASS5: BAD MAGIC NUMBER ALTERNATE SUPERBLK(S) ARE INCORRECT SALVAGE? [yn] y SUMMARY INFORMATION BAD SALVAGE? [yn] y BLK(S) MISSING IN BIT MAPS SALVAGE? [yn] y CG 799: PASS5: BAD MAGIC NUMBER CG 801: PASS5: BAD MAGIC NUMBER CG 806: PASS5: BAD MAGIC NUMBER CG 823: PASS5: BAD MAGIC NUMBER CG 962: PASS5: BAD MAGIC NUMBER CG 966: PASS5: BAD MAGIC NUMBER 482470 files, 113827090 used, 67860178 free (3818 frags, 8482045 blocks, 0.0% fragmentation) * FILE SYSTEM WAS MODIFIED * ``` I did not find too much information what this magic numbers of a cylinder group means and what could have caused them to be "bad" :-/ Anyway, a repeated fsck does not show further errors so I thought it should be fine. However, after mounting the FS to /export with ``` $ find /export ``` i can still trigger the above mentioned 100% CPU problem in a reproduce-able manner. Thereby find always hangs at the same directory entry. Does anyone have an idea how I can investigate this further? I have already done a ktrace on find, but in the state in question there seems to be no activity going on in find itself. Kind regards Matthias