Automated report: NetBSD-current/i386 build success
The NetBSD-current/i386 build is working again. The following commits were made between the last failed build and the successful build: 2021.12.30.02.14.55 rillig src/usr.bin/make/cond.c,v 1.321 2021.12.30.02.14.55 rillig src/usr.bin/make/unit-tests/cond-token-plain.exp,v 1.14 2021.12.30.02.14.55 rillig src/usr.bin/make/unit-tests/cond-token-plain.mk,v 1.15 2021.12.30.02.30.53 christos src/distrib/sets/sets.subr,v 1.199 Logs can be found at: http://releng.NetBSD.org/b5reports/i386/commits-2021.12.html#2021.12.30.02.30.53
daily CVS update output
Updating src tree: P src/build.sh P src/distrib/sets/sets.subr P src/distrib/sets/lists/debug/mi P src/etc/Makefile P src/external/bsd/openldap/sbin/slapd/Makefile P src/sys/conf/Makefile.kern.inc P src/sys/external/bsd/drm2/dist/drm/drm_print.c P src/sys/uvm/pmap/pmap_tlb.c P src/usr.bin/make/cond.c P src/usr.bin/make/make.h P src/usr.bin/make/nonints.h P src/usr.bin/make/parse.c P src/usr.bin/make/var.c P src/usr.bin/make/unit-tests/cond-token-plain.exp P src/usr.bin/make/unit-tests/cond-token-plain.mk Updating xsrc tree: P xsrc/external/mit/xinit/dist/xinitrc.cpp Killing core files: Updating file list: -rw-rw-r-- 1 srcmastr netbsd 41492879 Dec 30 03:03 ls-lRA.gz
Automated report: NetBSD-current/i386 build failure
This is an automatically generated notice of a NetBSD-current/i386 build failure. The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host, using sources from CVS date 2021.12.29.22.22.13. An extract from the build.sh output follows: ./usr/share/man/man3/SQLITE_SESSION_MONOLITHIC_STRMSIZE.3 ./usr/share/man/man3/SQLITE_SESSION_XEN3PAE_DOM0_STRMSIZE.3 ./usr/share/man/man3/SQLITE_SESSION_XEN3PAE_DOMU_STRMSIZE.3 ./usr/share/man/man3/SQLITE_XEN3PAE_DOM0_SINGLETHREAD.3 ./usr/share/man/man3/SQLITE_XEN3PAE_DOMU_SINGLETHREAD.3 end of 84 missing files == *** Failed target: checkflist *** Failed commands: ${SETSCMD} ${.CURDIR}/checkflist ${MAKEFLIST_FLAGS} ${CHECKFLIST_FLAGS} ${METALOG.unpriv} => cd /tmp/build/2021.12.29.22.22.13-i386/src/distrib/sets && DESTDIR=/tmp/build/2021.12.29.22.22.13-i386/destdir MACHINE=i386 MACHINE_ARCH=i386 AWK=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbawk CKSUM=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbcksum DB=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbdb EGREP=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbgrep\ -E HOST_SH=/bin/sh MAKE=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbmake MKTEMP=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbmktemp MTREE=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbmtree PAX=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbpax COMPRESS_PROGRAM=gzip GZIP=-n XZ_OPT=-9 TAR_SUFF=tgz PKG_CREATE=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbpkg_create SED=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbsed TSORT=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbtsort\ -q /bin/sh /tmp/build/2021.12.29.22.22.13-i386/src/distrib/sets/checkflist -L base -M /tmp/build/2021 .12.29.22.22.13-i386/destdir/METALOG.sanitised *** [checkflist] Error code 1 nbmake[2]: stopped in /tmp/build/2021.12.29.22.22.13-i386/src/distrib/sets 1 error nbmake[2]: stopped in /tmp/build/2021.12.29.22.22.13-i386/src/distrib/sets nbmake[1]: stopped in /tmp/build/2021.12.29.22.22.13-i386/src nbmake: stopped in /tmp/build/2021.12.29.22.22.13-i386/src ERROR: Failed to make release The following commits were made between the last successful build and the failed build: 2021.12.29.22.22.12 christos src/build.sh,v 1.360 2021.12.29.22.22.13 christos src/distrib/sets/lists/debug/mi,v 1.370 2021.12.29.22.22.13 christos src/distrib/sets/sets.subr,v 1.198 2021.12.29.22.22.13 christos src/etc/Makefile,v 1.456 2021.12.29.22.22.13 christos src/sys/conf/Makefile.kern.inc,v 1.286 Logs can be found at: http://releng.NetBSD.org/b5reports/i386/commits-2021.12.html#2021.12.29.22.22.13
Re: mpii panic on 9.99.92
Hej, > Am 29.12.2021 um 19:36 schrieb Michael van Elst : > > os...@fessel.org ("os...@fessel.org") writes: > >> Hej there, > >> does anyone have a clue why this is happening more frequently on = >> 9.99.92? > >> If not, i probably should send-pr that. > > > You should definitely. OK. Will do. > N.B. I'd rather suspect a Xen issue. Probably not only. It happened on this machine before it ran Xen. Only less often. Cheers Oskar
Re: Filesystem corruption in current 9.99.92 (posix1eacl & log enabled FFSv2)
On Wed, Dec 29, 2021 at 08:01:53PM +0100, Matthias Petermann wrote: > Hello, > > On 27.12.21 06:20, Matthias Petermann wrote: > > I did not try to move the file around as you recommended because I would > > like to ask if there is anything I can do at this point to gather more > > diagnostic data to help understand the root cause? > > in the meantime I migrated all files to a freshly created filesystem using > the patched kernel and so "solved" the problem for now. > > The broken filesystem still exists, but I am now running out of space on the > host (the filesystems are in sparse allocated VNDs). I would have to delete > the broken filesystem in a timely manner, but would still like to run > diagnostic steps on the root cause first, if any. Unfortunately, the > filesystem is very large and remote, and I don't know how to reasonably > isolate the affected portion to save space for further analysis. Are there > any other reasonable steps I could do asap? let me write something to allow you to extract the contents of the corrupted extattr block, so that I can reproduce your exact situation on a test machine. once we have a copy of the corrupted block then you can reclaim the existing fs image. hopefully I'll have that later today. > One more question I would have about the patch. It helped very well to avoid > the freeze when working in such a corrupted filesystem. In this case, the > filesystem behaves as you described - no ACL is applied or issued on the > affected directory. When I try to set a new ACL on the affected directory, > it seems to have no effect, but no error message appears. Would it make > sense to include the patch with appropriate error logging in the official > sources, so that when the problem occurs for which we do not know the cause > at the moment, we will at least get some output (instead of the current > behavior - the infinite loop)? yea, the final patch will include printing a message on the console, and also include some way to restore the ability to set ACLs on the file without needing to recreate the file (let alone recreating the whole fs). that will take a little longer though. -Chuck
Re: Filesystem corruption in current 9.99.92 (posix1eacl & log enabled FFSv2)
Hello, On 27.12.21 06:20, Matthias Petermann wrote: I did not try to move the file around as you recommended because I would like to ask if there is anything I can do at this point to gather more diagnostic data to help understand the root cause? in the meantime I migrated all files to a freshly created filesystem using the patched kernel and so "solved" the problem for now. The broken filesystem still exists, but I am now running out of space on the host (the filesystems are in sparse allocated VNDs). I would have to delete the broken filesystem in a timely manner, but would still like to run diagnostic steps on the root cause first, if any. Unfortunately, the filesystem is very large and remote, and I don't know how to reasonably isolate the affected portion to save space for further analysis. Are there any other reasonable steps I could do asap? One more question I would have about the patch. It helped very well to avoid the freeze when working in such a corrupted filesystem. In this case, the filesystem behaves as you described - no ACL is applied or issued on the affected directory. When I try to set a new ACL on the affected directory, it seems to have no effect, but no error message appears. Would it make sense to include the patch with appropriate error logging in the official sources, so that when the problem occurs for which we do not know the cause at the moment, we will at least get some output (instead of the current behavior - the infinite loop)? Kind regards Matthias
Re: mpii panic on 9.99.92
os...@fessel.org ("os...@fessel.org") writes: >Hej there, >does anyone have a clue why this is happening more frequently on = >9.99.92? >If not, i probably should send-pr that. You should definitely. N.B. I'd rather suspect a Xen issue.
Re: HEADS UP: Merging drm update
Hi, Taylor R Campbell writes: >> Date: Tue, 28 Dec 2021 11:34:43 +0900 >> From: Ryo ONODERA >> >> intel_gt_pm_fini() at netbsd:intel_gt_pm_fini+0x18 >> intel_gt_init() at netbsd:intel_gt_init+0x6ad >> i915_gem_init() at netbsd:i915_gem_init+0x14d >> i915_driver_probe() at netbsd:i915_driver_probe+0x949 >> i915drmkms_attach_real() at netbsd:i915drmkms_attach_real+0x4c >> config_mountroot_thread() at netbsd:config_mountroot_thread+0x60 > > So intel_gt_init is failing on boot, and the driver has decided to > give up -- and proximate cause of the crash is that one of the error > branches is screwy, but while it would be nice to fix the error > branches it's more important to find why we're reaching them in the > first place. > > Can you get a line number for intel_gt_init+0x6ad, and can you also > insert prints into every error branch of intel_gt_init to find out > which one it is and how it fails? And maybe do that recursively in > whichever branch does fail? In sys/external/bsd/drm2/dist/drm/i915/gt/intel_gt.c: intel_gt_init(), __engines_record_defaults(gt) failed and went to err_gt label, then the panic happened. "intel_gt_init+0x6ad" is err_uc_init's intel_uc_fini(>uc). (snip) err = __engines_record_defaults(gt); if (err) goto err_gt; (snip) err_gt: __intel_gt_disable(gt); intel_uc_fini_hw(>uc); err_uc_init: intel_uc_fini(>uc); err_engines: intel_engines_release(gt); i915_vm_put(fetch_and_zero(>vm)); err_pm: intel_gt_pm_fini(gt); intel_gt_fini_scratch(gt); out_fw: if (err) (snip) And I have added some printfs to __engines_record_defaults() and the other functions invoked from __engines_record_defaults() as follows. __engines_record_defaults intel_gt_wait_for_idle intel_gt_retire_requests_timeout dma_fence_wait_timeout i915_fence_wait (via *fence->ops->wait) i915_request_wait In i915_request_wait, DRM_SPIN_TIMED_WAIT_UNTIL sets timeout=0 and i915_request_wait returns timeout=-ETIME. #ifdef __NetBSD__ spin_lock(rq->fence.lock); #define C (i915_request_completed(rq) ? 1 : \ (spin_unlock(rq->fence.lock), \ intel_engine_flush_submission(rq->engine),\ spin_lock(rq->fence.lock),\ i915_request_completed(rq))) if (flags & I915_WAIT_INTERRUPTIBLE) { DRM_SPIN_TIMED_WAIT_UNTIL(timeout, , rq->fence.lock, timeout, C); } else { DRM_SPIN_TIMED_WAIT_NOINTR_UNTIL(timeout, , rq->fence.lock, timeout, C); } #undef C if (timeout > 0) { /* succeeded before timeout */ KASSERT(i915_request_completed(rq)); dma_fence_signal_locked(>fence); } else if (timeout == 0) { /* timed out */ timeout = -ETIME; } spin_unlock(rq->fence.lock); DRM_DESTROY_WAITQUEUE(); #else Thank you. -- Ryo ONODERA // r...@tetera.org PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB FD1B F404 27FA C7D1 15F3
mpii panic on 9.99.92
Hej there, does anyone have a clue why this is happening more frequently on 9.99.92? If not, i probably should send-pr that. Any advice on an alternative rock-solid SATA controller for the DL380 G8 that does not rev up the fans? Cheers Oskar [ 184523.7505073] panic: kernel diagnostic assertion "xs->resid == xs->datalen" failed: file "/hurz/src/sys/dev/pci/mpii.c", line 3207 [ 184523.7505073] cpu0: Begin traceback... [ 184523.7505073] vpanic() at netbsd:vpanic+0x14a [ 184523.7604867] kern_assert() at netbsd:kern_assert+0x4b [ 184523.7604867] mpii_scsi_cmd_done() at netbsd:mpii_scsi_cmd_done+0x30b [ 184523.7604867] mpii_intr() at netbsd:mpii_intr+0x21e [ 184523.7604867] evtchn_do_event() at netbsd:evtchn_do_event+0x10d [ 184523.7604867] do_hypervisor_callback() at netbsd:do_hypervisor_callback+0x167 [ 184523.7704848] Xhandle_hypervisor_callback() at netbsd:Xhandle_hypervisor_callback+0x19 [ 184523.7704848] --- interrupt --- [ 184523.7704848] hypercall_page() at netbsd:hypercall_page+0x3aa [ 184523.7704848] idle_loop() at netbsd:idle_loop+0x11f [ 184523.7704848] cpu0: End traceback... [ 184523.7704848] dumping to dev 168,9 (offset=33482590, size=0): not possible [ 184523.7704848] rebooting... (XEN) Hardware Dom0 shutdown: rebooting machine