Automated report: NetBSD-current/i386 build success

2021-12-29 Thread NetBSD Test Fixture
The NetBSD-current/i386 build is working again.

The following commits were made between the last failed build and the
successful build:

2021.12.30.02.14.55 rillig src/usr.bin/make/cond.c,v 1.321
2021.12.30.02.14.55 rillig 
src/usr.bin/make/unit-tests/cond-token-plain.exp,v 1.14
2021.12.30.02.14.55 rillig 
src/usr.bin/make/unit-tests/cond-token-plain.mk,v 1.15
2021.12.30.02.30.53 christos src/distrib/sets/sets.subr,v 1.199

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2021.12.html#2021.12.30.02.30.53


daily CVS update output

2021-12-29 Thread NetBSD source update


Updating src tree:
P src/build.sh
P src/distrib/sets/sets.subr
P src/distrib/sets/lists/debug/mi
P src/etc/Makefile
P src/external/bsd/openldap/sbin/slapd/Makefile
P src/sys/conf/Makefile.kern.inc
P src/sys/external/bsd/drm2/dist/drm/drm_print.c
P src/sys/uvm/pmap/pmap_tlb.c
P src/usr.bin/make/cond.c
P src/usr.bin/make/make.h
P src/usr.bin/make/nonints.h
P src/usr.bin/make/parse.c
P src/usr.bin/make/var.c
P src/usr.bin/make/unit-tests/cond-token-plain.exp
P src/usr.bin/make/unit-tests/cond-token-plain.mk

Updating xsrc tree:
P xsrc/external/mit/xinit/dist/xinitrc.cpp


Killing core files:




Updating file list:
-rw-rw-r--  1 srcmastr  netbsd  41492879 Dec 30 03:03 ls-lRA.gz


Automated report: NetBSD-current/i386 build failure

2021-12-29 Thread NetBSD Test Fixture
This is an automatically generated notice of a NetBSD-current/i386
build failure.

The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host,
using sources from CVS date 2021.12.29.22.22.13.

An extract from the build.sh output follows:

./usr/share/man/man3/SQLITE_SESSION_MONOLITHIC_STRMSIZE.3
./usr/share/man/man3/SQLITE_SESSION_XEN3PAE_DOM0_STRMSIZE.3
./usr/share/man/man3/SQLITE_SESSION_XEN3PAE_DOMU_STRMSIZE.3
./usr/share/man/man3/SQLITE_XEN3PAE_DOM0_SINGLETHREAD.3
./usr/share/man/man3/SQLITE_XEN3PAE_DOMU_SINGLETHREAD.3
  end of 84 missing files  ==
*** Failed target: checkflist
*** Failed commands:
${SETSCMD} ${.CURDIR}/checkflist  ${MAKEFLIST_FLAGS} 
${CHECKFLIST_FLAGS} ${METALOG.unpriv}
=> cd /tmp/build/2021.12.29.22.22.13-i386/src/distrib/sets &&  
DESTDIR=/tmp/build/2021.12.29.22.22.13-i386/destdir  MACHINE=i386  
MACHINE_ARCH=i386  AWK=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbawk  
CKSUM=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbcksum  
DB=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbdb  
EGREP=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbgrep\ -E  HOST_SH=/bin/sh 
 MAKE=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbmake  
MKTEMP=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbmktemp  
MTREE=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbmtree  
PAX=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbpax  COMPRESS_PROGRAM=gzip  
GZIP=-n  XZ_OPT=-9  TAR_SUFF=tgz  
PKG_CREATE=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbpkg_create  
SED=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbsed  
TSORT=/tmp/build/2021.12.29.22.22.13-i386/tools/bin/nbtsort\ -q  /bin/sh 
/tmp/build/2021.12.29.22.22.13-i386/src/distrib/sets/checkflist  -L base  -M 
/tmp/build/2021
 .12.29.22.22.13-i386/destdir/METALOG.sanitised
*** [checkflist] Error code 1
nbmake[2]: stopped in /tmp/build/2021.12.29.22.22.13-i386/src/distrib/sets
1 error
nbmake[2]: stopped in /tmp/build/2021.12.29.22.22.13-i386/src/distrib/sets
nbmake[1]: stopped in /tmp/build/2021.12.29.22.22.13-i386/src
nbmake: stopped in /tmp/build/2021.12.29.22.22.13-i386/src
ERROR: Failed to make release

The following commits were made between the last successful build and
the failed build:

2021.12.29.22.22.12 christos src/build.sh,v 1.360
2021.12.29.22.22.13 christos src/distrib/sets/lists/debug/mi,v 1.370
2021.12.29.22.22.13 christos src/distrib/sets/sets.subr,v 1.198
2021.12.29.22.22.13 christos src/etc/Makefile,v 1.456
2021.12.29.22.22.13 christos src/sys/conf/Makefile.kern.inc,v 1.286

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2021.12.html#2021.12.29.22.22.13


Re: mpii panic on 9.99.92

2021-12-29 Thread Jan-Hinrich Fessel
Hej,

> Am 29.12.2021 um 19:36 schrieb Michael van Elst :
> 
> os...@fessel.org ("os...@fessel.org") writes:
> 
>> Hej there,
> 
>> does anyone have a clue why this is happening more frequently on =
>> 9.99.92?
> 
>> If not, i probably should send-pr that.
> 
> 
> You should definitely.

OK.  Will do.
> N.B. I'd rather suspect a Xen issue.

Probably not only.  It happened on this machine before it ran Xen.  Only less 
often.

Cheers
Oskar



Re: Filesystem corruption in current 9.99.92 (posix1eacl & log enabled FFSv2)

2021-12-29 Thread Chuck Silvers
On Wed, Dec 29, 2021 at 08:01:53PM +0100, Matthias Petermann wrote:
> Hello,
> 
> On 27.12.21 06:20, Matthias Petermann wrote:
> > I did not try to move the file around as you recommended because I would
> > like to ask if there is anything I can do at this point to gather more
> > diagnostic data to help understand the root cause?
> 
> in the meantime I migrated all files to a freshly created filesystem using
> the patched kernel and so "solved" the problem for now.
> 
> The broken filesystem still exists, but I am now running out of space on the
> host (the filesystems are in sparse allocated VNDs). I would have to delete
> the broken filesystem in a timely manner, but would still like to run
> diagnostic steps on the root cause first, if any. Unfortunately, the
> filesystem is very large and remote, and I don't know how to reasonably
> isolate the affected portion to save space for further analysis. Are there
> any other reasonable steps I could do asap?

let me write something to allow you to extract the contents of the corrupted
extattr block, so that I can reproduce your exact situation on a test machine.
once we have a copy of the corrupted block then you can reclaim the existing
fs image.  hopefully I'll have that later today.

 
> One more question I would have about the patch. It helped very well to avoid
> the freeze when working in such a corrupted filesystem. In this case, the
> filesystem behaves as you described - no ACL is applied or issued on the
> affected directory. When I try to set a new ACL on the affected directory,
> it seems to have no effect, but no error message appears. Would it make
> sense to include the patch with appropriate error logging in the official
> sources, so that when the problem occurs for which we do not know the cause
> at the moment, we will at least get some output (instead of the current
> behavior - the infinite loop)?

yea, the final patch will include printing a message on the console, and
also include some way to restore the ability to set ACLs on the file
without needing to recreate the file (let alone recreating the whole fs).
that will take a little longer though.

-Chuck


Re: Filesystem corruption in current 9.99.92 (posix1eacl & log enabled FFSv2)

2021-12-29 Thread Matthias Petermann

Hello,

On 27.12.21 06:20, Matthias Petermann wrote:
I did not try to move the file around as you recommended because I would 
like to ask if there is anything I can do at this point to gather more 
diagnostic data to help understand the root cause?


in the meantime I migrated all files to a freshly created filesystem 
using the patched kernel and so "solved" the problem for now.


The broken filesystem still exists, but I am now running out of space on 
the host (the filesystems are in sparse allocated VNDs). I would have to 
delete the broken filesystem in a timely manner, but would still like to 
run diagnostic steps on the root cause first, if any. Unfortunately, the 
filesystem is very large and remote, and I don't know how to reasonably 
isolate the affected portion to save space for further analysis. Are 
there any other reasonable steps I could do asap?


One more question I would have about the patch. It helped very well to 
avoid the freeze when working in such a corrupted filesystem. In this 
case, the filesystem behaves as you described - no ACL is applied or 
issued on the affected directory. When I try to set a new ACL on the 
affected directory, it seems to have no effect, but no error message 
appears. Would it make sense to include the patch with appropriate error 
logging in the official sources, so that when the problem occurs for 
which we do not know the cause at the moment, we will at least get some 
output (instead of the current behavior - the infinite loop)?


Kind regards
Matthias


Re: mpii panic on 9.99.92

2021-12-29 Thread Michael van Elst
os...@fessel.org ("os...@fessel.org") writes:

>Hej there,

>does anyone have a clue why this is happening more frequently on =
>9.99.92?

>If not, i probably should send-pr that.


You should definitely.
N.B. I'd rather suspect a Xen issue.




Re: HEADS UP: Merging drm update

2021-12-29 Thread Ryo ONODERA
Hi,

Taylor R Campbell  writes:

>> Date: Tue, 28 Dec 2021 11:34:43 +0900
>> From: Ryo ONODERA 
>> 
>> intel_gt_pm_fini() at netbsd:intel_gt_pm_fini+0x18
>> intel_gt_init() at netbsd:intel_gt_init+0x6ad
>> i915_gem_init() at netbsd:i915_gem_init+0x14d
>> i915_driver_probe() at netbsd:i915_driver_probe+0x949
>> i915drmkms_attach_real() at netbsd:i915drmkms_attach_real+0x4c
>> config_mountroot_thread() at netbsd:config_mountroot_thread+0x60
>
> So intel_gt_init is failing on boot, and the driver has decided to
> give up -- and proximate cause of the crash is that one of the error
> branches is screwy, but while it would be nice to fix the error
> branches it's more important to find why we're reaching them in the
> first place.
>
> Can you get a line number for intel_gt_init+0x6ad, and can you also
> insert prints into every error branch of intel_gt_init to find out
> which one it is and how it fails?  And maybe do that recursively in
> whichever branch does fail?

In sys/external/bsd/drm2/dist/drm/i915/gt/intel_gt.c: intel_gt_init(),
__engines_record_defaults(gt) failed and went to err_gt label,
then the panic happened.
"intel_gt_init+0x6ad" is err_uc_init's intel_uc_fini(>uc).

(snip)
err = __engines_record_defaults(gt);
if (err)
goto err_gt;
(snip)
err_gt:
__intel_gt_disable(gt);
intel_uc_fini_hw(>uc);
err_uc_init:
intel_uc_fini(>uc);
err_engines:
intel_engines_release(gt);
i915_vm_put(fetch_and_zero(>vm));
err_pm:
intel_gt_pm_fini(gt);
intel_gt_fini_scratch(gt);
out_fw:
if (err)
(snip)


And I have added some printfs to __engines_record_defaults() and
the other functions invoked from __engines_record_defaults() as follows.

__engines_record_defaults
intel_gt_wait_for_idle
intel_gt_retire_requests_timeout
dma_fence_wait_timeout
i915_fence_wait (via *fence->ops->wait)
i915_request_wait

In i915_request_wait, DRM_SPIN_TIMED_WAIT_UNTIL sets timeout=0
and i915_request_wait returns timeout=-ETIME.

#ifdef __NetBSD__
spin_lock(rq->fence.lock);
#define C   (i915_request_completed(rq) ? 1 : \
(spin_unlock(rq->fence.lock), \
intel_engine_flush_submission(rq->engine),\
spin_lock(rq->fence.lock),\
i915_request_completed(rq)))
if (flags & I915_WAIT_INTERRUPTIBLE) {
DRM_SPIN_TIMED_WAIT_UNTIL(timeout, ,
rq->fence.lock, timeout,
C);
} else {
DRM_SPIN_TIMED_WAIT_NOINTR_UNTIL(timeout, ,
rq->fence.lock, timeout,
C);
}
#undef  C
if (timeout > 0) {  /* succeeded before timeout */
KASSERT(i915_request_completed(rq));
dma_fence_signal_locked(>fence);
} else if (timeout == 0) {  /* timed out */
timeout = -ETIME;
}
spin_unlock(rq->fence.lock);
DRM_DESTROY_WAITQUEUE();
#else


Thank you.

-- 
Ryo ONODERA // r...@tetera.org
PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3


mpii panic on 9.99.92

2021-12-29 Thread os...@fessel.org
Hej there,

does anyone have a clue why this is happening more frequently on 9.99.92?

If not, i probably should send-pr that.

Any advice on an alternative rock-solid SATA controller for the DL380 G8 that 
does not rev up the fans?

Cheers
Oskar

[ 184523.7505073] panic: kernel diagnostic assertion "xs->resid == xs->datalen" 
failed: file "/hurz/src/sys/dev/pci/mpii.c", line 3207 
[ 184523.7505073] cpu0: Begin traceback...
[ 184523.7505073] vpanic() at netbsd:vpanic+0x14a
[ 184523.7604867] kern_assert() at netbsd:kern_assert+0x4b
[ 184523.7604867] mpii_scsi_cmd_done() at netbsd:mpii_scsi_cmd_done+0x30b
[ 184523.7604867] mpii_intr() at netbsd:mpii_intr+0x21e
[ 184523.7604867] evtchn_do_event() at netbsd:evtchn_do_event+0x10d
[ 184523.7604867] do_hypervisor_callback() at 
netbsd:do_hypervisor_callback+0x167
[ 184523.7704848] Xhandle_hypervisor_callback() at 
netbsd:Xhandle_hypervisor_callback+0x19
[ 184523.7704848] --- interrupt ---
[ 184523.7704848] hypercall_page() at netbsd:hypercall_page+0x3aa
[ 184523.7704848] idle_loop() at netbsd:idle_loop+0x11f
[ 184523.7704848] cpu0: End traceback...

[ 184523.7704848] dumping to dev 168,9 (offset=33482590, size=0): not possible
[ 184523.7704848] rebooting...
(XEN) Hardware Dom0 shutdown: rebooting machine