Re: intermittent bsdtar/jemalloc failures

2021-10-07 Thread Konstantin Belousov
On Thu, Oct 07, 2021 at 05:43:14PM -0400, Michael Butler wrote:
> On 10/7/21 16:52, Mark Johnston wrote:
> > On Thu, Oct 07, 2021 at 04:18:28PM -0400, Michael Butler via 
> > freebsd-current wrote:
> > > On 10/7/21 15:39, Konstantin Belousov wrote:
> > > > On Thu, Oct 07, 2021 at 03:28:44PM -0400, Michael Butler via 
> > > > freebsd-current wrote:
> > > > > While building a local release bundle, I sometimes get bsdtar failing 
> > > > > (and
> > > > > dumping core) as follows below. Worse, as can be seen below, it 
> > > > > doesn't stop
> > > > > the build unless I happen to notice and it yields an incomplete 
> > > > > package.
> > > > > 
> > > > > a usr/src/sys/netgraph/ng_checksum.h
> > > > > a usr/src/sys/netgraph/ng_message.h
> > > > > a usr/src/sys/netgraph/ng_echo.c
> > > > > a usr/src/sys/netgraph/ng_gif.h
> > > > > : jemalloc_arena.c:747: Failed assertion:
> > > > > "nstime_compare(&decay->epoch, &time) <= 0"
> > > > > Abort trap (core dumped)
> > > > > sh /usr/src/release/scripts/make-manifest.sh *.txz > MANIFEST
> > > > > 
> > > > > What causes this? Build machine is a 2x4-core Intel box with ZFS
> > > > > file-systems all around. I tried stopping NTPD temporarily but the 
> > > > > failures
> > > > > persist .. sometimes :-(
> > > > > 
> > > > > I've seen this at different points in the archiving process so it 
> > > > > doesn't
> > > > > seem specific to building kernel.txz.
> > > > 
> > > > What timecounter do you use? Perhaps show the whole output from
> > > > sysctl kern.timecounter.
> > > 
> > > imb@vm01:/home/imb> sysctl kern.timecounter
> > > kern.timecounter.tsc_shift: 1
> > > kern.timecounter.smp_tsc_adjust: 0
> > > kern.timecounter.smp_tsc: 1
> > > kern.timecounter.invariant_tsc: 1
> > > kern.timecounter.fast_gettime: 1
> > > kern.timecounter.tick: 1
> > > kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(1000)
> > > dummy(-100)
> > > kern.timecounter.hardware: HPET
> > > kern.timecounter.alloweddeviation: 5
> > > kern.timecounter.timehands_count: 2
> > > kern.timecounter.stepwarnings: 0
> > > kern.timecounter.tc.ACPI-fast.quality: 900
> > > kern.timecounter.tc.ACPI-fast.frequency: 3579545
> > > kern.timecounter.tc.ACPI-fast.counter: 16124892
> > > kern.timecounter.tc.ACPI-fast.mask: 16777215
> > > kern.timecounter.tc.HPET.quality: 950
> > > kern.timecounter.tc.HPET.frequency: 14318180
> > > kern.timecounter.tc.HPET.counter: 1883995229
> > > kern.timecounter.tc.HPET.mask: 4294967295
> > > kern.timecounter.tc.i8254.quality: 0
> > > kern.timecounter.tc.i8254.frequency: 1193182
> > > kern.timecounter.tc.i8254.counter: 57
> > > kern.timecounter.tc.i8254.mask: 65535
> > > kern.timecounter.tc.TSC-low.quality: 1000
> > > kern.timecounter.tc.TSC-low.frequency: 1413153007
> > > kern.timecounter.tc.TSC-low.counter: 2352002295
> > > kern.timecounter.tc.TSC-low.mask: 4294967295
> > > 
> > > I overrode the default selection of counter-type as NTPD drifted so
> > > badly as to require stepping almost hourly :-(

If you return to TSC, does the problem go away?
Same question if you leave HPET on, but set fast_gettime to 0.



Re: intermittent bsdtar/jemalloc failures

2021-10-07 Thread Michael Butler via freebsd-current

On 10/7/21 16:52, Mark Johnston wrote:

On Thu, Oct 07, 2021 at 04:18:28PM -0400, Michael Butler via freebsd-current 
wrote:

On 10/7/21 15:39, Konstantin Belousov wrote:

On Thu, Oct 07, 2021 at 03:28:44PM -0400, Michael Butler via freebsd-current 
wrote:

While building a local release bundle, I sometimes get bsdtar failing (and
dumping core) as follows below. Worse, as can be seen below, it doesn't stop
the build unless I happen to notice and it yields an incomplete package.

a usr/src/sys/netgraph/ng_checksum.h
a usr/src/sys/netgraph/ng_message.h
a usr/src/sys/netgraph/ng_echo.c
a usr/src/sys/netgraph/ng_gif.h
: jemalloc_arena.c:747: Failed assertion:
"nstime_compare(&decay->epoch, &time) <= 0"
Abort trap (core dumped)
sh /usr/src/release/scripts/make-manifest.sh *.txz > MANIFEST

What causes this? Build machine is a 2x4-core Intel box with ZFS
file-systems all around. I tried stopping NTPD temporarily but the failures
persist .. sometimes :-(

I've seen this at different points in the archiving process so it doesn't
seem specific to building kernel.txz.


What timecounter do you use? Perhaps show the whole output from
sysctl kern.timecounter.


imb@vm01:/home/imb> sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(1000)
dummy(-100)
kern.timecounter.hardware: HPET
kern.timecounter.alloweddeviation: 5
kern.timecounter.timehands_count: 2
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 16124892
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.counter: 1883995229
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 57
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.TSC-low.quality: 1000
kern.timecounter.tc.TSC-low.frequency: 1413153007
kern.timecounter.tc.TSC-low.counter: 2352002295
kern.timecounter.tc.TSC-low.mask: 4294967295

I overrode the default selection of counter-type as NTPD drifted so
badly as to require stepping almost hourly :-(


Could you show output from

# kldload cpuctl
# cpucontrol -i 0x15 /dev/cpuctl0
# cpucontrol -i 0x16 /dev/cpuctl0

as well as a copy of the dmesg after a boot?  I am looking at a similar
problem currently.


root@vm01:/usr/home/imb # cpucontrol -i 0x15 /dev/cpuctl0
cpuid level 0x15: 0x07280202 0x 0x 0x0503
root@vm01:/usr/home/imb # cpucontrol -i 0x16 /dev/cpuctl0
cpuid level 0x16: 0x07280202 0x 0x 0x0503

This is a Dell-1950 1-U box with a SAS drive-box attached ..

root@vm01:/usr/home/imb # less /var/log/dmesg.today
---<>---
VERBOSE_SYSINIT: DDB not enabled, symbol lookups disabled.
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 14.0-CURRENT #256 main-42dfad2ef1: Sat Oct  2 09:41:36 EDT 2021

r...@vm01.auburn.protected-networks.net:/usr/obj/usr/src/amd64.amd64/sys/VM01 
amd64
FreeBSD clang version 12.0.1 (g...@github.com:llvm/llvm-project.git 
llvmorg-12.0.1-0-gfed41342a82f)

VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU   E5440  @ 2.83GHz (2826.31-MHz 
K8-class CPU)

  Origin="GenuineIntel"  Id=0x10676  Family=0x6  Model=0x17  Stepping=6

Features=0xbfebfbff

Features2=0xce3bd
  AMD Features=0x20100800
  AMD Features2=0x1
  VT-x: HLT,PAUSE
  TSC: P-state invariant, performance statistics
real memory  = 68719476736 (65536 MB)
avail memory = 65811677184 (62762 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s)
random: unblocking device.
Security policy loaded: MAC/ntpd (mac_ntpd)
ioapic0: MADT APIC ID 8 != hw id 0
ioapic0  irqs 0-23
Launching APs: 3 4 1 6 2 5 7
Timecounter "TSC-low" frequency 1413155409 Hz quality 1000
random: entropy device external interface
kbd1 at kbdmux0
vtvga0: 
smbios0:  at iomem 0xfcdf0-0xfce0e
smbios0: Version: 2.5, BCD Revision: 2.5
acpi0: 
acpi0: Power Button (fixed)
Firmware Error (ACPI): Could not resolve symbol [\134_SB._OSC.CDW1], 
AE_NOT_FOUND (20210930/psargs-503)
ACPI Error: Aborting method \134_SB._OSC due to previous error 
(AE_NOT_FOUND) (20210930/psparse-689)

apei0:  on acpi0
ipmi0:  port 0xca8,0xcac on acpi0
ipmi0: KCS mode found at io 0xca8 on acpi
cpu0:  on acpi0
atrtc0:  port 0x70-0x7f irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.00s
Event timer "RTC" frequency 32768 Hz quality 0
attim

Re: intermittent bsdtar/jemalloc failures

2021-10-07 Thread Mark Johnston
On Thu., Oct. 7, 2021, 17:11 Konstantin Belousov, 
wrote:

> On Thu, Oct 07, 2021 at 04:52:52PM -0400, Mark Johnston wrote:
> > On Thu, Oct 07, 2021 at 04:18:28PM -0400, Michael Butler via
> freebsd-current wrote:
> > > On 10/7/21 15:39, Konstantin Belousov wrote:
> > > > On Thu, Oct 07, 2021 at 03:28:44PM -0400, Michael Butler via
> freebsd-current wrote:
> > > >> While building a local release bundle, I sometimes get bsdtar
> failing (and
> > > >> dumping core) as follows below. Worse, as can be seen below, it
> doesn't stop
> > > >> the build unless I happen to notice and it yields an incomplete
> package.
> > > >>
> > > >> a usr/src/sys/netgraph/ng_checksum.h
> > > >> a usr/src/sys/netgraph/ng_message.h
> > > >> a usr/src/sys/netgraph/ng_echo.c
> > > >> a usr/src/sys/netgraph/ng_gif.h
> > > >> : jemalloc_arena.c:747: Failed assertion:
> > > >> "nstime_compare(&decay->epoch, &time) <= 0"
> > > >> Abort trap (core dumped)
> > > >> sh /usr/src/release/scripts/make-manifest.sh *.txz > MANIFEST
> > > >>
> > > >> What causes this? Build machine is a 2x4-core Intel box with ZFS
> > > >> file-systems all around. I tried stopping NTPD temporarily but the
> failures
> > > >> persist .. sometimes :-(
> > > >>
> > > >> I've seen this at different points in the archiving process so it
> doesn't
> > > >> seem specific to building kernel.txz.
> > > >
> > > > What timecounter do you use? Perhaps show the whole output from
> > > > sysctl kern.timecounter.
> > >
> > > imb@vm01:/home/imb> sysctl kern.timecounter
> > > kern.timecounter.tsc_shift: 1
> > > kern.timecounter.smp_tsc_adjust: 0
> > > kern.timecounter.smp_tsc: 1
> > > kern.timecounter.invariant_tsc: 1
> > > kern.timecounter.fast_gettime: 1
> > > kern.timecounter.tick: 1
> > > kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0)
> TSC-low(1000)
> > > dummy(-100)
> > > kern.timecounter.hardware: HPET
> > > kern.timecounter.alloweddeviation: 5
> > > kern.timecounter.timehands_count: 2
> > > kern.timecounter.stepwarnings: 0
> > > kern.timecounter.tc.ACPI-fast.quality: 900
> > > kern.timecounter.tc.ACPI-fast.frequency: 3579545
> > > kern.timecounter.tc.ACPI-fast.counter: 16124892
> > > kern.timecounter.tc.ACPI-fast.mask: 16777215
> > > kern.timecounter.tc.HPET.quality: 950
> > > kern.timecounter.tc.HPET.frequency: 14318180
> > > kern.timecounter.tc.HPET.counter: 1883995229
> > > kern.timecounter.tc.HPET.mask: 4294967295
> > > kern.timecounter.tc.i8254.quality: 0
> > > kern.timecounter.tc.i8254.frequency: 1193182
> > > kern.timecounter.tc.i8254.counter: 57
> > > kern.timecounter.tc.i8254.mask: 65535
> > > kern.timecounter.tc.TSC-low.quality: 1000
> > > kern.timecounter.tc.TSC-low.frequency: 1413153007
> > > kern.timecounter.tc.TSC-low.counter: 2352002295
> > > kern.timecounter.tc.TSC-low.mask: 4294967295
> > >
> > > I overrode the default selection of counter-type as NTPD drifted so
> > > badly as to require stepping almost hourly :-(
> >
> > Could you show output from
> >
> > # kldload cpuctl
> > # cpucontrol -i 0x15 /dev/cpuctl0
> > # cpucontrol -i 0x16 /dev/cpuctl0
> >
> > as well as a copy of the dmesg after a boot?  I am looking at a similar
> > problem currently.
>
> Do you have the issue with jemalloc(3), or the problem with imprecise TSC
> frequency as reported by CPUID leaf?
>

Only the latter, but I did not try overriding the time counter selection.

>


Re: intermittent bsdtar/jemalloc failures

2021-10-07 Thread Konstantin Belousov
On Thu, Oct 07, 2021 at 04:52:52PM -0400, Mark Johnston wrote:
> On Thu, Oct 07, 2021 at 04:18:28PM -0400, Michael Butler via freebsd-current 
> wrote:
> > On 10/7/21 15:39, Konstantin Belousov wrote:
> > > On Thu, Oct 07, 2021 at 03:28:44PM -0400, Michael Butler via 
> > > freebsd-current wrote:
> > >> While building a local release bundle, I sometimes get bsdtar failing 
> > >> (and
> > >> dumping core) as follows below. Worse, as can be seen below, it doesn't 
> > >> stop
> > >> the build unless I happen to notice and it yields an incomplete package.
> > >>
> > >> a usr/src/sys/netgraph/ng_checksum.h
> > >> a usr/src/sys/netgraph/ng_message.h
> > >> a usr/src/sys/netgraph/ng_echo.c
> > >> a usr/src/sys/netgraph/ng_gif.h
> > >> : jemalloc_arena.c:747: Failed assertion:
> > >> "nstime_compare(&decay->epoch, &time) <= 0"
> > >> Abort trap (core dumped)
> > >> sh /usr/src/release/scripts/make-manifest.sh *.txz > MANIFEST
> > >>
> > >> What causes this? Build machine is a 2x4-core Intel box with ZFS
> > >> file-systems all around. I tried stopping NTPD temporarily but the 
> > >> failures
> > >> persist .. sometimes :-(
> > >>
> > >> I've seen this at different points in the archiving process so it doesn't
> > >> seem specific to building kernel.txz.
> > > 
> > > What timecounter do you use? Perhaps show the whole output from
> > > sysctl kern.timecounter.
> > 
> > imb@vm01:/home/imb> sysctl kern.timecounter
> > kern.timecounter.tsc_shift: 1
> > kern.timecounter.smp_tsc_adjust: 0
> > kern.timecounter.smp_tsc: 1
> > kern.timecounter.invariant_tsc: 1
> > kern.timecounter.fast_gettime: 1
> > kern.timecounter.tick: 1
> > kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(1000) 
> > dummy(-100)
> > kern.timecounter.hardware: HPET
> > kern.timecounter.alloweddeviation: 5
> > kern.timecounter.timehands_count: 2
> > kern.timecounter.stepwarnings: 0
> > kern.timecounter.tc.ACPI-fast.quality: 900
> > kern.timecounter.tc.ACPI-fast.frequency: 3579545
> > kern.timecounter.tc.ACPI-fast.counter: 16124892
> > kern.timecounter.tc.ACPI-fast.mask: 16777215
> > kern.timecounter.tc.HPET.quality: 950
> > kern.timecounter.tc.HPET.frequency: 14318180
> > kern.timecounter.tc.HPET.counter: 1883995229
> > kern.timecounter.tc.HPET.mask: 4294967295
> > kern.timecounter.tc.i8254.quality: 0
> > kern.timecounter.tc.i8254.frequency: 1193182
> > kern.timecounter.tc.i8254.counter: 57
> > kern.timecounter.tc.i8254.mask: 65535
> > kern.timecounter.tc.TSC-low.quality: 1000
> > kern.timecounter.tc.TSC-low.frequency: 1413153007
> > kern.timecounter.tc.TSC-low.counter: 2352002295
> > kern.timecounter.tc.TSC-low.mask: 4294967295
> > 
> > I overrode the default selection of counter-type as NTPD drifted so 
> > badly as to require stepping almost hourly :-(
> 
> Could you show output from
> 
> # kldload cpuctl
> # cpucontrol -i 0x15 /dev/cpuctl0
> # cpucontrol -i 0x16 /dev/cpuctl0
> 
> as well as a copy of the dmesg after a boot?  I am looking at a similar
> problem currently.

Do you have the issue with jemalloc(3), or the problem with imprecise TSC
frequency as reported by CPUID leaf?



Re: intermittent bsdtar/jemalloc failures

2021-10-07 Thread Mark Johnston
On Thu, Oct 07, 2021 at 04:18:28PM -0400, Michael Butler via freebsd-current 
wrote:
> On 10/7/21 15:39, Konstantin Belousov wrote:
> > On Thu, Oct 07, 2021 at 03:28:44PM -0400, Michael Butler via 
> > freebsd-current wrote:
> >> While building a local release bundle, I sometimes get bsdtar failing (and
> >> dumping core) as follows below. Worse, as can be seen below, it doesn't 
> >> stop
> >> the build unless I happen to notice and it yields an incomplete package.
> >>
> >> a usr/src/sys/netgraph/ng_checksum.h
> >> a usr/src/sys/netgraph/ng_message.h
> >> a usr/src/sys/netgraph/ng_echo.c
> >> a usr/src/sys/netgraph/ng_gif.h
> >> : jemalloc_arena.c:747: Failed assertion:
> >> "nstime_compare(&decay->epoch, &time) <= 0"
> >> Abort trap (core dumped)
> >> sh /usr/src/release/scripts/make-manifest.sh *.txz > MANIFEST
> >>
> >> What causes this? Build machine is a 2x4-core Intel box with ZFS
> >> file-systems all around. I tried stopping NTPD temporarily but the failures
> >> persist .. sometimes :-(
> >>
> >> I've seen this at different points in the archiving process so it doesn't
> >> seem specific to building kernel.txz.
> > 
> > What timecounter do you use? Perhaps show the whole output from
> > sysctl kern.timecounter.
> 
> imb@vm01:/home/imb> sysctl kern.timecounter
> kern.timecounter.tsc_shift: 1
> kern.timecounter.smp_tsc_adjust: 0
> kern.timecounter.smp_tsc: 1
> kern.timecounter.invariant_tsc: 1
> kern.timecounter.fast_gettime: 1
> kern.timecounter.tick: 1
> kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(1000) 
> dummy(-100)
> kern.timecounter.hardware: HPET
> kern.timecounter.alloweddeviation: 5
> kern.timecounter.timehands_count: 2
> kern.timecounter.stepwarnings: 0
> kern.timecounter.tc.ACPI-fast.quality: 900
> kern.timecounter.tc.ACPI-fast.frequency: 3579545
> kern.timecounter.tc.ACPI-fast.counter: 16124892
> kern.timecounter.tc.ACPI-fast.mask: 16777215
> kern.timecounter.tc.HPET.quality: 950
> kern.timecounter.tc.HPET.frequency: 14318180
> kern.timecounter.tc.HPET.counter: 1883995229
> kern.timecounter.tc.HPET.mask: 4294967295
> kern.timecounter.tc.i8254.quality: 0
> kern.timecounter.tc.i8254.frequency: 1193182
> kern.timecounter.tc.i8254.counter: 57
> kern.timecounter.tc.i8254.mask: 65535
> kern.timecounter.tc.TSC-low.quality: 1000
> kern.timecounter.tc.TSC-low.frequency: 1413153007
> kern.timecounter.tc.TSC-low.counter: 2352002295
> kern.timecounter.tc.TSC-low.mask: 4294967295
> 
> I overrode the default selection of counter-type as NTPD drifted so 
> badly as to require stepping almost hourly :-(

Could you show output from

# kldload cpuctl
# cpucontrol -i 0x15 /dev/cpuctl0
# cpucontrol -i 0x16 /dev/cpuctl0

as well as a copy of the dmesg after a boot?  I am looking at a similar
problem currently.



Re: intermittent bsdtar/jemalloc failures

2021-10-07 Thread Michael Butler via freebsd-current

On 10/7/21 15:39, Konstantin Belousov wrote:

On Thu, Oct 07, 2021 at 03:28:44PM -0400, Michael Butler via freebsd-current 
wrote:

While building a local release bundle, I sometimes get bsdtar failing (and
dumping core) as follows below. Worse, as can be seen below, it doesn't stop
the build unless I happen to notice and it yields an incomplete package.

a usr/src/sys/netgraph/ng_checksum.h
a usr/src/sys/netgraph/ng_message.h
a usr/src/sys/netgraph/ng_echo.c
a usr/src/sys/netgraph/ng_gif.h
: jemalloc_arena.c:747: Failed assertion:
"nstime_compare(&decay->epoch, &time) <= 0"
Abort trap (core dumped)
sh /usr/src/release/scripts/make-manifest.sh *.txz > MANIFEST

What causes this? Build machine is a 2x4-core Intel box with ZFS
file-systems all around. I tried stopping NTPD temporarily but the failures
persist .. sometimes :-(

I've seen this at different points in the archiving process so it doesn't
seem specific to building kernel.txz.


What timecounter do you use? Perhaps show the whole output from
sysctl kern.timecounter.


imb@vm01:/home/imb> sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(1000) 
dummy(-100)

kern.timecounter.hardware: HPET
kern.timecounter.alloweddeviation: 5
kern.timecounter.timehands_count: 2
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 16124892
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.counter: 1883995229
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 57
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.TSC-low.quality: 1000
kern.timecounter.tc.TSC-low.frequency: 1413153007
kern.timecounter.tc.TSC-low.counter: 2352002295
kern.timecounter.tc.TSC-low.mask: 4294967295

I overrode the default selection of counter-type as NTPD drifted so 
badly as to require stepping almost hourly :-(


So .. I have this in /etc/sysctl.conf ..

kern.timecounter.hardware=HPET

While I hope it wouldn't make a difference, I also have powerd enabled 
in /etc/rc.conf to (marginally) reduce the power-consumption when the 
machine is near-idle. sysctl -a | grep ^dev.cpu | grep freq shows ..


dev.cpu.7.freq_levels: 2834/103000 2333/9 2000/79000
dev.cpu.7.freq: 2834
dev.cpu.3.freq_levels: 2834/103000 2333/94000 2000/86000
dev.cpu.3.freq: 2834
dev.cpu.5.freq_levels: 2834/103000 2333/9 2000/79000
dev.cpu.5.freq: 2834
dev.cpu.1.freq_levels: 2834/103000 2333/94000 2000/86000
dev.cpu.1.freq: 2834
dev.cpu.6.freq_levels: 2834/103000 2333/9 2000/79000
dev.cpu.6.freq: 2834
dev.cpu.2.freq_levels: 2834/103000 2333/94000 2000/86000
dev.cpu.2.freq: 2834
dev.cpu.4.freq_levels: 2834/103000 2333/9 2000/79000
dev.cpu.4.freq: 2834
dev.cpu.0.freq_levels: 2834/103000 2333/94000 2000/86000
dev.cpu.0.freq: 2834

imb


OpenPGP_signature
Description: OpenPGP digital signature


Re: intermittent bsdtar/jemalloc failures

2021-10-07 Thread Konstantin Belousov
On Thu, Oct 07, 2021 at 03:28:44PM -0400, Michael Butler via freebsd-current 
wrote:
> While building a local release bundle, I sometimes get bsdtar failing (and
> dumping core) as follows below. Worse, as can be seen below, it doesn't stop
> the build unless I happen to notice and it yields an incomplete package.
> 
> a usr/src/sys/netgraph/ng_checksum.h
> a usr/src/sys/netgraph/ng_message.h
> a usr/src/sys/netgraph/ng_echo.c
> a usr/src/sys/netgraph/ng_gif.h
> : jemalloc_arena.c:747: Failed assertion:
> "nstime_compare(&decay->epoch, &time) <= 0"
> Abort trap (core dumped)
> sh /usr/src/release/scripts/make-manifest.sh *.txz > MANIFEST
> 
> What causes this? Build machine is a 2x4-core Intel box with ZFS
> file-systems all around. I tried stopping NTPD temporarily but the failures
> persist .. sometimes :-(
> 
> I've seen this at different points in the archiving process so it doesn't
> seem specific to building kernel.txz.

What timecounter do you use? Perhaps show the whole output from
sysctl kern.timecounter.



intermittent bsdtar/jemalloc failures

2021-10-07 Thread Michael Butler via freebsd-current
While building a local release bundle, I sometimes get bsdtar failing 
(and dumping core) as follows below. Worse, as can be seen below, it 
doesn't stop the build unless I happen to notice and it yields an 
incomplete package.


a usr/src/sys/netgraph/ng_checksum.h
a usr/src/sys/netgraph/ng_message.h
a usr/src/sys/netgraph/ng_echo.c
a usr/src/sys/netgraph/ng_gif.h
: jemalloc_arena.c:747: Failed assertion: 
"nstime_compare(&decay->epoch, &time) <= 0"

Abort trap (core dumped)
sh /usr/src/release/scripts/make-manifest.sh *.txz > MANIFEST

What causes this? Build machine is a 2x4-core Intel box with ZFS 
file-systems all around. I tried stopping NTPD temporarily but the 
failures persist .. sometimes :-(


I've seen this at different points in the archiving process so it 
doesn't seem specific to building kernel.txz.


Any thoughts?

imb


OpenPGP_signature
Description: OpenPGP digital signature


Re: Writing large build logs to NFS extremely slow?

2021-10-07 Thread Mehmet Erol Sanliturk
On Thu, Oct 7, 2021 at 5:17 AM Felix Palmen  wrote:

> Hi all,
>
> I use a -CURRENT bhyve vm for testing port builds with poudriere. As
> this vm is only running when needed, but I want to always have access to
> the build logs, I use NFS to mount /usr/local/poudriere/data/logs from
> the host.
>
> I noticed some few ports take ridiculously long to build while barely
> using any CPU time at all. On a closer look, that's all ports producing
> a lot of compiler (warning) output, e.g. gcc, gnutls, gtk2, …
>
> So I assume appending to a large file via NFS gets slower and slower. Is
> there any mount option I could try to fix this? Right now I only have
> `nolockd`, I also tried `noncontigwr` which didn't change anything.
>
> Thinking about alternatives to NFS, are there any news for client-side
> 9p virtfs? I found  which
> still builds with a few minor adaptions, but trying to mount a 9p share
> freezes the machine.
>
> Would you suggest a different mailing list to ask?
>
> BR, Felix
>
> --
>  Dipl.-Inform. Felix Palmen ,.//..
>  {web}  http://palmen-it.de  {jabber} [see email]   ,//palmen-it.de
>  {pgp public key} http://palmen-it.de/pub.txt   //   """
>  {pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A
>



I have encountered such cases previously , but I am not able to remember
which parameters I have used to solve this problem ,
because I am not using the FreeBSD server now .
A similar problem occurs also in the Linux NFS server.

The problem is caused mainly by  NFS definition parameters .

If you study NFS definition parameters one by one , I think you will be
able to find which one is effective .

My opinion is the one setting is "write directly to disk" , i.e. , "do not
use the cache" .


In the  "write directly to disk" case , without completion of a write , the
computer in use is waiting for completion of previous write operation
before writing a new record . This is useful in case of abrupt program
terminations because every record is written into the disk file , by
consuming more time .

In the cache use case , time is not consumed much but the last written
records are lost in an abrupt  program termination .


My understanding from your question is this .


Mehmet Erol Sanliturk