On 13/04/2023 06:28, Mark Millard wrote:
From: Charlie Li <vishwin_at_freebsd.org> wrote on
Date: Wed, 12 Apr 2023 20:11:16 UTC :
Charlie Li wrote:
Mateusz Guzik wrote:
can you please test poudriere with
https://github.com/openzfs/zfs/pull/14739/files
After applying, on the md(4)-backed pool regardless of block_cloning,
the cy@ `cp -R` test reports no differing (ie corrupted) files. Will
report back on poudriere results (no block_cloning).
As for poudriere, build failures are still rolling in. These are (and
have been) entirely random on every run. Some examples from this run:
lang/php81:
- post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development
${WRKSRC}/php.ini-production ${WRKDIR}/php.conf ${STAGEDIR}/${PREFIX}/etc
- consumers fail to build due to corrupted php.conf packaged
devel/ninja:
- phase: stage
- install -s -m 555
/wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja
/wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
- consumers fail to build due to corrupted bin/ninja packaged
devel/netsurf-buildsystem:
- phase: stage
- mkdir -p
/wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles
/wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/testtools
for M in Makefile.top Makefile.tools Makefile.subdir Makefile.pkgconfig
Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
cp makefiles/$M
/wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles/;
\
done
- graphics/libnsgif fails to build due to NUL characters in
Makefile.{clang,subdir}, causing nothing to link
Summary: I have problems building ports into packages
via poudriere-devel use despite being fully updated/patched
(as of when I started the experiment), never having enabled
block_cloning ( still using openzfs-2.1-freebsd ).
In other words, I can confirm other reports that have
been made.
The details follow.
[Written as I was working on setting up for the experiments
and then executing those experiments, adjusting as I went
along.]
I've run my own tests in a context that has never had the
zpool upgrade and that jump from before the openzfs import to
after the existing commits for trying to fix openzfs on
FreeBSD. I report on the sequence of activities getting to
the point of testing as well.
By personal policy I keep my (non-temporary) pool's compatible
with what the most recent ??.?-RELEASE supports, using
openzfs-2.1-freebsd for now. The pools involved below have
never had a zpool upgrade from where they started. (I've no
pools that have ever had a zpool upgrade.)
(Temporary pools are rare for me, such as this investigation.
But I'm not testing block_cloning or anything new this time.)
I'll note that I use zfs for bectl, not for redundancy. So
my evidence is more limited in that respect.
The activities were done on a HoneyComb (16 Cortex-A72 cores).
The system has and supports ECC RAM, 64 GiBytes of RAM are
present.
I started by duplicating my normal zfs environment to an
external USB3 NVMe drive and adjusting the host name and such
to produce the below. (Non-debug, although I do not strip
symbols.) :
# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90
main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
arm64 aarch64 1400082 1400082
I then did: git fetch, stash push ., merge --ff-only, stash apply . :
my normal procedure. I then also applied the patch from:
https://github.com/openzfs/zfs/pull/14739/files
Then I did: buildworld buildkernel, install them, and rebooted.
The result was:
# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91
main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
arm64 aarch64 1400086 1400086
The later poudriere-devel based build of packages from ports is
based on:
# ~/fbsd-based-on-what-commit.sh -C /usr/ports
4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) devel/freebsd-gcc12:
Bump to 12.2.0.
Author: John Baldwin <j...@freebsd.org>
Commit: John Baldwin <j...@freebsd.org>
CommitDate: 2023-03-25 00:06:40 +0000
branch: main
merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72
merge-base: CommitDate: 2023-03-25 00:06:40 +0000
n613214 (--first-parent --count for merge-base)
poudriere attempted to build 476 packages, starting
with pkg (in order to build the 56 that I explicitly
indicate that I want). It is my normal set of ports.
The form of building is biased to allowing a high
load average compared to the number of hardware
threads (same as cores here): each builder is allowed
to use the full count of hardware threads. The build
used USE_TMPFS="data" instead of the USE_TMPFS=all I
normally use on the build machine involved.
And it produced some random errors during the attempted
builds. A type of example that is easy to interpret
without further exploration is:
pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse error at
"'\x00\x00\x00\x00\x00\x00\x00\x00'": Expected W:(0-9A-Za-z)
A fair number of errors are of the form: the build
installing a previously built package for use in the
builder but later the builder can not find some file
from the package's installation.
Another error reported was:
ld: error: /usr/local/lib/libblkid.a: unknown file type
For reference:
[main-CA72-bulk_a-default] [2023-04-12_20h45m32s] [committing:] Queued: 476
Built: 252 Failed: 11 Skipped: 213 Ignored: 0 Fetched: 0 Tobuild: 0
Time: 00:37:52
I started another build that tried to build 224 packeges:
the 11 failed and 213 skipped.
Just 1 package built that failed before:
[00:04:58] [09] [00:04:15] Finished databases/sqlite3@default |
sqlite3-3.41.0_1,1: Success
It seems to be the only one where the original failure was not
an example of complaining about the missing/corrupted content
of a package install used for building. So it is an example
of randomly varying behavior.
That, in turn, allowed:
[00:04:58] [01] [00:00:00] Building security/nss | nss-3.89
to build but everything else failed or was skipped.
The sqlite3 vs. other failure difference suggests that writes
have random problems but later reads reliably see the problem
that resulted (before the content is deleted).
After the above:
# zpool status
pool: zroot
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
da0p8 ONLINE 0 0 0
errors: No known data errors
# zpool scrub zroot
# zpool status
pool: zroot
state: ONLINE
scan: scrub repaired 0B in 00:16:25 with 0 errors on Wed Apr 12 22:15:39 2023
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
da0p8 ONLINE 0 0 0
errors: No known data errors
===
Mark Millard
marklmi at yahoo.com
Hi,
I'm having a funny issue here and I'm wondering if it is related.
When building one of my ports I will, eventually, not always, get a file
full of zeros as a result.
The build will create copies of crispy-setup and, every once in a while,
one of them will be a blob of zeros:
I'm running the recent ZFS update but I never upgraded my pool:
FreeBSD capeta 14.0-CURRENT FreeBSD 14.0-CURRENT #4
main-n262091-eed92455e600: Tue Apr 11 16:06:42 IST 2023
danilo@capeta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64
cp crispy-setup crispy-doom-setup
--- crispy-heretic-setup ---
cp crispy-setup crispy-heretic-setup
--- crispy-hexen-setup ---
cp crispy-setup crispy-hexen-setup
--- crispy-strife-setup ---
cp crispy-setup crispy-strife-setup
$ ls -l work/stage/usr/local/bin/crispy-*-setup
-r-xr-xr-x 1 danilo wheel 923488 Apr 13 10:10
work/stage/usr/local/bin/crispy-doom-setup
-r-xr-xr-x 1 danilo wheel 923488 Apr 13 10:10
work/stage/usr/local/bin/crispy-heretic-setup
-r-xr-xr-x 1 danilo wheel 923488 Apr 13 10:10
work/stage/usr/local/bin/crispy-hexen-setup
-r-xr-xr-x 1 danilo wheel 923488 Apr 13 10:10
work/stage/usr/local/bin/crispy-strife-setup
$ file work/stage/usr/local/bin/crispy-*-setup
work/stage/usr/local/bin/crispy-doom-setup: ELF 64-bit LSB executable...
work/stage/usr/local/bin/crispy-heretic-setup: ELF 64-bit LSB executable...
work/stage/usr/local/bin/crispy-hexen-setup: data
work/stage/usr/local/bin/crispy-strife-setup: ELF 64-bit LSB executable...
$ hexdump work/stage/usr/local/bin/crispy-hexen-setup
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
00e1760