Hi, I've been encountering this failure off and on for a few weeks now, and I'd like to help fix it. In short, it seems like non-deterministic test failures, to me. I think we should gather data and report the issue upstream, and maybe disable the offending tests in the meantime.
Mariadb failed for me earlier today with a different error than the ones observed in this bug report so far. My error was the following (when building mariadb 10.1.40 on an x86_64-linux system using Guix 9b2644c): Failure: Failed 1/1990 tests, 99.95% were successful. Failing test(s): tokudb_bugs.5733_innodb The log files in var/log may give you some hint of what went wrong. If you want to report this error, please read first the documentation at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html 558 tests were skipped, 169 by the test itself I kept the failed build directory, but there is no "var" directory to be found there. I guess they meant system logs; I am not sure where such logs would go when emitted from within a derivation. The MySQL website suggested running mysql-test-run.pl with the --force option, which I casually tried after invoking ". environment-variables" from the failed build directory; however, it promptly failed because it could not find 'my_safe_process' - maybe I didn't have everything set up just so to run the tests manually. Curiously, on a different x86_64-linux machine, using Guix commit 6c83c48 (which is only a few commits ahead of 9b2644c), I was able to build mariadb successfully, although I am not sure when I built it (running "guix build mariadb" currently results in quick success for me, so on this machine I probably built or substituted it some time ago). The derivation (without grafts) was identical to the one that failed to build on the other machine, which is strange because I would normally expect the same derivation to succeed on both machines. For the record, this was the derivation: $ guix build --no-grafts -d mariadb /gnu/store/9yw33r8r84qrsic7fiq0lqqkbzisv1cj-mariadb-10.1.40.drv Perhaps these tests fail non-deterministically? Or perhaps they fail in a way that is specific something not isolated from the build process by Guix, such as the kernel, the file system, or the hardware? I tried to check the status of mariadb in Cuirass. However, I only found the following information: https://ci.guix.gnu.org/search?query=mariadb-10.1.40 For x86_64-linux, build 1304242 supposedly failed at 10 May 20:32 +0200 after about 3 hours of runtime: https://ci.guix.gnu.org/build/1304242/details I say "supposedly failed" because I'm not sure why it failed. The build log seems to indicate no problems: https://ci.guix.gnu.org/build/1304242/log/raw Has Cuirass tried to build mariadb since then? May 10th was a long time ago, and I am surprised there is not another build of it from master. Mark H Weaver <m...@netris.org> writes: > Mark H Weaver <m...@netris.org> writes: > >> The same build also failed twice in a row on my Thinkpad X200, and with >> the same error each time, although it's a different error than happens >> on hydra.gnunet.org. On my X200, I get this instead: >> >>> Failure: Failed 1/1091 tests, 99.91% were successful. >>> >>> Failing test(s): tokudb_bugs.mdev4533 > > and it just failed a third time on my X200, again with the same error. It seems like the tests may be flaky. The test failure I saw was different from yours. And in my case, I actually was able to build (or substitute) mariadb once. So maybe what we need to do is gather enough data to report the problem upstream, to enlist their help? Platoxia <plato...@protonmail.com> writes: > This problem persists and is preventing sucessful completion of guix system > reconfigure for pre-1.0.0 systems (at least mine which is still at kernel > 4.20), not only for those using mariadb but also for anyone using any of the > 544 packages that depend on it; as per the command guix graph > --type=reverse-package mariadb | grep -c label). > > This could, potentially, be fixed by simply adding this test to the list of > disabled tests in the package definition: > > --- snip --- > (add-after 'unpack 'adjust-tests > (lambda _ > (let ((disabled-tests > '(;; These fail because root@hostname == root@localhost in > ;; the build environment, causing a user count mismatch. > ;; See <https://jira.mariadb.org/browse/MDEV-7761>. > "main.join_cache" > "main.explain_non_select" > "main.stat_tables_innodb" > "roles.acl_statistics" > > ;; This file contains a time bomb which makes it fail > after > ;; 2030-12-31. See <https://bugs.gnu.org/34351> for > details. > "main.mysqldump" > > ;; XXX: Fails sporadically. > "innodb_fts.crash_recovery" > > ;; FIXME: This test fails on i686: > ;; -myisampack: Can't create/write to file (Errcode: 17 > "File exists") > ;; +myisampack: Can't create/write to file (Errcode: 17 > "File exists) > ;; When running "myisampack --join=foo/t3 foo/t1 foo/t2" > ;; (all three tables must exist and be identical) > ;; in a loop it produces the same error around 1/240 > times. > ;; montywi on #maria suggested removing the real_end > check in > ;; "strings/my_vsnprintf.c" on line 503, yet it still > does not > ;; reach the ending quote occasionally. Disable it for > now. > "main.myisampack" > ;; FIXME: This test fails on armhf-linux: > "mroonga/storage.index_read_multiple_double")) > > ;; This file contains a list of known-flaky tests for this > ;; release. Append our own items. > (unstable-tests (open-file "mysql-test/unstable-tests" > "a"))) > (for-each (lambda (test) > (format unstable-tests "~a : ~a\n" > test "Disabled in Guix")) > disabled-tests) > (close-port unstable-tests) > --- snip --- > > I say "potentially" because after getting this failure I happened to notice > that approximately one and a half minutes after beginning the build of > /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv the kernel > throws this message: "traps: cmTC_35af5[27766] trap invalid opcode > ip:555555555174 sp:7fffffffcc90 error:0 in cmTC_35af5[555555555000+1000]". > > I have retested this several times and confirmed that this occurs each and > every time mariadb-10.1.38.drv tries to build and in approximately the same > amount of time after starting the build. I say approximately because the > closest I could get to a timeframe on this kernel message in relation to the > mariadb build is by sending the stdout from guix system reconfigure through > logger so that it gets printed with a timestamp to the kernel messages > terminal (alt-F12). > > Specifically, the message sequence is always as follows, without deviation > (other than the cmTC_#), with no related messages in between; as per the > command cat /dev/vcs12: > > --- snip --- > May 9 16:36:35 localhost root cmd: guix system reconfigure: building > /gnu/store/c46sn2yfllcfi86p8227wvvr1bxssgxj-mariadb-10.1.38.drv... > May 9 16:38:08 localhost vmunix: [ 9169.050496] traps: cmTC_35af5[27766] > trap invalid opcode ip:555555555174 sp:7fffffffcc90 error:0 in > cmTC_35af5[555555555000+1000] > --- snip --- > > I really suggest trying to simply add the tokudb_alter_table.hcad_all_add > test to the package definition before trying to solve the overall problem, > though. Maybe we can get this in for 1.0.1? > > I would be willing to do this myself and report the results here but I'm > baffled at how to achieve this simple task. Perhaps someone could walk me > through it? I'm not sure about the kernel error. I haven't seen an error like that myself. But perhaps this is yet another test which is failing non-deterministically? I think we need more data. It would be nice if we could build this repeatedly on Cuirass. When the build is 3 hours long, it is difficult to test it on my machine, and I often forget about it by the time it is done running. If I get more time, I will try to dig in more. In the meantime, any thoughts about this would be welcome. -- Chris
signature.asc
Description: PGP signature