Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
At Fri, 16 Apr 2021 11:44:08 +0100, David Brownlee wrote: Subject: Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0 > > On Fri, 16 Apr 2021 at 08:41, Greg A. Woods wrote: > > > What else is different? What am I missing? What could be different in > > NetBSD current that could cause a FreeBSD domU to (mis)behave this way? > > Could the fault still be in the FreeBSD drivers -- I don't see how as > > the same root problem caused corruption in both HVM and PVH domUs. > > Random data collection thoughts: > > - Can you reproduce it on tiny partitions (to speed up testing) > - If you newfs, shutdown the DOMU, then copy off the data from the > DOM0 does it pass FreeBSD fsck on a native boot > - Alternatively if you newfs an image on a native FreeBSD box and copy > to the DOM0 does the DOMU fsck fail > - Potentially based on results above - does it still happen with a > reboot between the newfs and fsck > - Can you ktrace whichever of newfs or fsck to see exactly what its > writing (tiny *tiny* filesystem for the win here :) So, the root filesystem is clean (from the factory, and verified by at least NetBSD's fsck as OK), but when '-f' is used it is found to be corrupt. Unfortunately I don't have any real FreeBSD machines available (though I could possibly get it installed on my MacBookPro again, but that's probably a multi-day effort at this point). However I've just found a way to reproduce the problem reliably and with a working comparison with a matching-sized memory disk. First off attach a tiny 4mb LVM LV to FreeBSD -- that's the smallest LV possible apparently: dom0 # lvm lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert build scratch -wi-a- 250.00g fbsd-test.0 scratch -wi-a- 30.00g fbsd-test.1 scratch -wi-a- 30.00g nbtest.pkg vg0 -wi-a- 30.00g nbtest.root vg0 -wi-a- 30.00g nbtest.swap vg0 -wi-a- 8.00g nbtest.var vg0 -wi-a- 10.00g tinytestvg0 -wi-a- 4.00m dom0 # xl block-attach fbsd-test format=raw, vdev=sdc, access=rw, target=/dev/mapper/vg0-tinytest Now a run of the test on the FreeBSD domU (first showing the kernel seeing the device attachment): # xbd3: 4MB at device/vbd/2080 on xenbusb_front0 xbd3: attaching as da2 xbd3: features: flush xbd3: synchronize cache commands enabled. GEOM: new disk da2 # dd if=/dev/zero of=tinytest.fs count=8192 8192+0 records in 8192+0 records out 4194304 bytes transferred in 0.081106 secs (51713998 bytes/sec) # mdconfig -a -t vnode -f tinytest.fs md0 # newfs -o space -n md0 /dev/md0: 4.0MB (8192 sectors) block size 32768, fragment size 4096 using 4 cylinder groups of 1.03MB, 33 blks, 256 inodes. super-block backups (for fsck_ffs -b #) at: 192, 2304, 4416, 6528 # newfs -o space -n da2 /dev/da2: 4.0MB (8192 sectors) block size 32768, fragment size 4096 using 4 cylinder groups of 1.03MB, 33 blks, 256 inodes. super-block backups (for fsck_ffs -b #) at: 192, 2304, 4416, 6528 # dumpfs da2 >da2.dumpfs # dumpfs md0 >md0.dumpfs # diff md0.dumpfs da2.dumpfs 1,2c1,2 < magic 19540119 (UFS2) timeFri Apr 16 18:48:55 2021 < superblock location 65536 id [ 6079dc17 1006b3b4 ] --- > magic 19540119 (UFS2) timeFri Apr 16 18:49:57 2021 > superblock location 65536 id [ 6079dc55 348e5947 ] 27c27 < magic 90255 tell2 timeFri Apr 16 18:48:55 2021 --- > magic 90255 tell2 timeFri Apr 16 18:49:57 2021 40c40 < magic 90255 tell128000 timeFri Apr 16 18:48:55 2021 --- > magic 90255 tell128000 timeFri Apr 16 18:49:57 2021 53c53 < magic 90255 tell23 timeFri Apr 16 18:48:55 2021 --- > magic 90255 tell23 timeFri Apr 16 18:49:57 2021 66c66 < magic 90255 tell338000 timeFri Apr 16 18:48:55 2021 --- > magic 90255 tell338000 timeFri Apr 16 18:49:57 2021 # fsck md0 ** /dev/md0 ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1 files, 1 used, 870 free (14 frags, 107 blocks, 1.6% fragmentation) * FILE SYSTEM IS CLEAN * # fsck da2 ** /dev/da2 ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ROOT INODE UNALLOCATED ALLOCATE? [yn] n * FILE SYSTEM MARKED DIRTY * So I ktraced the fsck_ufs run, and though I haven't looked at it with a fine-tooth comb and the source open, the only thing that seems a wee bit different about what fsck does is that it opens the device twice, with O_RDONLY, then shortly before it prints the first "** /dev/da2" line it reopens it O_RDRW a third time, closes the second one, and then closes the second one and calls dup() on the third one so that it has the same FD# as the second open had. Otherwise it does a
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
On Fri, 16 Apr 2021 at 08:41, Greg A. Woods wrote: > What else is different? What am I missing? What could be different in > NetBSD current that could cause a FreeBSD domU to (mis)behave this way? > Could the fault still be in the FreeBSD drivers -- I don't see how as > the same root problem caused corruption in both HVM and PVH domUs. Random data collection thoughts: - Can you reproduce it on tiny partitions (to speed up testing) - If you newfs, shutdown the DOMU, then copy off the data from the DOM0 does it pass FreeBSD fsck on a native boot - Alternatively if you newfs an image on a native FreeBSD box and copy to the DOM0 does the DOMU fsck fail - Potentially based on results above - does it still happen with a reboot between the newfs and fsck - Can you ktrace whichever of newfs or fsck to see exactly what its writing (tiny *tiny* filesystem for the win here :) David
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
So I wrote a little awk script so that I could write 512-byte blocks with varying values of bytes. (Awk is the only decent programming language on the FreeBSD mini-memstick.img which I could think of that would do something close to what I wanted it to do. I could have combined awk+sh+dd and done things faster, but I had all day to let it run while I worked on some small engine repairs.) https://github.com/robohack/experiments/blob/master/tblocks.awk and then I used it to write 30GB to two different LVM LVs, each of identical size, and each exported to the domU, one written on the dom0 and the other written on the domU. Then I ran a cmp of both drives on each the dom0 and domU. On the dom0 side were no differences. All 30GB of what was written directly in the dom0 to one of the LVs was identical to what was written in the FreeBSD domU to the other LV. I.e. the FreeBSD domU side seems to be writing reliably through to the disk. The FreeBSD domU though is _really_ slow at reading with cmp (perhaps not unexpectedly given that it is using stdio to do the read and only managing 4KB requests, at a rate of just under 500 requests per second on each disk). I'm going to send this and go to bed before it finishes, but I'm guessing it's about 2/3's of the way through (it has run for nearly 11,000 seconds), and thus so far there are no differences from the FreeBSD domU's point of view either. Anyway, what the heck is FreeBSD newfs and/or fsck doing different!?!?!?? They're both writing and reading the very same raw device(s) that I wrote and read to/from with awk and cmp. These awk/cmp tests did very sequential operations, and the data are quite uniform and regular; whereas newfs/fsck write/read a much more complex data structure using operations scattered about in the disk. These tests are also writing then reading enough data to flush through the buffer caches in each dom0 and domU several times over. The dom0 has only 4GB and the domU has 8GB, but Xen says it's only using under 2GB. What else is different? What am I missing? What could be different in NetBSD current that could cause a FreeBSD domU to (mis)behave this way? Could the fault still be in the FreeBSD drivers -- I don't see how as the same root problem caused corruption in both HVM and PVH domUs. -- Greg A. Woods Kelowna, BC +1 250 762-7675 RoboHack Planix, Inc. Avoncote Farms pgpOtcGozQ5xB.pgp Description: OpenPGP Digital Signature
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
At Wed, 14 Apr 2021 19:53:47 +0200, Jaromír Doleček wrote: Subject: Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0 > > You can test if this is the problem by disabling the feature in > negotiation in NetBSD xbdback.c - comment out the code which sets > feature-max-indirect-segments in xbdback_backend_changed(). With the > feature disabled, FreeBSD DomU should not use indirect segments. OK, first off the behaviour of the bug didn't change at all; but also FreeBSD did not do quite what I expected as I didn't read their code as carefully as before. It seems they don't directly report the maximum number of indirect segments -- somehow they hide that part. From the dom0 side things look as I think they should, i.e. feature-max-indirect-segments no longer appears in xenstore: # xl block-list fbsd-test Vdev BE handle state evt-ch ring-ref BE-path 2048 0 2 4 47 8/local/domain/0/backend/vbd/2/2048 2064 0 2 4 48 10 /local/domain/0/backend/vbd/2/2064 768 0 2 4 49 11 /local/domain/0/backend/vbd/2/768 # xenstore-ls /local/domain/0/backend/vbd/2 2048 = "" frontend = "/local/domain/2/device/vbd/2048" params = "/dev/mapper/scratch-fbsd--test.0" script = "/etc/xen/scripts/block" frontend-id = "2" online = "1" removable = "0" bootable = "1" state = "4" dev = "sda" type = "phy" mode = "w" device-type = "disk" discard-enable = "1" physical-device = "43266" sectors = "62914560" info = "0" sector-size = "512" feature-flush-cache = "1" hotplug-status = "connected" 2064 = "" frontend = "/local/domain/2/device/vbd/2064" params = "/dev/mapper/scratch-fbsd--test.1" script = "/etc/xen/scripts/block" frontend-id = "2" online = "1" removable = "0" bootable = "1" state = "4" dev = "sdb" type = "phy" mode = "w" device-type = "disk" discard-enable = "1" physical-device = "43267" hotplug-status = "connected" sectors = "62914560" info = "0" sector-size = "512" feature-flush-cache = "1" 768 = "" frontend = "/local/domain/2/device/vbd/768" params = "/build/images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img" script = "/etc/xen/scripts/block" frontend-id = "2" online = "1" removable = "0" bootable = "1" state = "4" dev = "hda" type = "phy" mode = "r" device-type = "disk" discard-enable = "0" vnd = "/dev/vnd0d" physical-device = "3587" sectors = "792576" info = "4" sector-size = "512" feature-flush-cache = "1" hotplug-status = "connected" However FreeBSD now says: # sysctl dev.xbd dev.xbd.2.xenstore_peer_path: /local/domain/0/backend/vbd/2/768 dev.xbd.2.xenbus_peer_domid: 0 dev.xbd.2.xenbus_connection_state: Connected dev.xbd.2.xenbus_dev_type: vbd dev.xbd.2.xenstore_path: device/vbd/768 dev.xbd.2.features: flush dev.xbd.2.ring_pages: 1 dev.xbd.2.max_request_size: 40960 dev.xbd.2.max_request_segments: 11 dev.xbd.2.max_requests: 32 dev.xbd.2.%parent: xenbusb_front0 dev.xbd.2.%pnpinfo: dev.xbd.2.%location: dev.xbd.2.%driver: xbd dev.xbd.2.%desc: Virtual Block Device dev.xbd.1.xenstore_peer_path: /local/domain/0/backend/vbd/2/2064 dev.xbd.1.xenbus_peer_domid: 0 dev.xbd.1.xenbus_connection_state: Connected dev.xbd.1.xenbus_dev_type: vbd dev.xbd.1.xenstore_path: device/vbd/2064 dev.xbd.1.features: flush dev.xbd.1.ring_pages: 1 dev.xbd.1.max_request_size: 40960 dev.xbd.1.max_request_segments: 11 dev.xbd.1.max_requests: 32 dev.xbd.1.%parent: xenbusb_front0 dev.xbd.1.%pnpinfo: dev.xbd.1.%location: dev.xbd.1.%driver: xbd dev.xbd.1.%desc: Virtual Block Device dev.xbd.0.xenstore_peer_path: /local/domain/0/backend/vbd/2/2048 dev.xbd.0.xenbus_peer_domid: 0 dev.xbd.0.xenbus_connection_state: Connected dev.xbd.0.xenbus_dev_type: vbd dev.xbd.0.xenstore_path: device/vbd/2048 dev.xbd.0.features: flush dev.xbd.0.ring_pages: 1 dev.xbd.0.max_request_size: 40960 dev.xbd.0.max_request_segments: 11 dev.xbd.0.max_requests: 32 dev.xbd.0.%parent: xenbusb_front0 dev.xbd.0.%pnpinfo: dev.xbd.0.%location: dev.xbd.0.%driver: xbd dev.xbd.0.%desc: Virtual Block Device dev.xbd.%parent: For reference it said this previously (e.g. for dev.xbd.0): dev.xbd.0.xenstore_peer_path: /local/domain/0/backend/vbd/2/2048 dev.xbd.0.xenbus_peer_domid: 0 dev.xbd.0.xenbus_connection_state: Connected dev.xbd.0.xenbus_dev_type: vbd dev.xbd.0.xenstore_path: device/vbd/2048 dev.xbd.0.fe
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
At Wed, 14 Apr 2021 19:53:47 +0200, Jaromír Doleček wrote: Subject: Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0 > > You can test if this is the problem by disabling the feature in > negotiation in NetBSD xbdback.c - comment out the code which sets > feature-max-indirect-segments in xbdback_backend_changed(). With the > feature disabled, FreeBSD DomU should not use indirect segments. Ah, yes, thanks! I should have thought of that. That's especially useful since on the client side it's a read-only flag: # sysctl -w hw.xbd.xbd_enable_indirect=0 sysctl: oid 'hw.xbd.xbd_enable_indirect' is a read only tunable sysctl: Tunable values are set in /boot/loader.conf Apparently in the Linux implementation the number of indirect segments used by a domU can be tuned at boot time, and that appears to be done by setting a driver option on the guest kernel command line. When I first read that it didn't make so much sense to me to be giving this kind of control to the domU. Perhaps it would be better to make this a tuneable in xl.cfg(5) such that it can be tuned on a per-guest basis. Then setting it to zero for a given guest would not advertise the feature at all. I've some other things to do before I can reboot -- I'll report as soon as that's done -- Greg A. Woods Kelowna, BC +1 250 762-7675 RoboHack Planix, Inc. Avoncote Farms pgp4H02B9VFeu.pgp Description: OpenPGP Digital Signature
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
Le mer. 14 avr. 2021 à 03:21, Greg A. Woods a écrit : > However their front-end code does detect it and seems to make use of it, > and has done for some 6 years now according to "git blame" (with no > recent fixes beyond fixing a memory leak on their end). Here we see it > live from FreeBSD's sysctl output, thus my concern that this feature may > be the source of the problem: You can test if this is the problem by disabling the feature in negotiation in NetBSD xbdback.c - comment out the code which sets feature-max-indirect-segments in xbdback_backend_changed(). With the feature disabled, FreeBSD DomU should not use indirect segments. Jaromir
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
At Tue, 13 Apr 2021 18:20:39 -0700, "Greg A. Woods" wrote: Subject: Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0 > > So "17" seems an odd number, but it is apparently because of "Need to > alloc one extra page to account for possible mapping offset". Nope, changing that to 16 didn't make any difference. -- Greg A. Woods Kelowna, BC +1 250 762-7675 RoboHack Planix, Inc. Avoncote Farms pgpWqf4eWoyDV.pgp Description: OpenPGP Digital Signature
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
At Sun, 11 Apr 2021 13:55:36 -0700, "Greg A. Woods" wrote: Subject: Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0 > > Definitely writing to a FreeBSD domU filesystem, i.e. to a FreeBSD > xbd(4) with a new filesystem created on it, is impossible. So, having run out of "easy" ideas, and working under the assumption that this must be a problem in NetBSD-current dom0 (i.e. not likely in Xen or Xen tools) I've been scanning through changes and this one, so far, is one that would seem to me to have at least some tiny possibility of being the root cause. RCS file: /cvs/master/m-NetBSD/main/src/sys/arch/xen/xen/xbdback_xenbus.c,v revision 1.86 date: 2020-04-21 06:56:18 -0700; author: jdolecek; state: Exp; lines: +175 -47; commitid: 26JkIx2V3sGnZf5C; add support for indirect segments, which makes it possible to pass up to MAXPHYS (implementation limit, interface allows more) using single request request using indirect segment requires 1 extra copy hypercall per request, but saves 2 shared memory hypercalls (map_grant/unmap_grant), so should be net performance boost due to less TLB flushing this also effectively doubles disk queue size for xbd(4) I don't see anything obviously glaringly wrong, and of course this is working A-OK on my same machines with NetBSD-5 and a NetBSD-current (and originally somewhat older NetBSD-8.99) domUs. However I'm really not very familiar with this code and the specs for what it should be doing so I'm unlikely to be able to spot anything that's missing. I did read the following, which mostly reminded me to look in xenstore's db to see what feature-max-indirect-segments is set to by default: https://xenproject.org/2013/08/07/indirect-descriptors-for-xen-pv-disks/ Here's what is stored for a file-backed device: backend = "" vbd = "" 3 = "" 768 = "" frontend = "/local/domain/3/device/vbd/768" params = "/build/images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img" script = "/etc/xen/scripts/block" frontend-id = "3" online = "1" removable = "0" bootable = "1" state = "4" dev = "hda" type = "phy" mode = "r" device-type = "disk" discard-enable = "0" vnd = "/dev/vnd0d" physical-device = "3587" hotplug-status = "connected" sectors = "792576" info = "4" sector-size = "512" feature-flush-cache = "1" feature-max-indirect-segments = "17" Here's what's stored for an LVM-LV backed vbd: 162 = "" 2048 = "" frontend = "/local/domain/162/device/vbd/2048" params = "/dev/mapper/vg1-fbsd--test.0" script = "/etc/xen/scripts/block" frontend-id = "162" online = "1" removable = "0" bootable = "1" state = "4" dev = "sda" type = "phy" mode = "r" device-type = "disk" discard-enable = "0" physical-device = "43285" hotplug-status = "connected" sectors = "83886080" info = "4" sector-size = "512" feature-flush-cache = "1" feature-max-indirect-segments = "17" So "17" seems an odd number, but it is apparently because of "Need to alloc one extra page to account for possible mapping offset". It is currently the maximum for indirect-segments, and it's hard-coded. (Linux apparently has a max of 256, and the linux blkfront defaults to only using 32.) Maybe it should be "16", so matching max_request_size? I did take a quick gander at the related code in FreeBSD (both the domU code that's talking to this code in NetBSD, and the dom0 code that would be used if dom0 was running FreeBSD), and besides seeing that it is quite different, I also don't see anything obviously wrong or incompatible there either. (I do note that the FreeBSD equivalent to xbdback(4) has a major advantage of being able to directly access files, i.e. without the need for vnd(4). Not quite as exciting as maybe full 9pfs mounts through to domUs would be, but still pretty neat!) FreeBSD's equivalent to xbdback(4) (i.e. sys/dev/xen/blkback/blkack.c) doesn't seem to mention "feature-max-indirect-segments", so apparently they don't offer it yet, though it does mention "feature-flush-cache". However their front-end code does detect it and seems to make use of it, and has done for some 6 years now according to "git blame" (with no recent fixes beyond fixing a memory leak on their end). Here we see it live from FreeBSD's sysctl output, thus my concern that this feature may be the source of the problem: hw.xbd.xbd_enable_indirect: 1 dev.xbd.0.max_request_size: 65536 dev.xbd.0.max_request_segments: 17 dev.xbd.0.max_requests: 32 -- Greg A. Woods Kelowna, BC +1 250 762-7675 RoboHack Planix, Inc. Avoncote Farms pgpw4B9SHqX72.pgp Description: OpenPGP Digital Signature
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
On Sun, 11 Apr 2021, Greg A. Woods wrote: Anyway this does something slightly different (and better, or worse) on the FreeBSD side, but still ends up with a corrupted filesystem, as seen from both sides, though maybe not so bad from NetBSD's point of view: # mount /dev/da1 mount: /dev/da1: unknown special file or file system ... # umount /mnt # fsck /dev/da1 ** /dev/da1 ** Last Mounted on /mnt ** Phase 1 - Check Blocks and Sizes PARTIALLY TRUNCATED INODE I=325128 SALVAGE? [yn] n PARTIALLY TRUNCATED INODE I=877864 SALVAGE? [yn] n PARTIALLY TRUNCATED INODE I=877866 SALVAGE? [yn] n PARTIALLY TRUNCATED INODE I=877879 SALVAGE? [yn] ^C * FILE SYSTEM MARKED DIRTY * Don't see that on a standard block device (USB stick): $ sudo gpart add -t freebsd-ufs da0 da0p1 added $ sudo newfs -O1 /dev/da0p1 /dev/da0p1: 496.0MB (1015728 sectors) block size 32768, fragment size 4096 using 4 cylinder groups of 124.00MB, 3968 blks, 15872 inodes. super-block backups (for fsck_ffs -b #) at: 64, 254016, 507968, 761920 $ sudo fsck_ufs -EfrZz da0p1 ** /dev/da0p1 ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 2 files, 2 used, 124907 free (27 frags, 15610 blocks, 0.0% fragmentation) * FILE SYSTEM IS CLEAN * $ sudo mount /dev/da0p1 /media $ sudo umount /media $ -RVP
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
On Sun, 11 Apr 2021, Greg A. Woods wrote: NetBSD can actually make some sense of this FreeBSD filesystem though: # fsck -n /dev/mapper/rscratch-fbsd--test.0 ** /dev/mapper/rscratch-fbsd--test.0 (NO WRITE) Invalid quota magic number CONTINUE? yes ** File system is already clean ** Last Mounted on /mnt ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups SUMMARY INFORMATION BAD SALVAGE? no BLK(S) MISSING IN BIT MAPS SALVAGE? no ** Phase 6 - Check Quotas CLEAR SUPERBLOCK QUOTA FLAG? no 2 files, 2 used, 7612693 free (21 frags, 951584 blocks, 0.0% fragmentation) * UNRESOLVED INCONSISTENCIES REMAIN * I'm not sure if those problems are to be expected with a FreeBSD-created filesystem or not. Probably the "Invalid quota magic number" is normal, but I'm not sure about the "BLK(s) MISSING IN BIT MAPS". Have FreeBSD and NetBSD FFS diverged this much? I won't try to mount it, especially not from the dom0. I've run into this before. This is a NetBSD vs. FreeBSD UFS issue. Create a UFS v1 FS on FreeBSD if you want to write it on NetBSD: newfs -O1 /dev/... Otherwise, you get that "Invalid quota magic number" error because newfs creates a UFSv2 FS on FreeBSD. -RVP
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
At Sun, 11 Apr 2021 21:13:44 + (UTC), RVP wrote: Subject: Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0 > > I've run into this before. This is a NetBSD vs. FreeBSD UFS issue. Create > a UFS v1 FS on FreeBSD if you want to write it on NetBSD: > > newfs -O1 /dev/... > > Otherwise, you get that "Invalid quota magic number" error because newfs > creates a UFSv2 FS on FreeBSD. While that wasn't exactly my goal (I just wanted to see from the NetBSD side if the FreeBSD side was actually writing sensible things and not missing anything or mixing anything up or corrupting anything), it could indeed help me in doing that. Anyway this does something slightly different (and better, or worse) on the FreeBSD side, but still ends up with a corrupted filesystem, as seen from both sides, though maybe not so bad from NetBSD's point of view: xbd1: 30720MB at device/vbd/2064 on xenbusb_front0 xbd1: attaching as da1 xbd1: features: flush xbd1: synchronize cache commands enabled. GEOM: new disk da1 # newfs -O1 /dev/da1 /dev/da1: 30720.0MB (62914560 sectors) block size 32768, fragment size 4096 using 121 cylinder groups of 254.00MB, 8128 blks, 32512 inodes. super-block backups (for fsck_ffs -b #) at: 64, 520256, 1040448, 1560640, 2080832, 2601024, 3121216, 3641408, 4161600, 4681792, 5201984, 5722176, 6242368, 6762560, 7282752, 7802944, 8323136, 8843328, 9363520, 9883712, 10403904, 10924096, 11444288, 11964480, 12484672, 13004864, 13525056, 14045248, 14565440, 15085632, 15605824, 16126016, 16646208, 17166400, 17686592, 18206784, 18726976, 19247168, 19767360, 20287552, 20807744, 21327936, 21848128, 22368320, 22888512, 23408704, 23928896, 24449088, 24969280, 25489472, 26009664, 26529856, 27050048, 27570240, 28090432, 28610624, 29130816, 29651008, 30171200, 30691392, 31211584, 31731776, 32251968, 32772160, 33292352, 33812544, 34332736, 34852928, 35373120, 35893312, 36413504, 36933696, 37453888, 37974080, 38494272, 39014464, 39534656, 40054848, 40575040, 41095232, 41615424, 42135616, 42655808, 43176000, 43696192, 44216384, 44736576, 45256768, 45776960, 46297152, 46817344, 47337536, 47857728, 48377920, 48898112, 49418304, 49938496, 50458688, 50978880, 51499072, 52019264, 52539456, 53059648, 53579840, 54100032, 54620224, 55140416, 55660608, 56180800, 56700992, 57221184, 57741376, 58261568, 58781760, 59301952, 59822144, 60342336, 60862528, 61382720, 61902912, 62423104 # mount /dev/da1 mount: /dev/da1: unknown special file or file system # mount /dev/da1 /mnt # df Filesystem 512-blocks UsedAvail Capacity Mounted on /dev/ufs/FreeBSD_Install 782968 737016 -16680 102%/ devfs 2 20 100%/dev tmpfs 6553661664920 1%/var tmpfs 40960 840952 0%/tmp /dev/da1 61915512 16 56962256 0%/mnt # ls -l /mnt total 8 drwxrwxr-x 2 root operator 512 Apr 11 21:45 .snap # cp /COPYRIGHT /mnt/ # ls -l /mnt/ total 24 drwxrwxr-x 2 root operator 512 Apr 11 21:45 .snap -r--r--r-- 1 root wheel 6177 Apr 11 21:46 COPYRIGHT # pax -X -rw -pe / /mnt # ls -l /mnt total 1120 -rw-r--r-- 2 root wheel 1089 Oct 23 05:56 .cshrc -rw-r--r-- 2 root wheel470 Oct 23 05:56 .profile drwxrwxr-x 2 root operator 512 Apr 11 21:52 .snap -r--r--r-- 1 root wheel 6177 Oct 23 05:56 COPYRIGHT -r--r--r-- 1 root wheel 7226 Oct 23 05:57 ERRATA.HTML -r--r--r-- 1 root wheel 3273 Oct 23 05:57 ERRATA.TXT -r--r--r-- 1 root wheel 252351 Oct 23 05:57 HARDWARE.HTML -r--r--r-- 1 root wheel 117568 Oct 23 05:57 HARDWARE.TXT -r--r--r-- 1 root wheel 23882 Oct 23 05:57 README.HTML -r--r--r-- 1 root wheel 14316 Oct 23 05:57 README.TXT -r--r--r-- 1 root wheel 36431 Oct 23 05:57 RELNOTES.HTML -r--r--r-- 1 root wheel 12343 Oct 23 05:57 RELNOTES.TXT drwxr-xr-x 2 root wheel 1024 Oct 23 05:55 bin drwxr-xr-x 9 root wheel 1536 Oct 23 05:57 boot dr-xr-xr-x 2 root wheel512 Apr 11 19:02 dev -r--r--r-- 1 root wheel 6985 Oct 23 05:57 docbook.css drwxr-xr-x 25 root wheel 2048 Oct 23 06:04 etc drwxr-xr-x 5 root wheel 1536 Oct 23 05:55 lib drwxr-xr-x 3 root wheel512 Oct 23 05:55 libexec drwxr-xr-x 2 root wheel512 Oct 23 05:55 media drwxr-xr-x 2 root wheel512 Oct 23 05:57 mnt drwxr-xr-x 2 root wheel512 Oct 23 05:55 net dr-xr-xr-x 2 root wheel512 Oct 23 05:55 proc drwxr-xr-x 2 root wheel512 Oct 23 05:55 rescue drwxr-xr-x 2 root wheel512 Oct 23 05:56 root drwxr-xr-x 2 root wheel 2560 Oct 23 05:56 sbin drwxrwxrwt 2 root wheel512 Apr 11 21:52 tmp drwxr-xr-x 13 root wheel512 Oct 23 05:57 usr drwxr-xr-x 2 root wheel512 Apr 11 19:04 var # umount /mnt # fsck /d
Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0
At Sun, 11 Apr 2021 13:23:31 -0700, "Greg A. Woods" wrote: Subject: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0 > > In fact it only seems to be fsck that complains, possibly along > with any attempt to write to a filesystem, that causes problems. Definitely writing to a FreeBSD domU filesystem, i.e. to a FreeBSD xbd(4) with a new filesystem created on it, is impossible. I was able to write 500MB of zeros to the LVM LV backed disk, overwriting the copy of the .img file I had put there, and only see 500MB of zeros back on the NetBSD side, so writing directly to the raw /dev/da1 on FreeBSD seems to write data without problem. However then the following happens when I try to use a new FS there: # newfs /dev/da1 /dev/da1: 30720.0MB (62914560 sectors) block size 32768, fragment size 4096 using 50 cylinder groups of 626.09MB, 20035 blks, 80256 inodes. super-block backups (for fsck_ffs -b #) at: 192, 1282432, 2564672, 3846912, 5129152, 6411392, 7693632, 8975872, 10258112, 11540352, 12822592, 14104832, 15387072, 16669312, 17951552, 19233792, 20516032, 21798272, 23080512, 24362752, 25644992, 26927232, 28209472, 29491712, 30773952, 32056192, 8432, 34620672, 35902912, 37185152, 38467392, 39749632, 41031872, 42314112, 43596352, 44878592, 46160832, 47443072, 48725312, 50007552, 51289792, 52572032, 53854272, 55136512, 56418752, 57700992, 58983232, 60265472, 61547712, 62829952 # mount /dev/da1 /mnt # mount /dev/ufs/FreeBSD_Install on / (ufs, local, noatime, read-only) devfs on /dev (devfs, local, multilabel) tmpfs on /var (tmpfs, local) tmpfs on /tmp (tmpfs, local) /dev/da1 on /mnt (ufs, local) # df Filesystem 512-blocks UsedAvail Capacity Mounted on /dev/ufs/FreeBSD_Install 782968 737016 -16680 102%/ devfs 2 20 100%/dev tmpfs 6553660864928 1%/var tmpfs 40960 840952 0%/tmp /dev/da1 60901560 16 56029424 0%/mnt # cp /COPYRIGHT /mnt UFS /dev/da1 (/mnt) cylinder checksum failed: cg 0, cgp: 0xe66de1a4 != bp: 0xf433acbc UFS /dev/da1 (/mnt) cylinder checksum failed: cg 1, cgp: 0x89ba8532 != bp: 0x3491fbd0 UFS /dev/da1 (/mnt) cylinder checksum failed: cg 3, cgp: 0xdeaf87a7 != bp: 0x3a071e86 UFS /dev/da1 (/mnt) cylinder checksum failed: cg 7, cgp: 0x7085828d != bp: 0xaaae0f19 UFS /dev/da1 (/mnt) cylinder checksum failed: cg 15, cgp: 0x293dfe28 != bp: 0xe2f25f8b UFS /dev/da1 (/mnt) cylinder checksum failed: cg 31, cgp: 0x9a4d0762 != bp: 0x4119c6e [[ and on and on ]] UFS /dev/da1 (/mnt) cylinder checksum failed: cg 49, cgp: 0x931f84e5 != bp: 0xb48687df /mnt: create/symlink failed, no inodes free cp: /mnt/COPYRIGHT: No space left on device # Apr 11 20:37:28 syslogd: last message repeated 4 times Apr 11 20:37:59 kernel: pid 713 (cp), uid 0 inumber 2 on /mnt: out of inodes # df -i Filesystem 512-blocks UsedAvail Capacity iused ifree %iused Mounted on /dev/ufs/FreeBSD_Install 782968 737016 -16680 102% 12129 285 98% / devfs 2 20 100% 0 0 100% /dev tmpfs 6553660864928 1% 75 114613 0% /var tmpfs 40960 840952 0% 6 71674 0% /tmp /dev/da1 60901560 16 56029424 0% 2 4012796 0% /mnt NetBSD can actually make some sense of this FreeBSD filesystem though: # fsck -n /dev/mapper/rscratch-fbsd--test.0 ** /dev/mapper/rscratch-fbsd--test.0 (NO WRITE) Invalid quota magic number CONTINUE? yes ** File system is already clean ** Last Mounted on /mnt ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups SUMMARY INFORMATION BAD SALVAGE? no BLK(S) MISSING IN BIT MAPS SALVAGE? no ** Phase 6 - Check Quotas CLEAR SUPERBLOCK QUOTA FLAG? no 2 files, 2 used, 7612693 free (21 frags, 951584 blocks, 0.0% fragmentation) * UNRESOLVED INCONSISTENCIES REMAIN * I'm not sure if those problems are to be expected with a FreeBSD-created filesystem or not. Probably the "Invalid quota magic number" is normal, but I'm not sure about the "BLK(s) MISSING IN BIT MAPS". Have FreeBSD and NetBSD FFS diverged this much? I won't try to mount it, especially not from the dom0. Dumpfs shows the following: file system: /dev/mapper/rscratch-fbsd--test.0 format FFSv2 endian little-endian location 65536 (-b 128) magic 19540119timeSun Apr 11 13:46:15 2021 superblock location 65536 id [ 60735d32 358197c4 ] cylgrp dynamic inodes FFSv2 sblock FFSv2 fslevel 5 nbfree 951584 ndir2 nifree 4012796 nffree 21 ncg 50 size7864320 blocks 7612695 bsize 32768 shift 15 mask0x8000 fsize 4096shift