On Thu, 30 Jul 2020 at 02:24, Chuck Silvers <c...@chuq.com> wrote: > > On Wed, Jul 29, 2020 at 06:13:03PM +0100, Chavdar Ivanov wrote: > > On Wed, 29 Jul 2020 at 08:33, Matthias Petermann <m...@petermann-it.de> > > wrote: > > > > > > Hello Chavdar, > > > > > > Am 28.07.2020 um 18:48 schrieb Chavdar Ivanov: > > > > This being a place people are trying samba4 as a DC, I got a > > > > repeatable panic on one of the systems I am trying it on, as follows: > > > > .... > > > > crash: _kvm_kvatop(0) > > > > Crash version 9.99.69, image version 9.99.69. > > > > Kernel compiled without options LOCKDEBUG. > > > > System panicked: /: bad dir ino 657889 at offset 0: Bad dir (not > > > > rounded), reclen=0x2e33, namlen=51, dirsiz=60 <= reclen=11827 <= > > > > maxsize=512, flags=0x2005900, entryoffsetinblock=0, dirblksiz=512 > > > > > > > > Backtrace from time of crash is available. > > > > _KERNEL_OPT_NARCNET() at 0 > > > > _KERNEL_OPT_DDB_HISTORY_SIZE() at _KERNEL_OPT_DDB_HISTORY_SIZE > > > > sys_reboot() at sys_reboot > > > > vpanic() at vpanic+0x15b > > > > snprintf() at snprintf > > > > ufs_lookup() at ufs_lookup+0x518 > > > > VOP_LOOKUP() at VOP_LOOKUP+0x42 > > > > lookup_once() at lookup_once+0x1a1 > > > > namei_tryemulroot() at namei_tryemulroot+0xacf > > > > namei() at namei+0x29 > > > > vn_open() at vn_open+0x9a > > > > do_open() at do_open+0x112 > > > > do_sys_openat() at do_sys_openat+0x72 > > > > sys_open() at sys_open+0x24 > > > > syscall() at syscall+0x26e > > > > --- syscall (number 5) --- > > > > syscall+0x26e: > > > > .... > > > > > > > > > that still looks like a file system inconsistency. Before the patch from > > > Chuck I also had the case several times that a filesystem that was > > > apparently repaired with fsck could no longer be trusted. After > > > importing the patched kernel, to be on the safe side, I recreated all > > > the file systems previously mounted with posix1eacls with newfs. > > > > Hard that one, as it was the root file system... Anyway, a couple of > > fsck's seem to have sorted out this one. > > how exactly did you run fsck to fix this? the most reliable way is to boot > the machine single-user, then run "fsck -fy ..." from the console shell, > then run the same fsck command again to make sure that it says that > everything is ok, then reboot.
Exactly. Deep buried in my fingertip's memories is that one should fsck / in single user, twice, and reboot immediately without a 'sync'. Perhaps some old SunOS manual... > > if you have done that and are still crashing due to corruption in your > root file system, then we still have another bug in the kernel somewhere. So it seems to me; the peculiarities here are that in both cases / is a GPT slice and that I have 'log' as a mount option; it was suggested 'posix1eacls' should be used on its own. > > > > > Presumably fsck is not prepared for the kind of inconsistency, and only > > > a newfs can restore a trustworthy initial state. What is the starting > > > point for you? Has the file system been created after the patch, or has > > > it only been treated with fsck so far? > > > > I think it may have been created before the patch to the filesystem > > code, but before the second version of the samba4 package. > > > > > > > > In any case, I would advise you - if you have not already done so - to > > > use a separate partition or LVM volume for the sysvol with its own file > > > system, and to mount only this with the posix1eacls option. It seems the > > > ACL code still needs a lot of testingh, so at least you can be sure that > > > your root filesystem will not be affected. > > > > As this was running on a XCP-NG guest, I added a small 1GB disk to the > > vm, created the filesystem (-O 2) and mounted it on /var/db/samba4. > > > > I removed the 'posix1eacls' options from the other existing > > filesystems and left it only for the one mounted on /var/db/samba4 . > > In this case, the provisioning fails with a message that the > > filesystem does not support acls - so it perhaps checks the root > > filesystem after all. I then re-added this option to /, newfs'd > > /var/db/samba4, rebooted and retried the provisioning. This resulted > > in a similar to the above panic, this time after perhaps 10 minutes > > work of python8 doing database conversion from v1 to v2 - the third > > database in the list. As this was seen on the console of the XCP-NG > > guest, I took screenshots of the panic, in case someone is interested. > > yes, I'd like to see the screenshots please. I'll mail them off-list, to avoid large-ish uuencoded bits polluting the archives. > > -Chuck Chavdar -- ----