On 05/14/2013 05:59 AM, Adam Thorn wrote: > Hi, > > I'm seeing regular tdb corruption; typical log messages are: > > tdb(/var/db/samba/sessionid.tdb): tdb_rec_read bad magic 0x42424242 at > offset=672032 > > tdb(/var/db/samba/connections.tdb): tdb_rec_read bad magic 0x0 at > offset=1111638594 > > tdb(/var/db/samba/locking.tdb): tdb_rec_read bad magic 0x42424242 at > offset=1034396 > > which then prevents fileserving from working properly (N.B. the bad > magic is not limited to those three tdbs). At the moment I'm running > Samba 3.6.6 on FreeBSD 9.0, but I've seen exactly the same behaviour > with 3.6.9 and 3.6.13, and also the same behaviour on FreeBSD 9.1 as > well. I also currently have the tdb-1.2.9,1 FreeBSD port installed at > present, but have seen the same problem with tdb-1.2.11,1 > > I found a few forum posts that suggested setting "use mmap=no" - I have > tried that, but saw no change in behaviour. > > Restarting samba invariably clears the problem for a while: sometimes > it's just a few hours before we get further bad magic messages, > sometimes it's continued working fine for ~10 days or so, and pretty > much everything in between. There is no obvious pattern of which tdbs > are corrupting; I've seen pretty much all of them become corrupt over > the last couple of months. > > The server has multiple IP addresses which samba listens on; first of > all we just start smbd with > > [global] > include = /data/config/samba/servers/%i > > and I've attached the result from running testparm on one of those > included files. It's very very slightly redacted to hide IP addresses > and group names. We have another similarly-configured server (FreeBSD > 9.0, Samba 3.6.6) with the same pattern of "include a config file > dependent on the IP address the client connects to", and that has been > running smoothly with no problems at all for over a year. > > I don't think (but have not absolutely conclusively ruled out) that it's > a hardware problem on the server itself; the samba service (and the > associated IP addresses) is managed by heartbeat, so I've tried running > samba on the two nominally identical servers in the HA cluster - I see > the same problematic behaviour on both nodes. > > I've also attached the output of "smbd -b", in case that is informative. > > I'm kind of running out of ideas of what to try next; any and all advice > will thus be gratefully received! It's been especially hard to diagnose > because the corruption happens seemingly at random, and I've not been > able to identify a definite action that leads to the errors. (Also, > because it's a production server, I'm not keen to try to deliberately > provoke errors..) > > Adam > >
What type of filesystem are you using? Do you have barriers enabled? I know in Linux that you should set barrier=1 on the ext3/ext4 filesystem in order to prevent corruption of sam.ldb in cases of power loss. -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba