Steve Baxter wrote:
We have a few users reporting Metakit database corruption. Having investigated, they are all opening a database that is residing on a remote volume - in particular one user is opening a database from Windows XP that resides on a Mac running 10.3.8 (the Mac uses SAMBA to share files to PC clients).
After doing some investigation, the problem seems to be with memory mapped files. If I turn off memory mapping for Windows, we get no problems.
Looking at the Metakit sources, I think I see why. MK maps the file, but then also uses fread() and fwrite() to access the file at the same time. This confuses me somewhat - I would have thought that data would have been modified in-place rather than using fwrite().
Files are mapped read-only. Writes only occur in unused areas of the file, i.e. where no reading takes places, so it does not matter whether the mmap is in sync or stale. After a write/commit, the map is dropped and re-mapped.
There is a known case which breaks at least some Win* setups: writing data to file which comes from a mmap'ed memory area on the same file. The problem seems to be that the seek pointer can get confused, IOW it looks like some versions of Windows use file I/O underneath and operate off a single file pointer. The solution was to never write using a pointer coming from a mmap'ed area - an interim 4 Kb buffer is used to avoid that. Look for "horrendous" in fileio.cpp for details.
Reading up on Windows file mapping, I find this comment in the docs for CreateFileMapping():
Creating a file mapping object creates the potential for mapping a view of the file, but does not map the view. The MapViewOfFile and MapViewOfFileEx functions map a view of a file into a process address space.
With one important exception, file views derived from a single file mapping object are coherent or identical at a specific time. If multiple processes have handles of the same file mapping object, they see a coherent view of the data when they map a view of the file.
The exception is related to remote files. Although CreateFileMapping works with remote files, it does not keep them coherent. For example, if two computers both map a file as writable, and both change the same page, each computer only sees its own writes to the page. When the data gets updated on the disk, it is not merged.
A mapped file and a file that is accessed by using the input and output (I/O) functions (ReadFile and WriteFile) are not necessarily coherent.
That last bit is the killer - Windows does not guarantee that files accessed by ReadFile and WriteFile (which I suspect fread() and fwrite() eventually call) are coherent with memory mapped on those files.
No, as I read this, MK does not have a problem with it. See above.
So, I can think of a couple of ways of solving this:
(a) Turn off file mapping for remote volumes.
(b) Modify MK so that writes inside the mapped area are done using memmove() (or whatever) rather than fwrite().
Switching to a r/w memory map in MK would make the system considerably more vulnerable to stray pointer bugs damaging a datafile.
This is not guaranteed to work on Unix either:
http://www.opengroup.org/onlinepubs/009695399/functions/mmap.html
The application must ensure correct synchronization when using mmap() in conjunction with any other file access method, such as read() and write(), standard input/output, and shmat().
Again, file handle and mmap being in sync is not a requirement for MK to work properly. Having said that, MK *does* assume that both can be used in parallel, just that changes from writes do not necessarily show up in the r/o mmap.
Has anyone come across this before? Any fixes?
I'd look at Samba config options on the Mac, maybe there are ways to turn off some optimizations and/or caching behaviors. Or maybe we need a good way to sync/flush on Windows? The logic is all in fileio.cpp, maybe there are open mode flags one could set (look for "_open"), or some sort of file sync system call to add to DataCommit (has to be Windows specific, I don't think there is a portable way to do this). The essential requirement is that when a new r/o mmap is set up, it really fully sees all writes done up to that point.
This is btw the reason - generally speaking - to avoid using r/w datafiles on file servers: the semantics are not well-defined enough (as far as I can tell) to guarantee that MK's stable-storage approach to transaction safety is enforced. I've long ago given up on locking, which is even hairier on remote file servers. On non-local data, the most robust way forward is a client/server approach. Depending on language, that may or may not be an easy option, of course.
You don't need to alter MK to make the above changes btw, you can derive your own class from c4_FileStrategy and override the relevant member functions, then pass in instance of it to the c4_Storage constructor instead.
-jcw
_____________________________________________ Metakit mailing list - [email protected] http://www.equi4.com/mailman/listinfo/metakit
