On Wed, Feb 04 2015, Carl Worth wrote: > First, when mapping the index file: > > /* FIXME: We map this shared, which is a start, but we need to think about > * how to make it multi-process safe. */ > cache->index = (unsigned char *) > mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > > Then, when reading an entry from the cache file, (and potentially > unlinking the old file being replaced): > > /* FIXME: We'll need an fsync here and think about races... maybe even need > * an flock to avoid leaking files. Or maybe fsync, then read back and > * verify the entry is still ours, delete it if somebody else overwrote > * it. */ > > So please follow up in replies if you have good ideas on how to best > address these comments.
Kristian and I just had a good conversation about these issues. The key insight is that the shader cache implementation uses this cache for two distinct things: 1. Here's a SHA-1 corresponding to some GLSL source that I just compiled. Hold onto this hash, so that when I see this source again in the future, I can quickly know that this source compiles. 2. Here's a SHA-1 that I'm using to name a compiled and linked binary blob. Hold onto this blob so that I can retrieve it later with this same hash. For case 1, we want quick lookups, and the mmapped index file provides this nicely. But for case 2, the index-file approach has some problems: * As Kristian mentions in his comment above, there is a race condition where two processes can update the same entry with different hashes, (and neither seeing the other's first). This will result in files leaking. * As discussed elsewhere in the thread, a replacement policy based on a maximum number of cached entries isn't what we want. And interestingly, the one big benefit of the index file, (fast lookups for presence in the cache), is not needed at all for case 2. If we're trying to load a binary from disk we can instead just open() the file with the hash as its filename. If the open succeeds, proceed to load the data. If not, we go off to compile-and-link, (which is expensive enough that we need not worry about the cost of the open()). So my plan now is to entirely separate the above two cases in the cache API. Case 1, ("have we seen this shader source?") will be implemented with the index-file idea. This can be a fixed number of entries, (there's no associated data so we don't need a size-based knob for configuration). And without any unlink() being involved we don't have races to worry about. [There is the potential for two processes to write to the same entry at the same time and corrupt it, but we have a cryptographic improbability that the resulting corrupted entry will ever be valid for any future GLSL source.] Then, case 2, ("what's the linked binary blob for this hash?") will be implemented by simply reading and writing blobs to files based on the hash filenames, (without going through the index). And for this, I'll implement a maximum size for the cache. I plan to do that at cache_create() time by noticing a too-large cache and freeing up some space by deleting a sufficient number of random entries. I'll make this concrete in patches soon. As always, any comments are welcome. -Carl
pgp3zENd3wNCs.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev