On Thu, Mar 17, 2016 at 9:43 PM, Johannes Schindelin <johannes.schinde...@gmx.de> wrote: > Hi Duy, > > On Thu, 17 Mar 2016, Duy Nguyen wrote: > >> On Thu, Mar 17, 2016 at 1:27 AM, Johannes Schindelin >> <johannes.schinde...@gmx.de> wrote: >> > I am much more concerned about concurrent accesses and the communication >> > between the Git processes and the index-helper. Writing to the .pid file >> > sounds very fragile to me, in particular when multiple processes can poke >> > the index-helper in succession and some readers are unaware that the index >> > is being refreshed. >> >> It's not that bad. > > Well, the way I read the code it is possible that: > > 1. Git process 1 starts, reading the index > 2. Git process 2 starts, poking the index-helper > 3. The index-helper updates the .pid file (why not set a bit in the shared > memory?) with a prefix "W" > 4. Git process 2 reads the .pid file and waits for the "W" to go away > (what if index-helper is not fast enough to write the "W"?) > 5. Git process 1 access the index, happily oblivious that it is being > updated and the data is in an inconsistent state
No, if process 1 reads the index file, then that file will remain consistent/unchanged all the time. index-helper is not allowed to touch that file at all. The process 2 gets the index content from shm (cached by the index helper), verifies that it's good (with the signature at the end of the shm). If watchman is used, process 2 can also read the list of modified files from another shm, combine it with the in-core index, then write it down the normal way. Only then process 1 (or process 3) can see the new index content from the file. >> We should have protection in place to deal with this and fall back to >> reading directly from file when things get suspicious. > > I really want to prevent that. I know of use cases where the index weighs > 300MB, and falling back to reading it directly *really* hurts. For crying out loud, what do you store in that repo? What I have in mind for all these works are indexes in 10MB range, or maybe 50MB max. Very unscientifically, git.git index is about 274kb and contains ~3000 entries, so 94 bytes per entry on average. With a 300MB index , the extrapolated number of entries is about 3 millions! At around 1 million index entries, I think it's time to just use a database as index. >> But I agree that sending UNIX signals (or PostMessage) is not really >> good communication. > > Yeah, I really would like two-way communication instead. Named pipes? > They'd have the advantage that you could use the full path to the index as > identifier. Yep. > The way I read the current code, we would actually create a different > shared memory every time the index changes because its checksum is part of > the shared memory's "path"... Yep. shm objects are "immutable", pretty much like git objects. But now that I think of it, I don't know how cheap/expensive shm creation operation is on Windows. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html