I hinted about it earlier [1]. It now passes the test suite and with a design that I'm happy with (thanks to Junio for a suggestion about the rename problem).
>From the user point of view, this reduces the writable size of index down to the number of updated files. For example my webkit index v4 is 14MB. With a fresh split, I only have to update an index of 200KB. Every file I touch will add about 80 bytes to that. As long as I don't touch every single tracked file in my worktree, I should not pay penalty for writing 14MB index file on every operation. The read penalty is not addressed here, so I still pay 14MB hashing cost. But that's an easy problem. We could cache the validated index in a daemon. Whenever git needs to load an index, it pokes the daemon. The daemon verifies that the on-disk index still has the same signature, then sends the in-mem index to git. When git updates the index, it pokes the daemon again to update in-mem index. Next time git reads the index, it does not have to pay I/O cost any more (actually it does but the cost is hidden away when you do not have to read it yet). The forth patch is not really necessary. I started out with a different approach that needed that abstraction. But I think it's still a nice thing to keep. The real meat starts from 0017 to 0025. In essence, the new index is more like a journal, where the real index is put away unchanged. Doing this in other implementations should be easy (at least the reading part) and with small code change. The whole index format is retained. All you need is to read a new extension that contains two ewah-bitmaps and apply the changes to create the final index. This is a preparation step for my untracked file cache. With writing (and later on reading) index becoming cheap, I can start to put more things in there. [1] http://thread.gmane.org/gmane.comp.version-control.git/246471/focus=247031 Nguyễn Thái Ngọc Duy (32): ewah: fix constness of ewah_read_mmap ewah: delete unused ewah_read_mmap_native declaration sequencer: do not update/refresh index if the lock cannot be held read-cache: new API write_locked_index instead of write_index/write_cache read-cache: relocate and unexport commit_locked_index() read-cache: store in-memory flags in the first 12 bits of ce_flags read-cache: be strict about "changed" in remove_marked_cache_entries() read-cache: be specific what part of the index has changed update-index: be specific what part of the index has changed resolve-undo: be specific what part of the index has changed unpack-trees: be specific what part of the index has changed cache-tree: mark istate->cache_changed on cache tree invalidation cache-tree: mark istate->cache_changed on cache tree update cache-tree: mark istate->cache_changed on prime_cache_tree() entry.c: update cache_changed if refresh_cache is set in checkout_entry() read-cache: save index SHA-1 after reading read-cache: split-index mode read-cache: mark new entries for split index read-cache: save deleted entries in split index read-cache: mark updated entries for split index split-index: the writing part split-index: the reading part split-index: do not invalidate cache-tree at read time split-index: strip pathname of on-disk replaced entries update-index: new options to enable/disable split index mode update-index --split-index: do not split if $GIT_DIR is read only rev-parse: add --shared-index-path to get shared index path read-tree: force split-index mode off on --index-output read-tree: note about dropping split-index mode or index version read-cache: force split index mode with GIT_TEST_SPLIT_INDEX t2104: make sure split index mode is off for the version test t1700: new tests for split-index mode -- 1.9.1.346.ga2b5940 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html