[PATCH 00/32] Split index mode for very large indexes

Nguyễn Thái Ngọc Duy Mon, 28 Apr 2014 03:55:29 -0700

I hinted about it earlier [1]. It now passes the test suite and with a
design that I'm happy with (thanks to Junio for a suggestion about the
rename problem).


>From the user point of view, this reduces the writable size of index
down to the number of updated files. For example my webkit index v4 is
14MB. With a fresh split, I only have to update an index of 200KB.
Every file I touch will add about 80 bytes to that. As long as I don't
touch every single tracked file in my worktree, I should not pay
penalty for writing 14MB index file on every operation.

The read penalty is not addressed here, so I still pay 14MB hashing
cost. But that's an easy problem. We could cache the validated index
in a daemon. Whenever git needs to load an index, it pokes the daemon.
The daemon verifies that the on-disk index still has the same
signature, then sends the in-mem index to git. When git updates the
index, it pokes the daemon again to update in-mem index. Next time git
reads the index, it does not have to pay I/O cost any more (actually
it does but the cost is hidden away when you do not have to read it
yet).

The forth patch is not really necessary. I started out with a
different approach that needed that abstraction. But I think it's
still a nice thing to keep. The real meat starts from 0017 to 0025. In
essence, the new index is more like a journal, where the real index is
put away unchanged.

Doing this in other implementations should be easy (at least the
reading part) and with small code change. The whole index format is
retained. All you need is to read a new extension that contains two
ewah-bitmaps and apply the changes to create the final index.

This is a preparation step for my untracked file cache. With writing
(and later on reading) index becoming cheap, I can start to put more
things in there.

[1] http://thread.gmane.org/gmane.comp.version-control.git/246471/focus=247031

Nguyễn Thái Ngọc Duy (32):
  ewah: fix constness of ewah_read_mmap
  ewah: delete unused ewah_read_mmap_native declaration
  sequencer: do not update/refresh index if the lock cannot be held
  read-cache: new API write_locked_index instead of write_index/write_cache
  read-cache: relocate and unexport commit_locked_index()
  read-cache: store in-memory flags in the first 12 bits of ce_flags
  read-cache: be strict about "changed" in remove_marked_cache_entries()
  read-cache: be specific what part of the index has changed
  update-index: be specific what part of the index has changed
  resolve-undo: be specific what part of the index has changed
  unpack-trees: be specific what part of the index has changed
  cache-tree: mark istate->cache_changed on cache tree invalidation
  cache-tree: mark istate->cache_changed on cache tree update
  cache-tree: mark istate->cache_changed on prime_cache_tree()
  entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  read-cache: save index SHA-1 after reading
  read-cache: split-index mode
  read-cache: mark new entries for split index
  read-cache: save deleted entries in split index
  read-cache: mark updated entries for split index
  split-index: the writing part
  split-index: the reading part
  split-index: do not invalidate cache-tree at read time
  split-index: strip pathname of on-disk replaced entries
  update-index: new options to enable/disable split index mode
  update-index --split-index: do not split if $GIT_DIR is read only
  rev-parse: add --shared-index-path to get shared index path
  read-tree: force split-index mode off on --index-output
  read-tree: note about dropping split-index mode or index version
  read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  t2104: make sure split index mode is off for the version test
  t1700: new tests for split-index mode
-- 
1.9.1.346.ga2b5940

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/32] Split index mode for very large indexes

Reply via email to