Goal
~~~~

This is a patch series about libifying `git apply` functionality, and
using this libified functionality in `git am`, so that no 'git apply'
process is spawn anymore. This makes `git am` significantly faster, so
`git rebase`, when it uses the am backend, is also significantly
faster.

Previous discussions and patches series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This has initially been discussed in the following thread:

  http://thread.gmane.org/gmane.comp.version-control.git/287236/

Then a RFC patch series was sent:

  http://thread.gmane.org/gmane.comp.version-control.git/288489/

Then v1 was sent:

  http://thread.gmane.org/gmane.comp.version-control.git/292324/

Highlevel view of the pqtches in the series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This new patch series is built on top of the above previous work.

  - Patches 1/94 to 47/94 are here from the RFC patch series.

They get rid of most of the global variables in builtin/apply.c by
moving the variables into a new 'struct apply_state', and they
refactor the code around a bit.

They haven't changed much since v1. There are mostly variables that
have been moved around inside 'struct apply_state' or inside
match_fragment(), new lines that have removed in 'struct apply_state'
and function parameters that have been renamed.

  - Patches 48/94 to 84/94 were new in v1.

They finish libifying the apply functionality that was in
builtin/apply.c and move it into apply.{c,h}. And they use this
libified functionality in git am so that it doesn't launch git apply
processes any more.

There are many changes in these patches, as there were a lot of great
suggestions from many reviewers to improve on v1:

      - Lockfile management changes as suggested by Junio.
      - Many memory leaks were fixed as suggested by Eric.
      - A patch was splitted into two as suggested by Duy.
      - Some file descriptors have been closed as suggested by Dscho.
      - Some early return were moved up as suggested by Eric.
      - Many commit messages were improved.

I am not very happy that I added some gotos, but it looked like a not
too bad way to make sure that the memory and file descriptor cleanups
were done before returning from some functions.

  - Patches 85/94 to 94/94 are new and can be considered RFC.

They implement a way to make the libified apply silent. It is a new
feature in the libified apply functionality.

I consider that the apply functionality is properly libified before
these patches, and that they should be in a separate series, but
unfortunately using the libified apply in "git am" unmasks the fact
that "git am", since it was a shell script, has been silencing the
apply functionality by "futzing with file descriptors". And
unfortunately this makes some reviewers unhappy.

The last patch 94/94 adds a --silent command line option to git apply.
This is not necessary, but some scripts could perhaps use it, and it
could make it easier to test the new silent feature.

By the way there are no tests yet for this new feature, and I am not
sure at all that "--silent" and "be_silent" are good names. Also I
will perhaps be asked to move or merge this part of this series, to
remove the "futzing with file descriptors" earlier in the former
patches. That's why this part of the series can be considered RFC.

General comments
~~~~~~~~~~~~~~~~

Sorry if this patch series is long. I can split it into two or more
series if it is prefered.

I can also send diffs between this version and the previous one, but
for now I'd rather not send them in this email, as it would make it
very long.

The benefits are not just related to not creating new processes. When
`git am` launched a `git apply` process, this new process had to read
the index from disk. Then after the `git apply`process had terminated,
`git am` dropped its index and read the index from disk to get the
index that had been modified by the `git apply`process. This was
inefficient and also prevented the split-index mechanism to provide
many performance benefits.

Using this series as rebase material, Duy explains it like this:

 > Without the series, the picture is not so surprising. We run git-apply
 > 80+ times, each consists of this sequence
 >
 > read index
 > write index (cache tree updates only)
 > read index again
 > optionally initialize name hash (when new entries are added, I guess)
 > read packed-refs
 > write index
 >
 > With this series, we run a single git-apply which does
 >
 > read index (and sharedindex too if in split-index mode)
 > initialize name hash
 > write index 80+ times

(See: 
http://thread.gmane.org/gmane.comp.version-control.git/292324/focus=292460)

Links
~~~~~

This patch series is available here:

https://github.com/chriscool/git/commits/libify-apply-use-in-am54

The previous version, v1, is available here:

https://github.com/chriscool/git/commits/libify-apply-use-in-am25 

Performance numbers
~~~~~~~~~~~~~~~~~~~

Only tests on Linux have been performed. It could be interesting to
test on other platforms especially Windows and perhaps OSX too.

  - Around mid April Ævar did a huge many-hundred commit rebase on the
    kernel with untracked cache.

command: git rebase --onto 1993b17 52bef0c 29dde7c

Vanilla "next" without split index:                1m54.953s
Vanilla "next" with split index:                   1m22.476s
This series on top of "next" without split index:  1m12.034s
This series on top of "next" with split index:     0m15.678s

Ævar used his Debian laptop with SSD.

  - Around mid April I tested rebasing 13 commits in Booking.com's
    monorepo on a Red Hat 6.5 server with split-index and
    GIT_TRACE_PERFORMANCE=1.

With Git v2.8.0, the rebase took 6.375888383 s, with the git am
command launched by the rebase command taking 3.705677431 s.

With this series on top of next, the rebase took 3.044529494 s, with
the git am command launched by the rebase command taking 0.583521168
s.


Christian Couder (94):
  builtin/apply: make gitdiff_verify_name() return void
  builtin/apply: avoid parameter shadowing 'p_value' global
  builtin/apply: avoid parameter shadowing 'linenr' global
  builtin/apply: avoid local variable shadowing 'len' parameter
  builtin/apply: extract line_by_line_fuzzy_match() from
    match_fragment()
  builtin/apply: move 'options' variable into cmd_apply()
  builtin/apply: move 'read_stdin' global into cmd_apply()
  builtin/apply: introduce 'struct apply_state' to start libifying
  builtin/apply: move 'state' init into init_apply_state()
  builtin/apply: move 'unidiff_zero' global into 'struct apply_state'
  builtin/apply: move 'check' global into 'struct apply_state'
  builtin/apply: move 'check_index' global into 'struct apply_state'
  builtin/apply: move 'apply_in_reverse' global into 'struct
    apply_state'
  builtin/apply: move 'apply_with_reject' global into 'struct
    apply_state'
  builtin/apply: move 'apply_verbosely' global into 'struct apply_state'
  builtin/apply: move 'update_index' global into 'struct apply_state'
  builtin/apply: move 'allow_overlap' global into 'struct apply_state'
  builtin/apply: move 'cached' global into 'struct apply_state'
  builtin/apply: move 'diffstat' global into 'struct apply_state'
  builtin/apply: move 'numstat' global into 'struct apply_state'
  builtin/apply: move 'summary' global into 'struct apply_state'
  builtin/apply: move 'threeway' global into 'struct apply_state'
  builtin/apply: move 'no_add' global into 'struct apply_state'
  builtin/apply: move 'unsafe_paths' global into 'struct apply_state'
  builtin/apply: move 'line_termination' global into 'struct
    apply_state'
  builtin/apply: move 'fake_ancestor' global into 'struct apply_state'
  builtin/apply: move 'p_context' global into 'struct apply_state'
  builtin/apply: move 'apply' global into 'struct apply_state'
  builtin/apply: move 'patch_input_file' global into 'struct
    apply_state'
  builtin/apply: move 'limit_by_name' global into 'struct apply_state'
  builtin/apply: move 'has_include' global into 'struct apply_state'
  builtin/apply: move 'p_value' global into 'struct apply_state'
  builtin/apply: move 'p_value_known' global into 'struct apply_state'
  builtin/apply: move 'root' global into 'struct apply_state'
  builtin/apply: move 'whitespace_error' global into 'struct
    apply_state'
  builtin/apply: move 'whitespace_option' into 'struct apply_state'
  builtin/apply: remove whitespace_option arg from
    set_default_whitespace_mode()
  builtin/apply: move 'squelch_whitespace_errors' into 'struct
    apply_state'
  builtin/apply: move 'applied_after_fixing_ws' into 'struct
    apply_state'
  builtin/apply: move 'ws_error_action' into 'struct apply_state'
  builtin/apply: move 'ws_ignore_action' into 'struct apply_state'
  builtin/apply: move 'max_change' and 'max_len' into 'struct
    apply_state'
  builtin/apply: move 'state_linenr' global into 'struct apply_state'
  builtin/apply: move 'fn_table' global into 'struct apply_state'
  builtin/apply: move 'symlink_changes' global into 'struct apply_state'
  builtin/apply: move 'state' check into check_apply_state()
  builtin/apply: move applying patches into apply_all_patches()
  builtin/apply: rename 'prefix_' parameter to 'prefix'
  builtin/apply: move 'lock_file' global into 'struct apply_state'
  builtin/apply: move 'newfd' global into 'struct apply_state'
  builtin/apply: make apply_patch() return -1 instead of die()ing
  builtin/apply: read_patch_file() return -1 instead of die()ing
  builtin/apply: make find_header() return -1 instead of die()ing
  builtin/apply: make parse_chunk() return a negative integer on error
  builtin/apply: make parse_single_patch() return -1 on error
  apply: move 'struct apply_state' to apply.h
  builtin/apply: make parse_whitespace_option() return -1 instead of
    die()ing
  builtin/apply: make parse_ignorewhitespace_option() return -1 instead
    of die()ing
  builtin/apply: move init_apply_state() to apply.c
  apply: make init_apply_state() return -1 instead of exit()ing
  builtin/apply: make check_apply_state() return -1 instead of die()ing
  builtin/apply: move check_apply_state() to apply.c
  builtin/apply: make apply_all_patches() return -1 on error
  builtin/apply: make parse_traditional_patch() return -1 on error
  builtin/apply: make gitdiff_*() return 1 at end of header
  builtin/apply: make gitdiff_*() return -1 on error
  builtin/apply: change die_on_unsafe_path() to check_unsafe_path()
  builtin/apply: make build_fake_ancestor() return -1 on error
  builtin/apply: make remove_file() return -1 on error
  builtin/apply: make add_conflicted_stages_file() return -1 on error
  builtin/apply: make add_index_file() return -1 on error
  builtin/apply: make create_file() return -1 on error
  builtin/apply: make write_out_one_result() return -1 on error
  builtin/apply: make write_out_results() return -1 on error
  builtin/apply: make try_create_file() return -1 on error
  builtin/apply: make create_one_file() return -1 on error
  builtin/apply: rename option parsing functions
  apply: rename and move opt constants to apply.h
  Move libified code from builtin/apply.c to apply.{c,h}
  apply: make some parsing functions static again
  run-command: make dup_devnull() non static
  apply: roll back index lock file in case of error
  environment: add set_index_file()
  builtin/am: use apply api in run_apply()
  write_or_die: use warning() instead of fprintf(stderr, ...)
  apply: add 'be_silent' variable to 'struct apply_state'
  apply: make 'be_silent' incomatible with 'apply_verbosely'
  apply: don't print on stdout when be_silent is set
  usage: add set_warn_routine()
  usage: add get_error_routine() and get_warn_routine()
  apply: change error_routine when be_silent is set
  am: use be_silent in 'struct apply_state' to shut up applying patches
  run-command: make dup_devnull() static again
  builtin/apply: add a cli option for be_silent

 Makefile                   |    1 +
 builtin/apply.c => apply.c | 1750 ++++++++--------
 apply.h                    |  146 ++
 builtin/am.c               |   91 +-
 builtin/apply.c            | 4762 +-------------------------------------------
 cache.h                    |    1 +
 environment.c              |   10 +
 git-compat-util.h          |    3 +
 run-command.c              |    2 +-
 t/t4012-diff-binary.sh     |    4 +-
 t/t4254-am-corrupt.sh      |    2 +-
 usage.c                    |   15 +
 write_or_die.c             |    6 +-
 13 files changed, 1324 insertions(+), 5469 deletions(-)
 copy builtin/apply.c => apply.c (73%)
 create mode 100644 apply.h
 rewrite builtin/apply.c (98%)

-- 
2.8.2.490.g3dabe57

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to