Re: [DVC] Incremental parsing of log output

Matthieu Moy Mon, 27 Aug 2007 04:41:28 -0700

Michael Olson <[EMAIL PROTECTED]> writes:

> Now that I've actually profiled both the original git-log function and
> the new xgit-log function, I see that indeed waiting for git output is
> not the big issue after all.


An interesting experiment:

in git's repository (11,000 revisions), 

* (dvc-run-dvc-sync 'xgit '("log"))  runs in around 1 second,

* (dvc-run-dvc-async 'xgit '("log")) runs in around 20 seconds

* (start-process "git-log" (current-buffer) "git" "log") also runs in
  around 20 seconds (so, dvc-run-dvc-async may add some overhead, but
  there is a real performance problem in start-process.

* (start-process "git-log" (current-buffer) "sh" "-c" "git log > 
/tmp/log-output.txt")
  runs in less than a second. So, the performance problem in
  `start-process' is actually a problem with filling-in the output
  buffer.

  After that, opening /tmp/log-output.txt in Emacs is almost
  instantaneous. So, the problem is the way Emacs fills the buffer
  incrementally, not just putting the data in the buffer.

At that point, I can imagine two solutions, and unfortunately, they're
partly orthogonal.

1) Forget about incremental parsing, and modify (dvc-run-dvc-async) to
   pipe the output to a temporary file, and open that temporary file
   in a buffer before calling the function provided with :finished
   and :error.

   That's relatively easy to implement. a few lines of code in DVC,
   nothing in the backends. The only problem is the existing code to
   prompt for password, which uses process filters. I have no idea how
   to make this work here :-(.

2) Go for the incremental solution.

In the case of git, with the example of the git repository, solution
1) means getting the full log output in a few seconds (less than a
second to run git, and a few seconds for xgit-log-parse), and solution
2) means getting the first items immediately, but have Emacs continue
processing the output for a few more tens of seconds. In this case,
I wouldn't give a strong advantage to either solution.

Solution 2) clearly wins in 2 cases: huge projects (the Linux kernel
has ~5 times more revisions than git, and that's only since
2.6.something, and there are bigger projects that it ...), and
back-ends which allows remote log (bzr), which can be slow because of
the network.

So, if it were me, I'd go for 1) because I'm lazy, but if you go for
2), that's even better ;-).

Now, the best would be to get the performance of 1), with the
incrementality of 2). A few random ideas for that:

* *Perhaps* this can be done by piping the output to a temporary file
  anyway, and open this file in a buffer. The incremental function
  would just `revert-buffer' before running. *But* I guess
  `revert-buffer' is O(file-size), which means the total cost is
  O(file-size * number of runs). If the number of runs is
  O(file-size), we have a quadratic cost.

  That can be worked around by having the incremental function run in
  an aperiodical way. Run once, program yourself in 0.1 second. Run
  again, program yourself in 0.2 seconds, then 0.4, 0.8, ... then
  we're back to O(n * log(n)).

* In any case, if :filter is not provided, we can use the optimisation
  in 1) above. That will not benefit incremental functions, but that
  can be an appreciable performance boost for existing code. But that
  means solving the password prompt problem.

Any thought?

-- 
Matthieu

_______________________________________________
Dvc-dev mailing list
[email protected]
https://mail.gna.org/listinfo/dvc-dev

Re: [DVC] Incremental parsing of log output

Reply via email to