Michael Olson <[EMAIL PROTECTED]> writes:
> Now that I've actually profiled both the original git-log function and
> the new xgit-log function, I see that indeed waiting for git output is
> not the big issue after all.
An interesting experiment:
in git's repository (11,000 revisions),
* (dvc-run-dvc-sync 'xgit '("log")) runs in around 1 second,
* (dvc-run-dvc-async 'xgit '("log")) runs in around 20 seconds
* (start-process "git-log" (current-buffer) "git" "log") also runs in
around 20 seconds (so, dvc-run-dvc-async may add some overhead, but
there is a real performance problem in start-process.
* (start-process "git-log" (current-buffer) "sh" "-c" "git log >
/tmp/log-output.txt")
runs in less than a second. So, the performance problem in
`start-process' is actually a problem with filling-in the output
buffer.
After that, opening /tmp/log-output.txt in Emacs is almost
instantaneous. So, the problem is the way Emacs fills the buffer
incrementally, not just putting the data in the buffer.
At that point, I can imagine two solutions, and unfortunately, they're
partly orthogonal.
1) Forget about incremental parsing, and modify (dvc-run-dvc-async) to
pipe the output to a temporary file, and open that temporary file
in a buffer before calling the function provided with :finished
and :error.
That's relatively easy to implement. a few lines of code in DVC,
nothing in the backends. The only problem is the existing code to
prompt for password, which uses process filters. I have no idea how
to make this work here :-(.
2) Go for the incremental solution.
In the case of git, with the example of the git repository, solution
1) means getting the full log output in a few seconds (less than a
second to run git, and a few seconds for xgit-log-parse), and solution
2) means getting the first items immediately, but have Emacs continue
processing the output for a few more tens of seconds. In this case,
I wouldn't give a strong advantage to either solution.
Solution 2) clearly wins in 2 cases: huge projects (the Linux kernel
has ~5 times more revisions than git, and that's only since
2.6.something, and there are bigger projects that it ...), and
back-ends which allows remote log (bzr), which can be slow because of
the network.
So, if it were me, I'd go for 1) because I'm lazy, but if you go for
2), that's even better ;-).
Now, the best would be to get the performance of 1), with the
incrementality of 2). A few random ideas for that:
* *Perhaps* this can be done by piping the output to a temporary file
anyway, and open this file in a buffer. The incremental function
would just `revert-buffer' before running. *But* I guess
`revert-buffer' is O(file-size), which means the total cost is
O(file-size * number of runs). If the number of runs is
O(file-size), we have a quadratic cost.
That can be worked around by having the incremental function run in
an aperiodical way. Run once, program yourself in 0.1 second. Run
again, program yourself in 0.2 seconds, then 0.4, 0.8, ... then
we're back to O(n * log(n)).
* In any case, if :filter is not provided, we can use the optimisation
in 1) above. That will not benefit incremental functions, but that
can be an appreciable performance boost for existing code. But that
means solving the password prompt problem.
Any thought?
--
Matthieu
_______________________________________________
Dvc-dev mailing list
[email protected]
https://mail.gna.org/listinfo/dvc-dev