On 12 May 2015 4:20 am, "Magnus Ihse Bursie" <[email protected]> wrote:
>
> Hi,
>
> When viewing large (> 1 MB) files with Meld, it takes a noticable amount
of time for the diff to show up. (My current example is a 2 MB file which,
on my machine takes ~8 seconds).
>
> I have always naively assumed that it was the underlying diff algorithm
that was slow, perhaps due to (my perceived) slowness of python. This
turned out to be completely incorrect. When I created a unit test to just
read the files into an array and feed it directly to MyersSequenceMatcher,
it's so blazingly fast I can't even get a reliable measurement on it.
>
> So it's something else that causes the delays.
>
> I tried running 'python -m cProfile -s time  bin/meld' but this only
gives the following un-helpful result:
>
>          457930 function calls (454080 primitive calls) in 9.639 seconds
>
>    Ordered by: internal time
>
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    7.278    7.278    9.639    9.639 meld:19(<module>)
>       476    0.227    0.000    0.348    0.001 diffgrid.py:232(do_draw)
>       976    0.209    0.000    0.210    0.000 Gtk.py:676(insert)
>      6647    0.174    0.000    0.188    0.000
meldbuffer.py:57(do_apply_tag)
>        42    0.143    0.003    0.293    0.007
meldbuffer.py:212(__getitem__)
>         3    0.139    0.046    0.149    0.050 gnomeglade.py:39(__init__)
>        80    0.119    0.001    0.119    0.001 {method 'splitlines' of
'unicode' objects}
>      3540    0.098    0.000    0.321    0.000
diffgrid.py:182(child_allocate)
>         2    0.085    0.042    0.096    0.048
matchers.py:128(index_matching)
>      8496    0.080    0.000    0.140    0.000
diffgrid.py:168(get_child_prop_int)
>       236    0.078    0.000    0.114    0.000
diffgrid.py:214(_get_min_sizes)
> <cut>
>
> So the complete run took more than 9 seconds, of which 7 was spend in
"meld:19". The rest of the calls contributes to negligeble times and is
clearly not the culprit here. (Note that the timing includes a split second
for me to close the meld Window after the diff has finished rendering.)

You can add a low priority sys.exit call to the meld task queue to help
automate what you're doing.

> I assume that "meld" refers to the bin/meld.py script. Line 19 is
obviosly bogus (that's the first non-comment line, an import statement). I
guess the problem here is
>     status = meld.meldapp.app.run(sys.argv)
> and that cProfile and Gtk.Application does not play well together.

Yes. My guess is that you're seeing stuff dispatched from the gobject main
loop, though I have never figured out the rules for what things end up in
python profiling data.

> I have tried googling on how to profile a Gtk.Application-based python
program, but ended up with no usable results.
>
> Have anyone here tried profiling Meld before? If so, how did you do?

I've done a lot of profiling previously, but mostly pre-GTK3, and while
some stuff shows up in the profiles, much does not.

I'm going to suggest two probable slownesses from previous experience.

Firstly, inserting text into a textbuffer that's editable validates the
UTF8-ness of the text on every insert, which can end up being slow for
large files. You can check the file loading code and make it unbuffered to
see whether you're seeing this. Also, make it non editable and see whether
that helps.

Secondly, the initial diff is typically fairly fast. The slowness usually
comes from the (often very many) online highlighting comparisons, which are
threaded or multiprocessed, depending. You can play with making that single
threaded (for example, which is something I've toyed with as a comparison)
which generally improves the initial responsiveness at the cost of overall
performance.

As I said, I haven't looked at this much since the GTK3 port, so I'd be
interested to know if you find anything.

Cheers,
Kai
_______________________________________________
meld-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/meld-list

Reply via email to