Re: profiling latency in large org-mode buffers (under both main & org-fold feature)

Ihor Radchenko Sat, 26 Feb 2022 22:49:51 -0800

Max Nikulin <[email protected]> writes:

>> Max Nikulin writes:
>>> Actually I suspect that markers may have a similar problem during regexp
>>> searches. I am curious if it is possible to invoke a kind of "vacuum"
>>> (in SQL parlance). Folding all headings and resetting refile cache does
>>> not restore performance to the initial state at session startup. Maybe
>>> it is effect of incremental searches.
>> 
>> I doubt that markers have anything to do with regexp search itself
>> (directly). They should only come into play when editing text in buffer,
>> where their performance is also O(N_markers).
>
> I believed, your confirmed my conclusion earlier:
>
> Ihor Radchenko. Re: [BUG] org-goto slows down org-set-property.
> Sun, 11 Jul 2021 19:49:08 +0800.
> https://list.orgmode.org/orgmode/87lf6dul3f.fsf@localhost/


I confirmed that invoking org-refile-get-targets slows down your nm-tst
looping over the headlines.

However, the issue is not with outline-next-heading there. Profiling
shows that the slowdown mostly happens in org-get-property-block

I have looked into regexp search C source and I did not find anything
that could depend on the number markers in buffer.
After further analysis now (after your email), I found that I may be
wrong and regexp search might actually be affected.

Now, I did an extended profiling of what is happening using perf:

;; perf cpu with refile cache (using your previous code on my largest Org 
buffer)
    19.68%   [.] mark_object
     6.20%   [.] buf_bytepos_to_charpos
     5.66%   [.] re_match_2_internal
     5.33%   [.] exec_byte_code
     5.07%   [.] rpl_re_search_2
     3.09%   [.] Fmemq
     2.56%   [.] allocate_vectorlike
     1.86%   [.] sweep_vectors
     1.47%   [.] mark_objects
     1.45%   [.] pdumper_marked_p_impl

;; perf cpu without refile cache (removing getting refile targets from the code)
    18.79%   [.] mark_object
     8.23%   [.] re_match_2_internal
     5.88%   [.] rpl_re_search_2
     4.06%   [.] buf_bytepos_to_charpos
     3.06%   [.] Fmemq
     2.45%   [.] allocate_vectorlike
     1.63%   [.] exec_byte_code
     1.50%   [.] pdumper_marked_p_impl

The bottleneck appears to be buf_bytepos_to_charpos, called by
BYTE_TO_CHAR macro, which, in turn, is used by set_search_regs

buf_bytepos_to_charpos contains the following loop:

  for (tail = BUF_MARKERS (b); tail; tail = tail->next)
    {
      CONSIDER (tail->bytepos, tail->charpos);

      /* If we are down to a range of 50 chars,
         don't bother checking any other markers;
         scan the intervening chars directly now.  */
      if (best_above - bytepos < distance
          || bytepos - best_below < distance)
        break;
      else
        distance += BYTECHAR_DISTANCE_INCREMENT;
    }

I am not sure if I understand the code correctly, but that loop is
clearly scaling performance with the number of markers

Finally, FYI. I plan to work on an alternative mechanism to access Org
headings - generic Org query library. It will not use markers and
implement ideas from org-ql. org-refile will eventually use that generic
library instead of current mechanism.

Best,
Ihor

Re: profiling latency in large org-mode buffers (under both main & org-fold feature)

Reply via email to