On 07/05/2021 09:08, Ihor Radchenko wrote:
Maxim Nikulin writes:
Did you just replace gethash by avl-tree?

Yes

Likely my idea is based on a
wrong assumption. I hoped that having positions of headers it is
possible to avoid jumps (goto-char ...) preceded or followed by regexp
matching almost completely. Previous header for arbitrary initial
position can be found using binary search through structure obtained
during scan.

Sorry, I cannot follow what you mean. The call to goto-char in
org-up-heading-safe is required by function docstring - we need to move
point to the parent heading.

I am trying to minimize number of regexp searches. Mostly it is applied when information concerning multiple headings is required (agenda, refile targets). It unlikely will get some benefits during interactive calls related to single heading.

For a file having 3000 headings, scanning though it takes ~0.1sec to get the following properties: position, level, list of parents (with same properties). Note that no expensive operations are performed like cleaning up of heading title.

Having list of headings (and its length), it is possible to build a tree for binary search in linear time. It takes ~0.01sec.

Having the tree, it is possible to instantly find heading for *arbitrary* position in the buffer. Likely the next operation is goto to the heading or to it parent and parsing the line for more detailed properties. The latter is cacheable, structure for heading properties can be expanded.

Hash works only for fixed set of positions, to use hash it is necessary to find position of the heading at first. On the other hand, to take advantage of binary tree, more substantial modification of code is required.

Since there is no operations as insertion or deletion of nodes from tree, no complex code is required to implement balancing rotations. That is why I think that avl-tree is an overkill.

See the attachment for experimental (thus almost untested) code. Likely you will find code style quite ugly. I am not happy with 0.1 sec for a moderately large file. It is close to the limit for comfortable interactive operations.

+                           (re-search-backward
+                             (format "^\\*\\{1,%d\\} " level-up) nil t)
+                           (funcall outline-level))))

Unsure concerning the following optimization from the point of
readability and reliability in respect to future modifications. Outline
level can be derived from the length of matched string without the
funcall requiring extra regexp.

I am not sure here. outline-level also consults outline-heading-alist,
though I found no references to that variable in Org mode code.
Otherwise, outline-level already reuses match-data. There is no regexp
matching there.

Sorry. You are right. The function does what I proposed to write explicitly. For some reason I believed that outline-level calls something like looking-at. Maybe I checked it earlier and completely forgot.

#+begin_src elisp :results none

  ;; Heading properties
  (defun nm-heading-properties-new (position level parents)
    "Heading properties: (position . (level . parent))"
    (cons position (cons level parents)))

  (defun nm-heading-properties-level (props)
    (cadr props))

  (defun nm-heading-properties-pos (props)
    (car props))

  (defun nm-heading-properties-parents (props)
    (cddr props))

  (defun nm-heading-pos-lessp (value props)
    (< value (nm-heading-properties-pos props)))

  (defun nm-buffer-headings-reversed (buffer)
    (with-current-buffer buffer
      (save-restriction
        (save-excursion
          (widen)
          (goto-char (point-min))
          (let ((count 0)
                (headings ())
                (parents ()))
            (while (re-search-forward org-outline-regexp-bol nil t)
              (let* ((pos (match-beginning 0))
                     (level (- (match-end 0) pos 1)))
                (while (and parents (>= (nm-heading-properties-level (car 
parents)) level))
                  (pop parents))
                (setq count (1+ count))
                (let ((props (nm-heading-properties-new pos level parents)))
                  (push props headings)
                  (push props parents))))
            (and headings (cons headings count)))))))


  ;; binary search tree
  (defun nm-btree-new-node ()
    "((left right) . properties"
    (cons (cons nil nil) nil))

  (defun nm-btree-node-left (node)
    (caar node))

  (defun nm-btree-node-set-left (node child)
    (setcar (car node) child))

  (defun nm-btree-node-set-right (node child)
    (setcdr (car node) child))

  (defun nm-btree-node-right (node)
    (cdar node))

  (defun nm-btree-node-properties (node)
    (cdr node))

  (defun nm-btree-node-set-properties (node properties)
    (setcdr node properties))

  (defun nm-btree-from-reversed (scan-result)
    (and
     scan-result
     (let* ((key-properties-list (car scan-result))
            (length (cdr scan-result))
            (head (nm-btree-new-node))
            (queue (list (cons length head)))) ; list of (count . node)
       (while queue
         (let* ((item (pop queue))
                (count (car item))
                (node (cdr item)))
           (cond
            ((eq count 1) ; leaf or only single child
             (nm-btree-node-set-properties node (pop key-properties-list)))
            ((nm-btree-node-right node) ; right children completed
             (nm-btree-node-set-properties node (pop key-properties-list))
             (let ((left-node (nm-btree-new-node)))
               (nm-btree-node-set-left node left-node)
               (push (cons (1- count) left-node) queue)))
            (t (let* ((right-count (/ (car item) 2))
                      (right-node (nm-btree-new-node)))
                 (nm-btree-node-set-right node right-node)
                 (setcar item (- count right-count))
                 (push item queue)
                 (push (cons right-count right-node) queue))))))
       head)))

  (defun nm-btree-find-left (tree value &optional cmp)
    "Find last element not less than value"
    (let ((cmp (or cmp #'nm-heading-pos-lessp))
          (result nil))
      (while tree
        (setq tree (if (funcall cmp value (nm-btree-node-properties tree))
                       (nm-btree-node-left tree)
                     (setq result tree)
                     (nm-btree-node-right tree))))
      (nm-btree-node-properties result)))
#+end_src

#+begin_src elisp
  (byte-compile #'nm-buffer-headings-reversed)
  (byte-compile #'nm-btree-from-reversed)
  (byte-compile #'nm-btree-find-left)

  (let* ((buffer "notes.org")
         (scan-result (nm-buffer-headings-reversed buffer))
         (tree (nm-btree-from-reversed scan-result))
         (lim (with-current-buffer buffer
                (save-restriction
                  (widen)
                  (point-max)))))
    (list
     (append '("scan x10")
             (benchmark-run 10
               (nm-buffer-headings-reversed buffer)))
     (append '("btree x10")
             (benchmark-run 10
               (nm-btree-from-reversed scan-result)))
     (append '("scan+btree x10")
             (benchmark-run 10
               (let* ((scan-result1 (nm-buffer-headings-reversed buffer))
                      (tree1 (nm-btree-from-reversed scan-result1)))
                 tree1)))
     (append '("find random x10 000")
             (benchmark-run 10000
               (nm-btree-find-left tree (random lim))))
     (list "nodes" (cdr scan-result) "" "")))
#+end_src

#+RESULTS:
| scan x10            |   0.8611382689999999 | 0 |                 0.0 |
| btree x10           |  0.07705962400000001 | 1 | 0.05648322199999978 |
| scan+btree x10      |          0.940467238 | 1 | 0.05685373699999996 |
| find random x10 000 | 0.047712096999999995 | 0 |                 0.0 |
| nodes               |                 3413 |   |                     |

Without ~byte-compile~
| scan x10            |  1.2031535999999998 |  4 | 0.22845906700000018 |
| btree x10           |         0.498214241 |  6 | 0.34067464299999894 |
| scan+btree x10      |  1.7026304230000002 | 10 |  0.5686926149999998 |
| find random x10 000 | 0.08789912700000001 |  0 |                 0.0 |
| nodes               |                3413 |    |                     |

Reply via email to