dmg <[email protected]> writes:

> Every time a file is saved and flycheck triggers org-lint, the function
> org-lint-invalid-id-link calls org-id-update-id-locations, which scans all
> org files on disk to rebuild the ID location cache. In my case this means
> scanning 1382 files on every single save, causing noticeable sluggishness.
>
> Looking at the code in org-lint.el (function org-lint-invalid-id-link):
>
>   (let ((id-locations-updated nil))
>     (org-element-map ast 'link
>       (lambda (link)
>         (unless id-locations-updated
>           (org-id-update-id-locations nil t)
>           (setq id-locations-updated t))
>         ...
>           )))
>
> The id-locations-updated flag correctly prevents scanning more than once
> per lint run. However, since it is a local variable, it is reset to nil
> on every call -- meaning a full scan of all org files happens on every
> flycheck-triggered lint run, i.e., on every save of any org file
> containing at least one id: link.

That's on purpose.
The idea is that it is unlikely that the ids get updated *during* the
linting, but they can definitely change (including outside Emacs)
between org-lint runs.

> As far as I understand, org-id-locations is already an in-memory hash table
> cache mapping IDs to files, loaded from org-id-locations-file at startup. The
> function org-id-find-id-file, called immediately after in the same checker,
> already trusts this cache. So org-lint-invalid-id-link is ignoring the cache
> for the update step but relying on it for the lookup step.

org-id-find-id-file indeed trusts the cache. However, its main caller -
`org-id-find', falls back to `org-id-update-id-locations' when
org-id-find-id-file fails.

> I think there are two complementary improvements, and I would like to
> discuss a potential fix, in 2 parts:
>
> 1. In org-lint-invalid-id-link: only call org-id-update-id-locations if
>    org-id-locations is empty or not yet loaded, rather than unconditionally
>    on every lint run:
>
>      (unless id-locations-updated
>        (when (or (not org-id-locations)
>                  (zerop (hash-table-count org-id-locations)))
>          (org-id-update-id-locations nil t))
>        (setq id-locations-updated t))
>
>    This way the scan only happens once per session (when the cache is
>    genuinely empty), not on every save.

That could cause extra false-positives if files are changed on disk.

> 2. Keep org-id-locations incrementally up to date on save: when a file
>    is saved, scan just that file for IDs and call org-id-add-location
>    for each one. It would mean the cache is always fresh without ever
>    needing a full rescan during linting.

That would not work very well on large files.
Saving is something done much more frequently compared to looking up ids.

> I may be missing something -- perhaps there is a reason the full rescan is
> necessary before declaring a link invalid -- so we wanted to raise this here
> before proposing a patch.

We have a relevant optimization that tries not to re-parse everything
every time. Check out `org-id--locations-checksum'. However, in your
specific case, that checksum is changes upon save.
I logical improvement would be not treating change in *one* file as a
trigger for full re-parsing, but do it more granularly in
`org-id-update-id-locations', reusing cache from the files that did not
change.

-- 
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

Reply via email to