Re: [DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS)

2024-05-05 Thread Ihor Radchenko
Ihor Radchenko  writes:

>> The only additional consideration is that compare function should be 
>> configurable. If a user access same files from Linux and macOS then it 
>> may be really annoying to get different order of entries in agenda. For 
>> most of Linux users it is better to use more smart 
>> `string-collate-lessp'. Some care is required to sort entries obtained 
>> from multiple buffers in predictable environment (locale, case 
>> conversion table).
>
> I agree. We can introduce a new customization -
> `org-string-sort-function' that will be used across Org mode to sort
> user text.

See the attached tentative patch.
I added a customization, made everything in Org obey it, and provided
some default options for MacOS users.

>From dbc3929d8c7a26da3bf31fb52a651da68d1f733b Mon Sep 17 00:00:00 2001
Message-ID: 
From: Ihor Radchenko 
Date: Sun, 5 May 2024 14:37:52 +0300
Subject: [PATCH] org: New Org-wide custom option `org-sort-function'

* lisp/org-macs.el (org-sort-function): New customization defining how
Org mode should sort headlines, table lines, agenda lines, etc.
(org-string<):
(org-string<=):
(org-string>=):
(org-string>): Use the new customization.
(org-string<>): Add docstring.
(org-sort-function-downcase): New helper function to help users on
MacOS where `string-collate-lessp' is not reliable.
* lisp/oc-basic.el (org-cite-basic--field-less-p):
* lisp/org-agenda.el (org-cmp-category):
(org-cmp-alpha):
* lisp/org-list.el (org-sort-list):
* lisp/org-mouse.el (org-mouse-list-options-menu):
* lisp/org-table.el (org-table-sort-lines):
* lisp/org.el (org-tags-sort-function):
(org-sort-entries):
* lisp/ox-publish.el (org-publish-sitemap): Honor the new
customization.
* lisp/org-mouse.el (org-mouse-tag-menu):
(org-mouse-popup-global-menu):
* lisp/org-agenda.el (org-cmp-tag): Honor `org-tags-sort-function' and
falling back to `org-string<' if note set.
* etc/ORG-NEWS (New option controlling how Org mode sorts things
~org-sort-function~): Announce the change.

This change aims to standardize the way Org mode performs sorting of
user data.  In particular, it addresses issues with oddities of string
collation rules on MacOS and tricky language environments like
Turkish.

Link: https://orgmode.org/list/87jzleptcs.fsf@localhost
---
 etc/ORG-NEWS   | 20 ++
 lisp/oc-basic.el   |  2 +-
 lisp/org-agenda.el | 12 -
 lisp/org-list.el   |  2 +-
 lisp/org-macs.el   | 66 +-
 lisp/org-mouse.el  | 13 +
 lisp/org-table.el  |  4 +--
 lisp/org.el|  6 ++---
 lisp/ox-publish.el |  9 +++
 9 files changed, 98 insertions(+), 36 deletions(-)

diff --git a/etc/ORG-NEWS b/etc/ORG-NEWS
index 3c597db40..af88febb1 100644
--- a/etc/ORG-NEWS
+++ b/etc/ORG-NEWS
@@ -710,6 +710,26 @@ any more.  Run ~org-ctags-enable~ to setup hooks and advices:
 #+end_src
 
 ** New and changed options
+*** New option controlling how Org mode sorts things ~org-sort-function~
+
+Sorting of agenda items, tables, menus, headlines, etc can now be
+controlled using a new custom option ~org-sort-function~.
+
+By default, Org mode sorts things according to the operation system
+language.  However, language sorting rules may or may not produce good
+results depending on the use case.  For example, multi-language
+documents may be sorted weirdly when sorting rules for system language
+are applied on the text written using different language.  Also, some
+operations systems (e.g. MacOS), do not provide accurate string
+sorting rules.
+
+Org mode provides 4 possible values for ~org-sort-function~:
+1. (default) Sort using system language rules.
+2. Sort using dumb string comparison. It is the most reliable option.
+3. Sort case-insensitively, making use of UTF case conversion.  This
+   may work better for mixed-language documents and on MacOS.
+4. Custom function, if the above does not fit the needs.
+
 *** =ob-latex= now uses a new option ~org-babel-latex-process-alist~ to generate png output
 
 Previously, =ob-latex= used ~org-preview-latex-default-process~ from
diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el
index 8959bb065..6e3142fa1 100644
--- a/lisp/oc-basic.el
+++ b/lisp/oc-basic.el
@@ -680,7 +680,7 @@ (defun org-cite-basic--field-less-p (field info)
 INFO is the export state, as a property list."
   (and field
(lambda (a b)
- (string-collate-lessp
+ (org-string<
   (org-cite-basic--get-field field a info 'raw)
   (org-cite-basic--get-field field b info 'raw)
   nil t
diff --git a/lisp/org-agenda.el b/lisp/org-agenda.el
index 93c6acef2..05d2f94c0 100644
--- a/lisp/org-agenda.el
+++ b/lisp/org-agenda.el
@@ -7489,8 +7489,8 @@ (defsubst org-cmp-category (a b)
   "Compare the string values of categories of strings A and B."
   (let ((ca (or (get-text-property (1- (length a)) 'org-category a) ""))
 	(cb (or (get-text-property (1- (length b)) 'org-category b) "")))
-(cond ((string-lessp ca cb) -1)
-	  

[DISCUSSION] Sorting strings in Org mode vs. system locale (was: test-org-table/sort-lines: Failing test on macOS)

2024-04-03 Thread Ihor Radchenko
Max Nikulin  writes:

>> This sounds like something to be adapted to Emacs upstream.
>> I suggested to change `string-collate-lessp' fallback behaviour to use
>> `downcase' when IGNORE-CASE is non-nil. See my last message in
>> bug#59275.
>
> I do not share Eli's position "all or nothing". I prefer graceful 
> degradation and best result achievable with reasonable efforts.

> However either the reason is performance or correctness, both variants 
> are against modification of `string-collate-lessp'. I still think that 
> Org will benefit from a compatibility wrapper with `downcase'.

Unless we have user complaints with real-world use-cases, I am leaning
towards keeping things consistent with Emacs. Including Emacs-wide
fallback for `string-collate-lessp'. This will make our life easier.

Maintaining an Org-specific fallback will (1) cost maintenance time; (2)
may confuse users used to global Emacs behaviour; (3) has no clear
benefit other than our theoretical discussion.

> The only additional consideration is that compare function should be 
> configurable. If a user access same files from Linux and macOS then it 
> may be really annoying to get different order of entries in agenda. For 
> most of Linux users it is better to use more smart 
> `string-collate-lessp'. Some care is required to sort entries obtained 
> from multiple buffers in predictable environment (locale, case 
> conversion table).

I agree. We can introduce a new customization -
`org-string-sort-function' that will be used across Org mode to sort
user text.

It would be even better to allow smart sort function that depends on
document #+language, but I do not see an easy way to implement such
feature - `string-collate-lessp' does accept LOCALE argument, but I have
no idea how to link #+LANGUAGE to locale deterministically.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at .
Support Org development at ,
or support my work at