Re: [PATCH v3] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-13 Thread Ihor Radchenko
Max Nikulin  writes:

>> I do not think that it is a good idea. We currently promise to use the
>> first function in the list to generate the IDs. Other functions are
>> mostly a fallback to catch the existing attach dirs created in the past
>> using different ID scheme.
>
> My idea is to add an extra check that filters out nil values. I do not 
> mind to signal a user error if first entry from 
> `org-attach-id-to-path-function-list' returns nil and paths returned by 
> other function do not exist.

This would be possible, in addition to the patch above.
Or do you mean the default ts/uuid format functions to return nil to
short IDs as well?

>> I guess that one option could be demanding a non-nil return value when
>> generating a new attach dir (without TRY-ALL argument) and wrapping
>> things into (ignore-errors ...) otherwise.
>
> I am against `ignore-errors' because it makes harder bug hunting using 
> debug on errors.

Agreed.

> I would prefer to force users to justify strategy for non-standard IDs 
> as early as possible. The goal is to avoid manual actions to change 
> directory structure when users have hundreds of folders. My experience 
> that users may easily face performance issues with ~1000 files per 
> directory. The reason is either lack of optimization for such case in 
> particular application or a bug introduced in a new version.

I am not sure if we can solve this kind of issue reliably. The only idea
is enforcing [1-9][0-9]\{5\} for ts IDs. And we cannot do much about IDs
prefixed with the same constant string.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



Re: [PATCH v3] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-13 Thread Max Nikulin

On 13/08/2022 12:29, Ihor Radchenko wrote:

Max Nikulin writes:


I believe that interaction between `org-attach-dir-from-id' and members
of `org-attach-id-to-path-function-list' could be improved. If a too
short ID is passed to `org-attach-id-uuid-folder-format',
`org-attach-id-ts-folder-format', or a user-defined function, they may
return nil and `org-attach-dir-from-id' should try next element from the
list. If nothing succeeds then a user error should be signaled demanding
to implement a strategy to properly deal with peculiar IDs.


I do not think that it is a good idea. We currently promise to use the
first function in the list to generate the IDs. Other functions are
mostly a fallback to catch the existing attach dirs created in the past
using different ID scheme.


My idea is to add an extra check that filters out nil values. I do not 
mind to signal a user error if first entry from 
`org-attach-id-to-path-function-list' returns nil and paths returned by 
other function do not exist.



I guess that one option could be demanding a non-nil return value when
generating a new attach dir (without TRY-ALL argument) and wrapping
things into (ignore-errors ...) otherwise.


I am against `ignore-errors' because it makes harder bug hunting using 
debug on errors.


I would prefer to force users to justify strategy for non-standard IDs 
as early as possible. The goal is to avoid manual actions to change 
directory structure when users have hundreds of folders. My experience 
that users may easily face performance issues with ~1000 files per 
directory. The reason is either lack of optimization for such case in 
particular application or a bug introduced in a new version.





Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-13 Thread Max Nikulin

On 12/08/2022 23:08, Janek F wrote:

I use org-roam which creates links between org-files using the ID, and I
would like to be able to manually add such links to common pages even if
autocompletion doesn't work smoothly.


Unfortunately I have not tried org-roam yet. My suggestion is to add 
CUSTOM_ID property in addition to ID. I admit that CUSTOM_ID may be 
inconvenient if a heading is moved from one file to another. Are there 
more problems with CUSTOM_ID links?



Considering the other messages, let's not get lost in details, the
default function does not need to consider all edge cases. It simply
should not error out on ids in a different format.


Janek, what is your expectation concerning directory structure for 
attachments in the case of customized ID values?


You have not provided details concerning your configuration, but perhaps 
your issue may be solved by customizing 
`org-attach-id-to-path-function-list': keep 
`org-attach-id-uuid-folder-format' and drop 
`org-attach-id-ts-folder-format'. It should help unless you have single 
character IDs.


I do not think Org behavior is ideal in your case. On the other hand the 
code was written with assumption that either UUID or timestamp-based IDs 
are used. It is violated by handcrafted IDs. I do not know the full 
story, but I guess, `org-attach-id-ts-folder-format' is present in 
default configuration for backward compatibility reasons.


org-attach-id-ts-folder-format: Args out of range: "ftt", 0, 6

is an obscure error. However Org needs some hint from user how to handle 
such case.




Re: [PATCH v3] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-12 Thread Ihor Radchenko
Max Nikulin  writes:

> was a really bad one. It leads to a separate directory for each short 
> ID. However I still believe that the purpose of 
> `org-attach-id-ts-folder-format' is avoid concentration of huge number 
> of file in a single directory. Distribution of attachments over 
> subdirectories is not perfectly even but it still works reasonably well 
> for long-lasting projects while IDs follow assumed format and month is 
> included into directory names.

This is only true for uuid-style. Also, this is likely not the problem
we need to worry about. To be a problem, the number of attach dirs
should be really large, which is only relevant when thousands of attach
dirs are cramped into a single parent directory. Thousands of attach
dirs would probably correspond to tens of thousands of headings, which
is not something we commonly have and, honestly, Org will have much more
serious issues when the number of headings is that large.

If someone ends up complaining about too many attach dirs, we can always
ask to adjust the ID naming scheme.

The previous paragraph appears to be consistent with

> Back to the original problem. First of all, I believe that if a user 
> decided to use non-standard IDs then it is their responsibility to 
> customize `org-attach-id-to-path-function-list' and to provide a 
> function that implements alternative layout of attachment files.
> Depending on expected amount, they may be put into single directory, 
> spread using some hash function, or accordingly to project structure. So 
> today I am against default fallback directories especially in 
> `org-attach-id-ts-folder-format'.

---

> I believe that interaction between `org-attach-dir-from-id' and members 
> of `org-attach-id-to-path-function-list' could be improved. If a too 
> short ID is passed to `org-attach-id-uuid-folder-format', 
> `org-attach-id-ts-folder-format', or a user-defined function, they may 
> return nil and `org-attach-dir-from-id' should try next element from the 
> list. If nothing succeeds then a user error should be signaled demanding 
> to implement a strategy to properly deal with peculiar IDs.

I do not think that it is a good idea. We currently promise to use the
first function in the list to generate the IDs. Other functions are
mostly a fallback to catch the existing attach dirs created in the past
using different ID scheme.

I guess that one option could be demanding a non-nil return value when
generating a new attach dir (without TRY-ALL argument) and wrapping
things into (ignore-errors ...) otherwise.

> There should be no problem to put single character IDs into a common 
> directory (UUID case), but there are enough variants for 5 characters 
> long names to create delayed performance problem. The worst case is when 
> a user decides to use some common prefix, e.g. "id-" before UUID and all 
> files go to the same directory. On the other hand I do not think that 
> code should detect such cases.

We should not worry about this. It will be the user responsibility to
create the relevant custom function.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-12 Thread Ihor Radchenko
Janek F  writes:

> Considering the other messages, let's not get lost in details, the default 
> function does not need to consider all edge cases. It simply should not error 
> out on ids in a different format.

You are indeed right, but we may use multiple different approaches to
achieve this. Some are more simplistic, some are less. We are trying to
find a balance between less hassle for users and not overthinking
things.

The simplest approach would be wrapping the function calls into
(ignore-errors ...), which would avoid errors when we just try to check
the existence of the attachment dir. However, it may raise unexpected errors
for non-standard/copy-pasted IDs for users who did not customize this
area.

The approach I am proposing is handling unexpected IDs inside the
default functions and bringing errors in user-defined functions to the
attention of user - I feel that it will be more useful than ignoring
errors.

We could use yet another approach and use a global fallback if no
available function can generate an ID, but we may run into tricky cases
because the first function in the list has a special meaning - it _must_
generate a path from ID or the user may run into unexpected breakage in
some cases when custom IDs are being used.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-12 Thread Janek F
I use org-roam which creates links between org-files using the ID, and I would 
like to be able to manually add such links to common pages even if 
autocompletion doesn't work smoothly.

Considering the other messages, let's not get lost in details, the default 
function does not need to consider all edge cases. It simply should not error 
out on ids in a different format.

--- Original Message ---
Max Nikulin  schrieb am Freitag, 12. August 2022 um 17:35:


> On 21/07/2022 02:12, Janek F wrote:
>
> > However I tend to customize IDs for important files by hand,
> > causing any attempt to use org-attach on that file to fail
> > if the ID is shorter than six characters:
> >
> > org-attach-id-ts-folder-format: Args out of range: "ftt", 0, 6
> >
> > This method should be adjusted to handle non-ts-ids just as well,
> > as org-id-method does not dictate the format of existing ids.
>
>
> There are should be no such restriction for CUSTOM_ID links (referenced
> using #anchor). IDs used for attachments assume particular format and
> consistent layout of files. Janek, could you, please, provide some
> details concerning your use case. I am interested why you decided to
> avoid CUSTOM_ID for headings important to you.
>
> As I wrote yesterday, you can customize Org to use directories for
> attachments you like.



Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-12 Thread Max Nikulin

On 21/07/2022 02:12, Janek F wrote:


However I tend to customize IDs for important files by hand,
causing any attempt to use org-attach on that file to fail
if the ID is shorter than six characters:

     org-attach-id-ts-folder-format: Args out of range: "ftt", 0, 6

This method should be adjusted to handle non-ts-ids just as well,
as org-id-method does not dictate the format of existing ids.


There are should be no such restriction for CUSTOM_ID links (referenced 
using #anchor). IDs used for attachments assume particular format and 
consistent layout of files. Janek, could you, please, provide some 
details concerning your use case. I am interested why you decided to 
avoid CUSTOM_ID for headings important to you.


As I wrote yesterday, you can customize Org to use directories for 
attachments you like.





Re: [PATCH v3] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-11 Thread Max Nikulin

On 11/08/2022 11:19, Ihor Radchenko wrote:

Max Nikulin writes:


I slightly dislike the "___xx" compared to "__" because it will
create a proliferation of top-level folders as opposed to cramping the
non-standard IDs into a single "__" folder.


I believed that proliferation of folders is for purpose. Intermediate
directories allows to avoid excessive number of files in single
directory. ext4 with directory tree index usually is not the case, but
other filesystems may have rather poor performance when too much files
are stuffed into single folder. Some applications become really slow for
huge directories.


I was referring to TS style timestamp resolver here. It is designed to
group directories by creation time, not to distribute them homogeneously.


My bad, I have realizes that my idea of mapping

"x" -> "__x/x"
"xy" -> "_xy/xy"

was a really bad one. It leads to a separate directory for each short 
ID. However I still believe that the purpose of 
`org-attach-id-ts-folder-format' is avoid concentration of huge number 
of file in a single directory. Distribution of attachments over 
subdirectories is not perfectly even but it still works reasonably well 
for long-lasting projects while IDs follow assumed format and month is 
included into directory names.


Back to the original problem. First of all, I believe that if a user 
decided to use non-standard IDs then it is their responsibility to 
customize `org-attach-id-to-path-function-list' and to provide a 
function that implements alternative layout of attachment files. 
Depending on expected amount, they may be put into single directory, 
spread using some hash function, or accordingly to project structure. So 
today I am against default fallback directories especially in 
`org-attach-id-ts-folder-format'.


I believe that interaction between `org-attach-dir-from-id' and members 
of `org-attach-id-to-path-function-list' could be improved. If a too 
short ID is passed to `org-attach-id-uuid-folder-format', 
`org-attach-id-ts-folder-format', or a user-defined function, they may 
return nil and `org-attach-dir-from-id' should try next element from the 
list. If nothing succeeds then a user error should be signaled demanding 
to implement a strategy to properly deal with peculiar IDs.


There should be no problem to put single character IDs into a common 
directory (UUID case), but there are enough variants for 5 characters 
long names to create delayed performance problem. The worst case is when 
a user decides to use some common prefix, e.g. "id-" before UUID and all 
files go to the same directory. On the other hand I do not think that 
code should detect such cases.




Re: [PATCH v3] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-10 Thread Ihor Radchenko
Max Nikulin  writes:

>>> Please, do not use path components starting with dash, it is terrible
>>> for CLI tools. By the way, you promised underscores, not dashes.
>> 
>> Why?
>
> It is inconvenient in interactive sessions (in scripts appropriate 
> measures must be taken anyway), the following does not list content of 
> the "--" directory:
>
>  ls -l --

Thanks for the clarification.

>> I slightly dislike the "___xx" compared to "__" because it will
>> create a proliferation of top-level folders as opposed to cramping the
>> non-standard IDs into a single "__" folder.
>
> I believed that proliferation of folders is for purpose. Intermediate 
> directories allows to avoid excessive number of files in single 
> directory. ext4 with directory tree index usually is not the case, but 
> other filesystems may have rather poor performance when too much files 
> are stuffed into single folder. Some applications become really slow for 
> huge directories.

I was referring to TS style timestamp resolver here. It is designed to
group directories by creation time, not to distribute them homogeneously.

> On the other hand there are not so many distinct file names that may 
> fall into "__" directory, so perhaps it is OK.

Yes. Such short IDs are not supposed to be common and even if they are,
the maximum possible number of such IDs is relatively small. 


-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



Re: [PATCH v3] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-10 Thread Max Nikulin

On 10/08/2022 20:23, Ihor Radchenko wrote:

Max Nikulin writes:


+the path."
+  (if (< (length id) 3)
+  (format "--/%s" id)


Please, do not use path components starting with dash, it is terrible
for CLI tools. By the way, you promised underscores, not dashes.


Why?


It is inconvenient in interactive sessions (in scripts appropriate 
measures must be taken anyway), the following does not list content of 
the "--" directory:


ls -l --


I slightly dislike the "___xx" compared to "__" because it will
create a proliferation of top-level folders as opposed to cramping the
non-standard IDs into a single "__" folder.


I believed that proliferation of folders is for purpose. Intermediate 
directories allows to avoid excessive number of files in single 
directory. ext4 with directory tree index usually is not the case, but 
other filesystems may have rather poor performance when too much files 
are stuffed into single folder. Some applications become really slow for 
huge directories.


On the other hand there are not so many distinct file names that may 
fall into "__" directory, so perhaps it is OK. Concerning "unknown" I am 
less sure.


I do not have strong opinion how to properly deal with short IDs.




[PATCH v3] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-10 Thread Ihor Radchenko
Max Nikulin  writes:

> On 10/08/2022 18:43, Ihor Radchenko wrote:
>> Ihor Radchenko writes:
>> 
>> I have updated the patch to use "__/id" when id is too short.
>> Any objections?
>> 
>> +When ID is too short (less than 3 chars), use its md5 hash to create
>
> Misleading docs, you do not use md5.

Oops. Will fix.

>> +the path."
>> +  (if (< (length id) 3)
>> +  (format "--/%s" id)
>
> Please, do not use path components starting with dash, it is terrible 
> for CLI tools. By the way, you promised underscores, not dashes.

Why?

I have no opinion about the possible dummy folder name, except that it
should fit the general pattern we already have "xx/..." or "MM/...".

>> +(format "%s/%s"
>> +(substring id 0 2)
>> +(substring id 2
>
> Ihor, I have not look into the code around, so my suggestions may have 
> no sense.
>
> Is it possible to pass empty string as ID to these functions? Should it 
> be explicitly checked?

It should not be possible under normal operation.

> What if ID contains "/" that can not be used in file name? Windows has 
> more forbidden characters.

Then, creating the attachment dir will fail in front of the user. I am
not sure if we really need to do much about this. At least, not until we
get a bug report.

If we really need to solve the edge cases with attach dir generation, we
may need to change the overall design in org-attach/org-id to something
more restrictive. I am not sure if it is worth the effort compared to
other directions where to improve Org.

> Do you expect any problem if here (and for timestamp-based ids) 
> directory component is just padded with some character (unsure what can 
> better than underscore) to required length?
>
> "x" -> "_x/x" or "__x/x"
> "xy" -> "xy/xy" or "_xy/xy"
> etc.
>
>  From my point of view it might be a bit better than "__" and "unknown".

Does it make sense?
Timestamp-based IDs are like 20220810T210121.478800 by default.
Then, the top-level folders will be like 202208/...
If the actual format is different, we cannot possibly expect the
timestamp to follow the same pattern, which is why I chose "unknown"
referring to unknown date.

On the other hand, if an ID has more than 6 characters, it will generate
nonsense top-level folder with or without the patch. We could go further
and match the ID against [1-9][0-9]\{5\} and put the folder into
"unknown/ID" instead, but that would be a breaking change.

In summary, I am more of less neutral towards this fallback format,
except that it would be nice to be able to recover the ID from the
directory path. Padding should be recoverable.

I slightly dislike the "___xx" compared to "__" because it will
create a proliferation of top-level folders as opposed to cramping the
non-standard IDs into a single "__" folder.

Attaching the updated patch.

>From 50d0f9de0acdf5d67b797476816cbeb40b19f554 Mon Sep 17 00:00:00 2001
Message-Id: <50d0f9de0acdf5d67b797476816cbeb40b19f554.1660137585.git.yanta...@gmail.com>
From: Ihor Radchenko 
Date: Sat, 23 Jul 2022 13:13:24 +0800
Subject: [PATCH v3] org-attach-dir-from-id: Do not rely on ID being over 6
 chars long

* lisp/org-attach.el (org-attach-id-uuid-folder-format): Fall back to
"__/ID" when the ID contains 2 chars or less and cannot be split into
the "xy/z" path.
(org-attach-id-ts-folder-format): Fall back to "__/ID" path format
when the ID contains less than 7 chars and cannot be split into the
"MM/rest" path.

Fixes https://orgmode.org/list/KC8PcypJapBpJQtJxM0kX5N7Z0THL2Lq6EQjBMzpw1-vgQf72egZ2JOIlTbPYiqAVD4MdSBhrhBZr2Ykf5DN1mocm1ANvvuKKZShlkgzKYM=@pm.me
---
 lisp/org-attach.el | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/lisp/org-attach.el b/lisp/org-attach.el
index fe49af6f3..0f5d5af82 100644
--- a/lisp/org-attach.el
+++ b/lisp/org-attach.el
@@ -159,19 +159,27 @@ (defcustom org-attach-archive-delete nil
 (defun org-attach-id-uuid-folder-format (id)
   "Translate an UUID ID into a folder-path.
 Default format for how Org translates ID properties to a path for
-attachments.  Useful if ID is generated with UUID."
-  (format "%s/%s"
-	  (substring id 0 2)
-	  (substring id 2)))
+attachments.  Useful if ID is generated with UUID.
+
+When ID is too short (less than 3 chars), return \"__/ID\"."
+  (if (< (length id) 3)
+  (format "__/%s" id)
+(format "%s/%s"
+	(substring id 0 2)
+	(substring id 2
 
 (defun org-attach-id-ts-folder-format (id)
   "Translate an ID based on a timestamp to a folder-path.
 Useful way of translation if ID is generated based on ISO8601
 timestamp.  Splits the attachment folder hierarchy into
-year-month, the rest."
-  (format "%s/%s"
-	  (substring id 0 6)
-	  (substring id 6)))
+year-month, the rest.
+
+When ID is too short (less than 7 chars), return \"__/ID\"."
+  (if (< (length id) 7)
+  (format "__/%s" id)
+(format "%s/%s"
+	(substring id 0 6)
+	(substring id 6
 
 (defcustom 

Re: [PATCH v2] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-10 Thread Max Nikulin

On 10/08/2022 18:43, Ihor Radchenko wrote:

Ihor Radchenko writes:

I have updated the patch to use "__/id" when id is too short.
Any objections?

+When ID is too short (less than 3 chars), use its md5 hash to create


Misleading docs, you do not use md5.


+the path."
+  (if (< (length id) 3)
+  (format "--/%s" id)


Please, do not use path components starting with dash, it is terrible 
for CLI tools. By the way, you promised underscores, not dashes.



+(format "%s/%s"
+   (substring id 0 2)
+   (substring id 2


Ihor, I have not look into the code around, so my suggestions may have 
no sense.


Is it possible to pass empty string as ID to these functions? Should it 
be explicitly checked?


What if ID contains "/" that can not be used in file name? Windows has 
more forbidden characters.


Do you expect any problem if here (and for timestamp-based ids) 
directory component is just padded with some character (unsure what can 
better than underscore) to required length?


"x" -> "_x/x" or "__x/x"
"xy" -> "xy/xy" or "_xy/xy"
etc.

From my point of view it might be a bit better than "__" and "unknown".




[PATCH v2] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-10 Thread Ihor Radchenko
Ihor Radchenko  writes:

>> Ihor, I am afraid of collisions due to short input to md5. Long hash 
>> value gives false impression of high entropy but actually there are not 
>> so many variants for 1 or 2 characters strings. Either there is no point 
>> in making long name from short id or random string of appropriate length 
>> should be used instead.
>
> Random is not an option. If we use random string, how will we get the
> right attachment path on a second call? This function must return the
> same value given the same input.
>
> Having said that, I do agree (taking into account the other comment)
> that md5 may not be a very good idea. Maybe simply something like
> "__/id". Ideally, we should be able to recover the original id from the
> path to attachment.

I have updated the patch to use "__/id" when id is too short.
Any objections?

>From 7ca5fe66c53a1399943a2e2d4a1675ca8cd14194 Mon Sep 17 00:00:00 2001
Message-Id: <7ca5fe66c53a1399943a2e2d4a1675ca8cd14194.1660131747.git.yanta...@gmail.com>
From: Ihor Radchenko 
Date: Sat, 23 Jul 2022 13:13:24 +0800
Subject: [PATCH v2] org-attach-dir-from-id: Do not rely on ID being over 6
 chars long

* lisp/org-attach.el (org-attach-id-uuid-folder-format): Fall back to
"--/ID" when the ID contains 2 chars or less and cannot be split into
the "xy/z" path.
(org-attach-id-ts-folder-format): Fall back to "unknown/ID" path
format when the ID contains less than 7 chars and cannot be split into
the "MM/rest" path.

Fixes https://orgmode.org/list/KC8PcypJapBpJQtJxM0kX5N7Z0THL2Lq6EQjBMzpw1-vgQf72egZ2JOIlTbPYiqAVD4MdSBhrhBZr2Ykf5DN1mocm1ANvvuKKZShlkgzKYM=@pm.me
---
 lisp/org-attach.el | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/lisp/org-attach.el b/lisp/org-attach.el
index fe49af6f3..bb7c787f7 100644
--- a/lisp/org-attach.el
+++ b/lisp/org-attach.el
@@ -159,19 +159,28 @@ (defcustom org-attach-archive-delete nil
 (defun org-attach-id-uuid-folder-format (id)
   "Translate an UUID ID into a folder-path.
 Default format for how Org translates ID properties to a path for
-attachments.  Useful if ID is generated with UUID."
-  (format "%s/%s"
-	  (substring id 0 2)
-	  (substring id 2)))
+attachments.  Useful if ID is generated with UUID.
+
+When ID is too short (less than 3 chars), use its md5 hash to create
+the path."
+  (if (< (length id) 3)
+  (format "--/%s" id)
+(format "%s/%s"
+	(substring id 0 2)
+	(substring id 2
 
 (defun org-attach-id-ts-folder-format (id)
   "Translate an ID based on a timestamp to a folder-path.
 Useful way of translation if ID is generated based on ISO8601
 timestamp.  Splits the attachment folder hierarchy into
-year-month, the rest."
-  (format "%s/%s"
-	  (substring id 0 6)
-	  (substring id 6)))
+year-month, the rest.
+
+When ID is too short (less than 7 chars), return \"unknown/ID\"."
+  (if (< (length id) 7)
+  (format "unknown/%s" id)
+(format "%s/%s"
+	(substring id 0 6)
+	(substring id 6
 
 (defcustom org-attach-id-to-path-function-list '(org-attach-id-uuid-folder-format
 		 org-attach-id-ts-folder-format)
-- 
2.35.1



-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92


Re: [PATCH] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-03 Thread Ihor Radchenko
Max Nikulin  writes:

> On 23/07/2022 12:22, Ihor Radchenko wrote:
>> Tentative patch is attached.
>
>> +  (if (< (length id) 3)
>> +  (org-attach-id-uuid-folder-format (md5 id))
>
> Ihor, I am afraid of collisions due to short input to md5. Long hash 
> value gives false impression of high entropy but actually there are not 
> so many variants for 1 or 2 characters strings. Either there is no point 
> in making long name from short id or random string of appropriate length 
> should be used instead.

Random is not an option. If we use random string, how will we get the
right attachment path on a second call? This function must return the
same value given the same input.

Having said that, I do agree (taking into account the other comment)
that md5 may not be a very good idea. Maybe simply something like
"__/id". Ideally, we should be able to recover the original id from the
path to attachment.

Best,
Ihor



Re: [PATCH] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-03 Thread Max Nikulin

On 23/07/2022 12:22, Ihor Radchenko wrote:

Tentative patch is attached.



+  (if (< (length id) 3)
+  (org-attach-id-uuid-folder-format (md5 id))


Ihor, I am afraid of collisions due to short input to md5. Long hash 
value gives false impression of high entropy but actually there are not 
so many variants for 1 or 2 characters strings. Either there is no point 
in making long name from short id or random string of appropriate length 
should be used instead.





Re: [PATCH] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-08-02 Thread Janek F
Just now had a closer look at your patch,
and I don't like the hashing.
My whole idea customizing ids is to make the file structure useful by itself, 
too.
This is how I patched it for me:

  (defun xf/org-attach-id-folder-format (id)
"Translate any ID into a folder-path."
(format "%s/%s"
(substring id 0 2)
(if (> (seq-length id) 2) (substring id 2) id))
)
  (setq org-attach-id-to-path-function-list '(xf/org-attach-id-folder-format))


--- Original Message ---
Ihor Radchenko  schrieb am Samstag, 23. Juli 2022 um 07:21:


> Janek F xer...@pm.me writes:
>
> > When setting org-id-method to 'ts or 'org,
> > org-attach seems to use org-attach-id-ts-folder-format
> > to create its hierarchy.
> >
> > However I tend to customize IDs for important files by hand,
> > causing any attempt to use org-attach on that file to fail
> > if the ID is shorter than six characters:
> >
> > org-attach-id-ts-folder-format: Args out of range: "ftt", 0, 6
> >
> > This method should be adjusted to handle non-ts-ids just as well,
> > as org-id-method does not dictate the format of existing ids.
>
>
> Thanks for reporting!
> Tentative patch is attached.
>
> I have added some fallbacks for the default attach folder formatters, so
> that they can work when the ID does not conform to specific format.
>
> This change is somewhat opinionated, so feel free to suggest alternative
> solutions/fallback options.
>
> Best,
> Ihor



[PATCH] Re: [BUG] org-attach-id-ts-folder-format fails on customized IDs [9.6 (9.6-??-2e9999783)]

2022-07-22 Thread Ihor Radchenko
Janek F  writes:

> When setting org-id-method to 'ts or 'org,
> org-attach seems to use org-attach-id-ts-folder-format
> to create its hierarchy.
>
> However I tend to customize IDs for important files by hand,
> causing any attempt to use org-attach on that file to fail
> if the ID is shorter than six characters:
>
>     org-attach-id-ts-folder-format: Args out of range: "ftt", 0, 6
>
> This method should be adjusted to handle non-ts-ids just as well,
> as org-id-method does not dictate the format of existing ids.

Thanks for reporting!
Tentative patch is attached.

I have added some fallbacks for the default attach folder formatters, so
that they can work when the ID does not conform to specific format.

This change is somewhat opinionated, so feel free to suggest alternative
solutions/fallback options.

Best,
Ihor

>From e004752ba39f0d328645a1f6053ad3ce3b06ac28 Mon Sep 17 00:00:00 2001
Message-Id: 
From: Ihor Radchenko 
Date: Sat, 23 Jul 2022 13:13:24 +0800
Subject: [PATCH] org-attach-dir-from-id: Do not rely on ID being over 6 chars
 long

* lisp/org-attach.el (org-attach-id-uuid-folder-format): Fall back to
using ID md5 hash when the ID contains 2 chars or less and cannot be
split into the "xy/z" path.
(org-attach-id-ts-folder-format): Fall back to "unknown/ID" path
format when the ID contains less than 7 chars and cannot be split into
the "MM/rest" path.

Fixes https://orgmode.org/list/KC8PcypJapBpJQtJxM0kX5N7Z0THL2Lq6EQjBMzpw1-vgQf72egZ2JOIlTbPYiqAVD4MdSBhrhBZr2Ykf5DN1mocm1ANvvuKKZShlkgzKYM=@pm.me
---
 lisp/org-attach.el | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/lisp/org-attach.el b/lisp/org-attach.el
index 36c21b702..7c72fd7ee 100644
--- a/lisp/org-attach.el
+++ b/lisp/org-attach.el
@@ -159,19 +159,28 @@ (defcustom org-attach-archive-delete nil
 (defun org-attach-id-uuid-folder-format (id)
   "Translate an UUID ID into a folder-path.
 Default format for how Org translates ID properties to a path for
-attachments.  Useful if ID is generated with UUID."
-  (format "%s/%s"
-	  (substring id 0 2)
-	  (substring id 2)))
+attachments.  Useful if ID is generated with UUID.
+
+When ID is too short (less than 3 chars), use its md5 hash to create
+the path."
+  (if (< (length id) 3)
+  (org-attach-id-uuid-folder-format (md5 id))
+(format "%s/%s"
+	(substring id 0 2)
+	(substring id 2
 
 (defun org-attach-id-ts-folder-format (id)
   "Translate an ID based on a timestamp to a folder-path.
 Useful way of translation if ID is generated based on ISO8601
 timestamp.  Splits the attachment folder hierarchy into
-year-month, the rest."
-  (format "%s/%s"
-	  (substring id 0 6)
-	  (substring id 6)))
+year-month, the rest.
+
+When ID is too short (less than 7 chars), return \"unknown/ID\"."
+  (if (< (length id) 7)
+  (format "unknown/%s" id)
+(format "%s/%s"
+	(substring id 0 6)
+	(substring id 6
 
 (defcustom org-attach-id-to-path-function-list '(org-attach-id-uuid-folder-format
 		 org-attach-id-ts-folder-format)
-- 
2.35.1