I didnt think of the niceties that databases offer so far, but that
looks useful indeed.
A database does not have to be PostgreSQL. It can be simple and
shareable. It can be an Emacs hash table. It can be an XML file. It can
be an Org file. It can be a SQLite file in your git repository. It does
not matter.
What matters is that there is a centralized place for the metadata. A
single source of truth. A place where you can store arbitrary
metadata—not just a limited set of fields that fit in a link syntax.
Inserting limited metadata is not arbitrary metadata. If you can only
fit a few fields in the link, you are not supporting arbitrary metadata.
You are supporting a limited set of fields that someone decided were
important.
Arbitrary metadata means any field, any value, any structure, any
relationship. That cannot fit in a link. That requires a data store.
That requires a database—whether it is PostgreSQL, SQLite, or an Org
file with properties.
Let me explain what kind of link-based workflow I have in mind before
saying what system is best suited.
My main focus is to make links reliable, in the sense that when you
follow a link, it always just works; in linkin-org, this is ensured
with a decentralized id system etc, but that's irrelevant for the
current matter (check out https://github.com/Judafa/linkin-org to know
more).
From the user's point of view, it should feel as if you merged org-mode
with dired as your links (with file type) are access points as reliable
as a file listed in dired.
That means that principle I have explained fit your own philosophy
perfectly.
I am just not sure if you are saying one thing, but not following it or
not seeing.
It is architectural solution that links never break to have unique IDs
and any changes of link are managed centralized. That way your Org links
do not break. Let us say if website link is not valid any more, you
could edit centralized place of links (call it database), and then all
the Org links could point to archived version of the link, user doesn't
change each particular link in many Org files.
I do not know about you, but I had thousands of Org files, one for each
person. Imagine the problem to edit each single link in those files. It
is quite different when you use centralized links and just reference
them by UUID or ID.
For a link (with file type) to work, you just need two things: the link
and the file.
Makes sense, trivial, and that's meant to be this way.
Trivial for small file, single user, who has time for thinking without
practicing.
For any user beyond that, but even much below power users, that system
you are proposing is shallow.
Files can change. Is link going to work always?
If files are indexed in the database and you move them in the database,
then file system is forgotten, and links will always work that way (for
as long as hard disk and computer works).
If files do not have metadata (indexing) then when file is renamed or
moved, the link is gone and destroyed.
Solution to keep links working long term is to have indexed list of
links and to reference them by their ID.
By extension, I see any third-party, centralized intermediary as a
weakness towards reliable links.
I don't know what you mean with third parties. I have not suggested
having a third party. Having central place of links means that you have
to own and control your central place of links. That is not third party.
If you do not own the hard disk and indexing system of third parties,
such as archive.org then you are more subject to lose reliable links.
Correct me if I'm wrong, but in case the database is
lost/corrupted/non-updated, then the links may not work anymore.
Sure, but same can be said for your files, if they are lost, corrupted,
you will not have reliable links, that is not context of the principle
explained.
And this would be the worst outcome: you're left with years of notes
full of links that are now useless.
Same thing if your hard disk get corrupted.
Backing up the database is far more simpler than backing up the file
system.
The issue of backing up your digital data is not really the subject,
isn't it?
Arbitrary data in links is fully solved on my side. I told you I have
unlimited information on links, truly arbitrary information, and I am
using the principle explained.
I'm primarily targeting a fully decentralized system, which fits best
with the philosophy of org mode imo.
I am trying to understand how "decentralized system" fit into arbitrary
metadata, but okay, maybe you mean you wish to have each link not
centrally indexed, and so all information stored in the simple link.
Arbitrary data means unlimited fields, values, structures,
relationships, so that cannot fit into link, or in the filename, or in
plain text file, that needs data store.
This comes with a whole bunch of desirable niceties: you can
move/update your linked file anywhere (dropbox app on your phone,
whatever) without notifying a database, the link still works.
That relates to file synchronization, not a knowledge management
workflow.
Dynamic Knowledge Repository is invented by Doug Engelbart, so you
should look into his work. He has invented it exactly for the same
purpose you are describing. Maybe it is lot to read?
But let me see practically, if I have my database, and it is accessible,
then I can move my files anywhere I wish and want and links will still
work.
I would say it may even combine neatly: one can use the id of a file in
last resort to make sure the database cannot lose track of the file.
For those files over several bytes, I keep their index with hashes. So
re-indexing would find those files. That is another architectural
principle that you could lose so that your links stay always same, even
if files is moved on file system or renamed, as long as it is not
changed.
And if you wish to change the file, then maybe such change should occur
over the layer of the index, so that file is first recognized, changed,
and re-indexed after the change. That way you could arbitrary change
files and have links still working as long as you use their IDs.
PS: Some more minor remarks:
-
Link creation or capturing should be instant. 1-3 seconds.
The link is created automatically, obviously.
- with a database, no easy way to share a data afaict. Otoh if the
metadata is inside the link, then that's as simple as paste the link,
put the file in attachement, send the email.
Sharing with a database is trivial. You export the object to a
self-contained format—JSON, XML, or even an Org property drawer—and send
it. The recipient imports it into their own database. The link remains
the same. The metadata remains intact. The relationships remain
preserved.
Alternatively, you can embed the metadata in the link at export time.
The database stores the metadata. The link is the identifier. When you
export for sharing, you expand the metadata into the link. When you
import, you extract it back into the database.
This gives you the best of both worlds: your internal system is
queryable, versioned, and scalable, and your shared links are
self-contained.
Your system cannot do the reverse. Once the metadata is embedded in the
link, you cannot extract it into a database without parsing every link.
You are stuck with embedded metadata forever.
Just as you would need to "attach the file" to share the link inside of
Org file, so I also need to press few buttons and I could export
whatever links and share with people.
--
Jean Louis