On 1/7/26 01:24, raf wrote:
On Sat, Jan 03, 2026 at 10:05:16PM +0100, "Peter B." <[email protected]> 
wrote:

Hi!

I'm super happy to see that I'm not the only one anymore interested in
increased xattr, and  therefore possible key/value usage and query
functionality!

😀️

**I'm writing this with personal interest**
Yes, we ALL have data to deal with, and I like to tag and find my kids'
photos and files - regardless which "app" or system-

**as well as my professional interest**
Working with small-to-medium-to-very-very-large digital heritage
collections.

(And therefore loads of meta+data wrangling clever-hacks-and-stunts for a
living 😉️)


So, my hopefully useful "few-cents" to this thread are here:

On 1/2/26 13:11, raf wrote:
On Wed, Dec 31, 2025 at 04:21:14PM +0100, Bernhard Voelker 
<[email protected]> wrote:

On 12/30/25 17:45, Peter B. wrote:
I'll also checkout Morgan's find-patch:
I'd really love to see xattr-support in the basic "most likely to be
present-on-any-box" tools.
The technical coding under the hood for reading xattr from the file system
is there in Morgan's patch for quite a while, indeed.

But before adding, I feel we need to discuss once again the interface
to the find(1) user:

- Do we only want to provide an option to search for files having xattrs?
      find -xattr

- Do we want a test option to search for a file having an xattr key matching
    a certain string (or eventually pattern)?
      find -xattr 'mykey'        # xattr key equals string
      find -xattr '*mykey*'      # xattr key matches pattern
    Or better explicitly mention in the option that we match for the xattr keys?
      find -xattr-key 'mykey'
      find -xattr-key '*mykey*'

- Do we want a test option to search for a file having an xattr value matching
    a certain string (or eventually pattern)?
      find -xattr-value 'myval'   # files with xattr value equals string
      find -xattr-value '*myval*' # ... or pattern

- Or search for files with xattr having a certain mixture of 'key=val'?
      find -xattr-match 'mykey=myval'      # search by key+val as strings
      find -xattr-match 'mykey=*someval*'  # search for key matching a val 
pattern
There are versions of find that (I think) use -xattr to
just identify the existence of EAs. I don't think
that's enough, but it might make sense to have -xattr
do that
+1
absolutely agree.
To both: useful and not-enough.


(for compatibility with those other versions of
find), but to also have -xattr-key and -xattr-value (or
-xattr-match for both).
+1 again.
It indeed makes perfect sense (and has use-cases!) where one wants to query
(key OR value) AND (key AND value).

A lot useful if that provides some RegEx (for matching wildcards/patterns).


The thing to watch out for if both the key and value
are combined, is that if you format it like "key=value"
you need to consider the case where the key includes
"=". There might be malicious EAs trying to be tricky.
True.
There will always be potential for malicious attempts, yet I would suggest
to apply basic "query-escaping", until more strict antitrust seems
necessary?


My rawhide program matches EAs formatted like "key:
value" but non-ascii bytes are encoded (like \x1b or \n
or \t etc.) and any ": " in the key itself is encoded
as "\x3a " to disappoint the creators of malicious EAs.
I've never seen an EA whose key contained ": " so I
don't think it'll bother anyone.
I think we're on the same page here :)
Yet, one may say "xattrs are not-yet-popular enough, but you'll see, once
they do..." - like with all formats and data-exchange protocols, I guess?
EAs/xattrs are used for various things by various
systems. (e.g. selinux on Linux, quarantine on macOS)
but user/app-specific usage probably isn't popular yet.
Although it can be. Some systems allow 64KiB of EA data.

Thanks for info! I can confirm that.
btw, app-specific usage: yt-dlp supports `--xattrs` 😉️

I've put some FAQ infos together regarding xattrs (in 2024/25):
https://github.com/ArkThis/AHAlodeck/blob/main/doc/xattr_faq.md

The FAQ also contains xattr size-constraints per filesystem I was able to find (or test it). ZFS can do 64kiB, indeed: I must admit I still want to test if PER "key and/or value" - or in total. I've done tests, attaching a ~45kB uuencoded JPG image as xattr on ZFS and back: MD5 match.

I love the fact that EAs are used for security purposes:
I can assume it means relying on them is enterprise-production-stable (at least regarding the kernel/filesystem side)?


> What if:
> - I search for "some term", but don't care if it's a key or value string?

Having -xattr-key and -attr-value wouldn't help you in
that case. If you didn't care if it's a key or a value,
then you'd need to use both predicates in your search
(-xattr-key ... -o -xattr-value ...).

But the problem I was thinking of was that, if these
two predicates existed, then a user might reasonably
expect them to be connected to each such that using
both of them would refer to the same extended
attribute, i.e. that -xattr-key ... -a -xaatr-value ...
would match a file with an EA whose key matched the
-xattr-key and whose value matched the -xattr-value.
But it would almost certainly match a file with an EA
with the matching key and an EA with the matching
value, but they wouldn't have to be the same EA. They
could be different EAs. So I think it would be
difficult for a user to guess the correct behaviour.

I was thinking of defaulting to "linked" (=like `keep aspect ratio` by default),
and options to declare if my intention is linked/unlinked?

I have that use-case a lot: plain fulltext search in collections.
That use case is best served (I think) by a single
predicate that can match both the key and value encoded
in some way as I described earlier (either by
"key=val\n" or "key: val\n"). I chose ": " because
that's the format output by xattr on macOS (and
probably elsewhere).

Hm.
Does that mean you're matching against the output of `xattr` for searching, or just that you'd like to stay pattern-compatible with it?

That way, if you don't care whether your search
criteria applies to the key or the value, you can just
do -xattr-match 'something' and if you want to match a
key only you can do -regexxattr-match '^somekey: ' and
if you want to match a value only you can do
-regexxattr-match '^[^\n]+: [^\n]*someval', and to
match both the key and the value, you can do something
like -regexxattr-match '^somekey: [^\n]*someval'.

Roger that.
Is there any overhead due to requiring regex for this, compared to non-pattern matching here?


The above assumes that all EAs are encoded as a single
piece of text with one line per EA that is matched
against. The choice of encoding and whether it's
encoded like this at all would affect the examples.
This sort of matching probably isn't possible with
globbing unless the ksh extensions are available.

I'd vote for just -xattr and -regexxattr (and maybe
-ixattr and -iregexxattr) and have it match text that
looks like "key1=val1\nkey2=val2\nkey3=val3\n" or
"key1: val1\nkey2: val2\nkey3: val3\n".
I like args names that explain what they are, but just for "style"
suggestion:

[quick bike-shedding]:
What if, you'd call it "--xattr" and "--xxattr" or "--xattRX/xattrx"? (like
the "RX" at the end is "RegEx" :P)

Just an idea...
I suggested names that I thought were like the existing predicate names.
I don't know what would be best.

Sorry, didn't know there were existing ones:
Then of course it makes sense to keep their names.


And I assume "ixattr" is "case Insensitive"?
That's right.

- Do we need support for -printf formats to print xattr keys and/or values?
    How? The % 1-character directives are almost all used, maybe begin with
    the reserved ones? (%{ %[ %(
    How to output several xattr keys or values?
    How to select which xattr value to print?
      find -xattr -printf '%p %{xattr-keys=hello*} %{xattr-valuekey=hello}
There are plenty of conversion letters available!

See https://savannah.gnu.org/bugs/?64100
Having quickly read over that thread, I totally agree "IF those features be
added, IF possible, it'd be great if rawhide and find could stay
syntax-compatible".

Also, I think (If I understood correctly) that some common-library-printf
formatting syntax may save A LOT of re-occurring bash-or-python-style
"formatting-caller-works-on-my-machine" scripts, I guess?
But I'm far out on a limb here; I've not really read the thread, and I've
never used find-output formatting (as I didn't know it had any).
Only GNU find has it.

There's a lot of possibilities, and I don't want to introduce something
which contradicts the typical use cases, or is not extensible or not
maintainable.
The complexity - especially when it comes to -printf - is that files can have
several xattr while all other attributes (name, timestamps, permissions, size,
etc.) only exists once.
My rawhide program also has %j for JSON output. The
current version outputs EAs like %x but the next
version outputs them as a JSON object with key/value
pairs matching the EA names and values. It's still
encoded because EA values can be text or binary and
JSON can't represent binary without the user choosing
some encoding. So that's something to think about.
If any output data is available in a machine readable format (like JSON),
it'd definitely make the re-usability and interoperability between tools
(and scripts) written around find/rawhide, IMO (and experience).

I think I asked this kind of questions already some years ago, but there was
no input yet.
I personally don't have a use case to search for xattrs, so the usual pattern
with '-exec getfattr ...' works for me.
Anyone?
ME! Meee! :)
As I said, I'm doing devops for digital-and-physical GLAM-collections - and
a lot of metadata layout, storage and retrieval/usage.

For example:
After de-embedding (=copying as-is) all exiftool-readable metadata into
xattrs, being able to do some queries on those key/value pairs is AMAZING!

Combining that with RDF URIs for keys, combined with Wikibase-and-Wikidata
engine and data, I have many great use cases for xattrs (and find).
That sounds cool. A much more interesting use for EAs than
just selinux or quarantining. :-)

:)

I really like the fact that once I can rely more on xattrs use,
**any file-format ever is basically irrelevant from now on, for "all things metadata"**.
And this feels really good!

Like: Why must I choose another file-format, if I simply want to give "my file/data" annotation information?
(BMP vs PNG vs TIFF vs ... ".txt" regarding storing "their title"?)

Friend of mine has written a small "updatedb/locate" version with xattr-only support a few days ago for testing: He ran it on a collection of mixed-format, real-world annotated music collection (~14.000 files) - which have all exiftool metadata as xattrs: 13917 files took 1.416 seconds to index.

The index.db entries is a simple CSV, with: `"filename", "xattr-key", "xattr-value"`

Using xattrs already feel more streamlined, consistent and stable than any MAM/DAM metadata handling paradigm I've seen so far... ;)

I've rediscovered xattrs btw, when evaluating object storage implementations (FOSS) for large-scale data handling:

When I found out that S3 has very limited size for tags (127/256 chars) and other limitations (<=4kB) for metadata, or ASCII-only in popular implementations. See: [Minio issue #19576: "non ASCII characters in metadata"](https://github.com/minio/minio/discussions/19576)

...and pre-S3 Object Storages use the underlying filesystem's xattrs for their object-metadata! ;)


P.S. Finally, I wouldn't want to introduce too much and complex code for
0.25% use cases.  Eventually, the simple test to search for files having
xattrs is enough, and the application logic then extracts them with another
tool.
Is this about the code to handle xattrs (in `gnu-find`)?
There are libs for plain access to xattr key/value, but is there any
xattr-libs for higher-level functions (like queries, regex, etc)?
I think the actual searching would be done by what find is already doing
(glob, regex).

Roger.

Like what HaikuOS (probably?) does in their BeFS filesystem libraries?
A friend of mine (author of "https://sen-labs.org/": a semantic desktop
engine for Haiku) told me that BeFS has some database-like query functions
built-into their filesystem "xattrs".

I know any additional library-dependence is another "whole thing" to
maintain.
However, IMO being able to query key/value data "on the spot" is something
that will eventually replace the legacy "files-in-folders open/save dialogs
- and wildcards" as-we-know-it...?

Might be an incentive to invest in a common, and commonly supported
"attribute-handling" lib.
Just 2¢ of mine.

I just like interoperable and open tech-systems :)


I don't think it's necessarily too complex. The code in
rawhide to obtain EAs is 492 lines (335 excluding
blanks line, 278 excluding comment lines as well) for
Linux, macOS, Cygwin, Solaris, and FreeBSD (OpenBSD and
NetBSD don't have EAs). But of course, that's only the
start.
Including so many OSs already in your consideration(s), wouldn't that
already suffice for a common lib?
Ignore and correct me, if I'm totally on the wrong track here?
Thanks :)
rawhide is GPLv3+ so the code is available for a
library but if potential clients don't like my choices
for encoding the EA data it might not be suitable for
their needs. Encoding is always tricky to get right,
and maybe I didn't. And it's probably unwise to use a
third-party library until it has become popular enough
to be packaged on lots of systems. And GNU find
supports many more systems than rawhide does, and I
don't know which other systems support EAs. I only
support systems that I have real or virtual machines
for.

Thanks for those interesting insights!
Indeed, good point: no lib-dependency until "popular enough".

Kind regards, Peter

Reply via email to