On Sun, Jan 11, 2026 at 01:00:44AM +0100, "Peter B." <[email protected]> 
wrote:

> On 1/7/26 01:24, raf wrote:
> > On Sat, Jan 03, 2026 at 10:05:16PM +0100, "Peter B." 
> > <[email protected]> wrote:
> > 
> > > Hi!
> > > 
> > > I'm super happy to see that I'm not the only one anymore interested in
> > > increased xattr, and  therefore possible key/value usage and query
> > > functionality!
> > > 
> > > 😀️
> > > 
> > > **I'm writing this with personal interest**
> > > Yes, we ALL have data to deal with, and I like to tag and find my kids'
> > > photos and files - regardless which "app" or system-
> > > 
> > > **as well as my professional interest**
> > > Working with small-to-medium-to-very-very-large digital heritage
> > > collections.
> > > 
> > > (And therefore loads of meta+data wrangling clever-hacks-and-stunts for a
> > > living 😉️)
> > > 
> > > 
> > > So, my hopefully useful "few-cents" to this thread are here:
> > > 
> > > On 1/2/26 13:11, raf wrote:
> > > > On Wed, Dec 31, 2025 at 04:21:14PM +0100, Bernhard Voelker 
> > > > <[email protected]> wrote:
> > > > 
> > > > > On 12/30/25 17:45, Peter B. wrote:
> > > > > > I'll also checkout Morgan's find-patch:
> > > > > > I'd really love to see xattr-support in the basic "most likely to be
> > > > > > present-on-any-box" tools.
> > > > > The technical coding under the hood for reading xattr from the file 
> > > > > system
> > > > > is there in Morgan's patch for quite a while, indeed.
> > > > > 
> > > > > But before adding, I feel we need to discuss once again the interface
> > > > > to the find(1) user:
> > > > > 
> > > > > - Do we only want to provide an option to search for files having 
> > > > > xattrs?
> > > > >       find -xattr
> > > > > 
> > > > > - Do we want a test option to search for a file having an xattr key 
> > > > > matching
> > > > >     a certain string (or eventually pattern)?
> > > > >       find -xattr 'mykey'        # xattr key equals string
> > > > >       find -xattr '*mykey*'      # xattr key matches pattern
> > > > >     Or better explicitly mention in the option that we match for the 
> > > > > xattr keys?
> > > > >       find -xattr-key 'mykey'
> > > > >       find -xattr-key '*mykey*'
> > > > > 
> > > > > - Do we want a test option to search for a file having an xattr value 
> > > > > matching
> > > > >     a certain string (or eventually pattern)?
> > > > >       find -xattr-value 'myval'   # files with xattr value equals 
> > > > > string
> > > > >       find -xattr-value '*myval*' # ... or pattern
> > > > > 
> > > > > - Or search for files with xattr having a certain mixture of 
> > > > > 'key=val'?
> > > > >       find -xattr-match 'mykey=myval'      # search by key+val as 
> > > > > strings
> > > > >       find -xattr-match 'mykey=*someval*'  # search for key matching 
> > > > > a val pattern
> > > > There are versions of find that (I think) use -xattr to
> > > > just identify the existence of EAs. I don't think
> > > > that's enough, but it might make sense to have -xattr
> > > > do that
> > > +1
> > > absolutely agree.
> > > To both: useful and not-enough.
> > > 
> > > 
> > > > (for compatibility with those other versions of
> > > > find), but to also have -xattr-key and -xattr-value (or
> > > > -xattr-match for both).
> > > +1 again.
> > > It indeed makes perfect sense (and has use-cases!) where one wants to 
> > > query
> > > (key OR value) AND (key AND value).
> > > 
> > > A lot useful if that provides some RegEx (for matching 
> > > wildcards/patterns).
> > > 
> > > 
> > > > The thing to watch out for if both the key and value
> > > > are combined, is that if you format it like "key=value"
> > > > you need to consider the case where the key includes
> > > > "=". There might be malicious EAs trying to be tricky.
> > > True.
> > > There will always be potential for malicious attempts, yet I would suggest
> > > to apply basic "query-escaping", until more strict antitrust seems
> > > necessary?
> > > 
> > > 
> > > > My rawhide program matches EAs formatted like "key:
> > > > value" but non-ascii bytes are encoded (like \x1b or \n
> > > > or \t etc.) and any ": " in the key itself is encoded
> > > > as "\x3a " to disappoint the creators of malicious EAs.
> > > > I've never seen an EA whose key contained ": " so I
> > > > don't think it'll bother anyone.
> > > I think we're on the same page here :)
> > > Yet, one may say "xattrs are not-yet-popular enough, but you'll see, once
> > > they do..." - like with all formats and data-exchange protocols, I guess?
> > EAs/xattrs are used for various things by various
> > systems. (e.g. selinux on Linux, quarantine on macOS)
> > but user/app-specific usage probably isn't popular yet.
> > Although it can be. Some systems allow 64KiB of EA data.
> 
> Thanks for info! I can confirm that.
> btw, app-specific usage: yt-dlp supports `--xattrs` 😉️
> 
> I've put some FAQ infos together regarding xattrs (in 2024/25):
> https://github.com/ArkThis/AHAlodeck/blob/main/doc/xattr_faq.md
> 
> The FAQ also contains xattr size-constraints per filesystem I was able to
> find (or test it).
> ZFS can do 64kiB, indeed: I must admit I still want to test if PER "key
> and/or value" - or in total.
> I've done tests, attaching a ~45kB uuencoded JPG image as xattr on ZFS and
> back: MD5 match.
> 
> I love the fact that EAs are used for security purposes:
> I can assume it means relying on them is enterprise-production-stable (at
> least regarding the kernel/filesystem side)?

I think it's safe to assume that whatever the Linux kernel
and its file systems support is reliable.

> > > What if:
> > > - I search for "some term", but don't care if it's a key or value string?
> 
> > Having -xattr-key and -attr-value wouldn't help you in
> > that case. If you didn't care if it's a key or a value,
> > then you'd need to use both predicates in your search
> > (-xattr-key ... -o -xattr-value ...).
> > 
> > But the problem I was thinking of was that, if these
> > two predicates existed, then a user might reasonably
> > expect them to be connected to each such that using
> > both of them would refer to the same extended
> > attribute, i.e. that -xattr-key ... -a -xaatr-value ...
> > would match a file with an EA whose key matched the
> > -xattr-key and whose value matched the -xattr-value.
> > But it would almost certainly match a file with an EA
> > with the matching key and an EA with the matching
> > value, but they wouldn't have to be the same EA. They
> > could be different EAs. So I think it would be
> > difficult for a user to guess the correct behaviour.
> 
> I was thinking of defaulting to "linked" (=like `keep aspect ratio` by
> default),
> and options to declare if my intention is linked/unlinked?

But it can get messy. What if you want to search for
files with two specific EAs? Some of the
key/value-specific predicates would need to be linked
and others not. That's why I recommend not having them,
but rather creating a piece of multi-line text encoding
all of the EAs together and having a predicate that
matches against that text. It makes it possible to
perform all kinds of tests with a single predicate
as long as each EA represents a single "line" in the
encoded text (so any newline bytes in an EA key or
value must be encoded somehow e.g. \n or \x0a).

> > > I have that use-case a lot: plain fulltext search in collections.
> > That use case is best served (I think) by a single
> > predicate that can match both the key and value encoded
> > in some way as I described earlier (either by
> > "key=val\n" or "key: val\n"). I chose ": " because
> > that's the format output by xattr on macOS (and
> > probably elsewhere).
> 
> Hm.
> Does that mean you're matching against the output of `xattr` for searching,
> or just that you'd like to stay pattern-compatible with it?

It's not using the xattr program. It's all in-process.
It's just constructing text vaguely similar to the
output of xattr because that's what I saw. But it's not
completely compatible with xattr output. It has its own
encoding that doesn't match what xattr does. I'm not
even sure what xattr outputs for binary values. It's
probably base 64 or something but I wanted a human
readable encoding that could be used to text and for binary
EA values (hence the C string escape sequences like \n
and \x1b).

> > That way, if you don't care whether your search
> > criteria applies to the key or the value, you can just
> > do -xattr-match 'something' and if you want to match a
> > key only you can do -regexxattr-match '^somekey: ' and
> > if you want to match a value only you can do
> > -regexxattr-match '^[^\n]+: [^\n]*someval', and to
> > match both the key and the value, you can do something
> > like -regexxattr-match '^somekey: [^\n]*someval'.
> 
> Roger that.
> Is there any overhead due to requiring regex for this, compared to
> non-pattern matching here?

Maybe. But there's always pattern matching. It's either
glob or regex. The globbing is provided either by the
system implementation of fnmatch(), or by an internal
implementation of fnmatch() (because there are wierd
fnmatch bugs on many systems), and the regex is
provided by the pcre2 library. regex patterns can be
slow to match, but it depends on how you write them.
I've never noticed a problem.

> > The above assumes that all EAs are encoded as a single
> > piece of text with one line per EA that is matched
> > against. The choice of encoding and whether it's
> > encoded like this at all would affect the examples.
> > This sort of matching probably isn't possible with
> > globbing unless the ksh extensions are available.
> > 
> > > > I'd vote for just -xattr and -regexxattr (and maybe
> > > > -ixattr and -iregexxattr) and have it match text that
> > > > looks like "key1=val1\nkey2=val2\nkey3=val3\n" or
> > > > "key1: val1\nkey2: val2\nkey3: val3\n".
> > > I like args names that explain what they are, but just for "style"
> > > suggestion:
> > > 
> > > [quick bike-shedding]:
> > > What if, you'd call it "--xattr" and "--xxattr" or "--xattRX/xattrx"? 
> > > (like
> > > the "RX" at the end is "RegEx" :P)
> > > 
> > > Just an idea...
> > I suggested names that I thought were like the existing predicate names.
> > I don't know what would be best.
> 
> Sorry, didn't know there were existing ones:
> Then of course it makes sense to keep their names.

I'm referring to the style of existing predicate names,
not to any existing xattr-related predicate names.
There aren't any of those.

> > > And I assume "ixattr" is "case Insensitive"?
> > That's right.
> > 
> > > > > - Do we need support for -printf formats to print xattr keys and/or 
> > > > > values?
> > > > >     How? The % 1-character directives are almost all used, maybe 
> > > > > begin with
> > > > >     the reserved ones? (%{ %[ %(
> > > > >     How to output several xattr keys or values?
> > > > >     How to select which xattr value to print?
> > > > >       find -xattr -printf '%p %{xattr-keys=hello*} 
> > > > > %{xattr-valuekey=hello}
> > > > There are plenty of conversion letters available!
> > > > 
> > > > See https://savannah.gnu.org/bugs/?64100
> > > Having quickly read over that thread, I totally agree "IF those features 
> > > be
> > > added, IF possible, it'd be great if rawhide and find could stay
> > > syntax-compatible".
> > > 
> > > Also, I think (If I understood correctly) that some common-library-printf
> > > formatting syntax may save A LOT of re-occurring bash-or-python-style
> > > "formatting-caller-works-on-my-machine" scripts, I guess?
> > > But I'm far out on a limb here; I've not really read the thread, and I've
> > > never used find-output formatting (as I didn't know it had any).
> > Only GNU find has it.
> > 
> > > > > There's a lot of possibilities, and I don't want to introduce 
> > > > > something
> > > > > which contradicts the typical use cases, or is not extensible or not
> > > > > maintainable.
> > > > > The complexity - especially when it comes to -printf - is that files 
> > > > > can have
> > > > > several xattr while all other attributes (name, timestamps, 
> > > > > permissions, size,
> > > > > etc.) only exists once.
> > > > My rawhide program also has %j for JSON output. The
> > > > current version outputs EAs like %x but the next
> > > > version outputs them as a JSON object with key/value
> > > > pairs matching the EA names and values. It's still
> > > > encoded because EA values can be text or binary and
> > > > JSON can't represent binary without the user choosing
> > > > some encoding. So that's something to think about.
> > > If any output data is available in a machine readable format (like JSON),
> > > it'd definitely make the re-usability and interoperability between tools
> > > (and scripts) written around find/rawhide, IMO (and experience).
> > > 
> > > > > I think I asked this kind of questions already some years ago, but 
> > > > > there was
> > > > > no input yet.
> > > > > I personally don't have a use case to search for xattrs, so the usual 
> > > > > pattern
> > > > > with '-exec getfattr ...' works for me.
> > > > > Anyone?
> > > ME! Meee! :)
> > > As I said, I'm doing devops for digital-and-physical GLAM-collections - 
> > > and
> > > a lot of metadata layout, storage and retrieval/usage.
> > > 
> > > For example:
> > > After de-embedding (=copying as-is) all exiftool-readable metadata into
> > > xattrs, being able to do some queries on those key/value pairs is AMAZING!
> > > 
> > > Combining that with RDF URIs for keys, combined with Wikibase-and-Wikidata
> > > engine and data, I have many great use cases for xattrs (and find).
> > That sounds cool. A much more interesting use for EAs than
> > just selinux or quarantining. :-)
> 
> :)
> 
> I really like the fact that once I can rely more on xattrs use,
> **any file-format ever is basically irrelevant from now on, for "all things
> metadata"**.
> And this feels really good!

Just make sure that you copy files in a way that preserves EAs.

> Like: Why must I choose another file-format, if I simply want to give "my
> file/data" annotation information?
> (BMP vs PNG vs TIFF vs ... ".txt" regarding storing "their title"?)
> 
> Friend of mine has written a small "updatedb/locate" version with xattr-only
> support a few days ago for testing:
> He ran it on a collection of mixed-format, real-world annotated music
> collection (~14.000 files) - which have all exiftool metadata as xattrs:
> 13917 files took 1.416 seconds to index.
> 
> The index.db entries is a simple CSV, with: `"filename", "xattr-key",
> "xattr-value"`
> 
> Using xattrs already feel more streamlined, consistent and stable than any
> MAM/DAM metadata handling paradigm I've seen so far... ;)
> 
> I've rediscovered xattrs btw, when evaluating object storage implementations
> (FOSS) for large-scale data handling:
> 
> When I found out that S3 has very limited size for tags (127/256 chars) and
> other limitations (<=4kB) for metadata, or ASCII-only in popular
> implementations. See: [Minio issue #19576: "non ASCII characters in
> metadata"](https://github.com/minio/minio/discussions/19576)
> 
> ...and pre-S3 Object Storages use the underlying filesystem's xattrs for
> their object-metadata! ;)

That's a piece of good luck, but if S3 doesn't support binary EAs, that's
a bit sad.

> > > > > P.S. Finally, I wouldn't want to introduce too much and complex code 
> > > > > for
> > > > > 0.25% use cases.  Eventually, the simple test to search for files 
> > > > > having
> > > > > xattrs is enough, and the application logic then extracts them with 
> > > > > another
> > > > > tool.
> > > Is this about the code to handle xattrs (in `gnu-find`)?
> > > There are libs for plain access to xattr key/value, but is there any
> > > xattr-libs for higher-level functions (like queries, regex, etc)?
> > I think the actual searching would be done by what find is already doing
> > (glob, regex).
> 
> Roger.
> 
> > > Like what HaikuOS (probably?) does in their BeFS filesystem libraries?
> > > A friend of mine (author of "https://sen-labs.org/": a semantic desktop
> > > engine for Haiku) told me that BeFS has some database-like query functions
> > > built-into their filesystem "xattrs".
> > > 
> > > I know any additional library-dependence is another "whole thing" to
> > > maintain.
> > > However, IMO being able to query key/value data "on the spot" is something
> > > that will eventually replace the legacy "files-in-folders open/save 
> > > dialogs
> > > - and wildcards" as-we-know-it...?
> > > 
> > > Might be an incentive to invest in a common, and commonly supported
> > > "attribute-handling" lib.
> > > Just 2¢ of mine.
> > > 
> > > I just like interoperable and open tech-systems :)
> > > 
> > > 
> > > > I don't think it's necessarily too complex. The code in
> > > > rawhide to obtain EAs is 492 lines (335 excluding
> > > > blanks line, 278 excluding comment lines as well) for
> > > > Linux, macOS, Cygwin, Solaris, and FreeBSD (OpenBSD and
> > > > NetBSD don't have EAs). But of course, that's only the
> > > > start.
> > > Including so many OSs already in your consideration(s), wouldn't that
> > > already suffice for a common lib?
> > > Ignore and correct me, if I'm totally on the wrong track here?
> > > Thanks :)
> > rawhide is GPLv3+ so the code is available for a
> > library but if potential clients don't like my choices
> > for encoding the EA data it might not be suitable for
> > their needs. Encoding is always tricky to get right,
> > and maybe I didn't. And it's probably unwise to use a
> > third-party library until it has become popular enough
> > to be packaged on lots of systems. And GNU find
> > supports many more systems than rawhide does, and I
> > don't know which other systems support EAs. I only
> > support systems that I have real or virtual machines
> > for.
> 
> Thanks for those interesting insights!
> Indeed, good point: no lib-dependency until "popular enough".
> 
> Kind regards, Peter

cheers,
raf


Reply via email to