On 2026-04-29 08:51, Radosław Korzeniewski wrote:
The idea of an "incremental accelerator" for common local filesystems is
great.
But the inotify api is not the best solution for this case IMVHO, YMMV. The
main issue is the scalability issue.

True. IMHO, this is quick and dirty solution path which can be acceptable for some use cases. I consider reliability of a backup system too important
and failures to miss a file unforgiving.

You want to use "incremental accelerator" for all your high volume storage,
but inotify functionality does not scale.
The more directories you want to inotify the more resources it requires. A
tradeoff, sure.

Yes, it doesn't scale. Still, it can monitor millions of files.
For several millions of files, it will probably consume several hundreds
of megabytes (at least), and this will be unswappable memory.

From what I remember (I read this is still the case), there is much
worse problem with inotify - it doesn't throttle. When queue overflow
occurs, events get dropped and for most use cases, this is a disaster.
No retry, no recovery from this.
Kernel does send notification but only about overflow happening, without
detailed info about each lost event.
This means that events could slip trough your fingers unnoticed.

Even inotify(7) man page warns about inconsistencies due to race conditions.

There are other potential issues as well. Man page says:
-----BEGIN-----
 If a filesystem is mounted on top of a monitored directory, no event is
 generated, and no events are generated for  objects  immediately  under
 the  new  mount  point.   If  the filesystem is subsequently unmounted,
 events will subsequently be generated for the directory and the objects
 it contains.
-----END-----


I did a prototype of this feature years ago (no coding LLM existed yet) and it wasn't worth the effort (the architecture was different, when fd starts it spawns a special thread and registers required watches from the config file, so incremental job can ask this thread what changed - it was a never
ending story of issues, race conditions, edge cases, etc.).

Today there is a modern and scalable fanotify API (or FSEvents in macOS
or USN Change Journal in win) which should be used instead, again IMVHO,
YMMV.

Fanotify might be more scalable and reliable but it can still lose events.
No guaranteed completeness - the stream of events you receive is not
guaranteed to represent all filesystem activity that actually happened.

Its man page fanotify(7) states: "The event queue can overflow. In this case,
events are lost.".

And for what I understood, just like with inotify, you get a single overflow
notification, so you know that a loss occurred but you don't know what.

In short, there are many cases where fanotify is better than inotify but
it comes with some other limitations and design choices that should be
considered when using it.


Regards

--
Josip Deanovic


_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to