Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-07-30 Thread Beata Michalska
On 07/22/2015 05:55 PM, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> Some comments below.
> 
> On Tuesday, June 16, 2015 03:09:30 PM Beata Michalska wrote:
>> Introduce configurable generic interface for file
>> system-wide event notifications, to provide file
>> systems with a common way of reporting any potential
>> issues as they emerge.
>>
>> The notifications are to be issued through generic
>> netlink interface by newly introduced multicast group.
>>
>> Threshold notifications have been included, allowing
>> triggering an event whenever the amount of free space drops
>> below a certain level - or levels to be more precise as two
>> of them are being supported: the lower and the upper range.
>> The notifications work both ways: once the threshold level
>> has been reached, an event shall be generated whenever
>> the number of available blocks goes up again re-activating
>> the threshold.
>>
>> The interface has been exposed through a vfs. Once mounted,
>> it serves as an entry point for the set-up where one can
>> register for particular file system events.
>>
>> Signed-off-by: Beata Michalska 
>> ---
>>  Documentation/filesystems/events.txt |  232 ++
>>  fs/Kconfig   |2 +
>>  fs/Makefile  |1 +
>>  fs/events/Kconfig|7 +
>>  fs/events/Makefile   |5 +
>>  fs/events/fs_event.c |  809 
>> ++
>>  fs/events/fs_event.h |   22 +
>>  fs/events/fs_event_netlink.c |  104 +
>>  fs/namespace.c   |1 +
>>  include/linux/fs.h   |6 +-
>>  include/linux/fs_event.h |   72 +++
>>  include/uapi/linux/Kbuild|1 +
>>  include/uapi/linux/fs_event.h|   58 +++
>>  13 files changed, 1319 insertions(+), 1 deletion(-)
>>  create mode 100644 Documentation/filesystems/events.txt
>>  create mode 100644 fs/events/Kconfig
>>  create mode 100644 fs/events/Makefile
>>  create mode 100644 fs/events/fs_event.c
>>  create mode 100644 fs/events/fs_event.h
>>  create mode 100644 fs/events/fs_event_netlink.c
>>  create mode 100644 include/linux/fs_event.h
>>  create mode 100644 include/uapi/linux/fs_event.h
>>
>> diff --git a/Documentation/filesystems/events.txt 
>> b/Documentation/filesystems/events.txt
>> new file mode 100644
>> index 000..c2e6227
>> --- /dev/null
>> +++ b/Documentation/filesystems/events.txt
>> @@ -0,0 +1,232 @@
>> +
>> +Generic file system event notification interface
>> +
>> +Document created 23 April 2015 by Beata Michalska 
>> +
>> +1. The reason behind:
>> +=
>> +
>> +There are many corner cases when things might get messy with the 
>> filesystems.
>> +And it is not always obvious what and when went wrong. Sometimes you might
>> +get some subtle hints that there is something going on - but by the time
>> +you realise it, it might be too late as you are already out-of-space
>> +or the filesystem has been remounted as read-only (i.e.). The generic
>> +interface for the filesystem events fills the gap by providing a rather
>> +easy way of real-time notifications triggered whenever something interesting
>> +happens, allowing filesystems to report events in a common way, as they 
>> occur.
>> +
>> +2. How does it work:
>> +
>> +
>> +The interface itself has been exposed as fstrace-type Virtual File System,
>> +primarily to ease the process of setting up the configuration for the
>> +notifications. So for starters, it needs to get mounted (obviously):
>> +
>> +mount -t fstrace none /sys/fs/events
>> +
>> +This will unveil the single fstrace filesystem entry - the 'config' file,
>> +through which the notification are being set-up.
> 
> The patch creates a separate virtual filesystem for single file,
> this is an overkill IMHO and a new sysfs or debugfs entry should
> be sufficient.
> 
>> +
>> +Activating notifications for particular filesystem is as straightforward
>> +as writing into the 'config' file. Note that by default all events, despite
>> +the actual filesystem type, are being disregarded.
>> +
>> +Synopsis of config:
>> +--
>> +
>> +MOUNT EVENT_TYPE [L1] [L2]
> 
> OTOH Why not use the advantages of having a separate virtual
> filesystem and create separate directories for each mount point
> (+ maybe even extra parent directories for mount namespaces) and
> put separate entries for each event type in these directories.
> 
> This would also allow usage of eventfd() notification interface
> on such files.
> 
> Please take look at:
> 
> tools/cgroup/cgroup_event_listener.c
> 
> and
> 
> Documentation/cgroups/memcg_test.txt (point 9.10)
> 
> to see how much easier it is to observe memory usage thresholds
> on memory cgroups compared to available blocks on filesystems
> using fs events..
> 

I'll give it some thoughts as the solution you are proposing eliminates
some issues related 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-07-30 Thread Beata Michalska
On 07/22/2015 05:55 PM, Bartlomiej Zolnierkiewicz wrote:
 
 Hi,
 
 Some comments below.
 
 On Tuesday, June 16, 2015 03:09:30 PM Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska b.michal...@samsung.com
 ---
  Documentation/filesystems/events.txt |  232 ++
  fs/Kconfig   |2 +
  fs/Makefile  |1 +
  fs/events/Kconfig|7 +
  fs/events/Makefile   |5 +
  fs/events/fs_event.c |  809 
 ++
  fs/events/fs_event.h |   22 +
  fs/events/fs_event_netlink.c |  104 +
  fs/namespace.c   |1 +
  include/linux/fs.h   |6 +-
  include/linux/fs_event.h |   72 +++
  include/uapi/linux/Kbuild|1 +
  include/uapi/linux/fs_event.h|   58 +++
  13 files changed, 1319 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/filesystems/events.txt
  create mode 100644 fs/events/Kconfig
  create mode 100644 fs/events/Makefile
  create mode 100644 fs/events/fs_event.c
  create mode 100644 fs/events/fs_event.h
  create mode 100644 fs/events/fs_event_netlink.c
  create mode 100644 include/linux/fs_event.h
  create mode 100644 include/uapi/linux/fs_event.h

 diff --git a/Documentation/filesystems/events.txt 
 b/Documentation/filesystems/events.txt
 new file mode 100644
 index 000..c2e6227
 --- /dev/null
 +++ b/Documentation/filesystems/events.txt
 @@ -0,0 +1,232 @@
 +
 +Generic file system event notification interface
 +
 +Document created 23 April 2015 by Beata Michalska b.michal...@samsung.com
 +
 +1. The reason behind:
 +=
 +
 +There are many corner cases when things might get messy with the 
 filesystems.
 +And it is not always obvious what and when went wrong. Sometimes you might
 +get some subtle hints that there is something going on - but by the time
 +you realise it, it might be too late as you are already out-of-space
 +or the filesystem has been remounted as read-only (i.e.). The generic
 +interface for the filesystem events fills the gap by providing a rather
 +easy way of real-time notifications triggered whenever something interesting
 +happens, allowing filesystems to report events in a common way, as they 
 occur.
 +
 +2. How does it work:
 +
 +
 +The interface itself has been exposed as fstrace-type Virtual File System,
 +primarily to ease the process of setting up the configuration for the
 +notifications. So for starters, it needs to get mounted (obviously):
 +
 +mount -t fstrace none /sys/fs/events
 +
 +This will unveil the single fstrace filesystem entry - the 'config' file,
 +through which the notification are being set-up.
 
 The patch creates a separate virtual filesystem for single file,
 this is an overkill IMHO and a new sysfs or debugfs entry should
 be sufficient.
 
 +
 +Activating notifications for particular filesystem is as straightforward
 +as writing into the 'config' file. Note that by default all events, despite
 +the actual filesystem type, are being disregarded.
 +
 +Synopsis of config:
 +--
 +
 +MOUNT EVENT_TYPE [L1] [L2]
 
 OTOH Why not use the advantages of having a separate virtual
 filesystem and create separate directories for each mount point
 (+ maybe even extra parent directories for mount namespaces) and
 put separate entries for each event type in these directories.
 
 This would also allow usage of eventfd() notification interface
 on such files.
 
 Please take look at:
 
 tools/cgroup/cgroup_event_listener.c
 
 and
 
 Documentation/cgroups/memcg_test.txt (point 9.10)
 
 to see how much easier it is to observe memory usage thresholds
 on memory cgroups compared to available blocks on filesystems
 using fs events..
 

I'll give it some thoughts as the solution you are proposing eliminates
some issues related with the generic netlink (mostly the one concerning the
network namespaces) though I'd rather avoid creating numerous entries
for each mount/mount namespace. I guess the 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-07-22 Thread Bartlomiej Zolnierkiewicz

Hi,

Some comments below.

On Tuesday, June 16, 2015 03:09:30 PM Beata Michalska wrote:
> Introduce configurable generic interface for file
> system-wide event notifications, to provide file
> systems with a common way of reporting any potential
> issues as they emerge.
> 
> The notifications are to be issued through generic
> netlink interface by newly introduced multicast group.
> 
> Threshold notifications have been included, allowing
> triggering an event whenever the amount of free space drops
> below a certain level - or levels to be more precise as two
> of them are being supported: the lower and the upper range.
> The notifications work both ways: once the threshold level
> has been reached, an event shall be generated whenever
> the number of available blocks goes up again re-activating
> the threshold.
> 
> The interface has been exposed through a vfs. Once mounted,
> it serves as an entry point for the set-up where one can
> register for particular file system events.
> 
> Signed-off-by: Beata Michalska 
> ---
>  Documentation/filesystems/events.txt |  232 ++
>  fs/Kconfig   |2 +
>  fs/Makefile  |1 +
>  fs/events/Kconfig|7 +
>  fs/events/Makefile   |5 +
>  fs/events/fs_event.c |  809 
> ++
>  fs/events/fs_event.h |   22 +
>  fs/events/fs_event_netlink.c |  104 +
>  fs/namespace.c   |1 +
>  include/linux/fs.h   |6 +-
>  include/linux/fs_event.h |   72 +++
>  include/uapi/linux/Kbuild|1 +
>  include/uapi/linux/fs_event.h|   58 +++
>  13 files changed, 1319 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/filesystems/events.txt
>  create mode 100644 fs/events/Kconfig
>  create mode 100644 fs/events/Makefile
>  create mode 100644 fs/events/fs_event.c
>  create mode 100644 fs/events/fs_event.h
>  create mode 100644 fs/events/fs_event_netlink.c
>  create mode 100644 include/linux/fs_event.h
>  create mode 100644 include/uapi/linux/fs_event.h
> 
> diff --git a/Documentation/filesystems/events.txt 
> b/Documentation/filesystems/events.txt
> new file mode 100644
> index 000..c2e6227
> --- /dev/null
> +++ b/Documentation/filesystems/events.txt
> @@ -0,0 +1,232 @@
> +
> + Generic file system event notification interface
> +
> +Document created 23 April 2015 by Beata Michalska 
> +
> +1. The reason behind:
> +=
> +
> +There are many corner cases when things might get messy with the filesystems.
> +And it is not always obvious what and when went wrong. Sometimes you might
> +get some subtle hints that there is something going on - but by the time
> +you realise it, it might be too late as you are already out-of-space
> +or the filesystem has been remounted as read-only (i.e.). The generic
> +interface for the filesystem events fills the gap by providing a rather
> +easy way of real-time notifications triggered whenever something interesting
> +happens, allowing filesystems to report events in a common way, as they 
> occur.
> +
> +2. How does it work:
> +
> +
> +The interface itself has been exposed as fstrace-type Virtual File System,
> +primarily to ease the process of setting up the configuration for the
> +notifications. So for starters, it needs to get mounted (obviously):
> +
> + mount -t fstrace none /sys/fs/events
> +
> +This will unveil the single fstrace filesystem entry - the 'config' file,
> +through which the notification are being set-up.

The patch creates a separate virtual filesystem for single file,
this is an overkill IMHO and a new sysfs or debugfs entry should
be sufficient.

> +
> +Activating notifications for particular filesystem is as straightforward
> +as writing into the 'config' file. Note that by default all events, despite
> +the actual filesystem type, are being disregarded.
> +
> +Synopsis of config:
> +--
> +
> + MOUNT EVENT_TYPE [L1] [L2]

OTOH Why not use the advantages of having a separate virtual
filesystem and create separate directories for each mount point
(+ maybe even extra parent directories for mount namespaces) and
put separate entries for each event type in these directories.

This would also allow usage of eventfd() notification interface
on such files.

Please take look at:

tools/cgroup/cgroup_event_listener.c

and

Documentation/cgroups/memcg_test.txt (point 9.10)

to see how much easier it is to observe memory usage thresholds
on memory cgroups compared to available blocks on filesystems
using fs events..

Also while at it please add your example user-space code (posted
on request in a some other mail) to tools/fs_events/ (preferably
in a separate patch).

> +
> + MOUNT  : the filesystem's mount point
> + EVENT_TYPE : event types - currently two of them are being supported:
> +
> +   * 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-07-22 Thread Bartlomiej Zolnierkiewicz

Hi,

Some comments below.

On Tuesday, June 16, 2015 03:09:30 PM Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.
 
 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.
 
 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.
 
 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.
 
 Signed-off-by: Beata Michalska b.michal...@samsung.com
 ---
  Documentation/filesystems/events.txt |  232 ++
  fs/Kconfig   |2 +
  fs/Makefile  |1 +
  fs/events/Kconfig|7 +
  fs/events/Makefile   |5 +
  fs/events/fs_event.c |  809 
 ++
  fs/events/fs_event.h |   22 +
  fs/events/fs_event_netlink.c |  104 +
  fs/namespace.c   |1 +
  include/linux/fs.h   |6 +-
  include/linux/fs_event.h |   72 +++
  include/uapi/linux/Kbuild|1 +
  include/uapi/linux/fs_event.h|   58 +++
  13 files changed, 1319 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/filesystems/events.txt
  create mode 100644 fs/events/Kconfig
  create mode 100644 fs/events/Makefile
  create mode 100644 fs/events/fs_event.c
  create mode 100644 fs/events/fs_event.h
  create mode 100644 fs/events/fs_event_netlink.c
  create mode 100644 include/linux/fs_event.h
  create mode 100644 include/uapi/linux/fs_event.h
 
 diff --git a/Documentation/filesystems/events.txt 
 b/Documentation/filesystems/events.txt
 new file mode 100644
 index 000..c2e6227
 --- /dev/null
 +++ b/Documentation/filesystems/events.txt
 @@ -0,0 +1,232 @@
 +
 + Generic file system event notification interface
 +
 +Document created 23 April 2015 by Beata Michalska b.michal...@samsung.com
 +
 +1. The reason behind:
 +=
 +
 +There are many corner cases when things might get messy with the filesystems.
 +And it is not always obvious what and when went wrong. Sometimes you might
 +get some subtle hints that there is something going on - but by the time
 +you realise it, it might be too late as you are already out-of-space
 +or the filesystem has been remounted as read-only (i.e.). The generic
 +interface for the filesystem events fills the gap by providing a rather
 +easy way of real-time notifications triggered whenever something interesting
 +happens, allowing filesystems to report events in a common way, as they 
 occur.
 +
 +2. How does it work:
 +
 +
 +The interface itself has been exposed as fstrace-type Virtual File System,
 +primarily to ease the process of setting up the configuration for the
 +notifications. So for starters, it needs to get mounted (obviously):
 +
 + mount -t fstrace none /sys/fs/events
 +
 +This will unveil the single fstrace filesystem entry - the 'config' file,
 +through which the notification are being set-up.

The patch creates a separate virtual filesystem for single file,
this is an overkill IMHO and a new sysfs or debugfs entry should
be sufficient.

 +
 +Activating notifications for particular filesystem is as straightforward
 +as writing into the 'config' file. Note that by default all events, despite
 +the actual filesystem type, are being disregarded.
 +
 +Synopsis of config:
 +--
 +
 + MOUNT EVENT_TYPE [L1] [L2]

OTOH Why not use the advantages of having a separate virtual
filesystem and create separate directories for each mount point
(+ maybe even extra parent directories for mount namespaces) and
put separate entries for each event type in these directories.

This would also allow usage of eventfd() notification interface
on such files.

Please take look at:

tools/cgroup/cgroup_event_listener.c

and

Documentation/cgroups/memcg_test.txt (point 9.10)

to see how much easier it is to observe memory usage thresholds
on memory cgroups compared to available blocks on filesystems
using fs events..

Also while at it please add your example user-space code (posted
on request in a some other mail) to tools/fs_events/ (preferably
in a separate patch).

 +
 + MOUNT  : the filesystem's mount point
 + EVENT_TYPE : event types - currently two of them are being supported:
 +
 +   * generic events (G) covering most common warnings
 +  

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-26 Thread Beata Michalska
On 06/24/2015 06:26 PM, Steve French wrote:
> On Wed, Jun 24, 2015 at 10:31 AM, Beata Michalska
>  wrote:
>> On 06/24/2015 10:47 AM, Dmitry Monakhov wrote:
>>> Beata Michalska  writes:
>>>
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska 
 ---
  Documentation/filesystems/events.txt |  232 ++
  fs/Kconfig   |2 +
  fs/Makefile  |1 +
  fs/events/Kconfig|7 +
  fs/events/Makefile   |5 +
  fs/events/fs_event.c |  809 
 ++
  fs/events/fs_event.h |   22 +
  fs/events/fs_event_netlink.c |  104 +
  fs/namespace.c   |1 +
  include/linux/fs.h   |6 +-
  include/linux/fs_event.h |   72 +++
  include/uapi/linux/Kbuild|1 +
  include/uapi/linux/fs_event.h|   58 +++
  13 files changed, 1319 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/filesystems/events.txt
  create mode 100644 fs/events/Kconfig
  create mode 100644 fs/events/Makefile
  create mode 100644 fs/events/fs_event.c
  create mode 100644 fs/events/fs_event.h
  create mode 100644 fs/events/fs_event_netlink.c
  create mode 100644 include/linux/fs_event.h
  create mode 100644 include/uapi/linux/fs_event.h

 diff --git a/Documentation/filesystems/events.txt 
 b/Documentation/filesystems/events.txt
 new file mode 100644
 index 000..c2e6227
 --- /dev/null
 +++ b/Documentation/filesystems/events.txt
 @@ -0,0 +1,232 @@
 +
 +Generic file system event notification interface
 +
 +Document created 23 April 2015 by Beata Michalska 
 
 +
 +1. The reason behind:
 +=
 +
 +There are many corner cases when things might get messy with the 
 filesystems.
 +And it is not always obvious what and when went wrong. Sometimes you might
 +get some subtle hints that there is something going on - but by the time
 +you realise it, it might be too late as you are already out-of-space
 +or the filesystem has been remounted as read-only (i.e.). The generic
 +interface for the filesystem events fills the gap by providing a rather
 +easy way of real-time notifications triggered whenever something 
 interesting
 +happens, allowing filesystems to report events in a common way, as they 
 occur.
 +
 +2. How does it work:
 +
 +
 +The interface itself has been exposed as fstrace-type Virtual File System,
 +primarily to ease the process of setting up the configuration for the
 +notifications. So for starters, it needs to get mounted (obviously):
 +
 +mount -t fstrace none /sys/fs/events
 +
 +This will unveil the single fstrace filesystem entry - the 'config' file,
 +through which the notification are being set-up.
 +
 +Activating notifications for particular filesystem is as straightforward
 +as writing into the 'config' file. Note that by default all events, 
 despite
 +the actual filesystem type, are being disregarded.
 +
 +Synopsis of config:
 +--
 +
 +MOUNT EVENT_TYPE [L1] [L2]
 +
 + MOUNT  : the filesystem's mount point
 + EVENT_TYPE : event types - currently two of them are being supported:
 +
 +  * generic events ("G") covering most common warnings
 +  and errors that might be reported by any filesystem;
 +  this option does not take any arguments;
 +
 +  * threshold notifications ("T") - events sent whenever
 +  the amount of available space drops below certain level;
 +  it is possible to specify two threshold levels though
 +  only one is required to properly setup the notifications;
 +  as those refer 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-26 Thread Beata Michalska
On 06/24/2015 06:26 PM, Steve French wrote:
 On Wed, Jun 24, 2015 at 10:31 AM, Beata Michalska
 b.michal...@samsung.com wrote:
 On 06/24/2015 10:47 AM, Dmitry Monakhov wrote:
 Beata Michalska b.michal...@samsung.com writes:

 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska b.michal...@samsung.com
 ---
  Documentation/filesystems/events.txt |  232 ++
  fs/Kconfig   |2 +
  fs/Makefile  |1 +
  fs/events/Kconfig|7 +
  fs/events/Makefile   |5 +
  fs/events/fs_event.c |  809 
 ++
  fs/events/fs_event.h |   22 +
  fs/events/fs_event_netlink.c |  104 +
  fs/namespace.c   |1 +
  include/linux/fs.h   |6 +-
  include/linux/fs_event.h |   72 +++
  include/uapi/linux/Kbuild|1 +
  include/uapi/linux/fs_event.h|   58 +++
  13 files changed, 1319 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/filesystems/events.txt
  create mode 100644 fs/events/Kconfig
  create mode 100644 fs/events/Makefile
  create mode 100644 fs/events/fs_event.c
  create mode 100644 fs/events/fs_event.h
  create mode 100644 fs/events/fs_event_netlink.c
  create mode 100644 include/linux/fs_event.h
  create mode 100644 include/uapi/linux/fs_event.h

 diff --git a/Documentation/filesystems/events.txt 
 b/Documentation/filesystems/events.txt
 new file mode 100644
 index 000..c2e6227
 --- /dev/null
 +++ b/Documentation/filesystems/events.txt
 @@ -0,0 +1,232 @@
 +
 +Generic file system event notification interface
 +
 +Document created 23 April 2015 by Beata Michalska 
 b.michal...@samsung.com
 +
 +1. The reason behind:
 +=
 +
 +There are many corner cases when things might get messy with the 
 filesystems.
 +And it is not always obvious what and when went wrong. Sometimes you might
 +get some subtle hints that there is something going on - but by the time
 +you realise it, it might be too late as you are already out-of-space
 +or the filesystem has been remounted as read-only (i.e.). The generic
 +interface for the filesystem events fills the gap by providing a rather
 +easy way of real-time notifications triggered whenever something 
 interesting
 +happens, allowing filesystems to report events in a common way, as they 
 occur.
 +
 +2. How does it work:
 +
 +
 +The interface itself has been exposed as fstrace-type Virtual File System,
 +primarily to ease the process of setting up the configuration for the
 +notifications. So for starters, it needs to get mounted (obviously):
 +
 +mount -t fstrace none /sys/fs/events
 +
 +This will unveil the single fstrace filesystem entry - the 'config' file,
 +through which the notification are being set-up.
 +
 +Activating notifications for particular filesystem is as straightforward
 +as writing into the 'config' file. Note that by default all events, 
 despite
 +the actual filesystem type, are being disregarded.
 +
 +Synopsis of config:
 +--
 +
 +MOUNT EVENT_TYPE [L1] [L2]
 +
 + MOUNT  : the filesystem's mount point
 + EVENT_TYPE : event types - currently two of them are being supported:
 +
 +  * generic events (G) covering most common warnings
 +  and errors that might be reported by any filesystem;
 +  this option does not take any arguments;
 +
 +  * threshold notifications (T) - events sent whenever
 +  the amount of available space drops below certain level;
 +  it is possible to specify two threshold levels though
 +  only one is required to properly setup the notifications;
 +  as those refer to the number of available blocks, the lower
 +  level [L1] needs to be higher than the upper one [L2]
 +
 +Sample request could look like the following:
 +
 + echo /sample/mount/point G T 71 50  /sys/fs/events/config
 +
 +Multiple request might be specified provided they are separated with 
 semicolon.
 +
 +The configuration itself might 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-24 Thread Steve French
On Wed, Jun 24, 2015 at 10:31 AM, Beata Michalska
 wrote:
> On 06/24/2015 10:47 AM, Dmitry Monakhov wrote:
>> Beata Michalska  writes:
>>
>>> Introduce configurable generic interface for file
>>> system-wide event notifications, to provide file
>>> systems with a common way of reporting any potential
>>> issues as they emerge.
>>>
>>> The notifications are to be issued through generic
>>> netlink interface by newly introduced multicast group.
>>>
>>> Threshold notifications have been included, allowing
>>> triggering an event whenever the amount of free space drops
>>> below a certain level - or levels to be more precise as two
>>> of them are being supported: the lower and the upper range.
>>> The notifications work both ways: once the threshold level
>>> has been reached, an event shall be generated whenever
>>> the number of available blocks goes up again re-activating
>>> the threshold.
>>>
>>> The interface has been exposed through a vfs. Once mounted,
>>> it serves as an entry point for the set-up where one can
>>> register for particular file system events.
>>>
>>> Signed-off-by: Beata Michalska 
>>> ---
>>>  Documentation/filesystems/events.txt |  232 ++
>>>  fs/Kconfig   |2 +
>>>  fs/Makefile  |1 +
>>>  fs/events/Kconfig|7 +
>>>  fs/events/Makefile   |5 +
>>>  fs/events/fs_event.c |  809 
>>> ++
>>>  fs/events/fs_event.h |   22 +
>>>  fs/events/fs_event_netlink.c |  104 +
>>>  fs/namespace.c   |1 +
>>>  include/linux/fs.h   |6 +-
>>>  include/linux/fs_event.h |   72 +++
>>>  include/uapi/linux/Kbuild|1 +
>>>  include/uapi/linux/fs_event.h|   58 +++
>>>  13 files changed, 1319 insertions(+), 1 deletion(-)
>>>  create mode 100644 Documentation/filesystems/events.txt
>>>  create mode 100644 fs/events/Kconfig
>>>  create mode 100644 fs/events/Makefile
>>>  create mode 100644 fs/events/fs_event.c
>>>  create mode 100644 fs/events/fs_event.h
>>>  create mode 100644 fs/events/fs_event_netlink.c
>>>  create mode 100644 include/linux/fs_event.h
>>>  create mode 100644 include/uapi/linux/fs_event.h
>>>
>>> diff --git a/Documentation/filesystems/events.txt 
>>> b/Documentation/filesystems/events.txt
>>> new file mode 100644
>>> index 000..c2e6227
>>> --- /dev/null
>>> +++ b/Documentation/filesystems/events.txt
>>> @@ -0,0 +1,232 @@
>>> +
>>> +Generic file system event notification interface
>>> +
>>> +Document created 23 April 2015 by Beata Michalska 
>>> +
>>> +1. The reason behind:
>>> +=
>>> +
>>> +There are many corner cases when things might get messy with the 
>>> filesystems.
>>> +And it is not always obvious what and when went wrong. Sometimes you might
>>> +get some subtle hints that there is something going on - but by the time
>>> +you realise it, it might be too late as you are already out-of-space
>>> +or the filesystem has been remounted as read-only (i.e.). The generic
>>> +interface for the filesystem events fills the gap by providing a rather
>>> +easy way of real-time notifications triggered whenever something 
>>> interesting
>>> +happens, allowing filesystems to report events in a common way, as they 
>>> occur.
>>> +
>>> +2. How does it work:
>>> +
>>> +
>>> +The interface itself has been exposed as fstrace-type Virtual File System,
>>> +primarily to ease the process of setting up the configuration for the
>>> +notifications. So for starters, it needs to get mounted (obviously):
>>> +
>>> +mount -t fstrace none /sys/fs/events
>>> +
>>> +This will unveil the single fstrace filesystem entry - the 'config' file,
>>> +through which the notification are being set-up.
>>> +
>>> +Activating notifications for particular filesystem is as straightforward
>>> +as writing into the 'config' file. Note that by default all events, despite
>>> +the actual filesystem type, are being disregarded.
>>> +
>>> +Synopsis of config:
>>> +--
>>> +
>>> +MOUNT EVENT_TYPE [L1] [L2]
>>> +
>>> + MOUNT  : the filesystem's mount point
>>> + EVENT_TYPE : event types - currently two of them are being supported:
>>> +
>>> +  * generic events ("G") covering most common warnings
>>> +  and errors that might be reported by any filesystem;
>>> +  this option does not take any arguments;
>>> +
>>> +  * threshold notifications ("T") - events sent whenever
>>> +  the amount of available space drops below certain level;
>>> +  it is possible to specify two threshold levels though
>>> +  only one is required to properly setup the notifications;
>>> +  as those refer to the number of available blocks, the lower
>>> +  level [L1] needs to be higher than the upper one [L2]
>>> +
>>> +Sample request could look like the following:

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-24 Thread Beata Michalska
On 06/24/2015 10:47 AM, Dmitry Monakhov wrote:
> Beata Michalska  writes:
> 
>> Introduce configurable generic interface for file
>> system-wide event notifications, to provide file
>> systems with a common way of reporting any potential
>> issues as they emerge.
>>
>> The notifications are to be issued through generic
>> netlink interface by newly introduced multicast group.
>>
>> Threshold notifications have been included, allowing
>> triggering an event whenever the amount of free space drops
>> below a certain level - or levels to be more precise as two
>> of them are being supported: the lower and the upper range.
>> The notifications work both ways: once the threshold level
>> has been reached, an event shall be generated whenever
>> the number of available blocks goes up again re-activating
>> the threshold.
>>
>> The interface has been exposed through a vfs. Once mounted,
>> it serves as an entry point for the set-up where one can
>> register for particular file system events.
>>
>> Signed-off-by: Beata Michalska 
>> ---
>>  Documentation/filesystems/events.txt |  232 ++
>>  fs/Kconfig   |2 +
>>  fs/Makefile  |1 +
>>  fs/events/Kconfig|7 +
>>  fs/events/Makefile   |5 +
>>  fs/events/fs_event.c |  809 
>> ++
>>  fs/events/fs_event.h |   22 +
>>  fs/events/fs_event_netlink.c |  104 +
>>  fs/namespace.c   |1 +
>>  include/linux/fs.h   |6 +-
>>  include/linux/fs_event.h |   72 +++
>>  include/uapi/linux/Kbuild|1 +
>>  include/uapi/linux/fs_event.h|   58 +++
>>  13 files changed, 1319 insertions(+), 1 deletion(-)
>>  create mode 100644 Documentation/filesystems/events.txt
>>  create mode 100644 fs/events/Kconfig
>>  create mode 100644 fs/events/Makefile
>>  create mode 100644 fs/events/fs_event.c
>>  create mode 100644 fs/events/fs_event.h
>>  create mode 100644 fs/events/fs_event_netlink.c
>>  create mode 100644 include/linux/fs_event.h
>>  create mode 100644 include/uapi/linux/fs_event.h
>>
>> diff --git a/Documentation/filesystems/events.txt 
>> b/Documentation/filesystems/events.txt
>> new file mode 100644
>> index 000..c2e6227
>> --- /dev/null
>> +++ b/Documentation/filesystems/events.txt
>> @@ -0,0 +1,232 @@
>> +
>> +Generic file system event notification interface
>> +
>> +Document created 23 April 2015 by Beata Michalska 
>> +
>> +1. The reason behind:
>> +=
>> +
>> +There are many corner cases when things might get messy with the 
>> filesystems.
>> +And it is not always obvious what and when went wrong. Sometimes you might
>> +get some subtle hints that there is something going on - but by the time
>> +you realise it, it might be too late as you are already out-of-space
>> +or the filesystem has been remounted as read-only (i.e.). The generic
>> +interface for the filesystem events fills the gap by providing a rather
>> +easy way of real-time notifications triggered whenever something interesting
>> +happens, allowing filesystems to report events in a common way, as they 
>> occur.
>> +
>> +2. How does it work:
>> +
>> +
>> +The interface itself has been exposed as fstrace-type Virtual File System,
>> +primarily to ease the process of setting up the configuration for the
>> +notifications. So for starters, it needs to get mounted (obviously):
>> +
>> +mount -t fstrace none /sys/fs/events
>> +
>> +This will unveil the single fstrace filesystem entry - the 'config' file,
>> +through which the notification are being set-up.
>> +
>> +Activating notifications for particular filesystem is as straightforward
>> +as writing into the 'config' file. Note that by default all events, despite
>> +the actual filesystem type, are being disregarded.
>> +
>> +Synopsis of config:
>> +--
>> +
>> +MOUNT EVENT_TYPE [L1] [L2]
>> +
>> + MOUNT  : the filesystem's mount point
>> + EVENT_TYPE : event types - currently two of them are being supported:
>> +
>> +  * generic events ("G") covering most common warnings
>> +  and errors that might be reported by any filesystem;
>> +  this option does not take any arguments;
>> +
>> +  * threshold notifications ("T") - events sent whenever
>> +  the amount of available space drops below certain level;
>> +  it is possible to specify two threshold levels though
>> +  only one is required to properly setup the notifications;
>> +  as those refer to the number of available blocks, the lower
>> +  level [L1] needs to be higher than the upper one [L2]
>> +
>> +Sample request could look like the following:
>> +
>> + echo /sample/mount/point G T 71 50 > /sys/fs/events/config
>> +
>> +Multiple request might be specified provided they are separated with 
>> semicolon.
>> +

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-24 Thread Dmitry Monakhov
Beata Michalska  writes:

> Introduce configurable generic interface for file
> system-wide event notifications, to provide file
> systems with a common way of reporting any potential
> issues as they emerge.
>
> The notifications are to be issued through generic
> netlink interface by newly introduced multicast group.
>
> Threshold notifications have been included, allowing
> triggering an event whenever the amount of free space drops
> below a certain level - or levels to be more precise as two
> of them are being supported: the lower and the upper range.
> The notifications work both ways: once the threshold level
> has been reached, an event shall be generated whenever
> the number of available blocks goes up again re-activating
> the threshold.
>
> The interface has been exposed through a vfs. Once mounted,
> it serves as an entry point for the set-up where one can
> register for particular file system events.
>
> Signed-off-by: Beata Michalska 
> ---
>  Documentation/filesystems/events.txt |  232 ++
>  fs/Kconfig   |2 +
>  fs/Makefile  |1 +
>  fs/events/Kconfig|7 +
>  fs/events/Makefile   |5 +
>  fs/events/fs_event.c |  809 
> ++
>  fs/events/fs_event.h |   22 +
>  fs/events/fs_event_netlink.c |  104 +
>  fs/namespace.c   |1 +
>  include/linux/fs.h   |6 +-
>  include/linux/fs_event.h |   72 +++
>  include/uapi/linux/Kbuild|1 +
>  include/uapi/linux/fs_event.h|   58 +++
>  13 files changed, 1319 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/filesystems/events.txt
>  create mode 100644 fs/events/Kconfig
>  create mode 100644 fs/events/Makefile
>  create mode 100644 fs/events/fs_event.c
>  create mode 100644 fs/events/fs_event.h
>  create mode 100644 fs/events/fs_event_netlink.c
>  create mode 100644 include/linux/fs_event.h
>  create mode 100644 include/uapi/linux/fs_event.h
>
> diff --git a/Documentation/filesystems/events.txt 
> b/Documentation/filesystems/events.txt
> new file mode 100644
> index 000..c2e6227
> --- /dev/null
> +++ b/Documentation/filesystems/events.txt
> @@ -0,0 +1,232 @@
> +
> + Generic file system event notification interface
> +
> +Document created 23 April 2015 by Beata Michalska 
> +
> +1. The reason behind:
> +=
> +
> +There are many corner cases when things might get messy with the filesystems.
> +And it is not always obvious what and when went wrong. Sometimes you might
> +get some subtle hints that there is something going on - but by the time
> +you realise it, it might be too late as you are already out-of-space
> +or the filesystem has been remounted as read-only (i.e.). The generic
> +interface for the filesystem events fills the gap by providing a rather
> +easy way of real-time notifications triggered whenever something interesting
> +happens, allowing filesystems to report events in a common way, as they 
> occur.
> +
> +2. How does it work:
> +
> +
> +The interface itself has been exposed as fstrace-type Virtual File System,
> +primarily to ease the process of setting up the configuration for the
> +notifications. So for starters, it needs to get mounted (obviously):
> +
> + mount -t fstrace none /sys/fs/events
> +
> +This will unveil the single fstrace filesystem entry - the 'config' file,
> +through which the notification are being set-up.
> +
> +Activating notifications for particular filesystem is as straightforward
> +as writing into the 'config' file. Note that by default all events, despite
> +the actual filesystem type, are being disregarded.
> +
> +Synopsis of config:
> +--
> +
> + MOUNT EVENT_TYPE [L1] [L2]
> +
> + MOUNT  : the filesystem's mount point
> + EVENT_TYPE : event types - currently two of them are being supported:
> +
> +   * generic events ("G") covering most common warnings
> +   and errors that might be reported by any filesystem;
> +   this option does not take any arguments;
> +
> +   * threshold notifications ("T") - events sent whenever
> +   the amount of available space drops below certain level;
> +   it is possible to specify two threshold levels though
> +   only one is required to properly setup the notifications;
> +   as those refer to the number of available blocks, the lower
> +   level [L1] needs to be higher than the upper one [L2]
> +
> +Sample request could look like the following:
> +
> + echo /sample/mount/point G T 71 50 > /sys/fs/events/config
> +
> +Multiple request might be specified provided they are separated with 
> semicolon.
> +
> +The configuration itself might be modified at any time. One can add/remove
> +particular event types for given fielsystem, modify the threshold levels,
> 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-24 Thread Dmitry Monakhov
Beata Michalska b.michal...@samsung.com writes:

 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska b.michal...@samsung.com
 ---
  Documentation/filesystems/events.txt |  232 ++
  fs/Kconfig   |2 +
  fs/Makefile  |1 +
  fs/events/Kconfig|7 +
  fs/events/Makefile   |5 +
  fs/events/fs_event.c |  809 
 ++
  fs/events/fs_event.h |   22 +
  fs/events/fs_event_netlink.c |  104 +
  fs/namespace.c   |1 +
  include/linux/fs.h   |6 +-
  include/linux/fs_event.h |   72 +++
  include/uapi/linux/Kbuild|1 +
  include/uapi/linux/fs_event.h|   58 +++
  13 files changed, 1319 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/filesystems/events.txt
  create mode 100644 fs/events/Kconfig
  create mode 100644 fs/events/Makefile
  create mode 100644 fs/events/fs_event.c
  create mode 100644 fs/events/fs_event.h
  create mode 100644 fs/events/fs_event_netlink.c
  create mode 100644 include/linux/fs_event.h
  create mode 100644 include/uapi/linux/fs_event.h

 diff --git a/Documentation/filesystems/events.txt 
 b/Documentation/filesystems/events.txt
 new file mode 100644
 index 000..c2e6227
 --- /dev/null
 +++ b/Documentation/filesystems/events.txt
 @@ -0,0 +1,232 @@
 +
 + Generic file system event notification interface
 +
 +Document created 23 April 2015 by Beata Michalska b.michal...@samsung.com
 +
 +1. The reason behind:
 +=
 +
 +There are many corner cases when things might get messy with the filesystems.
 +And it is not always obvious what and when went wrong. Sometimes you might
 +get some subtle hints that there is something going on - but by the time
 +you realise it, it might be too late as you are already out-of-space
 +or the filesystem has been remounted as read-only (i.e.). The generic
 +interface for the filesystem events fills the gap by providing a rather
 +easy way of real-time notifications triggered whenever something interesting
 +happens, allowing filesystems to report events in a common way, as they 
 occur.
 +
 +2. How does it work:
 +
 +
 +The interface itself has been exposed as fstrace-type Virtual File System,
 +primarily to ease the process of setting up the configuration for the
 +notifications. So for starters, it needs to get mounted (obviously):
 +
 + mount -t fstrace none /sys/fs/events
 +
 +This will unveil the single fstrace filesystem entry - the 'config' file,
 +through which the notification are being set-up.
 +
 +Activating notifications for particular filesystem is as straightforward
 +as writing into the 'config' file. Note that by default all events, despite
 +the actual filesystem type, are being disregarded.
 +
 +Synopsis of config:
 +--
 +
 + MOUNT EVENT_TYPE [L1] [L2]
 +
 + MOUNT  : the filesystem's mount point
 + EVENT_TYPE : event types - currently two of them are being supported:
 +
 +   * generic events (G) covering most common warnings
 +   and errors that might be reported by any filesystem;
 +   this option does not take any arguments;
 +
 +   * threshold notifications (T) - events sent whenever
 +   the amount of available space drops below certain level;
 +   it is possible to specify two threshold levels though
 +   only one is required to properly setup the notifications;
 +   as those refer to the number of available blocks, the lower
 +   level [L1] needs to be higher than the upper one [L2]
 +
 +Sample request could look like the following:
 +
 + echo /sample/mount/point G T 71 50  /sys/fs/events/config
 +
 +Multiple request might be specified provided they are separated with 
 semicolon.
 +
 +The configuration itself might be modified at any time. One can add/remove
 +particular event types for given fielsystem, modify the threshold levels,
 +and remove single or all entries from the 'config' 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-24 Thread Beata Michalska
On 06/24/2015 10:47 AM, Dmitry Monakhov wrote:
 Beata Michalska b.michal...@samsung.com writes:
 
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska b.michal...@samsung.com
 ---
  Documentation/filesystems/events.txt |  232 ++
  fs/Kconfig   |2 +
  fs/Makefile  |1 +
  fs/events/Kconfig|7 +
  fs/events/Makefile   |5 +
  fs/events/fs_event.c |  809 
 ++
  fs/events/fs_event.h |   22 +
  fs/events/fs_event_netlink.c |  104 +
  fs/namespace.c   |1 +
  include/linux/fs.h   |6 +-
  include/linux/fs_event.h |   72 +++
  include/uapi/linux/Kbuild|1 +
  include/uapi/linux/fs_event.h|   58 +++
  13 files changed, 1319 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/filesystems/events.txt
  create mode 100644 fs/events/Kconfig
  create mode 100644 fs/events/Makefile
  create mode 100644 fs/events/fs_event.c
  create mode 100644 fs/events/fs_event.h
  create mode 100644 fs/events/fs_event_netlink.c
  create mode 100644 include/linux/fs_event.h
  create mode 100644 include/uapi/linux/fs_event.h

 diff --git a/Documentation/filesystems/events.txt 
 b/Documentation/filesystems/events.txt
 new file mode 100644
 index 000..c2e6227
 --- /dev/null
 +++ b/Documentation/filesystems/events.txt
 @@ -0,0 +1,232 @@
 +
 +Generic file system event notification interface
 +
 +Document created 23 April 2015 by Beata Michalska b.michal...@samsung.com
 +
 +1. The reason behind:
 +=
 +
 +There are many corner cases when things might get messy with the 
 filesystems.
 +And it is not always obvious what and when went wrong. Sometimes you might
 +get some subtle hints that there is something going on - but by the time
 +you realise it, it might be too late as you are already out-of-space
 +or the filesystem has been remounted as read-only (i.e.). The generic
 +interface for the filesystem events fills the gap by providing a rather
 +easy way of real-time notifications triggered whenever something interesting
 +happens, allowing filesystems to report events in a common way, as they 
 occur.
 +
 +2. How does it work:
 +
 +
 +The interface itself has been exposed as fstrace-type Virtual File System,
 +primarily to ease the process of setting up the configuration for the
 +notifications. So for starters, it needs to get mounted (obviously):
 +
 +mount -t fstrace none /sys/fs/events
 +
 +This will unveil the single fstrace filesystem entry - the 'config' file,
 +through which the notification are being set-up.
 +
 +Activating notifications for particular filesystem is as straightforward
 +as writing into the 'config' file. Note that by default all events, despite
 +the actual filesystem type, are being disregarded.
 +
 +Synopsis of config:
 +--
 +
 +MOUNT EVENT_TYPE [L1] [L2]
 +
 + MOUNT  : the filesystem's mount point
 + EVENT_TYPE : event types - currently two of them are being supported:
 +
 +  * generic events (G) covering most common warnings
 +  and errors that might be reported by any filesystem;
 +  this option does not take any arguments;
 +
 +  * threshold notifications (T) - events sent whenever
 +  the amount of available space drops below certain level;
 +  it is possible to specify two threshold levels though
 +  only one is required to properly setup the notifications;
 +  as those refer to the number of available blocks, the lower
 +  level [L1] needs to be higher than the upper one [L2]
 +
 +Sample request could look like the following:
 +
 + echo /sample/mount/point G T 71 50  /sys/fs/events/config
 +
 +Multiple request might be specified provided they are separated with 
 semicolon.
 +
 +The configuration itself might be modified at any time. One can add/remove
 +particular event types for given fielsystem, modify the threshold levels,
 +and remove 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-24 Thread Steve French
On Wed, Jun 24, 2015 at 10:31 AM, Beata Michalska
b.michal...@samsung.com wrote:
 On 06/24/2015 10:47 AM, Dmitry Monakhov wrote:
 Beata Michalska b.michal...@samsung.com writes:

 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska b.michal...@samsung.com
 ---
  Documentation/filesystems/events.txt |  232 ++
  fs/Kconfig   |2 +
  fs/Makefile  |1 +
  fs/events/Kconfig|7 +
  fs/events/Makefile   |5 +
  fs/events/fs_event.c |  809 
 ++
  fs/events/fs_event.h |   22 +
  fs/events/fs_event_netlink.c |  104 +
  fs/namespace.c   |1 +
  include/linux/fs.h   |6 +-
  include/linux/fs_event.h |   72 +++
  include/uapi/linux/Kbuild|1 +
  include/uapi/linux/fs_event.h|   58 +++
  13 files changed, 1319 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/filesystems/events.txt
  create mode 100644 fs/events/Kconfig
  create mode 100644 fs/events/Makefile
  create mode 100644 fs/events/fs_event.c
  create mode 100644 fs/events/fs_event.h
  create mode 100644 fs/events/fs_event_netlink.c
  create mode 100644 include/linux/fs_event.h
  create mode 100644 include/uapi/linux/fs_event.h

 diff --git a/Documentation/filesystems/events.txt 
 b/Documentation/filesystems/events.txt
 new file mode 100644
 index 000..c2e6227
 --- /dev/null
 +++ b/Documentation/filesystems/events.txt
 @@ -0,0 +1,232 @@
 +
 +Generic file system event notification interface
 +
 +Document created 23 April 2015 by Beata Michalska b.michal...@samsung.com
 +
 +1. The reason behind:
 +=
 +
 +There are many corner cases when things might get messy with the 
 filesystems.
 +And it is not always obvious what and when went wrong. Sometimes you might
 +get some subtle hints that there is something going on - but by the time
 +you realise it, it might be too late as you are already out-of-space
 +or the filesystem has been remounted as read-only (i.e.). The generic
 +interface for the filesystem events fills the gap by providing a rather
 +easy way of real-time notifications triggered whenever something 
 interesting
 +happens, allowing filesystems to report events in a common way, as they 
 occur.
 +
 +2. How does it work:
 +
 +
 +The interface itself has been exposed as fstrace-type Virtual File System,
 +primarily to ease the process of setting up the configuration for the
 +notifications. So for starters, it needs to get mounted (obviously):
 +
 +mount -t fstrace none /sys/fs/events
 +
 +This will unveil the single fstrace filesystem entry - the 'config' file,
 +through which the notification are being set-up.
 +
 +Activating notifications for particular filesystem is as straightforward
 +as writing into the 'config' file. Note that by default all events, despite
 +the actual filesystem type, are being disregarded.
 +
 +Synopsis of config:
 +--
 +
 +MOUNT EVENT_TYPE [L1] [L2]
 +
 + MOUNT  : the filesystem's mount point
 + EVENT_TYPE : event types - currently two of them are being supported:
 +
 +  * generic events (G) covering most common warnings
 +  and errors that might be reported by any filesystem;
 +  this option does not take any arguments;
 +
 +  * threshold notifications (T) - events sent whenever
 +  the amount of available space drops below certain level;
 +  it is possible to specify two threshold levels though
 +  only one is required to properly setup the notifications;
 +  as those refer to the number of available blocks, the lower
 +  level [L1] needs to be higher than the upper one [L2]
 +
 +Sample request could look like the following:
 +
 + echo /sample/mount/point G T 71 50  /sys/fs/events/config
 +
 +Multiple request might be specified provided they are separated with 
 semicolon.
 +
 +The configuration itself might be modified at any time. One can add/remove
 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-22 Thread Beata Michalska
On 06/20/2015 01:21 AM, Dave Chinner wrote:
> On Fri, Jun 19, 2015 at 07:28:11PM +0200, Beata Michalska wrote:
>> On 06/19/2015 02:03 AM, Dave Chinner wrote:
>>> On Thu, Jun 18, 2015 at 10:25:08AM +0200, Beata Michalska wrote:
 On 06/18/2015 01:06 AM, Dave Chinner wrote:
> On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
>> Introduce configurable generic interface for file
>> system-wide event notifications, to provide file
>> systems with a common way of reporting any potential
>> issues as they emerge.
>>
>> The notifications are to be issued through generic
>> netlink interface by newly introduced multicast group.
>>
>> Threshold notifications have been included, allowing
>> triggering an event whenever the amount of free space drops
>> below a certain level - or levels to be more precise as two
>> of them are being supported: the lower and the upper range.
>> The notifications work both ways: once the threshold level
>> has been reached, an event shall be generated whenever
>> the number of available blocks goes up again re-activating
>> the threshold.
>>
>> The interface has been exposed through a vfs. Once mounted,
>> it serves as an entry point for the set-up where one can
>> register for particular file system events.
>>
>> Signed-off-by: Beata Michalska 
>
> This has massive scalability problems:
>>> 
> Have you noticed that the filesystems have percpu counters for
> tracking global space usage? There's good reason for that - taking a
> spinlock in such a hot accounting path causes severe contention.
>>> 
> Then puts the entire netlink send path inside this spinlock, which
> includes memory allocation and all sorts of non-filesystem code
> paths. And it may be inside critical filesystem locks as well
>
> Apart from the serialisation problem of the locking, adding
> memory allocation and the network send path to filesystem code
> that is effectively considered "innermost" filesystem code is going
> to have all sorts of problems for various filesystems. In the XFS
> case, we simply cannot execute this sort of function in the places
> where we update global space accounting.
>
> As it is, I think the basic concept of separate tracking of free
> space if fundamentally flawed. What I think needs to be done is that
> filesystems need access to the thresholds for events, and then the
> filesystems call fs_event_send_thresh() themselves from appropriate
> contexts (ie. without compromising locking, scalability, memory
> allocation recursion constraints, etc).
>
> e.g. instead of tracking every change in free space, a filesystem
> might execute this once every few seconds from a workqueue:
>
>   event = fs_event_need_space_warning(sb, )
>   if (event)
>   fs_event_send_thresh(sb, event);
>
> User still gets warnings about space usage, but there's no runtime
> overhead or problems with lock/memory allocation contexts, etc.

 Having fs to keep a firm hand on thresholds limits would indeed be
 far more sane approach though that would require each fs to
 add support for that and handle most of it on their own. Avoiding
> this was the main rationale behind this rfc.
 If fs people agree to that, I'll be more than willing to drop this
 in favour of the per-fs tracking solution. 
 Personally, I hope they will.
>>>
>>> I was hoping that you'd think a little more about my suggestion and
>>> work out how to do background threshold event detection generically.
>>> I kind of left it as "an exercise for the reader" because it seems
>>> obvious to me.
>>>
>>> Hint: ->statfs allows you to get the total, free and used space
>>> from filesystems in a generic manner.
>>>
>>> Cheers,
>>>
>>> Dave.
>>>
>>
>> I haven't given up on that, so yes, I'm still working on a more suitable
>> generic solution.
>> Background detection is one of the options, though it needs some more 
>> thoughts.
>> Giving up the sync approach means less accuracy for the threshold 
>> notifications,
>> but I guess this could be fine-tuned to get an acceptable level.
> 
> Accuracy really doesn't matter for threshold notifications - by the
> time the event is delivered to userspace it can already be wrong.
> 
>> Another bump:
>> how this tuning is supposed to be done (additional config option maybe)? 
> 
> Why would you need to tune it at all? You can't *stop* the operation
> that is triggering the threshold, so a few seconds delay on delivery
> isn't going to make any difference to anyone
> 
> You're overthinking this massively. All this needs is a work item
> per superblock, and when the thresholds are turned on it queues a
> self-repeating delayed work that calls ->statfs, checks against the
> configured threshold, issues an event if necessary, and then queues
> 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-22 Thread Beata Michalska
On 06/20/2015 01:21 AM, Dave Chinner wrote:
 On Fri, Jun 19, 2015 at 07:28:11PM +0200, Beata Michalska wrote:
 On 06/19/2015 02:03 AM, Dave Chinner wrote:
 On Thu, Jun 18, 2015 at 10:25:08AM +0200, Beata Michalska wrote:
 On 06/18/2015 01:06 AM, Dave Chinner wrote:
 On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska b.michal...@samsung.com

 This has massive scalability problems:
 
 Have you noticed that the filesystems have percpu counters for
 tracking global space usage? There's good reason for that - taking a
 spinlock in such a hot accounting path causes severe contention.
 
 Then puts the entire netlink send path inside this spinlock, which
 includes memory allocation and all sorts of non-filesystem code
 paths. And it may be inside critical filesystem locks as well

 Apart from the serialisation problem of the locking, adding
 memory allocation and the network send path to filesystem code
 that is effectively considered innermost filesystem code is going
 to have all sorts of problems for various filesystems. In the XFS
 case, we simply cannot execute this sort of function in the places
 where we update global space accounting.

 As it is, I think the basic concept of separate tracking of free
 space if fundamentally flawed. What I think needs to be done is that
 filesystems need access to the thresholds for events, and then the
 filesystems call fs_event_send_thresh() themselves from appropriate
 contexts (ie. without compromising locking, scalability, memory
 allocation recursion constraints, etc).

 e.g. instead of tracking every change in free space, a filesystem
 might execute this once every few seconds from a workqueue:

   event = fs_event_need_space_warning(sb, fs_free_space)
   if (event)
   fs_event_send_thresh(sb, event);

 User still gets warnings about space usage, but there's no runtime
 overhead or problems with lock/memory allocation contexts, etc.

 Having fs to keep a firm hand on thresholds limits would indeed be
 far more sane approach though that would require each fs to
 add support for that and handle most of it on their own. Avoiding
 this was the main rationale behind this rfc.
 If fs people agree to that, I'll be more than willing to drop this
 in favour of the per-fs tracking solution. 
 Personally, I hope they will.

 I was hoping that you'd think a little more about my suggestion and
 work out how to do background threshold event detection generically.
 I kind of left it as an exercise for the reader because it seems
 obvious to me.

 Hint: -statfs allows you to get the total, free and used space
 from filesystems in a generic manner.

 Cheers,

 Dave.


 I haven't given up on that, so yes, I'm still working on a more suitable
 generic solution.
 Background detection is one of the options, though it needs some more 
 thoughts.
 Giving up the sync approach means less accuracy for the threshold 
 notifications,
 but I guess this could be fine-tuned to get an acceptable level.
 
 Accuracy really doesn't matter for threshold notifications - by the
 time the event is delivered to userspace it can already be wrong.
 
 Another bump:
 how this tuning is supposed to be done (additional config option maybe)? 
 
 Why would you need to tune it at all? You can't *stop* the operation
 that is triggering the threshold, so a few seconds delay on delivery
 isn't going to make any difference to anyone
 
 You're overthinking this massively. All this needs is a work item
 per superblock, and when the thresholds are turned on it queues a
 self-repeating delayed work that calls -statfs, checks against the
 configured threshold, issues an event if necessary, and then queues
 itself again to run next period. When the threshold is turned off,
 the work is cancelled.
 
 Another option: a kernel thread that runs periodically and just
 calls iterate_supers() with a function that checks the sb for
 threshold events, and if configured runs -statfs and does the work,
 otherwise skips the sb. That avoids all the lifetime issues with
 using workqueues, you don't 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-19 Thread Dave Chinner
On Fri, Jun 19, 2015 at 07:28:11PM +0200, Beata Michalska wrote:
> On 06/19/2015 02:03 AM, Dave Chinner wrote:
> > On Thu, Jun 18, 2015 at 10:25:08AM +0200, Beata Michalska wrote:
> >> On 06/18/2015 01:06 AM, Dave Chinner wrote:
> >>> On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
>  Introduce configurable generic interface for file
>  system-wide event notifications, to provide file
>  systems with a common way of reporting any potential
>  issues as they emerge.
> 
>  The notifications are to be issued through generic
>  netlink interface by newly introduced multicast group.
> 
>  Threshold notifications have been included, allowing
>  triggering an event whenever the amount of free space drops
>  below a certain level - or levels to be more precise as two
>  of them are being supported: the lower and the upper range.
>  The notifications work both ways: once the threshold level
>  has been reached, an event shall be generated whenever
>  the number of available blocks goes up again re-activating
>  the threshold.
> 
>  The interface has been exposed through a vfs. Once mounted,
>  it serves as an entry point for the set-up where one can
>  register for particular file system events.
> 
>  Signed-off-by: Beata Michalska 
> >>>
> >>> This has massive scalability problems:
> > 
> >>> Have you noticed that the filesystems have percpu counters for
> >>> tracking global space usage? There's good reason for that - taking a
> >>> spinlock in such a hot accounting path causes severe contention.
> > 
> >>> Then puts the entire netlink send path inside this spinlock, which
> >>> includes memory allocation and all sorts of non-filesystem code
> >>> paths. And it may be inside critical filesystem locks as well
> >>>
> >>> Apart from the serialisation problem of the locking, adding
> >>> memory allocation and the network send path to filesystem code
> >>> that is effectively considered "innermost" filesystem code is going
> >>> to have all sorts of problems for various filesystems. In the XFS
> >>> case, we simply cannot execute this sort of function in the places
> >>> where we update global space accounting.
> >>>
> >>> As it is, I think the basic concept of separate tracking of free
> >>> space if fundamentally flawed. What I think needs to be done is that
> >>> filesystems need access to the thresholds for events, and then the
> >>> filesystems call fs_event_send_thresh() themselves from appropriate
> >>> contexts (ie. without compromising locking, scalability, memory
> >>> allocation recursion constraints, etc).
> >>>
> >>> e.g. instead of tracking every change in free space, a filesystem
> >>> might execute this once every few seconds from a workqueue:
> >>>
> >>>   event = fs_event_need_space_warning(sb, )
> >>>   if (event)
> >>>   fs_event_send_thresh(sb, event);
> >>>
> >>> User still gets warnings about space usage, but there's no runtime
> >>> overhead or problems with lock/memory allocation contexts, etc.
> >>
> >> Having fs to keep a firm hand on thresholds limits would indeed be
> >> far more sane approach though that would require each fs to
> >> add support for that and handle most of it on their own. Avoiding
> >>> this was the main rationale behind this rfc.
> >> If fs people agree to that, I'll be more than willing to drop this
> >> in favour of the per-fs tracking solution. 
> >> Personally, I hope they will.
> > 
> > I was hoping that you'd think a little more about my suggestion and
> > work out how to do background threshold event detection generically.
> > I kind of left it as "an exercise for the reader" because it seems
> > obvious to me.
> > 
> > Hint: ->statfs allows you to get the total, free and used space
> > from filesystems in a generic manner.
> > 
> > Cheers,
> > 
> > Dave.
> > 
> 
> I haven't given up on that, so yes, I'm still working on a more suitable
> generic solution.
> Background detection is one of the options, though it needs some more 
> thoughts.
> Giving up the sync approach means less accuracy for the threshold 
> notifications,
> but I guess this could be fine-tuned to get an acceptable level.

Accuracy really doesn't matter for threshold notifications - by the
time the event is delivered to userspace it can already be wrong.

> Another bump:
> how this tuning is supposed to be done (additional config option maybe)? 

Why would you need to tune it at all? You can't *stop* the operation
that is triggering the threshold, so a few seconds delay on delivery
isn't going to make any difference to anyone

You're overthinking this massively. All this needs is a work item
per superblock, and when the thresholds are turned on it queues a
self-repeating delayed work that calls ->statfs, checks against the
configured threshold, issues an event if necessary, and then queues
itself again to run next period. When the threshold is turned off,
the work is 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-19 Thread Beata Michalska
On 06/19/2015 02:03 AM, Dave Chinner wrote:
> On Thu, Jun 18, 2015 at 10:25:08AM +0200, Beata Michalska wrote:
>> On 06/18/2015 01:06 AM, Dave Chinner wrote:
>>> On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska 
>>>
>>> This has massive scalability problems:
> 
>>> Have you noticed that the filesystems have percpu counters for
>>> tracking global space usage? There's good reason for that - taking a
>>> spinlock in such a hot accounting path causes severe contention.
> 
>>> Then puts the entire netlink send path inside this spinlock, which
>>> includes memory allocation and all sorts of non-filesystem code
>>> paths. And it may be inside critical filesystem locks as well
>>>
>>> Apart from the serialisation problem of the locking, adding
>>> memory allocation and the network send path to filesystem code
>>> that is effectively considered "innermost" filesystem code is going
>>> to have all sorts of problems for various filesystems. In the XFS
>>> case, we simply cannot execute this sort of function in the places
>>> where we update global space accounting.
>>>
>>> As it is, I think the basic concept of separate tracking of free
>>> space if fundamentally flawed. What I think needs to be done is that
>>> filesystems need access to the thresholds for events, and then the
>>> filesystems call fs_event_send_thresh() themselves from appropriate
>>> contexts (ie. without compromising locking, scalability, memory
>>> allocation recursion constraints, etc).
>>>
>>> e.g. instead of tracking every change in free space, a filesystem
>>> might execute this once every few seconds from a workqueue:
>>>
>>> event = fs_event_need_space_warning(sb, )
>>> if (event)
>>> fs_event_send_thresh(sb, event);
>>>
>>> User still gets warnings about space usage, but there's no runtime
>>> overhead or problems with lock/memory allocation contexts, etc.
>>
>> Having fs to keep a firm hand on thresholds limits would indeed be
>> far more sane approach though that would require each fs to
>> add support for that and handle most of it on their own. Avoiding
>>> this was the main rationale behind this rfc.
>> If fs people agree to that, I'll be more than willing to drop this
>> in favour of the per-fs tracking solution. 
>> Personally, I hope they will.
> 
> I was hoping that you'd think a little more about my suggestion and
> work out how to do background threshold event detection generically.
> I kind of left it as "an exercise for the reader" because it seems
> obvious to me.
> 
> Hint: ->statfs allows you to get the total, free and used space
> from filesystems in a generic manner.
> 
> Cheers,
> 
> Dave.
> 

I haven't given up on that, so yes, I'm still working on a more suitable
generic solution.
Background detection is one of the options, though it needs some more thoughts.
Giving up the sync approach means less accuracy for the threshold notifications,
but I guess this could be fine-tuned to get an acceptable level. Another bump:
how this tuning is supposed to be done (additional config option maybe)? 
The interface would have to keep it somehow sane - but what would 'sane' mean
in this case (?) Also, I'm not sure whether single approach would server here
well for all the potentially supported file systems so this would have to be
properly adjusted (taking the threshold levels into consideration as well). 
And still,it would require some form of synchronization with tracked fs so that
this 'detection' is not being unnecessarily performed (i.e. while fs remains 
frozen).

There is also an idea of using an interface resembling the stackable fs:
a transparent file system layered on top of the tracked one 
(solely for the tracking purposes). This would simplify handling the trace 
object's lifetime - no more list of registered traces.
It would also give a way of tracking (to some extent) the changes in the amount
of available space, which combined with tweaked background check could give
a solution with 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-19 Thread Dave Chinner
On Fri, Jun 19, 2015 at 07:28:11PM +0200, Beata Michalska wrote:
 On 06/19/2015 02:03 AM, Dave Chinner wrote:
  On Thu, Jun 18, 2015 at 10:25:08AM +0200, Beata Michalska wrote:
  On 06/18/2015 01:06 AM, Dave Chinner wrote:
  On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
  Introduce configurable generic interface for file
  system-wide event notifications, to provide file
  systems with a common way of reporting any potential
  issues as they emerge.
 
  The notifications are to be issued through generic
  netlink interface by newly introduced multicast group.
 
  Threshold notifications have been included, allowing
  triggering an event whenever the amount of free space drops
  below a certain level - or levels to be more precise as two
  of them are being supported: the lower and the upper range.
  The notifications work both ways: once the threshold level
  has been reached, an event shall be generated whenever
  the number of available blocks goes up again re-activating
  the threshold.
 
  The interface has been exposed through a vfs. Once mounted,
  it serves as an entry point for the set-up where one can
  register for particular file system events.
 
  Signed-off-by: Beata Michalska b.michal...@samsung.com
 
  This has massive scalability problems:
  
  Have you noticed that the filesystems have percpu counters for
  tracking global space usage? There's good reason for that - taking a
  spinlock in such a hot accounting path causes severe contention.
  
  Then puts the entire netlink send path inside this spinlock, which
  includes memory allocation and all sorts of non-filesystem code
  paths. And it may be inside critical filesystem locks as well
 
  Apart from the serialisation problem of the locking, adding
  memory allocation and the network send path to filesystem code
  that is effectively considered innermost filesystem code is going
  to have all sorts of problems for various filesystems. In the XFS
  case, we simply cannot execute this sort of function in the places
  where we update global space accounting.
 
  As it is, I think the basic concept of separate tracking of free
  space if fundamentally flawed. What I think needs to be done is that
  filesystems need access to the thresholds for events, and then the
  filesystems call fs_event_send_thresh() themselves from appropriate
  contexts (ie. without compromising locking, scalability, memory
  allocation recursion constraints, etc).
 
  e.g. instead of tracking every change in free space, a filesystem
  might execute this once every few seconds from a workqueue:
 
event = fs_event_need_space_warning(sb, fs_free_space)
if (event)
fs_event_send_thresh(sb, event);
 
  User still gets warnings about space usage, but there's no runtime
  overhead or problems with lock/memory allocation contexts, etc.
 
  Having fs to keep a firm hand on thresholds limits would indeed be
  far more sane approach though that would require each fs to
  add support for that and handle most of it on their own. Avoiding
  this was the main rationale behind this rfc.
  If fs people agree to that, I'll be more than willing to drop this
  in favour of the per-fs tracking solution. 
  Personally, I hope they will.
  
  I was hoping that you'd think a little more about my suggestion and
  work out how to do background threshold event detection generically.
  I kind of left it as an exercise for the reader because it seems
  obvious to me.
  
  Hint: -statfs allows you to get the total, free and used space
  from filesystems in a generic manner.
  
  Cheers,
  
  Dave.
  
 
 I haven't given up on that, so yes, I'm still working on a more suitable
 generic solution.
 Background detection is one of the options, though it needs some more 
 thoughts.
 Giving up the sync approach means less accuracy for the threshold 
 notifications,
 but I guess this could be fine-tuned to get an acceptable level.

Accuracy really doesn't matter for threshold notifications - by the
time the event is delivered to userspace it can already be wrong.

 Another bump:
 how this tuning is supposed to be done (additional config option maybe)? 

Why would you need to tune it at all? You can't *stop* the operation
that is triggering the threshold, so a few seconds delay on delivery
isn't going to make any difference to anyone

You're overthinking this massively. All this needs is a work item
per superblock, and when the thresholds are turned on it queues a
self-repeating delayed work that calls -statfs, checks against the
configured threshold, issues an event if necessary, and then queues
itself again to run next period. When the threshold is turned off,
the work is cancelled.

Another option: a kernel thread that runs periodically and just
calls iterate_supers() with a function that checks the sb for
threshold events, and if configured runs -statfs and does the work,
otherwise skips the sb. That avoids all the lifetime issues with
using 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-19 Thread Beata Michalska
On 06/19/2015 02:03 AM, Dave Chinner wrote:
 On Thu, Jun 18, 2015 at 10:25:08AM +0200, Beata Michalska wrote:
 On 06/18/2015 01:06 AM, Dave Chinner wrote:
 On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska b.michal...@samsung.com

 This has massive scalability problems:
 
 Have you noticed that the filesystems have percpu counters for
 tracking global space usage? There's good reason for that - taking a
 spinlock in such a hot accounting path causes severe contention.
 
 Then puts the entire netlink send path inside this spinlock, which
 includes memory allocation and all sorts of non-filesystem code
 paths. And it may be inside critical filesystem locks as well

 Apart from the serialisation problem of the locking, adding
 memory allocation and the network send path to filesystem code
 that is effectively considered innermost filesystem code is going
 to have all sorts of problems for various filesystems. In the XFS
 case, we simply cannot execute this sort of function in the places
 where we update global space accounting.

 As it is, I think the basic concept of separate tracking of free
 space if fundamentally flawed. What I think needs to be done is that
 filesystems need access to the thresholds for events, and then the
 filesystems call fs_event_send_thresh() themselves from appropriate
 contexts (ie. without compromising locking, scalability, memory
 allocation recursion constraints, etc).

 e.g. instead of tracking every change in free space, a filesystem
 might execute this once every few seconds from a workqueue:

 event = fs_event_need_space_warning(sb, fs_free_space)
 if (event)
 fs_event_send_thresh(sb, event);

 User still gets warnings about space usage, but there's no runtime
 overhead or problems with lock/memory allocation contexts, etc.

 Having fs to keep a firm hand on thresholds limits would indeed be
 far more sane approach though that would require each fs to
 add support for that and handle most of it on their own. Avoiding
 this was the main rationale behind this rfc.
 If fs people agree to that, I'll be more than willing to drop this
 in favour of the per-fs tracking solution. 
 Personally, I hope they will.
 
 I was hoping that you'd think a little more about my suggestion and
 work out how to do background threshold event detection generically.
 I kind of left it as an exercise for the reader because it seems
 obvious to me.
 
 Hint: -statfs allows you to get the total, free and used space
 from filesystems in a generic manner.
 
 Cheers,
 
 Dave.
 

I haven't given up on that, so yes, I'm still working on a more suitable
generic solution.
Background detection is one of the options, though it needs some more thoughts.
Giving up the sync approach means less accuracy for the threshold notifications,
but I guess this could be fine-tuned to get an acceptable level. Another bump:
how this tuning is supposed to be done (additional config option maybe)? 
The interface would have to keep it somehow sane - but what would 'sane' mean
in this case (?) Also, I'm not sure whether single approach would server here
well for all the potentially supported file systems so this would have to be
properly adjusted (taking the threshold levels into consideration as well). 
And still,it would require some form of synchronization with tracked fs so that
this 'detection' is not being unnecessarily performed (i.e. while fs remains 
frozen).

There is also an idea of using an interface resembling the stackable fs:
a transparent file system layered on top of the tracked one 
(solely for the tracking purposes). This would simplify handling the trace 
object's lifetime - no more list of registered traces.
It would also give a way of tracking (to some extent) the changes in the amount
of available space, which combined with tweaked background check could give
a solution with less performance overhead than the original one.
I'll try this one and see how it goes.

Thank You for your feedback so far - I really appreciate it.


Best Regards
Beata 



--
To 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-18 Thread Dave Chinner
On Thu, Jun 18, 2015 at 10:25:08AM +0200, Beata Michalska wrote:
> On 06/18/2015 01:06 AM, Dave Chinner wrote:
> > On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
> >> Introduce configurable generic interface for file
> >> system-wide event notifications, to provide file
> >> systems with a common way of reporting any potential
> >> issues as they emerge.
> >>
> >> The notifications are to be issued through generic
> >> netlink interface by newly introduced multicast group.
> >>
> >> Threshold notifications have been included, allowing
> >> triggering an event whenever the amount of free space drops
> >> below a certain level - or levels to be more precise as two
> >> of them are being supported: the lower and the upper range.
> >> The notifications work both ways: once the threshold level
> >> has been reached, an event shall be generated whenever
> >> the number of available blocks goes up again re-activating
> >> the threshold.
> >>
> >> The interface has been exposed through a vfs. Once mounted,
> >> it serves as an entry point for the set-up where one can
> >> register for particular file system events.
> >>
> >> Signed-off-by: Beata Michalska 
> > 
> > This has massive scalability problems:

> > Have you noticed that the filesystems have percpu counters for
> > tracking global space usage? There's good reason for that - taking a
> > spinlock in such a hot accounting path causes severe contention.

> > Then puts the entire netlink send path inside this spinlock, which
> > includes memory allocation and all sorts of non-filesystem code
> > paths. And it may be inside critical filesystem locks as well
> > 
> > Apart from the serialisation problem of the locking, adding
> > memory allocation and the network send path to filesystem code
> > that is effectively considered "innermost" filesystem code is going
> > to have all sorts of problems for various filesystems. In the XFS
> > case, we simply cannot execute this sort of function in the places
> > where we update global space accounting.
> > 
> > As it is, I think the basic concept of separate tracking of free
> > space if fundamentally flawed. What I think needs to be done is that
> > filesystems need access to the thresholds for events, and then the
> > filesystems call fs_event_send_thresh() themselves from appropriate
> > contexts (ie. without compromising locking, scalability, memory
> > allocation recursion constraints, etc).
> > 
> > e.g. instead of tracking every change in free space, a filesystem
> > might execute this once every few seconds from a workqueue:
> > 
> > event = fs_event_need_space_warning(sb, )
> > if (event)
> > fs_event_send_thresh(sb, event);
> > 
> > User still gets warnings about space usage, but there's no runtime
> > overhead or problems with lock/memory allocation contexts, etc.
> 
> Having fs to keep a firm hand on thresholds limits would indeed be
> far more sane approach though that would require each fs to
> add support for that and handle most of it on their own. Avoiding
>> this was the main rationale behind this rfc.
> If fs people agree to that, I'll be more than willing to drop this
> in favour of the per-fs tracking solution. 
> Personally, I hope they will.

I was hoping that you'd think a little more about my suggestion and
work out how to do background threshold event detection generically.
I kind of left it as "an exercise for the reader" because it seems
obvious to me.

Hint: ->statfs allows you to get the total, free and used space
from filesystems in a generic manner.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-18 Thread Beata Michalska
Hi,

On 06/18/2015 01:17 PM, Kinglong Mee wrote:
> On 6/16/2015 9:09 PM, Beata Michalska wrote:
>> Introduce configurable generic interface for file
>> system-wide event notifications, to provide file
>> systems with a common way of reporting any potential
>> issues as they emerge.
> ... snip ...
>> +
>> +Sample request could look like the following:
>> +
>> + echo /sample/mount/point G T 71 50 > /sys/fs/events/config
>> +
>> +Multiple request might be specified provided they are separated with 
>> semicolon.
>> +
>> +The configuration itself might be modified at any time. One can add/remove
>> +particular event types for given fielsystem, modify the threshold levels,
>> +and remove single or all entries from the 'config' file.
>> +
>> + - Adding new event type:
>> +
>> + $ echo MOUNT EVENT_TYPE > /sys/fs/events/config
>> +
>> +(Note that is is enough to provide the event type to be enabled without
> 
> Should be "Note that it is ... " here ?

Right
> 
>> +the already set ones.)
>> +
>> + - Removing event type:
>> +
>> + $ echo '!MOUNT EVENT_TYPE' > /sys/fs/events/config
>> +
>> + - Updating threshold limits:
>> +
>> + $ echo MOUNT T L1 L2 > /sys/fs/events/config
>> +
>> + - Removing single entry:
>> +
>> + $ echo '!MOUNT' > /sys/fs/events/config
>> +
>> + - Removing all entries:
>> +
>> + $ echo > /sys/fs/events/config
>> +
>> +Reading the file will list all registered entries with their current set-up
>> +along with some additional info like the filesystem type and the backing 
>> device
>> +name if available.
>> +
>> +Final, though a very important note on the configuration: when and if the
>> +actual events are being triggered falls way beyond the scope of the generic
>> +filesystem events interface. It is up to a particular filesystem
>> +implementation which events are to be supported - if any at all. So if
>> +given filesystem does not support the event notifications, an attempt to
>> +enable those through 'config' file will fail.
>> +
>> +
>> +3. The generic netlink interface support:
>> +=
>> +
>> +Whenever an event notification is triggered (by given filesystem) the 
>> current
>> +configuration is being validated to decide whether a userpsace notification
>> +should be launched. If there has been no request (in a mean of 'config' file
>> +entry) for given event, one will be silently disregarded. If, on the other
>> +hand, someone is 'watching' given filesystem for specific events, a generic
>> +netlink message will be sent. A dedicated multicast group has been provided
>> +solely for this purpose so in order to receive such notifications, one 
>> should
>> +subscribe to this new multicast group. As for now only the init network
>> +namespace is being supported.
>> +
>> +3.1 Message format
>> +
>> +The FS_NL_C_EVENT shall be stored within the generic netlink message header
>> +as the command field. The message payload will provide more detailed info:
>> +the backing device major and minor numbers, the event code and the id of
>> +the process which action led to the event occurrence. In case of threshold
>> +notifications, the current number of available blocks will be included
>> +in the payload as well.
>> +
>> +
>> + 0   1   2   3
>> + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>> ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> +|   NETLINK MESSAGE HEADER  |
>> ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> +|   GENERIC NETLINK MESSAGE HEADER  |
>> +|  (with FS_NL_C_EVENT as genlmsghdr cdm field) |
> 
> cmd, not cdm.

ditto
> 
>> ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> +| Optional user specific message header |
>> ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> +|  GENERIC MESSAGE PAYLOAD: |
>> ++---+
>> +| FS_NL_A_EVENT_ID  (NLA_U32)   |
>> ++---+
>> +| FS_NL_A_DEV_MAJOR (NLA_U32)   |
>> ++---+
>> +| FS_NL_A_DEV_MINOR (NLA_U32)   |
>> ++---+
>> +| FS_NL_A_CAUSED_ID (NLA_U32)   |
> 
> Should be NLA_U64 ? The following uses as, 
> 
> + if (nla_put_u64(skb, FS_NL_A_CAUSED_ID, pid_vnr(task_pid(current
> + return -EINVAL;
> 

Yes, or nla_put_u32 - either way my bad

> Also, I'd like FS_NL_A_CAUSED_PID than FS_NL_A_CAUSED_ID.

Alright
> 
>> +

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-18 Thread Kinglong Mee
On 6/16/2015 9:09 PM, Beata Michalska wrote:
> Introduce configurable generic interface for file
> system-wide event notifications, to provide file
> systems with a common way of reporting any potential
> issues as they emerge.
... snip ...
> +
> +Sample request could look like the following:
> +
> + echo /sample/mount/point G T 71 50 > /sys/fs/events/config
> +
> +Multiple request might be specified provided they are separated with 
> semicolon.
> +
> +The configuration itself might be modified at any time. One can add/remove
> +particular event types for given fielsystem, modify the threshold levels,
> +and remove single or all entries from the 'config' file.
> +
> + - Adding new event type:
> +
> + $ echo MOUNT EVENT_TYPE > /sys/fs/events/config
> +
> +(Note that is is enough to provide the event type to be enabled without

Should be "Note that it is ... " here ?

> +the already set ones.)
> +
> + - Removing event type:
> +
> + $ echo '!MOUNT EVENT_TYPE' > /sys/fs/events/config
> +
> + - Updating threshold limits:
> +
> + $ echo MOUNT T L1 L2 > /sys/fs/events/config
> +
> + - Removing single entry:
> +
> + $ echo '!MOUNT' > /sys/fs/events/config
> +
> + - Removing all entries:
> +
> + $ echo > /sys/fs/events/config
> +
> +Reading the file will list all registered entries with their current set-up
> +along with some additional info like the filesystem type and the backing 
> device
> +name if available.
> +
> +Final, though a very important note on the configuration: when and if the
> +actual events are being triggered falls way beyond the scope of the generic
> +filesystem events interface. It is up to a particular filesystem
> +implementation which events are to be supported - if any at all. So if
> +given filesystem does not support the event notifications, an attempt to
> +enable those through 'config' file will fail.
> +
> +
> +3. The generic netlink interface support:
> +=
> +
> +Whenever an event notification is triggered (by given filesystem) the current
> +configuration is being validated to decide whether a userpsace notification
> +should be launched. If there has been no request (in a mean of 'config' file
> +entry) for given event, one will be silently disregarded. If, on the other
> +hand, someone is 'watching' given filesystem for specific events, a generic
> +netlink message will be sent. A dedicated multicast group has been provided
> +solely for this purpose so in order to receive such notifications, one should
> +subscribe to this new multicast group. As for now only the init network
> +namespace is being supported.
> +
> +3.1 Message format
> +
> +The FS_NL_C_EVENT shall be stored within the generic netlink message header
> +as the command field. The message payload will provide more detailed info:
> +the backing device major and minor numbers, the event code and the id of
> +the process which action led to the event occurrence. In case of threshold
> +notifications, the current number of available blocks will be included
> +in the payload as well.
> +
> +
> +  0   1   2   3
> +  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + |   NETLINK MESSAGE HEADER  |
> + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + |   GENERIC NETLINK MESSAGE HEADER  |
> + |  (with FS_NL_C_EVENT as genlmsghdr cdm field) |

cmd, not cdm.

> + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + | Optional user specific message header |
> + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + |  GENERIC MESSAGE PAYLOAD: |
> + +---+
> + | FS_NL_A_EVENT_ID  (NLA_U32)   |
> + +---+
> + | FS_NL_A_DEV_MAJOR (NLA_U32)   |
> + +---+
> + | FS_NL_A_DEV_MINOR (NLA_U32)   |
> + +---+
> + | FS_NL_A_CAUSED_ID (NLA_U32)   |

Should be NLA_U64 ? The following uses as, 

+   if (nla_put_u64(skb, FS_NL_A_CAUSED_ID, pid_vnr(task_pid(current
+   return -EINVAL;

Also, I'd like FS_NL_A_CAUSED_PID than FS_NL_A_CAUSED_ID.

> + +---+
> + |   FS_NL_A_DATA (NLA_U64)  |
> + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> +
> +
> +The above figure is based on:
> + 
> 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-18 Thread Beata Michalska
Hi,

On 06/18/2015 01:06 AM, Dave Chinner wrote:
> On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
>> Introduce configurable generic interface for file
>> system-wide event notifications, to provide file
>> systems with a common way of reporting any potential
>> issues as they emerge.
>>
>> The notifications are to be issued through generic
>> netlink interface by newly introduced multicast group.
>>
>> Threshold notifications have been included, allowing
>> triggering an event whenever the amount of free space drops
>> below a certain level - or levels to be more precise as two
>> of them are being supported: the lower and the upper range.
>> The notifications work both ways: once the threshold level
>> has been reached, an event shall be generated whenever
>> the number of available blocks goes up again re-activating
>> the threshold.
>>
>> The interface has been exposed through a vfs. Once mounted,
>> it serves as an entry point for the set-up where one can
>> register for particular file system events.
>>
>> Signed-off-by: Beata Michalska 
> 
> This has massive scalability problems:
> 
>> + 4.3 Threshold notifications:
>> +
>> + #include 
>> + void fs_event_alloc_space(struct super_block *sb, u64 ncount);
>> + void fs_event_free_space(struct super_block *sb, u64 ncount);
>> +
>> + Each filesystme supporting the threshold notifications should call
>> + fs_event_alloc_space/fs_event_free_space respectively whenever the
>> + amount of available blocks changes.
>> + - sb: the filesystem's super block
>> + - ncount: number of blocks being acquired/released
> 
> ... here.
> 
>> + Note that to properly handle the threshold notifications the fs events
>> + interface needs to be kept up to date by the filesystems. Each should
>> + register fs_trace_operations to enable querying the current number of
>> + available blocks.
> 
> Have you noticed that the filesystems have percpu counters for
> tracking global space usage? There's good reason for that - taking a
> spinlock in such a hot accounting path causes severe contention.
> 
>> +static void fs_event_send(struct fs_trace_entry *en, unsigned int event_id)
>> +{
>> +size_t size = nla_total_size(sizeof(u32)) * 2 +
>> +  nla_total_size(sizeof(u64));
>> +
>> +fs_netlink_send_event(size, event_id, create_common_msg, en);
>> +}
>> +
>> +static void fs_event_send_thresh(struct fs_trace_entry *en,
>> +  unsigned int event_id)
>> +{
>> +size_t size = nla_total_size(sizeof(u32)) * 2 +
>> +  nla_total_size(sizeof(u64)) * 2;
>> +
>> +fs_netlink_send_event(size, event_id, create_thresh_msg, en);
>> +}
>> +
>> +void fs_event_notify(struct super_block *sb, unsigned int event_id)
>> +{
>> +struct fs_trace_entry *en;
>> +
>> +en = fs_trace_entry_get_rcu(sb);
>> +if (!en)
>> +return;
>> +
>> +spin_lock(>lock);
>> +if (atomic_read(>active) && (en->notify & FS_EVENT_GENERIC))
>> +fs_event_send(en, event_id);
>> +spin_unlock(>lock);
>> +fs_trace_entry_put(en);
>> +}
>> +EXPORT_SYMBOL(fs_event_notify);
>> +
>> +void fs_event_alloc_space(struct super_block *sb, u64 ncount)
>> +{
>> +struct fs_trace_entry *en;
>> +s64 count;
>> +
>> +en = fs_trace_entry_get_rcu(sb);
>> +if (!en)
>> +return;
> 
> Adds an atomic write to get the trace entry,
> 
>> +spin_lock(>lock);
> 
> a spin lock to lock the entry,
> 
> 
>> +if (!atomic_read(>active) || !(en->notify & FS_EVENT_THRESH))
>> +goto leave;
>> +/*
>> + * we shouldn't drop below 0 here,
>> + * unless there is a sync issue somewhere (?)
>> + */
>> +count = en->th.avail_space - ncount;
>> +en->th.avail_space = count < 0 ? 0 : count;
>> +
>> +if (en->th.avail_space > en->th.lrange)
>> +/* Not 'even' close - leave */
>> +goto leave;
>> +
>> +if (en->th.avail_space > en->th.urange) {
>> +/* Close enough - the lower range has been reached */
>> +if (!(en->th.state & THRESH_LR_BEYOND)) {
>> +/* Send notification */
>> +fs_event_send_thresh(en, FS_THR_LRBELOW);
>> +en->th.state &= ~THRESH_LR_BELOW;
>> +en->th.state |= THRESH_LR_BEYOND;
>> +}
>> +goto leave;
> 
> Then puts the entire netlink send path inside this spinlock, which
> includes memory allocation and all sorts of non-filesystem code
> paths. And it may be inside critical filesystem locks as well
> 
> Apart from the serialisation problem of the locking, adding
> memory allocation and the network send path to filesystem code
> that is effectively considered "innermost" filesystem code is going
> to have all sorts of problems for various filesystems. In the XFS
> case, we simply cannot execute this sort of function in the places
> where we update global space accounting.
> 
> As it is, I think the basic concept 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-18 Thread Kinglong Mee
On 6/16/2015 9:09 PM, Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.
... snip ...
 +
 +Sample request could look like the following:
 +
 + echo /sample/mount/point G T 71 50  /sys/fs/events/config
 +
 +Multiple request might be specified provided they are separated with 
 semicolon.
 +
 +The configuration itself might be modified at any time. One can add/remove
 +particular event types for given fielsystem, modify the threshold levels,
 +and remove single or all entries from the 'config' file.
 +
 + - Adding new event type:
 +
 + $ echo MOUNT EVENT_TYPE  /sys/fs/events/config
 +
 +(Note that is is enough to provide the event type to be enabled without

Should be Note that it is ...  here ?

 +the already set ones.)
 +
 + - Removing event type:
 +
 + $ echo '!MOUNT EVENT_TYPE'  /sys/fs/events/config
 +
 + - Updating threshold limits:
 +
 + $ echo MOUNT T L1 L2  /sys/fs/events/config
 +
 + - Removing single entry:
 +
 + $ echo '!MOUNT'  /sys/fs/events/config
 +
 + - Removing all entries:
 +
 + $ echo  /sys/fs/events/config
 +
 +Reading the file will list all registered entries with their current set-up
 +along with some additional info like the filesystem type and the backing 
 device
 +name if available.
 +
 +Final, though a very important note on the configuration: when and if the
 +actual events are being triggered falls way beyond the scope of the generic
 +filesystem events interface. It is up to a particular filesystem
 +implementation which events are to be supported - if any at all. So if
 +given filesystem does not support the event notifications, an attempt to
 +enable those through 'config' file will fail.
 +
 +
 +3. The generic netlink interface support:
 +=
 +
 +Whenever an event notification is triggered (by given filesystem) the current
 +configuration is being validated to decide whether a userpsace notification
 +should be launched. If there has been no request (in a mean of 'config' file
 +entry) for given event, one will be silently disregarded. If, on the other
 +hand, someone is 'watching' given filesystem for specific events, a generic
 +netlink message will be sent. A dedicated multicast group has been provided
 +solely for this purpose so in order to receive such notifications, one should
 +subscribe to this new multicast group. As for now only the init network
 +namespace is being supported.
 +
 +3.1 Message format
 +
 +The FS_NL_C_EVENT shall be stored within the generic netlink message header
 +as the command field. The message payload will provide more detailed info:
 +the backing device major and minor numbers, the event code and the id of
 +the process which action led to the event occurrence. In case of threshold
 +notifications, the current number of available blocks will be included
 +in the payload as well.
 +
 +
 +  0   1   2   3
 +  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 + |   NETLINK MESSAGE HEADER  |
 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 + |   GENERIC NETLINK MESSAGE HEADER  |
 + |  (with FS_NL_C_EVENT as genlmsghdr cdm field) |

cmd, not cdm.

 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 + | Optional user specific message header |
 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 + |  GENERIC MESSAGE PAYLOAD: |
 + +---+
 + | FS_NL_A_EVENT_ID  (NLA_U32)   |
 + +---+
 + | FS_NL_A_DEV_MAJOR (NLA_U32)   |
 + +---+
 + | FS_NL_A_DEV_MINOR (NLA_U32)   |
 + +---+
 + | FS_NL_A_CAUSED_ID (NLA_U32)   |

Should be NLA_U64 ? The following uses as, 

+   if (nla_put_u64(skb, FS_NL_A_CAUSED_ID, pid_vnr(task_pid(current
+   return -EINVAL;

Also, I'd like FS_NL_A_CAUSED_PID than FS_NL_A_CAUSED_ID.

 + +---+
 + |   FS_NL_A_DATA (NLA_U64)  |
 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +
 +
 +The above figure is based on:
 + 
 http://www.linuxfoundation.org/collaborate/workgroups/networking/generic_netlink_howto#Message_Format
 +
 +
... snip... 
 +  

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-18 Thread Beata Michalska
Hi,

On 06/18/2015 01:17 PM, Kinglong Mee wrote:
 On 6/16/2015 9:09 PM, Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.
 ... snip ...
 +
 +Sample request could look like the following:
 +
 + echo /sample/mount/point G T 71 50  /sys/fs/events/config
 +
 +Multiple request might be specified provided they are separated with 
 semicolon.
 +
 +The configuration itself might be modified at any time. One can add/remove
 +particular event types for given fielsystem, modify the threshold levels,
 +and remove single or all entries from the 'config' file.
 +
 + - Adding new event type:
 +
 + $ echo MOUNT EVENT_TYPE  /sys/fs/events/config
 +
 +(Note that is is enough to provide the event type to be enabled without
 
 Should be Note that it is ...  here ?

Right
 
 +the already set ones.)
 +
 + - Removing event type:
 +
 + $ echo '!MOUNT EVENT_TYPE'  /sys/fs/events/config
 +
 + - Updating threshold limits:
 +
 + $ echo MOUNT T L1 L2  /sys/fs/events/config
 +
 + - Removing single entry:
 +
 + $ echo '!MOUNT'  /sys/fs/events/config
 +
 + - Removing all entries:
 +
 + $ echo  /sys/fs/events/config
 +
 +Reading the file will list all registered entries with their current set-up
 +along with some additional info like the filesystem type and the backing 
 device
 +name if available.
 +
 +Final, though a very important note on the configuration: when and if the
 +actual events are being triggered falls way beyond the scope of the generic
 +filesystem events interface. It is up to a particular filesystem
 +implementation which events are to be supported - if any at all. So if
 +given filesystem does not support the event notifications, an attempt to
 +enable those through 'config' file will fail.
 +
 +
 +3. The generic netlink interface support:
 +=
 +
 +Whenever an event notification is triggered (by given filesystem) the 
 current
 +configuration is being validated to decide whether a userpsace notification
 +should be launched. If there has been no request (in a mean of 'config' file
 +entry) for given event, one will be silently disregarded. If, on the other
 +hand, someone is 'watching' given filesystem for specific events, a generic
 +netlink message will be sent. A dedicated multicast group has been provided
 +solely for this purpose so in order to receive such notifications, one 
 should
 +subscribe to this new multicast group. As for now only the init network
 +namespace is being supported.
 +
 +3.1 Message format
 +
 +The FS_NL_C_EVENT shall be stored within the generic netlink message header
 +as the command field. The message payload will provide more detailed info:
 +the backing device major and minor numbers, the event code and the id of
 +the process which action led to the event occurrence. In case of threshold
 +notifications, the current number of available blocks will be included
 +in the payload as well.
 +
 +
 + 0   1   2   3
 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +|   NETLINK MESSAGE HEADER  |
 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +|   GENERIC NETLINK MESSAGE HEADER  |
 +|  (with FS_NL_C_EVENT as genlmsghdr cdm field) |
 
 cmd, not cdm.

ditto
 
 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +| Optional user specific message header |
 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +|  GENERIC MESSAGE PAYLOAD: |
 ++---+
 +| FS_NL_A_EVENT_ID  (NLA_U32)   |
 ++---+
 +| FS_NL_A_DEV_MAJOR (NLA_U32)   |
 ++---+
 +| FS_NL_A_DEV_MINOR (NLA_U32)   |
 ++---+
 +| FS_NL_A_CAUSED_ID (NLA_U32)   |
 
 Should be NLA_U64 ? The following uses as, 
 
 + if (nla_put_u64(skb, FS_NL_A_CAUSED_ID, pid_vnr(task_pid(current
 + return -EINVAL;
 

Yes, or nla_put_u32 - either way my bad

 Also, I'd like FS_NL_A_CAUSED_PID than FS_NL_A_CAUSED_ID.

Alright
 
 ++---+
 +|   FS_NL_A_DATA (NLA_U64)  |
 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +
 +
 +The above figure is based on:
 + 
 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-18 Thread Dave Chinner
On Thu, Jun 18, 2015 at 10:25:08AM +0200, Beata Michalska wrote:
 On 06/18/2015 01:06 AM, Dave Chinner wrote:
  On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
  Introduce configurable generic interface for file
  system-wide event notifications, to provide file
  systems with a common way of reporting any potential
  issues as they emerge.
 
  The notifications are to be issued through generic
  netlink interface by newly introduced multicast group.
 
  Threshold notifications have been included, allowing
  triggering an event whenever the amount of free space drops
  below a certain level - or levels to be more precise as two
  of them are being supported: the lower and the upper range.
  The notifications work both ways: once the threshold level
  has been reached, an event shall be generated whenever
  the number of available blocks goes up again re-activating
  the threshold.
 
  The interface has been exposed through a vfs. Once mounted,
  it serves as an entry point for the set-up where one can
  register for particular file system events.
 
  Signed-off-by: Beata Michalska b.michal...@samsung.com
  
  This has massive scalability problems:

  Have you noticed that the filesystems have percpu counters for
  tracking global space usage? There's good reason for that - taking a
  spinlock in such a hot accounting path causes severe contention.

  Then puts the entire netlink send path inside this spinlock, which
  includes memory allocation and all sorts of non-filesystem code
  paths. And it may be inside critical filesystem locks as well
  
  Apart from the serialisation problem of the locking, adding
  memory allocation and the network send path to filesystem code
  that is effectively considered innermost filesystem code is going
  to have all sorts of problems for various filesystems. In the XFS
  case, we simply cannot execute this sort of function in the places
  where we update global space accounting.
  
  As it is, I think the basic concept of separate tracking of free
  space if fundamentally flawed. What I think needs to be done is that
  filesystems need access to the thresholds for events, and then the
  filesystems call fs_event_send_thresh() themselves from appropriate
  contexts (ie. without compromising locking, scalability, memory
  allocation recursion constraints, etc).
  
  e.g. instead of tracking every change in free space, a filesystem
  might execute this once every few seconds from a workqueue:
  
  event = fs_event_need_space_warning(sb, fs_free_space)
  if (event)
  fs_event_send_thresh(sb, event);
  
  User still gets warnings about space usage, but there's no runtime
  overhead or problems with lock/memory allocation contexts, etc.
 
 Having fs to keep a firm hand on thresholds limits would indeed be
 far more sane approach though that would require each fs to
 add support for that and handle most of it on their own. Avoiding
 this was the main rationale behind this rfc.
 If fs people agree to that, I'll be more than willing to drop this
 in favour of the per-fs tracking solution. 
 Personally, I hope they will.

I was hoping that you'd think a little more about my suggestion and
work out how to do background threshold event detection generically.
I kind of left it as an exercise for the reader because it seems
obvious to me.

Hint: -statfs allows you to get the total, free and used space
from filesystems in a generic manner.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-18 Thread Beata Michalska
Hi,

On 06/18/2015 01:06 AM, Dave Chinner wrote:
 On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

 Signed-off-by: Beata Michalska b.michal...@samsung.com
 
 This has massive scalability problems:
 
 + 4.3 Threshold notifications:
 +
 + #include linux/fs_event.h
 + void fs_event_alloc_space(struct super_block *sb, u64 ncount);
 + void fs_event_free_space(struct super_block *sb, u64 ncount);
 +
 + Each filesystme supporting the threshold notifications should call
 + fs_event_alloc_space/fs_event_free_space respectively whenever the
 + amount of available blocks changes.
 + - sb: the filesystem's super block
 + - ncount: number of blocks being acquired/released
 
 ... here.
 
 + Note that to properly handle the threshold notifications the fs events
 + interface needs to be kept up to date by the filesystems. Each should
 + register fs_trace_operations to enable querying the current number of
 + available blocks.
 
 Have you noticed that the filesystems have percpu counters for
 tracking global space usage? There's good reason for that - taking a
 spinlock in such a hot accounting path causes severe contention.
 
 +static void fs_event_send(struct fs_trace_entry *en, unsigned int event_id)
 +{
 +size_t size = nla_total_size(sizeof(u32)) * 2 +
 +  nla_total_size(sizeof(u64));
 +
 +fs_netlink_send_event(size, event_id, create_common_msg, en);
 +}
 +
 +static void fs_event_send_thresh(struct fs_trace_entry *en,
 +  unsigned int event_id)
 +{
 +size_t size = nla_total_size(sizeof(u32)) * 2 +
 +  nla_total_size(sizeof(u64)) * 2;
 +
 +fs_netlink_send_event(size, event_id, create_thresh_msg, en);
 +}
 +
 +void fs_event_notify(struct super_block *sb, unsigned int event_id)
 +{
 +struct fs_trace_entry *en;
 +
 +en = fs_trace_entry_get_rcu(sb);
 +if (!en)
 +return;
 +
 +spin_lock(en-lock);
 +if (atomic_read(en-active)  (en-notify  FS_EVENT_GENERIC))
 +fs_event_send(en, event_id);
 +spin_unlock(en-lock);
 +fs_trace_entry_put(en);
 +}
 +EXPORT_SYMBOL(fs_event_notify);
 +
 +void fs_event_alloc_space(struct super_block *sb, u64 ncount)
 +{
 +struct fs_trace_entry *en;
 +s64 count;
 +
 +en = fs_trace_entry_get_rcu(sb);
 +if (!en)
 +return;
 
 Adds an atomic write to get the trace entry,
 
 +spin_lock(en-lock);
 
 a spin lock to lock the entry,
 
 
 +if (!atomic_read(en-active) || !(en-notify  FS_EVENT_THRESH))
 +goto leave;
 +/*
 + * we shouldn't drop below 0 here,
 + * unless there is a sync issue somewhere (?)
 + */
 +count = en-th.avail_space - ncount;
 +en-th.avail_space = count  0 ? 0 : count;
 +
 +if (en-th.avail_space  en-th.lrange)
 +/* Not 'even' close - leave */
 +goto leave;
 +
 +if (en-th.avail_space  en-th.urange) {
 +/* Close enough - the lower range has been reached */
 +if (!(en-th.state  THRESH_LR_BEYOND)) {
 +/* Send notification */
 +fs_event_send_thresh(en, FS_THR_LRBELOW);
 +en-th.state = ~THRESH_LR_BELOW;
 +en-th.state |= THRESH_LR_BEYOND;
 +}
 +goto leave;
 
 Then puts the entire netlink send path inside this spinlock, which
 includes memory allocation and all sorts of non-filesystem code
 paths. And it may be inside critical filesystem locks as well
 
 Apart from the serialisation problem of the locking, adding
 memory allocation and the network send path to filesystem code
 that is effectively considered innermost filesystem code is going
 to have all sorts of problems for various filesystems. In the XFS
 case, we simply cannot execute this sort of function in the places
 where we update global space accounting.
 
 As it is, I think the basic concept of separate tracking of free
 space if fundamentally flawed. What I think needs to be done is that
 filesystems need access to the thresholds for events, and then the
 filesystems call fs_event_send_thresh() 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-17 Thread Dave Chinner
On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
> Introduce configurable generic interface for file
> system-wide event notifications, to provide file
> systems with a common way of reporting any potential
> issues as they emerge.
> 
> The notifications are to be issued through generic
> netlink interface by newly introduced multicast group.
> 
> Threshold notifications have been included, allowing
> triggering an event whenever the amount of free space drops
> below a certain level - or levels to be more precise as two
> of them are being supported: the lower and the upper range.
> The notifications work both ways: once the threshold level
> has been reached, an event shall be generated whenever
> the number of available blocks goes up again re-activating
> the threshold.
> 
> The interface has been exposed through a vfs. Once mounted,
> it serves as an entry point for the set-up where one can
> register for particular file system events.
> 
> Signed-off-by: Beata Michalska 

This has massive scalability problems:

> + 4.3 Threshold notifications:
> +
> + #include 
> + void fs_event_alloc_space(struct super_block *sb, u64 ncount);
> + void fs_event_free_space(struct super_block *sb, u64 ncount);
> +
> + Each filesystme supporting the threshold notifications should call
> + fs_event_alloc_space/fs_event_free_space respectively whenever the
> + amount of available blocks changes.
> + - sb: the filesystem's super block
> + - ncount: number of blocks being acquired/released

... here.

> + Note that to properly handle the threshold notifications the fs events
> + interface needs to be kept up to date by the filesystems. Each should
> + register fs_trace_operations to enable querying the current number of
> + available blocks.

Have you noticed that the filesystems have percpu counters for
tracking global space usage? There's good reason for that - taking a
spinlock in such a hot accounting path causes severe contention.

> +static void fs_event_send(struct fs_trace_entry *en, unsigned int event_id)
> +{
> + size_t size = nla_total_size(sizeof(u32)) * 2 +
> +   nla_total_size(sizeof(u64));
> +
> + fs_netlink_send_event(size, event_id, create_common_msg, en);
> +}
> +
> +static void fs_event_send_thresh(struct fs_trace_entry *en,
> +   unsigned int event_id)
> +{
> + size_t size = nla_total_size(sizeof(u32)) * 2 +
> +   nla_total_size(sizeof(u64)) * 2;
> +
> + fs_netlink_send_event(size, event_id, create_thresh_msg, en);
> +}
> +
> +void fs_event_notify(struct super_block *sb, unsigned int event_id)
> +{
> + struct fs_trace_entry *en;
> +
> + en = fs_trace_entry_get_rcu(sb);
> + if (!en)
> + return;
> +
> + spin_lock(>lock);
> + if (atomic_read(>active) && (en->notify & FS_EVENT_GENERIC))
> + fs_event_send(en, event_id);
> + spin_unlock(>lock);
> + fs_trace_entry_put(en);
> +}
> +EXPORT_SYMBOL(fs_event_notify);
> +
> +void fs_event_alloc_space(struct super_block *sb, u64 ncount)
> +{
> + struct fs_trace_entry *en;
> + s64 count;
> +
> + en = fs_trace_entry_get_rcu(sb);
> + if (!en)
> + return;

Adds an atomic write to get the trace entry,

> + spin_lock(>lock);

a spin lock to lock the entry,


> + if (!atomic_read(>active) || !(en->notify & FS_EVENT_THRESH))
> + goto leave;
> + /*
> +  * we shouldn't drop below 0 here,
> +  * unless there is a sync issue somewhere (?)
> +  */
> + count = en->th.avail_space - ncount;
> + en->th.avail_space = count < 0 ? 0 : count;
> +
> + if (en->th.avail_space > en->th.lrange)
> + /* Not 'even' close - leave */
> + goto leave;
> +
> + if (en->th.avail_space > en->th.urange) {
> + /* Close enough - the lower range has been reached */
> + if (!(en->th.state & THRESH_LR_BEYOND)) {
> + /* Send notification */
> + fs_event_send_thresh(en, FS_THR_LRBELOW);
> + en->th.state &= ~THRESH_LR_BELOW;
> + en->th.state |= THRESH_LR_BEYOND;
> + }
> + goto leave;

Then puts the entire netlink send path inside this spinlock, which
includes memory allocation and all sorts of non-filesystem code
paths. And it may be inside critical filesystem locks as well

Apart from the serialisation problem of the locking, adding
memory allocation and the network send path to filesystem code
that is effectively considered "innermost" filesystem code is going
to have all sorts of problems for various filesystems. In the XFS
case, we simply cannot execute this sort of function in the places
where we update global space accounting.

As it is, I think the basic concept of separate tracking of free
space if fundamentally flawed. What I think needs to be done is that
filesystems need access to the thresholds for events, and then the

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-17 Thread Beata Michalska
Hi,

On 06/16/2015 06:21 PM, Al Viro wrote:
> On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
>> Introduce configurable generic interface for file
>> system-wide event notifications, to provide file
>> systems with a common way of reporting any potential
>> issues as they emerge.
>>
>> The notifications are to be issued through generic
>> netlink interface by newly introduced multicast group.
>>
>> Threshold notifications have been included, allowing
>> triggering an event whenever the amount of free space drops
>> below a certain level - or levels to be more precise as two
>> of them are being supported: the lower and the upper range.
>> The notifications work both ways: once the threshold level
>> has been reached, an event shall be generated whenever
>> the number of available blocks goes up again re-activating
>> the threshold.
>>
>> The interface has been exposed through a vfs. Once mounted,
>> it serves as an entry point for the set-up where one can
>> register for particular file system events.
> 
> Hmm...
> 
> 1) what happens if two processes write to that file at the same time,
> trying to create an entry for the same fs?  WARN_ON() and fail for one
> of them if they race?
>

There are some limits here - I admit. The entries in the config file

might be overwritten at any time - there is no support for multiple 

config entries for the same mounted fs. This is mainly due to the threshold

notifications: handling potentially numerous threshold limits each time

the number of available blocks changes didn't seem like a good idea.

So this is more like a global config, resembling sysfs fs-related tune options.


> 2) what happens if fs is mounted more than once (e.g. in different
> namespaces, or bound at different mountpoints, or just plain mounted
> several times in different places) and we add an event for each?
> More specifically, what should happen when one of those gets unmounted?
> 

Each write to that file is being handled within the current namespace.
Setting up an entry for a mount point from a different mnt namespace
needs switching to that ns. As for bound mounts: the entry exists

until the mount point it has been registered with is detached. 
The events can only be registered for one of the mount points,
as they are tied with the super
 block - so one cannot have a separate
config entry for each bound mounts.


> 3) what's the meaning of ->active?  Is that "fs_drop_trace_entry() hadn't
> been called yet" flag?  Unless I'm misreading it, we can very well get
> explicit removal race with umount, resulting in cleanup_mnt() returning
> from fs_event_mount_dropped() before the first process (i.e. write
> asking to remove that entry) gets around to its deactivate_super(),
> ending up with umount(2) on a filesystem that isn't mounted anywhere
> else reporting success to userland before the actual fs shutdown, which
> is not a nice thing to do...
> 

The 'active' means simply that the entry for a given mounted fs
is still
 valid in a way that the events are still required: the entry
in the config file
 has not been removed. When the trace is
 being removed
- it's 'active' filed gets invalidated to mark that the events for related
fs are no longer needed. deactivate_super() should get called only once,
dropping the
 reference acquired while creating the entry (fs_new_trace_entry).

While in fs_drop_trace_entry, lock is being held (in both cases: unmount and
explicit 
entry removal). The fs_drop_trace_entry will silently skip all
the clean-up if the 
entry is inactive. I might be missing smth here - though.
If so,I would really appreciate some more of your comments.

> 4) test in fs_event_mount_dropped() looks very odd - by that point we
> are absolutely guaranteed to have ->mnt_ns == NULL.  What's that supposed
> to do?
>

I have totally missed the fact that the mnt namespace pointer is invalidated

during unmount_tree - cannot really explain why that did happen. So thank You

for pointing that out. 
This should be simply checking if it's still valid.
 This verification is
needed in case the mount that is being detached is not
 the one the events have
been registered with as they refer to fs not a particular
 mount point. This is
the case with the mnt namespaces: let's assume one registers
 for events for
particular mounted fs in an init mnt namespace, then the new mnt
 namespace is
being created with shared moutn points being cloned: so the same
 mount point
exists in both namespaces. Now if this mnt point gets detached:
 either through
umount or during the mnt namespace being swept out - the entry
 in the init mnt
namespace should remain untouched - same applies the other way round.
 
> 
> Al, trying to figure out the lifetime rules in all of that...
> 

Best Regards
Beata
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-17 Thread Dave Chinner
On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.
 
 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.
 
 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.
 
 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.
 
 Signed-off-by: Beata Michalska b.michal...@samsung.com

This has massive scalability problems:

 + 4.3 Threshold notifications:
 +
 + #include linux/fs_event.h
 + void fs_event_alloc_space(struct super_block *sb, u64 ncount);
 + void fs_event_free_space(struct super_block *sb, u64 ncount);
 +
 + Each filesystme supporting the threshold notifications should call
 + fs_event_alloc_space/fs_event_free_space respectively whenever the
 + amount of available blocks changes.
 + - sb: the filesystem's super block
 + - ncount: number of blocks being acquired/released

... here.

 + Note that to properly handle the threshold notifications the fs events
 + interface needs to be kept up to date by the filesystems. Each should
 + register fs_trace_operations to enable querying the current number of
 + available blocks.

Have you noticed that the filesystems have percpu counters for
tracking global space usage? There's good reason for that - taking a
spinlock in such a hot accounting path causes severe contention.

 +static void fs_event_send(struct fs_trace_entry *en, unsigned int event_id)
 +{
 + size_t size = nla_total_size(sizeof(u32)) * 2 +
 +   nla_total_size(sizeof(u64));
 +
 + fs_netlink_send_event(size, event_id, create_common_msg, en);
 +}
 +
 +static void fs_event_send_thresh(struct fs_trace_entry *en,
 +   unsigned int event_id)
 +{
 + size_t size = nla_total_size(sizeof(u32)) * 2 +
 +   nla_total_size(sizeof(u64)) * 2;
 +
 + fs_netlink_send_event(size, event_id, create_thresh_msg, en);
 +}
 +
 +void fs_event_notify(struct super_block *sb, unsigned int event_id)
 +{
 + struct fs_trace_entry *en;
 +
 + en = fs_trace_entry_get_rcu(sb);
 + if (!en)
 + return;
 +
 + spin_lock(en-lock);
 + if (atomic_read(en-active)  (en-notify  FS_EVENT_GENERIC))
 + fs_event_send(en, event_id);
 + spin_unlock(en-lock);
 + fs_trace_entry_put(en);
 +}
 +EXPORT_SYMBOL(fs_event_notify);
 +
 +void fs_event_alloc_space(struct super_block *sb, u64 ncount)
 +{
 + struct fs_trace_entry *en;
 + s64 count;
 +
 + en = fs_trace_entry_get_rcu(sb);
 + if (!en)
 + return;

Adds an atomic write to get the trace entry,

 + spin_lock(en-lock);

a spin lock to lock the entry,


 + if (!atomic_read(en-active) || !(en-notify  FS_EVENT_THRESH))
 + goto leave;
 + /*
 +  * we shouldn't drop below 0 here,
 +  * unless there is a sync issue somewhere (?)
 +  */
 + count = en-th.avail_space - ncount;
 + en-th.avail_space = count  0 ? 0 : count;
 +
 + if (en-th.avail_space  en-th.lrange)
 + /* Not 'even' close - leave */
 + goto leave;
 +
 + if (en-th.avail_space  en-th.urange) {
 + /* Close enough - the lower range has been reached */
 + if (!(en-th.state  THRESH_LR_BEYOND)) {
 + /* Send notification */
 + fs_event_send_thresh(en, FS_THR_LRBELOW);
 + en-th.state = ~THRESH_LR_BELOW;
 + en-th.state |= THRESH_LR_BEYOND;
 + }
 + goto leave;

Then puts the entire netlink send path inside this spinlock, which
includes memory allocation and all sorts of non-filesystem code
paths. And it may be inside critical filesystem locks as well

Apart from the serialisation problem of the locking, adding
memory allocation and the network send path to filesystem code
that is effectively considered innermost filesystem code is going
to have all sorts of problems for various filesystems. In the XFS
case, we simply cannot execute this sort of function in the places
where we update global space accounting.

As it is, I think the basic concept of separate tracking of free
space if fundamentally flawed. What I think needs to be done is that
filesystems need access to the thresholds for events, and then the
filesystems call fs_event_send_thresh() themselves from appropriate
contexts 

Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-17 Thread Beata Michalska
Hi,

On 06/16/2015 06:21 PM, Al Viro wrote:
 On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.

 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.

 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.

 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.
 
 Hmm...
 
 1) what happens if two processes write to that file at the same time,
 trying to create an entry for the same fs?  WARN_ON() and fail for one
 of them if they race?


There are some limits here - I admit. The entries in the config file

might be overwritten at any time - there is no support for multiple 

config entries for the same mounted fs. This is mainly due to the threshold

notifications: handling potentially numerous threshold limits each time

the number of available blocks changes didn't seem like a good idea.

So this is more like a global config, resembling sysfs fs-related tune options.


 2) what happens if fs is mounted more than once (e.g. in different
 namespaces, or bound at different mountpoints, or just plain mounted
 several times in different places) and we add an event for each?
 More specifically, what should happen when one of those gets unmounted?
 

Each write to that file is being handled within the current namespace.
Setting up an entry for a mount point from a different mnt namespace
needs switching to that ns. As for bound mounts: the entry exists

until the mount point it has been registered with is detached. 
The events can only be registered for one of the mount points,
as they are tied with the super
 block - so one cannot have a separate
config entry for each bound mounts.


 3) what's the meaning of -active?  Is that fs_drop_trace_entry() hadn't
 been called yet flag?  Unless I'm misreading it, we can very well get
 explicit removal race with umount, resulting in cleanup_mnt() returning
 from fs_event_mount_dropped() before the first process (i.e. write
 asking to remove that entry) gets around to its deactivate_super(),
 ending up with umount(2) on a filesystem that isn't mounted anywhere
 else reporting success to userland before the actual fs shutdown, which
 is not a nice thing to do...
 

The 'active' means simply that the entry for a given mounted fs
is still
 valid in a way that the events are still required: the entry
in the config file
 has not been removed. When the trace is
 being removed
- it's 'active' filed gets invalidated to mark that the events for related
fs are no longer needed. deactivate_super() should get called only once,
dropping the
 reference acquired while creating the entry (fs_new_trace_entry).

While in fs_drop_trace_entry, lock is being held (in both cases: unmount and
explicit 
entry removal). The fs_drop_trace_entry will silently skip all
the clean-up if the 
entry is inactive. I might be missing smth here - though.
If so,I would really appreciate some more of your comments.

 4) test in fs_event_mount_dropped() looks very odd - by that point we
 are absolutely guaranteed to have -mnt_ns == NULL.  What's that supposed
 to do?


I have totally missed the fact that the mnt namespace pointer is invalidated

during unmount_tree - cannot really explain why that did happen. So thank You

for pointing that out. 
This should be simply checking if it's still valid.
 This verification is
needed in case the mount that is being detached is not
 the one the events have
been registered with as they refer to fs not a particular
 mount point. This is
the case with the mnt namespaces: let's assume one registers
 for events for
particular mounted fs in an init mnt namespace, then the new mnt
 namespace is
being created with shared moutn points being cloned: so the same
 mount point
exists in both namespaces. Now if this mnt point gets detached:
 either through
umount or during the mnt namespace being swept out - the entry
 in the init mnt
namespace should remain untouched - same applies the other way round.
 
 
 Al, trying to figure out the lifetime rules in all of that...
 

Best Regards
Beata
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-16 Thread Al Viro
On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
> Introduce configurable generic interface for file
> system-wide event notifications, to provide file
> systems with a common way of reporting any potential
> issues as they emerge.
> 
> The notifications are to be issued through generic
> netlink interface by newly introduced multicast group.
> 
> Threshold notifications have been included, allowing
> triggering an event whenever the amount of free space drops
> below a certain level - or levels to be more precise as two
> of them are being supported: the lower and the upper range.
> The notifications work both ways: once the threshold level
> has been reached, an event shall be generated whenever
> the number of available blocks goes up again re-activating
> the threshold.
> 
> The interface has been exposed through a vfs. Once mounted,
> it serves as an entry point for the set-up where one can
> register for particular file system events.

Hmm...

1) what happens if two processes write to that file at the same time,
trying to create an entry for the same fs?  WARN_ON() and fail for one
of them if they race?

2) what happens if fs is mounted more than once (e.g. in different
namespaces, or bound at different mountpoints, or just plain mounted
several times in different places) and we add an event for each?
More specifically, what should happen when one of those gets unmounted?

3) what's the meaning of ->active?  Is that "fs_drop_trace_entry() hadn't
been called yet" flag?  Unless I'm misreading it, we can very well get
explicit removal race with umount, resulting in cleanup_mnt() returning
from fs_event_mount_dropped() before the first process (i.e. write
asking to remove that entry) gets around to its deactivate_super(),
ending up with umount(2) on a filesystem that isn't mounted anywhere
else reporting success to userland before the actual fs shutdown, which
is not a nice thing to do...

4) test in fs_event_mount_dropped() looks very odd - by that point we
are absolutely guaranteed to have ->mnt_ns == NULL.  What's that supposed
to do?


Al, trying to figure out the lifetime rules in all of that...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v3 1/4] fs: Add generic file system event notifications

2015-06-16 Thread Al Viro
On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
 Introduce configurable generic interface for file
 system-wide event notifications, to provide file
 systems with a common way of reporting any potential
 issues as they emerge.
 
 The notifications are to be issued through generic
 netlink interface by newly introduced multicast group.
 
 Threshold notifications have been included, allowing
 triggering an event whenever the amount of free space drops
 below a certain level - or levels to be more precise as two
 of them are being supported: the lower and the upper range.
 The notifications work both ways: once the threshold level
 has been reached, an event shall be generated whenever
 the number of available blocks goes up again re-activating
 the threshold.
 
 The interface has been exposed through a vfs. Once mounted,
 it serves as an entry point for the set-up where one can
 register for particular file system events.

Hmm...

1) what happens if two processes write to that file at the same time,
trying to create an entry for the same fs?  WARN_ON() and fail for one
of them if they race?

2) what happens if fs is mounted more than once (e.g. in different
namespaces, or bound at different mountpoints, or just plain mounted
several times in different places) and we add an event for each?
More specifically, what should happen when one of those gets unmounted?

3) what's the meaning of -active?  Is that fs_drop_trace_entry() hadn't
been called yet flag?  Unless I'm misreading it, we can very well get
explicit removal race with umount, resulting in cleanup_mnt() returning
from fs_event_mount_dropped() before the first process (i.e. write
asking to remove that entry) gets around to its deactivate_super(),
ending up with umount(2) on a filesystem that isn't mounted anywhere
else reporting success to userland before the actual fs shutdown, which
is not a nice thing to do...

4) test in fs_event_mount_dropped() looks very odd - by that point we
are absolutely guaranteed to have -mnt_ns == NULL.  What's that supposed
to do?


Al, trying to figure out the lifetime rules in all of that...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/