Re: [PATCH v3 2/7] documentation for stats_fs

2020-06-04 Thread Emanuele Giuseppe Esposito

Hi,


+
+The STATS_FS_HIDDEN attribute won't affect the aggregation, it will only
+block the creation of the files.


Why does HIDDEN block the creation of files?  instead of their visibility?


The file itself is used to allow the user to view the content of a 
value. In order to make it hidden, the framework just doesn't create the 
file.

The structure is still present and considered in statsfs, however.

Hidden in this case means not visible at all thus not created, not the 
hidden file concept of dotted files (".filename")





+
+Add values to parent and child (also here order doesn't matter)::
+
+struct kvm *base_ptr = kmalloc(..., sizeof(struct kvm));
+...
+stats_fs_source_add_values(child_source, kvm_stats, base_ptr, 0);
+stats_fs_source_add_values(parent_source, kvm_stats, NULL, 
STATS_FS_HIDDEN);
+
+``child_source`` will be a simple value, since it has a non-NULL base
+pointer, while ``parent_source`` will be an aggregate. During the adding
+phase, also values can optionally be marked as hidden, so that the folder
+and other values can be still shown.
+
+Of course the same ``struct stats_fs_value`` array can be also passed with a
+different base pointer, to represent the same value but in another instance
+of the kvm struct.
+
+Search:
+
+Fetch a value from the child source, returning the value
+pointed by ``(uint64_t *) base_ptr + kvm_stats[0].offset``::
+
+uint64_t ret_child, ret_parent;
+
+stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
+
+Fetch an aggregate value, searching all subsources of ``parent_source`` for
+the specified ``struct stats_fs_value``::
+
+stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
+
+assert(ret_child == ret_parent); // check expected result
+
+To make it more interesting, add another child::
+
+struct stats_fs_source child_source2 = stats_fs_source_create(0, 
"child2");
+
+stats_fs_source_add_subordinate(parent_source, child_source2);
+// now  the structure is parent -> child1
+//  -> child2


Is that the same as parent -> child1 -> child2
?  It could almost be read as
 parent -> child1
 parent -> child2


No the example in the documentation shows the relationship
parent -> child1 and
parent -> child2.
It's not the same as
parent -> child1 -> child2.
In order to do the latter, one would need to do:

stats_fs_source_add_subordinate(parent_source, child_source1);
stats_fs_source_add_subordinate(child_source1, child_source2);

Hope that this clarifies it.



Whichever it is, can you make it more explicit, please?



+
+struct kvm *other_base_ptr = kmalloc(..., sizeof(struct kvm));
+...
+stats_fs_source_add_values(child_source2, kvm_stats, other_base_ptr, 
0);
+
+Note that other_base_ptr points to another instance of kvm, so the struct
+stats_fs_value is the same but the address at which they point is not.
+
+Now get the aggregate value::
+
+uint64_t ret_child, ret_child2, ret_parent;
+
+stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
+stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
+stats_fs_source_get_value(child_source2, &kvm_stats[0], &ret_child2);
+
+assert((ret_child + ret_child2) == ret_parent);
+
+Cleanup::
+
+stats_fs_source_remove_subordinate(parent_source, child_source);
+stats_fs_source_revoke(child_source);
+stats_fs_source_put(child_source);
+
+stats_fs_source_remove_subordinate(parent_source, child_source2);
+stats_fs_source_revoke(child_source2);
+stats_fs_source_put(child_source2);
+
+stats_fs_source_put(parent_source);
+kfree(other_base_ptr);
+kfree(base_ptr);
+
+Calling stats_fs_source_revoke is very important, because it will ensure


stats_fs_source_revoke()


+that stats_fs will not access the data that were passed to
+stats_fs_source_add_value for this source.
+
+Because open files increase the reference count for a stats_fs_source, the
+source can end up living longer than the data that provides the values for
+the source.  Calling stats_fs_source_revoke just before the backing data


 stats_fs_source_revoke()


+is freed avoids accesses to freed data structures. The sources will return
+0.
+
+This is not needed for the parent_source, since it just contains
+aggregates that would be 0 anyways if no matching child value exist.
+
+API Documentation
+=
+
+.. kernel-doc:: include/linux/stats_fs.h
+   :export: fs/stats_fs/*.c
\ No newline at end of file


Please fix that. ^


Thanks for the documentation.



Thank you for the feedback,
Emanuele


Re: [PATCH v3 3/7] kunit: tests for stats_fs API

2020-05-27 Thread Emanuele Giuseppe Esposito




In order to run them, the kernel .config must set CONFIG_KUNIT=y
and a new .kunitconfig file must be created with CONFIG_STATS_FS=y
and CONFIG_STATS_FS_TEST=y



It looks like CONFIG_STATS_FS is built-in, but it exports
much of the functionality you are testing.  However could the
tests also be built as a module (i.e. make CONFIG_STATS_FS_TEST
a tristate variable)? To test this you'd need to specify
CONFIG_KUNIT=m and CONFIG_STATS_FS_TEST=m, and testing would
simply be a case of "modprobe"ing the stats fs module and collecting
results in /sys/kernel/debug/kunit/ (rather
than running kunit.py). Are you relying on unexported internals in
the the tests that would prevent building them as a module?



I haven't checked it yet, but tests should work as separate module.
I will look into it, thanks.

Emanuele



Re: [PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics

2020-05-27 Thread Emanuele Giuseppe Esposito





The file system is mounted on /sys/kernel/stats and would be already used
by kvm. Statsfs was initially introduced by Paolo Bonzini [1].


What's the direct motivation for this work? Moving KVM stats out of
debugfs?


There's many reasons: one of these is not using debugfs for statistics, 
but also (and mainly) to try and have a single tool that automatically 
takes care and displays them, instead of leaving each subsystem "on its 
own".


Sure, everyone gathers and processes stats in different ways, and the 
aim of this tool is to hopefully be extensible enough to cover all needs.

In my experience stats belong in the API used for creating/enumerating
objects, statsfs sounds like going in the exact opposite direction -
creating a parallel structure / hierarchy for exposing stats.


 I know

nothing about KVM but are you sure all the info that has to be exposed
will be stats?I don't understand, what do you mean here?




In case of networking we have the basic stats in sysfs, under the
netdevice's kobject. But since we're not using sysfs much any more
for config, new stats are added in netlink APIs. Again - same APIs
used for enumeration and config.


I don't really know a lot about the networking subsystem, and as it was 
pointed out in another email on patch 7 by Andrew, networking needs to 
atomically gather and display statistics in order to make them 
consistent, and currently this is not supported by stats_fs but could be 
added in future.


In addition, right now it won't work properly if the networking 
namespaces are enabled. That is another issue to take into 
consideration. That's also why I marked patch 7 as "not for merge"


Regarding the config, as I said the idea is to gather multiple 
subsystems' statistics, therefore there wouldn't be a single 
configuration method like in netlink.
For example in kvm there are file descriptors for configuration, and 
creating them requires no privilege, contrary to the network interfaces.


Thank you,
Emanuele



Re: [PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API

2020-05-26 Thread Emanuele Giuseppe Esposito



Hi Andrew


How do you atomically get and display a group of statistics?

If you look at how the netlink socket works, you will see code like:

 do {
 start = u64_stats_fetch_begin_irq(&cpu_stats->syncp);
 rx_packets = cpu_stats->rx_packets;
 rx_bytes = cpu_stats->rx_bytes;

 } while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, start));

It will ensure that rx_packets and rx_bytes are consistent with each
other. If the value of the sequence counter changes while inside the
loop, the loop so repeated until it does not change.

In general, hardware counters in NICs are the same.  You tell it to
take a snapshot of the statistics counters, and then read them all
back, to give a consistent view across all the statistics.

I've not looked at this new code in detail, but it looks like you have
one file per statistic, and assume each statistic is independent of
every other statistic. This independence can limit how you use the
values, particularly when debugging. The netlink interface we use does
not have this limitation.


You're right, statistics are treated independently so what you describe 
is currently not supported.


In KVM the utilization is more qualitative, so there isn't such problem.
But as long as the interface is based on file access, the possibility of 
snapshotting might not be useful; however, it could still be considered 
to be added later together with the binary access.


Jonathan, how is your metricfs handling this case?

Thank you,
Emanuele



[PATCH v3 7/7] [not for merge] netstats: example use of stats_fs API

2020-05-26 Thread Emanuele Giuseppe Esposito
Apply stats_fs on the networking statistics subsystem.

Currently it only works with disabled network namespace
(CONFIG_NET_NS=n), because multiple namespaces will have the same
device name under the same root source that will cause a conflict in
stats_fs.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/linux/netdevice.h |  2 ++
 net/Kconfig   |  1 +
 net/core/dev.c| 66 +++
 3 files changed, 69 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 130a668049ab..408c4e7b0e21 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -2117,6 +2118,7 @@ struct net_device {
unsignedwol_enabled:1;
 
struct list_headnet_notifier_list;
+   struct stats_fs_source  *stats_fs_src;
 
 #if IS_ENABLED(CONFIG_MACSEC)
/* MACsec management functions */
diff --git a/net/Kconfig b/net/Kconfig
index df8d8c9bd021..3441d5bb6107 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -8,6 +8,7 @@ menuconfig NET
select NLATTR
select GENERIC_NET_UTILS
select BPF
+   select STATS_FS_API
---help---
  Unless you really know what you are doing, you should say Y here.
  The reason is that some programs need kernel networking support even
diff --git a/net/core/dev.c b/net/core/dev.c
index 522288177bbd..3db48cd1a097 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -142,6 +142,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "net-sysfs.h"
 
@@ -150,6 +151,11 @@
 /* This should be increased if a protocol with a bigger head is added. */
 #define GRO_MAX_HEAD (MAX_HEADER + 128)
 
+#define NETDEV_STAT(str, m, ...)   
\
+   { str, offsetof(struct rtnl_link_stats64, m),   
\
+ &stats_fs_type_netdev_u64,
\
+ STATS_FS_SUM, ## __VA_ARGS__ }
+
 static DEFINE_SPINLOCK(ptype_lock);
 static DEFINE_SPINLOCK(offload_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
@@ -196,6 +202,53 @@ static DEFINE_READ_MOSTLY_HASHTABLE(napi_hash, 8);
 
 static seqcount_t devnet_rename_seq;
 
+static uint64_t stats_fs_get_netdev_u64(struct stats_fs_value *val,
+   void *base)
+{
+   struct net_device *netdev = (struct net_device *)base;
+   struct rtnl_link_stats64 net_stats;
+
+   dev_get_stats(netdev, &net_stats);
+
+   return stats_fs_get_u64(val, &net_stats);
+}
+
+static struct stats_fs_type stats_fs_type_netdev_u64 = {
+   .get = stats_fs_get_netdev_u64,
+   .clear = NULL,
+   .sign = false
+};
+
+static struct stats_fs_source *netdev_root;
+
+static struct stats_fs_value stats_fs_netdev_entries[] = {
+   NETDEV_STAT("rx_packets", rx_packets),
+   NETDEV_STAT("tx_packets", tx_packets),
+   NETDEV_STAT("rx_bytes", rx_bytes),
+   NETDEV_STAT("tx_bytes", tx_bytes),
+   NETDEV_STAT("rx_errors", rx_errors),
+   NETDEV_STAT("tx_errors", tx_errors),
+   NETDEV_STAT("rx_dropped", rx_dropped),
+   NETDEV_STAT("tx_dropped", tx_dropped),
+   NETDEV_STAT("multicast", multicast),
+   NETDEV_STAT("collisions", collisions),
+   NETDEV_STAT("rx_length_errors", rx_length_errors),
+   NETDEV_STAT("rx_over_errors", rx_over_errors),
+   NETDEV_STAT("rx_crc_errors", rx_crc_errors),
+   NETDEV_STAT("rx_frame_errors", rx_frame_errors),
+   NETDEV_STAT("rx_fifo_errors", rx_fifo_errors),
+   NETDEV_STAT("rx_missed_errors", rx_missed_errors),
+   NETDEV_STAT("tx_aborted_errors", tx_aborted_errors),
+   NETDEV_STAT("tx_carrier_errors", tx_carrier_errors),
+   NETDEV_STAT("tx_fifo_errors", tx_fifo_errors),
+   NETDEV_STAT("tx_heartbeat_errors", tx_heartbeat_errors),
+   NETDEV_STAT("tx_window_errors", tx_window_errors),
+   NETDEV_STAT("rx_compressed", rx_compressed),
+   NETDEV_STAT("tx_compressed", tx_compressed),
+   NETDEV_STAT("rx_nohandler", rx_nohandler),
+   { NULL }
+};
+
 static inline void dev_base_seq_inc(struct net *net)
 {
while (++net->dev_base_seq == 0)
@@ -8783,6 +8836,11 @@ static void rollback_registered_many(struct list_head 
*head)
ASSERT_RTNL();
 
list_for_each_entry_safe(dev, tmp, head, unreg_list) {
+   stats_fs_source_remove_subordinate(netdev_root,
+  dev->stats_fs_src);
+   stats_fs_source_revoke(dev->stats_fs_src);
+   stats_fs_source_put(dev->sta

[PATCH v3 6/7] [not for merge] kvm: example of stats_fs_value show function

2020-05-26 Thread Emanuele Giuseppe Esposito
Add an example of the show function using the mp_state value.

mp_state is an enum that represents the VCPU state,
so instead of displaying its integer representation,
the show function takes care of translating the integer into a
more meaningful string representation.

The VCPU status is shown in the kvm//vcpu/mp_state file

Signed-off-by: Emanuele Giuseppe Esposito 
---
 arch/x86/kvm/stats_fs.c | 54 +
 1 file changed, 54 insertions(+)

diff --git a/arch/x86/kvm/stats_fs.c b/arch/x86/kvm/stats_fs.c
index f6edebb9c559..902be18562da 100644
--- a/arch/x86/kvm/stats_fs.c
+++ b/arch/x86/kvm/stats_fs.c
@@ -39,11 +39,65 @@ struct stats_fs_value stats_fs_vcpu_arch_tsc_frac[] = {
{ NULL } /* base is &kvm_tsc_scaling_ratio_frac_bits */
 };
 
+char *stats_fs_vcpu_get_mpstate(uint64_t state)
+{
+   char *state_str;
+
+   state_str = kzalloc(20, GFP_KERNEL);
+   if (!state_str)
+   return ERR_PTR(-ENOMEM);
+
+   switch (state) {
+   case KVM_MP_STATE_RUNNABLE:
+   strcpy(state_str, "RUNNABLE");
+   break;
+   case KVM_MP_STATE_UNINITIALIZED:
+   strcpy(state_str, "UNINITIALIZED");
+   break;
+   case KVM_MP_STATE_INIT_RECEIVED:
+   strcpy(state_str, "INIT_RECEIVED");
+   break;
+   case KVM_MP_STATE_HALTED:
+   strcpy(state_str, "HALTED");
+   break;
+   case KVM_MP_STATE_SIPI_RECEIVED:
+   strcpy(state_str, "SIPI_RECEIVED");
+   break;
+   case KVM_MP_STATE_STOPPED:
+   strcpy(state_str, "STOPPED");
+   break;
+   case KVM_MP_STATE_CHECK_STOP:
+   strcpy(state_str, "CHECK_STOP");
+   break;
+   case KVM_MP_STATE_OPERATING:
+   strcpy(state_str, "OPERATING");
+   break;
+   case KVM_MP_STATE_LOAD:
+   strcpy(state_str, "LOAD");
+   break;
+   default:
+   strcpy(state_str, "UNRECOGNIZED");
+   break;
+   }
+
+   return state_str;
+}
+
+struct stats_fs_value stats_fs_vcpu_mp_state[] = {
+   VCPU_ARCH_STATS_FS("mp_state", kvm_vcpu_arch, mp_state,
+  .type = &stats_fs_type_u32,
+  .show = stats_fs_vcpu_get_mpstate),
+   { NULL }
+};
+
 void kvm_arch_create_vcpu_stats_fs(struct kvm_vcpu *vcpu)
 {
stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_tsc_offset,
   &vcpu->arch, 0);
 
+   stats_fs_source_add_values(vcpu->stats_fs_src, stats_fs_vcpu_mp_state,
+  &vcpu->arch, 0);
+
if (lapic_in_kernel(vcpu))
stats_fs_source_add_values(vcpu->stats_fs_src,
   stats_fs_vcpu_arch_lapic_timer,
-- 
2.25.4



[PATCH v3 5/7] kvm_main: replace debugfs with stats_fs

2020-05-26 Thread Emanuele Giuseppe Esposito
Use stats_fs API instead of debugfs to create sources and add values.

This also requires to change all architecture files to replace the old
debugfs_entries with stats_fs_vcpu_entries and statsfs_vm_entries.

The files/folders name and organization is kept unchanged, and a symlink
in sys/kernel/debugfs/kvm is left for backward compatibility.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 arch/arm64/kvm/Kconfig  |   1 +
 arch/arm64/kvm/guest.c  |   2 +-
 arch/mips/kvm/Kconfig   |   1 +
 arch/mips/kvm/mips.c|   2 +-
 arch/powerpc/kvm/Kconfig|   1 +
 arch/powerpc/kvm/book3s.c   |  12 +-
 arch/powerpc/kvm/booke.c|   8 +-
 arch/s390/kvm/Kconfig   |   1 +
 arch/s390/kvm/kvm-s390.c|  16 +-
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/Kconfig|   1 +
 arch/x86/kvm/Makefile   |   2 +-
 arch/x86/kvm/debugfs.c  |  64 ---
 arch/x86/kvm/stats_fs.c |  60 ++
 arch/x86/kvm/x86.c  |  11 +-
 include/linux/kvm_host.h|  45 ++---
 virt/kvm/arm/arm.c  |   2 +-
 virt/kvm/kvm_main.c | 318 +---
 18 files changed, 161 insertions(+), 388 deletions(-)
 delete mode 100644 arch/x86/kvm/debugfs.c
 create mode 100644 arch/x86/kvm/stats_fs.c

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 449386d76441..f95f6d1c3610 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -23,6 +23,7 @@ config KVM
depends on OF
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET && MULTIUSER
+   select STATS_FS_API
select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select HAVE_KVM_CPU_RELAX_INTERCEPT
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 8417b200bec9..235ed44e4353 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,7 +29,7 @@
 
 #include "trace.h"
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("halt_successful_poll", halt_successful_poll),
VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index b91d145aa2d5..b19fbc5297b4 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -19,6 +19,7 @@ config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM
depends on MIPS_FP_SUPPORT
+   select STATS_FS_API
select EXPORT_UASM
select PREEMPT_NOTIFIERS
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index fdf1c14d9205..a47d21f35444 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -39,7 +39,7 @@
 #define VECTORSPACING 0x100/* for EI/VI mode */
 #endif
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("wait", wait_exits),
VCPU_STAT("cache", cache_exits),
VCPU_STAT("signal", signal_exits),
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 12885eda324e..6f0675edfe7c 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -19,6 +19,7 @@ if VIRTUALIZATION
 
 config KVM
bool
+   select STATS_FS_API
select PREEMPT_NOTIFIERS
select HAVE_KVM_EVENTFD
select HAVE_KVM_VCPU_ASYNC_IOCTL
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 37508a356f28..e3346b3087d0 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -38,7 +38,7 @@
 
 /* #define EXIT_DEBUG */
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("exits", sum_exits),
VCPU_STAT("mmio", mmio_exits),
VCPU_STAT("sig", signal_exits),
@@ -66,8 +66,14 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
VCPU_STAT("pthru_all", pthru_all),
VCPU_STAT("pthru_host", pthru_host),
VCPU_STAT("pthru_bad_aff", pthru_bad_aff),
-   VM_STAT("largepages_2M", num_2M_pages, .mode = 0444),
-   VM_STAT("largepages_1G", num_1G_pages, .mode = 0444),
+   { NULL }
+};
+
+struct stats_fs_value stats_fs_vm_entries[] = {
+   VM_STAT("largepages_2M", num_2M_pages,
+   .value_flag = STATS_FS_FLOATING_VALUE),
+   VM_STAT("largepages_1G", num_1G_pages,
+   .value_flag = STATS_FS_FLOATING_VALUE),
{ NULL }
 };
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c2984cb6dfa7..b14c07786cc8 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -35,7 +35,12 @@
 
 unsigned long kvmppc_b

[PATCH v3 4/7] stats_fs fs: virtual fs to show stats to the end-user

2020-05-26 Thread Emanuele Giuseppe Esposito
Add virtual fs that maps stats_fs sources with directories, and values
(simple or aggregates) to files.

Every time a file is read/cleared, the fs internally invokes the stats_fs
API to get/set the requested value.

Also introduce the optional show function in each value, that allows
to customize how the value is displayed inside a file. This could be
especially useful with enums.

fs/stats_fs/inode.cis pretty much similar to what is done in
fs/debugfs/inode.c, with the exception that the API is only
composed by stats_fs_create_file, stats_fs_create_dir and stats_fs_remove.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/stats_fs/Makefile   |   2 +-
 fs/stats_fs/inode.c| 461 +
 fs/stats_fs/internal.h |  15 ++
 fs/stats_fs/stats_fs.c |  92 +++-
 include/linux/stats_fs.h   |  18 ++
 include/uapi/linux/magic.h |   1 +
 tools/lib/api/fs/fs.c  |  21 ++
 7 files changed, 608 insertions(+), 2 deletions(-)
 create mode 100644 fs/stats_fs/inode.c

diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
index bc59a54d5721..19b7e13f6c3d 100644
--- a/fs/stats_fs/Makefile
+++ b/fs/stats_fs/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-stats_fs-objs  := stats_fs.o
+stats_fs-objs  := inode.o stats_fs.o
 stats_fs-tests-objs:= stats_fs-tests.o
 
 obj-$(CONFIG_STATS_FS) += stats_fs.o
diff --git a/fs/stats_fs/inode.c b/fs/stats_fs/inode.c
new file mode 100644
index ..eaa0a8bc7466
--- /dev/null
+++ b/fs/stats_fs/inode.c
@@ -0,0 +1,461 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  inode.c - part of stats_fs, a tiny little stats_fs file system
+ *
+ *  Copyright (C) 2020 Emanuele Giuseppe Esposito 
+ *  Copyright (C) 2020 Redhat
+ */
+#define pr_fmt(fmt)"stats_fs: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "internal.h"
+
+#define STATS_FS_DEFAULT_MODE  0700
+
+static struct simple_fs stats_fs;
+static bool stats_fs_registered;
+
+struct stats_fs_mount_opts {
+   kuid_t uid;
+   kgid_t gid;
+   umode_t mode;
+};
+
+enum {
+   Opt_uid,
+   Opt_gid,
+   Opt_mode,
+   Opt_err
+};
+
+static const match_table_t tokens = {
+   {Opt_uid, "uid=%u"},
+   {Opt_gid, "gid=%u"},
+   {Opt_mode, "mode=%o"},
+   {Opt_err, NULL}
+};
+
+struct stats_fs_fs_info {
+   struct stats_fs_mount_opts mount_opts;
+};
+
+static int stats_fs_parse_options(char *data, struct stats_fs_mount_opts *opts)
+{
+   substring_t args[MAX_OPT_ARGS];
+   int option;
+   int token;
+   kuid_t uid;
+   kgid_t gid;
+   char *p;
+
+   opts->mode = STATS_FS_DEFAULT_MODE;
+
+   while ((p = strsep(&data, ",")) != NULL) {
+   if (!*p)
+   continue;
+
+   token = match_token(p, tokens, args);
+   switch (token) {
+   case Opt_uid:
+   if (match_int(&args[0], &option))
+   return -EINVAL;
+   uid = make_kuid(current_user_ns(), option);
+   if (!uid_valid(uid))
+   return -EINVAL;
+   opts->uid = uid;
+   break;
+   case Opt_gid:
+   if (match_int(&args[0], &option))
+   return -EINVAL;
+   gid = make_kgid(current_user_ns(), option);
+   if (!gid_valid(gid))
+   return -EINVAL;
+   opts->gid = gid;
+   break;
+   case Opt_mode:
+   if (match_octal(&args[0], &option))
+   return -EINVAL;
+   opts->mode = option & S_IALLUGO;
+   break;
+   /*
+* We might like to report bad mount options here;
+* but traditionally stats_fs has ignored all mount options
+*/
+   }
+   }
+
+   return 0;
+}
+
+static int stats_fs_apply_options(struct super_block *sb)
+{
+   struct stats_fs_fs_info *fsi = sb->s_fs_info;
+   struct inode *inode = d_inode(sb->s_root);
+   struct stats_fs_mount_opts *opts = &fsi->mount_opts;
+
+   inode->i_mode &= ~S_IALLUGO;
+   inode->i_mode |= opts->mode;
+
+   inode->i_uid = opts->uid;
+   inode->i_gid = opts->gid;
+
+   return 0;
+}
+
+static int stats_fs_remount(struct super_block *sb, int *flags, char *data)
+{
+   int err;
+   struct stats_fs_fs_info *fsi = sb->s_fs_info;
+
+   sync_filesystem(sb);
+   err = stats_fs_parse_options(data, &fsi->mount_opts);
+   if (err)
+   goto fail;
+
+   stats_fs_apply_opti

[PATCH v3 3/7] kunit: tests for stats_fs API

2020-05-26 Thread Emanuele Giuseppe Esposito
Add kunit tests to extensively test the stats_fs API functionality.

In order to run them, the kernel .config must set CONFIG_KUNIT=y
and a new .kunitconfig file must be created with CONFIG_STATS_FS=y
and CONFIG_STATS_FS_TEST=y

Tests can be then started by running the following command from the root
directory of the linux kernel source tree:
./tools/testing/kunit/kunit.py run --timeout=30 --jobs=`nproc --all`

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/Kconfig   |6 +
 fs/stats_fs/Makefile |2 +
 fs/stats_fs/stats_fs-tests.c | 1097 ++
 3 files changed, 1105 insertions(+)
 create mode 100644 fs/stats_fs/stats_fs-tests.c

diff --git a/fs/Kconfig b/fs/Kconfig
index 684ad61129ab..02bbb0e4cdf7 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -227,6 +227,12 @@ config STATS_FS
  stats_fs is a virtual file system that provides counters and
  other statistics about the running kernel.
 
+config STATS_FS_TEST
+   bool "Tests for stats_fs"
+   depends on STATS_FS && KUNIT
+   help
+ tests for the stats_fs API.
+
 config STATS_FS_API
bool
imply STATS_FS
diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
index bd988daa4c39..bc59a54d5721 100644
--- a/fs/stats_fs/Makefile
+++ b/fs/stats_fs/Makefile
@@ -1,5 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 stats_fs-objs  := stats_fs.o
+stats_fs-tests-objs:= stats_fs-tests.o
 
 obj-$(CONFIG_STATS_FS) += stats_fs.o
 obj-$(CONFIG_STATS_FS_STUB) += stub.o
+obj-$(CONFIG_STATS_FS_TEST) += stats_fs-tests.o
diff --git a/fs/stats_fs/stats_fs-tests.c b/fs/stats_fs/stats_fs-tests.c
new file mode 100644
index ..bbac133d7fe7
--- /dev/null
+++ b/fs/stats_fs/stats_fs-tests.c
@@ -0,0 +1,1097 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include "internal.h"
+
+#define STATS_FS_STAT(el, x, ...)  
\
+   {  \
+   .name = #x, .offset = offsetof(struct container, el.x),\
+   ##__VA_ARGS__  \
+   }
+
+#define ARR_SIZE(el) ((int)(sizeof(el) / sizeof(struct stats_fs_value) - 1))
+
+struct test_values_struct {
+   uint64_t u64;
+   int32_t s32;
+   bool bo;
+   uint8_t u8;
+   int16_t s16;
+};
+
+struct container {
+   struct test_values_struct vals;
+};
+
+struct stats_fs_value test_values[6] = {
+   STATS_FS_STAT(vals, u64, .type = &stats_fs_type_u64,
+ .aggr_kind = STATS_FS_NONE,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+   STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_NONE),
+   STATS_FS_STAT(vals, bo, .type = &stats_fs_type_bool,
+ .aggr_kind = STATS_FS_NONE,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+   STATS_FS_STAT(vals, u8, .type = &stats_fs_type_u8,
+ .aggr_kind = STATS_FS_NONE),
+   STATS_FS_STAT(vals, s16, .type = &stats_fs_type_s16,
+ .aggr_kind = STATS_FS_NONE,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+   { NULL },
+};
+
+struct stats_fs_value test_aggr[4] = {
+   STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_MIN,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+   STATS_FS_STAT(vals, bo, .type = &stats_fs_type_bool,
+ .aggr_kind = STATS_FS_MAX,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+   STATS_FS_STAT(vals, u64, .type = &stats_fs_type_u64,
+ .aggr_kind = STATS_FS_SUM,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+   { NULL },
+};
+
+struct stats_fs_value test_same_name[3] = {
+   STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_NONE),
+   STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_MIN),
+   { NULL },
+};
+
+struct stats_fs_value test_all_aggr[6] = {
+   STATS_FS_STAT(vals, s32, .type = &stats_fs_type_s32,
+ .aggr_kind = STATS_FS_MIN),
+   STATS_FS_STAT(vals, bo, .type = &stats_fs_type_bool,
+ .aggr_kind = STATS_FS_COUNT_ZERO,
+ .value_flag = STATS_FS_FLOATING_VALUE),
+   STATS_FS_STAT(vals, u64, .type = &stats_fs_type_u64,
+ .aggr_kind = STATS_FS_SUM),
+   STATS_FS_STAT(vals, u8, .type = &stats_fs_type_u8,
+ .aggr_kind = STATS_FS_AVG,
+ .value_flag = S

[PATCH v3 2/7] documentation for stats_fs

2020-05-26 Thread Emanuele Giuseppe Esposito
Html docs for a complete documentation of the stats_fs API,
filesystem and usage.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 Documentation/filesystems/index.rst|   1 +
 Documentation/filesystems/stats_fs.rst | 222 +
 2 files changed, 223 insertions(+)
 create mode 100644 Documentation/filesystems/stats_fs.rst

diff --git a/Documentation/filesystems/index.rst 
b/Documentation/filesystems/index.rst
index e7b46dac7079..9a46fd851c6e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -89,6 +89,7 @@ Documentation for filesystem implementations.
relay
romfs
squashfs
+   stats_fs
sysfs
sysv-fs
tmpfs
diff --git a/Documentation/filesystems/stats_fs.rst 
b/Documentation/filesystems/stats_fs.rst
new file mode 100644
index ..292c689ffb98
--- /dev/null
+++ b/Documentation/filesystems/stats_fs.rst
@@ -0,0 +1,222 @@
+
+Stats_FS
+
+
+Stats_fs is a synthetic ram-based virtual filesystem that takes care of
+gathering and displaying statistics for the Linux kernel subsystems.
+
+The motivation for stats_fs comes from the fact that there is no common
+way for Linux kernel subsystems to expose statistics to userspace shared
+throughout the Linux kernel; subsystems have to take care of gathering and
+displaying statistics by themselves, for example in the form of files in
+debugfs.
+
+Allowing each subsystem of the kernel to do so has two disadvantages.
+First, it will introduce redundant code. Second, debugfs is anyway not the
+right place for statistics (for example it is affected by lockdown).
+
+Stats_fs offers a generic and stable API, allowing any kind of
+directory/file organization and supporting multiple kind of aggregations
+(not only sum, but also average, max, min and count_zero) and data types
+(boolean, all unsigned/signed and custom types). The implementation takes
+care of gathering and displaying information at run time; users only need
+to specify the values to be included in each source. Optionally, users can
+also provide a display function for each value, that will take care of
+displaying the provided value in a custom format.
+
+Its main function is to display each statistics as a file in the desired
+folder hierarchy defined through the API. Stats_fs files can be read, and
+possibly cleared if their file mode allows it.
+
+Stats_fs is typically mounted with a command like::
+
+mount -t stats_fs stats_fs /sys/kernel/stats_fs
+
+(Or an equivalent /etc/fstab line).
+
+Stats_fs has two main components: the public API defined by
+include/linux/stats_fs.h, and the virtual file system in
+/sys/kernel/stats.
+
+The API has two main elements, values and sources. Kernel
+subsystems will create a source, add child
+sources/values/aggregates and register it to the root source (that on the
+virtual fs would be /sys/kernel/stats).
+
+The stats_fs API is defined in .
+
+Sources
+Sources are created via ``stats_fs_source_create()``, and each
+source becomes a directory in the file system. Sources form a
+parent-child relationship; root sources are added to the file
+system via ``stats_fs_source_register()``. Therefore each Linux
+subsystem will add its own entry to the root, filesystem similar
+to what it is done in debugfs. Every other source is added to or
+removed from a parent through the
+``stats_fs_source_add_subordinate()`` and
+``stats_fs_source_remove_subordinate()`` APIs. Once a source is
+created and added to the tree (via add_subordinate), it will be
+used to compute aggregate values in the parent source. A source
+can optionally be hidden from the filesystem but still considered
+in the aggregation operations if the corresponding flag is set
+during initialization.
+
+Values
+Values represent quantites that are gathered by the stats_fs user.
+Examples of values include the number of vm exits of a given kind,
+the amount of memory used by some data structure, the length of
+the longest hash table chain, or anything like that. Values are
+defined with the stats_fs_source_add_values function. Each value
+is defined by a ``struct stats_fs_value``; the same
+``stats_fs_value`` can be added to many different sources. A value
+can be considered "simple" if it fetches data from a user-provided
+location, or "aggregate" if it groups all values in the
+subordinate sources that include the same ``stats_fs_value``.
+Values by default are considered to be cumulative, meaning the
+value they represent never decreases, but can also be defined as
+floating if they exibith a different behavior. The main difference
+between these two is reflected into the file permission, since a
+floating value file does not allow the us

[PATCH v3 1/7] stats_fs API: create, add and remove stats_fs sources and values

2020-05-26 Thread Emanuele Giuseppe Esposito
Introduction to the stats_fs API, that allows to easily create, add
and remove stats_fs sources and values. The API allows to easily building
the statistics directory tree to automatically gather them for the linux
kernel. The main functionalities are: create a source, add child
sources/values/aggregates, register it to the root source (that on
the virtual fs would be /sys/kernel/stats), ad perform a search for
a value/aggregate.

Each source and value has an optional flag parameter:
in a value, it represent whether the statistic is cumulative or floating, in a
source whether it should be visible from the filesystem or not.
Defaults are respectively cumulative and visible.
Both flags fields are represented as an uint32_t to offer portability for
future flags.

Each value also takes a struct stats_fs_type pointer that defines
get and clear function for that stat, allowing custom
types handling. The API also  provides default get and clear types for
the supported standard types (stats_fs_type_*).

The API representation is only logical and will be backed up
by a virtual file system in patch 4.
Its usage will be shared between the stats_fs file system
and the end-users like kvm, the former calling it when it needs to
display and clear statistics, the latter to add values and sources.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 MAINTAINERS  |   7 +
 fs/Kconfig   |  14 +
 fs/Makefile  |   1 +
 fs/stats_fs/Makefile |   5 +
 fs/stats_fs/internal.h   |  19 ++
 fs/stats_fs/stats_fs.c   | 552 +++
 fs/stats_fs/stub.c   |  13 +
 include/linux/stats_fs.h | 363 +
 8 files changed, 974 insertions(+)
 create mode 100644 fs/stats_fs/Makefile
 create mode 100644 fs/stats_fs/internal.h
 create mode 100644 fs/stats_fs/stats_fs.c
 create mode 100644 fs/stats_fs/stub.c
 create mode 100644 include/linux/stats_fs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b816a453b10e..a8403d07cee5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5229,6 +5229,13 @@ F:   include/linux/debugfs.h
 F: include/linux/kobj*
 F: lib/kobj*
 
+STATS_FS
+M: Paolo Bonzini 
+R: Emanuele Giuseppe Esposito 
+S: Supported
+F: include/linux/stats_fs.h
+F: fs/stats_fs
+
 DRIVERS FOR ADAPTIVE VOLTAGE SCALING (AVS)
 M: Kevin Hilman 
 M: Nishanth Menon 
diff --git a/fs/Kconfig b/fs/Kconfig
index f08fbbfafd9a..684ad61129ab 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -221,6 +221,20 @@ config MEMFD_CREATE
 config ARCH_HAS_GIGANTIC_PAGE
bool
 
+config STATS_FS
+   bool "Statistics Filesystem"
+   help
+ stats_fs is a virtual file system that provides counters and
+ other statistics about the running kernel.
+
+config STATS_FS_API
+   bool
+   imply STATS_FS
+
+config STATS_FS_STUB
+   bool
+   default y if STATS_FS_API && !STATS_FS
+
 source "fs/configfs/Kconfig"
 source "fs/efivarfs/Kconfig"
 
diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..91558eca0cf7 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_BEFS_FS)   += befs/
 obj-$(CONFIG_HOSTFS)   += hostfs/
 obj-$(CONFIG_CACHEFILES)   += cachefiles/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
+obj-$(CONFIG_STATS_FS) += stats_fs/
 obj-$(CONFIG_TRACING)  += tracefs/
 obj-$(CONFIG_OCFS2_FS) += ocfs2/
 obj-$(CONFIG_BTRFS_FS) += btrfs/
diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
new file mode 100644
index ..bd988daa4c39
--- /dev/null
+++ b/fs/stats_fs/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+stats_fs-objs  := stats_fs.o
+
+obj-$(CONFIG_STATS_FS) += stats_fs.o
+obj-$(CONFIG_STATS_FS_STUB) += stub.o
diff --git a/fs/stats_fs/internal.h b/fs/stats_fs/internal.h
new file mode 100644
index ..4993afbb1e45
--- /dev/null
+++ b/fs/stats_fs/internal.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _STATS_FS_INTERNAL_H_
+#define _STATS_FS_INTERNAL_H_
+
+#include 
+#include 
+#include 
+#include 
+
+/* values, grouped by base */
+struct stats_fs_value_source {
+   void *base_addr;
+   bool files_created;
+   uint32_t common_flags;
+   struct stats_fs_value *values;
+   struct list_head list_element;
+};
+
+#endif /* _STATS_FS_INTERNAL_H_ */
diff --git a/fs/stats_fs/stats_fs.c b/fs/stats_fs/stats_fs.c
new file mode 100644
index ..b76ee44f6dac
--- /dev/null
+++ b/fs/stats_fs/stats_fs.c
@@ -0,0 +1,552 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "internal.h"
+
+struct stats_fs_aggregate_value {
+   uint64_t sum, min, max;
+   uint32_t count, count_zero;
+};
+
+#define STATS_FS_DEFINE_TYPE_STRUCT(gtype, stype, si)  
\
+   

[PATCH v3 0/7] Statsfs: a new ram-based file system for Linux kernel statistics

2020-05-26 Thread Emanuele Giuseppe Esposito
example of the show function and
patch 7 another real-life example in the networking subsystem.

[1] 
https://lore.kernel.org/kvm/5d6cdcb1-d8ad-7ae6-7351-3544e2fa3...@redhat.com/?fbclid=IwAR18LHJ0PBcXcDaLzILFhHsl3qpT3z2vlG60RnqgbpGYhDv7L43n0ZXJY8M

Signed-off-by: Emanuele Giuseppe Esposito 

v2 -> v3 move kconfig entry in the pseudo filesystem menu, add
documentation, get/clear function for value types, show function,
floating/cumulative and hidden flags. Also added the netstat
example

Emanuele Giuseppe Esposito (7):
  stats_fs API: create, add and remove stats_fs sources and values
  documentation for stats_fs
  kunit: tests for stats_fs API
  stats_fs fs: virtual fs to show stats to the end-user
  kvm_main: replace debugfs with stats_fs
  [not for merge] kvm: example of stats_fs_value show function
  [not for merge] netstats: example use of stats_fs API

 Documentation/filesystems/index.rst|1 +
 Documentation/filesystems/stats_fs.rst |  222 +
 MAINTAINERS|7 +
 arch/arm64/kvm/Kconfig |1 +
 arch/arm64/kvm/guest.c |2 +-
 arch/mips/kvm/Kconfig  |1 +
 arch/mips/kvm/mips.c   |2 +-
 arch/powerpc/kvm/Kconfig   |1 +
 arch/powerpc/kvm/book3s.c  |   12 +-
 arch/powerpc/kvm/booke.c   |8 +-
 arch/s390/kvm/Kconfig  |1 +
 arch/s390/kvm/kvm-s390.c   |   16 +-
 arch/x86/include/asm/kvm_host.h|2 +-
 arch/x86/kvm/Kconfig   |1 +
 arch/x86/kvm/Makefile  |2 +-
 arch/x86/kvm/debugfs.c |   64 --
 arch/x86/kvm/stats_fs.c|  114 +++
 arch/x86/kvm/x86.c |   11 +-
 fs/Kconfig |   20 +
 fs/Makefile|1 +
 fs/stats_fs/Makefile   |7 +
 fs/stats_fs/inode.c|  461 ++
 fs/stats_fs/internal.h |   34 +
 fs/stats_fs/stats_fs-tests.c   | 1097 
 fs/stats_fs/stats_fs.c |  642 ++
 fs/stats_fs/stub.c |   13 +
 include/linux/kvm_host.h   |   45 +-
 include/linux/netdevice.h  |2 +
 include/linux/stats_fs.h   |  381 
 include/uapi/linux/magic.h |1 +
 net/Kconfig|1 +
 net/core/dev.c |   68 ++
 tools/lib/api/fs/fs.c  |   21 +
 virt/kvm/arm/arm.c |2 +-
 virt/kvm/kvm_main.c|  317 +--
 35 files changed, 3193 insertions(+), 388 deletions(-)
 create mode 100644 Documentation/filesystems/stats_fs.rst
 delete mode 100644 arch/x86/kvm/debugfs.c
 create mode 100644 arch/x86/kvm/stats_fs.c
 create mode 100644 fs/stats_fs/Makefile
 create mode 100644 fs/stats_fs/inode.c
 create mode 100644 fs/stats_fs/internal.h
 create mode 100644 fs/stats_fs/stats_fs-tests.c
 create mode 100644 fs/stats_fs/stats_fs.c
 create mode 100644 fs/stats_fs/stub.c
 create mode 100644 include/linux/stats_fs.h

-- 
2.25.4



Re: [PATCH v2 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics

2020-05-11 Thread Emanuele Giuseppe Esposito


On 5/8/20 11:44 AM, Paolo Bonzini wrote:
> So in general I'd say the sources/values model holds up.  We certainly
> want to:
> 
> - switch immediately to callbacks instead of the type constants (so that
> core statsfs code only does signed/unsigned)
> 
> - add a field to distinguish cumulative and floating properties (and use
> it to determine the default file mode)
> 
> - add a new argument to statsfs_create_source and statsfs_create_values
> that makes it not create directories and files respectively
> 
> - add a new API to look for a statsfs_value recursively in all the
> subordinate sources, and pass the source/value pair to a callback
> function; and reimplement recursive aggregation and clear in terms of
> this function.

Ok I will apply this, thank you for all the suggestions. 
I will post the v3 patchset in the next few weeks. 

In the meanwhile, I wrote the documentation you asked (even though it's 
going to change in v3), you can find it here:

https://github.com/esposem/linux/commit/dfa92f270f1aed73d5f3b7f12640b2a1635c711f

Thank you,
Emanuele



Re: [PATCH v2 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics

2020-05-05 Thread Emanuele Giuseppe Esposito




On 5/4/20 11:37 PM, David Rientjes wrote:

On Mon, 4 May 2020, Emanuele Giuseppe Esposito wrote:



In this patch series I introduce statsfs, a synthetic ram-based virtual
filesystem that takes care of gathering and displaying statistics for the
Linux kernel subsystems.



This is exciting, we have been looking in the same area recently.  Adding
Jonathan Adams .

In your diffstat, one thing I notice that is omitted: an update to
Documentation/* :)  Any chance of getting some proposed Documentation/
updates with structure of the fs, the per subsystem breakdown, and best
practices for managing the stats from the kernel level?


Yes, I will write some documentation. Thank you for the suggestion.



Values represent quantites that are gathered by the statsfs user. Examples
of values include the number of vm exits of a given kind, the amount of
memory used by some data structure, the length of the longest hash table
chain, or anything like that. Values are defined with the
statsfs_source_add_values function. Each value is defined by a struct
statsfs_value; the same statsfs_value can be added to many different
sources. A value can be considered "simple" if it fetches data from a
user-provided location, or "aggregate" if it groups all values in the
subordinates sources that include the same statsfs_value.



This seems like it could have a lot of overhead if we wanted to
periodically track the totality of subsystem stats as a form of telemetry
gathering from userspace.  To collect telemetry for 1,000 different stats,
do we need to issue lseek()+read() syscalls for each of them individually
(or, worse, open()+read()+close())?

Any thoughts on how that can be optimized?  A couple of ideas:

  - an interface that allows gathering of all stats for a particular
interface through a single file that would likely be encoded in binary
and the responsibility of userspace to disseminate, or

  - an interface that extends beyond this proposal and allows the reader to
specify which stats they are interested in collecting and then the
kernel will only provide these stats in a well formed structure and
also be binary encoded.


Are you thinking of another file, containing all the stats for the 
directory in binary format?



We've found that the one-file-per-stat method is pretty much a show
stopper from the performance view and we always must execute at least two
syscalls to obtain a single stat.

Since this is becoming a generic API (good!!), maybe we can discuss
possible ways to optimize gathering of stats in mass?


Sure, the idea of a binary format was considered from the beginning in 
[1], and it can be done either together with the current filesystem, or 
as a replacement via different mount options.


Thank you,
Emanuele


[1] 
https://lore.kernel.org/kvm/5d6cdcb1-d8ad-7ae6-7351-3544e2fa3...@redhat.com/?fbclid=IwAR18LHJ0PBcXcDaLzILFhHsl3qpT3z2vlG60RnqgbpGYhDv7L43n0ZXJY8M





Signed-off-by: Emanuele Giuseppe Esposito 

v1->v2 remove unnecessary list_foreach_safe loops, fix wrong indentation,
change statsfs in stats_fs

Emanuele Giuseppe Esposito (5):
   refcount, kref: add dec-and-test wrappers for rw_semaphores
   stats_fs API: create, add and remove stats_fs sources and values
   kunit: tests for stats_fs API
   stats_fs fs: virtual fs to show stats to the end-user
   kvm_main: replace debugfs with stats_fs

  MAINTAINERS |7 +
  arch/arm64/kvm/Kconfig  |1 +
  arch/arm64/kvm/guest.c  |2 +-
  arch/mips/kvm/Kconfig   |1 +
  arch/mips/kvm/mips.c|2 +-
  arch/powerpc/kvm/Kconfig|1 +
  arch/powerpc/kvm/book3s.c   |6 +-
  arch/powerpc/kvm/booke.c|8 +-
  arch/s390/kvm/Kconfig   |1 +
  arch/s390/kvm/kvm-s390.c|   16 +-
  arch/x86/include/asm/kvm_host.h |2 +-
  arch/x86/kvm/Kconfig|1 +
  arch/x86/kvm/Makefile   |2 +-
  arch/x86/kvm/debugfs.c  |   64 --
  arch/x86/kvm/stats_fs.c |   56 ++
  arch/x86/kvm/x86.c  |6 +-
  fs/Kconfig  |   12 +
  fs/Makefile |1 +
  fs/stats_fs/Makefile|6 +
  fs/stats_fs/inode.c |  337 ++
  fs/stats_fs/internal.h  |   35 +
  fs/stats_fs/stats_fs-tests.c| 1088 +++
  fs/stats_fs/stats_fs.c  |  773 ++
  include/linux/kref.h|   11 +
  include/linux/kvm_host.h|   39 +-
  include/linux/refcount.h|2 +
  include/linux/stats_fs.h|  304 +
  include/uapi/linux/magic.h  |1 +
  lib/refcount.c  |   32 +
  tools/lib/api/fs/fs.c   |   21 +
  virt/kvm/arm/arm.c  |2 +-
  virt/kvm/kvm_main.c |  314 ++---
  32 files changed, 2772 insertions(+), 382 deletions(-)
  delete mode 100644 arch/x86/kvm/debugfs.c
  create mode 10

[PATCH v2 5/5] kvm_main: replace debugfs with stats_fs

2020-05-04 Thread Emanuele Giuseppe Esposito
Use stats_fs API instead of debugfs to create sources and add values.

This also requires to change all architecture files to replace the old
debugfs_entries with stats_fs_vcpu_entries and statsfs_vm_entries.

The files/folders name and organization is kept unchanged, and a symlink
in sys/kernel/debugfs/kvm is left for backward compatibility.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 arch/arm64/kvm/Kconfig  |   1 +
 arch/arm64/kvm/guest.c  |   2 +-
 arch/mips/kvm/Kconfig   |   1 +
 arch/mips/kvm/mips.c|   2 +-
 arch/powerpc/kvm/Kconfig|   1 +
 arch/powerpc/kvm/book3s.c   |   6 +-
 arch/powerpc/kvm/booke.c|   8 +-
 arch/s390/kvm/Kconfig   |   1 +
 arch/s390/kvm/kvm-s390.c|  16 +-
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/Kconfig|   1 +
 arch/x86/kvm/Makefile   |   2 +-
 arch/x86/kvm/debugfs.c  |  64 ---
 arch/x86/kvm/stats_fs.c |  56 ++
 arch/x86/kvm/x86.c  |   6 +-
 include/linux/kvm_host.h|  39 +---
 virt/kvm/arm/arm.c  |   2 +-
 virt/kvm/kvm_main.c | 314 
 18 files changed, 142 insertions(+), 382 deletions(-)
 delete mode 100644 arch/x86/kvm/debugfs.c
 create mode 100644 arch/x86/kvm/stats_fs.c

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 449386d76441..8c125387b673 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -23,6 +23,7 @@ config KVM
depends on OF
# for TASKSTATS/TASK_DELAY_ACCT:
depends on NET && MULTIUSER
+   imply STATS_FS
select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select HAVE_KVM_CPU_RELAX_INTERCEPT
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 8417b200bec9..235ed44e4353 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,7 +29,7 @@
 
 #include "trace.h"
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("halt_successful_poll", halt_successful_poll),
VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index b91d145aa2d5..19d14e979e5f 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -19,6 +19,7 @@ config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM
depends on MIPS_FP_SUPPORT
+   imply STATS_FS
select EXPORT_UASM
select PREEMPT_NOTIFIERS
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index fdf1c14d9205..a47d21f35444 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -39,7 +39,7 @@
 #define VECTORSPACING 0x100/* for EI/VI mode */
 #endif
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("wait", wait_exits),
VCPU_STAT("cache", cache_exits),
VCPU_STAT("signal", signal_exits),
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 12885eda324e..feb5e110ebb0 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -19,6 +19,7 @@ if VIRTUALIZATION
 
 config KVM
bool
+   imply STATS_FS
select PREEMPT_NOTIFIERS
select HAVE_KVM_EVENTFD
select HAVE_KVM_VCPU_ASYNC_IOCTL
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 37508a356f28..76222ab148da 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -38,7 +38,7 @@
 
 /* #define EXIT_DEBUG */
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vcpu_entries[] = {
VCPU_STAT("exits", sum_exits),
VCPU_STAT("mmio", mmio_exits),
VCPU_STAT("sig", signal_exits),
@@ -66,6 +66,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
VCPU_STAT("pthru_all", pthru_all),
VCPU_STAT("pthru_host", pthru_host),
VCPU_STAT("pthru_bad_aff", pthru_bad_aff),
+   { NULL }
+};
+
+struct stats_fs_value stats_fs_vm_entries[] = {
VM_STAT("largepages_2M", num_2M_pages, .mode = 0444),
VM_STAT("largepages_1G", num_1G_pages, .mode = 0444),
{ NULL }
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c2984cb6dfa7..b14c07786cc8 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -35,7 +35,12 @@
 
 unsigned long kvmppc_booke_handlers;
 
-struct kvm_stats_debugfs_item debugfs_entries[] = {
+struct stats_fs_value stats_fs_vm_entries[] = {
+   VM_STAT("remote_tlb_flush", remote_tlb_flush),
+   { NULL }
+};
+
+struct stats_fs_value stats_fs_vcpu_entries[] = {

[PATCH v2 4/5] stats_fs fs: virtual fs to show stats to the end-user

2020-05-04 Thread Emanuele Giuseppe Esposito
Add virtual fs that maps stats_fs sources with directories, and values
(simple or aggregates) to files.

Every time a file is read/cleared, the fs internally invokes the stats_fs
API to get/set the requested value.

fs/stats_fs/inode.c is pretty much similar to what is done in
fs/debugfs/inode.c, with the exception that the API is only
composed by stats_fs_create_file, stats_fs_create_dir and stats_fs_remove.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/stats_fs/Makefile   |   2 +-
 fs/stats_fs/inode.c| 337 +
 fs/stats_fs/internal.h |  15 ++
 fs/stats_fs/stats_fs.c | 163 ++
 include/linux/stats_fs.h   |  15 ++
 include/uapi/linux/magic.h |   1 +
 tools/lib/api/fs/fs.c  |  21 +++
 7 files changed, 553 insertions(+), 1 deletion(-)
 create mode 100644 fs/stats_fs/inode.c

diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
index 9db130fac6b6..ac12c27545f6 100644
--- a/fs/stats_fs/Makefile
+++ b/fs/stats_fs/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-stats_fs-objs  := stats_fs.o
+stats_fs-objs  := inode.o stats_fs.o
 stats_fs-tests-objs:= stats_fs-tests.o
 
 obj-$(CONFIG_STATS_FS) += stats_fs.o
diff --git a/fs/stats_fs/inode.c b/fs/stats_fs/inode.c
new file mode 100644
index ..865ee91656ba
--- /dev/null
+++ b/fs/stats_fs/inode.c
@@ -0,0 +1,337 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  inode.c - part of stats_fs, a tiny little stats_fs file system
+ *
+ *  Copyright (C) 2020 Emanuele Giuseppe Esposito 
+ *  Copyright (C) 2020 Redhat
+ */
+#define pr_fmt(fmt)"stats_fs: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "internal.h"
+
+#define STATS_FS_DEFAULT_MODE  0700
+
+static struct simple_fs stats_fs;
+static bool stats_fs_registered;
+
+struct stats_fs_mount_opts {
+   kuid_t uid;
+   kgid_t gid;
+   umode_t mode;
+};
+
+enum {
+   Opt_uid,
+   Opt_gid,
+   Opt_mode,
+   Opt_err
+};
+
+static const match_table_t tokens = {
+   {Opt_uid, "uid=%u"},
+   {Opt_gid, "gid=%u"},
+   {Opt_mode, "mode=%o"},
+   {Opt_err, NULL}
+};
+
+struct stats_fs_fs_info {
+   struct stats_fs_mount_opts mount_opts;
+};
+
+static int stats_fs_parse_options(char *data, struct stats_fs_mount_opts *opts)
+{
+   substring_t args[MAX_OPT_ARGS];
+   int option;
+   int token;
+   kuid_t uid;
+   kgid_t gid;
+   char *p;
+
+   opts->mode = STATS_FS_DEFAULT_MODE;
+
+   while ((p = strsep(&data, ",")) != NULL) {
+   if (!*p)
+   continue;
+
+   token = match_token(p, tokens, args);
+   switch (token) {
+   case Opt_uid:
+   if (match_int(&args[0], &option))
+   return -EINVAL;
+   uid = make_kuid(current_user_ns(), option);
+   if (!uid_valid(uid))
+   return -EINVAL;
+   opts->uid = uid;
+   break;
+   case Opt_gid:
+   if (match_int(&args[0], &option))
+   return -EINVAL;
+   gid = make_kgid(current_user_ns(), option);
+   if (!gid_valid(gid))
+   return -EINVAL;
+   opts->gid = gid;
+   break;
+   case Opt_mode:
+   if (match_octal(&args[0], &option))
+   return -EINVAL;
+   opts->mode = option & S_IALLUGO;
+   break;
+   /*
+* We might like to report bad mount options here;
+* but traditionally stats_fs has ignored all mount options
+*/
+   }
+   }
+
+   return 0;
+}
+
+static int stats_fs_apply_options(struct super_block *sb)
+{
+   struct stats_fs_fs_info *fsi = sb->s_fs_info;
+   struct inode *inode = d_inode(sb->s_root);
+   struct stats_fs_mount_opts *opts = &fsi->mount_opts;
+
+   inode->i_mode &= ~S_IALLUGO;
+   inode->i_mode |= opts->mode;
+
+   inode->i_uid = opts->uid;
+   inode->i_gid = opts->gid;
+
+   return 0;
+}
+
+static int stats_fs_remount(struct super_block *sb, int *flags, char *data)
+{
+   int err;
+   struct stats_fs_fs_info *fsi = sb->s_fs_info;
+
+   sync_filesystem(sb);
+   err = stats_fs_parse_options(data, &fsi->mount_opts);
+   if (err)
+   goto fail;
+
+   stats_fs_apply_options(sb);
+
+fail:
+   return err;
+}
+
+static int stats_fs_show_options(struct seq_file *m, struct dentry *root)
+{
+   struct stat

[PATCH v2 3/5] kunit: tests for stats_fs API

2020-05-04 Thread Emanuele Giuseppe Esposito
Add kunit tests to extensively test the stats_fs API functionality.

In order to run them, the kernel .config must set CONFIG_KUNIT=y
and a new .kunitconfig file must be created with CONFIG_STATS_FS=y
and CONFIG_STATS_FS_TEST=y

Tests can be then started by running the following command from the root
directory of the linux kernel source tree:
./tools/testing/kunit/kunit.py run --timeout=30 --jobs=`nproc --all`

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/Kconfig   |6 +
 fs/stats_fs/Makefile |2 +
 fs/stats_fs/stats_fs-tests.c | 1088 ++
 3 files changed, 1096 insertions(+)
 create mode 100644 fs/stats_fs/stats_fs-tests.c

diff --git a/fs/Kconfig b/fs/Kconfig
index 1b0de0f19e96..0844e8defd22 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -334,4 +334,10 @@ config STATS_FS
  stats_fs is a virtual file system that provides counters and
  other statistics about the running kernel.
 
+config STATS_FS_TEST
+   bool "Tests for stats_fs"
+   depends on STATS_FS && KUNIT
+   help
+ tests for the stats_fs API.
+
 endmenu
diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
index 94fe52d590d5..9db130fac6b6 100644
--- a/fs/stats_fs/Makefile
+++ b/fs/stats_fs/Makefile
@@ -1,4 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 stats_fs-objs  := stats_fs.o
+stats_fs-tests-objs:= stats_fs-tests.o
 
 obj-$(CONFIG_STATS_FS) += stats_fs.o
+obj-$(CONFIG_STATS_FS_TEST) += stats_fs-tests.o
diff --git a/fs/stats_fs/stats_fs-tests.c b/fs/stats_fs/stats_fs-tests.c
new file mode 100644
index ..46c3fb510ee9
--- /dev/null
+++ b/fs/stats_fs/stats_fs-tests.c
@@ -0,0 +1,1088 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include "internal.h"
+
+#define STATS_FS_STAT(el, x, ...)  
\
+   {  \
+   .name = #x, .offset = offsetof(struct container, el.x),\
+   ##__VA_ARGS__  \
+   }
+
+#define ARR_SIZE(el) ((int)(sizeof(el) / sizeof(struct stats_fs_value) - 1))
+
+struct test_values_struct {
+   uint64_t u64;
+   int32_t s32;
+   bool bo;
+   uint8_t u8;
+   int16_t s16;
+};
+
+struct container {
+   struct test_values_struct vals;
+};
+
+struct stats_fs_value test_values[6] = {
+   STATS_FS_STAT(vals, u64, .type = STATS_FS_U64,
+ .aggr_kind = STATS_FS_NONE, .mode = 0),
+   STATS_FS_STAT(vals, s32, .type = STATS_FS_S32,
+ .aggr_kind = STATS_FS_NONE, .mode = 0),
+   STATS_FS_STAT(vals, bo, .type = STATS_FS_BOOL,
+ .aggr_kind = STATS_FS_NONE, .mode = 0),
+   STATS_FS_STAT(vals, u8, .type = STATS_FS_U8, .aggr_kind = STATS_FS_NONE,
+ .mode = 0),
+   STATS_FS_STAT(vals, s16, .type = STATS_FS_S16,
+ .aggr_kind = STATS_FS_NONE, .mode = 0),
+   { NULL },
+};
+
+struct stats_fs_value test_aggr[4] = {
+   STATS_FS_STAT(vals, s32, .type = STATS_FS_S32,
+ .aggr_kind = STATS_FS_MIN, .mode = 0),
+   STATS_FS_STAT(vals, bo, .type = STATS_FS_BOOL,
+ .aggr_kind = STATS_FS_MAX, .mode = 0),
+   STATS_FS_STAT(vals, u64, .type = STATS_FS_U64,
+ .aggr_kind = STATS_FS_SUM, .mode = 0),
+   { NULL },
+};
+
+struct stats_fs_value test_same_name[3] = {
+   STATS_FS_STAT(vals, s32, .type = STATS_FS_S32,
+ .aggr_kind = STATS_FS_NONE, .mode = 0),
+   STATS_FS_STAT(vals, s32, .type = STATS_FS_S32,
+ .aggr_kind = STATS_FS_MIN, .mode = 0),
+   { NULL },
+};
+
+struct stats_fs_value test_all_aggr[6] = {
+   STATS_FS_STAT(vals, s32, .type = STATS_FS_S32,
+ .aggr_kind = STATS_FS_MIN, .mode = 0),
+   STATS_FS_STAT(vals, bo, .type = STATS_FS_BOOL,
+ .aggr_kind = STATS_FS_COUNT_ZERO, .mode = 0),
+   STATS_FS_STAT(vals, u64, .type = STATS_FS_U64,
+ .aggr_kind = STATS_FS_SUM, .mode = 0),
+   STATS_FS_STAT(vals, u8, .type = STATS_FS_U8, .aggr_kind = STATS_FS_AVG,
+ .mode = 0),
+   STATS_FS_STAT(vals, s16, .type = STATS_FS_S16,
+ .aggr_kind = STATS_FS_MAX, .mode = 0),
+   { NULL },
+};
+
+#define def_u64 ((uint64_t)64)
+
+#define def_val_s32 ((int32_t)S32_MIN)
+#define def_val_bool ((bool)true)
+#define def_val_u8 ((uint8_t)127)
+#define def_val_s16 ((int16_t)1)
+
+#define def_val2_s32 ((int32_t)S16_MAX)
+#define def_val2_bool ((bool)false)
+#define def_val2_u8 ((uint8_t)255)
+#define def_val2_s16 ((int16_t)-2)
+
+struct container cont = {
+   .vals = {
+ 

[PATCH v2 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics

2020-05-04 Thread Emanuele Giuseppe Esposito
There is currently no common way for Linux kernel subsystems to expose
statistics to userspace shared throughout the Linux kernel; subsystems
have to take care of gathering and displaying statistics by themselves,
for example in the form of files in debugfs. For example KVM has its own
code section that takes care of this in virt/kvm/kvm_main.c, where it sets
up debugfs handlers for displaying values and aggregating them from
various subfolders to obtain information about the system state (i.e.
displaying the total number of exits, calculated by summing all exits of
all cpus of all running virtual machines).

Allowing each section of the kernel to do so has two disadvantages. First,
it will introduce redundant code. Second, debugfs is anyway not the right
place for statistics (for example it is affected by lockdown)

In this patch series I introduce statsfs, a synthetic ram-based virtual
filesystem that takes care of gathering and displaying statistics for the
Linux kernel subsystems.

The file system is mounted on /sys/kernel/stats and would be already used
by kvm. Statsfs was initially introduced by Paolo Bonzini [1].

Statsfs offers a generic and stable API, allowing any kind of
directory/file organization and supporting multiple kind of aggregations
(not only sum, but also average, max, min and count_zero) and data types
(all unsigned and signed types plus boolean). The implementation, which is
a generalization of KVM’s debugfs statistics code, takes care of gathering
and displaying information at run time; users only need to specify the
values to be included in each source.

Statsfs would also be a different mountpoint from debugfs, and would not
suffer from limited access due to the security lock down patches. Its main
function is to display each statistics as a file in the desired folder
hierarchy defined through the API. Statsfs files can be read, and possibly
cleared if their file mode allows it.

Statsfs has two main components: the public API defined by
include/linux/statsfs.h, and the virtual file system which should end up
in /sys/kernel/stats.

The API has two main elements, values and sources. Kernel subsystems like
KVM can use the API to create a source, add child
sources/values/aggregates and register it to the root source (that on the
virtual fs would be /sys/kernel/statsfs).

Sources are created via statsfs_source_create(), and each source becomes a
directory in the file system. Sources form a parent-child relationship;
root sources are added to the file system via statsfs_source_register().
Every other source is added to or removed from a parent through the
statsfs_source_add_subordinate and statsfs_source_remote_subordinate APIs.
Once a source is created and added to the tree (via add_subordinate), it
will be used to compute aggregate values in the parent source.

Values represent quantites that are gathered by the statsfs user. Examples
of values include the number of vm exits of a given kind, the amount of
memory used by some data structure, the length of the longest hash table
chain, or anything like that. Values are defined with the
statsfs_source_add_values function. Each value is defined by a struct
statsfs_value; the same statsfs_value can be added to many different
sources. A value can be considered "simple" if it fetches data from a
user-provided location, or "aggregate" if it groups all values in the
subordinates sources that include the same statsfs_value.

For more information, please consult the kerneldoc documentation in patch
2 and the sample uses in the kunit tests and in KVM.

This series of patches is based on my previous series "libfs: group and
simplify linux fs code" and the single patch sent to kvm "kvm_host: unify
VM_STAT and VCPU_STAT definitions in a single place". The former
simplifies code duplicated in debugfs and tracefs (from which statsfs is
based on), the latter groups all macros definition for statistics in kvm
in a single common file shared by all architectures.

Patch 1 adds a new refcount and kref destructor wrappers that take a
semaphore, as those are used later by statsfs. Patch 2 introduces the
statsfs API, patch 3 provides extensive tests that can also be used as
example on how to use the API and patch 4 adds the file system support.
Finally, patch 5 provides a real-life example of statsfs usage in KVM.

[1] 
https://lore.kernel.org/kvm/5d6cdcb1-d8ad-7ae6-7351-3544e2fa3...@redhat.com/?fbclid=IwAR18LHJ0PBcXcDaLzILFhHsl3qpT3z2vlG60RnqgbpGYhDv7L43n0ZXJY8M

Signed-off-by: Emanuele Giuseppe Esposito 

v1->v2 remove unnecessary list_foreach_safe loops, fix wrong indentation,
change statsfs in stats_fs

Emanuele Giuseppe Esposito (5):
  refcount, kref: add dec-and-test wrappers for rw_semaphores
  stats_fs API: create, add and remove stats_fs sources and values
  kunit: tests for stats_fs API
  stats_fs fs: virtual fs to show stats to the end-user
  kvm_main: replace debugfs with stats_fs

 MAINTAINERS  

[PATCH v2 2/5] stats_fs API: create, add and remove stats_fs sources and values

2020-05-04 Thread Emanuele Giuseppe Esposito
Introduction to the stats_fs API, that allows to easily create, add
and remove stats_fs sources and values. The API allows to easily building
the statistics directory tree to automatically gather them for the linux
kernel. The main functionalities are: create a source, add child
sources/values/aggregates, register it to the root source (that on
the virtual fs would be /sys/kernel/statsfs), ad perform a search for
a value/aggregate.

This allows creating any kind of source tree, making it more flexible
also to future readjustments.

The API representation is only logical and will be backed up
by a virtual file system in patch 4.
Its usage will be shared between the stats_fs file system
and the end-users like kvm, the former calling it when it needs to
display and clear statistics, the latter to add values and sources.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 MAINTAINERS  |   7 +
 fs/Kconfig   |   6 +
 fs/Makefile  |   1 +
 fs/stats_fs/Makefile |   4 +
 fs/stats_fs/internal.h   |  20 ++
 fs/stats_fs/stats_fs.c   | 610 +++
 include/linux/stats_fs.h | 289 +++
 7 files changed, 937 insertions(+)
 create mode 100644 fs/stats_fs/Makefile
 create mode 100644 fs/stats_fs/internal.h
 create mode 100644 fs/stats_fs/stats_fs.c
 create mode 100644 include/linux/stats_fs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b816a453b10e..a8403d07cee5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5229,6 +5229,13 @@ F:   include/linux/debugfs.h
 F: include/linux/kobj*
 F: lib/kobj*
 
+STATS_FS
+M: Paolo Bonzini 
+R: Emanuele Giuseppe Esposito 
+S: Supported
+F: include/linux/stats_fs.h
+F: fs/stats_fs
+
 DRIVERS FOR ADAPTIVE VOLTAGE SCALING (AVS)
 M: Kevin Hilman 
 M: Nishanth Menon 
diff --git a/fs/Kconfig b/fs/Kconfig
index f08fbbfafd9a..1b0de0f19e96 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -328,4 +328,10 @@ source "fs/unicode/Kconfig"
 config IO_WQ
bool
 
+config STATS_FS
+   bool "Statistics Filesystem"
+   help
+ stats_fs is a virtual file system that provides counters and
+ other statistics about the running kernel.
+
 endmenu
diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..91558eca0cf7 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_BEFS_FS)   += befs/
 obj-$(CONFIG_HOSTFS)   += hostfs/
 obj-$(CONFIG_CACHEFILES)   += cachefiles/
 obj-$(CONFIG_DEBUG_FS) += debugfs/
+obj-$(CONFIG_STATS_FS) += stats_fs/
 obj-$(CONFIG_TRACING)  += tracefs/
 obj-$(CONFIG_OCFS2_FS) += ocfs2/
 obj-$(CONFIG_BTRFS_FS) += btrfs/
diff --git a/fs/stats_fs/Makefile b/fs/stats_fs/Makefile
new file mode 100644
index ..94fe52d590d5
--- /dev/null
+++ b/fs/stats_fs/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+stats_fs-objs  := stats_fs.o
+
+obj-$(CONFIG_STATS_FS) += stats_fs.o
diff --git a/fs/stats_fs/internal.h b/fs/stats_fs/internal.h
new file mode 100644
index ..ddf262a60736
--- /dev/null
+++ b/fs/stats_fs/internal.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _STATS_FS_INTERNAL_H_
+#define _STATS_FS_INTERNAL_H_
+
+#include 
+#include 
+#include 
+#include 
+
+/* values, grouped by base */
+struct stats_fs_value_source {
+   void *base_addr;
+   bool files_created;
+   struct stats_fs_value *values;
+   struct list_head list_element;
+};
+
+int stats_fs_val_get_mode(struct stats_fs_value *val);
+
+#endif /* _STATS_FS_INTERNAL_H_ */
diff --git a/fs/stats_fs/stats_fs.c b/fs/stats_fs/stats_fs.c
new file mode 100644
index ..b63de12769e2
--- /dev/null
+++ b/fs/stats_fs/stats_fs.c
@@ -0,0 +1,610 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "internal.h"
+
+struct stats_fs_aggregate_value {
+   uint64_t sum, min, max;
+   uint32_t count, count_zero;
+};
+
+static int is_val_signed(struct stats_fs_value *val)
+{
+   return val->type & STATS_FS_SIGN;
+}
+
+int stats_fs_val_get_mode(struct stats_fs_value *val)
+{
+   return val->mode ? val->mode : 0644;
+}
+
+static struct stats_fs_value *find_value(struct stats_fs_value_source *src,
+struct stats_fs_value *val)
+{
+   struct stats_fs_value *entry;
+
+   for (entry = src->values; entry->name; entry++) {
+   if (entry == val)
+   return entry;
+   }
+   return NULL;
+}
+
+static struct stats_fs_value *
+search_value_in_source(struct stats_fs_source *src, struct stats_fs_value *arg,
+  struct stats_fs_value_source **val_src)
+{
+   struct stats_fs_value *entry;
+   struct stats_fs_value_source *src_entry;
+
+   list_for_each_entry (src_entry, &sr

[PATCH v2 1/5] refcount, kref: add dec-and-test wrappers for rw_semaphores

2020-05-04 Thread Emanuele Giuseppe Esposito
Similar to the existing functions that take a mutex or spinlock if and
only if a reference count is decremented to zero, these new function
take an rwsem for writing just before the refcount reaches 0 (and
call a user-provided function in the case of kref_put_rwsem).

These will be used for stats_fs_source data structures, which are
protected by an rw_semaphore to allow concurrent sysfs reads.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/linux/kref.h | 11 +++
 include/linux/refcount.h |  2 ++
 lib/refcount.c   | 32 
 3 files changed, 45 insertions(+)

diff --git a/include/linux/kref.h b/include/linux/kref.h
index d32e21a2538c..2dc935445f45 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -79,6 +79,17 @@ static inline int kref_put_mutex(struct kref *kref,
return 0;
 }
 
+static inline int kref_put_rwsem(struct kref *kref,
+void (*release)(struct kref *kref),
+struct rw_semaphore *rwsem)
+{
+   if (refcount_dec_and_down_write(&kref->refcount, rwsem)) {
+   release(kref);
+   return 1;
+   }
+   return 0;
+}
+
 static inline int kref_put_lock(struct kref *kref,
void (*release)(struct kref *kref),
spinlock_t *lock)
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 0e3ee25eb156..a9d5038aec9a 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -99,6 +99,7 @@
 #include 
 
 struct mutex;
+struct rw_semaphore;
 
 /**
  * struct refcount_t - variant of atomic_t specialized for reference counts
@@ -313,6 +314,7 @@ static inline void refcount_dec(refcount_t *r)
 extern __must_check bool refcount_dec_if_one(refcount_t *r);
 extern __must_check bool refcount_dec_not_one(refcount_t *r);
 extern __must_check bool refcount_dec_and_mutex_lock(refcount_t *r, struct 
mutex *lock);
+extern __must_check bool refcount_dec_and_down_write(refcount_t *r, struct 
rw_semaphore *rwsem);
 extern __must_check bool refcount_dec_and_lock(refcount_t *r, spinlock_t 
*lock);
 extern __must_check bool refcount_dec_and_lock_irqsave(refcount_t *r,
   spinlock_t *lock,
diff --git a/lib/refcount.c b/lib/refcount.c
index ebac8b7d15a7..03e113e1b43a 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -4,6 +4,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -94,6 +95,37 @@ bool refcount_dec_not_one(refcount_t *r)
 }
 EXPORT_SYMBOL(refcount_dec_not_one);
 
+/**
+ * refcount_dec_and_down_write - return holding rwsem for writing if able to 
decrement
+ *   refcount to 0
+ * @r: the refcount
+ * @lock: the mutex to be locked
+ *
+ * Similar to atomic_dec_and_mutex_lock(), it will WARN on underflow and fail
+ * to decrement when saturated at REFCOUNT_SATURATED.
+ *
+ * Provides release memory ordering, such that prior loads and stores are done
+ * before, and provides a control dependency such that free() must come after.
+ * See the comment on top.
+ *
+ * Return: true and hold rwsem for writing if able to decrement refcount to 0, 
false
+ * otherwise
+ */
+bool refcount_dec_and_down_write(refcount_t *r, struct rw_semaphore *lock)
+{
+   if (refcount_dec_not_one(r))
+   return false;
+
+   down_write(lock);
+   if (!refcount_dec_and_test(r)) {
+   up_write(lock);
+   return false;
+   }
+
+   return true;
+}
+EXPORT_SYMBOL(refcount_dec_and_down_write);
+
 /**
  * refcount_dec_and_mutex_lock - return holding mutex if able to decrement
  *   refcount to 0
-- 
2.25.2



[PATCH v2 7/7] tracefs: switch to simplefs inode creation API

2020-04-21 Thread Emanuele Giuseppe Esposito
There is no semantic change intended; the code in the libfs.c
functions in fact was derived from debugfs and tracefs code.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/tracefs/inode.c | 86 --
 1 file changed, 7 insertions(+), 79 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 370eb38ff1ad..bceaa4f45da2 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -308,57 +308,6 @@ static struct file_system_type trace_fs_type = {
 };
 MODULE_ALIAS_FS("tracefs");
 
-static struct dentry *start_creating(const char *name, struct dentry *parent)
-{
-   struct dentry *dentry;
-   int error;
-
-   pr_debug("tracefs: creating file '%s'\n",name);
-
-   error = simple_pin_fs(&tracefs, &trace_fs_type);
-   if (error)
-   return ERR_PTR(error);
-
-   /* If the parent is not specified, we create it in the root.
-* We need the root dentry to do this, which is in the super
-* block. A pointer to that is in the struct vfsmount that we
-* have around.
-*/
-   if (!parent)
-   parent = tracefs.mount->mnt_root;
-
-   inode_lock(parent->d_inode);
-   if (unlikely(IS_DEADDIR(parent->d_inode)))
-   dentry = ERR_PTR(-ENOENT);
-   else
-   dentry = lookup_one_len(name, parent, strlen(name));
-   if (!IS_ERR(dentry) && dentry->d_inode) {
-   dput(dentry);
-   dentry = ERR_PTR(-EEXIST);
-   }
-
-   if (IS_ERR(dentry)) {
-   inode_unlock(parent->d_inode);
-   simple_release_fs(&tracefs);
-   }
-
-   return dentry;
-}
-
-static struct dentry *failed_creating(struct dentry *dentry)
-{
-   inode_unlock(dentry->d_parent->d_inode);
-   dput(dentry);
-   simple_release_fs(&tracefs);
-   return NULL;
-}
-
-static struct dentry *end_creating(struct dentry *dentry)
-{
-   inode_unlock(dentry->d_parent->d_inode);
-   return dentry;
-}
-
 /**
  * tracefs_create_file - create a file in the tracefs filesystem
  * @name: a pointer to a string containing the name of the file to create.
@@ -395,49 +344,28 @@ struct dentry *tracefs_create_file(const char *name, 
umode_t mode,
if (security_locked_down(LOCKDOWN_TRACEFS))
return NULL;
 
-   if (!(mode & S_IFMT))
-   mode |= S_IFREG;
-   BUG_ON(!S_ISREG(mode));
-   dentry = start_creating(name, parent);
-
+   dentry = simplefs_create_file(&tracefs, &trace_fs_type,
+ name, mode, parent, data, &inode);
if (IS_ERR(dentry))
return NULL;
 
-   inode = tracefs_get_inode(dentry->d_sb);
-   if (unlikely(!inode))
-   return failed_creating(dentry);
-
-   inode->i_mode = mode;
inode->i_fop = fops ? fops : &tracefs_file_operations;
-   inode->i_private = data;
-   d_instantiate(dentry, inode);
-   fsnotify_create(dentry->d_parent->d_inode, dentry);
-   return end_creating(dentry);
+   return simplefs_finish_dentry(dentry, inode);
 }
 
 static struct dentry *__create_dir(const char *name, struct dentry *parent,
   const struct inode_operations *ops)
 {
-   struct dentry *dentry = start_creating(name, parent);
+   struct dentry *dentry;
struct inode *inode;
 
+   dentry = simplefs_create_dir(&tracefs, &trace_fs_type,
+name, 0755, parent, &inode);
if (IS_ERR(dentry))
return NULL;
 
-   inode = tracefs_get_inode(dentry->d_sb);
-   if (unlikely(!inode))
-   return failed_creating(dentry);
-
-   inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
inode->i_op = ops;
-   inode->i_fop = &simple_dir_operations;
-
-   /* directory inodes start off with i_nlink == 2 (for "." entry) */
-   inc_nlink(inode);
-   d_instantiate(dentry, inode);
-   inc_nlink(dentry->d_parent->d_inode);
-   fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
-   return end_creating(dentry);
+   return simplefs_finish_dentry(dentry, inode);
 }
 
 /**
-- 
2.25.2



[PATCH v2 6/7] debugfs: switch to simplefs inode creation API

2020-04-21 Thread Emanuele Giuseppe Esposito
The only difference, compared to the pre-existing code, is that symlink
creation now triggers fsnotify_create.  This was a bug in the debugfs
code, since for example vfs_symlink does call fsnotify_create.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/debugfs/inode.c | 144 +
 1 file changed, 15 insertions(+), 129 deletions(-)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 5dbb74a23e7c..ccbeea9e5f6c 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -305,68 +305,6 @@ struct dentry *debugfs_lookup(const char *name, struct 
dentry *parent)
 }
 EXPORT_SYMBOL_GPL(debugfs_lookup);
 
-static struct dentry *start_creating(const char *name, struct dentry *parent)
-{
-   struct dentry *dentry;
-   int error;
-
-   pr_debug("creating file '%s'\n", name);
-
-   if (IS_ERR(parent))
-   return parent;
-
-   error = simple_pin_fs(&debugfs, &debug_fs_type);
-   if (error) {
-   pr_err("Unable to pin filesystem for file '%s'\n", name);
-   return ERR_PTR(error);
-   }
-
-   /* If the parent is not specified, we create it in the root.
-* We need the root dentry to do this, which is in the super
-* block. A pointer to that is in the struct vfsmount that we
-* have around.
-*/
-   if (!parent)
-   parent = debugfs.mount->mnt_root;
-
-   inode_lock(d_inode(parent));
-   if (unlikely(IS_DEADDIR(d_inode(parent
-   dentry = ERR_PTR(-ENOENT);
-   else
-   dentry = lookup_one_len(name, parent, strlen(name));
-   if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
-   if (d_is_dir(dentry))
-   pr_err("Directory '%s' with parent '%s' already 
present!\n",
-  name, parent->d_name.name);
-   else
-   pr_err("File '%s' in directory '%s' already present!\n",
-  name, parent->d_name.name);
-   dput(dentry);
-   dentry = ERR_PTR(-EEXIST);
-   }
-
-   if (IS_ERR(dentry)) {
-   inode_unlock(d_inode(parent));
-   simple_release_fs(&debugfs);
-   }
-
-   return dentry;
-}
-
-static struct dentry *failed_creating(struct dentry *dentry)
-{
-   inode_unlock(d_inode(dentry->d_parent));
-   dput(dentry);
-   simple_release_fs(&debugfs);
-   return ERR_PTR(-ENOMEM);
-}
-
-static struct dentry *end_creating(struct dentry *dentry)
-{
-   inode_unlock(d_inode(dentry->d_parent));
-   return dentry;
-}
-
 static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *proxy_fops,
@@ -375,32 +313,17 @@ static struct dentry *__debugfs_create_file(const char 
*name, umode_t mode,
struct dentry *dentry;
struct inode *inode;
 
-   if (!(mode & S_IFMT))
-   mode |= S_IFREG;
-   BUG_ON(!S_ISREG(mode));
-   dentry = start_creating(name, parent);
-
+   dentry = simplefs_create_file(&debugfs, &debug_fs_type,
+ name, mode, parent, data, &inode);
if (IS_ERR(dentry))
return dentry;
 
-   inode = debugfs_get_inode(dentry->d_sb);
-   if (unlikely(!inode)) {
-   pr_err("out of free dentries, can not create file '%s'\n",
-  name);
-   return failed_creating(dentry);
-   }
-
-   inode->i_mode = mode;
-   inode->i_private = data;
-
inode->i_op = &debugfs_file_inode_operations;
inode->i_fop = proxy_fops;
dentry->d_fsdata = (void *)((unsigned long)real_fops |
DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
 
-   d_instantiate(dentry, inode);
-   fsnotify_create(d_inode(dentry->d_parent), dentry);
-   return end_creating(dentry);
+   return simplefs_finish_dentry(dentry, inode);
 }
 
 /**
@@ -533,29 +456,16 @@ EXPORT_SYMBOL_GPL(debugfs_create_file_size);
  */
 struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
 {
-   struct dentry *dentry = start_creating(name, parent);
+   struct dentry *dentry;
struct inode *inode;
 
+   dentry = simplefs_create_dir(&debugfs, &debug_fs_type,
+name, 0755, parent, &inode);
if (IS_ERR(dentry))
return dentry;
 
-   inode = debugfs_get_inode(dentry->d_sb);
-   if (unlikely(!inode)) {
-   pr_err("out of free dentries, can not create directory '%s'\n",
-  name);
-   r

[PATCH v2 5/7] libfs: add file creation functions

2020-04-21 Thread Emanuele Giuseppe Esposito
A bunch of code is duplicated between debugfs and tracefs, unify it to the
libfs library.

The code is very similar, except that dentry and inode creation are unified
into a single function (unlike start_creating in debugfs and tracefs, which
only takes care of dentries).  This adds an output parameter to the
creation functions, but pushes all error recovery into fs/libfs.c.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/libfs.c | 226 +
 include/linux/fs.h |  18 
 2 files changed, 244 insertions(+)

diff --git a/fs/libfs.c b/fs/libfs.c
index 5c76e4c648dc..90b0c221d9a2 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -751,6 +751,232 @@ struct inode *simple_alloc_anon_inode(struct simple_fs 
*fs)
 }
 EXPORT_SYMBOL(simple_alloc_anon_inode);
 
+static struct dentry *failed_creating(struct simple_fs *fs, struct dentry 
*dentry)
+{
+   inode_unlock(d_inode(dentry->d_parent));
+   dput(dentry);
+   simple_release_fs(fs);
+   return ERR_PTR(-ENOMEM);
+}
+
+/**
+ * simplefs_create_dentry - creates a new dentry and inode
+ * @fs: a pointer to a struct simple_fs containing the reference counter
+ *  and vfs_mount pointer
+ * @type: the fs type
+ * @name: dentry name
+ * @parent: parent dentry. If this parameter is NULL,
+ *  then the dentry will be created in the root of the
+ *  filesystem.
+ * @inode: pointer that will contain a newly created inode
+ *
+ * This function returns a new dentry, or NULL on error.  On success, a
+ * new inode is created and stored into @inode.  Also note that the inode
+ * for the parent directory is locked by simplefs_create_dentry(),
+ * and will be unlocked by simple_finish_dentry().
+ **/
+struct dentry *simplefs_create_dentry(struct simple_fs *fs, struct 
file_system_type *type,
+ const char *name, struct dentry *parent,
+ struct inode **inode)
+{
+   struct dentry *dentry;
+   int error;
+
+   pr_debug("creating file '%s'\n", name);
+
+   if (IS_ERR(parent))
+   return parent;
+
+   error = simple_pin_fs(fs, type);
+   if (error) {
+   pr_err("Unable to pin filesystem for file '%s'\n", name);
+   return ERR_PTR(error);
+   }
+
+   /* If the parent is not specified, we create it in the root.
+* We need the root dentry to do this, which is in the super
+* block. A pointer to that is in the struct vfsmount that we
+* have around.
+*/
+   if (!parent)
+   parent = fs->mount->mnt_root;
+
+   inode_lock(d_inode(parent));
+   dentry = lookup_one_len(name, parent, strlen(name));
+   if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
+   if (d_is_dir(dentry))
+   pr_err("Directory '%s' with parent '%s' already 
present!\n",
+  name, parent->d_name.name);
+   else
+   pr_err("File '%s' in directory '%s' already present!\n",
+  name, parent->d_name.name);
+   dput(dentry);
+   dentry = ERR_PTR(-EEXIST);
+   }
+
+   if (IS_ERR(dentry)) {
+   inode_unlock(d_inode(parent));
+   simple_release_fs(fs);
+   }
+
+
+   if (IS_ERR(dentry))
+   return dentry;
+
+   *inode = new_inode_current_time(fs->mount->mnt_sb);
+   if (unlikely(!(*inode))) {
+   pr_err("out of free inodes, can not create file '%s'\n",
+  name);
+   return failed_creating(fs, dentry);
+   }
+
+   return dentry;
+}
+EXPORT_SYMBOL(simplefs_create_dentry);
+
+/**
+ * simplefs_create_file - creates a new file dentry and inode
+ * @fs: a pointer to a struct simple_fs containing the reference counter
+ *  and vfs_mount pointer
+ * @type: the fs type
+ * @name: file name
+ * @mode: file mode
+ * @parent: parent dentry. If this parameter is NULL,
+ *  then the file will be created in the root of the
+ *  filesystem.
+ * @data: what will the file contain
+ * @inode: pointer that will contain a newly created inode
+ *
+ * This function returns a new dentry, or NULL on error.  On success, a
+ * new inode is created and stored into @inode.  Also note that the inode
+ * for the parent directory is locked by simplefs_create_dentry(),
+ * and will be unlocked by simple_finish_dentry().
+ **/
+struct dentry *simplefs_create_file(struct simple_fs *fs, struct 
file_system_type *type,
+   const char *name, umode_t mode,
+   struct dentry *parent, void *data,
+   struct inode **inode)
+{
+   struct dentry *dentry;
+
+   WARN_ON((mode 

[PATCH v2 4/7] libfs: add alloc_anon_inode wrapper

2020-04-21 Thread Emanuele Giuseppe Esposito
libfs.c has many functions that are useful to implement dentry and inode
operations, but not many at the filesystem level. Start adding file
creation wrappers, the simplest returns an anonymous inode.

There is no functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 drivers/gpu/drm/drm_drv.c   |  2 +-
 drivers/misc/cxl/api.c  |  2 +-
 drivers/scsi/cxlflash/ocxl_hw.c |  2 +-
 fs/libfs.c  | 10 +-
 include/linux/fs.h  |  2 ++
 5 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index e29424d64874..1854f760ad39 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -539,7 +539,7 @@ static struct inode *drm_fs_inode_new(void)
return ERR_PTR(r);
}
 
-   inode = alloc_anon_inode(drm_fs.mount->mnt_sb);
+   inode = simple_alloc_anon_inode(&drm_fs);
if (IS_ERR(inode))
simple_release_fs(&drm_fs);
 
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 67e4808bce49..57672abb6223 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -72,7 +72,7 @@ static struct file *cxl_getfile(const char *name,
goto err_module;
}
 
-   inode = alloc_anon_inode(cxl_fs.mount->mnt_sb);
+   inode = simple_alloc_anon_inode(&cxl_fs);
if (IS_ERR(inode)) {
file = ERR_CAST(inode);
goto err_fs;
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index 7fa98dd4fa28..0e9f2ae7eebf 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -85,7 +85,7 @@ static struct file *ocxlflash_getfile(struct device *dev, 
const char *name,
goto err2;
}
 
-   inode = alloc_anon_inode(ocxlflash_fs.mount->mnt_sb);
+   inode = simple_alloc_anon_inode(&ocxlflash_fs);
if (IS_ERR(inode)) {
rc = PTR_ERR(inode);
dev_err(dev, "%s: alloc_anon_inode failed rc=%d\n",
diff --git a/fs/libfs.c b/fs/libfs.c
index 3fa0cd27ab06..5c76e4c648dc 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -741,7 +741,15 @@ void simple_release_fs(struct simple_fs *fs)
 }
 EXPORT_SYMBOL(simple_release_fs);
 
-
+/**
+ * simple_alloc_anon_inode - wrapper for alloc_anon_inode
+ * @fs: a pointer to a struct simple_fs containing a valid vfs_mount pointer
+ **/
+struct inode *simple_alloc_anon_inode(struct simple_fs *fs)
+{
+   return alloc_anon_inode(fs->mount->mnt_sb);
+}
+EXPORT_SYMBOL(simple_alloc_anon_inode);
 
 /**
  * simple_read_from_buffer - copy data from the buffer to user space
diff --git a/include/linux/fs.h b/include/linux/fs.h
index de2577df30ae..5e93de72118b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3373,6 +3373,8 @@ struct simple_fs {
 extern int simple_pin_fs(struct simple_fs *, struct file_system_type *);
 extern void simple_release_fs(struct simple_fs *);
 
+extern struct inode *simple_alloc_anon_inode(struct simple_fs *fs);
+
 extern ssize_t simple_read_from_buffer(void __user *to, size_t count,
loff_t *ppos, const void *from, size_t available);
 extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos,
-- 
2.25.2



[PATCH v2 3/7] libfs: introduce new_inode_current_time

2020-04-21 Thread Emanuele Giuseppe Esposito
It is a common special case for new_inode to initialize the
time to the current time and the inode to get_next_ino().
Introduce a core function that does it.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/libfs.c | 20 
 include/linux/fs.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/fs/libfs.c b/fs/libfs.c
index 54e07ae986ca..3fa0cd27ab06 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -594,6 +594,26 @@ int simple_write_end(struct file *file, struct 
address_space *mapping,
 }
 EXPORT_SYMBOL(simple_write_end);
 
+/**
+ * new_inode_current_time - create new inode by initializing the
+ * time to the current time and the inode to get_next_ino()
+ * @sb: pointer to super block of the file system
+ *
+ * Returns an inode pointer on success, NULL on failure.
+ */
+struct inode *new_inode_current_time(struct super_block *sb)
+{
+   struct inode *inode = new_inode(sb);
+
+   if (inode) {
+   inode->i_ino = get_next_ino();
+   inode->i_atime = inode->i_mtime =
+   inode->i_ctime = current_time(inode);
+   }
+   return inode;
+}
+EXPORT_SYMBOL(new_inode_current_time);
+
 /*
  * the inodes created here are not hashed. If you use iunique to generate
  * unique inode values later for this filesystem, then you must take care
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a3691c132b3a..de2577df30ae 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3088,6 +3088,7 @@ extern void clear_inode(struct inode *);
 extern void __destroy_inode(struct inode *);
 extern struct inode *new_inode_pseudo(struct super_block *sb);
 extern struct inode *new_inode(struct super_block *sb);
+extern struct inode *new_inode_current_time(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_privs(struct file *);
-- 
2.25.2



[PATCH v2 2/7] libfs: wrap simple_pin_fs/simple_release_fs arguments in a struct

2020-04-21 Thread Emanuele Giuseppe Esposito
Simplify passing the count and mount to simple_pin_fs and
simple_release_fs by wrapping them in the simple_fs struct,
in preparation for adding more high level operations to
fs/libfs.c

There is no functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 drivers/gpu/drm/drm_drv.c   | 11 
 drivers/misc/cxl/api.c  | 13 +-
 drivers/scsi/cxlflash/ocxl_hw.c | 14 +-
 fs/binfmt_misc.c|  9 +++
 fs/configfs/mount.c | 10 +++-
 fs/debugfs/inode.c  | 22 
 fs/libfs.c  | 45 +
 fs/tracefs/inode.c  | 18 ++---
 include/linux/fs.h  | 10 ++--
 security/apparmor/apparmorfs.c  | 25 +-
 security/inode.c| 11 
 11 files changed, 103 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 7b1a628d1f6e..e29424d64874 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -514,8 +514,7 @@ EXPORT_SYMBOL(drm_dev_unplug);
  * iput(), but this way you'd end up with a new vfsmount for each inode.
  */
 
-static int drm_fs_cnt;
-static struct vfsmount *drm_fs_mnt;
+static struct simple_fs drm_fs;
 
 static int drm_fs_init_fs_context(struct fs_context *fc)
 {
@@ -534,15 +533,15 @@ static struct inode *drm_fs_inode_new(void)
struct inode *inode;
int r;
 
-   r = simple_pin_fs(&drm_fs_type, &drm_fs_mnt, &drm_fs_cnt);
+   r = simple_pin_fs(&drm_fs, &drm_fs_type);
if (r < 0) {
DRM_ERROR("Cannot mount pseudo fs: %d\n", r);
return ERR_PTR(r);
}
 
-   inode = alloc_anon_inode(drm_fs_mnt->mnt_sb);
+   inode = alloc_anon_inode(drm_fs.mount->mnt_sb);
if (IS_ERR(inode))
-   simple_release_fs(&drm_fs_mnt, &drm_fs_cnt);
+   simple_release_fs(&drm_fs);
 
return inode;
 }
@@ -551,7 +550,7 @@ static void drm_fs_inode_free(struct inode *inode)
 {
if (inode) {
iput(inode);
-   simple_release_fs(&drm_fs_mnt, &drm_fs_cnt);
+   simple_release_fs(&drm_fs);
}
 }
 
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index b493de962153..67e4808bce49 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -31,8 +31,7 @@
 
 #define CXL_PSEUDO_FS_MAGIC0x1697697f
 
-static int cxl_fs_cnt;
-static struct vfsmount *cxl_vfs_mount;
+static struct simple_fs cxl_fs;
 
 static int cxl_fs_init_fs_context(struct fs_context *fc)
 {
@@ -50,7 +49,7 @@ static struct file_system_type cxl_fs_type = {
 void cxl_release_mapping(struct cxl_context *ctx)
 {
if (ctx->kernelapi && ctx->mapping)
-   simple_release_fs(&cxl_vfs_mount, &cxl_fs_cnt);
+   simple_release_fs(&cxl_fs);
 }
 
 static struct file *cxl_getfile(const char *name,
@@ -66,20 +65,20 @@ static struct file *cxl_getfile(const char *name,
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   rc = simple_pin_fs(&cxl_fs_type, &cxl_vfs_mount, &cxl_fs_cnt);
+   rc = simple_pin_fs(&cxl_fs, &cxl_fs_type);
if (rc < 0) {
pr_err("Cannot mount cxl pseudo filesystem: %d\n", rc);
file = ERR_PTR(rc);
goto err_module;
}
 
-   inode = alloc_anon_inode(cxl_vfs_mount->mnt_sb);
+   inode = alloc_anon_inode(cxl_fs.mount->mnt_sb);
if (IS_ERR(inode)) {
file = ERR_CAST(inode);
goto err_fs;
}
 
-   file = alloc_file_pseudo(inode, cxl_vfs_mount, name,
+   file = alloc_file_pseudo(inode, cxl_fs.mount, name,
 flags & (O_ACCMODE | O_NONBLOCK), fops);
if (IS_ERR(file))
goto err_inode;
@@ -91,7 +90,7 @@ static struct file *cxl_getfile(const char *name,
 err_inode:
iput(inode);
 err_fs:
-   simple_release_fs(&cxl_vfs_mount, &cxl_fs_cnt);
+   simple_release_fs(&cxl_fs);
 err_module:
module_put(fops->owner);
return file;
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index 7018cd802569..7fa98dd4fa28 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -29,8 +29,7 @@
 
 #define OCXLFLASH_FS_MAGIC  0x1697698f
 
-static int ocxlflash_fs_cnt;
-static struct vfsmount *ocxlflash_vfs_mount;
+static struct simple_fs ocxlflash_fs;
 
 static int ocxlflash_fs_init_fs_context(struct fs_context *fc)
 {
@@ -51,7 +50,7 @@ static struct file_system_type ocxlflash_fs_type = {
 static void ocxlflash_release_mapping(struct ocxlflash_context *ctx)
 {
if (ctx->mapping)
-   simple_release_fs(&ocxlflash_vfs_mount, &a

[PATCH v2 1/7] apparmor: just use vfs_kern_mount to make .null

2020-04-21 Thread Emanuele Giuseppe Esposito
aa_mk_null_file is using simple_pin_fs/simple_release_fs with local
variables as arguments, for what would amount to a simple
vfs_kern_mount/mntput pair if everything was inlined.  Just use
the normal filesystem API since the reference counting is not needed
here (it is a local variable and always 0 on entry and on exit).

There is no functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 security/apparmor/apparmorfs.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 280741fc0f5f..36f848734902 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -2525,14 +2525,15 @@ struct path aa_null;
 
 static int aa_mk_null_file(struct dentry *parent)
 {
-   struct vfsmount *mount = NULL;
+   struct file_system_type *type = parent->d_sb->s_type;
+   struct vfsmount *mount;
struct dentry *dentry;
struct inode *inode;
-   int count = 0;
-   int error = simple_pin_fs(parent->d_sb->s_type, &mount, &count);
+   int error;
 
-   if (error)
-   return error;
+   mount = vfs_kern_mount(type, SB_KERNMOUNT, type->name, NULL);
+   if (IS_ERR(mount))
+   return PTR_ERR(mount);
 
inode_lock(d_inode(parent));
dentry = lookup_one_len(NULL_FILE_NAME, parent, strlen(NULL_FILE_NAME));
@@ -2561,7 +2562,7 @@ static int aa_mk_null_file(struct dentry *parent)
dput(dentry);
 out:
inode_unlock(d_inode(parent));
-   simple_release_fs(&mount, &count);
+   mntput(mount);
return error;
 }
 
-- 
2.25.2



[PATCH v2 0/7] libfs: group and simplify linux fs code

2020-04-21 Thread Emanuele Giuseppe Esposito
libfs.c has many functions that are useful to implement dentry and inode
operations, but not many at the filesystem level.  As a result, code to
create files and inodes has a lot of duplication, to the point that
tracefs has copied several hundred lines from debugfs.

The main two libfs.c functions for filesystems are simple_pin_fs and
simple_release_fs, which hide a somewhat complicated locking sequence
that is needed to serialize vfs_kern_mount and mntget.  In this series,
my aim is to add functions that create dentries and inodes of various
kinds (either anonymous inodes, or directory/file/symlink).  These
functions take the code that was duplicated across debugfs and tracefs
and move it to libfs.c.

In order to limit the number of arguments to the new functions, the
series first creates a data type that is passed to both
simple_pin_fs/simple_release_fs and the new creation functions.  The new
struct, introduced in patch 2, simply groups the "mount" and "count"
arguments to simple_pin_fs and simple_release_fs.

Patches 1-4 are preparations to introduce the new simple_fs struct and
new functions that are useful in the remainder of the series.  Patch 5
introduces the dentry and inode creation functions.  Patch 6-7 can then
adopt them in debugfs and tracefs.

Signed-off-by: Emanuele Giuseppe Esposito 

v1->v2: rename simple_new_inode in new_inode_current_time,
more detailed explanations, put all common code in fs/libfs.c

Emanuele Giuseppe Esposito (7):
  apparmor: just use vfs_kern_mount to make .null
  libfs: wrap simple_pin_fs/simple_release_fs arguments in a struct
  libfs: introduce new_inode_current_time
  libfs: add alloc_anon_inode wrapper
  libfs: add file creation functions
  debugfs: switch to simplefs inode creation API
  tracefs: switch to simplefs inode creation API

 drivers/gpu/drm/drm_drv.c   |  11 +-
 drivers/misc/cxl/api.c  |  13 +-
 drivers/scsi/cxlflash/ocxl_hw.c |  14 +-
 fs/binfmt_misc.c|   9 +-
 fs/configfs/mount.c |  10 +-
 fs/debugfs/inode.c  | 158 +++--
 fs/libfs.c  | 299 ++--
 fs/tracefs/inode.c  |  96 ++
 include/linux/fs.h  |  31 +++-
 security/apparmor/apparmorfs.c  |  38 ++--
 security/inode.c|  11 +-
 11 files changed, 399 insertions(+), 291 deletions(-)

-- 
2.25.2



Re: [PATCH 2/8] fs: extract simple_pin/release_fs to separate files

2020-04-21 Thread Emanuele Giuseppe Esposito




On 4/21/20 1:19 PM, Frederic Barrat wrote:




diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 39eec9031487..a62795079d9c 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -19,6 +19,7 @@ config CXL
  select CXL_BASE
  select CXL_AFU_DRIVER_OPS
  select CXL_LIB
+    select SIMPLEFS
  default m
  help
    Select this option to enable driver support for IBM Coherent
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index b493de962153..0b8f8de7475a 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -9,6 +9,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
index 2d2266c1439e..ddd9245fff3d 100644
--- a/drivers/misc/ocxl/Kconfig
+++ b/drivers/misc/ocxl/Kconfig
@@ -12,6 +12,7 @@ config OCXL
  depends on PPC_POWERNV && PCI && EEH
  select OCXL_BASE
  select HOTPLUG_PCI_POWERNV
+    select SIMPLEFS



It's not clear to me the Kconfig updated is needed for the ocxl driver. 
I think it's only needed for the cxl driver.


I am going to get rid of the separate simplefs.c file and related 
Kconfig entry and put everything in fs/libfs.c, so this file (together 
with many others touched in this patch) won't be modified in v2.


Thanks,

Emanuele



Re: [PATCH 0/8] Simplefs: group and simplify linux fs code

2020-04-20 Thread Emanuele Giuseppe Esposito




On 4/16/20 8:59 AM, Luis Chamberlain wrote:

On Tue, Apr 14, 2020 at 02:42:54PM +0200, Emanuele Giuseppe Esposito wrote:

This series of patches introduce wrappers for functions,
arguments simplification in functions calls and most importantly
groups duplicated code in a single header, simplefs, to avoid redundancy
in the linux fs, especially debugfs and tracefs.


The general goal seems worthy, but here I don't see explained why hasn't
this gone through libfs, and what the intention was long term. For
instance, you added some other generalizations which you have found. It
was not clear that this was the goal here, to expand on these paths.

What if common code on fs is found which are not part of debugfs and
tracefs, how does one decide if to move to libfs or simplefs?


The idea of simplefs (that I will also explain better in the cover 
letter and commit messages) is that not only it groups common code, but 
also introduces a new struct simple_fs that simplifies parameter 
passing. This means all fs that use these functions and the struct 
should include linux/simplefs.h, while all common functions that take a 
simple_fs struct will be added in simplefs.c


Thank you for all the feedback, I will incorporate it and send a new 
patch series soon.


Emanuele



Re: [PATCH 1/8] apparmor: just use vfs_kern_mount to make .null

2020-04-20 Thread Emanuele Giuseppe Esposito




On 4/16/20 8:44 AM, Luis Chamberlain wrote:

On Tue, Apr 14, 2020 at 02:42:55PM +0200, Emanuele Giuseppe Esposito wrote:

aa_mk_null_file is using simple_pin_fs/simple_release_fs with local
variables as arguments, for what would amount to a simple
vfs_kern_mount/mntput pair if everything was inlined.  Just use
the normal filesystem API since the reference counting is not needed
here.


*Why* is refcounting not needed here?


The refcount is a local variable and is always 0 on entry and exit, so 
it is not necessary to have refcounting across function invocations, 
such as what simple_pin_fs and simple_release_fs provide.


Thank you,

Emanuele


Luis



Signed-off-by: Emanuele Giuseppe Esposito 
---
  security/apparmor/apparmorfs.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 280741fc0f5f..828bb1eb77ea 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -2525,14 +2525,14 @@ struct path aa_null;
  
  static int aa_mk_null_file(struct dentry *parent)

  {
-   struct vfsmount *mount = NULL;
+   struct file_system_type *type = parent->d_sb->s_type;
+   struct vfsmount *mount;
struct dentry *dentry;
struct inode *inode;
-   int count = 0;
-   int error = simple_pin_fs(parent->d_sb->s_type, &mount, &count);
  
-	if (error)

-   return error;
+   mount = vfs_kern_mount(type, SB_KERNMOUNT, type->name, NULL);
+   if (IS_ERR(mount))
+   return PTR_ERR(mount);
  
  	inode_lock(d_inode(parent));

dentry = lookup_one_len(NULL_FILE_NAME, parent, strlen(NULL_FILE_NAME));
@@ -2561,7 +2561,7 @@ static int aa_mk_null_file(struct dentry *parent)
dput(dentry);
  out:
inode_unlock(d_inode(parent));
-   simple_release_fs(&mount, &count);
+   mntput(mount);
return error;
  }
  
--

2.25.2







Re: [PATCH 4/8] fs: introduce simple_new_inode

2020-04-20 Thread Emanuele Giuseppe Esposito




On 4/14/20 3:01 PM, Greg Kroah-Hartman wrote:

On Tue, Apr 14, 2020 at 02:42:58PM +0200, Emanuele Giuseppe Esposito wrote:

It is a common special case for new_inode to initialize the
time to the current time and the inode to get_next_ino().
Introduce a core function that does it and use it throughout
Linux.


Shouldn't this just be called new_inode_current_time()?

How is anyone going to remember what simple_new_inode() does to the
inode structure?


I noticed that most functions in libfs.c are called "simple_*" when they 
do the right thing for the majority of simple use cases (e.g., 
simple_symlink_inode_operations or simple_dir_operations). I can 
certainly rename the function.


Thank you for all the feedback, I will incorporate it and send a new 
patch series soon.



Emanuele



--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -595,6 +595,18 @@ int simple_write_end(struct file *file, struct 
address_space *mapping,
  }
  EXPORT_SYMBOL(simple_write_end);
  
+struct inode *simple_new_inode(struct super_block *sb)

+{
+   struct inode *inode = new_inode(sb);
+   if (inode) {
+   inode->i_ino = get_next_ino();
+   inode->i_atime = inode->i_mtime =
+   inode->i_ctime = current_time(inode);
+   }
+   return inode;
+}
+EXPORT_SYMBOL(simple_new_inode);




Re: [PATCH 6/8] simplefs: add file creation functions

2020-04-20 Thread Emanuele Giuseppe Esposito




On 4/14/20 2:56 PM, Greg Kroah-Hartman wrote:

On Tue, Apr 14, 2020 at 02:43:00PM +0200, Emanuele Giuseppe Esposito wrote:

A bunch of code is duplicated between debugfs and tracefs, unify it to the
simplefs library.

The code is very similar, except that dentry and inode creation are unified
into a single function (unlike start_creating in debugfs and tracefs, which
only takes care of dentries).  This adds an output parameter to the creation
functions, but pushes all error recovery into fs/simplefs.c.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  fs/simplefs.c| 150 +++
  include/linux/simplefs.h |  19 +
  2 files changed, 169 insertions(+)


What's wrong with libfs, isn't that supposed to be for these types of
"common" filesystem interactions?

Why create a whole "new" fs for this?


I assume you meant a new file. These new functions are used only by a 
few filesystems, and I didn't want to include them in vmlinux 
unconditionally, so I introduced simplefs.c and CONFIG_SIMPLEFS instead 
of extending libfs.c. In this way only fs that need this code like 
debugfs and tracefs will load it.


Thank you,

Emanuele



Re: [PATCH] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place

2020-04-14 Thread Emanuele Giuseppe Esposito




On 4/14/20 10:18 AM, Paolo Bonzini wrote:

On 13/04/20 23:34, Philippe Mathieu-Daudé wrote:

+#define VM_STAT(x, ...) offsetof(struct kvm, stat.x), KVM_STAT_VM, ## 
__VA_ARGS__
+#define VCPU_STAT(x, ...) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## 
__VA_ARGS__

I find this macro expanding into multiple fields odd... Maybe a matter
of taste. Sugggestion, have the macro define the full structure, as in
the arm64 arch:

#define VM_STAT(n, x, ...) { n, offsetof(struct kvm, stat.x),
KVM_STAT_VM, ## __VA_ARGS__ }

Ditto for VCPU_STAT().


Hi Philippe and Paolo,


Yes, that's a good idea.  Emanuele, can you switch it to this format?


Sure, I just submitted the v2 version.

Thanks,

Emanuele



[PATCH 8/8] tracefs: switch to simplefs inode creation API

2020-04-14 Thread Emanuele Giuseppe Esposito
There is no semantic change intended; the code in the simplefs.c functions in
fact was derived from debugfs and tracefs code.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/tracefs/inode.c | 86 --
 1 file changed, 7 insertions(+), 79 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index a30837a8e1d4..69e2215c797b 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -298,57 +298,6 @@ static struct file_system_type trace_fs_type = {
 };
 MODULE_ALIAS_FS("tracefs");
 
-static struct dentry *start_creating(const char *name, struct dentry *parent)
-{
-   struct dentry *dentry;
-   int error;
-
-   pr_debug("tracefs: creating file '%s'\n",name);
-
-   error = simple_pin_fs(&tracefs, &trace_fs_type);
-   if (error)
-   return ERR_PTR(error);
-
-   /* If the parent is not specified, we create it in the root.
-* We need the root dentry to do this, which is in the super
-* block. A pointer to that is in the struct vfsmount that we
-* have around.
-*/
-   if (!parent)
-   parent = tracefs.mount->mnt_root;
-
-   inode_lock(parent->d_inode);
-   if (unlikely(IS_DEADDIR(parent->d_inode)))
-   dentry = ERR_PTR(-ENOENT);
-   else
-   dentry = lookup_one_len(name, parent, strlen(name));
-   if (!IS_ERR(dentry) && dentry->d_inode) {
-   dput(dentry);
-   dentry = ERR_PTR(-EEXIST);
-   }
-
-   if (IS_ERR(dentry)) {
-   inode_unlock(parent->d_inode);
-   simple_release_fs(&tracefs);
-   }
-
-   return dentry;
-}
-
-static struct dentry *failed_creating(struct dentry *dentry)
-{
-   inode_unlock(dentry->d_parent->d_inode);
-   dput(dentry);
-   simple_release_fs(&tracefs);
-   return NULL;
-}
-
-static struct dentry *end_creating(struct dentry *dentry)
-{
-   inode_unlock(dentry->d_parent->d_inode);
-   return dentry;
-}
-
 /**
  * tracefs_create_file - create a file in the tracefs filesystem
  * @name: a pointer to a string containing the name of the file to create.
@@ -385,49 +334,28 @@ struct dentry *tracefs_create_file(const char *name, 
umode_t mode,
if (security_locked_down(LOCKDOWN_TRACEFS))
return NULL;
 
-   if (!(mode & S_IFMT))
-   mode |= S_IFREG;
-   BUG_ON(!S_ISREG(mode));
-   dentry = start_creating(name, parent);
-
+   dentry = simplefs_create_file(&tracefs, &trace_fs_type,
+ name, mode, parent, data, &inode);
if (IS_ERR(dentry))
return NULL;
 
-   inode = simple_new_inode(dentry->d_sb);
-   if (unlikely(!inode))
-   return failed_creating(dentry);
-
-   inode->i_mode = mode;
inode->i_fop = fops ? fops : &tracefs_file_operations;
-   inode->i_private = data;
-   d_instantiate(dentry, inode);
-   fsnotify_create(dentry->d_parent->d_inode, dentry);
-   return end_creating(dentry);
+   return simplefs_finish_dentry(dentry, inode);
 }
 
 static struct dentry *__create_dir(const char *name, struct dentry *parent,
   const struct inode_operations *ops)
 {
-   struct dentry *dentry = start_creating(name, parent);
+   struct dentry *dentry;
struct inode *inode;
 
+   dentry = simplefs_create_dir(&tracefs, &trace_fs_type,
+name, 0755, parent, &inode);
if (IS_ERR(dentry))
return NULL;
 
-   inode = simple_new_inode(dentry->d_sb);
-   if (unlikely(!inode))
-   return failed_creating(dentry);
-
-   inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
inode->i_op = ops;
-   inode->i_fop = &simple_dir_operations;
-
-   /* directory inodes start off with i_nlink == 2 (for "." entry) */
-   inc_nlink(inode);
-   d_instantiate(dentry, inode);
-   inc_nlink(dentry->d_parent->d_inode);
-   fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
-   return end_creating(dentry);
+   return simplefs_finish_dentry(dentry, inode);
 }
 
 /**
-- 
2.25.2



[PATCH 7/8] debugfs: switch to simplefs inode creation API

2020-04-14 Thread Emanuele Giuseppe Esposito
The only difference, compared to the pre-existing code, is that symlink
creation now triggers fsnotify_create.  This was a bug in the debugfs
code, since for example vfs_symlink does call fsnotify_create.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/debugfs/inode.c | 144 +
 1 file changed, 15 insertions(+), 129 deletions(-)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 834b5872ca0d..7a2369373b85 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -294,68 +294,6 @@ struct dentry *debugfs_lookup(const char *name, struct 
dentry *parent)
 }
 EXPORT_SYMBOL_GPL(debugfs_lookup);
 
-static struct dentry *start_creating(const char *name, struct dentry *parent)
-{
-   struct dentry *dentry;
-   int error;
-
-   pr_debug("creating file '%s'\n", name);
-
-   if (IS_ERR(parent))
-   return parent;
-
-   error = simple_pin_fs(&debugfs, &debug_fs_type);
-   if (error) {
-   pr_err("Unable to pin filesystem for file '%s'\n", name);
-   return ERR_PTR(error);
-   }
-
-   /* If the parent is not specified, we create it in the root.
-* We need the root dentry to do this, which is in the super
-* block. A pointer to that is in the struct vfsmount that we
-* have around.
-*/
-   if (!parent)
-   parent = debugfs.mount->mnt_root;
-
-   inode_lock(d_inode(parent));
-   if (unlikely(IS_DEADDIR(d_inode(parent
-   dentry = ERR_PTR(-ENOENT);
-   else
-   dentry = lookup_one_len(name, parent, strlen(name));
-   if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
-   if (d_is_dir(dentry))
-   pr_err("Directory '%s' with parent '%s' already 
present!\n",
-  name, parent->d_name.name);
-   else
-   pr_err("File '%s' in directory '%s' already present!\n",
-  name, parent->d_name.name);
-   dput(dentry);
-   dentry = ERR_PTR(-EEXIST);
-   }
-
-   if (IS_ERR(dentry)) {
-   inode_unlock(d_inode(parent));
-   simple_release_fs(&debugfs);
-   }
-
-   return dentry;
-}
-
-static struct dentry *failed_creating(struct dentry *dentry)
-{
-   inode_unlock(d_inode(dentry->d_parent));
-   dput(dentry);
-   simple_release_fs(&debugfs);
-   return ERR_PTR(-ENOMEM);
-}
-
-static struct dentry *end_creating(struct dentry *dentry)
-{
-   inode_unlock(d_inode(dentry->d_parent));
-   return dentry;
-}
-
 static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
struct dentry *parent, void *data,
const struct file_operations *proxy_fops,
@@ -364,32 +302,17 @@ static struct dentry *__debugfs_create_file(const char 
*name, umode_t mode,
struct dentry *dentry;
struct inode *inode;
 
-   if (!(mode & S_IFMT))
-   mode |= S_IFREG;
-   BUG_ON(!S_ISREG(mode));
-   dentry = start_creating(name, parent);
-
+   dentry = simplefs_create_file(&debugfs, &debug_fs_type,
+ name, mode, parent, data, &inode);
if (IS_ERR(dentry))
return dentry;
 
-   inode = simple_new_inode(dentry->d_sb);
-   if (unlikely(!inode)) {
-   pr_err("out of free dentries, can not create file '%s'\n",
-  name);
-   return failed_creating(dentry);
-   }
-
-   inode->i_mode = mode;
-   inode->i_private = data;
-
inode->i_op = &debugfs_file_inode_operations;
inode->i_fop = proxy_fops;
dentry->d_fsdata = (void *)((unsigned long)real_fops |
DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
 
-   d_instantiate(dentry, inode);
-   fsnotify_create(d_inode(dentry->d_parent), dentry);
-   return end_creating(dentry);
+   return simplefs_finish_dentry(dentry, inode);
 }
 
 /**
@@ -522,29 +445,16 @@ EXPORT_SYMBOL_GPL(debugfs_create_file_size);
  */
 struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
 {
-   struct dentry *dentry = start_creating(name, parent);
+   struct dentry *dentry;
struct inode *inode;
 
+   dentry = simplefs_create_dir(&debugfs, &debug_fs_type,
+name, 0755, parent, &inode);
if (IS_ERR(dentry))
return dentry;
 
-   inode = simple_new_inode(dentry->d_sb);
-   if (unlikely(!inode)) {
-   pr_err("out of free dentries, can not create directory '%s'\n",
-  name);
-   r

[PATCH 6/8] simplefs: add file creation functions

2020-04-14 Thread Emanuele Giuseppe Esposito
A bunch of code is duplicated between debugfs and tracefs, unify it to the
simplefs library.

The code is very similar, except that dentry and inode creation are unified
into a single function (unlike start_creating in debugfs and tracefs, which
only takes care of dentries).  This adds an output parameter to the creation
functions, but pushes all error recovery into fs/simplefs.c.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 fs/simplefs.c| 150 +++
 include/linux/simplefs.h |  19 +
 2 files changed, 169 insertions(+)

diff --git a/fs/simplefs.c b/fs/simplefs.c
index c59eb8d996be..3e48a288beb3 100644
--- a/fs/simplefs.c
+++ b/fs/simplefs.c
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include 
 #include 
+#include 
+#include 
 
 static DEFINE_SPINLOCK(pin_fs_lock);
 
@@ -42,3 +44,151 @@ struct inode *simple_alloc_anon_inode(struct simple_fs *fs)
return alloc_anon_inode(fs->mount->mnt_sb);
 }
 EXPORT_SYMBOL(simple_alloc_anon_inode);
+
+static struct dentry *failed_creating(struct simple_fs *fs, struct dentry 
*dentry)
+{
+   inode_unlock(d_inode(dentry->d_parent));
+   dput(dentry);
+   simple_release_fs(fs);
+   return ERR_PTR(-ENOMEM);
+}
+
+struct dentry *simplefs_create_dentry(struct simple_fs *fs, struct 
file_system_type *type,
+ const char *name, struct dentry *parent,
+ struct inode **inode)
+{
+   struct dentry *dentry;
+   int error;
+
+   pr_debug("creating file '%s'\n", name);
+
+   if (IS_ERR(parent))
+   return parent;
+
+   error = simple_pin_fs(fs, type);
+   if (error) {
+   pr_err("Unable to pin filesystem for file '%s'\n", name);
+   return ERR_PTR(error);
+   }
+
+   /* If the parent is not specified, we create it in the root.
+* We need the root dentry to do this, which is in the super
+* block. A pointer to that is in the struct vfsmount that we
+* have around.
+*/
+   if (!parent)
+   parent = fs->mount->mnt_root;
+
+   inode_lock(d_inode(parent));
+   dentry = lookup_one_len(name, parent, strlen(name));
+   if (!IS_ERR(dentry) && d_really_is_positive(dentry)) {
+   if (d_is_dir(dentry))
+   pr_err("Directory '%s' with parent '%s' already 
present!\n",
+  name, parent->d_name.name);
+   else
+   pr_err("File '%s' in directory '%s' already present!\n",
+  name, parent->d_name.name);
+   dput(dentry);
+   dentry = ERR_PTR(-EEXIST);
+   }
+
+   if (IS_ERR(dentry)) {
+   inode_unlock(d_inode(parent));
+   simple_release_fs(fs);
+   }
+
+
+   if (IS_ERR(dentry))
+   return dentry;
+
+   *inode = simple_new_inode(fs->mount->mnt_sb);
+   if (unlikely(!(*inode))) {
+   pr_err("out of free inodes, can not create file '%s'\n",
+  name);
+   return failed_creating(fs, dentry);
+   }
+
+   return dentry;
+}
+EXPORT_SYMBOL(simplefs_create_dentry);
+
+struct dentry *simplefs_create_file(struct simple_fs *fs, struct 
file_system_type *type,
+   const char *name, umode_t mode,
+   struct dentry *parent, void *data,
+   struct inode **inode)
+{
+   struct dentry *dentry;
+
+   WARN_ON((mode & S_IFMT) && !S_ISREG(mode));
+   mode |= S_IFREG;
+
+   dentry = simplefs_create_dentry(fs, type, name, parent, inode);
+
+   if (IS_ERR(dentry))
+   return dentry;
+
+   (*inode)->i_mode = mode;
+   (*inode)->i_private = data;
+
+   return dentry;
+}
+EXPORT_SYMBOL(simplefs_create_file);
+
+struct dentry *simplefs_finish_dentry(struct dentry *dentry, struct inode 
*inode)
+{
+   d_instantiate(dentry, inode);
+   if (S_ISDIR(inode->i_mode)) {
+   inc_nlink(d_inode(dentry->d_parent));
+   fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
+   } else {
+   fsnotify_create(d_inode(dentry->d_parent), dentry);
+   }
+   inode_unlock(d_inode(dentry->d_parent));
+   return dentry;
+}
+EXPORT_SYMBOL(simplefs_finish_dentry);
+
+struct dentry *simplefs_create_dir(struct simple_fs *fs, struct 
file_system_type *type,
+  const char *name, umode_t mode, struct 
dentry *parent,
+  struct inode **inode)
+{
+   struct dentry *dentry;
+
+   WARN_ON((mode & S_IFMT) && !S_ISDIR(mode));
+   mode |= S_IFDIR;
+
+   dentry = si

[PATCH 5/8] simplefs: add alloc_anon_inode wrapper

2020-04-14 Thread Emanuele Giuseppe Esposito
Start adding file creation wrappers, the simplest returns an anonymous
inode.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 drivers/gpu/drm/drm_drv.c   | 2 +-
 drivers/misc/cxl/api.c  | 2 +-
 drivers/scsi/cxlflash/ocxl_hw.c | 2 +-
 fs/simplefs.c   | 6 ++
 include/linux/simplefs.h| 2 ++
 5 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index b4b357725be2..4e4ea1bf312c 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -539,7 +539,7 @@ static struct inode *drm_fs_inode_new(void)
return ERR_PTR(r);
}
 
-   inode = alloc_anon_inode(drm_fs.mount->mnt_sb);
+   inode = simple_alloc_anon_inode(&drm_fs);
if (IS_ERR(inode))
simple_release_fs(&drm_fs);
 
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 6c6566d8bc17..a3d2682eb3a7 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -73,7 +73,7 @@ static struct file *cxl_getfile(const char *name,
goto err_module;
}
 
-   inode = alloc_anon_inode(cxl_fs.mount->mnt_sb);
+   inode = simple_alloc_anon_inode(&cxl_fs);
if (IS_ERR(inode)) {
file = ERR_CAST(inode);
goto err_fs;
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index 23afde0c6c0e..770fdf186028 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -86,7 +86,7 @@ static struct file *ocxlflash_getfile(struct device *dev, 
const char *name,
goto err2;
}
 
-   inode = alloc_anon_inode(ocxlflash_fs.mount->mnt_sb);
+   inode = simple_alloc_anon_inode(&ocxlflash_fs);
if (IS_ERR(inode)) {
rc = PTR_ERR(inode);
dev_err(dev, "%s: alloc_anon_inode failed rc=%d\n",
diff --git a/fs/simplefs.c b/fs/simplefs.c
index 790d8beb9cc3..c59eb8d996be 100644
--- a/fs/simplefs.c
+++ b/fs/simplefs.c
@@ -36,3 +36,9 @@ void simple_release_fs(struct simple_fs *fs)
mntput(mnt);
 }
 EXPORT_SYMBOL(simple_release_fs);
+
+struct inode *simple_alloc_anon_inode(struct simple_fs *fs)
+{
+   return alloc_anon_inode(fs->mount->mnt_sb);
+}
+EXPORT_SYMBOL(simple_alloc_anon_inode);
diff --git a/include/linux/simplefs.h b/include/linux/simplefs.h
index 18010414a16f..c62ab526414e 100644
--- a/include/linux/simplefs.h
+++ b/include/linux/simplefs.h
@@ -12,4 +12,6 @@ struct simple_fs {
 extern int simple_pin_fs(struct simple_fs *, struct file_system_type *);
 extern void simple_release_fs(struct simple_fs *);
 
+extern struct inode *simple_alloc_anon_inode(struct simple_fs *fs);
+
 #endif
-- 
2.25.2



[PATCH 4/8] fs: introduce simple_new_inode

2020-04-14 Thread Emanuele Giuseppe Esposito
It is a common special case for new_inode to initialize the
time to the current time and the inode to get_next_ino().
Introduce a core function that does it and use it throughout
Linux.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 arch/powerpc/platforms/cell/spufs/inode.c |  4 +---
 arch/s390/hypfs/inode.c   |  4 +---
 drivers/infiniband/hw/qib/qib_fs.c|  6 +-
 drivers/misc/ibmasm/ibmasmfs.c|  8 +++-
 drivers/oprofile/oprofilefs.c |  8 +++-
 drivers/usb/gadget/function/f_fs.c|  8 +---
 fs/autofs/inode.c |  4 +---
 fs/binfmt_misc.c  | 16 ++--
 fs/debugfs/inode.c| 19 ---
 fs/efivarfs/inode.c   |  4 +---
 fs/fuse/control.c |  4 +---
 fs/hugetlbfs/inode.c  |  8 ++--
 fs/libfs.c| 12 
 fs/ocfs2/dlmfs/dlmfs.c|  8 ++--
 fs/proc/base.c|  4 +---
 fs/proc/proc_sysctl.c |  5 +
 fs/pstore/inode.c | 14 ++
 fs/ramfs/inode.c  |  4 +---
 fs/tracefs/inode.c| 14 ++
 include/linux/fs.h|  1 +
 ipc/mqueue.c  |  4 +---
 kernel/bpf/inode.c|  7 +--
 mm/shmem.c|  4 +---
 net/sunrpc/rpc_pipe.c |  4 +---
 security/apparmor/apparmorfs.c|  8 ++--
 security/inode.c  |  4 +---
 26 files changed, 50 insertions(+), 136 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
b/arch/powerpc/platforms/cell/spufs/inode.c
index 25390569e24c..5167b11d41ed 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -77,15 +77,13 @@ spufs_new_inode(struct super_block *sb, umode_t mode)
 {
struct inode *inode;
 
-   inode = new_inode(sb);
+   inode = simple_new_inode(sb);
if (!inode)
goto out;
 
-   inode->i_ino = get_next_ino();
inode->i_mode = mode;
inode->i_uid = current_fsuid();
inode->i_gid = current_fsgid();
-   inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 out:
return inode;
 }
diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index 5c97f48cea91..97d11561f35c 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -93,15 +93,13 @@ static void hypfs_delete_tree(struct dentry *root)
 
 static struct inode *hypfs_make_inode(struct super_block *sb, umode_t mode)
 {
-   struct inode *ret = new_inode(sb);
+   struct inode *ret = simple_new_inode(sb);
 
if (ret) {
struct hypfs_sb_info *hypfs_info = sb->s_fs_info;
-   ret->i_ino = get_next_ino();
ret->i_mode = mode;
ret->i_uid = hypfs_info->uid;
ret->i_gid = hypfs_info->gid;
-   ret->i_atime = ret->i_mtime = ret->i_ctime = current_time(ret);
if (S_ISDIR(mode))
set_nlink(ret, 2);
}
diff --git a/drivers/infiniband/hw/qib/qib_fs.c 
b/drivers/infiniband/hw/qib/qib_fs.c
index e336d778e076..d402c3b1c552 100644
--- a/drivers/infiniband/hw/qib/qib_fs.c
+++ b/drivers/infiniband/hw/qib/qib_fs.c
@@ -53,21 +53,17 @@ static int qibfs_mknod(struct inode *dir, struct dentry 
*dentry,
   void *data)
 {
int error;
-   struct inode *inode = new_inode(dir->i_sb);
+   struct inode *inode = simple_new_inode(dir->i_sb);
 
if (!inode) {
error = -EPERM;
goto bail;
}
 
-   inode->i_ino = get_next_ino();
inode->i_mode = mode;
inode->i_uid = GLOBAL_ROOT_UID;
inode->i_gid = GLOBAL_ROOT_GID;
inode->i_blocks = 0;
-   inode->i_atime = current_time(inode);
-   inode->i_mtime = inode->i_atime;
-   inode->i_ctime = inode->i_atime;
inode->i_private = data;
if (S_ISDIR(mode)) {
inode->i_op = &simple_dir_inode_operations;
diff --git a/drivers/misc/ibmasm/ibmasmfs.c b/drivers/misc/ibmasm/ibmasmfs.c
index 35fec1bf1b3d..72aa02505f45 100644
--- a/drivers/misc/ibmasm/ibmasmfs.c
+++ b/drivers/misc/ibmasm/ibmasmfs.c
@@ -134,13 +134,11 @@ static int ibmasmfs_fill_super(struct super_block *sb, 
struct fs_context *fc)
 
 static struct inode *ibmasmfs_make_inode(struct super_block *sb, int mode)
 {
-   struct inode *ret = new_inode(sb);
+   struct inode *ret = simple_new_inode(sb);
 
-   if (ret) {
-   ret->i_ino = get_next_ino();
+   if (ret)
ret->i_mode = mode;
-   re

[PATCH 3/8] fs: wrap simple_pin_fs/simple_release_fs arguments in a struct

2020-04-14 Thread Emanuele Giuseppe Esposito
Simplify passing the count and mount to simple_pin_fs and simple_release_fs,
in preparation for adding more high level operations to the simplefs API.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 drivers/gpu/drm/drm_drv.c   | 11 +--
 drivers/misc/cxl/api.c  | 13 ++---
 drivers/scsi/cxlflash/ocxl_hw.c | 14 ++
 fs/binfmt_misc.c|  9 -
 fs/configfs/mount.c | 10 --
 fs/debugfs/inode.c  | 22 ++
 fs/simplefs.c   | 20 ++--
 fs/tracefs/inode.c  | 18 --
 include/linux/simplefs.h|  9 +++--
 security/apparmor/apparmorfs.c  | 25 -
 security/inode.c| 11 +--
 11 files changed, 77 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 187a61091b5c..b4b357725be2 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -514,8 +514,7 @@ EXPORT_SYMBOL(drm_dev_unplug);
  * iput(), but this way you'd end up with a new vfsmount for each inode.
  */
 
-static int drm_fs_cnt;
-static struct vfsmount *drm_fs_mnt;
+static struct simple_fs drm_fs;
 
 static int drm_fs_init_fs_context(struct fs_context *fc)
 {
@@ -534,15 +533,15 @@ static struct inode *drm_fs_inode_new(void)
struct inode *inode;
int r;
 
-   r = simple_pin_fs(&drm_fs_type, &drm_fs_mnt, &drm_fs_cnt);
+   r = simple_pin_fs(&drm_fs, &drm_fs_type);
if (r < 0) {
DRM_ERROR("Cannot mount pseudo fs: %d\n", r);
return ERR_PTR(r);
}
 
-   inode = alloc_anon_inode(drm_fs_mnt->mnt_sb);
+   inode = alloc_anon_inode(drm_fs.mount->mnt_sb);
if (IS_ERR(inode))
-   simple_release_fs(&drm_fs_mnt, &drm_fs_cnt);
+   simple_release_fs(&drm_fs);
 
return inode;
 }
@@ -551,7 +550,7 @@ static void drm_fs_inode_free(struct inode *inode)
 {
if (inode) {
iput(inode);
-   simple_release_fs(&drm_fs_mnt, &drm_fs_cnt);
+   simple_release_fs(&drm_fs);
}
 }
 
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 0b8f8de7475a..6c6566d8bc17 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -32,8 +32,7 @@
 
 #define CXL_PSEUDO_FS_MAGIC0x1697697f
 
-static int cxl_fs_cnt;
-static struct vfsmount *cxl_vfs_mount;
+static struct simple_fs cxl_fs;
 
 static int cxl_fs_init_fs_context(struct fs_context *fc)
 {
@@ -51,7 +50,7 @@ static struct file_system_type cxl_fs_type = {
 void cxl_release_mapping(struct cxl_context *ctx)
 {
if (ctx->kernelapi && ctx->mapping)
-   simple_release_fs(&cxl_vfs_mount, &cxl_fs_cnt);
+   simple_release_fs(&cxl_fs);
 }
 
 static struct file *cxl_getfile(const char *name,
@@ -67,20 +66,20 @@ static struct file *cxl_getfile(const char *name,
if (fops->owner && !try_module_get(fops->owner))
return ERR_PTR(-ENOENT);
 
-   rc = simple_pin_fs(&cxl_fs_type, &cxl_vfs_mount, &cxl_fs_cnt);
+   rc = simple_pin_fs(&cxl_fs, &cxl_fs_type);
if (rc < 0) {
pr_err("Cannot mount cxl pseudo filesystem: %d\n", rc);
file = ERR_PTR(rc);
goto err_module;
}
 
-   inode = alloc_anon_inode(cxl_vfs_mount->mnt_sb);
+   inode = alloc_anon_inode(cxl_fs.mount->mnt_sb);
if (IS_ERR(inode)) {
file = ERR_CAST(inode);
goto err_fs;
}
 
-   file = alloc_file_pseudo(inode, cxl_vfs_mount, name,
+   file = alloc_file_pseudo(inode, cxl_fs.mount, name,
 flags & (O_ACCMODE | O_NONBLOCK), fops);
if (IS_ERR(file))
goto err_inode;
@@ -92,7 +91,7 @@ static struct file *cxl_getfile(const char *name,
 err_inode:
iput(inode);
 err_fs:
-   simple_release_fs(&cxl_vfs_mount, &cxl_fs_cnt);
+   simple_release_fs(&cxl_fs);
 err_module:
module_put(fops->owner);
return file;
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index 429f55651090..23afde0c6c0e 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -30,8 +30,7 @@
 
 #define OCXLFLASH_FS_MAGIC  0x1697698f
 
-static int ocxlflash_fs_cnt;
-static struct vfsmount *ocxlflash_vfs_mount;
+static struct simple_fs ocxlflash_fs;
 
 static int ocxlflash_fs_init_fs_context(struct fs_context *fc)
 {
@@ -52,7 +51,7 @@ static struct file_system_type ocxlflash_fs_type = {
 static void ocxlflash_release_mapping(struct ocxlflash_context *ctx)
 {
if (ctx->mapping)
-   simple_release_fs(&ocxlflash_vfs_mount, &ocxlflash_fs_cnt);
+   simple_releas

[PATCH 2/8] fs: extract simple_pin/release_fs to separate files

2020-04-14 Thread Emanuele Giuseppe Esposito
We will augment this family of functions with inode management.  To avoid
littering include/linux/fs.h and fs/libfs.c, move them to a separate header,
with a Kconfig symbol to enable them.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 drivers/gpu/drm/Kconfig |  1 +
 drivers/gpu/drm/drm_drv.c   |  2 +-
 drivers/misc/cxl/Kconfig|  1 +
 drivers/misc/cxl/api.c  |  1 +
 drivers/misc/ocxl/Kconfig   |  1 +
 drivers/scsi/cxlflash/ocxl_hw.c |  1 +
 fs/Kconfig  |  3 +++
 fs/Kconfig.binfmt   |  1 +
 fs/Makefile |  1 +
 fs/binfmt_misc.c|  2 +-
 fs/configfs/Kconfig |  1 +
 fs/configfs/mount.c |  2 +-
 fs/debugfs/inode.c  |  2 +-
 fs/libfs.c  | 36 +--
 fs/simplefs.c   | 38 +
 fs/tracefs/inode.c  |  2 +-
 include/linux/fs.h  |  2 --
 include/linux/simplefs.h| 10 +
 lib/Kconfig.debug   | 16 --
 security/Kconfig|  1 +
 security/apparmor/apparmorfs.c  |  3 ++-
 security/inode.c|  2 +-
 22 files changed, 79 insertions(+), 50 deletions(-)
 create mode 100644 fs/simplefs.c
 create mode 100644 include/linux/simplefs.h

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 43594978958e..a6fe933de9da 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -14,6 +14,7 @@ menuconfig DRM
select I2C
select I2C_ALGOBIT
select DMA_SHARED_BUFFER
+   select SIMPLEFS
select SYNC_FILE
help
  Kernel-level support for the Direct Rendering Infrastructure (DRI)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 7b1a628d1f6e..187a61091b5c 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -27,7 +27,7 @@
  */
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig
index 39eec9031487..a62795079d9c 100644
--- a/drivers/misc/cxl/Kconfig
+++ b/drivers/misc/cxl/Kconfig
@@ -19,6 +19,7 @@ config CXL
select CXL_BASE
select CXL_AFU_DRIVER_OPS
select CXL_LIB
+   select SIMPLEFS
default m
help
  Select this option to enable driver support for IBM Coherent
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index b493de962153..0b8f8de7475a 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
index 2d2266c1439e..ddd9245fff3d 100644
--- a/drivers/misc/ocxl/Kconfig
+++ b/drivers/misc/ocxl/Kconfig
@@ -12,6 +12,7 @@ config OCXL
depends on PPC_POWERNV && PCI && EEH
select OCXL_BASE
select HOTPLUG_PCI_POWERNV
+   select SIMPLEFS
default m
help
  Select this option to enable the ocxl driver for Open
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index 7018cd802569..429f55651090 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/fs/Kconfig b/fs/Kconfig
index f08fbbfafd9a..a6795ae312a2 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -218,6 +218,9 @@ config HUGETLB_PAGE
 config MEMFD_CREATE
def_bool TMPFS || HUGETLBFS
 
+config SIMPLEFS
+   bool
+
 config ARCH_HAS_GIGANTIC_PAGE
bool
 
diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
index 62dc4f577ba1..af7b765de4d3 100644
--- a/fs/Kconfig.binfmt
+++ b/fs/Kconfig.binfmt
@@ -176,6 +176,7 @@ config BINFMT_EM86
 
 config BINFMT_MISC
tristate "Kernel support for MISC binaries"
+   select SIMPLEFS
---help---
  If you say Y here, it will be possible to plug wrapper-driven binary
  formats into the kernel. You will like this especially when you use
diff --git a/fs/Makefile b/fs/Makefile
index 2ce5112b02c8..c5c665984b9e 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_PROC_FS) += proc_namespace.o
 obj-y  += notify/
 obj-$(CONFIG_EPOLL)+= eventpoll.o
 obj-y  += anon_inodes.o
+obj-$(CONFIG_SIMPLEFS) += simplefs.o
 obj-$(CONFIG_SIGNALFD) += signalfd.o
 obj-$(CONFIG_TIMERFD)  += timerfd.o
 obj-$(CONFIG_EVENTFD)  += eventfd.o
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index cdb45829354d..c764110f5f0b 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -25,7 +25,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #include "internal.h"
diff --git a/fs/configfs/Kconfig b/fs/configfs/Kconfig
index 272b64456999..3b461e4e3989 10

[PATCH 1/8] apparmor: just use vfs_kern_mount to make .null

2020-04-14 Thread Emanuele Giuseppe Esposito
aa_mk_null_file is using simple_pin_fs/simple_release_fs with local
variables as arguments, for what would amount to a simple
vfs_kern_mount/mntput pair if everything was inlined.  Just use
the normal filesystem API since the reference counting is not needed
here.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 security/apparmor/apparmorfs.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 280741fc0f5f..828bb1eb77ea 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -2525,14 +2525,14 @@ struct path aa_null;
 
 static int aa_mk_null_file(struct dentry *parent)
 {
-   struct vfsmount *mount = NULL;
+   struct file_system_type *type = parent->d_sb->s_type;
+   struct vfsmount *mount;
struct dentry *dentry;
struct inode *inode;
-   int count = 0;
-   int error = simple_pin_fs(parent->d_sb->s_type, &mount, &count);
 
-   if (error)
-   return error;
+   mount = vfs_kern_mount(type, SB_KERNMOUNT, type->name, NULL);
+   if (IS_ERR(mount))
+   return PTR_ERR(mount);
 
inode_lock(d_inode(parent));
dentry = lookup_one_len(NULL_FILE_NAME, parent, strlen(NULL_FILE_NAME));
@@ -2561,7 +2561,7 @@ static int aa_mk_null_file(struct dentry *parent)
dput(dentry);
 out:
inode_unlock(d_inode(parent));
-   simple_release_fs(&mount, &count);
+   mntput(mount);
return error;
 }
 
-- 
2.25.2



[PATCH 0/8] Simplefs: group and simplify linux fs code

2020-04-14 Thread Emanuele Giuseppe Esposito
This series of patches introduce wrappers for functions,
arguments simplification in functions calls and most importantly
groups duplicated code in a single header, simplefs, to avoid redundancy
in the linux fs, especially debugfs and tracefs.

Signed-off-by: Emanuele Giuseppe Esposito 

Emanuele Giuseppe Esposito (8):
  apparmor: just use vfs_kern_mount to make .null
  fs: extract simple_pin/release_fs to separate files
  fs: wrap simple_pin_fs/simple_release_fs arguments in a struct
  fs: introduce simple_new_inode
  simplefs: add alloc_anon_inode wrapper
  simplefs: add file creation functions
  debugfs: switch to simplefs inode creation API
  tracefs: switch to simplefs inode creation API

 arch/powerpc/platforms/cell/spufs/inode.c |   4 +-
 arch/s390/hypfs/inode.c   |   4 +-
 drivers/gpu/drm/Kconfig   |   1 +
 drivers/gpu/drm/drm_drv.c |  13 +-
 drivers/infiniband/hw/qib/qib_fs.c|   6 +-
 drivers/misc/cxl/Kconfig  |   1 +
 drivers/misc/cxl/api.c|  14 +-
 drivers/misc/ibmasm/ibmasmfs.c|   8 +-
 drivers/misc/ocxl/Kconfig |   1 +
 drivers/oprofile/oprofilefs.c |   8 +-
 drivers/scsi/cxlflash/ocxl_hw.c   |  15 +-
 drivers/usb/gadget/function/f_fs.c|   8 +-
 fs/Kconfig|   3 +
 fs/Kconfig.binfmt |   1 +
 fs/Makefile   |   1 +
 fs/autofs/inode.c |   4 +-
 fs/binfmt_misc.c  |  27 +--
 fs/configfs/Kconfig   |   1 +
 fs/configfs/mount.c   |  12 +-
 fs/debugfs/inode.c| 171 +++
 fs/efivarfs/inode.c   |   4 +-
 fs/fuse/control.c |   4 +-
 fs/hugetlbfs/inode.c  |   8 +-
 fs/libfs.c|  48 ++
 fs/ocfs2/dlmfs/dlmfs.c|   8 +-
 fs/proc/base.c|   4 +-
 fs/proc/proc_sysctl.c |   5 +-
 fs/pstore/inode.c |  14 +-
 fs/ramfs/inode.c  |   4 +-
 fs/simplefs.c | 194 ++
 fs/tracefs/inode.c| 108 ++--
 include/linux/fs.h|   3 +-
 include/linux/simplefs.h  |  36 
 ipc/mqueue.c  |   4 +-
 kernel/bpf/inode.c|   7 +-
 lib/Kconfig.debug |  16 +-
 mm/shmem.c|   4 +-
 net/sunrpc/rpc_pipe.c |   4 +-
 security/Kconfig  |   1 +
 security/apparmor/apparmorfs.c|  48 +++---
 security/inode.c  |  17 +-
 41 files changed, 385 insertions(+), 459 deletions(-)
 create mode 100644 fs/simplefs.c
 create mode 100644 include/linux/simplefs.h

-- 
2.25.2



[PATCH] kvm_host: unify VM_STAT and VCPU_STAT definitions in a single place

2020-04-13 Thread Emanuele Giuseppe Esposito
The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple
files, each used by a different architecure to initialize the debugfs
entries for statistics. Since they all have the same purpose, they can be
unified in a single common definition in include/linux/kvm_host.h

Signed-off-by: Emanuele Giuseppe Esposito 
---
 arch/arm64/kvm/guest.c| 23 +++
 arch/mips/kvm/mips.c  | 61 +++
 arch/powerpc/kvm/book3s.c |  3 --
 arch/powerpc/kvm/booke.c  |  3 --
 arch/s390/kvm/kvm-s390.c  |  3 --
 arch/x86/kvm/x86.c|  3 --
 include/linux/kvm_host.h  |  3 ++
 7 files changed, 43 insertions(+), 56 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 23ebe51410f0..3e3aee8b37c0 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,20 +29,17 @@
 
 #include "trace.h"
 
-#define VM_STAT(x) { #x, offsetof(struct kvm, stat.x), KVM_STAT_VM }
-#define VCPU_STAT(x) { #x, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU }
-
 struct kvm_stats_debugfs_item debugfs_entries[] = {
-   VCPU_STAT(halt_successful_poll),
-   VCPU_STAT(halt_attempted_poll),
-   VCPU_STAT(halt_poll_invalid),
-   VCPU_STAT(halt_wakeup),
-   VCPU_STAT(hvc_exit_stat),
-   VCPU_STAT(wfe_exit_stat),
-   VCPU_STAT(wfi_exit_stat),
-   VCPU_STAT(mmio_exit_user),
-   VCPU_STAT(mmio_exit_kernel),
-   VCPU_STAT(exits),
+   { "halt_successful_poll", VCPU_STAT(halt_successful_poll) },
+   { "halt_attempted_poll", VCPU_STAT(halt_attempted_poll) },
+   { "halt_poll_invalid", VCPU_STAT(halt_poll_invalid) },
+   { "halt_wakeup", VCPU_STAT(halt_wakeup) },
+   { "hvc_exit_stat", VCPU_STAT(hvc_exit_stat) },
+   { "wfe_exit_stat", VCPU_STAT(wfe_exit_stat) },
+   { "wfi_exit_stat", VCPU_STAT(wfi_exit_stat) },
+   { "mmio_exit_user", VCPU_STAT(mmio_exit_user) },
+   { "mmio_exit_kernel", VCPU_STAT(mmio_exit_kernel) },
+   { "exits", VCPU_STAT(exits) },
{ NULL }
 };
 
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 8f05dd0a0f4e..f14b93d02f02 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -39,40 +39,39 @@
 #define VECTORSPACING 0x100/* for EI/VI mode */
 #endif
 
-#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x)
 struct kvm_stats_debugfs_item debugfs_entries[] = {
-   { "wait", VCPU_STAT(wait_exits), KVM_STAT_VCPU },
-   { "cache",VCPU_STAT(cache_exits),KVM_STAT_VCPU },
-   { "signal",   VCPU_STAT(signal_exits),   KVM_STAT_VCPU },
-   { "interrupt",VCPU_STAT(int_exits),  KVM_STAT_VCPU },
-   { "cop_unusable", VCPU_STAT(cop_unusable_exits), KVM_STAT_VCPU },
-   { "tlbmod",   VCPU_STAT(tlbmod_exits),   KVM_STAT_VCPU },
-   { "tlbmiss_ld",   VCPU_STAT(tlbmiss_ld_exits),   KVM_STAT_VCPU },
-   { "tlbmiss_st",   VCPU_STAT(tlbmiss_st_exits),   KVM_STAT_VCPU },
-   { "addrerr_st",   VCPU_STAT(addrerr_st_exits),   KVM_STAT_VCPU },
-   { "addrerr_ld",   VCPU_STAT(addrerr_ld_exits),   KVM_STAT_VCPU },
-   { "syscall",  VCPU_STAT(syscall_exits),  KVM_STAT_VCPU },
-   { "resvd_inst",   VCPU_STAT(resvd_inst_exits),   KVM_STAT_VCPU },
-   { "break_inst",   VCPU_STAT(break_inst_exits),   KVM_STAT_VCPU },
-   { "trap_inst",VCPU_STAT(trap_inst_exits),KVM_STAT_VCPU },
-   { "msa_fpe",  VCPU_STAT(msa_fpe_exits),  KVM_STAT_VCPU },
-   { "fpe",  VCPU_STAT(fpe_exits),  KVM_STAT_VCPU },
-   { "msa_disabled", VCPU_STAT(msa_disabled_exits), KVM_STAT_VCPU },
-   { "flush_dcache", VCPU_STAT(flush_dcache_exits), KVM_STAT_VCPU },
+   { "wait", VCPU_STAT(wait_exits) },
+   { "cache",VCPU_STAT(cache_exits) },
+   { "signal",   VCPU_STAT(signal_exits) },
+   { "interrupt",VCPU_STAT(int_exits) },
+   { "cop_unusable", VCPU_STAT(cop_unusable_exits) },
+   { "tlbmod",   VCPU_STAT(tlbmod_exits) },
+   { "tlbmiss_ld",   VCPU_STAT(tlbmiss_ld_exits) },
+   { "tlbmiss_st",   VCPU_STAT(tlbmiss_st_exits) },
+   { "addrerr_st",   VCPU_STAT(addrerr_st_exits) },
+   { "addrerr_ld",   VCPU_STAT(addrerr_ld_exits) },
+   { "syscall",  VCPU_STAT(syscall_exits) },
+   { "resvd_inst",   VCPU_STAT(resvd_inst_exits) },
+   { "break_inst",   VCPU_STAT(break_inst_exits) },
+   { "trap_inst",VCPU_STAT(trap_inst_exits) },
+   { "msa_fpe",  VCPU_STAT(msa_fpe_exits) },
+   { "fpe"