On Thu, Nov 12, 2020 at 10:06 AM Miklos Szeredi <mszer...@redhat.com> wrote:
>
> On Fri, Nov 6, 2020 at 11:35 PM Vivek Goyal <vgo...@redhat.com> wrote:
> >
> > On Fri, Nov 06, 2020 at 08:33:50PM +0000, Venegas Munoz, Jose Carlos wrote:
> > > Hi Vivek,
> > >
> > > I have tested with Kata 1.12-apha0, the results seems that are better for 
> > > the use fio config I am tracking.
> > >
> > > The fio config does  randrw:
> > >
> > > fio --direct=1 --gtod_reduce=1 --name=test 
> > > --filename=random_read_write.fio --bs=4k --iodepth=64 --size=200M 
> > > --readwrite=randrw --rwmixread=75
> > >
> >
> > Hi Carlos,
> >
> > Thanks for the testing.
> >
> > So basically two conclusions from your tests.
> >
> > - for virtiofs, --thread-pool-size=0 is performing better as comapred
> >   to --thread-pool-size=1 as well as --thread-pool-size=64. Approximately
> >   35-40% better.
> >
> > - virtio-9p is still approximately 30% better than virtiofs
> >   --thread-pool-size=0.
> >
> > As I had done the analysis that this particular workload (mixed read and
> > write) is bad with virtiofs because after every write we are invalidating
> > attrs and cache so next read ends up fetching attrs again. I had posted
> > patches to gain some of the performance.
> >
> > https://lore.kernel.org/linux-fsdevel/20200929185015.gg220...@redhat.com/
> >
> > But I got the feedback to look into implementing file leases instead.
>
> Hmm, the FUSE_AUTO_INVAL_DATA feature is buggy, how about turning it
> off for now?   9p doesn't have it, so no point in enabling it for
> virtiofs by default.
>
> Also I think some confusion comes from cache=auto being the default
> for virtiofs.    Not sure what the default is for 9p, but comparing
> default to default will definitely not be apples to apples since this
> mode is nonexistent in 9p.
>
> 9p:cache=none  <-> virtiofs:cache=none
> 9p:cache=loose <-> virtiofs:cache=always
>
> "9p:cache=mmap" and "virtiofs:cache=auto" have no match.
>
> Untested patch attached.

Another performance issue with virtiofs could be due to the strict
page writeback rules in fuse that are meant to prevent misuse of
kernel memory by unprivileged processes.   Since virtiofs isn't
subject to that limitation, these could be relaxed.

Attaching a patch that does one half of this.  The other half is
getting rid of the page copying, that's a bit more involved, but
shouldn't be too difficult.  Just need to duplicate the cached
writeback callbacks for virtiofs and do away with the complex temp
page stuff.

Thanks,
Miklos
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index d414c787e362..92c92c482c57 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -502,6 +502,7 @@ struct fuse_fs_context {
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
 	bool dax:1;
+	bool relax_writeback:1;
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 36ab05315828..029325ebd1b3 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1130,7 +1130,8 @@ void fuse_free_conn(struct fuse_conn *fc)
 }
 EXPORT_SYMBOL_GPL(fuse_free_conn);
 
-static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
+static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb,
+			 struct fuse_fs_context *ctx)
 {
 	int err;
 	char *suffix = "";
@@ -1151,21 +1152,24 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
 
 	/* fuse does it's own writeback accounting */
 	sb->s_bdi->capabilities &= ~BDI_CAP_WRITEBACK_ACCT;
-	sb->s_bdi->capabilities |= BDI_CAP_STRICTLIMIT;
 
-	/*
-	 * For a single fuse filesystem use max 1% of dirty +
-	 * writeback threshold.
-	 *
-	 * This gives about 1M of write buffer for memory maps on a
-	 * machine with 1G and 10% dirty_ratio, which should be more
-	 * than enough.
-	 *
-	 * Privileged users can raise it by writing to
-	 *
-	 *    /sys/class/bdi/<bdi>/max_ratio
-	 */
-	bdi_set_max_ratio(sb->s_bdi, 1);
+	if (!ctx->relax_writeback) {
+		sb->s_bdi->capabilities |= BDI_CAP_STRICTLIMIT;
+
+		/*
+		 * For a single fuse filesystem use max 1% of dirty +
+		 * writeback threshold.
+		 *
+		 * This gives about 1M of write buffer for memory maps on a
+		 * machine with 1G and 10% dirty_ratio, which should be more
+		 * than enough.
+		 *
+		 * Privileged users can raise it by writing to
+		 *
+		 *    /sys/class/bdi/<bdi>/max_ratio
+		 */
+		bdi_set_max_ratio(sb->s_bdi, 1);
+	}
 
 	return 0;
 }
@@ -1354,7 +1358,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 
 	fc->dev = sb->s_dev;
 	fm->sb = sb;
-	err = fuse_bdi_init(fc, sb);
+	err = fuse_bdi_init(fc, sb, ctx);
 	if (err)
 		goto err_dev_free;
 
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 8868ac31a3c0..efbe1697612e 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -1288,6 +1288,7 @@ static inline void virtio_fs_ctx_set_defaults(struct fuse_fs_context *ctx)
 	ctx->destroy = true;
 	ctx->no_control = true;
 	ctx->no_force_umount = true;
+	ctx->relax_writeback = true;
 }
 
 static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)

Reply via email to