Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP

2021-08-17 Thread JeffleXu



On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jeffl...@linux.alibaba.com) wrote:
>> For passthrough, when the corresponding virtiofs in guest is mounted
>> with '-o dax=inode', advertise that the file is capable of per-file
>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
>>
>> Signed-off-by: Jeffle Xu 
>> ---
>>  tools/virtiofsd/passthrough_ll.c | 43 
>>  1 file changed, 43 insertions(+)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c 
>> b/tools/virtiofsd/passthrough_ll.c
>> index 5b6228210f..4cbd904248 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -171,6 +171,7 @@ struct lo_data {
>>  int allow_direct_io;
>>  int announce_submounts;
>>  int perfile_dax_cap; /* capability of backend fs */
>> +bool perfile_dax; /* enable per-file DAX or not */
>>  bool use_statx;
>>  struct lo_inode root;
>>  GHashTable *inodes; /* protected by lo->mutex */
>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct 
>> fuse_conn_info *conn)
>>  
>>  if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>>  conn->want |= FUSE_CAP_PERFILE_DAX;
>> +lo->perfile_dax = 1;
>> +}
>> +else {
>> +lo->perfile_dax = 0;
>>  }
>>  }
>>  
>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, 
>> const char *pathname,
>>  return 0;
>>  }
>>  
>> +/*
>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
>> + * enabled for this file.
>> + */
>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
>> + const char *name)
>> +{
>> +int res, fd;
>> +int ret = false;;
>> +unsigned int attr;
>> +struct fsxattr xattr;
>> +
>> +if (!lo->perfile_dax)
>> +return false;
>> +
>> +/* Open file without O_PATH, so that ioctl can be called. */
>> +fd = openat(dir->fd, name, O_NOFOLLOW);
>> +if (fd == -1)
>> +return false;
> 
> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> might stumble into a /dev node or something else we're not allowed to
> open?

As far as I know, virtiofsd will pivot_root/chroot to the source
directory, and can only access files inside the source directory
specified by "-o source=". Then where do these unexpected files come
from? Besides, fd opened without O_PATH here is temporary and used for
FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
function returns.

> 
>> +if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
>> +res = ioctl(fd, FS_IOC_GETFLAGS, );
>> +if (!res && (attr & FS_DAX_FL))
>> +ret = true;
>> +}
>> +else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
>> +res = ioctl(fd, FS_IOC_FSGETXATTR, );
>> +if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
>> +ret = true;
>> +}
> 
> This all looks pretty expensive for each lookup.

Yes. it can be somehow optimized if we can agree on the way of storing
the dax flag persistently.

-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT

2021-08-17 Thread JeffleXu



On 8/18/21 1:15 AM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jeffl...@linux.alibaba.com) wrote:
>> In FUSE_INIT negotiating phase, server/client should advertise if it
>> supports per-file DAX.
>>
>> Once advertising support for per-file DAX feature, virtiofsd should
>> support storing FS_DAX_FL flag persistently passed by
>> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
>> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
>>
>> Currently only ext4/xfs since linux kernel v5.8 support storing
>> FS_DAX_FL flag persistently, and thus advertise support for per-file
>> DAX feature only when the backend fs type is ext4 and xfs.
> 
> I'm a little worried about the meaning of the flags we're storing and
> the fact we're storing them in the normal host DAX flags.
> 
> Doesn't this mean that we're using a single host flag to mean:
>   a) It can be mapped as DAX on the host if it was a real DAX device
>   b) We can map it as DAX inside the guest with virtiofs?

Yes the side effect is that the host file is also dax enabled if the
backend fs is built upon real nvdimm device.

The rationale here is that, fuse daemon shall be capable of *marking*
the file as dax capable *persistently*, so that it can be informed that
this file is capable of dax later.

I'm not sure if xattr (extent attribute) is a better option for this?


> 
> what happens when we're using usernamespaces for the guest?
> 
> Dave
> 
> 
>> Signed-off-by: Jeffle Xu 
>> ---
>>  tools/virtiofsd/fuse_common.h|  5 +
>>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++
>>  tools/virtiofsd/passthrough_ll.c | 29 +
>>  3 files changed, 40 insertions(+)
>>
>> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
>> index 8a75729be9..ee6fc64c23 100644
>> --- a/tools/virtiofsd/fuse_common.h
>> +++ b/tools/virtiofsd/fuse_common.h
>> @@ -372,6 +372,11 @@ struct fuse_file_info {
>>   */
>>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
>>  
>> +/**
>> + * Indicates support for per-file DAX.
>> + */
>> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
>> +
>>  /**
>>   * Ioctl flags
>>   *
>> diff --git a/tools/virtiofsd/fuse_lowlevel.c 
>> b/tools/virtiofsd/fuse_lowlevel.c
>> index 50fc5c8d5a..04a4f17423 100644
>> --- a/tools/virtiofsd/fuse_lowlevel.c
>> +++ b/tools/virtiofsd/fuse_lowlevel.c
>> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>>  if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>>  se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
>>  }
>> +if (arg->flags & FUSE_PERFILE_DAX) {
>> +se->conn.capable |= FUSE_CAP_PERFILE_DAX;
>> +}
>>  #ifdef HAVE_SPLICE
>>  #ifdef HAVE_VMSPLICE
>>  se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
>> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>>  if (se->conn.want & FUSE_CAP_POSIX_ACL) {
>>  outarg.flags |= FUSE_POSIX_ACL;
>>  }
>> +if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
>> +outarg.flags |= FUSE_PERFILE_DAX;
>> +}
>>  outarg.max_readahead = se->conn.max_readahead;
>>  outarg.max_write = se->conn.max_write;
>>  if (se->conn.max_background >= (1 << 16)) {
>> diff --git a/tools/virtiofsd/passthrough_ll.c 
>> b/tools/virtiofsd/passthrough_ll.c
>> index e170b17adb..5b6228210f 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -53,8 +53,10 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include "qemu/cutils.h"
>>  #include "passthrough_helpers.h"
>> @@ -136,6 +138,13 @@ enum {
>>  SANDBOX_CHROOT,
>>  };
>>  
>> +/* capability of storing DAX flag persistently */
>> +enum {
>> +DAX_CAP_NONE,  /* not supported */
>> +DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
>> +DAX_CAP_XATTR, /* stored in xflags 
>> (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
>> +};
>> +
>>  typedef struct xattr_map_entry {
>>  char *key;
>>  char *prepend;
>> @@ -161,6 +170,7 @@ struct lo_data {
>>  int readdirplus_clear;
>>  int allow_direct_io;
>>  int announce_submounts;
>> +int perfile_dax_cap; /* capability of backend fs */
>>  bool use_statx;
>>  struct lo_inode root;
>>  GHashTable *inodes; /* protected by lo->mutex */
>> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct 
>> fuse_conn_info *conn)
>>  conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
>>  lo->killpriv_v2 = 0;
>>  }
>> +
>> +if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>> +conn->want |= FUSE_CAP_PERFILE_DAX;
>> +}
>>  }
>>  
>>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
>> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct 
>> lo_inode *root)
>>  int fd, res;
>>  struct stat stat;
>>  uint64_t mnt_id;
>> +struct statfs statfs;
>>  
>> 

Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread JeffleXu



On 8/17/21 10:57 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 09:22:53PM +0800, JeffleXu wrote:
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
 On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  wrote:
>
> This patchset adds support of per-file DAX for virtiofs, which is
> inspired by Ira Weiny's work on ext4[1] and xfs[2].

 Can you please explain the background of this change in detail?

 Why would an admin want to enable DAX for a particular virtiofs file
 and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded. 
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> Downloading 2MB chunk should not be a big issue (IMHO). 

Then the latency increases. Latency really matters in our use case.


> And if this
> turns out to be real concern, we could experiment with a smaller
> mapping granularity.
> 


-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread JeffleXu



On 8/17/21 10:54 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 09:08:35PM +0800, JeffleXu wrote:
>>
>>
>> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
>>> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
>>>  wrote:

 * Miklos Szeredi (mik...@szeredi.hu) wrote:
> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  
> wrote:
>>
>> This patchset adds support of per-file DAX for virtiofs, which is
>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>
> Can you please explain the background of this change in detail?
>
> Why would an admin want to enable DAX for a particular virtiofs file
> and not for others?

 Where we're contending on virtiofs dax cache size it makes a lot of
 sense; it's quite expensive for us to map something into the cache
 (especially if we push something else out), so selectively DAXing files
 that are expected to be hot could help reduce cache churn.
>>>
>>> If this is a performance issue, it should be fixed in a way that
>>> doesn't require hand tuning like you suggest, I think.
>>>
>>> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
>>> can help understand the virtiofs case as well.
>>>
>>
>> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
>>
>> "Boaz Harrosh wondered why someone might want to turn DAX off for a
>> persistent memory device. Hellwig said that the performance "could
>> suck"; Williams noted that the page cache could be useful for some
>> applications as well. Jan Kara pointed out that reads from persistent
>> memory are close to DRAM speed, but that writes are not; the page cache
>> could be helpful for frequent writes. Applications need to change to
>> fully take advantage of DAX, Williams said; part of the promise of
>> adding a flag is that users can do DAX on smaller granularities than a
>> full filesystem."
>>
>> In summary, page cache is preferable in some cases, and thus more fine
>> grained way of DAX control is needed.
> 
> In case of virtiofs, we are using page cache on host. So this probably
> is not a factor for us. Writes will go in page cache of host.
> 
>>
>>
>> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
>> may compete for limited DAX window resource.
>>
>> Besides, supporting DAX for small files can be expensive. Small files
>> can consume DAX window resource rapidly, and if small files are accessed
>> only once, the cost of mmap/munmap on host can not be ignored.
> 
> W.r.r access pattern, same applies to large files also. So if a section
> of large file is accessed only once, it will consume dax window as well
> and will have to be reclaimed.
> 
> Dax in virtiofs provides speed gain only if map file once and access
> it multiple times. If that pattern does not hold true, then dax does
> not seem to provide speed gains and in fact might be slower than
> non-dax.
> 
> So if there is a pattern where we know some files are accessed repeatedly
> while others are not, then enabling/disabling dax selectively will make
> sense. Question is how many workloads really know that and how will
> you make that decision. Do you have any data to back that up.

There's no precise performance data yet. Empirically, small files used
to have worse performance with dax, while frequently accessed files
(such as .so libraries) behave better with dax.

> 
> W.r.t small file, is that a real concern. If that file is being accessed
> mutliple times, then we will still see the speed gain. Only down side
> is that there is little wastage of resources because our minimum dax
> mapping granularity is 2MB. I am wondering can we handle that by
> supporting other dax mapping granularities as well. say 256K and let
> users choose it.


-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH linux-next v3 0/6] vdpa: enable user to set mac, mtu

2021-08-17 Thread Parav Pandit via Virtualization



> From: Jason Wang 
> Sent: Wednesday, August 18, 2021 10:02 AM
> On Wed, Aug 18, 2021 at 11:15 AM Parav Pandit  wrote:
[..]
> > > I'm inclined to say vxlan is closer to a model to follow.
> > Ok. thanks for the feedback. We are using the model close to vxlan.
> > Lets resolve should we have it at creation time, post creation or both?
> > (a) Creation time
> > Pros:
> > - simpler single api for user
> > - eliminates needs of inventing stats reset in future series
> > Cons:
> > - inability to reuse the device with different config
> 
> This can be solved by destroying the instance and re-creating it with a
> different params?
> 
Yes, which is what I tried be say below.

> > - This may not be of great advantage, and it is probably fine to have 
> > creation time params
  ^ here.

> >
> > (b) post creation time:
> > Pros:
> > - able to reuse the device with different config for say different VM.
> > - will require stats reset in future once stats are implemented
> 
> Any reason for doing this other than re-creating the device?
> 
No. Only reason I can think of is, device reconfig may be faster than recreate.
But I weigh user simplicity more at the beginning and optimizations to bring 
later if required.

> > Cons:
> > - more commands for users to config a device, better to have the ability at
> create time.
> 
> We probably need to support post creation but it should be device specific.

True. Your below device resize is good example of it.

> 
> E.g we may support device resize for virtio-blk devices.
> 
> But it can be done on top I think.
I think so too.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH linux-next v3 0/6] vdpa: enable user to set mac, mtu

2021-08-17 Thread Jason Wang
On Wed, Aug 18, 2021 at 11:15 AM Parav Pandit  wrote:
>
>
>
> > From: Michael S. Tsirkin 
> > Sent: Tuesday, August 17, 2021 2:24 AM
> >
> > On Mon, Aug 09, 2021 at 09:51:49AM +, Parav Pandit wrote:
> > > > From: Michael S. Tsirkin 
> > > > Sent: Monday, August 9, 2021 3:10 PM
> > > >
> > > > On Fri, Aug 06, 2021 at 08:55:56AM +, Parav Pandit wrote:
> > > > >
> > > > >
> > > > > >
> > > > > > The point is to try and not reinvent a dedicated vpda interface
> > > > > > where a generic one exits.
> > > > > > E.g. for phy things such as mac speed etc, I think most people
> > > > > > are using ethtool things right?
> > > > >
> > > > > As you know vdpa is the backend device for the front-end netdevice
> > > > accessed by the ethtool.
> > > > > vdpa management tool here is composing the vdpa device.
> > > > >
> > > > > For example creator (hypervisor) of the vdpa devices knows that a
> > > > > guest VM is given 4 vcpus, So hypervisor creates a vdpa devices
> > > > > with config space layout as, max_virtqueue_pairs = 4.
> > > > > And the MAC address chosen by hypervisor in mac[6].
> > > > >
> > > > > Guest VM ethtool can still chose to use less number of channels.
> > > > >
> > > > > Typically,
> > > > > ethtool is for guest VM.
> > > > > vdpa device is in hypevisor.
> > > > >
> > > > > How can hypervisor compose a vdpa device without any tool?
> > > > > How can it tell ethtool, what is supported and what are the defaults?
> > > > >
> > > > > I must be misunderstanding your comment about ethtool.
> > > > > Can you please explain?
> > > >
> > > >
> > > > I am basically saying that we probably want to be able to change MAC
> > > > of a VDPA device on the host without desroying and recreating the
> > > > device as long as it's not in use.
> > > Ok. I understood your comment now.
> > > Yes, this was the objective which is why they are present as independent
> > config knob.
> > > Jason was suggesting to have them as creation only knobs, which requires
> > recreate.
> > >
> > > I don't have strong opinion for either method.
> > >
> > > Passing them at creation time is simpler for user.
> > > If user needs the ability to modify and reuse same device with different
> > config, extending such support in future like this patch should possible.
> > >
> > > So there are two questions to close.
> > > 1. Can we start with config params at vdpa device creation time?
> >
> > I'm not sure whether we need both but I'd like to see a full API and I 
> > think we
> > all agree host wants ability to tweak mac after device creation even if 
> > guest is
> > not allowed to change mac, right?
> >
> Yes.
> $ vdpa dev add name foo mgmtdev pci/:03:00.0 mac 00:11:22:33:44:55 maxvqs 
> 8 mtu 9000
>
> Above API if we do at creation time. It is likely simpler for user to pass 
> necessary params during creation time.
>
> > > 2. Is it ok to have these config params as individual fields at netlink 
> > > U->K
> > UAPI level?
> > > This is the method proposed in this patch series.
> > > (Similar to incrementally growing vxlan ip link command).
> > >
> > > Or
> > > They should be packed in a structure between U-> K and deal with
> > typecasting based on size and more?
> > > (Jason's input).
> >
> > I'm inclined to say vxlan is closer to a model to follow.
> Ok. thanks for the feedback. We are using the model close to vxlan.
> Lets resolve should we have it at creation time, post creation or both?
> (a) Creation time
> Pros:
> - simpler single api for user
> - eliminates needs of inventing stats reset in future series
> Cons:
> - inability to reuse the device with different config

This can be solved by destroying the instance and re-creating it with
a different params?

> - This may not be of great advantage, and it is probably fine to have 
> creation time params
>
> (b) post creation time:
> Pros:
> - able to reuse the device with different config for say different VM.
> - will require stats reset in future once stats are implemented

Any reason for doing this other than re-creating the device?

> Cons:
> - more commands for users to config a device, better to have the ability at 
> create time.

We probably need to support post creation but it should be device specific.

E.g we may support device resize for virtio-blk devices.

But it can be done on top I think.

Thanks

>
> >
> > --
> > MST
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread JeffleXu



On 8/17/21 10:08 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 15:22, JeffleXu  wrote:
>>
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
 On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  wrote:
>
> This patchset adds support of per-file DAX for virtiofs, which is
> inspired by Ira Weiny's work on ext4[1] and xfs[2].

 Can you please explain the background of this change in detail?

 Why would an admin want to enable DAX for a particular virtiofs file
 and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded.
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> That could be achieved with a plain fuse filesystem on the host (which
> will get 4k READ requests for accesses to mapped area inside guest).
> Since this can be done selectively for files which are not yet
> downloaded, the extra layer wouldn't be a performance problem.

I'm not sure if I fully understand your idea. Then in this case, host
daemon only prepares 4KB while guest thinks that the whole DAX window
(e.g., 2MB) has been fully mapped. Then when guest really accesses the
remained part (2MB - 4KB), page fault is triggered, and now host daemon
is responsible for downloading the remained part?

> 
> Is there a reason why that wouldn't work?
> 
> Thanks,
> Miklos
> 

-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] i2c: virtio: Update i2c-adapter's of_node

2021-08-17 Thread Viresh Kumar
Set of-node of the adapter to the virtio device's of-node to enable
automatic parsing the of the I2C devices, if present in the DT.

Signed-off-by: Viresh Kumar 
---
 drivers/i2c/busses/i2c-virtio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
index d3e60d9cde10..2dde69cfb9aa 100644
--- a/drivers/i2c/busses/i2c-virtio.c
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -228,6 +228,7 @@ static int virtio_i2c_probe(struct virtio_device *vdev)
vi->adap.algo = _algorithm;
vi->adap.quirks = _i2c_quirks;
vi->adap.dev.parent = >dev;
+   vi->adap.dev.of_node = vdev->dev.of_node;
i2c_set_adapdata(>adap, vi);
 
/*
-- 
2.31.1.272.g89b43f80a514

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH linux-next v3 0/6] vdpa: enable user to set mac, mtu

2021-08-17 Thread Parav Pandit via Virtualization



> From: Michael S. Tsirkin 
> Sent: Tuesday, August 17, 2021 2:24 AM
> 
> On Mon, Aug 09, 2021 at 09:51:49AM +, Parav Pandit wrote:
> > > From: Michael S. Tsirkin 
> > > Sent: Monday, August 9, 2021 3:10 PM
> > >
> > > On Fri, Aug 06, 2021 at 08:55:56AM +, Parav Pandit wrote:
> > > >
> > > >
> > > > >
> > > > > The point is to try and not reinvent a dedicated vpda interface
> > > > > where a generic one exits.
> > > > > E.g. for phy things such as mac speed etc, I think most people
> > > > > are using ethtool things right?
> > > >
> > > > As you know vdpa is the backend device for the front-end netdevice
> > > accessed by the ethtool.
> > > > vdpa management tool here is composing the vdpa device.
> > > >
> > > > For example creator (hypervisor) of the vdpa devices knows that a
> > > > guest VM is given 4 vcpus, So hypervisor creates a vdpa devices
> > > > with config space layout as, max_virtqueue_pairs = 4.
> > > > And the MAC address chosen by hypervisor in mac[6].
> > > >
> > > > Guest VM ethtool can still chose to use less number of channels.
> > > >
> > > > Typically,
> > > > ethtool is for guest VM.
> > > > vdpa device is in hypevisor.
> > > >
> > > > How can hypervisor compose a vdpa device without any tool?
> > > > How can it tell ethtool, what is supported and what are the defaults?
> > > >
> > > > I must be misunderstanding your comment about ethtool.
> > > > Can you please explain?
> > >
> > >
> > > I am basically saying that we probably want to be able to change MAC
> > > of a VDPA device on the host without desroying and recreating the
> > > device as long as it's not in use.
> > Ok. I understood your comment now.
> > Yes, this was the objective which is why they are present as independent
> config knob.
> > Jason was suggesting to have them as creation only knobs, which requires
> recreate.
> >
> > I don't have strong opinion for either method.
> >
> > Passing them at creation time is simpler for user.
> > If user needs the ability to modify and reuse same device with different
> config, extending such support in future like this patch should possible.
> >
> > So there are two questions to close.
> > 1. Can we start with config params at vdpa device creation time?
> 
> I'm not sure whether we need both but I'd like to see a full API and I think 
> we
> all agree host wants ability to tweak mac after device creation even if guest 
> is
> not allowed to change mac, right?
>
Yes.
$ vdpa dev add name foo mgmtdev pci/:03:00.0 mac 00:11:22:33:44:55 maxvqs 8 
mtu 9000

Above API if we do at creation time. It is likely simpler for user to pass 
necessary params during creation time.
 
> > 2. Is it ok to have these config params as individual fields at netlink U->K
> UAPI level?
> > This is the method proposed in this patch series.
> > (Similar to incrementally growing vxlan ip link command).
> >
> > Or
> > They should be packed in a structure between U-> K and deal with
> typecasting based on size and more?
> > (Jason's input).
> 
> I'm inclined to say vxlan is closer to a model to follow.
Ok. thanks for the feedback. We are using the model close to vxlan.
Lets resolve should we have it at creation time, post creation or both?
(a) Creation time 
Pros: 
- simpler single api for user
- eliminates needs of inventing stats reset in future series
Cons:
- inability to reuse the device with different config
- This may not be of great advantage, and it is probably fine to have creation 
time params

(b) post creation time:
Pros:
- able to reuse the device with different config for say different VM.
- will require stats reset in future once stats are implemented
Cons:
- more commands for users to config a device, better to have the ability at 
create time.

> 
> --
> MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Patch v1 3/3] vdpa/mlx5: Add multiqueue support

2021-08-17 Thread Jason Wang
On Tue, Aug 17, 2021 at 4:57 PM Jason Wang  wrote:
>
> On Tue, Aug 17, 2021 at 2:01 PM Eli Cohen  wrote:
> >
> > On Tue, Aug 17, 2021 at 01:48:17PM +0800, Jason Wang wrote:
> > > On Tue, Aug 17, 2021 at 1:26 PM Eli Cohen  wrote:
> > > >
> > > > On Tue, Aug 17, 2021 at 12:19:55PM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2021/8/17 下午12:03, Parav Pandit 写道:
> > > > > > > From: Jason Wang 
> > > > > > > Sent: Tuesday, August 17, 2021 9:26 AM
> > > > > > >
> > > > > > > On Tue, Aug 17, 2021 at 3:37 AM Michael S. Tsirkin 
> > > > > > > 
> > > > > > > wrote:
> > > > > > > > On Mon, Aug 16, 2021 at 07:56:59PM +0300, Eli Cohen wrote:
> > > > > > > > > On Mon, Aug 16, 2021 at 01:58:06PM +0800, Jason Wang wrote:
> > > > > > > > > > 在 2021/8/16 下午1:47, Eli Cohen 写道:
> > > > > > > > > > > On Mon, Aug 16, 2021 at 12:16:14PM +0800, Jason Wang 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > 在 2021/8/12 下午5:50, Eli Cohen 写道:
> > > > > > > > > > > > > On Thu, Aug 12, 2021 at 03:04:35PM +0800, Jason Wang 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > 在 2021/8/12 下午3:01, Eli Cohen 写道:
> > > > > > > > > > > > > > > On Thu, Aug 12, 2021 at 02:47:06PM +0800, Jason 
> > > > > > > > > > > > > > > Wang wrote:
> > > > > > > > > > > > > > > > On Thu, Aug 12, 2021 at 12:55 PM Eli Cohen
> > > > > > >  wrote:
> > > > > > > > > > > > > > > > > On Thu, Aug 12, 2021 at 11:19:19AM +0800, 
> > > > > > > > > > > > > > > > > Jason Wang
> > > > > > > wrote:
> > > > > > > > > > > > > > > > > > 在 2021/8/11 下午7:04, Eli Cohen 写道:
> > > > > > > > > > > > > > > > > > > On Wed, Aug 11, 2021 at 04:37:44PM +0800, 
> > > > > > > > > > > > > > > > > > > Jason
> > > > > > > Wang wrote:
> > > > > > > > > > > > > > > > > > > > 在 2021/8/11 下午3:53, Eli Cohen 写道:
> > > > > > > > > > > > > > > > > > > > > > One thing need to solve for mq is 
> > > > > > > > > > > > > > > > > > > > > > that the:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > +static u16 ctrl_vq_idx(struct
> > > > > > > > > > > > > > > > > > > > > > > +mlx5_vdpa_dev *mvdev) {
> > > > > > > > > > > > > > > > > > > > > > > + return 2 *
> > > > > > > > > > > > > > > > > > > > > > > +mlx5_vdpa_max_qps(mvdev->max_vqs);
> > > > > > > > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > > > > > > We should handle the case when MQ is
> > > > > > > > > > > > > > > > > > > > > > supported by the device but not the 
> > > > > > > > > > > > > > > > > > > > > > driver.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > E.g in the case when we have 2 
> > > > > > > > > > > > > > > > > > > > > > queue pairs:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > When MQ is enabled, cvq is queue 4
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > When MQ is not enabled, cvq is 
> > > > > > > > > > > > > > > > > > > > > > queue 2
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > There's some issue with this. I get
> > > > > > > > > > > > > > > > > > > > > callbacks targeting specific 
> > > > > > > > > > > > > > > > > > > > > virtqueues before
> > > > > > > features negotiation has been completed.
> > > > > > > > > > > > > > > > > > > > > Specifically, I get set_vq_cb() 
> > > > > > > > > > > > > > > > > > > > > calls. At
> > > > > > > > > > > > > > > > > > > > > this point I must know the control vq 
> > > > > > > > > > > > > > > > > > > > > index.
> > > > > > > > > > > > > > > > > > > > So I think we need do both:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 1) At one hand, it's a bug for the 
> > > > > > > > > > > > > > > > > > > > userspace
> > > > > > > > > > > > > > > > > > > > to use vq_index before feature is 
> > > > > > > > > > > > > > > > > > > > negotiated
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > (looks like a bug in my cvq series that 
> > > > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > call SET_VRING_CALL before feature is 
> > > > > > > > > > > > > > > > > > > > negotiate,
> > > > > > > which I will look).
> > > > > > > > > > > > > > > > > > > > 2) At the other hand, the driver should 
> > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > able to deal with that
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > All I can do is drop callbacks for VQs 
> > > > > > > > > > > > > > > > > > > before
> > > > > > > > > > > > > > > > > > > features negotation has been completed.
> > > > > > > > > > > > > > > > > > Or just leave queue index 0, 1 work.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Since it is not expected to be changed.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Right, will do.
> > > > > > > > > > > > > > > 

Re: [PATCH v15] i2c: virtio: add a virtio i2c frontend driver

2021-08-17 Thread Jie Deng



On 2021/8/18 4:22, Wolfram Sang wrote:


  Michael S. Tsirkin 
Okay, with rc6 being released, I won't wait for an immutable branch
anymore. I applied this now and we will see if there will be a merge
conflict. If so, it will be trivial to handle, I'd think. So:

Applied to for-next, thanks!



Thanks Wolfram!


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP

2021-08-17 Thread Dr. David Alan Gilbert
* Jeffle Xu (jeffl...@linux.alibaba.com) wrote:
> For passthrough, when the corresponding virtiofs in guest is mounted
> with '-o dax=inode', advertise that the file is capable of per-file
> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> 
> Signed-off-by: Jeffle Xu 
> ---
>  tools/virtiofsd/passthrough_ll.c | 43 
>  1 file changed, 43 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c 
> b/tools/virtiofsd/passthrough_ll.c
> index 5b6228210f..4cbd904248 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -171,6 +171,7 @@ struct lo_data {
>  int allow_direct_io;
>  int announce_submounts;
>  int perfile_dax_cap; /* capability of backend fs */
> +bool perfile_dax; /* enable per-file DAX or not */
>  bool use_statx;
>  struct lo_inode root;
>  GHashTable *inodes; /* protected by lo->mutex */
> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct 
> fuse_conn_info *conn)
>  
>  if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>  conn->want |= FUSE_CAP_PERFILE_DAX;
> + lo->perfile_dax = 1;
> +}
> +else {
> + lo->perfile_dax = 0;
>  }
>  }
>  
> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const 
> char *pathname,
>  return 0;
>  }
>  
> +/*
> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> + * enabled for this file.
> + */
> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> +  const char *name)
> +{
> +int res, fd;
> +int ret = false;;
> +unsigned int attr;
> +struct fsxattr xattr;
> +
> +if (!lo->perfile_dax)
> + return false;
> +
> +/* Open file without O_PATH, so that ioctl can be called. */
> +fd = openat(dir->fd, name, O_NOFOLLOW);
> +if (fd == -1)
> +return false;

Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
might stumble into a /dev node or something else we're not allowed to
open?

> +if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
> +res = ioctl(fd, FS_IOC_GETFLAGS, );
> +if (!res && (attr & FS_DAX_FL))
> + ret = true;
> +}
> +else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
> + res = ioctl(fd, FS_IOC_FSGETXATTR, );
> + if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
> + ret = true;
> +}

This all looks pretty expensive for each lookup.

Dave


> +close(fd);
> +return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
> @@ -1038,6 +1078,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t 
> parent, const char *name,
>  e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>  }
>  
> +if (lo_should_enable_dax(lo, dir, name))
> + e->attr_flags |= FUSE_ATTR_DAX;
> +
>  inode = lo_find(lo, >attr, mnt_id);
>  if (inode) {
>  close(newfd);
> -- 
> 2.27.0
> 
> ___
> Virtio-fs mailing list
> virtio...@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-17 Thread kernel test robot
Hi Xianting,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tty/tty-testing]
[also build test WARNING on char-misc/char-misc-testing soc/for-next v5.14-rc6 
next-20210817]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210817-212556
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git 
tty-testing
config: i386-randconfig-r021-20210816 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 
2c6448cdc2f68f8c28fd0bd9404182b81306e6e6)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/f12c3bee9f2413ed7643d858b40ce2337329fdae
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Xianting-Tian/make-hvc-pass-dma-capable-memory-to-its-backend/20210817-212556
git checkout f12c3bee9f2413ed7643d858b40ce2337329fdae
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   clang-14: warning: optimization flag '-falign-jumps=0' is not supported 
[-Wignored-optimization-argument]
   In file included from drivers/tty/hvc/hvc_console.c:15:
   In file included from include/linux/kbd_kern.h:5:
   In file included from include/linux/tty.h:5:
   In file included from include/linux/fs.h:6:
   In file included from include/linux/wait_bit.h:8:
   In file included from include/linux/wait.h:9:
   In file included from include/linux/spinlock.h:51:
   In file included from include/linux/preempt.h:78:
   In file included from arch/x86/include/asm/preempt.h:7:
   In file included from include/linux/thread_info.h:60:
   arch/x86/include/asm/thread_info.h:172:13: warning: calling 
'__builtin_frame_address' with a nonzero argument is unsafe [-Wframe-address]
   oldframe = __builtin_frame_address(1);
  ^~
   arch/x86/include/asm/thread_info.h:174:11: warning: calling 
'__builtin_frame_address' with a nonzero argument is unsafe [-Wframe-address]
   frame = __builtin_frame_address(2);
   ^~
>> drivers/tty/hvc/hvc_console.c:160:18: warning: address of array 'hp->c' will 
>> always evaluate to 'true' [-Wpointer-bool-conversion]
   if (!hp || !hp->c)
  ~^
   3 warnings generated.


vim +160 drivers/tty/hvc/hvc_console.c

   136  
   137  /*
   138   * Console APIs, NOT TTY.  These APIs are available immediately when
   139   * hvc_console_setup() finds adapters.
   140   */
   141  
   142  static void hvc_console_print(struct console *co, const char *b,
   143unsigned count)
   144  {
   145  char *c;
   146  unsigned i = 0, n = 0;
   147  int r, donecr = 0, index = co->index;
   148  unsigned long flags;
   149  struct hvc_struct *hp;
   150  
   151  /* Console access attempt outside of acceptable console range. 
*/
   152  if (index >= MAX_NR_HVC_CONSOLES)
   153  return;
   154  
   155  /* This console adapter was removed so it is not usable. */
   156  if (vtermnos[index] == -1)
   157  return;
   158  
   159  hp = cons_hvcs[index];
 > 160  if (!hp || !hp->c)
   161  return;
   162  
   163  c = hp->c;
   164  
   165  spin_lock_irqsave(>c_lock, flags);
   166  while (count > 0 || i > 0) {
   167  if (count > 0 && i < sizeof(c)) {
   168  if (b[n] == '\n' && !donecr) {
   169  c[i++] = '\r';
   170  donecr = 1;
   171  } else {
   172  c[i++] = b[n++];
   173  donecr = 0;
   174  --count;
   175  }
   176  } else {
   177  r = cons_ops[index]->put_chars(vtermnos[index], 
c, i);
   178  if (r <= 0) {
   179  /* throw away characters on error
   180   * but spin in case of -EAGAIN */
   181  if (r != -EAGAIN) {
   182   

Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT

2021-08-17 Thread Dr. David Alan Gilbert
* Jeffle Xu (jeffl...@linux.alibaba.com) wrote:
> In FUSE_INIT negotiating phase, server/client should advertise if it
> supports per-file DAX.
> 
> Once advertising support for per-file DAX feature, virtiofsd should
> support storing FS_DAX_FL flag persistently passed by
> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> 
> Currently only ext4/xfs since linux kernel v5.8 support storing
> FS_DAX_FL flag persistently, and thus advertise support for per-file
> DAX feature only when the backend fs type is ext4 and xfs.

I'm a little worried about the meaning of the flags we're storing and
the fact we're storing them in the normal host DAX flags.

Doesn't this mean that we're using a single host flag to mean:
  a) It can be mapped as DAX on the host if it was a real DAX device
  b) We can map it as DAX inside the guest with virtiofs?

what happens when we're using usernamespaces for the guest?

Dave


> Signed-off-by: Jeffle Xu 
> ---
>  tools/virtiofsd/fuse_common.h|  5 +
>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++
>  tools/virtiofsd/passthrough_ll.c | 29 +
>  3 files changed, 40 insertions(+)
> 
> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> index 8a75729be9..ee6fc64c23 100644
> --- a/tools/virtiofsd/fuse_common.h
> +++ b/tools/virtiofsd/fuse_common.h
> @@ -372,6 +372,11 @@ struct fuse_file_info {
>   */
>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
>  
> +/**
> + * Indicates support for per-file DAX.
> + */
> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> +
>  /**
>   * Ioctl flags
>   *
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 50fc5c8d5a..04a4f17423 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>  if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>  se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
>  }
> +if (arg->flags & FUSE_PERFILE_DAX) {
> +se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> +}
>  #ifdef HAVE_SPLICE
>  #ifdef HAVE_VMSPLICE
>  se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>  if (se->conn.want & FUSE_CAP_POSIX_ACL) {
>  outarg.flags |= FUSE_POSIX_ACL;
>  }
> +if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> +outarg.flags |= FUSE_PERFILE_DAX;
> +}
>  outarg.max_readahead = se->conn.max_readahead;
>  outarg.max_write = se->conn.max_write;
>  if (se->conn.max_background >= (1 << 16)) {
> diff --git a/tools/virtiofsd/passthrough_ll.c 
> b/tools/virtiofsd/passthrough_ll.c
> index e170b17adb..5b6228210f 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -53,8 +53,10 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> +#include 
>  
>  #include "qemu/cutils.h"
>  #include "passthrough_helpers.h"
> @@ -136,6 +138,13 @@ enum {
>  SANDBOX_CHROOT,
>  };
>  
> +/* capability of storing DAX flag persistently */
> +enum {
> +DAX_CAP_NONE,  /* not supported */
> +DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> +DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) 
> */
> +};
> +
>  typedef struct xattr_map_entry {
>  char *key;
>  char *prepend;
> @@ -161,6 +170,7 @@ struct lo_data {
>  int readdirplus_clear;
>  int allow_direct_io;
>  int announce_submounts;
> +int perfile_dax_cap; /* capability of backend fs */
>  bool use_statx;
>  struct lo_inode root;
>  GHashTable *inodes; /* protected by lo->mutex */
> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct 
> fuse_conn_info *conn)
>  conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
>  lo->killpriv_v2 = 0;
>  }
> +
> +if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> +conn->want |= FUSE_CAP_PERFILE_DAX;
> +}
>  }
>  
>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct 
> lo_inode *root)
>  int fd, res;
>  struct stat stat;
>  uint64_t mnt_id;
> +struct statfs statfs;
>  
>  fd = open("/", O_PATH);
>  if (fd == -1) {
> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct 
> lo_inode *root)
>  root->posix_locks = g_hash_table_new_full(
>  g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
>  }
> +
> +/*
> + * Currently only ext4/xfs since linux kernel v5.8 support storing
> + * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> + * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> + * FS_IOC_FSG[S]ETXATTR ioctl.
> + 

Re: [PATCH v7 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-17 Thread Greg KH
On Tue, Aug 17, 2021 at 09:22:59PM +0800, Xianting Tian wrote:
> We tested the patch, it worked normally.

Who is "we"?

> Signed-off-by: Xianting Tian 

Like I said before, I need another developer from your company to review
and sign-off on this patch (and the other one), before I am willing to
look at it, based on the previous mistakes that have happened here.

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread Vivek Goyal
On Tue, Aug 17, 2021 at 04:11:14PM +0200, Miklos Szeredi wrote:

[..]
> > As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> > may compete for limited DAX window resource.
> >
> > Besides, supporting DAX for small files can be expensive. Small files
> > can consume DAX window resource rapidly, and if small files are accessed
> > only once, the cost of mmap/munmap on host can not be ignored.
> 
> That's a good point.   Maybe we should disable DAX for file sizes much
> smaller than the chunk size?

This indeed seems like a valid concern. 2MB chunk size will consume
512 struct page entries. If an entry is 64 bytes in size, then that's
32K RAM used to access 4K bytes of file. Does not sound like good usage
of resources.

If we end up selectively disabling dax based on file size, two things
come to me mind.

- Will be good if it is users can opt-in for this behavior. There
  might be a class of users who always want to enable dax on all
  files.

- Secondly, we will have to figure out how to do it safely in the
  event of shared filesystem where file size can change suddenly.
  Will need to make sure change from dax to no-dax and vice-versa
  is safe w.r.t page cache and other paths.

Thanks
Vivek

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread Vivek Goyal
On Tue, Aug 17, 2021 at 09:22:53PM +0800, JeffleXu wrote:
> 
> 
> On 8/17/21 8:39 PM, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> >> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  wrote:
> >>>
> >>> This patchset adds support of per-file DAX for virtiofs, which is
> >>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>
> >> Can you please explain the background of this change in detail?
> >>
> >> Why would an admin want to enable DAX for a particular virtiofs file
> >> and not for others?
> > 
> > Initially I thought that they needed it because they are downloading
> > files on the fly from server. So they don't want to enable dax on the file
> > till file is completely downloaded. 
> 
> Right, it's our initial requirement.
> 
> 
> > But later I realized that they should
> > be able to block in FUSE_SETUPMAPPING call and make sure associated
> > file section has been downloaded before returning and solve the problem.
> > So that can't be the primary reason.
> 
> Saying we want to access 4KB of one file inside guest, if it goes
> through FUSE request routine, then the fuse daemon only need to download
> this 4KB from remote server. But if it goes through DAX, then the fuse
> daemon need to download the whole DAX window (e.g., 2MB) from remote
> server, so called amplification. Maybe we could decrease the DAX window
> size, but it's a trade off.

Downloading 2MB chunk should not be a big issue (IMHO). And if this
turns out to be real concern, we could experiment with a smaller
mapping granularity.

> 
> > 
> > Other reason mentioned I think was that only certain files benefit
> > from DAX. But not much details are there after that. It will be nice
> > to hear a more concrete use case and more details about this usage.
> > 
> 
> Apart from our internal requirement, more fine grained control for DAX
> shall be general and more flexible. Glad to hear more discussion from
> community.

Sure it will be more general and flexible. But there needs to be 1-2
good concrete use cases to justify additional complexity. And I don't
think that so far a good use case has come forward.

Thanks
Vivek

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread Vivek Goyal
On Tue, Aug 17, 2021 at 09:08:35PM +0800, JeffleXu wrote:
> 
> 
> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> > On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> >  wrote:
> >>
> >> * Miklos Szeredi (mik...@szeredi.hu) wrote:
> >>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  
> >>> wrote:
> 
>  This patchset adds support of per-file DAX for virtiofs, which is
>  inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>
> >>> Can you please explain the background of this change in detail?
> >>>
> >>> Why would an admin want to enable DAX for a particular virtiofs file
> >>> and not for others?
> >>
> >> Where we're contending on virtiofs dax cache size it makes a lot of
> >> sense; it's quite expensive for us to map something into the cache
> >> (especially if we push something else out), so selectively DAXing files
> >> that are expected to be hot could help reduce cache churn.
> > 
> > If this is a performance issue, it should be fixed in a way that
> > doesn't require hand tuning like you suggest, I think.
> > 
> > I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> > can help understand the virtiofs case as well.
> > 
> 
> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
> 
> "Boaz Harrosh wondered why someone might want to turn DAX off for a
> persistent memory device. Hellwig said that the performance "could
> suck"; Williams noted that the page cache could be useful for some
> applications as well. Jan Kara pointed out that reads from persistent
> memory are close to DRAM speed, but that writes are not; the page cache
> could be helpful for frequent writes. Applications need to change to
> fully take advantage of DAX, Williams said; part of the promise of
> adding a flag is that users can do DAX on smaller granularities than a
> full filesystem."
> 
> In summary, page cache is preferable in some cases, and thus more fine
> grained way of DAX control is needed.

In case of virtiofs, we are using page cache on host. So this probably
is not a factor for us. Writes will go in page cache of host.

> 
> 
> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> may compete for limited DAX window resource.
> 
> Besides, supporting DAX for small files can be expensive. Small files
> can consume DAX window resource rapidly, and if small files are accessed
> only once, the cost of mmap/munmap on host can not be ignored.

W.r.r access pattern, same applies to large files also. So if a section
of large file is accessed only once, it will consume dax window as well
and will have to be reclaimed.

Dax in virtiofs provides speed gain only if map file once and access
it multiple times. If that pattern does not hold true, then dax does
not seem to provide speed gains and in fact might be slower than
non-dax.

So if there is a pattern where we know some files are accessed repeatedly
while others are not, then enabling/disabling dax selectively will make
sense. Question is how many workloads really know that and how will
you make that decision. Do you have any data to back that up.

W.r.t small file, is that a real concern. If that file is being accessed
mutliple times, then we will still see the speed gain. Only down side
is that there is little wastage of resources because our minimum dax
mapping granularity is 2MB. I am wondering can we handle that by
supporting other dax mapping granularities as well. say 256K and let
users choose it.

Thanks
Vivek
> 
> 
> [1]
> https://lore.kernel.org/lkml/20200428002142.404144-1-ira.we...@intel.com/
> [2] https://lwn.net/Articles/787973/
> 
> -- 
> Thanks,
> Jeffle
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes

2021-08-17 Thread JeffleXu



On 8/17/21 6:26 PM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jeffl...@linux.alibaba.com) wrote:
>> When the per-file DAX indication changes while the file is still
>> *opened*, it is quite complicated and maybe fragile to dynamically
>> change the DAX state.
>>
>> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> 
>  ^^
> typo as DONT ?
> 

Thanks. I will fix it.

> 
>> per-file DAX indication changes, so that the inode instance will be
>> evicted and freed as soon as possible once the file is closed and the
>> last reference to the inode is put. And then when the file gets reopened
>> next time, the inode will reflect the new DAX state.
>>
>> In summary, when the per-file DAX indication changes for an *opened*
>> file, the state of the file won't be updated until this file is closed
>> and reopened later.
>>
>> Signed-off-by: Jeffle Xu 
>> ---
>>  fs/fuse/dax.c| 9 +
>>  fs/fuse/fuse_i.h | 1 +
>>  fs/fuse/inode.c  | 3 +++
>>  3 files changed, 13 insertions(+)
>>
>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>> index 30833f8d37dd..f7ede0be4e00 100644
>> --- a/fs/fuse/dax.c
>> +++ b/fs/fuse/dax.c
>> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, 
>> unsigned int flags)
>>  inode->i_data.a_ops = _dax_file_aops;
>>  }
>>  
>> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
>> +{
>> +struct fuse_conn *fc = get_fuse_conn(inode);
>> +
>> +if (fc->dax_mode == FUSE_DAX_INODE &&
>> +fc->perfile_dax && (!!IS_DAX(inode) != newdax))
>> +d_mark_dontcache(inode);
>> +}
>> +
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int 
>> map_alignment)
>>  {
>>  if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index 7b7b4c208af2..56fe1c4d2136 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>>  void fuse_dax_inode_cleanup(struct inode *inode);
>> +void fuse_dax_dontcache(struct inode *inode, bool newdax);
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int 
>> map_alignment);
>>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>>  
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 8080f78befed..8c9774c6a210 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct 
>> fuse_attr *attr,
>>  if (inval)
>>  invalidate_inode_pages2(inode->i_mapping);
>>  }
>> +
>> +if (IS_ENABLED(CONFIG_FUSE_DAX))
>> +fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
>>  }
>>  
>>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
>> -- 
>> 2.27.0
>>
>> ___
>> Virtio-fs mailing list
>> virtio...@redhat.com
>> https://listman.redhat.com/mailman/listinfo/virtio-fs
>>

-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread JeffleXu



On 8/17/21 8:39 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  wrote:
>>>
>>> This patchset adds support of per-file DAX for virtiofs, which is
>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>
>> Can you please explain the background of this change in detail?
>>
>> Why would an admin want to enable DAX for a particular virtiofs file
>> and not for others?
> 
> Initially I thought that they needed it because they are downloading
> files on the fly from server. So they don't want to enable dax on the file
> till file is completely downloaded. 

Right, it's our initial requirement.


> But later I realized that they should
> be able to block in FUSE_SETUPMAPPING call and make sure associated
> file section has been downloaded before returning and solve the problem.
> So that can't be the primary reason.

Saying we want to access 4KB of one file inside guest, if it goes
through FUSE request routine, then the fuse daemon only need to download
this 4KB from remote server. But if it goes through DAX, then the fuse
daemon need to download the whole DAX window (e.g., 2MB) from remote
server, so called amplification. Maybe we could decrease the DAX window
size, but it's a trade off.

> 
> Other reason mentioned I think was that only certain files benefit
> from DAX. But not much details are there after that. It will be nice
> to hear a more concrete use case and more details about this usage.
> 

Apart from our internal requirement, more fine grained control for DAX
shall be general and more flexible. Glad to hear more discussion from
community.


-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread JeffleXu



On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
>  wrote:
>>
>> * Miklos Szeredi (mik...@szeredi.hu) wrote:
>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  wrote:

 This patchset adds support of per-file DAX for virtiofs, which is
 inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>
>>> Can you please explain the background of this change in detail?
>>>
>>> Why would an admin want to enable DAX for a particular virtiofs file
>>> and not for others?
>>
>> Where we're contending on virtiofs dax cache size it makes a lot of
>> sense; it's quite expensive for us to map something into the cache
>> (especially if we push something else out), so selectively DAXing files
>> that are expected to be hot could help reduce cache churn.
> 
> If this is a performance issue, it should be fixed in a way that
> doesn't require hand tuning like you suggest, I think.
> 
> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> can help understand the virtiofs case as well.
> 

Some hints why ext4/xfs support per-file DAX can be found [1] and [2].

"Boaz Harrosh wondered why someone might want to turn DAX off for a
persistent memory device. Hellwig said that the performance "could
suck"; Williams noted that the page cache could be useful for some
applications as well. Jan Kara pointed out that reads from persistent
memory are close to DRAM speed, but that writes are not; the page cache
could be helpful for frequent writes. Applications need to change to
fully take advantage of DAX, Williams said; part of the promise of
adding a flag is that users can do DAX on smaller granularities than a
full filesystem."

In summary, page cache is preferable in some cases, and thus more fine
grained way of DAX control is needed.


As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
may compete for limited DAX window resource.

Besides, supporting DAX for small files can be expensive. Small files
can consume DAX window resource rapidly, and if small files are accessed
only once, the cost of mmap/munmap on host can not be ignored.


[1]
https://lore.kernel.org/lkml/20200428002142.404144-1-ira.we...@intel.com/
[2] https://lwn.net/Articles/787973/

-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread Vivek Goyal
On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  wrote:
> >
> > This patchset adds support of per-file DAX for virtiofs, which is
> > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Can you please explain the background of this change in detail?
> 
> Why would an admin want to enable DAX for a particular virtiofs file
> and not for others?

Initially I thought that they needed it because they are downloading
files on the fly from server. So they don't want to enable dax on the file
till file is completely downloaded. But later I realized that they should
be able to block in FUSE_SETUPMAPPING call and make sure associated
file section has been downloaded before returning and solve the problem.
So that can't be the primary reason.

Other reason mentioned I think was that only certain files benefit
from DAX. But not much details are there after that. It will be nice
to hear a more concrete use case and more details about this usage.

Thanks
Vivek

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread Vivek Goyal
On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
> * Miklos Szeredi (mik...@szeredi.hu) wrote:
> > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  wrote:
> > >
> > > This patchset adds support of per-file DAX for virtiofs, which is
> > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> > 
> > Can you please explain the background of this change in detail?
> > 
> > Why would an admin want to enable DAX for a particular virtiofs file
> > and not for others?
> 
> Where we're contending on virtiofs dax cache size it makes a lot of
> sense; it's quite expensive for us to map something into the cache
> (especially if we push something else out), so selectively DAXing files
> that are expected to be hot could help reduce cache churn.

In that case probaly we should just make DAX window larger. I assume
that selecting which files to turn DAX on, will itself will not be
a trivial. Not sure what heuristics are being deployed to determine
that. Will like to know more about it.

Vivek

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [kvm-unit-tests PATCH 0/6] Initial x86_64 UEFI support

2021-08-17 Thread Joerg Roedel
Hi Marc,

On Fri, Aug 13, 2021 at 11:44:39AM -0700, Marc Orr wrote:
> To date, we have _most_ x86 test cases (39/44) working under UEFI and
> we've also got some of the test cases to boot under SEV-ES, using the
> UEFI #VC handler.

While the EFI APP approach simplifies the implementation a lot, I don't
think it is the best path to SEV and TDX testing for a couple of
reasons:

1) It leaves the details of #VC/#VE handling and the SEV-ES
   specific communication channels (GHCB) under control of the
   firmware. So we can't reliably test those interfaces from an
   EFI APP.

2) Same for the memory validation/acceptance interface needed
   for SEV-SNP and TDX. Using an EFI APP leaves those under
   firmware control and we are not able to reliably test them.

3) The IDT also stays under control of the firmware in an EFI
   APP, otherwise the firmware couldn't provide a #VC handler.
   This makes it unreliable to test anything IDT or IRQ related.

4) Relying on the firmware #VC hanlder limits the tests to its
   abilities. Implementing a separate #VC handler routine for
   kvm-unit-tests is more work, but it makes test development
   much more flexible.

So it comes down to the fact that and EFI APP leaves control over
SEV/TDX specific hypervisor interfaces in the firmware, making it hard
and unreliable to test these interfaces from kvm-unit-tests. The stub
approach on the other side gives the tests full control over the VM,
allowing to test all aspects of the guest-host interface.

Regards,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread Dr. David Alan Gilbert
* Miklos Szeredi (mik...@szeredi.hu) wrote:
> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
>  wrote:
> >
> > * Miklos Szeredi (mik...@szeredi.hu) wrote:
> > > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  
> > > wrote:
> > > >
> > > > This patchset adds support of per-file DAX for virtiofs, which is
> > > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> > >
> > > Can you please explain the background of this change in detail?
> > >
> > > Why would an admin want to enable DAX for a particular virtiofs file
> > > and not for others?
> >
> > Where we're contending on virtiofs dax cache size it makes a lot of
> > sense; it's quite expensive for us to map something into the cache
> > (especially if we push something else out), so selectively DAXing files
> > that are expected to be hot could help reduce cache churn.
> 
> If this is a performance issue, it should be fixed in a way that
> doesn't require hand tuning like you suggest, I think.

I'd agree that would be nice; however:
  a) It looks like other filesystems already gave something admin
selectable
  b) Trying to write clever heuristics is only going to work in some
cases; being able to say 'DAX this directory' might work better in
practice.

> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> can help understand the virtiofs case as well.

Yep, I don't understand the case with real nvdimm hardware.

Dave

> Thanks,
> Miklos
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes

2021-08-17 Thread Dr. David Alan Gilbert
* Jeffle Xu (jeffl...@linux.alibaba.com) wrote:
> When the per-file DAX indication changes while the file is still
> *opened*, it is quite complicated and maybe fragile to dynamically
> change the DAX state.
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the

 ^^
typo as DONT ?

Dave

> per-file DAX indication changes, so that the inode instance will be
> evicted and freed as soon as possible once the file is closed and the
> last reference to the inode is put. And then when the file gets reopened
> next time, the inode will reflect the new DAX state.
> 
> In summary, when the per-file DAX indication changes for an *opened*
> file, the state of the file won't be updated until this file is closed
> and reopened later.
> 
> Signed-off-by: Jeffle Xu 
> ---
>  fs/fuse/dax.c| 9 +
>  fs/fuse/fuse_i.h | 1 +
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 30833f8d37dd..f7ede0be4e00 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned 
> int flags)
>   inode->i_data.a_ops = _dax_file_aops;
>  }
>  
> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
> +{
> + struct fuse_conn *fc = get_fuse_conn(inode);
> +
> + if (fc->dax_mode == FUSE_DAX_INODE &&
> + fc->perfile_dax && (!!IS_DAX(inode) != newdax))
> + d_mark_dontcache(inode);
> +}
> +
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int 
> map_alignment)
>  {
>   if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 7b7b4c208af2..56fe1c4d2136 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_dontcache(struct inode *inode, bool newdax);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int 
> map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 8080f78befed..8c9774c6a210 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct 
> fuse_attr *attr,
>   if (inval)
>   invalidate_inode_pages2(inode->i_mapping);
>   }
> +
> + if (IS_ENABLED(CONFIG_FUSE_DAX))
> + fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
>  }
>  
>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
> -- 
> 2.27.0
> 
> ___
> Virtio-fs mailing list
> virtio...@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX

2021-08-17 Thread Dr. David Alan Gilbert
* Miklos Szeredi (mik...@szeredi.hu) wrote:
> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu  wrote:
> >
> > This patchset adds support of per-file DAX for virtiofs, which is
> > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Can you please explain the background of this change in detail?
> 
> Why would an admin want to enable DAX for a particular virtiofs file
> and not for others?

Where we're contending on virtiofs dax cache size it makes a lot of
sense; it's quite expensive for us to map something into the cache
(especially if we push something else out), so selectively DAXing files
that are expected to be hot could help reduce cache churn.

Dave

> Thanks,
> Miklos
> 
> ___
> Virtio-fs mailing list
> virtio...@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Patch v1 3/3] vdpa/mlx5: Add multiqueue support

2021-08-17 Thread Jason Wang
On Tue, Aug 17, 2021 at 2:01 PM Eli Cohen  wrote:
>
> On Tue, Aug 17, 2021 at 01:48:17PM +0800, Jason Wang wrote:
> > On Tue, Aug 17, 2021 at 1:26 PM Eli Cohen  wrote:
> > >
> > > On Tue, Aug 17, 2021 at 12:19:55PM +0800, Jason Wang wrote:
> > > >
> > > > 在 2021/8/17 下午12:03, Parav Pandit 写道:
> > > > > > From: Jason Wang 
> > > > > > Sent: Tuesday, August 17, 2021 9:26 AM
> > > > > >
> > > > > > On Tue, Aug 17, 2021 at 3:37 AM Michael S. Tsirkin 
> > > > > > wrote:
> > > > > > > On Mon, Aug 16, 2021 at 07:56:59PM +0300, Eli Cohen wrote:
> > > > > > > > On Mon, Aug 16, 2021 at 01:58:06PM +0800, Jason Wang wrote:
> > > > > > > > > 在 2021/8/16 下午1:47, Eli Cohen 写道:
> > > > > > > > > > On Mon, Aug 16, 2021 at 12:16:14PM +0800, Jason Wang wrote:
> > > > > > > > > > > 在 2021/8/12 下午5:50, Eli Cohen 写道:
> > > > > > > > > > > > On Thu, Aug 12, 2021 at 03:04:35PM +0800, Jason Wang 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > 在 2021/8/12 下午3:01, Eli Cohen 写道:
> > > > > > > > > > > > > > On Thu, Aug 12, 2021 at 02:47:06PM +0800, Jason 
> > > > > > > > > > > > > > Wang wrote:
> > > > > > > > > > > > > > > On Thu, Aug 12, 2021 at 12:55 PM Eli Cohen
> > > > > >  wrote:
> > > > > > > > > > > > > > > > On Thu, Aug 12, 2021 at 11:19:19AM +0800, Jason 
> > > > > > > > > > > > > > > > Wang
> > > > > > wrote:
> > > > > > > > > > > > > > > > > 在 2021/8/11 下午7:04, Eli Cohen 写道:
> > > > > > > > > > > > > > > > > > On Wed, Aug 11, 2021 at 04:37:44PM +0800, 
> > > > > > > > > > > > > > > > > > Jason
> > > > > > Wang wrote:
> > > > > > > > > > > > > > > > > > > 在 2021/8/11 下午3:53, Eli Cohen 写道:
> > > > > > > > > > > > > > > > > > > > > One thing need to solve for mq is 
> > > > > > > > > > > > > > > > > > > > > that the:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > +static u16 ctrl_vq_idx(struct
> > > > > > > > > > > > > > > > > > > > > > +mlx5_vdpa_dev *mvdev) {
> > > > > > > > > > > > > > > > > > > > > > + return 2 *
> > > > > > > > > > > > > > > > > > > > > > +mlx5_vdpa_max_qps(mvdev->max_vqs);
> > > > > > > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > > > > > We should handle the case when MQ is
> > > > > > > > > > > > > > > > > > > > > supported by the device but not the 
> > > > > > > > > > > > > > > > > > > > > driver.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > E.g in the case when we have 2 queue 
> > > > > > > > > > > > > > > > > > > > > pairs:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > When MQ is enabled, cvq is queue 4
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > When MQ is not enabled, cvq is queue 2
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > There's some issue with this. I get
> > > > > > > > > > > > > > > > > > > > callbacks targeting specific virtqueues 
> > > > > > > > > > > > > > > > > > > > before
> > > > > > features negotiation has been completed.
> > > > > > > > > > > > > > > > > > > > Specifically, I get set_vq_cb() calls. 
> > > > > > > > > > > > > > > > > > > > At
> > > > > > > > > > > > > > > > > > > > this point I must know the control vq 
> > > > > > > > > > > > > > > > > > > > index.
> > > > > > > > > > > > > > > > > > > So I think we need do both:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 1) At one hand, it's a bug for the 
> > > > > > > > > > > > > > > > > > > userspace
> > > > > > > > > > > > > > > > > > > to use vq_index before feature is 
> > > > > > > > > > > > > > > > > > > negotiated
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > (looks like a bug in my cvq series that 
> > > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > call SET_VRING_CALL before feature is 
> > > > > > > > > > > > > > > > > > > negotiate,
> > > > > > which I will look).
> > > > > > > > > > > > > > > > > > > 2) At the other hand, the driver should be
> > > > > > > > > > > > > > > > > > > able to deal with that
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > All I can do is drop callbacks for VQs 
> > > > > > > > > > > > > > > > > > before
> > > > > > > > > > > > > > > > > > features negotation has been completed.
> > > > > > > > > > > > > > > > > Or just leave queue index 0, 1 work.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Since it is not expected to be changed.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Right, will do.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I think the CVQ index must not depend 
> > > > > > > > > > > > > > > > > > > > on the
> > > > > > > > > > > > > > > > > > > > negotiated features but rather depend 
> > > > > > > > > > > > > > > > > > > > of the
> > > > > > > > > > > > > > > > > > > > value the device driver provides in the 
> > > > > 

Re: [PATCH net] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO

2021-08-17 Thread Jason Wang
On Tue, Aug 17, 2021 at 2:46 PM Michael S. Tsirkin  wrote:
>
> Patch is good. Suggest some tweaks to the commit log.
>
> On Tue, Aug 17, 2021 at 10:03:38AM +0800, Jason Wang wrote:
> > Commit a02e8964eaf92 ("virtio-net: ethtool configurable LRO") tries to
> > advertise LRO on behalf of the guest offloading features and allow the
>
> tries to advertise -> advertises
>
> that part actually works.
>
> allow -> allows
>
> on behalf of features is really weird. maybe "maps"?
>
> > administrator to enable and disable those features via ethtool.
> >
> > This may lead several issues:
>
> may lead->lead to
>
> >
> > - For the device that doesn't support control guest offloads, the
> >   "LRO" can't be disabled so we will get a warn in the
>
> warn -> warning
>
> >   dev_disable_lro()
>
> .. when turning off LRO or when enabling forwarding bridging etc.
>
> > - For the device that have the control guest offloads, the guest
>
> have the -> supports
>
> >   offloads were disabled in the case of bridge etc
>
> etc -> forwarding etc
>
> > which may slow down
>
> were -> are
>
> may slow -> slows
>
> >   the traffic.
> >
> > Fixing this by using NETIF_F_GRO_HW instead. Though the spec does not
> > guaranteed to be re-segmented as original explicitly now, we can add
>
> guaranteed -> guarantee
>
> > that to the spec
>
> I would add:
>
> Further, we never advertised LRO historically before a02e8964eaf92
> ("virtio-net: ethtool configurable LRO") and so bridged/forwarded
> configs effectively relied on virtio receive offloads being GRO.
>
>
>
>
> > and then we can catch the bad configuration and
> > setup.
>
> Don't know what does this part mean. How would we catch it?
> With a new flag? Let's say so.
>
> >
> > Fixes: a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
> > Signed-off-by: Jason Wang 
>
>
>
> Proposed rewritten commit log:

Agree.

Post a new version.

Thanks

>
> ===
> [PATCH net] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO
>
> Commit a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
> maps LRO to virtio guest offloading features and allows the
> administrator to enable and disable those features via ethtool.
>
> This leads to several issues:
>
>
> - For a device that doesn't support control guest offloads, the "LRO"
>   can't be disabled triggering WARN in dev_disable_lro() when turning
>   off LRO or when enabling forwarding bridging etc.
>
> - For a device that supports control guest offloads, the guest
>   offloads are disabled in cases of bridging, forwarding etc
>   slowing down the traffic.
>
> Fix this by using NETIF_F_GRO_HW instead. Though the spec does not
> guarantee packets to be re-segmented as the original ones,
> we can add that to the spec, possibly with a flag for devices to
> differentiate between GRO and LRO.
>
> Further, we never advertised LRO historically before a02e8964eaf92
> ("virtio-net: ethtool configurable LRO") and so bridged/forwarded
> configs effectively always relied on virtio receive offloads behaving
> like GRO - thus even if this breaks any configs it is at least not
> a regression.
>
> Fixes: a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
> Acked-by: Michael S. Tsirkin 
> Reported-by: Ivan 
> Tested-by: Ivan 
> Signed-off-by: Jason Wang 
>
> ===
>
>
> > ---
> >  drivers/net/virtio_net.c | 14 +++---
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 0416a7e00914..10c382b08bce 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -63,7 +63,7 @@ static const unsigned long guest_offloads[] = {
> >   VIRTIO_NET_F_GUEST_CSUM
> >  };
> >
> > -#define GUEST_OFFLOAD_LRO_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> > +#define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> >   (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
> >   (1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
> >   (1ULL << VIRTIO_NET_F_GUEST_UFO))
> > @@ -2481,7 +2481,7 @@ static int virtnet_xdp_set(struct net_device *dev, 
> > struct bpf_prog *prog,
> >   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> >   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
> >   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
> > - NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
> > implementing LRO/CSUM, disable LRO/CSUM first");
> > + NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
> > implementing GRO_HW/CSUM, disable GRO_HW/CSUM first");
> >   return -EOPNOTSUPP;
> >   }
> >
> > @@ -2612,15 +2612,15 @@ static int virtnet_set_features(struct net_device 
> > *dev,
> >   u64 offloads;
> >   int err;
> >
> > - if ((dev->features ^ features) & NETIF_F_LRO) {
> > + if ((dev->features ^ features) & NETIF_F_GRO_HW) {
> >   if (vi->xdp_enabled)
> >

[PATCH V2 net] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO

2021-08-17 Thread Jason Wang
Commit a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
maps LRO to virtio guest offloading features and allows the
administrator to enable and disable those features via ethtool.

This leads to several issues:

- For a device that doesn't support control guest offloads, the "LRO"
  can't be disabled triggering WARN in dev_disable_lro() when turning
  off LRO or when enabling forwarding bridging etc.

- For a device that supports control guest offloads, the guest
  offloads are disabled in cases of bridging, forwarding etc slowing
  down the traffic.

Fix this by using NETIF_F_GRO_HW instead. Though the spec does not
guarantee packets to be re-segmented as the original ones,
we can add that to the spec, possibly with a flag for devices to
differentiate between GRO and LRO.

Further, we never advertised LRO historically before a02e8964eaf92
("virtio-net: ethtool configurable LRO") and so bridged/forwarded
configs effectively always relied on virtio receive offloads behaving
like GRO - thus even if this breaks any configs it is at least not
a regression.

Fixes: a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
Acked-by: Michael S. Tsirkin 
Reported-by: Ivan 
Tested-by: Ivan 
Signed-off-by: Jason Wang 
---
Changes from V1:
- Tweak the commit log
---
 drivers/net/virtio_net.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0416a7e00914..10c382b08bce 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -63,7 +63,7 @@ static const unsigned long guest_offloads[] = {
VIRTIO_NET_F_GUEST_CSUM
 };
 
-#define GUEST_OFFLOAD_LRO_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
+#define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
(1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
(1ULL << VIRTIO_NET_F_GUEST_UFO))
@@ -2481,7 +2481,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct 
bpf_prog *prog,
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
-   NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
implementing LRO/CSUM, disable LRO/CSUM first");
+   NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
implementing GRO_HW/CSUM, disable GRO_HW/CSUM first");
return -EOPNOTSUPP;
}
 
@@ -2612,15 +2612,15 @@ static int virtnet_set_features(struct net_device *dev,
u64 offloads;
int err;
 
-   if ((dev->features ^ features) & NETIF_F_LRO) {
+   if ((dev->features ^ features) & NETIF_F_GRO_HW) {
if (vi->xdp_enabled)
return -EBUSY;
 
-   if (features & NETIF_F_LRO)
+   if (features & NETIF_F_GRO_HW)
offloads = vi->guest_offloads_capable;
else
offloads = vi->guest_offloads_capable &
-  ~GUEST_OFFLOAD_LRO_MASK;
+  ~GUEST_OFFLOAD_GRO_HW_MASK;
 
err = virtnet_set_guest_offloads(vi, offloads);
if (err)
@@ -3100,9 +3100,9 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->features |= NETIF_F_RXCSUM;
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
-   dev->features |= NETIF_F_LRO;
+   dev->features |= NETIF_F_GRO_HW;
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
-   dev->hw_features |= NETIF_F_LRO;
+   dev->hw_features |= NETIF_F_GRO_HW;
 
dev->vlan_features = dev->features;
 
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO

2021-08-17 Thread Michael S. Tsirkin
Patch is good. Suggest some tweaks to the commit log.

On Tue, Aug 17, 2021 at 10:03:38AM +0800, Jason Wang wrote:
> Commit a02e8964eaf92 ("virtio-net: ethtool configurable LRO") tries to
> advertise LRO on behalf of the guest offloading features and allow the

tries to advertise -> advertises

that part actually works.

allow -> allows

on behalf of features is really weird. maybe "maps"?

> administrator to enable and disable those features via ethtool.
> 
> This may lead several issues:

may lead->lead to

> 
> - For the device that doesn't support control guest offloads, the
>   "LRO" can't be disabled so we will get a warn in the

warn -> warning

>   dev_disable_lro()

.. when turning off LRO or when enabling forwarding bridging etc.

> - For the device that have the control guest offloads, the guest

have the -> supports

>   offloads were disabled in the case of bridge etc

etc -> forwarding etc

> which may slow down

were -> are

may slow -> slows

>   the traffic.
> 
> Fixing this by using NETIF_F_GRO_HW instead. Though the spec does not
> guaranteed to be re-segmented as original explicitly now, we can add

guaranteed -> guarantee

> that to the spec

I would add:

Further, we never advertised LRO historically before a02e8964eaf92
("virtio-net: ethtool configurable LRO") and so bridged/forwarded
configs effectively relied on virtio receive offloads being GRO.




> and then we can catch the bad configuration and
> setup.

Don't know what does this part mean. How would we catch it?
With a new flag? Let's say so.

> 
> Fixes: a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
> Signed-off-by: Jason Wang 



Proposed rewritten commit log:

===
[PATCH net] virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO

Commit a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
maps LRO to virtio guest offloading features and allows the
administrator to enable and disable those features via ethtool.
 
This leads to several issues:


- For a device that doesn't support control guest offloads, the "LRO"
  can't be disabled triggering WARN in dev_disable_lro() when turning
  off LRO or when enabling forwarding bridging etc.

- For a device that supports control guest offloads, the guest
  offloads are disabled in cases of bridging, forwarding etc
  slowing down the traffic.
 
Fix this by using NETIF_F_GRO_HW instead. Though the spec does not
guarantee packets to be re-segmented as the original ones,
we can add that to the spec, possibly with a flag for devices to
differentiate between GRO and LRO.

Further, we never advertised LRO historically before a02e8964eaf92
("virtio-net: ethtool configurable LRO") and so bridged/forwarded
configs effectively always relied on virtio receive offloads behaving
like GRO - thus even if this breaks any configs it is at least not
a regression.

Fixes: a02e8964eaf92 ("virtio-net: ethtool configurable LRO")
Acked-by: Michael S. Tsirkin 
Reported-by: Ivan 
Tested-by: Ivan 
Signed-off-by: Jason Wang 

===


> ---
>  drivers/net/virtio_net.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0416a7e00914..10c382b08bce 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -63,7 +63,7 @@ static const unsigned long guest_offloads[] = {
>   VIRTIO_NET_F_GUEST_CSUM
>  };
>  
> -#define GUEST_OFFLOAD_LRO_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> +#define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
>   (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
>   (1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
>   (1ULL << VIRTIO_NET_F_GUEST_UFO))
> @@ -2481,7 +2481,7 @@ static int virtnet_xdp_set(struct net_device *dev, 
> struct bpf_prog *prog,
>   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
>   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
>   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
> - NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
> implementing LRO/CSUM, disable LRO/CSUM first");
> + NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is 
> implementing GRO_HW/CSUM, disable GRO_HW/CSUM first");
>   return -EOPNOTSUPP;
>   }
>  
> @@ -2612,15 +2612,15 @@ static int virtnet_set_features(struct net_device 
> *dev,
>   u64 offloads;
>   int err;
>  
> - if ((dev->features ^ features) & NETIF_F_LRO) {
> + if ((dev->features ^ features) & NETIF_F_GRO_HW) {
>   if (vi->xdp_enabled)
>   return -EBUSY;
>  
> - if (features & NETIF_F_LRO)
> + if (features & NETIF_F_GRO_HW)
>   offloads = vi->guest_offloads_capable;
>   else
>   offloads = vi->guest_offloads_capable &
> -~GUEST_OFFLOAD_LRO_MASK;
> +