On Fri, Apr 12, 2013 at 07:08:42PM -0500, Scott Wood wrote:
> Currently, devices that are emulated inside KVM are configured in a
> hardcoded manner based on an assumption that any given architecture
> only has one way to do it.  If there's any need to access device state,
> it is done through inflexible one-purpose-only IOCTLs (e.g.
> KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
> cumbersome and depletes a limited numberspace.
> 
> This API provides a mechanism to instantiate a device of a certain
> type, returning an ID that can be used to set/get attributes of the
> device.  Attributes may include configuration parameters (e.g.
> register base address), device state, operational commands, etc.  It
> is similar to the ONE_REG API, except that it acts on devices rather
> than vcpus.
> 
> Both device types and individual attributes can be tested without having
> to create the device or get/set the attribute, without the need for
> separately managing enumerated capabilities.
> 
> Signed-off-by: Scott Wood <scottw...@freescale.com>
> ---
> v4:
>  - Move some boilerplate back into generic code, as requested by Gleb.
>    File descriptor management and reference counting is no longer the
>    concern of the device implementation.
> 
>  - Don't hold kvm->lock during create.  The original reasons
>    for doing so have vanished as for as MPIC is concerned, and
>    this avoids needing to answer the question of whether to
>    hold the lock during destroy as well.
> 
>    Paul, you may need to acquire the lock yourself in kvm_create_xics()
>    to protect the -EEXIST check.
> 
> v3: remove some changes that were merged into this patch by accident,
> and fix the error documentation for KVM_CREATE_DEVICE.
> ---
>  Documentation/virtual/kvm/api.txt        |   70 ++++++++++++++++
>  Documentation/virtual/kvm/devices/README |    1 +
>  include/linux/kvm_host.h                 |   35 ++++++++
>  include/uapi/linux/kvm.h                 |   27 +++++++
>  virt/kvm/kvm_main.c                      |  129 
> ++++++++++++++++++++++++++++++
>  5 files changed, 262 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/devices/README
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 976eb65..d52f3f9 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2173,6 +2173,76 @@ header; first `n_valid' valid entries with contents 
> from the data
>  written, then `n_invalid' invalid entries, invalidating any previously
>  valid entries found.
>  
> +4.79 KVM_CREATE_DEVICE
> +
> +Capability: KVM_CAP_DEVICE_CTRL
> +Type: vm ioctl
> +Parameters: struct kvm_create_device (in/out)
> +Returns: 0 on success, -1 on error
> +Errors:
> +  ENODEV: The device type is unknown or unsupported
> +  EEXIST: Device already created, and this type of device may not
> +          be instantiated multiple times
> +
> +  Other error conditions may be defined by individual device types or
> +  have their standard meanings.
> +
> +Creates an emulated device in the kernel.  The file descriptor returned
> +in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR.
> +
> +If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the
> +device type is supported (not necessarily whether it can be created
> +in the current vm).
> +
> +Individual devices should not define flags.  Attributes should be used
> +for specifying any behavior that is not implied by the device type
> +number.
> +
> +struct kvm_create_device {
> +     __u32   type;   /* in: KVM_DEV_TYPE_xxx */
> +     __u32   fd;     /* out: device handle */
> +     __u32   flags;  /* in: KVM_CREATE_DEVICE_xxx */
> +};
Should we add __u32 padding here to make struct size multiple of u64?

> +
> +4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
> +
> +Capability: KVM_CAP_DEVICE_CTRL
> +Type: device ioctl
> +Parameters: struct kvm_device_attr
> +Returns: 0 on success, -1 on error
> +Errors:
> +  ENXIO:  The group or attribute is unknown/unsupported for this device
> +  EPERM:  The attribute cannot (currently) be accessed this way
> +          (e.g. read-only attribute, or attribute that only makes
> +          sense when the device is in a different state)
> +
> +  Other error conditions may be defined by individual device types.
> +
> +Gets/sets a specified piece of device configuration and/or state.  The
> +semantics are device-specific.  See individual device documentation in
> +the "devices" directory.  As with ONE_REG, the size of the data
> +transferred is defined by the particular attribute.
> +
> +struct kvm_device_attr {
> +     __u32   flags;          /* no flags currently defined */
> +     __u32   group;          /* device-defined */
> +     __u64   attr;           /* group-defined */
> +     __u64   addr;           /* userspace address of attr data */
> +};
> +
> +4.81 KVM_HAS_DEVICE_ATTR
> +
> +Capability: KVM_CAP_DEVICE_CTRL
> +Type: device ioctl
> +Parameters: struct kvm_device_attr
> +Returns: 0 on success, -1 on error
> +Errors:
> +  ENXIO:  The group or attribute is unknown/unsupported for this device
> +
> +Tests whether a device supports a particular attribute.  A successful
> +return indicates the attribute is implemented.  It does not necessarily
> +indicate that the attribute can be read or written in the device's
> +current state.  "addr" is ignored.
>  
>  4.77 KVM_ARM_VCPU_INIT
>  
> diff --git a/Documentation/virtual/kvm/devices/README 
> b/Documentation/virtual/kvm/devices/README
> new file mode 100644
> index 0000000..34a6983
> --- /dev/null
> +++ b/Documentation/virtual/kvm/devices/README
> @@ -0,0 +1 @@
> +This directory contains specific device bindings for KVM_CAP_DEVICE_CTRL.
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 20d77d2..8fce9bc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1063,6 +1063,41 @@ static inline bool kvm_check_request(int req, struct 
> kvm_vcpu *vcpu)
>  
>  extern bool kvm_rebooting;
>  
> +struct kvm_device_ops;
> +
> +struct kvm_device {
> +     struct kvm_device_ops *ops;
> +     struct kvm *kvm;
> +     atomic_t users;
> +     void *private;
> +};
> +
> +/* create, destroy, and name are mandatory */
> +struct kvm_device_ops {
> +     const char *name;
> +     int (*create)(struct kvm_device *dev, u32 type);
> +
> +     /*
> +      * Destroy is responsible for freeing dev.
> +      *
> +      * Destroy may be called before or after destructors are called
> +      * on emulated I/O regions, depending on whether a reference is
> +      * held by a vcpu or other kvm component that gets destroyed
> +      * after the emulated I/O.
> +      */
> +     void (*destroy)(struct kvm_device *dev);
> +
> +     int (*set_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
> +     int (*get_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
> +     int (*has_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
> +     long (*ioctl)(struct kvm_device *dev, unsigned int ioctl,
> +                   unsigned long arg);
> +};
> +
> +void kvm_device_get(struct kvm_device *dev);
> +void kvm_device_put(struct kvm_device *dev);
> +struct kvm_device *kvm_device_from_filp(struct file *filp);
> +
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 74d0ff3..20ce2d2 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_PPC_EPR 86
>  #define KVM_CAP_ARM_PSCI 87
>  #define KVM_CAP_ARM_SET_DEVICE_ADDR 88
> +#define KVM_CAP_DEVICE_CTRL 89
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -909,6 +910,32 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_ARM_SET_DEVICE_ADDR        _IOW(KVMIO,  0xab, struct 
> kvm_arm_device_addr)
>  
>  /*
> + * Device control API, available with KVM_CAP_DEVICE_CTRL
> + */
> +#define KVM_CREATE_DEVICE_TEST               1
> +
> +struct kvm_create_device {
> +     __u32   type;   /* in: KVM_DEV_TYPE_xxx */
> +     __u32   fd;     /* out: device handle */
> +     __u32   flags;  /* in: KVM_CREATE_DEVICE_xxx */
> +};
> +
> +struct kvm_device_attr {
> +     __u32   flags;          /* no flags currently defined */
> +     __u32   group;          /* device-defined */
> +     __u64   attr;           /* group-defined */
> +     __u64   addr;           /* userspace address of attr data */
> +};
Please move struct definitions and KVM_CREATE_DEVICE_TEST define out
from ioctl definition block.

> +
> +/* ioctl for vm fd */
> +#define KVM_CREATE_DEVICE      _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> +
> +/* ioctls for fds returned by KVM_CREATE_DEVICE */
> +#define KVM_SET_DEVICE_ATTR    _IOW(KVMIO,  0xe1, struct kvm_device_attr)
> +#define KVM_GET_DEVICE_ATTR    _IOW(KVMIO,  0xe2, struct kvm_device_attr)
> +#define KVM_HAS_DEVICE_ATTR    _IOW(KVMIO,  0xe3, struct kvm_device_attr)
> +
> +/*
>   * ioctls for vcpu fds
>   */
>  #define KVM_RUN                   _IO(KVMIO,   0x80)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5cc53c9..e2b18af 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2158,6 +2158,117 @@ out:
>  }
>  #endif
>  
> +static int kvm_device_ioctl_attr(struct kvm_device *dev,
> +                              int (*accessor)(struct kvm_device *dev,
> +                                              struct kvm_device_attr *attr),
> +                              unsigned long arg)
> +{
> +     struct kvm_device_attr attr;
> +
> +     if (!accessor)
> +             return -EPERM;
> +
> +     if (copy_from_user(&attr, (void __user *)arg, sizeof(attr)))
> +             return -EFAULT;
> +
> +     return accessor(dev, &attr);
> +}
> +
> +static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
> +                          unsigned long arg)
> +{
> +     struct kvm_device *dev = filp->private_data;
> +
> +     switch (ioctl) {
> +     case KVM_SET_DEVICE_ATTR:
> +             return kvm_device_ioctl_attr(dev, dev->ops->set_attr, arg);
> +     case KVM_GET_DEVICE_ATTR:
> +             return kvm_device_ioctl_attr(dev, dev->ops->get_attr, arg);
> +     case KVM_HAS_DEVICE_ATTR:
> +             return kvm_device_ioctl_attr(dev, dev->ops->has_attr, arg);
> +     default:
> +             if (dev->ops->ioctl)
> +                     return dev->ops->ioctl(dev, ioctl, arg);
> +
> +             return -ENOTTY;
> +     }
> +}
> +
> +void kvm_device_get(struct kvm_device *dev)
> +{
> +     atomic_inc(&dev->users);
> +}
> +
> +void kvm_device_put(struct kvm_device *dev)
> +{
> +     if (atomic_dec_and_test(&dev->users))
> +             dev->ops->destroy(dev);
> +}
> +
> +static int kvm_device_release(struct inode *inode, struct file *filp)
> +{
> +     struct kvm_device *dev = filp->private_data;
> +     struct kvm *kvm = dev->kvm;
> +
> +     kvm_device_put(dev);
> +     kvm_put_kvm(kvm);
We may put kvm only if users goes to zero, otherwise kvm can be
freed while something holds a reference to a device. Why not make
kvm_device_put() do it?

> +     return 0;
> +}
> +
> +static const struct file_operations kvm_device_fops = {
> +     .unlocked_ioctl = kvm_device_ioctl,
> +     .release = kvm_device_release,
> +};
> +
> +struct kvm_device *kvm_device_from_filp(struct file *filp)
> +{
> +     if (filp->f_op != &kvm_device_fops)
> +             return NULL;
> +
> +     return filp->private_data;
> +}
> +
> +static int kvm_ioctl_create_device(struct kvm *kvm,
> +                                struct kvm_create_device *cd)
> +{
> +     struct kvm_device_ops *ops = NULL;
> +     struct kvm_device *dev;
> +     bool test = cd->flags & KVM_CREATE_DEVICE_TEST;
> +     int ret;
> +
> +     switch (cd->type) {
> +     default:
> +             return -ENODEV;
> +     }
> +
> +     if (test)
> +             return 0;
> +
> +     dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> +     if (!dev)
> +             return -ENOMEM;
> +
> +     dev->ops = ops;
> +     dev->kvm = kvm;
> +     atomic_set(&dev->users, 1);
> +
> +     ret = ops->create(dev, cd->type);
> +     if (ret < 0) {
> +             kfree(dev);
> +             return ret;
> +     }
> +
> +     ret = anon_inode_getfd(ops->name, &kvm_device_fops, dev, O_RDWR);
> +     if (ret < 0) {
> +             ops->destroy(dev);
> +             return ret;
> +     }
> +
> +     kvm_get_kvm(kvm);
> +     cd->fd = ret;
> +     return 0;
> +}
> +
>  static long kvm_vm_ioctl(struct file *filp,
>                          unsigned int ioctl, unsigned long arg)
>  {
> @@ -2272,6 +2383,24 @@ static long kvm_vm_ioctl(struct file *filp,
>               break;
>       }
>  #endif
> +     case KVM_CREATE_DEVICE: {
> +             struct kvm_create_device cd;
> +
> +             r = -EFAULT;
> +             if (copy_from_user(&cd, argp, sizeof(cd)))
> +                     goto out;
> +
> +             r = kvm_ioctl_create_device(kvm, &cd);
> +             if (r)
> +                     goto out;
> +
> +             r = -EFAULT;
> +             if (copy_to_user(argp, &cd, sizeof(cd)))
> +                     goto out;
> +
> +             r = 0;
> +             break;
> +     }
>       default:
>               r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>               if (r == -ENOTTY)
> -- 
> 1.7.10.4
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to