from:"Dan Williams"

Re: [PATCH v4 27/33] nvdimm acpi: save arg3 for NVDIMM device _DSM method

2015-10-19 Thread Dan Williams

On Mon, Oct 19, 2015 at 2:19 PM, Michael S. Tsirkin  wrote:
> On Mon, Oct 19, 2015 at 10:29:50AM -0700, Dan Williams wrote:
>> On Mon, Oct 19, 2015 at 12:09 AM, Michael S. Tsirkin  wrote:
>> > On Mon, Oct 19, 2015 at 12:04:48PM +0800, Xiao Guangrong wrote:
>> > I mean don't use ASL to comment C. It's not more readable.
>> > Describe why the code is the way it is. Use variables by preference,
>> > C does not have weird limitations like ASL so you don't need to call
>> > your variables "arg3". What does it hold?
>> >
>>
>> What it holds is function number specific.  It's similar to
>> SYSCALL_DEFINEx where the ASL code is there to marshal arguments from
>> the OS through ACPI to a BIOS routine.  See the definition of the
>> example _DSM functions here and the usages of "Arg3":
>> http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
>
> So it seems to say "3.1.1.1.1 Input (Arg3)".
> Is that right? So
>
> Aml *input = aml_arg(3); /* Input (Arg3) */
> or even
> Aml *input_arg3 =  aml_arg(3); /* Input (Arg3) */
>
> My point is we are not writing ASL. There is no reason
> to use cryptic names.
>

Ah, ok, sounds good to me.  ASL is already hard to read and we
shouldn't be using it as code commentary.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 2/2] block: enable dax for raw block devices

2015-10-19 Thread Dan Williams

On Sun, Oct 18, 2015 at 8:01 PM, Ross Zwisler
 wrote:
> On Fri, Oct 16, 2015 at 08:49:41PM -0400, Dan Williams wrote:
>> If an application wants exclusive access to all of the persistent memory
>> provided by an NVDIMM namespace it can use this raw-block-dax facility
>> to forgo establishing a filesystem.  This capability is targeted
>> primarily to hypervisors wanting to provision persistent memory for
>> guests.
>>
>> Cc: Jeff Moyer 
>> Cc: Christoph Hellwig 
>> Cc: Al Viro 
>> Cc: Andrew Morton 
>> Cc: Ross Zwisler 
>> Cc: Xiao Guangrong 
>> Signed-off-by: Dan Williams 
>> ---
>>
>> Only lighted tested so far, but seems to work, is the shortest path to a
>> DAX mapping, and makes it easier to trigger the pmd_fault path (no
>> fs-block-allocator interactions).
>>
>>  fs/block_dev.c |   84 
>> +++-
>>  1 file changed, 83 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 5277dd83d254..498b71455570 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1687,13 +1687,95 @@ static const struct address_space_operations 
>> def_blk_aops = {
>>   .is_dirty_writeback = buffer_check_dirty_writeback,
>>  };
>>
>> +#ifdef CONFIG_FS_DAX
>> +static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault 
>> *vmf)
>> +{
>> + struct inode *bd_inode = file_bd_inode(vma->vm_file);
>> + struct block_device *bdev = I_BDEV(bd_inode);
>> + int ret;
>> +
>> + mutex_lock(&bdev->bd_mutex);
>> + ret = __dax_fault(vma, vmf, blkdev_get_block, NULL);
>> + mutex_unlock(&bdev->bd_mutex);
>> +
>> + return ret;
>> +}
>
> This all looks very straightforward.  The one comment I have is that this
> code is missing the calls to sb_[start|end]_pagefault(), and to
> file_update_time() that are found in ext[24]/xfs and the generic fault code.
>
> The previous version of this code used the generic fault implementation, and
> was calling these functions via filemap_page_mkwrite().
>
> It is possible that they were omitted for a reason - does protection from
> filesystem freezing still make sense when talking with a raw block device?
> For example, if that block device *has* a mounted filesystem on it that is
> frozen, does sb_start_pagefault() prevent against page faults on the raw
> device that try and make something writable?
>
> In any case, the presence of them in filemap_page_mkwrite() tells me that they
> at least aren't harmful, and I wanted to make sure they weren't needed before
> leaving them out.  If the omission was intentional, should we add a comment to
> explain why they are missing?

So, I left them out on purpose and labeled this "RFC" mainly because I
wasn't ready to assert that the new locking using bd_mutex did not
have bad interactions with those paths or the mmap path in general.

The access time and interactions with freezing are different for raw
block device files given that the inode for the per-instance
device-node file is separate from the inode for the block device
itself.  The device node file is typically on a tmpfs filesystem
mounted at /dev.  The access time of the device node doesn't have a
reliable correlation with the access time of the block-device due to
the fact that you can mknod() a new device node anywhere, on any
filesystem, and do i/o.

I'll consider adding them back if they do no harm, but "device special
files" are already sufficiently different than regular files that I'm
not sure it matters.  Either way I agree this could use more
commentary in the code.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 27/33] nvdimm acpi: save arg3 for NVDIMM device _DSM method

2015-10-19 Thread Dan Williams

On Mon, Oct 19, 2015 at 12:09 AM, Michael S. Tsirkin  wrote:
> On Mon, Oct 19, 2015 at 12:04:48PM +0800, Xiao Guangrong wrote:
> I mean don't use ASL to comment C. It's not more readable.
> Describe why the code is the way it is. Use variables by preference,
> C does not have weird limitations like ASL so you don't need to call
> your variables "arg3". What does it hold?
>

What it holds is function number specific.  It's similar to
SYSCALL_DEFINEx where the ASL code is there to marshal arguments from
the OS through ACPI to a BIOS routine.  See the definition of the
example _DSM functions here and the usages of "Arg3":
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 2/2] block: enable dax for raw block devices

2015-10-16 Thread Dan Williams

If an application wants exclusive access to all of the persistent memory
provided by an NVDIMM namespace it can use this raw-block-dax facility
to forgo establishing a filesystem.  This capability is targeted
primarily to hypervisors wanting to provision persistent memory for
guests.

Cc: Jeff Moyer 
Cc: Christoph Hellwig 
Cc: Al Viro 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Cc: Xiao Guangrong 
Signed-off-by: Dan Williams 
---

Only lighted tested so far, but seems to work, is the shortest path to a
DAX mapping, and makes it easier to trigger the pmd_fault path (no
fs-block-allocator interactions).

 fs/block_dev.c |   84 +++-
 1 file changed, 83 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 5277dd83d254..498b71455570 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1687,13 +1687,95 @@ static const struct address_space_operations 
def_blk_aops = {
.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#ifdef CONFIG_FS_DAX
+static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+   struct inode *bd_inode = file_bd_inode(vma->vm_file);
+   struct block_device *bdev = I_BDEV(bd_inode);
+   int ret;
+
+   mutex_lock(&bdev->bd_mutex);
+   ret = __dax_fault(vma, vmf, blkdev_get_block, NULL);
+   mutex_unlock(&bdev->bd_mutex);
+
+   return ret;
+}
+
+static int blkdev_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
+   pmd_t *pmd, unsigned int flags)
+{
+   struct inode *bd_inode = file_bd_inode(vma->vm_file);
+   struct block_device *bdev = I_BDEV(bd_inode);
+   int ret;
+
+   mutex_lock(&bdev->bd_mutex);
+   ret = __dax_pmd_fault(vma, addr, pmd, flags, blkdev_get_block, NULL);
+   mutex_unlock(&bdev->bd_mutex);
+
+   return ret;
+}
+
+static int blkdev_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+   struct inode *bd_inode = file_bd_inode(vma->vm_file);
+   struct block_device *bdev = I_BDEV(bd_inode);
+   int ret;
+
+   mutex_lock(&bdev->bd_mutex);
+   ret = __dax_mkwrite(vma, vmf, blkdev_get_block, NULL);
+   mutex_unlock(&bdev->bd_mutex);
+
+   return ret;
+}
+
+static int blkdev_dax_pfn_mkwrite(struct vm_area_struct *vma,
+   struct vm_fault *vmf)
+{
+   struct inode *bd_inode = file_bd_inode(vma->vm_file);
+   struct block_device *bdev = I_BDEV(bd_inode);
+   int ret = VM_FAULT_NOPAGE;
+   loff_t size;
+
+   /* check that the faulting page hasn't raced with bdev resize */
+   mutex_lock(&bdev->bd_mutex);
+   size = (i_size_read(bd_inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+   if (vmf->pgoff >= size)
+   ret = VM_FAULT_SIGBUS;
+   mutex_unlock(&bdev->bd_mutex);
+
+   return ret;
+}
+
+static const struct vm_operations_struct blkdev_dax_vm_ops = {
+   .fault  = blkdev_dax_fault,
+   .pmd_fault  = blkdev_dax_pmd_fault,
+   .page_mkwrite   = blkdev_dax_mkwrite,
+   .pfn_mkwrite= blkdev_dax_pfn_mkwrite,
+};
+
+static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct inode *bd_inode = file_bd_inode(file);
+
+   if (!IS_DAX(bd_inode))
+   return generic_file_mmap(file, vma);
+
+   file_accessed(file);
+   vma->vm_ops = &blkdev_dax_vm_ops;
+   vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
+   return 0;
+}
+#else
+#define blkdev_mmap generic_file_mmap
+#endif
+
 const struct file_operations def_blk_fops = {
.open   = blkdev_open,
.release= blkdev_close,
.llseek = block_llseek,
.read_iter  = blkdev_read_iter,
.write_iter = blkdev_write_iter,
-   .mmap   = generic_file_mmap,
+   .mmap   = blkdev_mmap,
.fsync  = blkdev_fsync,
.unlocked_ioctl = block_ioctl,
 #ifdef CONFIG_COMPAT

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 1/2] block: introduce file_bd_inode()

2015-10-16 Thread Dan Williams

Similar to the file_inode() helper, provide a helper to lookup the inode for a
raw block device itself.

Cc: Al Viro 
Signed-off-by: Dan Williams 
---
 fs/block_dev.c |   19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 073bb57adab1..5277dd83d254 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -147,11 +147,16 @@ blkdev_get_block(struct inode *inode, sector_t iblock,
return 0;
 }
 
+static struct inode *file_bd_inode(struct file *file)
+{
+   return file->f_mapping->host;
+}
+
 static ssize_t
 blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 {
struct file *file = iocb->ki_filp;
-   struct inode *inode = file->f_mapping->host;
+   struct inode *inode = file_bd_inode(file);
 
if (IS_DAX(inode))
return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
@@ -329,7 +334,7 @@ static int blkdev_write_end(struct file *file, struct 
address_space *mapping,
  */
 static loff_t block_llseek(struct file *file, loff_t offset, int whence)
 {
-   struct inode *bd_inode = file->f_mapping->host;
+   struct inode *bd_inode = file_bd_inode(file);
loff_t retval;
 
mutex_lock(&bd_inode->i_mutex);
@@ -340,7 +345,7 @@ static loff_t block_llseek(struct file *file, loff_t 
offset, int whence)

 int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
 {
-   struct inode *bd_inode = filp->f_mapping->host;
+   struct inode *bd_inode = file_bd_inode(filp);
struct block_device *bdev = I_BDEV(bd_inode);
int error;

@@ -1579,14 +1584,14 @@ EXPORT_SYMBOL(blkdev_put);
 
 static int blkdev_close(struct inode * inode, struct file * filp)
 {
-   struct block_device *bdev = I_BDEV(filp->f_mapping->host);
+   struct block_device *bdev = I_BDEV(file_bd_inode(filp));
blkdev_put(bdev, filp->f_mode);
return 0;
 }
 
 static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 {
-   struct block_device *bdev = I_BDEV(file->f_mapping->host);
+   struct block_device *bdev = I_BDEV(file_bd_inode(file));
fmode_t mode = file->f_mode;
 
/*
@@ -1611,7 +1616,7 @@ static long block_ioctl(struct file *file, unsigned cmd, 
unsigned long arg)
 ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
struct file *file = iocb->ki_filp;
-   struct inode *bd_inode = file->f_mapping->host;
+   struct inode *bd_inode = file_bd_inode(file);
loff_t size = i_size_read(bd_inode);
struct blk_plug plug;
ssize_t ret;
@@ -1643,7 +1648,7 @@ EXPORT_SYMBOL_GPL(blkdev_write_iter);
 ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
struct file *file = iocb->ki_filp;
-   struct inode *bd_inode = file->f_mapping->host;
+   struct inode *bd_inode = file_bd_inode(file);
loff_t size = i_size_read(bd_inode);
loff_t pos = iocb->ki_pos;
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-14 Thread Dan Williams

On Tue, Oct 13, 2015 at 9:03 PM, Xiao Guangrong
 wrote:
>> Label-less DIMMs are tested as part of the unit test [1] and the
>> "memmap=nn!ss" kernel parameter that registers a persistent-memory
>> address range without a DIMM.  What error do you see when label
>> support is disabled?
>>
>> [1]: https://github.com/pmem/ndctl/blob/master/README.md
>>
>
> After revert my commits on NVDIMM driver, yeah, it works.
>
> Okay, i will drop the namespace part and make it as label-less
> instead.
>
> Thank you, Dan!
>

Good to hear.  There are still cases where a guest would likely want
to submit a _DSM, like retrieving address range scrub results from the
ACPI0012 root device, so the ASL work is still needed.  However, I
think the bulk of the storage functionality can be had without
storing/retrieving labels in the guest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Dan Williams

On Mon, Oct 12, 2015 at 10:49 PM, Xiao Guangrong
 wrote:
>
>
> On 10/13/2015 11:38 AM, Dan Williams wrote:
>>
>> On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrong
>>  wrote:
>>>
>>> On 10/13/2015 12:36 AM, Dan Williams wrote:
>>>>
>>>> Static namespaces can be emitted without a label.  Linux needs this to
>>>> support existing "label-less" bare metal NVDIMMs.
>>>
>>>
>>>
>>> This is Linux specific? As i did not see it has been documented in the
>>> spec...
>>
>>
>> I expect most NVDIMMs, especially existing ones available today, do
>> not have a label area.  This is not Linux specific and ACPI 6 does not
>> specify a label area, only the Intel DSM Interface Example.
>>
>
> Yup, label data is accessed via DSM interface, the spec I mentioned
> is Intel DSM Interface Example.
>
> However, IIRC Linux NVDIMM driver refused to use the device if no
> DSM GET_LABEL support, are you going to update it?

Label-less DIMMs are tested as part of the unit test [1] and the
"memmap=nn!ss" kernel parameter that registers a persistent-memory
address range without a DIMM.  What error do you see when label
support is disabled?

[1]: https://github.com/pmem/ndctl/blob/master/README.md
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Dan Williams

On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrong
 wrote:
> On 10/13/2015 12:36 AM, Dan Williams wrote:
>> Static namespaces can be emitted without a label.  Linux needs this to
>> support existing "label-less" bare metal NVDIMMs.
>
>
> This is Linux specific? As i did not see it has been documented in the
> spec...

I expect most NVDIMMs, especially existing ones available today, do
not have a label area.  This is not Linux specific and ACPI 6 does not
specify a label area, only the Intel DSM Interface Example.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 23/32] nvdimm: build ACPI NFIT table

2015-10-12 Thread Dan Williams

On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrong
 wrote:
> NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
>
> Currently, we only support PMEM mode. Each device has 3 structures:
> - SPA structure, defines the PMEM region info
>
> - MEM DEV structure, it has the @handle which is used to associate specified
>   ACPI NVDIMM  device we will introduce in later patch.
>   Also we can happily ignored the memory device's interleave, the real
>   nvdimm hardware access is hidden behind host
>
> - DCR structure, it defines vendor ID used to associate specified vendor
>   nvdimm driver. Since we only implement PMEM mode this time, Command
>   window and Data window are not needed
>
> Signed-off-by: Xiao Guangrong 
> ---
>  hw/i386/acpi-build.c |   4 +
>  hw/mem/nvdimm/acpi.c | 209 
> ++-
>  hw/mem/nvdimm/internal.h |  13 +++
>  hw/mem/nvdimm/nvdimm.c   |  25 ++
>  include/hw/mem/nvdimm.h  |   2 +
>  5 files changed, 252 insertions(+), 1 deletion(-)
>
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 95e0c65..c637dc8 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1661,6 +1661,7 @@ static bool acpi_has_iommu(void)
>  static
>  void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>  {
> +PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
>  GArray *table_offsets;
>  unsigned facs, ssdt, dsdt, rsdt;
>  AcpiCpuInfo cpu;
> @@ -1742,6 +1743,9 @@ void acpi_build(PcGuestInfo *guest_info, 
> AcpiBuildTables *tables)
>  build_dmar_q35(tables_blob, tables->linker);
>  }
>
> +nvdimm_build_acpi_table(&pcms->nvdimm_memory, table_offsets, tables_blob,
> +tables->linker);
> +
>  /* Add tables supplied by user (if any) */
>  for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
>  unsigned len = acpi_table_len(u);
> diff --git a/hw/mem/nvdimm/acpi.c b/hw/mem/nvdimm/acpi.c
> index b640874..62b1e02 100644
> --- a/hw/mem/nvdimm/acpi.c
> +++ b/hw/mem/nvdimm/acpi.c
> @@ -32,6 +32,46 @@
>  #include "hw/mem/nvdimm.h"
>  #include "internal.h"
>
> +static void nfit_spa_uuid_pm(uuid_le *uuid)
> +{
> +uuid_le uuid_pm = UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d,
> +  0x33, 0x18, 0xb7, 0x8c, 0xdb);
> +memcpy(uuid, &uuid_pm, sizeof(uuid_pm));
> +}
> +
> +enum {
> +NFIT_STRUCTURE_SPA = 0,
> +NFIT_STRUCTURE_MEMDEV = 1,
> +NFIT_STRUCTURE_IDT = 2,
> +NFIT_STRUCTURE_SMBIOS = 3,
> +NFIT_STRUCTURE_DCR = 4,
> +NFIT_STRUCTURE_BDW = 5,
> +NFIT_STRUCTURE_FLUSH = 6,
> +};
> +
> +enum {
> +EFI_MEMORY_UC = 0x1ULL,
> +EFI_MEMORY_WC = 0x2ULL,
> +EFI_MEMORY_WT = 0x4ULL,
> +EFI_MEMORY_WB = 0x8ULL,
> +EFI_MEMORY_UCE = 0x10ULL,
> +EFI_MEMORY_WP = 0x1000ULL,
> +EFI_MEMORY_RP = 0x2000ULL,
> +EFI_MEMORY_XP = 0x4000ULL,
> +EFI_MEMORY_NV = 0x8000ULL,
> +EFI_MEMORY_MORE_RELIABLE = 0x1ULL,
> +};

Would it worth including / copying the ACPICA header files directly?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/acpi/actbl1.h
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/acpi/acuuid.h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Dan Williams

On Sun, Oct 11, 2015 at 9:33 PM, Xiao Guangrong
 wrote:
>
>
> On 10/11/2015 05:17 AM, Dan Williams wrote:
>>
>> On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrong
>>  wrote:
>> [..]
>>>
>>> == Test ==
>>> In host
>>> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G
>>> count=10
>>> 2) append "-object memory-backend-file,share,id=mem1,
>>> mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
>>> id=nv1" in QEMU command line
>>>
>>> In guest, download the latest upsteam kernel (4.2 merge window) and
>>> enable
>>> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
>>> 1) insmod drivers/nvdimm/libnvdimm.ko
>>> 2) insmod drivers/acpi/nfit.ko
>>> 3) insmod drivers/nvdimm/nd_btt.ko
>>> 4) insmod drivers/nvdimm/nd_pmem.ko
>>> You can see the whole nvdimm device used as a single namespace and
>>> /dev/pmem0
>>> appears. You can do whatever on /dev/pmem0 including DAX access.
>>>
>>> Currently Linux NVDIMM driver does not support namespace operation on
>>> this
>>> kind of PMEM, apply below changes to support dynamical namespace:
>>>
>>> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct
>>> acpi_nfit_desc *a
>>>  continue;
>>>  }
>>>
>>> -   if (nfit_mem->bdw && nfit_mem->memdev_pmem)
>>> +   //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
>>> +   if (nfit_mem->memdev_pmem)
>>>  flags |= NDD_ALIASING;
>>
>>
>> This is just for testing purposes, right?  I expect guests can
>
>
> It's used to validate NVDIMM _DSM method and static namespace following
> NVDIMM specs...

Static namespaces can be emitted without a label.  Linux needs this to
support existing "label-less" bare metal NVDIMMs.

>> sub-divide persistent memory capacity by partitioning the resulting
>> block device(s).
>
>
> I understand that it's a Linux design... Hmm, can the same expectation
> apply to PBLK?

BLK-mode is a bit different as those namespaces have both configurable
sector-size and an optional BTT.  It is possible to expect multiple
BLK namespaces per a given region with different settings.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 11/20] kvm: rename pfn_t to kvm_pfn_t

2015-10-12 Thread Dan Williams

On Mon, Oct 12, 2015 at 5:51 AM, Paolo Bonzini  wrote:
>
>
> On 10/10/2015 22:57, Dan Williams wrote:
>> On Sat, Oct 10, 2015 at 1:35 PM, Paolo Bonzini  wrote:
>>> On 10/10/2015 02:56, Dan Williams wrote:
>>>> The core has developed a need for a "pfn_t" type [1].  Move the existing
>>>> pfn_t in KVM to kvm_pfn_t [2].
>>>>
>>>> [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
>>>> [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
>>>
>>> Can you please change also the other types in include/linux/kvm_types.h?
>>
>> Hmm, all those seem kvm specific already.  I'd only prefix them with
>> kvm_ if they collided with a "core" type.
>
> But they are all related and the code becomes uglier if you only prefix
> one of them.  If you don't convert all of them, I will do it anyway as
> soon as this patch get in.

Ok.

> Since it touches a lot of KVM files, we should synchronize in order to
> avoid conflicts and gnashing of teeth.  What tree is this patch going
> in?  You could provide me a commit SHA1 for this patch (well, its
> definitive version) based on Linus's tree (so that I can merge it in my
> tree as well), or I could commit it and provide the SHA1 to the
> maintainer of said tree.

The kvm_pfn_t conversion is only needed if the new pfn_t
infrastructure moves forward, and at this point it still needs some
review feedback.

How about this, care to send conversion patches for the rest? ...based on:


https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending

When/if the new pfn_t bits move forward I'll carry them in the same
pull request through the nvdimm.git tree.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-10 Thread Dan Williams

On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrong
 wrote:
[..]
> == Test ==
> In host
> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
> 2) append "-object memory-backend-file,share,id=mem1,
>mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
>id=nv1" in QEMU command line
>
> In guest, download the latest upsteam kernel (4.2 merge window) and enable
> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
> 1) insmod drivers/nvdimm/libnvdimm.ko
> 2) insmod drivers/acpi/nfit.ko
> 3) insmod drivers/nvdimm/nd_btt.ko
> 4) insmod drivers/nvdimm/nd_pmem.ko
> You can see the whole nvdimm device used as a single namespace and /dev/pmem0
> appears. You can do whatever on /dev/pmem0 including DAX access.
>
> Currently Linux NVDIMM driver does not support namespace operation on this
> kind of PMEM, apply below changes to support dynamical namespace:
>
> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
> *a
> continue;
> }
>
> -   if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +   //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +   if (nfit_mem->memdev_pmem)
> flags |= NDD_ALIASING;

This is just for testing purposes, right?  I expect guests can
sub-divide persistent memory capacity by partitioning the resulting
block device(s).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/4] net: enables interface option to skip IP

2014-02-25 Thread Dan Williams

On Mon, 2014-02-24 at 18:04 -0500, David Miller wrote:
> From: Dan Williams 
> Date: Mon, 24 Feb 2014 12:22:00 -0600
> 
> > In the future I expect more people will want to disable IPv4 as
> > they move to IPv6.
> 
> I definitely don't.
> 
> I've been lightly following this conversation and I have to say
> a few things.
> 
> disable_ipv6 was added because people wanted to make sure their
> machines didn't generate any ipv6 traffic because "ipv6 is not
> mature", "we don't have our firewalls configured to handle that
> kind of traffic" etc.
> 
> None of these things apply to ipv4.
> 
> And if you think people will go to ipv6 only, you are dreaming.
> 
> Name a provider of a major web sitewho will go to strictly only
> providing an ipv6 facing site?
> 
> Only an idiot who wanted to lose significiant nunbers of page views
> and traffic would do that, so ipv4 based connectivity will be
> universally necessary forever.
> 
> I think disable_ipv4 is absolutely a non-starter.

Also, disable_ipv4 signals *intent*, which is distinct from current
state.

Does an interface without an IPv4 address mean that the user wished it
not to have one?

Or does it mean that DHCP hasn't started yet (but is supposed to), or
failed, or something hasn't gotten around to assigning an address yet?

disable_ipv4 lets you distinguish between these two cases, the same way
disable_ipv6 does.

Dan


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/4] net: enables interface option to skip IP

2014-02-24 Thread Dan Williams

On Thu, 2014-02-20 at 12:31 -0800, Luis R. Rodriguez wrote:
> On Wed, Feb 19, 2014 at 4:56 PM, Dan Williams  wrote:
> > Note that there isn't yet a disable_ipv4 knob though, I was
> > perhaps-too-subtly trying to get you to send a patch for it, since I can
> > use it too :)
> 
> Sure, can you describe a little better the use case, as I could use
> that for the commit log. My only current use case was the xen-netback
> case but Zoltan has noted a few cases where an IPv4 or IPv6 address
> *could* be used on the backend interfaces (which I'll still poke as
> its unclear to me why they have 'em).

My use-case would simply be to have an analogue for the disable_ipv6
case.  In the future I expect more people will want to disable IPv4 as
they move to IPv6.  If you don't have something like disable_ipv4, then
there's no way to ensure that some random program or something doesn't
set up IPv4 stuff that you don't want.

Same thing for IPv6; some people really don't want IPv6 enabled on an
interface no matter what; they don't want an IPv6LL address assigned,
they don't want kernel SLAAC, they want to ensure that *nothing*
IPv6-related gets done for that interface.  The same can be true for
IPv4, but we don't have a way of doing that right now.

Dan

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/4] net: enables interface option to skip IP

2014-02-19 Thread Dan Williams

On Thu, 2014-02-20 at 01:58 +0100, Hannes Frederic Sowa wrote:
> On Wed, Feb 19, 2014 at 06:56:17PM -0600, Dan Williams wrote:
> > Note that there isn't yet a disable_ipv4 knob though, I was
> > perhaps-too-subtly trying to get you to send a patch for it, since I can
> > use it too :)
> 
> Do you plan to implement
> <http://datatracker.ietf.org/doc/draft-ietf-sunset4-noipv4/>?
> 
> ;)

Well, not specifically, but with NetworkManager we do have a "disable
IPv4" method for IPv4, which now just doesn't do any kind of IPv4, but
obviously doesn't disable IPv4 entirely because that's not possible.  I
was only thinking that it would be nice to actually guarantee that IPv4
was disabled, just like disable_ipv6 does.

But we could certainly implement that draft if a patch shows up or if it
bubbled up the priority stack :)

Dan

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/4] net: enables interface option to skip IP

2014-02-19 Thread Dan Williams

On Wed, 2014-02-19 at 09:20 -0800, Luis R. Rodriguez wrote:
> On Wed, Feb 19, 2014 at 8:45 AM, Dan Williams  wrote:
> > On Tue, 2014-02-18 at 13:19 -0800, Luis R. Rodriguez wrote:
> >> On Mon, Feb 17, 2014 at 12:23 PM, Dan Williams  wrote:
> >> > On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
> >> >> From: "Luis R. Rodriguez" 
> >> >>
> >> >> Some interfaces do not need to have any IPv4 or IPv6
> >> >> addresses, so enable an option to specify this. One
> >> >> example where this is observed are virtualization
> >> >> backend interfaces which just use the net_device
> >> >> constructs to help with their respective frontends.
> >> >>
> >> >> This should optimize boot time and complexity on
> >> >> virtualization environments for each backend interface
> >> >> while also avoiding triggering SLAAC and DAD, which is
> >> >> simply pointless for these type of interfaces.
> >> >
> >> > Would it not be better/cleaner to use disable_ipv6 and then add a
> >> > disable_ipv4 sysctl, then use those with that interface?
> >>
> >> Sure, but note that the both disable_ipv6 and accept_dada sysctl
> >> parameters are global. ipv4 and ipv6 interfaces are created upon
> >> NETDEVICE_REGISTER, which will get triggered when a driver calls
> >> register_netdev(). The goal of this patch was to enable an early
> >> optimization for drivers that have no need ever for ipv4 or ipv6
> >> interfaces.
> >
> > Each interface gets override sysctls too though, eg:
> >
> > /proc/sys/net/ipv6/conf/enp0s25/disable_ipv6
> 
> I hadn't seen those, thanks!

Note that there isn't yet a disable_ipv4 knob though, I was
perhaps-too-subtly trying to get you to send a patch for it, since I can
use it too :)

Dan

> > which is the one I meant; you're obviously right that the global ones
> > aren't what you want here.  But the specific ones should be suitable?
> 
> Under the approach Stephen mentioned by first ensuring the interface
> is down yes. There's one use case I can consider to still want the
> patch though, more on that below.
> 
> > If you set that on a per-interface basis, then you'll get EPERM or
> > something whenever you try to add IPv6 addresses or do IPv6 routing.
> 
> Neat, thanks.
> 
> >> Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
> >> backends though, as such this is no longer applicable as a
> >> requirement. The ipv4 sysctl however still seems like a reasonable
> >> approach to enable optimizations of the network in topologies where
> >> its known we won't need them but -- we'd need to consider a much more
> >> granular solution, not just global as it is now for disable_ipv6, and
> >> we'd also have to figure out a clean way to do this to not incur the
> >> cost of early address interface addition upon register_netdev().
> >>
> >> Given that we have a use case for ipv4 and ipv6 addresses on
> >> xen-netback we no longer have an immediate use case for such early
> >> optimization primitives though, so I'll drop this.
> >>
> >> > The IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
> >> > already doing.
> >>
> >> disable_ipv6 is global, the goal was to make this granular and skip
> >> the cost upon early boot, but its been clarified we don't need this.
> >
> > Like Stephen says, you need to make sure you set them before IFF_UP, but
> > beyond that, wouldn't the interface-specific sysctls work?
> 
> Yeah that'll do it, unless there is a measurable run time benefit cost
> to never even add these in the first place. Consider a host with tons
> of guests, not sure how many is 'a lot' these days. One would have to
> measure the cost of reducing the amount of time it takes to boot these
> up. As discussed in the other threads though there *is* some use cases
> of assigning IPv4 or IPv6 addresses to the backend interfaces though:
> routing them (although its unclear to me if iptables can be used
> instead, Zoltan?). So at least now there no clear requirement to
> remove these interfaces or not have them at all. The boot time cost
> savings should be considered though if this is ultimately desirable. I
> saw tons of timers and events that'd get triggered with any IPv4 or
> IPv6 interface laying around.
> 
>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/4] net: enables interface option to skip IP

2014-02-19 Thread Dan Williams

On Tue, 2014-02-18 at 13:19 -0800, Luis R. Rodriguez wrote:
> On Mon, Feb 17, 2014 at 12:23 PM, Dan Williams  wrote:
> > On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
> >> From: "Luis R. Rodriguez" 
> >>
> >> Some interfaces do not need to have any IPv4 or IPv6
> >> addresses, so enable an option to specify this. One
> >> example where this is observed are virtualization
> >> backend interfaces which just use the net_device
> >> constructs to help with their respective frontends.
> >>
> >> This should optimize boot time and complexity on
> >> virtualization environments for each backend interface
> >> while also avoiding triggering SLAAC and DAD, which is
> >> simply pointless for these type of interfaces.
> >
> > Would it not be better/cleaner to use disable_ipv6 and then add a
> > disable_ipv4 sysctl, then use those with that interface?
> 
> Sure, but note that the both disable_ipv6 and accept_dada sysctl
> parameters are global. ipv4 and ipv6 interfaces are created upon
> NETDEVICE_REGISTER, which will get triggered when a driver calls
> register_netdev(). The goal of this patch was to enable an early
> optimization for drivers that have no need ever for ipv4 or ipv6
> interfaces.

Each interface gets override sysctls too though, eg:

/proc/sys/net/ipv6/conf/enp0s25/disable_ipv6

which is the one I meant; you're obviously right that the global ones
aren't what you want here.  But the specific ones should be suitable?
If you set that on a per-interface basis, then you'll get EPERM or
something whenever you try to add IPv6 addresses or do IPv6 routing.

> Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
> backends though, as such this is no longer applicable as a
> requirement. The ipv4 sysctl however still seems like a reasonable
> approach to enable optimizations of the network in topologies where
> its known we won't need them but -- we'd need to consider a much more
> granular solution, not just global as it is now for disable_ipv6, and
> we'd also have to figure out a clean way to do this to not incur the
> cost of early address interface addition upon register_netdev().
> 
> Given that we have a use case for ipv4 and ipv6 addresses on
> xen-netback we no longer have an immediate use case for such early
> optimization primitives though, so I'll drop this.
> 
> > The IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
> > already doing.
> 
> disable_ipv6 is global, the goal was to make this granular and skip
> the cost upon early boot, but its been clarified we don't need this.

Like Stephen says, you need to make sure you set them before IFF_UP, but
beyond that, wouldn't the interface-specific sysctls work?

Dan

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/4] net: enables interface option to skip IP

2014-02-17 Thread Dan Williams

On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" 
> 
> Some interfaces do not need to have any IPv4 or IPv6
> addresses, so enable an option to specify this. One
> example where this is observed are virtualization
> backend interfaces which just use the net_device
> constructs to help with their respective frontends.
> 
> This should optimize boot time and complexity on
> virtualization environments for each backend interface
> while also avoiding triggering SLAAC and DAD, which is
> simply pointless for these type of interfaces.

Would it not be better/cleaner to use disable_ipv6 and then add a
disable_ipv4 sysctl, then use those with that interface?  The
IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
already doing.

Dan

> Cc: "David S. Miller" 
> cC: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Cc: net...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez 
> ---
>  include/uapi/linux/if.h | 1 +
>  net/ipv4/devinet.c  | 3 +++
>  net/ipv6/addrconf.c | 6 ++
>  3 files changed, 10 insertions(+)
> 
> diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
> index 8d10382..566d856 100644
> --- a/include/uapi/linux/if.h
> +++ b/include/uapi/linux/if.h
> @@ -85,6 +85,7 @@
>* change when it's running */
>  #define IFF_MACVLAN 0x20 /* Macvlan device */
>  #define IFF_BRIDGE_NON_ROOT 0x40/* Don't consider for root bridge */
> +#define IFF_SKIP_IP  0x80/* Skip IPv4, IPv6 */
>  
> 
>  #define IF_GET_IFACE 0x0001  /* for querying only */
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index a1b5bcb..8e9ef07 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -1342,6 +1342,9 @@ static int inetdev_event(struct notifier_block *this, 
> unsigned long event,
>  
>   ASSERT_RTNL();
>  
> + if (dev->priv_flags & IFF_SKIP_IP)
> + goto out;
> +
>   if (!in_dev) {
>   if (event == NETDEV_REGISTER) {
>   in_dev = inetdev_init(dev);
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4b6b720..57f58e3 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -314,6 +314,9 @@ static struct inet6_dev *ipv6_add_dev(struct net_device 
> *dev)
>  
>   ASSERT_RTNL();
>  
> + if (dev->priv_flags & IFF_SKIP_IP)
> + return NULL;
> +
>   if (dev->mtu < IPV6_MIN_MTU)
>   return NULL;
>  
> @@ -2749,6 +2752,9 @@ static int addrconf_notify(struct notifier_block *this, 
> unsigned long event,
>   int run_pending = 0;
>   int err;
>  
> + if (dev->priv_flags & IFF_SKIP_IP)
> + return NOTIFY_OK;
> +
>   switch (event) {
>   case NETDEV_REGISTER:
>   if (!idev && dev->mtu >= IPV6_MIN_MTU) {


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] tun: dma engine support

2010-10-13 Thread Dan Williams

[ added Maciej ]

On Mon, Oct 11, 2010 at 1:52 PM, Michael S. Tsirkin  wrote:
> Simple hack to use dma engine for tun RX.
> Only one skb in flight at the moment.
>
> Signed-off-by: Michael S. Tsirkin 
> ---
>
> I am still looking at handling multiple skbs, but
> sending this out for early flames and improvement suggestions.
>
> Loopback testing seems to show only minor performance gains:
> this is not really suprising as data is hot in cache already.
> Where I would expect this to help more is with incoming
> traffic from an external NIC. This still needs to be tested.

Actually it is interesting that you did not see a performance loss
because the dma performs the transfer in memory so the destination
will be cache cold in the dma case compared to a cpu copy.

[..]
> +int tun_dma_copybreak = 0x1;
> +module_param_named(dma_copybreak, tun_dma_copybreak, int, 0644);

What platform are you using for testing?  If this proves beneficial we
may need to adjust this value to have a platform specific default,
similar to how ioatdma sets tcp_dma_copybreak depending on the
hardware version.  You will notice that on Nehalem class hardware
(drivers/dma/ioat/dma_v3.c) the default is set to 256K effectively
disabling offload as the overhead of managing the dma engine
overshadows any offload advantage.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 27/33] nvdimm acpi: save arg3 for NVDIMM device _DSM method

Re: [RFC PATCH 2/2] block: enable dax for raw block devices

Re: [PATCH v4 27/33] nvdimm acpi: save arg3 for NVDIMM device _DSM method

[RFC PATCH 2/2] block: enable dax for raw block devices

[RFC PATCH 1/2] block: introduce file_bd_inode()

Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

Re: [PATCH v3 23/32] nvdimm: build ACPI NFIT table

Re: [PATCH v3 00/32] implement vNVDIMM

Re: [PATCH v2 11/20] kvm: rename pfn_t to kvm_pfn_t

Re: [PATCH v3 00/32] implement vNVDIMM

Re: [RFC v2 2/4] net: enables interface option to skip IP

Re: [RFC v2 2/4] net: enables interface option to skip IP

Re: [RFC v2 2/4] net: enables interface option to skip IP

Re: [RFC v2 2/4] net: enables interface option to skip IP

Re: [RFC v2 2/4] net: enables interface option to skip IP

Re: [RFC v2 2/4] net: enables interface option to skip IP

Re: [PATCH RFC] tun: dma engine support

19 matches

Site Navigation

Mail list logo

Footer information