Re: [PATCH v2 9/9] iomap: Change calling convention for zeroing

2020-09-10 Thread Christoph Hellwig
On Fri, Sep 11, 2020 at 12:47:07AM +0100, Matthew Wilcox (Oracle) wrote:
> Pass the full length to iomap_zero() and dax_iomap_zero(), and have
> them return how many bytes they actually handled.  This is preparatory
> work for handling THP, although it looks like DAX could actually take
> advantage of it if there's a larger contiguous area.

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending

2020-09-10 Thread Christoph Hellwig
On Fri, Sep 11, 2020 at 12:47:04AM +0100, Matthew Wilcox (Oracle) wrote:
> Instead of counting bio segments, count the number of bytes submitted.
> This insulates us from the block layer's definition of what a 'same page'
> is, which is not necessarily clear once THPs are involved.
> 
> Signed-off-by: Matthew Wilcox (Oracle) 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page

2020-09-10 Thread Christoph Hellwig
On Fri, Sep 11, 2020 at 12:47:03AM +0100, Matthew Wilcox (Oracle) wrote:
> Size the uptodate array dynamically to support larger pages in the
> page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> but with a 2MB maximum page size, we'd have to allocate more than 4kB
> per page.  Add a few debugging assertions.
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> Reviewed-by: Dave Chinner 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-10 Thread kernel test robot
Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on next-20200909]
[cannot apply to mmotm/master hnaz-linux-mm/master xen-tip/linux-next 
powerpc/next linus/master v5.9-rc4 v5.9-rc3 v5.9-rc2 v5.9-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/David-Hildenbrand/mm-memory_hotplug-selective-merging-of-system-ram-resources/20200910-171630
base:7204eaa2c1f509066486f488c9dcb065d7484494
config: x86_64-randconfig-a016-20200909 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
0a5dc7effb191eff740e0e7ae7bd8e1f6bdb3ad9)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   WARNING: unmet direct dependencies detected for PHY_SAMSUNG_UFS
   Depends on OF && (ARCH_EXYNOS || COMPILE_TEST
   Selected by
   - SCSI_UFS_EXYNOS && SCSI_LOWLEVEL && SCSI && SCSI_UFSHCD_PLATFORM && 
(ARCH_EXYNOS || COMPILE_TEST
   In file included from arch/x86/kernel/asm-offsets.c:9:
   In file included from include/linux/crypto.h:20:
   In file included from include/linux/slab.h:15:
   In file included from include/linux/gfp.h:6:
   In file included from include/linux/mmzone.h:853:
>> include/linux/memory_hotplug.h:354:55: error: unknown type name 'mhp_t'
   extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
   ^
   include/linux/memory_hotplug.h:355:53: error: unknown type name 'mhp_t'
   extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
   ^
   include/linux/memory_hotplug.h:357:11: error: unknown type name 'mhp_t'
   mhp_t mhp_flags);
   ^
   include/linux/memory_hotplug.h:360:10: error: unknown type name 'mhp_t'
   mhp_t mhp_flags);
   ^
   4 errors generated.
   Makefile Module.symvers System.map arch block certs crypto drivers fs 
include init ipc kernel lib mm modules.builtin modules.builtin.modinfo 
modules.order net scripts security sound source tools usr virt vmlinux 
vmlinux.o vmlinux.symvers [scripts/Makefile.build:117: 
arch/x86/kernel/asm-offsets.s] Error 1
   Target '__build' not remade because of errors.
   Makefile Module.symvers System.map arch block certs crypto drivers fs 
include init ipc kernel lib mm modules.builtin modules.builtin.modinfo 
modules.order net scripts security sound source tools usr virt vmlinux 
vmlinux.o vmlinux.symvers [Makefile:1196: prepare0] Error 2
   Target 'prepare' not remade because of errors.
   make: Makefile Module.symvers System.map arch block certs crypto drivers fs 
include init ipc kernel lib mm modules.builtin modules.builtin.modinfo 
modules.order net scripts security sound source tools usr virt vmlinux 
vmlinux.o vmlinux.symvers [Makefile:185: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.

# 
https://github.com/0day-ci/linux/commit/d88270d1c0783a7f99f24a85692be90fd2ae0d7d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
David-Hildenbrand/mm-memory_hotplug-selective-merging-of-system-ram-resources/20200910-171630
git checkout d88270d1c0783a7f99f24a85692be90fd2ae0d7d
vim +/mhp_t +354 include/linux/memory_hotplug.h

   352  
   353  extern void __ref free_area_init_core_hotplug(int nid);
 > 354  extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] dax: fix for do not print error message for non-persistent memory block device

2020-09-10 Thread Coly Li
On 2020/9/11 04:29, John Pittman wrote:
> But it should be moved prior to the two bdev_dax_pgoff() checks right?
>  Else a misaligned partition on a dax unsupported block device can
> print the below messages.
> 
> kernel: sda1: error: unaligned partition for dax
> kernel: sda2: error: unaligned partition for dax
> kernel: sda3: error: unaligned partition for dax
> 

Aha, yes you are right, I agree with you.

Coly Li


> Reviewed-by: John Pittman 
> 
> On Thu, Sep 3, 2020 at 12:12 PM Coly Li  wrote:
>>
>> On 2020/9/4 00:06, Ira Weiny wrote:
>>> On Thu, Sep 03, 2020 at 07:55:49PM +0800, Coly Li wrote:
 When calling __generic_fsdax_supported(), a dax-unsupported device may
 not have dax_dev as NULL, e.g. the dax related code block is not enabled
 by Kconfig.

 Therefore in __generic_fsdax_supported(), to check whether a device
 supports DAX or not, the following order should be performed,
 - If dax_dev pointer is NULL, it means the device driver explicitly
   announce it doesn't support DAX. Then it is OK to directly return
   false from __generic_fsdax_supported().
 - If dax_dev pointer is NOT NULL, it might be because the driver doesn't
   support DAX and not explicitly initialize related data structure. Then
   bdev_dax_supported() should be called for further check.

 IMHO if device driver desn't explicitly set its dax_dev pointer to NULL,
 this is not a bug. Calling bdev_dax_supported() makes sure they can be
 recognized as dax-unsupported eventually.

 This patch does the following change for the above purpose,
 -   if (!dax_dev && !bdev_dax_supported(bdev, blocksize)) {
 +   if (!dax_dev || !bdev_dax_supported(bdev, blocksize)) {


 Fixes: c2affe920b0e ("dax: do not print error message for non-persistent 
 memory block device")
 Signed-off-by: Coly Li 
>>>
>>> I hate to do this because I realize this is a bug which people really need
>>> fixed.
>>>
>>> However, shouldn't we also check (!dax_dev || !bdev_dax_supported()) as the
>>> _first_ check in __generic_fsdax_supported()?
>>>
>>> It seems like the other pr_info's could also be called when DAX is not
>>> supported and we probably don't want them to be?
>>>
>>> Perhaps that should be a follow on patch though.  So...
>>
>> I am not author of c2affe920b0e, but I guess it was because
>> bdev_dax_supported() needed blocksize, so blocksize should pass previous
>> checks firstly to make sure bdev_dax_supported() has a correct blocksize
>> to check.
>>
>>>
>>> As a direct fix to c2affe920b0e
>>>
>>> Reviewed-by: Ira Weiny 
>>
>> Thanks.
>>
>> Coly Li
>>
[snipped]
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2 2/9] fs: Introduce i_blocks_per_page

2020-09-10 Thread Matthew Wilcox (Oracle)
This helper is useful for both THPs and for supporting block size larger
than page size.  Convert all users that I could find (we have a few
different ways of writing this idiom, and I may have missed some).

Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/buffered-io.c  |  8 
 fs/jfs/jfs_metapage.c   |  2 +-
 fs/xfs/xfs_aops.c   |  2 +-
 include/linux/pagemap.h | 16 
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d81a9a86c5aa..330f86b825d7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -46,7 +46,7 @@ iomap_page_create(struct inode *inode, struct page *page)
 {
struct iomap_page *iop = to_iomap_page(page);
 
-   if (iop || i_blocksize(inode) == PAGE_SIZE)
+   if (iop || i_blocks_per_page(inode, page) <= 1)
return iop;
 
iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
@@ -147,7 +147,7 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned 
off, unsigned len)
unsigned int i;
 
spin_lock_irqsave(&iop->uptodate_lock, flags);
-   for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
+   for (i = 0; i < i_blocks_per_page(inode, page); i++) {
if (i >= first && i <= last)
set_bit(i, iop->uptodate);
else if (!test_bit(i, iop->uptodate))
@@ -1077,7 +1077,7 @@ iomap_finish_page_writeback(struct inode *inode, struct 
page *page,
mapping_set_error(inode->i_mapping, -EIO);
}
 
-   WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
+   WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
 
if (!iop || atomic_dec_and_test(&iop->write_count))
@@ -1373,7 +1373,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
int error = 0, count = 0, i;
LIST_HEAD(submit_list);
 
-   WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
+   WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
 
/*
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index a2f5338a5ea1..176580f54af9 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -473,7 +473,7 @@ static int metapage_readpage(struct file *fp, struct page 
*page)
struct inode *inode = page->mapping->host;
struct bio *bio = NULL;
int block_offset;
-   int blocks_per_page = PAGE_SIZE >> inode->i_blkbits;
+   int blocks_per_page = i_blocks_per_page(inode, page);
sector_t page_start;/* address of page in fs blocks */
sector_t pblock;
int xlen;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index b35611882ff9..55d126d4e096 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -544,7 +544,7 @@ xfs_discard_page(
page, ip->i_ino, offset);
 
error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
-   PAGE_SIZE / i_blocksize(inode));
+   i_blocks_per_page(inode, page));
if (error && !XFS_FORCED_SHUTDOWN(mp))
xfs_alert(mp, "page discard unable to remove delalloc 
mapping.");
 out_invalidate:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 50d2c39b47ab..f7f602040913 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -975,4 +975,20 @@ static inline int page_mkwrite_check_truncate(struct page 
*page,
return offset;
 }
 
+/**
+ * i_blocks_per_page - How many blocks fit in this page.
+ * @inode: The inode which contains the blocks.
+ * @page: The page (head page if the page is a THP).
+ *
+ * If the block size is larger than the size of this page, return zero.
+ *
+ * Context: The caller should hold a refcount on the page to prevent it
+ * from being split.
+ * Return: The number of filesystem blocks covered by this page.
+ */
+static inline
+unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
+{
+   return thp_size(page) >> inode->i_blkbits;
+}
 #endif /* _LINUX_PAGEMAP_H */
-- 
2.28.0
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2 7/9] iomap: Convert write_count to write_bytes_pending

2020-09-10 Thread Matthew Wilcox (Oracle)
Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Christoph Hellwig 
---
 fs/iomap/buffered-io.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 1cf976a8e55c..64a5cb383f30 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -27,7 +27,7 @@
  */
 struct iomap_page {
atomic_tread_bytes_pending;
-   atomic_twrite_count;
+   atomic_twrite_bytes_pending;
spinlock_t  uptodate_lock;
unsigned long   uptodate[];
 };
@@ -73,7 +73,7 @@ iomap_page_release(struct page *page)
if (!iop)
return;
WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending));
-   WARN_ON_ONCE(atomic_read(&iop->write_count));
+   WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending));
WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
PageUptodate(page));
kfree(iop);
@@ -1047,7 +1047,7 @@ EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 
 static void
 iomap_finish_page_writeback(struct inode *inode, struct page *page,
-   int error)
+   int error, unsigned int len)
 {
struct iomap_page *iop = to_iomap_page(page);
 
@@ -1057,9 +1057,9 @@ iomap_finish_page_writeback(struct inode *inode, struct 
page *page,
}
 
WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
-   WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
+   WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) <= 0);
 
-   if (!iop || atomic_dec_and_test(&iop->write_count))
+   if (!iop || atomic_sub_and_test(len, &iop->write_bytes_pending))
end_page_writeback(page);
 }
 
@@ -1093,7 +1093,8 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 
/* walk each page on bio, ending page IO on them */
bio_for_each_segment_all(bv, bio, iter_all)
-   iomap_finish_page_writeback(inode, bv->bv_page, error);
+   iomap_finish_page_writeback(inode, bv->bv_page, error,
+   bv->bv_len);
bio_put(bio);
}
/* The ioend has been freed by bio_put() */
@@ -1309,8 +1310,8 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, 
struct page *page,
 
merged = __bio_try_merge_page(wpc->ioend->io_bio, page, len, poff,
&same_page);
-   if (iop && !same_page)
-   atomic_inc(&iop->write_count);
+   if (iop)
+   atomic_add(len, &iop->write_bytes_pending);
 
if (!merged) {
if (bio_full(wpc->ioend->io_bio, len)) {
@@ -1353,7 +1354,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
LIST_HEAD(submit_list);
 
WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
-   WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
+   WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
 
/*
 * Walk through the page to find areas to write back. If we run off the
-- 
2.28.0
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2 5/9] iomap: Support arbitrarily many blocks per page

2020-09-10 Thread Matthew Wilcox (Oracle)
Size the uptodate array dynamically to support larger pages in the
page cache.  With a 64kB page, we're only saving 8 bytes per page today,
but with a 2MB maximum page size, we'd have to allocate more than 4kB
per page.  Add a few debugging assertions.

Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Dave Chinner 
---
 fs/iomap/buffered-io.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7fc0e02d27b0..9670c096b83e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -22,18 +22,25 @@
 #include "../internal.h"
 
 /*
- * Structure allocated for each page when block size < PAGE_SIZE to track
- * sub-page uptodate status and I/O completions.
+ * Structure allocated for each page or THP when block size < page size
+ * to track sub-page uptodate status and I/O completions.
  */
 struct iomap_page {
atomic_tread_count;
atomic_twrite_count;
spinlock_t  uptodate_lock;
-   DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
+   unsigned long   uptodate[];
 };
 
 static inline struct iomap_page *to_iomap_page(struct page *page)
 {
+   /*
+* per-block data is stored in the head page.  Callers should
+* not be dealing with tail pages (and if they are, they can
+* call thp_head() first.
+*/
+   VM_BUG_ON_PGFLAGS(PageTail(page), page);
+
if (page_has_private(page))
return (struct iomap_page *)page_private(page);
return NULL;
@@ -45,11 +52,13 @@ static struct iomap_page *
 iomap_page_create(struct inode *inode, struct page *page)
 {
struct iomap_page *iop = to_iomap_page(page);
+   unsigned int nr_blocks = i_blocks_per_page(inode, page);
 
-   if (iop || i_blocks_per_page(inode, page) <= 1)
+   if (iop || nr_blocks <= 1)
return iop;
 
-   iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
+   iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
+   GFP_NOFS | __GFP_NOFAIL);
spin_lock_init(&iop->uptodate_lock);
attach_page_private(page, iop);
return iop;
@@ -59,11 +68,14 @@ static void
 iomap_page_release(struct page *page)
 {
struct iomap_page *iop = detach_page_private(page);
+   unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
 
if (!iop)
return;
WARN_ON_ONCE(atomic_read(&iop->read_count));
WARN_ON_ONCE(atomic_read(&iop->write_count));
+   WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
+   PageUptodate(page));
kfree(iop);
 }
 
-- 
2.28.0
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2 4/9] iomap: Use bitmap ops to set uptodate bits

2020-09-10 Thread Matthew Wilcox (Oracle)
Now that the bitmap is protected by a spinlock, we can use the
more efficient bitmap ops instead of individual test/set bit ops.

Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/buffered-io.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 58a1fd83f2a4..7fc0e02d27b0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -134,19 +134,11 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned 
off, unsigned len)
struct inode *inode = page->mapping->host;
unsigned first = off >> inode->i_blkbits;
unsigned last = (off + len - 1) >> inode->i_blkbits;
-   bool uptodate = true;
unsigned long flags;
-   unsigned int i;
 
spin_lock_irqsave(&iop->uptodate_lock, flags);
-   for (i = 0; i < i_blocks_per_page(inode, page); i++) {
-   if (i >= first && i <= last)
-   set_bit(i, iop->uptodate);
-   else if (!test_bit(i, iop->uptodate))
-   uptodate = false;
-   }
-
-   if (uptodate)
+   bitmap_set(iop->uptodate, first, last - first + 1);
+   if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
SetPageUptodate(page);
spin_unlock_irqrestore(&iop->uptodate_lock, flags);
 }
-- 
2.28.0
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2 0/9] THP iomap patches for 5.10

2020-09-10 Thread Matthew Wilcox (Oracle)
These patches are carefully plucked from the THP series.  I would like
them to hit 5.10 to make the THP patchset merge easier.  Some of these
are just generic improvements that make sense on their own terms, but
the overall intent is to support THPs in iomap.

v2:
 - Move the call to flush_dcache_page (Christoph)
 - Clarify comments (Darrick)
 - Rename read_count to read_bytes_pending (Christoph)
 - Rename write_count to write_bytes_pending (Christoph)
 - Restructure iomap_readpage_actor() (Christoph)
 - Change return type of the zeroing functions from loff_t to s64

Matthew Wilcox (Oracle) (9):
  iomap: Fix misplaced page flushing
  fs: Introduce i_blocks_per_page
  iomap: Use kzalloc to allocate iomap_page
  iomap: Use bitmap ops to set uptodate bits
  iomap: Support arbitrarily many blocks per page
  iomap: Convert read_count to read_bytes_pending
  iomap: Convert write_count to write_bytes_pending
  iomap: Convert iomap_write_end types
  iomap: Change calling convention for zeroing

 fs/dax.c|  13 ++-
 fs/iomap/buffered-io.c  | 173 +---
 fs/jfs/jfs_metapage.c   |   2 +-
 fs/xfs/xfs_aops.c   |   2 +-
 include/linux/dax.h |   3 +-
 include/linux/pagemap.h |  16 
 6 files changed, 96 insertions(+), 113 deletions(-)

-- 
2.28.0
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2 9/9] iomap: Change calling convention for zeroing

2020-09-10 Thread Matthew Wilcox (Oracle)
Pass the full length to iomap_zero() and dax_iomap_zero(), and have
them return how many bytes they actually handled.  This is preparatory
work for handling THP, although it looks like DAX could actually take
advantage of it if there's a larger contiguous area.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 fs/dax.c   | 13 ++---
 fs/iomap/buffered-io.c | 33 +++--
 include/linux/dax.h|  3 +--
 3 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 994ab66a9907..6ad346352a8c 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1037,18 +1037,18 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
return ret;
 }
 
-int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
-  struct iomap *iomap)
+s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
pgoff_t pgoff;
long rc, id;
void *kaddr;
bool page_aligned = false;
-
+   unsigned offset = offset_in_page(pos);
+   unsigned size = min_t(u64, PAGE_SIZE - offset, length);
 
if (IS_ALIGNED(sector << SECTOR_SHIFT, PAGE_SIZE) &&
-   IS_ALIGNED(size, PAGE_SIZE))
+   (size == PAGE_SIZE))
page_aligned = true;
 
rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
@@ -1058,8 +1058,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned 
size,
id = dax_read_lock();
 
if (page_aligned)
-   rc = dax_zero_page_range(iomap->dax_dev, pgoff,
-size >> PAGE_SHIFT);
+   rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
else
rc = dax_direct_access(iomap->dax_dev, pgoff, 1, &kaddr, NULL);
if (rc < 0) {
@@ -1072,7 +1071,7 @@ int dax_iomap_zero(loff_t pos, unsigned offset, unsigned 
size,
dax_flush(iomap->dax_dev, kaddr + offset, size);
}
dax_read_unlock(id);
-   return 0;
+   return size;
 }
 
 static loff_t
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index cb25a7b70401..3e1eb40a73fd 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -898,11 +898,13 @@ iomap_file_unshare(struct inode *inode, loff_t pos, 
loff_t len,
 }
 EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
-static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
-   unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
+static s64 iomap_zero(struct inode *inode, loff_t pos, u64 length,
+   struct iomap *iomap, struct iomap *srcmap)
 {
struct page *page;
int status;
+   unsigned offset = offset_in_page(pos);
+   unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
 
status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
if (status)
@@ -914,38 +916,33 @@ static int iomap_zero(struct inode *inode, loff_t pos, 
unsigned offset,
return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
 }
 
-static loff_t
-iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
-   void *data, struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
+   loff_t length, void *data, struct iomap *iomap,
+   struct iomap *srcmap)
 {
bool *did_zero = data;
loff_t written = 0;
-   int status;
 
/* already zeroed?  we're done. */
if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
-   return count;
+   return length;
 
do {
-   unsigned offset, bytes;
-
-   offset = offset_in_page(pos);
-   bytes = min_t(loff_t, PAGE_SIZE - offset, count);
+   s64 bytes;
 
if (IS_DAX(inode))
-   status = dax_iomap_zero(pos, offset, bytes, iomap);
+   bytes = dax_iomap_zero(pos, length, iomap);
else
-   status = iomap_zero(inode, pos, offset, bytes, iomap,
-   srcmap);
-   if (status < 0)
-   return status;
+   bytes = iomap_zero(inode, pos, length, iomap, srcmap);
+   if (bytes < 0)
+   return bytes;
 
pos += bytes;
-   count -= bytes;
+   length -= bytes;
written += bytes;
if (did_zero)
*did_zero = true;
-   } while (count > 0);
+   } while (length > 0);
 
return written;
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 6904d4e0b2e0..951a851a0481 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -214,8 +214,7 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
 int dax_delete_mapping_entry(struct ad

[PATCH v2 1/9] iomap: Fix misplaced page flushing

2020-09-10 Thread Matthew Wilcox (Oracle)
If iomap_unshare_actor() unshares to an inline iomap, the page was
not being flushed.  block_write_end() and __iomap_write_end() already
contain flushes, so adding it to iomap_write_end_inline() seems like
the best place.  That means we can remove it from iomap_write_actor().

Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Dave Chinner 
Reviewed-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
---
 fs/iomap/buffered-io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 897ab9a26a74..d81a9a86c5aa 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -717,6 +717,7 @@ iomap_write_end_inline(struct inode *inode, struct page 
*page,
WARN_ON_ONCE(!PageUptodate(page));
BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
+   flush_dcache_page(page);
addr = kmap_atomic(page);
memcpy(iomap->inline_data + pos, addr + pos, copied);
kunmap_atomic(addr);
@@ -810,8 +811,6 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
 
copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 
-   flush_dcache_page(page);
-
status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
srcmap);
if (unlikely(status < 0))
-- 
2.28.0
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2 3/9] iomap: Use kzalloc to allocate iomap_page

2020-09-10 Thread Matthew Wilcox (Oracle)
We can skip most of the initialisation, although spinlocks still
need explicit initialisation as architectures may use a non-zero
value to indicate unlocked.  The comment is no longer useful as
attach_page_private() handles the refcount now.

Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/buffered-io.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 330f86b825d7..58a1fd83f2a4 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -49,16 +49,8 @@ iomap_page_create(struct inode *inode, struct page *page)
if (iop || i_blocks_per_page(inode, page) <= 1)
return iop;
 
-   iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
-   atomic_set(&iop->read_count, 0);
-   atomic_set(&iop->write_count, 0);
+   iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
spin_lock_init(&iop->uptodate_lock);
-   bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
-
-   /*
-* migrate_page_move_mapping() assumes that pages with private data have
-* their count elevated by 1.
-*/
attach_page_private(page, iop);
return iop;
 }
-- 
2.28.0
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2 8/9] iomap: Convert iomap_write_end types

2020-09-10 Thread Matthew Wilcox (Oracle)
iomap_write_end cannot return an error, so switch it to return
size_t instead of int and remove the error checking from the callers.
Also convert the arguments to size_t from unsigned int, in case anyone
ever wants to support a page size larger than 2GB.

Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Christoph Hellwig 
---
 fs/iomap/buffered-io.c | 31 ---
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 64a5cb383f30..cb25a7b70401 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -663,9 +663,8 @@ iomap_set_page_dirty(struct page *page)
 }
 EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
 
-static int
-__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-   unsigned copied, struct page *page)
+static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
+   size_t copied, struct page *page)
 {
flush_dcache_page(page);
 
@@ -687,9 +686,8 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned 
len,
return copied;
 }
 
-static int
-iomap_write_end_inline(struct inode *inode, struct page *page,
-   struct iomap *iomap, loff_t pos, unsigned copied)
+static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
+   struct iomap *iomap, loff_t pos, size_t copied)
 {
void *addr;
 
@@ -705,13 +703,14 @@ iomap_write_end_inline(struct inode *inode, struct page 
*page,
return copied;
 }
 
-static int
-iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
-   struct page *page, struct iomap *iomap, struct iomap *srcmap)
+/* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
+static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
+   size_t copied, struct page *page, struct iomap *iomap,
+   struct iomap *srcmap)
 {
const struct iomap_page_ops *page_ops = iomap->page_ops;
loff_t old_size = inode->i_size;
-   int ret;
+   size_t ret;
 
if (srcmap->type == IOMAP_INLINE) {
ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
@@ -790,11 +789,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
 
copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
 
-   status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
+   copied = iomap_write_end(inode, pos, bytes, copied, page, iomap,
srcmap);
-   if (unlikely(status < 0))
-   break;
-   copied = status;
 
cond_resched();
 
@@ -868,11 +864,8 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
 
status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
srcmap);
-   if (unlikely(status <= 0)) {
-   if (WARN_ON_ONCE(status == 0))
-   return -EIO;
-   return status;
-   }
+   if (WARN_ON_ONCE(status == 0))
+   return -EIO;
 
cond_resched();
 
-- 
2.28.0
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending

2020-09-10 Thread Matthew Wilcox (Oracle)
Instead of counting bio segments, count the number of bytes submitted.
This insulates us from the block layer's definition of what a 'same page'
is, which is not necessarily clear once THPs are involved.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 fs/iomap/buffered-io.c | 41 -
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9670c096b83e..1cf976a8e55c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -26,7 +26,7 @@
  * to track sub-page uptodate status and I/O completions.
  */
 struct iomap_page {
-   atomic_tread_count;
+   atomic_tread_bytes_pending;
atomic_twrite_count;
spinlock_t  uptodate_lock;
unsigned long   uptodate[];
@@ -72,7 +72,7 @@ iomap_page_release(struct page *page)
 
if (!iop)
return;
-   WARN_ON_ONCE(atomic_read(&iop->read_count));
+   WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending));
WARN_ON_ONCE(atomic_read(&iop->write_count));
WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
PageUptodate(page));
@@ -167,13 +167,6 @@ iomap_set_range_uptodate(struct page *page, unsigned off, 
unsigned len)
SetPageUptodate(page);
 }
 
-static void
-iomap_read_finish(struct iomap_page *iop, struct page *page)
-{
-   if (!iop || atomic_dec_and_test(&iop->read_count))
-   unlock_page(page);
-}
-
 static void
 iomap_read_page_end_io(struct bio_vec *bvec, int error)
 {
@@ -187,7 +180,8 @@ iomap_read_page_end_io(struct bio_vec *bvec, int error)
iomap_set_range_uptodate(page, bvec->bv_offset, bvec->bv_len);
}
 
-   iomap_read_finish(iop, page);
+   if (!iop || atomic_sub_and_test(bvec->bv_len, &iop->read_bytes_pending))
+   unlock_page(page);
 }
 
 static void
@@ -267,30 +261,19 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
}
 
ctx->cur_page_in_bio = true;
+   if (iop)
+   atomic_add(plen, &iop->read_bytes_pending);
 
-   /*
-* Try to merge into a previous segment if we can.
-*/
+   /* Try to merge into a previous segment if we can */
sector = iomap_sector(iomap, pos);
-   if (ctx->bio && bio_end_sector(ctx->bio) == sector)
+   if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
+   if (__bio_try_merge_page(ctx->bio, page, plen, poff,
+   &same_page))
+   goto done;
is_contig = true;
-
-   if (is_contig &&
-   __bio_try_merge_page(ctx->bio, page, plen, poff, &same_page)) {
-   if (!same_page && iop)
-   atomic_inc(&iop->read_count);
-   goto done;
}
 
-   /*
-* If we start a new segment we need to increase the read count, and we
-* need to do so before submitting any previous full bio to make sure
-* that we don't prematurely unlock the page.
-*/
-   if (iop)
-   atomic_inc(&iop->read_count);
-
-   if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {
+   if (!is_contig || bio_full(ctx->bio, plen)) {
gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
gfp_t orig_gfp = gfp;
int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
-- 
2.28.0
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] dax: fix for do not print error message for non-persistent memory block device

2020-09-10 Thread John Pittman
But it should be moved prior to the two bdev_dax_pgoff() checks right?
 Else a misaligned partition on a dax unsupported block device can
print the below messages.

kernel: sda1: error: unaligned partition for dax
kernel: sda2: error: unaligned partition for dax
kernel: sda3: error: unaligned partition for dax

Reviewed-by: John Pittman 

On Thu, Sep 3, 2020 at 12:12 PM Coly Li  wrote:
>
> On 2020/9/4 00:06, Ira Weiny wrote:
> > On Thu, Sep 03, 2020 at 07:55:49PM +0800, Coly Li wrote:
> >> When calling __generic_fsdax_supported(), a dax-unsupported device may
> >> not have dax_dev as NULL, e.g. the dax related code block is not enabled
> >> by Kconfig.
> >>
> >> Therefore in __generic_fsdax_supported(), to check whether a device
> >> supports DAX or not, the following order should be performed,
> >> - If dax_dev pointer is NULL, it means the device driver explicitly
> >>   announce it doesn't support DAX. Then it is OK to directly return
> >>   false from __generic_fsdax_supported().
> >> - If dax_dev pointer is NOT NULL, it might be because the driver doesn't
> >>   support DAX and not explicitly initialize related data structure. Then
> >>   bdev_dax_supported() should be called for further check.
> >>
> >> IMHO if device driver desn't explicitly set its dax_dev pointer to NULL,
> >> this is not a bug. Calling bdev_dax_supported() makes sure they can be
> >> recognized as dax-unsupported eventually.
> >>
> >> This patch does the following change for the above purpose,
> >> -   if (!dax_dev && !bdev_dax_supported(bdev, blocksize)) {
> >> +   if (!dax_dev || !bdev_dax_supported(bdev, blocksize)) {
> >>
> >>
> >> Fixes: c2affe920b0e ("dax: do not print error message for non-persistent 
> >> memory block device")
> >> Signed-off-by: Coly Li 
> >
> > I hate to do this because I realize this is a bug which people really need
> > fixed.
> >
> > However, shouldn't we also check (!dax_dev || !bdev_dax_supported()) as the
> > _first_ check in __generic_fsdax_supported()?
> >
> > It seems like the other pr_info's could also be called when DAX is not
> > supported and we probably don't want them to be?
> >
> > Perhaps that should be a follow on patch though.  So...
>
> I am not author of c2affe920b0e, but I guess it was because
> bdev_dax_supported() needed blocksize, so blocksize should pass previous
> checks firstly to make sure bdev_dax_supported() has a correct blocksize
> to check.
>
> >
> > As a direct fix to c2affe920b0e
> >
> > Reviewed-by: Ira Weiny 
>
> Thanks.
>
> Coly Li
>
>
> >
> >> Cc: Adrian Huang 
> >> Cc: Ira Weiny 
> >> Cc: Jan Kara 
> >> Cc: Mike Snitzer 
> >> Cc: Pankaj Gupta 
> >> Cc: Vishal Verma 
> >> ---
> >>  drivers/dax/super.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> >> index 32642634c1bb..e5767c83ea23 100644
> >> --- a/drivers/dax/super.c
> >> +++ b/drivers/dax/super.c
> >> @@ -100,7 +100,7 @@ bool __generic_fsdax_supported(struct dax_device 
> >> *dax_dev,
> >>  return false;
> >>  }
> >>
> >> -if (!dax_dev && !bdev_dax_supported(bdev, blocksize)) {
> >> +if (!dax_dev || !bdev_dax_supported(bdev, blocksize)) {
> >>  pr_debug("%s: error: dax unsupported by block device\n",
> >>  bdevname(bdev, buf));
> >>  return false;
> >> --
> >> 2.26.2
> >>
> ___
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
>
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] powerpc/papr_scm: Fix warning triggered by perf_stats_show()

2020-09-10 Thread Ira Weiny
On Thu, Sep 10, 2020 at 02:52:12PM +0530, Vaibhav Jain wrote:
> A warning is reported by the kernel in case perf_stats_show() returns
> an error code. The warning is of the form below:
> 
>  papr_scm ibm,persistent-memory:ibm,pmemory@4411:
> Failed to query performance stats, Err:-10
>  dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
>  fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count
> 
> On investigation it looks like that the compiler is silently truncating the
> return value of drc_pmem_query_stats() from 'long' to 'int', since the
> variable used to store the return code 'rc' is an 'int'. This
> truncated value is then returned back as a 'ssize_t' back from
> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large
> unsigned number and triggers this warning..
> 
> To fix this we update the type of variable 'rc' from 'int' to
> 'ssize_t' that prevents the compiler from truncating the return value
> of drc_pmem_query_stats() and returning correct signed value back from
> perf_stats_show().
> 
> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance
>stats from PHYP')
> Signed-off-by: Vaibhav Jain 
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index a88a707a608aa..9f00b61676ab9 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor 
> *nd_desc,
>  static ssize_t perf_stats_show(struct device *dev,
>  struct device_attribute *attr, char *buf)
>  {
> - int index, rc;
> + int index;
> + ssize_t rc;

I'm not sure this is really fixing everything here.

drc_pmem_query_stats() can return negative errno's.  Why are those not checked
somewhere in perf_stats_show()?

It seems like all this fix is handling is a > 0 return value: 'ret[0]' from
line 289 in papr_scm.c...  Or something?

Worse yet drc_pmem_query_stats() is returning ssize_t which is a signed value.
Therefore, it should not be returning -errno.  I'm surprised the static
checkers did not catch that.

I believe I caught similar errors with a patch series before which did not pay
attention to variable types.

Please audit this code for these types of errors and ensure you are really
doing the correct thing when using the sysfs interface.  I'm pretty sure bad
things will eventually happen (if they are not already) if you return some
really big number to the sysfs core from *_show().

Ira

>   struct seq_buf s;
>   struct papr_scm_perf_stat *stat;
>   struct papr_scm_perf_stats *stats;
> -- 
> 2.26.2
> 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Hi,

2020-09-10 Thread Franca Rose



Hi,

i am trying to reach you hope this message get to
you.from franca

thanks,
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute

2020-09-10 Thread Michael Ellerman
On Mon, 7 Sep 2020 16:35:40 +0530, Vaibhav Jain wrote:
> The newly introduced 'perf_stats' attribute uses the default access
> mode of 0444 letting non-root users access performance stats of an
> nvdimm and potentially force the kernel into issuing large number of
> expensive HCALLs. Since the information exposed by this attribute
> cannot be cached hence its better to ward of access to this attribute
> from users who don't need to access these performance statistics.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
  https://git.kernel.org/powerpc/c/0460534b532e5518c657c7d6492b9337d975eaa3

cheers
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 5/7] virtio-mem: try to merge system ram resources

2020-09-10 Thread Pankaj Gupta
Reviewed-by: Pankaj Gupta 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 4/7] mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources

2020-09-10 Thread Pankaj Gupta
Looks good to me.

Reviewed-by: Pankaj Gupta 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-10 Thread Pankaj Gupta
> We soon want to pass flags, e.g., to mark added System RAM resources.
> mergeable. Prepare for that.
>
> This patch is based on a similar patch by Oscar Salvador:
>
> https://lkml.kernel.org/r/20190625075227.15193-3-osalva...@suse.de
>
> Acked-by: Wei Liu 
> Reviewed-by: Juergen Gross  # Xen related part
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Dan Williams 
> Cc: Jason Gunthorpe 
> Cc: Pankaj Gupta 
> Cc: Baoquan He 
> Cc: Wei Yang 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: "Rafael J. Wysocki" 
> Cc: Len Brown 
> Cc: Greg Kroah-Hartman 
> Cc: Vishal Verma 
> Cc: Dave Jiang 
> Cc: "K. Y. Srinivasan" 
> Cc: Haiyang Zhang 
> Cc: Stephen Hemminger 
> Cc: Wei Liu 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> Cc: David Hildenbrand 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> Cc: Stefano Stabellini 
> Cc: "Oliver O'Halloran" 
> Cc: Pingfan Liu 
> Cc: Nathan Lynch 
> Cc: Libor Pechacek 
> Cc: Anton Blanchard 
> Cc: Leonardo Bras 
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux-nvdimm@lists.01.org
> Cc: linux-hyp...@vger.kernel.org
> Cc: linux-s...@vger.kernel.org
> Cc: virtualizat...@lists.linux-foundation.org
> Cc: xen-de...@lists.xenproject.org
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c   |  2 +-
>  arch/powerpc/platforms/pseries/hotplug-memory.c |  2 +-
>  drivers/acpi/acpi_memhotplug.c  |  3 ++-
>  drivers/base/memory.c   |  3 ++-
>  drivers/dax/kmem.c  |  2 +-
>  drivers/hv/hv_balloon.c |  2 +-
>  drivers/s390/char/sclp_cmd.c|  2 +-
>  drivers/virtio/virtio_mem.c |  2 +-
>  drivers/xen/balloon.c   |  2 +-
>  include/linux/memory_hotplug.h  | 16 
>  mm/memory_hotplug.c | 14 +++---
>  11 files changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
> b/arch/powerpc/platforms/powernv/memtrace.c
> index 13b369d2cc454..6828108486f83 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -224,7 +224,7 @@ static int memtrace_online(void)
> ent->mem = 0;
> }
>
> -   if (add_memory(ent->nid, ent->start, ent->size)) {
> +   if (add_memory(ent->nid, ent->start, ent->size, MHP_NONE)) {
> pr_err("Failed to add trace memory to node %d\n",
> ent->nid);
> ret += 1;
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 0ea976d1cac47..e1c9fa0d730f5 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -615,7 +615,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> nid = memory_add_physaddr_to_nid(lmb->base_addr);
>
> /* Add the memory */
> -   rc = __add_memory(nid, lmb->base_addr, block_sz);
> +   rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
> if (rc) {
> invalidate_lmb_associativity_index(lmb);
> return rc;
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index e294f44a78504..2067c3bc55763 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -207,7 +207,8 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
> if (node < 0)
> node = memory_add_physaddr_to_nid(info->start_addr);
>
> -   result = __add_memory(node, info->start_addr, info->length);
> +   result = __add_memory(node, info->start_addr, info->length,
> + MHP_NONE);
>
> /*
>  * If the memory block has been used by the kernel, 
> add_memory()
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 4db3c660de831..b4c297dd04755 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -432,7 +432,8 @@ static ssize_t probe_store(struct device *dev, struct 
> device_attribute *attr,
>
> nid = memory_add_physaddr_to_nid(phys_addr);
> ret = __add_memory(nid, phys_addr,
> -  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
> +  MIN_MEMORY_BLOCK_SIZE * sections_per_block,
> +  MHP_NONE);
>
> if (ret)
> goto out;
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 7dcb2902e9b1b..896cb9444e727 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -95,7 +95,7 @@ int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>

[PATCH] powerpc/papr_scm: Fix warning triggered by perf_stats_show()

2020-09-10 Thread Vaibhav Jain
A warning is reported by the kernel in case perf_stats_show() returns
an error code. The warning is of the form below:

 papr_scm ibm,persistent-memory:ibm,pmemory@4411:
  Failed to query performance stats, Err:-10
 dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
 fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count

On investigation it looks like that the compiler is silently truncating the
return value of drc_pmem_query_stats() from 'long' to 'int', since the
variable used to store the return code 'rc' is an 'int'. This
truncated value is then returned back as a 'ssize_t' back from
perf_stats_show() to 'dev_attr_show()' which thinks of it as a large
unsigned number and triggers this warning..

To fix this we update the type of variable 'rc' from 'int' to
'ssize_t' that prevents the compiler from truncating the return value
of drc_pmem_query_stats() and returning correct signed value back from
perf_stats_show().

Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance
   stats from PHYP')
Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index a88a707a608aa..9f00b61676ab9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor 
*nd_desc,
 static ssize_t perf_stats_show(struct device *dev,
   struct device_attribute *attr, char *buf)
 {
-   int index, rc;
+   int index;
+   ssize_t rc;
struct seq_buf s;
struct papr_scm_perf_stat *stat;
struct papr_scm_perf_stats *stats;
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v3 5/7] virtio-mem: try to merge system ram resources

2020-09-10 Thread David Hildenbrand
virtio-mem adds memory in memory block granularity, to be able to
remove it in the same granularity again later, and to grow slowly on
demand. This, however, results in quite a lot of resources when
adding a lot of memory. Resources are effectively stored in a list-based
tree. Having a lot of resources not only wastes memory, it also makes
traversing that tree more expensive, and makes /proc/iomem explode in
size (e.g., requiring kexec-tools to manually merge resources later
when e.g., trying to create a kdump header).

Before this patch, we get (/proc/iomem) when hotplugging 2G via virtio-mem
on x86-64:
[...]
1-13fff : System RAM
14000-33fff : virtio0
  14000-147ff : System RAM (virtio_mem)
  14800-14fff : System RAM (virtio_mem)
  15000-157ff : System RAM (virtio_mem)
  15800-15fff : System RAM (virtio_mem)
  16000-167ff : System RAM (virtio_mem)
  16800-16fff : System RAM (virtio_mem)
  17000-177ff : System RAM (virtio_mem)
  17800-17fff : System RAM (virtio_mem)
  18000-187ff : System RAM (virtio_mem)
  18800-18fff : System RAM (virtio_mem)
  19000-197ff : System RAM (virtio_mem)
  19800-19fff : System RAM (virtio_mem)
  1a000-1a7ff : System RAM (virtio_mem)
  1a800-1afff : System RAM (virtio_mem)
  1b000-1b7ff : System RAM (virtio_mem)
  1b800-1bfff : System RAM (virtio_mem)
328000-32 : PCI Bus :00

With this patch, we get (/proc/iomem):
[...]
fffc- : Reserved
1-13fff : System RAM
14000-33fff : virtio0
  14000-1bfff : System RAM (virtio_mem)
328000-32 : PCI Bus :00

Of course, with more hotplugged memory, it gets worse. When unplugging
memory blocks again, try_remove_memory() (via
offline_and_remove_memory()) will properly split the resource up again.

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio_mem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index ed99e43354010..ba4de598f6636 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -424,7 +424,8 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, 
unsigned long mb_id)
 
dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
return add_memory_driver_managed(nid, addr, memory_block_size_bytes(),
-vm->resource_name, MHP_NONE);
+vm->resource_name,
+MEMHP_MERGE_RESOURCE);
 }
 
 /*
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v3 6/7] xen/balloon: try to merge system ram resources

2020-09-10 Thread David Hildenbrand
Let's try to merge system ram resources we add, to minimize the number
of resources in /proc/iomem. We don't care about the boundaries of
individual chunks we added.

Reviewed-by: Juergen Gross 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Roger Pau Monné 
Cc: Julien Grall 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 drivers/xen/balloon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 9f40a294d398d..b57b2067ecbfb 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -331,7 +331,7 @@ static enum bp_state reserve_additional_memory(void)
mutex_unlock(&balloon_mutex);
/* add_memory_resource() requires the device_hotplug lock */
lock_device_hotplug();
-   rc = add_memory_resource(nid, resource, MHP_NONE);
+   rc = add_memory_resource(nid, resource, MEMHP_MERGE_RESOURCE);
unlock_device_hotplug();
mutex_lock(&balloon_mutex);
 
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v3 4/7] mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources

2020-09-10 Thread David Hildenbrand
Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space). We really want to merge
added resources in this scenario where possible.

Let's provide a flag (MEMHP_MERGE_RESOURCE) to specify that a resource
either created within add_memory*() or passed via add_memory_resource()
shall be marked mergeable and merged with applicable siblings.

To implement that, we need a kernel/resource interface to mark selected
System RAM resources mergeable (IORESOURCE_SYSRAM_MERGEABLE) and trigger
merging.

Note: We really want to merge after the whole operation succeeded, not
directly when adding a resource to the resource tree (it would break
add_memory_resource() and require splitting resources again when the
operation failed - e.g., due to -ENOMEM).

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Ard Biesheuvel 
Cc: Thomas Gleixner 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Wei Liu 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Roger Pau Monné 
Cc: Julien Grall 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 include/linux/ioport.h |  4 +++
 include/linux/memory_hotplug.h |  7 
 kernel/resource.c  | 60 ++
 mm/memory_hotplug.c|  7 
 4 files changed, 78 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index d7620d7c941a0..7e61389dcb017 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -60,6 +60,7 @@ struct resource {
 
 /* IORESOURCE_SYSRAM specific bits. */
 #define IORESOURCE_SYSRAM_DRIVER_MANAGED   0x0200 /* Always detected 
via a driver. */
+#define IORESOURCE_SYSRAM_MERGEABLE0x0400 /* Resource can be 
merged. */
 
 #define IORESOURCE_EXCLUSIVE   0x0800  /* Userland may not map this 
resource */
 
@@ -253,6 +254,9 @@ extern void __release_region(struct resource *, 
resource_size_t,
 extern void release_mem_region_adjustable(struct resource *, resource_size_t,
  resource_size_t);
 #endif
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern void merge_system_ram_resource(struct resource *res);
+#endif
 
 /* Wrappers for managed devices */
 struct device;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index e53d1058f3443..869a59006cd8e 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -62,6 +62,13 @@ typedef int __bitwise mhp_t;
 
 /* No special request */
 #define MHP_NONE   ((__force mhp_t)0)
+/*
+ * Allow merging of the added System RAM resource with adjacent,
+ * mergeable resources. After a successful call to add_memory_resource()
+ * with this flag set, the resource pointer must no longer be used as it
+ * might be stale, or the resource might have changed.
+ */
+#define MEMHP_MERGE_RESOURCE   ((__force mhp_t)BIT(0))
 
 /*
  * Extended parameters for memory hotplug:
diff --git a/kernel/resource.c b/kernel/resource.c
index 36b3552210120..7a91b935f4c20 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1363,6 +1363,66 @@ void release_mem_region_adjustable(struct resource 
*parent,
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+static bool system_ram_resources_mergeable(struct resource *r1,
+  struct resource *r2)
+{
+   /* We assume either r1 or r2 is IORESOURCE_SYSRAM_MERGEABLE. */
+   return r1->flags == r2->flags && r1->end + 1 == r2->start &&
+  r1->name == r2->name && r1->desc == r2->desc &&
+  !r1->child && !r2->child;
+}
+
+/*
+ * merge_system_ram_resource - mark the System RAM resource mergeable and try 
to
+ * merge it with adjacent, mergeable resources
+ * @res: resource descriptor
+ *
+ * This interface is intended for memory hotplug, whereby lots of contiguous
+ * system ram resources are added (e.g., via add_memory*()) by a driver, and
+ * the actual resource boundaries are not of interest (e.g., it might be
+ * relevant for DIMMs). Only resources that are marked mergeable, that have the
+ * same parent, and that don't have any children are considered. All mergeable
+ * resources must be immutable during the request.
+ *
+ * Note:
+ * - The caller has to make sure that no pointers to resources that are
+ *   marked mergeable are used anymore after this call - the resource might
+ *   be freed and the pointer might be stale!
+ * - release_mem_region_adjustable() will split on demand on memory hotunplug
+ */
+void merge_system_ram_resource(struct resource *res)
+{
+   const unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+   struct

[PATCH v3 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-10 Thread David Hildenbrand
We soon want to pass flags, e.g., to mark added System RAM resources.
mergeable. Prepare for that.

This patch is based on a similar patch by Oscar Salvador:

https://lkml.kernel.org/r/20190625075227.15193-3-osalva...@suse.de

Acked-by: Wei Liu 
Reviewed-by: Juergen Gross  # Xen related part
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Jason Gunthorpe 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Wei Liu 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: David Hildenbrand 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: "Oliver O'Halloran" 
Cc: Pingfan Liu 
Cc: Nathan Lynch 
Cc: Libor Pechacek 
Cc: Anton Blanchard 
Cc: Leonardo Bras 
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-a...@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-hyp...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Signed-off-by: David Hildenbrand 
---
 arch/powerpc/platforms/powernv/memtrace.c   |  2 +-
 arch/powerpc/platforms/pseries/hotplug-memory.c |  2 +-
 drivers/acpi/acpi_memhotplug.c  |  3 ++-
 drivers/base/memory.c   |  3 ++-
 drivers/dax/kmem.c  |  2 +-
 drivers/hv/hv_balloon.c |  2 +-
 drivers/s390/char/sclp_cmd.c|  2 +-
 drivers/virtio/virtio_mem.c |  2 +-
 drivers/xen/balloon.c   |  2 +-
 include/linux/memory_hotplug.h  | 16 
 mm/memory_hotplug.c | 14 +++---
 11 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
b/arch/powerpc/platforms/powernv/memtrace.c
index 13b369d2cc454..6828108486f83 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -224,7 +224,7 @@ static int memtrace_online(void)
ent->mem = 0;
}
 
-   if (add_memory(ent->nid, ent->start, ent->size)) {
+   if (add_memory(ent->nid, ent->start, ent->size, MHP_NONE)) {
pr_err("Failed to add trace memory to node %d\n",
ent->nid);
ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 0ea976d1cac47..e1c9fa0d730f5 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -615,7 +615,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
/* Add the memory */
-   rc = __add_memory(nid, lmb->base_addr, block_sz);
+   rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
if (rc) {
invalidate_lmb_associativity_index(lmb);
return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index e294f44a78504..2067c3bc55763 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -207,7 +207,8 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
-   result = __add_memory(node, info->start_addr, info->length);
+   result = __add_memory(node, info->start_addr, info->length,
+ MHP_NONE);
 
/*
 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 4db3c660de831..b4c297dd04755 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -432,7 +432,8 @@ static ssize_t probe_store(struct device *dev, struct 
device_attribute *attr,
 
nid = memory_add_physaddr_to_nid(phys_addr);
ret = __add_memory(nid, phys_addr,
-  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+  MIN_MEMORY_BLOCK_SIZE * sections_per_block,
+  MHP_NONE);
 
if (ret)
goto out;
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 7dcb2902e9b1b..896cb9444e727 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -95,7 +95,7 @@ int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 * this as RAM automatically.
 */
rc = add_memory_driver_managed(numa_node, range.start,
-   range_len(&range), kmem_name);
+   range_len(&range

[PATCH v3 7/7] hv_balloon: try to merge system ram resources

2020-09-10 Thread David Hildenbrand
Let's try to merge system ram resources we add, to minimize the number
of resources in /proc/iomem. We don't care about the boundaries of
individual chunks we added.

Reviewed-by: Wei Liu 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Wei Liu 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 drivers/hv/hv_balloon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 3c0d52e244520..b64d2efbefe71 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
 
nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
ret = add_memory(nid, PFN_PHYS((start_pfn)),
-   (HA_CHUNK << PAGE_SHIFT), MHP_NONE);
+   (HA_CHUNK << PAGE_SHIFT), MEMHP_MERGE_RESOURCE);
 
if (ret) {
pr_err("hot_add memory failed error is %d\n", ret);
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v3 2/7] kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED

2020-09-10 Thread David Hildenbrand
IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
always set to 0 by hardware. This is far from beautiful (and confusing),
and the bit only applies to SYSRAM. So let's move it out of the
bus-specific (PnP) defined bits.

We'll add another SYSRAM specific bit soon. If we ever need more bits for
other purposes, we can steal some from "desc", or reshuffle/regroup what we
have.

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Ard Biesheuvel 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Cc: Eric Biederman 
Cc: Thomas Gleixner 
Cc: Greg Kroah-Hartman 
Cc: ke...@lists.infradead.org
Signed-off-by: David Hildenbrand 
---
 include/linux/ioport.h | 4 +++-
 kernel/kexec_file.c| 2 +-
 mm/memory_hotplug.c| 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 52a91f5fa1a36..d7620d7c941a0 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -58,6 +58,9 @@ struct resource {
 #define IORESOURCE_EXT_TYPE_BITS 0x0100/* Resource extended types */
 #define IORESOURCE_SYSRAM  0x0100  /* System RAM (modifier) */
 
+/* IORESOURCE_SYSRAM specific bits. */
+#define IORESOURCE_SYSRAM_DRIVER_MANAGED   0x0200 /* Always detected 
via a driver. */
+
 #define IORESOURCE_EXCLUSIVE   0x0800  /* Userland may not map this 
resource */
 
 #define IORESOURCE_DISABLED0x1000
@@ -103,7 +106,6 @@ struct resource {
 #define IORESOURCE_MEM_32BIT   (3<<3)
 #define IORESOURCE_MEM_SHADOWABLE  (1<<5)  /* dup: IORESOURCE_SHADOWABLE */
 #define IORESOURCE_MEM_EXPANSIONROM(1<<6)
-#define IORESOURCE_MEM_DRIVER_MANAGED  (1<<7)
 
 /* PnP I/O specific bits (IORESOURCE_BITS) */
 #define IORESOURCE_IO_16BIT_ADDR   (1<<0)
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index ca40bef75a616..dfeeed1aed084 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -520,7 +520,7 @@ static int locate_mem_hole_callback(struct resource *res, 
void *arg)
/* Returning 0 will take to next memory range */
 
/* Don't use memory that will be detected and handled by a driver. */
-   if (res->flags & IORESOURCE_MEM_DRIVER_MANAGED)
+   if (res->flags & IORESOURCE_SYSRAM_DRIVER_MANAGED)
return 0;
 
if (sz < kbuf->memsz)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4c47b68a9f4b5..8e1cd18b5cf14 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, 
u64 size,
unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 
if (strcmp(resource_name, "System RAM"))
-   flags |= IORESOURCE_MEM_DRIVER_MANAGED;
+   flags |= IORESOURCE_SYSRAM_DRIVER_MANAGED;
 
/*
 * Make sure value parsed from 'mem=' only restricts memory adding
@@ -1160,7 +1160,7 @@ EXPORT_SYMBOL_GPL(add_memory);
  *
  * For this memory, no entries in /sys/firmware/memmap ("raw firmware-provided
  * memory map") are created. Also, the created memory resource is flagged
- * with IORESOURCE_MEM_DRIVER_MANAGED, so in-kernel users can special-case
+ * with IORESOURCE_SYSRAM_DRIVER_MANAGED, so in-kernel users can special-case
  * this memory as well (esp., not place kexec images onto it).
  *
  * The resource_name (visible via /proc/iomem) has to have the format
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v3 0/7] mm/memory_hotplug: selective merging of system ram resources

2020-09-10 Thread David Hildenbrand
Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space). We really want to merge
added resources in this scenario where possible.

Resources are effectively stored in a list-based tree. Having a lot of
resources not only wastes memory, it also makes traversing that tree more
expensive, and makes /proc/iomem explode in size (e.g., requiring
kexec-tools to manually merge resources when creating a kdump header. The
current kexec-tools resource count limit does not allow for more than
~100GB of memory with a memory block size of 128MB on x86-64).

Let's allow to selectively merge system ram resources by specifying a
new flag for add_memory*(). Patch #5 contains a /proc/iomem example. Only
tested with virtio-mem.

v2 -> v3:
- "mm/memory_hotplug: prepare passing flags to add_memory() and friends"
-- Use proper __bitwise type for flags
-- Use "MHP_NONE" for empty flags
- Rebased to latest -next, added rb's

v1 -> v2:
- I had another look at v1 after vacation and didn't like it - it felt like
  a hack. So I want forward and added a proper flag to add_memory*(), and
  introduce a clean (non-racy) way to mark System RAM resources mergeable.
- "kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED"
-- Clean that flag up, felt wrong in the PnP section
- "mm/memory_hotplug: prepare passing flags to add_memory() and friends"
-- Previously sent in other context - decided to keep Wei's ack
- "mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System
   RAM resources"
-- Cleaner approach to get the job done by using proper flags and only
   merging the single, specified resource
- "virtio-mem: try to merge system ram resources"
  "xen/balloon: try to merge system ram resources"
  "hv_balloon: try to merge system ram resources"
-- Use the new flag MEMHP_MERGE_RESOURCE, much cleaner

RFC -> v1:
- Switch from rather generic "merge_child_mem_resources()" where a resource
  name has to be specified to "merge_system_ram_resources().
- Smaller comment/documentation/patch description changes/fixes

David Hildenbrand (7):
  kernel/resource: make release_mem_region_adjustable() never fail
  kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED
  mm/memory_hotplug: prepare passing flags to add_memory() and friends
  mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System
RAM resources
  virtio-mem: try to merge system ram resources
  xen/balloon: try to merge system ram resources
  hv_balloon: try to merge system ram resources

 arch/powerpc/platforms/powernv/memtrace.c |   2 +-
 .../platforms/pseries/hotplug-memory.c|   2 +-
 drivers/acpi/acpi_memhotplug.c|   3 +-
 drivers/base/memory.c |   3 +-
 drivers/dax/kmem.c|   2 +-
 drivers/hv/hv_balloon.c   |   2 +-
 drivers/s390/char/sclp_cmd.c  |   2 +-
 drivers/virtio/virtio_mem.c   |   3 +-
 drivers/xen/balloon.c |   2 +-
 include/linux/ioport.h|  12 +-
 include/linux/memory_hotplug.h|  23 +++-
 kernel/kexec_file.c   |   2 +-
 kernel/resource.c | 109 ++
 mm/memory_hotplug.c   |  47 +++-
 14 files changed, 146 insertions(+), 68 deletions(-)

-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v3 1/7] kernel/resource: make release_mem_region_adjustable() never fail

2020-09-10 Thread David Hildenbrand
Let's make sure splitting a resource on memory hotunplug will never fail.
This will become more relevant once we merge selected System RAM
resources - then, we'll trigger that case more often on memory hotunplug.

In general, this function is already unlikely to fail. When we remove
memory, we free up quite a lot of metadata (memmap, page tables, memory
block device, etc.). The only reason it could really fail would be when
injecting allocation errors.

All other error cases inside release_mem_region_adjustable() seem to be
sanity checks if the function would be abused in different context -
let's add WARN_ON_ONCE() in these cases so we can catch them.

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Ard Biesheuvel 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 include/linux/ioport.h |  4 ++--
 kernel/resource.c  | 49 --
 mm/memory_hotplug.c| 22 +--
 3 files changed, 31 insertions(+), 44 deletions(-)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb7..52a91f5fa1a36 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -248,8 +248,8 @@ extern struct resource * __request_region(struct resource *,
 extern void __release_region(struct resource *, resource_size_t,
resource_size_t);
 #ifdef CONFIG_MEMORY_HOTREMOVE
-extern int release_mem_region_adjustable(struct resource *, resource_size_t,
-   resource_size_t);
+extern void release_mem_region_adjustable(struct resource *, resource_size_t,
+ resource_size_t);
 #endif
 
 /* Wrappers for managed devices */
diff --git a/kernel/resource.c b/kernel/resource.c
index f1175ce93a1d5..36b3552210120 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1258,21 +1258,28 @@ EXPORT_SYMBOL(__release_region);
  *   assumes that all children remain in the lower address entry for
  *   simplicity.  Enhance this logic when necessary.
  */
-int release_mem_region_adjustable(struct resource *parent,
- resource_size_t start, resource_size_t size)
+void release_mem_region_adjustable(struct resource *parent,
+  resource_size_t start, resource_size_t size)
 {
+   struct resource *new_res = NULL;
+   bool alloc_nofail = false;
struct resource **p;
struct resource *res;
-   struct resource *new_res;
resource_size_t end;
-   int ret = -EINVAL;
 
end = start + size - 1;
-   if ((start < parent->start) || (end > parent->end))
-   return ret;
+   if (WARN_ON_ONCE((start < parent->start) || (end > parent->end)))
+   return;
 
-   /* The alloc_resource() result gets checked later */
-   new_res = alloc_resource(GFP_KERNEL);
+   /*
+* We free up quite a lot of memory on memory hotunplug (esp., memap),
+* just before releasing the region. This is highly unlikely to
+* fail - let's play save and make it never fail as the caller cannot
+* perform any error handling (e.g., trying to re-add memory will fail
+* similarly).
+*/
+retry:
+   new_res = alloc_resource(GFP_KERNEL | alloc_nofail ? __GFP_NOFAIL : 0);
 
p = &parent->child;
write_lock(&resource_lock);
@@ -1298,7 +1305,6 @@ int release_mem_region_adjustable(struct resource *parent,
 * so if we are dealing with them, let us just back off here.
 */
if (!(res->flags & IORESOURCE_SYSRAM)) {
-   ret = 0;
break;
}
 
@@ -1315,20 +1321,23 @@ int release_mem_region_adjustable(struct resource 
*parent,
/* free the whole entry */
*p = res->sibling;
free_resource(res);
-   ret = 0;
} else if (res->start == start && res->end != end) {
/* adjust the start */
-   ret = __adjust_resource(res, end + 1,
-   res->end - end);
+   WARN_ON_ONCE(__adjust_resource(res, end + 1,
+  res->end - end));
} else if (res->start != start && res->end == end) {
/* adjust the end */
-   ret = __adjust_resource(res, res->start,
-   start - res->start);
+   WARN_ON_ONCE(__adjust_resource(res, res->start,
+  start - res->start));
} else {
-   /* split into two entries */
+   /* split into two entries - we need a new resource */
if (!new_res) {
-   

PO.# 52/FF/20-21/0460/ S-1

2020-09-10 Thread purchase
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Absolutely Essential Cell Phone Accessories

2020-09-10 Thread pfefferrolfsonc5np
Writing songs takes so much time and effort that many give up before they've 
got the for you to complete a song. Many people focus a good deal on the 
finishedproject before looking into all of this components of songwriting, 
however the truth is, you desire to learn about songwriting basics before may 
get createa song of extremely. Let's have a moment to review the factors of 
songs that construct decided to do . song structure of today, and how each 
section or elementof your song will vary lyrically and melodically. I often 
download free ringtones here: https://japanringtones.com/

Usually, this song structure will have a lot of variation their verse melody, 
since the verses repeat often. It keeps their melody from getting boring during 
all the repetition.

It can be a good method to express yourself with Idol ringtones. Monthly 
American Idol gives its viewers a terrific performance, wonderful mood and 
fantastic emotions.I know you plan to retrieve something better about 
ringtones. Have your considered JapanRingtones? The most amazing thing is 
usually everybody canoffer his voice for his own American Idol. And doing thing 
is actually everybody may get Idol Ringtones to their mobile smartphones.

Choose your music software so a person can can get output many different is. 
Most software packages allow copying your music onto CDs or Cds. Look for 
softwarethat anyone to convert your music into mp3 or wav file. These files 
could be uploaded for the web actually shared among band members easily via 
email.You can do store a number of mp3 files or wav files in a thumb drive or 
players such as iPods. This makes it easy to carry your music all period so 
where youcan play it to a crowd whenever a business presents its own self.

If the real partner have a favorite song, you can set that as your ringtone 
when she or he contact. Same with your best friend, or many for that matter. 
This meansfun, but it lets concerning ahead of energy who's phone. Think of it 
as a musical call display.

This is considered the most the ideal way to get a major selection of rock and 
roll ringtones for your cell get in touch with. A lot of the time, cell phone 
carriers mayhave promotions which will allow you to achieve your decision 
concerning different ringtones to download for free. Look for these every month 
on locations ofyour cell phone carrier. If they have special offers on 
ringtones, they typically provide you with a choice of the most popular songs 
in a number of different styles.

Having your demo recorded by a qualified demo singer is worth the hard earned 
money. If you certainly are songwriter can be serious about pitching your 
songsand advancing your career, allow us to help. Our studio will produce your 
demo using some of essentially the most talented professional producers, 
musicians,and vocalists in Chattanooga. We can help take your song in order to 
some higher level and bad you'll be thrilled utilizing the result. You might be 
evenmore thrilled when an artist decides to record it on a task.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: 自分だけのMp3着メロを作成する

2020-09-10 Thread pfefferrolfsonc5np
ここで着信音をダウンロードできます:https://japanringtones.com/
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


自分だけのMp3着メロを作成する

2020-09-10 Thread pfefferrolfsonc5np
着メロはとても暑いです。店を歩くだけで、さまざまな着信音が聞こえてきます。電話には通常、少数の異なるサンプルが付属していますが、それらは常に退屈であり、世界の半分がおそらくそれらを使用しています。それが、あなたがユニークであり、誰も持っていない真新しい着メロを持ちたい理由です。異なるユニークな着信音を入手することを考えている場合、いくつかの異なるオプションから選択できます。ここで着信音をダウンロードできます:https://japanringtones.com/

次のシリーズの記事では、中世のBaebesのディスコグラフィーを探索し、クラシックレコーディングの作成に使用された音楽の起源と歴史について簡単に説明します。子供たちを楽しんでください。

曲を書くことも同じです。荒々しい曲で軽やかに命をとり、輝き続けるまで磨き続けることができます。達成したいことをエスカレートし、素晴らしい素晴らしい音楽を書きます。

最初に、iTunesをMacまたはWindowsコンピュータの中央に配置する必要がある場合、本当に詰まる可能性があります。 
JapanRingtonesのナンバーワンは、Postが着メロを検索していた数か月にまたがる最高のカップルに私の注意をひきつけました。次に、着メロの基礎として適用される考慮事項である非DRM曲を用意します。

キャノピーソングの個人的なボーカルデリバリーを設計するときに感じる重要なことは、常に、個人にとっての意味に基づいて、歌詞を適切に感情的に表現することです。そうすれば、必ずボーカルを失うことはありません。
 Karen Oは、このトラックバイでボーカルを届けるのに素晴らしい仕事をしており、必要なときに敷設し、適切なときにエッジの効いたボーカルに浸しています。

詩:歌のテキストは散文で書かれるべきですが、詩的なパターンで書かれるべきです。詩は永遠に書かれており、詩はその秘密を表現するための最良の方法が可能であることがわかっていました。ヴェーダ、ウパニシャッド、ジータのような最も古いテキストを持つすべてのインドの経典は詩で構成されました。詩は、書かれた散文の言葉と音楽のリズムの架け橋と考えることができます。詩は歌の後に作成されることもありますが、多くの場合、詩が書かれた後に音楽が実際に作成されます。しかし、曲が作られると、音楽と詩が非常に組み合わさって、最近何が最初に作成されたかを知ることは不可能です。

ですから、友達や新しい人に信仰を紹介するために何か新しいものを探しているのであれば、クリスティンの着信音が最も有益な方法であり、着信や着信があるたびに神の恵みを思い出させてくれますメール。
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org