Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-13 Thread Dan Williams
On Wed, Mar 13, 2019 at 8:45 PM Aneesh Kumar K.V
 wrote:
[..]
> >> Now w.r.t to failures, can device-dax do an opportunistic huge page
> >> usage?
> >
> > device-dax explicitly disclaims the ability to do opportunistic mappings.
> >
> >> I haven't looked at the device-dax details fully yet. Do we make the
> >> assumption of the mapping page size as a format w.r.t device-dax? Is that
> >> derived from nd_pfn->align value?
> >
> > Correct.
> >
> >>
> >> Here is what I am working on:
> >> 1) If the platform doesn't support huge page and if the device superblock
> >> indicated that it was created with huge page support, we fail the device
> >> init.
> >
> > Ok.
> >
> >> 2) Now if we are creating a new namespace without huge page support in
> >> the platform, then we force the align details to PAGE_SIZE. In such a
> >> configuration when handling dax fault even with THP enabled during
> >> the build, we should not try to use hugepage. This I think we can
> >> achieve by using TRANSPARENT_HUGEPAEG_DAX_FLAG.
> >
> > How is this dynamic property communicated to the guest?
>
> via device tree on powerpc. We have a device tree node indicating
> supported page sizes.

Ah, ok, yeah let's plumb that straight to the device-dax driver and
leave out the interaction / interpretation of the thp-enabled flags.

>
> >
> >>
> >> Also even if the user decided to not use THP, by
> >> echo "never" > transparent_hugepage/enabled , we should continue to map
> >> dax fault using huge page on platforms that can support huge pages.
> >>
> >> This still doesn't cover the details of a device-dax created with
> >> PAGE_SIZE align later booted with a kernel that can do hugepage dax.How
> >> should we handle that? That makes me think, this should be a VMA flag
> >> which got derived from device config? May be use VM_HUGEPAGE to indicate
> >> if device should use a hugepage mapping or not?
> >
> > device-dax configured with PAGE_SIZE always gets PAGE_SIZE mappings.
>
> Now what will be page size used for mapping vmemmap?

That's up to the architecture's vmemmap_populate() implementation.

> Architectures
> possibly will use PMD_SIZE mapping if supported for vmemmap. Now a
> device-dax with struct page in the device will have pfn reserve area aligned
> to PAGE_SIZE with the above example? We can't map that using
> PMD_SIZE page size?

IIUC, that's a different alignment. Currently that's handled by
padding the reservation area up to a section (128MB on x86) boundary,
but I'm working on patches to allow sub-section sized ranges to be
mapped.

Now, that said, I expect there may be bugs lurking in the
implementation if PAGE_SIZE changes from one boot to the next simply
because I've never tested that.

I think this also indicates that the section padding logic can't be
removed until all arch vmemmap_populate() implementations understand
the sub-section case.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-13 Thread Aneesh Kumar K.V
Dan Williams  writes:

> On Wed, Mar 6, 2019 at 1:18 AM Aneesh Kumar K.V
>  wrote:
>>
>> Dan Williams  writes:
>>
>> > On Thu, Feb 28, 2019 at 1:40 AM Oliver  wrote:
>> >>
>> >> On Thu, Feb 28, 2019 at 7:35 PM Aneesh Kumar K.V
>> >>  wrote:
>> >> >
>> >> > Add a flag to indicate the ability to do huge page dax mapping. On 
>> >> > architecture
>> >> > like ppc64, the hypervisor can disable huge page support in the guest. 
>> >> > In
>> >> > such a case, we should not enable huge page dax mapping. This patch adds
>> >> > a flag which the architecture code will update to indicate huge page
>> >> > dax mapping support.
>> >>
>> >> *groan*
>> >>
>> >> > Architectures mostly do transparent_hugepage_flag = 0; if they can't
>> >> > do hugepages. That also takes care of disabling dax hugepage mapping
>> >> > with this change.
>> >> >
>> >> > Without this patch we get the below error with kvm on ppc64.
>> >> >
>> >> > [  118.849975] lpar: Failed hash pte insert with error -4
>> >> >
>> >> > NOTE: The patch also use
>> >> >
>> >> > echo never > /sys/kernel/mm/transparent_hugepage/enabled
>> >> > to disable dax huge page mapping.
>> >> >
>> >> > Signed-off-by: Aneesh Kumar K.V 
>> >> > ---
>> >> > TODO:
>> >> > * Add Fixes: tag
>> >> >
>> >> >  include/linux/huge_mm.h | 4 +++-
>> >> >  mm/huge_memory.c| 4 
>> >> >  2 files changed, 7 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> >> > index 381e872bfde0..01ad5258545e 100644
>> >> > --- a/include/linux/huge_mm.h
>> >> > +++ b/include/linux/huge_mm.h
>> >> > @@ -53,6 +53,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_area_struct 
>> >> > *vma, unsigned long addr,
>> >> > pud_t *pud, pfn_t pfn, bool write);
>> >> >  enum transparent_hugepage_flag {
>> >> > TRANSPARENT_HUGEPAGE_FLAG,
>> >> > +   TRANSPARENT_HUGEPAGE_DAX_FLAG,
>> >> > TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
>> >> > TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
>> >> > TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
>> >> > @@ -111,7 +112,8 @@ static inline bool 
>> >> > __transparent_hugepage_enabled(struct vm_area_struct *vma)
>> >> > if (transparent_hugepage_flags & (1 << 
>> >> > TRANSPARENT_HUGEPAGE_FLAG))
>> >> > return true;
>> >> >
>> >> > -   if (vma_is_dax(vma))
>> >> > +   if (vma_is_dax(vma) &&
>> >> > +   (transparent_hugepage_flags & (1 << 
>> >> > TRANSPARENT_HUGEPAGE_DAX_FLAG)))
>> >> > return true;
>> >>
>> >> Forcing PTE sized faults should be fine for fsdax, but it'll break
>> >> devdax. The devdax driver requires the fault size be >= the namespace
>> >> alignment since devdax tries to guarantee hugepage mappings will be
>> >> used and PMD alignment is the default. We can probably have devdax
>> >> fall back to the largest size the hypervisor has made available, but
>> >> it does run contrary to the design. Ah well, I suppose it's better off
>> >> being degraded rather than unusable.
>> >
>> > Given this is an explicit setting I think device-dax should explicitly
>> > fail to enable in the presence of this flag to preserve the
>> > application visible behavior.
>> >
>> > I.e. if device-dax was enabled after this setting was made then I
>> > think future faults should fail as well.
>>
>> Not sure I understood that. Now we are disabling the ability to map
>> pages as huge pages. I am now considering that this should not be
>> user configurable. Ie, this is something that platform can use to avoid
>> dax forcing huge page mapping, but if the architecture can enable huge
>> dax mapping, we should always default to using that.
>
> No, that's an application visible behavior regression. The side effect
> of this setting is that all huge-page configured device-dax instances
> must be disabled.

So if the device was created with a nd_pfn->align value of PMD_SIZE, that is
an indication that we would map the pages in PMD_SIZE?

Ok with that understanding, If the align value is not a supported
mapping size, we fail initializing the device. 


>
>> Now w.r.t to failures, can device-dax do an opportunistic huge page
>> usage?
>
> device-dax explicitly disclaims the ability to do opportunistic mappings.
>
>> I haven't looked at the device-dax details fully yet. Do we make the
>> assumption of the mapping page size as a format w.r.t device-dax? Is that
>> derived from nd_pfn->align value?
>
> Correct.
>
>>
>> Here is what I am working on:
>> 1) If the platform doesn't support huge page and if the device superblock
>> indicated that it was created with huge page support, we fail the device
>> init.
>
> Ok.
>
>> 2) Now if we are creating a new namespace without huge page support in
>> the platform, then we force the align details to PAGE_SIZE. In such a
>> configuration when handling dax fault even with THP enabled during
>> the build, we should not try to use hugepage. This I think we can
>> achieve by using TRANSPARENT_HUGEPAEG_DAX_

如何运营《抖音与社群》,建立自己的鱼塘

2019-03-13 Thread 田�D浩
From: =?GB2312?B?linux-nvdimm/gudiyv8PF?= 
Subject: =?Ìï•DºÆ?B?wM22r7rPzay3qLu3vrPPwg==?=
To: "Ìï•DºÆ" 

(AD) Ç×£¬·³Çë²éÔĸ½¼þ´ó¸Ù

Èç¹ûÄãÏëÉî¶ÈÁ˽â·ÛË¿ºÍÉçȺ¾­¼ÃºËÐÄ£¬Éî¶È²ÎÓëÉç½»´´ÒµÌåϵ

ÎÒÃǽ«ÔÚÁ½Ìì¿Î³Ìµ±ÖÐΪÄú½â´ð¡£

·¢ËÍʱ¼ä:2019-03-14  11:38:29
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Delivery reports about your e-mail

2019-03-13 Thread brad
The original message was received at Thu, 14 Mar 2019 11:32:03 +0800
from contoso.com [57.22.67.74]

- The following addresses had permanent fatal errors -


- Transcript of session follows -
  while talking to lists.01.org.:
>>> MAIL From:b...@contoso.com
<<< 501 b...@contoso.com... Refused



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


(AD)主管做些什么工作才能真正提升业绩?

2019-03-13 Thread 薛�B朋
 
---Original---
From: yau...@cd-jiahe.com
Date: 2019-03-14   10:22:48
To: linux-nvdimm@lists.01.org

Ïê Çé Çë ²é ÔÄ ¸½ ¼þ...
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [mm PATCH v6 6/7] mm: Add reserved flag setting to set_page_links

2019-03-13 Thread Alexander Duyck
On Wed, 2019-03-13 at 09:33 -0700, Andrew Morton wrote:
> On Tue, 12 Mar 2019 15:50:36 -0700 Alexander Duyck 
>  wrote:
> 
> > On Tue, 2019-03-12 at 15:07 -0700, Andrew Morton wrote:
> > > On Wed, 5 Dec 2018 21:42:47 +0100 Michal Hocko  wrote:
> > > 
> > > > > I got your explanation. However Andrew had already applied the patches
> > > > > and I had some outstanding issues in them that needed to be addressed.
> > > > > So I thought it best to send out this set of patches with those fixes
> > > > > before the code in mm became too stale. I am still working on what to
> > > > > do about the Reserved bit, and plan to submit it as a follow-up set.
> > > > > From my experience Andrew can drop patches between different versions 
> > > > > of
> > > > the patchset. Things can change a lot while they are in mmotm and under
> > > > the discussion.
> > > 
> > > It's been a while and everyone has forgotten everything, so I'll drop
> > > this version of the patchset.
> > > 
> > 
> > As far as getting to the reserved bit I probably won't have the time in
> > the near future. If I were to resubmit the first 4 patches as a
> > standalone patch set would that be acceptable, or would they be held up
> > as well until the reserved bit issues is addressed?
> > 
> 
> Yes, I think that merging the first four will be OK.  As long as they
> don't add some bug which [5/5] corrects, which happens sometimes!
> 
> Please redo, retest and resend sometime?

I had gone through and tested with each patch applied individually when
I was performance testing them, and I am fairly certain there wasn't a
bug introduced between any two patches.

The issue that I recall Michal had was the fact that I was essentially
embedding the setting of the reserved page under several layers of
function calls, which would make it harder to remove. I started that
work at about patch 5 which is why I figured I would resend the first
4, and hold off on 5-7 until I can get the reserved bit removal for
hotplug done.

I can probably have the patches ready to go in a couple days. I'll send
updates once linux-next and mmotm with the patches dropped have been
posted.

Thanks.

- Alex

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [GIT PULL] Filesystem-DAX fixes for 5.1

2019-03-13 Thread pr-tracker-bot
The pull request you sent on Tue, 12 Mar 2019 16:20:52 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/fsdax-for-5.1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3bb0f28d84f3d4e3800ae57d6b1a931b3f88c1f8

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [GIT PULL] libnvdimm updates for 5.1

2019-03-13 Thread pr-tracker-bot
The pull request you sent on Mon, 11 Mar 2019 14:54:47 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 
> tags/libnvdimm-for-5.1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5ea6718b1f1bb58825426e19a21cdba47075a954

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [mm PATCH v6 6/7] mm: Add reserved flag setting to set_page_links

2019-03-13 Thread Andrew Morton
On Tue, 12 Mar 2019 15:50:36 -0700 Alexander Duyck 
 wrote:

> On Tue, 2019-03-12 at 15:07 -0700, Andrew Morton wrote:
> > On Wed, 5 Dec 2018 21:42:47 +0100 Michal Hocko  wrote:
> > 
> > > > I got your explanation. However Andrew had already applied the patches
> > > > and I had some outstanding issues in them that needed to be addressed.
> > > > So I thought it best to send out this set of patches with those fixes
> > > > before the code in mm became too stale. I am still working on what to
> > > > do about the Reserved bit, and plan to submit it as a follow-up set.
> > > > From my experience Andrew can drop patches between different versions of
> > > the patchset. Things can change a lot while they are in mmotm and under
> > > the discussion.
> > 
> > It's been a while and everyone has forgotten everything, so I'll drop
> > this version of the patchset.
> > 
> 
> As far as getting to the reserved bit I probably won't have the time in
> the near future. If I were to resubmit the first 4 patches as a
> standalone patch set would that be acceptable, or would they be held up
> as well until the reserved bit issues is addressed?
> 

Yes, I think that merging the first four will be OK.  As long as they
don't add some bug which [5/5] corrects, which happens sometimes!

Please redo, retest and resend sometime?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-13 Thread Dan Williams
On Wed, Mar 6, 2019 at 4:46 AM Aneesh Kumar K.V
 wrote:
>
> On 3/6/19 5:14 PM, Michal Suchánek wrote:
> > On Wed, 06 Mar 2019 14:47:33 +0530
> > "Aneesh Kumar K.V"  wrote:
> >
> >> Dan Williams  writes:
> >>
> >>> On Thu, Feb 28, 2019 at 1:40 AM Oliver  wrote:
> 
>  On Thu, Feb 28, 2019 at 7:35 PM Aneesh Kumar K.V
>   wrote:
> >
> >> Also even if the user decided to not use THP, by
> >> echo "never" > transparent_hugepage/enabled , we should continue to map
> >> dax fault using huge page on platforms that can support huge pages.
> >
> > Is this a good idea?
> >
> > This knob is there for a reason. In some situations having huge pages
> > can severely impact performance of the system (due to host-guest
> > interaction or whatever) and the ability to really turn off all THP
> > would be important in those cases, right?
> >
>
> My understanding was that is not true for dax pages? These are not
> regular memory that got allocated. They are allocated out of /dev/dax/
> or /dev/pmem*. Do we have a reason not to use hugepages for mapping
> pages in that case?

The problem with the transparent_hugepage/enabled interface is that it
conflates performing compaction work to produce THP-pages with the
ability to map huge pages at all. The compaction is a nop for dax
because the memory is already statically allocated. If the
administrator does not want dax to consume huge TLB entries then don't
configure huge-page dax. If a hypervisor wants to force disable
huge-page-configured device-dax instances after the fact it seems we
need an explicit interface for that and not overload
transparent_hugepage/enabled.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-13 Thread Dan Williams
On Wed, Mar 6, 2019 at 1:18 AM Aneesh Kumar K.V
 wrote:
>
> Dan Williams  writes:
>
> > On Thu, Feb 28, 2019 at 1:40 AM Oliver  wrote:
> >>
> >> On Thu, Feb 28, 2019 at 7:35 PM Aneesh Kumar K.V
> >>  wrote:
> >> >
> >> > Add a flag to indicate the ability to do huge page dax mapping. On 
> >> > architecture
> >> > like ppc64, the hypervisor can disable huge page support in the guest. In
> >> > such a case, we should not enable huge page dax mapping. This patch adds
> >> > a flag which the architecture code will update to indicate huge page
> >> > dax mapping support.
> >>
> >> *groan*
> >>
> >> > Architectures mostly do transparent_hugepage_flag = 0; if they can't
> >> > do hugepages. That also takes care of disabling dax hugepage mapping
> >> > with this change.
> >> >
> >> > Without this patch we get the below error with kvm on ppc64.
> >> >
> >> > [  118.849975] lpar: Failed hash pte insert with error -4
> >> >
> >> > NOTE: The patch also use
> >> >
> >> > echo never > /sys/kernel/mm/transparent_hugepage/enabled
> >> > to disable dax huge page mapping.
> >> >
> >> > Signed-off-by: Aneesh Kumar K.V 
> >> > ---
> >> > TODO:
> >> > * Add Fixes: tag
> >> >
> >> >  include/linux/huge_mm.h | 4 +++-
> >> >  mm/huge_memory.c| 4 
> >> >  2 files changed, 7 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> > index 381e872bfde0..01ad5258545e 100644
> >> > --- a/include/linux/huge_mm.h
> >> > +++ b/include/linux/huge_mm.h
> >> > @@ -53,6 +53,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_area_struct 
> >> > *vma, unsigned long addr,
> >> > pud_t *pud, pfn_t pfn, bool write);
> >> >  enum transparent_hugepage_flag {
> >> > TRANSPARENT_HUGEPAGE_FLAG,
> >> > +   TRANSPARENT_HUGEPAGE_DAX_FLAG,
> >> > TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
> >> > TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
> >> > TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG,
> >> > @@ -111,7 +112,8 @@ static inline bool 
> >> > __transparent_hugepage_enabled(struct vm_area_struct *vma)
> >> > if (transparent_hugepage_flags & (1 << 
> >> > TRANSPARENT_HUGEPAGE_FLAG))
> >> > return true;
> >> >
> >> > -   if (vma_is_dax(vma))
> >> > +   if (vma_is_dax(vma) &&
> >> > +   (transparent_hugepage_flags & (1 << 
> >> > TRANSPARENT_HUGEPAGE_DAX_FLAG)))
> >> > return true;
> >>
> >> Forcing PTE sized faults should be fine for fsdax, but it'll break
> >> devdax. The devdax driver requires the fault size be >= the namespace
> >> alignment since devdax tries to guarantee hugepage mappings will be
> >> used and PMD alignment is the default. We can probably have devdax
> >> fall back to the largest size the hypervisor has made available, but
> >> it does run contrary to the design. Ah well, I suppose it's better off
> >> being degraded rather than unusable.
> >
> > Given this is an explicit setting I think device-dax should explicitly
> > fail to enable in the presence of this flag to preserve the
> > application visible behavior.
> >
> > I.e. if device-dax was enabled after this setting was made then I
> > think future faults should fail as well.
>
> Not sure I understood that. Now we are disabling the ability to map
> pages as huge pages. I am now considering that this should not be
> user configurable. Ie, this is something that platform can use to avoid
> dax forcing huge page mapping, but if the architecture can enable huge
> dax mapping, we should always default to using that.

No, that's an application visible behavior regression. The side effect
of this setting is that all huge-page configured device-dax instances
must be disabled.

> Now w.r.t to failures, can device-dax do an opportunistic huge page
> usage?

device-dax explicitly disclaims the ability to do opportunistic mappings.

> I haven't looked at the device-dax details fully yet. Do we make the
> assumption of the mapping page size as a format w.r.t device-dax? Is that
> derived from nd_pfn->align value?

Correct.

>
> Here is what I am working on:
> 1) If the platform doesn't support huge page and if the device superblock
> indicated that it was created with huge page support, we fail the device
> init.

Ok.

> 2) Now if we are creating a new namespace without huge page support in
> the platform, then we force the align details to PAGE_SIZE. In such a
> configuration when handling dax fault even with THP enabled during
> the build, we should not try to use hugepage. This I think we can
> achieve by using TRANSPARENT_HUGEPAEG_DAX_FLAG.

How is this dynamic property communicated to the guest?

>
> Also even if the user decided to not use THP, by
> echo "never" > transparent_hugepage/enabled , we should continue to map
> dax fault using huge page on platforms that can support huge pages.
>
> This still doesn't cover the details of a device-dax created with
> PAGE_SIZE align later booted with a kernel that can do hugepage dax.How

Re: [PATCH v2] fs/dax: deposit pagetable even when installing zero page

2019-03-13 Thread Dan Williams
On Wed, Mar 13, 2019 at 2:58 AM Jan Kara  wrote:
>
> On Wed 13-03-19 10:17:17, Aneesh Kumar K.V wrote:
> >
> > Hi Dan/Andrew/Jan,
> >
> > "Aneesh Kumar K.V"  writes:
> >
> > > Architectures like ppc64 use the deposited page table to store hardware
> > > page table slot information. Make sure we deposit a page table when
> > > using zero page at the pmd level for hash.
> > >
> > > Without this we hit
> > >
> > > Unable to handle kernel paging request for data at address 0x
> > > Faulting instruction address: 0xc0082a74
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > 
> > >
> > > NIP [c0082a74] __hash_page_thp+0x224/0x5b0
> > > LR [c00829a4] __hash_page_thp+0x154/0x5b0
> > > Call Trace:
> > >  hash_page_mm+0x43c/0x740
> > >  do_hash_page+0x2c/0x3c
> > >  copy_from_iter_flushcache+0xa4/0x4a0
> > >  pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
> > >  dax_copy_from_iter+0x40/0x70
> > >  dax_iomap_actor+0x134/0x360
> > >  iomap_apply+0xfc/0x1b0
> > >  dax_iomap_rw+0xac/0x130
> > >  ext4_file_write_iter+0x254/0x460 [ext4]
> > >  __vfs_write+0x120/0x1e0
> > >  vfs_write+0xd8/0x220
> > >  SyS_write+0x6c/0x110
> > >  system_call+0x3c/0x130
> > >
> > > Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
> > > Reviewed-by: Jan Kara 
> > > Signed-off-by: Aneesh Kumar K.V 
> >
> > Any suggestion on which tree this patch should got to? Also since this
> > fix a kernel crash, we may want to get this to 5.1?
>
> I think this should go through Dan's tree...

I'll merge this and let it soak in -next for a week and then submit for 5.1-rc2.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2] fs/dax: deposit pagetable even when installing zero page

2019-03-13 Thread Jan Kara
On Wed 13-03-19 10:17:17, Aneesh Kumar K.V wrote:
> 
> Hi Dan/Andrew/Jan,
> 
> "Aneesh Kumar K.V"  writes:
> 
> > Architectures like ppc64 use the deposited page table to store hardware
> > page table slot information. Make sure we deposit a page table when
> > using zero page at the pmd level for hash.
> >
> > Without this we hit
> >
> > Unable to handle kernel paging request for data at address 0x
> > Faulting instruction address: 0xc0082a74
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > 
> >
> > NIP [c0082a74] __hash_page_thp+0x224/0x5b0
> > LR [c00829a4] __hash_page_thp+0x154/0x5b0
> > Call Trace:
> >  hash_page_mm+0x43c/0x740
> >  do_hash_page+0x2c/0x3c
> >  copy_from_iter_flushcache+0xa4/0x4a0
> >  pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
> >  dax_copy_from_iter+0x40/0x70
> >  dax_iomap_actor+0x134/0x360
> >  iomap_apply+0xfc/0x1b0
> >  dax_iomap_rw+0xac/0x130
> >  ext4_file_write_iter+0x254/0x460 [ext4]
> >  __vfs_write+0x120/0x1e0
> >  vfs_write+0xd8/0x220
> >  SyS_write+0x6c/0x110
> >  system_call+0x3c/0x130
> >
> > Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
> > Reviewed-by: Jan Kara 
> > Signed-off-by: Aneesh Kumar K.V 
> 
> Any suggestion on which tree this patch should got to? Also since this
> fix a kernel crash, we may want to get this to 5.1?

I think this should go through Dan's tree...

Honza

> > ---
> > Changes from v1:
> > * Add reviewed-by:
> > * Add Fixes:
> >
> >  fs/dax.c | 15 +++
> >  1 file changed, 15 insertions(+)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 6959837cc465..01bfb2ac34f9 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -33,6 +33,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "internal.h"
> >  
> >  #define CREATE_TRACE_POINTS
> > @@ -1410,7 +1411,9 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
> > *xas, struct vm_fault *vmf,
> >  {
> > struct address_space *mapping = vmf->vma->vm_file->f_mapping;
> > unsigned long pmd_addr = vmf->address & PMD_MASK;
> > +   struct vm_area_struct *vma = vmf->vma;
> > struct inode *inode = mapping->host;
> > +   pgtable_t pgtable = NULL;
> > struct page *zero_page;
> > spinlock_t *ptl;
> > pmd_t pmd_entry;
> > @@ -1425,12 +1428,22 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
> > *xas, struct vm_fault *vmf,
> > *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn,
> > DAX_PMD | DAX_ZERO_PAGE, false);
> >  
> > +   if (arch_needs_pgtable_deposit()) {
> > +   pgtable = pte_alloc_one(vma->vm_mm);
> > +   if (!pgtable)
> > +   return VM_FAULT_OOM;
> > +   }
> > +
> > ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd);
> > if (!pmd_none(*(vmf->pmd))) {
> > spin_unlock(ptl);
> > goto fallback;
> > }
> >  
> > +   if (pgtable) {
> > +   pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
> > +   mm_inc_nr_ptes(vma->vm_mm);
> > +   }
> > pmd_entry = mk_pmd(zero_page, vmf->vma->vm_page_prot);
> > pmd_entry = pmd_mkhuge(pmd_entry);
> > set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
> > @@ -1439,6 +1452,8 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
> > *xas, struct vm_fault *vmf,
> > return VM_FAULT_NOPAGE;
> >  
> >  fallback:
> > +   if (pgtable)
> > +   pte_free(vma->vm_mm, pgtable);
> > trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, *entry);
> > return VM_FAULT_FALLBACK;
> >  }
> > -- 
> > 2.20.1
> 
> -aneesh
> 
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm