Re: [External] Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone

2018-05-15 Thread Dan Williams
On Tue, May 15, 2018 at 7:52 PM, Matthew Wilcox  wrote:
> On Wed, May 16, 2018 at 02:05:05AM +, Huaisheng HS1 Ye wrote:
>> > From: Matthew Wilcox [mailto:wi...@infradead.org]
>> > Sent: Wednesday, May 16, 2018 12:20 AM>
>> > > > > > Then there's the problem of reconnecting the page cache (which is
>> > > > > > pointed to by ephemeral data structures like inodes and dentries) 
>> > > > > > to
>> > > > > > the new inodes.
>> > > > > Yes, it is not easy.
>> > > >
>> > > > Right ... and until we have that ability, there's no point in this 
>> > > > patch.
>> > > We are focusing to realize this ability.
>> >
>> > But is it the right approach?  So far we have (I think) two parallel
>> > activities.  The first is for local storage, using DAX to store files
>> > directly on the pmem.  The second is a physical block cache for network
>> > filesystems (both NAS and SAN).  You seem to be wanting to supplant the
>> > second effort, but I think it's much harder to reconnect the logical cache
>> > (ie the page cache) than it is the physical cache (ie the block cache).
>>
>> Dear Matthew,
>>
>> Thanks for correcting my idea with cache line.
>> But I have questions about that, assuming NVDIMM works with pmem mode, even 
>> we
>> used it as physical block cache, like dm-cache, there is potential risk with
>> this cache line issue, because NVDIMMs are bytes-address storage, right?
>> If system crash happens, that means CPU doesn't have opportunity to flush 
>> all dirty
>> data from cache lines to NVDIMM, during copying data pointed by 
>> bio_vec.bv_page to
>> NVDIMM.
>> I know there is btt which is used to guarantee sector atomic with block mode,
>> but for pmem mode that will likely cause mix of new and old data in one page
>> of NVDIMM.
>> Correct me if anything wrong.
>
> Right, we do have BTT.  I'm not sure how it's being used with the block
> cache ... but the principle is the same; write the new data to a new
> page and then update the metadata to point to the new page.
>
>> Another question, if we used NVDIMMs as physical block cache for network 
>> filesystems,
>> Does industry have existing implementation to bypass Page Cache similarly 
>> like DAX way,
>> that is to say, directly storing data to NVDIMMs from userspace, rather than 
>> copying
>> data from kernel space memory to NVDIMMs.
>
> The important part about DAX is that the kernel gets entirely out of the
> way and userspace takes care of handling flushing and synchronisation.
> I'm not sure how that works with the block cache; for a network
> filesystem, the filesystem needs to be in charge of deciding when and
> how to write the buffered data back to the storage.
>
> Dan, Vishal, perhaps you could jump in here; I'm not really sure where
> this effort has got to.

Which effort? I think we're saying that there is no such thing as a
DAX capable block cache and it is not clear one make sense.

We can certainly teach existing block caches some optimizations in the
presence of pmem, and perhaps that is sufficient.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Delivery reports about your e-mail

2018-05-15 Thread Bounced mail
This Message was undeliverable due to the following reason:

Your message was not delivered because the destination computer was
not reachable within the allowed queue period. The amount of time
a message is queued before it is returned depends on local configura-
tion parameters.

Most likely there is a network problem that prevented delivery, but
it is also possible that the computer is turned off, or does not
have a mail system running right now.

Your message was not delivered within 1 days:
Host 186.12.157.243 is not responding.

The following recipients did not receive this message:


Please reply to postmas...@lists.01.org
if you feel this message to be in error.



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [External] Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone

2018-05-15 Thread Matthew Wilcox
On Wed, May 16, 2018 at 02:05:05AM +, Huaisheng HS1 Ye wrote:
> > From: Matthew Wilcox [mailto:wi...@infradead.org]
> > Sent: Wednesday, May 16, 2018 12:20 AM> 
> > > > > > Then there's the problem of reconnecting the page cache (which is
> > > > > > pointed to by ephemeral data structures like inodes and dentries) to
> > > > > > the new inodes.
> > > > > Yes, it is not easy.
> > > >
> > > > Right ... and until we have that ability, there's no point in this 
> > > > patch.
> > > We are focusing to realize this ability.
> > 
> > But is it the right approach?  So far we have (I think) two parallel
> > activities.  The first is for local storage, using DAX to store files
> > directly on the pmem.  The second is a physical block cache for network
> > filesystems (both NAS and SAN).  You seem to be wanting to supplant the
> > second effort, but I think it's much harder to reconnect the logical cache
> > (ie the page cache) than it is the physical cache (ie the block cache).
> 
> Dear Matthew,
> 
> Thanks for correcting my idea with cache line.
> But I have questions about that, assuming NVDIMM works with pmem mode, even we
> used it as physical block cache, like dm-cache, there is potential risk with
> this cache line issue, because NVDIMMs are bytes-address storage, right?
> If system crash happens, that means CPU doesn't have opportunity to flush all 
> dirty
> data from cache lines to NVDIMM, during copying data pointed by 
> bio_vec.bv_page to
> NVDIMM. 
> I know there is btt which is used to guarantee sector atomic with block mode,
> but for pmem mode that will likely cause mix of new and old data in one page
> of NVDIMM.
> Correct me if anything wrong.

Right, we do have BTT.  I'm not sure how it's being used with the block
cache ... but the principle is the same; write the new data to a new
page and then update the metadata to point to the new page.

> Another question, if we used NVDIMMs as physical block cache for network 
> filesystems,
> Does industry have existing implementation to bypass Page Cache similarly 
> like DAX way,
> that is to say, directly storing data to NVDIMMs from userspace, rather than 
> copying
> data from kernel space memory to NVDIMMs.

The important part about DAX is that the kernel gets entirely out of the
way and userspace takes care of handling flushing and synchronisation.
I'm not sure how that works with the block cache; for a network
filesystem, the filesystem needs to be in charge of deciding when and
how to write the buffered data back to the storage.

Dan, Vishal, perhaps you could jump in here; I'm not really sure where
this effort has got to.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [External] Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone

2018-05-15 Thread Dan Williams
On Tue, May 15, 2018 at 7:05 PM, Huaisheng HS1 Ye  wrote:
>> From: Matthew Wilcox [mailto:wi...@infradead.org]
>> Sent: Wednesday, May 16, 2018 12:20 AM>
>> > > > > Then there's the problem of reconnecting the page cache (which is
>> > > > > pointed to by ephemeral data structures like inodes and dentries) to
>> > > > > the new inodes.
>> > > > Yes, it is not easy.
>> > >
>> > > Right ... and until we have that ability, there's no point in this patch.
>> > We are focusing to realize this ability.
>>
>> But is it the right approach?  So far we have (I think) two parallel
>> activities.  The first is for local storage, using DAX to store files
>> directly on the pmem.  The second is a physical block cache for network
>> filesystems (both NAS and SAN).  You seem to be wanting to supplant the
>> second effort, but I think it's much harder to reconnect the logical cache
>> (ie the page cache) than it is the physical cache (ie the block cache).
>
> Dear Matthew,
>
> Thanks for correcting my idea with cache line.
> But I have questions about that, assuming NVDIMM works with pmem mode, even we
> used it as physical block cache, like dm-cache, there is potential risk with
> this cache line issue, because NVDIMMs are bytes-address storage, right?

No, there is no risk if the cache is designed properly. The pmem
driver will not report that the I/O is complete until the entire
payload of the data write has made it to persistent memory. The cache
driver will not report that the write succeeded until the pmem driver
completes the I/O. There is no risk to losing power while the pmem
driver is operating because the cache will recover to it's last
acknowledged stable state, i.e. it will roll back / undo the
incomplete write.

> If system crash happens, that means CPU doesn't have opportunity to flush all 
> dirty
> data from cache lines to NVDIMM, during copying data pointed by 
> bio_vec.bv_page to
> NVDIMM.
> I know there is btt which is used to guarantee sector atomic with block mode,
> but for pmem mode that will likely cause mix of new and old data in one page
> of NVDIMM.
> Correct me if anything wrong.

dm-cache is performing similar metadata management as the btt driver
to ensure safe forward progress of the cache state relative to power
loss or system-crash.

> Another question, if we used NVDIMMs as physical block cache for network 
> filesystems,
> Does industry have existing implementation to bypass Page Cache similarly 
> like DAX way,
> that is to say, directly storing data to NVDIMMs from userspace, rather than 
> copying
> data from kernel space memory to NVDIMMs.

Any caching solution with associated metadata requires coordination
with the kernel, so it is not possible for the kernel to stay
completely out of the way. Especially when we're talking about a cache
in front of the network there is not much room for DAX to offer
improved performance because we need the kernel to takeover on all
write-persist operations to update cache metadata.

So, I'm still struggling to see why dm-cache is not a suitable
solution for this case. It seems suitable if it is updated to allow
direct dma-access to the pmem cache pages from the backing device
storage / networking driver.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [PATCH v6 2/4] ndctl, monitor: add ndctl monitor daemon

2018-05-15 Thread Qi, Fuli
> -Original Message-
> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Wednesday, May 16, 2018 2:07 AM
> To: Qi, Fuli/斉 福利 
> Cc: linux-nvdimm 
> Subject: Re: [PATCH v6 2/4] ndctl, monitor: add ndctl monitor daemon
> 
> On Tue, May 15, 2018 at 1:32 AM, Qi, Fuli  wrote:
> [..]
> >> Actually, I don't see a need to have LOG_DESTINATION_FILE at all. Why
> >> not just
> >> do:
> >>
> >> ndctl monitor 2>file
> >>
> >> ...to redirect stderr to a file?
> >
> > In my understanding, this stderr redirection does not make sense when
> > ndctl monitor runs as a daemon, eg:
> ># ndctl monitor --logfile stderr --daemon 2>file What do you think?
> 
> True, and now that I look dnsmasq allows the same with its --log-facility 
> option. Ok,
> let's keep this feature.
> 
> I appreciate the continued effort and patience.
> 
Ok, I see.
Thank you very much.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [Qemu-devel] [PATCH 3/3] nvdimm: platform capabilities command line option

2018-05-15 Thread Ross Zwisler
On Thu, May 10, 2018 at 03:28:48PM +0200, Igor Mammedov wrote:
> On Fri, 27 Apr 2018 15:53:14 -0600
> Ross Zwisler  wrote:
> 
> > Add a device command line option to allow the user to control the Platform
> > Capabilities Structure in the virtualized NFIT.
> > 
> > Signed-off-by: Ross Zwisler 
> > ---
> >  docs/nvdimm.txt | 22 ++
> >  hw/acpi/nvdimm.c| 29 +
> >  hw/mem/nvdimm.c | 28 
> >  include/hw/mem/nvdimm.h |  6 ++
> >  4 files changed, 81 insertions(+), 4 deletions(-)
> > 
> > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> > index e903d8bb09..13a2c15b70 100644
> > --- a/docs/nvdimm.txt
> > +++ b/docs/nvdimm.txt
> > @@ -153,3 +153,25 @@ guest NVDIMM region mapping structure.  This unarmed 
> > flag indicates
> >  guest software that this vNVDIMM device contains a region that cannot
> >  accept persistent writes. In result, for example, the guest Linux
> >  NVDIMM driver, marks such vNVDIMM device as read-only.
> > +
> > +Platform Capabilities
> > +-
> > +
> > +ACPI 6.2 Errata A added support for a new Platform Capabilities Structure
> > +which allows the platform to communicate what features it supports related 
> > to
> > +NVDIMM data durability.  Users can provide a capabilities value to a guest 
> > via
> > +the optional "cap" device command line option:
> > +
> > + -device nvdimm,id=nvdimm1,memdev=mem1,cap=3
> > +
> > +As of ACPI 6.2 Errata A, the following values are valid for the bottom two
> > +bits:
> > +
> > +2 - Memory Controller Flush to NVDIMM Durability on Power Loss Capable.
> > +3 - CPU Cache Flush to NVDIMM Durability on Power Loss Capable.
> > +
> > +For a complete list of the flags available please consult the ACPI spec.
> > +
> > +These platform capabilities apply to the entire virtual platform, so it is
> > +recommended that only one "cap" device command option be given per virtual
> > +machine.  This value will apply to all NVDIMMs in the virtual platform.
> This looks like it should be machine property instead of per device one,
> you can get rid of static variable and mismatch check and a weird nvdimm CLI
> option that implies that the option is per device.

Yep, that's much better.  I have this implemented and ready to go.

> Also an extra patch to for make check that will test setting 'cap'
> would be nice (an extra testcase in tests/bios-tables-test.c)

Hmm...I've been looking at this, and it doesn't look like there is any
verification around a lot of the ACPI tables (NFIT, SRAT, etc).

I've verified my patch by interacting with a guest with various settings - is
this good enough, or do you really want me to test the value (which I think
would just be "do I get out what I put in at the command line") via the unit
test infrastructure?

Thank you for the review.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Draft NVDIMM proposal

2018-05-15 Thread Dan Williams
On Tue, May 15, 2018 at 7:19 AM, George Dunlap  wrote:
> On 05/11/2018 05:33 PM, Dan Williams wrote:
>> [ adding linux-nvdimm ]
>>
>> Great write up! Some comments below...
>
> Thanks for the quick response!
>
> It seems I still have some fundamental misconceptions about what's going
> on, so I'd better start with that. :-)
>
> Here's the part that I'm having a hard time getting.
>
> If actual data on the NVDIMMs is a noun, and the act of writing is a
> verb, then the SPA and interleave sets are adverbs: they define *how*
> the write happens.  When the processor says, "Write to address X", the
> memory controller converts address X into a  address> tuple to actually write the data.
>
> So, who decides what this SPA range and interleave set is?  Can the
> operating system change these interleave sets and mappings, or change
> data from PMEM to BLK, and is so, how?

The interleave-set to SPA range association and delineation of
capacity between PMEM and BLK access modes is current out-of-scope for
ACPI. The BIOS reports the configuration to the OS via the NFIT, but
the configuration is currently written by vendor specific tooling.
Longer term it would be great for this mechanism to become
standardized and available to the OS, but for now it requires platform
specific tooling to change the DIMM interleave configuration.

> If you read through section 13.19 of the UEFI manual, it seems to imply
> that this is determined by the label area -- that each DIMM has a
> separate label area describing regions local to that DIMM; and that if
> you have 4 DIMMs you'll have 4 label areas, and each label area will
> have a label describing the DPA region on that DIMM which corresponds to
> the interleave set.  And somehow someone sets up the interleave sets and
> SPA based on what's written there.
>
> Which would mean that an operating system could change how the
> interleave sets work by rewriting the various labels on the DIMMs; for
> instance, changing a single 4-way set spanning the entirety of 4 DIMMs,
> to one 4-way set spanning half of 4 DIMMs, and 2 2-way sets spanning
> half of 2 DIMMs each.

If a DIMM supports both the PMEM and BLK mechanisms for accessing the
same DPA, then the label breaks the disambiguation and tells the OS to
enforce one access mechanism per DPA at a time. Otherwise the OS has
no ability to affect the interleave-set configuration, it's all
initialized by platform BIOS/firmware before the OS boots.

>
> But then you say:
>
>> Unlike NVMe an NVDIMM itself has no concept of namespaces. Some DIMMs
>> provide a "label area" which is an out-of-band non-volatile memory
>> area where the OS can store whatever it likes. The UEFI 2.7
>> specification defines a data format for the definition of namespaces
>> on top of persistent memory ranges advertised to the OS via the ACPI
>> NFIT structure.
>
> OK, so that sounds like no, that's that what happens.  So where do the
> SPA range and interleave sets come from?
>
> Random guess: The BIOS / firmware makes it up.  Either it's hard-coded,
> or there's some menu in the BIOS you can use to change things around;
> but once it hits the operating system, that's it -- the mapping of SPA
> range onto interleave sets onto DIMMs is, from the operating system's
> point of view, fixed.

Correct.

> And so (here's another guess) -- when you're talking about namespaces
> and label areas, you're talking about namespaces stored *within a
> pre-existing SPA range*.  You use the same format as described in the
> UEFI spec, but ignore all the stuff about interleave sets and whatever,
> and use system physical addresses relative to the SPA range rather than
> DPAs.

Well, we don't ignore it because we need to validate in the driver
that the interleave set configuration matches a checksum that we
generated when the namespace was first instantiated on the interleave
set. However, you are right, for accesses at run time all we care
about is the SPA for PMEM accesses.

>
> Is that right?
>
> But then there's things like this:
>
>> There is no obligation for an NVDIMM to provide a label area, and as
>> far as I know all NVDIMMs on the market today do not provide a label
>> area.
> [snip]
>> Linux supports "label-less" mode where it exposes
>> the raw capacity of a region in 1:1 mapped namespace without a label.
>> This is how Linux supports "legacy" NVDIMMs that do not support
>> labels.
>
> So are "all NVDIMMs on the market today" then classed as "legacy"
> NVDIMMs because they don't support labels?  And if labels are simply the
> NVDIMM equivalent of a partition table, then what does it mena to
> "support" or "not support" labels?

Yes, the term "legacy" has been thrown around for NVDIMMs that do not
support labels. The way this support is determined is whether the
platform publishes the _LSI, _LSR, and _LSW methods in ACPI (see:
6.5.10 NVDIMM Label Methods in ACPI 6.2a). I.e. each DIMM is
represented by an ACPI device object, and we query those 

Re: Draft NVDIMM proposal

2018-05-15 Thread Dan Williams
On Tue, May 15, 2018 at 5:26 AM, Jan Beulich  wrote:
 On 15.05.18 at 12:12,  wrote:
[..]
>> That is, each fsdax / devdax namespace has a superblock that, in part,
>> defines what parts are used for Linux and what parts are used for data.  Or
>> to put it a different way: Linux decides which parts of a namespace to use
>> for page structures, and writes it down in the metadata starting in the first
>> page of the namespace.
>
> And that metadata layout is agreed upon between all OS vendors?

The only agreed upon metadata layouts across all OS vendors are the
ones that are specified in UEFI. We typically only need inter-OS and
UEFI compatibility for booting and other pre-OS accesses. For Linux
"raw" and "sector" mode namespaces defined by namespace labels are
inter-OS compatible while "fsdax", "devdax", and so called
"label-less" configurations are not.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)

2018-05-15 Thread Dan Williams
On Mon, May 14, 2018 at 11:49 PM, Ingo Molnar  wrote:
>
> * Dan Williams  wrote:
>
>> On Mon, May 14, 2018 at 12:26 AM, Ingo Molnar  wrote:
>> >
>> > * Dan Williams  wrote:
>> >
>> >> Ingo, Thomas, Al, any concerns with this series?
>> >
>> > Yeah, so:
>> >
>> >"[PATCH v3 0/9] Series short description"
>> >
>> > ... isn't the catchiest of titles to capture my [all too easily distracted]
>> > attention! ;-)
>>
>> My bad! After that mistake it became a toss-up between more spam and
>> hoping the distraction would not throw you off.
>>
>> > I have marked it now for -tip processing. Linus was happy with this and 
>> > acked the
>> > approach, right?
>>
>> I think "happy" is a strong word when it comes to x86 machine check
>> handling. My interpretation is that he and Andy acquiesced that this
>> is about the best we can do with dax+mce as things stand today.
>
> So, how would you like to go about this series?
>
> To help move it forward I applied the first 5 commits to tip:x86/dax, on a
> vanilla v4.17-rc5 base, did some minor edits to the changelogs, tested it
> superficially (I don't have DAX so this essentially means build tests) and
> pushed out the result.

Thanks for that. Technically speaking, you do have dax, but setting up
our unit tests is currently not friction free, so I would not expect
you to go through that effort. Hopefully we can revive 0day running
our unit tests one of these days.

> Barring some later generic-x86 regression (unlikely) this looks good to me - 
> feel
> free to cross-pull that branch into your DAX/nvdimm tree.
>
> Or we could apply the remaining changes to -tip too - your call.

The remainder patches have developed a conflict with another topic
branch in the nvdimm tree, in particular "dax: introduce a
->copy_to_iter dax operation". I think the best course is for me to
rebase the remaining 4 on top of tip/x86/dax and carry the merge
conflict through the nvdimm tree.

> Thanks,
>
> Ingo

Thanks!
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 2/4] ndctl, monitor: add ndctl monitor daemon

2018-05-15 Thread Dan Williams
On Tue, May 15, 2018 at 1:32 AM, Qi, Fuli  wrote:
[..]
>> Actually, I don't see a need to have LOG_DESTINATION_FILE at all. Why not 
>> just
>> do:
>>
>> ndctl monitor 2>file
>>
>> ...to redirect stderr to a file?
>
> In my understanding, this stderr redirection does not make sense when ndctl 
> monitor
> runs as a daemon, eg:
># ndctl monitor --logfile stderr --daemon 2>file
> What do you think?

True, and now that I look dnsmasq allows the same with its
--log-facility option. Ok, let's keep this feature.

I appreciate the continued effort and patience.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [External] Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone

2018-05-15 Thread Matthew Wilcox
On Tue, May 15, 2018 at 04:07:28PM +, Huaisheng HS1 Ye wrote:
> > From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On Behalf 
> > Of Matthew
> > Wilcox
> > No.  In the current situation, the user knows that either the entire
> > page was written back from the pagecache or none of it was (at least
> > with a journalling filesystem).  With your proposal, we may have pages
> > splintered along cacheline boundaries, with a mix of old and new data.
> > This is completely unacceptable to most customers.
> 
> Dear Matthew,
> 
> Thanks for your great help, I really didn't consider this case.
> I want to make it a little bit clearer to me. So, correct me if anything 
> wrong.
> 
> Is that to say this mix of old and new data in one page, which only has 
> chance to happen when CPU failed to flush all dirty data from LLC to NVDIMM?
> But if an interrupt can be reported to CPU, and CPU successfully flush all 
> dirty data from cache lines to NVDIMM within interrupt response function, 
> this mix of old and new data can be avoided.

If you can keep the CPU and the memory (and all the busses between them)
alive for long enough after the power signal hs been tripped, yes.
Talk to your hardware designers about what it will take to achieve this
:-) Be sure to ask about the number of retries which may be necessary
on the CPU interconnect to flush all data to an NV-DIMM attached to a
remote CPU.

> Current X86_64 uses N-way set associative cache, and every cache line has 64 
> bytes.
> For 4096 bytes page, one page shall be splintered to 64 (4096/64) lines. Is 
> it right?

That's correct.

> > > > Then there's the problem of reconnecting the page cache (which is
> > > > pointed to by ephemeral data structures like inodes and dentries) to
> > > > the new inodes.
> > > Yes, it is not easy.
> > 
> > Right ... and until we have that ability, there's no point in this patch.
> We are focusing to realize this ability.

But is it the right approach?  So far we have (I think) two parallel
activities.  The first is for local storage, using DAX to store files
directly on the pmem.  The second is a physical block cache for network
filesystems (both NAS and SAN).  You seem to be wanting to supplant the
second effort, but I think it's much harder to reconnect the logical cache
(ie the page cache) than it is the physical cache (ie the block cache).

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [External] Re: [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone

2018-05-15 Thread Huaisheng HS1 Ye



> From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On Behalf Of 
> Matthew
> Wilcox
> Sent: Friday, May 11, 2018 12:28 AM
> On Wed, May 09, 2018 at 04:47:54AM +, Huaisheng HS1 Ye wrote:
> > > On Tue, May 08, 2018 at 02:59:40AM +, Huaisheng HS1 Ye wrote:
> > > > Currently in our mind, an ideal use scenario is that, we put all page 
> > > > caches to
> > > > zone_nvm, without any doubt, page cache is an efficient and common cache
> > > > implement, but it has a disadvantage that all dirty data within it 
> > > > would has risk
> > > > to be missed by power failure or system crash. If we put all page 
> > > > caches to NVDIMMs,
> > > > all dirty data will be safe.
> > >
> > > That's a common misconception.  Some dirty data will still be in the
> > > CPU caches.  Are you planning on building servers which have enough
> > > capacitance to allow the CPU to flush all dirty data from LLC to NV-DIMM?
> > >
> > Sorry for not being clear.
> > For CPU caches if there is a power failure, NVDIMM has ADR to guarantee an 
> > interrupt
> will be reported to CPU, an interrupt response function should be responsible 
> to flush
> all dirty data to NVDIMM.
> > If there is a system crush, perhaps CPU couldn't have chance to execute 
> > this response.
> >
> > It is hard to make sure everything is safe, what we can do is just to save 
> > the dirty
> data which is already stored to Pagecache, but not in CPU cache.
> > Is this an improvement than current?
> 
> No.  In the current situation, the user knows that either the entire
> page was written back from the pagecache or none of it was (at least
> with a journalling filesystem).  With your proposal, we may have pages
> splintered along cacheline boundaries, with a mix of old and new data.
> This is completely unacceptable to most customers.

Dear Matthew,

Thanks for your great help, I really didn't consider this case.
I want to make it a little bit clearer to me. So, correct me if anything wrong.

Is that to say this mix of old and new data in one page, which only has chance 
to happen when CPU failed to flush all dirty data from LLC to NVDIMM?
But if an interrupt can be reported to CPU, and CPU successfully flush all 
dirty data from cache lines to NVDIMM within interrupt response function, this 
mix of old and new data can be avoided.

Current X86_64 uses N-way set associative cache, and every cache line has 64 
bytes.
For 4096 bytes page, one page shall be splintered to 64 (4096/64) lines. Is it 
right?


> > > Then there's the problem of reconnecting the page cache (which is
> > > pointed to by ephemeral data structures like inodes and dentries) to
> > > the new inodes.
> > Yes, it is not easy.
> 
> Right ... and until we have that ability, there's no point in this patch.
We are focusing to realize this ability.

Sincerely,
Huaisheng Ye


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


linux-nvdimm:会技术的您会管理吗?

2018-05-15 Thread 许经理
详*情~请*查~阅*附~件*大~纲 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Draft NVDIMM proposal

2018-05-15 Thread George Dunlap
On 05/11/2018 05:33 PM, Dan Williams wrote:
> [ adding linux-nvdimm ]
> 
> Great write up! Some comments below...

Thanks for the quick response!

It seems I still have some fundamental misconceptions about what's going
on, so I'd better start with that. :-)

Here's the part that I'm having a hard time getting.

If actual data on the NVDIMMs is a noun, and the act of writing is a
verb, then the SPA and interleave sets are adverbs: they define *how*
the write happens.  When the processor says, "Write to address X", the
memory controller converts address X into a  tuple to actually write the data.

So, who decides what this SPA range and interleave set is?  Can the
operating system change these interleave sets and mappings, or change
data from PMEM to BLK, and is so, how?

If you read through section 13.19 of the UEFI manual, it seems to imply
that this is determined by the label area -- that each DIMM has a
separate label area describing regions local to that DIMM; and that if
you have 4 DIMMs you'll have 4 label areas, and each label area will
have a label describing the DPA region on that DIMM which corresponds to
the interleave set.  And somehow someone sets up the interleave sets and
SPA based on what's written there.

Which would mean that an operating system could change how the
interleave sets work by rewriting the various labels on the DIMMs; for
instance, changing a single 4-way set spanning the entirety of 4 DIMMs,
to one 4-way set spanning half of 4 DIMMs, and 2 2-way sets spanning
half of 2 DIMMs each.

But then you say:

> Unlike NVMe an NVDIMM itself has no concept of namespaces. Some DIMMs
> provide a "label area" which is an out-of-band non-volatile memory
> area where the OS can store whatever it likes. The UEFI 2.7
> specification defines a data format for the definition of namespaces
> on top of persistent memory ranges advertised to the OS via the ACPI
> NFIT structure.

OK, so that sounds like no, that's that what happens.  So where do the
SPA range and interleave sets come from?

Random guess: The BIOS / firmware makes it up.  Either it's hard-coded,
or there's some menu in the BIOS you can use to change things around;
but once it hits the operating system, that's it -- the mapping of SPA
range onto interleave sets onto DIMMs is, from the operating system's
point of view, fixed.

And so (here's another guess) -- when you're talking about namespaces
and label areas, you're talking about namespaces stored *within a
pre-existing SPA range*.  You use the same format as described in the
UEFI spec, but ignore all the stuff about interleave sets and whatever,
and use system physical addresses relative to the SPA range rather than
DPAs.

Is that right?

But then there's things like this:

> There is no obligation for an NVDIMM to provide a label area, and as
> far as I know all NVDIMMs on the market today do not provide a label
> area.
[snip]
> Linux supports "label-less" mode where it exposes
> the raw capacity of a region in 1:1 mapped namespace without a label.
> This is how Linux supports "legacy" NVDIMMs that do not support
> labels.

So are "all NVDIMMs on the market today" then classed as "legacy"
NVDIMMs because they don't support labels?  And if labels are simply the
NVDIMM equivalent of a partition table, then what does it mena to
"support" or "not support" labels?

And then there's this:

> In any
> event we do the DIMM to SPA association first before reading labels.
> The OS calculates a so called "Interleave Set Cookie" from the NFIT
> information to compare against a similar value stored in the labels.
> This lets the OS determine that the Interleave Set composition has not
> changed from when the labels were initially written. An Interleave Set
> Cookie mismatch indicates the labels are stale, corrupted, or that the
> physical composition of the Interleave Set has changed.

So wait, the SPA and interleave sets can actually change?  And the
labels which the OS reads actually are per-DIMM, and do control somehow
how the DPA ranges of individual DIMMs are mapped into interleave sets
and exposed as SPAs?  (And perhaps, can be changed by the operating system?)

And:

> There are checksums in the Namespace definition to account label
> validity. Starting with ACPI 6.2 DSMs for labels are deprecated in
> favor of the new / named methods for label access _LSI, _LSR, and
> _LSW.

Does this mean the methods will use checksums to verify writes to the
label area, and refuse writes which create invalid labels?

If all of the above is true, then in what way can it be said that
"NVDIMM has no concept of namespaces", that an OS can "store whatever it
likes" in the label area, and that UEFI namespaces are "on top of
persistent memory ranges advertised to the OS via the ACPI NFIT structure"?

I'm sorry if this is obvious, but I am exactly as confused as I was
before I started writing this. :-)

This is all pretty foundational.  Xen can read static ACPI tables, but
it can't do AML.  So to do a 

Re: Draft NVDIMM proposal

2018-05-15 Thread George Dunlap


> On May 15, 2018, at 1:26 PM, Jan Beulich  wrote:
> 
 On 15.05.18 at 12:12,  wrote:
>>> On May 15, 2018, at 11:05 AM, Roger Pau Monne  wrote:
>>> On Fri, May 11, 2018 at 09:33:10AM -0700, Dan Williams wrote:
 [ adding linux-nvdimm ]
 
 Great write up! Some comments below...
 
 On Wed, May 9, 2018 at 10:35 AM, George Dunlap  
 wrote:
>> To use a namespace, an operating system needs at a minimum two pieces
>> of information: The UUID and/or Name of the namespace, and the SPA
>> range where that namespace is mapped; and ideally also the Type and
>> Abstraction Type to know how to interpret the data inside.
 
 Not necessarily, no. Linux supports "label-less" mode where it exposes
 the raw capacity of a region in 1:1 mapped namespace without a label.
 This is how Linux supports "legacy" NVDIMMs that do not support
 labels.
>>> 
>>> In that case, how does Linux know which area of the NVDIMM it should
>>> use to store the page structures?
>> 
>> The answer to that is right here:
>> 
>> `fsdax` and `devdax` mode are both designed to make it possible for
>> user processes to have direct mapping of NVRAM.  As such, both are
>> only suitable for PMEM namespaces (?).  Both also need to have kernel
>> page structures allocated for each page of NVRAM; this amounts to 64
>> bytes for every 4k of NVRAM.  Memory for these page structures can
>> either be allocated out of normal "system" memory, or inside the PMEM
>> namespace itself.
>> 
>> In both cases, an "info block", very similar to the BTT info block, is
>> written to the beginning of the namespace when created.  This info
>> block specifies whether the page structures come from system memory or
>> from the namespace itself.  If from the namespace itself, it contains
>> information about what parts of the namespace have been set aside for
>> Linux to use for this purpose.
>> 
>> That is, each fsdax / devdax namespace has a superblock that, in part, 
>> defines what parts are used for Linux and what parts are used for data.  Or 
>> to put it a different way: Linux decides which parts of a namespace to use 
>> for page structures, and writes it down in the metadata starting in the 
>> first 
>> page of the namespace.
> 
> And that metadata layout is agreed upon between all OS vendors?
> 
>> Linux has also defined "Type GUIDs" for these two types of namespace
>> to be stored in the namespace label, although these are not yet in the
>> ACPI spec.
 
 They never will be. One of the motivations for GUIDs is that an OS can
 define private ones without needing to go back and standardize them.
 Only GUIDs that are needed to inter-OS / pre-OS compatibility would
 need to be defined in ACPI, and there is no expectation that other
 OSes understand Linux's format for reserving page structure space.
>>> 
>>> Maybe it would be helpful to somehow mark those areas as
>>> "non-persistent" storage, so that other OSes know they can use this
>>> space for temporary data that doesn't need to survive across reboots?
>> 
>> In theory there’s no reason another OS couldn’t learn Linux’s format, 
>> discover where the blocks were, and use those blocks for its own purposes 
>> while Linux wasn’t running.
> 
> This looks to imply "no" to my question above, in which case I wonder how
> we would use (part of) the space when the "other" owner is e.g. Windows.

So in classic DOS partition tables, you have partition types; and various 
operating systems just sort of “claimed” numbers for themselves (e.g., NTFS, 
Linux Swap, ).  

But the DOS partition table number space is actually quite small.  So in 
namespaces, you have a similar concept, except that it’s called a “type GUID”, 
and it’s massively long — long enough anyone who wants to make a new type can 
simply generate one randomly and be pretty confident that nobody else is using 
that one.

So if the labels contain a TGUID you understand, you use it, just like you 
would a partition that you understand.  If it contains GUIDs you don’t 
understand, you’d better leave it alone.

 -George
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Draft NVDIMM proposal

2018-05-15 Thread Jan Beulich
>>> On 15.05.18 at 12:12,  wrote:
>> On May 15, 2018, at 11:05 AM, Roger Pau Monne  wrote:
>> On Fri, May 11, 2018 at 09:33:10AM -0700, Dan Williams wrote:
>>> [ adding linux-nvdimm ]
>>> 
>>> Great write up! Some comments below...
>>> 
>>> On Wed, May 9, 2018 at 10:35 AM, George Dunlap  
>>> wrote:
> To use a namespace, an operating system needs at a minimum two pieces
> of information: The UUID and/or Name of the namespace, and the SPA
> range where that namespace is mapped; and ideally also the Type and
> Abstraction Type to know how to interpret the data inside.
>>> 
>>> Not necessarily, no. Linux supports "label-less" mode where it exposes
>>> the raw capacity of a region in 1:1 mapped namespace without a label.
>>> This is how Linux supports "legacy" NVDIMMs that do not support
>>> labels.
>> 
>> In that case, how does Linux know which area of the NVDIMM it should
>> use to store the page structures?
> 
> The answer to that is right here:
> 
> `fsdax` and `devdax` mode are both designed to make it possible for
> user processes to have direct mapping of NVRAM.  As such, both are
> only suitable for PMEM namespaces (?).  Both also need to have kernel
> page structures allocated for each page of NVRAM; this amounts to 64
> bytes for every 4k of NVRAM.  Memory for these page structures can
> either be allocated out of normal "system" memory, or inside the PMEM
> namespace itself.
> 
> In both cases, an "info block", very similar to the BTT info block, is
> written to the beginning of the namespace when created.  This info
> block specifies whether the page structures come from system memory or
> from the namespace itself.  If from the namespace itself, it contains
> information about what parts of the namespace have been set aside for
> Linux to use for this purpose.
> 
> That is, each fsdax / devdax namespace has a superblock that, in part, 
> defines what parts are used for Linux and what parts are used for data.  Or 
> to put it a different way: Linux decides which parts of a namespace to use 
> for page structures, and writes it down in the metadata starting in the first 
> page of the namespace.

And that metadata layout is agreed upon between all OS vendors?

> Linux has also defined "Type GUIDs" for these two types of namespace
> to be stored in the namespace label, although these are not yet in the
> ACPI spec.
>>> 
>>> They never will be. One of the motivations for GUIDs is that an OS can
>>> define private ones without needing to go back and standardize them.
>>> Only GUIDs that are needed to inter-OS / pre-OS compatibility would
>>> need to be defined in ACPI, and there is no expectation that other
>>> OSes understand Linux's format for reserving page structure space.
>> 
>> Maybe it would be helpful to somehow mark those areas as
>> "non-persistent" storage, so that other OSes know they can use this
>> space for temporary data that doesn't need to survive across reboots?
> 
> In theory there’s no reason another OS couldn’t learn Linux’s format, 
> discover where the blocks were, and use those blocks for its own purposes 
> while Linux wasn’t running.

This looks to imply "no" to my question above, in which case I wonder how
we would use (part of) the space when the "other" owner is e.g. Windows.

Jan

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Draft NVDIMM proposal

2018-05-15 Thread George Dunlap


> On May 15, 2018, at 11:05 AM, Roger Pau Monne  wrote:
> 
> Just some replies/questions to some of the points raised below.
> 
> On Fri, May 11, 2018 at 09:33:10AM -0700, Dan Williams wrote:
>> [ adding linux-nvdimm ]
>> 
>> Great write up! Some comments below...
>> 
>> On Wed, May 9, 2018 at 10:35 AM, George Dunlap  
>> wrote:
 To use a namespace, an operating system needs at a minimum two pieces
 of information: The UUID and/or Name of the namespace, and the SPA
 range where that namespace is mapped; and ideally also the Type and
 Abstraction Type to know how to interpret the data inside.
>> 
>> Not necessarily, no. Linux supports "label-less" mode where it exposes
>> the raw capacity of a region in 1:1 mapped namespace without a label.
>> This is how Linux supports "legacy" NVDIMMs that do not support
>> labels.
> 
> In that case, how does Linux know which area of the NVDIMM it should
> use to store the page structures?

The answer to that is right here:

 `fsdax` and `devdax` mode are both designed to make it possible for
 user processes to have direct mapping of NVRAM.  As such, both are
 only suitable for PMEM namespaces (?).  Both also need to have kernel
 page structures allocated for each page of NVRAM; this amounts to 64
 bytes for every 4k of NVRAM.  Memory for these page structures can
 either be allocated out of normal "system" memory, or inside the PMEM
 namespace itself.
 
 In both cases, an "info block", very similar to the BTT info block, is
 written to the beginning of the namespace when created.  This info
 block specifies whether the page structures come from system memory or
 from the namespace itself.  If from the namespace itself, it contains
 information about what parts of the namespace have been set aside for
 Linux to use for this purpose.

That is, each fsdax / devdax namespace has a superblock that, in part, defines 
what parts are used for Linux and what parts are used for data.  Or to put it a 
different way: Linux decides which parts of a namespace to use for page 
structures, and writes it down in the metadata starting in the first page of 
the namespace.


 
 Linux has also defined "Type GUIDs" for these two types of namespace
 to be stored in the namespace label, although these are not yet in the
 ACPI spec.
>> 
>> They never will be. One of the motivations for GUIDs is that an OS can
>> define private ones without needing to go back and standardize them.
>> Only GUIDs that are needed to inter-OS / pre-OS compatibility would
>> need to be defined in ACPI, and there is no expectation that other
>> OSes understand Linux's format for reserving page structure space.
> 
> Maybe it would be helpful to somehow mark those areas as
> "non-persistent" storage, so that other OSes know they can use this
> space for temporary data that doesn't need to survive across reboots?

In theory there’s no reason another OS couldn’t learn Linux’s format, discover 
where the blocks were, and use those blocks for its own purposes while Linux 
wasn’t running.

But that won’t help Xen, as we want to use those blocks while Linux *is* 
running.

 -George

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Draft NVDIMM proposal

2018-05-15 Thread Roger Pau Monné
Just some replies/questions to some of the points raised below.

On Fri, May 11, 2018 at 09:33:10AM -0700, Dan Williams wrote:
> [ adding linux-nvdimm ]
> 
> Great write up! Some comments below...
> 
> On Wed, May 9, 2018 at 10:35 AM, George Dunlap  
> wrote:
> >> To use a namespace, an operating system needs at a minimum two pieces
> >> of information: The UUID and/or Name of the namespace, and the SPA
> >> range where that namespace is mapped; and ideally also the Type and
> >> Abstraction Type to know how to interpret the data inside.
> 
> Not necessarily, no. Linux supports "label-less" mode where it exposes
> the raw capacity of a region in 1:1 mapped namespace without a label.
> This is how Linux supports "legacy" NVDIMMs that do not support
> labels.

In that case, how does Linux know which area of the NVDIMM it should
use to store the page structures?

> >> `fsdax` and `devdax` mode are both designed to make it possible for
> >> user processes to have direct mapping of NVRAM.  As such, both are
> >> only suitable for PMEM namespaces (?).  Both also need to have kernel
> >> page structures allocated for each page of NVRAM; this amounts to 64
> >> bytes for every 4k of NVRAM.  Memory for these page structures can
> >> either be allocated out of normal "system" memory, or inside the PMEM
> >> namespace itself.
> >>
> >> In both cases, an "info block", very similar to the BTT info block, is
> >> written to the beginning of the namespace when created.  This info
> >> block specifies whether the page structures come from system memory or
> >> from the namespace itself.  If from the namespace itself, it contains
> >> information about what parts of the namespace have been set aside for
> >> Linux to use for this purpose.
> >>
> >> Linux has also defined "Type GUIDs" for these two types of namespace
> >> to be stored in the namespace label, although these are not yet in the
> >> ACPI spec.
> 
> They never will be. One of the motivations for GUIDs is that an OS can
> define private ones without needing to go back and standardize them.
> Only GUIDs that are needed to inter-OS / pre-OS compatibility would
> need to be defined in ACPI, and there is no expectation that other
> OSes understand Linux's format for reserving page structure space.

Maybe it would be helpful to somehow mark those areas as
"non-persistent" storage, so that other OSes know they can use this
space for temporary data that doesn't need to survive across reboots?

> >> # Proposed design / roadmap
> >>
> >> Initially, dom0 accesses the NVRAM as normal, using static ACPI tables
> >> and the DSM methods; mappings are treated by Xen during this phase as
> >> MMIO.
> >>
> >> Once dom0 is ready to pass parts of a namespace through to a guest, it
> >> makes a hypercall to tell Xen about the namespace.  It includes any
> >> regions of the namespace which Xen may use for 'scratch'; it also
> >> includes a flag to indicate whether this 'scratch' space may be used
> >> for frame tables from other namespaces.
> >>
> >> Frame tables are then created for this SPA range.  They will be
> >> allocated from, in this order: 1) designated 'scratch' range from
> >> within this namespace 2) designated 'scratch' range from other
> >> namespaces which has been marked as sharable 3) system RAM.
> >>
> >> Xen will either verify that dom0 has no existing mappings, or promote
> >> the mappings to full pages (taking appropriate reference counts for
> >> mappings).  Dom0 must ensure that this namespace is not unmapped,
> >> modified, or relocated until it asks Xen to unmap it.
> >>
> >> For Xen frame tables, to begin with, set aside a partition inside a
> >> namespace to be used by Xen.  Pass this in to Xen when activating the
> >> namespace; this could be either 2a or 3a from "Page structure
> >> allocation".  After that, we could decide which of the two more
> >> streamlined approaches (2b or 3b) to pursue.
> >>
> >> At this point, dom0 can pass parts of the mapped namespace into
> >> guests.  Unfortunately, passing files on a fsdax filesystem is
> >> probably not safe; but we can pass in full dev-dax or fsdax
> >> partitions.
> >>
> >> From a guest perspective, I propose we provide static NFIT only, no
> >> access to labels to begin with.  This can be generated in hvmloader
> >> and/or the toolstack acpi code.
> 
> I'm ignorant of Xen internals, but can you not reuse the existing QEMU
> emulation for labels and NFIT?

We only use QEMU for HVM guests, which would still leave PVH guests
without NVDIMM support. Ideally we would like to use the same solution
for both HVM and PVH, which means QEMU cannot be part of that
solution.

Thanks, Roger.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


RE: [PATCH v6 2/4] ndctl, monitor: add ndctl monitor daemon

2018-05-15 Thread Qi, Fuli
> -Original Message-
> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Saturday, May 12, 2018 3:45 AM
> To: Qi, Fuli/斉 福利 
> Cc: linux-nvdimm 
> Subject: Re: [PATCH v6 2/4] ndctl, monitor: add ndctl monitor daemon
> 
> On Sun, May 6, 2018 at 10:09 PM, QI Fuli  wrote:
> > This patch adds the body file of ndctl monitor daemon.
> 
> This is too short. Let's copy your cover letter details into this patch since 
> the cover
> letter is thrown away, but the commit messages are preserved in git:

Thanks for your comments.
I will write more details.

> ---
> 
> ndctl monitor daemon, a tiny daemon to monitor the smart events of nvdimm
> DIMMs. Users can run a monitor as a one-shot command or a daemon in
> background by using the [--daemon] option. DIMMs to monitor can be selected by
> [--dimm] [--bus] [--region] [--namespace] options, these options support 
> multiple
> space-seperated arguments. When a smart event fires, monitor daemon will log 
> the
> notifications which including dimm health status to syslog or a logfile by 
> setting
> [--logfile=] option. monitor also can output the notifications 
> to stderr
> when it run as one-shot command by setting [--logfile=]. The
> notifications follow json format and can be consumed by log collectors like 
> Fluentd.
> Users can change the configuration of monitor by editing the default 
> configuration
> file /etc/ndctl/monitor.conf or by using [--config-file=] option to 
> override the
> default configuration.
> 
> Users can start a monitor daemon by the following command:
>  # ndctl monitor --daemon --logfile /var/log/ndctl/monitor.log
> 
> Also, a monitor daemon can be started by systemd:
>  # systemctl start ndctl-monitor.service In this case, monitor daemon follows 
> the
> default configuration file /etc/ndctl/monitor.conf.
> 
> ---
> 
> However, now that I re-read this description what about the other event types
> (beyond health) on other objects (beyond DIMMs). This should behave like the 
> 'list'
> command where we have filter parameters for devices to monitor *and* event
> types for events to include:
> 
> dimm-events=""
> namespace-events=""
> region-events=""
> bus-events=""
> bus=""
> dimm=""
> region=""
> namespace=""
> 
> We don't need to support all of this in this first implementation, but see 
> more
> comments below I think there are some changes we can make to start down this
> path.
> 
> >
> > Signed-off-by: QI Fuli 
> > ---
> >  builtin.h |   1 +
> >  ndctl/Makefile.am |   3 +-
> >  ndctl/monitor.c   | 460
> ++
> >  ndctl/ndctl.c |   1 +
> >  4 files changed, 464 insertions(+), 1 deletion(-)  create mode 100644
> > ndctl/monitor.c
> >
> > diff --git a/builtin.h b/builtin.h
> > index d3cc723..675a6ce 100644
> > --- a/builtin.h
> > +++ b/builtin.h
> > @@ -39,6 +39,7 @@ int cmd_inject_error(int argc, const char **argv,
> > void *ctx);  int cmd_wait_scrub(int argc, const char **argv, void
> > *ctx);  int cmd_start_scrub(int argc, const char **argv, void *ctx);
> > int cmd_list(int argc, const char **argv, void *ctx);
> > +int cmd_monitor(int argc, const char **argv, void *ctx);
> >  #ifdef ENABLE_TEST
> >  int cmd_test(int argc, const char **argv, void *ctx);  #endif diff
> > --git a/ndctl/Makefile.am b/ndctl/Makefile.am index d22a379..7dbf223
> > 100644
> > --- a/ndctl/Makefile.am
> > +++ b/ndctl/Makefile.am
> > @@ -16,7 +16,8 @@ ndctl_SOURCES = ndctl.c \
> > util/json-smart.c \
> > util/json-firmware.c \
> > inject-error.c \
> > -   inject-smart.c
> > +   inject-smart.c \
> > +   monitor.c
> >
> >  if ENABLE_DESTRUCTIVE
> >  ndctl_SOURCES += ../test/blk_namespaces.c \ diff --git
> > a/ndctl/monitor.c b/ndctl/monitor.c new file mode 100644 index
> > 000..ab6e701
> > --- /dev/null
> > +++ b/ndctl/monitor.c
> > @@ -0,0 +1,460 @@
> > +/*
> > + * Copyright(c) 2018, FUJITSU LIMITED. All rights reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify it
> > + * under the terms and conditions of the GNU Lesser General Public
> > +License,
> > + * version 2.1, as published by the Free Software Foundation.
> 
> Did you intend for this to be LGPL-2.1?
> 
> The licensing we have been doing to date is GPL-2.0 for the utility code 
> (i.e. ndctl/)
> especially because it copies from git which is GPL2.0. LGPL-2.1 is for the 
> library
> routines (i.e. ndctl/lib/). The intent is for applications to be able to use 
> the library
> without needing to share their application source, but improvements to the 
> utility
> are shared back with the project.

I will change it to GPL-2.0.

> > + *
> > + * This program is distributed in the hope it will be useful, but
> > + WITHOUT ANY
> > + * WARRANTY; without even the implied warranty of MERCHANTABILITY 

Re: use memcpy_mcsafe() for copy_to_iter() (was: Re: [PATCH v3 0/9] Series short description)

2018-05-15 Thread Ingo Molnar

* Dan Williams  wrote:

> On Mon, May 14, 2018 at 12:26 AM, Ingo Molnar  wrote:
> >
> > * Dan Williams  wrote:
> >
> >> Ingo, Thomas, Al, any concerns with this series?
> >
> > Yeah, so:
> >
> >"[PATCH v3 0/9] Series short description"
> >
> > ... isn't the catchiest of titles to capture my [all too easily distracted]
> > attention! ;-)
> 
> My bad! After that mistake it became a toss-up between more spam and
> hoping the distraction would not throw you off.
> 
> > I have marked it now for -tip processing. Linus was happy with this and 
> > acked the
> > approach, right?
> 
> I think "happy" is a strong word when it comes to x86 machine check
> handling. My interpretation is that he and Andy acquiesced that this
> is about the best we can do with dax+mce as things stand today.

So, how would you like to go about this series?

To help move it forward I applied the first 5 commits to tip:x86/dax, on a
vanilla v4.17-rc5 base, did some minor edits to the changelogs, tested it
superficially (I don't have DAX so this essentially means build tests) and
pushed out the result.

Barring some later generic-x86 regression (unlikely) this looks good to me - 
feel 
free to cross-pull that branch into your DAX/nvdimm tree.

Or we could apply the remaining changes to -tip too - your call.

Thanks,

Ingo
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm