Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-13 Thread Julian Calaby
Hi Christoph,

On Fri, Aug 14, 2015 at 12:35 AM, Christoph Hellwig  wrote:
> On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote:
>> I.e. ~90% of this patch set seems to be just mechanically dropping
>> BUG_ON()s and converting open coded stuff to use accessor functions
>> (which should be macros or get inlined, right?) - and the remaining
>> bit is not flushing if we don't have a physical page somewhere.
>
> Which is was 90%.  By lines changed most actually is the diffs for
> the cache flushing.

I was talking in terms of changes made, not lines changed: by my
recollection, about a third of the patches didn't touch flush calls
and most of the lines changed looked like refactoring so that making
the flush call conditional would be easier.

I guess it smelled like you were doing lots of distinct changes in a
single patch and I got my numbers wrong.

>> Would it make sense to split this patch set into a few bits: one to
>> drop all the useless BUG_ON()s, one to convert all the open coded
>> stuff to accessor functions, then another to do the actual page-less
>> sg stuff?
>
> Without the ifs the BUG_ON() actually are useful to assert we
> never feed the sort of physical addresses we can't otherwise support,
> so I don't think that part is doable.

My point is that there's a couple of patches that only remove
BUG_ON()s, which implies that for that particular driver it doesn't
matter if there's a physical page or not, so therefore that code is
purely "documentation".

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-13 Thread Boaz Harrosh
On 08/13/2015 05:40 PM, Christoph Hellwig wrote:
> On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote:
>> The support I have suggested and submitted for zone-less sections.
>> (In my add_persistent_memory() patchset)
>>
>> Would work perfectly well and transparent for all such multimedia cases.
>> (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM
>> a few times and it is great easy fun. (I wanted to experiment with cached
>> memory over a pcie)
> 
> And everyone agree that it was both buggy and incomplete.
> 

What? No one ever said anything about bugs. Is the first ever I hear of it.
I was always in the notion that no one even tried it out.

I'm smoking these page-full nvidimms for more than a year. With RDMA to
pears and swap out to disks. So is not that bad I would say

> Dan has done a respin of the page backed nvdimm work with most of
> these comments addressed.
> 

I would love some comments. All I got so far is silence. (And I do not
like Dan's patches comments will come next week)

> I have to say I hate both pfn-based I/O [1] and page backed nvdimms with
> passion, so we're looking into the lesser evil with an open mind.
> 
> [1] not the SGL part posted here, which I think is quite sane.  The bio
> side is much worse, though.
> 

What can I say. I like the page-backed nvdimms. And the long term for me
is 2M pages. I hope we can sit one day soon and you explain to me whats
evil about it. I would really really like to understand

Thanks though
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-13 Thread Christoph Hellwig
On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote:
> The support I have suggested and submitted for zone-less sections.
> (In my add_persistent_memory() patchset)
>
> Would work perfectly well and transparent for all such multimedia cases.
> (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM
> a few times and it is great easy fun. (I wanted to experiment with cached
> memory over a pcie)

And everyone agree that it was both buggy and incomplete.

Dan has done a respin of the page backed nvdimm work with most of
these comments addressed.

I have to say I hate both pfn-based I/O [1] and page backed nvdimms with
passion, so we're looking into the lesser evil with an open mind.

[1] not the SGL part posted here, which I think is quite sane.  The bio
side is much worse, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-13 Thread Christoph Hellwig
On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote:
> I.e. ~90% of this patch set seems to be just mechanically dropping
> BUG_ON()s and converting open coded stuff to use accessor functions
> (which should be macros or get inlined, right?) - and the remaining
> bit is not flushing if we don't have a physical page somewhere.

Which is was 90%.  By lines changed most actually is the diffs for
the cache flushing.

> Would it make sense to split this patch set into a few bits: one to
> drop all the useless BUG_ON()s, one to convert all the open coded
> stuff to accessor functions, then another to do the actual page-less
> sg stuff?

Without the ifs the BUG_ON() actually are useful to assert we
never feed the sort of physical addresses we can't otherwise support,
so I don't think that part is doable.

A simple series to make more use of sg_phys and add sg_pfn might
still be useful, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-13 Thread Christoph Hellwig
On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote:
 The support I have suggested and submitted for zone-less sections.
 (In my add_persistent_memory() patchset)

 Would work perfectly well and transparent for all such multimedia cases.
 (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM
 a few times and it is great easy fun. (I wanted to experiment with cached
 memory over a pcie)

And everyone agree that it was both buggy and incomplete.

Dan has done a respin of the page backed nvdimm work with most of
these comments addressed.

I have to say I hate both pfn-based I/O [1] and page backed nvdimms with
passion, so we're looking into the lesser evil with an open mind.

[1] not the SGL part posted here, which I think is quite sane.  The bio
side is much worse, though.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-13 Thread Christoph Hellwig
On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote:
 I.e. ~90% of this patch set seems to be just mechanically dropping
 BUG_ON()s and converting open coded stuff to use accessor functions
 (which should be macros or get inlined, right?) - and the remaining
 bit is not flushing if we don't have a physical page somewhere.

Which is was 90%.  By lines changed most actually is the diffs for
the cache flushing.

 Would it make sense to split this patch set into a few bits: one to
 drop all the useless BUG_ON()s, one to convert all the open coded
 stuff to accessor functions, then another to do the actual page-less
 sg stuff?

Without the ifs the BUG_ON() actually are useful to assert we
never feed the sort of physical addresses we can't otherwise support,
so I don't think that part is doable.

A simple series to make more use of sg_phys and add sg_pfn might
still be useful, though.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-13 Thread Julian Calaby
Hi Christoph,

On Fri, Aug 14, 2015 at 12:35 AM, Christoph Hellwig h...@lst.de wrote:
 On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote:
 I.e. ~90% of this patch set seems to be just mechanically dropping
 BUG_ON()s and converting open coded stuff to use accessor functions
 (which should be macros or get inlined, right?) - and the remaining
 bit is not flushing if we don't have a physical page somewhere.

 Which is was 90%.  By lines changed most actually is the diffs for
 the cache flushing.

I was talking in terms of changes made, not lines changed: by my
recollection, about a third of the patches didn't touch flush calls
and most of the lines changed looked like refactoring so that making
the flush call conditional would be easier.

I guess it smelled like you were doing lots of distinct changes in a
single patch and I got my numbers wrong.

 Would it make sense to split this patch set into a few bits: one to
 drop all the useless BUG_ON()s, one to convert all the open coded
 stuff to accessor functions, then another to do the actual page-less
 sg stuff?

 Without the ifs the BUG_ON() actually are useful to assert we
 never feed the sort of physical addresses we can't otherwise support,
 so I don't think that part is doable.

My point is that there's a couple of patches that only remove
BUG_ON()s, which implies that for that particular driver it doesn't
matter if there's a physical page or not, so therefore that code is
purely documentation.

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-13 Thread Boaz Harrosh
On 08/13/2015 05:40 PM, Christoph Hellwig wrote:
 On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote:
 The support I have suggested and submitted for zone-less sections.
 (In my add_persistent_memory() patchset)

 Would work perfectly well and transparent for all such multimedia cases.
 (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM
 a few times and it is great easy fun. (I wanted to experiment with cached
 memory over a pcie)
 
 And everyone agree that it was both buggy and incomplete.
 

What? No one ever said anything about bugs. Is the first ever I hear of it.
I was always in the notion that no one even tried it out.

I'm smoking these page-full nvidimms for more than a year. With RDMA to
pears and swap out to disks. So is not that bad I would say

 Dan has done a respin of the page backed nvdimm work with most of
 these comments addressed.
 

I would love some comments. All I got so far is silence. (And I do not
like Dan's patches comments will come next week)

 I have to say I hate both pfn-based I/O [1] and page backed nvdimms with
 passion, so we're looking into the lesser evil with an open mind.
 
 [1] not the SGL part posted here, which I think is quite sane.  The bio
 side is much worse, though.
 

What can I say. I like the page-backed nvdimms. And the long term for me
is 2M pages. I hope we can sit one day soon and you explain to me whats
evil about it. I would really really like to understand

Thanks though
Boaz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-12 Thread Julian Calaby
Hi,

On Wed, Aug 12, 2015 at 10:42 PM, Boaz Harrosh  wrote:
> On 08/12/2015 10:05 AM, Christoph Hellwig wrote:
>> It turns out most DMA mapping implementation can handle SGLs without
>> page structures with some fairly simple mechanical work.  Most of it
>> is just about consistently using sg_phys.  For implementations that
>> need to flush caches we need a new helper that skips these cache
>> flushes if a entry doesn't have a kernel virtual address.
>>
>> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
>> to be operate mostly on virtual addresses.  It's a fairly odd concept
>> that I don't fully grasp, so I'll need some help with those if we want
>> to bring this forward.
>>
>> Additional this series skips ARM entirely for now.  The reason is
>> that most arm implementations of the .map_sg operation just iterate
>> over all entries and call ->map_page for it, which means we'd need
>> to convert those to a ->map_pfn similar to Dan's previous approach.
>>
>
[snip]
>
> It is a bit of work but is worth while, and accelerating tremendously
> lots of workloads. Not like this abomination which only branches
> things more and more, and making things fatter and slower.

As a random guy reading a big bunch of patches on code I know almost
nothing about, parts of this comment really resonated with me:
overall, we seem to be adding a lot of if statements to code that
appears to be in a hot path.

I.e. ~90% of this patch set seems to be just mechanically dropping
BUG_ON()s and converting open coded stuff to use accessor functions
(which should be macros or get inlined, right?) - and the remaining
bit is not flushing if we don't have a physical page somewhere.

Would it make sense to split this patch set into a few bits: one to
drop all the useless BUG_ON()s, one to convert all the open coded
stuff to accessor functions, then another to do the actual page-less
sg stuff?

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-12 Thread Grant Grundler
On Wed, Aug 12, 2015 at 10:00 AM, James Bottomley
 wrote:
> On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote:
...
>> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
>> to be operate mostly on virtual addresses.  It's a fairly odd concept
>> that I don't fully grasp, so I'll need some help with those if we want
>> to bring this forward.

James explained the primary function of IOMMUs on parisc (DMA-Cache
coherency) much better than I ever could.

Three more observations:
1) the IOMMU can be bypassed by 64-bit DMA devices on IA64.

2) IOMMU enables 32-bit DMA devices to reach > 32-bit physical memory
and thus avoiding bounce buffers. parisc and older IA-64 have some
32-bit PCI devices - e.g. IDE boot HDD.

3) IOMMU acts as a proxy for IO devices by fetching cachelines of data
for PA-RISC systems whose memory controllers ONLY serve cacheline
sized transactions. ie. 32-bit DMA results in the IOMMU fetching the
cacheline and updating just the 32-bits in a DMA cache coherent
fashion.

Bonus thought:
4) IOMMU can improve DMA performance in some cases using "hints"
provided by the OS (e.g. prefetching DMA data or using READ_CURRENT
bus transactions instead of normal memory fetches.)

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-12 Thread James Bottomley
On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote:
> Dan Williams started to look into addressing I/O to and from
> Persistent Memory in his series from June:
> 
>   http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944
> 
> I've started looking into DMA mapping of these SGLs specifically instead
> of the map_pfn method in there.  In addition to supporting NVDIMM backed
> I/O I also suspect this would be highly useful for media drivers that
> go through nasty hoops to be able to DMA from/to their ioremapped regions,
> with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c
> being a prime example for the unsafe hacks currently used.
> 
> It turns out most DMA mapping implementation can handle SGLs without
> page structures with some fairly simple mechanical work.  Most of it
> is just about consistently using sg_phys.  For implementations that
> need to flush caches we need a new helper that skips these cache
> flushes if a entry doesn't have a kernel virtual address.
> 
> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
> to be operate mostly on virtual addresses.  It's a fairly odd concept
> that I don't fully grasp, so I'll need some help with those if we want
> to bring this forward.

I can explain that.  I think this doesn't apply to ia64 because it's
cache is PIPT, but on parisc, we have a VIPT cache.

On normal physically indexed architectures, when the iommu sees a DMA
transfer to/from physical memory, it also notifies the CPU to flush the
internal CPU caches of those lines.  This is usually an interlocking
step of the transfer to make sure the page is coherent before transfer
to/from the device (it's why the ia32 for instance is a coherent
architecture).  Because the system is physically indexed, there's no
need to worry about aliases.

On Virtually Indexed systems, like parisc, there is an aliasing problem.
The CCIO iommu unit (and all other iommu systems on parisc) have what's
called a local coherence index (LCI).  You program it as part of the
IOMMU page table and it tells the system which Virtual line in the cache
to flush as part of the IO transaction, thus still ensuring cache
coherence.  That's why we have to know the virtual as well as physical
addresses for the page.  The problem we have in Linux is that we have
two virtual addresses, which are often incoherent aliases: the user
virtual address and a kernel virtual address but we can only make the
page coherent with a single alias (only one LCI).  The way I/O on Linux
currently works is that get_user_pages actually flushes the user virtual
address, so that's expected to be coherent, so the address we program
into the VCI is the kernel virtual address.  Usually nothing in the
kernel has ever touched the page, so there's nothing to flush, but we do
it just in case.

In theory, for these non kernel page backed SG entries, we can make the
process more efficient by not flushing in gup and instead programming
the user virtual address into the local coherence index.  However,
simply zeroing the LCI will also work (except that poor VI zero line
will get flushed repeatedly, so it's probably best to pick a known
untouched line in the kernel).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-12 Thread Boaz Harrosh
On 08/12/2015 10:05 AM, Christoph Hellwig wrote:
> Dan Williams started to look into addressing I/O to and from
> Persistent Memory in his series from June:
> 
>   http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944
> 
> I've started looking into DMA mapping of these SGLs specifically instead
> of the map_pfn method in there.  In addition to supporting NVDIMM backed
> I/O I also suspect this would be highly useful for media drivers that
> go through nasty hoops to be able to DMA from/to their ioremapped regions,
> with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c
> being a prime example for the unsafe hacks currently used.
> 

The support I have suggested and submitted for zone-less sections.
(In my add_persistent_memory() patchset)

Would work perfectly well and transparent for all such multimedia cases.
(All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM
a few times and it is great easy fun. (I wanted to experiment with cached
memory over a pcie)

> It turns out most DMA mapping implementation can handle SGLs without
> page structures with some fairly simple mechanical work.  Most of it
> is just about consistently using sg_phys.  For implementations that
> need to flush caches we need a new helper that skips these cache
> flushes if a entry doesn't have a kernel virtual address.
> 
> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem
> to be operate mostly on virtual addresses.  It's a fairly odd concept
> that I don't fully grasp, so I'll need some help with those if we want
> to bring this forward.
> 
> Additional this series skips ARM entirely for now.  The reason is
> that most arm implementations of the .map_sg operation just iterate
> over all entries and call ->map_page for it, which means we'd need
> to convert those to a ->map_pfn similar to Dan's previous approach.
> 

All this endless work for nothing more than uglyfing the Kernel, and
It will never end. When a real and fully working solution is right
here for more then a year.

If you are really up for a deep audit and a mammoth testing effort,
why not do a more worthy, and order of magnitude smaller work and support
2M and 1G variable sized "pages". All the virtual-vs-phisical-vs-caching
just works.

Most of the core work is there. Block layer and lots of other subsytems
already support sending a single page-pointer with bvec_offset bvec_len
bigger then 4K. Other system will be small fixes sprinkled around but
not at all this endless stream of subsystem after another of patches.
And for why.

The novelty of pages is the section object, the section is reached
from page* from virtual as well as physical planes. And is a center
that translate from all plains to all plains. You keep this concept
only make 2M-page sections and 1G-page sections.

It is a bit of work but is worth while, and accelerating tremendously
lots of workloads. Not like this abomination which only branches
things more and more, and making things fatter and slower.

It all feels like a typhoon, the inertia of tones and tons of
men hours work, in a huge wave. How will you ever stop such a
rushing mass. I'm trying to dock under but, surly it makes me sad.

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-12 Thread Julian Calaby
Hi,

On Wed, Aug 12, 2015 at 10:42 PM, Boaz Harrosh b...@plexistor.com wrote:
 On 08/12/2015 10:05 AM, Christoph Hellwig wrote:
 It turns out most DMA mapping implementation can handle SGLs without
 page structures with some fairly simple mechanical work.  Most of it
 is just about consistently using sg_phys.  For implementations that
 need to flush caches we need a new helper that skips these cache
 flushes if a entry doesn't have a kernel virtual address.

 However the ccio (parisc) and sba_iommu (parisc  ia64) IOMMUs seem
 to be operate mostly on virtual addresses.  It's a fairly odd concept
 that I don't fully grasp, so I'll need some help with those if we want
 to bring this forward.

 Additional this series skips ARM entirely for now.  The reason is
 that most arm implementations of the .map_sg operation just iterate
 over all entries and call -map_page for it, which means we'd need
 to convert those to a -map_pfn similar to Dan's previous approach.


[snip]

 It is a bit of work but is worth while, and accelerating tremendously
 lots of workloads. Not like this abomination which only branches
 things more and more, and making things fatter and slower.

As a random guy reading a big bunch of patches on code I know almost
nothing about, parts of this comment really resonated with me:
overall, we seem to be adding a lot of if statements to code that
appears to be in a hot path.

I.e. ~90% of this patch set seems to be just mechanically dropping
BUG_ON()s and converting open coded stuff to use accessor functions
(which should be macros or get inlined, right?) - and the remaining
bit is not flushing if we don't have a physical page somewhere.

Would it make sense to split this patch set into a few bits: one to
drop all the useless BUG_ON()s, one to convert all the open coded
stuff to accessor functions, then another to do the actual page-less
sg stuff?

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-12 Thread James Bottomley
On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote:
 Dan Williams started to look into addressing I/O to and from
 Persistent Memory in his series from June:
 
   http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944
 
 I've started looking into DMA mapping of these SGLs specifically instead
 of the map_pfn method in there.  In addition to supporting NVDIMM backed
 I/O I also suspect this would be highly useful for media drivers that
 go through nasty hoops to be able to DMA from/to their ioremapped regions,
 with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c
 being a prime example for the unsafe hacks currently used.
 
 It turns out most DMA mapping implementation can handle SGLs without
 page structures with some fairly simple mechanical work.  Most of it
 is just about consistently using sg_phys.  For implementations that
 need to flush caches we need a new helper that skips these cache
 flushes if a entry doesn't have a kernel virtual address.
 
 However the ccio (parisc) and sba_iommu (parisc  ia64) IOMMUs seem
 to be operate mostly on virtual addresses.  It's a fairly odd concept
 that I don't fully grasp, so I'll need some help with those if we want
 to bring this forward.

I can explain that.  I think this doesn't apply to ia64 because it's
cache is PIPT, but on parisc, we have a VIPT cache.

On normal physically indexed architectures, when the iommu sees a DMA
transfer to/from physical memory, it also notifies the CPU to flush the
internal CPU caches of those lines.  This is usually an interlocking
step of the transfer to make sure the page is coherent before transfer
to/from the device (it's why the ia32 for instance is a coherent
architecture).  Because the system is physically indexed, there's no
need to worry about aliases.

On Virtually Indexed systems, like parisc, there is an aliasing problem.
The CCIO iommu unit (and all other iommu systems on parisc) have what's
called a local coherence index (LCI).  You program it as part of the
IOMMU page table and it tells the system which Virtual line in the cache
to flush as part of the IO transaction, thus still ensuring cache
coherence.  That's why we have to know the virtual as well as physical
addresses for the page.  The problem we have in Linux is that we have
two virtual addresses, which are often incoherent aliases: the user
virtual address and a kernel virtual address but we can only make the
page coherent with a single alias (only one LCI).  The way I/O on Linux
currently works is that get_user_pages actually flushes the user virtual
address, so that's expected to be coherent, so the address we program
into the VCI is the kernel virtual address.  Usually nothing in the
kernel has ever touched the page, so there's nothing to flush, but we do
it just in case.

In theory, for these non kernel page backed SG entries, we can make the
process more efficient by not flushing in gup and instead programming
the user virtual address into the local coherence index.  However,
simply zeroing the LCI will also work (except that poor VI zero line
will get flushed repeatedly, so it's probably best to pick a known
untouched line in the kernel).

James


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-12 Thread Boaz Harrosh
On 08/12/2015 10:05 AM, Christoph Hellwig wrote:
 Dan Williams started to look into addressing I/O to and from
 Persistent Memory in his series from June:
 
   http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944
 
 I've started looking into DMA mapping of these SGLs specifically instead
 of the map_pfn method in there.  In addition to supporting NVDIMM backed
 I/O I also suspect this would be highly useful for media drivers that
 go through nasty hoops to be able to DMA from/to their ioremapped regions,
 with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c
 being a prime example for the unsafe hacks currently used.
 

The support I have suggested and submitted for zone-less sections.
(In my add_persistent_memory() patchset)

Would work perfectly well and transparent for all such multimedia cases.
(All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM
a few times and it is great easy fun. (I wanted to experiment with cached
memory over a pcie)

 It turns out most DMA mapping implementation can handle SGLs without
 page structures with some fairly simple mechanical work.  Most of it
 is just about consistently using sg_phys.  For implementations that
 need to flush caches we need a new helper that skips these cache
 flushes if a entry doesn't have a kernel virtual address.
 
 However the ccio (parisc) and sba_iommu (parisc  ia64) IOMMUs seem
 to be operate mostly on virtual addresses.  It's a fairly odd concept
 that I don't fully grasp, so I'll need some help with those if we want
 to bring this forward.
 
 Additional this series skips ARM entirely for now.  The reason is
 that most arm implementations of the .map_sg operation just iterate
 over all entries and call -map_page for it, which means we'd need
 to convert those to a -map_pfn similar to Dan's previous approach.
 

All this endless work for nothing more than uglyfing the Kernel, and
It will never end. When a real and fully working solution is right
here for more then a year.

If you are really up for a deep audit and a mammoth testing effort,
why not do a more worthy, and order of magnitude smaller work and support
2M and 1G variable sized pages. All the virtual-vs-phisical-vs-caching
just works.

Most of the core work is there. Block layer and lots of other subsytems
already support sending a single page-pointer with bvec_offset bvec_len
bigger then 4K. Other system will be small fixes sprinkled around but
not at all this endless stream of subsystem after another of patches.
And for why.

The novelty of pages is the section object, the section is reached
from page* from virtual as well as physical planes. And is a center
that translate from all plains to all plains. You keep this concept
only make 2M-page sections and 1G-page sections.

It is a bit of work but is worth while, and accelerating tremendously
lots of workloads. Not like this abomination which only branches
things more and more, and making things fatter and slower.

It all feels like a typhoon, the inertia of tones and tons of
men hours work, in a huge wave. How will you ever stop such a
rushing mass. I'm trying to dock under but, surly it makes me sad.

Thanks
Boaz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-12 Thread Grant Grundler
On Wed, Aug 12, 2015 at 10:00 AM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
 On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote:
...
 However the ccio (parisc) and sba_iommu (parisc  ia64) IOMMUs seem
 to be operate mostly on virtual addresses.  It's a fairly odd concept
 that I don't fully grasp, so I'll need some help with those if we want
 to bring this forward.

James explained the primary function of IOMMUs on parisc (DMA-Cache
coherency) much better than I ever could.

Three more observations:
1) the IOMMU can be bypassed by 64-bit DMA devices on IA64.

2) IOMMU enables 32-bit DMA devices to reach  32-bit physical memory
and thus avoiding bounce buffers. parisc and older IA-64 have some
32-bit PCI devices - e.g. IDE boot HDD.

3) IOMMU acts as a proxy for IO devices by fetching cachelines of data
for PA-RISC systems whose memory controllers ONLY serve cacheline
sized transactions. ie. 32-bit DMA results in the IOMMU fetching the
cacheline and updating just the 32-bits in a DMA cache coherent
fashion.

Bonus thought:
4) IOMMU can improve DMA performance in some cases using hints
provided by the OS (e.g. prefetching DMA data or using READ_CURRENT
bus transactions instead of normal memory fetches.)

cheers,
grant
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/