Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-04-28 Thread Matthew Wilcox
On Sat, Apr 28, 2018 at 09:46:52PM +0200, Julia Lawall wrote:
> FWIW, here is my semantic patch and the output - it reports on things that
> appear to be too small and things that it doesn't know about.
> 
> What are the relevant pci wrappers?  I didn't find them.

Basically all of the functions in include/linux/pci-dma-compat.h

> too small: drivers/gpu/drm/i915/i915_drv.c:1138: 30
> too small: drivers/net/wireless/broadcom/b43/dma.c:1068: 30
> unknown: sound/pci/ctxfi/cthw20k2.c:2033: DMA_BIT_MASK(dma_bits)
> unknown: sound/pci/ctxfi/cthw20k2.c:2034: DMA_BIT_MASK(dma_bits)

This one's good:

const unsigned int dma_bits = BITS_PER_LONG;

> unknown: drivers/scsi/megaraid/megaraid_sas_base.c:6036: consistent_mask

and this one:
consistent_mask = (instance->adapter_type == VENTURA_SERIES) ?
DMA_BIT_MASK(64) : DMA_BIT_MASK(32);

> unknown: drivers/net/wireless/ath/wil6210/txrx.c:200: 
> DMA_BIT_MASK(wil->dma_addr_size)

if (wil->dma_addr_size > 32)
dma_set_mask_and_coherent(dev,
  DMA_BIT_MASK(wil->dma_addr_size));

> unknown: drivers/net/ethernet/netronome/nfp/nfp_main.c:452: 
> DMA_BIT_MASK(NFP_NET_MAX_DMA_BITS)

drivers/net/ethernet/netronome/nfp/nfp_net.h:#define NFP_NET_MAX_DMA_BITS   
40

> unknown: drivers/gpu/host1x/dev.c:199: host->info->dma_mask

Looks safe ...

drivers/gpu/host1x/bus.c:   device->dev.coherent_dma_mask = 
host1x->dev->coherent_dma_mask;
drivers/gpu/host1x/bus.c:   device->dev.dma_mask = 
>dev.coherent_dma_mask;
drivers/gpu/host1x/dev.c:   .dma_mask = DMA_BIT_MASK(32),
drivers/gpu/host1x/dev.c:   .dma_mask = DMA_BIT_MASK(32),
drivers/gpu/host1x/dev.c:   .dma_mask = DMA_BIT_MASK(34),
drivers/gpu/host1x/dev.c:   .dma_mask = DMA_BIT_MASK(34),
drivers/gpu/host1x/dev.c:   .dma_mask = DMA_BIT_MASK(34),
drivers/gpu/host1x/dev.c:   dma_set_mask_and_coherent(host->dev, 
host->info->dma_mask);
drivers/gpu/host1x/dev.h:   u64 dma_mask; /* mask of addressable memory */

... but that reminds us that maybe some drivers aren't using dma_set_mask()
but rather touching dma_mask directly.

... 57 more to look at ...


Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-04-28 Thread Julia Lawall


On Sat, 28 Apr 2018, Luis R. Rodriguez wrote:

> On Sat, Apr 28, 2018 at 01:42:21AM -0700, Christoph Hellwig wrote:
> > On Fri, Apr 27, 2018 at 04:14:56PM +, Luis R. Rodriguez wrote:
> > > Do we have a list of users for x86 with a small DMA mask?
> > > Or, given that I'm not aware of a tool to be able to look
> > > for this in an easy way, would it be good to find out which
> > > x86 drivers do have a small mask?
> >
> > Basically you'll have to grep for calls to dma_set_mask/
> > dma_set_coherent_mask/dma_set_mask_and_coherent and their pci_*
> > wrappers with masks smaller 32-bit.  Some use numeric values,
> > some use DMA_BIT_MASK and various places uses local variables
> > or struct members to parse them, so finding them will be a bit
> > more work.  Nothing a coccinelle expert couldn't solve, though :)
>
> Thing is unless we have a specific flag used consistently I don't believe we
> can do this search with Coccinelle. ie, if we have local variables and based 
> on
> some series of variables things are set, this makes the grammatical expression
> difficult to express.  So Cocinelle is not designed for this purpose.
>
> But I believe smatch [0] is intended exactly for this sort of purpose, is that
> right Dan? I gave a cursory look and I think it'd take me significant time to
> get such hunt down.
>
> [0] https://lwn.net/Articles/691882/

FWIW, here is my semantic patch and the output - it reports on things that
appear to be too small and things that it doesn't know about.

What are the relevant pci wrappers?  I didn't find them.

julia

@initialize:ocaml@
@@

let clean s = String.concat "" (Str.split (Str.regexp " ") s)

let shorten s = List.nth (Str.split (Str.regexp "linux-next/") s) 1

@bad1 exists@
identifier i,x;
expression e;
position p;
@@

x = DMA_BIT_MASK(i)
...
\(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\)(e,x)

@bad2@
identifier i;
expression e;
position p;
@@

\(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\)
   (e,DMA_BIT_MASK(i))

@ok1 exists@
identifier x;
expression e;
constant c;
position p != bad1.p;
@@

x = \(DMA_BIT_MASK(c)\|0x\)
...
\(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\)(e,x)

@script:ocaml@
p << ok1.p;
c << ok1.c;
@@

let c = int_of_string c in
if c < 32
then
  let p = List.hd p in
  Printf.printf "too small: %s:%d: %d\n" (shorten p.file) p.line c

@ok2@
expression e;
constant c;
position p != bad2.p;
@@

\(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\)
   (e,\(DMA_BIT_MASK(c)\|0x\))

@script:ocaml@
p << ok2.p;
c << ok2.c;
@@

let c = int_of_string c in
if c < 32
then
  let p = List.hd p in
  Printf.printf "too small: %s:%d: %d\n" (shorten p.file) p.line c

@unk@
expression e,e1 != ATA_DMA_MASK;
position p != {ok1.p,ok2.p};
@@

\(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\)(e,e1)

@script:ocaml@
p << unk.p;
e1 << unk.e1;
@@

let p = List.hd p in
Printf.printf "unknown: %s:%d: %s\n" (shorten p.file) p.line (clean e1)

-

too small: drivers/gpu/drm/i915/i915_drv.c:1138: 30
too small: drivers/net/wireless/broadcom/b43/dma.c:1068: 30
unknown: sound/pci/ctxfi/cthw20k2.c:2033: DMA_BIT_MASK(dma_bits)
unknown: sound/pci/ctxfi/cthw20k2.c:2034: DMA_BIT_MASK(dma_bits)
unknown: drivers/scsi/megaraid/megaraid_sas_base.c:6036: consistent_mask
unknown: drivers/net/wireless/ath/wil6210/txrx.c:200: 
DMA_BIT_MASK(wil->dma_addr_size)
unknown: drivers/net/ethernet/netronome/nfp/nfp_main.c:452: 
DMA_BIT_MASK(NFP_NET_MAX_DMA_BITS)
unknown: drivers/gpu/host1x/dev.c:199: host->info->dma_mask
unknown: drivers/iommu/arm-smmu-v3.c:2691: DMA_BIT_MASK(smmu->oas)
too small: sound/pci/es1968.c:2692: 28
too small: sound/pci/es1968.c:2693: 28
too small: drivers/net/wireless/broadcom/b43legacy/dma.c:809: 30
unknown: drivers/virtio/virtio_mmio.c:573: DMA_BIT_MASK(32+PAGE_SHIFT)
unknown: drivers/ata/sata_nv.c:762: pp->adma_dma_mask
unknown: drivers/dma/mmp_pdma.c:1094: pdev->dev->coherent_dma_mask
too small: sound/pci/maestro3.c:2557: 28
too small: sound/pci/maestro3.c:2558: 28
too small: sound/pci/ice1712/ice1712.c:2533: 28
too small: sound/pci/ice1712/ice1712.c:2534: 28
unknown: drivers/net/wireless/ath/wil6210/pmc.c:132: 
DMA_BIT_MASK(wil->dma_addr_size)
unknown: drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c:313: 
DMA_BIT_MASK(tdev->func->iommu_bit)
unknown: drivers/net/ethernet/synopsys/dwc-xlgmac-common.c:96: 
DMA_BIT_MASK(pdata->hw_feat.dma_width)
too small: sound/pci/als4000.c:874: 24
too small: sound/pci/als4000.c:875: 24
unknown: drivers/hwtracing/coresight/coresight-tmc.c:335: DMA_BIT_MASK(dma_mask)
unknown: drivers/dma/xilinx/xilinx_dma.c:2634: DMA_BIT_MASK(addr_width)
too small: sound/pci/sonicvibes.c:1262: 24
too small: sound/pci/sonicvibes.c:1263: 24
too small: sound/pci/es1938.c:1600: 24
too small: sound/pci/es1938.c:1601: 24
unknown: drivers/crypto/ccree/cc_driver.c:260: dma_mask
unknown: sound/pci/hda/hda_intel.c:1888: 

Re: [PATCH] scsi_transport_sas: don't bounce highmem pages for the smp handler

2018-04-28 Thread Johannes Thumshirn
On Fri, Apr 27, 2018 at 07:24:18AM +0200, Christoph Hellwig wrote:
> Johannes,
> 
> can you take a look at this?  You are one of the few persons who cared
> about SMP passthrough in the recent past.

I'm sitting at the airport currently, but as soon as I'm back in the office
I'll have a look.

Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-04-28 Thread Luis R. Rodriguez
On Sat, Apr 28, 2018 at 01:42:21AM -0700, Christoph Hellwig wrote:
> On Fri, Apr 27, 2018 at 04:14:56PM +, Luis R. Rodriguez wrote:
> > Do we have a list of users for x86 with a small DMA mask?
> > Or, given that I'm not aware of a tool to be able to look
> > for this in an easy way, would it be good to find out which
> > x86 drivers do have a small mask?
> 
> Basically you'll have to grep for calls to dma_set_mask/
> dma_set_coherent_mask/dma_set_mask_and_coherent and their pci_*
> wrappers with masks smaller 32-bit.  Some use numeric values,
> some use DMA_BIT_MASK and various places uses local variables
> or struct members to parse them, so finding them will be a bit
> more work.  Nothing a coccinelle expert couldn't solve, though :)

Thing is unless we have a specific flag used consistently I don't believe we
can do this search with Coccinelle. ie, if we have local variables and based on
some series of variables things are set, this makes the grammatical expression
difficult to express.  So Cocinelle is not designed for this purpose.

But I believe smatch [0] is intended exactly for this sort of purpose, is that
right Dan? I gave a cursory look and I think it'd take me significant time to
get such hunt down.

[0] https://lwn.net/Articles/691882/

  Luis


Re: [PATCH 0/3] scsi: scsi-mq: don't hold host_busy in IO path

2018-04-28 Thread Ming Lei
On Fri, Apr 27, 2018 at 09:39:47AM -0600, Jens Axboe wrote:
> On 4/27/18 9:31 AM, Bart Van Assche wrote:
> > On Fri, 2018-04-20 at 14:57 +0800, Ming Lei wrote:
> >> This patches removes the expensive atomic opeation on host-wide counter
> >> of .host_busy for scsi-mq, and it is observed that IOPS can be increased by
> >> 15% with this change in IO test over scsi_debug.
> >>
> >>
> >> Ming Lei (3):
> >>   scsi: introduce scsi_host_busy()
> >>   scsi: read host_busy via scsi_host_busy()
> >>   scsi: avoid to hold host-wide counter of host_busy for scsi_mq
> >>
> >>  drivers/scsi/advansys.c   |  8 
> >>  drivers/scsi/hosts.c  | 32 
> >> +++
> >>  drivers/scsi/libsas/sas_scsi_host.c   |  4 ++--
> >>  drivers/scsi/megaraid/megaraid_sas_base.c |  2 +-
> >>  drivers/scsi/mpt3sas/mpt3sas_base.c   |  4 ++--
> >>  drivers/scsi/qlogicpti.c  |  2 +-
> >>  drivers/scsi/scsi.c   |  2 +-
> >>  drivers/scsi/scsi_error.c |  6 +++---
> >>  drivers/scsi/scsi_lib.c   | 23 --
> >>  drivers/scsi/scsi_sysfs.c |  2 +-
> >>  include/scsi/scsi_host.h  |  1 +
> >>  11 files changed, 65 insertions(+), 21 deletions(-)\
> > 
> > Hello Ming,
> > 
> > From the MAINTAINERS file:
> > 
> > SCSI SUBSYSTEM
> > M:  "James E.J. Bottomley" 
> > T:  git git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git
> > M:  "Martin K. Petersen" 
> > T:  git git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> > L:  linux-scsi@vger.kernel.org
> > S:  Maintained
> > F:  Documentation/devicetree/bindings/scsi/
> > F:  drivers/scsi/
> > F:  include/scsi/
> > 
> > Hence my surprise when I saw that you sent this patch series to Jens instead
> > of James and Martin?
> 
> Martin and James are both on the CC as well. For what it's worth, the patch
> seems like a good approach to me. To handle the case that Hannes was concerned
> about (older drivers doing internal command issue), I would suggest that those
> drivers get instrumented to include a inc/dec of the host busy count for
> internal commands that bypass the normal tagging. That means the mq case needs
> to be
> 
> blk_mq_tagset_busy_iter(>tag_set, scsi_host_check_in_flight,
>   _flight);
> return in_flight.cnt + atomic_read(>host_busy);
> 
> The atomic read is basically free, once we get rid of the dirty of that
> variable on each IO.

Actually the internal command isn't submitted via normal IO path, then
it won't be completed via the normal completion path(soft_irq_done, or
.timeout), so handling internal command doesn't touch the scsi generic
counter of .host_busy.

I have talked with Hannes a bit about this at LSFMM, looks he agreed too.

Thanks,
Ming


Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-04-28 Thread Christoph Hellwig
On Fri, Apr 27, 2018 at 04:14:56PM +, Luis R. Rodriguez wrote:
> But curious, on a standard qemu x86_x64 KVM guest, which of the
> drivers do we know for certain *are* being used from the ones
> listed?

On a KVM guest probably none.  But not all the world is relatively
sane and standardized VMs unfortunately.

> > But even more importantly
> > we have plenty driver using it through dma_alloc_* and a small DMA
> > mask, and they are in use 
> 
> Do we have a list of users for x86 with a small DMA mask?
> Or, given that I'm not aware of a tool to be able to look
> for this in an easy way, would it be good to find out which
> x86 drivers do have a small mask?

Basically you'll have to grep for calls to dma_set_mask/
dma_set_coherent_mask/dma_set_mask_and_coherent and their pci_*
wrappers with masks smaller 32-bit.  Some use numeric values,
some use DMA_BIT_MASK and various places uses local variables
or struct members to parse them, so finding them will be a bit
more work.  Nothing a coccinell expert couldn't solve, though :)

> > - we actually had a 4.16 regression due to them.
> 
> Ah what commit was the culprit? Is that fixed already? If so what
> commit?

66bdb147 ("swiotlb: Use dma_direct_supported() for swiotlb_ops")

> > > SCSI is *severely* affected:
> > 
> > Not really.  We have unchecked_isa_dma to support about 4 drivers,
> 
> Ah very neat:
> 
>   * CONFIG_CHR_DEV_OSST - "SCSI OnStream SC-x0 tape support"
>   * CONFIG_SCSI_ADVANSYS - "AdvanSys SCSI support"
>   * CONFIG_SCSI_AHA1542 - "Adaptec AHA1542 support"
>   * CONFIG_SCSI_ESAS2R - "ATTO Technology's ExpressSAS RAID adapter driver"
> 
> > and less than a hand ful of drivers doing stupid things, which can
> > be fixed easily, and just need a volunteer.
> 
> Care to list what needs to be done? Can an eager beaver student do it?

Drop the drivers, as in my branch I prepared a while ago would be
easiest:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/unchecked_isa_dma

But unlike the other few aha1542 actually seems to have active users,
or at least had recently.  I'll need to send this out as a RFC, but
don't really expect it to fly.

If it doesn't we'll need to enhance swiotlb to support a ISA DMA pool
in addition to current 32-bit DMA pool, and also convert aha1542 to
use the DMA API.  Not really student material.

> > > That's the end of the review of all current explicit callers on x86.
> > > 
> > > # dma_alloc_coherent_gfp_flags() and dma_generic_alloc_coherent()
> > > 
> > > dma_alloc_coherent_gfp_flags() and dma_generic_alloc_coherent() set
> > > GFP_DMA if if (dma_mask <= DMA_BIT_MASK(24))
> > 
> > All that code is long gone and replaced with dma-direct.  Which still
> > uses GFP_DMA based on the dma mask, though - see above.
> 
> And that's mostly IOMMU code, on the alloc() dma_map_ops.

It is the dma mapping API, which translates the dma mask to the right
zone, and probably is the biggest user of ZONE_DMA in modern systems.

Currently there are still various arch and iommu specific
implementations of the allocator decisions, but I'm working to
consolidate them into common code.


Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-04-28 Thread Christoph Hellwig
On Fri, Apr 27, 2018 at 11:36:23AM -0500, Christopher Lameter wrote:
> On Fri, 27 Apr 2018, Matthew Wilcox wrote:
> 
> > Some devices have incredibly bogus hardware like 28 bit addressing
> > or 39 bit addressing.  We don't have a good way to allocate memory by
> > physical address other than than saying "GFP_DMA for anything less than
> > 32, GFP_DMA32 (or GFP_KERNEL on 32-bit) for anything less than 64 bit".
> >
> > Even CMA doesn't have a "cma_alloc_phys()".  Maybe that's the right place
> > to put such an allocation API.
> 
> The other way out of this would be to require a IOMMU?

Which on many systems doesn't exist.  And even if it exists might not
be usable.


Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-04-28 Thread Christoph Hellwig
On Fri, Apr 27, 2018 at 11:07:07AM -0500, Christopher Lameter wrote:
> Well it looks like what we are using it for is to force allocation from
> low physical memory if we fail to obtain proper memory through a normal
> channel.  The use of ZONE_DMA is only there for emergency purposes.
> I think we could subsitute ZONE_DMA32 on x87 without a problem.
> 
> Which means that ZONE_DMA has no purpose anymore.
> 
> Can we make ZONE_DMA on x86 refer to the low 32 bit physical addresses
> instead and remove ZONE_DMA32?

While < 32-bit allocations are much more common there are plenty
of requirements for < 24-bit or other weird masks still.


Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-04-28 Thread Christoph Hellwig
On Fri, Apr 27, 2018 at 09:18:43AM +0200, Michal Hocko wrote:
> > On Thu, Apr 26, 2018 at 09:54:06PM +, Luis R. Rodriguez wrote:
> > > In practice if you don't have a floppy device on x86, you don't need 
> > > ZONE_DMA,
> > 
> > I call BS on that, and you actually explain later why it it BS due
> > to some drivers using it more explicitly.  But even more importantly
> > we have plenty driver using it through dma_alloc_* and a small DMA
> > mask, and they are in use - we actually had a 4.16 regression due to
> > them.
> 
> Well, but do we need a zone for that purpose? The idea was to actually
> replace the zone by a CMA pool (at least on x86). With the current
> implementation of the CMA we would move the range [0-16M] pfn range into 
> zone_movable so it can be used and we would get rid of all of the
> overhead each zone brings (a bit in page flags, kmalloc caches and who
> knows what else)

That wasn't clear in the mail.  But if we have anothr way to allocate
<16MB memory we don't need ZONE_DMA for the floppy driver either, so the
above conclusion is still wrong.

> -- 
> Michal Hocko
> SUSE Labs
---end quoted text---


Re: [PATCH 3/3] scsi: avoid to hold host-wide counter of host_busy for scsi_mq

2018-04-28 Thread Ming Lei
On Fri, Apr 27, 2018 at 04:16:48PM +, Bart Van Assche wrote:
> On Fri, 2018-04-20 at 14:57 +0800, Ming Lei wrote:
> > +struct scsi_host_mq_in_flight {
> > +   int cnt;
> > +};
> > +
> > +static void scsi_host_check_in_flight(struct request *rq, void *data,
> > +   bool reserved)
> > +{
> > +   struct scsi_host_mq_in_flight *in_flight = data;
> > +
> > +   if (blk_mq_request_started(rq))
> > +   in_flight->cnt++;
> > +}
> > +
> >  /**
> >   * scsi_host_busy - Return the host busy counter
> >   * @shost: Pointer to Scsi_Host to inc.
> >   **/
> >  int scsi_host_busy(struct Scsi_Host *shost)
> >  {
> > -   return atomic_read(>host_busy);
> > +   struct scsi_host_mq_in_flight in_flight = {
> > +   .cnt = 0,
> > +   };
> > +
> > +   if (!shost->use_blk_mq)
> > +   return atomic_read(>host_busy);
> > +
> > +   blk_mq_tagset_busy_iter(>tag_set, scsi_host_check_in_flight,
> > +   _flight);
> > +   return in_flight.cnt;
> >  }
> >  EXPORT_SYMBOL(scsi_host_busy);
> 
> This patch introduces a subtle behavior change that has not been explained
> in the commit message. If a SCSI request gets requeued that results in a
> decrease of the .host_busy counter by scsi_device_unbusy() before the request
> is requeued and an increase of the host_busy counter when scsi_queue_rq() is
> called again. During that time such requests have the state MQ_RQ_COMPLETE and
> hence blk_mq_request_started() will return true and 
> scsi_host_check_in_flight()

No, __blk_mq_requeue_request() will change the rq state into MQ_RQ_IDLE,
so such issue you worried about, please look at scsi_mq_requeue_cmd(),
which calls blk_mq_requeue_request(), which puts driver tag and updates
rq's state to IDLE.

> will include these requests. In other words, this patch introduces a subtle
> behavior change that has not been explained in the commit message. Hence I'm
> doubt that this change is correct.

As I explained above, no such issue.


Thanks,
Ming


Re: [PATCH 2/3] scsi: read host_busy via scsi_host_busy()

2018-04-28 Thread Ming Lei
On Fri, Apr 27, 2018 at 03:51:46PM +, Bart Van Assche wrote:
> On Fri, 2018-04-20 at 14:57 +0800, Ming Lei wrote:
> >  show_host_busy(struct device *dev, struct device_attribute *attr, char 
> > *buf)
> >  {
> > struct Scsi_Host *shost = class_to_shost(dev);
> > -   return snprintf(buf, 20, "%d\n", atomic_read(>host_busy));
> > +   return snprintf(buf, 20, "%d\n", scsi_host_busy(shost));
> >  }
> >  static DEVICE_ATTR(host_busy, S_IRUGO, show_host_busy, NULL);
> 
> The ", 20" part is cargo-cult programming. Since you have to touch this code,
> please either use "sprintf(buf, ...)" or use "scnprintf(buf, PAGE_SIZE, ...)".

This patch is only to replace atomic_read(>host_busy) with
scsi_host_busy(shost) which returns 'int' too, so nothing related
with snprintf(buf, 20,..).

No mention the string with 20 length is enough to hold integer, so it
isn't needed too. 

So I don't see any reason to do that in this patch.

Thanks, 
Ming