Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Thu, May 03, 2018 at 02:03:38PM +0200, Michal Hocko wrote: > On Sat 28-04-18 19:10:47, Matthew Wilcox wrote: > > Another way we could approach this is to get rid of ZONE_DMA. Make GFP_DMA > > a flag which doesn't map to a zone. Rather, it redirects to a separate > > allocator. At boot, we hand all memory under 16MB to the DMA allocator. The > > DMA allocator can have a shrinker which just hands back all the memory once > > we're under memory pressure (if it's never had an allocation). > > Yeah, that was exactly the plan with the CMA allocator... We wouldn't > need the shrinker because who cares about 16MB which is not usable > anyway. The CMA pool sounds fine. But please kill GFP_DMA off first / at the same time. 95% of the users are either completely bogus or should be using the DMA API, and the few other can use the new allocator directly.
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Sat 28-04-18 19:10:47, Matthew Wilcox wrote: > Another way we could approach this is to get rid of ZONE_DMA. Make GFP_DMA > a flag which doesn't map to a zone. Rather, it redirects to a separate > allocator. At boot, we hand all memory under 16MB to the DMA allocator. The > DMA allocator can have a shrinker which just hands back all the memory once > we're under memory pressure (if it's never had an allocation). Yeah, that was exactly the plan with the CMA allocator... We wouldn't need the shrinker because who cares about 16MB which is not usable anyway. -- Michal Hocko SUSE Labs
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
Hi Luis, On Thu, Apr 26, 2018 at 11:54 PM, Luis R. Rodriguez wrote: > x86 implicit and explicit ZONE_DMA users > - > > We list below all x86 implicit and explicit ZONE_DMA users. > > # Explicit x86 users of GFP_DMA or __GFP_DMA > > * drivers/iio/common/ssp_sensors - wonder if enabling this on x86 was a > mistake. > Note that this needs SPI and SPI needs HAS_IOMEM. I only see HAS_IOMEM on > s390 ? But I do think the Intel Minnowboard has SPI, but doubt it has >the ssp sensor stuff. > > * drivers/input/rmi4/rmi_spi.c - same SPI question > * drivers/media/common/siano/ - make allyesconfig yields it enabled, but >not sure if this should ever be on x86 > * drivers/media/platform/sti/bdisp/ - likewise > * drivers/media/platform/sti/hva/ - likewise > * drivers/media/usb/gspca/ - likewise > * drivers/mmc/host/wbsd.c - likewise > * drivers/mtd/nand/gpmi-nand/ - likewise > * drivers/net/can/spi/hi311x.c - likewise > * drivers/net/can/spi/mcp251x.c - likewise > * drivers/net/ethernet/agere/ - likewise > * drivers/net/ethernet/neterion/vxge/ - likewise > * drivers/net/ethernet/rocker/ - likewise > * drivers/net/usb/kalmia.c - likewise > * drivers/net/ethernet/neterion/vxge/ - likewise > * drivers/spi/spi-pic32-sqi.c - likewise > * drivers/spi/spi-sh-msiof.c - likewise depends on ARCH_SHMOBILE || ARCH_RENESAS || COMPILE_TEST > * drivers/spi/spi-ti-qspi.c - likewise I haven't checked the others, but probably you want to disable COMPILE_TEST to make more educated guesses about driver usage on x86. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
Here are some improved results, also taking into account the pci functions. julia too small: drivers/gpu/drm/i915/i915_drv.c:1138: 30 too small: drivers/hwtracing/coresight/coresight-tmc.c:335: 0 too small: drivers/media/pci/sta2x11/sta2x11_vip.c:859: 29 too small: drivers/media/pci/sta2x11/sta2x11_vip.c:983: 26 too small: drivers/net/ethernet/broadcom/b44.c:2389: 30 too small: drivers/net/wan/wanxl.c:585: 28 too small: drivers/net/wan/wanxl.c:586: 28 too small: drivers/net/wireless/broadcom/b43/dma.c:1068: 30 too small: drivers/net/wireless/broadcom/b43legacy/dma.c:809: 30 too small: drivers/scsi/aacraid/commsup.c:1581: 31 too small: drivers/scsi/aacraid/linit.c:1651: 31 too small: drivers/usb/host/ehci-pci.c:127: 31 too small: sound/pci/ali5451/ali5451.c:2110: 31 too small: sound/pci/ali5451/ali5451.c:2111: 31 too small: sound/pci/als300.c:661: 28 too small: sound/pci/als300.c:662: 28 too small: sound/pci/als4000.c:874: 24 too small: sound/pci/als4000.c:875: 24 too small: sound/pci/azt3328.c:2421: 24 too small: sound/pci/azt3328.c:2422: 24 too small: sound/pci/emu10k1/emu10k1x.c:916: 28 too small: sound/pci/emu10k1/emu10k1x.c:917: 28 too small: sound/pci/es1938.c:1600: 24 too small: sound/pci/es1938.c:1601: 24 too small: sound/pci/es1968.c:2692: 28 too small: sound/pci/es1968.c:2693: 28 too small: sound/pci/ice1712/ice1712.c:2533: 28 too small: sound/pci/ice1712/ice1712.c:2534: 28 too small: sound/pci/maestro3.c:2557: 28 too small: sound/pci/maestro3.c:2558: 28 too small: sound/pci/sis7019.c:1328: 30 too small: sound/pci/sonicvibes.c:1262: 24 too small: sound/pci/sonicvibes.c:1263: 24 too small: sound/pci/trident/trident_main.c:3552: 30 too small: sound/pci/trident/trident_main.c:3553: 30 unknown: arch/x86/pci/sta2x11-fixup.c:169: STA2X11_AMBA_SIZE-1 unknown: arch/x86/pci/sta2x11-fixup.c:170: STA2X11_AMBA_SIZE-1 unknown: drivers/ata/sata_nv.c:762: pp->adma_dma_mask unknown: drivers/char/agp/intel-gtt.c:1409: DMA_BIT_MASK(mask) unknown: drivers/char/agp/intel-gtt.c:1413: DMA_BIT_MASK(mask) unknown: drivers/crypto/ccree/cc_driver.c:260: dma_mask unknown: drivers/dma/mmp_pdma.c:1094: pdev->dev->coherent_dma_mask unknown: drivers/dma/pxa_dma.c:1375: op->dev.coherent_dma_mask unknown: drivers/dma/xilinx/xilinx_dma.c:2634: DMA_BIT_MASK(addr_width) unknown: drivers/gpu/drm/ati_pcigart.c:117: gart_info->table_mask unknown: drivers/gpu/drm/msm/msm_drv.c:1132: ~0 unknown: drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c:313: DMA_BIT_MASK(tdev->func->iommu_bit) unknown: drivers/gpu/host1x/dev.c:199: host->info->dma_mask unknown: drivers/hwtracing/intel_th/core.c:379: parent->coherent_dma_mask unknown: drivers/iommu/arm-smmu.c:1848: DMA_BIT_MASK(size) unknown: drivers/media/pci/intel/ipu3/ipu3-cio2.c:1759: CIO2_DMA_MASK unknown: drivers/media/platform/qcom/venus/core.c:186: core->res->dma_mask unknown: drivers/message/fusion/mptbase.c:4599: ioc->dma_mask unknown: drivers/message/fusion/mptbase.c:4600: ioc->dma_mask unknown: drivers/net/ethernet/altera/altera_tse_main.c:1449: DMA_BIT_MASK(priv->dmaops->dmamask) unknown: drivers/net/ethernet/altera/altera_tse_main.c:1450: DMA_BIT_MASK(priv->dmaops->dmamask) unknown: drivers/net/ethernet/amazon/ena/ena_netdev.c:2455: DMA_BIT_MASK(dma_width) unknown: drivers/net/ethernet/amazon/ena/ena_netdev.c:2461: DMA_BIT_MASK(dma_width) unknown: drivers/net/ethernet/amd/pcnet32.c:1558: PCNET32_DMA_MASK unknown: drivers/net/ethernet/amd/xgbe/xgbe-main.c:294: DMA_BIT_MASK(pdata->hw_feat.dma_width) unknown: drivers/net/ethernet/broadcom/bnx2.c:8234: persist_dma_mask unknown: drivers/net/ethernet/broadcom/tg3.c:17781: persist_dma_mask unknown: drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c:315: old_mask unknown: drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c:316: old_cmask unknown: drivers/net/ethernet/sfc/efx.c:1298: dma_mask unknown: drivers/net/ethernet/sfc/falcon/efx.c:1251: dma_mask unknown: drivers/net/ethernet/synopsys/dwc-xlgmac-common.c:96: DMA_BIT_MASK(pdata->hw_feat.dma_width) unknown: drivers/net/wireless/ath/wil6210/pcie_bus.c:299: DMA_BIT_MASK(dma_addr_size[i]) unknown: drivers/net/wireless/ath/wil6210/pmc.c:132: DMA_BIT_MASK(wil->dma_addr_size) unknown: drivers/net/wireless/ath/wil6210/txrx.c:200: DMA_BIT_MASK(wil->dma_addr_size) unknown: drivers/scsi/3w-.c:2260: TW_DMA_MASK unknown: drivers/scsi/hptiop.c:1312: DMA_BIT_MASK(iop_ops->hw_dma_bit_mask) unknown: drivers/scsi/megaraid/megaraid_sas_base.c:6036: consistent_mask unknown: drivers/scsi/sym53c8xx_2/sym_glue.c:1315: DMA_DAC_MASK unknown: drivers/usb/gadget/udc/bdc/bdc_pci.c:86: pci->dev.coherent_dma_mask unknown: sound/pci/emu10k1/emu10k1_main.c:1910: emu->dma_mask @initialize:ocaml@ @@ let clean s = String.concat "" (Str.split (Str.regexp " ") s) let shorten s = List.nth (Str.split (Str.regexp "linux-next/") s) 1 let ios s = match Str.split_delim (Str.regexp "ULL") s with [n;""] -> int_of_string n | _ -> int_of_string s let number x = try ignore(ios x); tr
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Sat, Apr 28, 2018 at 09:46:52PM +0200, Julia Lawall wrote: > FWIW, here is my semantic patch and the output - it reports on things that > appear to be too small and things that it doesn't know about. > > What are the relevant pci wrappers? I didn't find them. Basically all of the functions in include/linux/pci-dma-compat.h > too small: drivers/gpu/drm/i915/i915_drv.c:1138: 30 > too small: drivers/net/wireless/broadcom/b43/dma.c:1068: 30 > unknown: sound/pci/ctxfi/cthw20k2.c:2033: DMA_BIT_MASK(dma_bits) > unknown: sound/pci/ctxfi/cthw20k2.c:2034: DMA_BIT_MASK(dma_bits) This one's good: const unsigned int dma_bits = BITS_PER_LONG; > unknown: drivers/scsi/megaraid/megaraid_sas_base.c:6036: consistent_mask and this one: consistent_mask = (instance->adapter_type == VENTURA_SERIES) ? DMA_BIT_MASK(64) : DMA_BIT_MASK(32); > unknown: drivers/net/wireless/ath/wil6210/txrx.c:200: > DMA_BIT_MASK(wil->dma_addr_size) if (wil->dma_addr_size > 32) dma_set_mask_and_coherent(dev, DMA_BIT_MASK(wil->dma_addr_size)); > unknown: drivers/net/ethernet/netronome/nfp/nfp_main.c:452: > DMA_BIT_MASK(NFP_NET_MAX_DMA_BITS) drivers/net/ethernet/netronome/nfp/nfp_net.h:#define NFP_NET_MAX_DMA_BITS 40 > unknown: drivers/gpu/host1x/dev.c:199: host->info->dma_mask Looks safe ... drivers/gpu/host1x/bus.c: device->dev.coherent_dma_mask = host1x->dev->coherent_dma_mask; drivers/gpu/host1x/bus.c: device->dev.dma_mask = &device->dev.coherent_dma_mask; drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(32), drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(32), drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(34), drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(34), drivers/gpu/host1x/dev.c: .dma_mask = DMA_BIT_MASK(34), drivers/gpu/host1x/dev.c: dma_set_mask_and_coherent(host->dev, host->info->dma_mask); drivers/gpu/host1x/dev.h: u64 dma_mask; /* mask of addressable memory */ ... but that reminds us that maybe some drivers aren't using dma_set_mask() but rather touching dma_mask directly. ... 57 more to look at ...
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Sat, 28 Apr 2018, Luis R. Rodriguez wrote: > On Sat, Apr 28, 2018 at 01:42:21AM -0700, Christoph Hellwig wrote: > > On Fri, Apr 27, 2018 at 04:14:56PM +, Luis R. Rodriguez wrote: > > > Do we have a list of users for x86 with a small DMA mask? > > > Or, given that I'm not aware of a tool to be able to look > > > for this in an easy way, would it be good to find out which > > > x86 drivers do have a small mask? > > > > Basically you'll have to grep for calls to dma_set_mask/ > > dma_set_coherent_mask/dma_set_mask_and_coherent and their pci_* > > wrappers with masks smaller 32-bit. Some use numeric values, > > some use DMA_BIT_MASK and various places uses local variables > > or struct members to parse them, so finding them will be a bit > > more work. Nothing a coccinelle expert couldn't solve, though :) > > Thing is unless we have a specific flag used consistently I don't believe we > can do this search with Coccinelle. ie, if we have local variables and based > on > some series of variables things are set, this makes the grammatical expression > difficult to express. So Cocinelle is not designed for this purpose. > > But I believe smatch [0] is intended exactly for this sort of purpose, is that > right Dan? I gave a cursory look and I think it'd take me significant time to > get such hunt down. > > [0] https://lwn.net/Articles/691882/ FWIW, here is my semantic patch and the output - it reports on things that appear to be too small and things that it doesn't know about. What are the relevant pci wrappers? I didn't find them. julia @initialize:ocaml@ @@ let clean s = String.concat "" (Str.split (Str.regexp " ") s) let shorten s = List.nth (Str.split (Str.regexp "linux-next/") s) 1 @bad1 exists@ identifier i,x; expression e; position p; @@ x = DMA_BIT_MASK(i) ... \(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\)(e,x) @bad2@ identifier i; expression e; position p; @@ \(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\) (e,DMA_BIT_MASK(i)) @ok1 exists@ identifier x; expression e; constant c; position p != bad1.p; @@ x = \(DMA_BIT_MASK(c)\|0x\) ... \(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\)(e,x) @script:ocaml@ p << ok1.p; c << ok1.c; @@ let c = int_of_string c in if c < 32 then let p = List.hd p in Printf.printf "too small: %s:%d: %d\n" (shorten p.file) p.line c @ok2@ expression e; constant c; position p != bad2.p; @@ \(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\) (e,\(DMA_BIT_MASK(c)\|0x\)) @script:ocaml@ p << ok2.p; c << ok2.c; @@ let c = int_of_string c in if c < 32 then let p = List.hd p in Printf.printf "too small: %s:%d: %d\n" (shorten p.file) p.line c @unk@ expression e,e1 != ATA_DMA_MASK; position p != {ok1.p,ok2.p}; @@ \(dma_set_mask@p\|dma_set_coherent_mask@p\|dma_set_mask_and_coherent@p\)(e,e1) @script:ocaml@ p << unk.p; e1 << unk.e1; @@ let p = List.hd p in Printf.printf "unknown: %s:%d: %s\n" (shorten p.file) p.line (clean e1) - too small: drivers/gpu/drm/i915/i915_drv.c:1138: 30 too small: drivers/net/wireless/broadcom/b43/dma.c:1068: 30 unknown: sound/pci/ctxfi/cthw20k2.c:2033: DMA_BIT_MASK(dma_bits) unknown: sound/pci/ctxfi/cthw20k2.c:2034: DMA_BIT_MASK(dma_bits) unknown: drivers/scsi/megaraid/megaraid_sas_base.c:6036: consistent_mask unknown: drivers/net/wireless/ath/wil6210/txrx.c:200: DMA_BIT_MASK(wil->dma_addr_size) unknown: drivers/net/ethernet/netronome/nfp/nfp_main.c:452: DMA_BIT_MASK(NFP_NET_MAX_DMA_BITS) unknown: drivers/gpu/host1x/dev.c:199: host->info->dma_mask unknown: drivers/iommu/arm-smmu-v3.c:2691: DMA_BIT_MASK(smmu->oas) too small: sound/pci/es1968.c:2692: 28 too small: sound/pci/es1968.c:2693: 28 too small: drivers/net/wireless/broadcom/b43legacy/dma.c:809: 30 unknown: drivers/virtio/virtio_mmio.c:573: DMA_BIT_MASK(32+PAGE_SHIFT) unknown: drivers/ata/sata_nv.c:762: pp->adma_dma_mask unknown: drivers/dma/mmp_pdma.c:1094: pdev->dev->coherent_dma_mask too small: sound/pci/maestro3.c:2557: 28 too small: sound/pci/maestro3.c:2558: 28 too small: sound/pci/ice1712/ice1712.c:2533: 28 too small: sound/pci/ice1712/ice1712.c:2534: 28 unknown: drivers/net/wireless/ath/wil6210/pmc.c:132: DMA_BIT_MASK(wil->dma_addr_size) unknown: drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c:313: DMA_BIT_MASK(tdev->func->iommu_bit) unknown: drivers/net/ethernet/synopsys/dwc-xlgmac-common.c:96: DMA_BIT_MASK(pdata->hw_feat.dma_width) too small: sound/pci/als4000.c:874: 24 too small: sound/pci/als4000.c:875: 24 unknown: drivers/hwtracing/coresight/coresight-tmc.c:335: DMA_BIT_MASK(dma_mask) unknown: drivers/dma/xilinx/xilinx_dma.c:2634: DMA_BIT_MASK(addr_width) too small: sound/pci/sonicvibes.c:1262: 24 too small: sound/pci/sonicvibes.c:1263: 24 too small: sound/pci/es1938.c:1600: 24 too small: sound/pci/es1938.c:1601: 24 unknown: drivers/crypto/ccree/cc_driver.c:260: dma_mask unknown: sound/pci/hda/hda_intel.c:1888: DMA_B
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Sat, Apr 28, 2018 at 01:42:21AM -0700, Christoph Hellwig wrote: > On Fri, Apr 27, 2018 at 04:14:56PM +, Luis R. Rodriguez wrote: > > Do we have a list of users for x86 with a small DMA mask? > > Or, given that I'm not aware of a tool to be able to look > > for this in an easy way, would it be good to find out which > > x86 drivers do have a small mask? > > Basically you'll have to grep for calls to dma_set_mask/ > dma_set_coherent_mask/dma_set_mask_and_coherent and their pci_* > wrappers with masks smaller 32-bit. Some use numeric values, > some use DMA_BIT_MASK and various places uses local variables > or struct members to parse them, so finding them will be a bit > more work. Nothing a coccinelle expert couldn't solve, though :) Thing is unless we have a specific flag used consistently I don't believe we can do this search with Coccinelle. ie, if we have local variables and based on some series of variables things are set, this makes the grammatical expression difficult to express. So Cocinelle is not designed for this purpose. But I believe smatch [0] is intended exactly for this sort of purpose, is that right Dan? I gave a cursory look and I think it'd take me significant time to get such hunt down. [0] https://lwn.net/Articles/691882/ Luis
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, Apr 27, 2018 at 04:14:56PM +, Luis R. Rodriguez wrote: > But curious, on a standard qemu x86_x64 KVM guest, which of the > drivers do we know for certain *are* being used from the ones > listed? On a KVM guest probably none. But not all the world is relatively sane and standardized VMs unfortunately. > > But even more importantly > > we have plenty driver using it through dma_alloc_* and a small DMA > > mask, and they are in use > > Do we have a list of users for x86 with a small DMA mask? > Or, given that I'm not aware of a tool to be able to look > for this in an easy way, would it be good to find out which > x86 drivers do have a small mask? Basically you'll have to grep for calls to dma_set_mask/ dma_set_coherent_mask/dma_set_mask_and_coherent and their pci_* wrappers with masks smaller 32-bit. Some use numeric values, some use DMA_BIT_MASK and various places uses local variables or struct members to parse them, so finding them will be a bit more work. Nothing a coccinell expert couldn't solve, though :) > > - we actually had a 4.16 regression due to them. > > Ah what commit was the culprit? Is that fixed already? If so what > commit? 66bdb147 ("swiotlb: Use dma_direct_supported() for swiotlb_ops") > > > SCSI is *severely* affected: > > > > Not really. We have unchecked_isa_dma to support about 4 drivers, > > Ah very neat: > > * CONFIG_CHR_DEV_OSST - "SCSI OnStream SC-x0 tape support" > * CONFIG_SCSI_ADVANSYS - "AdvanSys SCSI support" > * CONFIG_SCSI_AHA1542 - "Adaptec AHA1542 support" > * CONFIG_SCSI_ESAS2R - "ATTO Technology's ExpressSAS RAID adapter driver" > > > and less than a hand ful of drivers doing stupid things, which can > > be fixed easily, and just need a volunteer. > > Care to list what needs to be done? Can an eager beaver student do it? Drop the drivers, as in my branch I prepared a while ago would be easiest: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/unchecked_isa_dma But unlike the other few aha1542 actually seems to have active users, or at least had recently. I'll need to send this out as a RFC, but don't really expect it to fly. If it doesn't we'll need to enhance swiotlb to support a ISA DMA pool in addition to current 32-bit DMA pool, and also convert aha1542 to use the DMA API. Not really student material. > > > That's the end of the review of all current explicit callers on x86. > > > > > > # dma_alloc_coherent_gfp_flags() and dma_generic_alloc_coherent() > > > > > > dma_alloc_coherent_gfp_flags() and dma_generic_alloc_coherent() set > > > GFP_DMA if if (dma_mask <= DMA_BIT_MASK(24)) > > > > All that code is long gone and replaced with dma-direct. Which still > > uses GFP_DMA based on the dma mask, though - see above. > > And that's mostly IOMMU code, on the alloc() dma_map_ops. It is the dma mapping API, which translates the dma mask to the right zone, and probably is the biggest user of ZONE_DMA in modern systems. Currently there are still various arch and iommu specific implementations of the allocator decisions, but I'm working to consolidate them into common code.
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, Apr 27, 2018 at 11:36:23AM -0500, Christopher Lameter wrote: > On Fri, 27 Apr 2018, Matthew Wilcox wrote: > > > Some devices have incredibly bogus hardware like 28 bit addressing > > or 39 bit addressing. We don't have a good way to allocate memory by > > physical address other than than saying "GFP_DMA for anything less than > > 32, GFP_DMA32 (or GFP_KERNEL on 32-bit) for anything less than 64 bit". > > > > Even CMA doesn't have a "cma_alloc_phys()". Maybe that's the right place > > to put such an allocation API. > > The other way out of this would be to require a IOMMU? Which on many systems doesn't exist. And even if it exists might not be usable.
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, Apr 27, 2018 at 11:07:07AM -0500, Christopher Lameter wrote: > Well it looks like what we are using it for is to force allocation from > low physical memory if we fail to obtain proper memory through a normal > channel. The use of ZONE_DMA is only there for emergency purposes. > I think we could subsitute ZONE_DMA32 on x87 without a problem. > > Which means that ZONE_DMA has no purpose anymore. > > Can we make ZONE_DMA on x86 refer to the low 32 bit physical addresses > instead and remove ZONE_DMA32? While < 32-bit allocations are much more common there are plenty of requirements for < 24-bit or other weird masks still.
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, Apr 27, 2018 at 09:18:43AM +0200, Michal Hocko wrote: > > On Thu, Apr 26, 2018 at 09:54:06PM +, Luis R. Rodriguez wrote: > > > In practice if you don't have a floppy device on x86, you don't need > > > ZONE_DMA, > > > > I call BS on that, and you actually explain later why it it BS due > > to some drivers using it more explicitly. But even more importantly > > we have plenty driver using it through dma_alloc_* and a small DMA > > mask, and they are in use - we actually had a 4.16 regression due to > > them. > > Well, but do we need a zone for that purpose? The idea was to actually > replace the zone by a CMA pool (at least on x86). With the current > implementation of the CMA we would move the range [0-16M] pfn range into > zone_movable so it can be used and we would get rid of all of the > overhead each zone brings (a bit in page flags, kmalloc caches and who > knows what else) That wasn't clear in the mail. But if we have anothr way to allocate <16MB memory we don't need ZONE_DMA for the floppy driver either, so the above conclusion is still wrong. > -- > Michal Hocko > SUSE Labs ---end quoted text---
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri 27-04-18 11:07:07, Cristopher Lameter wrote: > On Fri, 27 Apr 2018, Michal Hocko wrote: > > > On Thu 26-04-18 22:35:56, Christoph Hellwig wrote: > > > On Thu, Apr 26, 2018 at 09:54:06PM +, Luis R. Rodriguez wrote: > > > > In practice if you don't have a floppy device on x86, you don't need > > > > ZONE_DMA, > > > > > > I call BS on that, and you actually explain later why it it BS due > > > to some drivers using it more explicitly. But even more importantly > > > we have plenty driver using it through dma_alloc_* and a small DMA > > > mask, and they are in use - we actually had a 4.16 regression due to > > > them. > > > > Well, but do we need a zone for that purpose? The idea was to actually > > replace the zone by a CMA pool (at least on x86). With the current > > implementation of the CMA we would move the range [0-16M] pfn range into > > zone_movable so it can be used and we would get rid of all of the > > overhead each zone brings (a bit in page flags, kmalloc caches and who > > knows what else) > > Well it looks like what we are using it for is to force allocation from > low physical memory if we fail to obtain proper memory through a normal > channel. The use of ZONE_DMA is only there for emergency purposes. > I think we could subsitute ZONE_DMA32 on x87 without a problem. > > Which means that ZONE_DMA has no purpose anymore. We still need to make sure the low 16MB is available on request. And that is what CMA can help with. We do not really seem to need the whole zone infreastructure for that. > Can we make ZONE_DMA on x86 refer to the low 32 bit physical addresses > instead and remove ZONE_DMA32? Why that would be an advantage? If anything I would rename ZONE_DMA32 to ZONE_ADDR32 or something like that. > That would actually improve the fallback because you have more memory for > the old devices. I do not really understand how is that related to removal ZONE_DMA. We are really talking about the lowest 16MB... -- Michal Hocko SUSE Labs
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, 27 Apr 2018, Matthew Wilcox wrote: > Some devices have incredibly bogus hardware like 28 bit addressing > or 39 bit addressing. We don't have a good way to allocate memory by > physical address other than than saying "GFP_DMA for anything less than > 32, GFP_DMA32 (or GFP_KERNEL on 32-bit) for anything less than 64 bit". > > Even CMA doesn't have a "cma_alloc_phys()". Maybe that's the right place > to put such an allocation API. The other way out of this would be to require a IOMMU?
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, Apr 27, 2018 at 04:14:56PM +, Luis R. Rodriguez wrote: > > Not really. We have unchecked_isa_dma to support about 4 drivers, > > Ah very neat: > > * CONFIG_CHR_DEV_OSST - "SCSI OnStream SC-x0 tape support" That's an upper level driver, like cdrom, disk and regular tapes. > * CONFIG_SCSI_ADVANSYS - "AdvanSys SCSI support" If we ditch support for the ISA boards, this can go away. > * CONFIG_SCSI_AHA1542 - "Adaptec AHA1542 support" Probably true. > * CONFIG_SCSI_ESAS2R - "ATTO Technology's ExpressSAS RAID adapter driver" That's being set to 0. You missed BusLogic.c and gdth.c
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, Apr 27, 2018 at 11:07:07AM -0500, Christopher Lameter wrote: > Well it looks like what we are using it for is to force allocation from > low physical memory if we fail to obtain proper memory through a normal > channel. The use of ZONE_DMA is only there for emergency purposes. > I think we could subsitute ZONE_DMA32 on x87 without a problem. > > Which means that ZONE_DMA has no purpose anymore. > > Can we make ZONE_DMA on x86 refer to the low 32 bit physical addresses > instead and remove ZONE_DMA32? > > That would actually improve the fallback because you have more memory for > the old devices. Some devices have incredibly bogus hardware like 28 bit addressing or 39 bit addressing. We don't have a good way to allocate memory by physical address other than than saying "GFP_DMA for anything less than 32, GFP_DMA32 (or GFP_KERNEL on 32-bit) for anything less than 64 bit". Even CMA doesn't have a "cma_alloc_phys()". Maybe that's the right place to put such an allocation API.
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Fri, 27 Apr 2018, Michal Hocko wrote: > On Thu 26-04-18 22:35:56, Christoph Hellwig wrote: > > On Thu, Apr 26, 2018 at 09:54:06PM +, Luis R. Rodriguez wrote: > > > In practice if you don't have a floppy device on x86, you don't need > > > ZONE_DMA, > > > > I call BS on that, and you actually explain later why it it BS due > > to some drivers using it more explicitly. But even more importantly > > we have plenty driver using it through dma_alloc_* and a small DMA > > mask, and they are in use - we actually had a 4.16 regression due to > > them. > > Well, but do we need a zone for that purpose? The idea was to actually > replace the zone by a CMA pool (at least on x86). With the current > implementation of the CMA we would move the range [0-16M] pfn range into > zone_movable so it can be used and we would get rid of all of the > overhead each zone brings (a bit in page flags, kmalloc caches and who > knows what else) Well it looks like what we are using it for is to force allocation from low physical memory if we fail to obtain proper memory through a normal channel. The use of ZONE_DMA is only there for emergency purposes. I think we could subsitute ZONE_DMA32 on x87 without a problem. Which means that ZONE_DMA has no purpose anymore. Can we make ZONE_DMA on x86 refer to the low 32 bit physical addresses instead and remove ZONE_DMA32? That would actually improve the fallback because you have more memory for the old devices.
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Thu, Apr 26, 2018 at 10:35:56PM -0700, Christoph Hellwig wrote: > On Thu, Apr 26, 2018 at 09:54:06PM +, Luis R. Rodriguez wrote: > > In practice if you don't have a floppy device on x86, you don't need > > ZONE_DMA, > > I call BS on that, I did not explain though that it was not me who claimed this though. The list displayed below is the result of trying to confirm/deny this, and what could be done, and also evaluating if there is *any* gain about doing something about it. But curious, on a standard qemu x86_x64 KVM guest, which of the drivers do we know for certain *are* being used from the ones listed? What about Xen guests, I wonder? > and you actually explain later why it it BS due > to some drivers using it more explicitly. Or implicitly. The list I showed is the work to show that the users of GFP_DMA on x86 is *much* more wide spread than expected from the above claim. I however did not also answer the above qemu x86_64 question, but would be good to know. Note I stated that the claim was *in practice*. > But even more importantly > we have plenty driver using it through dma_alloc_* and a small DMA > mask, and they are in use Do we have a list of users for x86 with a small DMA mask? Or, given that I'm not aware of a tool to be able to look for this in an easy way, would it be good to find out which x86 drivers do have a small mask? > - we actually had a 4.16 regression due to them. Ah what commit was the culprit? Is that fixed already? If so what commit? > > SCSI is *severely* affected: > > Not really. We have unchecked_isa_dma to support about 4 drivers, Ah very neat: * CONFIG_CHR_DEV_OSST - "SCSI OnStream SC-x0 tape support" * CONFIG_SCSI_ADVANSYS - "AdvanSys SCSI support" * CONFIG_SCSI_AHA1542 - "Adaptec AHA1542 support" * CONFIG_SCSI_ESAS2R - "ATTO Technology's ExpressSAS RAID adapter driver" > and less than a hand ful of drivers doing stupid things, which can > be fixed easily, and just need a volunteer. Care to list what needs to be done? Can an eager beaver student do it? > > That's the end of the review of all current explicit callers on x86. > > > > # dma_alloc_coherent_gfp_flags() and dma_generic_alloc_coherent() > > > > dma_alloc_coherent_gfp_flags() and dma_generic_alloc_coherent() set > > GFP_DMA if if (dma_mask <= DMA_BIT_MASK(24)) > > All that code is long gone and replaced with dma-direct. Which still > uses GFP_DMA based on the dma mask, though - see above. And that's mostly IOMMU code, on the alloc() dma_map_ops. Luis
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Thu 26-04-18 22:35:56, Christoph Hellwig wrote: > On Thu, Apr 26, 2018 at 09:54:06PM +, Luis R. Rodriguez wrote: > > In practice if you don't have a floppy device on x86, you don't need > > ZONE_DMA, > > I call BS on that, and you actually explain later why it it BS due > to some drivers using it more explicitly. But even more importantly > we have plenty driver using it through dma_alloc_* and a small DMA > mask, and they are in use - we actually had a 4.16 regression due to > them. Well, but do we need a zone for that purpose? The idea was to actually replace the zone by a CMA pool (at least on x86). With the current implementation of the CMA we would move the range [0-16M] pfn range into zone_movable so it can be used and we would get rid of all of the overhead each zone brings (a bit in page flags, kmalloc caches and who knows what else) -- Michal Hocko SUSE Labs
Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Thu, Apr 26, 2018 at 09:54:06PM +, Luis R. Rodriguez wrote: > In practice if you don't have a floppy device on x86, you don't need ZONE_DMA, I call BS on that, and you actually explain later why it it BS due to some drivers using it more explicitly. But even more importantly we have plenty driver using it through dma_alloc_* and a small DMA mask, and they are in use - we actually had a 4.16 regression due to them. > SCSI is *severely* affected: Not really. We have unchecked_isa_dma to support about 4 drivers, and less than a hand ful of drivers doing stupid things, which can be fixed easily, and just need a volunteer. > That's the end of the review of all current explicit callers on x86. > > # dma_alloc_coherent_gfp_flags() and dma_generic_alloc_coherent() > > dma_alloc_coherent_gfp_flags() and dma_generic_alloc_coherent() set > GFP_DMA if if (dma_mask <= DMA_BIT_MASK(24)) All that code is long gone and replaced with dma-direct. Which still uses GFP_DMA based on the dma mask, though - see above.
Re: [Lsf-pc] [LSF/MM TOPIC NOTES] x86 ZONE_DMA love
On Thu, 2018-04-26 at 21:54 +, Luis R. Rodriguez wrote: > Below are my notes on the ZONE_DMA discussion at LSF/MM 2018. There > were some > earlier discussion prior to my arrival to the session about moving > around > ZOME_DMA around, if someone has notes on that please share too :) We took notes during LSF/MM 2018. Not a whole lot on your topic, but most of the MM and plenary topics have some notes. https://etherpad.wikimedia.org/p/LSFMM2018 -- All Rights Reversed. signature.asc Description: This is a digitally signed message part
[LSF/MM TOPIC NOTES] x86 ZONE_DMA love
Below are my notes on the ZONE_DMA discussion at LSF/MM 2018. There were some earlier discussion prior to my arrival to the session about moving around ZOME_DMA around, if someone has notes on that please share too :) PS. I'm not subscribed to linux-mm Luis Determining you don't need to support ZONE_DMA on x86 at run time = In practice if you don't have a floppy device on x86, you don't need ZONE_DMA, in that case you dont need to support ZONE_DMA, however currently disabling it is only possible at compile time, and we won't know for sure until boot time if you have such a device. If you don't need ZONE_DMA though means we would not have to deal with slab allocators for them and special casings for it in a slew of places. In particular even kmalloc() has a branch which is always run if CONFIG_ZONE_DMA is enabled. ZONE_DMA is needed for old devices that requires lower addresses since it allows allocations more reliably. There should be more devices that require this, not just floppy though. Christoph Lameter added CONFIG_ZONE_DMA to disable ZONE_DMA at build time but most distributions enable this. If we could disable ZONE_DMA at run time once we know we don't have any device present requiring it we could get the same benefit of compiling without CONFIG_ZONE_DMA at run time. It used to be that disabling CONFIG_ZONE_DMA could help with performance, we don't seem to have modern benchmarks over possible gains on removing it. Are the gains no longer expected to be significant? Very likely there are no performance gains. The assumption then is that the main advantage over being able to disable ZONE_DMA on x86 these days would be pure aesthetics, and having x86 work more like other architectures with allocations. Use of ZONE_DMA on drivers are also good signs these drivers are old, or may be deprecated. Perhaps some of these on x86 should be moved to staging. Note that some architectures rely on ZONE_DMA as well, the above notes only applies to x86. We can use certain kernel mechanisms to disable usage of x86 certain features at run time. Below are a few options: * x86 binary patching * ACPI_SIG_FADT * static keys * compiler multiverse (at least the R&D gcc proof of concept is now complete) Detecting legacy x86 devices with ACPI ACPI_SIG_FADT We could expand on ACPI_SIG_FADT with more legacy devices. This mechanism was used to help determine if certain legacy x86 devices are present or not with paravirtualization. For instance: * ACPI_FADT_NO_VGA * ACPI_FADT_NO_CMOS_RTC CONFIG_ZONE_DMA --- Christoph Lameter added CONFIG_ZONE_DMA through commit 4b51d66989218 ("[PATCH] optional ZONE_DMA: optional ZONE_DMA in the VM") merged on v2.6.21. On x86 ZONE_DMA is defined as follows: config ZONE_DMA bool "DMA memory allocation support" if EXPERT default y help DMA memory allocation support allows devices with less than 32-bit addressing to allocate within the first 16MB of address space. Disable if no such devices will be used. If unsure, say Y. Most distributions enable CONFIG_ZONE_DMA. Immediate impact of CONFIG_ZONE_DMA --- CONFIG_ZONE_DMA implicaates kmalloc() as follows: struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) { ... #ifdef CONFIG_ZONE_DMA if (unlikely((flags & GFP_DMA))) return kmalloc_dma_caches[index]; #endif ... } ZONE_DMA users == Turns out there are much more users of ZONE_DMA than expected even on x86. Explicit requirements for ZONE_DMA with gfp flags - All drivers which explicitly use any of these flags implicate use of ZONE_DMA for allocations: * GFP_DMA * __GFP_DMA Implicit ZONE_DMA users --- There are a series of implicit users of ZONE_DMA which use helpers. These are, with details documented further below: * blk_queue_bounce() * blk_queue_bounce_limit() * dma_alloc_coherent_gfp_flags() * dma_generic_alloc_coherent() * intel_alloc_coherent() * _regmap_raw_write() * mempool_alloc_pages_isa() x86 implicit and explicit ZONE_DMA users - We list below all x86 implicit and explicit ZONE_DMA users. # Explicit x86 users of GFP_DMA or __GFP_DMA * drivers/iio/common/ssp_sensors - wonder if enabling this on x86 was a mistake. Note that this needs SPI and SPI needs HAS_IOMEM. I only see HAS_IOMEM on s390 ? But I do think the Intel Minnowboard has SPI, but doubt it has the ssp sensor stuff. * drivers/input/rmi4/rmi_spi.c - same SPI question * drivers/media/common/siano/ - make allyesconfig yields it enabled, but not sure if this should ever be on x86 * drivers/med