Re: [PATCH v1 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock

2018-09-22 Thread Balbir Singh
On Wed, Sep 19, 2018 at 09:35:07AM +0200, David Hildenbrand wrote:
> Am 19.09.18 um 03:22 schrieb Balbir Singh:
> > On Tue, Sep 18, 2018 at 01:48:16PM +0200, David Hildenbrand wrote:
> >> Reading through the code and studying how mem_hotplug_lock is to be used,
> >> I noticed that there are two places where we can end up calling
> >> device_online()/device_offline() - online_pages()/offline_pages() without
> >> the mem_hotplug_lock. And there are other places where we call
> >> device_online()/device_offline() without the device_hotplug_lock.
> >>
> >> While e.g.
> >>echo "online" > /sys/devices/system/memory/memory9/state
> >> is fine, e.g.
> >>echo 1 > /sys/devices/system/memory/memory9/online
> >> Will not take the mem_hotplug_lock. However the device_lock() and
> >> device_hotplug_lock.
> >>
> >> E.g. via memory_probe_store(), we can end up calling
> >> add_memory()->online_pages() without the device_hotplug_lock. So we can
> >> have concurrent callers in online_pages(). We e.g. touch in online_pages()
> >> basically unprotected zone->present_pages then.
> >>
> >> Looks like there is a longer history to that (see Patch #2 for details),
> >> and fixing it to work the way it was intended is not really possible. We
> >> would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
> >> sounds wrong.
> >>
> >> Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
> >> More details can be found in patch 3 and patch 6.
> >>
> >> I propose the general rules (documentation added in patch 6):
> >>
> >> 1. add_memory/add_memory_resource() must only be called with
> >>device_hotplug_lock.
> >> 2. remove_memory() must only be called with device_hotplug_lock. This is
> >>already documented and holds for all callers.
> >> 3. device_online()/device_offline() must only be called with
> >>device_hotplug_lock. This is already documented and true for now in core
> >>code. Other callers (related to memory hotplug) have to be fixed up.
> >> 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
> >>online_pages/offline_pages.
> >>
> >> To me, this looks way cleaner than what we have right now (and easier to
> >> verify). And looking at the documentation of remove_memory, using
> >> lock_device_hotplug also for add_memory() feels natural.
> >>
> > 
> > That seems reasonable, but also implies that device_online() would hold
> > back add/remove memory, could you please also document what mode
> > read/write the locks need to be held? For example can the 
> > device_hotplug_lock
> > be held in read mode while add/remove memory via (mem_hotplug_lock) is held
> > in write mode?
> 
> device_hotplug_lock is an ordinary mutex. So no option there.
> 
> Only mem_hotplug_lock is a per CPU RW mutex. And as of now it only
> exists to not require get_online_mems()/put_online_mems() to take the
> device_hotplug_lock. Which is perfectly valid, because these users only
> care about memory (not any other devices) not suddenly vanish. And that
> RW lock makes things fast.
> 
> Any modifications (online/offline/add/remove) require the
> mem_hotplug_lock in write.
> 
> I can add some more details to documentation in patch #6.
> 
> "... we should always hold the mem_hotplug_lock (via
> mem_hotplug_begin/mem_hotplug_done) in write mode to serialize memory
> hotplug" ..."
> 
> "In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in
> read mode allows for a quite efficient get_online_mems/put_online_mems
> implementation, so code accessing memory can protect from that memory
> vanishing."
> 
> Would that work for you?

Yes, Thanks

Balbir Singh.


Re: [PATCH 4/4] dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops

2018-09-22 Thread Guenter Roeck
Hi,

On Mon, Aug 27, 2018 at 10:47:11AM +0200, Christoph Hellwig wrote:
> There is no reason to leave the per-device dma_ops around when
> deconfiguring a device, so move this code from arm64 into the
> common code.
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Robin Murphy 

This patch causes various ppc images to crash in -next due to NULL
DMA ops in dma_alloc_attrs().

Looking into the code, the macio driver tries to instantiate on
the :f0:0d.0 PCI address (the driver maps to all Apple PCI IDs)
and fails. This results in a call to arch_teardown_dma_ops(). Later,
the same device pointer is used to instantiate ohci-pci, which
crashes in probe because the dma_ops pointer has been cleared.

I don't claim to fully understand the code, but to me it looks like
the pci device is allocated and waiting for a driver to attach to.
See arch/powerpc/kernel/pci_of_scan.c:of_create_pci_dev().
If attaching a driver (here: macio) fails, the same device pointer
is then reused for the next matching driver until a match is found
and the device is successfully instantiated. Of course this fails
quite badly if the device pointer has been scrubbed after the first
failure.

I don't know if this is generic PCI behavior or ppc specific.
I am copying pci and ppc maintainers for additional input.

Either case, reverting the patch fixes the problem.

Guenter

---
ohci-pci :f0:0d.0: OHCI PCI host controller
ohci-pci :f0:0d.0: new USB bus registered, assigned bus number 1
[ cut here ]
kernel BUG at ./include/linux/dma-mapping.h:516!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PREEMPT SMP NR_CPUS=4 NUMA PowerMac
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
4.19.0-rc4-next-20180921-dirty #1
NIP:  c086b824 LR: c086b5dc CTR: c086dd70
REGS: c0003d30f0b0 TRAP: 0700   Tainted: GW
(4.19.0-rc4-next-20180921-dirty)
MSR:  8002b032   CR: 28008842  XER: 
IRQMASK: 0 
GPR00: c086b5dc c0003d30f330 c1199900 c0003d3ce898 
GPR04: c115b798 c0003d8c3a48   
GPR08:  ff00  0020 
GPR12: 24008442 c1299000 c000f720  
GPR16:     
GPR20:     
GPR24:  c10452e8 c1045420 0080 
GPR28: 001c c1045408 c0003d3ce898 c0003d8c3a30 
NIP [c086b824] .ohci_init+0x564/0x570
LR [c086b5dc] .ohci_init+0x31c/0x570
Call Trace:
[c0003d30f330] [c086b5dc] .ohci_init+0x31c/0x570 (unreliable)
[c0003d30f3c0] [c086de58] .ohci_pci_reset+0xa8/0xb0
[c0003d30f440] [c08335ec] .usb_add_hcd+0x35c/0x9b0
[c0003d30f510] [c084ea90] .usb_hcd_pci_probe+0x320/0x510
[c0003d30f5c0] [c06c7df8] .local_pci_probe+0x68/0x140
[c0003d30f660] [c06c92a4] .pci_device_probe+0x144/0x210
[c0003d30f710] [c074cd48] .really_probe+0x2a8/0x3c0
[c0003d30f7b0] [c074d100] .driver_probe_device+0x80/0x170
[c0003d30f840] [c074d33c] .__driver_attach+0x14c/0x150
[c0003d30f8d0] [c0749c6c] .bus_for_each_dev+0xac/0x100
[c0003d30f970] [c074c334] .driver_attach+0x34/0x50
[c0003d30f9f0] [c074b9b8] .bus_add_driver+0x178/0x2f0
[c0003d30fa90] [c074e560] .driver_register+0x90/0x1a0
[c0003d30fb10] [c06c707c] .__pci_register_driver+0x6c/0x90
[c0003d30fba0] [c0f39f14] .ohci_pci_init+0x90/0xac
[c0003d30fc10] [c000f380] .do_one_initcall+0x70/0x2d0
[c0003d30fce0] [c0edfca4] .kernel_init_freeable+0x3b8/0x4b0
[c0003d30fdb0] [c000f744] .kernel_init+0x24/0x160
[c0003d30fe30] [c000b7a4] .ret_from_kernel_thread+0x58/0x74

---
# bad: [46c163a036b41a29b0ec3c475bf97515d755ff41] Add linux-next specific files 
for 20180921
# good: [7876320f88802b22d4e2daf7eb027dd14175a0f8] Linux 4.19-rc4
git bisect start 'HEAD' 'v4.19-rc4'
# bad: [03b5533c4d89cc558063a98fa4201a5d2b4eb1f7] Merge remote-tracking branch 
'crypto/master'
git bisect bad 03b5533c4d89cc558063a98fa4201a5d2b4eb1f7
# bad: [62c54071a46255d59e26e95528b80bf432796cb4] Merge remote-tracking branch 
'v9fs/9p-next'
git bisect bad 62c54071a46255d59e26e95528b80bf432796cb4
# bad: [1eee72bfcf0977daca74e9f902956adbb4f38847] Merge remote-tracking branch 
'realtek/for-next'
git bisect bad 1eee72bfcf0977daca74e9f902956adbb4f38847
# good: [30d3220045f49c707bbeec1d35423bd60488c433] Merge remote-tracking branch 
'scsi-fixes/fixes'
git bisect good 30d3220045f49c707bbeec1d35423bd60488c433
# bad: [a31a9772a2aa569dc279468da4be555b737e51f8] Merge remote-tracking branch 
'at91/at91-next'
git bisect bad a31a9772a2aa569dc279468da4be555b737e51f8
# bad: [3f52a0ad91857e78a5feed28327eafa11c44412c] Merge remote-tracki

Re: [PATCH v8 0/3] powerpc: Detection and scheduler optimization for POWER9 bigcore

2018-09-22 Thread Gautham R Shenoy
Hi Dave,

On Thu, Sep 20, 2018 at 11:04:54AM -0700, Dave Hansen wrote:
> On 09/20/2018 10:22 AM, Gautham R. Shenoy wrote:
> >-
> >|L1 Cache   |
> >--
> >|L2| | | |  |
> >|  |  0  |  2  |  4  |  6   |Small Core0
> >|C | | | |  |
> > Big|a --
> > Core   |c | | | |  |
> >|h |  1  |  3  |  5  |  7   | Small Core1
> >|e | | | |  |
> >-
> >   | L1 Cache   |
> >   --
> 
> The scheduler already knows about shared caches.  Could you elaborate on
> how this is different from the situation today where we have multiple
> cores sharing an L2/L3?

The issue is not so much about the threads in the core sharing L2
cache. But the two group of threads in the core, each of which has its
own L1-cache. This patchset (the second patch in the series) informs
the scheduler of this distinction by defining the SMT sched-domain
have groups correspond to the threads that share L1 cache. With this
the scheduler will treat a pair of threads {1,2} differently from
{1,3} when threads 1 and 3 share the L1 cache, while 1 and 2 don't.

The next sched-domain (CACHE domain) is defined as the group of
threads that share the L2 cache, which happens to be the entire big
core.

Without this patchset, the SMT domain would be defined as the group of
threads that share L2 cache. Thus, the scheduler would treat any two
threads in the big-core in the same way, resulting in run-to-run
variance when the software threads are placed on pair of threads
within the same L1-cache group or on separate ones.

> 
> Adding the new sysfs stuff seems like overkill if that's all that you
> are trying to do.
>

The sysfs attributes are to inform the users that we have a big-core
configuration comprising of two small cores, thereby allowing them to
make informed choices should they want to pin the tasks to the CPUs.

--
Thanks and Regards
gautham.



Re: [PATCH] lib/xz: Put CRC32_POLY_LE in xz_private.h

2018-09-22 Thread Meelis Roos
> This fixes a regression introduced by faa16bc404d72a5 ("lib: Use
> existing define with polynomial").
> 
> The cleanup added a dependency on include/linux, which broke the PowerPC
> boot wrapper/decompresser when KERNEL_XZ is enabled:
> 
>   BOOTCC  arch/powerpc/boot/decompress.o
>  In file included from arch/powerpc/boot/../../../lib/decompress_unxz.c:233,
>  from arch/powerpc/boot/decompress.c:42:
>  arch/powerpc/boot/../../../lib/xz/xz_crc32.c:18:10: fatal error:
>  linux/crc32poly.h: No such file or directory
>   #include 
>^~~
> 
> The powerpc decompresser is a hairy corner of the kernel. Even while building
> a 64-bit kernel it needs to build a 32-bit binary and therefore avoid 
> including
> files from include/linux.
> 
> This allows users of the xz library to avoid including headers from
> 'include/linux/' while still achieving the cleanup of the magic number.
> 
> Fixes: faa16bc404d72a5 ("lib: Use existing define with polynomial")
> Reported-by: Meelis Roos 
> Reported-by: kbuild test robot 
> Suggested-by: Christophe LEROY 
> Signed-off-by: Joel Stanley 

Tested-by: Meelis Roos 

-- 
Meelis Roos (mr...@linux.ee)