Re: BUG: KASAN: use-after-free in bt_for_each+0x1ea/0x29f

2018-04-04 Thread Jens Axboe
On 4/4/18 5:28 PM, Ming Lei wrote:
> Hi,
> 
> The following warning is observed once when running dbench on NVMe with
> the linus tree(top commit is 642e7fd23353).
> 
> [ 1446.882043] 
> ==
> [ 1446.886884] BUG: KASAN: use-after-free in bt_for_each+0x1ea/0x29f
> [ 1446.888045] Read of size 8 at addr 880055a60a00 by task dbench/13443
> [ 1446.889660]
> [ 1446.889892] CPU: 1 PID: 13443 Comm: dbench Not tainted 
> 4.16.0_642e7fd23353_master+ #1
> [ 1446.891007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.10.2-2.fc27 04/01/2014
> [ 1446.892290] Call Trace:
> [ 1446.892641]  
> [ 1446.892937]  dump_stack+0xf0/0x191
> [ 1446.893600]  ? dma_direct_map_page+0x6f/0x6f
> [ 1446.894425]  ? show_regs_print_info+0xa/0xa
> [ 1446.895247]  ? ext4_writepages+0x196d/0x1e6d
> [ 1446.896063]  ? do_writepages+0x57/0xa3
> [ 1446.896810]  print_address_description+0x6e/0x23b
> [ 1446.897882]  ? bt_for_each+0x1ea/0x29f
> [ 1446.898693]  kasan_report+0x247/0x285
> [ 1446.899484]  bt_for_each+0x1ea/0x29f
> [ 1446.900233]  ? blk_mq_tagset_busy_iter+0xa3/0xa3
> [ 1446.901190]  ? generic_file_buffered_read+0x14b1/0x14b1
> [ 1446.903097]  ? blk_mq_hctx_mark_pending.isra.0+0x5c/0x5c
> [ 1446.904418]  ? bio_free+0x64/0xaa
> [ 1446.905113]  ? debug_lockdep_rcu_enabled+0x26/0x52
> [ 1446.906332]  ? bio_put+0x7a/0x10e
> [ 1446.906811]  ? debug_lockdep_rcu_enabled+0x26/0x52
> [ 1446.907527]  ? blk_mq_hctx_mark_pending.isra.0+0x5c/0x5c
> [ 1446.908334]  blk_mq_queue_tag_busy_iter+0xd0/0xde
> [ 1446.909023]  blk_mq_in_flight+0xb4/0xdb
> [ 1446.909619]  ? blk_mq_exit_hctx+0x190/0x190
> [ 1446.910281]  ? ext4_end_bio+0x25d/0x2a1
> [ 1446.911713]  part_in_flight+0xc0/0x2ac
> [ 1446.912470]  ? ext4_put_io_end_defer+0x277/0x277
> [ 1446.913465]  ? part_dec_in_flight+0x8f/0x8f
> [ 1446.914375]  ? __lock_acquire+0x38/0x8e5
> [ 1446.915182]  ? bio_endio+0x3d9/0x41c
> [ 1446.915936]  ? __rcu_read_unlock+0x134/0x180
> [ 1446.916796]  ? lock_acquire+0x2ba/0x32d
> [ 1446.917570]  ? blk_account_io_done+0xea/0x572
> [ 1446.918424]  part_round_stats+0x167/0x1a3
> [ 1446.919188]  ? part_round_stats_single.isra.1+0xc7/0xc7
> [ 1446.920187]  blk_account_io_done+0x34d/0x572
> [ 1446.921056]  ? blk_update_bidi_request+0x8f/0x8f
> [ 1446.921923]  ? blk_mq_run_hw_queue+0x13d/0x187
> [ 1446.922803]  blk_mq_end_request+0x3f/0xbf
> [ 1446.923631]  nvme_complete_rq+0x305/0x348 [nvme_core]
> [ 1446.924612]  ? nvme_delete_ctrl_sync+0x5c/0x5c [nvme_core]
> [ 1446.925696]  ? nvme_pci_complete_rq+0x1f6/0x20c [nvme]
> [ 1446.926673]  ? kfree+0x21c/0x2ab
> [ 1446.927317]  ? nvme_pci_complete_rq+0x1f6/0x20c [nvme]
> [ 1446.928239]  __blk_mq_complete_request+0x391/0x3ee
> [ 1446.928938]  ? blk_mq_free_request+0x479/0x479
> [ 1446.929588]  ? rcu_read_lock_bh_held+0x3a/0x3a
> [ 1446.930321]  ? enqueue_hrtimer+0x252/0x29a
> [ 1446.930938]  ? do_raw_spin_lock+0xd8/0xd8
> [ 1446.931532]  ? debug_lockdep_rcu_enabled+0x26/0x52
> [ 1446.932425]  blk_mq_complete_request+0x10e/0x159
> [ 1446.933341]  ? hctx_lock+0xe8/0xe8
> [ 1446.933985]  ? lock_contended+0x680/0x680
> [ 1446.934707]  ? lock_downgrade+0x338/0x338
> [ 1446.935463]  nvme_process_cq+0x26a/0x34d [nvme]
> [ 1446.936297]  ? nvme_init_hctx+0xa6/0xa6 [nvme]
> [ 1446.937150]  nvme_irq+0x23/0x51 [nvme]
> [ 1446.937864]  ? nvme_process_cq+0x34d/0x34d [nvme]
> [ 1446.938713]  __handle_irq_event_percpu+0x29d/0x568
> [ 1446.939516]  ? __irq_wake_thread+0x99/0x99
> [ 1446.940241]  ? rcu_user_enter+0x72/0x72
> [ 1446.940978]  ? do_timer+0x25/0x25
> [ 1446.941650]  ? do_raw_spin_unlock+0x146/0x179
> [ 1446.942514]  ? __lock_acquire+0x38/0x8e5
> [ 1446.943305]  ? debug_lockdep_rcu_enabled+0x26/0x52
> [ 1446.944242]  ? lock_acquire+0x32d/0x32d
> [ 1446.944995]  ? lock_contended+0x680/0x680
> [ 1446.945718]  handle_irq_event_percpu+0x7c/0xf7
> [ 1446.946438]  ? __handle_irq_event_percpu+0x568/0x568
> [ 1446.947124]  ? rcu_user_exit+0xa/0xa
> [ 1446.947781]  handle_irq_event+0x53/0x83
> [ 1446.948553]  handle_edge_irq+0x1f2/0x279
> [ 1446.949397]  handle_irq+0x1d8/0x1e9
> [ 1446.950094]  do_IRQ+0x90/0x12d
> [ 1446.950750]  common_interrupt+0xf/0xf
> [ 1446.951507]  
> [ 1446.951953] RIP: 0010:__blk_mq_get_tag+0x201/0x22d
> [ 1446.952894] RSP: 0018:880055b467a0 EFLAGS: 0246 ORIG_RAX: 
> ffdc
> [ 1446.954295] RAX:  RBX: 88005952f648 RCX: 
> 
> [ 1446.955641] RDX: 0259 RSI:  RDI: 
> ed000ab68d06
> [ 1446.956972] RBP: ed000ab68cf6 R08: 0007 R09: 
> 
> [ 1446.958356] R10: ed000a0ec0f2 R11: ed000a0ec0f1 R12: 
> 88007f113978
> [ 1446.959737] R13: 880055b46ce8 R14: dc00 R15: 
> 880058bf60c0
> [ 1446.961184]  ? modules_open+0x5e/0x5e
> [ 1446.961922]  ? blk_mq_unique_tag+0xc5/0xc5
> [ 1446.962748]  ? lock_acquire+0x32d/0x32d
> [ 1446.963534]  ? __rcu_read_unlock+0x134/0x180
> [ 1446.964393]  ? 

Re: 4.15.14 crash with iscsi target and dvd

2018-04-04 Thread Wakko Warner
Bart Van Assche wrote:
> On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I 
> > > > mount
> > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn 
> > > > from
> > > > the initiator with out problems.  I'll test other kernels between 4.9 
> > > > and
> > > > 4.14.
> > > 
> > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest 
> > > patch
> > > (except for 4.15 which was 1 behind)
> > > Each of these kernels crash within seconds or immediate of doing find 
> > > -type
> > > f | xargs cat > /dev/null from the initiator.
> > 
> > I tried 4.10.0.  It doesn't completely lockup the system, but the device
> > that was used hangs.  So from the initiator, it's /dev/sr1 and from the
> > target it's /dev/sr0.  Attempting to read /dev/sr0 after the oops causes the
> > process to hang in D state.
> 
> Hello Wakko,
> 
> Thank you for having narrowed down this further. I think that you encountered
> a regression either in the block layer core or in the SCSI core. Unfortunately
> the number of changes between kernel versions v4.9 and v4.10 in these two
> subsystems is huge. I see two possible ways forward:
> - Either that you perform a bisect to identify the patch that introduced this
>   regression. However, I'm not sure whether you are familiar with the bisect
>   process.
> - Or that you identify the command that triggers this crash such that others
>   can reproduce this issue without needing access to your setup.
> 
> How about reproducing this crash with the below patch applied on top of
> kernel v4.15.x? The additional output sent by this patch to the system log
> should allow us to reproduce this issue by submitting the same SCSI command
> with sg_raw.

Sorry for not getting back in touch.  My internet was down.  I haven't tried
the patch yet.  I'll try to get to that tomorrow.  The system with the issue
is busy and I can't reboot it right now.


BUG: KASAN: use-after-free in bt_for_each+0x1ea/0x29f

2018-04-04 Thread Ming Lei
Hi,

The following warning is observed once when running dbench on NVMe with
the linus tree(top commit is 642e7fd23353).

[ 1446.882043] 
==
[ 1446.886884] BUG: KASAN: use-after-free in bt_for_each+0x1ea/0x29f
[ 1446.888045] Read of size 8 at addr 880055a60a00 by task dbench/13443
[ 1446.889660]
[ 1446.889892] CPU: 1 PID: 13443 Comm: dbench Not tainted 
4.16.0_642e7fd23353_master+ #1
[ 1446.891007] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.10.2-2.fc27 04/01/2014
[ 1446.892290] Call Trace:
[ 1446.892641]  
[ 1446.892937]  dump_stack+0xf0/0x191
[ 1446.893600]  ? dma_direct_map_page+0x6f/0x6f
[ 1446.894425]  ? show_regs_print_info+0xa/0xa
[ 1446.895247]  ? ext4_writepages+0x196d/0x1e6d
[ 1446.896063]  ? do_writepages+0x57/0xa3
[ 1446.896810]  print_address_description+0x6e/0x23b
[ 1446.897882]  ? bt_for_each+0x1ea/0x29f
[ 1446.898693]  kasan_report+0x247/0x285
[ 1446.899484]  bt_for_each+0x1ea/0x29f
[ 1446.900233]  ? blk_mq_tagset_busy_iter+0xa3/0xa3
[ 1446.901190]  ? generic_file_buffered_read+0x14b1/0x14b1
[ 1446.903097]  ? blk_mq_hctx_mark_pending.isra.0+0x5c/0x5c
[ 1446.904418]  ? bio_free+0x64/0xaa
[ 1446.905113]  ? debug_lockdep_rcu_enabled+0x26/0x52
[ 1446.906332]  ? bio_put+0x7a/0x10e
[ 1446.906811]  ? debug_lockdep_rcu_enabled+0x26/0x52
[ 1446.907527]  ? blk_mq_hctx_mark_pending.isra.0+0x5c/0x5c
[ 1446.908334]  blk_mq_queue_tag_busy_iter+0xd0/0xde
[ 1446.909023]  blk_mq_in_flight+0xb4/0xdb
[ 1446.909619]  ? blk_mq_exit_hctx+0x190/0x190
[ 1446.910281]  ? ext4_end_bio+0x25d/0x2a1
[ 1446.911713]  part_in_flight+0xc0/0x2ac
[ 1446.912470]  ? ext4_put_io_end_defer+0x277/0x277
[ 1446.913465]  ? part_dec_in_flight+0x8f/0x8f
[ 1446.914375]  ? __lock_acquire+0x38/0x8e5
[ 1446.915182]  ? bio_endio+0x3d9/0x41c
[ 1446.915936]  ? __rcu_read_unlock+0x134/0x180
[ 1446.916796]  ? lock_acquire+0x2ba/0x32d
[ 1446.917570]  ? blk_account_io_done+0xea/0x572
[ 1446.918424]  part_round_stats+0x167/0x1a3
[ 1446.919188]  ? part_round_stats_single.isra.1+0xc7/0xc7
[ 1446.920187]  blk_account_io_done+0x34d/0x572
[ 1446.921056]  ? blk_update_bidi_request+0x8f/0x8f
[ 1446.921923]  ? blk_mq_run_hw_queue+0x13d/0x187
[ 1446.922803]  blk_mq_end_request+0x3f/0xbf
[ 1446.923631]  nvme_complete_rq+0x305/0x348 [nvme_core]
[ 1446.924612]  ? nvme_delete_ctrl_sync+0x5c/0x5c [nvme_core]
[ 1446.925696]  ? nvme_pci_complete_rq+0x1f6/0x20c [nvme]
[ 1446.926673]  ? kfree+0x21c/0x2ab
[ 1446.927317]  ? nvme_pci_complete_rq+0x1f6/0x20c [nvme]
[ 1446.928239]  __blk_mq_complete_request+0x391/0x3ee
[ 1446.928938]  ? blk_mq_free_request+0x479/0x479
[ 1446.929588]  ? rcu_read_lock_bh_held+0x3a/0x3a
[ 1446.930321]  ? enqueue_hrtimer+0x252/0x29a
[ 1446.930938]  ? do_raw_spin_lock+0xd8/0xd8
[ 1446.931532]  ? debug_lockdep_rcu_enabled+0x26/0x52
[ 1446.932425]  blk_mq_complete_request+0x10e/0x159
[ 1446.933341]  ? hctx_lock+0xe8/0xe8
[ 1446.933985]  ? lock_contended+0x680/0x680
[ 1446.934707]  ? lock_downgrade+0x338/0x338
[ 1446.935463]  nvme_process_cq+0x26a/0x34d [nvme]
[ 1446.936297]  ? nvme_init_hctx+0xa6/0xa6 [nvme]
[ 1446.937150]  nvme_irq+0x23/0x51 [nvme]
[ 1446.937864]  ? nvme_process_cq+0x34d/0x34d [nvme]
[ 1446.938713]  __handle_irq_event_percpu+0x29d/0x568
[ 1446.939516]  ? __irq_wake_thread+0x99/0x99
[ 1446.940241]  ? rcu_user_enter+0x72/0x72
[ 1446.940978]  ? do_timer+0x25/0x25
[ 1446.941650]  ? do_raw_spin_unlock+0x146/0x179
[ 1446.942514]  ? __lock_acquire+0x38/0x8e5
[ 1446.943305]  ? debug_lockdep_rcu_enabled+0x26/0x52
[ 1446.944242]  ? lock_acquire+0x32d/0x32d
[ 1446.944995]  ? lock_contended+0x680/0x680
[ 1446.945718]  handle_irq_event_percpu+0x7c/0xf7
[ 1446.946438]  ? __handle_irq_event_percpu+0x568/0x568
[ 1446.947124]  ? rcu_user_exit+0xa/0xa
[ 1446.947781]  handle_irq_event+0x53/0x83
[ 1446.948553]  handle_edge_irq+0x1f2/0x279
[ 1446.949397]  handle_irq+0x1d8/0x1e9
[ 1446.950094]  do_IRQ+0x90/0x12d
[ 1446.950750]  common_interrupt+0xf/0xf
[ 1446.951507]  
[ 1446.951953] RIP: 0010:__blk_mq_get_tag+0x201/0x22d
[ 1446.952894] RSP: 0018:880055b467a0 EFLAGS: 0246 ORIG_RAX: 
ffdc
[ 1446.954295] RAX:  RBX: 88005952f648 RCX: 
[ 1446.955641] RDX: 0259 RSI:  RDI: ed000ab68d06
[ 1446.956972] RBP: ed000ab68cf6 R08: 0007 R09: 
[ 1446.958356] R10: ed000a0ec0f2 R11: ed000a0ec0f1 R12: 88007f113978
[ 1446.959737] R13: 880055b46ce8 R14: dc00 R15: 880058bf60c0
[ 1446.961184]  ? modules_open+0x5e/0x5e
[ 1446.961922]  ? blk_mq_unique_tag+0xc5/0xc5
[ 1446.962748]  ? lock_acquire+0x32d/0x32d
[ 1446.963534]  ? __rcu_read_unlock+0x134/0x180
[ 1446.964393]  ? rcu_read_lock_bh_held+0x3a/0x3a
[ 1446.965282]  blk_mq_get_tag+0x1ad/0x67a
[ 1446.966079]  ? __blk_mq_tag_idle+0x44/0x44
[ 1446.966891]  ? wait_woken+0x13c/0x13c
[ 1446.967638]  ? debug_lockdep_rcu_enabled+0x26/0x52
[ 1446.968566]  ? lock_acquire+0x32d/0x32d
[ 

Re: [PATCH] blk-mq: order getting budget and driver tag

2018-04-04 Thread Jens Axboe
On 4/4/18 10:35 AM, Ming Lei wrote:
> This patch orders getting budget and driver tag by making sure to acquire
> driver tag after budget is got, this way can help to avoid the following
> race:
> 
> 1) before dispatch request from scheduler queue, get one budget first, then
> dequeue a request, call it request A.
> 
> 2) in another IO path for dispatching request B which is from hctx->dispatch,
> driver tag is got, then try to get budget in blk_mq_dispatch_rq_list(),
> unfortunately the budget is held by request A.
> 
> 3) meantime blk_mq_dispatch_rq_list() is called for dispatching request
> A, and try to get driver tag first, unfortunately no driver tag is
> available because the driver tag is held by request B
> 
> 4) both two IO pathes can't move on, and IO stall is caused.
> 
> This issue can be observed when running dbench on USB storage.

Good catch, this can trigger on anything potentially, but of course more
likely with limited budget and/or tag space. Classic ABBA deadlock.

-- 
Jens Axboe



Re: [PATCH V3 4/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-04-04 Thread Thomas Gleixner
On Wed, 4 Apr 2018, Ming Lei wrote:
> On Wed, Apr 04, 2018 at 10:25:16AM +0200, Thomas Gleixner wrote:
> > In the example above:
> > 
> > > > >   irq 39, cpu list 0,4
> > > > >   irq 40, cpu list 1,6
> > > > >   irq 41, cpu list 2,5
> > > > >   irq 42, cpu list 3,7
> > 
> > and assumed that at driver init time only CPU 0-3 are online then the
> > hotplug of CPU 4-7 will not result in any interrupt delivered to CPU 4-7.
> 
> Indeed, and I just tested this case, and found that no interrupts are
> delivered to CPU 4-7.
> 
> In theory, the affinity has been assigned to these irq vectors, and
> programmed to interrupt controller, I understand it should work.
> 
> Could you explain it a bit why interrupts aren't delivered to CPU 4-7?

As I explained before:

"If the device is already in use when the offline CPUs get hot plugged, then
 the interrupts still stay on cpu 0-3 because the effective affinity of
 interrupts on X86 (and other architectures) is always a single CPU."

IOW. If you set the affinity mask so it contains more than one CPU then the
kernel selects a single CPU as target. The selected CPU must be online and
if there is more than one online CPU in the mask then the kernel picks the
one which has the least number of interrupts targeted at it. This selected
CPU target is programmed into the corresponding interrupt chip
(IOAPIC/MSI/MSIX) and it stays that way until the selected target CPU
goes offline or the affinity mask changes.

The reasons why we use single target delivery on X86 are:

   1) Not all X86 systems support multi target delivery

   2) If a system supports multi target delivery then the interrupt is
  preferrably delivered to the CPU with the lowest APIC ID (which
  usually corresponds to the lowest CPU number) due to hardware magic
  and only a very small percentage of interrupts are delivered to the
  other CPUs in the multi target set. So the benefit is rather dubious
  and extensive performance testing did not show any significant
  difference.

   3) The management of multi targets on the software side is painful as
  the same low level vector number has to be allocated on all possible
  target CPUs. That's making a lot of things including hotplug more
  complex for very little - if at all - benefit.

So at some point we ripped out the multi target support on X86 and moved
everything to single target delivery mode.

Other architectures never supported multi target delivery either due to
hardware restrictions or for similar reasons why X86 dropped it. There
might be a few architectures which support it, but I have no overview at
the moment.

The information is in procfs

# cat /proc/irq/9/smp_affinity_list 
0-3
# cat /proc/irq/9/effective_affinity_list 
1

# cat /proc/irq/10/smp_affinity_list 
0-3
# cat /proc/irq/10/effective_affinity_list 
2

smp_affinity[_list] is the affinity which is set either by the kernel or by
writing to /proc/irq/$N/smp_affinity[_list]

effective_affinity[_list] is the affinity which is effective, i.e. the
single target CPU to which the interrupt is affine at this point.

As you can see in the above examples the target CPU is selected from the
given possible target set and the internal spreading of the low level x86
vector allocation code picks a CPU which has the lowest number of
interrupts targeted at it.

Let's assume for the example below

# cat /proc/irq/10/smp_affinity_list 
0-3
# cat /proc/irq/10/effective_affinity_list 
2

that CPU 3 was offline when the device was initialized. So there was no way
to select it and when CPU 3 comes online there is no reason to change the
affinity of that interrupt, at least not from the kernel POV. Actually we
don't even have a mechanism to do so automagically.

If I offline CPU 2 after onlining CPU 3 then the kernel has to move the
interrupt away from CPU 2, so it selects CPU 3 as it's the one with the
lowest number of interrupts targeted at it.

Now this is a bit different if you use affinity managed interrupts like
NVME and other devices do.

Many of these devices create one queue per possible CPU, so the spreading
is simple; One interrupt per possible cpu. Pretty boring.

When the device has less queues than possible CPUs, then stuff gets more
interesting. The queues and therefore the interrupts must be targeted at
multiple CPUs. There is some logic which spreads them over the numa nodes
and takes siblings into account when Hyperthreading is enabled.

In both cases the managed interrupts are handled over CPU soft
hotplug/unplug:

  1) If a CPU is soft unplugged and an interrupt is targeted at the CPU
 then the interrupt is either moved to a still online CPU in the
 affinity mask or if the outgoing CPU is the last one in the affinity
 mask it is shut down.

  2) If a CPU is soft plugged then the interrupts are scanned and the ones
 which are managed and shutdown checked whether the affinity mask
 contains the upcoming CPU. If that's the 

[RFC PATCH 04/79] pipe: add inode field to struct pipe_inode_info

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Pipes are associated with a file and thus an inode, store a pointer
back to the inode in struct pipe_inode_info, this will be use when
testing pages haven't been truncated.

Signed-off-by: Jérôme Glisse 
Cc: Eric Biggers 
Cc: Kees Cook 
Cc: Joe Lawrence 
Cc: Willy Tarreau 
Cc: Andrew Morton 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
Cc: Jeff Layton 
---
 fs/pipe.c | 2 ++
 fs/splice.c   | 1 +
 include/linux/pipe_fs_i.h | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index 7b1954caf388..41e115b0bde7 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -715,6 +715,7 @@ static struct inode * get_pipe_inode(void)
 
inode->i_pipe = pipe;
pipe->files = 2;
+   pipe->inode = inode;
pipe->readers = pipe->writers = 1;
inode->i_fop = _fops;
 
@@ -903,6 +904,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
pipe = alloc_pipe_info();
if (!pipe)
return -ENOMEM;
+   pipe->inode = inode;
pipe->files = 1;
spin_lock(>i_lock);
if (unlikely(inode->i_pipe)) {
diff --git a/fs/splice.c b/fs/splice.c
index 39e2dc01ac12..acab52a7fe56 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -927,6 +927,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
splice_desc *sd,
 * PIPE_READERS appropriately.
 */
pipe->readers = 1;
+   pipe->inode = file_inode(in);
 
current->splice_pipe = pipe;
}
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 5a3bb3b7c9ad..171aa78ebbf0 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -44,6 +44,7 @@ struct pipe_buffer {
  * @fasync_writers: writer side fasync
  * @bufs: the circular array of pipe buffers
  * @user: the user who created this pipe
+ * @inode: inode this pipe is associated to
  **/
 struct pipe_inode_info {
struct mutex mutex;
@@ -60,6 +61,7 @@ struct pipe_inode_info {
struct fasync_struct *fasync_writers;
struct pipe_buffer *bufs;
struct user_struct *user;
+   struct inode *inode;
 };
 
 /*
-- 
2.14.3



[RFC PATCH 06/79] mm/page: add helpers to dereference struct page index field

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Regroup all helpers that dereference struct page.index field into one
place and require a the address_space (mapping) against which caller
is looking the index (offset, pgoff, ...)

Signed-off-by: Jérôme Glisse 
Cc: linux...@kvack.org
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 include/linux/mm-page.h | 136 
 include/linux/mm.h  |   5 ++
 2 files changed, 141 insertions(+)
 create mode 100644 include/linux/mm-page.h

diff --git a/include/linux/mm-page.h b/include/linux/mm-page.h
new file mode 100644
index ..2981db45eeef
--- /dev/null
+++ b/include/linux/mm-page.h
@@ -0,0 +1,136 @@
+/*
+ * Copyright 2018 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Authors: Jérôme Glisse 
+ */
+/*
+ * This header file regroup everything that deal with struct page and has no
+ * outside dependency except basic types header files.
+ */
+/* Protected against rogue include ... do not include this file directly */
+#ifdef DOT_NOT_INCLUDE___INSIDE_MM
+#ifndef MM_PAGE_H
+#define MM_PAGE_H
+
+/* External struct dependencies: */
+struct address_space;
+
+/* External function dependencies: */
+extern pgoff_t __page_file_index(struct page *page);
+
+
+/*
+ * _page_index() - return page index value (with special case for swap)
+ * @page: page struct pointer for which we want the index value
+ * @mapping: mapping against which we want the page index
+ * Returns: index value for the page in the given mapping
+ *
+ * The index value of a page is against a given mapping and page which belongs
+ * to swap cache need special handling. For swap cache page what we want is the
+ * swap offset which is store encoded with other fields in page->private.
+ */
+static inline unsigned long _page_index(struct page *page,
+   struct address_space *mapping)
+{
+   if (unlikely(PageSwapCache(page)))
+   return __page_file_index(page);
+   return page->index;
+}
+
+/*
+ * _page_set_index() - set page index value against a give mapping
+ * @page: page struct pointer for which we want the index value
+ * @mapping: mapping against which we want the page index
+ * @index: index value to set
+ */
+static inline void _page_set_index(struct page *page,
+   struct address_space *mapping,
+   unsigned long index)
+{
+   page->index = index;
+}
+
+/*
+ * _page_to_index() - page index value against a give mapping
+ * @page: page struct pointer for which we want the index value
+ * @mapping: mapping against which we want the page index
+ * Returns: index value for the page in the given mapping
+ *
+ * The index value of a page is against a given mapping. THP page need special
+ * handling has the index is set in the head page thus the final index value is
+ * the tail page index plus the number of page from current page to head page.
+ */
+static inline unsigned long _page_to_index(struct page *page,
+   struct address_space *mapping)
+{
+   unsigned long pgoff;
+
+   if (likely(!PageTransTail(page)))
+   return page->index;
+
+   /*
+*  We don't initialize ->index for tail pages: calculate based on
+*  head page
+*/
+   pgoff = compound_head(page)->index;
+   pgoff += page - compound_head(page);
+   return pgoff;
+}
+
+/*
+ * _page_to_pgoff() - page pgoff value against a give mapping
+ * @page: page struct pointer for which we want the index value
+ * @mapping: mapping against which we want the page index
+ * Returns: pgoff value for the page in the given mapping
+ *
+ * The pgoff value of a page is against a given mapping. Hugetlb pages need
+ * special handling as for they have page->index in size of the huge pages
+ * (PMD_SIZE or  PUD_SIZE), not in PAGE_SIZE as other types of pages.
+ *
+ * FIXME convert hugetlb to multi-order entries.
+ */
+static inline unsigned long _page_to_pgoff(struct page *page,
+   struct address_space *mapping)
+{
+   if (unlikely(PageHeadHuge(page)))
+   return page->index << compound_order(page);
+
+   return _page_to_index(page, mapping);
+}
+
+/*
+ * _page_offset() - page offset (in bytes) against a give mapping
+ * @page: page 

[RFC PATCH 05/79] mm/swap: add an helper to get address_space from swap_entry_t

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Each swap entry is associated to a file and thus an address_space.
That address_space is use for reading/writing to swap storage. This
patch add an helper to get the address_space from swap_entry_t.

Signed-off-by: Jérôme Glisse 
Cc: Michal Hocko 
Cc: Johannes Weiner 
Cc: Andrew Morton 
---
 include/linux/swap.h | 1 +
 mm/swapfile.c| 7 +++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index a1a3f4ed94ce..e2155df84d77 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -475,6 +475,7 @@ extern int __swp_swapcount(swp_entry_t entry);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
 extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
+struct address_space *swap_entry_to_address_space(swp_entry_t swap);
 extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index c7a33717d079..a913d4b45866 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3467,6 +3467,13 @@ struct swap_info_struct *swp_swap_info(swp_entry_t entry)
return swap_info[swp_type(entry)];
 }
 
+struct address_space *swap_entry_to_address_space(swp_entry_t swap)
+{
+   struct swap_info_struct *sis = swp_swap_info(swap);
+
+   return sis->swap_file->f_mapping;
+}
+
 struct swap_info_struct *page_swap_info(struct page *page)
 {
swp_entry_t entry = { .val = page_private(page) };
-- 
2.14.3



[RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue

2018-04-04 Thread jglisse
From: Jérôme Glisse 

https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc

This is an RFC for LSF/MM discussions. It impacts the file subsystem,
the block subsystem and the mm subsystem. Hence it would benefit from
a cross sub-system discussion.

Patchset is not fully bake so take it with a graint of salt. I use it
to illustrate the fact that it is doable and now that i did it once i
believe i have a better and cleaner plan in my head on how to do this.
I intend to share and discuss it at LSF/MM (i still need to write it
down). That plan lead to quite different individual steps than this
patchset takes and his also easier to split up in more manageable
pieces.

I also want to apologize for the size and number of patches (and i am
not even sending them all).

--
The Why ?

I have two objectives: duplicate memory read only accross nodes and or
devices and work around PCIE atomic limitations. More on each of those
objective below. I also want to put forward that it can solve the page
wait list issue ie having each page with its own wait list and thus
avoiding long wait list traversale latency recently reported [1].

It does allow KSM for file back pages (truely generic KSM even between
both anonymous and file back page). I am not sure how useful this can
be, this was not an objective i did pursue, this is just a for free
feature (see below).

[1] https://groups.google.com/forum/#!topic/linux.kernel/Iit1P5BNyX8

--
Per page wait list, so long page_waitqueue() !

Not implemented in this RFC but below is the logic and pseudo code
at bottom of this email.

When there is a contention on struct page lock bit, the caller which
is trying to lock the page will add itself to a waitqueue. The issues
here is that multiple pages share the same wait queue and on large
system with a lot of ram this means we can quickly get to a long list
of waiters for differents pages (or for the same page) on the same
list [1].

The present patchset virtualy kills all places that need to access the
page->mapping field and only a handfull are left, namely for testing
page truncation and for vmscan. The former can be remove if we reuse
the PG_waiters flag for a new PG_truncate flag set on truncation then
we can virtualy kill all derefence of page->mapping (this patchset
proves it is doable). NOTE THIS DOES NOT MEAN THAT MAPPING is FREE TO
BE USE BY ANYONE TO STORE WHATEVER IN STRUCT PAGE. SORRY NO !

What this means whenever a thread want to spin on page until it can
lock it then it can carefully replace the page->mapping with a waiter
struct for a wait list. Thus each page under contention will have its
own wait list.

The fact that there is not many place that dereference page.mapping
is important because this means now that any dereference must be done
with preemption disabled (inside rcu read section) so that the waiter
can free the waiter struct without fear for hazard (the struct is on
the stack like today). Pseudo code at the end of this mail.

Devil is in the details but after long meditation and pondering on
this i believe this is a do-able solution. Note it does not rely on
the write protection, nor does it technically need to kill all struct
page mapping derefence. But the latter can really hurt performance if
they have to be done under rcu read lock and the corresponding grace
period needed before freeing waiter struct.

--
KSM for everyone !

With generic write protection you can do KSM for file back page too
(even if they have different offset, mapping or buffer_head). While i
believe page sharing for containers is already solve with overlayfs,
this might still be an interesting feature for some.

Oh and crazy to crazy you can merge private anonymous page and file
back page together ... Probably totaly useless but cool like crazy.

--
KDM (Kernel Duplicate Memory)

Most kernel development, especialy in mm sub-system, is about how to
save resources, how to share as much of them as possible so that we
maximize their availabilities for all the processes.

Objective here is slightly different. Some user favor performances and
already have properly sized system (ie they have enough resources for
the task at hand). For performance it is sometimes better to use more
resources to improve other parameters of the performance equation.

This is especialy true for big system that either use several devices
or spread accross several nodes or both. For those, sharing memory
means peer to peer traffic. This can become a bottleneck and saturate
the interconnect between those peers.

If some data set under consideration is access read only then we can
duplicate memory backing it on multiple nodes/devices. Access is then
local to 

[RFC PATCH 07/79] mm/page: add helpers to find mapping give a page and buffer head

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For now this simply use exist page_mapping() inline. Latter it will
use buffer head pointer as a key to lookup mapping for write protected
page.

Signed-off-by: Jérôme Glisse 
Cc: linux...@kvack.org
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 include/linux/mm-page.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/mm-page.h b/include/linux/mm-page.h
index 2981db45eeef..647a8a8cf9ba 100644
--- a/include/linux/mm-page.h
+++ b/include/linux/mm-page.h
@@ -132,5 +132,17 @@ static inline unsigned long _page_file_offset(struct page 
*page,
return page->index << PAGE_SHIFT;
 }
 
+/*
+ * fs_page_mapping_get_with_bh() - page mapping knowing buffer_head
+ * @page: page struct pointer for which we want the mapping
+ * @bh: buffer_head associated with the page for the mapping
+ * Returns: page mapping for the given buffer head
+ */
+static inline struct address_space *fs_page_mapping_get_with_bh(
+   struct page *page, struct buffer_head *bh)
+{
+   return page_mapping(page);
+}
+
 #endif /* MM_PAGE_H */
 #endif /* DOT_NOT_INCLUDE___INSIDE_MM */
-- 
2.14.3



[RFC PATCH 09/79] fs: add struct address_space to read_cache_page() callback argument

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct address_space to callback arguments of read_cache_page()
and read_cache_pages(). Note this patch only add arguments and modify
callback function signature, it does not make use of the new argument
and thus it should be regression free.

One step toward dropping reliance on page->mapping.

Signed-off-by: Jérôme Glisse 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
Cc: Jeff Layton 
---
 drivers/staging/lustre/lustre/mdc/mdc_request.c |  3 ++-
 fs/9p/vfs_addr.c| 13 -
 fs/afs/file.c   |  7 ---
 fs/afs/internal.h   |  2 +-
 fs/exofs/inode.c|  5 +++--
 fs/fuse/file.c  |  3 ++-
 fs/gfs2/aops.c  |  5 +++--
 fs/jffs2/file.c |  6 --
 fs/jffs2/fs.c   |  2 +-
 fs/jffs2/os-linux.h |  3 ++-
 fs/nfs/dir.c|  3 ++-
 fs/nfs/read.c   |  3 ++-
 fs/nfs/symlink.c|  6 --
 include/linux/pagemap.h |  8 ++--
 mm/filemap.c| 14 +++---
 mm/readahead.c  |  4 ++--
 16 files changed, 61 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c 
b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 03e55bca4ada..4814ef083824 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -1122,7 +1122,8 @@ struct readpage_param {
  * in PAGE_SIZE (if PAGE_SIZE greater than LU_PAGE_SIZE), and the
  * lu_dirpage for this integrated page will be adjusted.
  **/
-static int mdc_read_page_remote(void *data, struct page *page0)
+static int mdc_read_page_remote(void *data, struct address_space *mapping,
+   struct page *page0)
 {
struct readpage_param *rp = data;
struct page **page_pool;
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index e1cbdfdb7c68..61f70e63a525 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -99,6 +99,17 @@ static int v9fs_vfs_readpage(struct file *filp, struct page 
*page)
return v9fs_fid_readpage(filp->private_data, page);
 }
 
+/*
+ * This wrapper is needed to avoid forcing callback cast on read_cache_pages()
+ * and defeating compiler figuring out we are doing something wrong.
+ */
+static int v9fs_vfs_readpage_filler(void *data, struct address_space *mapping,
+   struct page *page)
+{
+   return v9fs_vfs_readpage(data, page);
+}
+
+
 /**
  * v9fs_vfs_readpages - read a set of pages from 9P
  *
@@ -122,7 +133,7 @@ static int v9fs_vfs_readpages(struct file *filp, struct 
address_space *mapping,
if (ret == 0)
return ret;
 
-   ret = read_cache_pages(mapping, pages, (void *)v9fs_vfs_readpage, filp);
+   ret = read_cache_pages(mapping, pages, v9fs_vfs_readpage_filler, filp);
p9_debug(P9_DEBUG_VFS, "  = %d\n", ret);
return ret;
 }
diff --git a/fs/afs/file.c b/fs/afs/file.c
index a39192ced99e..f457b0144946 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -247,7 +247,8 @@ int afs_fetch_data(struct afs_vnode *vnode, struct key 
*key, struct afs_read *de
 /*
  * read page from file, directory or symlink, given a key to use
  */
-int afs_page_filler(void *data, struct page *page)
+int afs_page_filler(void *data, struct address_space *mapping,
+   struct page *page)
 {
struct inode *inode = page->mapping->host;
struct afs_vnode *vnode = AFS_FS_I(inode);
@@ -373,14 +374,14 @@ static int afs_readpage(struct file *file, struct page 
*page)
if (file) {
key = afs_file_key(file);
ASSERT(key != NULL);
-   ret = afs_page_filler(key, page);
+   ret = afs_page_filler(key, page->mapping, page);
} else {
struct inode *inode = page->mapping->host;
key = afs_request_key(AFS_FS_S(inode->i_sb)->cell);
if (IS_ERR(key)) {
ret = PTR_ERR(key);
} else {
-   ret = afs_page_filler(key, page);
+   ret = afs_page_filler(key, page->mapping, page);
key_put(key);
}
}
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index f38d6a561a84..4c449145f668 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -656,7 +656,7 @@ extern void afs_put_wb_key(struct afs_wb_key *);
 extern int 

[RFC PATCH 08/79] mm/page: add helpers to find page mapping and private given a bio

2018-04-04 Thread jglisse
From: Jérôme Glisse 

When page undergo io it is associated with a unique bio and thus we can
use it to lookup other page fields which are relevant only for the bio
under consideration.

Note this only apply when page is special ie page->mapping is pointing
to some special structure which is not a valid struct address_space.

Signed-off-by: Jérôme Glisse 
Cc: linux...@kvack.org
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 include/linux/mm-page.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/mm-page.h b/include/linux/mm-page.h
index 647a8a8cf9ba..6ec3ba19b1a4 100644
--- a/include/linux/mm-page.h
+++ b/include/linux/mm-page.h
@@ -24,6 +24,7 @@
 
 /* External struct dependencies: */
 struct address_space;
+struct bio;
 
 /* External function dependencies: */
 extern pgoff_t __page_file_index(struct page *page);
@@ -144,5 +145,13 @@ static inline struct address_space 
*fs_page_mapping_get_with_bh(
return page_mapping(page);
 }
 
+static inline void bio_page_mapping_and_private(struct page *page,
+   struct bio *bio, struct address_space **mappingp,
+   unsigned long *privatep)
+{
+   *mappingp = page->mapping;
+   *privatep = page_private(page);
+}
+
 #endif /* MM_PAGE_H */
 #endif /* DOT_NOT_INCLUDE___INSIDE_MM */
-- 
2.14.3



[RFC PATCH 22/79] fs: add struct inode to block_read_full_page() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct inode to block_read_full_page(). Note this patch only add
arguments and modify call site conservatily using page->mapping and
thus the end result is as before this patch.

One step toward dropping reliance on page->mapping.

Signed-off-by: Jérôme Glisse 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
Cc: Jeff Layton 
---
 fs/adfs/inode.c | 2 +-
 fs/affs/file.c  | 2 +-
 fs/befs/linuxvfs.c  | 3 ++-
 fs/bfs/file.c   | 2 +-
 fs/block_dev.c  | 2 +-
 fs/buffer.c | 4 ++--
 fs/efs/inode.c  | 2 +-
 fs/ext4/readpage.c  | 3 ++-
 fs/freevxfs/vxfs_subr.c | 2 +-
 fs/hfs/inode.c  | 2 +-
 fs/hfsplus/inode.c  | 3 ++-
 fs/minix/inode.c| 2 +-
 fs/mpage.c  | 2 +-
 fs/ocfs2/aops.c | 3 ++-
 fs/ocfs2/refcounttree.c | 3 ++-
 fs/omfs/file.c  | 2 +-
 fs/qnx4/inode.c | 2 +-
 fs/reiserfs/inode.c | 3 ++-
 fs/sysv/itree.c | 2 +-
 fs/ufs/inode.c  | 3 ++-
 include/linux/buffer_head.h | 2 +-
 21 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c
index 1100d5da84d0..2270ab3d5392 100644
--- a/fs/adfs/inode.c
+++ b/fs/adfs/inode.c
@@ -45,7 +45,7 @@ static int adfs_writepage(struct address_space *mapping, 
struct page *page,
 static int adfs_readpage(struct file *file, struct address_space *mapping,
 struct page *page)
 {
-   return block_read_full_page(page, adfs_get_block);
+   return block_read_full_page(page->mapping->host, page, adfs_get_block);
 }
 
 static void adfs_write_failed(struct address_space *mapping, loff_t to)
diff --git a/fs/affs/file.c b/fs/affs/file.c
index 55ab72c1b228..136cb90f332f 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -379,7 +379,7 @@ static int affs_writepage(struct address_space *mapping, 
struct page *page,
 static int affs_readpage(struct file *file, struct address_space *mapping,
 struct page *page)
 {
-   return block_read_full_page(page, affs_get_block);
+   return block_read_full_page(page->mapping->host, page, affs_get_block);
 }
 
 static void affs_write_failed(struct address_space *mapping, loff_t to)
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index f6844b4ae77f..4436123674d3 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -112,7 +112,8 @@ static int
 befs_readpage(struct file *file, struct address_space *mapping,
  struct page *page)
 {
-   return block_read_full_page(page, befs_get_block);
+   return block_read_full_page(page->mapping->host, page,
+   befs_get_block);
 }
 
 static sector_t
diff --git a/fs/bfs/file.c b/fs/bfs/file.c
index 1c4593429f7d..b1255ee4cd75 100644
--- a/fs/bfs/file.c
+++ b/fs/bfs/file.c
@@ -160,7 +160,7 @@ static int bfs_writepage(struct address_space *mapping, 
struct page *page,
 static int bfs_readpage(struct file *file, struct address_space *mapping,
struct page *page)
 {
-   return block_read_full_page(page, bfs_get_block);
+   return block_read_full_page(page->mapping->host, page, bfs_get_block);
 }
 
 static void bfs_write_failed(struct address_space *mapping, loff_t to)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 2bf1b17aeff3..9ac6bf760272 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -571,7 +571,7 @@ static int blkdev_writepage(struct address_space *mapping, 
struct page *page,
 static int blkdev_readpage(struct file * file, struct address_space *mapping,
   struct page * page)
 {
-   return block_read_full_page(page, blkdev_get_block);
+   return block_read_full_page(page->mapping->host,page,blkdev_get_block);
 }
 
 static int blkdev_readpages(struct file *file, struct address_space *mapping,
diff --git a/fs/buffer.c b/fs/buffer.c
index 99818e876ad8..aa7d9be68581 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2231,9 +2231,9 @@ EXPORT_SYMBOL(block_is_partially_uptodate);
  * set/clear_buffer_uptodate() functions propagate buffer state into the
  * page struct once IO has completed.
  */
-int block_read_full_page(struct page *page, get_block_t *get_block)
+int block_read_full_page(struct inode *inode, struct page *page,
+get_block_t *get_block)
 {
-   struct inode *inode = page->mapping->host;
sector_t iblock, lblock;
struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
unsigned int blocksize, bbits;
diff --git a/fs/efs/inode.c b/fs/efs/inode.c
index 05aab4a5e8a1..a2f47227124e 100644
--- a/fs/efs/inode.c
+++ b/fs/efs/inode.c
@@ -16,7 +16,7 @@
 static int efs_readpage(struct file 

[RFC PATCH 24/79] fs: add struct inode to nobh_writepage() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct inode to nobh_writepage(). Note this patch only add arguments
and modify call site conservatily using page->mapping and thus the end
result is as before this patch.

One step toward dropping reliance on page->mapping.

Signed-off-by: Jérôme Glisse 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
Cc: Jeff Layton 
---
 fs/buffer.c | 5 ++---
 fs/ext2/inode.c | 2 +-
 fs/gfs2/aops.c  | 3 ++-
 include/linux/buffer_head.h | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index aa7d9be68581..31298f4f0300 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2730,10 +2730,9 @@ EXPORT_SYMBOL(nobh_write_end);
  * that it tries to operate without attaching bufferheads to
  * the page.
  */
-int nobh_writepage(struct page *page, get_block_t *get_block,
-   struct writeback_control *wbc)
+int nobh_writepage(struct inode *inode, struct page *page,
+   get_block_t *get_block, struct writeback_control *wbc)
 {
-   struct inode * const inode = page->mapping->host;
loff_t i_size = i_size_read(inode);
const pgoff_t end_index = i_size >> PAGE_SHIFT;
unsigned offset;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 37439d1e544c..11b3c3e7ea65 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -926,7 +926,7 @@ static int ext2_nobh_writepage(struct address_space 
*mapping,
struct page *page,
struct writeback_control *wbc)
 {
-   return nobh_writepage(page, ext2_get_block, wbc);
+   return nobh_writepage(page->mapping->host, page, ext2_get_block, wbc);
 }
 
 static sector_t ext2_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 8cfd4c7d884c..ff02313b86e6 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -142,7 +142,8 @@ static int gfs2_writepage(struct address_space *mapping, 
struct page *page,
if (ret <= 0)
return ret;
 
-   return nobh_writepage(page, gfs2_get_block_noalloc, wbc);
+   return nobh_writepage(page->mapping->host, page,
+ gfs2_get_block_noalloc, wbc);
 }
 
 /* This is the same as calling block_write_full_page, but it also
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index cab143668834..fb68a3358330 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -265,8 +265,8 @@ int nobh_write_end(struct file *, struct address_space *,
loff_t, unsigned, unsigned,
struct page *, void *);
 int nobh_truncate_page(struct address_space *, loff_t, get_block_t *);
-int nobh_writepage(struct page *page, get_block_t *get_block,
-struct writeback_control *wbc);
+int nobh_writepage(struct inode *inode, struct page *page,
+   get_block_t *get_block, struct writeback_control *wbc);
 
 void buffer_init(void);
 
-- 
2.14.3



[RFC PATCH 27/79] fs: add struct address_space to fscache_read*() callback arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct address_space to fscache_read*() callback argument. Note
this patch only add arguments and modify call site conservatily using
page->mapping and thus the end result is as before this patch.

One step toward dropping reliance on page->mapping.

Signed-off-by: Jérôme Glisse 
Cc: David Howells 
Cc: linux-cach...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
Cc: Jeff Layton 
---
 fs/9p/cache.c |  4 +++-
 fs/afs/file.c |  4 +++-
 fs/ceph/cache.c   | 10 ++
 fs/cifs/fscache.c |  6 --
 fs/fscache/page.c |  1 +
 fs/nfs/fscache.c  |  4 +++-
 include/linux/fscache-cache.h |  2 +-
 include/linux/fscache.h   |  9 ++---
 8 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index 8185bfe4492f..3f122d35c54d 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -273,7 +273,8 @@ void __v9fs_fscache_invalidate_page(struct address_space 
*mapping,
}
 }
 
-static void v9fs_vfs_readpage_complete(struct page *page, void *data,
+static void v9fs_vfs_readpage_complete(struct address_space *mapping,
+  struct page *page, void *data,
   int error)
 {
if (!error)
@@ -299,6 +300,7 @@ int __v9fs_readpage_from_fscache(struct inode *inode, 
struct page *page)
return -ENOBUFS;
 
ret = fscache_read_or_alloc_page(v9inode->fscache,
+page->mapping,
 page,
 v9fs_vfs_readpage_complete,
 NULL,
diff --git a/fs/afs/file.c b/fs/afs/file.c
index f87e997b9df9..23ff51343dd3 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -203,7 +203,8 @@ void afs_put_read(struct afs_read *req)
 /*
  * deal with notification that a page was read from the cache
  */
-static void afs_file_readpage_read_complete(struct page *page,
+static void afs_file_readpage_read_complete(struct address_space *mapping,
+   struct page *page,
void *data,
int error)
 {
@@ -271,6 +272,7 @@ int afs_page_filler(void *data, struct address_space 
*mapping,
/* is it cached? */
 #ifdef CONFIG_AFS_FSCACHE
ret = fscache_read_or_alloc_page(vnode->cache,
+page->mapping,
 page,
 afs_file_readpage_read_complete,
 NULL,
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index a3ab265d3215..14438f1ed7e0 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -266,7 +266,9 @@ void ceph_fscache_file_set_cookie(struct inode *inode, 
struct file *filp)
}
 }
 
-static void ceph_readpage_from_fscache_complete(struct page *page, void *data, 
int error)
+static void ceph_readpage_from_fscache_complete(struct address_space *mapping,
+   struct page *page, void *data,
+   int error)
 {
if (!error)
SetPageUptodate(page);
@@ -293,9 +295,9 @@ int ceph_readpage_from_fscache(struct inode *inode, struct 
page *page)
if (!cache_valid(ci))
return -ENOBUFS;
 
-   ret = fscache_read_or_alloc_page(ci->fscache, page,
-ceph_readpage_from_fscache_complete, 
NULL,
-GFP_KERNEL);
+   ret = fscache_read_or_alloc_page(ci->fscache, page->mapping, page,
+ceph_readpage_from_fscache_complete,
+NULL, GFP_KERNEL);
 
switch (ret) {
case 0: /* Page found */
diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c
index 8d4b7bc8ae91..25f259a83fe0 100644
--- a/fs/cifs/fscache.c
+++ b/fs/cifs/fscache.c
@@ -140,7 +140,8 @@ int cifs_fscache_release_page(struct page *page, gfp_t gfp)
return 1;
 }
 
-static void cifs_readpage_from_fscache_complete(struct page *page, void *ctx,
+static void cifs_readpage_from_fscache_complete(struct address_space *mapping,
+   struct page *page, void *ctx,
int error)
 {
cifs_dbg(FYI, "%s: (0x%p/%d)\n", __func__, page, error);
@@ -158,7 +159,8 @@ int __cifs_readpage_from_fscache(struct inode *inode, 
struct page *page)
 
cifs_dbg(FYI, "%s: (fsc:%p, p:%p, i:0x%p\n",
 

[RFC PATCH 20/79] fs: add struct address_space to write_cache_pages() callback argument

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct address_space to callback arguments of write_cache_pages()
Note this patch only add arguments and modify all callback functions
signature, it does not make use of the new argument and thus it should
be regression free.

One step toward dropping reliance on page->mapping.

Signed-off-by: Jérôme Glisse 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
Cc: Jeff Layton 
---
 fs/exofs/inode.c  | 2 +-
 fs/ext4/inode.c   | 7 +++
 fs/fuse/file.c| 1 +
 fs/mpage.c| 6 +++---
 fs/nfs/write.c| 4 +++-
 fs/xfs/xfs_aops.c | 3 ++-
 include/linux/writeback.h | 4 ++--
 mm/page-writeback.c   | 9 -
 8 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 41f6b04cbfca..54d6b7dbd4e7 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -691,7 +691,7 @@ static int write_exec(struct page_collect *pcol)
  * previous segment and will start a new collection.
  * Eventually caller must submit the last segment if present.
  */
-static int writepage_strip(struct page *page,
+static int writepage_strip(struct page *page, struct address_space *mapping,
   struct writeback_control *wbc_unused, void *data)
 {
struct page_collect *pcol = data;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 96dcae1937c8..63bf0160c579 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2697,10 +2697,9 @@ static int mpage_prepare_extent_to_map(struct 
mpage_da_data *mpd)
return err;
 }
 
-static int __writepage(struct page *page, struct writeback_control *wbc,
-  void *data)
+static int __writepage(struct page *page, struct address_space *mapping,
+  struct writeback_control *wbc, void *data)
 {
-   struct address_space *mapping = data;
int ret = ext4_writepage(mapping, page, wbc);
mapping_set_error(mapping, ret);
return ret;
@@ -2746,7 +2745,7 @@ static int ext4_writepages(struct address_space *mapping,
struct blk_plug plug;
 
blk_start_plug();
-   ret = write_cache_pages(mapping, wbc, __writepage, mapping);
+   ret = write_cache_pages(mapping, wbc, __writepage, NULL);
blk_finish_plug();
goto out_writepages;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 3c602632b33a..e0562d04d84f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1794,6 +1794,7 @@ static bool fuse_writepage_in_flight(struct fuse_req 
*new_req,
 }
 
 static int fuse_writepages_fill(struct page *page,
+   struct address_space *mapping,
struct writeback_control *wbc, void *_data)
 {
struct fuse_fill_wb_data *data = _data;
diff --git a/fs/mpage.c b/fs/mpage.c
index b03a82d5b908..d25f08f46090 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -479,8 +479,8 @@ void clean_page_buffers(struct page *page)
clean_buffers(page, ~0U);
 }
 
-static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
- void *data)
+static int __mpage_writepage(struct page *page, struct address_space *_mapping,
+struct writeback_control *wbc, void *data)
 {
struct mpage_data *mpd = data;
struct bio *bio = mpd->bio;
@@ -734,7 +734,7 @@ int mpage_writepage(struct page *page, get_block_t 
get_block,
.get_block = get_block,
.use_writepage = 0,
};
-   int ret = __mpage_writepage(page, wbc, );
+   int ret = __mpage_writepage(page, page->mapping, wbc, );
if (mpd.bio) {
int op_flags = (wbc->sync_mode == WB_SYNC_ALL ?
  REQ_SYNC : 0);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 1f7723eff542..ffab026b9632 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -693,7 +693,9 @@ int nfs_writepage(struct address_space *mapping, struct 
page *page,
return ret;
 }
 
-static int nfs_writepages_callback(struct page *page, struct writeback_control 
*wbc, void *data)
+static int nfs_writepages_callback(struct page *page,
+  struct address_space *mapping,
+  struct writeback_control *wbc, void *data)
 {
int ret;
 
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 981a2a4e00e5..00922a82ede6 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1060,6 +1060,7 @@ xfs_writepage_map(
 STATIC int
 xfs_do_writepage(
struct page *page,
+   struct address_space*mapping,
struct writeback_control *wbc,
void*data)
 {
@@ -1179,7 +1180,7 @@ xfs_vm_writepage(
};
int   

[RFC PATCH 26/79] fs: add struct address_space to mpage_readpage() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct address_space to mpage_readpage(). Note this patch only add
arguments and modify call site conservatily using page->mapping and thus
the end result is as before this patch.

One step toward dropping reliance on page->mapping.

Signed-off-by: Jérôme Glisse 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
Cc: Jeff Layton 
---
 fs/ext2/inode.c   |  2 +-
 fs/fat/inode.c|  2 +-
 fs/gfs2/aops.c|  2 +-
 fs/hpfs/file.c|  2 +-
 fs/isofs/inode.c  |  2 +-
 fs/jfs/inode.c|  2 +-
 fs/mpage.c| 14 --
 fs/nilfs2/inode.c |  2 +-
 fs/qnx6/inode.c   |  2 +-
 fs/udf/inode.c|  2 +-
 fs/xfs/xfs_aops.c |  2 +-
 include/linux/mpage.h |  3 ++-
 12 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 11b3c3e7ea65..33873c0a4c14 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -872,7 +872,7 @@ static int ext2_writepage(struct address_space *mapping, 
struct page *page,
 static int ext2_readpage(struct file *file, struct address_space *mapping,
 struct page *page)
 {
-   return mpage_readpage(page, ext2_get_block);
+   return mpage_readpage(page->mapping, page, ext2_get_block);
 }
 
 static int
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 4b70dcbcd192..9e6bc6364468 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -197,7 +197,7 @@ static int fat_writepages(struct address_space *mapping,
 static int fat_readpage(struct file *file, struct address_space *mapping,
struct page *page)
 {
-   return mpage_readpage(page, fat_get_block);
+   return mpage_readpage(page->mapping, page, fat_get_block);
 }
 
 static int fat_readpages(struct file *file, struct address_space *mapping,
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index ff02313b86e6..b42775bba6a1 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -524,7 +524,7 @@ static int __gfs2_readpage(void *file, struct address_space 
*mapping,
error = stuffed_readpage(ip, page);
unlock_page(page);
} else {
-   error = mpage_readpage(page, gfs2_block_map);
+   error = mpage_readpage(page->mapping, page, gfs2_block_map);
}
 
if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags)))
diff --git a/fs/hpfs/file.c b/fs/hpfs/file.c
index 3f2cc3fcee80..620dd9709a2c 100644
--- a/fs/hpfs/file.c
+++ b/fs/hpfs/file.c
@@ -118,7 +118,7 @@ static int hpfs_get_block(struct inode *inode, sector_t 
iblock, struct buffer_he
 static int hpfs_readpage(struct file *file, struct address_space *mapping,
 struct page *page)
 {
-   return mpage_readpage(page, hpfs_get_block);
+   return mpage_readpage(page->mapping, page, hpfs_get_block);
 }
 
 static int hpfs_writepage(struct address_space *mapping, struct page *page,
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 541d89e0621a..7d73b1036321 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -1171,7 +1171,7 @@ struct buffer_head *isofs_bread(struct inode *inode, 
sector_t block)
 static int isofs_readpage(struct file *file, struct address_space *mapping,
  struct page *page)
 {
-   return mpage_readpage(page, isofs_get_block);
+   return mpage_readpage(page->mapping, page, isofs_get_block);
 }
 
 static int isofs_readpages(struct file *file, struct address_space *mapping,
diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
index be71214f4937..be6da161bc81 100644
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
@@ -297,7 +297,7 @@ static int jfs_writepages(struct address_space *mapping,
 static int jfs_readpage(struct file *file, struct address_space *mapping,
struct page *page)
 {
-   return mpage_readpage(page, jfs_get_block);
+   return mpage_readpage(page->mapping, page, jfs_get_block);
 }
 
 static int jfs_readpages(struct file *file, struct address_space *mapping,
diff --git a/fs/mpage.c b/fs/mpage.c
index 8800bcde5f4e..52a6028e2066 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -143,12 +143,13 @@ map_buffer_to_page(struct inode *inode, struct page *page,
  * get_block() call.
  */
 static struct bio *
-do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
+do_mpage_readpage(struct bio *bio, struct address_space *mapping,
+   struct page *page, unsigned nr_pages,
sector_t *last_block_in_bio, struct buffer_head *map_bh,
unsigned long *first_logical_block, get_block_t get_block,
gfp_t gfp)
 {
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = mapping->host;
const unsigned blkbits = inode->i_blkbits;
const unsigned 

[RFC PATCH 30/79] fs/block: add struct address_space to __block_write_begin() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct address_space to __block_write_begin() arguments.

One step toward dropping reliance on page->mapping.

--
identifier M;
expression E1, E2, E3, E4;
@@
struct address_space *M;
...
-__block_write_begin(E1, E2, E3, E4)
+__block_write_begin(M, E1, E2, E3, E4)

@exists@
identifier M, F;
expression E1, E2, E3, E4;
@@
F(..., struct address_space *M, ...) {...
-__block_write_begin(E1, E2, E3, E4)
+__block_write_begin(M, E1, E2, E3, E4)
...}

@exists@
identifier I;
expression E1, E2, E3, E4, E5;
@@
struct inode *I;
...
-__block_write_begin(E1, E2, E3, E4)
+__block_write_begin(I->i_mapping, E1, E2, E3, E4)

@exists@
identifier I, F;
expression E1, E2, E3, E4;
@@
F(..., struct inode *I, ...) {...
-__block_write_begin(E1, E2, E3, E4)
+__block_write_begin(I->i_mapping, E1, E2, E3, E4)
...}

@exists@
identifier P;
expression E1, E2, E3, E4, E5;
@@
struct page *P;
...
-__block_write_begin(E1, E2, E3, E4)
+__block_write_begin(P->mapping, E1, E2, E3, E4)

@exists@
identifier P, F;
expression E1, E2, E3, E4;
@@
F(..., struct page *P, ...) {...
-__block_write_begin(E1, E2, E3, E4)
+__block_write_begin(P->mapping, E1, E2, E3, E4)
...}
--

Signed-off-by: Jérôme Glisse 
Cc: Jens Axboe 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/buffer.c | 10 +-
 fs/ext2/dir.c   |  3 ++-
 fs/ext4/inline.c|  7 ---
 fs/ext4/inode.c |  8 +---
 fs/gfs2/aops.c  |  2 +-
 fs/minix/inode.c|  3 ++-
 fs/nilfs2/dir.c |  3 ++-
 fs/ocfs2/file.c |  2 +-
 fs/reiserfs/inode.c |  8 +---
 fs/sysv/itree.c |  2 +-
 fs/ufs/inode.c  |  3 ++-
 include/linux/buffer_head.h |  4 ++--
 12 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8b2eb3dfb539..de16588d7f7f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2028,8 +2028,8 @@ int __block_write_begin_int(struct page *page, loff_t 
pos, unsigned len,
return err;
 }
 
-int __block_write_begin(struct page *page, loff_t pos, unsigned len,
-   get_block_t *get_block)
+int __block_write_begin(struct address_space *mapping, struct page *page,
+   loff_t pos, unsigned len, get_block_t *get_block)
 {
return __block_write_begin_int(page, pos, len, get_block, NULL);
 }
@@ -2090,7 +2090,7 @@ int block_write_begin(struct address_space *mapping, 
loff_t pos, unsigned len,
if (!page)
return -ENOMEM;
 
-   status = __block_write_begin(page, pos, len, get_block);
+   status = __block_write_begin(mapping, page, pos, len, get_block);
if (unlikely(status)) {
unlock_page(page);
put_page(page);
@@ -2495,7 +2495,7 @@ int block_page_mkwrite(struct vm_area_struct *vma, struct 
vm_fault *vmf,
else
end = PAGE_SIZE;
 
-   ret = __block_write_begin(page, 0, end, get_block);
+   ret = __block_write_begin(inode->i_mapping, page, 0, end, get_block);
if (!ret)
ret = block_commit_write(page, 0, end);
 
@@ -2579,7 +2579,7 @@ int nobh_write_begin(struct address_space *mapping,
*fsdata = NULL;
 
if (page_has_buffers(page)) {
-   ret = __block_write_begin(page, pos, len, get_block);
+   ret = __block_write_begin(mapping, page, pos, len, get_block);
if (unlikely(ret))
goto out_release;
return ret;
diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 3b8114def693..0d116d4e923c 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -453,7 +453,8 @@ ino_t ext2_inode_by_name(struct inode *dir, const struct 
qstr *child)
 
 static int ext2_prepare_chunk(struct page *page, loff_t pos, unsigned len)
 {
-   return __block_write_begin(page, pos, len, ext2_get_block);
+   return __block_write_begin(page->mapping, page, pos, len,
+  ext2_get_block);
 }
 
 /* Releases the page */
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 70cf4c7b268a..ffdbd443c67a 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -580,10 +580,11 @@ static int ext4_convert_inline_data_to_extent(struct 
address_space *mapping,
goto out;
 
if (ext4_should_dioread_nolock(inode)) {
-   ret = __block_write_begin(page, from, to,
+   ret = __block_write_begin(mapping, page, from, to,
  ext4_get_block_unwritten);
} else
-   ret = __block_write_begin(page, 

[RFC PATCH 28/79] fs: introduce page_is_truncated() helper

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Simple helper to unify all truncation test to one logic. This also
unify logic that was bit different in various places.

Convertion done using following coccinelle spatch on fs and mm dir:
-
@@
struct page * ppage;
@@
-!ppage->mapping
+page_is_truncated(ppage, mapping)

@@
struct page * ppage;
@@
-ppage->mapping != mapping
+page_is_truncated(ppage, mapping)

@@
struct page * ppage;
@@
-ppage->mapping != inode->i_mapping
+page_is_truncated(ppage, inode->i_mapping)
-

Followed by:
git checkout mm/migrate.c mm/huge_memory.c mm/memory-failure.c
git checkout mm/memcontrol.c fs/ext4/page-io.c fs/reiserfs/journal.c

Hand editing:
mm/memory.c do_page_mkwrite()
fs/splice.c splice_to_pipe()
fs/nfs/dir.c cache_page_release()
fs/xfs/xfs_aops.c xfs_check_page_type()
fs/xfs/xfs_aops.c xfs_vm_set_page_dirty()
fs/buffer.c mark_buffer_write_io_error()
fs/buffer.c page_cache_seek_hole_data()

Signed-off-by: Jérôme Glisse 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
Cc: Jeff Layton 

fixup! fs: introduce page_is_truncated() helper
---
 drivers/staging/lustre/lustre/llite/llite_mmap.c |  7 +--
 fs/9p/vfs_file.c |  2 +-
 fs/afs/write.c   |  2 +-
 fs/btrfs/extent_io.c |  4 ++--
 fs/btrfs/file.c  |  2 +-
 fs/btrfs/inode.c |  7 ---
 fs/btrfs/ioctl.c |  6 +++---
 fs/btrfs/scrub.c |  2 +-
 fs/buffer.c  |  8 
 fs/ceph/addr.c   |  6 +++---
 fs/cifs/file.c   |  2 +-
 fs/ext4/inode.c  | 10 +-
 fs/ext4/mballoc.c|  8 
 fs/f2fs/checkpoint.c |  4 ++--
 fs/f2fs/data.c   |  8 
 fs/f2fs/file.c   |  2 +-
 fs/f2fs/super.c  |  2 +-
 fs/fuse/file.c   |  2 +-
 fs/gfs2/aops.c   |  2 +-
 fs/gfs2/file.c   |  4 ++--
 fs/iomap.c   |  2 +-
 fs/nfs/dir.c |  2 +-
 fs/nilfs2/file.c |  2 +-
 fs/ocfs2/aops.c  |  2 +-
 fs/ocfs2/mmap.c  |  2 +-
 fs/splice.c  |  2 +-
 fs/ubifs/file.c  |  2 +-
 fs/xfs/xfs_aops.c|  8 +---
 include/linux/pagemap.h  | 16 
 mm/filemap.c | 12 ++--
 mm/memory.c  |  5 -
 mm/page-writeback.c  |  2 +-
 mm/truncate.c| 12 ++--
 33 files changed, 92 insertions(+), 67 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c 
b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index c0533bd6f352..6a9d310a7bfd 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -191,7 +191,7 @@ static int ll_page_mkwrite0(struct vm_area_struct *vma, 
struct page *vmpage,
struct ll_inode_info *lli = ll_i2info(inode);
 
lock_page(vmpage);
-   if (!vmpage->mapping) {
+   if (page_is_truncated(vmpage, inode->i_mapping)) {
unlock_page(vmpage);
 
/* page was truncated and lock was cancelled, return
@@ -341,10 +341,13 @@ static int ll_fault(struct vm_fault *vmf)
LASSERT(!(result & VM_FAULT_LOCKED));
if (result == 0) {
struct page *vmpage = vmf->page;
+   struct address_space *mapping;
+
+   mapping = vmf->vma->vm_file ? vmf->vma->vm_file->f_mapping : 0;
 
/* check if this page has been truncated */
lock_page(vmpage);
-   if (unlikely(!vmpage->mapping)) { /* unlucky */
+   if (unlikely(page_is_truncated(vmpage, mapping))) { /* unlucky 
*/
unlock_page(vmpage);
put_page(vmpage);
vmf->page = NULL;
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 03c9e325bfbc..bf71ea1d7ff6 100644
--- 

[RFC PATCH 32/79] fs/block: do not rely on page->mapping get it from the context

2018-04-04 Thread jglisse
From: Jérôme Glisse 

This patch remove most dereference of page->mapping and get the mapping
from the call context (either already available in the function or by
adding it to function arguments).

Signed-off-by: Jérôme Glisse 
Cc: Jens Axboe 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/block_dev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 502b6643bc74..dd9da97615e3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -564,14 +564,14 @@ EXPORT_SYMBOL(thaw_bdev);
 static int blkdev_writepage(struct address_space *mapping, struct page *page,
struct writeback_control *wbc)
 {
-   return block_write_full_page(page->mapping->host, page,
+   return block_write_full_page(mapping->host, page,
 blkdev_get_block, wbc);
 }
 
 static int blkdev_readpage(struct file * file, struct address_space *mapping,
   struct page * page)
 {
-   return block_read_full_page(page->mapping->host,page,blkdev_get_block);
+   return block_read_full_page(mapping->host,page,blkdev_get_block);
 }
 
 static int blkdev_readpages(struct file *file, struct address_space *mapping,
@@ -1941,7 +1941,7 @@ EXPORT_SYMBOL_GPL(blkdev_read_iter);
 static int blkdev_releasepage(struct address_space *mapping,
  struct page *page, gfp_t wait)
 {
-   struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super;
+   struct super_block *super = BDEV_I(mapping->host)->bdev.bd_super;
 
if (super && super->s_op->bdev_try_to_free_page)
return super->s_op->bdev_try_to_free_page(super, page, wait);
-- 
2.14.3



[RFC PATCH 31/79] fs/block: add struct address_space to __block_write_begin_int() args

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct address_space to __block_write_begin_int() arguments.

One step toward dropping reliance on page->mapping.

--
@exists@
identifier M;
expression E1, E2, E3, E4, E5;
@@
struct address_space *M;
...
-__block_write_begin_int(E1, E2, E3, E4, E5)
+__block_write_begin_int(M, E1, E2, E3, E4, E5)

@exists@
identifier M, F;
expression E1, E2, E3, E4, E5;
@@
F(..., struct address_space *M, ...) {...
-__block_write_begin_int(E1, E2, E3, E4, E5)
+__block_write_begin_int(M, E1, E2, E3, E4, E5)
...}
--

Signed-off-by: Jérôme Glisse 
Cc: Jens Axboe 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/buffer.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index de16588d7f7f..c83878d0a4c0 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1943,8 +1943,9 @@ iomap_to_bh(struct inode *inode, sector_t block, struct 
buffer_head *bh,
}
 }
 
-int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
-   get_block_t *get_block, struct iomap *iomap)
+int __block_write_begin_int(struct address_space *mapping, struct page *page,
+   loff_t pos, unsigned len, get_block_t *get_block,
+   struct iomap *iomap)
 {
unsigned from = pos & (PAGE_SIZE - 1);
unsigned to = from + len;
@@ -2031,7 +2032,8 @@ int __block_write_begin_int(struct page *page, loff_t 
pos, unsigned len,
 int __block_write_begin(struct address_space *mapping, struct page *page,
loff_t pos, unsigned len, get_block_t *get_block)
 {
-   return __block_write_begin_int(page, pos, len, get_block, NULL);
+   return __block_write_begin_int(mapping, page, pos, len, get_block,
+  NULL);
 }
 EXPORT_SYMBOL(__block_write_begin);
 
-- 
2.14.3



[RFC PATCH 33/79] fs/journal: add struct super_block to jbd2_journal_forget() arguments.

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For the holy crusade to stop relying on struct page mapping field, add
struct super_block to jbd2_journal_forget() arguments.

spatch --sp-file zemantic-010a.spatch --in-place --dir fs/
--
@exists@
expression E1, E2;
identifier I;
@@
struct super_block *I;
...
-jbd2_journal_forget(E1, E2)
+jbd2_journal_forget(E1, I, E2)

@exists@
expression E1, E2;
identifier F, I;
@@
F(..., struct super_block *I, ...) {
...
-jbd2_journal_forget(E1, E2)
+jbd2_journal_forget(E1, I, E2)
...
}

@exists@
expression E1, E2;
identifier I;
@@
struct block_device *I;
...
-jbd2_journal_forget(E1, E2)
+jbd2_journal_forget(E1, I->bd_super, E2)

@exists@
expression E1, E2;
identifier F, I;
@@
F(..., struct block_device *I, ...) {
...
-jbd2_journal_forget(E1, E2)
+jbd2_journal_forget(E1, I->bd_super, E2)
...
}

@exists@
expression E1, E2;
identifier I;
@@
struct inode *I;
...
-jbd2_journal_forget(E1, E2)
+jbd2_journal_forget(E1, I->i_sb, E2)

@exists@
expression E1, E2;
identifier F, I;
@@
F(..., struct inode *I, ...) {
...
-jbd2_journal_forget(E1, E2)
+jbd2_journal_forget(E1, I->i_sb, E2)
...
}
--

Signed-off-by: Jérôme Glisse 
Cc: "Theodore Ts'o" 
Cc: Jan Kara 
Cc: linux-e...@vger.kernel.org
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
---
 fs/ext4/ext4_jbd2.c   | 2 +-
 fs/jbd2/revoke.c  | 2 +-
 fs/jbd2/transaction.c | 3 ++-
 include/linux/jbd2.h  | 3 ++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 2d593201cf7a..0804d564b529 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -224,7 +224,7 @@ int __ext4_forget(const char *where, unsigned int line, 
handle_t *handle,
(!is_metadata && !ext4_should_journal_data(inode))) {
if (bh) {
BUFFER_TRACE(bh, "call jbd2_journal_forget");
-   err = jbd2_journal_forget(handle, bh);
+   err = jbd2_journal_forget(handle, inode->i_sb, bh);
if (err)
ext4_journal_abort_handle(where, line, __func__,
  bh, handle, err);
diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index 696ef15ec942..b6e2fd52acd6 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -381,7 +381,7 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long 
long blocknr,
set_buffer_revokevalid(bh);
if (bh_in) {
BUFFER_TRACE(bh_in, "call jbd2_journal_forget");
-   jbd2_journal_forget(handle, bh_in);
+   jbd2_journal_forget(handle, bdev->bd_super, bh_in);
} else {
BUFFER_TRACE(bh, "call brelse");
__brelse(bh);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index ac311037d7a5..e8c50bb5822c 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1482,7 +1482,8 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct 
buffer_head *bh)
  * Allow this call even if the handle has aborted --- it may be part of
  * the caller's cleanup after an abort.
  */
-int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
+int jbd2_journal_forget (handle_t *handle, struct super_block *sb,
+struct buffer_head *bh)
 {
transaction_t *transaction = handle->h_transaction;
journal_t *journal;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index b708e5169d1d..d89749a179eb 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1358,7 +1358,8 @@ extern int jbd2_journal_get_undo_access(handle_t 
*, struct buffer_head *);
 voidjbd2_journal_set_triggers(struct buffer_head *,
   struct jbd2_buffer_trigger_type 
*type);
 extern int  jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *);
-extern int  jbd2_journal_forget (handle_t *, struct buffer_head *);
+extern int  jbd2_journal_forget (handle_t *, struct super_block *sb,
+   struct buffer_head *);
 extern void journal_sync_buffer (struct buffer_head *);
 extern int  jbd2_journal_invalidatepage(journal_t *,
struct page *, unsigned int, unsigned int);
-- 
2.14.3



[RFC PATCH 29/79] fs/block: add struct address_space to bdev_write_page() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct address_space to bdev_write_page() arguments.

One step toward dropping reliance on page->mapping.

Signed-off-by: Jérôme Glisse 
Cc: Jens Axboe 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/block_dev.c | 4 +++-
 fs/mpage.c | 2 +-
 include/linux/blkdev.h | 5 +++--
 mm/page_io.c   | 7 ---
 4 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9ac6bf760272..502b6643bc74 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -678,6 +678,7 @@ EXPORT_SYMBOL_GPL(bdev_read_page);
  * bdev_write_page() - Start writing a page to a block device
  * @bdev: The device to write the page to
  * @sector: The offset on the device to write the page to (need not be aligned)
+ * @mapping: The address space the page belongs to
  * @page: The page to write
  * @wbc: The writeback_control for the write
  *
@@ -694,7 +695,8 @@ EXPORT_SYMBOL_GPL(bdev_read_page);
  * Return: negative errno if an error occurs, 0 if submission was successful.
  */
 int bdev_write_page(struct block_device *bdev, sector_t sector,
-   struct page *page, struct writeback_control *wbc)
+   struct address_space *mapping, struct page *page,
+   struct writeback_control *wbc)
 {
int result;
const struct block_device_operations *ops = bdev->bd_disk->fops;
diff --git a/fs/mpage.c b/fs/mpage.c
index 52a6028e2066..a75cea232f1a 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -619,7 +619,7 @@ static int __mpage_writepage(struct page *page, struct 
address_space *_mapping,
if (bio == NULL) {
if (first_unmapped == blocks_per_page) {
if (!bdev_write_page(bdev, blocks[0] << (blkbits - 9),
-   page, wbc))
+   mapping, page, wbc))
goto out;
}
bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ed63f3b69c12..0cf66b6993f4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -2053,8 +2053,9 @@ struct block_device_operations {
 extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
 unsigned long);
 extern int bdev_read_page(struct block_device *, sector_t, struct page *);
-extern int bdev_write_page(struct block_device *, sector_t, struct page *,
-   struct writeback_control *);
+extern int bdev_write_page(struct block_device *bdev, sector_t sector,
+   struct address_space *mapping, struct page *page,
+   struct writeback_control *wbc);
 
 #ifdef CONFIG_BLK_DEV_ZONED
 bool blk_req_needs_zone_write_lock(struct request *rq);
diff --git a/mm/page_io.c b/mm/page_io.c
index 402231dd1286..6e548b588490 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -282,12 +282,12 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
struct bio *bio;
int ret;
struct swap_info_struct *sis = page_swap_info(page);
+   struct file *swap_file = sis->swap_file;
+   struct address_space *mapping = swap_file->f_mapping;
 
VM_BUG_ON_PAGE(!PageSwapCache(page), page);
if (sis->flags & SWP_FILE) {
struct kiocb kiocb;
-   struct file *swap_file = sis->swap_file;
-   struct address_space *mapping = swap_file->f_mapping;
struct bio_vec bv = {
.bv_page = page,
.bv_len  = PAGE_SIZE,
@@ -325,7 +325,8 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
return ret;
}
 
-   ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc);
+   ret = bdev_write_page(sis->bdev, swap_page_sector(page),
+ mapping, page, wbc);
if (!ret) {
count_swpout_vm_event(page);
return 0;
-- 
2.14.3



[RFC PATCH 34/79] fs/journal: add struct inode to jbd2_journal_revoke() arguments.

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For the holy crusade to stop relying on struct page mapping field, add
struct super_block to jbd2_journal_revoke() arguments.

spatch --sp-file zemantic-011a.spatch --in-place --dir fs/
--
@exists@
expression E1, E2, E3;
identifier I;
@@
struct super_block *I;
...
-jbd2_journal_revoke(E1, E2, E3)
+jbd2_journal_revoke(E1, E2, I, E3)

@exists@
expression E1, E2, E3;
identifier F, I;
@@
F(..., struct super_block *I, ...) {
...
-jbd2_journal_revoke(E1, E2, E3)
+jbd2_journal_revoke(E1, E2, I, E3)
...
}

@exists@
expression E1, E2, E3;
identifier I;
@@
struct inode *I;
...
-jbd2_journal_revoke(E1, E2, E3)
+jbd2_journal_revoke(E1, E2, I->i_sb, E3)

@exists@
expression E1, E2, E3;
identifier F, I;
@@
F(..., struct inode *I, ...) {
...
-jbd2_journal_revoke(E1, E2, E3)
+jbd2_journal_revoke(E1, E2, I->i_sb, E3)
...
}
--

Signed-off-by: Jérôme Glisse 
Cc: "Theodore Ts'o" 
Cc: Jan Kara 
Cc: linux-e...@vger.kernel.org
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org

Signed-off-by: Jérôme Glisse 
---
 fs/ext4/ext4_jbd2.c  | 2 +-
 fs/jbd2/revoke.c | 2 +-
 include/linux/jbd2.h | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 0804d564b529..5529badca994 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -237,7 +237,7 @@ int __ext4_forget(const char *where, unsigned int line, 
handle_t *handle,
 * data!=journal && (is_metadata || should_journal_data(inode))
 */
BUFFER_TRACE(bh, "call jbd2_journal_revoke");
-   err = jbd2_journal_revoke(handle, blocknr, bh);
+   err = jbd2_journal_revoke(handle, blocknr, inode->i_sb, bh);
if (err) {
ext4_journal_abort_handle(where, line, __func__,
  bh, handle, err);
diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index b6e2fd52acd6..71e690ad9d44 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -320,7 +320,7 @@ void jbd2_journal_destroy_revoke(journal_t *journal)
  */
 
 int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
-  struct buffer_head *bh_in)
+   struct super_block *sb, struct buffer_head *bh_in)
 {
struct buffer_head *bh = NULL;
journal_t *journal;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index d89749a179eb..c5133df80fd4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1450,7 +1450,8 @@ extern void  
jbd2_journal_destroy_revoke_caches(void);
 extern intjbd2_journal_init_revoke_caches(void);
 
 extern void   jbd2_journal_destroy_revoke(journal_t *);
-extern intjbd2_journal_revoke (handle_t *, unsigned long long, struct 
buffer_head *);
+extern intjbd2_journal_revoke (handle_t *, unsigned long long,
+   struct super_block *, struct buffer_head *);
 extern intjbd2_journal_cancel_revoke(handle_t *, struct journal_head 
*);
 extern void   jbd2_journal_write_revoke_records(transaction_t *transaction,
 struct list_head 
*log_bufs);
-- 
2.14.3



[RFC PATCH 38/79] fs/buffer: add first buffer flag for first buffer_head in a page

2018-04-04 Thread jglisse
From: Jérôme Glisse 

A common pattern in code is that we have a buffer_head and we want to
get the first buffer_head in buffer_head list for a page. Before this
patch it was simply done with page_buffers(bh->b_page).

This patch introduce an helper bh_first_for_page(struct buffer_head *)
which can use a new flag (also introduced in this patch) to find the
first buffer_head struct for a given page.

This patch use page_buffers(bh->b_page) for now but latter patch can
update this helper to handle special page differently and instead scan
buffer_head list until a buffer_head with first_for_page flag set is
found.

Signed-off-by: Jérôme Glisse 
Cc: Jens Axboe 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/buffer.c |  4 ++--
 include/linux/buffer_head.h | 18 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 422204701a3b..44beba15c38d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -276,7 +276,7 @@ static void end_buffer_async_read(struct address_space 
*mapping,
 * two buffer heads end IO at almost the same time and both
 * decide that the page is now completely done.
 */
-   first = page_buffers(page);
+   first = bh_first_for_page(bh);
local_irq_save(flags);
bit_spin_lock(BH_Uptodate_Lock, >b_state);
clear_buffer_async_read(bh);
@@ -332,7 +332,7 @@ void end_buffer_async_write(struct address_space *mapping, 
struct page *page,
SetPageError(page);
}
 
-   first = page_buffers(page);
+   first = bh_first_for_page(bh);
local_irq_save(flags);
bit_spin_lock(BH_Uptodate_Lock, >b_state);
 
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 7ae60f59f27e..22e79307c055 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -39,6 +39,12 @@ enum bh_state_bits {
BH_Prio,/* Buffer should be submitted with REQ_PRIO */
BH_Defer_Completion, /* Defer AIO completion to workqueue */
 
+   /*
+* First buffer_head for a page ie page->private is pointing to this
+* buffer_head struct.
+*/
+   BH_FirstForPage,
+
BH_PrivateStart,/* not a state bit, but the first bit available
 * for private allocation by other entities
 */
@@ -135,6 +141,7 @@ BUFFER_FNS(Unwritten, unwritten)
 BUFFER_FNS(Meta, meta)
 BUFFER_FNS(Prio, prio)
 BUFFER_FNS(Defer_Completion, defer_completion)
+BUFFER_FNS(FirstForPage, first_for_page)
 
 #define bh_offset(bh)  ((unsigned long)(bh)->b_data & ~PAGE_MASK)
 
@@ -278,11 +285,22 @@ void buffer_init(void);
  * inline definitions
  */
 
+/*
+ * bh_first_for_page - return first buffer_head for a page
+ * @bh: buffer_head for which we want the first buffer_head for same page
+ * Returns: first buffer_head within the same page as given buffer_head
+ */
+static inline struct buffer_head *bh_first_for_page(struct buffer_head *bh)
+{
+   return page_buffers(bh->b_page);
+}
+
 static inline void attach_page_buffers(struct page *page,
struct buffer_head *head)
 {
get_page(page);
SetPagePrivate(page);
+   set_buffer_first_for_page(head);
set_page_private(page, (unsigned long)head);
 }
 
-- 
2.14.3



[RFC PATCH 35/79] fs/buffer: add struct address_space and struct page to end_io callback

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For the holy crusade to stop relying on struct page mapping field, add
struct address_space and struct page to the end_io callback of buffer
head. Caller of this callback have more context information to find
the match page and mapping.

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Jens Axboe 
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 drivers/md/md-bitmap.c  |  3 ++-
 fs/btrfs/disk-io.c  |  3 ++-
 fs/buffer.c | 26 +-
 fs/ext4/ext4.h  |  3 ++-
 fs/ext4/ialloc.c|  3 ++-
 fs/gfs2/meta_io.c   |  2 +-
 fs/jbd2/commit.c|  3 ++-
 fs/ntfs/aops.c  |  9 ++---
 fs/reiserfs/journal.c   |  6 --
 include/linux/buffer_head.h | 12 
 10 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 239c7bb3929b..717e99eabce9 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -313,7 +313,8 @@ static void write_page(struct bitmap *bitmap, struct page 
*page, int wait)
bitmap_file_kick(bitmap);
 }
 
-static void end_bitmap_write(struct buffer_head *bh, int uptodate)
+static void end_bitmap_write(struct address_space *mapping, struct page *page,
+struct buffer_head *bh, int uptodate)
 {
struct bitmap *bitmap = bh->b_private;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a976ccc6036b..df789cfdebd7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3123,7 +3123,8 @@ int open_ctree(struct super_block *sb,
 }
 ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
 
-static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
+static void btrfs_end_buffer_write_sync(struct address_space *mapping,
+   struct page *page, struct buffer_head *bh, int uptodate)
 {
if (uptodate) {
set_buffer_uptodate(bh);
diff --git a/fs/buffer.c b/fs/buffer.c
index c83878d0a4c0..9f2c5e90b64d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -159,14 +159,16 @@ static void __end_buffer_read_notouch(struct buffer_head 
*bh, int uptodate)
  * Default synchronous end-of-IO handler..  Just mark it up-to-date and
  * unlock the buffer. This is what ll_rw_block uses too.
  */
-void end_buffer_read_sync(struct buffer_head *bh, int uptodate)
+void end_buffer_read_sync(struct address_space *mapping, struct page *page,
+ struct buffer_head *bh, int uptodate)
 {
__end_buffer_read_notouch(bh, uptodate);
put_bh(bh);
 }
 EXPORT_SYMBOL(end_buffer_read_sync);
 
-void end_buffer_write_sync(struct buffer_head *bh, int uptodate)
+void end_buffer_write_sync(struct address_space *mapping, struct page *page,
+  struct buffer_head *bh, int uptodate)
 {
if (uptodate) {
set_buffer_uptodate(bh);
@@ -250,12 +252,12 @@ __find_get_block_slow(struct block_device *bdev, sector_t 
block)
  * I/O completion handler for block_read_full_page() - pages
  * which come unlocked at the end of I/O.
  */
-static void end_buffer_async_read(struct buffer_head *bh, int uptodate)
+static void end_buffer_async_read(struct address_space *mapping,
+   struct page *page, struct buffer_head *bh, int uptodate)
 {
unsigned long flags;
struct buffer_head *first;
struct buffer_head *tmp;
-   struct page *page;
int page_uptodate = 1;
 
BUG_ON(!buffer_async_read(bh));
@@ -311,12 +313,12 @@ static void end_buffer_async_read(struct buffer_head *bh, 
int uptodate)
  * Completion handler for block_write_full_page() - pages which are unlocked
  * during I/O, and which have PageWriteback cleared upon I/O completion.
  */
-void end_buffer_async_write(struct buffer_head *bh, int uptodate)
+void end_buffer_async_write(struct address_space *mapping, struct page *page,
+   struct buffer_head *bh, int uptodate)
 {
unsigned long flags;
struct buffer_head *first;
struct buffer_head *tmp;
-   struct page *page;
 
BUG_ON(!buffer_async_write(bh));
 
@@ -2311,7 +2313,7 @@ int block_read_full_page(struct inode *inode, struct page 
*page,
for (i = 0; i < nr; i++) {
bh = arr[i];
if (buffer_uptodate(bh))
-   end_buffer_async_read(bh, 1);
+   end_buffer_async_read(inode->i_mapping, page, bh, 1);
else
submit_bh(REQ_OP_READ, 0, bh);
}
@@ -2517,7 +2519,8 @@ EXPORT_SYMBOL(block_page_mkwrite);
  * immediately, while under the page lock.  So it needs a special end_io
  * handler which does not touch the bh after 

[RFC PATCH 36/79] fs/buffer: add struct super_block to bforget() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For the holy crusade to stop relying on struct page mapping field, add
struct super_block to bforget() arguments.

spatch --sp-file zemantic-012a.spatch --in-place --dir fs/
--
@exists@
expression E1;
identifier I;
@@
struct super_block *I;
...
-bforget(E1)
+bforget(I, E1)

@exists@
expression E1;
identifier F, I;
@@
F(..., struct super_block *I, ...) {
...
-bforget(E1)
+bforget(I, E1)
...
}

@exists@
expression E1;
identifier I;
@@
struct inode *I;
...
-bforget(E1)
+bforget(I->i_sb, E1)

@exists@
expression E1;
identifier F, I;
@@
F(..., struct inode *I, ...) {
...
-bforget(E1)
+bforget(I->i_sb, E1)
...
}
--

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Jens Axboe 
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/bfs/file.c   | 2 +-
 fs/ext2/inode.c | 4 ++--
 fs/ext2/xattr.c | 4 ++--
 fs/ext4/ext4_jbd2.c | 2 +-
 fs/fat/dir.c| 4 ++--
 fs/jfs/resize.c | 2 +-
 fs/minix/itree_common.c | 6 +++---
 fs/reiserfs/journal.c   | 2 +-
 fs/reiserfs/resize.c| 2 +-
 fs/sysv/itree.c | 6 +++---
 fs/ufs/util.c   | 2 +-
 include/linux/buffer_head.h | 2 +-
 12 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/bfs/file.c b/fs/bfs/file.c
index b1255ee4cd75..6d66cc137bc3 100644
--- a/fs/bfs/file.c
+++ b/fs/bfs/file.c
@@ -41,7 +41,7 @@ static int bfs_move_block(unsigned long from, unsigned long 
to,
new = sb_getblk(sb, to);
memcpy(new->b_data, bh->b_data, bh->b_size);
mark_buffer_dirty(new);
-   bforget(bh);
+   bforget(sb, bh);
brelse(new);
return 0;
 }
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 33873c0a4c14..83ea6ad2cefa 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -536,7 +536,7 @@ static int ext2_alloc_branch(struct inode *inode,
 
 failed:
for (i = 1; i < n; i++)
-   bforget(branch[i].bh);
+   bforget(inode->i_sb, branch[i].bh);
for (i = 0; i < indirect_blks; i++)
ext2_free_blocks(inode, new_blocks[i], 1);
ext2_free_blocks(inode, new_blocks[i], num);
@@ -1167,7 +1167,7 @@ static void ext2_free_branches(struct inode *inode, 
__le32 *p, __le32 *q, int de
   (__le32*)bh->b_data,
   (__le32*)bh->b_data + addr_per_block,
   depth);
-   bforget(bh);
+   bforget(inode->i_sb, bh);
ext2_free_blocks(inode, nr, 1);
mark_inode_dirty(inode);
}
diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c
index 62d9a659a8ff..c77edf9afbce 100644
--- a/fs/ext2/xattr.c
+++ b/fs/ext2/xattr.c
@@ -733,7 +733,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head 
*old_bh,
/* We let our caller release old_bh, so we
 * need to duplicate the buffer before. */
get_bh(old_bh);
-   bforget(old_bh);
+   bforget(sb, old_bh);
} else {
/* Decrement the refcount only. */
le32_add_cpu((old_bh)->h_refcount, -1);
@@ -802,7 +802,7 @@ ext2_xattr_delete_inode(struct inode *inode)
  bh->b_blocknr);
ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1);
get_bh(bh);
-   bforget(bh);
+   bforget(inode->i_sb, bh);
unlock_buffer(bh);
} else {
le32_add_cpu((bh)->h_refcount, -1);
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 5529badca994..60fbf5336059 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -211,7 +211,7 @@ int __ext4_forget(const char *where, unsigned int line, 
handle_t *handle,
 
/* In the no journal case, we can just do a bforget and return */
if (!ext4_handle_valid(handle)) {
-   bforget(bh);
+   bforget(inode->i_sb, bh);
return 0;
}
 
diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index 8e100c3bf72c..b801f3d0220b 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -1126,7 +1126,7 @@ static int fat_zeroed_cluster(struct inode *dir, sector_t 
blknr, int nr_used,
 
 error:
for (i = 0; i < n; i++)
-   bforget(bhs[i]);
+   bforget(sb, bhs[i]);
return err;
 }
 
@@ -1266,7 +1266,7 @@ static int fat_add_new_entries(struct inode *dir, void 
*slots, int 

[RFC PATCH 37/79] fs/buffer: add struct super_block to __bforget() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For the holy crusade to stop relying on struct page mapping field, add
struct super_block to __bforget() arguments.

spatch --sp-file zemantic-013a.spatch --in-place --dir fs/
spatch --sp-file zemantic-013a.spatch --in-place --dir include/ 
--include-headers
--
@exists@
expression E1;
identifier I;
@@
struct super_block *I;
...
-__bforget(E1)
+__bforget(I, E1)

@exists@
expression E1;
identifier F, I;
@@
F(..., struct super_block *I, ...) {
...
-__bforget(E1)
+__bforget(I, E1)
...
}

@exists@
expression E1;
identifier I;
@@
struct inode *I;
...
-__bforget(E1)
+__bforget(I->i_sb, E1)

@exists@
expression E1;
identifier F, I;
@@
F(..., struct inode *I, ...) {
...
-__bforget(E1)
+__bforget(I->i_sb, E1)
...
}
--

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Jens Axboe 
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/buffer.c | 2 +-
 fs/jbd2/transaction.c   | 2 +-
 include/linux/buffer_head.h | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 9f2c5e90b64d..422204701a3b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1168,7 +1168,7 @@ EXPORT_SYMBOL(__brelse);
  * bforget() is like brelse(), except it discards any
  * potentially dirty data.
  */
-void __bforget(struct buffer_head *bh)
+void __bforget(struct super_block *sb, struct buffer_head *bh)
 {
clear_buffer_dirty(bh);
if (bh->b_assoc_map) {
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e8c50bb5822c..177616eb793c 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1560,7 +1560,7 @@ int jbd2_journal_forget (handle_t *handle, struct 
super_block *sb,
if (!buffer_jbd(bh)) {
spin_unlock(>j_list_lock);
jbd_unlock_bh_state(bh);
-   __bforget(bh);
+   __bforget(sb, bh);
goto drop;
}
}
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 82faae102ba2..7ae60f59f27e 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -192,7 +192,7 @@ struct buffer_head *__find_get_block(struct block_device 
*bdev, sector_t block,
 struct buffer_head *__getblk_gfp(struct block_device *bdev, sector_t block,
  unsigned size, gfp_t gfp);
 void __brelse(struct buffer_head *);
-void __bforget(struct buffer_head *);
+void __bforget(struct super_block *, struct buffer_head *);
 void __breadahead(struct block_device *, sector_t block, unsigned int size);
 struct buffer_head *__bread_gfp(struct block_device *,
sector_t block, unsigned size, gfp_t gfp);
@@ -306,7 +306,7 @@ static inline void brelse(struct buffer_head *bh)
 static inline void bforget(struct super_block *sb, struct buffer_head *bh)
 {
if (bh)
-   __bforget(bh);
+   __bforget(sb, bh);
 }
 
 static inline struct buffer_head *
-- 
2.14.3



[RFC PATCH 50/79] fs: stop relying on mapping field of struct page, get it from context

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Holy grail, remove all usage of mapping field of struct page inside
common fs code.

spatch --sp-file zemantic-015a.spatch --in-place fs/*.c
--
@exists@
struct page * P;
identifier I;
@@
struct address_space *I;
...
-P->mapping
+I

@exists@
identifier F, I;
struct page * P;
@@
F(..., struct address_space *I, ...) {
...
-P->mapping
+I
...
}

@@
@@
-mapping = mapping;

@@
@@
-struct address_space *mapping = _mapping;
--

Hand edit:
fs/mpage.c __mpage_writepage() coccinelle sematic is too hard ...

Signed-off-by: Jérôme Glisse 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Jens Axboe 
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/buffer.c | 11 +--
 fs/libfs.c  |  2 +-
 fs/mpage.c  |  9 -
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b968ac0b65e8..39d8c7315b55 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -327,7 +327,7 @@ void end_buffer_async_write(struct address_space *mapping, 
struct page *page,
set_buffer_uptodate(bh);
} else {
buffer_io_error(bh, ", lost async page write");
-   mark_buffer_write_io_error(page->mapping, page, bh);
+   mark_buffer_write_io_error(mapping, page, bh);
clear_buffer_uptodate(bh);
SetPageError(page);
}
@@ -597,11 +597,10 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
  *
  * The caller must hold lock_page_memcg().
  */
-static void __set_page_dirty(struct page *page, struct address_space *_mapping,
+static void __set_page_dirty(struct page *page, struct address_space *mapping,
 int warn)
 {
unsigned long flags;
-   struct address_space *mapping = page_mapping(page);
 
spin_lock_irqsave(>tree_lock, flags);
if (page_is_truncated(page, mapping)) { /* Race with truncate? */
@@ -1954,7 +1953,7 @@ int __block_write_begin_int(struct address_space 
*mapping, struct page *page,
 {
unsigned from = pos & (PAGE_SIZE - 1);
unsigned to = from + len;
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = mapping->host;
unsigned block_start, block_end;
sector_t block;
int err = 0;
@@ -2456,7 +2455,7 @@ EXPORT_SYMBOL(cont_write_begin);
 int block_commit_write(struct address_space *mapping, struct page *page,
   unsigned from, unsigned to)
 {
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = mapping->host;
__block_commit_write(inode,page,from,to);
return 0;
 }
@@ -2705,7 +2704,7 @@ int nobh_write_end(struct file *file, struct 
address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
 {
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = mapping->host;
struct buffer_head *head = fsdata;
struct buffer_head *bh;
BUG_ON(fsdata != NULL && page_has_buffers(page));
diff --git a/fs/libfs.c b/fs/libfs.c
index ac76b269bbb7..585ef1f37d54 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -475,7 +475,7 @@ int simple_write_end(struct file *file, struct 
address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
 {
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = mapping->host;
loff_t last_pos = pos + copied;
 
/* zero the stale part of the page if we did a short copy */
diff --git a/fs/mpage.c b/fs/mpage.c
index 1eec9d0df23e..ecdef63f464e 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -231,7 +231,7 @@ do_mpage_readpage(struct bio *bio, struct address_space 
*mapping,
 * so readpage doesn't have to repeat the get_block call
 */
if (buffer_uptodate(map_bh)) {
-   map_buffer_to_page(page->mapping->host, page,
+   map_buffer_to_page(mapping->host, page,
   map_bh, page_block);
goto confused;
}
@@ -312,7 +312,7 @@ do_mpage_readpage(struct bio *bio, struct address_space 
*mapping,
if (bio)
bio = mpage_bio_submit(REQ_OP_READ, 0, bio);
if (!PageUptodate(page))
-   block_read_full_page(page->mapping->host, page, get_block);
+   block_read_full_page(mapping->host, page, get_block);
else
unlock_page(page);
goto out;
@@ -484,13 +484,12 @@ void clean_page_buffers(struct address_space *mapping, 
struct page 

[RFC PATCH 39/79] fs/buffer: add struct address_space to clean_page_buffers() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct address_space to clean_page_buffers() arguments.

One step toward dropping reliance on page->mapping.

Signed-off-by: Jérôme Glisse 
Cc: Jens Axboe 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/block_dev.c  | 2 +-
 fs/mpage.c  | 9 +
 include/linux/buffer_head.h | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index dd9da97615e3..b653cd8fd1e3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -712,7 +712,7 @@ int bdev_write_page(struct block_device *bdev, sector_t 
sector,
if (result) {
end_page_writeback(page);
} else {
-   clean_page_buffers(page);
+   clean_page_buffers(mapping, page);
unlock_page(page);
}
blk_queue_exit(bdev->bd_queue);
diff --git a/fs/mpage.c b/fs/mpage.c
index a75cea232f1a..624995c333e0 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -447,7 +447,8 @@ struct mpage_data {
  * We have our BIO, so we can now mark the buffers clean.  Make
  * sure to only clean buffers which we know we'll be writing.
  */
-static void clean_buffers(struct page *page, unsigned first_unmapped)
+static void clean_buffers(struct address_space *mapping, struct page *page,
+ unsigned first_unmapped)
 {
unsigned buffer_counter = 0;
struct buffer_head *bh, *head;
@@ -477,9 +478,9 @@ static void clean_buffers(struct page *page, unsigned 
first_unmapped)
  * We don't need to calculate how many buffers are attached to the page,
  * we just need to specify a number larger than the maximum number of buffers.
  */
-void clean_page_buffers(struct page *page)
+void clean_page_buffers(struct address_space *mapping, struct page *page)
 {
-   clean_buffers(page, ~0U);
+   clean_buffers(mapping, page, ~0U);
 }
 
 static int __mpage_writepage(struct page *page, struct address_space *_mapping,
@@ -643,7 +644,7 @@ static int __mpage_writepage(struct page *page, struct 
address_space *_mapping,
goto alloc_new;
}
 
-   clean_buffers(page, first_unmapped);
+   clean_buffers(mapping, page, first_unmapped);
 
BUG_ON(PageWriteback(page));
set_page_writeback(page);
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 22e79307c055..f3baf88a251b 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -248,7 +248,7 @@ int generic_write_end(struct file *, struct address_space *,
loff_t, unsigned, unsigned,
struct page *, void *);
 void page_zero_new_buffers(struct page *page, unsigned from, unsigned to);
-void clean_page_buffers(struct page *page);
+void clean_page_buffers(struct address_space *mapping, struct page *page);
 int cont_write_begin(struct file *, struct address_space *, loff_t,
unsigned, unsigned, struct page **, void **,
get_block_t *, loff_t *);
-- 
2.14.3



[RFC PATCH 63/79] mm/page: convert page's index lookup to be against specific mapping

2018-04-04 Thread jglisse
From: Jérôme Glisse 

This patch switch mm to lookup the page index or offset value to be
against specific mapping. The page index value only have a meaning
against a mapping.

Using coccinelle:
-
@@
struct page *P;
expression E;
@@
-P->index = E
+page_set_index(P, E)

@@
struct page *P;
@@
-P->index
+page_index(P)

@@
struct page *P;
@@
-page_index(P) << PAGE_SHIFT
+page_offset(P)

@@
expression E;
@@
-page_index(E)
+_page_index(E, mapping)

@@
expression E1, E2;
@@
-page_set_index(E1, E2)
+_page_set_index(E1, mapping, E2)

@@
expression E;
@@
-page_to_index(E)
+_page_to_index(E, mapping)

@@
expression E;
@@
-page_to_pgoff(E)
+_page_to_pgoff(E, mapping)

@@
expression E;
@@
-page_offset(E)
+_page_offset(E, mapping)

@@
expression E;
@@
-page_file_offset(E)
+_page_file_offset(E, mapping)
-

Signed-off-by: Jérôme Glisse 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: linux...@kvack.org
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
---
 mm/filemap.c| 26 ++
 mm/page-writeback.c | 16 +---
 mm/shmem.c  | 11 +++
 mm/truncate.c   | 11 ++-
 4 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 012a53964215..a41c7cfb6351 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -118,7 +118,8 @@ static int page_cache_tree_insert(struct address_space 
*mapping,
void **slot;
int error;
 
-   error = __radix_tree_create(>page_tree, page->index, 0,
+   error = __radix_tree_create(>page_tree,
+   _page_index(page, mapping), 0,
, );
if (error)
return error;
@@ -155,7 +156,8 @@ static void page_cache_tree_delete(struct address_space 
*mapping,
struct radix_tree_node *node;
void **slot;
 
-   __radix_tree_lookup(>page_tree, page->index + i,
+   __radix_tree_lookup(>page_tree,
+   _page_index(page, mapping) + i,
, );
 
VM_BUG_ON_PAGE(!node && nr != 1, page);
@@ -791,12 +793,12 @@ int replace_page_cache_page(struct page *old, struct page 
*new, gfp_t gfp_mask)
void (*freepage)(struct page *);
unsigned long flags;
 
-   pgoff_t offset = old->index;
+   pgoff_t offset = _page_index(old, mapping);
freepage = mapping->a_ops->freepage;
 
get_page(new);
new->mapping = mapping;
-   new->index = offset;
+   _page_set_index(new, mapping, offset);
 
spin_lock_irqsave(>tree_lock, flags);
__delete_from_page_cache(old, NULL);
@@ -850,7 +852,7 @@ static int __add_to_page_cache_locked(struct page *page,
 
get_page(page);
page->mapping = mapping;
-   page->index = offset;
+   _page_set_index(page, mapping, offset);
 
spin_lock_irq(>tree_lock);
error = page_cache_tree_insert(mapping, page, shadowp);
@@ -1500,7 +1502,7 @@ struct page *find_lock_entry(struct address_space 
*mapping, pgoff_t offset)
put_page(page);
goto repeat;
}
-   VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
+   VM_BUG_ON_PAGE(_page_to_pgoff(page, mapping) != offset, page);
}
return page;
 }
@@ -1559,7 +1561,7 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t offset,
put_page(page);
goto repeat;
}
-   VM_BUG_ON_PAGE(page->index != offset, page);
+   VM_BUG_ON_PAGE(_page_index(page, mapping) != offset, page);
}
 
if (page && (fgp_flags & FGP_ACCESSED))
@@ -1751,7 +1753,7 @@ unsigned find_get_pages_range(struct address_space 
*mapping, pgoff_t *start,
 
pages[ret] = page;
if (++ret == nr_pages) {
-   *start = pages[ret - 1]->index + 1;
+   *start = _page_index(pages[ret - 1], mapping) + 1;
goto out;
}
}
@@ -1837,7 +1839,7 @@ unsigned find_get_pages_contig(struct address_space 
*mapping, pgoff_t index,
 * otherwise we can get both false positives and false
 * negatives, which is just confusing to the caller.
 */
-   if (page->mapping == NULL || page_to_pgoff(page) != iter.index) 
{
+   if (page->mapping == NULL || _page_to_pgoff(page, mapping) != 
iter.index) {
put_page(page);
break;
}

[RFC PATCH 64/79] mm/buffer: use _page_has_buffers() instead of page_has_buffers()

2018-04-04 Thread jglisse
From: Jérôme Glisse 

The former need the address_space for which the buffer_head is being
lookup.

--
@exists@
identifier M;
expression E;
@@
struct address_space *M;
...
-page_buffers(E)
+_page_buffers(E, M)

@exists@
identifier M, F;
expression E;
@@
F(..., struct address_space *M, ...) {...
-page_buffers(E)
+_page_buffers(E, M)
...}

@exists@
identifier M;
expression E;
@@
struct address_space *M;
...
-page_has_buffers(E)
+_page_has_buffers(E, M)

@exists@
identifier M, F;
expression E;
@@
F(..., struct address_space *M, ...) {...
-page_has_buffers(E)
+_page_has_buffers(E, M)
...}

@exists@
identifier I;
expression E;
@@
struct inode *I;
...
-page_buffers(E)
+_page_buffers(E, I->i_mapping)

@exists@
identifier I, F;
expression E;
@@
F(..., struct inode *I, ...) {...
-page_buffers(E)
+_page_buffers(E, I->i_mapping)
...}

@exists@
identifier I;
expression E;
@@
struct inode *I;
...
-page_has_buffers(E)
+_page_has_buffers(E, I->i_mapping)

@exists@
identifier I, F;
expression E;
@@
F(..., struct inode *I, ...) {...
-page_has_buffers(E)
+_page_has_buffers(E, I->i_mapping)
...}
--

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Jens Axboe 
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 mm/migrate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index c2a613283fa2..e4b20ac6cf36 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -768,10 +768,10 @@ int buffer_migrate_page(struct address_space *mapping,
struct buffer_head *bh, *head;
int rc;
 
-   if (!page_has_buffers(page))
+   if (!_page_has_buffers(page, mapping))
return migrate_page(mapping, newpage, page, mode);
 
-   head = page_buffers(page);
+   head = _page_buffers(page, mapping);
 
rc = migrate_page_move_mapping(mapping, newpage, page, head, mode, 0);
 
-- 
2.14.3



[RFC PATCH 52/79] fs/buffer: use _page_has_buffers() instead of page_has_buffers()

2018-04-04 Thread jglisse
From: Jérôme Glisse 

The former need the address_space for which the buffer_head is being
lookup.

--
@exists@
identifier M;
expression E;
@@
struct address_space *M;
...
-page_buffers(E)
+_page_buffers(E, M)

@exists@
identifier M, F;
expression E;
@@
F(..., struct address_space *M, ...) {...
-page_buffers(E)
+_page_buffers(E, M)
...}

@exists@
identifier M;
expression E;
@@
struct address_space *M;
...
-page_has_buffers(E)
+_page_has_buffers(E, M)

@exists@
identifier M, F;
expression E;
@@
F(..., struct address_space *M, ...) {...
-page_has_buffers(E)
+_page_has_buffers(E, M)
...}

@exists@
identifier I;
expression E;
@@
struct inode *I;
...
-page_buffers(E)
+_page_buffers(E, I->i_mapping)

@exists@
identifier I, F;
expression E;
@@
F(..., struct inode *I, ...) {...
-page_buffers(E)
+_page_buffers(E, I->i_mapping)
...}

@exists@
identifier I;
expression E;
@@
struct inode *I;
...
-page_has_buffers(E)
+_page_has_buffers(E, I->i_mapping)

@exists@
identifier I, F;
expression E;
@@
F(..., struct inode *I, ...) {...
-page_has_buffers(E)
+_page_has_buffers(E, I->i_mapping)
...}
--

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Jens Axboe 
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/buffer.c | 60 ++--
 fs/mpage.c  | 14 +++---
 2 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 3c424b7af5af..27b19c629308 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -89,13 +89,13 @@ void buffer_check_dirty_writeback(struct page *page,
 
BUG_ON(!PageLocked(page));
 
-   if (!page_has_buffers(page))
+   if (!_page_has_buffers(page, mapping))
return;
 
if (PageWriteback(page))
*writeback = true;
 
-   head = page_buffers(page);
+   head = _page_buffers(page, mapping);
bh = head;
do {
if (buffer_locked(bh))
@@ -211,9 +211,9 @@ __find_get_block_slow(struct block_device *bdev, sector_t 
block)
goto out;
 
spin_lock(_mapping->private_lock);
-   if (!page_has_buffers(page))
+   if (!_page_has_buffers(page, bd_mapping))
goto out_unlock;
-   head = page_buffers(page);
+   head = _page_buffers(page, bd_mapping);
bh = head;
do {
if (!buffer_mapped(bh))
@@ -648,8 +648,8 @@ int __set_page_dirty_buffers(struct address_space *mapping,
return !TestSetPageDirty(page);
 
spin_lock(>private_lock);
-   if (page_has_buffers(page)) {
-   struct buffer_head *head = page_buffers(page);
+   if (_page_has_buffers(page, mapping)) {
+   struct buffer_head *head = _page_buffers(page, mapping);
struct buffer_head *bh = head;
 
do {
@@ -913,7 +913,7 @@ static sector_t
 init_page_buffers(struct address_space *buffer, struct page *page,
  struct block_device *bdev, sector_t block, int size)
 {
-   struct buffer_head *head = page_buffers(page);
+   struct buffer_head *head = _page_buffers(page, buffer);
struct buffer_head *bh = head;
int uptodate = PageUptodate(page);
sector_t end_block = blkdev_max_block(I_BDEV(bdev->bd_inode), size);
@@ -969,8 +969,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 
BUG_ON(!PageLocked(page));
 
-   if (page_has_buffers(page)) {
-   bh = page_buffers(page);
+   if (_page_has_buffers(page, inode->i_mapping)) {
+   bh = _page_buffers(page, inode->i_mapping);
if (bh->b_size == size) {
end_block = init_page_buffers(inode->i_mapping, page,
bdev, (sector_t)index << sizebits,
@@ -1490,7 +1490,7 @@ void block_invalidatepage(struct address_space *mapping, 
struct page *page,
unsigned int stop = length + offset;
 
BUG_ON(!PageLocked(page));
-   if (!page_has_buffers(page))
+   if (!_page_has_buffers(page, mapping))
goto out;
 
/*
@@ -1498,7 +1498,7 @@ void block_invalidatepage(struct address_space *mapping, 
struct page *page,
 */
BUG_ON(stop > PAGE_SIZE || stop < length);
 
-   head = page_buffers(page);
+   head = _page_buffers(page, mapping);
bh = head;
do {
unsigned int next_off = curr_off + bh->b_size;
@@ -1605,7 +1605,7 @@ void clean_bdev_aliases(struct block_device *bdev, 
sector_t block, sector_t len)
for (i = 0; i < count; i++) {

[RFC PATCH 65/79] mm/swap: add struct swap_info_struct swap_readpage() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Add struct swap_info_struct swap_readpage() arguments. One step toward
dropping reliance on page->private during swap read back.

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 include/linux/swap.h |  6 --
 mm/memory.c  |  2 +-
 mm/page_io.c |  4 ++--
 mm/swap_state.c  | 12 
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 2f6abe9652f6..90c26ec2997c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -383,7 +383,8 @@ extern void kswapd_stop(int nid);
 #include  /* for bio_end_io_t */
 
 /* linux/mm/page_io.c */
-extern int swap_readpage(struct page *page, bool do_poll);
+extern int swap_readpage(struct swap_info_struct *sis, struct page *page,
+bool do_poll);
 extern int swap_writepage(struct address_space *mapping, struct page *page,
  struct writeback_control *wbc);
 extern void end_swap_bio_write(struct bio *bio);
@@ -486,7 +487,8 @@ extern void exit_swap_address_space(unsigned int type);
 
 #else /* CONFIG_SWAP */
 
-static inline int swap_readpage(struct page *page, bool do_poll)
+static inline int swap_readpage(struct swap_info_struct *sis, struct page 
*page,
+   bool do_poll)
 {
return 0;
 }
diff --git a/mm/memory.c b/mm/memory.c
index 1311599a164b..6ffd76528e7b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2949,7 +2949,7 @@ int do_swap_page(struct vm_fault *vmf)
__SetPageSwapBacked(page);
set_page_private(page, entry.val);
lru_cache_add_anon(page);
-   swap_readpage(page, true);
+   swap_readpage(si, page, true);
}
} else {
if (vma_readahead)
diff --git a/mm/page_io.c b/mm/page_io.c
index 6e548b588490..f4e05c90c87e 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -349,11 +349,11 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
return ret;
 }
 
-int swap_readpage(struct page *page, bool synchronous)
+int swap_readpage(struct swap_info_struct *sis, struct page *page,
+ bool synchronous)
 {
struct bio *bio;
int ret = 0;
-   struct swap_info_struct *sis = page_swap_info(page);
blk_qc_t qc;
struct gendisk *disk;
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 39ae7cfad90f..40a2437e3c34 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -466,8 +466,10 @@ struct page *read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
struct page *retpage = __read_swap_cache_async(entry, gfp_mask,
vma, addr, _was_allocated);
 
-   if (page_was_allocated)
-   swap_readpage(retpage, do_poll);
+   if (page_was_allocated) {
+   struct swap_info_struct *sis = swp_swap_info(entry);
+   swap_readpage(sis, retpage, do_poll);
+   }
 
return retpage;
 }
@@ -585,7 +587,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t 
gfp_mask,
if (!page)
continue;
if (page_allocated) {
-   swap_readpage(page, false);
+   struct swap_info_struct *sis = swp_swap_info(entry);
+   swap_readpage(sis, page, false);
if (offset != entry_offset &&
likely(!PageTransCompound(page))) {
SetPageReadahead(page);
@@ -748,7 +751,8 @@ struct page *do_swap_page_readahead(swp_entry_t fentry, 
gfp_t gfp_mask,
if (!page)
continue;
if (page_allocated) {
-   swap_readpage(page, false);
+   struct swap_info_struct *sis = swp_swap_info(entry);
+   swap_readpage(sis, page, false);
if (i != swap_ra->offset &&
likely(!PageTransCompound(page))) {
SetPageReadahead(page);
-- 
2.14.3



[RFC PATCH 69/79] fs/journal: add struct address_space to jbd2_journal_try_to_free_buffers() arguments

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For the holy crusade to stop relying on struct page mapping field, add
struct address_space to jbd2_journal_try_to_free_buffers() arguments.

<-
@@
type T1, T2, T3;
@@
int
-jbd2_journal_try_to_free_buffers(T1 journal, T2 page, T3 gfp_mask)
+jbd2_journal_try_to_free_buffers(T1 journal, struct address_space *mapping, T2 
page, T3 gfp_mask)
{...}

@@
type T1, T2, T3;
@@
int
-jbd2_journal_try_to_free_buffers(T1, T2, T3)
+jbd2_journal_try_to_free_buffers(T1, struct address_space *, T2, T3)
;

@@
expression E1, E2, E3;
@@
-jbd2_journal_try_to_free_buffers(E1, E2, E3)
+jbd2_journal_try_to_free_buffers(E1, NULL, E2, E3)
->

Signed-off-by: Jérôme Glisse 
Cc: "Theodore Ts'o" 
Cc: Jan Kara 
Cc: linux-e...@vger.kernel.org
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
---
 fs/ext4/inode.c   | 3 ++-
 fs/ext4/super.c   | 4 ++--
 fs/jbd2/transaction.c | 3 ++-
 include/linux/jbd2.h  | 4 +++-
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1a44d9acde53..ef53a57d9768 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3413,7 +3413,8 @@ static int ext4_releasepage(struct address_space *mapping,
if (PageChecked(page))
return 0;
if (journal)
-   return jbd2_journal_try_to_free_buffers(journal, page, wait);
+   return jbd2_journal_try_to_free_buffers(journal, NULL, page,
+   wait);
else
return try_to_free_buffers(mapping, page);
 }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8f98bc886569..cf2b74137fb2 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1138,8 +1138,8 @@ static int bdev_try_to_free_page(struct super_block *sb, 
struct page *page,
if (!_page_has_buffers(page, mapping))
return 0;
if (journal)
-   return jbd2_journal_try_to_free_buffers(journal, page,
-   wait & ~__GFP_DIRECT_RECLAIM);
+   return jbd2_journal_try_to_free_buffers(journal, NULL, page,
+   wait & 
~__GFP_DIRECT_RECLAIM);
return try_to_free_buffers(mapping, page);
 }
 
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index bf673b33d436..6899e7b4036d 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1984,7 +1984,8 @@ __journal_try_to_free_buffer(journal_t *journal, struct 
buffer_head *bh)
  * Return 0 on failure, 1 on success
  */
 int jbd2_journal_try_to_free_buffers(journal_t *journal,
-   struct page *page, gfp_t gfp_mask)
+struct address_space *mapping,
+struct page *page, gfp_t gfp_mask)
 {
struct buffer_head *head;
struct buffer_head *bh;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index c5133df80fd4..658a0d2f758f 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1363,7 +1363,9 @@ extern int jbd2_journal_forget (handle_t *, 
struct super_block *sb,
 extern void journal_sync_buffer (struct buffer_head *);
 extern int  jbd2_journal_invalidatepage(journal_t *,
struct page *, unsigned int, unsigned int);
-extern int  jbd2_journal_try_to_free_buffers(journal_t *, struct page *, 
gfp_t);
+extern int  jbd2_journal_try_to_free_buffers(journal_t *,
+   struct address_space *,
+   struct page *, gfp_t);
 extern int  jbd2_journal_stop(handle_t *);
 extern int  jbd2_journal_flush (journal_t *);
 extern void jbd2_journal_lock_updates (journal_t *);
-- 
2.14.3



[RFC PATCH 71/79] mm: add struct address_space to set_page_dirty()

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For the holy crusade to stop relying on struct page mapping field, add
struct address_space to set_page_dirty() arguments.

<-
@@
identifier I1;
type T1;
@@
int
-set_page_dirty(T1 I1)
+set_page_dirty(struct address_space *_mapping, T1 I1)
{...}

@@
type T1;
@@
int
-set_page_dirty(T1)
+set_page_dirty(struct address_space *, T1)
;

@@
identifier I1;
type T1;
@@
int
-set_page_dirty(T1 I1)
+set_page_dirty(struct address_space *, T1)
;

@@
expression E1;
@@
-set_page_dirty(E1)
+set_page_dirty(NULL, E1)
->

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  2 +-
 drivers/gpu/drm/drm_gem.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem.c|  6 ++---
 drivers/gpu/drm/i915/i915_gem_fence_reg.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c|  2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c|  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c   |  2 +-
 drivers/infiniband/core/umem_odp.c |  2 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c|  2 +-
 drivers/mtd/devices/block2mtd.c|  4 +--
 drivers/platform/goldfish/goldfish_pipe.c  |  2 +-
 drivers/sbus/char/oradax.c |  2 +-
 drivers/staging/lustre/lustre/llite/rw26.c |  2 +-
 drivers/staging/lustre/lustre/llite/vvp_io.c   |  4 +--
 .../interface/vchiq_arm/vchiq_2835_arm.c   |  2 +-
 fs/9p/vfs_addr.c   |  2 +-
 fs/afs/write.c |  2 +-
 fs/btrfs/extent_io.c   |  2 +-
 fs/btrfs/file.c|  2 +-
 fs/btrfs/inode.c   |  6 ++---
 fs/btrfs/ioctl.c   |  2 +-
 fs/btrfs/relocation.c  |  2 +-
 fs/buffer.c|  6 ++---
 fs/ceph/addr.c |  4 +--
 fs/cifs/file.c |  4 +--
 fs/exofs/dir.c |  2 +-
 fs/exofs/inode.c   |  4 +--
 fs/f2fs/checkpoint.c   |  4 +--
 fs/f2fs/data.c |  6 ++---
 fs/f2fs/dir.c  | 10 
 fs/f2fs/file.c | 10 
 fs/f2fs/gc.c   |  6 ++---
 fs/f2fs/inline.c   | 18 ++---
 fs/f2fs/inode.c|  6 ++---
 fs/f2fs/node.c | 20 +++
 fs/f2fs/node.h |  2 +-
 fs/f2fs/recovery.c |  2 +-
 fs/f2fs/segment.c  | 12 -
 fs/f2fs/xattr.c|  6 ++---
 fs/fuse/file.c |  2 +-
 fs/gfs2/file.c |  2 +-
 fs/hfs/bnode.c | 12 -
 fs/hfs/btree.c |  6 ++---
 fs/hfsplus/bitmap.c|  8 +++---
 fs/hfsplus/bnode.c | 30 +++---
 fs/hfsplus/btree.c |  6 ++---
 fs/hfsplus/xattr.c |  2 +-
 fs/iomap.c |  2 +-
 fs/jfs/jfs_metapage.c  |  4 +--
 fs/libfs.c |  2 +-
 fs/nfs/direct.c|  2 +-
 fs/ntfs/attrib.c   |  8 +++---
 fs/ntfs/bitmap.c   |  4 +--
 fs/ntfs/file.c |  2 +-
 fs/ntfs/lcnalloc.c |  4 +--
 fs/ntfs/mft.c  |  4 +--
 fs/ntfs/usnjrnl.c  |  2 +-
 fs/udf/file.c  |  2 +-
 fs/ufs/inode.c |  2 +-
 include/linux/mm.h |  2 +-
 mm/filemap.c   |  2 +-
 mm/gup.c   |  2 +-
 mm/huge_memory.c   |  2 +-
 mm/hugetlb.c   |  2 +-
 mm/khugepaged.c   

[RFC PATCH 70/79] mm: add struct address_space to mark_buffer_dirty()

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For the holy crusade to stop relying on struct page mapping field, add
struct address_space to mark_buffer_dirty() arguments.

<-
@@
identifier I1;
type T1;
@@
void
-mark_buffer_dirty(T1 I1)
+mark_buffer_dirty(struct address_space *_mapping, T1 I1)
{...}

@@
type T1;
@@
void
-mark_buffer_dirty(T1)
+mark_buffer_dirty(struct address_space *, T1)
;

@@
identifier I1;
type T1;
@@
void
-mark_buffer_dirty(T1 I1)
+mark_buffer_dirty(struct address_space *, T1)
;

@@
expression E1;
@@
-mark_buffer_dirty(E1)
+mark_buffer_dirty(NULL, E1)
->

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/adfs/dir_f.c |  2 +-
 fs/affs/bitmap.c|  6 +++---
 fs/affs/super.c |  2 +-
 fs/bfs/file.c   |  2 +-
 fs/bfs/inode.c  |  4 ++--
 fs/buffer.c | 12 ++--
 fs/ext2/balloc.c|  6 +++---
 fs/ext2/ialloc.c|  8 
 fs/ext2/inode.c |  2 +-
 fs/ext2/super.c |  4 ++--
 fs/ext2/xattr.c |  8 
 fs/ext4/ext4_jbd2.c |  4 ++--
 fs/ext4/inode.c |  4 ++--
 fs/ext4/mmp.c   |  2 +-
 fs/ext4/resize.c|  2 +-
 fs/ext4/super.c |  2 +-
 fs/fat/inode.c  |  4 ++--
 fs/fat/misc.c   |  2 +-
 fs/gfs2/bmap.c  |  4 ++--
 fs/gfs2/lops.c  |  6 +++---
 fs/hfs/mdb.c| 10 +-
 fs/hpfs/anode.c | 34 +-
 fs/hpfs/buffer.c|  8 
 fs/hpfs/dnode.c |  4 ++--
 fs/hpfs/ea.c|  4 ++--
 fs/hpfs/inode.c |  2 +-
 fs/hpfs/namei.c | 10 +-
 fs/hpfs/super.c |  6 +++---
 fs/jbd2/recovery.c  |  2 +-
 fs/jbd2/transaction.c   |  2 +-
 fs/jfs/jfs_imap.c   |  2 +-
 fs/jfs/jfs_mount.c  |  2 +-
 fs/jfs/resize.c |  6 +++---
 fs/jfs/super.c  |  2 +-
 fs/minix/bitmap.c   | 10 +-
 fs/minix/inode.c| 12 ++--
 fs/nilfs2/alloc.c   | 12 ++--
 fs/nilfs2/btnode.c  |  4 ++--
 fs/nilfs2/btree.c   | 38 +++---
 fs/nilfs2/cpfile.c  | 24 
 fs/nilfs2/dat.c |  4 ++--
 fs/nilfs2/gcinode.c |  2 +-
 fs/nilfs2/ifile.c   |  4 ++--
 fs/nilfs2/inode.c   |  2 +-
 fs/nilfs2/ioctl.c   |  2 +-
 fs/nilfs2/mdt.c |  2 +-
 fs/nilfs2/segment.c |  4 ++--
 fs/nilfs2/sufile.c  | 26 +-
 fs/ntfs/file.c  |  8 
 fs/ntfs/super.c |  2 +-
 fs/ocfs2/alloc.c|  2 +-
 fs/ocfs2/aops.c |  4 ++--
 fs/ocfs2/inode.c|  2 +-
 fs/omfs/bitmap.c|  6 +++---
 fs/omfs/dir.c   |  8 
 fs/omfs/file.c  |  4 ++--
 fs/omfs/inode.c |  4 ++--
 fs/reiserfs/file.c  |  2 +-
 fs/reiserfs/inode.c |  4 ++--
 fs/reiserfs/journal.c   | 10 +-
 fs/reiserfs/resize.c|  2 +-
 fs/sysv/balloc.c|  2 +-
 fs/sysv/ialloc.c|  2 +-
 fs/sysv/inode.c |  8 
 fs/sysv/sysv.h  |  4 ++--
 fs/udf/balloc.c |  6 +++---
 fs/udf/inode.c  |  2 +-
 fs/udf/partition.c  |  4 ++--
 fs/udf/super.c  |  8 
 fs/ufs/balloc.c |  4 ++--
 fs/ufs/ialloc.c |  4 ++--
 fs/ufs/inode.c  |  8 
 fs/ufs/util.c   |  2 +-
 include/linux/buffer_head.h |  2 +-
 74 files changed, 220 insertions(+), 220 deletions(-)

diff --git a/fs/adfs/dir_f.c b/fs/adfs/dir_f.c
index 0fbfd0b04ae0..3d92f8d187bc 100644
--- a/fs/adfs/dir_f.c
+++ b/fs/adfs/dir_f.c
@@ -434,7 +434,7 @@ adfs_f_update(struct adfs_dir *dir, struct object_info *obj)
}
 #endif
for (i = dir->nr_buffers - 1; i >= 0; i--)
-   mark_buffer_dirty(dir->bh[i]);
+   mark_buffer_dirty(NULL, dir->bh[i]);
 
ret = 0;
 out:
diff --git a/fs/affs/bitmap.c b/fs/affs/bitmap.c
index 5ba9ef2742f6..59b352075505 100644
--- a/fs/affs/bitmap.c
+++ b/fs/affs/bitmap.c
@@ -79,7 +79,7 @@ affs_free_block(struct super_block *sb, u32 block)
tmp = be32_to_cpu(*(__be32 *)bh->b_data);
*(__be32 *)bh->b_data = cpu_to_be32(tmp - mask);
 
-   mark_buffer_dirty(bh);
+   mark_buffer_dirty(NULL, bh);
affs_mark_sb_dirty(sb);
bm->bm_free++;
 
@@ -223,7 +223,7 @@ 

[RFC PATCH 51/79] fs: stop relying on mapping field of struct page, get it from context

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Holy grail, remove all usage of mapping field of struct page inside
common fs code. This is the manual conversion patch (so much can be
done with coccinelle).

Signed-off-by: Jérôme Glisse 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Jens Axboe 
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 fs/buffer.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 39d8c7315b55..3c424b7af5af 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -570,7 +570,9 @@ void write_boundary_block(struct block_device *bdev,
 void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
 {
struct address_space *mapping = inode->i_mapping;
-   struct address_space *buffer_mapping = bh->b_page->mapping;
+   struct address_space *buffer_mapping;
+
+   buffer_mapping = fs_page_mapping_get_with_bh(bh->b_page, bh);
 
mark_buffer_dirty(bh);
if (!mapping->private_data) {
@@ -1138,10 +1140,13 @@ EXPORT_SYMBOL(mark_buffer_dirty);
 void mark_buffer_write_io_error(struct address_space *mapping,
struct page *page, struct buffer_head *bh)
 {
+   BUG_ON(page != bh->b_page);
+   BUG_ON(mapping != bh->b_page->mapping);
+
set_buffer_write_io_error(bh);
/* FIXME: do we need to set this in both places? */
-   if (bh->b_page && !page_is_truncated(bh->b_page, bh->b_page->mapping))
-   mapping_set_error(bh->b_page->mapping, -EIO);
+   if (bh->b_page && !page_is_truncated(page, mapping))
+   mapping_set_error(mapping, -EIO);
if (bh->b_assoc_map)
mapping_set_error(bh->b_assoc_map, -EIO);
 }
@@ -1172,7 +1177,10 @@ void __bforget(struct super_block *sb, struct 
buffer_head *bh)
 {
clear_buffer_dirty(bh);
if (bh->b_assoc_map) {
-   struct address_space *buffer_mapping = bh->b_page->mapping;
+   struct address_space *buffer_mapping;
+
+   buffer_mapping = sb->s_bdev->bd_inode->i_mapping;
+   BUG_ON(buffer_mapping != bh->b_page->mapping);
 
spin_lock(_mapping->private_lock);
list_del_init(>b_assoc_buffers);
@@ -1543,7 +1551,7 @@ void create_empty_buffers(struct address_space *mapping, 
struct page *page,
} while (bh);
tail->b_this_page = head;
 
-   spin_lock(>mapping->private_lock);
+   spin_lock(>private_lock);
if (PageUptodate(page) || PageDirty(page)) {
bh = head;
do {
@@ -1555,7 +1563,7 @@ void create_empty_buffers(struct address_space *mapping, 
struct page *page,
} while (bh != head);
}
attach_page_buffers(page, head);
-   spin_unlock(>mapping->private_lock);
+   spin_unlock(>private_lock);
 }
 EXPORT_SYMBOL(create_empty_buffers);
 
@@ -1833,7 +1841,7 @@ int __block_write_full_page(struct inode *inode, struct 
page *page,
} while ((bh = bh->b_this_page) != head);
SetPageError(page);
BUG_ON(PageWriteback(page));
-   mapping_set_error(page->mapping, err);
+   mapping_set_error(inode->i_mapping, err);
set_page_writeback(page);
do {
struct buffer_head *next = bh->b_this_page;
@@ -2541,7 +2549,7 @@ static void attach_nobh_buffers(struct address_space 
*mapping,
 
BUG_ON(!PageLocked(page));
 
-   spin_lock(>mapping->private_lock);
+   spin_lock(>private_lock);
bh = head;
do {
if (PageDirty(page))
@@ -2551,7 +2559,7 @@ static void attach_nobh_buffers(struct address_space 
*mapping,
bh = bh->b_this_page;
} while (bh != head);
attach_page_buffers(page, head);
-   spin_unlock(>mapping->private_lock);
+   spin_unlock(>private_lock);
 }
 
 /*
-- 
2.14.3



[RFC PATCH 68/79] mm/vma_address: convert page's index lookup to be against specific mapping

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Pass down the mapping ...

Signed-off-by: Jérôme Glisse 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: linux...@kvack.org
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
---
 mm/internal.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/internal.h b/mm/internal.h
index e6bd35182dae..43e9ed27362f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -336,7 +336,9 @@ extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct 
vm_area_struct *vma);
 static inline unsigned long
 __vma_address(struct page *page, struct vm_area_struct *vma)
 {
-   pgoff_t pgoff = page_to_pgoff(page);
+   struct address_space *mapping = vma->vm_file ? vma->vm_file->f_mapping 
: NULL;
+
+   pgoff_t pgoff = _page_to_pgoff(page, mapping);
return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 }
 
-- 
2.14.3



[RFC PATCH 72/79] mm: add struct address_space to set_page_dirty_lock()

2018-04-04 Thread jglisse
From: Jérôme Glisse 

For the holy crusade to stop relying on struct page mapping field, add
struct address_space to set_page_dirty_lock() arguments.

<-
@@
identifier I1;
type T1;
@@
int
-set_page_dirty_lock(T1 I1)
+set_page_dirty_lock(struct address_space *_mapping, T1 I1)
{...}

@@
type T1;
@@
int
-set_page_dirty_lock(T1)
+set_page_dirty_lock(struct address_space *, T1)
;

@@
identifier I1;
type T1;
@@
int
-set_page_dirty_lock(T1 I1)
+set_page_dirty_lock(struct address_space *, T1)
;

@@
expression E1;
@@
-set_page_dirty_lock(E1)
+set_page_dirty_lock(NULL, E1)
->

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 arch/cris/arch-v32/drivers/cryptocop.c| 2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c| 2 +-
 arch/powerpc/kvm/e500_mmu.c   | 3 ++-
 arch/s390/kvm/interrupt.c | 4 ++--
 arch/x86/kvm/svm.c| 2 +-
 block/bio.c   | 4 ++--
 drivers/gpu/drm/exynos/exynos_drm_g2d.c   | 2 +-
 drivers/infiniband/core/umem.c| 2 +-
 drivers/infiniband/hw/hfi1/user_pages.c   | 2 +-
 drivers/infiniband/hw/qib/qib_user_pages.c| 2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c  | 2 +-
 drivers/media/common/videobuf2/videobuf2-dma-contig.c | 2 +-
 drivers/media/common/videobuf2/videobuf2-dma-sg.c | 2 +-
 drivers/media/common/videobuf2/videobuf2-vmalloc.c| 2 +-
 drivers/misc/genwqe/card_utils.c  | 2 +-
 drivers/staging/lustre/lustre/llite/rw26.c| 2 +-
 drivers/vhost/vhost.c | 2 +-
 fs/block_dev.c| 2 +-
 fs/direct-io.c| 2 +-
 fs/fuse/dev.c | 2 +-
 fs/fuse/file.c| 2 +-
 include/linux/mm.h| 2 +-
 mm/memory.c   | 2 +-
 mm/page-writeback.c   | 2 +-
 mm/process_vm_access.c| 2 +-
 net/ceph/pagevec.c| 2 +-
 26 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/arch/cris/arch-v32/drivers/cryptocop.c 
b/arch/cris/arch-v32/drivers/cryptocop.c
index a3c353472a8c..5cb42555c90b 100644
--- a/arch/cris/arch-v32/drivers/cryptocop.c
+++ b/arch/cris/arch-v32/drivers/cryptocop.c
@@ -2930,7 +2930,7 @@ static int cryptocop_ioctl_process(struct inode *inode, 
struct file *filp, unsig
for (i = 0; i < nooutpages; i++){
int spdl_err;
/* Mark output pages dirty. */
-   spdl_err = set_page_dirty_lock(outpages[i]);
+   spdl_err = set_page_dirty_lock(NULL, outpages[i]);
DEBUG(if (spdl_err < 0)printk("cryptocop_ioctl_process: 
set_page_dirty_lock returned %d\n", spdl_err));
}
for (i = 0; i < nooutpages; i++){
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 5cb4e4687107..8daefabe650e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -482,7 +482,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 
if (page) {
if (!ret && (pgflags & _PAGE_WRITE))
-   set_page_dirty_lock(page);
+   set_page_dirty_lock(NULL, page);
put_page(page);
}
 
diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
index ddbf8f0284c0..364ee7a5b268 100644
--- a/arch/powerpc/kvm/e500_mmu.c
+++ b/arch/powerpc/kvm/e500_mmu.c
@@ -556,7 +556,8 @@ static void free_gtlb(struct kvmppc_vcpu_e500 *vcpu_e500)
  PAGE_SIZE)));
 
for (i = 0; i < vcpu_e500->num_shared_tlb_pages; i++) {
-   set_page_dirty_lock(vcpu_e500->shared_tlb_pages[i]);
+   set_page_dirty_lock(NULL,
+   vcpu_e500->shared_tlb_pages[i]);
put_page(vcpu_e500->shared_tlb_pages[i]);
}
 
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index b04616b57a94..6db8d4f5c74f 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2616,7 +2616,7 @@ static int adapter_indicators_set(struct kvm *kvm,
set_bit(bit, map);
idx = 

[RFC PATCH 79/79] mm/ksm: set page->mapping to page_ronly struct instead of stable_node.

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Set page->mapping to the page_ronly struct instead of stable_node
struct. There is no functional change as page_ronly is just a field
of stable_node.

Signed-off-by: Jérôme Glisse 
Cc: Andrea Arcangeli 
---
 mm/ksm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 6085068fb8b3..52b0ae291d23 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "internal.h"
@@ -126,6 +127,7 @@ struct ksm_scan {
 
 /**
  * struct stable_node - node of the stable rbtree
+ * @ronly: Page read only struct wrapper (see include/linux/page_ronly.h).
  * @node: rb node of this ksm page in the stable tree
  * @head: (overlaying parent) _nodes indicates temporarily on that list
  * @hlist_dup: linked into the stable_node->hlist with a stable_node chain
@@ -137,6 +139,7 @@ struct ksm_scan {
  * @nid: NUMA node id of stable tree in which linked (may not match kpfn)
  */
 struct stable_node {
+   struct page_ronly ronly;
union {
struct rb_node node;/* when node of stable tree */
struct {/* when listed for migration */
@@ -318,13 +321,15 @@ static void __init ksm_slab_free(void)
 
 static inline struct stable_node *page_stable_node(struct page *page)
 {
-   return PageReadOnly(page) ? page_rmapping(page) : NULL;
+   struct page_ronly *ronly = page_ronly(page);
+
+   return ronly ? container_of(ronly, struct stable_node, ronly) : NULL;
 }
 
 static inline void set_page_stable_node(struct page *page,
struct stable_node *stable_node)
 {
-   page->mapping = (void *)((unsigned long)stable_node | 
PAGE_MAPPING_RONLY);
+   page_ronly_set(page, stable_node ? _node->ronly : NULL);
 }
 
 static __always_inline bool is_stable_node_chain(struct stable_node *chain)
-- 
2.14.3



[RFC PATCH 75/79] mm/page_ronly: add page read only core structure and helpers.

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Page read only is a generic framework for page write protection.
It reuses the same mechanism as KSM by using the lower bit of the
page->mapping fields, and KSM is converted to use this generic
framework.

Signed-off-by: Jérôme Glisse 
Cc: Andrea Arcangeli 
---
 include/linux/page_ronly.h | 169 +
 1 file changed, 169 insertions(+)
 create mode 100644 include/linux/page_ronly.h

diff --git a/include/linux/page_ronly.h b/include/linux/page_ronly.h
new file mode 100644
index ..6312d4f015ea
--- /dev/null
+++ b/include/linux/page_ronly.h
@@ -0,0 +1,169 @@
+/*
+ * Copyright 2015 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Authors: Jérôme Glisse 
+ */
+/*
+ * Page read only generic wrapper. This is common struct use to write protect
+ * page by means of forbidding anyone from inserting a pte (page table entry)
+ * with write flag set. It reuse the ksm mecanism (which use lower bit of the
+ * mapping field of struct page).
+ */
+#ifndef LINUX_PAGE_RONLY_H
+#define LINUX_PAGE_RONLY_H
+#ifdef CONFIG_PAGE_RONLY
+
+#include 
+#include 
+#include 
+#include 
+
+
+/* enum page_ronly_event - Event that trigger a call to unprotec().
+ *
+ * @PAGE_RONLY_SWAPIN: Page fault on at an address with a swap entry pte.
+ * @PAGE_RONLY_WFAULT: Write page fault.
+ * @PAGE_RONLY_GUP: Get user page.
+ */
+enum page_ronly_event {
+   PAGE_RONLY_SWAPIN,
+   PAGE_RONLY_WFAULT,
+   PAGE_RONLY_GUP,
+};
+
+/* struct page_ronly_ops - Page read only operations.
+ *
+ * @unprotect: Callback to unprotect a page (mandatory).
+ * @rmap_walk: Callback to walk reverse mapping of a page (mandatory).
+ *
+ * Kernel user that want to use the page write protection mechanism have to
+ * provide a number of callback.
+ */
+struct page_ronly_ops {
+   struct page *(*unprotect)(struct page *page,
+ unsigned long addr,
+ struct vm_area_struct *vma,
+ enum page_ronly_event event);
+   int (*rmap_walk)(struct page *page, struct rmap_walk_control *rwc);
+};
+
+/* struct page_ronly - Replace page->mapping when a page is write protected.
+ *
+ * @ops: Pointer to page read only operations.
+ *
+ * Page that are write protect have their page->mapping field pointing to this
+ * wrapper structure. It must be allocated by page read only user and must be
+ * free (if needed) inside unprotect() callback.
+ */
+struct page_ronly {
+   const struct page_ronly_ops *ops;
+};
+
+
+/* page_ronly() - Return page_ronly struct if any or NULL.
+ *
+ * @page: The page for which to replace the page->mapping field.
+ */
+static inline struct page_ronly *page_ronly(struct page *page)
+{
+   return PageReadOnly(page) ? page_rmapping(page) : NULL;
+}
+
+/* page_ronly_set() - Replace page->mapping with ptr to page_ronly struct.
+ *
+ * @page: The page for which to replace the page->mapping field.
+ * @ronly: The page_ronly structure to set.
+ *
+ * Page must be locked.
+ */
+static inline void page_ronly_set(struct page *page, struct page_ronly *ronly)
+{
+   VM_BUG_ON_PAGE(!PageLocked(page), page);
+
+   page->mapping = (void *)ronly + (PAGE_MAPPING_ANON|PAGE_MAPPING_RONLY);
+}
+
+/* page_ronly_unprotect() - Unprotect a read only protected page.
+ *
+ * @page: The page to unprotect.
+ * @addr: Fault address that trigger the unprotect.
+ * @vma: The vma of the fault address.
+ * @event: Event which triggered the unprotect.
+ *
+ * Page must be locked and must be a read only page.
+ */
+static inline struct page *page_ronly_unprotect(struct page *page,
+   unsigned long addr,
+   struct vm_area_struct *vma,
+   enum page_ronly_event event)
+{
+   struct page_ronly *pageronly;
+
+   VM_BUG_ON_PAGE(!PageLocked(page), page);
+   /*
+* Rely on the page lock to protect against concurrent modifications
+* to that page's node of the stable tree.
+*/
+   VM_BUG_ON_PAGE(!PageReadOnly(page), page);
+   pageronly = page_ronly(page);
+   if (pageronly)
+   return pageronly->ops->unprotect(page, addr, vma, event);
+   /* Safest fallback. */
+   return page;
+}
+
+/* page_ronly_rmap_walk() - Walk all CPU page table mapping of a 

[RFC PATCH 76/79] mm/ksm: have ksm select PAGE_RONLY config.

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Signed-off-by: Jérôme Glisse 
Cc: Andrea Arcangeli 
---
 mm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index aeffb6e8dd21..6994a1fdf847 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -308,6 +308,7 @@ config MMU_NOTIFIER
 config KSM
bool "Enable KSM for page merging"
depends on MMU
+   select PAGE_RONLY
help
  Enable Kernel Samepage Merging: KSM periodically scans those areas
  of an application's address space that an app has advised may be
-- 
2.14.3



[RFC PATCH 73/79] mm: pass down struct address_space to set_page_dirty()

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Pass down struct address_space to set_page_dirty() everywhere it is
already available.

<-
@exists@
expression E;
identifier F, M;
@@
F(..., struct address_space * M, ...) {
...
-set_page_dirty(NULL, E)
+set_page_dirty(M, E)
...
}

@exists@
expression E;
identifier M;
@@
struct address_space * M;
...
-set_page_dirty(NULL, E)
+set_page_dirty(M, E)

@exists@
expression E;
identifier F, I;
@@
F(..., struct inode * I, ...) {
...
-set_page_dirty(NULL, E)
+set_page_dirty(I->i_mapping, E)
...
}

@exists@
expression E;
identifier I;
@@
struct inode * I;
...
-set_page_dirty(NULL, E)
+set_page_dirty(I->i_mapping, E)
->

Signed-off-by: Jérôme Glisse 
CC: Andrew Morton 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
---
 mm/filemap.c|  2 +-
 mm/khugepaged.c |  2 +-
 mm/memory.c |  2 +-
 mm/page-writeback.c |  4 ++--
 mm/page_io.c|  4 ++--
 mm/shmem.c  | 18 +-
 mm/truncate.c   |  2 +-
 7 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index c1ee7431bc4d..a15c29350a6a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2717,7 +2717,7 @@ int filemap_page_mkwrite(struct vm_fault *vmf)
 * progress, we are guaranteed that writeback during freezing will
 * see the dirty page and writeprotect it again.
 */
-   set_page_dirty(NULL, page);
+   set_page_dirty(inode->i_mapping, page);
wait_for_stable_page(page);
 out:
sb_end_pagefault(inode->i_sb);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ccd5da4e855f..b9a968172fb9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1513,7 +1513,7 @@ static void collapse_shmem(struct mm_struct *mm,
retract_page_tables(mapping, start);
 
/* Everything is ready, let's unfreeze the new_page */
-   set_page_dirty(NULL, new_page);
+   set_page_dirty(mapping, new_page);
SetPageUptodate(new_page);
page_ref_unfreeze(new_page, HPAGE_PMD_NR);
mem_cgroup_commit_charge(new_page, memcg, false, true);
diff --git a/mm/memory.c b/mm/memory.c
index 20443ebf9c42..fbd80bb7a50a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2400,7 +2400,7 @@ static void fault_dirty_shared_page(struct vm_area_struct 
*vma,
bool dirtied;
bool page_mkwrite = vma->vm_ops && vma->vm_ops->page_mkwrite;
 
-   dirtied = set_page_dirty(NULL, page);
+   dirtied = set_page_dirty(mapping, page);
VM_BUG_ON_PAGE(PageAnon(page), page);
/*
 * Take a local copy of the address_space - page.mapping may be zeroed
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index eaa6c23ba752..59dc9a12efc7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2599,7 +2599,7 @@ int set_page_dirty_lock(struct address_space *_mapping, 
struct page *page)
int ret;
 
lock_page(page);
-   ret = set_page_dirty(NULL, page);
+   ret = set_page_dirty(_mapping, page);
unlock_page(page);
return ret;
 }
@@ -2693,7 +2693,7 @@ int clear_page_dirty_for_io(struct page *page)
 * threads doing their things.
 */
if (page_mkclean(page))
-   set_page_dirty(NULL, page);
+   set_page_dirty(mapping, page);
/*
 * We carefully synchronise fault handlers against
 * installing a dirty pte and marking the page dirty
diff --git a/mm/page_io.c b/mm/page_io.c
index 5afc8b8a6b97..fd3133cd50d4 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -329,7 +329,7 @@ int __swap_writepage(struct address_space *mapping, struct 
page *page,
 * the normal direct-to-bio case as it could
 * be temporary.
 */
-   set_page_dirty(NULL, page);
+   set_page_dirty(mapping, page);
ClearPageReclaim(page);
pr_err_ratelimited("Write error on dio swapfile 
(%llu)\n",
   page_file_offset(page));
@@ -348,7 +348,7 @@ int __swap_writepage(struct address_space *mapping, struct 
page *page,
ret = 0;
bio = get_swap_bio(GFP_NOIO, page, end_write_func);
if (bio == NULL) {
-   set_page_dirty(NULL, page);
+   set_page_dirty(mapping, page);
unlock_page(page);
ret = -ENOMEM;
goto out;
diff --git a/mm/shmem.c b/mm/shmem.c
index cb09fea4a9ce..eae03f684869 100644
--- 

[RFC PATCH 78/79] mm/ksm: rename PAGE_MAPPING_KSM to PAGE_MAPPING_RONLY

2018-04-04 Thread jglisse
From: Jérôme Glisse 

This just rename all KSM specific helper to generic page read only
name. No functional change.

Signed-off-by: Jérôme Glisse 
Cc: Andrea Arcangeli 
---
 fs/proc/page.c |  2 +-
 include/linux/page-flags.h | 30 +-
 mm/ksm.c   | 12 ++--
 mm/memory-failure.c|  2 +-
 mm/memory.c|  2 +-
 mm/migrate.c   |  6 +++---
 mm/mprotect.c  |  2 +-
 mm/page_idle.c |  2 +-
 mm/rmap.c  | 10 +-
 mm/swapfile.c  |  2 +-
 10 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 1491918a33c3..00cc037758ef 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -110,7 +110,7 @@ u64 stable_page_flags(struct page *page)
u |= 1 << KPF_MMAP;
if (PageAnon(page))
u |= 1 << KPF_ANON;
-   if (PageKsm(page))
+   if (PageReadOnly(page))
u |= 1 << KPF_KSM;
 
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 50c2b8786831..0338fb5dde8d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -374,12 +374,12 @@ PAGEFLAG(Idle, idle, PF_ANY)
  * page->mapping points to its anon_vma, not to a struct address_space;
  * with the PAGE_MAPPING_ANON bit set to distinguish it.  See rmap.h.
  *
- * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
+ * On an anonymous page in a VM_MERGEABLE area, if CONFIG_RONLY is enabled,
  * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
  * bit; and then page->mapping points, not to an anon_vma, but to a private
- * structure which KSM associates with that merged page.  See ksm.h.
+ * structure which RONLY associates with that merged page.  See page-ronly.h.
  *
- * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
+ * PAGE_MAPPING_RONLY without PAGE_MAPPING_ANON is used for non-lru movable
  * page and then page->mapping points a struct address_space.
  *
  * Please note that, confusingly, "page_mapping" refers to the inode
@@ -388,7 +388,7 @@ PAGEFLAG(Idle, idle, PF_ANY)
  */
 #define PAGE_MAPPING_ANON  0x1
 #define PAGE_MAPPING_MOVABLE   0x2
-#define PAGE_MAPPING_KSM   (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
+#define PAGE_MAPPING_RONLY (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
 #define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
 
 static __always_inline int PageMappingFlags(struct page *page)
@@ -408,21 +408,25 @@ static __always_inline int __PageMovable(struct page 
*page)
PAGE_MAPPING_MOVABLE;
 }
 
-#ifdef CONFIG_KSM
-/*
- * A KSM page is one of those write-protected "shared pages" or "merged pages"
- * which KSM maps into multiple mms, wherever identical anonymous page content
- * is found in VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any
- * anon_vma, but to that page's node of the stable tree.
+#ifdef CONFIG_PAGE_RONLY
+/* PageReadOnly() - Returns true if page is read only, false otherwise.
+ *
+ * @page: Page under test.
+ *
+ * A read only page is one of those write-protected. Currently only KSM does
+ * write protect a page as "shared pages" or "merged pages"  which KSM maps
+ * into multiple mms, wherever identical anonymous page content is found in
+ * VM_MERGEABLE vmas.  It's a PageAnon page, pointing not to any anon_vma,
+ * but to that page's node of the stable tree.
  */
-static __always_inline int PageKsm(struct page *page)
+static __always_inline int PageReadOnly(struct page *page)
 {
page = compound_head(page);
return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
-   PAGE_MAPPING_KSM;
+   PAGE_MAPPING_RONLY;
 }
 #else
-TESTPAGEFLAG_FALSE(Ksm)
+TESTPAGEFLAG_FALSE(ReadOnly)
 #endif
 
 u64 stable_page_flags(struct page *page);
diff --git a/mm/ksm.c b/mm/ksm.c
index f9bd1251c288..6085068fb8b3 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -318,13 +318,13 @@ static void __init ksm_slab_free(void)
 
 static inline struct stable_node *page_stable_node(struct page *page)
 {
-   return PageKsm(page) ? page_rmapping(page) : NULL;
+   return PageReadOnly(page) ? page_rmapping(page) : NULL;
 }
 
 static inline void set_page_stable_node(struct page *page,
struct stable_node *stable_node)
 {
-   page->mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM);
+   page->mapping = (void *)((unsigned long)stable_node | 
PAGE_MAPPING_RONLY);
 }
 
 static __always_inline bool is_stable_node_chain(struct stable_node *chain)
@@ -470,7 +470,7 @@ static int break_ksm(struct vm_area_struct *vma, unsigned 
long addr)
FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE);
if (IS_ERR_OR_NULL(page))

[RFC PATCH 77/79] mm/ksm: hide set_page_stable_node() and page_stable_node()

2018-04-04 Thread jglisse
From: Jérôme Glisse 

Hiding this 2 functions as preparatory step for generalizing ksm
write protection to other users. Moreover those two helpers can
not be use meaningfully outside ksm.c as the struct they deal with
is defined inside ksm.c.

Signed-off-by: Jérôme Glisse 
Cc: Andrea Arcangeli 
---
 include/linux/ksm.h | 12 
 mm/ksm.c| 11 +++
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 44368b19b27e..83c664080798 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -15,7 +15,6 @@
 #include 
 #include 
 
-struct stable_node;
 struct mem_cgroup;
 
 #ifdef CONFIG_KSM
@@ -37,17 +36,6 @@ static inline void ksm_exit(struct mm_struct *mm)
__ksm_exit(mm);
 }
 
-static inline struct stable_node *page_stable_node(struct page *page)
-{
-   return PageKsm(page) ? page_rmapping(page) : NULL;
-}
-
-static inline void set_page_stable_node(struct page *page,
-   struct stable_node *stable_node)
-{
-   page->mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM);
-}
-
 /*
  * When do_swap_page() first faults in from swap what used to be a KSM page,
  * no problem, it will be assigned to this vma's anon_vma; but thereafter,
diff --git a/mm/ksm.c b/mm/ksm.c
index 1c16a4309c1d..f9bd1251c288 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -316,6 +316,17 @@ static void __init ksm_slab_free(void)
mm_slot_cache = NULL;
 }
 
+static inline struct stable_node *page_stable_node(struct page *page)
+{
+   return PageKsm(page) ? page_rmapping(page) : NULL;
+}
+
+static inline void set_page_stable_node(struct page *page,
+   struct stable_node *stable_node)
+{
+   page->mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM);
+}
+
 static __always_inline bool is_stable_node_chain(struct stable_node *chain)
 {
return chain->rmap_hlist_len == STABLE_NODE_CHAIN;
-- 
2.14.3



[RFC PATCH 74/79] mm/page_ronly: add config option for generic read only page framework.

2018-04-04 Thread jglisse
From: Jérôme Glisse 

It's really just a config option patch.

Signed-off-by: Jérôme Glisse 
Cc: Andrea Arcangeli 
---
 mm/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..aeffb6e8dd21 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -149,6 +149,9 @@ config NO_BOOTMEM
 config MEMORY_ISOLATION
bool
 
+config PAGE_RONLY
+   bool
+
 #
 # Only be set on architectures that have completely implemented memory hotplug
 # feature. If you are not sure, don't touch it.
-- 
2.14.3



[PATCH] blk-mq: order getting budget and driver tag

2018-04-04 Thread Ming Lei
This patch orders getting budget and driver tag by making sure to acquire
driver tag after budget is got, this way can help to avoid the following
race:

1) before dispatch request from scheduler queue, get one budget first, then
dequeue a request, call it request A.

2) in another IO path for dispatching request B which is from hctx->dispatch,
driver tag is got, then try to get budget in blk_mq_dispatch_rq_list(),
unfortunately the budget is held by request A.

3) meantime blk_mq_dispatch_rq_list() is called for dispatching request
A, and try to get driver tag first, unfortunately no driver tag is
available because the driver tag is held by request B

4) both two IO pathes can't move on, and IO stall is caused.

This issue can be observed when running dbench on USB storage.

This patch fixes this issue by always getting budget before getting
driver tag.

Cc: sta...@vger.kernel.org
Fixes: de1482974080ec9e ("blk-mq: introduce .get_budget and .put_budget in 
blk_mq_ops")
Cc: Christoph Hellwig 
Cc: Bart Van Assche 
Cc: Omar Sandoval 
Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16e83e6df404..90838e998f66 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1188,7 +1188,12 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
struct blk_mq_queue_data bd;
 
rq = list_first_entry(list, struct request, queuelist);
-   if (!blk_mq_get_driver_tag(rq, , false)) {
+
+   hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
+   if (!got_budget && !blk_mq_get_dispatch_budget(hctx))
+   break;
+
+   if (!blk_mq_get_driver_tag(rq, NULL, false)) {
/*
 * The initial allocation attempt failed, so we need to
 * rerun the hardware queue when a tag is freed. The
@@ -1197,8 +1202,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
 * we'll re-run it below.
 */
if (!blk_mq_mark_tag_wait(, rq)) {
-   if (got_budget)
-   blk_mq_put_dispatch_budget(hctx);
+   blk_mq_put_dispatch_budget(hctx);
/*
 * For non-shared tags, the RESTART check
 * will suffice.
@@ -1209,11 +1213,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
}
}
 
-   if (!got_budget && !blk_mq_get_dispatch_budget(hctx)) {
-   blk_mq_put_driver_tag(rq);
-   break;
-   }
-
list_del_init(>queuelist);
 
bd.rq = rq;
@@ -1812,11 +1811,11 @@ static blk_status_t __blk_mq_try_issue_directly(struct 
blk_mq_hw_ctx *hctx,
if (q->elevator && !bypass_insert)
goto insert;
 
-   if (!blk_mq_get_driver_tag(rq, NULL, false))
+   if (!blk_mq_get_dispatch_budget(hctx))
goto insert;
 
-   if (!blk_mq_get_dispatch_budget(hctx)) {
-   blk_mq_put_driver_tag(rq);
+   if (!blk_mq_get_driver_tag(rq, NULL, false)) {
+   blk_mq_put_dispatch_budget(hctx);
goto insert;
}
 
-- 
2.9.5



Re: [PATCH V3 4/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-04-04 Thread Ming Lei
On Wed, Apr 04, 2018 at 02:45:18PM +0200, Thomas Gleixner wrote:
> On Wed, 4 Apr 2018, Thomas Gleixner wrote:
> > I'm aware how that hw-queue stuff works. But that only works if the
> > spreading algorithm makes the interrupts affine to offline/not-present CPUs
> > when the block device is initialized.
> > 
> > In the example above:
> > 
> > > > >   irq 39, cpu list 0,4
> > > > >   irq 40, cpu list 1,6
> > > > >   irq 41, cpu list 2,5
> > > > >   irq 42, cpu list 3,7
> > 
> > and assumed that at driver init time only CPU 0-3 are online then the
> > hotplug of CPU 4-7 will not result in any interrupt delivered to CPU 4-7.
> > 
> > So the extra assignment to CPU 4-7 in the affinity mask has no effect
> > whatsoever and even if the spreading result is 'perfect' it just looks
> > perfect as it is not making any difference versus the original result:
> > 
> > > > >   irq 39, cpu list 0
> > > > >   irq 40, cpu list 1
> > > > >   irq 41, cpu list 2
> > > > >   irq 42, cpu list 3
> 
> And looking deeper into the changes, I think that the first spreading step
> has to use cpu_present_mask and not cpu_online_mask.
> 
> Assume the following scenario:
> 
> Machine with 8 present CPUs is booted, the 4 last CPUs are
> unplugged. Device with 4 queues is initialized.
> 
> The resulting spread is going to be exactly your example:
> 
>   irq 39, cpu list 0,4
>   irq 40, cpu list 1,6
>   irq 41, cpu list 2,5
>   irq 42, cpu list 3,7
> 
> Now the 4 offline CPUs are plugged in again. These CPUs won't ever get an
> interrupt as all interrupts stay on CPU 0-3 unless one of these CPUs is
> unplugged. Using cpu_present_mask the spread would be:
> 
>   irq 39, cpu list 0,1
>   irq 40, cpu list 2,3
>   irq 41, cpu list 4,5
>   irq 42, cpu list 6,7

Given physical CPU hotplug isn't common, this way will make only irq 39
and irq 40 active most of times, so performance regression is caused just
as Kashyap reported.

> 
> while on a machine where CPU 4-7 are NOT present, but advertised as
> possible the spread would be:
> 
>   irq 39, cpu list 0,4
>   irq 40, cpu list 1,6
>   irq 41, cpu list 2,5
>   irq 42, cpu list 3,7

I think this way is still better, since performance regression can be
avoided, and there is at least one CPU for covering one irq vector,
in reality, it is often enough.

As I mentioned in another email, I still don't understand why interrupts
can't be delivered to CPU 4~7 after these CPUs become present & online.
Seems in theory, interrupts should be delivered to these CPUs since
affinity info has been programmed to interrupt controller already. 

Or do we still need CPU hotplug handler for device driver to tell device
the CPU hotplug change for delivering interrupts to new added CPUs?


Thanks,
Ming


Re: [PATCH V3 4/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-04-04 Thread Ming Lei
On Wed, Apr 04, 2018 at 10:25:16AM +0200, Thomas Gleixner wrote:
> On Wed, 4 Apr 2018, Ming Lei wrote:
> > On Tue, Apr 03, 2018 at 03:32:21PM +0200, Thomas Gleixner wrote:
> > > On Thu, 8 Mar 2018, Ming Lei wrote:
> > > > 1) before 84676c1f21 ("genirq/affinity: assign vectors to all possible 
> > > > CPUs")
> > > > irq 39, cpu list 0
> > > > irq 40, cpu list 1
> > > > irq 41, cpu list 2
> > > > irq 42, cpu list 3
> > > > 
> > > > 2) after 84676c1f21 ("genirq/affinity: assign vectors to all possible 
> > > > CPUs")
> > > > irq 39, cpu list 0-2
> > > > irq 40, cpu list 3-4,6
> > > > irq 41, cpu list 5
> > > > irq 42, cpu list 7
> > > > 
> > > > 3) after applying this patch against V4.15+:
> > > > irq 39, cpu list 0,4
> > > > irq 40, cpu list 1,6
> > > > irq 41, cpu list 2,5
> > > > irq 42, cpu list 3,7
> > > 
> > > That's more or less window dressing. If the device is already in use when
> > > the offline CPUs get hot plugged, then the interrupts still stay on cpu 
> > > 0-3
> > > because the effective affinity of interrupts on X86 (and other
> > > architectures) is always a single CPU.
> > > 
> > > So this only might move interrupts to the hotplugged CPUs when the device
> > > is initialized after CPU hotplug and the actual vector allocation moves an
> > > interrupt out to the higher numbered CPUs if they have less vectors
> > > allocated than the lower numbered ones.
> > 
> > It works for blk-mq devices, such as NVMe.
> > 
> > Now NVMe driver creates num_possible_cpus() hw queues, and each
> > hw queue is assigned one msix irq vector.
> > 
> > Storage is Client/Server model, that means the interrupt is only
> > delivered to CPU after one IO request is submitted to hw queue and
> > it is completed by this hw queue.
> > 
> > When CPUs is hotplugged, and there will be IO submitted from these
> > CPUs, then finally IOs complete and irq events are generated from
> > hw queues, and notify these submission CPU by IRQ finally.
> 
> I'm aware how that hw-queue stuff works. But that only works if the
> spreading algorithm makes the interrupts affine to offline/not-present CPUs
> when the block device is initialized.
> 
> In the example above:
> 
> > > > irq 39, cpu list 0,4
> > > > irq 40, cpu list 1,6
> > > > irq 41, cpu list 2,5
> > > > irq 42, cpu list 3,7
> 
> and assumed that at driver init time only CPU 0-3 are online then the
> hotplug of CPU 4-7 will not result in any interrupt delivered to CPU 4-7.

Indeed, and I just tested this case, and found that no interrupts are
delivered to CPU 4-7.

In theory, the affinity has been assigned to these irq vectors, and
programmed to interrupt controller, I understand it should work.

Could you explain it a bit why interrupts aren't delivered to CPU 4-7?


Thanks,
Ming


Re: BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

2018-04-04 Thread Sagi Grimberg



On 03/30/2018 12:32 PM, Yi Zhang wrote:

Hello
I got this kernel BUG on 4.16.0-rc7, here is the reproducer and log, let me 
know if you need more info, thanks.

Reproducer:
1. setup target
#nvmetcli restore /etc/rdma.json
2. connect target on host
#nvme connect-all -t rdma -a $IP -s 4420during my NVMeoF RDMA testing
3. do fio background on host
#fio -filename=/dev/nvme0n1 -iodepth=1 -thread -rw=randwrite -ioengine=psync 
-bssplit=5k/10:9k/10:13k/10:17k/10:21k/10:25k/10:29k/10:33k/10:37k/10:41k/10 
-bs_unaligned -runtime=180 -size=-group_reporting -name=mytest -numjobs=60 &
4. offline cpu on host
#echo 0 > /sys/devices/system/cpu/cpu1/online
#echo 0 > /sys/devices/system/cpu/cpu2/online
#echo 0 > /sys/devices/system/cpu/cpu3/online
5. clear target
#nvmetcli clear
6. restore target
#nvmetcli restore /etc/rdma.json
7. check console log on host


Hi Yi,

Does this happen with this applied?
--
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
index 996167f1de18..b89da55e8aaa 100644
--- a/block/blk-mq-rdma.c
+++ b/block/blk-mq-rdma.c
@@ -35,6 +35,8 @@ int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
const struct cpumask *mask;
unsigned int queue, cpu;

+   goto fallback;
+
for (queue = 0; queue < set->nr_hw_queues; queue++) {
mask = ib_get_vector_affinity(dev, first_vec + queue);
if (!mask)
--


Re: [PATCH] [v2] rbd: avoid Wreturn-type warnings

2018-04-04 Thread Ilya Dryomov
On Wed, Apr 4, 2018 at 2:53 PM, Arnd Bergmann  wrote:
> A new set of warnings appeared in next-20180403 in some configurations
> when gcc cannot see that rbd_assert(0) leads to an unreachable code
> path:
>
> drivers/block/rbd.c: In function 'rbd_img_is_write':
> drivers/block/rbd.c:1397:1: error: control reaches end of non-void function 
> [-Werror=return-type]
> drivers/block/rbd.c: In function '__rbd_obj_handle_request':
> drivers/block/rbd.c:2499:1: error: control reaches end of non-void function 
> [-Werror=return-type]
> drivers/block/rbd.c: In function 'rbd_obj_handle_write':
> drivers/block/rbd.c:2471:1: error: control reaches end of non-void function 
> [-Werror=return-type]
>
> As the rbd_assert() here shows has no extra information beyond the verbose
> BUG(), we can simply use BUG() directly in its place.  This is reliably
> detected as not returning on any architecture, since it doesn't depend
> on the unlikely() comparison that confused gcc.
>
> Fixes: 3da691bf4366 ("rbd: new request handling code")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/block/rbd.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 07dc5419bd63..5f7f4d4b78a8 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1392,7 +1392,7 @@ static bool rbd_img_is_write(struct rbd_img_request 
> *img_req)
> case OBJ_OP_DISCARD:
> return true;
> default:
> -   rbd_assert(0);
> +   BUG();
> }
>  }
>
> @@ -2466,7 +2466,7 @@ static bool rbd_obj_handle_write(struct rbd_obj_request 
> *obj_req)
> }
> return false;
> default:
> -   rbd_assert(0);
> +   BUG();
> }
>  }
>
> @@ -2494,7 +2494,7 @@ static bool __rbd_obj_handle_request(struct 
> rbd_obj_request *obj_req)
> }
> return false;
> default:
> -   rbd_assert(0);
> +   BUG();
> }
>  }
>

Applied.

Thanks,

Ilya


Re: [PATCH] rbd: add missing return statements

2018-04-04 Thread Arnd Bergmann
On Wed, Apr 4, 2018 at 1:04 PM, Ilya Dryomov  wrote:
> On Wed, Apr 4, 2018 at 11:49 AM, Arnd Bergmann  wrote:
>> A new set of warnings appeared in next-20180403 in some configurations
>> when gcc cannot see that rbd_assert(0) leads to an unreachable code
>> path:
>>
>> drivers/block/rbd.c: In function 'rbd_img_is_write':
>> drivers/block/rbd.c:1397:1: error: control reaches end of non-void function 
>> [-Werror=return-type]
>> drivers/block/rbd.c: In function '__rbd_obj_handle_request':
>> drivers/block/rbd.c:2499:1: error: control reaches end of non-void function 
>> [-Werror=return-type]
>> drivers/block/rbd.c: In function 'rbd_obj_handle_write':
>> drivers/block/rbd.c:2471:1: error: control reaches end of non-void function 
>> [-Werror=return-type]
>>
>> To work around this, we can add a return statement to each of these
>> cases. An alternative would be to remove the unlikely() annotation
>> in rbd_assert(), or to just use BUG()/BUG_ON() directly. This adds the
>> return statements, guessing what the most reasonable behavior
>> would be.
>
> Hi Arnd,
>
> I don't like these bogus return statements.  Let's go with explicit
> BUG/BUG_ON() instead.

Sounds good. Sent a v2 now.

   Arnd


[PATCH] [v2] rbd: avoid Wreturn-type warnings

2018-04-04 Thread Arnd Bergmann
A new set of warnings appeared in next-20180403 in some configurations
when gcc cannot see that rbd_assert(0) leads to an unreachable code
path:

drivers/block/rbd.c: In function 'rbd_img_is_write':
drivers/block/rbd.c:1397:1: error: control reaches end of non-void function 
[-Werror=return-type]
drivers/block/rbd.c: In function '__rbd_obj_handle_request':
drivers/block/rbd.c:2499:1: error: control reaches end of non-void function 
[-Werror=return-type]
drivers/block/rbd.c: In function 'rbd_obj_handle_write':
drivers/block/rbd.c:2471:1: error: control reaches end of non-void function 
[-Werror=return-type]

As the rbd_assert() here shows has no extra information beyond the verbose
BUG(), we can simply use BUG() directly in its place.  This is reliably
detected as not returning on any architecture, since it doesn't depend
on the unlikely() comparison that confused gcc.

Fixes: 3da691bf4366 ("rbd: new request handling code")
Signed-off-by: Arnd Bergmann 
---
 drivers/block/rbd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 07dc5419bd63..5f7f4d4b78a8 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1392,7 +1392,7 @@ static bool rbd_img_is_write(struct rbd_img_request 
*img_req)
case OBJ_OP_DISCARD:
return true;
default:
-   rbd_assert(0);
+   BUG();
}
 }
 
@@ -2466,7 +2466,7 @@ static bool rbd_obj_handle_write(struct rbd_obj_request 
*obj_req)
}
return false;
default:
-   rbd_assert(0);
+   BUG();
}
 }
 
@@ -2494,7 +2494,7 @@ static bool __rbd_obj_handle_request(struct 
rbd_obj_request *obj_req)
}
return false;
default:
-   rbd_assert(0);
+   BUG();
}
 }
 
-- 
2.9.0



Re: [PATCH V3 4/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-04-04 Thread Thomas Gleixner
On Wed, 4 Apr 2018, Thomas Gleixner wrote:
> I'm aware how that hw-queue stuff works. But that only works if the
> spreading algorithm makes the interrupts affine to offline/not-present CPUs
> when the block device is initialized.
> 
> In the example above:
> 
> > > > irq 39, cpu list 0,4
> > > > irq 40, cpu list 1,6
> > > > irq 41, cpu list 2,5
> > > > irq 42, cpu list 3,7
> 
> and assumed that at driver init time only CPU 0-3 are online then the
> hotplug of CPU 4-7 will not result in any interrupt delivered to CPU 4-7.
> 
> So the extra assignment to CPU 4-7 in the affinity mask has no effect
> whatsoever and even if the spreading result is 'perfect' it just looks
> perfect as it is not making any difference versus the original result:
> 
> > > >   irq 39, cpu list 0
> > > >   irq 40, cpu list 1
> > > >   irq 41, cpu list 2
> > > >   irq 42, cpu list 3

And looking deeper into the changes, I think that the first spreading step
has to use cpu_present_mask and not cpu_online_mask.

Assume the following scenario:

Machine with 8 present CPUs is booted, the 4 last CPUs are
unplugged. Device with 4 queues is initialized.

The resulting spread is going to be exactly your example:

irq 39, cpu list 0,4
irq 40, cpu list 1,6
irq 41, cpu list 2,5
irq 42, cpu list 3,7

Now the 4 offline CPUs are plugged in again. These CPUs won't ever get an
interrupt as all interrupts stay on CPU 0-3 unless one of these CPUs is
unplugged. Using cpu_present_mask the spread would be:

irq 39, cpu list 0,1
irq 40, cpu list 2,3
irq 41, cpu list 4,5
irq 42, cpu list 6,7

while on a machine where CPU 4-7 are NOT present, but advertised as
possible the spread would be:

irq 39, cpu list 0,4
irq 40, cpu list 1,6
irq 41, cpu list 2,5
irq 42, cpu list 3,7

Hmm?

Thanks,

tglx





Re: [PATCH] rbd: add missing return statements

2018-04-04 Thread Ilya Dryomov
On Wed, Apr 4, 2018 at 11:49 AM, Arnd Bergmann  wrote:
> A new set of warnings appeared in next-20180403 in some configurations
> when gcc cannot see that rbd_assert(0) leads to an unreachable code
> path:
>
> drivers/block/rbd.c: In function 'rbd_img_is_write':
> drivers/block/rbd.c:1397:1: error: control reaches end of non-void function 
> [-Werror=return-type]
> drivers/block/rbd.c: In function '__rbd_obj_handle_request':
> drivers/block/rbd.c:2499:1: error: control reaches end of non-void function 
> [-Werror=return-type]
> drivers/block/rbd.c: In function 'rbd_obj_handle_write':
> drivers/block/rbd.c:2471:1: error: control reaches end of non-void function 
> [-Werror=return-type]
>
> To work around this, we can add a return statement to each of these
> cases. An alternative would be to remove the unlikely() annotation
> in rbd_assert(), or to just use BUG()/BUG_ON() directly. This adds the
> return statements, guessing what the most reasonable behavior
> would be.

Hi Arnd,

I don't like these bogus return statements.  Let's go with explicit
BUG/BUG_ON() instead.

Thanks,

Ilya


[PATCH] rbd: add missing return statements

2018-04-04 Thread Arnd Bergmann
A new set of warnings appeared in next-20180403 in some configurations
when gcc cannot see that rbd_assert(0) leads to an unreachable code
path:

drivers/block/rbd.c: In function 'rbd_img_is_write':
drivers/block/rbd.c:1397:1: error: control reaches end of non-void function 
[-Werror=return-type]
drivers/block/rbd.c: In function '__rbd_obj_handle_request':
drivers/block/rbd.c:2499:1: error: control reaches end of non-void function 
[-Werror=return-type]
drivers/block/rbd.c: In function 'rbd_obj_handle_write':
drivers/block/rbd.c:2471:1: error: control reaches end of non-void function 
[-Werror=return-type]

To work around this, we can add a return statement to each of these
cases. An alternative would be to remove the unlikely() annotation
in rbd_assert(), or to just use BUG()/BUG_ON() directly. This adds the
return statements, guessing what the most reasonable behavior
would be.

Fixes: 3da691bf4366 ("rbd: new request handling code")
Signed-off-by: Arnd Bergmann 
---
 drivers/block/rbd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 07dc5419bd63..9445a71a9cd6 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1394,6 +1394,7 @@ static bool rbd_img_is_write(struct rbd_img_request 
*img_req)
default:
rbd_assert(0);
}
+   return false;
 }
 
 static void rbd_obj_handle_request(struct rbd_obj_request *obj_req);
@@ -2468,6 +2469,7 @@ static bool rbd_obj_handle_write(struct rbd_obj_request 
*obj_req)
default:
rbd_assert(0);
}
+   return true;
 }
 
 /*
@@ -2496,6 +2498,7 @@ static bool __rbd_obj_handle_request(struct 
rbd_obj_request *obj_req)
default:
rbd_assert(0);
}
+   return true;
 }
 
 static void rbd_obj_end_request(struct rbd_obj_request *obj_req)
-- 
2.9.0



Re: [PATCH V3 4/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-04-04 Thread Thomas Gleixner
On Wed, 4 Apr 2018, Ming Lei wrote:
> On Tue, Apr 03, 2018 at 03:32:21PM +0200, Thomas Gleixner wrote:
> > On Thu, 8 Mar 2018, Ming Lei wrote:
> > > 1) before 84676c1f21 ("genirq/affinity: assign vectors to all possible 
> > > CPUs")
> > >   irq 39, cpu list 0
> > >   irq 40, cpu list 1
> > >   irq 41, cpu list 2
> > >   irq 42, cpu list 3
> > > 
> > > 2) after 84676c1f21 ("genirq/affinity: assign vectors to all possible 
> > > CPUs")
> > >   irq 39, cpu list 0-2
> > >   irq 40, cpu list 3-4,6
> > >   irq 41, cpu list 5
> > >   irq 42, cpu list 7
> > > 
> > > 3) after applying this patch against V4.15+:
> > >   irq 39, cpu list 0,4
> > >   irq 40, cpu list 1,6
> > >   irq 41, cpu list 2,5
> > >   irq 42, cpu list 3,7
> > 
> > That's more or less window dressing. If the device is already in use when
> > the offline CPUs get hot plugged, then the interrupts still stay on cpu 0-3
> > because the effective affinity of interrupts on X86 (and other
> > architectures) is always a single CPU.
> > 
> > So this only might move interrupts to the hotplugged CPUs when the device
> > is initialized after CPU hotplug and the actual vector allocation moves an
> > interrupt out to the higher numbered CPUs if they have less vectors
> > allocated than the lower numbered ones.
> 
> It works for blk-mq devices, such as NVMe.
> 
> Now NVMe driver creates num_possible_cpus() hw queues, and each
> hw queue is assigned one msix irq vector.
> 
> Storage is Client/Server model, that means the interrupt is only
> delivered to CPU after one IO request is submitted to hw queue and
> it is completed by this hw queue.
> 
> When CPUs is hotplugged, and there will be IO submitted from these
> CPUs, then finally IOs complete and irq events are generated from
> hw queues, and notify these submission CPU by IRQ finally.

I'm aware how that hw-queue stuff works. But that only works if the
spreading algorithm makes the interrupts affine to offline/not-present CPUs
when the block device is initialized.

In the example above:

> > >   irq 39, cpu list 0,4
> > >   irq 40, cpu list 1,6
> > >   irq 41, cpu list 2,5
> > >   irq 42, cpu list 3,7

and assumed that at driver init time only CPU 0-3 are online then the
hotplug of CPU 4-7 will not result in any interrupt delivered to CPU 4-7.

So the extra assignment to CPU 4-7 in the affinity mask has no effect
whatsoever and even if the spreading result is 'perfect' it just looks
perfect as it is not making any difference versus the original result:

> > >   irq 39, cpu list 0
> > >   irq 40, cpu list 1
> > >   irq 41, cpu list 2
> > >   irq 42, cpu list 3

Thanks,

tglx




Re: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()

2018-04-04 Thread Christian Borntraeger


On 03/30/2018 04:53 AM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 01:49:29PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 01:43 PM, Ming Lei wrote:
>>> On Thu, Mar 29, 2018 at 12:49:55PM +0200, Christian Borntraeger wrote:


 On 03/29/2018 12:48 PM, Ming Lei wrote:
> On Thu, Mar 29, 2018 at 12:10:11PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 03/29/2018 11:40 AM, Ming Lei wrote:
>>> On Thu, Mar 29, 2018 at 11:09:08AM +0200, Christian Borntraeger wrote:


 On 03/29/2018 09:23 AM, Christian Borntraeger wrote:
>
>
> On 03/29/2018 04:00 AM, Ming Lei wrote:
>> On Wed, Mar 28, 2018 at 05:36:53PM +0200, Christian Borntraeger 
>> wrote:
>>>
>>>
>>> On 03/28/2018 05:26 PM, Ming Lei wrote:
 Hi Christian,

 On Wed, Mar 28, 2018 at 09:45:10AM +0200, Christian Borntraeger 
 wrote:
> FWIW, this patch does not fix the issue for me:
>
> ostname=? addr=? terminal=? res=success'
> [   21.454961] WARNING: CPU: 3 PID: 1882 at block/blk-mq.c:1410 
> __blk_mq_delay_run_hw_queue+0xbe/0xd8
> [   21.454968] Modules linked in: scsi_dh_rdac scsi_dh_emc 
> scsi_dh_alua dm_mirror dm_region_hash dm_log dm_multipath dm_mod 
> autofs4
> [   21.454984] CPU: 3 PID: 1882 Comm: dasdconf.sh Not tainted 
> 4.16.0-rc7+ #26
> [   21.454987] Hardware name: IBM 2964 NC9 704 (LPAR)
> [   21.454990] Krnl PSW : c0131ea3 3ea2f7bf 
> (__blk_mq_delay_run_hw_queue+0xbe/0xd8)
> [   21.454996]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 
> AS:3 CC:0 PM:0 RI:0 EA:3
> [   21.455005] Krnl GPRS: 013abb69a000 013a 
> 013ac6c0dc00 0001
> [   21.455008] 013abb69a710 
> 013a 0001b691fd98
> [   21.455011]0001b691fd98 013ace4775c8 
> 0001 
> [   21.455014]013ac6c0dc00 00b47238 
> 0001b691fc08 0001b691fbd0
> [   21.455032] Krnl Code: 0069c596: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c59c: c0f47a5e  brcl
> 15,68ba58
>  #0069c5a2: a7f40001  
> brc 15,69c5a4
>  >0069c5a6: e340f0c4  lg  
> %r4,192(%r15)
>   0069c5ac: ebaff0a4  lmg 
> %r10,%r15,160(%r15)
>   0069c5b2: 07f4  bcr 
> 15,%r4
>   0069c5b4: c0e5feea  brasl   
> %r14,69c388
>   0069c5ba: a7f4fff6  
> brc 15,69c5a6
> [   21.455067] Call Trace:
> [   21.455072] ([<0001b691fd98>] 0x1b691fd98)
> [   21.455079]  [<0069c692>] 
> blk_mq_run_hw_queue+0xba/0x100 
> [   21.455083]  [<0069c740>] 
> blk_mq_run_hw_queues+0x68/0x88 
> [   21.455089]  [<0069b956>] 
> __blk_mq_complete_request+0x11e/0x1d8 
> [   21.455091]  [<0069ba9c>] 
> blk_mq_complete_request+0x8c/0xc8 
> [   21.455103]  [<008aa250>] 
> dasd_block_tasklet+0x158/0x490 
> [   21.455110]  [<0014c742>] tasklet_hi_action+0x92/0x120 
> [   21.455118]  [<00a7cfc0>] __do_softirq+0x120/0x348 
> [   21.455122]  [<0014c212>] irq_exit+0xba/0xd0 
> [   21.455130]  [<0010bf92>] do_IRQ+0x8a/0xb8 
> [   21.455133]  [<00a7c298>] io_int_handler+0x130/0x298 
> [   21.455136] Last Breaking-Event-Address:
> [   21.455138]  [<0069c5a2>] 
> __blk_mq_delay_run_hw_queue+0xba/0xd8
> [   21.455140] ---[ end trace be43f99a5d1e553e ]---
> [   21.510046] dasdconf.sh Warning: 0.0.241e is already online, 
> not configuring

 Thinking about this issue further, I can't understand the root 
 cause for
 this issue.

 FWIW, Limiting CONFIG_NR_CPUS to 64 seems to make the problem go away.
>>>
>>> I think the following patch is needed, and this way aligns to the 
>>> mapping
>>> created via managed IRQ at least.
>>>
>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>> index