Re: [Devel] [PATCH rh7] fs: add __GFP_NORETRY in alloc_fdmem

2017-03-16 Thread Andrey Ryabinin


On 03/16/2017 06:03 PM, Konstantin Khorenko wrote:
> Andrey, please take a look.
> 
> All other patches from Anatoly are applied already, except this one.
> Worth to apply this one as well?
> 

Yep,
Acked-by: Andrey Ryabinin 


> -- 
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
> 
> On 10/21/2016 02:42 PM, Anatoly Stepanov wrote:
>> This is a backport of upstream (vanilla) commit:
>> commit 96c7a2ff21501691587e1ae969b83cbec8b78e08
>>
>> Under certain conditions there might be a lot of
>> alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.
>>
>> For example: httpd which is doing a lot of fork() calls.
>>
>> Real-life examples from our customers:
>>
>> [532506.773243] httpd   D 8803f5fecc20 0 939874   6606
>> [532506.773257] Call Trace:
>> [532506.773261]  [] schedule+0x29/0x70
>> [532506.773264]  [] schedule_timeout+0x175/0x2d0
>> [532506.773272]  [] ? internal_add_timer+0x70/0x70
>> [532506.773276]  [] io_schedule_timeout+0xae/0x130
>> [532506.773280]  [] wait_iff_congested+0x135/0x150
>> [532506.773284]  [] ? wake_up_atomic_t+0x30/0x30
>> [532506.773288]  [] shrink_inactive_list+0x65f/0x6c0
>> [532506.773292]  [] shrink_lruvec+0x395/0x800
>> [532506.773296]  [] shrink_zone+0xef/0x2d0
>> [532506.773300]  [] do_try_to_free_pages+0x170/0x530
>> [532506.773310]  [] try_to_free_pages+0xd5/0x160
>> [532506.773315]  [] __alloc_pages_nodemask+0x8ab/0xc10
>> [532506.773320]  [] alloc_pages_current+0xa9/0x170
>> [532506.773324]  [] kmalloc_order+0x18/0x50
>> [532506.773327]  [] kmalloc_order_trace+0x26/0xa0
>> [532506.773332]  [] __kmalloc+0x259/0x270
>> [532506.773337]  [] alloc_fdmem+0x20/0x50
>> [532506.773341]  [] alloc_fdtable+0x6c/0xe0
>> [532506.773344]  [] dup_fd+0x1f9/0x2d0
>> [532506.773354]  [] copy_process.part.30+0x87f/0x1510
>> [532506.773358]  [] do_fork+0xe1/0x320
>> [532506.773370]  [] SyS_clone+0x16/0x20
>> [532506.773376]  [] stub_clone+0x69/0x90
>> [532506.773380]  [] ? system_call_fastpath+0x16/0x1b
>>
>> [513890.005271] httpd   D 880425db7230 0 811718   6606
>> [513890.005279] Call Trace:
>> [513890.005282]  [] schedule+0x29/0x70
>> [513890.005284]  [] schedule_timeout+0x239/0x2d0
>> [513890.005292]  [] io_schedule_timeout+0xae/0x130
>> [513890.005296]  [] io_schedule+0x18/0x20
>> [513890.005298]  [] get_request+0x218/0x780
>> [513890.005303]  [] blk_queue_bio+0xc6/0x3a0
>> [513890.005309]  [] ? dm_make_request+0x119/0x170 [dm_mod]
>> [513890.005311]  [] generic_make_request+0xe2/0x130
>> [513890.005313]  [] submit_bio+0x77/0x1c0
>> [513890.005318]  [] __swap_writepage+0x1be/0x260
>> [513890.005337]  [] swap_writepage+0x39/0x80
>> [513890.005340]  [] shrink_page_list+0x4ad/0xa80
>> [513890.005343]  [] shrink_inactive_list+0x1fb/0x6c0
>> [513890.005345]  [] shrink_lruvec+0x395/0x800
>> [513890.005348]  [] shrink_zone+0xef/0x2d0
>> [513890.005350]  [] do_try_to_free_pages+0x170/0x530
>> [513890.005353]  [] try_to_free_pages+0xd5/0x160
>> [513890.005355]  [] __alloc_pages_nodemask+0x8ab/0xc10
>> [513890.005358]  [] alloc_pages_current+0xa9/0x170
>> [513890.005360]  [] kmalloc_order+0x18/0x50
>> [513890.005362]  [] kmalloc_order_trace+0x26/0xa0
>> [513890.005365]  [] __kmalloc+0x259/0x270
>> [513890.005367]  [] alloc_fdmem+0x20/0x50
>> [513890.005369]  [] alloc_fdtable+0x6c/0xe0
>> [513890.005371]  [] dup_fd+0x1f9/0x2d0
>> [513890.005376]  [] copy_process.part.30+0x87f/0x1510
>> [513890.005378]  [] do_fork+0xe1/0x320
>> [513890.005380]  [] SyS_clone+0x16/0x20
>> [513890.005382]  [] stub_clone+0x69/0x90
>>
>> We observed that sometimes kswapd cannot handle this which
>> causes many direct reclaim attempts which in turn:
>>
>> 1. Increases iowait time due to congestion_wait
>> 2. Increases number of block reqs per second due to
>> page swapping and writeback
>> 3. May induce OOMs
>>
>> So it's better DO NOT try that hard to allocate contiguous
>> area, and fallback to vmalloc() as soon as possible.
>>
>> Signed-off-by: Anatoly Stepanov 
>> ---
>>  fs/file.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/file.c b/fs/file.c
>> index 366d9bb..3f65ba0 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -36,7 +36,7 @@ static void *alloc_fdmem(size_t size)
>>   * vmalloc() if the allocation size will be considered "large" by the 
>> VM.
>>   */
>>  if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>> -void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
>> +void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>>  if (data != NULL)
>>  return data;
>>  }
>>
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ms/fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem

2017-03-16 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.10.2.vz7.29.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.10.2.vz7.29.4
-->
commit 2d47a05314ed0fd03df75c419eeda00fab40ad2d
Author: Eric W. Biederman 
Date:   Thu Mar 16 19:21:00 2017 +0400

ms/fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem

This is a backport of upstream (vanilla) commit:
commit 96c7a2ff21501691587e1ae969b83cbec8b78e08 ("fs/file.c:fdtable: avoid
triggering OOMs from alloc_fdmem")

Under certain conditions there might be a lot of
alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.

For example: httpd which is doing a lot of fork() calls.

Real-life examples from our customers:

[532506.773243] httpd   D 8803f5fecc20 0 939874   6606
[532506.773257] Call Trace:
[532506.773261]  [] schedule+0x29/0x70
[532506.773264]  [] schedule_timeout+0x175/0x2d0
[532506.773272]  [] ? internal_add_timer+0x70/0x70
[532506.773276]  [] io_schedule_timeout+0xae/0x130
[532506.773280]  [] wait_iff_congested+0x135/0x150
[532506.773284]  [] ? wake_up_atomic_t+0x30/0x30
[532506.773288]  [] shrink_inactive_list+0x65f/0x6c0
[532506.773292]  [] shrink_lruvec+0x395/0x800
[532506.773296]  [] shrink_zone+0xef/0x2d0
[532506.773300]  [] do_try_to_free_pages+0x170/0x530
[532506.773310]  [] try_to_free_pages+0xd5/0x160
[532506.773315]  [] __alloc_pages_nodemask+0x8ab/0xc10
[532506.773320]  [] alloc_pages_current+0xa9/0x170
[532506.773324]  [] kmalloc_order+0x18/0x50
[532506.773327]  [] kmalloc_order_trace+0x26/0xa0
[532506.773332]  [] __kmalloc+0x259/0x270
[532506.773337]  [] alloc_fdmem+0x20/0x50
[532506.773341]  [] alloc_fdtable+0x6c/0xe0
[532506.773344]  [] dup_fd+0x1f9/0x2d0
[532506.773354]  [] copy_process.part.30+0x87f/0x1510
[532506.773358]  [] do_fork+0xe1/0x320
[532506.773370]  [] SyS_clone+0x16/0x20
[532506.773376]  [] stub_clone+0x69/0x90
[532506.773380]  [] ? system_call_fastpath+0x16/0x1b

[513890.005271] httpd   D 880425db7230 0 811718   6606
[513890.005279] Call Trace:
[513890.005282]  [] schedule+0x29/0x70
[513890.005284]  [] schedule_timeout+0x239/0x2d0
[513890.005292]  [] io_schedule_timeout+0xae/0x130
[513890.005296]  [] io_schedule+0x18/0x20
[513890.005298]  [] get_request+0x218/0x780
[513890.005303]  [] blk_queue_bio+0xc6/0x3a0
[513890.005309]  [] ? dm_make_request+0x119/0x170 [dm_mod]
[513890.005311]  [] generic_make_request+0xe2/0x130
[513890.005313]  [] submit_bio+0x77/0x1c0
[513890.005318]  [] __swap_writepage+0x1be/0x260
[513890.005337]  [] swap_writepage+0x39/0x80
[513890.005340]  [] shrink_page_list+0x4ad/0xa80
[513890.005343]  [] shrink_inactive_list+0x1fb/0x6c0
[513890.005345]  [] shrink_lruvec+0x395/0x800
[513890.005348]  [] shrink_zone+0xef/0x2d0
[513890.005350]  [] do_try_to_free_pages+0x170/0x530
[513890.005353]  [] try_to_free_pages+0xd5/0x160
[513890.005355]  [] __alloc_pages_nodemask+0x8ab/0xc10
[513890.005358]  [] alloc_pages_current+0xa9/0x170
[513890.005360]  [] kmalloc_order+0x18/0x50
[513890.005362]  [] kmalloc_order_trace+0x26/0xa0
[513890.005365]  [] __kmalloc+0x259/0x270
[513890.005367]  [] alloc_fdmem+0x20/0x50
[513890.005369]  [] alloc_fdtable+0x6c/0xe0
[513890.005371]  [] dup_fd+0x1f9/0x2d0
[513890.005376]  [] copy_process.part.30+0x87f/0x1510
[513890.005378]  [] do_fork+0xe1/0x320
[513890.005380]  [] SyS_clone+0x16/0x20
[513890.005382]  [] stub_clone+0x69/0x90

We observed that sometimes kswapd cannot handle this which
causes many direct reclaim attempts which in turn:

1. Increases iowait time due to congestion_wait
2. Increases number of block reqs per second due to
page swapping and writeback
3. May induce OOMs

So it's better DO NOT try that hard to allocate contiguous
area, and fallback to vmalloc() as soon as possible.

=
Original commit message:

fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem

Recently due to a spike in connections per second memcached on 3
separate boxes triggered the OOM killer from accept.  At the time the
OOM killer was triggered there was 4GB out of 36GB free in zone 1.  The
problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
hold a bitmap, and there was sufficient fragmentation that the largest
page available was 8KiB.

I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
but I do agree that order 3 allocations are very likely to succeed.

There are always pathologies where order > 0 allocations can fail when
there are copious amounts 

Re: [Devel] [PATCH rh7] fs: add __GFP_NORETRY in alloc_fdmem

2017-03-16 Thread Konstantin Khorenko

Andrey, please take a look.

All other patches from Anatoly are applied already, except this one.
Worth to apply this one as well?

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 10/21/2016 02:42 PM, Anatoly Stepanov wrote:

This is a backport of upstream (vanilla) commit:
commit 96c7a2ff21501691587e1ae969b83cbec8b78e08

Under certain conditions there might be a lot of
alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.

For example: httpd which is doing a lot of fork() calls.

Real-life examples from our customers:

[532506.773243] httpd   D 8803f5fecc20 0 939874   6606
[532506.773257] Call Trace:
[532506.773261]  [] schedule+0x29/0x70
[532506.773264]  [] schedule_timeout+0x175/0x2d0
[532506.773272]  [] ? internal_add_timer+0x70/0x70
[532506.773276]  [] io_schedule_timeout+0xae/0x130
[532506.773280]  [] wait_iff_congested+0x135/0x150
[532506.773284]  [] ? wake_up_atomic_t+0x30/0x30
[532506.773288]  [] shrink_inactive_list+0x65f/0x6c0
[532506.773292]  [] shrink_lruvec+0x395/0x800
[532506.773296]  [] shrink_zone+0xef/0x2d0
[532506.773300]  [] do_try_to_free_pages+0x170/0x530
[532506.773310]  [] try_to_free_pages+0xd5/0x160
[532506.773315]  [] __alloc_pages_nodemask+0x8ab/0xc10
[532506.773320]  [] alloc_pages_current+0xa9/0x170
[532506.773324]  [] kmalloc_order+0x18/0x50
[532506.773327]  [] kmalloc_order_trace+0x26/0xa0
[532506.773332]  [] __kmalloc+0x259/0x270
[532506.773337]  [] alloc_fdmem+0x20/0x50
[532506.773341]  [] alloc_fdtable+0x6c/0xe0
[532506.773344]  [] dup_fd+0x1f9/0x2d0
[532506.773354]  [] copy_process.part.30+0x87f/0x1510
[532506.773358]  [] do_fork+0xe1/0x320
[532506.773370]  [] SyS_clone+0x16/0x20
[532506.773376]  [] stub_clone+0x69/0x90
[532506.773380]  [] ? system_call_fastpath+0x16/0x1b

[513890.005271] httpd   D 880425db7230 0 811718   6606
[513890.005279] Call Trace:
[513890.005282]  [] schedule+0x29/0x70
[513890.005284]  [] schedule_timeout+0x239/0x2d0
[513890.005292]  [] io_schedule_timeout+0xae/0x130
[513890.005296]  [] io_schedule+0x18/0x20
[513890.005298]  [] get_request+0x218/0x780
[513890.005303]  [] blk_queue_bio+0xc6/0x3a0
[513890.005309]  [] ? dm_make_request+0x119/0x170 [dm_mod]
[513890.005311]  [] generic_make_request+0xe2/0x130
[513890.005313]  [] submit_bio+0x77/0x1c0
[513890.005318]  [] __swap_writepage+0x1be/0x260
[513890.005337]  [] swap_writepage+0x39/0x80
[513890.005340]  [] shrink_page_list+0x4ad/0xa80
[513890.005343]  [] shrink_inactive_list+0x1fb/0x6c0
[513890.005345]  [] shrink_lruvec+0x395/0x800
[513890.005348]  [] shrink_zone+0xef/0x2d0
[513890.005350]  [] do_try_to_free_pages+0x170/0x530
[513890.005353]  [] try_to_free_pages+0xd5/0x160
[513890.005355]  [] __alloc_pages_nodemask+0x8ab/0xc10
[513890.005358]  [] alloc_pages_current+0xa9/0x170
[513890.005360]  [] kmalloc_order+0x18/0x50
[513890.005362]  [] kmalloc_order_trace+0x26/0xa0
[513890.005365]  [] __kmalloc+0x259/0x270
[513890.005367]  [] alloc_fdmem+0x20/0x50
[513890.005369]  [] alloc_fdtable+0x6c/0xe0
[513890.005371]  [] dup_fd+0x1f9/0x2d0
[513890.005376]  [] copy_process.part.30+0x87f/0x1510
[513890.005378]  [] do_fork+0xe1/0x320
[513890.005380]  [] SyS_clone+0x16/0x20
[513890.005382]  [] stub_clone+0x69/0x90

We observed that sometimes kswapd cannot handle this which
causes many direct reclaim attempts which in turn:

1. Increases iowait time due to congestion_wait
2. Increases number of block reqs per second due to
page swapping and writeback
3. May induce OOMs

So it's better DO NOT try that hard to allocate contiguous
area, and fallback to vmalloc() as soon as possible.

Signed-off-by: Anatoly Stepanov 
---
 fs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/file.c b/fs/file.c
index 366d9bb..3f65ba0 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -36,7 +36,7 @@ static void *alloc_fdmem(size_t size)
 * vmalloc() if the allocation size will be considered "large" by the 
VM.
 */
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-   void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
+   void *data = kmalloc(size, 
GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
if (data != NULL)
return data;
}


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] WARNING at mm/slub.c

2017-03-16 Thread Denis Kirjanov
Hi guys,

with the kernel rh7-3.10.0-327.36.1.vz7.18.7 we're seeing the
following WARNING while running LTP test suite:

[11796.576981] WARNING: at mm/slub.c:1252
slab_pre_alloc_hook.isra.42.part.43+0x15/0x17()

[11796.591008] Call Trace:
[11796.592065]  [] dump_stack+0x19/0x1b
[11796.593076]  [] warn_slowpath_common+0x70/0xb0
[11796.594228]  [] warn_slowpath_null+0x1a/0x20
[11796.595442]  []
slab_pre_alloc_hook.isra.42.part.43+0x15/0x17
[11796.596686]  [] kmem_cache_alloc_trace+0x58/0x230
[11796.597965]  [] ? kmapset_new+0x1e/0x50
[11796.599224]  [] kmapset_new+0x1e/0x50
[11796.600433]  [] __sysfs_add_one+0x4a/0xb0
[11796.601431]  [] sysfs_add_one+0x1b/0xd0
[11796.602451]  [] sysfs_add_file_mode+0xb7/0x100
[11796.603449]  [] sysfs_create_file+0x2a/0x30
[11796.604461]  [] kobject_add_internal+0x16c/0x2f0
[11796.605503]  [] kobject_add+0x75/0xd0
[11796.606627]  [] ? kmem_cache_alloc_trace+0x207/0x230
[11796.607655]  [] __link_block_group+0xe1/0x120 [btrfs]
[11796.608634]  [] btrfs_make_block_group+0x150/0x270 [btrfs]
[11796.609701]  [] __btrfs_alloc_chunk+0x67f/0x8a0 [btrfs]
[11796.610756]  [] btrfs_alloc_chunk+0x34/0x40 [btrfs]
[11796.611800]  [] do_chunk_alloc+0x23f/0x410 [btrfs]
[11796.612954]  []
btrfs_check_data_free_space+0xea/0x280 [btrfs]
[11796.614008]  [] __btrfs_buffered_write+0x151/0x5c0 [btrfs]
[11796.615153]  [] btrfs_file_aio_write+0x246/0x560 [btrfs]
[11796.616141]  [] ? __mem_cgroup_commit_charge+0x152/0x350
[11796.617220]  [] do_sync_write+0x90/0xe0
[11796.618253]  [] vfs_write+0xbd/0x1e0
[11796.619224]  [] SyS_write+0x7f/0xe0
[11796.620185]  [] system_call_fastpath+0x16/0x1b
[11796.621145] ---[ end trace 1437311f89b9e3c6 ]---

Thanks!
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ms/jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path

2017-03-16 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.10.2.vz7.29.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.10.2.vz7.29.3
-->
commit 6729ff4cc8df023add27a85eb8d1f0ec3b834a76
Author: OGAWA Hirofumi 
Date:   Thu Mar 16 15:42:19 2017 +0400

ms/jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount 
path

On umount path, jbd2_journal_destroy() writes latest transaction ID
(->j_tail_sequence) to be used at next mount.

The bug is that ->j_tail_sequence is not holding latest transaction ID
in some cases. So, at next mount, there is chance to conflict with
remaining (not overwritten yet) transactions.

mount (id=10)
write transaction (id=11)
write transaction (id=12)
umount (id=10) <= the bug doesn't write latest ID

mount (id=10)
write transaction (id=11)
crash

mount
[recovery process]
transaction (id=11)
transaction (id=12) <= valid transaction ID, but old commit
   must not replay

Like above, this bug become the cause of recovery failure, or FS
corruption.

So why ->j_tail_sequence doesn't point latest ID?

Because if checkpoint transactions was reclaimed by memory pressure
(i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated.
(And another case is, __jbd2_journal_clean_checkpoint_list() is called
with empty transaction.)

So in above cases, ->j_tail_sequence is not pointing latest
transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not
done too.

So, to fix this problem with minimum changes, this patch updates
->j_tail_sequence, and issue REQ_FLUSH.  (With more complex changes,
some optimizations would be possible to avoid unnecessary REQ_FLUSH
for example though.)

BTW,

journal->j_tail_sequence =
++journal->j_transaction_sequence;

Increment of ->j_transaction_sequence seems to be unnecessary, but
ext3 does this.

Signed-off-by: OGAWA Hirofumi 
Signed-off-by: Theodore Ts'o 
Cc: sta...@vger.kernel.org

ms commit: c0a2ad9 ("jbd2: fix FS corruption possibility in
jbd2_journal_destroy() on umount path")

Signed-off-by: Anatoly Stepanov 
---
 fs/jbd2/journal.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 6c54b78..868a923 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1406,11 +1406,12 @@ void jbd2_journal_update_sb_log_tail(journal_t 
*journal, tid_t tail_tid,
 /**
  * jbd2_mark_journal_empty() - Mark on disk journal as empty.
  * @journal: The journal to update.
+ * @write_op: With which operation should we write the journal sb
  *
  * Update a journal's dynamic superblock fields to show that journal is empty.
  * Write updated superblock to disk waiting for IO to complete.
  */
-static void jbd2_mark_journal_empty(journal_t *journal)
+static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
 {
journal_superblock_t *sb = journal->j_superblock;
 
@@ -1428,7 +1429,7 @@ static void jbd2_mark_journal_empty(journal_t *journal)
sb->s_start= cpu_to_be32(0);
read_unlock(>j_state_lock);
 
-   jbd2_write_superblock(journal, WRITE_FUA);
+   jbd2_write_superblock(journal, write_op);
 
/* Log is no longer empty */
write_lock(>j_state_lock);
@@ -1704,7 +1705,13 @@ int jbd2_journal_destroy(journal_t *journal)
if (journal->j_sb_buffer) {
if (!is_journal_aborted(journal)) {
mutex_lock(>j_checkpoint_mutex);
-   jbd2_mark_journal_empty(journal);
+
+   write_lock(>j_state_lock);
+   journal->j_tail_sequence =
+   ++journal->j_transaction_sequence;
+   write_unlock(>j_state_lock);
+
+   jbd2_mark_journal_empty(journal, WRITE_FLUSH_FUA);
mutex_unlock(>j_checkpoint_mutex);
} else
err = -EIO;
@@ -1956,7 +1963,7 @@ int jbd2_journal_flush(journal_t *journal)
 * the magic code for a fully-recovered superblock.  Any future
 * commits of data to the journal will restore the current
 * s_start value. */
-   jbd2_mark_journal_empty(journal);
+   jbd2_mark_journal_empty(journal, WRITE_FUA);
mutex_unlock(>j_checkpoint_mutex);
write_lock(>j_state_lock);
J_ASSERT(!journal->j_running_transaction);
@@ -2001,7 +2008,7 @@ int jbd2_journal_wipe(journal_t *journal, int write)
if (write) {
/* Lock to make assertions happy... */

[Devel] [PATCH RHEL7 COMMIT] ms/jbd2: fix incorrect unlock on j_list_lock

2017-03-16 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-514.10.2.vz7.29.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.10.2.vz7.29.3
-->
commit 0672ec203b2c812f45694f2cc0a5121adf2ca4ec
Author: Taesoo Kim 
Date:   Thu Mar 16 15:42:20 2017 +0400

ms/jbd2: fix incorrect unlock on j_list_lock

When 'jh->b_transaction == transaction' (asserted by below)

  J_ASSERT_JH(jh, (jh->b_transaction == transaction || ...

'journal->j_list_lock' will be incorrectly unlocked, since
the the lock is aquired only at the end of if / else-if
statements (missing the else case).

Signed-off-by: Taesoo Kim 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Andreas Dilger 
Fixes: 6e4862a5bb9d12be87e4ea5d9a60836ebed71d28
Cc: sta...@vger.kernel.org # 3.14+

ms commit: 559cce6 ("jbd2: fix incorrect unlock on j_list_lock")

Signed-off-by: Anatoly Stepanov 
---
 fs/jbd2/transaction.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index b249f40..ed52bf7 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1106,6 +1106,7 @@ int jbd2_journal_get_create_access(handle_t *handle, 
struct buffer_head *bh)
JBUFFER_TRACE(jh, "file as BJ_Reserved");
spin_lock(>j_list_lock);
__jbd2_journal_file_buffer(jh, transaction, BJ_Reserved);
+   spin_unlock(>j_list_lock);
} else if (jh->b_transaction == journal->j_committing_transaction) {
/* first access by this transaction */
jh->b_modified = 0;
@@ -1113,8 +1114,8 @@ int jbd2_journal_get_create_access(handle_t *handle, 
struct buffer_head *bh)
JBUFFER_TRACE(jh, "set next transaction");
spin_lock(>j_list_lock);
jh->b_next_transaction = transaction;
+   spin_unlock(>j_list_lock);
}
-   spin_unlock(>j_list_lock);
jbd_unlock_bh_state(bh);
 
/*
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel