I remembered one thing I changed some days ago. Cahnging the default io
scheduler from cfq to anticipatory. With the latter one, it was
impossible to resync the software raid1 md3, as you can see in dmesg
logs. Changed it back to defaults and waited for the raid to be synced
again. After that started the kvm guests again. But still get lot of
kernel messages:
See:
[ 248.800024] Clocksource tsc unstable (delta = -270012333 ns)
[ 6720.520038] INFO: task flush-9:2:454 blocked for more than 120 seconds.
[ 6720.524331] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this
message.
[ 6720.530099] flush-9:2 D 0 454 2 0x
[ 6720.530109] 8801938578d0 0046 00015b80
00015b80
[ 6720.530118] 880193859ab0 880193857fd8 00015b80
8801938596f0
[ 6720.530126] 00015b80 880193857fd8 00015b80
880193859ab0
[ 6720.530134] Call Trace:
[ 6720.530151] [8116c730] ? sync_buffer+0x0/0x50
[ 6720.530161] [8153e697] io_schedule+0x47/0x70
[ 6720.530168] [8116c775] sync_buffer+0x45/0x50
[ 6720.530175] [8153ed9a] __wait_on_bit_lock+0x5a/0xc0
[ 6720.530182] [8116c730] ? sync_buffer+0x0/0x50
[ 6720.530189] [8116cb20] ? end_buffer_async_write+0x0/0x180
[ 6720.530196] [8153ee78] out_of_line_wait_on_bit_lock+0x78/0x90
[ 6720.530205] [81085340] ? wake_bit_function+0x0/0x40
[ 6720.530212] [8116c8f6] __lock_buffer+0x36/0x40
[ 6720.530219] [8116d644] __block_write_full_page+0x374/0x3a0
[ 6720.530227] [810f39e7] ? unlock_page+0x27/0x30
[ 6720.530234] [8116cb20] ? end_buffer_async_write+0x0/0x180
[ 6720.530241] [8116cb20] ? end_buffer_async_write+0x0/0x180
[ 6720.530249] [8116dfd0] block_write_full_page_endio+0xe0/0x120
[ 6720.530256] [8116cb20] ? end_buffer_async_write+0x0/0x180
[ 6720.530263] [8116e025] block_write_full_page+0x15/0x20
[ 6720.530271] [811b636d] ext3_ordered_writepage+0x1dd/0x200
[ 6720.530279] [810fb907] __writepage+0x17/0x40
[ 6720.530287] [810fcac7] write_cache_pages+0x227/0x4d0
[ 6720.530294] [810fb8f0] ? __writepage+0x0/0x40
[ 6720.530302] [810fcd94] generic_writepages+0x24/0x30
[ 6720.530309] [810fcdd5] do_writepages+0x35/0x40
[ 6720.530315] [81164b66] writeback_single_inode+0xf6/0x3d0
[ 6720.530322] [811657d0] writeback_inodes_wb+0x410/0x5e0
[ 6720.530328] [81165aaa] wb_writeback+0x10a/0x1d0
[ 6720.530335] [81077895] ? try_to_del_timer_sync+0x75/0xd0
[ 6720.530342] [8153eb7b] ? schedule_timeout+0x19b/0x300
[ 6720.530348] [81165ddc] wb_do_writeback+0x18c/0x1a0
[ 6720.530355] [81165e43] bdi_writeback_task+0x53/0xe0
[ 6720.530363] [8110e726] bdi_start_fn+0x86/0x100
[ 6720.530369] [8110e6a0] ? bdi_start_fn+0x0/0x100
[ 6720.530375] [81084f86] kthread+0x96/0xa0
[ 6720.530383] [810141ea] child_rip+0xa/0x20
[ 6720.530389] [81084ef0] ? kthread+0x0/0xa0
[ 6720.530395] [810141e0] ? child_rip+0x0/0x20
[ 6720.530400] INFO: task kjournald:459 blocked for more than 120 seconds.
[ 6720.534113] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this
message.
[ 6720.541964] kjournald D 0 459 2 0x
[ 6720.541973] 880193a2bc30 0046 00015b80
00015b80
[ 6720.541981] 8801938b9ab0 880193a2bfd8 00015b80
8801938b96f0
[ 6720.541995] 00015b80 880193a2bfd8 00015b80
8801938b9ab0
[ 6720.542014] Call Trace:
[ 6720.542026] [8116c730] ? sync_buffer+0x0/0x50
[ 6720.542037] [8153e697] io_schedule+0x47/0x70
[ 6720.542050] [8116c775] sync_buffer+0x45/0x50
[ 6720.542061] [8153eeef] __wait_on_bit+0x5f/0x90
[ 6720.542074] [8116b4f1] ? submit_bh+0x111/0x140
[ 6720.542086] [8116c730] ? sync_buffer+0x0/0x50
[ 6720.542097] [8153ef98] out_of_line_wait_on_bit+0x78/0x90
[ 6720.542110] [81085340] ? wake_bit_function+0x0/0x40
[ 6720.542122] [8116c726] __wait_on_buffer+0x26/0x30
[ 6720.542136] [81212e0b] journal_commit_transaction+0x86b/0xe90
[ 6720.542152] [810397a9] ? default_spin_lock_flags+0x9/0x10
[ 6720.542164] [81076e0c] ? lock_timer_base+0x3c/0x70
[ 6720.542175] [81077895] ? try_to_del_timer_sync+0x75/0xd0
[ 6720.542189] [812167dd] kjournald+0xed/0x250
[ 6720.542201] [81085300] ? autoremove_wake_function+0x0/0x40
[ 6720.542214] [812166f0] ? kjournald+0x0/0x250
[ 6720.542225] [81084f86] kthread+0x96/0xa0
[ 6720.542236] [810141ea] child_rip+0xa/0x20
[ 6720.542248] [81084ef0] ? kthread+0x0/0xa0
[ 6720.542259] [810141e0] ? child_rip+0x0/0x20
[ 6720.542280] INFO: task openvpn:1591 blocked for more than 120 seconds.
[ 6720.546980]