On 08/09/2012 07:49 AM, Mel Gorman wrote:
Changelog since V2
o Capture !MIGRATE_MOVABLE pages where possible
o Document the treatment of MIGRATE_MOVABLE pages while capturing
o Expand changelogs
Changelog since V1
o Dropped kswapd related patch, basically a no-op and regresses if fixed
(minchan)
o Expanded changelogs a little
Allocation success rates have been far lower since 3.4 due to commit
[fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
commit was introduced for good reasons and it was known in advance that
the success rates would suffer but it was justified on the grounds that
the high allocation success rates were achieved by aggressive reclaim.
Success rates are expected to suffer even more in 3.6 due to commit
[7db8889a: mm: have order> 0 compaction start off where it left] which
testing has shown to severely reduce allocation success rates under load -
to 0% in one case. There is a proposed change to that patch in this series
and it would be ideal if Jim Schutt could retest the workload that led to
commit [7db8889a: mm: have order> 0 compaction start off where it left].
On my first test of this patch series on top of 3.5, I ran into an
instance of what I think is the sort of thing that patch 4/5 was
fixing. Here's what vmstat had to say during that period:
----------
2012-08-09 11:58:04.107-06:00
vmstat -w 4 16
procs -------------------memory------------------ ---swap-- -----io----
--system-- -----cpu-------
r b swpd free buff cache si so bi bo in
cs us sy id wa st
20 14 0 235884 576 38916072 0 0 12 17047 171
133 3 8 85 4 0
18 17 0 220272 576 38955912 0 0 86 2131838
200142 162956 12 38 31 19 0
17 9 0 244284 576 38955328 0 0 19 2179562
213775 167901 13 43 26 18 0
27 15 0 223036 576 38952640 0 0 24 2202816
217996 158390 14 47 25 15 0
17 16 0 233124 576 38959908 0 0 5 2268815
224647 165728 14 50 21 15 0
16 13 0 225840 576 38995740 0 0 52 2253829
216797 160551 14 47 23 16 0
22 13 0 260584 576 38982908 0 0 92 2196737
211694 140924 14 53 19 15 0
16 10 0 235784 576 38917128 0 0 22 2157466
210022 137630 14 54 19 14 0
12 13 0 214300 576 38923848 0 0 31 2187735
213862 142711 14 52 20 14 0
25 12 0 219528 576 38919540 0 0 11 2066523
205256 142080 13 49 23 15 0
26 14 0 229460 576 38913704 0 0 49 2108654
200692 135447 13 51 21 15 0
11 11 0 220376 576 38862456 0 0 45 2136419
207493 146813 13 49 22 16 0
36 12 0 229860 576 38869784 0 0 7 2163463
212223 151812 14 47 25 14 0
16 13 0 238356 576 38891496 0 0 67 2251650
221728 154429 14 52 20 14 0
65 15 0 211536 576 38922108 0 0 59 2237925
224237 156587 14 53 19 14 0
24 13 0 585024 576 38634024 0 0 37 2240929
229040 148192 15 61 14 10 0
2012-08-09 11:59:04.714-06:00
vmstat -w 4 16
procs -------------------memory------------------ ---swap-- -----io----
--system-- -----cpu-------
r b swpd free buff cache si so bi bo in
cs us sy id wa st
43 8 0 794392 576 38382316 0 0 11 20491 576
420 3 10 82 4 0
127 6 0 579328 576 38422156 0 0 21 2006775
205582 119660 12 70 11 7 0
44 5 0 492860 576 38512360 0 0 46 1536525
173377 85320 10 78 7 4 0
218 9 0 585668 576 38271320 0 0 39 1257266
152869 64023 8 83 7 3 0
101 6 0 600168 576 38128104 0 0 10 1438705
160769 68374 9 84 5 3 0
62 5 0 597004 576 38098972 0 0 93 1376841
154012 63912 8 82 7 4 0
61 11 0 850396 576 37808772 0 0 46 1186816
145731 70453 7 78 9 6 0
124 7 0 437388 576 38126320 0 0 15 1208434
149736 57142 7 86 4 3 0
204 11 0 1105816 576 37309532 0 0 20 1327833
145979 52718 7 87 4 2 0
29 8 0 751020 576 37360332 0 0 8 1405474
169916 61982 9 85 4 2 0
38 7 0 626448 576 37333244 0 0 14 1328415
174665 74214 8 84 5 3 0
23 5 0 650040 576 37134280 0 0 28 1351209
179220 71631 8 85 5 2 0
40 10 0 610988 576 37054292 0 0 104 1272527
167530 73527 7 85 5 3 0
79 22 0 2076836 576 35487340 0 0 750 1249934
175420 70124 7 88 3 2 0
58 6 0 431068 576 36934140 0 0 1000 1366234
169675 72524 8 84 5 3 0
134 9 0 574692 576 36784980 0 0 1049 1305543
152507 62639 8 84 4 4 0
2012-08-09 12:00:09.137-06:00
vmstat -w 4 16
procs -------------------memory------------------ ---swap-- -----io----
--system-- -----cpu-------
r b swpd free buff cache si so bi bo in
cs us sy id wa st
163 8 0 464308 576 36791368 0 0 11 22210 866
536 3 13 79 4 0
207 14 0 917752 576 36181928 0 0 712 1345376
134598 47367 7 90 1 2 0
123 12 0 685516 576 36296148 0 0 429 1386615
158494 60077 8 84 5 3 0
123 12 0 598572 576 36333728 0 0 1107 1233281
147542 62351 7 84 5 4 0
622 7 0 660768 576 36118264 0 0 557 1345548
151394 59353 7 85 4 3 0
223 11 0 283960 576 36463868 0 0 46 1107160
121846 33006 6 93 1 1 0
104 14 0 3140508 576 33522616 0 0 299 1414709
160879 51422 9 89 1 1 0
100 11 0 1323036 576 35337740 0 0 429 1637733
175817 94471 9 73 10 8 0
91 11 0 673320 576 35918084 0 0 562 1477100
157069 67951 8 83 5 4 0
35 15 0 3486592 576 32983244 0 0 384 1574186
189023 82135 9 81 5 5 0
51 16 0 1428108 576 34962112 0 0 394 1573231
160575 76632 9 76 9 7 0
55 6 0 719548 576 35621284 0 0 425 1483962
160335 79991 8 74 10 7 0
96 7 0 1226852 576 35062608 0 0 803 1531041
164923 70820 9 78 7 6 0
97 8 0 862500 576 35332496 0 0 536 1177949
155969 80769 7 74 13 7 0
23 5 0 6096372 576 30115776 0 0 367 919949 124993
81755 6 62 24 8 0
13 5 0 7427860 576 28368292 0 0 399 915331 153895
102186 6 53 32 9 0
----------
And here's a perf report, captured/displayed with
perf record -g -a sleep 10
perf report --sort symbol --call-graph fractal,5
sometime during that period just after 12:00:09, when
the run queueu was > 100.
----------
Processed 0 events and LOST 1175296!
Check IO/CPU overload!
# Events: 208K cycles
#
# Overhead
Symbol
# ........
.....................................................................................................................................................................................
.................................................................................................................................................................................................
............................................................................................................
#
34.63% [k] _raw_spin_lock_irqsave
|
|--97.30%-- isolate_freepages
| compaction_alloc
| unmap_and_move
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_slowpath
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| do_page_fault
| page_fault
| |
| |--87.39%-- skb_copy_datagram_iovec
| | tcp_recvmsg
| | inet_recvmsg
| | sock_recvmsg
| | sys_recvfrom
| | system_call
| | __recv
| | |
| | --100.00%-- (nil)
| |
| --12.61%-- memcpy
--2.70%-- [...]
14.31% [k] _raw_spin_lock_irq
|
|--98.08%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_slowpath
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| do_page_fault
| page_fault
| |
| |--83.93%-- skb_copy_datagram_iovec
| | tcp_recvmsg
| | inet_recvmsg
| | sock_recvmsg
| | sys_recvfrom
| | system_call
| | __recv
| | |
| | --100.00%-- (nil)
| |
| --16.07%-- memcpy
--1.92%-- [...]
5.48% [k] isolate_freepages_block
|
|--99.96%-- isolate_freepages
| compaction_alloc
| unmap_and_move
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_slowpath
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| do_page_fault
| page_fault
| |
| |--86.01%-- skb_copy_datagram_iovec
| | tcp_recvmsg
| | inet_recvmsg
| | sock_recvmsg
| | sys_recvfrom
| | system_call
| | __recv
| | |
| | --100.00%-- (nil)
| |
| --13.99%-- memcpy
--0.04%-- [...]
5.34% [.] ceph_crc32c_le
|
|--99.95%-- 0xb8057558d0065990
--0.05%-- [...]
----------
If I understand what this is telling me, skb_copy_datagram_iovec
is responsible for triggering the calls to isolate_freepages_block,
isolate_migratepages_range, and isolate_freepages?
FWIW, I'm using a Chelsio T4 NIC in these hosts, with jumbo frames
and the Linux TCP stack (i.e., no stateful TCP offload).
-- Jim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/