When the memory allocation was stalled, here is what perf top was giving me: Samples: 483K of event 'cycles:ppp', Event count (approx.): 52114089074 Overhead Shared Object Symbol 28.53% [kernel] [k] total_mapcount 25.34% [kernel] [k] kvm_age_rmapp 13.54% [kernel] [k] slot_rmap_walk_next 11.24% [kernel] [k] kvm_handle_hva_range 6.35% [kernel] [k] rmap_get_first 3.69% [kernel] [k] __x86_indirect_thunk_r13 1.33% [kernel] [k] __isolate_lru_page 0.63% [kernel] [k] isolate_lru_pages.isra.58 0.48% [kernel] [k] page_vma_mapped_walk 0.40% [kernel] [k] __mod_node_page_state 0.35% [kernel] [k] clear_page_erms 0.31% [kernel] [k] shrink_page_list 0.28% [kernel] [k] _find_next_bit 0.27% [kernel] [k] putback_inactive_pages 0.27% [kernel] [k] move_active_pages_to_lru 0.27% [kernel] [k] inactive_list_is_low 0.22% [kernel] [k] __mod_zone_page_state
numactl -H when the memorry allocation stalled: root@gpu-compute028:~# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 node 0 size: 64288 MB node 0 free: 55983 MB node 1 cpus: 1 3 5 7 9 11 13 15 node 1 size: 64489 MB node 1 free: 63810 MB node distances: node 0 1 0: 10 21 1: 21 10 root@gpu-compute028:~# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 node 0 size: 64288 MB node 0 free: 366 MB node 1 cpus: 1 3 5 7 9 11 13 15 node 1 size: 64489 MB node 1 free: 63782 MB node distances: node 0 1 0: 10 21 1: 21 10 root@gpu-compute028:~# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 node 0 size: 64288 MB node 0 free: 368 MB node 1 cpus: 1 3 5 7 9 11 13 15 node 1 size: 64489 MB node 1 free: 63757 MB node distances: node 0 1 0: 10 21 1: 21 10 root@gpu-compute028:~# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 node 0 size: 64288 MB node 0 free: 368 MB node 1 cpus: 1 3 5 7 9 11 13 15 node 1 size: 64489 MB node 1 free: 63744 MB node distances: node 0 1 0: 10 21 1: 21 10 root@gpu-compute028:~# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 node 0 size: 64288 MB node 0 free: 366 MB node 1 cpus: 1 3 5 7 9 11 13 15 node 1 size: 64489 MB node 1 free: 63504 MB node distances: node 0 1 0: 10 21 1: 21 10 then i killed the process. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1808412 Title: 4.15.0 memory allocation issue Status in linux package in Ubuntu: Confirmed Bug description: My server is : PowerEdge T630 2x Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz 128G of ram 4x VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN X] [10de:1b00] (rev a1) Starting 116G ram 16vcpus + 4 pci passthrough allocating memory stops after about half of the memory. When upgrading from kernel 4.13.0 to 4.15.0 starting a vm takes a long time. I tested kernel : linux-image-4.13.0-37 not affected linux-image-4.13.0-45 not affected linux-image-4.15.0-34 affected linux-image-4.15.0-42 affected After disabling transparent_hugepage on 4.15 everything seems to work correctly. cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-4.15.0-42-generic root=UUID=<some uuid> ro intel_iommu=on transparent_hugepage=never splash quiet vt.handoff=7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1808412/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp