Dear John:
Thank you for your reply. The case you mentioned is a typical
performance regression issue, there's no need for the kernel to oom kill
any random process even in the worst case. But in my observations, the
iommu_iova slab could consume up to 40G memory, and the kernel have to kill
my vm process to free memory (64G memory installed). So I don't think it's
relevent.
John Garry 于2020年4月25日周六 上午1:50写道:
> On 24/04/2020 17:30, Robin Murphy wrote:
> > On 2020-04-24 2:20 pm, Bin wrote:
> >> Dear Robin:
> >> Thank you for your explanation. Now, I understand that this could
> be
> >> NIC driver's fault, but how could I confirm it? Do I have to debug the
> >> driver myself?
> >
> > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> > memory about an order of magnitude faster than the IOVAs alone, but it
> > should shed some light on whether DMA API usage looks suspicious, and
> > dumping the mappings should help track down the responsible driver(s).
> > Although the debugfs code doesn't show the stacktrace of where each
> > mapping was made, I guess it would be fairly simple to tweak that for a
> > quick way to narrow down where to start looking in an offending driver.
> >
> > Robin.
>
> Just mentioning this in case it's relevant - we found long term aging
> throughput test causes RB tree to grow very large (and would I assume
> eat lots of memory):
>
>
> https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/
>
> John
>
> >
> >> Robin Murphy 于2020年4月24日周五 下午8:15写道:
> >>
> >>> On 2020-04-24 1:06 pm, Bin wrote:
> I'm not familiar with the mmu stuff, so what you mean by "some driver
> leaking DMA mappings", is it possible that some other kernel module
> like
> KVM or NIC driver leads to the leaking problem instead of the iommu
> >>> module
> itself?
> >>>
> >>> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> >>> should, since I'd expect a lot of people to have noticed that. It's far
> >>> more likely that some driver is failing to call dma_unmap_* when it's
> >>> finished with a buffer - with the IOMMU disabled that would be a no-op
> >>> on x86 with a modern 64-bit-capable device, so such a latent bug could
> >>> have been easily overlooked.
> >>>
> >>> Robin.
> >>>
> Bin 于 2020年4月24日周五 20:00写道:
>
> > Well, that's the problem! I'm assuming the iommu kernel module is
> >>> leaking
> > memory. But I don't know why and how.
> >
> > Do you have any idea about it? Or any further information is needed?
> >
> > Robin Murphy 于 2020年4月24日周五 19:20写道:
> >
> >> On 2020-04-24 1:40 am, Bin wrote:
> >>> Hello? anyone there?
> >>>
> >>> Bin 于2020年4月23日周四 下午5:14写道:
> >>>
> Forget to mention, I've already disabled the slab merge, so this
> is
> >> what
> it is.
>
> Bin 于2020年4月23日周四 下午5:11写道:
>
> > Hey, guys:
> >
> > I'm running a batch of CoreOS boxes, the lsb_release is:
> >
> > ```
> > # cat /etc/lsb-release
> > DISTRIB_ID="Container Linux by CoreOS"
> > DISTRIB_RELEASE=2303.3.0
> > DISTRIB_CODENAME="Rhyolite"
> > DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0
> (Rhyolite)"
> > ```
> >
> > ```
> > # uname -a
> > Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38
> -00
> >> 2019
> > x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> >> GNU/Linux
> > ```
> > Recently, I found my vms constently being killed due to OOM, and
> >>> after
> > digging into the problem, I finally realized that the kernel is
> >> leaking
> > memory.
> >
> > Here's my slabinfo:
> >
> > Active / Total Objects (% used): 83818306 / 84191607
> (99.6%)
> > Active / Total Slabs (% used) : 1336293 / 1336293
> (100.0%)
> > Active / Total Caches (% used) : 152 / 217 (70.0%)
> > Active / Total Size (% used) : 5828768.08K /
> 5996848.72K
> >> (97.2%)
> > Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> >
> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >
> > 80253888 80253888 100%0.06K 1253967 64 5015868K
> >>> iommu_iova
> >>
> >> Do you really have a peak demand of ~80 million simultaneous DMA
> >> buffers, or is some driver leaking DMA mappings?
> >>
> >> Robin.
> >>
> > 489472 489123 99%0.03K 3824 128 15296K kmalloc-32
> >
> > 297444 271112 91%0.19K 7082 42 56656K dentry
> >
> > 254400 252784 99%0.06K 3975 64 15900K
> >>> anon_vma_chain
> >
> > 222528 39255 17%0.50K 6954 32111264K
> kmalloc-512
>