On Fri, Dec 2, 2022 at 5:37 AM Maxime Coquelin
<maxime.coque...@redhat.com> wrote:
>
>
>
> On 12/2/22 11:09, David Marchand wrote:
> > On Wed, Nov 30, 2022 at 9:30 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
> >>>>>> Shouldn't this be 0x7f instead?
> >>>>>> 0x3f doesn't enable bit #6, which is responsible for dumping
> >>>>>> shared huge pages.  Or am I missing something?
> >>>>>
> >>>>> That's a good point, the hugepage may or may not be private. I'll send
> >>>>> in a new one.
> >>>>
> >>>> OK.  One thing to think about though is that we'll grab
> >>>> VM memory, I guess, in case we have vhost-user ports.
> >>>> So, the core dump size can become insanely huge.
> >>>>
> >>>> The downside of not having them is inability to inspect
> >>>> virtqueues and stuff in the dump.
> >>>
> >>> Did you consider madvise()?
> >>>
> >>>         MADV_DONTDUMP (since Linux 3.4)
> >>>                Exclude from a core dump those pages in the range
> >>> specified by addr and length.  This is useful in applications that
> >>> have large areas of memory that are known not to be useful in a core
> >>> dump.  The effect of  MADV_DONT‐
> >>>                DUMP takes precedence over the bit mask that is set via
> >>> the /proc/[pid]/coredump_filter file (see core(5)).
> >>>
> >>>         MADV_DODUMP (since Linux 3.4)
> >>>                Undo the effect of an earlier MADV_DONTDUMP.
> >>
> >> I don't think OVS actually knows location of particular VM memory
> >> pages that we do not need.  And dumping virtqueues and stuff is,
> >> probably, the point of this patch (?).
> >>
> >> vhost-user library might have a better idea on which particular parts
> >> of the memory guest may use for virtqueues and buffers, but I'm not
> >> 100% sure.
> >
> > Yes, distinguishing hugepages of interest is a problem.
> >
> > Since v20.05, DPDK mem allocator takes care of excluding (unused)
> > hugepages from dump.
> > So with this OVS patch, if we catch private and shared hugepages,
> > "interesting" DPDK hugepages will get dumped, which is useful for
> > debugging post mortem.
> >
> > Adding Maxime, who will have a better idea of what is possible for the
> > guest mapping part.
> >
> >
>
> I wonder if we could do a MADV_DONTDUMP on all the guest memory at mmap
> time, then there are two cases:
>    a. vIOMMU = OFF. In this case we could do MADV_DODUMP on virtqueues
> memory. Doing so, we would have the rings memory, but not their buffers
> (except if they are located on same hugepages).
>    b. vIOMMU = ON. In this case we could do MADV_DODUMP on IOTLB_UPDATE
> new entries and MADV_DONTDUMP on invalidated entries. Doing so we will
> get both vrings and their buffers the backend is allowed to access.
>
> I can prepare a PoC quickly if someone is willing to experiment.

A big motivation for this patch is DPDK becomes a big black hole in
coredumps, preventing even netdev structures from being enumerated.

I threw together a small PoC for this yesterday, only marking vhost
mmaps as DONTDUMP. The result was that a simple OVS configuration with
one vhost attached to an 8gb VM dropped the zstd compressed coredump
including shared huge pages from 486.4M to 19.8M. The resulting
coredump was also significantly more debuggable, all internal OVS
datastructures became available as well as many DPDK ones.

I'll look at cleaning things up and submitting to the DPDK mailing list.


Cheers,
M

>
> Regards,
> Maxime
>
>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to