On Fri, Dec 2, 2022 at 9:14 PM Mike Pattrick <m...@redhat.com> wrote:
>
> On Fri, Dec 2, 2022 at 1:40 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
> >
> > On 12/2/22 18:59, Mike Pattrick wrote:
> > > On Fri, Dec 2, 2022 at 11:59 AM Ilya Maximets <i.maxim...@ovn.org> wrote:
> > >>
> > >> On 12/2/22 11:36, Maxime Coquelin wrote:
> > >>>
> > >>>
> > >>> On 12/2/22 11:09, David Marchand wrote:
> > >>>> On Wed, Nov 30, 2022 at 9:30 PM Ilya Maximets <i.maxim...@ovn.org> 
> > >>>> wrote:
> > >>>>>>>>> Shouldn't this be 0x7f instead?
> > >>>>>>>>> 0x3f doesn't enable bit #6, which is responsible for dumping
> > >>>>>>>>> shared huge pages.  Or am I missing something?
> > >>>>>>>>
> > >>>>>>>> That's a good point, the hugepage may or may not be private. I'll 
> > >>>>>>>> send
> > >>>>>>>> in a new one.
> > >>>>>>>
> > >>>>>>> OK.  One thing to think about though is that we'll grab
> > >>>>>>> VM memory, I guess, in case we have vhost-user ports.
> > >>>>>>> So, the core dump size can become insanely huge.
> > >>>>>>>
> > >>>>>>> The downside of not having them is inability to inspect
> > >>>>>>> virtqueues and stuff in the dump.
> > >>>>>>
> > >>>>>> Did you consider madvise()?
> > >>>>>>
> > >>>>>>         MADV_DONTDUMP (since Linux 3.4)
> > >>>>>>                Exclude from a core dump those pages in the range
> > >>>>>> specified by addr and length.  This is useful in applications that
> > >>>>>> have large areas of memory that are known not to be useful in a core
> > >>>>>> dump.  The effect of  MADV_DONT‐
> > >>>>>>                DUMP takes precedence over the bit mask that is set 
> > >>>>>> via
> > >>>>>> the /proc/[pid]/coredump_filter file (see core(5)).
> > >>>>>>
> > >>>>>>         MADV_DODUMP (since Linux 3.4)
> > >>>>>>                Undo the effect of an earlier MADV_DONTDUMP.
> > >>>>>
> > >>>>> I don't think OVS actually knows location of particular VM memory
> > >>>>> pages that we do not need.  And dumping virtqueues and stuff is,
> > >>>>> probably, the point of this patch (?).
> > >>>>>
> > >>>>> vhost-user library might have a better idea on which particular parts
> > >>>>> of the memory guest may use for virtqueues and buffers, but I'm not
> > >>>>> 100% sure.
> > >>>>
> > >>>> Yes, distinguishing hugepages of interest is a problem.
> > >>>>
> > >>>> Since v20.05, DPDK mem allocator takes care of excluding (unused)
> > >>>> hugepages from dump.
> > >>>> So with this OVS patch, if we catch private and shared hugepages,
> > >>>> "interesting" DPDK hugepages will get dumped, which is useful for
> > >>>> debugging post mortem.
> > >>>>
> > >>>> Adding Maxime, who will have a better idea of what is possible for the
> > >>>> guest mapping part.
> > >>>>
> > >>>>
> > >>>
> > >>> I wonder if we could do a MADV_DONTDUMP on all the guest memory at mmap
> > >>> time, then there are two cases:
> > >>>   a. vIOMMU = OFF. In this case we could do MADV_DODUMP on virtqueues
> > >>> memory. Doing so, we would have the rings memory, but not their buffers
> > >>> (except if they are located on same hugepages).
> > >>>   b. vIOMMU = ON. In this case we could do MADV_DODUMP on IOTLB_UPDATE
> > >>> new entries and MADV_DONTDUMP on invalidated entries. Doing so we will
> > >>> get both vrings and their buffers the backend is allowed to access.
> > >>
> > >> I guess, while DONTDUMP calls are mainly harmless, the explicit DODUMP
> > >> will override whatever user had in their global configuration.  Meaning
> > >> every DPDK application with vhost ports will start dumping some of the
> > >> guest pages with no actual ability to turn that off.
> > >
> > > I initially thought it would work that way, but the DODUMP flag just
> > > disables the DONTDUMP flag.
> > >
> > > https://github.com/torvalds/linux/blob/master/mm/madvise.c#L1055
> > > https://github.com/torvalds/linux/blob/master/fs/coredump.c#L1033
> >
> > Hmm, interesting.  Makes sense.
> >
> > Thanks for the pointers!
> >
> > So, it should still be 7f regardless in the coredump filter for OVS, right?
> > Do you plan to update the current patch or do you think we should omit
> > shared pages until support for MADV_DO/DONTDUMP is added to vhost library?
> >
> > Note that this will likely not be available in 22.11 as it's not a bug fix.
> > So, 23.11 at the earliest.
> >
> > Basically 2 options:
> >
> > 1. 0x3f and not having shared pages.  Flip to 0x7f with DPDK 23.11 next 
> > year.
> >    Pros: Smaller files
> >    Cons: Missing some of the virtqueue memory until [potentially] DPDK 
> > 23.11.

Mm, if someone still has some --socket-mem config, then I guess shared
hugepages will be in use in DPDK.


> >
> > 2. 0x7f today.
> >    Pros: All the memory is available.
> >    Cons: [Significantly] larger files until [potentially] DPDK 23.11.
> >
> > What do you think?  David, Maxime?
>
> I'd prefer 7f today. It's disabled by default, has zero impact on end
> users, makes setting up debugging environments more convenient, and on
> distributions with systemd the larger coredumps are managed somewhat
> automatically. The news item already warns about large coredumps.

I prefer the latter suggestion.


-- 
David Marchand

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to