OK. That gives me something to investigate. I'll see what I can find out.
You're running 64-bit kernel and userspace, x86-64? On Tue, Mar 05, 2019 at 08:42:14PM +0400, Oleg Bondarev wrote: > On Tue, Mar 5, 2019 at 7:26 PM Ben Pfaff <b...@ovn.org> wrote: > > > You're talking about the email where you dumped out a repeating sequence > > from some blocks? That might be the root of the problem, if you can > > provide some more context. I didn't see from the message where you > > found the sequence (was it just at the beginning of each of the 4 MB > > blocks you reported separately, or somewhere else), how many copies of > > it, or if you were able to figure out how long each of the blocks was. > > If you can provide that information I might be able to learn some > > things. > > > > Yes, those were beginnings of 0x4000000 size blocks reported by the script. > I also checked 0x8000000 blocks reported and the content is the same. > Examples of how those blocks end: > - https://pastebin.com/D9M6T2BA > - https://pastebin.com/gNT7XEGn > - https://pastebin.com/fqy4XDbN > > So basically contents of the blocks are sequences of: > > *00000020: 0000 0000 0000 0000 6500 0000 0000 0000 ........e.......* > *00000030: 0000 0000 0000 4014 0000 0000 0000 0000 ......@.........* > *00000040: 0000 0000 0000 0000 fa16 3e2b c5d5 0000 ..........>+....* > *00000050: 0000 0022 0000 0000 0000 0000 0000 4014 ..."..........@.* > *00000060: 0000 0000 0000 0000 0000 0000 ffff ffff ................* > *00000070: ffff ffff ffff 0000 0000 0fff 0000 0000 ................* > > following each other and sometimes separated by sequences like this: > > *00001040: 6861 6e64 6c65 7232 3537 0000 0000 0000 handler257......* > > I ran the scripts against several core dumps of several compute nodes with > the issue and > the picture is pretty much the same: 0x4000000 blocks and less 0x8000000 > blocks. > I checked the core dump from a compute node where OVS memory consumption > was ok: > no such block sizes reported. > > > > > > On Tue, Mar 05, 2019 at 09:07:55AM +0400, Oleg Bondarev wrote: > > > Hi Ben, > > > > > > I didn't have a chance to debug the scripts yet, but just in case you > > > missed my last email with examples of repeatable blocks > > > and sequences - do you think we still need to analyze further, will the > > > scripts tell more about the heap? > > > > > > Thanks, > > > Oleg > > > > > > On Thu, Feb 28, 2019 at 10:14 PM Ben Pfaff <b...@ovn.org> wrote: > > > > > > > On Tue, Feb 26, 2019 at 01:41:45PM +0400, Oleg Bondarev wrote: > > > > > Hi, > > > > > > > > > > thanks for the scripts, so here's the output for a 24G core dump: > > > > > https://pastebin.com/hWa3R9Fx > > > > > there's 271 entries of 4MB - does it seem something we should take a > > > > closer > > > > > look at? > > > > > > > > I think that this output really just indicates that the script failed. > > > > It analyzed a lot of regions but didn't output anything useful. If it > > > > had worked properly, it would have told us a lot about data blocks that > > > > had been allocated and freed. > > > > > > > > The next step would have to be to debug the script. It definitely > > > > worked for me before, because I have fixed at least 3 or 4 bugs based > > on > > > > it, but it also definitely is a quick hack and not something that I can > > > > stand behind. I'm not sure how to debug it at a distance. It has a > > > > large comment that describes what it's trying to do. Maybe that would > > > > help you, if you want to try to debug it yourself. I guess it's also > > > > possible that glibc has changed its malloc implementation; if so, then > > > > it would probably be necessary to start over and build a new script. > > > > > > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss