Hi, On 14.06.2017 18:39, Mathieu Desnoyers wrote: >> On 14.06.2017 17:12, Mathieu Desnoyers wrote: >>> Can you provide a copy of the metadata file ? And ideally the data >>> streams too ? This would give us a better idea of what is happening. >>> >>> Do you perform kernel or user-space tracing ? Do you trace huge >>> sequences of bytes within your own tracepoints ? >> I perform kernel traceing only, in this case limited to syscalls, >> sched*, block* and irq*. No user-space tracepoints. >> >> I didn't know the metadata file was plain text, I had a quick look into >> it and noticed corruption already, with random garbage data inserted all >> over the place. I'm surprised babeltrace didn't choke on the metadata >> already. > The lttng metadata is "packetized plain-text". What you see is plain-text in > a transport layer which is binary. This explains the "garbage" you see: > those are binary headers for packets. Use babeltrace -o ctf-metadata > to extract the text-only metadata (which is also valid metadata under CTF). > Both packetized and pure text metadata are allowed.
Upps, right, red herring then. I've read the overview of the CTF format now (great docs) and understand things a bit better. >> Any idea what can cause the corrupted trace? > Based on your babeltrace backtrace, the possible culprits would be the > events that have a sequence (variable-sized array): > > syscalls: select, poll, ppoll, pselect6, epoll_wait, epoll_pwait > > block_rq_issue, block_rq_insert, block_rq_complete, block_rq_requeue, > block_rq_abort. > > There are a few approaches to cornering the issue. You can try reproducing > on your workload/config by only enabling one of these events at a time. > Just knowing which event(s) is/are the culprit would be a good start. > > Another possibility would be to send us a trace reproducing the issue > with only those events enabled, which should not contain confidential > info about your system. I've added some debug statements to babeltrace now. The culprit in this particular case is the first block_rq_complete event, the __cmd_length field contains a large value (3040877592). __cmd_length is used as the length for the _cmd sequence, and then of course allocating space for that sequence fails. Any idea what can cause __cmd_length to be bogus? For reference, this is the event declaration: event { name = "block_rq_complete"; id = 4; stream_ fields := struct { integer { size = 32; align = 32; signed = 0; encoding = none; base = 10; } _dev; integer { size = 64; align = 64; signed = 0; encoding = none; base = 10; } _sector; integer { size = 32; align = 32; signed = 0; encoding = none; base = 10; } _nr_sector; integer { size = 32; align = 32; signed = 1; encoding = none; base = 10; } _errors; integer { size = 32; align = 32; signed = 0; encoding = none; base = 10; } _rwbs; integer { size = 32; align = 32; signed = 0; encoding = none; base = 10; } __cmd_length; integer { size = 8; align = 8; signed = 0; encoding = none; base = 16; } _cmd[ __cmd_length ]; }; }; Also, does anyone know which block_rq_ events are actually needed to display the IO load in TraceCompass? Regards, Thomas -- Thomas McGuire | thomas.mcgu...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev