Hi Alexander,
On Thu, Dec 11, 2025 at 04:53:02PM +0000, Stephan, Alexander wrote:
> Dear list,
>
> I've recently opened an issue regarding a limitation of the "show proc"
> command, where the output is cut off after roughly 200 entries:
> https://github.com/haproxy/haproxy/issues/3204
> This has been fixed in
> https://github.com/haproxy/haproxy/commit/594408cd612b5d80b7120d8a4f157f928ed38710
> by William L., thanks again!
> I found that workers sharing the same timestamp still could be split across
> flushes: if a flush occurred mid-group and next_uptime advanced to that
> timestamp, resuming would include part of the group twice (duplicates).
> Setting ctx->next_uptime to a group's timestamp and then filtering with
> timestamp >= next_uptime re-includes that group when resuming, which causes
> duplicates unless you advance next_uptime past the group (or only update it
> on boundaries and ensure the group isn't split).
> To address this, I have included a bug fix that prevents these duplicate
> entries. It works by grouping the entries by timestamp: same-second workers
> are emitted contiguously.
Thanks for your explanations and cleanups. I agree that overall this
looks cleaner after your patches, but I'll let William review your
patches as I'm not the best qualified on this area. Howeve, I'm having
a comment about the issue you reported above. If it indeed happens that
multiple processes have the same timestamp (two reloads in the same
second), for me it simply means that this timestamp is not the correct
enumeration key since it is not a discriminator. And generally speaking,
timestamps are only usable as a restart point when used in batches, like
we do in the scheduler or like you're indeed doing in our approach, and
when they're guaranteed to be monotonic, which is not even the case here
if system time changes between reloads. So then why not just save the
pointer to the child itself instead of the uptime, i.e. something along
the stuff below:
struct cli_showproc_ctx {
int debug;
struct mworker_proc *next_child;
};
...
if (!ctx->next_child) {
/* print header */
...
ctx->next_child = LIST_ELEM(&proc_list, typeof(*ctx->next_child), list);
}
list_for_each_entry_from(ctx->next_child, &proc_list, list) {
/* dump one entry */
child = ctx->next_child;
...
}
Or can a process disappear from the list during the dump ? (I don't know
if the list is adjusted when a process quits). If so, then we'd need
another index. It *seems* to me that the number of reloads is strictly
monotonous since we never have more than one process per reload so we
could use the reload counter and start from the highest possible value.
And if for any reason it causes any remaining problem, then we could
just add an index in the structure. We're only speaking about 4 bytes
per process... that's even less than the extra code needed to deal
with the ambiguity of the timestamp.
> In theory, there is still a problem left with missing entries if there is a
> reload during the dump that has been also mentioned by William in the issue,
> but that is out of scope for now, I think.
William mentioned this to me, and there's indeed no simple solution to
this. The reason is simple: you never want a connected CLI to prevent
an old process from leaving, otherwise any monitoring process that
remains connected blocks your reloads. We could imagine disconnecting
only between commands, i.e. at the prompt, but then there's also the
case where an old process is actively waiting on a command (just think
about "show events -w" used to wait for new logs). So if my memory serves
me right, at the moment this is handled by not counting CLI sessions as
"jobs", i.e. important tasks that prevent the process from quitting.
Other approaches could be imagined, but here it's clearly visible that
waiting for new logs is an example of situation where even reaching the
end of the log buffer is not a sufficient condition for quitting. Some
might consider that this combined with jobs==0 might possibly work, but
honestly I don't know if that's sufficient and will always stop. You
could for example imagine someone accessing the CLI over TLS via a
local TLS offloader, meaning that you suddenly have one job per CLI
connection, etc. So this is indeed a much more complex topic that's
outside the scope of just this command.
Cheers,
Willy