Re: POC: Extension for adding distributed tracing - pg_tracing

Heikki Linnakangas Fri, 02 Feb 2024 04:41:42 -0800

Hi Anthonin,

I'm only now catching up on this thread. Very exciting feature!

My first observation is that you were able to implement this purely asan extension, without any core changes. That's very cool, because:

- You don't need buy-in or blessing from anyone else. You can just putthis on github and people can immediately start using it.- As an extension, it's not tied to the PostgreSQL release cycle. Userscan immediately start using it with existing PostgreSQL versions.

There are benefits to being in contrib/ vs. an extension outside thePostgreSQL source tree, but there are significant downsides too.Out-of-tree, you get more flexibility, and you can develop much faster.I think this should live out-of-tree, at least until it's relativelystable. By stable, I mean "not changing much", not that it's bug-free.


# Core changes

That said, there are a lot of things that would make sense to integratein PostgreSQL itself. For example:

- You're relying on various existing hooks, but it might be better tohave the 'begin_span'/'end_span' calls directly in PostgreSQL sourcecode in the right places. There are more places where spans might benice, like when performing a large sort in tuplesort.c to name oneexample. Or even across every I/O; not sure how much overhead the spansincur and how much we'd be willing to accept. 'begin_span'/'end_span'could be new hooks that normally do nothing, but your extension wouldimplement them.

- Other extensions could include begin_span/end_span calls too. (It'scomplicated for an extension to call functions in another extension.)

- Extensions like postgres_fdw could get the tracing context andpropagate it to the remote system too.

- How to pass the tracing context from the client? You went withSQLCommenter and others proposed a GUC. A protocol extension would makesense too. I can see merits to all of those. It probably would be goodto support multiple mechanisms, but some might need core changes. Itmight be good to implement the mechanism for accepting trace context incore. Without the extension, we could still include the trace ID in thelogs, for example.


# Further ideas

Some ideas on cool extra features on top of this:

- SQL functions to begin/end a span. Could be handy if you havecomplicated PL/pgSQL functions, for example.

- Have spans over subtransactions.

- Include elog() messages in the traces. You might want to have a lowerelevel for what's included in traces; something like theclient_min_messages and log_min_messages settings.

- Include EXPLAIN plans in the traces.

# Comments on the implementation

There was discussion already on push vs pull model. Currently, youcollect the traces in memory / files, and require a client to poll them.A push model is more common in applications that support tracing. IfPostgres could connect directly to the OTEL collector, you'd need onefewer running component. You could keep the traces in backend-privatememory, no need to deal with shared memory and spilling to files.

Admittedly supporting the OTEL wire protocol is complicated unless youuse an existing library. Nevertheless, it might be a better tradeoff.OTEL/HTTP with JSON format seems just about feasible to implement fromscratch. Or bite the bullet and use some external library. If this livesas an extension, you have more freedom to rely on 3rd party libraries.

(If you publish this as an out-of-tree extension, then this is up toyou, of course, and doesn't need consensus here on pgsql-hackers)


# Suggested next steps with this

Here's how I'd suggest to proceed with this:

1. Publish this as an extension on github, as it is. I think you'll geta lot of immediate adoption.

2. Start a new pgsql-hackers thread on in-core changes that wouldbenefit the extension. Write patches for them. I'm thinking of thethings I listed in the Core changes section above, but maybe there areothers.



PS. Does any other DBMS implement this? Any lessons to be learned from them?

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: POC: Extension for adding distributed tracing - pg_tracing

Reply via email to