Re: Extensibility of the PostgreSQL wire protocol

2021-03-04 Thread Jan Wieck

On 3/4/21 7:38 PM, Hannu Krosing wrote:

On Thu, Mar 4, 2021 at 9:55 PM Jan Wieck  wrote:

but the whole thing was developed that way from the beginning and
it is working. I don't have a definitive date when that code will be
presented. Kuntal or Prateek may be able to fill in more details.


Are you really fully replacing the main loop, or are you running a second
main loop in parallel in the same database server instance, perhaps as
a separate TDS_postmaster backend ?

Will the data still also be accessible "as postgres" via port 5432 when
TDS/SQLServer support is active ?


The individual backend (session) is running a different main loop. A 
libpq based client will still get the regular libpq and the original 
PostgresMain() behavior on port 5432. The default port for TDS is 1433 
and with everything in place I can connect to the same database on that 
port with Microsoft's SQLCMD.


The whole point of all this is to allow the postmaster to listen to more 
than just 5432 and have different communication protocols on those 
*additional* ports. Nothing is really *replaced*. The parts of the 
backend, that do actual socket communication, are just routed through 
function pointers so that an extension can change their behavior.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Extensibility of the PostgreSQL wire protocol

2021-03-04 Thread Hannu Krosing
On Thu, Mar 4, 2021 at 9:55 PM Jan Wieck  wrote:
>
> Another possibility, and this is what is being used by the AWS team
> implementing the TDS protocol for Babelfish, is to completely replace
> the entire TCOP mainloop function PostgresMain().

I suspect this is the only reasonable way to do it for protocols which are
not very close to libpq.

> That is of course a
> rather drastic move and requires a lot more coding on the extension
> side,

Not necessarily - if the new protocol is close to existing one, then it is
copy/paste + some changes.

If it is radically different, then trying to fit it into the current
mainloop will
be even harder than writing from scratch.

And will very likely fail in the end anyway :)

> but the whole thing was developed that way from the beginning and
> it is working. I don't have a definitive date when that code will be
> presented. Kuntal or Prateek may be able to fill in more details.

Are you really fully replacing the main loop, or are you running a second
main loop in parallel in the same database server instance, perhaps as
a separate TDS_postmaster backend ?

Will the data still also be accessible "as postgres" via port 5432 when
TDS/SQLServer support is active ?




Re: Extensibility of the PostgreSQL wire protocol

2021-03-04 Thread Jan Wieck

On 3/3/21 2:43 PM, Peter Eisentraut wrote:


I think, the way the abstractions are chosen in this patch, it is still
very much tied to how the libpq protocol works.  For example, there is a
cancel key and a ready-for-query message.  Some of the details of the
simple and the extended query are exposed.  So you could create a
protocol that has a different way of encoding the payloads, as your
telnet example does, but I don't believe that you could implement a
competitor's protocol through this.  Unless you have that done and want
to show it?



Correct, a lot of what this patch does is to allow a developer of such 
protocol extension to just "extend" what the server side libpq does, by 
building a wrapper around the function they are interested in. That 
doesn't change the protocol, but rather allows additional functionality 
like the telemetry data gathering, Fabrizio was talking about.


The telnet_srv tutorial extension (which needs more documentation) is an 
example of how far one can go by replacing those funcitons, in that it 
actually implements a very different wire protocol. This one still fits 
into the regular libpq message flow.


Another possibility, and this is what is being used by the AWS team 
implementing the TDS protocol for Babelfish, is to completely replace 
the entire TCOP mainloop function PostgresMain(). That is of course a 
rather drastic move and requires a lot more coding on the extension 
side, but the whole thing was developed that way from the beginning and 
it is working. I don't have a definitive date when that code will be 
presented. Kuntal or Prateek may be able to fill in more details.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Extensibility of the PostgreSQL wire protocol

2021-03-03 Thread Hannu Krosing
I have not looked at the actual patch, but does it allow you to set up
its own channels to listen to ?

For example if I'd want to set up a server to listen to incoming connections
over QUIC [1] - a protocol which create a connection over UDP and allows
clients to move to new IP addresses (among other things) then would the
current extensibility proposal cover this ?

Maybe a correct approach would be to just start up a separate
"postmaster" to listen to a different protocol ?

[1] https://en.wikipedia.org/wiki/QUIC

Cheers
Hannu




Re: Extensibility of the PostgreSQL wire protocol

2021-03-03 Thread Peter Eisentraut



I think, the way the abstractions are chosen in this patch, it is still 
very much tied to how the libpq protocol works.  For example, there is a 
cancel key and a ready-for-query message.  Some of the details of the 
simple and the extended query are exposed.  So you could create a 
protocol that has a different way of encoding the payloads, as your 
telnet example does, but I don't believe that you could implement a 
competitor's protocol through this.  Unless you have that done and want 
to show it?





Re: Extensibility of the PostgreSQL wire protocol

2021-02-24 Thread Jan Wieck

On 2/19/21 10:13 AM, Jan Wieck wrote:


Give the function, that postmaster is calling to accept a connection
when a server_fd is ready, a return code that it can use to tell
postmaster "forget about it, don't fork or do anything else with it".
This function is normally calling StreamConnection() before the
postmaster then forks the backend. But it could instead hand over the
socket to the pool background worker (I presume Jonah is transferring
them from process to process via UDP packet). The pool worker is then
launching the actual backends which receive a requesting client via the
same socket transfer to perform one or more transactions, then hand the
socket back to the pool worker.


The function in question, which is StreamConnection() and with this 
patch can be replaced with an extension funtion via the fn_accept 
pointer, already has that capability. If StreamConnection() or its 
replacement returns a NULL pointer, the postmaster just skips calling 
BackendStartup(). So everything is already in place for the above to work.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Extensibility of the PostgreSQL wire protocol

2021-02-22 Thread Jan Wieck

On 2/22/21 1:01 PM, Tom Lane wrote:

Yeah, and as I pointed out somewhere upthread, trying to replace the
whole parser will just end in a maintenance nightmare.  The constructs
that the parser has to emit are complex, Postgres-specific, and
constantly evolving.  We are NOT going to promise any sort of cross
version compatibility for parse trees.


Absolutely agreed. We cannot promise that the parsetree generated in one 
version will work with the planner, optimizer and executor of the next. 
These types of projects will need to pay close attention and more 
importantly, develop their own regression test suites that detect when 
something has changed in core. That said, discussion about the parser 
hook should happen in the other thread.


I don't even expect that we can guarantee that the functions I am trying 
to allow to be redirected for the wire protocol will be stable forever. 
libpq V4 may need to change some of the call signatures, which has 
happened before. For example, the function to send the command 
completion message to the frontend (tcop/dest.c EndCommand()) changed 
from 12 to 13.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Extensibility of the PostgreSQL wire protocol

2021-02-22 Thread Jonah H. Harris
On Mon, Feb 22, 2021 at 1:01 PM Tom Lane  wrote:

> Jan Wieck  writes:
> > As Jim Mlodgenski just posted in [0], having the ability to also extend
> > and/or replace the parser will give them the ability to do just that.
>
> Yeah, and as I pointed out somewhere upthread, trying to replace the
> whole parser will just end in a maintenance nightmare.  The constructs
> that the parser has to emit are complex, Postgres-specific, and
> constantly evolving.  We are NOT going to promise any sort of cross
> version compatibility for parse trees.
>

Wholeheartedly agreed. Core should only ever maintain the hooks, never
their usage. It's the responsibility of the extension author to maintain
their code just as it is to manage their use of all other hook usages. Yes,
it's sometimes a maintenance nightmare - but with great power comes great
responsibility... as is anything loaded directly into the process.

-- 
Jonah H. Harris


Re: Extensibility of the PostgreSQL wire protocol

2021-02-22 Thread Tom Lane
Jan Wieck  writes:
> On 2/10/21 1:10 PM, Tom Lane wrote:
>> What I'm actually more concerned about, in this whole line of development,
>> is the follow-on requests that will surely occur to kluge up Postgres
>> to make its behavior more like $whatever.  As in "well, now that we
>> can serve MySQL clients protocol-wise, can't we pretty please have a
>> mode that makes the parser act more like MySQL".

> Those requests will naturally follow. But I don't see it as the main 
> project's responsibility to satisfy them. It would be rather natural to 
> develop the two things together. The same developer or group of 
> developers, who are trying to connect a certain client, will want to 
> have other compatibility features.

> As Jim Mlodgenski just posted in [0], having the ability to also extend 
> and/or replace the parser will give them the ability to do just that.

Yeah, and as I pointed out somewhere upthread, trying to replace the
whole parser will just end in a maintenance nightmare.  The constructs
that the parser has to emit are complex, Postgres-specific, and
constantly evolving.  We are NOT going to promise any sort of cross
version compatibility for parse trees.

regards, tom lane




Re: Extensibility of the PostgreSQL wire protocol

2021-02-22 Thread Jan Wieck

On 2/10/21 1:10 PM, Tom Lane wrote:

What I'm actually more concerned about, in this whole line of development,
is the follow-on requests that will surely occur to kluge up Postgres
to make its behavior more like $whatever.  As in "well, now that we
can serve MySQL clients protocol-wise, can't we pretty please have a
mode that makes the parser act more like MySQL".


Those requests will naturally follow. But I don't see it as the main 
project's responsibility to satisfy them. It would be rather natural to 
develop the two things together. The same developer or group of 
developers, who are trying to connect a certain client, will want to 
have other compatibility features.


As Jim Mlodgenski just posted in [0], having the ability to also extend 
and/or replace the parser will give them the ability to do just that.



Regards, Jan

[0] 
https://www.postgresql.org/message-id/CAB_5SReoPJAPO26Z8+WN6ugfBb2UDc3c21rRz9=bzibmcap...@mail.gmail.com



--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Extensibility of the PostgreSQL wire protocol

2021-02-22 Thread David Fetter
On Mon, Feb 22, 2021 at 07:34:51AM -0500, Dave Cramer wrote:
> On Fri, 19 Feb 2021 at 15:39, Álvaro Hernández  wrote:
> 
> > On 19/2/21 19:30, Jan Wieck wrote:
> > > [...]
> > >
> > > I also am not sure if building a connection pool into a
> > > background worker or postmaster is a good idea to begin with.
> > > One of the important features of a pool is to be able to suspend
> > > traffic and make the server completely idle to for example be
> > > able to restart the postmaster without forcibly disconnecting
> > > all clients. A pool built into a background worker cannot do
> > > that.
> 
> Yes, when did it become a good idea to put a connection pooler in
> the backend???

It became a great idea when we noticed just how large and
resource-intensive backends were, especially in light of applications'
broad tendency to assume that they're free. While I agree that that's
not a good assumption, it's one that's so common everywhere in
computing that we really need to face up to the fact that it's not
going away any time soon.

Decoupling the parts that serve requests from the parts that execute
queries also goes a long way toward things we've wanted for quite
awhile like admission control systems and/or seamless zero-downtime
upgrades.

Separately, as the folks at AWS and elsewhere have mentioned, being
able to pretend at some level to be a different RDBMS can only happen
if we respond to its wire protocol.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Extensibility of the PostgreSQL wire protocol

2021-02-22 Thread Jan Wieck

On 2/19/21 4:36 AM, Kuntal Ghosh wrote:

On Thu, Feb 18, 2021 at 9:32 PM Jan Wieck  wrote:



Few comments in the extension code (although experimental):

1. In telnet_srv.c,
+ static intpe_port;
..
+   DefineCustomIntVariable("telnet_srv.port",
+   "Telnet server port.",
+   NULL,
+   &pe_port,
+   pe_port,
+   1024,
+   65536,
+   PGC_POSTMASTER,
+   0,
+   NULL,
+   NULL,
+   NULL);

The variable pe_port should be initialized to a value which is > 1024
and < 65536. Otherwise, the following assert will fail,
TRAP: FailedAssertion("newval >= conf->min", File: "guc.c", Line:
5541, PID: 12100)

2. The function pq_putbytes shouldn't be used by anyone other than
old-style COPY out.
+   pq_putbytes(msg, strlen(msg));
Otherwise, the following assert will fail in the same function:
 /* Should only be called by old-style COPY OUT */
 Assert(DoingCopyOut);



Attached are an updated patch and telnet_srv addressing the above problems.


Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services


telnet_srv_2021-02-22_1.tgz
Description: application/compressed-tar


wire_proto_2021-02-22_1.diff.gz
Description: application/gzip


Re: Extensibility of the PostgreSQL wire protocol

2021-02-22 Thread Jan Wieck

On 2/22/21 7:34 AM, Dave Cramer wrote:

Yes, when did it become a good idea to put a connection pooler in the 
backend???


Dave Cramer
www.postgres.rocks


As Alvaro said, different pool purposes lead to different pool 
architecture and placement.


However, the changes proposed here, aiming at the ability to load 
modified or entirely different wire protocol handlers, do not limit such 
connection pooling. To the contrary.


Any connection pool, that wants to maintain more client connections than 
actual database backends, must know when it is appropriate to do so. 
Usually the right moment to break the current client-backend association 
is when the backend is outside a transaction block and waiting for the 
next client request. To do so pools cannot blindly shovel data back and 
forth. They need to scan one way or another for the backend's 'Z' 
message, sent in tcop/dest.c ReadyForQuery(), where the backend also 
reports the current transaction state. IOW the pool must follow the flow 
of libpq messages on all connections, message by message, row by row, 
just for the purpose of seeing that one, single bit. It is possible to 
transmit that information to the pool on a separate channel.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Extensibility of the PostgreSQL wire protocol

2021-02-22 Thread Dave Cramer
On Fri, 19 Feb 2021 at 15:39, Álvaro Hernández  wrote:

>
>
> On 19/2/21 19:30, Jan Wieck wrote:
> > [...]
> >
> > I also am not sure if building a connection pool into a background
> > worker or postmaster is a good idea to begin with. One of the
> > important features of a pool is to be able to suspend traffic and make
> > the server completely idle to for example be able to restart the
> > postmaster without forcibly disconnecting all clients. A pool built
> > into a background worker cannot do that.
> >
> >
>



Yes, when did it become a good idea to put a connection pooler in the
backend???

Dave Cramer
www.postgres.rocks


Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Álvaro Hernández



On 19/2/21 19:30, Jan Wieck wrote:
> [...]
>
> I also am not sure if building a connection pool into a background
> worker or postmaster is a good idea to begin with. One of the
> important features of a pool is to be able to suspend traffic and make
> the server completely idle to for example be able to restart the
> postmaster without forcibly disconnecting all clients. A pool built
> into a background worker cannot do that.
>
>

    In my opinion, there are different reasons to use a connection pool,
that lead to different placements of that connection pool on the
architecture of the system. The ability of a pool to suspend (pause)
traffic and apply live re-configurations is a very important one to
implement high availability practices, transparent scaling, and others.
But these poolers belong to middleware layers (as in different processes
in different servers), where these pausing operations make complete sense.

    Connection poolers fronting the database have other specific
missions, namely to control the fan-in of connections to the database.
These connection poolers make sense being as close to the database as
possible (ideally: embedded) but don't need to perform pause operations
here.


    Álvaro


-- 

Alvaro Hernandez


---
OnGres






Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Álvaro Hernández



On 19/2/21 14:48, Heikki Linnakangas wrote:
> [...]
>
> I can see value in supporting different protocols. I don't like the
> approach discussed in this thread, however.
>
> For example, there has been discussion elsewhere about integrating
> connection pooling into the server itself. For that, you want to have
> a custom process that listens for incoming connections, and launches
> backends independently of the incoming connections. These hooks would
> not help with that.
>
> Similarly, if you want to integrate a web server into the database
> server, you probably also want some kind of connection pooling. A
> one-to-one relationship between HTTP connections and backend processes
> doesn't seem nice.

    While I'm far from an HTTP/2 expert and I may be wrong, from all I
know HTTP/2 allows to create full-duplex, multiplexed, asynchronous
channels. So multiple connections can be funneled through a single
connection. This decreases the need and aim for connection pooling. It
doesn't eliminate it completely, as you may have the channel busy if a
single tenant is sending a lot of data; and you could not have more than
one concurrent action from a single tenant. OTOH, given these HTTP/2
properties, a non-pooled HTTP/2 endpoint may provide already significant
benefits due to the multiplexing capabilities.

    So definitely we don't need to consider an HTTP endpoint would
entail a 1:1 mapping between connections and backend processes.


    Álvaro

-- 

Alvaro Hernandez


---
OnGres






Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Damir Simunic


> On 19 Feb 2021, at 19:30, Jan Wieck  wrote:
> 
> An "extended" libpq protocol could allow the pool to give clients a unique 
> ID. The protocol handler would then maintain maps with the SQL of prepared 
> statements and what the client thinks their prepared statement name is. 

Or, the connection pooler could support a different wire protocol that has some 
form of client cookies and could let the client hold on to an opaque token to 
present back with every query and use that to route to the right backend with a 
prepared statement for that client (or match the appropriate cached p statement 
from the cache), even across client disconnections.

> Most of that would of course be possible on the pool side itself. But the 
> internal structure of pgbouncer isn't suitable for that. It is very 
> lightweight and for long SQL queries may never have the complete 'P' message 
> in memory. It would also not have direct access to security related 
> information like the search path, which would require extra round trips 
> between the pool and the backend to retrieve it.

> 
> So while not suitable to create a built in pool by itself, loadable wire 
> protocols can definitely help with connection pooling.

I think loadable wire protocols will have a positive effect on developing more 
sophisticated connection poolers.

> I also am not sure if building a connection pool into a background worker or 
> postmaster is a good idea to begin with. One of the important features of a 
> pool is to be able to suspend traffic and make the server completely idle to 
> for example be able to restart the postmaster without forcibly disconnecting 
> all clients.

Agreed. Going even further, a connection pooler supporting a protocol like quic 
(where the notion of connection is decoupled from the actual socket connection) 
could help a lot with balancing load between servers and data centers, which 
also would not be convenient for the actual Postgres to do with present 
architecture. (And here, too, a pluggable wire protocol would help with keeping 
tabs on individual backends).

--
Damir



Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Jan Wieck

On 2/19/21 12:18 PM, Damir Simunic wrote:



On 19 Feb 2021, at 14:48, Heikki Linnakangas  wrote:

For example, there has been discussion elsewhere about integrating connection 
pooling into the server itself. For that, you want to have a custom process 
that listens for incoming connections, and launches backends independently of 
the incoming connections. These hooks would not help with that.



Not clear how the connection polling in the core is linked to discussing 
pluggable wire protocols.


It isn't per se. But there are things pluggable wire protocols can help 
with in regards to connection pooling. For example a connection pool 
like pgbouncer can be configured to switch client-backend association on 
a transaction level. It therefore scans the traffic for the in 
transaction state. This however only works if an application uses 
identical session states across all connections in a pool. The JDBC 
driver for example only really prepares PreparedStatements after a 
number of executions and then assigns a name based on a counter to them. 
So it is neither guaranteed that a certain backend has the same 
statements prepared, nor that they are named the same. Therefore JDBC 
based applications cannot use PreparedStatements through pgbouncer in 
transaction mode.


An "extended" libpq protocol could allow the pool to give clients a 
unique ID. The protocol handler would then maintain maps with the SQL of 
prepared statements and what the client thinks their prepared statement 
name is. So when a client sends a P packet, the protocol handler would 
lookup the mapping and see if it already has that statement prepared. 
Just add the mapping info or actually create a new statement entry in 
the maps. These maps are of course shared across backends. So if then 
another client sends bind+execute and the backend doesn't have a plan 
for that query, it would internally create one.


There are security implications here, so things like the search path 
might have to be part of the maps, but those are implementation details.


At the end this would allow a project like pgbouncer to create an 
extended version of libpq protocol that caters to the very special needs 
of that pool.


Most of that would of course be possible on the pool side itself. But 
the internal structure of pgbouncer isn't suitable for that. It is very 
lightweight and for long SQL queries may never have the complete 'P' 
message in memory. It would also not have direct access to security 
related information like the search path, which would require extra 
round trips between the pool and the backend to retrieve it.


So while not suitable to create a built in pool by itself, loadable wire 
protocols can definitely help with connection pooling.


I also am not sure if building a connection pool into a background 
worker or postmaster is a good idea to begin with. One of the important 
features of a pool is to be able to suspend traffic and make the server 
completely idle to for example be able to restart the postmaster without 
forcibly disconnecting all clients. A pool built into a background 
worker cannot do that.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Damir Simunic


> On 19 Feb 2021, at 14:48, Heikki Linnakangas  wrote:
> 
> For example, there has been discussion elsewhere about integrating connection 
> pooling into the server itself. For that, you want to have a custom process 
> that listens for incoming connections, and launches backends independently of 
> the incoming connections. These hooks would not help with that.
> 

Not clear how the connection polling in the core is linked to discussing 
pluggable wire protocols. 

> Similarly, if you want to integrate a web server into the database server, 
> you probably also want some kind of connection pooling. A one-to-one 
> relationship between HTTP connections and backend processes doesn't seem nice.
> 

HTTP/2 is just a protocol, not unlike fe/be that has a one-to-one relationship 
to backend processes as it stands. It shuttles data back and forth in 
query/response exchanges, and happens to be used by web servers and web 
browsers, among other things. My mentioning of it was simply an example I can 
speak of from experience, as opposed to speculating. Could have brought up any 
other wire protocol if I had experience with it, say MQTT.

To make it clear, “a pluggable wire protocol” as discussed here is a set of 
rules that defines how data is transmitted: what the requests and responses 
are, and how is the data laid out on the wire, what to do in case of error, 
etc. Nothing to do with a web server; why would one want to integrate it in the 
database, anyway?

The intended contribution to the discussion of big picture of pluggable wire 
protocols is that there are significant use cases where the protocol choice is 
restricted on the client side, and allowing a pluggable wire protocol on the 
server side brings tangible benefits in performance and architectural 
simplification. That’s all. The rest were supporting facts that hopefully can 
also serve as a counterpoint to “pluggable wire protocol is primarily useful to 
make Postgres pretend to be Mysql."

Protocol conversion HTTP/2<—>FE/BE on the connection pooler already brings a 
lot of the mentioned benefits, and I’m satisfied with it. Beyond that I’m 
simply supporting the idea of  pluggable protocols as experience so far allows 
me to see advantages that might sound theoretical to someone who never tried 
this scenario in production.

Glad to offer a couple of examples where I see potential for performance gains 
for having such a wire protocol pluggable in the core. Let me know if you want 
me to elaborate.

> Querying multiple databases over a single connection is not possible with the 
> approach taken here. 

Indeed, querying multiple databases over a single connection is something you 
need a proxy for and a different client protocol from fe/be. No need to mix 
that with the talk about pluggable wire protocol. 

My mentioning of it was in the sense “a lot of LoB backend code is nothing more 
than a bloated protocol converter that happens to also allow connecting to 
multiple databases from a single client connection => letting the client speak 
to the database [trough a proxy in this case] removed the bloated source of 
latency but kept the advantages.”

--
Damir





Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Jan Wieck

On 2/19/21 8:48 AM, Heikki Linnakangas wrote:

I can see value in supporting different protocols. I don't like the
approach discussed in this thread, however.

For example, there has been discussion elsewhere about integrating
connection pooling into the server itself. For that, you want to have a
custom process that listens for incoming connections, and launches
backends independently of the incoming connections. These hooks would
not help with that.


The two are not mutually exclusive. You are right that the current 
proposal would not help with that type of built in connection pool, but 
it may be extended to that.


Give the function, that postmaster is calling to accept a connection 
when a server_fd is ready, a return code that it can use to tell 
postmaster "forget about it, don't fork or do anything else with it". 
This function is normally calling StreamConnection() before the 
postmaster then forks the backend. But it could instead hand over the 
socket to the pool background worker (I presume Jonah is transferring 
them from process to process via UDP packet). The pool worker is then 
launching the actual backends which receive a requesting client via the 
same socket transfer to perform one or more transactions, then hand the 
socket back to the pool worker.


All of that would still require a protocol extension that has special 
messages for "here is a client socket for you" and "you can have that 
back".




I would recommend this approach: write a separate program that sits
between the client and PostgreSQL, speaking custom protocol to the
client, and libpq to the backend. And then move that program into a
background worker process.


That is a classic protocol converting proxy. It has been done in the 
past with not really good results, both performance wise as with respect 
to protocol completeness.



Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Jonah H. Harris
On Fri, Feb 19, 2021 at 8:48 AM Heikki Linnakangas  wrote:

> With the hooks that exist today, would it possible to write a background
> worker that listens on a port, instead of postmaster? Can you launch
> backends from a background worker? And communicate the backend processes
> using a shared memory message queue (see pqmq.c).
>

Yes. That's similar to how mine work: A background worker that acts as a
listener for the new protocol which then sets up a new dynamic background
worker on accept(), waits for its creation, passes the fd to the new
background worker, and sits in a while (!got_sigterm) loop checking the
socket for activity and running the protocol similar to postmaster. I
haven't looked at the latest connection pooling patches but, in general,
connection pooling is an abstract issue and should be usable for any type
of connection as, realistically, it's just an event loop and state problem
- it shouldn't be protocol specific.

I would recommend this approach: write a separate program that sits
> between the client and PostgreSQL, speaking custom protocol to the
> client, and libpq to the backend. And then move that program into a
> background worker process.
>

Doing protocol conversion between libpq and a different protocol works, but
is slow. My implementations were originally all proxies that worked outside
the database, then I moved them inside, then I replaced all the libpq code
with SPI-related calls.


> > In a recent case, letting the browser talk directly to the database
> > allowed me to get rid of a ~100k-sloc .net backend and all the
> > complexity and infrastructure that goes with
> > coding/testing/deploying/maintaining it, while keeping all the
> > positives: per-query compression/data conversion, querying multiple
> > databases over a single connection, session cookies, etc. Deployment
> > is trivial compared to what was before. Latency is down 2x-4x across
> > the board.
>
> Querying multiple databases over a single connection is not possible
> with the approach taken here. Not sure about the others things you listed.
>

Accessing multiple databases from the same backend is problematic overall -
I didn't solve that in my implementations either. IIRC, once a bgworker is
attached to a specific database, it's basically stuck with that database.

-- 
Jonah H. Harris


Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Heikki Linnakangas

On 19/02/2021 14:29, Damir Simunic wrote:



On 11 Feb 2021, at 16:06, Tom Lane  wrote:

Maybe there is some useful thing that can be accomplished here, but
we need to consider the bigger picture rather than believing
(without proof) that a few hook variables will be enough to do
anything.


Pluggable wire protocol is a game-changer on its own.

The bigger picture is that a right protocol choice enables
large-scale architectural simplifications for whole classes of
production applications.

For browser-based applications (lob, saas, e-commerce), having the
database server speak the browser protocol enables architectures
without backend application code. This in turn leads to significant
reductions of latency, complexity, and application development time.
And it’s not just lack of backend code: one also profits from all the
existing infrastructure like per-query compression/format choice,
browser connection management, sse, multiple streams, prioritization,
caching/cdns, etc.

Don’t know if you’d consider it as a proof, yet I am seeing 2x to 4x
latency reduction in production applications from protocol conversion
to http/2. My present solution is a simple connection pooler I built
on top of Nginx transforming the tcp stream as it passes through.


I can see value in supporting different protocols. I don't like the 
approach discussed in this thread, however.


For example, there has been discussion elsewhere about integrating 
connection pooling into the server itself. For that, you want to have a 
custom process that listens for incoming connections, and launches 
backends independently of the incoming connections. These hooks would 
not help with that.


Similarly, if you want to integrate a web server into the database 
server, you probably also want some kind of connection pooling. A 
one-to-one relationship between HTTP connections and backend processes 
doesn't seem nice.


With the hooks that exist today, would it possible to write a background 
worker that listens on a port, instead of postmaster? Can you launch 
backends from a background worker? And communicate the backend processes 
using a shared memory message queue (see pqmq.c).


I would recommend this approach: write a separate program that sits 
between the client and PostgreSQL, speaking custom protocol to the 
client, and libpq to the backend. And then move that program into a 
background worker process.



In a recent case, letting the browser talk directly to the database
allowed me to get rid of a ~100k-sloc .net backend and all the
complexity and infrastructure that goes with
coding/testing/deploying/maintaining it, while keeping all the
positives: per-query compression/data conversion, querying multiple
databases over a single connection, session cookies, etc. Deployment
is trivial compared to what was before. Latency is down 2x-4x across
the board.


Querying multiple databases over a single connection is not possible 
with the approach taken here. Not sure about the others things you listed.


- Heikki




Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Jan Wieck
Thank you Kuntal,

On Fri, Feb 19, 2021 at 4:36 AM Kuntal Ghosh 
wrote:

> On Thu, Feb 18, 2021 at 9:32 PM Jan Wieck  wrote:
>
>
> Few comments in the extension code (although experimental):
>
> 1. In telnet_srv.c,
> + static intpe_port;
> ..
> +   DefineCustomIntVariable("telnet_srv.port",
> +   "Telnet server
> port.",
> +   NULL,
> +   &pe_port,
> +   pe_port,
> +   1024,
> +   65536,
> +   PGC_POSTMASTER,
> +   0,
> +   NULL,
> +   NULL,
> +   NULL);
>
> The variable pe_port should be initialized to a value which is > 1024
> and < 65536. Otherwise, the following assert will fail,
> TRAP: FailedAssertion("newval >= conf->min", File: "guc.c", Line:
> 5541, PID: 12100)
>
>
Right, forgot to turn on Asserts.


> 2. The function pq_putbytes shouldn't be used by anyone other than
> old-style COPY out.
> +   pq_putbytes(msg, strlen(msg));
> Otherwise, the following assert will fail in the same function:
> /* Should only be called by old-style COPY OUT */
> Assert(DoingCopyOut);
>

I would argue that the Assert needs to be changed. It is obvious that the
Assert in place is meant to guard against direct usage of pg_putbytes() in
an attempt to force all code to use pq_putmessage() instead. This is good
when speaking libpq wire protocol since all messages there are prefixed
with a one byte message type. It does not apply to other protocols.

I propose to create another global boolean IsNonLibpqFrontend which the
protocol extension will set to true when accepting the connection and the
above then will change to

Assert(DoingCopyOut || IsNonLibpqFrontend);


Regards, Jan



>
> --
> Thanks & Regards,
> Kuntal Ghosh
> Amazon Web Services
>


-- 
Jan Wieck


Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Damir Simunic


> On 11 Feb 2021, at 16:06, Tom Lane  wrote:
> 
> Maybe there is some useful thing that can be accomplished here, but we
> need to consider the bigger picture rather than believing (without proof)
> that a few hook variables will be enough to do anything.
> 
>   regards, tom lane
> 

Pluggable wire protocol is a game-changer on its own. 

The bigger picture is that a right protocol choice enables large-scale 
architectural simplifications for whole classes of production applications.

For browser-based applications (lob, saas, e-commerce), having the database 
server speak the browser protocol enables architectures without backend 
application code. This in turn leads to significant reductions of latency, 
complexity, and application development time. And it’s not just lack of backend 
code: one also profits from all the existing infrastructure like per-query 
compression/format choice, browser connection management, sse, multiple 
streams, prioritization, caching/cdns, etc.

Don’t know if you’d consider it as a proof, yet I am seeing 2x to 4x latency 
reduction in production applications from protocol conversion to http/2. My 
present solution is a simple connection pooler I built on top of Nginx 
transforming the tcp stream as it passes through.

In a recent case, letting the browser talk directly to the database allowed me 
to get rid of a ~100k-sloc .net backend and all the complexity and 
infrastructure that goes with coding/testing/deploying/maintaining it, while 
keeping all the positives: per-query compression/data conversion, querying 
multiple databases over a single connection, session cookies, etc. Deployment 
is trivial compared to what was before. Latency is down 2x-4x across the board.

Having some production experience with this approach, I can see how 
http/2-speaking Postgres would further reduce latency, processing cost, and 
time-to-interaction for applications.

A similar case can be made for IoT where one would want to plug an 
iot-optimized protocol. Again, most of the benefit is possible with a 
protocol-converting proxy, but there are additional non-trivial performance 
gains to be had if the database server speaks the right protocol.

While not the only use cases, I’d venture a guess these represent a sizable 
chunk of what Postgres is used for today, and will be used even more for, so 
the positive impact of a pluggable protocol would be significant.

--
Damir



Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Kuntal Ghosh
On Thu, Feb 18, 2021 at 9:32 PM Jan Wieck  wrote:
>
And, here is how it looks with the following configuration:
telnet_srv.port = 1433
telnet_srv.listen_addresses = '*'

telnet localhost 1433


   master
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
PostgreSQL Telnet Interface

database name: postgres
username: kuntal
password: changeme
> select 1;
?column?

1

SELECT 1
> select 1/0;
Message: ERROR - division by zero

Few comments in the extension code (although experimental):

1. In telnet_srv.c,
+ static intpe_port;
..
+   DefineCustomIntVariable("telnet_srv.port",
+   "Telnet server port.",
+   NULL,
+   &pe_port,
+   pe_port,
+   1024,
+   65536,
+   PGC_POSTMASTER,
+   0,
+   NULL,
+   NULL,
+   NULL);

The variable pe_port should be initialized to a value which is > 1024
and < 65536. Otherwise, the following assert will fail,
TRAP: FailedAssertion("newval >= conf->min", File: "guc.c", Line:
5541, PID: 12100)

2. The function pq_putbytes shouldn't be used by anyone other than
old-style COPY out.
+   pq_putbytes(msg, strlen(msg));
Otherwise, the following assert will fail in the same function:
/* Should only be called by old-style COPY OUT */
Assert(DoingCopyOut);

-- 
Thanks & Regards,
Kuntal Ghosh
Amazon Web Services




Re: Extensibility of the PostgreSQL wire protocol

2021-02-18 Thread Jan Wieck
Attached are a first patch and a functioning extension that implements a
telnet protocol server.

The extension needs to be loaded via shared_preload_libraries and
configured for a port number and listen_addresses as follows:

shared_preload_libraries = 'telnet_srv'

telnet_srv.listen_addresses = '*'
telnet_srv.port = 54323

It is incomplete in that it doesn't address things like the COPY protocol.
But it is enough to give a more detailed idea of what this interface will
look like and what someone would do to implement their own protocol or
extend an existing one.

The overall idea here is to route all functions, that communicate with the
frontend, through function pointers that hang off of MyProcPort. Since we
are performing socket communication in them I believe one extra function
pointer indirection is unlikely to have significant performance impact.

Best Regards, Jan
On behalf of Amazon Web Services





On Sun, Feb 14, 2021 at 12:36 PM Dave Cramer 
wrote:

>
>
> On Thu, 11 Feb 2021 at 09:28, Robert Haas  wrote:
>
>> On Wed, Feb 10, 2021 at 2:04 PM Tom Lane  wrote:
>> > That is a spot-on definition of where I do NOT want to end up.  Hooks
>> > everywhere and enormous extensions that break anytime we change anything
>> > in the core.  It's not really clear that anybody is going to find that
>> > more maintainable than a straight fork, except to the extent that it
>> > enables the erstwhile forkers to shove some of their work onto the PG
>> > community.
>>
>> +1.
>>
>> Making the lexer and parser extensible seems desirable to me. It would
>> be beneficial not only for companies like EDB and Amazon that might
>> want to extend the grammar in various ways, but also for extension
>> authors. However, it's vastly harder than Jan's proposal to make the
>> wire protocol pluggable. The wire protocol is pretty well-isolated
>> from the rest of the system. As long as you can get queries out of the
>> packets the client sends and package up the results to send back, it's
>> all good.
>
>
> I would have to disagree that the wire protocol is well-isolated. Sending
> and receiving are not in a single file
> The codes are not even named constants so trying to find a specific one is
> difficult.
>
> Anything that would clean this up would be a benefit
>
>
> That being said, I'm not in favor of transferring maintenance work to
>> the community for this set of hooks any more than I am for something
>> on the parsing side. In general, I'm in favor of as much extensibility
>> as we can reasonably create, but with a complicated proposal like this
>> one, the community should expect to be able to get something out of
>> it. And so far what I hear Jan saying is that these hooks could in
>> theory be used for things other than Amazon's proprietary efforts and
>> those things could in theory bring benefits to the community, but
>> there are no actual plans to do anything with this that would benefit
>> anyone other than Amazon. Which seems to bring us right back to
>> expecting the community to maintain things for the benefit of
>> third-party forks.
>>
>
> if this proposal brought us the ability stream results that would be a
> huge plus!
>
> Dave Cramer
> www.postgres.rocks
>
>>
>>

-- 
Jan Wieck


wire_proto_2021-02-18_1.diff.gz
Description: application/gzip


telnet_srv_2021-02-18_1.tgz
Description: application/compressed-tar


Re: Extensibility of the PostgreSQL wire protocol

2021-02-14 Thread Dave Cramer
On Thu, 11 Feb 2021 at 09:28, Robert Haas  wrote:

> On Wed, Feb 10, 2021 at 2:04 PM Tom Lane  wrote:
> > That is a spot-on definition of where I do NOT want to end up.  Hooks
> > everywhere and enormous extensions that break anytime we change anything
> > in the core.  It's not really clear that anybody is going to find that
> > more maintainable than a straight fork, except to the extent that it
> > enables the erstwhile forkers to shove some of their work onto the PG
> > community.
>
> +1.
>
> Making the lexer and parser extensible seems desirable to me. It would
> be beneficial not only for companies like EDB and Amazon that might
> want to extend the grammar in various ways, but also for extension
> authors. However, it's vastly harder than Jan's proposal to make the
> wire protocol pluggable. The wire protocol is pretty well-isolated
> from the rest of the system. As long as you can get queries out of the
> packets the client sends and package up the results to send back, it's
> all good.


I would have to disagree that the wire protocol is well-isolated. Sending
and receiving are not in a single file
The codes are not even named constants so trying to find a specific one is
difficult.

Anything that would clean this up would be a benefit


That being said, I'm not in favor of transferring maintenance work to
> the community for this set of hooks any more than I am for something
> on the parsing side. In general, I'm in favor of as much extensibility
> as we can reasonably create, but with a complicated proposal like this
> one, the community should expect to be able to get something out of
> it. And so far what I hear Jan saying is that these hooks could in
> theory be used for things other than Amazon's proprietary efforts and
> those things could in theory bring benefits to the community, but
> there are no actual plans to do anything with this that would benefit
> anyone other than Amazon. Which seems to bring us right back to
> expecting the community to maintain things for the benefit of
> third-party forks.
>

if this proposal brought us the ability stream results that would be a huge
plus!

Dave Cramer
www.postgres.rocks

>
>


Re: Extensibility of the PostgreSQL wire protocol

2021-02-12 Thread Fabrízio de Royes Mello
On Thu, Feb 11, 2021 at 12:07 PM Tom Lane  wrote:
>
> Robert Haas  writes:
> > On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris 
wrote:
> >> As Jan said in his last email, they're not proposing all the different
> >> aspects needed. In fact, nothing has actually been proposed yet. This
> >> is an entirely philosophical debate. I don't even know what's being
> >> proposed at this point - I just know it *could* be useful. Let's just
> >> wait and see what is actually proposed before shooting it down, yes?
>
> > I don't think I'm trying to shoot anything down, because as I said, I
> > like extensibility and am generally in favor of it. Rather, I'm
> > expressing a concern which seems to me to be justified, based on what
> > was posted. I'm sorry that my tone seems to have aggravated you, but
> > it wasn't intended to do so.
>
> Likewise, the point I was trying to make is that a "pluggable wire
> protocol" is only a tiny part of what would be needed to have a credible
> MySQL, Oracle, or whatever clone.  There are large semantic differences
> from those products; there are maintenance issues arising from the fact
> that we whack structures like parse trees around all the time; and so on.
> Maybe there is some useful thing that can be accomplished here, but we
> need to consider the bigger picture rather than believing (without proof)
> that a few hook variables will be enough to do anything.
>

Just to don't miss the point, creating a compat protocol to mimic others
(TDS,
MySQL, etc) is just one use case.

There are other use cases to make wire protocol extensible, for example for
telemetry I can use some hooks to propagate context [1] and get more
detailed
tracing information about the negotiation between frontend and backend and
being able to implement a truly query tracing tool, for example.

Another use case is extending the current protocol to, for example, send
more
information about query execution on CommandComplete command instead of
just the number of affected rows.

About the HTTP protocol I think PG should have it, maybe pure HTTP (no
REST,
just HTTP) because it's the most interoperable. Performance can still be
very good
with HTTP2, and you have a huge ecosystem of tools and proxies (like Envoy)
that
would do wonders with this. You could safely query a db from a web page
(passing
through proxies that would do auth, TLS, etc). Or maybe a higher performing
gRPC
version (which is also HTTP2 and is amazing), but this makes it a bit more
difficult
to query from a web page. In either case, context propagation is already
built-in, and
in a standard way.

Regards,

[1] https://www.w3.org/TR/trace-context/

-- 
   Fabrízio de Royes Mello
   PostgreSQL Developer at OnGres Inc. - https://ongres.com


Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Joshua Drake
On Wed, Feb 10, 2021 at 11:04 AM Tom Lane  wrote:

> "Jonah H. Harris"  writes:
> > On Wed, Feb 10, 2021 at 1:10 PM Tom Lane  wrote:
> >> ...  If we start having
> >> modes for MySQL identifier quoting, Oracle outer join syntax, yadda
> >> yadda, it's going to be way more of a maintenance nightmare than some
> >> hook functions.  So if we accept any patch along this line, I want to
> >> drive a hard stake in the ground that the answer to that sort of thing
> >> will be NO.
>
> > Actually, a substantial amount can be done with hooks. For Oracle, which
> is
> > substantially harder than MySQL, I have a completely separate parser that
> > generates a PG-compatible parse tree packaged up as an extension. To
> handle
> > autonomous transactions, database links, hierarchical query conversion,
> > hints, and some execution-related items requires core changes.
>
> That is a spot-on definition of where I do NOT want to end up.  Hooks
> everywhere and enormous extensions that break anytime we change anything
> in the core.  It's not really clear that anybody is going to find that
> more maintainable than a straight fork, except to the extent that it
> enables the erstwhile forkers to shove some of their work onto the PG
> community.
>
> My feeling about this is if you want to use Oracle, go use Oracle.
> Don't ask PG to take on a ton of maintenance issues so you can have
> a frankenOracle.
>

PostgreSQL over the last decade spent a considerable amount of time
allowing it to become extensible outside of core. We are now useful in
workloads nobody would have considered in 2004 or 2008.

The more extensibility we add, the LESS we maintain. It is a lot easier to
maintain an API than it is an entire kernel. When I look at all the
interesting features coming from the ecosystem, they are all built on the
hooks that this community worked so hard to create. This idea is an
extension of that and a result of the community's success.

The more extensible we make PostgreSQL, the more the hacker community can
innovate without damaging the PostgreSQL reputation as a rock solid
database system.

Features like these only enable the entire community to innovate. Is the
real issue that the more extensible PostgreSQL is, the more boring it will
become?

JD



>
> regards, tom lane
>
>
>


Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Jim Mlodgenski
On Thu, Feb 11, 2021 at 10:29 AM Andrew Dunstan  wrote:

>
> On 2/11/21 10:06 AM, Tom Lane wrote:
> > Robert Haas  writes:
> >> On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris 
> wrote:
> >>> As Jan said in his last email, they're not proposing all the different
> >>> aspects needed. In fact, nothing has actually been proposed yet. This
> >>> is an entirely philosophical debate. I don't even know what's being
> >>> proposed at this point - I just know it *could* be useful. Let's just
> >>> wait and see what is actually proposed before shooting it down, yes?
> >> I don't think I'm trying to shoot anything down, because as I said, I
> >> like extensibility and am generally in favor of it. Rather, I'm
> >> expressing a concern which seems to me to be justified, based on what
> >> was posted. I'm sorry that my tone seems to have aggravated you, but
> >> it wasn't intended to do so.
> > Likewise, the point I was trying to make is that a "pluggable wire
> > protocol" is only a tiny part of what would be needed to have a credible
> > MySQL, Oracle, or whatever clone.  There are large semantic differences
> > from those products; there are maintenance issues arising from the fact
> > that we whack structures like parse trees around all the time; and so on.
> > Maybe there is some useful thing that can be accomplished here, but we
> > need to consider the bigger picture rather than believing (without proof)
> > that a few hook variables will be enough to do anything.
>
>
>
> Yeah. I think we'd need a fairly fully worked implementation to see
> where it goes. Is Amazon going to release (under TPL) its TDS
> implementation of this? That might go a long way to convincing me this
> is worth considering.
>
> Everything is planned to be released under the Apache 2.0 license so
people are free to do with it as they choose.


Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Andrew Dunstan


On 2/11/21 10:06 AM, Tom Lane wrote:
> Robert Haas  writes:
>> On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris  
>> wrote:
>>> As Jan said in his last email, they're not proposing all the different
>>> aspects needed. In fact, nothing has actually been proposed yet. This
>>> is an entirely philosophical debate. I don't even know what's being
>>> proposed at this point - I just know it *could* be useful. Let's just
>>> wait and see what is actually proposed before shooting it down, yes?
>> I don't think I'm trying to shoot anything down, because as I said, I
>> like extensibility and am generally in favor of it. Rather, I'm
>> expressing a concern which seems to me to be justified, based on what
>> was posted. I'm sorry that my tone seems to have aggravated you, but
>> it wasn't intended to do so.
> Likewise, the point I was trying to make is that a "pluggable wire
> protocol" is only a tiny part of what would be needed to have a credible
> MySQL, Oracle, or whatever clone.  There are large semantic differences
> from those products; there are maintenance issues arising from the fact
> that we whack structures like parse trees around all the time; and so on.
> Maybe there is some useful thing that can be accomplished here, but we
> need to consider the bigger picture rather than believing (without proof)
> that a few hook variables will be enough to do anything.



Yeah. I think we'd need a fairly fully worked implementation to see
where it goes. Is Amazon going to release (under TPL) its TDS
implementation of this? That might go a long way to convincing me this
is worth considering.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Tom Lane
Robert Haas  writes:
> On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris  
> wrote:
>> As Jan said in his last email, they're not proposing all the different
>> aspects needed. In fact, nothing has actually been proposed yet. This
>> is an entirely philosophical debate. I don't even know what's being
>> proposed at this point - I just know it *could* be useful. Let's just
>> wait and see what is actually proposed before shooting it down, yes?

> I don't think I'm trying to shoot anything down, because as I said, I
> like extensibility and am generally in favor of it. Rather, I'm
> expressing a concern which seems to me to be justified, based on what
> was posted. I'm sorry that my tone seems to have aggravated you, but
> it wasn't intended to do so.

Likewise, the point I was trying to make is that a "pluggable wire
protocol" is only a tiny part of what would be needed to have a credible
MySQL, Oracle, or whatever clone.  There are large semantic differences
from those products; there are maintenance issues arising from the fact
that we whack structures like parse trees around all the time; and so on.
Maybe there is some useful thing that can be accomplished here, but we
need to consider the bigger picture rather than believing (without proof)
that a few hook variables will be enough to do anything.

regards, tom lane




Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Robert Haas
On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris  wrote:
> I'm quite sure I said I'd open source my MySQL implementation, which allows 
> Postgres to appear to MySQL clients as a MySQL/MariaDB server. This is 
> neither proprietary nor Amazon-related and makes Postgres substantially more 
> useful for a large number of applications.

OK. There's stuff to think about there, too: do we want that in
contrib? Is it in good enough shape to be in contrib even if we did?
If it's not in contrib, how do we incorporate it into, say, the
buildfarm, so that we know if we break something? Is it actively
maintained and stable, so that if it needs adjustment for upstream
changes we can count on that getting addressed in a timely fashion? I
don't know the answers to these questions and am not trying to
prejudge, but I think they are important and relevant questions.

> As Jan said in his last email, they're not proposing all the different 
> aspects needed. In fact, nothing has actually been proposed yet. This is an 
> entirely philosophical debate. I don't even know what's being proposed at 
> this point - I just know it *could* be useful. Let's just wait and see what 
> is actually proposed before shooting it down, yes?

I don't think I'm trying to shoot anything down, because as I said, I
like extensibility and am generally in favor of it. Rather, I'm
expressing a concern which seems to me to be justified, based on what
was posted. I'm sorry that my tone seems to have aggravated you, but
it wasn't intended to do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Jonah H. Harris
On Thu, Feb 11, 2021 at 9:28 AM Robert Haas  wrote:

> That being said, I'm not in favor of transferring maintenance work to
> the community for this set of hooks any more than I am for something
> on the parsing side. In general, I'm in favor of as much extensibility
> as we can reasonably create, but with a complicated proposal like this
> one, the community should expect to be able to get something out of
> it. And so far what I hear Jan saying is that these hooks could in
> theory be used for things other than Amazon's proprietary efforts and
> those things could in theory bring benefits to the community, but
> there are no actual plans to do anything with this that would benefit
> anyone other than Amazon. Which seems to bring us right back to
> expecting the community to maintain things for the benefit of
> third-party forks.
>

I'm quite sure I said I'd open source my MySQL implementation, which allows
Postgres to appear to MySQL clients as a MySQL/MariaDB server. This is
neither proprietary nor Amazon-related and makes Postgres substantially
more useful for a large number of applications.

As Jan said in his last email, they're not proposing all the different
aspects needed. In fact, nothing has actually been proposed yet. This is an
entirely philosophical debate. I don't even know what's being proposed at
this point - I just know it *could* be useful. Let's just wait and see what
is actually proposed before shooting it down, yes?

-- 
Jonah H. Harris


Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Robert Haas
On Wed, Feb 10, 2021 at 2:04 PM Tom Lane  wrote:
> That is a spot-on definition of where I do NOT want to end up.  Hooks
> everywhere and enormous extensions that break anytime we change anything
> in the core.  It's not really clear that anybody is going to find that
> more maintainable than a straight fork, except to the extent that it
> enables the erstwhile forkers to shove some of their work onto the PG
> community.

+1.

Making the lexer and parser extensible seems desirable to me. It would
be beneficial not only for companies like EDB and Amazon that might
want to extend the grammar in various ways, but also for extension
authors. However, it's vastly harder than Jan's proposal to make the
wire protocol pluggable. The wire protocol is pretty well-isolated
from the rest of the system. As long as you can get queries out of the
packets the client sends and package up the results to send back, it's
all good. The parser, on the other hand, is not at all well-isolated
from the rest of the system. There's a LOT of code that knows a whole
lot of stuff about the structure of parse trees, so your variant
parser can't produce parse trees for new kinds of DDL, or for new
query constructs. And if it parsed some completely different syntax
where, say, joins were not explicit, it would still have to figure out
how to represent them in a way that looked just like it came out of
the regular parser -- otherwise, parse analysis and query planning and
so forth are not going to work, unless you go and change a lot of
other code too, and I don't really have any idea how we could solve
that, even in theory. But that kind of thing just isn't a problem for
the proposal on this thread.

That being said, I'm not in favor of transferring maintenance work to
the community for this set of hooks any more than I am for something
on the parsing side. In general, I'm in favor of as much extensibility
as we can reasonably create, but with a complicated proposal like this
one, the community should expect to be able to get something out of
it. And so far what I hear Jan saying is that these hooks could in
theory be used for things other than Amazon's proprietary efforts and
those things could in theory bring benefits to the community, but
there are no actual plans to do anything with this that would benefit
anyone other than Amazon. Which seems to bring us right back to
expecting the community to maintain things for the benefit of
third-party forks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Extensibility of the PostgreSQL wire protocol

2021-02-10 Thread Jan Wieck
On Wed, Feb 10, 2021 at 11:43 AM Robert Haas  wrote:

> On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck  wrote:
> > Our current plan is to create a new set of API calls and hooks that
> allow to register additional wire protocols. The existing backend libpq
> implementation will be modified to register itself using the new API. This
> will serve as a proof of concept as well as ensure that the API definition
> is not slanted towards a specific protocol. It is also similar to the way
> table access methods and compression methods are added.
>
> If we're going to end up with an open source implementation of
> something useful in contrib or whatever, then I think this is fine.
> But, if not, then we're just making it easier for Amazon to do
> proprietary stuff without getting any benefit for the open-source
> project. In fact, in that case PostgreSQL would ensure have to somehow
> ensure that the hooks don't get broken without having any code that
> actually uses them, so not only would the project get no benefit, but
> it would actually incur a small tax. I wouldn't say that's an
> absolutely show-stopper, but it definitely isn't my first choice.
>

At this very moment there are several parts to this. One is the hooks to
make wire protocols into loadable modules, which is what this effort is
about. Another is the TDS protocol as it is being implemented for Babelfish
and third is the Babelfish extension itself. Both will require additional
hooks and APIs I am not going to address here. I consider them not material
to my effort.

As for making the wire protocol itself expandable I really see a lot of
potential outside of what Amazon wants here. And I would not be advertising
it if it would be for Babelfish alone. As I laid out, just the ability for
a third party to add additional messages for special connection pool
support would be enough to make it useful. There also have been discussions
in the JDBC subproject to combine certain messages into one single message.
Why not allow the JDBC project to develop their own, JDBC-optimized backend
side? Last but not least, what would be wrong with listening for MariaDB
clients?

I am planning on a follow up project to this, demoting libpq itself to just
another loadable protocol. Just the way procedural languages are all on the
same level because that is how I developed the loadable, procedural
language handler all those years ago.

Considering how spread out and quite frankly unorganized our wire protocol
handling is, this is not a small order.


Regards, Jan








>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


-- 
Jan Wieck


Re: Extensibility of the PostgreSQL wire protocol

2021-02-10 Thread Jonah H. Harris
On Wed, Feb 10, 2021 at 2:04 PM Tom Lane  wrote:

> That is a spot-on definition of where I do NOT want to end up.  Hooks
> everywhere and enormous extensions that break anytime we change anything
> in the core.  It's not really clear that anybody is going to find that
> more maintainable than a straight fork, except to the extent that it
> enables the erstwhile forkers to shove some of their work onto the PG
> community.
>

Given the work over the last few major releases to make several other
aspects of Postgres pluggable, how is implementing a pluggable protocol API
any different?

To me, this sounds more like a philosophical disagreement with how people
could potentially use Postgres than a technical one. My point is only that,
using current PG functionality, I could equally write a pluggable storage
interface for my Oracle and InnoDB data file readers/writers, which would
similarly allow for the creation of a Postgres franken-Oracle by extension
only.

I don't think anyone is asking for hooks for all the things I mentioned - a
pluggable transaction manager, for example, doesn't make much sense. But,
when it comes to having actually done this vs. posited about its
usefulness, I'd say it has some merit and doesn't really introduce that
much complexity or maintenance overhead to core - whether the extensions
still work properly is up to the extension authors... isn't that the whole
point of extensions?

-- 
Jonah H. Harris


Re: Extensibility of the PostgreSQL wire protocol

2021-02-10 Thread Tom Lane
"Jonah H. Harris"  writes:
> On Wed, Feb 10, 2021 at 1:10 PM Tom Lane  wrote:
>> ...  If we start having
>> modes for MySQL identifier quoting, Oracle outer join syntax, yadda
>> yadda, it's going to be way more of a maintenance nightmare than some
>> hook functions.  So if we accept any patch along this line, I want to
>> drive a hard stake in the ground that the answer to that sort of thing
>> will be NO.

> Actually, a substantial amount can be done with hooks. For Oracle, which is
> substantially harder than MySQL, I have a completely separate parser that
> generates a PG-compatible parse tree packaged up as an extension. To handle
> autonomous transactions, database links, hierarchical query conversion,
> hints, and some execution-related items requires core changes.

That is a spot-on definition of where I do NOT want to end up.  Hooks
everywhere and enormous extensions that break anytime we change anything
in the core.  It's not really clear that anybody is going to find that
more maintainable than a straight fork, except to the extent that it
enables the erstwhile forkers to shove some of their work onto the PG
community.

My feeling about this is if you want to use Oracle, go use Oracle.
Don't ask PG to take on a ton of maintenance issues so you can have
a frankenOracle.

regards, tom lane




Re: Extensibility of the PostgreSQL wire protocol

2021-02-10 Thread Jonah H. Harris
On Wed, Feb 10, 2021 at 1:10 PM Tom Lane  wrote:

> What I'm actually more concerned about, in this whole line of development,
> is the follow-on requests that will surely occur to kluge up Postgres
> to make its behavior more like $whatever.  As in "well, now that we
> can serve MySQL clients protocol-wise, can't we pretty please have a
> mode that makes the parser act more like MySQL".  If we start having
> modes for MySQL identifier quoting, Oracle outer join syntax, yadda
> yadda, it's going to be way more of a maintenance nightmare than some
> hook functions.  So if we accept any patch along this line, I want to
> drive a hard stake in the ground that the answer to that sort of thing
> will be NO.
>

Actually, a substantial amount can be done with hooks. For Oracle, which is
substantially harder than MySQL, I have a completely separate parser that
generates a PG-compatible parse tree packaged up as an extension. To handle
autonomous transactions, database links, hierarchical query conversion,
hints, and some execution-related items requires core changes. But, the
protocol and parsing can definitely be done with hooks. And, as was
mentioned previously, this isn't tied directly to emulating another
database - it would enable us to support an HTTP-ish interface directly in
the server as an extension as well. A lot of this can be done with
background worker extensions now, which is how my stuff was primarily
architected, but it's hacky when it comes to areas where the items Jan
discussed could clean things up and make them more pluggable.

Assuming we're going to keep to that, though, it seems like people
> doing this sort of thing will inevitably end up with a fork anyway.
> So maybe we should just not bother with the first step either.
>

Perhaps I'm misunderstanding you, but I wouldn't throw this entire idea out
(which enables a substantial addition of extensible functionality with a
limited set of touchpoints) on the premise of future objections.

-- 
Jonah H. Harris


Re: Extensibility of the PostgreSQL wire protocol

2021-02-10 Thread Tom Lane
Robert Haas  writes:
> If we're going to end up with an open source implementation of
> something useful in contrib or whatever, then I think this is fine.
> But, if not, then we're just making it easier for Amazon to do
> proprietary stuff without getting any benefit for the open-source
> project. In fact, in that case PostgreSQL would ensure have to somehow
> ensure that the hooks don't get broken without having any code that
> actually uses them, so not only would the project get no benefit, but
> it would actually incur a small tax. I wouldn't say that's an
> absolutely show-stopper, but it definitely isn't my first choice.

As others noted, a test module could be built to add some coverage here.

What I'm actually more concerned about, in this whole line of development,
is the follow-on requests that will surely occur to kluge up Postgres
to make its behavior more like $whatever.  As in "well, now that we
can serve MySQL clients protocol-wise, can't we pretty please have a
mode that makes the parser act more like MySQL".  If we start having
modes for MySQL identifier quoting, Oracle outer join syntax, yadda
yadda, it's going to be way more of a maintenance nightmare than some
hook functions.  So if we accept any patch along this line, I want to
drive a hard stake in the ground that the answer to that sort of thing
will be NO.

Assuming we're going to keep to that, though, it seems like people
doing this sort of thing will inevitably end up with a fork anyway.
So maybe we should just not bother with the first step either.

regards, tom lane




Re: Extensibility of the PostgreSQL wire protocol

2021-02-10 Thread Jonah H. Harris
On Wed, Feb 10, 2021 at 11:43 AM Robert Haas  wrote:

> On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck  wrote:
> > Our current plan is to create a new set of API calls and hooks that
> allow to register additional wire protocols. The existing backend libpq
> implementation will be modified to register itself using the new API. This
> will serve as a proof of concept as well as ensure that the API definition
> is not slanted towards a specific protocol. It is also similar to the way
> table access methods and compression methods are added.
>
> If we're going to end up with an open source implementation of
> something useful in contrib or whatever, then I think this is fine.
> But, if not, then we're just making it easier for Amazon to do
> proprietary stuff without getting any benefit for the open-source
> project. In fact, in that case PostgreSQL would ensure have to somehow
> ensure that the hooks don't get broken without having any code that
> actually uses them, so not only would the project get no benefit, but
> it would actually incur a small tax. I wouldn't say that's an
> absolutely show-stopper, but it definitely isn't my first choice.
>

Agreed on adding substantial hooks if they're not likely to be used. While
I haven't yet seen AWS' implementation or concrete proposal, given the
people involved, I assume it's fairly similar to how I implemented it.
Assuming that's correct and it doesn't require substantial redevelopment,
I'd certainly open-source my MySQL-compatible protocol and parser
implementation. From my perspective, it would be awesome if these could be
done as extensions.

While I'm not planning to open source it as of yet, for my
Oracle-compatible stuff, I don't think I'd be able to do anything other
than the protocol as an extension given the core-related changes similar to
what EDB has to do. I don't think there's any easy way to get around that.
But, for the protocol and any type of simple translation to Postgres'
dialect, I think that could easily be hook-based.

-- 
Jonah H. Harris


Re: Extensibility of the PostgreSQL wire protocol

2021-02-10 Thread Fabrízio de Royes Mello
On Wed, Feb 10, 2021 at 1:43 PM Robert Haas  wrote:
>
> On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck  wrote:
> > Our current plan is to create a new set of API calls and hooks that
allow to register additional wire protocols. The existing backend libpq
implementation will be modified to register itself using the new API. This
will serve as a proof of concept as well as ensure that the API definition
is not slanted towards a specific protocol. It is also similar to the way
table access methods and compression methods are added.
>
> If we're going to end up with an open source implementation of
> something useful in contrib or whatever, then I think this is fine.
> But, if not, then we're just making it easier for Amazon to do
> proprietary stuff without getting any benefit for the open-source
> project. In fact, in that case PostgreSQL would ensure have to somehow
> ensure that the hooks don't get broken without having any code that
> actually uses them, so not only would the project get no benefit, but
> it would actually incur a small tax. I wouldn't say that's an
> absolutely show-stopper, but it definitely isn't my first choice.

As far I understood Jan's proposal is to add enough hooks on PostgreSQL to
enable us to extend the wire protocol and add a contrib module as an
example (maybe TDS, HTTP or just adding new capabilities to current
implementation).

Regards,

-- 
   Fabrízio de Royes Mello
   PostgreSQL Developer at OnGres Inc. - https://ongres.com


Re: Extensibility of the PostgreSQL wire protocol

2021-02-10 Thread Robert Haas
On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck  wrote:
> Our current plan is to create a new set of API calls and hooks that allow to 
> register additional wire protocols. The existing backend libpq implementation 
> will be modified to register itself using the new API. This will serve as a 
> proof of concept as well as ensure that the API definition is not slanted 
> towards a specific protocol. It is also similar to the way table access 
> methods and compression methods are added.

If we're going to end up with an open source implementation of
something useful in contrib or whatever, then I think this is fine.
But, if not, then we're just making it easier for Amazon to do
proprietary stuff without getting any benefit for the open-source
project. In fact, in that case PostgreSQL would ensure have to somehow
ensure that the hooks don't get broken without having any code that
actually uses them, so not only would the project get no benefit, but
it would actually incur a small tax. I wouldn't say that's an
absolutely show-stopper, but it definitely isn't my first choice.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Extensibility of the PostgreSQL wire protocol

2021-01-25 Thread Jan Wieck
Hi Jonah,

On Mon, Jan 25, 2021 at 10:18 AM Jonah H. Harris 
wrote:

> On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck  wrote:
>
>> The following is a request for discussion and comments, not a refined
>> proposal accompanied by a working patch.
>>
>
> After implementing this three different ways inside the backend over the
> years, I landed on almost this identical approach for handling the MySQL,
> TDS, MongoDB, and Oracle protocols for NEXTGRES.
>

Could any of that be open sourced? It would be an excellent addition to add
one of those as example code.


Regards, Jan



>
> Initially, each was implemented as an background worker extension which
> had to handle its own networking, passing the fd off to new
> protocol-specific connections, etc. This worked, but duplicate a good
> amount of logic. It would be great to have a standard, loadable, way to add
> support for a new protocol.
>
> --
> Jonah H. Harris
>
>

-- 
Jan Wieck


Re: Extensibility of the PostgreSQL wire protocol

2021-01-25 Thread Jonah H. Harris
On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck  wrote:

> The following is a request for discussion and comments, not a refined
> proposal accompanied by a working patch.
>

After implementing this three different ways inside the backend over the
years, I landed on almost this identical approach for handling the MySQL,
TDS, MongoDB, and Oracle protocols for NEXTGRES.

Initially, each was implemented as an background worker extension which had
to handle its own networking, passing the fd off to new protocol-specific
connections, etc. This worked, but duplicate a good amount of logic. It
would be great to have a standard, loadable, way to add support for a new
protocol.

-- 
Jonah H. Harris