Re: Proposal: http2 wire format

2018-05-10 Thread Robert Haas
On Mon, Mar 26, 2018 at 7:51 PM, Craig Ringer  wrote:
> There's been no visible consideration of overheads and comparison with
> existing v3 protocol. Personally I'm fine with adding some protocol overhead
> in bytes terms; low latency links have the bandwidth not to care much
> compared to payload sizes etc. On high latency links it's all about the
> round trips, not message sizes. But I want to know what those overheads are,
> and why they're there.

I think that the overhead of any new protocol (or protocol version)
ought to be a major consideration.  Overhead includes, but is not
limited to, number of bytes sent over the wire.  It also includes how
fast we can parse that protocol; Andres's earlier comments on this
thread abut Parse/Bind/Execute being slower than Query are on point.
If we implement a new protocol, we should measure how many QPS we can
push through it (for both prepared and unprepared queries).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: http2 wire format

2018-03-29 Thread Andres Freund
On 2018-03-29 17:52:07 -0400, Peter Eisentraut wrote:
> On 3/29/18 14:20, Andres Freund wrote:
> > On 2018-03-28 20:34:13 -0400, Peter Eisentraut wrote:
> >> On 3/28/18 12:09, Andres Freund wrote:
> >>> Yea, not the most descriptive... Returning multiple different resultsets
> >>> from a function / procedure. Inability to do so is a serious limitation
> >>> of postgres in comparison to some other language with procedures.
> >>
> >> This is already possible as far as the protocol is concerned.
> > 
> > Huh, I don't see how?
> 
> See example here:
> https://www.postgresql.org/message-id/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
> 
> More simply, you can already do this with psql like this:
> 
> => SELECT * FROM tbl1\; SELECT * FROM tbl2;
> 
> This will ship multiple result sets.  psql chooses to only display the
> last one.  This is also discussed in the above thread.

I don't think this is the real deal. For one, it really isn't
transparent to the client where statement boundaries are. That matters a
great deal when using pipelining. I think you really need framing that's
separate for client initiated statement than from multiple results sets
originating from the same statement.

Greetings,

Andres Freund



Re: Proposal: http2 wire format

2018-03-29 Thread Peter Eisentraut
On 3/29/18 14:20, Andres Freund wrote:
> On 2018-03-28 20:34:13 -0400, Peter Eisentraut wrote:
>> On 3/28/18 12:09, Andres Freund wrote:
>>> Yea, not the most descriptive... Returning multiple different resultsets
>>> from a function / procedure. Inability to do so is a serious limitation
>>> of postgres in comparison to some other language with procedures.
>>
>> This is already possible as far as the protocol is concerned.
> 
> Huh, I don't see how?

See example here:
https://www.postgresql.org/message-id/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com

More simply, you can already do this with psql like this:

=> SELECT * FROM tbl1\; SELECT * FROM tbl2;

This will ship multiple result sets.  psql chooses to only display the
last one.  This is also discussed in the above thread.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: http2 wire format

2018-03-29 Thread Andres Freund
On 2018-03-28 20:34:13 -0400, Peter Eisentraut wrote:
> On 3/28/18 12:09, Andres Freund wrote:
> > Yea, not the most descriptive... Returning multiple different resultsets
> > from a function / procedure. Inability to do so is a serious limitation
> > of postgres in comparison to some other language with procedures.
> 
> This is already possible as far as the protocol is concerned.

Huh, I don't see how?

Greetings,

Andres Freund



Re: Proposal: http2 wire format

2018-03-29 Thread Hannu Krosing
>
> > * room for other resultset formats later. Like Damir, I really want to
> add
> > protobuf or json serializations of result sets at some point, mainly so
> we
> > can return "entity graphs" in graph representation rather than left-join
> > projection.
>
> -1. I don't think this belongs in postgres.
>

Maybe the functionality does not belong in *core* postgres, but I sure
would like it to be possible to have an extension being able to do it.

A bit similar to what logical decoding plugins do now, just more flexible
in terms of protocol

Cheers
Hannu


Re: Proposal: http2 wire format

2018-03-28 Thread Peter Eisentraut
On 3/28/18 12:09, Andres Freund wrote:
> Yea, not the most descriptive... Returning multiple different resultsets
> from a function / procedure. Inability to do so is a serious limitation
> of postgres in comparison to some other language with procedures.

This is already possible as far as the protocol is concerned.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: http2 wire format

2018-03-28 Thread Andres Freund
Hi,

On 2018-03-28 09:59:34 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > A few random, very tired, points:
> 
> > - consolidated message for common tasks:
> >   - (bind, [describe?,] execute) to reduce overhead of prepared
> > statement execution (both in messages, as well as branches)
> >   - (anonymous parse, bind, describe, execute) to make it cheaper to
> > send statements with out-of-line parameters
> 
> I do not see a need for this; you can already send those combinations of
> messages in a single network packet if you have a mind to.

The simple protocol right now is *considerably* faster than the extended
protocol. The extended protocol sends more data overall, we do more
memory context resets, there's more switches between protocol messages
in both backend and client. All of those aren't free.

https://www.postgresql.org/message-id/12500.1470002232%40sss.pgh.pa.us

I've previously wondered whether we can peek ahead in the stream and
recognize that we got a bind/describe/execute or
parse/bind/describe/execute and execute them all together if all the
necessary data is there. To avoid new protocol messages.


> Your other points are sound, except I have no idea what this means:
> 
> > - nested table support

Yea, not the most descriptive... Returning multiple different resultsets
from a function / procedure. Inability to do so is a serious limitation
of postgres in comparison to some other language with procedures.


Greetings,

Andres Freund



Re: Proposal: http2 wire format

2018-03-28 Thread Andres Freund
Hi,

On 2018-03-28 16:29:37 +0800, Craig Ringer wrote:
> > - allow *streaming* of large datums
> 
> Yes, very much +1 there. That's already on the wiki. Yeah:
> 
> * Permit lazy fetches of large values, at least out-of-line TOASTED values
> http://www.postgresql.org/message-id/53ff0ef8@2ndquadrant.com

That's not necessarily the same though. What I think we need is the
ability to have "chunked" encoding with *optional* length for the
overall datum. And then the backend infrastructure to be able to send
*to the wire* partial datums.  Probably with some callback based
StringInfo like buffer.

> - nested table support
> >
> >
> Can you elaborate on that one?

Nested recordsets. E.g. a SRF or procedure returning multiple query results.


> * room for other resultset formats later. Like Damir, I really want to add
> protobuf or json serializations of result sets at some point, mainly so we
> can return "entity graphs" in graph representation rather than left-join
> projection.

-1. I don't think this belongs in postgres.

Greetings,

Andres Freund



Re: Proposal: http2 wire format

2018-03-28 Thread Tom Lane
Andres Freund  writes:
> A few random, very tired, points:

> - consolidated message for common tasks:
>   - (bind, [describe?,] execute) to reduce overhead of prepared
> statement execution (both in messages, as well as branches)
>   - (anonymous parse, bind, describe, execute) to make it cheaper to
> send statements with out-of-line parameters

I do not see a need for this; you can already send those combinations of
messages in a single network packet if you have a mind to.  Tatsuo-san's
point about making it easier to identify which response goes with which
message would improve life for people trying to send multiple messages
in advance of a response, though.

Your other points are sound, except I have no idea what this means:

> - nested table support

regards, tom lane



Re: Proposal: http2 wire format

2018-03-28 Thread Tatsuo Ishii
> A few random, very tired, points:
> 
> - consolidated message for common tasks:
>   - (bind, [describe?,] execute) to reduce overhead of prepared
> statement execution (both in messages, as well as branches)
>   - (anonymous parse, bind, describe, execute) to make it cheaper to
> send statements with out-of-line parameters
> - get rid of length limits of individual fields, probably w/ some variable
>   length encoding (simple 7 bit?)
> - allow *streaming* of large datums
> - type-level decisions about binary type transport, right now it's a lot
>   of effort (including potentially additional roundtrips), to get the
>   few important types to be transported in binary fashion. E.g. floating
>   points are really expensive to stringify, bytea as text gets a lot
>   bigger etc, but a lot of other types don't benefit a lot
> - annotate COMMIT, PREPARE TRANSACTION, COMMIT PREPARED with LSN of
>   associated WAL record
> - have a less insane cancellation handling
> - nested table support

I would like to have portal/statement name to be added to response
messages (i.e. parse complete, bind complete, close complete, and
command complete.). Currently it's not easy to recognize which
response corresponds to which message, which makes certain
applications such as Pgpool-II hard to implement and inefficient.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Proposal: http2 wire format

2018-03-28 Thread Craig Ringer
On 28 March 2018 at 16:02, Andres Freund  wrote:

> On 2018-03-26 22:44:09 +0200, Damir Simunic wrote:
> > > *NONE* of the interesting problems are solved by HTTP2. You *still*
> > > need a full blown protocol ontop of it. So no, this doesn't change
> that.
> >
> > If you had to nominate only one of those problems, which one would you
> consider the most interesting?
>
> A few random, very tired, points:
>
> - consolidated message for common tasks:
>   - (bind, [describe?,] execute) to reduce overhead of prepared
> statement execution (both in messages, as well as branches)
>   - (anonymous parse, bind, describe, execute) to make it cheaper to
> send statements with out-of-line parameters
> - get rid of length limits of individual fields, probably w/ some variable
>   length encoding (simple 7 bit?)
>

In preparation for the eventually-inevitable 64-bit field sizes, yes.

This should be on the protocol todo wiki.


> - allow *streaming* of large datums


Yes, very much +1 there. That's already on the wiki. Yeah:

* Permit lazy fetches of large values, at least out-of-line TOASTED values
http://www.postgresql.org/message-id/53ff0ef8@2ndquadrant.com


- type-level decisions about binary type transport, right now it's a lot
>   of effort (including potentially additional roundtrips), to get the
>   few important types to be transported in binary fashion. E.g. floating
>   points are really expensive to stringify, bytea as text gets a lot
>   bigger etc, but a lot of other types don't benefit a lot
>

Yeah, as distinct from now, where the client has specify param-by-param,
and where libpq doesn't support mixing text and binary formats in result
sets at all.

Again, needs wiki. I'll add.

- annotate COMMIT, PREPARE TRANSACTION, COMMIT PREPARED with LSN of
>   associated WAL record
>

Already on the wiki, as is the related job of sending the xid of a txn to
the client when one is assigned.


> - have a less insane cancellation handling
>

+100

- nested table support
>
>
Can you elaborate on that one?


A few other points that come to mind for me are:

* labeled result sets (useful for stored procs, etc, as came up recently
with trying to figure out how to let stored procs have OUT params and
multiple result sets)

* room for other resultset formats later. Like Damir, I really want to add
protobuf or json serializations of result sets at some point, mainly so we
can return "entity graphs" in graph representation rather than left-join
projection.

* Robert Haas was talking about some issues relating to sync and the COPY
BOTH protocol a while ago, which we'd want to address.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proposal: http2 wire format

2018-03-28 Thread Craig Ringer
On 28 March 2018 at 00:42, Damir Simunic 
wrote:

>
>
> I'm rapidly losing interest. Unless this goes back toward the concrete and
> practical I think it's going nowhere.
>
>
>
> Your message is exactly what I was hoping for. Thanks for your guidance
> and support, really appreciate you.
>
> Let me now get busy and earn your continued interest and support.
>
>
I spent a lot of time reviewing what you wrote and proposed, looked over
your proof of concept, and offered feedback. Much of which you ignored.
I've been trying to help and took a fair bit of time to do so.

I've outlined what I think needs to happen to push this in a practical
direction.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proposal: http2 wire format

2018-03-28 Thread Andres Freund
On 2018-03-26 22:44:09 +0200, Damir Simunic wrote:
> > *NONE* of the interesting problems are solved by HTTP2. You *still*
> > need a full blown protocol ontop of it. So no, this doesn't change that.
>
> If you had to nominate only one of those problems, which one would you 
> consider the most interesting?

A few random, very tired, points:

- consolidated message for common tasks:
  - (bind, [describe?,] execute) to reduce overhead of prepared
statement execution (both in messages, as well as branches)
  - (anonymous parse, bind, describe, execute) to make it cheaper to
send statements with out-of-line parameters
- get rid of length limits of individual fields, probably w/ some variable
  length encoding (simple 7 bit?)
- allow *streaming* of large datums
- type-level decisions about binary type transport, right now it's a lot
  of effort (including potentially additional roundtrips), to get the
  few important types to be transported in binary fashion. E.g. floating
  points are really expensive to stringify, bytea as text gets a lot
  bigger etc, but a lot of other types don't benefit a lot
- annotate COMMIT, PREPARE TRANSACTION, COMMIT PREPARED with LSN of
  associated WAL record
- have a less insane cancellation handling
- nested table support

Greetings,

Andres Freund



Re: Proposal: http2 wire format

2018-03-27 Thread Damir Simunic
> 
> 
> I'm rapidly losing interest. Unless this goes back toward the concrete and 
> practical I think it's going nowhere.


Your message is exactly what I was hoping for. Thanks for your guidance and 
support, really appreciate you. 

Let me now get busy and earn your continued interest and support. 


Damir
> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/ 
> 
>  PostgreSQL Development, 24x7 Support, Training & Services



Re: Proposal: http2 wire format

2018-03-26 Thread Craig Ringer
On 26 March 2018 at 22:56, Tom Lane  wrote:

> Damir Simunic  writes:
> >> On 26 Mar 2018, at 11:06, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
> >>> If anyone finds the idea of Postgres speaking http2 appealing
>
> TBH, this sounds like a proposal to expend a whole lot of work (much of it
> outside the core server, and thus not under our control) in order to get
> from a state of affairs where there are things we'd like to do but can't
> because of protocol compatibility worries, to a different state of affairs
> where there are things we'd like to do but can't because of protocol
> compatibility worries.  Why would forcing our data into a protocol
> designed for a completely different purpose, and which we have no control
> over, be a step forward?  How would that address the fundamental issue of
> inertia in multiple chunks of software (ie, client libraries and
> applications as well as the server)?
>

I think the idea is that the protocol (supposedly) solves a lot of the
issues we now face, and has sufficient extensibility built in for future
use.

I'm not convinced. The v4 protocol TODO hasn't been addressed, not has
support for handshake authentication models like SSPI, GSSAPI. There's been
no mention of query cancels, text encodings, or any of the other ongoing
pain points in the v3 protocol.

I completely understand the desire to support a totally new model where Pg
accepts and internally dispatches requests to a separate set of executors,
which may or may not be 1:1 with session state. I think we all do. But
predicating a protocol change on that being possible is wholly impractical.
But it looks like the availability of something like that is just being
assumed.

I want to see concrete reasons why this meets our existing and future
needs, and those of client apps.

I want to see _EXAMPLES_ of how protocol exchanges would work. Show:

- connect
- authenticate
- establish session
- begin txn
- query
- result set
- query
- error midway through result set
- sync recovery
- rollback
- utility query
- resultset
- query
- query cancel


There's been no visible consideration of overheads and comparison with
existing v3 protocol. Personally I'm fine with adding some protocol
overhead in bytes terms; low latency links have the bandwidth not to care
much compared to payload sizes etc. On high latency links it's all about
the round trips, not message sizes. But I want to know what those overheads
are, and why they're there.

I'm rapidly losing interest. Unless this goes back toward the concrete and
practical I think it's going nowhere.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proposal: http2 wire format

2018-03-26 Thread Stephen Frost
Greetings,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 26 March 2018 at 21:05, Damir Simunic 
> wrote:
> > The same goes for the ‘authorization’ header. Postgres does not support
> > Bearer token authorization today. But maybe you’ll be able to define a
> > function that knows how to deal with the token, and somehow signal to
> > Postgres that you want it to call this function when it sees such a header.
> > Or maybe someone wrote a plugin that does that, and you configure your
> > server to use it.
> 
> You've consistently ignored my comments re authentication and authorization.
> 
> How would a multi-step handshake authentication like GSSAPI or SSPI be
> implemented with HTTP2? Efficiently?

I've been trying to avoid this thread, but I'll throw in that the way
GSSAPI is handled on the web today is through SPNEGO:

https://en.wikipedia.org/wiki/SPNEGO

Would be great to get pgAdmin4 to work under a webserver which is
performing SPNEGO and Kerberos delegation to allow user who are
authenticated to the web server to let the web server proxy those
credentials to allow connecting to PG, and, independently, for
credentials to be able to be delegated to PG which can be used for
connections to other services via FDWs.

All of that is largely independent of http2, of course.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Proposal: http2 wire format

2018-03-26 Thread Craig Ringer
On 26 March 2018 at 21:05, Damir Simunic 
wrote:

> > On 26 Mar 2018, at 11:06, Vladimir Sitnikov 
> wrote:
> >
> > Hi,
> >
> > >If anyone finds the idea of Postgres speaking http2 appealing
> >
> > HTTP/2 sounds interesting.
> > What do you think of https://grpc.io/ ?
> >
> > Have you evaluated it?
> > It does sound like a ready RPC on top of HTTP/2 with support for lots of
> languages.
> >
> > The idea of reimplementing the protocol for multiple languages from
> scratch does not sound too appealing.
>
> This proposal takes the stance that having HTTP2 wire protocol in place
> will enable wide experimentation  with and implementation of many new
> features and content types, but is not concerned with the specifics of
> those.
>
> ---
> Let me illustrate with an example how it would look if we already had
> HTTP2 as proposed.
>
> Lets’ say you have a building automation device on your network that
> happens to speak grpc, and you decided to use Postgres to store published
> topics in the database.
>
> Your grpc-speaking device might connect to Postgres and issue a request
> like this:
>
> HEADERS (flags = END_HEADERS)
> :method = POST
> :scheme = http
> :path = /CreateTopic
> pg-database = Publisher
> content-type = application/grpc+proto
> grpc-encoding = gzip
> authorization = Bearer y235.wef315yfh138vh31hv93hv8h3v
>
> DATA (flags = END_STREAM)
> 
>
> (This is from grpc.io homepage; uppercase HEADERS and DATA are frame
> names from the HTTP2 specification).
>
> Postgres would take care of TLS negotiation, unpack the frames, decompress
> the headers (:method, :path, etc are transferred compressed with a lookup
> table) and copy the payload into memory and make it  all available to the
> backend. If this was the first request, it would start the backend for you
> as well.
>
> Postgres doesn’t know about grpc, so it would just conveniently return
> "406 Not Supported” to your client and close the stream (but not the
> connection). Still connected and authenticated, the device could retry the
> request with `content-type: application/json`, and if you somehow
> programmed a function that accepts json, the request would go through.
> (Let’s imagine we have some kind of mechanism to associate functions to
> requests and content types, maybe through some function attributes in the
> catalog).
>
>
This seems to have gone pretty pie-in-the-sky overnight. If I understand
correctly, what you're getting at is "eventually I'd like content
negotiation that lets us support alternate query representations and
response respresentations".

If so, me too. And HTTP2 has some features that are interesting there. But
it doesn't have a great deal to do with the immediate issues with v3, or
concrete benefits to uses that are already possible with v3.

Again, if your proposed protocol implementation adds significant overhead
it's probably a nonstarter.


> The same goes for the ‘authorization’ header. Postgres does not support
> Bearer token authorization today. But maybe you’ll be able to define a
> function that knows how to deal with the token, and somehow signal to
> Postgres that you want it to call this function when it sees such a header.
> Or maybe someone wrote a plugin that does that, and you configure your
> server to use it.
>

You've consistently ignored my comments re authentication and authorization.

How would a multi-step handshake authentication like GSSAPI or SSPI be
implemented with HTTP2? Efficiently?

You also mentioned Pg "starting a backend or using an existing one". Er,
no. You're assuming the presence of a connection pooler of sorts within Pg
its self. Many people want that, myself included, but it's a fairly tricky
problem with Pg's architecture, and definitely not something you should
assume with any new protocol proposal.

I'm increasingly convinced that you're pursuing your interesting use cases
and disregarding the need to solve the specific problems with the current
protocol and server architecture. You also seem to be handwaving away
impediments like the strongly tcp-session-based connection structure.
That's not going to fly.

IMO, you should really:

* Read
https://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes_.2F_v4_Protocol
and explain how this protocol does/doesn't address those items

* Explain how you see handshake based auth fitting into this. Remember that
we currently support strong authentication on cleartext protocols.

* Explain how query-cancels will work. Does the protocol help? Retaining
the current make-a-second-connection model is tolerable, but gross; a new
protocol should ideally address this.

* Explain how sync recovery will work when the data stream is interrupted
by a cancel or error, WITHOUT terminating the session

* Explain what a MINIMAL implementation delivers. Touching on extensibility
is good, but lets focus on what can be done soon.

* Explain how sessions will work across multiple request/response cycles.
You should assume that 1 session = 1 TCP connection f

Re: Proposal: http2 wire format

2018-03-26 Thread David G. Johnston
On Mon, Mar 26, 2018 at 1:05 PM, Damir Simunic  wrote:

> Would it be the only protocol supported? What if I wanted JSON or CSV
> returned, or just plain old Postgres v3 binary format, since I already have
> the parser written for it? Wouldn’t you need to first solve the problem of
> content negotiation?
>

​Is content negotiation - beyond client/server character encoding -
something we want the server to be capable of performing?  My gut reaction
is no.

Getting rid of having to write a framing parser in every client language
> ​?​
>


​How large a problem/benefit is this overall?  We are choosing between
standard-but-new versus specialized-but-widely-implemented.  While the v3
protocol is a sunk cost there must be considerable value in incrementing it
20% to get to better place rather than starting over from scratch with a
general-purpose, and I suspect more verbose, protocol.

I admire the vision presented here but I do wonder whether its asking
PostgreSQL to be more than it is reasonably capable of being?  Presently
the architectures I'm aware of have clients talk to middleware application
servers, running DB drivers, talking to PostgreSQL clusters.  This vision
wants to remove the middleware application server and allow clients to
directly communicate with the server in client-native protocols and formats
(http and json).  That adds a considerable amount of responsibility to
PostgreSQL that it does not presently have and, having observed the
community for a number of years now and seeing the responses on this
thread, is responsibility it probably should not be given.  Let those
concerns reside in the middleware under the control of developers -
potentially through frameworks such as PostGraphile [1] and the like.

Or a fork - one that can choose to operate different and/or more frequent
release cycle than the annual one that PostgreSQL uses.

David J.

[1] https://github.com/graphile/postgraphile


Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> Currently it is implemented via different v3 messages (parse, bind, execute, 
> row description, data row, etc etc).
> 
> The claim is *any* implementation "on top of HTTP/2" would basically require 
> to implement those "parse, bind, execute, row data, etc" *messages*.

Why? Wouldn’t you be able to package that into a single request with query in 
the data frame and params as headers?

> Say you pick to use "/parse" url with SQL text in body instead of "parse 
> message". It does not make the whole thing "just HTTP/2". It just means 
> you've created "your own protocol on top of HTTP/2”.

It is new functionality, isn’t it? Of course you have to evolve protocol 
semantics for that. That’s the whole point! HTTP2 is just a nice substrate that 
comes with the way to negotiate capabilities and can separate the metadata from 
payload. Nothing revolutionary, but it lets you move forward without hurting 
existing applications. Isn’t that an upgrade from v3?

> 
> Clients would have to know the sequence of valid messages,
> clients would have to know if SQL should be present in body or in URL or in 
> form post data, etc, etc.
> 
> I believe Andres means exactly the same thing as he says
> 
> By the way: you cannot just "load balance" "parse/bind/exec" to different 
> backends, so the load balancer should be aware of meaning of those 
> "parse/bind/exec" messages. I believe that is one of the requirements Craig 
> meant by "Is practical to implement in connection pooler proxies”.

Why can’t I package this into a single request? Don’t modern web proxies deal 
with session affinity and stuff like that?

> 
> Andres>You *still* need a full blown protocol ontop of it. So no, this 
> doesn't change that
> 
> 
> Damir> Did you know that Go has HTTP2 support in the standard library? And so 
> does java, too?
> 
> Java has TCP implementation in the standard library.
> Does it help implementing v3 protocol?

It does. If Java only had IP, without TCP, would you be able to implement your 
driver? Yes, but you’d have to suffer longer.

> In the same way HTTP/2 "as a library" helps implementing v4. The problem is 
> it does not. Developer would have to somehow code the coding rules (e.g. 
> header names, body formats).
> HTTP/2 is just too low level.
> 

It’s just framing. But standard framing.

> 
> Damir>Why parse some obscure Postgres-specific binary data types when you can 
> have the database send you the data in the serialization format of your 
> client language?
> 
> From my own experience, automatic use of server-prepared statements (see 
> https://github.com/pgjdbc/pgjdbc/pull/319 
>  ) did cut end-user response times 
> of our business application in half.
> That is clients would have to know the way to use prepared statements in 
> order to get decent performance.
> If you agree with that, then "v3 parse message", "v3 bind message", "v3 
> execute message" is not that different from "HTTP/2 POST to /parse", "HTTP/2 
> POST to /bind", "HTTP/2 POST to /execute". It is still "obscure 
> PostgreSQL-specific HTTP/2 calls”.

What of having that in one single request?

> 
> Even if you disagree (really?) you would still have to know 
> PostgreSQL-specific way to encode SQL text and "number of rows returned" and 
> "wire formats for the columns" even for a single "HTTP POST 
> /just/execute/sql" kind of API. Even that is "a full blown protocol ontop of 
> HTTP2" (c) Andres.

What does your business app do with the data?



> 
> Vladimir



Re: Proposal: http2 wire format

2018-03-26 Thread Alvaro Hernandez



On 26/03/18 21:57, Damir Simunic wrote:


On 26 Mar 2018, at 15:42, Alvaro Hernandez > wrote:




On 26/03/18 13:11, Damir Simunic wrote:
On 26 Mar 2018, at 11:13, Vladimir Sitnikov 
mailto:sitnikov.vladi...@gmail.com>> 
wrote:


Damir> * What are the criteria for getting this into the core?
Craig>Mine would be:

+1

There's a relevant list as well: 
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md 





This is a great addition to the list, thanks!

Damir



    Hi Damir.

    I'm interested in the idea. However, way before writing a PoC, 
IMVHO I'd rather write a detailed document including:


- A brief summary of the main features of HTTP2 and why it might be a 
good fit for PG (of course there's a lot of doc in the wild about 
HTTP/2, so just a summary of the main relevant features and an 
analysis of how it may fit Postgres).


- A more or less thorough description of how every feature in current 
PostgreSQL protocol would be implemented on HTTP/2.


- Similar to the above, but applied to the v4 TODO feature list.

- A section for connection poolers, as  an auth, as these are very 
important topics.



    Hope this helps,

    Álvaro



Álvaro, it does help, thanks. This discussion is to inform such a 
document. But the topic is such that having a good PoC will move the 
discussion further much faster.


    I don't particularly agree with this. A PoC for a significant 
undertaking like this is a) time and effort consuming and b) will need 
to take clear directions that may be completely wrong later. A matter as 
important as this requires design, and coding without it seems to be 
wrong direction in my opinion. I'm a bit skeptical about HTTP/2 being a 
good idea here, but I'm more than in to study it in detail if I'm 
enlightened about it and the proposed advantages. As you may have seen I 
contributed one of the "brain dumps" at 
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md 
So I'm on the side of looking at new versions/protocols. But a PoC will 
not help here at all, a design document will :)




Can you help with thinking about how would HTTP2 impact connection 
poolers, I don’t know much about those?


    For one, authentication. It needs to be possible to authenticate 
when there is a "man in the middle", which is far from trivial. Also 
extra round trips may not be acceptable. There are challenges in session 
management, user management, whether to add balancing and failover 
here... If a new protocol is to be designed, I'd like to think about 
these problems -even if not solve them all-.



    Álvaro

--

Alvaro Hernandez


---
OnGres



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
Hi Andres,

> 
> At least I do *NOT* want many protocols in core. We've a hard enough
> time to keep up with integrating patches and maintenance to not just
> willy nilly integrate multiple new features with unclear lifetimes.

Admire your effort in applying all these patches—this commitfest thing looks 
frenetic now that I’m subscribed to the mailing list. Can only guess the effort 
required on the part of a few of you to study and triage everything. Respect.

Actually, I don’t advocate multiple protocols in core. But the exercise of 
considering one will help generalize the architecture enough to make all 
protocols pluggable. 

The most interesting part for me is working out content negotiation—I think 
being able to package data in new ways will be super-interesting.

> 
> *NONE* of the interesting problems are solved by HTTP2. You *still*
> need a full blown protocol ontop of it. So no, this doesn't change that.

If you had to nominate only one of those problems, which one would you consider 
the most interesting?


Thanks for chiming in, really appreciate your time,
Damir





Re: Proposal: http2 wire format

2018-03-26 Thread Vladimir Sitnikov
Damir>Wouldn’t that be protocol semantics? Framing is already taken care of
by the wire protocol.

Apparently I'm using the wrong word. I do mean protocol semantics.

Damir>But can you pull off grpc.. Would it be the only protocol supported?

Of course there will be lots of "older clients"
For instance, pgjdbc has been supporting v2 and v3 for quite a while.
Now it supports just v3.

Damir>Can you imagine the reaction and discussion if I came up with this?

I would +1 for that :)

Damir>how can I do something about the status quo of FEBE protocol that
would be defensible in front of the Postgres community?” What would be your
answer?

I would definitely check if frameworks like GRPC, Apache Thrift, and
similar are suitable.

Damir>What if I wanted JSON or CSV returned

That is not up to the protocol. It is not up to the backend.
It looks like a task for the application since CSV could mean lots of stuff
(e.g. commas vs tabs vs quotes etc). You don't want to integrate all that
stuff in core.

Damir>Wouldn’t you need to first solve the problem of content negotiation?

Currently content negotiation is done at application level (e.g. at bind
message). I don't think it makes sense to negotiate the content for each
and every protocol message. It does sound like an over-engineering.


Damir>Wouldn’t HTTP2 framing still allow prepared statements and cursors?

It will require to implement parse/bind/execute "protocol messages" again
and again.

Damir>’m just proposing a layer under it that gets rid of a lot of pain.

The thing is it is not clear whose pain are you treating.
Application developers just use drivers, so they don't care if there's
HTTP/2 under the covers.
Driver developers have more-or-less tested v3 implementations, so they
don't have that pain of "implementing v3 parser".
PostgreSQL core developers ..., well, you know.

Vladimir


Re: Proposal: http2 wire format

2018-03-26 Thread Vladimir Sitnikov
It could make sense to arrange a Google Hangouts conversation (or alike).
Hangouts allows to record sessions with up to 10 speakers and unlimited
listeners. The recording can be shared via YouTube.

Damir>Funny you agree with that—for someone having the experience of
writing a driver and having a long list of things that you find wrong and
frustrating

I've been contributing to pgjdbc since 2014, so I do know how "great" v3 is.
I wish there was a machine-readable protocol definition (e.g. Apache Thrift
and/or GRPC and/or ProtoBuf etc).

However, I fully agree with Tom that a mere usage of HTTP/2 would just
render havoc with little outcome. Just to make it clear: HTTP/2 does not
solve "hard to parse", "hard to load-balance", "hard to proxy" v3 problems,
and it creates a huge problem of "let's reimplement HTTP2 client for all
the clients/languages".

Of course HTTP/2 stream multiplexing could be extremely helpful. For
QueryCancel commands and/or for hearbeat messages. However those things
alone do not justify the use of low-level HTTP/2.

Damir>Why parse some obscure Postgres-specific binary data types when you
can have the database send you the data in the serialization format of your
client language?

Suppose you want to use prepared statement.
Then you need to do it in sequence:
1) Parse SQL text and convert it to a handle
2) Bind parameter value to the handle
3) Execute the statement
4) Receive rows
5) Receive error if any

Currently it is implemented via different v3 messages (parse, bind,
execute, row description, data row, etc etc).

The claim is *any* implementation "on top of HTTP/2" would basically
require to implement those "parse, bind, execute, row data, etc" *messages*.
Say you pick to use "/parse" url with SQL text in body instead of "parse
message". It does not make the whole thing "just HTTP/2". It just means
you've created "your own protocol on top of HTTP/2".

Clients would have to know the sequence of valid messages,
clients would have to know if SQL should be present in body or in URL or in
form post data, etc, etc.

I believe Andres means exactly the same thing as he says

By the way: you cannot just "load balance" "parse/bind/exec" to different
backends, so the load balancer should be aware of meaning of those
"parse/bind/exec" messages. I believe that is one of the requirements Craig
meant by "Is practical to implement in connection pooler proxies".

Andres>You *still* need a full blown protocol ontop of it. So no, this
doesn't change that


Damir> Did you know that Go has HTTP2 support in the standard library? And
so does java, too?

Java has TCP implementation in the standard library.
Does it help implementing v3 protocol?
In the same way HTTP/2 "as a library" helps implementing v4. The problem is
it does not. Developer would have to somehow code the coding rules (e.g.
header names, body formats).
HTTP/2 is just too low level.


Damir>Why parse some obscure Postgres-specific binary data types when you
can have the database send you the data in the serialization format of your
client language?

>From my own experience, automatic use of server-prepared statements (see
https://github.com/pgjdbc/pgjdbc/pull/319 ) did cut end-user response times
of our business application in half.
That is clients would have to know the way to use prepared statements in
order to get decent performance.
If you agree with that, then "v3 parse message", "v3 bind message", "v3
execute message" is not that different from "HTTP/2 POST to /parse",
"HTTP/2 POST to /bind", "HTTP/2 POST to /execute". It is still "obscure
PostgreSQL-specific HTTP/2 calls".

Even if you disagree (really?) you would still have to know
PostgreSQL-specific way to encode SQL text and "number of rows returned"
and "wire formats for the columns" even for a single "HTTP POST
/just/execute/sql" kind of API. Even that is "a full blown protocol
ontop of HTTP2"
(c) Andres.

Vladimir


Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 18:09, Vladimir Sitnikov  
> wrote:
> 
> Damir>Postgres doesn’t know about grpc, s
> 
> I'm afraid you are missing the point.
> I would say PostgreSQL doesn't know about HTTP/2.
> It is the same as "PostgreSQL doesn't know about grpc".
> 
> Here's a quote from your pg_h2 repo:
> >What we need is to really build a request object and correctly extract
> > the full payload and parameters from the request. For example,
> >maybe we want to implement a QUERY method, similar to POST or PUT,
> > and pass the query text as the body of the request, with parameters
> > in the query string or in the headers
> 
> It basically suggests to implement own framing on top of HTTP/2.

Wouldn’t that be protocol semantics? Framing is already taken care of by the 
wire protocol.

> 
> When I say GRPC, I mean "implement PostgreSQL-specific protocol via GRPC 
> messages".
> 
> Let's take current message formats: 
> https://www.postgresql.org/docs/current/static/protocol-message-formats.html
> If one defines those message formats via GRPC, then GRPC would autogenerate 
> parsers and serializers for lots of languages "for free".
> 
> For instance
> Query (F)
>  Byte1('Q') Identifies the message as a simple query.
>  Int32 Length of message contents in bytes, including self.
>  String The query string itself.
> 
> can be defined via GPRC as
> message Query {
>   string queryText = 1;
> }
> 
> This is trivial to read, trivial to write, trivial to maintain, and it 
> automatically generates parsers/generators for lots of languages.
> 

I agree with you 100% here. But can you pull off grpc without HTTP2 framing in 
place? Would it be the only protocol supported? What if I wanted JSON or CSV 
returned, or just plain old Postgres v3 binary format, since I already have the 
parser written for it? Wouldn’t you need to first solve the problem of content 
negotiation?

HTTP2 proposal is pragmatically much smaller chunk, and it’s already hard to 
explain. Can you imagine the reaction and discussion if I came up with this?

In fact, if you ask yourself the question “how can I do something about the 
status quo of FEBE protocol that would be defensible in front of the Postgres 
community?” What would be your answer? 

> 
> Parsing of the current v3 protocol has to be reimplemented for each and every 
> language, and it would be pain to implement parsing for v4.
> Are you going to create "http/2" clients for Java, C#, Ruby, Swift, Dart, 
> etc, etc?
> 
> I am not saying that a mere redefinition of v3 messages as GRPC would do the 
> trick. I am saying that you'd better consider frameworks that would enable 
> transparent implementation of client libraries.
> 
> Damir>and will talk to any HTTP2 conforming client
> 
> I do not see where are you heading to.

Getting rid of having to write a framing parser in every client language?

> Is "curl as PostgreSQL client" one of the key objectives for you?

No, it’s just something that is available right now—the point is to demonstrate 
increased ability to get the data out, without having to write access code over 
and over, and then lug that whenever you install some data processing piece. 
Kind of the same motivation why you think grpc is it. I’m just proposing a 
layer under it that gets rid of a lot of pain.

> True clients (the ones that are used by the majority of applications) should 
> support things like "prepared statements", "data types", "cursors" (resultset 
> streaming), etc. I can hardly imagine a case when one would use "curl" and 
> operate with prepared statements.

Wouldn’t HTTP2 framing still allow prepared statements and cursors?

> I think psql is pretty good client, so I see no point in implementing HTTP/2 
> for a mere reason of using curl to fetch data from the DB.

> 
> Vladimir




Re: Proposal: http2 wire format

2018-03-26 Thread Andres Freund
Hi,

On 2018-03-26 20:36:09 +0200, Damir Simunic wrote:
> If so, I’m not suggesting we get rid of FEBE, but leave it as is and 
> complement it with a widely understood and supported protocol, that in fact 
> takes compatibility way more seriously than FEBE. Just leave v3 frozen. Seems 
> like ultimate backward compatibility, no? Or am I missing something?

Maintaining it forever both in postgres, and in various libraries.


> You likely know every possible use case for Postgres, which makes you
> believe that the status quo is the right way. Or maybe I didn’t flesh
> out my proposal enough for you to give it a chance. Either way, I just
> can’t figure out where would HTTP2 be the same as status quo or a step
> backward compared to FEBE. I can see you’re super-busy and dedicated,
> but if you can find the time to enlighten me beyond just waving the
> “compatibility” and “engineering” banners, I’d appreciate you
> endlessly.

Well, besides vague points you've not elaborated what this is actually
gaining us. From my POV, HTTP2 wouldn't solve any of the interesting
protocol issues since those are one layer above what HTTP2 provides.


> (Oh, it’s my data, too; presently held hostage to the v3 protocol).

What on earth. You can freely migrate off.


> You mention twice loss of control--what exactly is the fear? 

If there's issues with what the library does or how it does it - we
can't fix them ourselves.


> You know what? HTTP2 just might fix it. Getting a new protocol into
> the core will force enough adjustments to the code to open the door
> for the next protocol on the horizon: QUIC, which happens to be UDP
> based, and might just be the ticket. At a minimum it will get
> significantly more people thinking about the possibility of
> reattaching sessions and doing all kinds of other things. Allowing
> multiple protocols is not very different from allowing a multitude of
> pl implementations.

> Help me put HTTP2 in place, and I’ll bet you, within a few months someone 
> will come up with a patch for QUIC. And then someone else will remember your 
> paragraph above and say “hmm, let’s see…"

At least I do *NOT* want many protocols in core. We've a hard enough
time to keep up with integrating patches and maintenance to not just
willy nilly integrate multiple new features with unclear lifetimes.

> 
> > I realize that
> > webservers manage to have pretty lightweight sessions, but that's not a
> > property of the protocol they use, it's a property of their internal
> > architectures.  We can't get there without a massive rewrite of the PG
> > server --- one that would be largely independent of any particular way of
> > representing data on the wire, anyway.
> > 
> 
> A smart outsider might come along, look at an ultra-fast web server,
> then look at Postgres and think, “Hmm, both speak HTTP2, but one is
> blazing fast, the other slow.

Err. What does http2 or the v3 protocol have to do with any of this? The
performance issues v3 has are all above where http2 would be, so ...?


> There are three alternatives to the proposal: do nothing, make a few
> anemic changes to v3, or start a multiyear discussion on the design of
> the next protocol. And you’ll still converge to something like HTTP2
> or QUIC.

*NONE* of the interesting problems are solved by HTTP2. You *still*
need a full blown protocol ontop of it. So no, this doesn't change that.


Greetings,

Andres Freund



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 15:42, Alvaro Hernandez  wrote:
> 
> 
> 
> On 26/03/18 13:11, Damir Simunic wrote:
>>> On 26 Mar 2018, at 11:13, Vladimir Sitnikov >> > wrote:
>>> 
>>> Damir> * What are the criteria for getting this into the core?
>>> Craig>Mine would be: 
>>> 
>>> +1
>>> 
>>> There's a relevant list as well: 
>>> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md
>>>  
>>> 
>>>  
>>> 
>> 
>> This is a great addition to the list, thanks!
>> 
>> Damir
>> 
> 
> Hi Damir.
> 
> I'm interested in the idea. However, way before writing a PoC, IMVHO I'd 
> rather write a detailed document including:
> 
> - A brief summary of the main features of HTTP2 and why it might be a good 
> fit for PG (of course there's a lot of doc in the wild about HTTP/2, so just 
> a summary of the main relevant features and an analysis of how it may fit 
> Postgres).
> 
> - A more or less thorough description of how every feature in current 
> PostgreSQL protocol would be implemented on HTTP/2.
> 
> - Similar to the above, but applied to the v4 TODO feature list.
> 
> - A section for connection poolers, as  an auth, as these are very important 
> topics.
> 
> 
> Hope this helps,
> 
> Álvaro
> 

Álvaro, it does help, thanks. This discussion is to inform such a document. But 
the topic is such that having a good PoC will move the discussion further much 
faster. 

Can you help with thinking about how would HTTP2 impact connection poolers, I 
don’t know much about those?
> -- 
> 
> Alvaro Hernandez
> 
> 
> ---
> OnGres



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 18:19, Vladimir Sitnikov  
> wrote:
> 
> Tom>But starting from the assumption that HTTP2 solves our problems seems to 
> me to be "Here's a hammer.
> 
> Agree.

Funny you agree with that—for someone having the experience of writing a driver 
and having a long list of things that you find wrong and frustrating, one would 
expect you do look at how other protocols work, or at least consider that maybe 
the right way is to change something server side.

> 
> Just a side note: if v4 is ever invented I wish client language support
> is considered.
> It does take resources to implement message framing, and data parsing (e.g. 
> int, timestamp, struct, array, ...) for each language independently.

This is a strange statement about framing. Did you know that Go has HTTP2 
support in the standard library? And so does java, too? 
https://github.com/http2/http2-spec/wiki/Implementations 


The part I hinted at in the example but did not get the message across is that 
I’m advocating the best possible client language support. The right way is to 
stop writing drivers and format the data server side. Why parse some obscure 
Postgres-specific binary data types when you can have the database send you the 
data in the serialization format of your client language? Or JSON or protobuf 
or whatever you want. What if your application has data patterns that would 
benefit from being sent over the wire in some specific columnar format? 
Wouldn’t it be cool if you could add that to the server and have all clients 
just work, without being locked in into a language because of its driver?

My point is that you go in steps. Put the foot in the door first, enable 
experimentation and then you’ll get to where you want to be.


> 
> Vladimir 



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 16:56, Tom Lane  wrote:
> 
> Damir Simunic  writes:
>>> On 26 Mar 2018, at 11:06, Vladimir Sitnikov  
>>> wrote:
 If anyone finds the idea of Postgres speaking http2 appealing
> 
> TBH, this sounds like a proposal to expend a whole lot of work (much of it
> outside the core server, and thus not under our control) in order to get
> from a state of affairs where there are things we'd like to do but can't
> because of protocol compatibility worries, to a different state of affairs
> where there are things we'd like to do but can't because of protocol
> compatibility worries.  

What do you mean by compatibility worries? Is it backward compatibility?

If so, I’m not suggesting we get rid of FEBE, but leave it as is and complement 
it with a widely understood and supported protocol, that in fact takes 
compatibility way more seriously than FEBE. Just leave v3 frozen. Seems like 
ultimate backward compatibility, no? Or am I missing something?

You likely know every possible use case for Postgres, which makes you believe 
that the status quo is the right way. Or maybe I didn’t flesh out my proposal 
enough for you to give it a chance. Either way, I just can’t figure out where 
would HTTP2 be the same as status quo or a step backward compared to FEBE. I 
can see you’re super-busy and dedicated, but if you can find the time to 
enlighten me beyond just waving the “compatibility” and “engineering” banners, 
I’d appreciate you endlessly.

> Why would forcing our data into a protocol
> designed for a completely different purpose, and which we have no control
> over, be a step forward?  

What purpose do you see HTTP2 being designed for that is completely different 
from FEBE? Not being cynical, genuinely want to learn. (Oh, it’s my data, too; 
presently held hostage to the v3 protocol).

You mention twice loss of control--what exactly is the fear? 

> How would that address the fundamental issue of
> inertia in multiple chunks of software (ie, client libraries and
> applications as well as the server)?
> 

Is this inertia as in "our TODO list is years old and nobody’s doing anything 
about it"? If so, I posit here that using HTTP2 as the v4 protocol will lead to 
significant reduction of inertia. And that just because we’re talking HTTP2 and 
not some new obscure thing we invented.

The psychological and social aspects are not to be underestimated. 

>> This proposal takes the stance that having HTTP2 wire protocol in place will 
>> enable wide experimentation  with and implementation of many new features 
>> and content types, but is not concerned with the specifics of those.
> 
> That reads to me as pie in the sky, and uninformed by any engineering
> reality.  As an example, it's not the protocol's fault that database
> server processes are expensive to spin up; changing to a different
> protocol will do nothing to make them more lightweight.  We've thought
> about various ways to amortize that cost, but they tend to fall foul of
> the fact that sessions are associated with TCP connections, which we can't
> transparently remake or reattach to a different endpoint process.  HTTP2
> is not going to fix that, because it's still TCP based.  

That reads to me as uninformed engineering reality. Just because you are 
encumbered with the worries of compatibility and stuck in the world of TCP, 
doesn’t mean it can’t be done. 

You know what? HTTP2 just might fix it. Getting a new protocol into the core 
will force enough adjustments to the code to open the door for the next 
protocol on the horizon: QUIC, which happens to be UDP based, and might just be 
the ticket. At a minimum it will get significantly more people thinking about 
the possibility of reattaching sessions and doing all kinds of other things. 
Allowing multiple protocols is not very different from allowing a multitude of 
pl implementations.

Help me put HTTP2 in place, and I’ll bet you, within a few months someone will 
come up with a patch for QUIC. And then someone else will remember your 
paragraph above and say “hmm, let’s see…"

> I realize that
> webservers manage to have pretty lightweight sessions, but that's not a
> property of the protocol they use, it's a property of their internal
> architectures.  We can't get there without a massive rewrite of the PG
> server --- one that would be largely independent of any particular way of
> representing data on the wire, anyway.
> 

A smart outsider might come along, look at an ultra-fast web server, then look 
at Postgres and think, “Hmm, both speak HTTP2, but one is blazing fast, the 
other slow. Can I learn anything from the former to apply to the latter? Maybe 
I'll add another type of a backend that serves only a very very narrow use 
case, but makes it blazing fast?” Pie in the sky? Maybe. But isn’t it how it 
works today: lots of smart people chipping away in small increments?

Let’s not underestimate the effect of possibilities on mobilizing minds. 
Innovation is fueled by t

Re: Proposal: http2 wire format

2018-03-26 Thread Alvaro Hernandez



On 26/03/18 13:11, Damir Simunic wrote:
On 26 Mar 2018, at 11:13, Vladimir Sitnikov 
mailto:sitnikov.vladi...@gmail.com>> wrote:


Damir> * What are the criteria for getting this into the core?
Craig>Mine would be:

+1

There's a relevant list as well: 
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md 





This is a great addition to the list, thanks!

Damir



    Hi Damir.

    I'm interested in the idea. However, way before writing a PoC, 
IMVHO I'd rather write a detailed document including:


- A brief summary of the main features of HTTP2 and why it might be a 
good fit for PG (of course there's a lot of doc in the wild about 
HTTP/2, so just a summary of the main relevant features and an analysis 
of how it may fit Postgres).


- A more or less thorough description of how every feature in current 
PostgreSQL protocol would be implemented on HTTP/2.


- Similar to the above, but applied to the v4 TODO feature list.

- A section for connection poolers, as  an auth, as these are very 
important topics.



    Hope this helps,

    Álvaro

--

Alvaro Hernandez


---
OnGres



Re: Proposal: http2 wire format

2018-03-26 Thread Vladimir Sitnikov
Tom>But starting from the assumption that HTTP2 solves our problems seems
to me to be "Here's a hammer.

Agree.

Just a side note: if v4 is ever invented I wish client language support
is considered.
It does take resources to implement message framing, and data parsing (e.g.
int, timestamp, struct, array, ...) for each language independently.

Vladimir


Re: Proposal: http2 wire format

2018-03-26 Thread Vladimir Sitnikov
Damir>Postgres doesn’t know about grpc, s

I'm afraid you are missing the point.
I would say PostgreSQL doesn't know about HTTP/2.
It is the same as "PostgreSQL doesn't know about grpc".

Here's a quote from your pg_h2 repo:
>What we need is to really build a request object and correctly extract
> the full payload and parameters from the request. For example,
>maybe we want to implement a QUERY method, similar to POST or PUT,
> and pass the query text as the body of the request, with parameters
> in the query string or in the headers

It basically suggests to implement own framing on top of HTTP/2.

When I say GRPC, I mean "implement PostgreSQL-specific protocol via GRPC
messages".

Let's take current message formats:
https://www.postgresql.org/docs/current/static/protocol-message-formats.html
If one defines those message formats via GRPC, then GRPC would autogenerate
parsers and serializers for lots of languages "for free".

For instance
Query (F)
 Byte1('Q') Identifies the message as a simple query.
 Int32 Length of message contents in bytes, including self.
 String The query string itself.

can be defined via GPRC as
message Query {
  string queryText = 1;
}

This is trivial to read, trivial to write, trivial to maintain, and it
automatically generates parsers/generators for lots of languages.


Parsing of the current v3 protocol has to be reimplemented for each and
every language, and it would be pain to implement parsing for v4.
Are you going to create "http/2" clients for Java, C#, Ruby, Swift, Dart,
etc, etc?

I am not saying that a mere redefinition of v3 messages as GRPC would do
the trick. I am saying that you'd better consider frameworks that would
enable transparent implementation of client libraries.

Damir>and will talk to any HTTP2 conforming client

I do not see where are you heading to.
Is "curl as PostgreSQL client" one of the key objectives for you?
True clients (the ones that are used by the majority of applications)
should support things like "prepared statements", "data types", "cursors"
(resultset streaming), etc. I can hardly imagine a case when one would use
"curl" and operate with prepared statements.
I think psql is pretty good client, so I see no point in implementing
HTTP/2 for a mere reason of using curl to fetch data from the DB.

Vladimir


Re: Proposal: http2 wire format

2018-03-26 Thread Tom Lane
Damir Simunic  writes:
>> On 26 Mar 2018, at 11:06, Vladimir Sitnikov  
>> wrote:
>>> If anyone finds the idea of Postgres speaking http2 appealing

TBH, this sounds like a proposal to expend a whole lot of work (much of it
outside the core server, and thus not under our control) in order to get
from a state of affairs where there are things we'd like to do but can't
because of protocol compatibility worries, to a different state of affairs
where there are things we'd like to do but can't because of protocol
compatibility worries.  Why would forcing our data into a protocol
designed for a completely different purpose, and which we have no control
over, be a step forward?  How would that address the fundamental issue of
inertia in multiple chunks of software (ie, client libraries and
applications as well as the server)?

> This proposal takes the stance that having HTTP2 wire protocol in place will 
> enable wide experimentation  with and implementation of many new features and 
> content types, but is not concerned with the specifics of those.

That reads to me as pie in the sky, and uninformed by any engineering
reality.  As an example, it's not the protocol's fault that database
server processes are expensive to spin up; changing to a different
protocol will do nothing to make them more lightweight.  We've thought
about various ways to amortize that cost, but they tend to fall foul of
the fact that sessions are associated with TCP connections, which we can't
transparently remake or reattach to a different endpoint process.  HTTP2
is not going to fix that, because it's still TCP based.  I realize that
webservers manage to have pretty lightweight sessions, but that's not a
property of the protocol they use, it's a property of their internal
architectures.  We can't get there without a massive rewrite of the PG
server --- one that would be largely independent of any particular way of
representing data on the wire, anyway.

We've certainly got issues that can't be solved without protocol changes.
But starting from the assumption that HTTP2 solves our problems seems to
me to be "Here's a hammer.  I'm sure your problem must be a nail, because
all problems are nails".

regards, tom lane



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
> On 26 Mar 2018, at 11:06, Vladimir Sitnikov  
> wrote:
> 
> Hi,
> 
> >If anyone finds the idea of Postgres speaking http2 appealing
> 
> HTTP/2 sounds interesting.
> What do you think of https://grpc.io/ ?
> 
> Have you evaluated it?
> It does sound like a ready RPC on top of HTTP/2 with support for lots of 
> languages.
> 
> The idea of reimplementing the protocol for multiple languages from scratch 
> does not sound too appealing.

This proposal takes the stance that having HTTP2 wire protocol in place will 
enable wide experimentation  with and implementation of many new features and 
content types, but is not concerned with the specifics of those.

---
Let me illustrate with an example how it would look if we already had HTTP2 as 
proposed.

Lets’ say you have a building automation device on your network that happens to 
speak grpc, and you decided to use Postgres to store published topics in the 
database. 

Your grpc-speaking device might connect to Postgres and issue a request like 
this:

HEADERS (flags = END_HEADERS)
:method = POST
:scheme = http
:path = /CreateTopic
pg-database = Publisher
content-type = application/grpc+proto
grpc-encoding = gzip
authorization = Bearer y235.wef315yfh138vh31hv93hv8h3v

DATA (flags = END_STREAM)


(This is from grpc.io homepage; uppercase HEADERS and DATA are frame names from 
the HTTP2 specification).

Postgres would take care of TLS negotiation, unpack the frames, decompress the 
headers (:method, :path, etc are transferred compressed with a lookup table) 
and copy the payload into memory and make it  all available to the backend. If 
this was the first request, it would start the backend for you as well.

Postgres doesn’t know about grpc, so it would just conveniently return "406 Not 
Supported” to your client and close the stream (but not the connection). Still 
connected and authenticated, the device could retry the request with 
`content-type: application/json`, and if you somehow programmed a function that 
accepts json, the request would go through. (Let’s imagine we have some kind of 
mechanism to associate functions to requests and content types, maybe through 
some function attributes in the catalog). 

Say that someone else took the time and programmed a plugin that knows how to 
talk grpc. Then the server would call that plugin for you, validate and insert 
the data in the right table, and return 200 OK or 204 or whatever is 
appropriate to return according to grpc protocol semantics. 

Obviously, someone has to implement a bunch of new code on the server side to 
ungzip, to interpret the content of the protobuf message and take action. But 
that someone doesn’t need to think of getting to all the metadata like 
compression type, payload format etc. Just somehow plug into the server at the 
right level read the data and metadata from memory, and then call into SPI to 
do its thing. Similar to how application servers work today. (Or Postgres for 
that matter, though it’s just it speaks FEBE and there’s no content type 
negotiation).

The same goes for the ‘authorization’ header. Postgres does not support Bearer 
token authorization today. But maybe you’ll be able to define a function that 
knows how to deal with the token, and somehow signal to Postgres that you want 
it to call this function when it sees such a header. Or maybe someone wrote a 
plugin that does that, and you configure your server to use it. 

Then when connecting to Postgres with the above request, it would start the 
backend and call the function/plugin for you to decide whether to authorize the 
request. (As a side note, subsequent requests within the same connection would 
have this header compressed on the wire; that’s also a HTTP2 feature).

---

That’s only one possible scenario, and not the only one. In this specific 
scenario, the benefit is that Postgres will give you content negotiation built 
in, and will talk to any HTTP2 conforming client. Like you said, you don’t want 
to reimplement the protocol over and over.

But whether that content is grpc or something else, that's for a future 
discussion. 

Current focus is really on getting the framing and extensibility in the core. 
Admittedly, haven’t yet figured out how to code all the details, but I’m more 
and more clear how this will work architecturally. Now it’s about putting lots 
of elbow grease into understanding the source, coding in C, and addressing all 
the issues that make sure the new protocol is 100% supporting all existing v3 
use cases. 

Beyond v3 use cases, top of my mind are improvements like you comment on in the 
topic “Binary transfer” in your “v4 wanted features” doc (and most of the other 
stuff you mention).


Damir


> 
> Vladimir




Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
> On 26 Mar 2018, at 11:13, Vladimir Sitnikov  
> wrote:
> 
> Damir> * What are the criteria for getting this into the core?
> Craig>Mine would be: 
> 
> +1
> 
> There's a relevant list as well: 
> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md
>  
> 
>  
> 

This is a great addition to the list, thanks!

Damir



Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
> On 26 Mar 2018, at 12:47, Craig Ringer  wrote:
> 
> On 26 March 2018 at 17:34, Damir Simunic  wrote:
>  
> 
> > As you move forward with the PoC, consider: even if you decide not to
> > become protocol-layer experts, you'll still need to become familiar
> > with application-layer security in HTTP.
> 
> Good point. Application layer security is indeed a concern.
> 
> h2 has provisions for security by design, and a significant amount of 
> research going into this on a large scale. Adopting h2 instead of inventing 
> our own v4 gets us all this research for free.
> 
> HTTP2, please, not "h2".
> 
> It looks HTTP2 does use the term "h2" to mean "http2 over TLS", to 
> differentiate it from "h2c" which is HTTP2-over-cleartext.
> 
> IMO, you'd have to support both. Mandating TLS is going to be a non-starter 
> for sites that use loopback connections or virtual switches on VMs, VLAN 
> isolation, or other features to render traffic largely unsniffable. They 
> won't want to pay the price for crypto on all traffic. So this needs to be 
> "HTTP2 support" not "HTTP2/TLS (h2) support" anyway.

Makes sense; I’ll update all wording and function names, etc. No difference to 
the substance of this proposal. The same code path handles both h2 and h2c. TLS 
is optional, a matter of detecting the first byte of the request and taking the 
appropriate action. 

I think we can reliably and efficiently detect h2, h2c, and FEBE requests. Of 
course, the behavior needs to be configurable: which protocols to enable, and 
how to resolve the negotiation. In my mind this is self-evident.

> 
> Re Pg and security: By and large we don't invent our own security protocols. 
> We've adopted standard mechanisms like GSSAPI and SCRAM, and vendor ones like 
> SSPI. Some of the details of how they're implemented in the protocol are of 
> course protocol specific (and thus, opportunities for bugs/design mistakes), 
> of course.
> 
> But you will get _nowhere_ in making this a new default protocol if you just 
> try to treat those as outdated and uninteresting.
> 

Agreed: new default protocol must be covering 100% of existing use cases, _and_ 
add more compelling capabilities on top.

If anything I wrote made it appear contrary to that goal, it is purely because 
of my current focus on getting to a PoC. 

> In fact, part of extensibility considerations should be extensible 
> authentication.
> 
> Authentication and authorization (which any new protocol really should 
> separate) are crucial features, and there's no one-size-fits-all answer.
> 

I think that HTTP2 gets us much closer to that goal. My vision is to enable 
application-developer-defined authentication and/or authorization as well. This 
is something to research once the framing is in place.

> If you just assume, say, that everything happens over TLS with password auth 
> or x.509 client certs, you'll create a giant mess for all the sites that use 
> Kerberos or SSPI.
> 

100% agreed on everything you say, and thanks for taking the time to write this 
up. 

> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services




Re: Proposal: http2 wire format

2018-03-26 Thread Craig Ringer
On 26 March 2018 at 17:34, Damir Simunic 
wrote:


>
> > As you move forward with the PoC, consider: even if you decide not to
> > become protocol-layer experts, you'll still need to become familiar
> > with application-layer security in HTTP.
>
> Good point. Application layer security is indeed a concern.
>
> h2 has provisions for security by design, and a significant amount of
> research going into this on a large scale. Adopting h2 instead of inventing
> our own v4 gets us all this research for free.
>

HTTP2, please, not "h2".

It looks HTTP2 does use the term "h2" to mean "http2 over TLS", to
differentiate it from "h2c" which is HTTP2-over-cleartext.

IMO, you'd have to support both. Mandating TLS is going to be a non-starter
for sites that use loopback connections or virtual switches on VMs, VLAN
isolation, or other features to render traffic largely unsniffable. They
won't want to pay the price for crypto on all traffic. So this needs to be
"HTTP2 support" not "HTTP2/TLS (h2) support" anyway.

Re Pg and security: By and large we don't invent our own security
protocols. We've adopted standard mechanisms like GSSAPI and SCRAM, and
vendor ones like SSPI. Some of the details of how they're implemented in
the protocol are of course protocol specific (and thus, opportunities for
bugs/design mistakes), of course.

But you will get _nowhere_ in making this a new default protocol if you
just try to treat those as outdated and uninteresting.

In fact, part of extensibility considerations should be extensible
authentication.

Authentication and authorization (which any new protocol really should
separate) are crucial features, and there's no one-size-fits-all answer.

If you just assume, say, that everything happens over TLS with password
auth or x.509 client certs, you'll create a giant mess for all the sites
that use Kerberos or SSPI.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic

> On 26 Mar 2018, at 11:34, Craig Ringer  wrote:
> 
> On 26 March 2018 at 17:01, Damir Simunic  > wrote:
>  
> 
> > - Doesn't break new clients connecting to old servers
> >
> 
> Old server sends “Invalid startup packet” and closes the connection; client’s 
> TLS layer reports an error. Does that count as not breaking new clients?
> 
> 
> libpq would have to do something like it does now for ssl connections, 
> falling back to non-ssl, and offering a connection option to make it try the 
> v3 protocol immediately without bothering with v4.
>  
> > - No extra round trips for new client -> old server . I don't personally 
> > care about old client -> new server so much, but should be able to offer a 
> > pg_hba.conf option to ensure v3 proto only or otherwise prevent extra round 
> > trips in this case too.
> 
> Can we talk about this more, please?
> 
> As above. A newer libpq should not perform worse on an existing server than 
> an older libpq.

Wouldn’t newer libpq continue to support v3 as long as supported servers do? 
I’m confused with “no extra round trips” part and the “pg_hba.conf option". If 
I know I’m talking to the old server, I’ll just configure the client to talk 
febe v3 and not worry.

Anyway, I’ll document all the combinations to make it easier to discuss.

>  
>  
> 
> Check.
> 
> Extensibility is the essence of h2, we’re getting this for free.
> 
> 
> Please elaborate somewhat for people not already strongly familiar with HTTP2.
> 
> BTW, please stop saying "h2" when you mean HTTP2. It's really confusing, 
> because I keep thinking you are talking about H2, the database engine 
> (http://www.h2database.com/ ), which has 
> PostgreSQL protocol and syntax compatibility as well as its own wire protocol.

Haha, I din’t know that! “h2” is the protocol identifier in the ALPN; in mind, 
http2 has more of the web and http1 baggage that I’m trying to avoid here. But 
let’s stick to http2 and define it better.

>  
> > - Has a wireshark dissector
> 
> Check.
> 
> ... including understanding of the PostgreSQL bits that are payload within 
> the protocol.
> 
> Look at what the current dissector does - capture some packets.
>  
> 
> >
> > - Is practical to implement in connection pooler proxies like pgbouncer, 
> > pgpool
> 
> Something I’m planning to look into and address.
> 
> New connection poolers might become feasible, too: nginx, nghttpx, etc. (for 
> non-web related scenarios as well). Opting into h2 lets us benefit from a 
> much larger amount of time and resources being spent on improving things that 
> matter. Reverse proxies face the same architectural challenges as pg-only 
> connection poolers do.
> 
> 
> ... which is nice, but doesn't change the fact that a protocol revision that 
> completely and unfixably breaks existing tools much of the community relies 
> on won't go far.
>  
> > - Any libraries used are widespread enough that they're present in at least 
> > RHEL7 and Debian Stable. We *can't* just bundle extras in our sources, and 
> > packagers are unlikely to be at all happy packaging an extra lib or 
> > backport for us. They'll probably just disable the new protocol.
> 
> Check.
> 
> Let me see if I can make a table showing parallel availability of Postgres 
> and libnghttp versions on mainstream platforms. If there are any gaps, I’m 
> sure it is possible to lobby for inclusion of libnghttp where it matters. I 
> see Debian has it for wheezy, jessie, and sid, while pg10 is on sid and 
> buster.
> 
> 
> Good plan. But be clear that this is super experimental.
>  
> >
> > - No regressions for support of SASL / SCRAM, GSSAPI, TLS with X.509 client 
> > certs, various other auth methods.
> >
> 
> Check.
> 
> Adding new auth method keyword (“h2”) in pg_hba will give us a clean code 
> path to work with.
> 
> I think you missed the point there entirely.
> 
> HTTP2 isn't an authentication method. It's a wire protocol. It will be 
> necessary to support authentication methods including, but not limited to, 
> GSSAPI, SSPI (windows), SCRAM, etc *on any new protocol*.
> 
> If you propose a new protocol, to replace the v3 protocol, and it doesn't 
> support SSPI or SCRAM I rate your chances as about zero of getting serious 
> interest. You'll be back in extension-for-webdevs town.
>  

Great points. I need to be more clear on that. My main concern was how to 
bypass the v3 auth negotiation that is closely linked to existing methods. From 
PoC perspective, I didn’t want to touch that and was focusing on the fact that 
more can be done wrt authentication in the initial request packet. 

Let me spend some time on this and come up with a good way to cover everything.


> 
> > Now, a protocol that cannot satisfy these is IMO not a complete 
> > non-starter. It just has to be treated as an optional feature to help out 
> > webapps, with quite different design criteria as a result, and cannot be 
> > allowed to be as intrusive.

Re: Proposal: http2 wire format

2018-03-26 Thread Craig Ringer
On 26 March 2018 at 17:01, Damir Simunic 
wrote:


>
> > - Doesn't break new clients connecting to old servers
> >
>
> Old server sends “Invalid startup packet” and closes the connection;
> client’s TLS layer reports an error. Does that count as not breaking new
> clients?
>
>
libpq would have to do something like it does now for ssl connections,
falling back to non-ssl, and offering a connection option to make it try
the v3 protocol immediately without bothering with v4.


> > - No extra round trips for new client -> old server . I don't personally
> care about old client -> new server so much, but should be able to offer a
> pg_hba.conf option to ensure v3 proto only or otherwise prevent extra round
> trips in this case too.
>
> Can we talk about this more, please?
>

As above. A newer libpq should not perform worse on an existing server than
an older libpq.



>
> Check.
>
> Extensibility is the essence of h2, we’re getting this for free.
>
>
Please elaborate somewhat for people not already strongly familiar with
HTTP2.

BTW, please stop saying "h2" when you mean HTTP2. It's really confusing,
because I keep thinking you are talking about H2, the database engine (
http://www.h2database.com/), which has PostgreSQL protocol and syntax
compatibility as well as its own wire protocol.


> > - Has a wireshark dissector
>
> Check.
>

... including understanding of the PostgreSQL bits that are payload within
the protocol.

Look at what the current dissector does - capture some packets.


>
> >
> > - Is practical to implement in connection pooler proxies like pgbouncer,
> pgpool
>
> Something I’m planning to look into and address.
>
> New connection poolers might become feasible, too: nginx, nghttpx, etc.
> (for non-web related scenarios as well). Opting into h2 lets us benefit
> from a much larger amount of time and resources being spent on improving
> things that matter. Reverse proxies face the same architectural challenges
> as pg-only connection poolers do.
>
>
... which is nice, but doesn't change the fact that a protocol revision
that completely and unfixably breaks existing tools much of the community
relies on won't go far.


> > - Any libraries used are widespread enough that they're present in at
> least RHEL7 and Debian Stable. We *can't* just bundle extras in our
> sources, and packagers are unlikely to be at all happy packaging an extra
> lib or backport for us. They'll probably just disable the new protocol.
>
> Check.
>
> Let me see if I can make a table showing parallel availability of Postgres
> and libnghttp versions on mainstream platforms. If there are any gaps, I’m
> sure it is possible to lobby for inclusion of libnghttp where it matters. I
> see Debian has it for wheezy, jessie, and sid, while pg10 is on sid and
> buster.
>
>
Good plan. But be clear that this is super experimental.


> >
> > - No regressions for support of SASL / SCRAM, GSSAPI, TLS with X.509
> client certs, various other auth methods.
> >
>
> Check.
>
> Adding new auth method keyword (“h2”) in pg_hba will give us a clean code
> path to work with.
>

I think you missed the point there entirely.

HTTP2 isn't an authentication method. It's a wire protocol. It will be
necessary to support authentication methods including, but not limited to,
GSSAPI, SSPI (windows), SCRAM, etc *on any new protocol*.

If you propose a new protocol, to replace the v3 protocol, and it doesn't
support SSPI or SCRAM I rate your chances as about zero of getting serious
interest. You'll be back in extension-for-webdevs town.


>
> > Now, a protocol that cannot satisfy these is IMO not a complete
> non-starter. It just has to be treated as an optional feature to help out
> webapps, with quite different design criteria as a result, and cannot be
> allowed to be as intrusive. Where changes to core protocol logic paths are
> required it'd have to add plugin mechanisms/hooks instead of adding its own
> new logic directly.
>
> While web-related scenarios are the first thing that comes to ming when
> talking about h2, (and that should not be disregarded), this proposal looks
> at the bigger picture of future-proofing the protocol.
> Headers/data/trailers split, and feature/ content negotiation are far
> bigger benefits then being web friendly.
>

You mentioned something about bundling queries in the startup packet.
That's cool if your queries don't need to adapt to server version etc,
which will often be the case. But doesn't that imply rather high backend
startup/shutdown costs?

There's a reason everyone with high rates of small simple queries uses
poolers right now.

Such a protocol would help poolers a lot, but not gain a great deal for the
core server without some kind of backend pooling, which is a huge separate
topic.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
Hi,

> On 26 Mar 2018, at 06:47, Jacob Champion  wrote:
> 
> On Sun, Mar 25, 2018 at 8:11 PM, Craig Ringer  wrote:
>> As others have noted, you'll want to find a way to handle this in the least
>> SSL-implementation-specific manner possible. IMO if it can't work with
>> OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a
>> non-starter.
> 
> +1.
> 
>> While I'm a big fan of code reuse and using existing libraries, I understand
>> others' hesitance here. Look at what happened with ossp-uuid; that was
>> painful and it was just a contrib.
>> 
>> It's a difficult balance between NIH and maintaining a stable core.
> 
> For whatever it's worth, I think libnghttp2 is an excellent choice for
> an HTTP/2 implementation, even when taking into account the risks of
> NIH. It's a well-designed library with mature clients (Curl and Apache
> HTTP Server, among others), and it's authored by an HTTP/2 expert. (If
> you're seriously considering HTTP/2, then you seriously need to avoid
> not-invented-here syndrome. Don't roll your own unless you're
> interested in becoming HTTP/2 protocol-layer security experts in
> addition to SQL security experts.)
> 
Agreed.

> As you move forward with the PoC, consider: even if you decide not to
> become protocol-layer experts, you'll still need to become familiar
> with application-layer security in HTTP.

Good point. Application layer security is indeed a concern. 

h2 has provisions for security by design, and a significant amount of research 
going into this on a large scale. Adopting h2 instead of inventing our own v4 
gets us all this research for free.


> You'll need to decide whether
> the HTTP browser/server security model -- which is notoriously
> unintuitive for many -- works well for Postgres. In particular, you'll
> want to make sure that the new protocol doesn't put your browser-based
> users in danger (I'm thinking primarily about cross-site request
> forgeries here). Always remember that one of a web browser's core use
> cases is the execution of untrusted code…

Mentioning h2 does bring browsers in mind, but this proposal is not concerned 
with that. (quick curl sketches are shown only because curl is an already 
available h2 client). Present web-facing designs already deal with browsers and 
API clients, there will be no change to that. Existing Postgres deployment and 
security practices must remain unchanged whether we use v3 or h2. Don’t think 
anyone would want to expose Postgres to the open web without a connection 
pooler in front of it.

When you say "browser/server model,” presumably you’re having http1 in mind. h2 
does not have much in common with http1 on the wire. In fact, h2 is 
architecturally closer to febe than http1. Both h2 and febe deal with multiple 
request/response pairs over a single connection. Server initiated requests are 
covered through push_promise frames, and logical replication (being more of a 
subscription thing in my mind) is covered through stream multiplexing.

Let's keep the discussion focused on the wire protocol: the sooner we can get 
to stable h2 framing in the core, the sooner we’ll be able to experiment with 
new use cases and possibilities. Only then it will make sense to bring back 
this discussion about browsers, content negotiation, etc.


Thanks,
Damir



> --Jacob




Re: Proposal: http2 wire format

2018-03-26 Thread Vladimir Sitnikov
Damir> * What are the criteria for getting this into the core?
Craig>Mine would be:

+1

There's a relevant list as well:
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md


Vladimir


Re: Proposal: http2 wire format

2018-03-26 Thread Vladimir Sitnikov
Hi,

>If anyone finds the idea of Postgres speaking http2 appealing

HTTP/2 sounds interesting.
What do you think of https://grpc.io/ ?

Have you evaluated it?
It does sound like a ready RPC on top of HTTP/2 with support for lots of
languages.

The idea of reimplementing the protocol for multiple languages from scratch
does not sound too appealing.

Vladimir


Re: Proposal: http2 wire format

2018-03-26 Thread Damir Simunic
Hi,

> On 26 Mar 2018, at 05:11, Craig Ringer  wrote:
> 
> On 26 March 2018 at 06:00, Damir Simunic  wrote:
>  
> > - Overhead for all clients. It may be tiny, but it needs to be
> >  measured and that cost needs to be weighed against the benefits.
> >  Maybe a cache miss in the context of a network connection is
> >  negligible, but we do need to know.
> 
> Important point. If h2 is to be seriously considered, then it must be an 
> improvement in absolutely every aspect.
> 
> The core part of this proposal is that h2 is parallel to v3. Something one 
> can opt into by compiling `--with_http2`.
> 
> IMO, a new protocol intended to supersede an old one must be a core, 
> non-optional feature. It won't reach critical mass of adoption if people 
> can't reasonably rely on it being there. There'll still be a multi-year lead 
> time as versions that support it become widespread enough to interest 
> non-libpq-based driver authors.

Agreed, it should be in core.

>  
> My PoC strategy is to touch existing code as little as possible. Yet if the 
> ProcessStartupPacket can somehow return the consumed bytes back to the TLS 
> lib for negotiation, then there’s zero cost to protocol detection for v2/v3 
> clients and only h2 clients pay the price of the extra check.
> 
> As others have noted, you'll want to find a way to handle this in the least 
> SSL-implementation-specific manner possible. IMO if it can't work with 
> OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a 
> non-starter.

Understood.

Everyone that matters supports ALPN: 
https://en.wikipedia.org/wiki/Application-Layer_Protocol_Negotiation#Support

From the PoC standpoint, it’s now a straightforward chore to make sure it is 
supported for all possible build choices.

> 
> > - Dependency on a new external library. Fortunately, it's MIT
> >  licensed, so it's PostgreSQL compatible, but what happens if it
> >  becomes unmaintained? This has happened a couple of times, and it
> >  causes overhead that needs to be taken into account.
> 
> I chose nghttp because it gave me a quick start, it’s well designed, a good 
> fit for this kind of work, and fortunately indeed, the license is compatible. 
> (Also, curl links to it as well, so am pretty confident it’ll be around). 
> Very possible that over time h2 parsing code migrates into pg codebase. There 
> are so much similarities to v3 architecture, we might find a way to 
> generalize both into a single codebase. Then h2 frame parser/state machine 
> becomes only a handful of .c files.
> 
> h2 is a standard; however you decide to parse it, your code will eventually 
> converge to a stable state in the same manner that febe v3 code did. Once we 
> master the protocol, I don’t think there’ll be much need to touch the framing 
> code. IOW even if we just import what we need, it won’t be a big issue.
> 
> While I'm a big fan of code reuse and using existing libraries, I understand 
> others' hesitance here. Look at what happened with ossp-uuid; that was 
> painful and it was just a contrib.
> 
> It's a difficult balance between NIH and maintaining a stable core.

Enough important projects depend on libnghttp, I don’t think it will go away 
any time soon. And http2 is big; as more and more tools want to talk that 
protocol they’ll turn to libnghttp, so the signs of any troubles will be 
visible very very quickly.

>   
>  
> 
> * Is there merit in the idea of a completely new v4 protocol—one that freezes 
> the v3 and takes a new path?
> 
> Likely so... but it has to be pretty compelling IMO. And more importantly, 
> offer a smooth backwards- and forwards-compatible path.
>  
> 
> * What are the criteria for getting this into the core?
> 
> Mine would be: 
> 
> - No new/separate port required. Works on existing port.
> 
Check.

> - Doesn't break old clients connecting to new servers
> 
Check.

> - Doesn't break new clients connecting to old servers
> 

Old server sends “Invalid startup packet” and closes the connection; client’s 
TLS layer reports an error. Does that count as not breaking new clients? 

curl -v https://localhost:5432

...
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:5432
* stopped the pause stream!
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
localhost:5432

This applies to any TLS client (an h2-supporting libpq-fe will behave the same):
 
wget -v https://localhost:5432

Connecting to localhost|::1|:5432... connected.
Unable to establish SSL connection.


> - No extra round trips for new client -> old server . I don't personally care 
> about old client -> new server so much, but should be able to offer a 
> pg_hba.conf option to ensure v3 proto only or otherwise prevent extra round 
> trips in this case too.

Can we talk about this more, please?

> 
> - Offers significant, concrete benefits and solves the outstanding set of 
> issues with v3 comprehensively

This proposal aim

Re: Proposal: http2 wire format

2018-03-25 Thread Jacob Champion
On Sun, Mar 25, 2018 at 8:11 PM, Craig Ringer  wrote:
> As others have noted, you'll want to find a way to handle this in the least
> SSL-implementation-specific manner possible. IMO if it can't work with
> OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a
> non-starter.

+1.

> While I'm a big fan of code reuse and using existing libraries, I understand
> others' hesitance here. Look at what happened with ossp-uuid; that was
> painful and it was just a contrib.
>
> It's a difficult balance between NIH and maintaining a stable core.

For whatever it's worth, I think libnghttp2 is an excellent choice for
an HTTP/2 implementation, even when taking into account the risks of
NIH. It's a well-designed library with mature clients (Curl and Apache
HTTP Server, among others), and it's authored by an HTTP/2 expert. (If
you're seriously considering HTTP/2, then you seriously need to avoid
not-invented-here syndrome. Don't roll your own unless you're
interested in becoming HTTP/2 protocol-layer security experts in
addition to SQL security experts.)

As you move forward with the PoC, consider: even if you decide not to
become protocol-layer experts, you'll still need to become familiar
with application-layer security in HTTP. You'll need to decide whether
the HTTP browser/server security model -- which is notoriously
unintuitive for many -- works well for Postgres. In particular, you'll
want to make sure that the new protocol doesn't put your browser-based
users in danger (I'm thinking primarily about cross-site request
forgeries here). Always remember that one of a web browser's core use
cases is the execution of untrusted code...

--Jacob



Re: Proposal: http2 wire format

2018-03-25 Thread Craig Ringer
On 26 March 2018 at 06:00, Damir Simunic 
wrote:


> > - Overhead for all clients. It may be tiny, but it needs to be
> >  measured and that cost needs to be weighed against the benefits.
> >  Maybe a cache miss in the context of a network connection is
> >  negligible, but we do need to know.
>
> Important point. If h2 is to be seriously considered, then it must be an
> improvement in absolutely every aspect.
>
> The core part of this proposal is that h2 is parallel to v3. Something one
> can opt into by compiling `--with_http2`.
>

IMO, a new protocol intended to supersede an old one must be a core,
non-optional feature. It won't reach critical mass of adoption if people
can't reasonably rely on it being there. There'll still be a multi-year
lead time as versions that support it become widespread enough to interest
non-libpq-based driver authors.


> My PoC strategy is to touch existing code as little as possible. Yet if
> the ProcessStartupPacket can somehow return the consumed bytes back to the
> TLS lib for negotiation, then there’s zero cost to protocol detection for
> v2/v3 clients and only h2 clients pay the price of the extra check.
>

As others have noted, you'll want to find a way to handle this in the least
SSL-implementation-specific manner possible. IMO if it can't work with
OpenSSL, Windows's SSL implementation and OS X's SSL framework it's a
non-starter.


>
> > - Dependency on a new external library. Fortunately, it's MIT
> >  licensed, so it's PostgreSQL compatible, but what happens if it
> >  becomes unmaintained? This has happened a couple of times, and it
> >  causes overhead that needs to be taken into account.
>
> I chose nghttp because it gave me a quick start, it’s well designed, a
> good fit for this kind of work, and fortunately indeed, the license is
> compatible. (Also, curl links to it as well, so am pretty confident it’ll
> be around). Very possible that over time h2 parsing code migrates into pg
> codebase. There are so much similarities to v3 architecture, we might find
> a way to generalize both into a single codebase. Then h2 frame parser/state
> machine becomes only a handful of .c files.
>
> h2 is a standard; however you decide to parse it, your code will
> eventually converge to a stable state in the same manner that febe v3 code
> did. Once we master the protocol, I don’t think there’ll be much need to
> touch the framing code. IOW even if we just import what we need, it won’t
> be a big issue.
>

While I'm a big fan of code reuse and using existing libraries, I
understand others' hesitance here. Look at what happened with ossp-uuid;
that was painful and it was just a contrib.

It's a difficult balance between NIH and maintaining a stable core.



>
> * Is there merit in the idea of a completely new v4 protocol—one that
> freezes the v3 and takes a new path?
>

Likely so... but it has to be pretty compelling IMO. And more importantly,
offer a smooth backwards- and forwards-compatible path.


>
> * What are the criteria for getting this into the core?
>

Mine would be:

- No new/separate port required. Works on existing port.

- Doesn't break old clients connecting to new servers

- Doesn't break new clients connecting to old servers

- No extra round trips for new client -> old server . I don't personally
care about old client -> new server so much, but should be able to offer a
pg_hba.conf option to ensure v3 proto only or otherwise prevent extra round
trips in this case too.

- Offers significant, concrete benefits and solves the outstanding set of
issues with v3 comprehensively

- Offers a really strong extensibility path for client-requested and
server-requested optional protocol features as well as protocol version
negotiation, with no extra round trips whenever possible.

- Has a wireshark dissector

- Is practical to implement in connection pooler proxies like pgbouncer,
pgpool

- Can be made wholly transparent to clients of libpq, i.e. no extra headers
or libraries to link

- Works on windows and osx too

- Any libraries used are widespread enough that they're present in at least
RHEL7 and Debian Stable. We *can't* just bundle extras in our sources, and
packagers are unlikely to be at all happy packaging an extra lib or
backport for us. They'll probably just disable the new protocol.

- No regressions for support of SASL / SCRAM, GSSAPI, TLS with X.509 client
certs, various other auth methods.

Now, a protocol that cannot satisfy these is IMO not a complete
non-starter. It just has to be treated as an optional feature to help out
webapps, with quite different design criteria as a result, and cannot be
allowed to be as intrusive. Where changes to core protocol logic paths are
required it'd have to add plugin mechanisms/hooks instead of adding its own
new logic directly.

Make sense?


>
> * Is it better to develop in an experimental fork until the architecture
> is stable and than patch onto the master, or are we supposed to keep
> proposing patches for inclusi

Re: Proposal: http2 wire format

2018-03-25 Thread Damir Simunic
> On 25 Mar 2018, at 19:42, David Fetter  wrote:
> 
> On Sat, Mar 24, 2018 at 06:52:47PM +0100, Damir Simunic wrote:
>> Hello hackers,
>> 
>> I’d like to propose the implementation of new wire protocol using http2 
>> framing. 
> 
> Welcome to the PostgreSQL community!  This is a very interesting idea.
> Please send a patch to this mailing list on this thread.
> 

Thanks David, very excited to be part of pgsql-hackers!

> In order to get and keep it on the radar, you should know about how
> development works in PostgreSQL.
> 
> http://wiki.postgresql.org/wiki/Development_information
> 
> In particular, please look at: 
> http://wiki.postgresql.org/wiki/Submitting_a_Patch
> 

To put it out front: my forte is product design, not C coding. (Also, I made a 
grammar error in the opening sentence: I’m not proposing “the implementation”, 
but “implementing h2 as new wire proto”)

I did study all of the resources you mentioned. And am voraciously reading up 
on Postgres internals, scouring its source, practicing C development, etc. 

My email is the result of the first advice under “Brand new features” in “So 
you want to be a developer?”.

> I notice that you patched 10. New features, and this is definitely
> one, go against git master.
>  

Let me figure out how to do that pronto. 10.2 tarball was easier to learn from 
as it was not a moving target. Whatever I did so far is not yet patch-worthy.

>> It appears to me that http2 solves many of the issues on the TODO
>> list under “Wire Protocol Changes / v4 Protocol,“ without any
>> obvious downsides. 
> 
> Here are a few things to consider, at least from my perspective:
> 
> - Docs. Gotta have some: https://wiki.postgresql.org/wiki/Documentation_Tools

No worries about that—I love writing :)

> 
> - Testing. Gotta have some in src/test/regress in the source tree.

Before even getting to the patch stage, there will be a period of discussion 
about latency and other tradeoffs. Mandatory part of any conversation 
mentioning a wire protocol.

So the plan is to come up with a working prototype that we can plug into 
protocol testing tools and measure the heck out of it in context. Yet one more 
thing to figure out. BTW, are there any formal tests of that kind for v3 
protocol?

By that time I do hope to learn how to write code tests to put into 
src/test/regress.

> 
> - Tight coupling to OpenSSL, if that's actually what's happening.
>  We're actively trying to get away from this, so a TLS-neutral
>  implementation or at least one that's not specific to OpenSSL would
>  be good.

Didn’t know that. Will ifdef the openssl-dependent code. It’s not hard to 
implement ALPN nego to cover all viable libraries. Do you know what 
alternatives are being considered?

> 
> - Overhead for all clients. It may be tiny, but it needs to be
>  measured and that cost needs to be weighed against the benefits.
>  Maybe a cache miss in the context of a network connection is
>  negligible, but we do need to know.

Important point. If h2 is to be seriously considered, then it must be an 
improvement in absolutely every aspect. 

The core part of this proposal is that h2 is parallel to v3. Something one can 
opt into by compiling `--with_http2`. 

Even if h2 finds its way already into PG12, its likely that the existing 
installed base would elect not to compile it in as there are no immediate 
benefits to them. The first wave of users will be web-facing apps. They already 
pay the penalty of conversion to/from v3, so in those scenarios the switch will 
be a gain.

Then again, if h2 becomes the new v4, then libpq-fe will support for it, so we 
might find that the savings in one or two network round trips amply offset one 
byte socket peek, and everyone will eagerly upgrade. Who knows.

My PoC strategy is to touch existing code as little as possible. Yet if the 
ProcessStartupPacket can somehow return the consumed bytes back to the TLS lib 
for negotiation, then there’s zero cost to protocol detection for v2/v3 clients 
and only h2 clients pay the price of the extra check.

> 
> - Dependency on a new external library. Fortunately, it's MIT
>  licensed, so it's PostgreSQL compatible, but what happens if it
>  becomes unmaintained? This has happened a couple of times, and it
>  causes overhead that needs to be taken into account.

I chose nghttp because it gave me a quick start, it’s well designed, a good fit 
for this kind of work, and fortunately indeed, the license is compatible. 
(Also, curl links to it as well, so am pretty confident it’ll be around). Very 
possible that over time h2 parsing code migrates into pg codebase. There are so 
much similarities to v3 architecture, we might find a way to generalize both 
into a single codebase. Then h2 frame parser/state machine becomes only a 
handful of .c files. 

h2 is a standard; however you decide to parse it, your code will eventually 
converge to a stable state in the same manner that febe v3 code did. Once we 
master the protocol, I don’t t

Re: Proposal: http2 wire format

2018-03-25 Thread David Fetter
On Sat, Mar 24, 2018 at 06:52:47PM +0100, Damir Simunic wrote:
> Hello hackers,
> 
> I’d like to propose the implementation of new wire protocol using http2 
> framing. 

Welcome to the PostgreSQL community!  This is a very interesting idea.
Please send a patch to this mailing list on this thread.

In order to get and keep it on the radar, you should know about how
development works in PostgreSQL.

http://wiki.postgresql.org/wiki/Development_information

In particular, please look at: 
http://wiki.postgresql.org/wiki/Submitting_a_Patch

I notice that you patched 10. New features, and this is definitely
one, go against git master.

> It appears to me that http2 solves many of the issues on the TODO
> list under “Wire Protocol Changes / v4 Protocol,“ without any
> obvious downsides. 

Here are a few things to consider, at least from my perspective:

- Docs. Gotta have some: https://wiki.postgresql.org/wiki/Documentation_Tools

- Testing. Gotta have some in src/test/regress in the source tree.

- Tight coupling to OpenSSL, if that's actually what's happening.
  We're actively trying to get away from this, so a TLS-neutral
  implementation or at least one that's not specific to OpenSSL would
  be good.

- Overhead for all clients. It may be tiny, but it needs to be
  measured and that cost needs to be weighed against the benefits.
  Maybe a cache miss in the context of a network connection is
  negligible, but we do need to know.

- Dependency on a new external library. Fortunately, it's MIT
  licensed, so it's PostgreSQL compatible, but what happens if it
  becomes unmaintained? This has happened a couple of times, and it
  causes overhead that needs to be taken into account.

> My hope is that this post leads to a conversation and gets a few
> people excited about the idea the way I am. Maybe even some of the
> GSoC students would take the implementation further?

The conversation has started.

Again, welcome, and thanks for jumping in!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Proposal: http2 wire format

2018-03-24 Thread Damir Simunic
Hello hackers,



I’d like to propose the implementation of new wire protocol using http2 
framing. 

It appears to me that http2 solves many of the issues on the TODO list under 
“Wire Protocol Changes / v4 Protocol,“ without any obvious downsides. 

The implementation I have in mind has zero impact on existing clients. No 
changes to the format of existing v3 protocol. The new protocol works through a 
few small additions to postmaster.c to intercept TLS requests, and the rest in 
new source files, linked through PQcommMethods.

I’d like to emphasize that this proposal is empathically NOT about “let’s 
handle REST in the database” or some such. It’s about upgrading the framing, 
where http2 offers many benefits: content negotiation, concurrent bidirectional 
streams, extensible frame types, metadata/data split into headers/trailers and 
data frames, flow control, etc. It’s at least as efficient as febe v3. A lot of 
research is going into it to make it even more efficient and latency friendly. 
The mechanisms it provides for content negotiation, (and with ALPN, protocol 
negotiation), offers us a future-friendly way to evolve without the burden of 
backward compatibility compromises.

Before writing this proposal, I set out to create a proof of concept. My goal 
for the PoC is to be able to connect to the server using an existing http2 
client and get json back:

curl -k https://localhost:5432/some_func \
--http2-prior-knowledge --tlsv1.2 \
-H 'pg-database: postgres' \
-H 'pg-user: web'  \
-H ‘authorization: ….’
-H ‘accept: application/json’

{ result: [ … ] }

After spending a week getting up to speed with C, libpq internals, http2 
standard, libnghttp2 interface, etc., I’m fairly convinced that pg/http2 is 
feasible.

Sadly, my experience with C and Postgres internals is non-existent, and I am 
not yet able to finalize a live demo. The above curl request does establish the 
connection, receives the settings frame and queries the database, but I’m still 
struggling with writing code to return the http2 response. At this stage, it’s 
purely an issue of mechanically writing the code, I think I solved how it all 
works in principle.

If anyone finds the idea of Postgres speaking http2 appealing, I’d welcome 
guidance/mentoring/coding help (or just plain taking over). I a put up a repo 
with the results so far and a longer writeup: https://github.com/dsimunic/pg_h2 

All changes I made to the codebase are in a single commit, hopefully easy to 
understand what is happening. You’ll need libnghttp2 and openssl 1.0.2 or newer 
to compile.

My hope is that this post leads to a conversation and gets a few people excited 
about the idea the way I am. Maybe even some of the GSoC students would take 
the implementation further?


Damir