Re: [HACKERS] Roadmap for FE/BE protocol redesign

Barry Lind Wed, 12 Mar 2003 00:08:01 -0800

* Backend should pass its version number, database encoding, default
client encoding, and possibly other data (any ideas?) to frontend during
startup, to avoid need for explicit queries to get this info.  We could
also consider eliminating SET commands sent by libpq in favor of adding
variable settings to startup packet's PGOPTIONS field.  Ideally we could
get back to the point where a standard connection startup takes only one
packet in each direction.

This handles the JDBC needs (currently on startup the jdbc driver selects the database encoding and version number and sets the datestyle and autocommit parameters).

One addition I would personally like to see (it comes up in my apps code) is the ability to detect wheather the server is big endian or little endian. When using binary cursors this is necessary in order to read int data. Currently I issue a 'select 1' statement at connection startup to determine what format the server is using.

Other things I would like to see to help jdbc:

1) More information about the attributes selected in a query (I see there is an entire thread on this already) to minimize the work necessary to implement updateable result sets as defined by the jdbc spec.

2) Better support for domains. Currently the jdbc driver is broken with regards to domains (although no one has reported this yet). The driver will treat a datatype that is a domain as an unknown/unsupported datatype. It would be great if the T response included the 'base' datatype for a domain attribute so that the driver would know what parsing routines to call to convert to/from the text representation the backend expects.

3) Protocol level support for CURSORs. It would be nice if cursor support was done at the protocol level and not as a SQL command. The current default behavior of returning all results from a query in the query response message is often a problem (can easily lead to out of memory problems for poorly written queries). So it is desirable to use cursors. But with the current implementation in SQL, cursors are not the appropriate choice if a query is only going to return one or a few rows. The reason is that using a cursor requires a minimum of three SQL statements: DECLARE, FETCH, CLOSE. The jdbc driver issues the DECLARE and FETCH in one server call, but the CLOSE needs to be a second call. Thus for simple one row selects (which in many cases are the majority of selects issued) using CURSORS requires two roundtrips to the server vs. one for the nonCursor case. This leaves me with a problem in the jdbc driver, I can either use standard fast/performant queries for single row selects that blowup with out of memory errors for large results, or I can use cursors and avoid large memory usage but hurt overall performance. What I have currently done is require that the developer call an extra method to turn on the use of cursors when they know that the cursor is going to return a large number of rows and leave the default be the non-cursor case. This works but requires that developers who are writing code to interact with multiple different databases, code differently for the postgres jdbc driver. And this is a problem since one of the goals of jdbc is to be able to write code that works against multiple different databases. So I would request the ability of the client to set a max rows parameter for query results. If a query were to return more than the max number of rows, the client would be given a handle (essentially a cursor name) that it could use to fetch additional sets of rows.

4) Protocol level support of PREPARE. In jdbc and most other interfaces, there is support for parameterized SQL. If you want to take advantage of the performance benefits of reusing parsed plans you have to use the PREPARE SQL statement. My complaint on doing this at the SQL level vs the protocol level is similar to the problem with cursors above. To use prepare you need to issue three SQL statements: PREPARE, EXCECUTE, DEALLOCATE. If you know ahead of time that you are going to reuse a statement many times doing PREPARE, EXECUTE, EXECUTE, ..., DEALLOCATE makes sense and can be a big win in performance. However if you only ever execute the statement once then you need to use two round trips (one for the PREPARE, EXECUTE and another for the DEALLOCATE) versus one round trip to execute the statement 'normally'. So it decreases performance to use prepares for all parameterized sql statements. So the current implementation in jdbc requires the user to issue a postgres specific call to turn on the use of prepared statements for those cases the developer knows will be a performance win. But this requires coding differently for postgres jdbc than for other databases. So being better able to handle this in the protocol would be nice.

5) Better support for "large values". Generally I recommend that users of jdbc use bytea to store large binary values. I generally tell people to avoid using LOs (Large Objects). The reason for this is that LOs have two significant problems: 1) security - any user on the database can access all LOs even though they may not be able to access the row that contains the LO reference, 2) cleanup - deleting the row containing the LO reference doesn't delete the LO requireing extra code or triggers to behave like a regular value in a regular column. Bytea works OK for small to medium sized values, but doesn't work for very large values, where by very large I mean over a few Megabytes. The reason very large values are a problem is memory usage. There is no way to 'stream' bytea values from the server like you can do with LOs, so the driver ends up storeing the entire value in memory as it reads the result from the backend for a query. And if the query returns multiple rows each with a large value you quickly run out of memory. So what I would like to see is the ability for the client to set a MAX VALUE size parameter. The server would send up to this amount of data for any column. If the value was longer than MAX VALUE, the server would respond with a handle that the client could use to get the rest of the value (in chunks of MAX VALUE) if it wanted to. This would allow the client to get the entire result set which could contain perhaps many large bytea values, but not use a lot of memory up front. Then fetch the entire values only when/if the application asked for them and stream the result to the application and never bring the entire contents of the column into memory at once. (There are probably a number of different implementation posibilities so use this one as a suggestion to explain what I would like to see not necessarily how it should be implemented).

6) Better over the wire support for bytea. The current encoding of binary data \000 results in a significant expansion in the size of data transmitted. It would be nice if bytea data didn't result in 2 or 3 times data expansion. (and all the cpu cycles to convert to/from the escaped format). This may not be a protocol issue, but IMHO the best way to fix this would be in the protocol.

thanks,
--Barry


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Roadmap for FE/BE protocol redesign

Reply via email to