Re: [HACKERS] No merge sort?
On Thu, Mar 13, 2003 at 10:30:27PM -0500, Tom Lane wrote: > The idea is you look at the index to make a list of main-table tuple > positions you are interested in, which you represent compactly as a > compressed bitmap. (There is some finagling needed because PG actually > uses block/line number rather than a pure tuple number to identify > tuples, but you can fake it with a reasonably small amount of overhead.) > Then you can combine multiple index inputs by ANDing or ORing bitmaps > (the OR case applies to your example). Finally, you traverse the heap, > accessing the desired rows in heap-location order. This loses in terms > of producing presorted output --- but it often saves enough in I/O costs > to more than justify doing the sort in memory. And it loses bigtime in the case of LIMIT. If the unlimited query returns 4,000 records and I only want 20, you're retrieving 200x too much data from disk. -- Taral <[EMAIL PROTECTED]> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me pgp0.pgp Description: PGP signature
Re: [HACKERS] Upgrading the backend's error-message infrastructure
"Christopher Kings-Lynne" <[EMAIL PROTECTED]> writes: > Would it be possible to do a command line app? > > bash$ pg_error 1200D > Severity: ERROR > Message: Division by zero > Detail: > Hint: Modify statement to prevent zeros appearing in denominators. You're assuming that there's a one-to-one mapping of error codes to messages, which is not likely to be the case --- for example, all the "can't happen" errors will probably get lumped together under a single "internal error" error code. You could provide a lookup of the spec-defined meaning of each error code, maybe. >> Is there any benefit to having this over just including an index of >> error codes in the documentation? > It's quick and easy, especially when there's thousands of error codes. But there aren't. I count about 130 SQLSTATEs defined by the spec. Undoubtedly we'll make more for Postgres-specific errors, but not hundreds more. There's just not value to applications in distinguishing errors at such a fine grain. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] No merge sort?
Taral <[EMAIL PROTECTED]> writes: > On Thu, Mar 13, 2003 at 04:28:34PM -0500, Tom Lane wrote: >> Seems like a waste of effort to me. I find this example less than >> compelling --- the case that could be sped up is quite narrow, >> and the potential performance gain not all that large. (A sort >> is a sort however you slice it, with O(N log N) runtime...) > Actually, it's O(N) time. Only if you assume a fixed number of input streams. >> Also, the direction we'd likely be going in in future is to merge >> multiple indexscans using bitmap techniques, so that the output >> ordering of the scans couldn't be counted on anyway. > I don't understand this. What do these bitmap techniques do? The idea is you look at the index to make a list of main-table tuple positions you are interested in, which you represent compactly as a compressed bitmap. (There is some finagling needed because PG actually uses block/line number rather than a pure tuple number to identify tuples, but you can fake it with a reasonably small amount of overhead.) Then you can combine multiple index inputs by ANDing or ORing bitmaps (the OR case applies to your example). Finally, you traverse the heap, accessing the desired rows in heap-location order. This loses in terms of producing presorted output --- but it often saves enough in I/O costs to more than justify doing the sort in memory. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Upgrading the backend's error-message infrastructure
> On Thu, 2003-03-13 at 21:16, Christopher Kings-Lynne wrote: > > Would it be possible to do a command line app? > > > > bash$ pg_error 1200D > > Severity: ERROR > > Message: Division by zero > > Detail: > > Hint: Modify statement to prevent zeros appearing in denominators. > > Is there any benefit to having this over just including an index of > error codes in the documentation? It's quick and easy, especially when there's thousands of error codes. Ideally, the pg_error app and the error code documentation should be automatically generated... You could have a built-in function: pg_print_error(text) returns text, then the pg_error command line program could just call that, plus the user could check up errors from within postgresql as well... Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Upgrading the backend's error-message infrastructure
On Thu, 2003-03-13 at 21:16, Christopher Kings-Lynne wrote: > Would it be possible to do a command line app? > > bash$ pg_error 1200D > Severity: ERROR > Message: Division by zero > Detail: > Hint: Modify statement to prevent zeros appearing in denominators. Is there any benefit to having this over just including an index of error codes in the documentation? Cheers, Neil -- Neil Conway <[EMAIL PROTECTED]> || PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Marc G. Fournier wrote: > On Tue, 11 Mar 2003, Bruce Momjian wrote: > > > Six months would be June 1 beta, so maybe that is still a good target. > > We released v7.3 just before Dec 1st, so six months is May 1st, not June > 1st ... Six months is June 1 --- December (1), January-May (5) == 6. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Upgrading the backend's error-message infrastructure
> Great work, Tom! > > While we're effectively changing every elog call site in the backend, > would it also be a good idea to adopt a standard for the format of error > messages? (e.g. capitalization, grammar, etc.) I 100% agree with this - a style guide! Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Upgrading the backend's error-message infrastructure
> Comments? All the error stuff sounds really neat. I volunteer for doing lots of elog changes when the time comes. Would it be possible to do a command line app? bash$ pg_error 1200D Severity: ERROR Message: Division by zero Detail: Hint: Modify statement to prevent zeros appearing in denominators. So people can look up errors offline (oracle-style) Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] SQL99 ARRAY support proposal
> implode(text[], text) returns text - join array elements into a > string using given string delimiter > > I'm open to opinions on implode() -- I only picked implode() because > that's what it is called in PHP. Any suggestions? It's also called 'join' in PHP... Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] No merge sort?
On Thu, Mar 13, 2003 at 04:28:34PM -0500, Tom Lane wrote: > Seems like a waste of effort to me. I find this example less than > compelling --- the case that could be sped up is quite narrow, > and the potential performance gain not all that large. (A sort > is a sort however you slice it, with O(N log N) runtime...) Actually, it's O(N) time. The index can produce "time" sorted data for each "id" in linear time, and the merge sort can merge them in linear time. Also, the existing system insists on loading _all_ candidate rows whereas this method can benefit from the limit. If you don't want to code it, I will. I need it for the livejournal mysql->postgresql transition. (No, mysql doesn't do it right either.) But a few pointers to the right places to look in the code would be helpful. > Also, the direction we'd likely be going in in future is to merge > multiple indexscans using bitmap techniques, so that the output > ordering of the scans couldn't be counted on anyway. I don't understand this. What do these bitmap techniques do? -- Taral <[EMAIL PROTECTED]> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me pgp0.pgp Description: PGP signature
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Tom Lane wrote: Barry Lind <[EMAIL PROTECTED]> writes: Tom Lane wrote: See binary cursors ... Generally that is not an option. It either requires users to code to postgresql specific sql syntax, or requires the driver to do it magically for them. Fair enough. I don't see anything much wrong with a GUC option that says "send SELECT output in binary format". This is not really a protocol issue since the ASCII and BINARY choices both exist at the protocol level --- there is nothing in the protocol saying binary data can only be returned by FETCH and not by SELECT. The main problem with it in present releases is that binary data is architecture-dependent and so encouraging its general use seems like a really bad idea. But if we manage to get send/receive conversion routines in there, most of that issue would go away. That would be great. The describe request is generally only done once even though you may do multiple fetchs (unlike todays protocol which includes the describe information on every fetch, even if you are fetching one row at a time). I'm less than excited about changing that, because it breaks clients that don't want to remember past RowDescriptions (libpq being the front-line victim), and it guarantees loss-of-synchronization failures anytime the client misassociates rowdescription with query. In exchange for that, we get what exactly? Fetching one row at a time is *guaranteed* to be inefficient. The correct response if that bothers you is to fetch multiple rows at a time, not to make a less robust protocol. I don't feel strongly either way on this one, but IIRC the SQL standard for cursors only specifies fetching one record at a time (at least that is how MSSQL and DB2 implement it). Thus portable code is likely to only fetch one record at a time. The current row description isn't too big, but with the changes being suggested it might become so. thanks, --Barry ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Hiroshi Inoue <[EMAIL PROTECTED]> writes: > Hmm as for PREPAREd statements, it seems much better to > implement functions which returns fields info for the > statement than relying on such a protocol level change. Well, we're changing the protocol anyway for other purposes, so the extra burden of a change here doesn't seem large. I think it's more useful to worry about what functionality we want than to worry about whether it's changed... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] SQL99 ARRAY support proposal
Joe Conway <[EMAIL PROTECTED]> writes: > I'm leaning toward implode() and explode() now anyway because split() uses a > regex for the delimiter in PHP (and probably Perl), and I was not planning to > get that fancy. PHP isn't exactly an exemplar for great language design. explode/implode are terribly non-self-descriptive names. Someone seeing them for the first time wouldn't really have any clue what they did and would have zero chance of guessing their names to find them in an index. I would suggest join_str() and split_str() if "join" is too sensitive a word for an sql language. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Tom Lane wrote: > > "Dave Page" <[EMAIL PROTECTED]> writes: > > It's rumoured that Hiroshi Inoue once said: > >> Does looking up by the catalog keys take no cost ? > > > Obviously there is cost, but doing a lookup only on demand, has got to be > > cheaper in the long run than including the entire column definition in the > > message whether it's wanted or not? > > More to the point, the cost is paid by applications that want the > functionality, and not by those that don't. > > It'd probably be reasonable for client libraries to maintain a cache > of column info, so that they only have to query the backend about a > particular column ID once per connection. Is it a kind of thing that the server forces the clients easily ? Hmm as for PREPAREd statements, it seems much better to implement functions which returns fields info for the statement than relying on such a protocol level change. regards, Hiroshi Inoue http://www.geocities.jp/inocchichichi/psqlodbc/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Barry Lind <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> See binary cursors ... > Generally that is not an option. It either requires users to code to > postgresql specific sql syntax, or requires the driver to do it > magically for them. Fair enough. I don't see anything much wrong with a GUC option that says "send SELECT output in binary format". This is not really a protocol issue since the ASCII and BINARY choices both exist at the protocol level --- there is nothing in the protocol saying binary data can only be returned by FETCH and not by SELECT. The main problem with it in present releases is that binary data is architecture-dependent and so encouraging its general use seems like a really bad idea. But if we manage to get send/receive conversion routines in there, most of that issue would go away. > The describe request is generally only > done once even though you may do multiple fetchs (unlike todays protocol > which includes the describe information on every fetch, even if you are > fetching one row at a time). I'm less than excited about changing that, because it breaks clients that don't want to remember past RowDescriptions (libpq being the front-line victim), and it guarantees loss-of-synchronization failures anytime the client misassociates rowdescription with query. In exchange for that, we get what exactly? Fetching one row at a time is *guaranteed* to be inefficient. The correct response if that bothers you is to fetch multiple rows at a time, not to make a less robust protocol. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] SQL99 ARRAY support proposal
Tom Lane wrote: Joe Conway <[EMAIL PROTECTED]> writes: Hmmm -- I doubt that would fly, although I see it is specifically allowed as a function name (func_name_keyword list). Anyone have opinions on this either way? Good point --- it would work today, but any small tweak in the JOIN grammar might force us to reserve the keyword altogether. It'd be safer to use a name that is not an SQL keyword ... I'm leaning toward implode() and explode() now anyway because split() uses a regex for the delimiter in PHP (and probably Perl), and I was not planning to get that fancy. Joe ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] SQL99 ARRAY support proposal
Joe Conway <[EMAIL PROTECTED]> writes: >> In both Perl and Python, that type of function is called "join". > Hmmm -- I doubt that would fly, although I see it is specifically > allowed as a function name (func_name_keyword list). Anyone have > opinions on this either way? Good point --- it would work today, but any small tweak in the JOIN grammar might force us to reserve the keyword altogether. It'd be safer to use a name that is not an SQL keyword ... regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] [INTERFACES] Upgrading the backend's error-message infrastructure
Jean-Luc Lachance <[EMAIL PROTECTED]> writes: > Why trade 5 characters for a 4 byte integer -- a saving of 1 byte? It's more than that: in one case you have something on the order of a "load immediate" instruction, whereas in the other case the code is like "load pointer to global string", plus you need a 6-byte string literal (maybe costing you 8 bytes depending on alignment considerations). Also, depending on your machine's approach to addressing of global data, that "load pointer" thingy could be multiple instructions. So we're talking about at least six, possibly 8-12 bytes per elog call --- and there are thousands of 'em in the backend. Admittedly, it's a micro-optimization, but it seems worth doing since it won't have any direct impact on code legibility. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Upgrading the backend's error-message infrastructure
Neil Conway <[EMAIL PROTECTED]> writes: > While we're effectively changing every elog call site in the backend, > would it also be a good idea to adopt a standard for the format of error > messages? (e.g. capitalization, grammar, etc.) Yup. I was planning to bring that up as a separate thread. I think Peter has already put some thought into it, but I couldn't find anything in the archives... > If we wanted to get fancy, we could make use of the glibc ability to > generate a back trace programatically: Hmm ... maybe. Certainly we all too often ask people to get this info by hand ... too bad it only works in glibc though. >> In gcc-compiled >> backends, the function name will be provided automatically by errstart, >> but there will be some places where we need the name to be available even >> in a non-gcc build. > To be honest, I'd be sceptical whether there are enough platforms > without *either* gcc or a C99 compiler that it's worthwhile worrying > about them that much (all that is at stake is some backward > compatibility, anyway). I'm only planning to bother with the errfunction hack for messages that I know are being specifically tested for by existing frontends. ecpg looks for "PerformPortalFetch" messages, for example. If we don't keep that name in the (old version of the) error message then we have a compatibility problem. But I do want to move away from having function names in the primary error message text. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Roadmap for FE/BE protocol redesign
> -Original Message- > From: Dave Page > > > -Original Message- > > From: Hiroshi Inoue [mailto:[EMAIL PROTECTED] > > Sent: 13 March 2003 10:04 > > To: Dave Page > > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > > [EMAIL PROTECTED]; [EMAIL PROTECTED]; > > [EMAIL PROTECTED] > > Subject: Re: [HACKERS] Roadmap for FE/BE protocol redesign > > > > > > Dave Page wrote: > > > > > > It's rumoured that Hiroshi Inoue once said: > > > > > > > > Does looking up by the catalog keys take no cost ? > > > > > > Obviously there is cost, but doing a lookup only on demand, > > has got to > > > be cheaper in the long run than including the entire column > > definition > > > in the message whether it's wanted or not? > > > > So if there are 100 fields, should we ask the backend > > the column name 100 times ? > > No, the column name is there already. The column name isn't there. If a field has its alias name the alias is there. regards, Hiroshi Inoue ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Upgrading the backend's error-message infrastructure
On Thu, 2003-03-13 at 15:51, Tom Lane wrote: > After digging through our many past discussions of what to do with error > messages, I have put together the following first-cut proposal. Great work, Tom! While we're effectively changing every elog call site in the backend, would it also be a good idea to adopt a standard for the format of error messages? (e.g. capitalization, grammar, etc.) > extern int errmsg_internal(const char *fmt, ...); > > Like errmsg() except that the first parameter is not subject to > gettext-ification. My thought is that this would be used for internal > can't-happen conditions; there's no need to make translators labor over > translating stuff like "eval_const_expressions: unexpected boolop %d", > nor even to make them think about whether they need to. If we wanted to get fancy, we could make use of the glibc ability to generate a back trace programatically: http://www.gnu.org/manual/glibc-2.2.5/html_node/Backtraces.html#Backtraces > In gcc-compiled > backends, the function name will be provided automatically by errstart, > but there will be some places where we need the name to be available even > in a non-gcc build. To be honest, I'd be sceptical whether there are enough platforms without *either* gcc or a C99 compiler that it's worthwhile worrying about them that much (all that is at stake is some backward compatibility, anyway). Cheers, Neil -- Neil Conway <[EMAIL PROTECTED]> || PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Upgrading the backend's error-message infrastructure
--On Thursday, March 13, 2003 16:20:21 -0500 Tom Lane <[EMAIL PROTECTED]> wrote: Larry Rosenman <[EMAIL PROTECTED]> writes: __FUNCTION__ or an equivalent is MANDATED by C99, and available on UnixWare's native cc. You might want to make a configure test for it. Right, __func__ is the C99 spelling. I did have a configure test in mind here: __func__ or __FUNCTION__ or NULL is what would get compiled in. One nice thing about this approach is that we need change only one place to adjust the set of behind-the-scenes error parameters. Ok, you had said GCC only. Please do use the configure test, and __func__ if it's available. Thanks, LER -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] No merge sort?
Taral <[EMAIL PROTECTED]> writes: > Do I need to code merge sort into postgresql? Seems like a waste of effort to me. I find this example less than compelling --- the case that could be sped up is quite narrow, and the potential performance gain not all that large. (A sort is a sort however you slice it, with O(N log N) runtime...) Also, the direction we'd likely be going in in future is to merge multiple indexscans using bitmap techniques, so that the output ordering of the scans couldn't be counted on anyway. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Upgrading the backend's error-message infrastructure
Larry Rosenman <[EMAIL PROTECTED]> writes: > __FUNCTION__ or an equivalent is MANDATED by C99, and available on > UnixWare's native cc. > You might want to make a configure test for it. Right, __func__ is the C99 spelling. I did have a configure test in mind here: __func__ or __FUNCTION__ or NULL is what would get compiled in. One nice thing about this approach is that we need change only one place to adjust the set of behind-the-scenes error parameters. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Upgrading the backend's error-message infrastructure
--On Thursday, March 13, 2003 15:51:00 -0500 Tom Lane <[EMAIL PROTECTED]> wrote: (__FUNCTION__ is only used if we are compiling in gcc). errstart() pushes an empty entry onto an error-data-collection stack and fills in the behind-the-scenes file/line entries. errmsg() and friends stash values into the top-level stack entry. Finally errfinish() assembles and emits the completed message, then pops the stack. By using a stack, we can be assured that things will work correctly if a message is logged by some subroutine called in the parameters to ereport (not too unlikely when you think about formatting functions like format_type_be()). __FUNCTION__ or an equivalent is MANDATED by C99, and available on UnixWare's native cc. You might want to make a configure test for it. I believe the __func__ is the C99 spelling (that's what's available on UnixWare): $ cc -O -o testfunc testfunc.c $ ./testfunc function=main,file=testfunc.c,line=4 $ cat testfunc.c #include int main(int argc,char **argv) { printf("function=%s,file=%s,line=%d\n",__func__,__FILE__,__LINE__); } $ -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] No merge sort?
I tried general, but no response. Anyone here can shed some light on the issue? Do I need to code merge sort into postgresql? - Forwarded message from Taral <[EMAIL PROTECTED]> - From: Taral <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Date: Wed, 12 Mar 2003 17:54:35 -0600 Subject: [GENERAL] No merge sort? Message-ID: <[EMAIL PROTECTED]> I have a table "test" that looks like this: CREATE TABLE test ( id BIGINT, time INTEGER ); There is an index: CREATE INDEX idx ON test(id, time); The table has been loaded with 2M rows, where time ranges sequentially from 0 to 199 and id is random values from 0 to 4. This query: SELECT * FROM idx WHERE id IN (...) AND time > 198000 AND time < 199800 ORDER BY time DESC LIMIT 20; has an EXPLAIN ANALYZE of: Limit (cost=3635.28..3635.28 rows=20 width=12) (actual time=22.94...22.96 rows=14 loops=1) -> Sort (cost=3635.28..3635.28 rows=23 width=12) (actual time=22.93..22.93 rows=14 loops=1) -> Index Scan using idx, idx, ..., idx, idx on test (cost=0.00...3634.77 rows=23 width=12) (actual time=1.01..22.10 rows=14 loops=1) Total runtime: 29.12 msec This query: SELECT * FROM idx WHERE id IN (...) AND time < 199800 ORDER BY time DESC LIMIT 20; has an EXPLAIN ANALYZE of: Limit (cost=14516.46..14516.46 rows=20 width=12) (actual time=1448..83..1448.86 rows=20 loops=1) -> Sort (cost=14516.46..14516.46 rows=2527 width=12) (actual time=1448.82..1448.83 rows=21 loops=1) -> Index Scan using idx, idx, ..., idx, idx on test (cost=0.00...14373.67 rows=2527 width=12) (actual time=0.14..1437.33 rows=2048 loops=1) Total runtime: 1454.62 msec Since the index will output 'time' sorted data for each 'id', why isn't a merge sort being used here? A merge sort would reduce the execution time back to 30 ms. -- Taral <[EMAIL PROTECTED]> This message is digitally signed. Please PGP encrypt mail to me. "Most parents have better things to do with their time than take care of their children." -- Me pgp0.pgp Description: PGP signature
Re: [HACKERS] SQL99 ARRAY support proposal
Hannu Krosing <[EMAIL PROTECTED]> writes: > Tom Lane kirjutas N, 13.03.2003 kell 19:12: >>> The standard spelling for that appears to be >>> somearray || ARRAY[element] >>> which also has the nice property that it is commutative. >> >> Sure ... but that just means that || is the operator name for the >> underlying array_push function. We still need a way to declare this >> operation as a function. > I think he mant that you just need to conacat for too arrays, no need > for single-element push/append. Oh, I see. But my point remains: unless you want to take || out of the domain of operators and make it something hard-wired into the parser, there has to be an underlying function with a matching signature. So all these problems come up anyway. > contrib/intarray has the following functions (note that they use + for > || above) The reason that stuff is still contrib, and not mainstream, is we didn't have a way to make the functions polymorphic. One-datatype-at-a-time interface functions are not appealing, especially not when they have to be hand-coded in C. But with the features discussed in this thread, we could make the intarray functionality datatype-independent --- whereupon I for one would vote to move it into the mainstream. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] SQL99 ARRAY support proposal
Tom Lane kirjutas N, 13.03.2003 kell 19:12: > Peter Eisentraut <[EMAIL PROTECTED]> writes: > > OK, let's look at these more closely: > > >> array_push(anyarray, anyelement) returns anyarray > > > The standard spelling for that appears to be > > somearray || ARRAY[element] > > which also has the nice property that it is commutative. > > Sure ... but that just means that || is the operator name for the > underlying array_push function. We still need a way to declare this > operation as a function. I think he mant that you just need to conacat for too arrays, no need for single-element push/append. OTOH a separate push may be more efficient contrib/intarray has the following functions (note that they use + for || above) OPERATIONS: int[] && int[] - overlap - returns TRUE if arrays has at least one common elements. int[] @ int[] - contains - returns TRUE if left array contains right array int[] ~ int[] - contained - returns TRUE if left array is contained in right array # int[] - return the number of elements in array int[] + int - push element to array ( add to end of array) int[] + int[] - merge of arrays (right array added to the end of left one) int[] - int - remove entries matched by right argument from array int[] - int[] - remove right array from left int[] | int - returns intarray - union of arguments int[] | int[] - returns intarray as a union of two arrays int[] & int[] - returns intersection of arrays int[] @@ query_int - returns TRUE if array satisfies query (like '1&(2|3)') query_int ~~ int[] - -/- - Hannu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Upgrading the backend's error-message infrastructure
(Or, protocol upgrade phase 1...) After digging through our many past discussions of what to do with error messages, I have put together the following first-cut proposal. Fire at will... Objective - The basic objective here is to divide error reports into multiple fields, and in particular to include an "error code" field that gives applications a stable value to test against when they're trying to find out what went wrong. (I am not spending much space in this proposal on the question of exactly what set of error codes we ought to have, but that comes soon.) Peter Eisentraut argued cogently awhile back that the error codes ought not be hard-wired to specific error message texts, so this proposal treats them as separate entities. Wire-protocol changes - Error and Notice (maybe also Notify?) msgs will have this structure: E x string \0 x string \0 x string \0 \0 where the x's are single-character field identifiers. A frontend should simply ignore any unrecognized fields. Initially defined fields for Error and Notice are: S Severity --- the string is "ERROR", "FATAL", or "PANIC" (if E msg) or "WARNING", "NOTICE", "DEBUG", "INFO", or "LOG" (if N msg). (Should this string be localizable? Probably, assuming that the E/N distinction is all the client library really cares about.) C Code --- SQLSTATE code for error (a 5-character string per SQL spec). Not localizable. M Message --- the string is the primary error message (localized). D Detail --- secondary error message, carrying more detail about the problem (localized). H Hint --- a suggestion what to do about the error (localized). P Position --- the string is a decimal ASCII integer, indicating an error cursor position as an index into the original query string. First character is index 1. Q: measure index in bytes, or characters? Latter seems preferable considering that an encoding conversion may have occurred. F File --- file name of source-code location where error was reported (__FILE__) L Line # --- line number of source-code location (__LINE__) R Routine --- source code routine name reporting error (__func__ or __FUNCTION__) S,C,M fields will always appear (at least in Error messages; perhaps Notices might omit C?). The rest are optional. Why three textual message fields? 'M' should always appear, 'D' and 'H' are optional (and relatively rare). The convention is that the primary 'M' message should be accurate but terse (normally one line); if more info is needed than can reasonably fit on a line, use the detail message to carry additional lines. A "hint" is something that doesn't directly describe the error, but is a suggestion what to do to get around it. 'M' and 'D' should be factual, whereas 'H' may contain some guesswork, or advice that might not always apply. Client interfaces are expected to report 'M', but might suppress 'D' and/or 'H' depending on factors such as screen space. (Preferably they should have a verbose mode that shows all available info, though.) Error codes --- The SQL spec defines a set of 5-character status codes (called SQLSTATE values). We'll use these as the language-independent identifiers for error conditions. There is code space reserved by the spec for implementation-defined error conditions, which we'll surely need. Per spec, each of the five characters in a SQLSTATE code must be a digit '0'-'9' or an upper-case Latin letter 'A'-'Z'. So it's possible to fit a SQLSTATE code into a 32-bit integer with some simple encoding conventions. I propose that we use such a representation in the backend; that is, instead of passing around strings like "1200D" we pass around integers formed like ((('1' - '0') << 6) + '2' - '0') << 6 ... This should save a useful amount of space per elog call site, and it won't obscure the code noticeably since all the common values will be represented as macro names anyway, something like #define ERRCODE_DIVISION_BY_ZERO MAKE_SQLSTATE('2','2', '0','1','2') We need to do some legwork to figure out what set of implementation-defined error codes we want. It might make sense to look and see what other DBMSes are using. Backend source-code representation for extended error messages -- How do we generalize the elog() interface to cope with all this stuff? I don't think I want a function with a fixed parameter list --- some sort of open-ended API would be a lot more forward-looking. After some fooling around I've come up with the following proposal. A typical elog() call might be replaced by ereport(ERROR, ERRCODE_INTERNAL, errmsg("Big trouble with table %s", name), errhint("Bail out now, boss")); ERROR is the severity level, same as before, and ERRCO
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Tom Lane wrote: Barry Lind <[EMAIL PROTECTED]> writes: AFAICS the only context where this could make sense is binary transmission of parameters for a previously-prepared statement. We do have all the pieces for that on the roadmap. Actually it is the select of binary data that I was refering to. Are you suggesting that the over the wire format for bytea in a query result will be binary (instead of the ascii encoded text format as it currently exists)? See binary cursors ... Generally that is not an option. It either requires users to code to postgresql specific sql syntax, or requires the driver to do it magically for them. The later runs into all the issues that I raised on cursor support. In general the jdbc driver is expected to execute arbitrary sql statements any application might want to send it. The driver is handicaped because it doesn't know really anything about that sql statement (other than it is a select vs an update or delete). Specifically it doesn't know what tables or columns that SQL will access or how many rows a select will return. All of this knowledge is in the backend, and short of implementing a full sql parser in java this knowledge will never exist in the front end. Many of the things I put on my wish list for the protocol stem from this. Where there are two ways to do something (use cursors or not, use prepared statements or not, use binary cursors or not) the driver either needs to a) choose one way and always use it, b) infer from the sql statement which way will be better, or c) require the user to tell us. The problem with a) is that it may not always be the correct choice. The problem with b) is that generally this isn't possible and the problem with c) is it requires that the user write code that isn't portable across different databases. I would like to simply do a) in all cases. But that means that one of the two options should always (or almost always) be the best choice. So in the case of "use cursors or not", it would be nice if using cursors added little or no overhead such that it could always be used. In the case of "use prepared statements vs not", it would be nice if prepared statements added little or no overhead so that they could always be used. And finally in the case of "use binary or regular cursors" it would be nice if binary cursors could always be used. The Oracle SQLNet protocol supports most of this. Though it has been a few years since I worked with it, the oracle protocol has many of the features I am looking for (and perhaps the reason I am looking for them, is that I have seen them used there before). Essentially the Oracle protocol lets you do the following operations: open, parse, describe, bind, execute, fetch, close. A request from the client to the server specifies what operations it wants to perform on a sql statement. So a client could request to do all seven operations (which is essentially what the current postgres protocol does today). Or it could issue an open,parse call which essentially is that same thing as the PREPARE sql statement, followed by a describe,bind,execute,fetch which is similar to an EXECUTE and FETCH sql statement and finally a close which is similar to a CLOSE and DEALLOCATE sql. The describe request is generally only done once even though you may do multiple fetchs (unlike todays protocol which includes the describe information on every fetch, even if you are fetching one row at a time). The oracle approach gives the client complete flexibility to do a lot, without requiring that the client start parsing sql statements and doing things like appending on DECLARE CURSOR, or FETCH in order to reformate the applications sql statement into the postgresql sql way of doing this. --Barry ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Couldn't it be done optionally, so the clients that want the info pay the price and those that don't want it get the speed and lower bandwidth? Just a thought andrew - Original Message - From: "Tom Lane" <[EMAIL PROTECTED]> > "Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes: > > Also doesn't the planner/executor already have all needed info available ? > > Not directly, and not necessarily in the form that the client would want > it in (eg, converting type OID to type name isn't free). I don't care > to load either the backend or the protocol down with the responsibility > for offering every piece of column data that a client could possibly > want as part of RowDescription. > > Besides, elsewhere in this thread we were hearing about how > RowDescription is already too much overhead for some people ;-) > > To my mind, the argument in favor of this feature is essentially that > it saves ODBC/JDBC from needing to duplicate the backend's SQL parser; > which is a legitimate concern. But that doesn't translate to saying > that we should push functionality out of the clients and into the > backend when it wouldn't be in the backend otherwise. That's just > moving code around on the basis of some rather-shaky arguments about > performance. And what happens when your client wants something > different from the exact functionality that was pushed to the backend? > You're back to square one. > > regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] SQL99 ARRAY support proposal
On Thu, 2003-03-13 at 12:53, Joe Conway wrote: > Peter Eisentraut wrote: > array_pop(anyarray) returns anyelement > > > > That appears to mean that you return somearray[0] and alter the array > > as a side effect. How do you plan to do that? > > I'll give you this one -- doesn't make sense. > > split(text, text) returns text[] - split string into array on > delimiter > > There was a thread on this last September, and the consensus was that > the function *should* be name split, in order to be consistent with the > similar function existing in Perl and PHP (at least). > > implode(text[], text) returns text - join array elements into a > string using given string delimiter > > I'm open to opinions on implode() -- I only picked implode() because > that's what it is called in PHP. Any suggestions? I think implode() and explode() go together. split() and join() are a pair. Pick one ;) -- Rod Taylor <[EMAIL PROTECTED]> PGP Key: http://www.rbt.ca/rbtpub.asc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Roadmap for FE/BE protocol redesign
> -Original Message- > From: Zeugswetter Andreas SB SD [mailto:[EMAIL PROTECTED] > Sent: 13 March 2003 17:07 > To: Hiroshi Inoue; Dave Page > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: RE: [HACKERS] Roadmap for FE/BE protocol redesign > > > If this where at the protocol level, the client could flag > what info is wanted, and thus avoid all unwanted overhead. But the client usually won't know at that point. For example, an ODBC app executes a query and reads the tuples returned. After that has occured, the app calls SQLDescribeCol to describe a column in the resultset... Regards, Dave. smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Problems with win32 patch/build
> I am working my way through getting the postgres win32 port (7.2.1) > compiled under MSVC. I got through the patch and the compilation, but > am having problems with the linker. Trying to link the backend, I get > 43 linker errors like: > unresolved external symbol GUC-scanstr > unresolved external symbol ProcessConfigFile > unresolved external symbol num_columns_read > unresolved external symbol Int_yychar > > These symbols are listed in the postgres.def file. Removing them > removes the errors. If I do so, I get a new set of errors: > > Linking... >Creating library ..\..\Debug/postgres.lib and object > ..\..\Debug/postgres.exp > postmaster.obj : error LNK2001: unresolved external symbol > _ProcessConfigFile > bootstrap.lib(bootstrap.obj) : error LNK2001: unresolved external symbol > _ProcessConfigFile > tcop.lib(postgres.obj) : error LNK2001: unresolved external symbol > _ProcessConfigFile > bootstrap.lib(bootstrap.obj) : error LNK2001: unresolved external symbol > _Int_yyparse > ..\..\Debug\backend.dll : fatal error LNK1120: 2 unresolved externals > Error executing link.exe > > > Also, when trying to compile ecpg, I get > flex: could not create '.\preproc\pgc.c' > > There were some weird things that went on during my first compile run > that I fixed as I went and I'm afraid I broke something. Any > suggestions? > > If I can get through this I'll start hitting patch #2 and start running > tests versus 7.3.x source. I wrote: I think I figured it out. lexx has a problem with the guc_file.l. I should be able to tackle that: sorry to pester. Merlin ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Roadmap for FE/BE protocol redesign
> -Original Message- > From: Hiroshi Inoue [mailto:[EMAIL PROTECTED] > Sent: 13 March 2003 10:04 > To: Dave Page > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: [HACKERS] Roadmap for FE/BE protocol redesign > > > Dave Page wrote: > > > > It's rumoured that Hiroshi Inoue once said: > > > > > > Does looking up by the catalog keys take no cost ? > > > > Obviously there is cost, but doing a lookup only on demand, > has got to > > be cheaper in the long run than including the entire column > definition > > in the message whether it's wanted or not? > > So if there are 100 fields, should we ask the backend > the column name 100 times ? No, the column name is there already. This is useful for additional info such as uniqueness, nullability, default value, contraints and so on. Surely you don't want to get the entire relevant bits of the system catalogues with every query unless we specifically request it? Regards, Dave. smime.p7s Description: S/MIME cryptographic signature
[HACKERS] Problems with win32 patch/build
I am working my way through getting the postgres win32 port (7.2.1) compiled under MSVC. I got through the patch and the compilation, but am having problems with the linker. Trying to link the backend, I get 43 linker errors like: unresolved external symbol GUC-scanstr unresolved external symbol ProcessConfigFile unresolved external symbol num_columns_read unresolved external symbol Int_yychar These symbols are listed in the postgres.def file. Removing them removes the errors. If I do so, I get a new set of errors: Linking... Creating library ..\..\Debug/postgres.lib and object ..\..\Debug/postgres.exp postmaster.obj : error LNK2001: unresolved external symbol _ProcessConfigFile bootstrap.lib(bootstrap.obj) : error LNK2001: unresolved external symbol _ProcessConfigFile tcop.lib(postgres.obj) : error LNK2001: unresolved external symbol _ProcessConfigFile bootstrap.lib(bootstrap.obj) : error LNK2001: unresolved external symbol _Int_yyparse ..\..\Debug\backend.dll : fatal error LNK1120: 2 unresolved externals Error executing link.exe Also, when trying to compile ecpg, I get flex: could not create '.\preproc\pgc.c' There were some weird things that went on during my first compile run that I fixed as I went and I'm afraid I broke something. Any suggestions? If I can get through this I'll start hitting patch #2 and start running tests versus 7.3.x source. Merlin ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] SQL99 ARRAY support proposal
Alvaro Herrera wrote: Yeah, and join is also consistent with the inverse function being called "split". IIRC the equivalent function in PHP is explode(). Actually it looks like PHP supports both explode() and split(), and their inverse functions implode() and join(). split() appears to split the string by regular expression, whereas explode() splits by a simple string separator. I was not intending to support regex in split(), so maybe the best choice of names is explode() and implode()? Joe ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] SQL99 ARRAY support proposal
johnn wrote: On Thu, Mar 13, 2003 at 09:53:15AM -0800, Joe Conway wrote: implode(text[], text) returns text - join array elements into a string using given string delimiter In both Perl and Python, that type of function is called "join". Hmmm -- I doubt that would fly, although I see it is specifically allowed as a function name (func_name_keyword list). Anyone have opinions on this either way? Joe ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] SQL99 ARRAY support proposal
On Thu, Mar 13, 2003 at 12:00:46PM -0600, johnn wrote: > On Thu, Mar 13, 2003 at 09:53:15AM -0800, Joe Conway wrote: > > implode(text[], text) returns text - join array elements into a > > string using given string delimiter > > > > I'm open to opinions on implode() -- I only picked implode() because > > that's what it is called in PHP. Any suggestions? > > In both Perl and Python, that type of function is called "join". Yeah, and join is also consistent with the inverse function being called "split". IIRC the equivalent function in PHP is explode(). -- Alvaro Herrera () Si no sabes adonde vas, es muy probable que acabes en otra parte. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] SQL99 ARRAY support proposal
On Thu, Mar 13, 2003 at 09:53:15AM -0800, Joe Conway wrote: > implode(text[], text) returns text - join array elements into a > string using given string delimiter > > I'm open to opinions on implode() -- I only picked implode() because > that's what it is called in PHP. Any suggestions? In both Perl and Python, that type of function is called "join". -john ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] SQL99 ARRAY support proposal
Peter Eisentraut wrote: array_pop(anyarray) returns anyelement That appears to mean that you return somearray[0] and alter the array as a side effect. How do you plan to do that? I'll give you this one -- doesn't make sense. split(text, text) returns text[] - split string into array on delimiter There was a thread on this last September, and the consensus was that the function *should* be name split, in order to be consistent with the similar function existing in Perl and PHP (at least). implode(text[], text) returns text - join array elements into a string using given string delimiter I'm open to opinions on implode() -- I only picked implode() because that's what it is called in PHP. Any suggestions? Joe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Barry Lind <[EMAIL PROTECTED]> writes: >> AFAICS the only context where this could make sense is binary >> transmission of parameters for a previously-prepared statement. We do >> have all the pieces for that on the roadmap. >> > Actually it is the select of binary data that I was refering to. Are > you suggesting that the over the wire format for bytea in a query result > will be binary (instead of the ascii encoded text format as it currently > exists)? See binary cursors ... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Roadmap for FE/BE protocol redesign
"Zeugswetter Andreas SB SD" <[EMAIL PROTECTED]> writes: > Also doesn't the planner/executor already have all needed info available ? Not directly, and not necessarily in the form that the client would want it in (eg, converting type OID to type name isn't free). I don't care to load either the backend or the protocol down with the responsibility for offering every piece of column data that a client could possibly want as part of RowDescription. Besides, elsewhere in this thread we were hearing about how RowDescription is already too much overhead for some people ;-) To my mind, the argument in favor of this feature is essentially that it saves ODBC/JDBC from needing to duplicate the backend's SQL parser; which is a legitimate concern. But that doesn't translate to saying that we should push functionality out of the clients and into the backend when it wouldn't be in the backend otherwise. That's just moving code around on the basis of some rather-shaky arguments about performance. And what happens when your client wants something different from the exact functionality that was pushed to the backend? You're back to square one. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] SQL99 ARRAY support proposal
Peter Eisentraut <[EMAIL PROTECTED]> writes: > OK, let's look at these more closely: >> array_push(anyarray, anyelement) returns anyarray > The standard spelling for that appears to be > somearray || ARRAY[element] > which also has the nice property that it is commutative. Sure ... but that just means that || is the operator name for the underlying array_push function. We still need a way to declare this operation as a function. >> array_pop(anyarray) returns anyelement > That appears to mean that you return somearray[0] and alter the array as a > side effect. How do you plan to do that? Yeah, I wasn't thinking very clearly there... >> array_subscript(anyarray, int) yields anyelement > That's just somearray[x], no? Yes. But the fact that we can now represent the semantics of [] as a function seems to me to indicate that we're on the right track in terms of generalizing the capabilities of functions. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] [SQL] What's wrong with this group by clause?
Manfred Koizar <[EMAIL PROTECTED]> writes: > ISTM you have found a Postgres 7.3 bug. Yeah. Actually, the planner bug has been there a long time, but it was only latent until the parser stopped suppressing duplicate GROUP BY items: 2002-08-18 14:46 tgl * src/backend/parser/parse_clause.c: Remove optimization whereby parser would make only one sort-list entry when two equal() targetlist items were to be added to an ORDER BY or DISTINCT list. Although indeed this would make sorting fractionally faster by sometimes saving a comparison, it confuses the heck out of later stages of processing, because it makes it look like the user wrote DISTINCT ON rather than DISTINCT. Bug reported by [EMAIL PROTECTED] 7.3 patch is attached if you need it. regards, tom lane *** src/backend/optimizer/plan/planner.c.orig Wed Mar 5 13:38:26 2003 --- src/backend/optimizer/plan/planner.cThu Mar 13 11:21:16 2003 *** *** 1498,1510 * are just dummies with no extra execution cost.) */ List *sort_tlist = new_unsorted_tlist(subplan->targetlist); int keyno = 0; List *gl; foreach(gl, groupClause) { GroupClause *grpcl = (GroupClause *) lfirst(gl); ! TargetEntry *te = nth(grpColIdx[keyno] - 1, sort_tlist); Resdom *resdom = te->resdom; /* --- 1498,1511 * are just dummies with no extra execution cost.) */ List *sort_tlist = new_unsorted_tlist(subplan->targetlist); + int grpno = 0; int keyno = 0; List *gl; foreach(gl, groupClause) { GroupClause *grpcl = (GroupClause *) lfirst(gl); ! TargetEntry *te = nth(grpColIdx[grpno] - 1, sort_tlist); Resdom *resdom = te->resdom; /* *** *** 1518,1523 --- 1519,1525 resdom->reskey = ++keyno; resdom->reskeyop = grpcl->sortop; } + grpno++; } Assert(keyno > 0); ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] SQL99 ARRAY support proposal
OK, let's look at these more closely: > >> array_push(anyarray, anyelement) returns anyarray The standard spelling for that appears to be somearray || ARRAY[element] which also has the nice property that it is commutative. > >> array_pop(anyarray) returns anyelement That appears to mean that you return somearray[0] and alter the array as a side effect. How do you plan to do that? > >> array_subscript(anyarray, int) yields anyelement That's just somearray[x], no? > >> singleton_array(anyelement) returns anyarray That's ARRAY[element]. > >> split(text, text) returns text[] > >> - split string into array on delimiter > >> implode(text[], text) returns text > >> - join array elements into a string using given string delimiter I can live with these, but perhaps we should choose some not-so-generic names. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] [SQL] What's wrong with this group by clause?
> > On Tue, 4 Nov 2003 18:28:12 -0300, Franco Bruno Borghesi > <[EMAIL PROTECTED]> wrote: > >Below you can find a simplified example of a real case. > >I don't understand why I'm getting the "john" record twice. > > ISTM you have found a Postgres 7.3 bug. > > I get one john with > PostgreSQL 7.1.3 on i686-pc-cygwin, compiled by GCC 2.95.3-5 > and > PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1 > > but two johns with > PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1 > > >/*EXAMPLE*/ > >CREATE TABLE people > >( > > name TEXT > >); > >INSERT INTO people VALUES ('john'); > >INSERT INTO people VALUES ('john'); > >INSERT INTO people VALUES ('pete'); > >INSERT INTO people VALUES ('pete'); > >INSERT INTO people VALUES ('ernest'); > >INSERT INTO people VALUES ('john'); > > > >SELECT > > 0 AS field1, > > 0 AS field2, > > name > >FROM > > people > >GROUP BY > > field1, > > field2, > > name; > > > > field1 | field2 | name > >++ > > 0 | 0 | john > > 0 | 0 | pete > > 0 | 0 | ernest > > 0 | 0 | john > >(4 rows) > PostgreSQL 7.2.3 on hppa-hp-hpux10.20, compiled by GCC 2.95.2 SELECT 0 AS field1, 0 AS field2,name FROM people GROUP BY field1, field2, name; field1 | field2 | name ++ 0 | 0 | ernest 0 | 0 | john 0 | 0 | pete (3 rows) PostgreSQL 7.3.2 on hppa-hp-hpux10.20, compiled by GCC 2.95.2 SELECT 0 AS field1, 0 AS field2,name FROM people GROUP BY field1, field2, name; field1 | field2 | name ++ 0 | 0 | john 0 | 0 | pete 0 | 0 | john 0 | 0 | pete 0 | 0 | john 0 | 0 | ernest (6 rows) I doubt this is a bug in 7.3.2 but in prior versions. I've cross-checked how another DBMS (HP's ALLBASE) handles GROUP BY without an aggregate, and it acts like 7.3.2. Regards, Christoph ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Tom Lane wrote: Barry Lind <[EMAIL PROTECTED]> writes: One addition I would personally like to see (it comes up in my apps code) is the ability to detect wheather the server is big endian or little endian. When using binary cursors this is necessary in order to read int data. Actually, my hope is to eliminate that business entirely by standardizing the on-the-wire representation for binary data; note the reference to send/receive routines in the original message. For integer data this is simple enough: network byte order will be it. I'm not sure yet what to do about float data. Great. 2) Better support for domains. Currently the jdbc driver is broken with regards to domains (although no one has reported this yet). The driver will treat a datatype that is a domain as an unknown/unsupported datatype. It would be great if the T response included the 'base' datatype for a domain attribute so that the driver would know what parsing routines to call to convert to/from the text representation the backend expects. I'm unconvinced that we need do this in the protocol, as opposed to letting the client figure it out with metadata inquiries. If we should, I'd be inclined to just replace the typeid field with the base typeid, and not mention the domain to the frontend at all. Comments? I don't have a strong opinion on this one. I can live with current functionality. It isn't too much work to look up the base type. So I would request the ability of the client to set a max rows parameter for query results. If a query were to return more than the max number of rows, the client would be given a handle (essentially a cursor name) that it could use to fetch additional sets of rows. How about simply erroring out if the query returns more than X rows? This shouldn't be an error condition. I want to fetch all of the rows, I just don't want to have to buffer them all in memory. Consider the following example. Select statement #1 is 'select id from foo', statement #2 is 'update bar set x = y where foo_id = ?'. The program logic issues statement #1 and then starts iterating through the results and the issues statement #2 for some of those results. If statement #1 returns a large number of rows the program can run out of memory if all the rows from #1 need to be buffered in memory. What would be nice is if the protocol allowed getting some rows from #1 but not all so that the connection could be used to issue some #2 statements. 4) Protocol level support of PREPARE. In jdbc and most other interfaces, there is support for parameterized SQL. If you want to take advantage of the performance benefits of reusing parsed plans you have to use the PREPARE SQL statement. This argument seems self-contradictory to me. There is no such benefit unless you're going to re-use the statement many times. Nor do I see how pushing PREPARE down to the protocol level will create any improvement in its performance. There is a benefit if you do reuse the statement multiple times. The performance problem is the two round trips minimum to the server that are required. A protocol solution to this would be to allow the client to send multiple requests at one time to the server. But as I type that I realize that can already be done, by having multiple semi-colon separated SQL commands sent at once. So I probably have everything I need for this already. I can just cue up the 'deallocate' calls and piggyback them on to the next real call to the server. So what I would like to see is the ability for the client to set a MAX VALUE size parameter. The server would send up to this amount of data for any column. If the value was longer than MAX VALUE, the server would respond with a handle that the client could use to get the rest of the value (in chunks of MAX VALUE) if it wanted to. I don't think I want to embed this in the protocol, either; especially not when we don't have even the beginnings of backend support for it. I think such a feature should be implemented and proven as callable functions first, and then we could think about pushing it down into the protocol. That is fine. 6) Better over the wire support for bytea. The current encoding of binary data \000 results in a significant expansion in the size of data transmitted. It would be nice if bytea data didn't result in 2 or 3 times data expansion. AFAICS the only context where this could make sense is binary transmission of parameters for a previously-prepared statement. We do have all the pieces for that on the roadmap. Actually it is the select of binary data that I was refering to. Are you suggesting that the over the wire format for bytea in a query result will be binary (instead of the ascii encoded text format as it currently exists)? regards, tom lane I am looking forward to all of the protocol changes. thanks, --Barry ---(end of broadcast)---
Re: [HACKERS] Request for quote looking to pay for work
Thought I would change the title to see if anyone would respond? It wasn't clear but the intent was to support the following in the backend. On Tue, 2003-03-11 at 18:20, Dave Cramer wrote: > I have a customer porting an application from informix to postgres. They > require 2 things: > > 1) Cursors outside of transactions. > 2) For update cursors as well as where current of > > If anyone is interested in this work, please reply off list. -- Dave Cramer <[EMAIL PROTECTED]> Cramer Consulting ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Barry Lind wrote: 3) Protocol level support for CURSORs. It would be nice if cursor support was done at the protocol level and not as a SQL command. I want to second this proposal. Currently I avoid using cursors in my programs since a) they need much more logic and _string_concatenation_ to be handled transparently by a library (prepend the query with DECLARE X CURSOR FOR), then (FETCH n FROM X), then (CLOSE X). That's inefficient. b) I have a really bad feeling to have the backend parse (FETCH FROM X) every time I ask for a (single) row c) I hate that the backend retransmits column names etc. for every fetch I issue. This information is mostly unneeded but the backend cannot know better Of course these issues can be addressed by using FETCH n (n>10) but this kludge is only needed because the FETCH protocol is so inefficient. Think about the amount of bytes transferred for "select 2000 lines of integers" with and without declare/fetch/close. Imagine a result set of 1 to 2 integers given back (depending on parameters) for an interactive program (e.g. browsing a customer list by initials). Prefer a cursor (much more constant overhead even for single results) or all in one (and wait longer for a first result)? I'd love to tell the backend to give a "descriptor" for this query back and use it efficiently to get data and/or metadata (see ODBC, JDBC, sqlda or dynamic sql). Perhaps it's most efficient to ask for N initial results (which are instantly returned). Christof (who implemented dynamic sql for ecpg) PS: perhaps this protocol infrastructure is also well suited to return large bytea values ( return a descriptor). [Also proposed by Barry Lind.] PPS: I'm perfectly fine with returning attrelid/attnum. Then the client can control how many effort is spent for determining only the asked for metadata. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Tom Lane <[EMAIL PROTECTED]> writes: > Barry Lind <[EMAIL PROTECTED]> writes: > > > 4) Protocol level support of PREPARE. In jdbc and most other > > interfaces, there is support for parameterized SQL. If you want to take > > advantage of the performance benefits of reusing parsed plans you have > > to use the PREPARE SQL statement. > > This argument seems self-contradictory to me. There is no such benefit > unless you're going to re-use the statement many times. Nor do I see > how pushing PREPARE down to the protocol level will create any > improvement in its performance. "you're going to re-use the statement many times" is true (or should be true) for every statement in every web site and other OLTP system. Even if the query appears on only a single web page and is executed only once on that web page, the nature of high volume web sites is that that page will be executed hundreds or thousands of times per minute. This is why the Perl DBI, for example, has a prepare_cached() which provides a automatic caching of prepared handles. With Oracle I was able to use this exclusively on a large high volume web site to keep thousands of prepared handles. Every query was prepared only once per apache process. There is a performance benefit to using placeholders and prepared queries in that the plan doesn't need to be regenerated repeatedly. Ideally every query should either be a big DSS query where the time spent in the optimizer is irrelevant, or an OLTP transaction using placeholders where again the time spent in the optimizer is irrelevant because it only needs to be run once. This would allow the optimizer to grow in complexity. For example it could explore both sides of the decision tree in places where now we have heuristics to pick the probable better plan. Postgres's optimizer is pretty impressive currently, but the constant attention to avoiding high cost optimizations limits it. There is also a security benefit. The idea of mixing parameters into the queries even at the driver level gives me the willies. The database then has to parse them back out of the query string. If there's a bug in the driver or any kind of mismatch between the backend parser and the driver quoting then there could be security holes. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [SQL] What's wrong with this group by clause?
[forwarding to -hackers] On Tue, 4 Nov 2003 18:28:12 -0300, Franco Bruno Borghesi <[EMAIL PROTECTED]> wrote: >Below you can find a simplified example of a real case. >I don't understand why I'm getting the "john" record twice. ISTM you have found a Postgres 7.3 bug. I get one john with PostgreSQL 7.1.3 on i686-pc-cygwin, compiled by GCC 2.95.3-5 and PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1 but two johns with PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.7.2.1 >/*EXAMPLE*/ >CREATE TABLE people >( > name TEXT >); >INSERT INTO people VALUES ('john'); >INSERT INTO people VALUES ('john'); >INSERT INTO people VALUES ('pete'); >INSERT INTO people VALUES ('pete'); >INSERT INTO people VALUES ('ernest'); >INSERT INTO people VALUES ('john'); > >SELECT > 0 AS field1, > 0 AS field2, > name >FROM > people >GROUP BY > field1, > field2, > name; > > field1 | field2 | name >++ > 0 | 0 | john > 0 | 0 | pete > 0 | 0 | ernest > 0 | 0 | john >(4 rows) Same for SELECT 0 AS field1, 0 AS field2, name FROM people GROUP BY 1, 2, name; Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Hiroshi Inoue kirjutas N, 13.03.2003 kell 12:03: > Dave Page wrote: > > > > > Does looking up by the catalog keys take no cost ? > > > > Obviously there is cost, but doing a lookup only on demand, has got to be > > cheaper in the long run than including the entire column definition in the > > message whether it's wanted or not? > > So if there are 100 fields, should we ask the backend > the column name 100 times ? or once for all columns if you prefer using IN. > regards, > Hiroshi Inoue > http://www.geocities.jp/inocchichichi/psqlodbc/ > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Roadmap for FE/BE protocol redesign
> > Obviously there is cost, but doing a lookup only on demand, has got to be > > cheaper in the long run than including the entire column definition in the > > message whether it's wanted or not? > > So if there are 100 fields, should we ask the backend > the column name 100 times ? No, you do a single select that returns 100 results... Then you cache them in your frontend... Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Roadmap for FE/BE protocol redesign
Dave Page wrote: > > It's rumoured that Hiroshi Inoue once said: > > Tom Lane wrote: > >> > >> "Dave Page" <[EMAIL PROTECTED]> writes: > >> > No, but with them we can avoid cluttering the wire protocol with > >> > fields for all this, and the JDBC required data. With 2 numeric > >> > columns (attrelid, attnum), any application/interface can query the > >> > system catalogs easily for whatever extra info they like. > >> > >> This is my feeling also. We shouldn't try to guess in the protocol > >> exactly what set of information will be wanted by a frontend; we > >> should just provide the catalog keys needed to look up whatever is > >> wanted. > > > > Does looking up by the catalog keys take no cost ? > > Obviously there is cost, but doing a lookup only on demand, has got to be > cheaper in the long run than including the entire column definition in the > message whether it's wanted or not? So if there are 100 fields, should we ask the backend the column name 100 times ? regards, Hiroshi Inoue http://www.geocities.jp/inocchichichi/psqlodbc/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Roadmap for FE/BE protocol redesign
It's rumoured that Hiroshi Inoue once said: > Tom Lane wrote: >> >> "Dave Page" <[EMAIL PROTECTED]> writes: >> > No, but with them we can avoid cluttering the wire protocol with >> > fields for all this, and the JDBC required data. With 2 numeric >> > columns (attrelid, attnum), any application/interface can query the >> > system catalogs easily for whatever extra info they like. >> >> This is my feeling also. We shouldn't try to guess in the protocol >> exactly what set of information will be wanted by a frontend; we >> should just provide the catalog keys needed to look up whatever is >> wanted. > > Does looking up by the catalog keys take no cost ? Obviously there is cost, but doing a lookup only on demand, has got to be cheaper in the long run than including the entire column definition in the message whether it's wanted or not? Regards, Dave. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] bug in setval?
It's rumoured that Christopher Kings-Lynne once said: > When I create a new table with a serial column, the first row defaults > to inserting '1'. > > If I delete all the rows from the table and want to reset the sequence, > I can't: > > ERROR: users_health_types_type_id_seq.setval: value 0 is out of bounds > (1,9223372036854775807) > > How do I set the sequence to have next value = 1? Surely the bounds > should begin at zero? That's bugged me for ages as well. I just never got round to asking about it... Regards, Dave. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]