Re: [HACKERS] Reducing data type space usage
Gregory Stark writes: > Tom Lane <[EMAIL PROTECTED]> writes: > > > > There isn't if you want the type to also handle long strings. > > But what if we restrict it to short strings? See my > message just now. > > Then it seems like it imposes a pretty hefty burden on the user. > But there are a lot of places where it wins: - single byte for a multi-state flag - hex representation of a hash (like SHA-1) - part numbers - lots of fields imported from legacy systems - ZIP/Postal codes And for all of those you can decisively say at design time that 127 characters is an OK limit. +1 for Bruce/Tom's idea. Regards, Paul Bort ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] The enormous s->childXids problem
Theo Schlossnagle <[EMAIL PROTECTED]> writes: > I've tracked the problem I mentioned earlier with my 4.5 million node linked > list of s->childXids down. We use plperl to connect to Oracle over DBI. The > select function is dbi-link's remote_select. remote_select will perform the > query and then for each row return_next which calls the SPI.xs stuff to do > plperl_return_next which is wrapped in a PG_TRY block. I see the value of > the > try block to kick back sensible errors to perl, but creating childXids for > every row of a setof seems wildly excessive. What's the harm in simply not > TRY'ing around there? PG_TRY alone just sets up a longmp handler. Often it is used in conjunction with subtransactions though and I wouldn't be surprised if that was happening here but it's not quite right there. If you look in plperl.c you'll see a number of places that do call BeginInternalSubTransaction (and each one has a comment above that mentions "sub-transaction"). They use PG_TRY blocks to recover control so they can abort the subtransaction and throw a perl error. However plperl_return_next is one of the few cases that *doesn't*. I'm not sure exactly by what logic it gets an exception. I suppose the idea is that there aren't very many SQL errors return next can actually trigger. Anyways, perhaps you're also calling one of the other functions like plperl_spi_{exec,query,fetchrow,prepare,exec_prepared,query_prepared}? I wouldn't expect so given that you're actually doing Oracle queries though. > I ask with respect to the suitability as general solution and as the > suitability for my acute issue (of a 5 million row setof returned from that). > Will it break anything? If you don't have a subtransaction then you can't really recover from any error. There will be locks left over, memory allocated that isn't freed etc. PG_TRY will recover control but you're pretty much stuck rethrowing it eventually. In other words, any errors will require that you roll back the whole transaction. I'm not sure what happens to perl when you longjmp out of the perl interpreter instead of finishing execution normally. Subtransactions and PG_TRY are both relatively new and Postgres has had plperl for a lot longer so I imagine there's a way to get it to work. I don't remember how it used to work though. Perhaps it set a flag and returned from the perl interpreter normally and then rethrew the error from outside the perl interpreter? Or perhaps perl is ok with you longjmping out from inside it. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] polite request about syntax
On Fri, 15 Sep 2006, Dave Page wrote: > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of > > Ricardo Malafaia > > Sent: 15 September 2006 16:35 > > To: Andrew Dunstan > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] polite request about syntax > > > > my complaint is that, like i said, "timestamp with time zone" is no > > good substitute for a simple "datetime". Here, someone suggested a > > CREATE DOMAIN to create an alias for it. Why isn't it provided there > > out-of-the-box by default? So you have the SQL standard timestamp and > > the industry standard datetime. > > Because adding everybody's idea of industry-standard typenames, function > name, operators etc will lead to bloated system catalogs and insanity > for people trying to understand what differences between objects there > may or may not be. > > We follow the SQL standards. If you need to provide compatibility types > and functions to migrate from another product, then unlike many others > we provide the capability for you to add them yourself. I hate to comment on what is shaping up to be a bit of a tinderbox of a thread, but I can't help myself. When I was first dealing with postgres, I found it extremely annoying that I had to type out "double precision" rather than just "double" since every sane programming language (as well as Java) uses double. I eventually figured out that it was because double precision is the standard name, but I don't like to type and although I know I could use float8, I am used to typing double. I have found the same thing with the type "timestamp without time zone". The verbosity of type names seems rather extreme. But it is just not important enough to warrant me creating a domain or anything to do anything about it, it just slightly irks me every time I have to type them. I have probably now written more on this than it deserves :) -- Fertility is hereditary. If your parents didn't have any children, neither will you. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] The enormous s->childXids problem
I've tracked the problem I mentioned earlier with my 4.5 million node linked list of s->childXids down. We use plperl to connect to Oracle over DBI. The select function is dbi-link's remote_select. remote_select will perform the query and then for each row return_next which calls the SPI.xs stuff to do plperl_return_next which is wrapped in a PG_TRY block. I see the value of the try block to kick back sensible errors to perl, but creating childXids for every row of a setof seems wildly excessive. What's the harm in simply not TRY'ing around there? I ask with respect to the suitability as general solution and as the suitability for my acute issue (of a 5 million row setof returned from that). Will it break anything? Best regards, Theo // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Reducing data type space usage
Tom Lane <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> Tom said he didn't think there was enough code space and my own >> experimentation was slowly leading me to agree, sadly. > > There isn't if you want the type to also handle long strings. > But what if we restrict it to short strings? See my message > just now. Then it seems like it imposes a pretty hefty burden on the user. text columns, for example, can never take advantage of it. And there are plenty of instances where 127 bytes would be just short enough to be annoying even though 99% of the data would in fact be shorter. Things like "address" and "product name" for example. The user would have to decide that he'll never need a value over 127 bytes long ever in order to get the benefit. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [COMMITTERS] pgsql: sslinfo contrib module - information
Tom Lane wrote: > [EMAIL PROTECTED] (Peter Eisentraut) writes: > > sslinfo contrib module - information about current SSL certificate > > Author: Victor Wagner <[EMAIL PROTECTED]> > > It was premature to add this: Bruce is still trying to get a copyright > assignment out of the author. Test of wrapping of subject, please ignore. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [COMMITTERS] pgsql: sslinfo contrib module -
OK, I just emailed the long subject line to my gmail account, and got the full text: Re: [COMMITTERS] pgsql: sslinfo contrib module - information about current SSL Interestingly, I looked at a copy of the email I sent to start the truncated subject thread and see in my saved mbox file: From bruce Sun Sep 3 23:57:49 2006 Subject: Re: [HACKERS] [COMMITTERS] pgsql: sslinfo contrib module - information about current SSL In-Reply-To: <[EMAIL PROTECTED]> To: Tom Lane <[EMAIL PROTECTED]> Note the newline in the subject. What confuses me is how that newline got in there if replying to other email didn't have that problem. One thing I did do is to copy text from another email into that email. --- Bruce Momjian wrote: > Bruce Momjian wrote: > > Peter Eisentraut wrote: > > > Tom Lane wrote: > > > > Fixed --- I noticed it about the same time you did. I'm surprised > > > > Peter didn't get a Makefile right the first time though ... > > > > > > I'm surprised how crazy the contrib makefiles got while I wasn't > > > looking. :) I was glad to get something working at all. The > > > uniformity isn't that great at the moment and I'm unsure about the > > > purpose of all that pgxs stuff there, but I think we've had that > > > discussion and I'm not interested in reopening it. > > > > Full reply to long subject line for Alvaro. Note that the subject is a > > single line, not multiple lines. The subject as sent is: > > > > Subject: Re: [COMMITTERS] pgsql: sslinfo contrib module - information > > about current SSL > > OK, as a reply from the lists I see: > > Subject: Re: [HACKERS] [COMMITTERS] pgsql: sslinfo contrib module - > information > > which has removed "about current SSL". Interestingly, in the mbox file > I see later emails with the even shorter subject: > > Subject: Re: [COMMITTERS] pgsql: sslinfo contrib module - > > I do see that short subject thread starting with my post. Would someone > else post with that subject and see what we get? How do I see the > subject as sendmail is sending it out? > > -- > Bruce Momjian [EMAIL PROTECTED] > EnterpriseDBhttp://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [COMMITTERS] pgsql: sslinfo contrib module - information
Tom Lane wrote: > [EMAIL PROTECTED] (Peter Eisentraut) writes: > > sslinfo contrib module - information about current SSL certificate > > Author: Victor Wagner <[EMAIL PROTECTED]> > > It was premature to add this: Bruce is still trying to get a copyright > assignment out of the author. Test of wrapping of subject. Ignore. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [COMMITTERS] pgsql: sslinfo contrib module -
Bruce Momjian wrote: > Peter Eisentraut wrote: > > Tom Lane wrote: > > > Fixed --- I noticed it about the same time you did. I'm surprised > > > Peter didn't get a Makefile right the first time though ... > > > > I'm surprised how crazy the contrib makefiles got while I wasn't > > looking. :) I was glad to get something working at all. The > > uniformity isn't that great at the moment and I'm unsure about the > > purpose of all that pgxs stuff there, but I think we've had that > > discussion and I'm not interested in reopening it. > > Full reply to long subject line for Alvaro. Note that the subject is a > single line, not multiple lines. The subject as sent is: > > Subject: Re: [COMMITTERS] pgsql: sslinfo contrib module - information about > current SSL OK, as a reply from the lists I see: Subject: Re: [HACKERS] [COMMITTERS] pgsql: sslinfo contrib module - information which has removed "about current SSL". Interestingly, in the mbox file I see later emails with the even shorter subject: Subject: Re: [COMMITTERS] pgsql: sslinfo contrib module - I do see that short subject thread starting with my post. Would someone else post with that subject and see what we get? How do I see the subject as sendmail is sending it out? -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [COMMITTERS] pgsql: sslinfo contrib module - information
Peter Eisentraut wrote: > Tom Lane wrote: > > Fixed --- I noticed it about the same time you did. I'm surprised > > Peter didn't get a Makefile right the first time though ... > > I'm surprised how crazy the contrib makefiles got while I wasn't > looking. :) I was glad to get something working at all. The > uniformity isn't that great at the moment and I'm unsure about the > purpose of all that pgxs stuff there, but I think we've had that > discussion and I'm not interested in reopening it. Full reply to long subject line for Alvaro. Note that the subject is a single line, not multiple lines. The subject as sent is: Subject: Re: [COMMITTERS] pgsql: sslinfo contrib module - information about current SSL -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SPAM?] Re: [HACKERS] Optimize ORDER BY ... LIMIT
On Fri, Sep 15, 2006 at 10:06:16PM +0100, Gregory Stark wrote: > > I'm curious, as I may be such an offender. What alternatives exist? > > ... > > What alternatives to limit/offset exist? If there are thousands or > > more results, I have trouble with an idea that the entire results > > should be queried, and cached, displaying only a fraction. > If you have a unique index and instead of using OFFSET you pass > along the last key of the previous page then you can use a WHERE > clause on the indexed column to go straight to the correct page > rather than using OFFSET. So for example if you're displaying bank > transactions sorted by transaction_id you have the "next page" > button pass along the "start_transaction_id=nnn" where nnn is the > last transaction_id of the previous page. Then on the next page you > do a query with "WHERE transaction_id > ?" and pass that column. You > still use ORDER BY transaction_id and LIMIT. I found benefits to doing things this way that were not related to performance. If the number of items leading up to your page changes, remembering the offset can result in listing a completely different page than you intended when paging forward or backwards. On my pages, I prefer to define one of the items as the item I am looking at, and page seeking is always +/- 1 page from that item. This means that I am pretty close to what you are suggesting - except - because I do this for functional reasons, and not for performance reasons, I am doing something worse. I use COUNT(*) and WHERE as you describe above to map this identifier to an offset, and then a second SELECT with LIMIT/OFFSET to describe the object and the those that follow on the page. According to your suggestion, I think this means I should track the identifier with the last offset, displaying the offset to the user for information purposes only, not using it for any queries, and then use WHERE and LIMIT? I tried this out. EXPLAIN ANALYZE tells me that for a random offset=200, limit=20 case I tried, the simple change changes it from index scanning 207 rows to find 7 rows, to index scanning 7 rows to find 7 rows. Sweet. Unfortunately, the time to complete is unchanged around 1.3+/-0.2 milliseconds. Looks like my system has bigger bottlenecks. :-) Thanks, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Emacs local vars at the tail of every file
Peter Eisentraut wrote: > Gregory Stark wrote: > > I don't know if this changes the calculus but apparently we've > > already decided to go down the route of having Emacs local variables > > attached to every file in the source directory. We have things like > > this there: > > I delete them from every file I edit, but I haven't been interested > enough to remove them all. Feel free to do the legwork. Done. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] log_duration is redundant, no?
Tom Lane wrote: > "Guillaume Smet" <[EMAIL PROTECTED]> writes: > > If we consider that the prepare and the bind operations are important > > (and I agree they can be), I wonder why do we remove the output we > > have when log_min_duration_statement is set to 0 (I'm thinking of the > > parse: and bind: lines)? > > Well, we remove it for the execute: too if you have only log_duration > on. My view of this is that log_duration is meant to find out the total > amount of time spent doing stuff, and you set log_min_duration_statement > to whatever your threshold of pain is for the amount of time spent doing > a single thing. If it's less than log_min_duration_statement then > you're saying you don't care about the details of that individual step, > only the aggregate runtime. It might make sense to log _what_ is going on, without telling all the little details, for example LOG: parse duration: 0.250 ms LOG: bind duration: 0.057 ms LOG: execute my_query: SELECT * FROM shop WHERE $1 = $2 DETAIL: parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets' -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Reducing data type space usage
Gregory Stark wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > > Oh, OK, I had high byte meaning no header > > Just how annoying would it be if I pointed out I suggested precisely this a > few days ago? > > Tom said he didn't think there was enough code space and my own > experimentation was slowly leading me to agree, sadly. It would be neat to get > it to work though. I didn't see the 7-bit thing. Guess I missed it. I was worried we would not have enough bits to track a 1-gig field. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Reducing data type space usage
Gregory Stark <[EMAIL PROTECTED]> writes: > Tom said he didn't think there was enough code space and my own > experimentation was slowly leading me to agree, sadly. There isn't if you want the type to also handle long strings. But what if we restrict it to short strings? See my message just now. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Reducing data type space usage
Bruce Momjian <[EMAIL PROTECTED]> writes: > Oh, OK, I had high byte meaning no header, but clear is better, so > 0001 is 0x01, and is "". But I see now that bytea does > store nulls, so yea, we would be better using 1001, and it is the > same size as . I'm liking this idea more the more I think about it, because it'd actually be far less painful to put into the system structure than the other idea of fooling with varlena headers. To review: Bruce is proposing a var-length type structure with the properties first byte 0xxx field length 1 byte, exactly that value first byte 1xxx xxx data bytes follow This can support *any* stored value from zero to 127 bytes long. We can imagine creating new datatypes "short varchar" and "short char", and then having the parser silently substitute these types for varchar(N) or char(N) whenever N <= 127 / max_encoding_length. Add some appropriate implicit casts to convert these to the normal varlena types for computation, and away you go. No breakage of any existing datatype-specific code, just a few additions in places like heap_form_tuple. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Reducing data type space usage
Bruce Momjian <[EMAIL PROTECTED]> writes: > Oh, OK, I had high byte meaning no header Just how annoying would it be if I pointed out I suggested precisely this a few days ago? Tom said he didn't think there was enough code space and my own experimentation was slowly leading me to agree, sadly. It would be neat to get it to work though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Reducing data type space usage
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> No, it'll be a 1-byte header with length indicating that no bytes > >> follow, > > > Well, in my idea, 1001 would be 0x01. I was going to use the > > remaining 7 bits for the 7-bit ascii value. > > Huh? I thought you said 0001 would be 0x01, that is, high bit > clear means a single byte containing an ASCII character. You could > reverse that but it just seems to make things harder --- the byte > isn't a correct data byte by itself, as it would be with the other > convention. Oh, OK, I had high byte meaning no header, but clear is better, so 0001 is 0x01, and is "". But I see now that bytea does store nulls, so yea, we would be better using 1001, and it is the same size as . -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Reducing data type space usage
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> No, it'll be a 1-byte header with length indicating that no bytes >> follow, > Well, in my idea, 1001 would be 0x01. I was going to use the > remaining 7 bits for the 7-bit ascii value. Huh? I thought you said 0001 would be 0x01, that is, high bit clear means a single byte containing an ASCII character. You could reverse that but it just seems to make things harder --- the byte isn't a correct data byte by itself, as it would be with the other convention. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Reducing data type space usage
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > FYI, we also need to figure out how to store a zero-length string. That > > will probably be high-bit, and then all zero bits. We don't store a > > zero-byte in strings, so that should be unique for "". > > No, it'll be a 1-byte header with length indicating that no bytes > follow, which likely will be 1001 rather than 1000 ... but > in either case there is no ambiguity involved. Well, in my idea, 1001 would be 0x01. I was going to use the remaining 7 bits for the 7-bit ascii value. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Reducing data type space usage
Bruce Momjian <[EMAIL PROTECTED]> writes: > FYI, we also need to figure out how to store a zero-length string. That > will probably be high-bit, and then all zero bits. We don't store a > zero-byte in strings, so that should be unique for "". No, it'll be a 1-byte header with length indicating that no bytes follow, which likely will be 1001 rather than 1000 ... but in either case there is no ambiguity involved. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Reducing data type space usage
Gregory Stark wrote: > Case 2) Data types that are different sizes depending on the typmod but are > always >the same size that can be determined statically for a given typmod. In the >case of a ASCII encoded database CHAR(n) fits this category and in any case >we'll eventually have per-column encoding. NUMERC(a,b) could also be made >to fit this as well. > >In cases like these we don't need *any* varlena header. If we could arrange >for the functions to have enough information to know how large the data >must be. I thought about the CHAR(1) case some more. Rather than restrict single-byte storage to ASCII-encoded databases, I think there is a more general solution. First, I don't think any solution that assumes typmod will be around to help determine the meaning of the column is going to work. I think what will work is to store a 1-character, 7-bit ASCII value in one byte, by setting the high bit. This will work for any database encoding. This is the zero-length header case. If the 1-character has a high bit, will require a one-byte length header and then the high-bit byte, and if it is multi-byte, perhaps more bytes. Zero-length header will even work for a VARCHAR(8) field that stores one 7-bit ASCII character, because it isn't relying on the typmod. FYI, we also need to figure out how to store a zero-length string. That will probably be high-bit, and then all zero bits. We don't store a zero-byte in strings, so that should be unique for "". -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration is redundant, no?
"Guillaume Smet" <[EMAIL PROTECTED]> writes: > If we consider that the prepare and the bind operations are important > (and I agree they can be), I wonder why do we remove the output we > have when log_min_duration_statement is set to 0 (I'm thinking of the > parse: and bind: lines)? Well, we remove it for the execute: too if you have only log_duration on. My view of this is that log_duration is meant to find out the total amount of time spent doing stuff, and you set log_min_duration_statement to whatever your threshold of pain is for the amount of time spent doing a single thing. If it's less than log_min_duration_statement then you're saying you don't care about the details of that individual step, only the aggregate runtime. log_statement has another goal entirely, which is to record *what* is being done in a semantic sense, and so for that I think it makes sense to log only actual executions (and not parse/bind leading up to 'em). The only asymmetry in the thing is that if log_statement fired then we suppress duplicate printing of the query in the later duration log message (if any) for that query. But that seems like the right thing if you're at all concerned about log volume. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] question regarding regression tests
Thank you. On Fri, 2006-09-15 at 17:41 -0400, Alvaro Herrera wrote: > Gevik Babakhani wrote: > > Folks, > > > > Could someone please provide information about how to create a correct > > regression test? > > Some information you can find on the PGXS docs: > > > The scripts listed in the REGRESS variable are used for > regression testing of your module, just like make > installcheck is used for the main > PostgreSQL server. For this to work you need > to have a subdirectory named sql/ in your extension's > directory, within which you put one file for each group of tests you want > to run. The files should have extension .sql, which > should not be included in the REGRESS list in the > makefile. For each test there should be a file containing the expected > result in a subdirectory named expected/, with > extension > .out. The tests are run by executing make > installcheck, and the resulting output will be compared to the > expected files. The differences will be written to the file > regression.diffs in diff -c format. > Note that trying to run a test which is missing the expected file will be > reported as trouble, so make sure you have all expected > files. > > > Note that not all of this applies to the backend regression tests, but > it should help. For the backend, you list the tests in the > serial_schedule and parallel_schedule files. > ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Reducing data type space usage
Gregory Stark wrote: Case 2) Solving this is quite difficult without introducing major performance problems or security holes. The one approach we have that's practical right now is introducing special data types such as the oft-mentioned "char" data type. "char" doesn't have quite the right semantics to use as a transparent substitute for CHAR but we could define a CHAR(1) with exactly the right semantics and substitute it transparently in parser/analyze.c (btw having two files named analyze.c is pretty annoying). We could do the same with NUMERIC(a,b) for sufficiently small values of a and b with something like D'Arcy's CASH data type (which uses an integer internally). Didn't we discuss a problem with using CHAR(n), specifically that the number of bytes required to store n characters is variable? I had suggested making an ascii1 type, ascii2 type, etc. Someone else seemed to be saying that should be called bytea1, bytea2, or perhaps with the parenthesis bytea(1), bytea(2). The point being that it is a fixed number of bytes. The problem with defining lots of data types is that the number of casts and cross-data-type comparisons grows quadratically as the number of data types grows. In theory we would save space by defining a CHAR(n) for whatever size n the user needs but I can't really see anything other than CHAR(1) being worthwhile. Similarly a 4-byte NUMERIC substitute like CASH (with full NUMERIC semantics though) and maybe a 2-byte and 8-byte substitute might be reasonable but anything else would be pointless. Wouldn't a 4-byte numeric be a "float4" and an 8-byte numeric be a "float8". I'm not sure I see the difference. As for a 2-byte floating point number, I like the idea and will look for an ieee specification for how the bits are arranged, if any such ieee spec exists. mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] log_duration is redundant, no?
On 9/16/06, Tom Lane <[EMAIL PROTECTED]> wrote: Well, considering that the parse and bind may take longer than the execute, I hardly think we want to ignore them for log_duration purposes. And we agreed that if log_duration is on and log_min_duration_statement is not triggered, log_duration should print *only* duration. So I'm not sure what else you expected. I don't know exactly what I expected. I'm just surprised to have only the duration when log_statement is set to all. If we consider that the prepare and the bind operations are important (and I agree they can be), I wonder why do we remove the output we have when log_min_duration_statement is set to 0 (I'm thinking of the parse: and bind: lines)? (sorry for the double post, I forgot to cc: the list) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] question regarding regression tests
Gevik Babakhani wrote: > Folks, > > Could someone please provide information about how to create a correct > regression test? Some information you can find on the PGXS docs: The scripts listed in the REGRESS variable are used for regression testing of your module, just like make installcheck is used for the main PostgreSQL server. For this to work you need to have a subdirectory named sql/ in your extension's directory, within which you put one file for each group of tests you want to run. The files should have extension .sql, which should not be included in the REGRESS list in the makefile. For each test there should be a file containing the expected result in a subdirectory named expected/, with extension .out. The tests are run by executing make installcheck, and the resulting output will be compared to the expected files. The differences will be written to the file regression.diffs in diff -c format. Note that trying to run a test which is missing the expected file will be reported as trouble, so make sure you have all expected files. Note that not all of this applies to the backend regression tests, but it should help. For the backend, you list the tests in the serial_schedule and parallel_schedule files. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] log_duration is redundant, no?
"Guillaume Smet" <[EMAIL PROTECTED]> writes: > Is it normal that when I set log_duration to on and log_statement to > all, I have the following output when I prepare/bind/execute a > prepared statement using the protocol: > LOG: duration: 0.250 ms > LOG: duration: 0.057 ms > LOG: execute my_query: SELECT * FROM shop WHERE $1 = $2 > DETAIL: parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets' > I suppose the first line is the prepare and the second line is the > bind but I'm not sure it's the desired behaviour. Well, considering that the parse and bind may take longer than the execute, I hardly think we want to ignore them for log_duration purposes. And we agreed that if log_duration is on and log_min_duration_statement is not triggered, log_duration should print *only* duration. So I'm not sure what else you expected. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] log_duration is redundant, no?
On 9/8/06, Tom Lane <[EMAIL PROTECTED]> wrote: It's done already ... (Working on implementing the last changes you made in formatting in pgFouine) Is it normal that when I set log_duration to on and log_statement to all, I have the following output when I prepare/bind/execute a prepared statement using the protocol: LOG: duration: 0.250 ms LOG: duration: 0.057 ms LOG: execute my_query: SELECT * FROM shop WHERE $1 = $2 DETAIL: parameters: $1 = 'Clothes Clothes Clothes', $2 = 'Joe''s Widgets' I suppose the first line is the prepare and the second line is the bind but I'm not sure it's the desired behaviour. Any comment? -- Guillaume ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [COMMITTERS] pgsql: Sequences were not being shown due to the use of lowercase 's'
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> What became of my objection that the test should be on USAGE privilege >> for the containing schema instead? > Was this addressed? Yes, we arrived at this: http://archives.postgresql.org/pgsql-committers/2006-09/msg00252.php which does what Greg wanted but without the kluges. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )
"Rocco Altier" <[EMAIL PROTECTED]> writes: > With the patch attached this time... The proposed patch to Makefile.shlib makes me gag :-( ... lying to make about what's the purpose of a rule is seldom a good idea. Please try as attached instead. Also, I am *really* dubious about the change to ecpg/test/Makefile.regress --- if that's necessary then this whole exercise is wrong. regards, tom lane *** src/Makefile.shlib.orig Thu Apr 27 22:53:20 2006 --- src/Makefile.shlib Fri Sep 15 18:11:30 2006 *** *** 96,103 soname = lib$(NAME)$(DLSUFFIX).$(SO_MAJOR_VERSION) ifeq ($(PORTNAME), aix) ! shlib = lib$(NAME)$(DLSUFFIX) ! # SHLIB_LINK+= -lc endif ifeq ($(PORTNAME), darwin) --- 96,103 soname = lib$(NAME)$(DLSUFFIX).$(SO_MAJOR_VERSION) ifeq ($(PORTNAME), aix) ! shlib = lib$(NAME)$(DLSUFFIX).$(SO_MAJOR_VERSION) ! haslibarule = yes endif ifeq ($(PORTNAME), darwin) *** *** 295,303 else # PORTNAME == aix # AIX case ! $(shlib): lib$(NAME).a $(MKLDEXPORT) lib$(NAME).a > lib$(NAME)$(EXPSUFF) ! $(COMPILER) $(LDFLAGS_NO_L) $(LDFLAGS_SL) -o $@ $< -Wl,-bE:lib$(NAME)$(EXPSUFF) $(SHLIB_LINK) endif # PORTNAME == aix --- 295,307 else # PORTNAME == aix # AIX case ! $(shlib) lib$(NAME).a: $(OBJS) ! $(LINK.static) lib$(NAME).a $^ ! $(RANLIB) lib$(NAME).a $(MKLDEXPORT) lib$(NAME).a > lib$(NAME)$(EXPSUFF) ! $(COMPILER) $(LDFLAGS_NO_L) $(LDFLAGS_SL) -o $(shlib) lib$(NAME).a -Wl,-bE:lib$(NAME)$(EXPSUFF) $(SHLIB_LINK) ! rm -f lib$(NAME).a ! $(AR) $(AROPT) lib$(NAME).a $(shlib) endif # PORTNAME == aix *** *** 350,355 --- 354,360 ifeq ($(enable_shared), yes) install-lib-shared: $(shlib) + ifneq ($(PORTNAME), aix) # we don't install $(shlib) on AIX $(INSTALL_SHLIB) $< '$(DESTDIR)$(libdir)/$(shlib)' ifneq ($(PORTNAME), cygwin) ifneq ($(PORTNAME), win32) *** *** 365,370 --- 370,376 endif endif # not win32 endif # not cygwin + endif # not aix endif # enable_shared ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [COMMITTERS] pgsql: Sequences were not being shown due to
Tom Lane wrote: > [EMAIL PROTECTED] (Bruce Momjian) writes: > > Sequences were not being shown due to the use of lowercase 's' instead > > of 'S', and the views were not checking for table visibility with > > regards to temporary tables and sequences. > > What became of my objection that the test should be on USAGE privilege > for the containing schema instead? Was this addressed? -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Optimize ORDER BY ... LIMIT
Alvaro Herrera <[EMAIL PROTECTED]> writes: > I don't know if this is the same thing you are talking about, but Oleg > talked to me on the conference about "partial sort", which AFAICS it's > about the same thing you are talking about. I think Teodor submitted a > patch to implement it, which was rejected because of not being general > enough. Oof, you have a long memory. Oleg does reference such a thing in his 2002 post that ended up resulting in the TODO item. I can't find the original patch but I doubt any patch against 7.1 is going to be all that helpful in understanding what to do today. I'm also confused how he only saw a factor of 6 improvement in reading the top 100 out of a million. I would expect much better. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Release notes
Alvaro Herrera wrote: > Bruce Momjian wrote: > > I have completed my first pass over the release notes and Tom has made > > some additions: > > > > http://momjian.postgresql.org/cgi-bin/pgrelease > > > > I will probably go over them again in a few hours, update them to > > current CVS, then move them into our SGML documentation by Monday. > > Oh, another typo: > > This changes the previous behavior where concatenation would adjust the > lower array dimmensions. > > It's "dimensions", single m. OK, fixed. I ran spellcheck again and didn't find anything new. > > Further below, it says: > > For example, '2006-05-24 21:11 Americas/New_York'::timestamptz. > > However, the example is invalid. The correct example should be > > For example, '2006-05-24 21:11 America/New_York'::timestamptz. Fixed. > Note these two entries: > > Add index information to /contrib/pgstattuple (ITAGAKI Takahiro) > > Add functions to /contrib/pgstattuple that show index statistics and > index page contents (Satoshi Nagayasu) > > They should probably be merged into one. Done. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Update to msvc build sys
"Magnus Hagander" <[EMAIL PROTECTED]> writes: > Here's a patch that updates the msvc build system. It contains the > changes in Hiroshi-sans patch from about a week ago, so please apply > this patch instead to avoid conflicts. Changes summary: Applied, thanks. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Release notes
Bruce Momjian wrote: > I have completed my first pass over the release notes and Tom has made > some additions: > > http://momjian.postgresql.org/cgi-bin/pgrelease > > I will probably go over them again in a few hours, update them to > current CVS, then move them into our SGML documentation by Monday. Oh, another typo: This changes the previous behavior where concatenation would adjust the lower array dimmensions. It's "dimensions", single m. Further below, it says: For example, '2006-05-24 21:11 Americas/New_York'::timestamptz. However, the example is invalid. The correct example should be For example, '2006-05-24 21:11 America/New_York'::timestamptz. Note these two entries: Add index information to /contrib/pgstattuple (ITAGAKI Takahiro) Add functions to /contrib/pgstattuple that show index statistics and index page contents (Satoshi Nagayasu) They should probably be merged into one. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] pg_strcasecmp in fe-connect.c
"Magnus Hagander" <[EMAIL PROTECTED]> writes: > This patch fixes a couple of cases where we use strcasecmp() instead of > pg_strcasecmp() in fe_connect.c, coming from the LDAP client pathc. Applied. I found another instance in contrib/hstore, too. There are also some occurrences in pgbench.c, but I'm unsure that we need be paranoid about changing those. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] question regarding regression tests
Folks, Could someone please provide information about how to create a correct regression test? Regards, Gevik. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] LDAP function signature for MSVC
"Magnus Hagander" <[EMAIL PROTECTED]> writes: > This patch changes the function definition for ldap_start_tls_sA() on > win32 by removing the WINLDAPAPI. Applied. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Tiny plpython fix
"Magnus Hagander" <[EMAIL PROTECTED]> writes: > Seems __vc_errcode was used during Visual C++ beta at some point, and is > now declared deprecated in the system headers. This patch renames our > use of it to __msvc_errcode, so we don't conflict anymore. If we need this change in plpython, why not also src/include/port/win32.h? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Include file in regress.c
"Magnus Hagander" <[EMAIL PROTECTED]> writes: > This patch adds a required include file to regress.c, required to get at > InvalidTransactionId. If that's needed, why isn't everybody else's build falling over too? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Release notes
Fixed. --- Guillaume Smet wrote: > On 9/15/06, Bruce Momjian <[EMAIL PROTECTED]> wrote: > > I have completed my first pass over the release notes and Tom has made > > some additions: > > > > http://momjian.postgresql.org/cgi-bin/pgrelease > > In Server changes, the two first lines are: > > # Improve performance of statistics monitoring, especially > pg_stat_activity (Tom) > # Improve performance of statistics monitoring (Tom) > > This is probably an error. > > -- > Guillaume > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] plpgsql, return can contains any expression
"Pavel Stehule" <[EMAIL PROTECTED]> writes: >> This patch doesn't seem to cope with cases where the supplied tuple has >> the wrong number of columns, and it doesn't look like it's being careful >> about dropped columns either. Also, that's a mighty bizarre-looking >> choice of cache memory context in coerce_to_tuple ... but then again, >> why are you bothering with a cache at all for temporary arrays? > I am sorry, Tom. But I don't understand. I can check number of columns, > ofcourse and I'll do it. What cache for temporary arrays do you mean? I thought that making coerce_to_tuple depend on estate->err_func was pretty bizarre, and that there was no need for any "cache" at all for arrays that need only live as long as the function runs. All you are saving here is a palloc/pfree cycle, which is not worth the obscurantism and risk of bugs (are you sure natts can never change?). No, cache there is ugly. But I don't have idea about more efective implementation of it :-(. First version of this patch was more clean. and little bit slow. This cache save 10%. BTW, if you want this patch to make it into 8.2, it needs to be fixed and resubmitted *very* soon. I understand, but I am not able work on it in next four days. And I need help with it from Neil. It will be for 8.3. Thank you Pavel _ Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. http://messenger.msn.cz/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Optimize ORDER BY ... LIMIT
[EMAIL PROTECTED] writes: > On Fri, Sep 15, 2006 at 08:22:50PM +0100, Gregory Stark wrote: > > I'm curious, as I may be such an offender. What alternatives exist? > > I think you mean the concept of showing a page of information that > you can navigate forwards and backwards a page, or select a range. > > What alternatives to limit/offset exist? If there are thousands or > more results, I have trouble with an idea that the entire results > should be queried, and cached, displaying only a fraction. > > I think most or all of the times I do this, an index is available, > so perhaps that is why I don't find this issue problematic? If you have a unique index and instead of using OFFSET you pass along the last key of the previous page then you can use a WHERE clause on the indexed column to go straight to the correct page rather than using OFFSET. So for example if you're displaying bank transactions sorted by transaction_id you have the "next page" button pass along the "start_transaction_id=nnn" where nnn is the last transaction_id of the previous page. Then on the next page you do a query with "WHERE transaction_id > ?" and pass that column. You still use ORDER BY transaction_id and LIMIT. This has the upside that your query always takes the same amount of time. Using OFFSET means later pages take longer, possibly much longer, than earlier pages. Possibly so much longer that it causes enough i/o to bring down your web server etc. It does have lots of downsides as well. You cannot provide direct links to the later pages aside from the next page. It's difficult to provide a proper "previous page" button. etc. Early in the web's history these were reasonable trade-offs but nowadays it's hard to convince people that their $bazillion machine can't handle sorting a couple thousand records for each page view and they should sacrifice the user experience for that. So I've pretty much given up on that battle. > For implementation, I think something simple is best: [...] You just described using an insertion sort. Even if I went with insertion sort instead of heap sort I think it makes sense to do it inside tuplesort. > If X+Y tuples would take up too much memory, this plan should be > skipped, and the general routines used instead. And that's a big part of why... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Release notes
On 9/15/06, Bruce Momjian <[EMAIL PROTECTED]> wrote: I have completed my first pass over the release notes and Tom has made some additions: http://momjian.postgresql.org/cgi-bin/pgrelease In Server changes, the two first lines are: # Improve performance of statistics monitoring, especially pg_stat_activity (Tom) # Improve performance of statistics monitoring (Tom) This is probably an error. -- Guillaume ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: guc comment changes (was Re: [HACKERS] Getting a move on
OK, patch sent to the 8.3 hold queue for later work, open item removed. --- Tom Lane wrote: > Peter Eisentraut <[EMAIL PROTECTED]> writes: > > That does not mean that the patch is bad, and I certainly support the > > feature change. But I can't efficiently review the patch. If someone > > else wants to do it, go ahead. > > I've finally taken a close look at this patch, and I don't like it any > more than Peter does. The refactoring might or might not be good at its > core, but as presented it is horrid. As just one example, it replaces one > reasonably well-commented function with three misnamed, poorly commented > functions. In place of > > /* > ! * Sets option `name' to given value. The value should be a string > ! * which is going to be parsed and converted to the appropriate data > ! * type. The context and source parameters indicate in which context this > ! * function is being called so it can apply the access restrictions > ! * properly. > ! * > ! * If value is NULL, set the option to its default value. If the > ! * parameter changeVal is false then don't really set the option but do all > ! * the checks to see if it would work. > ! * > ! * If there is an error (non-existing option, invalid value) then an > ! * ereport(ERROR) is thrown *unless* this is called in a context where we > ! * don't want to ereport (currently, startup or SIGHUP config file reread). > ! * In that case we write a suitable error message via ereport(DEBUG) and > ! * return false. This is working around the deficiencies in the ereport > ! * mechanism, so don't blame me. In all other cases, the function > ! * returns true, including cases where the input is valid but we chose > ! * not to apply it because of context or source-priority considerations. > ! * > ! * See also SetConfigOption for an external interface. >*/ > ! bool > ! set_config_option(const char *name, const char *value, > ! GucContext context, GucSource source, > ! bool isLocal, bool changeVal) > > we find > > /* > ! * Try to parse value. Determine what is type and call related > ! * parsing function or if newval is equal to NULL, reset value > ! * to default or bootval. If the value parsed okay return true, > ! * else false. >*/ > ! static bool > ! parse_value(int elevel, const struct config_generic *record, > ! const char *value, GucSource *source, bool changeVal, > ! union config_var_value *retval) > > which doesn't tell you quite what the parameters do, but more > fundamentally is misnamed because one would expect "parse_value" > returning bool to merely check whether the value is syntactically > correct. Well, it doesn't do that: it applies the value too. > Another broken-out routine is > > ! /* > ! * Check if the option can be set at this time. See guc.h for the precise > ! * rules. > ! */ > ! static bool > ! checkContext(int elevel, struct config_generic *record, GucContext context) > > which is again a misleading description because it doesn't bother to > explain that control may not come back if the option is rejected > (depending on elevel). One might also think, given that description, > that the caller is supposed to emit an error message on false result. > Lastly we have > > + /* > + * Verify if option exists and value is valid. > + * It is primary used for validation of items in configuration file. > + */ > + bool > + verify_config_option(const char *name, const char *value, > + GucContext context, GucSource source, > + bool *isNewEqual, bool *isContextOK) > > which again is far south of my ideas for adequate documentation of a > function with a fairly complicated API. And guess what, this one has > side effects too, which it surely should not (and that leads directly > to a bug: GUC_IN_CONFFILE could remain set in a variable after a > parsing failure). > > It's possible that a refactoring along these lines could be an > improvement if it were well coded and well documented, but this patch > is not it. > > The comment-reversion part of the patch is not any better. It's poorly > factored (what the heck is guc-file.l doing patching up the source > settings after calling set_config_option?), which is surprising > considering the whole point of the refactoring was to support this. > And the handling of reversion to a PGC_S_ENV_VAR setting is just a kluge > involving duplicated code (what was that about refactoring again?). > > In short, whether or not it has any remaining undetected bugs, this > patch is a severe disimprovement from the standpoint of source code > quality, and I recommend rejecting it. > > regards, tom lane > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > >
Re: [HACKERS] Optimize ORDER BY ... LIMIT
On Fri, Sep 15, 2006 at 08:22:50PM +0100, Gregory Stark wrote: > But just in case it's not clear for anyone the usual use case for > this paging results on a web page. As much as I normally try to > convince people they don't want to do it that way they usually do > end up with it implemented using limit/offset. And Postgres > currently is absolutely *awful* at running those queries. I'm curious, as I may be such an offender. What alternatives exist? I think you mean the concept of showing a page of information that you can navigate forwards and backwards a page, or select a range. What alternatives to limit/offset exist? If there are thousands or more results, I have trouble with an idea that the entire results should be queried, and cached, displaying only a fraction. I think most or all of the times I do this, an index is available, so perhaps that is why I don't find this issue problematic? For implementation, I think something simple is best: - limit X offset Y This means keeping a set of X+Y tuples as follows: 1) If set of X+Y tuples still has room, insert using a binary search that retains the ordering characteristics that would be had if limit/offset had not been used. 2) If the row is less than the X+Y'th tuple, remove the X+Y'th tuple from the set, and insert the row as per 1). 3) Ignore the tuple. At the end, you return from the set starting at Y+1, and ending at Y+X. If X+Y tuples would take up too much memory, this plan should be skipped, and the general routines used instead. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of
I would like to see some comments about AIX linking so we don't need to relearn this in 1-2 years. --- Rocco Altier wrote: > With the patch attached this time... > > -rocco > > > -Original Message- > > From: Rocco Altier > > Sent: Friday, September 15, 2006 2:04 PM > > To: Rocco Altier; 'Tom Lane'; 'Albe Laurenz' > > Cc: 'pgsql-hackers@postgresql.org' > > Subject: RE: [PATCHES] [HACKERS] Linking on AIX (Was: Fix > > linking of OpenLDAP libraries ) > > > > > > > > From: Tom Lane [mailto:[EMAIL PROTECTED] > > > > Mmm ... what of "make check"'s temporary installation? We need > > > > to have the executables search in the temporary install's libdir, > > > > *before* looking in the configured --libdir (which could easily > > > > contain an incompatible back-version libpq ...) > > > > > > > > pg_regress normally tries to handle this by setting > > LD_LIBRARY_PATH > > > > ... does AIX use that or a similar symbol? > > > > > > > The "make check" was successful in my previous testing of the last > > > patch, so it appears that AIX was paying attention to > > LD_LIBRARY_PATH. > > > > > > I am testing the new version of the patch now, so will report back > > > shortly. > > > > > From testing the new patch, it did not work for the > > regression tests in the buildfarm. > > Not sure why it did work before. > > > > Anyhow, I have updated the patch to set LIBPATH (AIX's > > version of LD_LIBRARY_PATH), in pg_regress and ecpg's pg_regress. > > > > I have tested this with default config options > > (enable-shared, enable-rpath). I am starting to test the > > other methods as well, but wanted to get this out first. > > > > -rocco > > Content-Description: aix.link.regression.patch [ Attachment, skipping... ] > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Release notes
I have completed my first pass over the release notes and Tom has made some additions: http://momjian.postgresql.org/cgi-bin/pgrelease I will probably go over them again in a few hours, update them to current CVS, then move them into our SGML documentation by Monday. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Optimize ORDER BY ... LIMIT
Martijn van Oosterhout writes: > On Fri, Sep 15, 2006 at 05:30:27PM +0100, Gregory Stark wrote: >> Also, because heap sort is slower than qsort (on average anyways) it makes >> sense to not bother with the heap until the number of tuples grows well >> beyond >> the limit or until it would otherwise spill to disk. > > The thought that comes to mind is to leave the sorting as is, but > change the code that writes to the tapes to stop writing once it hits > the limit. So each tape will never have more than N tuples, where N is > the limit. This would be fairly unobtrusive because none of the other > code actually needs to care. I'm sorry, I forgot to mention that part of my analysis. Once you reach disk any chance of optimising the limit case is pretty much out the window. You could trim some tuples from each tape but unless you're sorting truly stupendous amounts of data I doubt it would really help much. I think it only makes sense to look at the in-memory case. Instead of qsorting thousands of records or, worse, spilling millions of records to disk and doing an external sort only to use only the top 10 and throw the rest away, we throw tuples away before they accumulate in memory in the first place. >> Alternatively we could have Limit(Sort()), Unique(Sort()), and >> Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not >> introduce the Limit and Unique nodes at all. > > I don't think it's easy to merge Unique and Sort, mainly because the > fields you're doing the Unique on are probably not the fields you're > sorting on, so you're probably not saving much. On the contrary I think the vast majority of the time you have a Unique(Sort) it will be the same key because it will be caused by a SELECT DISTINCT. Now that I actually test it I see there are more nodes that could do implement this (namely GroupAgg) so I'm thinking more and more about having an abstract way to pass information down through the nodes rather than handle just Limit/Sort specifically. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Release notes
Tom Lane wrote: > Neil Conway <[EMAIL PROTECTED]> writes: > > Bruce Momjian wrote: > >> How is maintaining another file on every commit going to go over? > > > Well, it would clearly not be on every commit: most commits don't > > warrant a mention in the release notes. If committers think that this > > burden is too much to bear, please speak up. > > Well, I'm willing to (and I think usually have) put release-note-grade > descriptions into commit log messages, but I'm not willing to add "edit > release.sgml" to the already long process, for two basic reasons: > > * it'd make release.sgml into a commit bottleneck --- if everyone is > doing it this way, everyone's local copy of the file would be constantly > out of date, and merge conflicts would be an everyday problem. > > * correct SGML markup is a PITA. > > If *someone else* wants to troll the commit logs every so often and make > entries into release.sgml, that's fine with me. But I don't have the > bandwidth. That is pretty much my objection, even though I have to spend the days to create release.sgml. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Optimize ORDER BY ... LIMIT
Gregory Stark wrote: > Tom Lane <[EMAIL PROTECTED]> writes: > > > I believe a better way to think about this would be as an aggregate that > > remembers the top N rows. > > Wouldn't such a thing just be a reimplementation of a tuplestore though? I > mean, it's storing tuples you feed it, sorting them, and spitting them back > out in sorted order. I don't know if this is the same thing you are talking about, but Oleg talked to me on the conference about "partial sort", which AFAICS it's about the same thing you are talking about. I think Teodor submitted a patch to implement it, which was rejected because of not being general enough. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Optimize ORDER BY ... LIMIT
Tom Lane <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: > >> I think this is pretty important to cover at some point because really _not_ >> doing this just wrong. > > I can't get all *that* excited about it, since an index solves the > problem. Well I'm not all *that* excited about it either, it's just another plan and there are an infinite number of possible plans out there we could infinite for various corner cases. But just in case it's not clear for anyone the usual use case for this paging results on a web page. As much as I normally try to convince people they don't want to do it that way they usually do end up with it implemented using limit/offset. And Postgres currently is absolutely *awful* at running those queries. Often the killer requirement that makes it infeasible to create an index is precisely that they want to be able to sort on any of a long list of possible keys. Creating dozens of keys on every table isn't too appealing. And in any case the query is often a join where the data in the sort key isn't even all coming from the same table or where you need to use other indexes to fetch the data prior to the sort. I won't discourage anyone from working on OLAP queries and this is indeed a similar idea. I suspect the same functionality in tuplesort of being able to set a maximum number of tuples to keep will be useful there too. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Optimize ORDER BY ... LIMIT
On Fri, Sep 15, 2006 at 05:30:27PM +0100, Gregory Stark wrote: > Also, because heap sort is slower than qsort (on average anyways) it makes > sense to not bother with the heap until the number of tuples grows well beyond > the limit or until it would otherwise spill to disk. The thought that comes to mind is to leave the sorting as is, but change the code that writes to the tapes to stop writing once it hits the limit. So each tape will never have more than N tuples, where N is the limit. This would be fairly unobtrusive because none of the other code actually needs to care. > Alternatively we could have Limit(Sort()), Unique(Sort()), and > Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not > introduce the Limit and Unique nodes at all. I don't think it's easy to merge Unique and Sort, mainly because the fields you're doing the Unique on are probably not the fields you're sorting on, so you're probably not saving much. However, merging Unique/Distinct/GroupBy is another avenue that has been considered. In general LIMIT is not handled bad because we don't execute further once we have the number of tuples. Only nodes that Materialize are a problem, basically Sort being the common one. > Or am I overthinking this and having some state nodes peek inside other state > nodes is normal? I don't think so. In general the parser and planner poke around quite a bit, but once the optimizer has been over it, the plan has to be static, for rescans, backward scans, executing stored plans, etc. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Optimize ORDER BY ... LIMIT
Tom Lane <[EMAIL PROTECTED]> writes: > I believe a better way to think about this would be as an aggregate that > remembers the top N rows. Wouldn't such a thing just be a reimplementation of a tuplestore though? I mean, it's storing tuples you feed it, sorting them, and spitting them back out in sorted order. What would you do if the set of tuples turned out to be larger than you expected and not fit in memory? Create a tuplesort and pass them on to it? I've already looked at tuplesort and the changes there are minimal. The hard part is what to do in the planner and executor to get the information to the tuplestore. Do we want the plan to look the way it does now or use some new sort of node that consolidates the limit and the sort in the same place. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Reducing data type space usage
On Fri, Sep 15, 2006 at 06:50:37PM +0100, Gregory Stark wrote: > With a CHAR(1) and CASH style numeric substitute we won't have 25-100% > performance lost on the things that would fit in 1-4 bytes. And with the > variable sized varlena header we'll limit to 25% at worst and 1-2% usually the > performance drain due to wasted space on larger data. I wonder how much of the benefit will be eaten by alignment. I think it'd be great if we rearrange the fields in a tuple to minimize alignment, but that logical field order patch has been and gone and the issues havn't changed. There's also slack at the end of pages. > Doing better would require a complete solution to data types that can > understand how large they are based on their typmod. That would imply more > dramatic solutions like I mused about involving passing around structures that > contain the Datum as well as the attlen or atttypmod. The more I think about > these ideas the more I think they may have merit but they would be awfully > invasive and require more thought. Whatever the solution is here, the same logic will have to apply to extracting Datums out of tuples. If you want the 7th column in a tuple, you have to find the lengths of all the previous datums first. Good summary though, probably worth putting on the wiki so next time we don't have to search the archives. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
guc comment changes (was Re: [HACKERS] Getting a move on for 8.2 beta)
Peter Eisentraut <[EMAIL PROTECTED]> writes: > That does not mean that the patch is bad, and I certainly support the > feature change. But I can't efficiently review the patch. If someone > else wants to do it, go ahead. I've finally taken a close look at this patch, and I don't like it any more than Peter does. The refactoring might or might not be good at its core, but as presented it is horrid. As just one example, it replaces one reasonably well-commented function with three misnamed, poorly commented functions. In place of /* ! * Sets option `name' to given value. The value should be a string ! * which is going to be parsed and converted to the appropriate data ! * type. The context and source parameters indicate in which context this ! * function is being called so it can apply the access restrictions ! * properly. ! * ! * If value is NULL, set the option to its default value. If the ! * parameter changeVal is false then don't really set the option but do all ! * the checks to see if it would work. ! * ! * If there is an error (non-existing option, invalid value) then an ! * ereport(ERROR) is thrown *unless* this is called in a context where we ! * don't want to ereport (currently, startup or SIGHUP config file reread). ! * In that case we write a suitable error message via ereport(DEBUG) and ! * return false. This is working around the deficiencies in the ereport ! * mechanism, so don't blame me. In all other cases, the function ! * returns true, including cases where the input is valid but we chose ! * not to apply it because of context or source-priority considerations. ! * ! * See also SetConfigOption for an external interface. */ ! bool ! set_config_option(const char *name, const char *value, ! GucContext context, GucSource source, ! bool isLocal, bool changeVal) we find /* ! * Try to parse value. Determine what is type and call related ! * parsing function or if newval is equal to NULL, reset value ! * to default or bootval. If the value parsed okay return true, ! * else false. */ ! static bool ! parse_value(int elevel, const struct config_generic *record, ! const char *value, GucSource *source, bool changeVal, ! union config_var_value *retval) which doesn't tell you quite what the parameters do, but more fundamentally is misnamed because one would expect "parse_value" returning bool to merely check whether the value is syntactically correct. Well, it doesn't do that: it applies the value too. Another broken-out routine is ! /* ! * Check if the option can be set at this time. See guc.h for the precise ! * rules. ! */ ! static bool ! checkContext(int elevel, struct config_generic *record, GucContext context) which is again a misleading description because it doesn't bother to explain that control may not come back if the option is rejected (depending on elevel). One might also think, given that description, that the caller is supposed to emit an error message on false result. Lastly we have + /* + * Verify if option exists and value is valid. + * It is primary used for validation of items in configuration file. + */ + bool + verify_config_option(const char *name, const char *value, + GucContext context, GucSource source, + bool *isNewEqual, bool *isContextOK) which again is far south of my ideas for adequate documentation of a function with a fairly complicated API. And guess what, this one has side effects too, which it surely should not (and that leads directly to a bug: GUC_IN_CONFFILE could remain set in a variable after a parsing failure). It's possible that a refactoring along these lines could be an improvement if it were well coded and well documented, but this patch is not it. The comment-reversion part of the patch is not any better. It's poorly factored (what the heck is guc-file.l doing patching up the source settings after calling set_config_option?), which is surprising considering the whole point of the refactoring was to support this. And the handling of reversion to a PGC_S_ENV_VAR setting is just a kluge involving duplicated code (what was that about refactoring again?). In short, whether or not it has any remaining undetected bugs, this patch is a severe disimprovement from the standpoint of source code quality, and I recommend rejecting it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )
With the patch attached this time... -rocco > -Original Message- > From: Rocco Altier > Sent: Friday, September 15, 2006 2:04 PM > To: Rocco Altier; 'Tom Lane'; 'Albe Laurenz' > Cc: 'pgsql-hackers@postgresql.org' > Subject: RE: [PATCHES] [HACKERS] Linking on AIX (Was: Fix > linking of OpenLDAP libraries ) > > > > > From: Tom Lane [mailto:[EMAIL PROTECTED] > > > Mmm ... what of "make check"'s temporary installation? We need > > > to have the executables search in the temporary install's libdir, > > > *before* looking in the configured --libdir (which could easily > > > contain an incompatible back-version libpq ...) > > > > > > pg_regress normally tries to handle this by setting > LD_LIBRARY_PATH > > > ... does AIX use that or a similar symbol? > > > > > The "make check" was successful in my previous testing of the last > > patch, so it appears that AIX was paying attention to > LD_LIBRARY_PATH. > > > > I am testing the new version of the patch now, so will report back > > shortly. > > > From testing the new patch, it did not work for the > regression tests in the buildfarm. > Not sure why it did work before. > > Anyhow, I have updated the patch to set LIBPATH (AIX's > version of LD_LIBRARY_PATH), in pg_regress and ecpg's pg_regress. > > I have tested this with default config options > (enable-shared, enable-rpath). I am starting to test the > other methods as well, but wanted to get this out first. > > -rocco > aix.link.regression.patch Description: aix.link.regression.patch ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )
> > From: Tom Lane [mailto:[EMAIL PROTECTED] > > Mmm ... what of "make check"'s temporary installation? We need > > to have the executables search in the temporary install's libdir, > > *before* looking in the configured --libdir (which could easily > > contain an incompatible back-version libpq ...) > > > > pg_regress normally tries to handle this by setting LD_LIBRARY_PATH > > ... does AIX use that or a similar symbol? > > > The "make check" was successful in my previous testing of the last > patch, so it appears that AIX was paying attention to LD_LIBRARY_PATH. > > I am testing the new version of the patch now, so will report back > shortly. > From testing the new patch, it did not work for the regression tests in the buildfarm. Not sure why it did work before. Anyhow, I have updated the patch to set LIBPATH (AIX's version of LD_LIBRARY_PATH), in pg_regress and ecpg's pg_regress. I have tested this with default config options (enable-shared, enable-rpath). I am starting to test the other methods as well, but wanted to get this out first. -rocco ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Reducing data type space usage
Following up on the recent discussion on list about wasted space in data representations I want to summarise what we found and make some proposals: As I see it there are two cases: Case 1) Data types that are variable length but often quite small. This includes things like NUMERIC which in common use will rarely be larger than 12-20 bytes and often things like text. In cases like these we really only need 1 or sometimes 2 byte varlena header overhead, not 4 as we currently do. In fact we *never* need more than 2 bytes of varlena header on disk anyways with the standard configuration. Case 2) Data types that are different sizes depending on the typmod but are always the same size that can be determined statically for a given typmod. In the case of a ASCII encoded database CHAR(n) fits this category and in any case we'll eventually have per-column encoding. NUMERC(a,b) could also be made to fit this as well. In cases like these we don't need *any* varlena header. If we could arrange for the functions to have enough information to know how large the data must be. Solutions proposed: Case 1) We've discussed the variable sized varlena headers and I think it's clear that that's the most realistic way to approach it. I don't think any other approaches were even suggested. Tom said he wanted a second varlena format for numeric that would have 2-byte alignment. But I think we could always just say that we always use the 2-byte varlena header on data types with 2-byte alignment and the 4-byte header on data types with 4-byte alignment needs. Or heap_form_tuple could be even cleverer about it but I'm not sure it's worth it. This limits the wasted space to 1-2% for most variable sized data that are 50 bytes long or more. But for very small data such as the quite common cases where those are often only 1-4 bytes it still means a 25-100% performance drain. Case 2) Solving this is quite difficult without introducing major performance problems or security holes. The one approach we have that's practical right now is introducing special data types such as the oft-mentioned "char" data type. "char" doesn't have quite the right semantics to use as a transparent substitute for CHAR but we could define a CHAR(1) with exactly the right semantics and substitute it transparently in parser/analyze.c (btw having two files named analyze.c is pretty annoying). We could do the same with NUMERIC(a,b) for sufficiently small values of a and b with something like D'Arcy's CASH data type (which uses an integer internally). The problem with defining lots of data types is that the number of casts and cross-data-type comparisons grows quadratically as the number of data types grows. In theory we would save space by defining a CHAR(n) for whatever size n the user needs but I can't really see anything other than CHAR(1) being worthwhile. Similarly a 4-byte NUMERIC substitute like CASH (with full NUMERIC semantics though) and maybe a 2-byte and 8-byte substitute might be reasonable but anything else would be pointless. I see these two solutions as complementary. The variable varlena headers take care of the larger data and the special-purpose data types take care of the extremely small data. And pretty important to cover both cases data that fits in 1-4 bytes is quite common. You often see databases with dozens of CHAR(1) flag columns or NUMERIC(10,2) currency columns. With a CHAR(1) and CASH style numeric substitute we won't have 25-100% performance lost on the things that would fit in 1-4 bytes. And with the variable sized varlena header we'll limit to 25% at worst and 1-2% usually the performance drain due to wasted space on larger data. Doing better would require a complete solution to data types that can understand how large they are based on their typmod. That would imply more dramatic solutions like I mused about involving passing around structures that contain the Datum as well as the attlen or atttypmod. The more I think about these ideas the more I think they may have merit but they would be awfully invasive and require more thought. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] regression test for uuid datatype
Gevik Babakhani wrote: I would like to create some regression tests for the uuid datatype. Should those also be included in the patch to review or the regression tests are done by the commiters? In the patch. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] regression test for uuid datatype
I would like to create some regression tests for the uuid datatype. Should those also be included in the patch to review or the regression tests are done by the commiters? Regards, Gevik. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [DOCS] New XML section for documentation
"Nikolay Samokhvalov" <[EMAIL PROTECTED]> writes: > On 8/26/06, Peter Eisentraut <[EMAIL PROTECTED]> wrote: >> "Valid" and "well-formed" have very specific distinct meanings in XML. >> (Note that "check" doesn't have any meaning there.) We will eventually >> want a method to verify both the validity and the well-formedness. >> >> I think that a function called xml_valid checks for well-formedness is >> an outright bug and needs to be fixed. > That's exactly what I'm talking about. xml_valid() is wrong name and > it may confuse people. > Bruce suggested to use overload to keep backward compat. - in other > words, 1-arg function for checking for well-formedness and 2-arg > function for validation process. That's bad too: ISTM the right answer is to add xml_is_well_formed() in this release and have xml_valid as an alias for it, with documentation explaining that xml_valid is deprecated and will be removed in the next release. Then we can add a proper validity-checking function too. Nikolay submitted a patch later http://archives.postgresql.org/pgsql-patches/2006-09/msg00123.php that does part of this and can easily be adapted to add the alias. His patch also adds an xpath_array() function --- what do people think about that? It's well past feature freeze ... now we've always been laxer about contrib than the core code, but still I'm inclined to say that that function should wait for 8.3. Comments? It's time to get a move on with resolving this. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Optimize ORDER BY ... LIMIT
Tom Lane wrote: (unless we want to invent aggregates that can return SETOF?) Doesn't sound like a bad idea at all ... cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Ding-dong, contrib is dead ...
Abhijit Menon-Sen <[EMAIL PROTECTED]> writes: > At 2006-09-05 16:35:49 -0400, [EMAIL PROTECTED] wrote: >> So basically I don't see the point of investing effort in a >> bug-compatible version of userlocks, when we can have something >> cleaner and suitable for the long run with not very much more >> effort. > Fine with me. Two questions: > - Where would the code live, if it were in core? > - Shall I hack up the API you suggested in your earlier message? Is this going anywhere? The days grow short ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Optimize ORDER BY ... LIMIT
Gregory Stark <[EMAIL PROTECTED]> writes: > I've been looking at doing the following TODO item: > Allow ORDER BY ... LIMIT # to select high/low value without sort or index > using a sequential scan for highest/lowest values > I think this is pretty important to cover at some point because really _not_ > doing this just wrong. I can't get all *that* excited about it, since an index solves the problem. > The way I see to do this is to still use a Sort node and use a tuplesort but > to arrange to get the information of the maximum number of tuples needed to > the tuplesort so it can throw out tuples as it sorts. The implementation that was proposed in the earlier discussion did not involve hacking the sort code beyond recognition ;-). I believe a better way to think about this would be as an aggregate that remembers the top N rows. It can't quite be an aggregate as it stands (unless we want to invent aggregates that can return SETOF?) but I think there might be some useful overlap with the SQL2003 window-function concept. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] New version of money type
On Fri, 15 Sep 2006 16:15:04 - Andrew - Supernews <[EMAIL PROTECTED]> wrote: > On 2006-09-15, "D'Arcy J.M. Cain" wrote: > >> Seems? Have you benchmarked it? > > > > Not rigourously but a few "ANALYZE EXPLAIN" statements bear out this > > observation. > > The overhead of EXPLAIN ANALYZE is so large that it completely swamps any > real difference. Hence my "not rigourously" comment. > First measure the benchmark overhead: > > select null::integer from generate_series(1,1000) s1, > generate_series(1,1000) s2; Time: 870.531 ms > Since output functions are strict, this does not call int4out at all, so > this measures the time taken to generate the million rows, output and discard > them. > > Then do the real tests: > > select 0::integer from generate_series(1,1000) s1, >generate_series(1,1000) s2; Time: 1410.690 ms > This calls int4out(0) a million times. (the input function is only called > once since it is a constant, and therefore handled during planning) > > select 0::numeric from generate_series(1,1000) s1, >generate_series(1,1000) s2; Time: 1256.539 ms Selecting "'0'::money" gives: Time: 1487.757 ms Bigint gives: Time: 1450.405 ms The extra processing over int and bigint is probably due to locale formatting. That's partially why I was wondering if the basic type should be doing that as opposed to doing it in app code. Also, I wonder if some of the techniques in numeric could be applied here. I haven't looked carefully at the numeric output code yet. In any case, I/O speed is probably not that important with this type. Internal calculations, in my experience, are much more critical. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: Fwd: [HACKERS] polite request about syntax
Ricardo Malafaia wrote: > On 9/15/06, Douglas McNaught <[EMAIL PROTECTED]> wrote: > >> What happens then when it sees something like a double variable > >> interpolation as in $$foobar? ;) > > > >Then you use $FOO$ (or something else that doesn't appear in your > >code) as the delimiter--you're not limited to just $$. > > clever. still, i don't believe such variety of syntax and even > multitude of language support would do well with most Windows shops. > but that's not really your fault... Well, there's always MS Access ... -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: Fwd: [HACKERS] polite request about syntax
Ricardo Malafaia wrote: And the $$ is indeed needed for allowing languages with different syntaxes. agreed. However, Tom, i could counter example your plperl example: realize that qq/end/ does not represent a matching "end"? What happens then when it sees something like a double variable interpolation as in $$foobar? ;) The delimiter does not have to be $$. It can be $any_unquoted_identifier_without_a_dollar_sign$. the lexer says: /* $foo$ style quotes ("dollar quoting") * The quoted string starts with $foo$ where "foo" is an optional string * in the form of an identifier, except that it may not contain "$", * and extends to the first occurrence of an identical string. * There is *no* processing of the quoted text. * */ dolq_start [A-Za-z\200-\377_] dolq_cont [A-Za-z\200-\377_0-9] dolqdelim \$({dolq_start}{dolq_cont}*)?\$ So for a plperl function you just use something like $func$ at each end. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: Fwd: [HACKERS] polite request about syntax
On 9/15/06, Douglas McNaught <[EMAIL PROTECTED]> wrote: > What happens then when it sees something like a double variable > interpolation as in $$foobar? ;) Then you use $FOO$ (or something else that doesn't appear in your code) as the delimiter--you're not limited to just $$. clever. still, i don't believe such variety of syntax and even multitude of language support would do well with most Windows shops. but that's not really your fault... ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] One of our own begins a new life
Hello, Yeah, this is a cross post and it is slightly off topic but IMHO this is important. Tomorrow one of our own, Devrim Gunduz is becoming a man. He is sucking it up, and committing to the cvs repo of project marriage. May the patches reviewers be kind to him! Congratz Devrim, have a good honey moon and we look forward to having you back in a couple of weeks! Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] New version of money type
On Fri, 15 Sep 2006 16:15:24 - Andrew - Supernews <[EMAIL PROTECTED]> wrote: > On 2006-09-15, Alvaro Herrera <[EMAIL PROTECTED]> wrote: > > Andrew - Supernews wrote: > >> Numbers from an actual benchmark: > >> > >> int4out(0) - 0.42us/call > >> numeric_out(0) - 0.32us/call > >> > >> int4out(10) - 0.67us/call > >> numeric_out(10) - 0.42us/call > > > > Is this really int4out, or is it int8out? > > int4out. int8out is slower. int8out is probably a better comparison since it is the same range. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: Fwd: [HACKERS] polite request about syntax
"Ricardo Malafaia" <[EMAIL PROTECTED]> writes: > What happens then when it sees something like a double variable > interpolation as in $$foobar? ;) Then you use $FOO$ (or something else that doesn't appear in your code) as the delimiter--you're not limited to just $$. -Doug ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Optimize ORDER BY ... LIMIT
I've been looking at doing the following TODO item: Allow ORDER BY ... LIMIT # to select high/low value without sort or index using a sequential scan for highest/lowest values Right now, if no index exists, ORDER BY ... LIMIT # requires we sort all values to return the high/low value. Instead The idea is to do a sequential scan to find the high/low value, thus avoiding the sort. MIN/MAX already does this, but not for LIMIT > 1. I think this is pretty important to cover at some point because really _not_ doing this just wrong. We're simply not supporting the correct plan for this type of query. Currently we're doing a O(nlogn) plan when the right plan would usually be O(n). (As in, it's actually O(nlogm) where m is usually small and not interesting). The way I see to do this is to still use a Sort node and use a tuplesort but to arrange to get the information of the maximum number of tuples needed to the tuplesort so it can throw out tuples as it sorts. My plan is to have tuplesort reuse the existing heap code it uses for tape merges only keep the memtuples array in a max-heap (instead of the min-heap it uses now -- that means having a tuplesortstate flag indicating which order and having the tuplesort_heap* functions check that flag). When it reaches the limit it can throw away either the new element or the top element on every insert. I considered using a simple insertion-sort instead but was worried that the performance would degrade as the limit clause grows large. I don't think that's a major use case but I don't like the idea of a O(n^2) algorithm lying in wait to ambush someone. Also, because heap sort is slower than qsort (on average anyways) it makes sense to not bother with the heap until the number of tuples grows well beyond the limit or until it would otherwise spill to disk. To actually get the information to the tuplesort the information has to be fed down to the SortState from the LimitState somehow. This I'm not sure how to do. There isn't currently any abstract interface between nodes to pass information like this. The simple solution is that ExecLimit could just peek at its outerPlanState and if it's a SortState it can set some fields so the SortState can know to pass the information to the tuplesort. I've also considered a more abstract interface such as adding an ExecAdvise() function that would pass some sort of structure (an node?) down with the information. This seems like overkill for a single integer but I wonder if there would be other consumers of such an interface. The current eflags could be turned swallowed by this, though I don't see any particular advantage. More realistically a Unique node could also inform a Sort node that it can throw away duplicates as it sorts. A limit could even be passed *through* a unique node as long as the Sort understands how to handle the combination properly. In other areas, a Hash Aggregate can start throw away elements once the number of elements in the hash grows to the limit. Alternatively we could have Limit(Sort()), Unique(Sort()), and Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not introduce the Limit and Unique nodes at all. I would worry about duplicated code in that case though, in particular it seems like there would be cases where we still want to use qsort rather than throw away unneeded tuples. But not throwing away unneeded tuples means reimplementing all of nodeLimit in nodeSort for those cases. And that doesn't help with other cases like Hash Aggregate. Or am I overthinking this and having some state nodes peek inside other state nodes is normal? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Fwd: [HACKERS] polite request about syntax
-- Forwarded message -- From: Ricardo Malafaia <[EMAIL PROTECTED]> Date: Sep 15, 2006 1:28 PM Subject: Re: [HACKERS] polite request about syntax To: Tom Lane <[EMAIL PROTECTED]> ok, guys. i guess i was a bit unfair. Timestamp is used everywhere indeed, Oracle, Firebird you name it. Only MySQL followed M$ and added a confusing datetime and date to the mix. I hope, though, that the "timestamp with timezone" isn't really necessary. And the $$ is indeed needed for allowing languages with different syntaxes. agreed. However, Tom, i could counter example your plperl example: realize that qq/end/ does not represent a matching "end"? What happens then when it sees something like a double variable interpolation as in $$foobar? ;) Sorry for the rudeness, but i truly like PostgreSQL and was playing devil's advocate. and no, i'm not likely to have fun with Oracle... :P cheers -- http://slashdot.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] New version of money type
On 2006-09-15, Alvaro Herrera <[EMAIL PROTECTED]> wrote: > Andrew - Supernews wrote: >> Numbers from an actual benchmark: >> >> int4out(0) - 0.42us/call >> numeric_out(0) - 0.32us/call >> >> int4out(10) - 0.67us/call >> numeric_out(10) - 0.42us/call > > Is this really int4out, or is it int8out? int4out. int8out is slower. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] New version of money type
On 2006-09-15, "D'Arcy J.M. Cain" wrote: >> Seems? Have you benchmarked it? > > Not rigourously but a few "ANALYZE EXPLAIN" statements bear out this > observation. The overhead of EXPLAIN ANALYZE is so large that it completely swamps any real difference. >> The point is that bigint is _not_ faster than numeric for I/O (in fact >> even integer is not faster than numeric for output). >> >> Numbers from an actual benchmark: >> >> int4out(0) - 0.42us/call >> numeric_out(0) - 0.32us/call >> >> int4out(10) - 0.67us/call >> numeric_out(10) - 0.42us/call > > Whay benchmark is this? Simple queries output to /dev/null. Use \timing in psql to get times. First measure the benchmark overhead: select null::integer from generate_series(1,1000) s1, generate_series(1,1000) s2; Since output functions are strict, this does not call int4out at all, so this measures the time taken to generate the million rows, output and discard them. Then do the real tests: select 0::integer from generate_series(1,1000) s1, generate_series(1,1000) s2; This calls int4out(0) a million times. (the input function is only called once since it is a constant, and therefore handled during planning) select 0::numeric from generate_series(1,1000) s1, generate_series(1,1000) s2; This calls numeric_out(0) a million times. And so on. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] confusing comment in tqual.c
Gregory Stark <[EMAIL PROTECTED]> writes: > I'm reading the code in tqual.c and find that this comment doesn't seem to > match the code. CommandId always seems to be compared with >= or <= rather > than equality as the comment says. Yeah, you're right, the comment seems to be written on the assumption that it's not possible to see cmin or cmax > curcid ... but that is possible given sufficiently bizarre programming (eg, query fires a function or trigger that updates the table again while outer query is still scanning). The actual rule for "now" is that a change made in the current transaction is considered to have taken effect if its cmin or cmax is strictly less than the current command's CID. (Hmm ... actually, given the limited ways in which SnapshotNow is used, I guess it's possible that indeed this can never happen. The code is made to be parallel to similar tests in SatisfiesSnapshot, which definitely can see the sort of scenario mentioned above.) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] New version of money type
Andrew - Supernews wrote: > Numbers from an actual benchmark: > > int4out(0) - 0.42us/call > numeric_out(0) - 0.32us/call > > int4out(10) - 0.67us/call > numeric_out(10) - 0.42us/call Is this really int4out, or is it int8out? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] polite request about syntax
"Ricardo Malafaia" <[EMAIL PROTECTED]> writes: > my complaint is that, like i said, "timestamp with time zone" is no > good substitute for a simple "datetime". Here, someone suggested a > CREATE DOMAIN to create an alias for it. Why isn't it provided there > out-of-the-box by default? So you have the SQL standard timestamp and > the industry standard datetime. In which part of the industry is "datetime" industry standard? Last I heard, the SQL spec was the industry standard. > and, while $$ is a whole lot better than '', why do we really need > these? Why not a normal, simple, begin end block or {}? Doesn't work real well for arbitrary PL languages: you are effectively assuming that the main SQL parser can lex every language anyone might want to write a PL with. I think I need stray no further than plperl to provide a counterexample: should the SQL parser be expected to realize that qq/end/ does not represent a matching "end"? > and Tom, i don't really want a GUI: No, but it sounds like your co-workers do. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] polite request about syntax
The only person in denial is you. Here's a hard lesson about open source: bitching gets you nothing. YOU are not going to be taken seriously while all you do is complain. And if you must complain, make sure the politeness is in the words, not just the subject. The only place timestamp is mentioned on that page is in the user contributed part of the docs - the official docs do not contain it (one of the reasons I hate so called interactive docs - we are held responsible for stuff that is not in our official docs as if it were). In any case, as Tom pointed out, making an alias for it is child's play. If you think it would be easy to come up with a way of having function bodies that are not strings, then we will be pleased to listen to your constructive and detailed plan for getting there. But first I'd like to know how you intend to do that and at the same time allow for loadable PLs of which we might know nothing at the time Postgres is built. (Personally I think there's a case to be made for special casing SQL and PLPgsql function bodies so they don't have to be strings, but I'm not sure how many people would agree with that). have fun with oracle. cheers andrew Ricardo Malafaia wrote: well, ain't that surprising to see so many open-source developers living in denial and sugestions to RTFM rather than actually coping wth the problem? are you to be taken seriously? As a C programmer, I'm in the same league as most of you guys, so while i can really contribute code and my talk is cheap, it's the best i can do: bug you with feature requests. So: On 9/15/06, Andrew Dunstan <[EMAIL PROTECTED]> wrote: Where is the mention of either of these on the CREATE FUNCTION page? http://www.postgresql.org/docs/8.1/interactive/sql-createfunction.html And anyway, what is your actual complaint? my complaint is that, like i said, "timestamp with time zone" is no good substitute for a simple "datetime". Here, someone suggested a CREATE DOMAIN to create an alias for it. Why isn't it provided there out-of-the-box by default? So you have the SQL standard timestamp and the industry standard datetime. and, while $$ is a whole lot better than '', why do we really need these? Why not a normal, simple, begin end block or {}? People in the industry don't like hacks and the open-source world is full of it, though it's getting better. I think this is all valid criticism, but you wanna play deaf, that's up to you guys. cheers and Tom, i don't really want a GUI: psql's use of GNU readline autocompletion is far better than M$'s stupid Query Analizer standard editor and matching it up with vim gets better yet. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] polite request about syntax
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Ricardo Malafaia > Sent: 15 September 2006 16:35 > To: Andrew Dunstan > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] polite request about syntax > > my complaint is that, like i said, "timestamp with time zone" is no > good substitute for a simple "datetime". Here, someone suggested a > CREATE DOMAIN to create an alias for it. Why isn't it provided there > out-of-the-box by default? So you have the SQL standard timestamp and > the industry standard datetime. Because adding everybody's idea of industry-standard typenames, function name, operators etc will lead to bloated system catalogs and insanity for people trying to understand what differences between objects there may or may not be. We follow the SQL standards. If you need to provide compatibility types and functions to migrate from another product, then unlike many others we provide the capability for you to add them yourself. > and, while $$ is a whole lot better than '', why do we really need > these? Why not a normal, simple, begin end block or {}? People in > the industry don't like hacks and the open-source world is full of it, > though it's getting better. Because the parser may have no knowledge of the syntax of the language being used which may legitimately use begin-end or { } for some other purpose (possibly not in matched pairs). Dollar quoting gives you the flexibility to avoid any potential clash. If we only had one procedural language then I'm sure we could do away with dollar quoting, but there are a dozen or more out there and they're all different. Regards, Dave. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Build v8.1.4 with VC++ 2005
Sweet! I'll try it as soon as it's available.Xiaofeng Zhaohttp://www.xzing.orgerrare humanum est> Subject: Re: [HACKERS] Build v8.1.4 with VC++ 2005> Date: Fri, 15 Sep 2006 10:59:03 +0200> From: [EMAIL PROTECTED]> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]> CC: pgsql-hackers@postgresql.org> > >> I tried to build postgresql from src using vc++ 2005 but was > >not able to go> >> far before hitting errors related to inline function in the > >"wchar.c" file.> >>> >> The source file I downloaded is "postgresql-8.1.4.tar.gz". > >I tried to build> >> it by runing "nmake -f win32.mak" in the "src" directory.> >>> >> Can v8.1.4 be build using VC++ 2005? If so, what is the > >build procedure and> >> if any patches is required.> >> >there is a patch in the queue which has a decent chance of making it> >into 8.2. from the stock 8.1 sources it is impossible.> > It's been applied, so it'll be in 8.2.> > Currently it's broken though, because of some changes during commit.> I'll post a patch to fix this soon.> > //Magnus> > ---(end of broadcast)---> TIP 6: explain analyze is your friendUse Messenger to talk to your IM friends, even those on Yahoo! Talk now!
Re: [HACKERS] polite request about syntax
On Fri, Sep 15, 2006 at 12:35:03PM -0300, Ricardo Malafaia wrote: > On 9/15/06, Andrew Dunstan <[EMAIL PROTECTED]> wrote: > >Where is the mention of either of these on the CREATE FUNCTION page? > > http://www.postgresql.org/docs/8.1/interactive/sql-createfunction.html Err, in the example? So you're not complaining about any text as such and there's nothing actually wrong on that page. OK. You're complaint is that we follow the standard and MSSQL doesn't and should follow them instead. Eh? There are more databases than MS-SQL, we can't implement everybodies version of "timestamp". And in any case, why are they not the same? > and, while $$ is a whole lot better than '', why do we really need > these? Why not a normal, simple, begin end block or {}? People in > the industry don't like hacks and the open-source world is full of it, > though it's getting better. Because Postgresql has a lot more languages. Putting a begin/end around perl code or {} around python code would just look wrong. $$ is unobstrusive and looks ok no matter what the language. Not to mention that it's actually parseable without know the language. Remember, we have to know where the code block begins and ends before we know what language it is, the LANGUAGE specifier comes after. > I think this is all valid criticism, but you wanna play deaf, that's > up to you guys. cheers Deaf? You're the first person who has mentioned either of these issues. So I'm sorry if we don't run to implement them right away. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] New version of money type
On Fri, 15 Sep 2006 15:14:10 - Andrew - Supernews <[EMAIL PROTECTED]> wrote: > On 2006-09-15, "D'Arcy J.M. Cain" wrote: > > On Fri, 15 Sep 2006 10:17:55 - > > Andrew - Supernews <[EMAIL PROTECTED]> wrote: > >> Presumably the same speed as bigint, which is to say that while it is > >> faster than numeric for calculation, it is (much) slower for input/output. > >> (The difference in speed between bigint output and numeric output is > >> measured in multiples, not in percentages.) > > > > I/O for money seems at least as compareable to numeric if not slightly > > better. > > Seems? Have you benchmarked it? Not rigourously but a few "ANALYZE EXPLAIN" statements bear out this observation. > The point is that bigint is _not_ faster than numeric for I/O (in fact > even integer is not faster than numeric for output). > > Numbers from an actual benchmark: > > int4out(0) - 0.42us/call > numeric_out(0) - 0.32us/call > > int4out(10) - 0.67us/call > numeric_out(10) - 0.42us/call Whay benchmark is this? Perhaps I can modify it to include my new implementation. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] confusing comment in tqual.c
I'm reading the code in tqual.c and find that this comment doesn't seem to match the code. CommandId always seems to be compared with >= or <= rather than equality as the comment says. I'm not even sure I have these operators right as the expression as written here is in a few places the converse case that the code actually tests. It's also pretty confusing. I'm not so much submitting this patch to try to correct it as to verify my understanding of the code. *** tqual.c 14 Sep 2006 13:54:45 +0100 1.96 --- tqual.c 15 Sep 2006 15:50:41 +0100 *** *** 254,269 * The satisfaction of "now" requires the following: * * ((Xmin == my-transaction &&changed by the current transaction ! * Cmin != my-command && but not by this command, and *(Xmax is null || the row has not been deleted, or *(Xmax == my-transaction && it was deleted by the current transaction ! * Cmax != my-command))) but not by this command, * || or * *(Xmin is committed && the row was modified by a committed transaction, and *(Xmax is null ||the row has not been deleted, or *(Xmax == my-transaction && the row is being deleted by this command, or ! * Cmax == my-command) || *(Xmax is not committed && the row was deleted by another transaction * Xmax != my-transaction that has not been committed * --- 254,269 * The satisfaction of "now" requires the following: * * ((Xmin == my-transaction &&changed by the current transaction ! * Cmin < my-command && by an earlier command than this scan, and *(Xmax is null || the row has not been deleted, or *(Xmax == my-transaction && it was deleted by the current transaction ! * Cmax >= my-command))) but not by a command before this scan * || or * *(Xmin is committed && the row was modified by a committed transaction, and *(Xmax is null ||the row has not been deleted, or *(Xmax == my-transaction && the row is being deleted by an earlier command ! * Cmax >= my-command) || *(Xmax is not committed && the row was deleted by another transaction * Xmax != my-transaction that has not been committed * -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] polite request about syntax
well, ain't that surprising to see so many open-source developers living in denial and sugestions to RTFM rather than actually coping wth the problem? are you to be taken seriously? As a C programmer, I'm in the same league as most of you guys, so while i can really contribute code and my talk is cheap, it's the best i can do: bug you with feature requests. So: On 9/15/06, Andrew Dunstan <[EMAIL PROTECTED]> wrote: Where is the mention of either of these on the CREATE FUNCTION page? http://www.postgresql.org/docs/8.1/interactive/sql-createfunction.html And anyway, what is your actual complaint? my complaint is that, like i said, "timestamp with time zone" is no good substitute for a simple "datetime". Here, someone suggested a CREATE DOMAIN to create an alias for it. Why isn't it provided there out-of-the-box by default? So you have the SQL standard timestamp and the industry standard datetime. and, while $$ is a whole lot better than '', why do we really need these? Why not a normal, simple, begin end block or {}? People in the industry don't like hacks and the open-source world is full of it, though it's getting better. I think this is all valid criticism, but you wanna play deaf, that's up to you guys. cheers and Tom, i don't really want a GUI: psql's use of GNU readline autocompletion is far better than M$'s stupid Query Analizer standard editor and matching it up with vim gets better yet. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )
> From: Tom Lane [mailto:[EMAIL PROTECTED] > "Albe Laurenz" <[EMAIL PROTECTED]> writes: > > Up to now you have built against the static libpq.a > > I didn't add the right -blibpath to this patch that > > failed for you - the broken initdb is dynamically linked > > but does not know where to look for its shared library. > > > The patch I just submitted to pgsql-patches should take > > care of that. It makes the executables look in --libdir. > > Mmm ... what of "make check"'s temporary installation? We need > to have the executables search in the temporary install's libdir, > *before* looking in the configured --libdir (which could easily > contain an incompatible back-version libpq ...) > > pg_regress normally tries to handle this by setting LD_LIBRARY_PATH > ... does AIX use that or a similar symbol? > The "make check" was successful in my previous testing of the last patch, so it appears that AIX was paying attention to LD_LIBRARY_PATH. I am testing the new version of the patch now, so will report back shortly. Thanks, -rocco ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )
"Albe Laurenz" <[EMAIL PROTECTED]> writes: > Up to now you have built against the static libpq.a > I didn't add the right -blibpath to this patch that > failed for you - the broken initdb is dynamically linked > but does not know where to look for its shared library. > The patch I just submitted to pgsql-patches should take > care of that. It makes the executables look in --libdir. Mmm ... what of "make check"'s temporary installation? We need to have the executables search in the temporary install's libdir, *before* looking in the configured --libdir (which could easily contain an incompatible back-version libpq ...) pg_regress normally tries to handle this by setting LD_LIBRARY_PATH ... does AIX use that or a similar symbol? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] New version of money type
On 2006-09-15, "D'Arcy J.M. Cain" wrote: > On Fri, 15 Sep 2006 10:17:55 - > Andrew - Supernews <[EMAIL PROTECTED]> wrote: >> Presumably the same speed as bigint, which is to say that while it is >> faster than numeric for calculation, it is (much) slower for input/output. >> (The difference in speed between bigint output and numeric output is >> measured in multiples, not in percentages.) > > I/O for money seems at least as compareable to numeric if not slightly > better. Seems? Have you benchmarked it? > Other than that it has all the speed advantages as bigint for > basically the same reasons. It's basically bigint with modified input > and output functions. The point is that bigint is _not_ faster than numeric for I/O (in fact even integer is not faster than numeric for output). Numbers from an actual benchmark: int4out(0) - 0.42us/call numeric_out(0) - 0.32us/call int4out(10) - 0.67us/call numeric_out(10) - 0.42us/call For numbers at the top end of bigint's range, the speed difference is on the order of 4x (albeit on my 32-bit machine) -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )
Rocco Altier wrote: > Here is the working one: > initdb needs: > /usr/lib/libc.a(shr.o) > /unix > /usr/lib/libcrypt.a(shr.o) > > Here is the broken one: > initdb needs: > ../../../src/interfaces/libpq/libpq.so > /usr/lib/libc.a(shr.o) > /usr/lib/librtl.a(shr.o) > /unix > /usr/lib/libcrypt.a(shr.o) > > When run it shows: > exec(): 0509-036 Cannot load program initdb because of the following > errors: > 0509-150 Dependent module libpq.so could not be loaded. > 0509-022 Cannot load module libpq.so. > 0509-026 System error: A file or directory in the > path name does > not exist. Yup, that's as expected :^) Up to now you have built against the static libpq.a I didn't add the right -blibpath to this patch that failed for you - the broken initdb is dynamically linked but does not know where to look for its shared library. The patch I just submitted to pgsql-patches should take care of that. It makes the executables look in --libdir. Yours, Laurenz Albe ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] New version of money type
On Thu, 14 Sep 2006 14:12:30 -0400 AgentM <[EMAIL PROTECTED]> wrote: > If you force the locale into the money type, then the entire column > must be of the same currency. That seems like an unnecessary > limitation. Does your type support banker's rounding? The whole point of money is to have a high speed type suitable for accounting apps. I had an application that used money that we had to switch to numeric due to the size limitation. When we did we saw a dramatic degredation in performance. The app was a gift card system that tracked card balances. A card might have hundreds of transactions and one client might have millions of cards. We had to sum all of those transactions grouped by card. It would have been great to have been able to keep the original money type but total sales broke the limit. We use rint(), same as the previous version. I know that that isn't precisely banker's rounding. I think that those special rules would have to be handled in code. In that environment you would probably want to do that for auditing (code and otherwise) purposes. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] New version of money type
On Fri, 15 Sep 2006 10:17:55 - Andrew - Supernews <[EMAIL PROTECTED]> wrote: > Presumably the same speed as bigint, which is to say that while it is > faster than numeric for calculation, it is (much) slower for input/output. > (The difference in speed between bigint output and numeric output is > measured in multiples, not in percentages.) I/O for money seems at least as compareable to numeric if not slightly better. Other than that it has all the speed advantages as bigint for basically the same reasons. It's basically bigint with modified input and output functions. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] plpgsql, return can contains any expression
"Pavel Stehule" <[EMAIL PROTECTED]> writes: >> This patch doesn't seem to cope with cases where the supplied tuple has >> the wrong number of columns, and it doesn't look like it's being careful >> about dropped columns either. Also, that's a mighty bizarre-looking >> choice of cache memory context in coerce_to_tuple ... but then again, >> why are you bothering with a cache at all for temporary arrays? > I am sorry, Tom. But I don't understand. I can check number of columns, > ofcourse and I'll do it. What cache for temporary arrays do you mean? I thought that making coerce_to_tuple depend on estate->err_func was pretty bizarre, and that there was no need for any "cache" at all for arrays that need only live as long as the function runs. All you are saving here is a palloc/pfree cycle, which is not worth the obscurantism and risk of bugs (are you sure natts can never change?). BTW, if you want this patch to make it into 8.2, it needs to be fixed and resubmitted *very* soon. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Linking on AIX (Was: Fix linking of OpenLDAP libraries )
> I suspect that up to now the buildfarm had a static build of > PostgreSQL. What is the output of 'ldd initdb' when it builds > and runs correctly? > > Is libpq.so in a non-standard directory? If yes, one either > has to export LIBPATH in the environment or link with > -L/location/of/libpq for the executable to find it > (similar to RPATH in Linux). > Here is the working one: initdb needs: /usr/lib/libc.a(shr.o) /unix /usr/lib/libcrypt.a(shr.o) Here is the broken one: initdb needs: ../../../src/interfaces/libpq/libpq.so /usr/lib/libc.a(shr.o) /usr/lib/librtl.a(shr.o) /unix /usr/lib/libcrypt.a(shr.o) When run it shows: exec(): 0509-036 Cannot load program initdb because of the following errors: 0509-150 Dependent module libpq.so could not be loaded. 0509-022 Cannot load module libpq.so. 0509-026 System error: A file or directory in the path name does not exist. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Release notes
Gregory Stark <[EMAIL PROTECTED]> writes: > Well we could make it "edit release.txt" which someone will fix up and turn > into release.sgml later instead. > I think if you put a big enough separator between entries, say two black > lines, two dashes, and two more blank lines, it wouldn't even cause merge > conflicts if it failed -- it would just insert the new entry in the "wrong" > place which wouldn't really matter. > Or you could have a release-notes directory and create a small text file in > there for each major patch. Andrew had the correct perspective on this: if someone wants a different release note process, and is willing to expend their *own* cycles on it, go to it. If the intention is to try to force the existing committers to expend extra effort for a process change they do not particularly believe in, don't be surprised by a lack of cooperation. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] polite request about syntax
Ricardo Malafaia wrote: I've send the comment below to the documentation page about CREATE FUNCTION, but it got (rightfully) rejected, since it doesn't really add up to the discussion and is more of a request about syntax. So, here it goes: Sorry, but "datetime" vs "timestamp with time zone"?! Where is the mention of either of these on the CREATE FUNCTION page? And anyway, what is your actual complaint? You complain about something but you expect us to divine what your actual beef is. And what about the whole function text between $$'s? Yes, better than the '' of some time ago, since we don't have to put string literals in the function text between 's! still... still what? Same deal. You expect us to read your brainwaves to discover what is bugging you. Why are open-source developers so masochist? I want to use PostgreSQL, Linux, Python and others in the firm I work at, but I this way I can't really sell them the idea of moving on to something which is harder, clunkier and less supported than Oracle or M$SQL... Right now we're moving from M$SQL to Oracle and this is why I came here, to see how things are going. Well, i guess i'll try again in five years or so. Please, take this as constructive criticism, since i'm a proud open-source supporter... i would gladly use PostgreSQL at work, clunkier syntax or not, but it's otherwise difficult to sell it to my Windows-minded coworkers... If you're a proud open source supporter, then contribute. Whining about things you don't like is not the same as support. Talk is cheap. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] polite request about syntax
"Ricardo Malafaia" <[EMAIL PROTECTED]> writes: > Sorry, but "datetime" vs "timestamp with time zone"?! And what about > the whole function text between $$'s? Yes, better than the '' of some > time ago, since we don't have to put string literals in the function > text between 's! still... "timestamp with time zone" is required by the SQL standard. If you'd like to use "datetime" as an alias for it, a quick CREATE DOMAIN will do that for you. As for the $$ bit, do you have a constructive suggestion? > Why are open-source developers so masochist? You seem to be confusing a database with a point-and-drool GUI. We have those too (see pgAdmin for instance, or three or four others), but raw psql is not designed to make Windows users happy. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Release notes
Tom Lane <[EMAIL PROTECTED]> writes: > Well, I'm willing to (and I think usually have) put release-note-grade > descriptions into commit log messages, but I'm not willing to add "edit > release.sgml" to the already long process, for two basic reasons: > > * it'd make release.sgml into a commit bottleneck --- if everyone is > doing it this way, everyone's local copy of the file would be constantly > out of date, and merge conflicts would be an everyday problem. > > * correct SGML markup is a PITA. > > If *someone else* wants to troll the commit logs every so often and make > entries into release.sgml, that's fine with me. But I don't have the > bandwidth. Well we could make it "edit release.txt" which someone will fix up and turn into release.sgml later instead. I think if you put a big enough separator between entries, say two black lines, two dashes, and two more blank lines, it wouldn't even cause merge conflicts if it failed -- it would just insert the new entry in the "wrong" place which wouldn't really matter. Or you could have a release-notes directory and create a small text file in there for each major patch. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] polite request about syntax
Please, take this as constructive criticism, since i'm a proud open-source supporter... i would gladly use PostgreSQL at work, clunkier syntax or not, but it's otherwise difficult to sell it to my Windows-minded coworkers... I would love to take this as constructive criticism, but you haven't provided any. You don't like $$, o.k. do you have a better solution? What is the problem with timestamp? Joshua D. Drake best regards ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster