Re: [HACKERS] Ranges for well-ordered types
Thanks to everyone for the feedback that I've received so far. It's clear that there's interest in this. On Jun 12, 2006, at 3:22 , Josh Berkus wrote: I do think Jim is right, though, in that we may want to look for portions of the functionality which are achievable in the context of one PostgreSQL version, unless you're going to be working full-time on this patch. I definitely agree with implementing it in parts. I doubt it's possible, but perhaps a first bit might make it into 8.2 :) In real-world calendaring applications, I *certainly* see the need for a successor function. However, that would require being able to define timestamps with a variable precision, e.g. TIMESTAMP('5 minutes'). This, by itself would be a significant effort, yet useful ... maybe that's where to start? As mentioned in an earlier email, I think calendaring applications in particular would benefit from timestamp precisions of less than 1 second, e.g., TIMESTAMP('5 minutes') or TIMESTAMP('1 hour'). However, I think this is a thorny problem. To elaborate, I believe the precision has to be relative to some baseline. From 12:00, 30 minute precision would presumably allow 12:00, 12:30, 13:00, 13:30, and so on. Precision of '1 hour' would allow 12:00, 13:00, 14:00, and so on. But these are relative to the time zone they're in. While 12:00 in Tokyo (+9) would be a timestamp value with 1 hour precision, that same timestamp is 4:30 in Tehran (+3:30) if I got the math right. Is 4:30 a timestamp value with 1 hour precision? Because of this, I think timestamp precisions of less than 1 second (timestamp (0)) require storing the time zone as part of the timestamp value. Pushing this even further, would we allow arbitrary precision? For example, would 45-minute precision be allowed? In that case, I believe we'd need to go further than storing just the time zone with the timestamp value. The timestamp value would have to be relative to some baseline timestamp to be able to calculate whether or not the difference between any particular timestamp and the baseline timestamp is integral. Perhaps this could be accomplished using domains and some additional checking function? I'm not sure. It's enough to make me want to forget about the idea of disallowing any precision that is not an evenly divided into the next larger time part: any precision between 0 seconds and 1 minute would have to be a number of seconds evenly divided into 60; between 1 hour and 1 day, precision would have to be one of the values 1, 2, 3, 4, 6, 8, or 12 hours. I've been able to discuss the issue of timestamp precision without bringing up successor functions or ranges at all, and indeed I think it's orthogonal to the range implementation. I think they're both concepts that should be included in PostgreSQL, but as for myself, I'm more interested in the range implementation than the the timestamp precision issue. By the way, anyone care to weigh in on what term we should use when discussing this? Josh has used PERIOD. Should we go with that for now? A somewhat related issue: would we want any implementation to follow (at least part) of the not-yet-standard SQL/Temporal draft? Or would it be more desirable to steer clear of using any terms/syntax that was included in an attempt to prevent any possible conflict with a future SQL spec? You're probably going to have to give up on B-Tree indexes for PERIODs, and look towards GiST. For one thing, I would see UNIQUE in the context of a PERIOD defined as non-overlapping. e.g.: I think that a non-overlapping constraint goes above and beyond what UNIQUE requires. In my opinion, UNIQUE should test for equality, rather than non-overlapping, as that keeps the meaning of UNIQUE consistent across all types and may actually be useful in some instances. I do think it would be convenient to have some sort of syntax that would provide a non-overlapping constraint rather than having to code up a constraint trigger every time you wanted to do this. As Martijn pointed out, when GiST can be used for a UNIQUE constraint, we should be able to define the non-overlapping constraint quite easily. So this could be thought of as a third orthogonal issue for ranges, the first two being the range type constructor and timestamp precision 1 second. Any one of these three could be done independently and improve PostgreSQL. In combination they are definitely a very nice package. On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote: Date ranges are really closed open as well (as finite sets of isolated points are both open and closed). The only oddity would be that the date used to indicate the open end of the range might not be what the user expects. I think it's definitely a matter of interpretation. [2006-01-01, 2006-12-31] and [2006-01-01, 2007-01-01) both include the same days. Who's to say which is the real
Re: [HACKERS] postgresql and process titles
On Tue, Jun 13, 2006 at 10:36:27PM -0400, Tom Lane wrote: * not clear whether the interrupt happens when waiting for I/O. I already mentioned that this would be a problem for EXPLAIN ANALYZE, but it might be no big deal for the other uses. If we're going y UNIX signal semantics, in theory any interruptable function (as listed on the signal manpage) would be interrupted every N milliseconds to handle the interrupt. So yes, if possible the interrupt will be serviced anyway, even during I/O, by aborting a restarting the I/O continuously. Not terribly efficient. With one big exception: sometimes I/O is non-interruptable (the good old D state). In this case the interrupts will happen but will simply be queued and in fact will all be dropped except the last one (non-realtime signals are never stacked). The handler will probably be called the moment it returns to user-space. Note: Do we want SIGPROF, SIGALRM or SIGVTALRM? There's apparently a distinction, see: http://en.wikipedia.org/wiki/SIGPROF Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] CSV mode option for pg_dump
Tom Lane said: Andrew Dunstan [EMAIL PROTECTED] writes: This is just nonsense. There is not the slightest reason that CSV data cannot be embedded in a text dump nor exist as the data members of a custom or tar dump with the corresponding COPY statements modified accordingly. Well, the really *core* question here is whether we trust the stability of the CSV format definition (and code) enough to want to rely on it for data dump/restore purposes. I'm still a few years away from that, myself. AFAICT the raison d'etre of the CSV code is emit whatever it takes to satisfy this, that, and the other broken Microsoft application. That's fine as an export tool, but as a dump/reload tool, nyet. If you put it in pg_dump you're just handing neophytes another foot-gun. Well, I'm staggered. Really. The CSV definition and its lack of formality is a complete red herring in this, as are references to Microsoft. The real issue in what you say here is your suggestion that we might not be able to reproduce the input in some cases via a COPY CSV round trip. If that is so it's a bug of our (probably my) making, and must be fixed. I assert that it is not so. In fact all the tests I did during development and since were premised on recovering the input exactly. The only CSV option that breaks things in that respect is FORCE NOT NULL, and it is designed for data coming in from a non Postgres source, so it's not relevant here. Even FORCE QUOTE won't break things because it never quotes a null value, and the only semantic significance to us of quoting is the null-ness of the value. If the code is broken then it should be discoverable by test or code analysis. There is no need to refer to any other application or standard. So if you or anyone think there is a case that will not reproduce the data exactly when the same CSV options are used for output and input, I challenge you or them to provide a single example. You say you're a few years away from trusting the code. Well, it's not so huge that it's beyond analysis, and I'll be happy to explain anything that puzzles you. Perhaps more importantly, it has been in use now for 18 months. We discovered one problem with embedded line feeds very early in the 8.0 release cycle, and fixed it. After that I have not heard of a single problem. And I assure you this code is widely used. It probably isn't used much as a round trip mechanism, probably in part because we haven't provided it as a pg_dump option. So maybe we have a chicken/egg scenario here. We do have some round trip regression tests in the copy test, and those can be beefed up if necessary to increase your confidence level. I'm happy to debate details, but general assertions of we can't trust this code don't seem worth much to me. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CSV mode option for pg_dump
I agree with Andrew. If there are bugs in CSV, then we should fix them, not avoid give users this usability improvement. I know I have analyzed the code and fixed a few problems with it in the past. As far as pg_dump, I am thinking the most flexible solution would be to allow an arbitrary WITH clause to be added to COPY, like pg_dump --copy-with CSV This would allow not only CSV control, but specification of non-CSV options if desired. --- Andrew Dunstan wrote: Tom Lane said: Andrew Dunstan [EMAIL PROTECTED] writes: This is just nonsense. There is not the slightest reason that CSV data cannot be embedded in a text dump nor exist as the data members of a custom or tar dump with the corresponding COPY statements modified accordingly. Well, the really *core* question here is whether we trust the stability of the CSV format definition (and code) enough to want to rely on it for data dump/restore purposes. I'm still a few years away from that, myself. AFAICT the raison d'etre of the CSV code is emit whatever it takes to satisfy this, that, and the other broken Microsoft application. That's fine as an export tool, but as a dump/reload tool, nyet. If you put it in pg_dump you're just handing neophytes another foot-gun. Well, I'm staggered. Really. The CSV definition and its lack of formality is a complete red herring in this, as are references to Microsoft. The real issue in what you say here is your suggestion that we might not be able to reproduce the input in some cases via a COPY CSV round trip. If that is so it's a bug of our (probably my) making, and must be fixed. I assert that it is not so. In fact all the tests I did during development and since were premised on recovering the input exactly. The only CSV option that breaks things in that respect is FORCE NOT NULL, and it is designed for data coming in from a non Postgres source, so it's not relevant here. Even FORCE QUOTE won't break things because it never quotes a null value, and the only semantic significance to us of quoting is the null-ness of the value. If the code is broken then it should be discoverable by test or code analysis. There is no need to refer to any other application or standard. So if you or anyone think there is a case that will not reproduce the data exactly when the same CSV options are used for output and input, I challenge you or them to provide a single example. You say you're a few years away from trusting the code. Well, it's not so huge that it's beyond analysis, and I'll be happy to explain anything that puzzles you. Perhaps more importantly, it has been in use now for 18 months. We discovered one problem with embedded line feeds very early in the 8.0 release cycle, and fixed it. After that I have not heard of a single problem. And I assure you this code is widely used. It probably isn't used much as a round trip mechanism, probably in part because we haven't provided it as a pg_dump option. So maybe we have a chicken/egg scenario here. We do have some round trip regression tests in the copy test, and those can be beefed up if necessary to increase your confidence level. I'm happy to debate details, but general assertions of we can't trust this code don't seem worth much to me. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] PL/pgSQL: SELECT INTO EXACT
I did some work on your patch: ftp://candle.pha.pa.us/pub/postgresql/mypatches/first I switched the name of the option flag to FIRST (already a reserved word), making the default behavior PL/SQL-compatible. I also added the proper execptions to match PL/SQL. My Oracle 9 PL/SQL manual has for SELECT INTO: When you use a SELECT INTO statement without the BULK COLLECT clause, it should return only one row. If it returns more than one row, PL/SQL raises the predefined exception TOO_MANY_ROWS. However, if no rows are returned, PL/SQL raises NO_DATA_FOUND unless the SELECT statement called a SQL aggregate function such as AVG or SUM. (SQL aggregate functions always return a value or a null. So, a SELECT INTO statement that calls an aggregate function never raises NO_DATA_FOUND.) The big problem is that a lot of applications use the SELECT INTO ... IF NOT FOUND test, and I don't see any good way to keep those applications working without being modified. The #option keyword seems as bad as just giving up on being PL/SQL compatibile and using the keyword STRICT (already a reserved word) when you want PL/SQL functionality. I don't think a GUC is going to work because it will affect all functions stored in the database, and their might be functions expecting different behaviors. Setting the GUC in the function that needs it also will not work because it will spill into functions called by that function. I think we set up SELECT INTO this way originally because we didn't have execeptions, but now that we have them, I don't see a clean way to move to the PL/SQL behavior. Perhaps STRICT is the best option. Comments? --- Matt Miller wrote: On Mon, 2005-08-08 at 17:18 -0400, Tom Lane wrote: Matt Miller [EMAIL PROTECTED] writes: On Fri, 2005-07-29 at 17:52 -0400, Tom Lane wrote: I dislike the choice of EXACT, too, as it (a) adds a new reserved word and (b) doesn't seem to convey quite what is happening anyway. Not sure about a better word though ... anyone? I can attach a patch that supports [EXACT | NOEXACT]. Somehow, proposing two new reserved words instead of one doesn't seem very responsive to my gripe :-(. My intention was to introduce the idea that the current behavior should be changed, and to then suggest a path that eventually eliminates all the new reserved words. If you think that this should be a global option instead of a per-statement one, something like the (undocumented) #option hack might be a good way to specify it; that would give it per-function scope, which seems reasonable. create function myfn(...) returns ... as $$ #option select_into_1_row declare ... $$ language plpgsql; Thanks, I'll take a look at this. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CSV mode option for pg_dump
Bruce Momjian pgman@candle.pha.pa.us writes: I agree with Andrew. If there are bugs in CSV, then we should fix them, not avoid give users this usability improvement. The case for it being a usability improvement seems very weak to me; no one has yet demonstrated an actual use-case where someone would pull CSV data out of pg_dump output instead of just dumping the table directly with COPY. Now the anti case is admittedly hypothetical: I'm supposing that we will eventually be bitten by portability problems with CSV-style dumps not being loadable into future versions. But given the weak nature of the pro case, I think we should be conservative and not take that risk. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] CSV mode option for pg_dump
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: I agree with Andrew. If there are bugs in CSV, then we should fix them, not avoid give users this usability improvement. The case for it being a usability improvement seems very weak to me; no one has yet demonstrated an actual use-case where someone would pull CSV data out of pg_dump output instead of just dumping the table directly with COPY. Now the anti case is admittedly hypothetical: I'm supposing that we will eventually be bitten by portability problems with CSV-style dumps not being loadable into future versions. But given the weak nature of the pro case, I think we should be conservative and not take that risk. Well, I saw little request for COPY CSV here, but IRC users were reporting a lot of interest, and feedback from the release that added it showed it was a major feature, so just because we haven't see use-case here doesn't mean it doesn't exist. As was stated before, the use-case for this is by people we don't normally have contact with. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CSV mode option for pg_dump
Bruce Momjian pgman@candle.pha.pa.us writes: As was stated before, the use-case for this is by people we don't normally have contact with. Or who don't exist. Once again: give me a plausible use-case for pulling this data out of a pg_dump output file. You'd need data-massaging code anyway just to extract the data, so why not expect that code to convert to CSV or whatever other format you might want? If you can think of use-cases like this, why do you think the destination format is necessarily CSV and not something else? If it is something else, adding a CSV option to pg_dump makes it *harder* not easier to write that massaging code, because now it's got to cope with N dump formats not one. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CSV mode option for pg_dump
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: As was stated before, the use-case for this is by people we don't normally have contact with. Or who don't exist. Once again: give me a plausible use-case for pulling this data out of a pg_dump output file. You'd need data-massaging code anyway just to extract the data, so why not expect that code to convert to CSV or whatever other format you might want? If you can think of use-cases like this, why do you think the destination format is necessarily CSV and not something else? If it is something else, adding a CSV option to pg_dump makes it *harder* not easier to write that massaging code, because now it's got to cope with N dump formats not one. I don't have to think of a use case. I trusted the people who said we needed CSV, so I trust them again if they say doing pg_dump with CSV would be a good idea. Also, my suggestion of --copy-with would allow CSV and other format modifications with minimal code and complexity. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] postgresql and process titles
Martijn van Oosterhout kleptog@svana.org writes: With one big exception: sometimes I/O is non-interruptable (the good old D state). In this case the interrupts will happen but will simply be queued and in fact will all be dropped except the last one (non-realtime signals are never stacked). The handler will probably be called the moment it returns to user-space. If backends store their current status in shared memory then a separate process entirely can receive the interrupts, scan through the shared memory process states and do the accounting. It would never be interuppting i/o since that process would never be doing i/o. It could use real-time signals, reads on /dev/rtc, or whatever other methods exist for scheduling periodic events since it would be doing nothing but handling these interrupts. The neat thing about this is that it would be possible to look at a process from outside instead of having to hijack the application to get feedback. You could find information like total time spent in i/o versus cpu aggregated across all queries from a single backend or for all backends. The downside is that to handle things like EXPLAIN ANALYZE you would have to provide enough information about plans to this accounting process for it to store the information. Plans are currently purely local to the backend running them. Perhaps it would be enough to provide a unique id for the plan (sequential id per backend would be enough). And a unique id for the plan node. The accounting process wouldn't know anything more about what that node represented but the backend could later query to fetch the information and associate it with the plan. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CSV mode option for pg_dump
On Tue, Jun 13, 2006 at 09:14:48AM -0700, Joshua D. Drake wrote: I don't get it. If you can use psql then you already have csv support. psql -c 'COPY pg_class TO STDOUT WITH CSV' postgres pg_class.csv If you data looks like this: foo barbaz bing You are o.k. You have three columns, tab delimited. However if you data looks like this: foo bar baz bing You have a problem. I'm not exactly sure what you're getting at here: postgres=# create temp table x (a text, b text, c text); CREATE TABLE postgres=# insert into x values ('blah', 'hello world', 'hmmm postgres'# '); INSERT 0 1 postgres=# \copy x to stdout csv blah,helloworld,hmmm An alternative although I don't know what kind of headaches it would cause is to have a text delimiter as well as a field delimter, e.g; Postgresql already delimits CSV fields as required, so I'm not sure what you're asking here... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] CSV mode option for pg_dump
On Wed, Jun 14, 2006 at 10:55:04AM -0400, Bruce Momjian wrote: Once again: give me a plausible use-case for pulling this data out of a pg_dump output file. You'd need data-massaging code anyway just to extract the data, so why not expect that code to convert to CSV or whatever other format you might want? If you can think of use-cases like this, why do you think the destination format is necessarily CSV and not something else? If it is something else, adding a CSV option to pg_dump makes it *harder* not easier to write that massaging code, because now it's got to cope with N dump formats not one. I don't have to think of a use case. I trusted the people who said we needed CSV, so I trust them again if they say doing pg_dump with CSV would be a good idea. I think the point is that it's hard to imagine a use case for CSV support in pg_dump that isn't already better served by using psql. If you add it to pg_dump, people have to write special code to extract it anyway, so why can't they handle the tab delimited themselves. OTOH, if you use psql you get straightforward CSV wit no garbage at the beginning or end, just straight, raw, CSV. $ psql postgres -c '\copy pg_namespace to stdout csv' pg_toast,10, pg_temp_1,10, pg_catalog,10,{kleptog=UC/kleptog,=U/kleptog} public,10,{kleptog=UC/kleptog,=UC/kleptog} information_schema,10,{kleptog=UC/kleptog,=U/kleptog} Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] CSV mode option for pg_dump
Andrew Dunstan [EMAIL PROTECTED] writes: The CSV definition and its lack of formality is a complete red herring in this, as are references to Microsoft. I think then you've missed the real issue. The real issue in what you say here is your suggestion that we might not be able to reproduce the input in some cases via a COPY CSV round trip. No that's not the suggestion. The question is: what do you do when the next version of Excel comes out and can't parse your CSV files any more? If you change your format to match then you won't be able to parse old pg_dump files any more. You'll end up with an option for each csv variant and then how does the user know which option to use to read a given pg_dump archive? And before you ask, yes, virtually every version of Excel has changed its csv file format. I think the problem here is that pg_dump is serving double duty as a postgres-postgres tool and some people using it as a kind of batch COPY. What benefit does it buy you over a script that runs COPY for each table if that's what you want? -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] CSV mode option for pg_dump
Bruce Momjian wrote: As was stated before, the use-case for this is by people we don't normally have contact with. I do think we need a use case for what we do. The main use case seems to me to be where you are exporting a whole database or most of it with a very large number of tables, and it is convenient to have all the CSVs created for you rather than have to make them manually one at a time. You could get these out of, say, a tar format dump very easily. I said near the beginning of this that a pgfoundry project to create a tool for this might be an alternative way to go. If that's the consensus then Ok. I just bristle a bit at the suggestion that we might not get back what we started with from a CSV dump, because we can, AFAIK. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CSV mode option for pg_dump
Andrew Dunstan wrote: Bruce Momjian wrote: As was stated before, the use-case for this is by people we don't normally have contact with. I do think we need a use case for what we do. The main use case seems to me to be where you are exporting a whole database or most of it with a very large number of tables, and it is convenient to have all the CSVs created for you rather than have to make them manually one at a time. You could get these out of, say, a tar format dump very easily. I said near the beginning of this that a pgfoundry project to create a tool for this might be an alternative way to go. If that's the consensus then Ok. I just bristle a bit at the suggestion that we might not get back what we started with from a CSV dump, because we can, AFAIK. For me, the use case would be, what format do I want a dump in if it is for long-term storage? Do I want it in a PostgreSQL-native format, or in a more universal format that can be loaded into PostgreSQL tomorrow, and perhaps loaded into some other database, with modification, ten years from now. I just had that issue on my home system for file system backups, going from cpio to ustar (POSIX.1-1988 / IEEE Std1003.2), but it seems that POSIX.1-2001 would be best if my operating system supported it. (Perhaps cpio was better?) Anyway, I never thought there would be a large demand for COPY CSV, but obviously there is, so I have concluded that other people's environment and skills are different enough from my own that I am willing to accept the idea there is a use case when I don't understand it. I will let the people who work in those environments make that decision. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] CSV mode option for pg_dump
Greg Stark wrote: And before you ask, yes, virtually every version of Excel has changed its csv file format. I'd be interested to hear of Excel variants we haven't catered for - our CSV parsing is some of the most flexible and complete around, IMNSHO. BTW, I suspect we won't see too much movement on this front in future, given the moves on the XML front. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Question about Ctrl-C and less
Modified version of this patch applied by Tom. --- Martijn van Oosterhout wrote: -- Start of PGP signed section. On Sun, Oct 16, 2005 at 04:06:04PM -0400, Andrew Dunstan wrote: Martin, Let's see the patch. I assume it should be fairly small. If we could get it in early in the 8.2 cycle we would have plenty of time to bang on it. In principle this sounds reasonable to me, but psql can be broken quite easily - I know as I've done it a couple of times ;-) Very well, patch attached. It's quite simple actually. However, there seems to be some push back from people suggesting that jumping back to the main loop before the pager has quit is not buggy behaviour. Assuming that a ^C will kill the pager is just folly. Tom asked if we should be blocking SIGQUIT and SIGHUP too. Standard procedure for spawning external interactive processes includes blocking SIGQUIT too (see system() manpage). Logically speaking, when the user sends an interrupt from the keyboard they expect to interrupt the currently active *interaxtive* process. Hence, once psql has spawned the pager, it should ignore such interrupts until control is returned (after pclose). So yes, I would suggest blocking SIGQUIT also, if only to prevent terminal corruption problems. Interactive programs like less trap SIGQUIT to restore the terminal settings on exit, but the exit anyway. SIGHUP is different. It is sent by the system, not the user, indicating the terminal has disconnected. psql is not a daemon expecting to survive that and hence should quit as normal, along with all other processes on that terminal. If this isn't going to make 8.1 anyway I'd probably go for a patch that got rid of the longjmp altogether (if people will accept that), fixing the memory leaks at the same time. At the moment I'm just looking for a concensus that it's a problem to be solved. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. [ Attachment, skipping... ] -- End of PGP section, PGP failed! -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] postgresql and process titles
Greg Stark [EMAIL PROTECTED] writes: If backends store their current status in shared memory then a separate process entirely can receive the interrupts, scan through the shared memory process states and do the accounting. This sounds good until you think about locking. It'd be quite impractical to implement anything as fine-grained as EXPLAIN ANALYZE this way, because of the overhead involved in taking and releasing spinlocks. It could be practical as a replacement for stats_command_string messages, though. I'm not sure about replacing ps_status with this. I don't think there is a way for one process to set another's status (on most platforms anyway). You might argue that we could abandon ps_status reporting altogether if we had something better, but I'm unconvinced ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Multi-byte and client side character encoding tests for
Where are we on this? Should we just add a URL to the full 1MB test files? --- Ayush Parashar wrote: Hi there, I am new to this list. I have made some additions to multi-byte regression tests ( ./src/test/mb), to include regression tests for copy command. This can be used to test multi-byte extensions of postgresql and client character encoding, for copy command. The test uses the following complete character sets: big5 gbk uhc gb18030 euc_cn latin8 iso_8859_6 iso_8859_7 euc_kr euc_jp. I am attaching only the modified README and mbregress.sh with the email. I have not included the various character sets with the email as they are nearly 1MB. Can you please comment on this..? I can provide a patch for the same, which includes the character sets. Thanks, Ayush ps: sorry for the previous blank email, that was by mistake. [ Attachment, skipping... ] [ Attachment, skipping... ] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Multi-byte and client side character encoding tests for
Bruce Momjian pgman@candle.pha.pa.us writes: Where are we on this? Should we just add a URL to the full 1MB test files? Adding an MB to the distribution for tests that hardly anyone needs to run is surely not a good idea. What about putting them on pgfoundry and then putting a link to that somewhere in the docs? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Multi-byte and client side character encoding tests for
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Where are we on this? Should we just add a URL to the full 1MB test files? Adding an MB to the distribution for tests that hardly anyone needs to run is surely not a good idea. What about putting them on pgfoundry and then putting a link to that somewhere in the docs? Yea, that sounds good. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] On hardcoded type aliases and typmod for user types
Thread added to TODO. --- Martijn van Oosterhout wrote: -- Start of PGP signed section. Hi, As part of previous discussions about typmod for user type, Tom mentioned that you would need to make type and function names equivalent. As it turns out, if you refactor a few rules, you can actually make it work and manage them seperately. For this to work the current col_name_keyword non-terminal has to be divided into few more categories. The criterion is mostly whether it is followed by a left parenthsis. 1. Some can be moved to unreserved words, mostly the types BIT, VARCHAR, BIGINT etc since they don't need special rules anymore. 2. Words that have special productions (eg CONVERT, COALESCE etc), these can still only be used as column names, not types or (user-defined) functions. 3. Words which can be column names functions but not types. These never appear normally with a parenthesis, so they will be interpreted as a function if there is one. (eg SETOF, NATIONAL, etc) 4. Words that can be column names and types but not functions. These are artifacts of the grammer due to the fact that VARYING is unrestricted. These are BIT, NCHAR, CHAR_P and CHARACTER. After this, you can have user-functions that take an arbitrary set of parameters. There'll need to be a bit of code to verify the arguments. It results in a net gain of 15 keywords for functions and about 30 for types. My question is, should users be able to create types schema.int4 and schema.integer simultaneously. Currently it allows you but it's not handled very well (\dT doesn't list both). Should this be allowed? Should aliasing for DEC and DECIMAL - NUMERIC be done for user-defined types? Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. -- End of PGP section, PGP failed! -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Multi-byte and client side character encoding tests
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Where are we on this? Should we just add a URL to the full 1MB test files? Adding an MB to the distribution for tests that hardly anyone needs to run is surely not a good idea. What about putting them on pgfoundry and then putting a link to that somewhere in the docs? Yea, that sounds good. The running of extra tests outside our standard regression set will be one of the topics discussed during my session in Toronto. I don't mind having them stored elsewhere, although it will make things a bit more complex, but this use should be kept in mind (for example, I don't think pgfoundry has a CVSup server, although it probably could have easily enough). cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] postgresql and process titles
Tom Lane [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: If backends store their current status in shared memory then a separate process entirely can receive the interrupts, scan through the shared memory process states and do the accounting. This sounds good until you think about locking. It'd be quite impractical to implement anything as fine-grained as EXPLAIN ANALYZE this way, because of the overhead involved in taking and releasing spinlocks. I'm not entirely convinced. The only other process that would be looking at the information would be the statistics accumulator which would only be waking up every 100ms or so. There would be no contention with other backends reporting their info. I was also thinking of reporting only basic information like which of a small number of states the backend was in and which node of the plan is being executed. If those are just integers then it might be possible to get away without locking at all, storing them as sig_atomic_t. I think there would still be a fair amount of cache coherency overhead like we see with the tas instruction on Xeons but only if that backend's monitor block is actually being touched by the statistics accumulator process which would presumably be only when running an EXPLAIN ANALYZE or other tracing facility. -- greg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Work-in-progress referential action trigger
Added to TODO: o Fix problem when cascading referential triggers make changes on cascaded tables, seeing the tables in an intermediate state http://archives.postgresql.org/pgsql-hackers/2005-09/msg00174.php http://archives.postgresql.org/pgsql-hackers/2005-09/msg00174.php --- Stephan Szabo wrote: [Hackers now seems more appropriate] On Thu, 1 Sep 2005, Stephan Szabo wrote: On Tue, 23 Aug 2005, Stephan Szabo wrote: Here's my current work in progress for 8.1 devel related to fixing the timing issues with referential actions having their checks run on intermediate states. I've only put in a simple test that failed against 8.0 in the regression patch and regression still passes for me. There's still an outstanding question of whether looping gives the correct result in the presence of explicit inserts and set constraints immediate in before triggers. As Darcy noticed, the patch as given does definately still have problems with before triggers. I was able to construct a case that violates the constraint with an update in a before delete trigger. I think this might be why the spec has the wierd timing rules for before triggers on cascaded deletes such that the deletions happen before the before triggers. We have a similar problem for before triggers that update the rows that are being cascade updated. The following seems to violate the constraint for me on 8.0.3: drop table pk cascade; drop table fk cascade; drop function fk_move(); create table pk(a int primary key); create table fk(a int references pk on delete cascade on update cascade, b int); create function fk_move() returns trigger as ' begin raise notice '' about to move for % '', old.b; update fk set b=b-1 where b old.b; return new; end;' language 'plpgsql'; create trigger fkmovetrig before update on fk for each row execute procedure fk_move(); insert into pk values(1); insert into pk values(2); insert into fk values(1,1); insert into fk values(1,2); insert into fk values(2,3); select * from pk; select * from fk; update pk set a = 3 where a = 1; select * from pk; select * from fk; This gives me (3,1), (1,1) and (2,2) as the rows in fk where the (1,1) row is invalid. This is obviously wrong, but the question is, what is the correct answer? Should the update in the before trigger trying to change b on a row that no longer has a reference have errored? Well, the spec seems to get out of this simply. I read SQL2003's trigger execution information (specifically 14.27 GR5g*) to say that before triggers that call data changing statements are invalid. We can't do that for compatibility reasons, but it would allow us to say that modifying a row in a before trigger that is also a row selected in the outer statement is an error for this update case. It'd presumably be an error for a normal delete as well, although I think it might be relaxable for cascaded deletes because the spec seems to say that the before triggers for deletions caused by the cascade are actually run after the removals. I'm not sure whether we could easily differentiate this case from any other cases where the row was modified twice either yet. --- * If TR is a BEFORE trigger and if, before the completion of the execution of an SQL procedure statement simply contained in TSS, an attempt is made to execute an SQL-data change statement or an SQL-invoked routine that possibly modifies SQL-data, then an exception condition is raised: prohibited statement encountered during trigger execution. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proof of concept COLLATE support with patch
Thread added to TODO.detail. --- Greg Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: I still doesn't get where the hostility towards this functionality comes from. We're not really willing to say here is a piece of syntax REQUIRED BY THE SQL SPEC which we only support on some platforms. readline, O_DIRECT, and the like are a completely inappropriate analogy, because those are inherently platform-dependent (and not in the spec). But that's not the case at all. The syntax can be supported everywhere it would just be somewhat faster on some platforms than others. It's already reasonably fast on any platform that caches locale information which includes glibc and presumably other free software libcs. It would be slightly faster if there are _l functions. And much slower if the libc setlocale implementation is braindead. But there's nothing wrong with saying it's slow because your libc is slow. Compile with this freely available library which has a better implementation. The programming syntax would still be exactly 100% the same. The objection is fundamentally that a platform-specific implementation cannot be our long-term goal, and so expending effort on creating one seems like a diversion. If there were a plan put forward showing how this is just a useful way-station, and we could see how we'd later get rid of the glibc dependency without throwing away the work already done, then it would be a different story. It's not like the actual calls to setlocale are going to be much code. One day presumably some variant of these _l functions will become entirely standard. In which case you're talking about potentially throwing away 50 lines of code. The bulk of the code is going to be parsing and implementing the actual syntax and behaviour of the SQL spec. And in any case I wouldn't expect it to ever get thrown away. There will be people compiling on RH9 or similar vintage systems for a long time. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Locale implementation questions
Thead added to TODO.detail. --- Tatsuo Ishii wrote: 3. Compiled locale files are large. One UTF-8 locale datafile can exceed a megabyte. Do we want the option of disabling it for small systems? To avoid the problem, you could dynmically load the compiled tables. The charset conversion tables are handled similar way. Also I think it's important to allow user defined collate data. To implement the CREATE COLLATE syntax, we need to have that capability anyway. 4. Do we want the option of running system locale in parallel with the internal ones? 5. I think we're going to have to deal with the very real possibility that our locale database will not be as good as some of the system provided ones. The question is how. This is quite unlike timezones which are quite standardized and rarely change. That database is quite well maintained. Would people object to a configure option that selected: --with-locales=internal (use pg database) --with-locales=system (use system database for win32, glibc or MacOS X) --with-locales=none (what we support now, which is neither) I don't think it will be much of an issue to support this, all the functions take the same parameters and have almost the same names. To be honest, I don't understand why we have to rely on (often broken) system locales. I don't think building our own locale data is too hard, and once we make up it, the maintenace cost will be very small since it should not be changed regularly. Moreover we could enjoy the benefit that PostgreSQL handles collations in a corret manner on any platform which PostgreSQL supports. 6. Locales for SQL_ASCII. Seems to me you have two options, either reject COLLATE altogether unless they specify a charset, or don't care and let the user shoot themselves in the foot if they wish... BTW, this MacOS locale supports seems to be new for 10.4.2 according to the CVS log info, can anyone confirm this? Anyway, I hope this post didn't bore too much. Locale support has been one of those things that has bugged me for a long time and it would be nice if there could be some real movement. Right. We Japanese (and probably Chinese too) have been bugged by the broken mutibyte locales for long time. Using C locale help us to a certain extent, but for Unicode we need correct locale data, othewise the sorted data will be completely chaos. -- SRA OSS, Inc. Japan Tatsuo Ishii ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Alternative variable length structure
I assume the conclusion from this email thread is that though the idea is interesting, the complexity added would not be worth the saving of a few bytes. --- ITAGAKI Takahiro wrote: Hi Hackers, PostgreSQL can treat variable-length data flexibly, but therefore it consumes more spaces if we store short data. Headers of variable-length types use 4 bytes regardless of the data length. My idea is to change the header itself to variable-length. In order to reduce the size of short data, I wrote a patch to encode lengths into the first several bits of structure. Also, the alignments of the types were changed to 'char' from 'int'. I know my patch is still insufficient, for example, the types cannot be TOASTed. But I guess this compression works well for short text. I'll appreciate any comments. thanks. the result of patch # create table txttbl (v1 text, v2 text, v3 text, v4 text); # create table strtbl (v1 string, v2 string, v3 string, v4 string); # insert into txttbl values('A', 'B', 'C', 'D'); # insert into strtbl values('A', 'B', 'C', 'D'); # select * from pgstattuple('txttbl'); -[ RECORD 1 ]--+-- table_len | 8192 tuple_count| 1 tuple_len | 57-- 28 + (5+3) + (5+3) + (5+3) + (5) ... # select * from pgstattuple('strtbl'); -[ RECORD 1 ]--+-- table_len | 8192 tuple_count| 1 tuple_len | 36-- 28 + 2 + 2 + 2 + 2 ... --- ITAGAKI Takahiro NTT Cyber Space Laboratories [ Attachment, skipping... ] ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 64-bit API for large objects
Thread added to TODO. As far as I can see, this thread never produced an patch that could be applied. --- Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: On Fri, 23 Sep 2005, Tom Lane wrote: postgresql-fe.h defines a ton of stuff that has no business being visible to libpq's client applications. It's designed to be used by our *own* client-side code (psql and the like), but we have not made any attempt to keep it from defining stuff that would likely break other peoples' code. So does this mean that there is a different, more advanced and more likely to break random other code, client library where this call would fit better? I've been thinking more about this and come to these conclusions: 1. libpq_fe.h definitely cannot include postgres_fe.h; in fact, it has no business even defining a type named int64. That is way too likely to collide with symbols coming from elsewhere in a client compilation. I think what we need is to declare a type named pg_int64 and use that in the externally visible declarations. The most reasonable place to put the typedef is postgres_ext.h. This will mean making configure generate postgres_ext.h from a template postgres_ext.h.in, but that's no big deal. 2. We need a strategy for what to do when configure doesn't find a working int64 type. My inclination is to just not export the functions in that case. So normally, postgres_ext.h would contain something like #define HAVE_PG_INT64 1 typedef long long int pg_int64; but neither of these would appear if configure couldn't find a working type. In libpq-fe.h, we'd have #ifdef HAVE_PG_INT64 extern pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence); extern pg_int64 lo_tell64(PGconn *conn, int fd); #endif and similarly for all the code inside libpq. The reason this seems like a good idea is that client code could key off #ifdef HAVE_PG_INT64 to detect whether the lo64 functions are available; which is useful even if you don't care about machines without int64, because you still need to think about machines with pre-8.2 PG installations. 3. This is still not 100% bulletproof, as it doesn't address situations like building PG with gcc and then trying to compile client apps with a vendor cc that doesn't understand long long int. The compile would choke on the typedef even if you weren't trying to use large objects at all. I don't see any very nice way around that. It might be worth doing this in postgres_ext.h: #ifndef NO_PG_INT64 #define HAVE_PG_INT64 1 typedef long long int pg_int64; #endif which would at least provide an escape hatch for such situations: define NO_PG_INT64 before trying to build. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] postgresql and process titles
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: This sounds good until you think about locking. It'd be quite impractical to implement anything as fine-grained as EXPLAIN ANALYZE this way, because of the overhead involved in taking and releasing spinlocks. I'm not entirely convinced. The only other process that would be looking at the information would be the statistics accumulator which would only be waking up every 100ms or so. There would be no contention with other backends reporting their info. The numbers I've been looking at lately say that heavy lock traffic is expensive, particularly on SMP machines, even with zero contention. Seems the cache coherency protocol costs a lot even when it's not doing anything... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Alternative variable length structure
On Wed, Jun 14, 2006 at 02:53:10PM -0400, Bruce Momjian wrote: I assume the conclusion from this email thread is that though the idea is interesting, the complexity added would not be worth the saving of a few bytes. Anyone do any testing? I'm also wondering if this would be useful to allow fields larger than 1G. --- ITAGAKI Takahiro wrote: Hi Hackers, PostgreSQL can treat variable-length data flexibly, but therefore it consumes more spaces if we store short data. Headers of variable-length types use 4 bytes regardless of the data length. My idea is to change the header itself to variable-length. In order to reduce the size of short data, I wrote a patch to encode lengths into the first several bits of structure. Also, the alignments of the types were changed to 'char' from 'int'. I know my patch is still insufficient, for example, the types cannot be TOASTed. But I guess this compression works well for short text. I'll appreciate any comments. thanks. the result of patch # create table txttbl (v1 text, v2 text, v3 text, v4 text); # create table strtbl (v1 string, v2 string, v3 string, v4 string); # insert into txttbl values('A', 'B', 'C', 'D'); # insert into strtbl values('A', 'B', 'C', 'D'); # select * from pgstattuple('txttbl'); -[ RECORD 1 ]--+-- table_len | 8192 tuple_count| 1 tuple_len | 57-- 28 + (5+3) + (5+3) + (5+3) + (5) ... # select * from pgstattuple('strtbl'); -[ RECORD 1 ]--+-- table_len | 8192 tuple_count| 1 tuple_len | 36-- 28 + 2 + 2 + 2 + 2 ... --- ITAGAKI Takahiro NTT Cyber Space Laboratories [ Attachment, skipping... ] ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Alternative variable length structure
Jim C. Nasby wrote: On Wed, Jun 14, 2006 at 02:53:10PM -0400, Bruce Momjian wrote: I assume the conclusion from this email thread is that though the idea is interesting, the complexity added would not be worth the saving of a few bytes. Anyone do any testing? I'm also wondering if this would be useful to allow fields larger than 1G. The submitter showed the pathological case where a single char was stored in a text field, and showed the reduced size (below). There were no performance numbers given. It seems like an edge case, especially since we have a char type that is a single byte. --- ITAGAKI Takahiro wrote: Hi Hackers, PostgreSQL can treat variable-length data flexibly, but therefore it consumes more spaces if we store short data. Headers of variable-length types use 4 bytes regardless of the data length. My idea is to change the header itself to variable-length. In order to reduce the size of short data, I wrote a patch to encode lengths into the first several bits of structure. Also, the alignments of the types were changed to 'char' from 'int'. I know my patch is still insufficient, for example, the types cannot be TOASTed. But I guess this compression works well for short text. I'll appreciate any comments. thanks. the result of patch # create table txttbl (v1 text, v2 text, v3 text, v4 text); # create table strtbl (v1 string, v2 string, v3 string, v4 string); # insert into txttbl values('A', 'B', 'C', 'D'); # insert into strtbl values('A', 'B', 'C', 'D'); # select * from pgstattuple('txttbl'); -[ RECORD 1 ]--+-- table_len | 8192 tuple_count| 1 tuple_len | 57-- 28 + (5+3) + (5+3) + (5+3) + (5) ... # select * from pgstattuple('strtbl'); -[ RECORD 1 ]--+-- table_len | 8192 tuple_count| 1 tuple_len | 36-- 28 + 2 + 2 + 2 + 2 ... --- ITAGAKI Takahiro NTT Cyber Space Laboratories [ Attachment, skipping... ] ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] postgresql and process titles
On Wed, Jun 14, 2006 at 03:51:28PM -0400, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: This sounds good until you think about locking. It'd be quite impractical to implement anything as fine-grained as EXPLAIN ANALYZE this way, because of the overhead involved in taking and releasing spinlocks. I'm not entirely convinced. The only other process that would be looking at the information would be the statistics accumulator which would only be waking up every 100ms or so. There would be no contention with other backends reporting their info. The numbers I've been looking at lately say that heavy lock traffic is expensive, particularly on SMP machines, even with zero contention. Seems the cache coherency protocol costs a lot even when it's not doing anything... Are there any ways we could avoid the locking? One idea would be to keep something akin to a FIFO, where the backend would write records instead of updating/over-writing them, and the reader process would only read records where there was no risk that they were still being written. That would mean that the reader would need to stay at least one record behind the backend, but that's probably manageable. The downside is more memory usage, but I think we could limit that by also keeping track of what record the reader had last read. The backend would check that, and if the reader was more than X records behind, the backend would update the most recent record it had written, instead of writing a new one. That would place an effective limit on how much memory was consumed. But... I have no idea how exactly shared memory works, so maybe this plan is fundamentally broken. But hopefully there's some way to handle the locking problems, because a seperate reader process does sound like an interesting possibility. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql and process titles
On Wed, Jun 14, 2006 at 03:21:55PM -0500, Jim C. Nasby wrote: One idea would be to keep something akin to a FIFO, where the backend would write records instead of updating/over-writing them, and the reader process would only read records where there was no risk that they were still being written. That would mean that the reader would need to stay at least one record behind the backend, but that's probably manageable. The problem with a FIFO is that the pointers controlling where the start/end are become the contention. The only thing I can think of is to have the backends only write atomic types and give each backend their own cache-line. The profiler would simply wake up every N ms and read each value, accumulating them somewhere. No locking and there would be a maximum of one cache-line bounce per backend per N ms. I'm not sure you can get better than that under the situation. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Alternative variable length structure
On Wed, Jun 14, 2006 at 04:21:34PM -0400, Bruce Momjian wrote: Jim C. Nasby wrote: On Wed, Jun 14, 2006 at 02:53:10PM -0400, Bruce Momjian wrote: I assume the conclusion from this email thread is that though the idea is interesting, the complexity added would not be worth the saving of a few bytes. Anyone do any testing? I'm also wondering if this would be useful to allow fields larger than 1G. The submitter showed the pathological case where a single char was stored in a text field, and showed the reduced size (below). There were no performance numbers given. It seems like an edge case, especially since we have a char type that is a single byte. Well, depending on how the patch works I could see it being valuable for tables that have a number of 'short' text fields, where short is less than 127 bytes. I've got some tables like that I can test on, at least to see the size difference. Not really sure what a valid performance test would be, though... I'm wondering if it would be worth trying to organize users to do testing of stuff like this. I'm sure there's lots of folks who know how to apply a patch and have test data that could benefit from patches like this. (I'm assuming this patch didn't place any substantial performance penalties into the backend...) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] postgresql and process titles
Jim C. Nasby [EMAIL PROTECTED] writes: The numbers I've been looking at lately say that heavy lock traffic is expensive, particularly on SMP machines, even with zero contention. Seems the cache coherency protocol costs a lot even when it's not doing anything... Are there any ways we could avoid the locking? Well if all we want to do is reproduce the current functionality of EXPLAIN ANALYZE, all you need is a single sig_atomic_t int that you store the address of the current node in. When you're done with the query you ask the central statistics accumulator for how the counts of how many times it saw each node address. But that's sort of skipping some steps. The reason tas is slow is because it needs to ensure cache coherency. Even just storing an int into shared memory when its cache line is owned by the processor running the stats accumulator would will have the same overhead. That said I don't think it would actually be as much. TAS has to ensure no other processor overwrites the data from this processor. Whereas merely storing into a sig_atomic_t just has to ensure that you don't get a partially written integer. And it seems sort of silly to be doing this much work just to reproduce the current functionality even if it's lower overhead. The big advantage of a separate process would be that it could do much more. But that would require pushing some representation of the plan out to the stats collector, storing more information about the state than just what node you're at currently such as whether you're in the midst of index lookups, sequential scans, operator and function execution, etc. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql and process titles
On Wed, Jun 14, 2006 at 10:37:59PM +0200, Martijn van Oosterhout wrote: On Wed, Jun 14, 2006 at 03:21:55PM -0500, Jim C. Nasby wrote: One idea would be to keep something akin to a FIFO, where the backend would write records instead of updating/over-writing them, and the reader process would only read records where there was no risk that they were still being written. That would mean that the reader would need to stay at least one record behind the backend, but that's probably manageable. The problem with a FIFO is that the pointers controlling where the start/end are become the contention. Even if what's being written is a single byte/word and we're taking intentional steps to make sure that reading a pointer that's off by 1 isn't an issue? Is the issue that if CPU A writes a pointer and CPU B immediately tries to read it that it could get old data from it's cache? The only thing I can think of is to have the backends only write atomic types and give each backend their own cache-line. The profiler would simply wake up every N ms and read each value, accumulating them somewhere. Would everything we need fit in a cache line? No locking and there would be a maximum of one cache-line bounce per backend per N ms. I'm not sure you can get better than that under the situation. What scheduler timer resolution do modern OSes use? It's likely that on a system with more runnable processes than processors, each process will only run for N ms anyway... Speaking of which... doesn't the variables involved in scheduling mean we couldn't use sampling to calculate cost in EXPLAIN ANALYZE? Or is the assumption just that it will affect all nodes evenly? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Alternative variable length structure
The code churn to do this is going to be quite significant, as well a performance-wise hit perhaps, so it has to be a big win. --- Jim C. Nasby wrote: On Wed, Jun 14, 2006 at 04:21:34PM -0400, Bruce Momjian wrote: Jim C. Nasby wrote: On Wed, Jun 14, 2006 at 02:53:10PM -0400, Bruce Momjian wrote: I assume the conclusion from this email thread is that though the idea is interesting, the complexity added would not be worth the saving of a few bytes. Anyone do any testing? I'm also wondering if this would be useful to allow fields larger than 1G. The submitter showed the pathological case where a single char was stored in a text field, and showed the reduced size (below). There were no performance numbers given. It seems like an edge case, especially since we have a char type that is a single byte. Well, depending on how the patch works I could see it being valuable for tables that have a number of 'short' text fields, where short is less than 127 bytes. I've got some tables like that I can test on, at least to see the size difference. Not really sure what a valid performance test would be, though... I'm wondering if it would be worth trying to organize users to do testing of stuff like this. I'm sure there's lots of folks who know how to apply a patch and have test data that could benefit from patches like this. (I'm assuming this patch didn't place any substantial performance penalties into the backend...) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] PL/pgSQL: SELECT INTO EXACT
I have update the patch at: ftp://candle.pha.pa.us/pub/postgresql/mypatches/strict I re-did it to use STRICT for Oracle PL/SQL syntax. I don't think we are going to be able to do any better than that, even in future versions. I added documentation that should help too. --- Bruce Momjian wrote: I did some work on your patch: ftp://candle.pha.pa.us/pub/postgresql/mypatches/first I switched the name of the option flag to FIRST (already a reserved word), making the default behavior PL/SQL-compatible. I also added the proper execptions to match PL/SQL. My Oracle 9 PL/SQL manual has for SELECT INTO: When you use a SELECT INTO statement without the BULK COLLECT clause, it should return only one row. If it returns more than one row, PL/SQL raises the predefined exception TOO_MANY_ROWS. However, if no rows are returned, PL/SQL raises NO_DATA_FOUND unless the SELECT statement called a SQL aggregate function such as AVG or SUM. (SQL aggregate functions always return a value or a null. So, a SELECT INTO statement that calls an aggregate function never raises NO_DATA_FOUND.) The big problem is that a lot of applications use the SELECT INTO ... IF NOT FOUND test, and I don't see any good way to keep those applications working without being modified. The #option keyword seems as bad as just giving up on being PL/SQL compatibile and using the keyword STRICT (already a reserved word) when you want PL/SQL functionality. I don't think a GUC is going to work because it will affect all functions stored in the database, and their might be functions expecting different behaviors. Setting the GUC in the function that needs it also will not work because it will spill into functions called by that function. I think we set up SELECT INTO this way originally because we didn't have execeptions, but now that we have them, I don't see a clean way to move to the PL/SQL behavior. Perhaps STRICT is the best option. Comments? --- Matt Miller wrote: On Mon, 2005-08-08 at 17:18 -0400, Tom Lane wrote: Matt Miller [EMAIL PROTECTED] writes: On Fri, 2005-07-29 at 17:52 -0400, Tom Lane wrote: I dislike the choice of EXACT, too, as it (a) adds a new reserved word and (b) doesn't seem to convey quite what is happening anyway. Not sure about a better word though ... anyone? I can attach a patch that supports [EXACT | NOEXACT]. Somehow, proposing two new reserved words instead of one doesn't seem very responsive to my gripe :-(. My intention was to introduce the idea that the current behavior should be changed, and to then suggest a path that eventually eliminates all the new reserved words. If you think that this should be a global option instead of a per-statement one, something like the (undocumented) #option hack might be a good way to specify it; that would give it per-function scope, which seems reasonable. create function myfn(...) returns ... as $$ #option select_into_1_row declare ... $$ language plpgsql; Thanks, I'll take a look at this. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] postgresql and process titles
Greg Stark [EMAIL PROTECTED] writes: Well if all we want to do is reproduce the current functionality of EXPLAIN ANALYZE, all you need is a single sig_atomic_t int that you store the address of the current node in. Do I need to point out that sig_atomic_t can't portably be assumed to be wider than char? We do currently assume that TransactionId can be read/written atomically, but (a) that's an int not a pointer, and (b) the assumption is cruft that we really ought to get rid of. In any case, speculating about whether we can do something useful with atomic types ignores the main real problem the thread is about. Anybody remember process titles and current command strings? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] timezones to own config file
Joachim Wieland [EMAIL PROTECTED] writes: Getting a list of active timezones could be easily done with a system view but you might not want to promise its existence when you think about moving the whole thing to a system catalog later... A read-only view wouldn't be a bad idea, actually, for both the long-form TZ names and the abbreviations. It'd be easy to cons one up the same way as our other views based on functions. Doesn't even need to be in core, could be contrib, if there's doubts about its usefulness. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Multi-byte and client side character encoding
Bruce, Tom, Andrew, On 6/14/06 11:06 AM, Andrew Dunstan [EMAIL PROTECTED] wrote: The running of extra tests outside our standard regression set will be one of the topics discussed during my session in Toronto. I don't mind having them stored elsewhere, although it will make things a bit more complex, but this use should be kept in mind (for example, I don't think pgfoundry has a CVSup server, although it probably could have easily enough). Ayush and I (along with Brian Hagenbuch) also plan to be in Toronto, so we can help as needed. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Added to TODO list. --- Simon Riggs wrote: On Wed, 2005-09-14 at 13:32 -0400, Tom Lane wrote: I wrote: Another thought came to mind: maybe the current data layout for LWLocks is bad. Right now, the spinlock that protects each LWLock data struct is itself part of the struct, and since the structs aren't large (circa 20 bytes), the whole thing is usually all in the same cache line. ... Maybe it'd be better to allocate the spinlocks off by themselves. Well, this is odd. I made up a patch to do this (attached) and found that it pretty much sucks. Still the 4-way Opteron, previous best (slock-no-cmpb and spin-delay-2): 1 31s 2 42s 4 51s 8 100s with lwlock-separate added: 1 31s 2 52s 4 106s 8 213s What I had expected to see was a penalty in the single-thread case (due to instructions added to the LWLock functions), but there isn't any. I did not expect to see a factor-of-2 penalty for four threads. I guess what this means is that there's no real problem with losing the cache line while manipulating the LWLock, which is what the patch was intended to prevent. Instead, we're paying for swapping two cache lines (the spinlock and the LWLock) across processors instead of just one line. But that should at worst be a 2x inflation of the time previously spent in LWLockAcquire/Release, which is surely not yet all of the application ;-). Why the heck is this so bad? (Just going back through the whole thread for completeness.) Should we expect that apparently minor changes in shared data structures might be costing equivalently huge penalties in SMP performance elsewhere? Yes. That's the advice from Intel and AMD; but we should add that there is potential for improving performance also. The possible problem we were trying to avoid here was false sharing of the cache line by lock requestors of locks whose spin locks were adjacent in memory. Splitting the data structure was just one of the possible ways of avoiding that. The above shows that the possible solution described above didn't work, but there could be others. One thing we tried in February was padding out the statically defined locks with dummy lock definitions in the enum. This has the effect of ensuring that the most contested locks are very definitely in their own cache line and not shared with others. That showed a noticeable improvement in performance, probably because it costs very little to implement, even if the code would require some explanatory documentation. The lwlock structure in include/storage/lwlock.h looks like typedef enum LWLockId { BufMappingLock, BufFreelistLock, LockMgrLock, OidGenLock, XidGenLock, ProcArrayLock, SInvalLock, FreeSpaceLock, WALInsertLock, WALWriteLock, ... Notice that the heavily contested locks (i.e. first 3 and the WAL locks) are adjacent to at least one other heavily contested lock. So they are certain to be in the same cache line and therefore to cause false sharing (on all CPU types, not just Intel and AMD (ref: Manufacturer's optimization handbooks). This could be replaced by... typedef enum LWLockId { BufMappingLock, BufMappingLock_Padding1, BufMappingLock_Padding2, BufMappingLock_Padding3, BufMappingLock_Padding4, BufMappingLock_Padding5, BufMappingLock_Padding6, BufMappingLock_Padding7, BufFreelistLock, BufFreelistLock_Padding1, BufFreelistLock_Padding2, BufFreelistLock_Padding3, BufFreelistLock_Padding4, BufFreelistLock_Padding5, BufFreelistLock_Padding6, BufFreelistLock_Padding7, LockMgrLock, LockMgrLock_Padding1, LockMgrLock_Padding2, LockMgrLock_Padding3, LockMgrLock_Padding4, LockMgrLock_Padding5, LockMgrLock_Padding6, LockMgrLock_Padding7, OidGenLock, XidGenLock, ProcArrayLock, SInvalLock, FreeSpaceLock, WALInsertLock, WALInsertLock_Padding1, WALInsertLock_Padding2, WALInsertLock_Padding3, WALInsertLock_Padding4, WALInsertLock_Padding5, WALInsertLock_Padding6, WALInsertLock_Padding7, WALWriteLock, WALWriteLock_Padding1, WALWriteLock_Padding2, WALWriteLock_Padding3, WALWriteLock_Padding4, WALWriteLock_Padding5, WALWriteLock_Padding6, WALWriteLock_Padding7, ... where the number of padding locks is determined by how many lock structures fit within a 128 byte cache line. This isn't exactly elegant coding, but it provides a useful improvement on an 8-way SMP box when run on 8.0 base. OK, lets be brutal: this looks pretty darn stupid. But it does follow the CPU optimization handbook advice and I did see a
Re: [HACKERS] libpq's pollution of application namespace
Thread added to TODO: o Properly mark all libpq-exported functions with PQ --- Tom Lane wrote: I find that libpq.so exports the following symbols that have neither PQ, pq, pg, nor lo_ as a prefix: EncryptMD5 SockAddr_cidr_mask fe_getauthname fe_getauthsvc fe_sendauth fe_setauthsvc freeaddrinfo_all getaddrinfo_all getnameinfo_all md5_hash rangeSockAddr md5_hash seems a particularly unforgivable intrusion on application namespace :-(. Any objection to fixing these things to be prefixed with pq or pg, which is the convention we usually follow for internal names that can't be static? Also, these functions strictly speaking violate application namespace, but given that PQ appears infix, they're probably OK. appendBinaryPQExpBuffer appendPQExpBuffer appendPQExpBufferChar appendPQExpBufferStr createPQExpBuffer destroyPQExpBuffer enlargePQExpBuffer initPQExpBuffer printfPQExpBuffer resetPQExpBuffer termPQExpBuffer It'd be nicer if we could filter out all exported symbols that don't appear in exports.txt, but I don't know any portable way to do that. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Multi-byte and client side character encoding
Luke, Ayush and I (along with Brian Hagenbuch) also plan to be in Toronto, so we can help as needed. You and Bruce might want to think about *registering* for the conference. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Multi-byte and client side character encoding
Josh Berkus wrote: Luke, Ayush and I (along with Brian Hagenbuch) also plan to be in Toronto, so we can help as needed. You and Bruce might want to think about *registering* for the conference. I thought I did already. I filled something out a few months ago. Anyway, it now says registration closed. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PQescapeIdentifier
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Christopher Kings-Lynne wrote: TODO item done for 8.2: * Add PQescapeIdentifier() to libpq Someone probably needs to check this :) Chris [ application/x-gzip is not supported, skipping... ] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql and process titles
Jim C. Nasby [EMAIL PROTECTED] writes: Even if what's being written is a single byte/word and we're taking intentional steps to make sure that reading a pointer that's off by 1 isn't an issue? Is the issue that if CPU A writes a pointer and CPU B immediately tries to read it that it could get old data from it's cache? Well for applications like this you don't care whether you get the old data or the new data. You just want to be sure you don't get half of each. However: Tom Lane [EMAIL PROTECTED] writes: Do I need to point out that sig_atomic_t can't portably be assumed to be wider than char? I didn't actually realize that. That would mean EXPLAIN ANALYZE would be limited to counting times for 256 plan nodes and wouldn't really be much of an improvement over the existing infrastructure. In any case, speculating about whether we can do something useful with atomic types ignores the main real problem the thread is about. Anybody remember process titles and current command strings? Oops, I thought this was the ANALYZE patch thread. -- greg ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] libpq's pollution of application namespace
Bruce Momjian pgman@candle.pha.pa.us writes: Thread added to TODO: o Properly mark all libpq-exported functions with PQ Is that still relevant? I thought we'd done as much as we intended to do in that specific direction. What would make sense to work on is making the build step that strips not-officially-exported symbols work on more platforms. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] libpq's pollution of application namespace
On Wed, Jun 14, 2006 at 05:54:56PM -0400, Bruce Momjian wrote: Thread added to TODO: o Properly mark all libpq-exported functions with PQ I thought this was done already. At least, with a recent CVS I get this: $ nm -D libpq.so --defined-only |grep -v 'PQ\|pq\|lo_\|pg_' 000171e0 D pgresStatus Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Multi-byte and client side character encoding
Josh Berkus josh@agliodbs.com writes: You and Bruce might want to think about *registering* for the conference. Speaking of which, is a list of those registered posted anywhere? I looked for one the other day because I wanted to double-check that I was registered, but couldn't find anything on the website. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] The argument for reinstating --as-needed
This patch is too general. It targets all platforms, not just those that might be affected, and it tries to fix an operating system bug. Also, we removed the readling linking into the backend in 8.2, so that is a non-issue now. --- Martijn van Oosterhout wrote: -- Start of PGP signed section. As some of you may remember, a patch for adding --as-needed to the GNU linker command line was added [1] and subsequently removed [2] under the mistaken assumption that it was a linker bug [3]. It isn't. The bug is actually in readline, in that it doesn't declare its dependancy on termcap/ncurses. Note: this bug was fixed back in 2002 in Debian (so no Debian systems are affected by this problem) but is still present in the latest Redhat release and probably many other places. I propose we add a workaround for readline and add the flag back again. The benefits are obvious, from 228 to 87 DT_NEEDED records across a normal PostgreSQL installation, and the server will no longer depend on readline (always an odd point). The patch is attached: it basically adds a reference to termcap/ncurses directly so the psql binary links to it. psql is the only binary that uses readline so this solves the problem. Just a quick note as to why it's not a bug in the linker. The --as-needed flag includes DT_NEEDED records only for libraries referenced by the objects. The issue is that ld removed termcap, needed by readline but not by psql directly. To see why this is not a bug, think about what the -l options are for. They are for resolving symbols not found in the objects you are linking. The linker *does not* look for the symbols needed by the shared libraries given. Static libraries (being sets of objects) yes, shared libraries no. Shared libraries have their own DT_NEEDED records to say what they depend on, they don't need to be checked again. On my Debian system where readline is configured correctly, ncurses is not mentioned on the link line and the linker doesn't look for it either. It is used at runtime though. Other issues: (a) won't this happen again with some other library? Well, unlikely. Obviously this can only affect libraries we list on our link line. Note, a library having this problem would fail autoconf tests also, so we'd know about it. We already have a hack in the autoconf stuff for readline already, I just don't think people expected it to apply to Redhat. (b) it wastes 4 bytes. Well yes, but I think the benefits outweigh the costs. I added it to input.c but it could be a seperate file. (c) an autoconf test to test for this condition. Well, I thought about it but the cost and effort of maintaining such a test is far higher than just forcing the reference. Have a nice day, [1] http://archives.postgresql.org/pgsql-committers/2005-05/msg00042.php [2] http://archives.postgresql.org/pgsql-hackers/2005-05/msg00488.php [3] https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=157126 -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. [ Attachment, skipping... ] -- End of PGP section, PGP failed! -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: input.c === RCS file: /projects/cvsroot/pgsql/src/bin/psql/input.c,v retrieving revision 1.46 diff -u -r1.46 input.c --- input.c 15 Oct 2005 02:49:40 - 1.46 +++ input.c 29 Oct 2005 16:57:48 - @@ -34,6 +34,14 @@ hctl_ignoredups = 2, hctl_ignoreboth = hctl_ignorespace | hctl_ignoredups }; + +/* Work around a bug in some releases of readline. The shared lib doesn't + * doesn't always declare its dependancy on termcap/ncurses/curses. This + * creates a reference to termcap so it gets pulled in, but this is never + * actually used... */ + +extern int tputs(); +int (*__pg_never_used)() = tputs; #endif #ifdef HAVE_ATEXIT ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] libpq's pollution of application namespace
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Thread added to TODO: o Properly mark all libpq-exported functions with PQ Is that still relevant? I thought we'd done as much as we intended to do in that specific direction. What would make sense to work on Oh, OK. is making the build step that strips not-officially-exported symbols work on more platforms. Uh, well, I had some patches in the patch queue to go in that direction, but I thought the conclusion in that thread was that it wasn't worth it. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CSV mode option for pg_dump
I wrote: The main use case seems to me to be where you are exporting a whole database or most of it with a very large number of tables, and it is convenient to have all the CSVs created for you rather than have to make them manually one at a time. You could get these out of, say, a tar format dump very easily. I just noticed that the data members all have \. and some blank lines at the end, so wash that out. We now return you to normal -hacking cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Multi-byte and client side character encoding
Tom, Speaking of which, is a list of those registered posted anywhere? I looked for one the other day because I wanted to double-check that I was registered, but couldn't find anything on the website. No, I think there would be some privacy issues with that. You're registered. You should have been getting conference-attendees e-mails, about 3 of them so far. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] libpq's pollution of application namespace
Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: What would make sense to work on is making the build step that strips not-officially-exported symbols work on more platforms. Uh, well, I had some patches in the patch queue to go in that direction, but I thought the conclusion in that thread was that it wasn't worth it. We currently have coverage for Linux and Darwin. If we had coverage for the *BSDen I would figure it was close enough ... is it possible to do the *BSDen with a few more makefile lines, or is it too ugly? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] libpq's pollution of application namespace
On Wed, Jun 14, 2006 at 06:23:36PM -0400, Bruce Momjian wrote: is making the build step that strips not-officially-exported symbols work on more platforms. Uh, well, I had some patches in the patch queue to go in that direction, but I thought the conclusion in that thread was that it wasn't worth it. That's my recollection too. I had something that supported HPUX for example but it was decided not worth the effort (can't find it right now though...). Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion(!((itemid)-lp_flags
Added to TODO: * Consider increasing internal areas when shared buffers is increased http://archives.postgresql.org/pgsql-hackers/2005-10/msg01419.php --- Alvaro Herrera wrote: Jim C. Nasby wrote: Now that I've got a little better idea of what this code does, I've noticed something interesting... this issue is happening on an 8-way machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this greatly increase the odds of buffer conflicts? Bug aside, would it be better to set NUM_SLRU_BUFFERS higher for a larger number of CPUs? We had talked about increasing NUM_SLRU_BUFFERS depending on shared_buffers, but it didn't get done. Something to consider for 8.2 though. I think you could have better performance by increasing that setting, while at the same time dimishing the possibility that the race condition appears. I think you should also consider increasing PGPROC_MAX_CACHED_SUBXIDS (src/include/storage/proc.h), because that should decrease the chance that the subtrans area needs to be scanned. By how much, however, I wouldn't know -- it depends on the number of subtransactions you typically have; I guess you could activate the measuring code in procarray.c to get a figure. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 www.google.com: interfaz de l?nea de comando para la web. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fabian Pascal and RDBMS deficiencies in fully implementing
kleptog@svana.org (Martijn van Oosterhout) writes: On Tue, Jun 13, 2006 at 05:23:56PM -0400, Christopher Browne wrote: [3] http://www.intelligententerprise.com/010327/celko_online.jhtml;jsessionid=NDIHEWXGL4TNKQSNDBNSKHSCJUMEKJVN The sample problem in [3] is one that shows pretty nicely a significant SQL weakness; it's very painful to build SQL to do complex things surrounding cumulative statistics. I havn't managed to wrap my brain around them yet, but this seems like something that SQL WINDOW functions would be able to do. For each row define the window frame to be all the preceding rows, do a SUM() and divide that over the total. Or perhaps the PERCENT_RANK() function does this already, not sure. Mind you, postgres doesn't support them yet, but it's interesting that it may be possible at all... Yes, you are exactly right; I have seen a couple references to OVER and PARTITION BY which look as though they are the relevant SQL additions... http://blogs.ittoolbox.com/database/technology/archives/olap-sql-part-5-windowing-aggregates-8373 http://www.sqljunkies.com/HowTo/4E65FA2D-F1FE-4C29-BF4F-543AB384AFBB.scuk http://sqljunkies.com/Article/4E65FA2D-F1FE-4C29-BF4F-543AB384AFBB.scuk http://www.experts-exchange.com/Databases/Oracle/Q_21793507.html I'm not sure the degree to which these are standardized, but they are available in some form or another in late-breaking versions of Oracle, DB2, and Microsoft SQL Server. I'm not quite sure how to frame this so as to produce something that should go on the TODO list, but it looks like there's a possible TODO here... -- let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;; http://cbbrowne.com/info/sap.html The newsreader abuse likely stems from more fundamental, than merely just the UI, design disagreements. Requests from Unix programmers to replicate Free Agent rightfully so should trigger the throwing of sharp heavy objects at the requesting party. -- [EMAIL PROTECTED] (jedi) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Interval aggregate regression failure (expected seems
I assume no more progress has been made on this? --- Tom Lane wrote: I wrote: Michael Paesold [EMAIL PROTECTED] writes: I am definatly not going to use -march=pentium4 in any production system. Should I open a bug report with RedHat (gcc vendor)? Yeah, but they'll probably want a smaller test case than Postgres fails its regression tests :-( I have just confirmed that the problem still exists in FC4's current compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the priority of the complaint quite a long way in Red Hat's eyes. I've also confirmed that the problem is in interval_div; you can reproduce the failure with select '41 years 1 mon 11 days'::interval / 10; which should give '4 years 1 mon 9 days 26:24:00', but when timestamp.o is compiled with -mcpu=pentium4 -march=pentium4, you get '4 years 1 mon 10 days 02:24:00'. --enable-integer-datetimes is not relevant because the interesting part is all double/integer arithmetic. Looking at this, though, I wonder if the pentium4 answer isn't more correct. If I'm doing the math by hand correctly, what we end up with is having to cascade 3/10ths of a month down into the days field, and since the conversion factor is supposed to be 30 days/month, that should be exactly 9 days. Plus the one full day from the 11/10 days gives 10 days. I think what is happening on all the non-Pentium platforms is that (3.0/10.0)*30.0 is producing something just a shade less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly due to rearrangement of the calculation. I think we can still file this as a compiler bug, because I'm pretty sure the C spec does not allow rearrangement of floating-point calculations ... but we might want to think about tweaking the code's roundoff behavior just a bit. An example that's a little easier to look at is select '41 years 1 mon'::interval / 10; I get '4 years 1 mon 9 days' with the pentium4 optimization, and '4 years 1 mon 8 days 24:00:00' without, and the former seems more correct... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Increasing catcache size
Awhile back, there was a discussion about psql \d display being really slow in a database with 4000 tables: http://archives.postgresql.org/pgsql-hackers/2005-09/msg01085.php I looked into this some, and it seems that part of the problem is that the catalog caches are limited to hold no more than 5000 tuples in total (MAXCCTUPLES in catcache.c). When you're looking at thousands of tables, the tuples you want age out of cache immediately and so every SearchSysCache call has to actually go to disk (or at least shared buffers). The 5000 setting is basically Berkeley-era (when we got the code from Berkeley, the limit was 300 tuples per cache, which I changed to a system-wide limit some years ago). A reasonable guess is that we're talking about entries of 100 to 200 bytes each, making the maximum catcache space usage a megabyte or less (per backend, that is). Seems like this is mighty small on modern machines. We could make a GUC variable to control this, or just arbitrarily bump up the constant to 5 or so. Is it worth another GUC? Not sure. There's another hardwired setting right next to this one, which is the number of hash buckets per catcache. Since the caches are likely to have significantly different sizes, that is looking a bit obsolete too. We could put per-cache numbers into the setup lists in syscache.c easily. Making the cache sizes dynamic a la dynahash.c is another possibility, but I'm unsure if it's worth the trouble. Thoughts? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Increasing catcache size
I am thinking we should scale it based on max_fsm_relations. --- Tom Lane wrote: Awhile back, there was a discussion about psql \d display being really slow in a database with 4000 tables: http://archives.postgresql.org/pgsql-hackers/2005-09/msg01085.php I looked into this some, and it seems that part of the problem is that the catalog caches are limited to hold no more than 5000 tuples in total (MAXCCTUPLES in catcache.c). When you're looking at thousands of tables, the tuples you want age out of cache immediately and so every SearchSysCache call has to actually go to disk (or at least shared buffers). The 5000 setting is basically Berkeley-era (when we got the code from Berkeley, the limit was 300 tuples per cache, which I changed to a system-wide limit some years ago). A reasonable guess is that we're talking about entries of 100 to 200 bytes each, making the maximum catcache space usage a megabyte or less (per backend, that is). Seems like this is mighty small on modern machines. We could make a GUC variable to control this, or just arbitrarily bump up the constant to 5 or so. Is it worth another GUC? Not sure. There's another hardwired setting right next to this one, which is the number of hash buckets per catcache. Since the caches are likely to have significantly different sizes, that is looking a bit obsolete too. We could put per-cache numbers into the setup lists in syscache.c easily. Making the cache sizes dynamic a la dynahash.c is another possibility, but I'm unsure if it's worth the trouble. Thoughts? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Increasing catcache size
Bruce Momjian pgman@candle.pha.pa.us writes: I am thinking we should scale it based on max_fsm_relations. Hmm ... tables are not the only factor in the required catcache size, and max_fsm_relations tells more about the total installation size than the number of tables in your particular database. But it's one possible approach. I just thought of a more radical idea: do we need a limit on catcache size at all? On normal size databases I believe that we never hit 5000 entries at all (at least, last time I ran the CATCACHE_STATS code on the regression tests, we didn't get close to that). We don't have any comparable limit in the relcache and it doesn't seem to hurt us, even though a relcache entry is a pretty heavyweight object. If we didn't try to enforce a limit on catcache size, we could get rid of the catcache LRU lists entirely, which'd make for a nice savings in lookup overhead (the MoveToFront operations in catcache.c are a nontrivial part of SearchSysCache according to profiling I've done, so getting rid of one of the two would be nice). regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Increasing catcache size
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: I am thinking we should scale it based on max_fsm_relations. Hmm ... tables are not the only factor in the required catcache size, and max_fsm_relations tells more about the total installation size than the number of tables in your particular database. But it's one possible approach. I just thought of a more radical idea: do we need a limit on catcache size at all? On normal size databases I believe that we never hit 5000 entries at all (at least, last time I ran the CATCACHE_STATS code on the regression tests, we didn't get close to that). We don't have any comparable limit in the relcache and it doesn't seem to hurt us, even though a relcache entry is a pretty heavyweight object. If we didn't try to enforce a limit on catcache size, we could get rid of the catcache LRU lists entirely, which'd make for a nice savings in lookup overhead (the MoveToFront operations in catcache.c are a nontrivial part of SearchSysCache according to profiling I've done, so getting rid of one of the two would be nice). Well, assuming you never access all those tables, you don't use lots of memory, but if you are accessing a lot, it seems memory for all your tables is a minimal overhead. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Re-thing PG_MODULE_MAGIC
Hey, I just noticed (the hard way) that in 8.2CVS, the PG_MODULE_MAGIC header is now *required* for all loadable modules. This includes non-pg modules, such as Solaris' libumem (performance improvement for malloc). FATAL: incompatible library /usr/lib/libumem.so: missing magic block HINT: Extension libraries are now required to use the PG_MODULE_MAGIC macro. Should we really be requiring PG_MODULE_MAGIC for stuff that wasn't built with PG? This seems like a way to break a lot of people's software. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Increasing catcache size
Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: I just thought of a more radical idea: do we need a limit on catcache size at all? On normal size databases I believe that we never hit 5000 entries at all (at least, last time I ran the CATCACHE_STATS code on the regression tests, we didn't get close to that). We don't have any comparable limit in the relcache and it doesn't seem to hurt us, even though a relcache entry is a pretty heavyweight object. Well, assuming you never access all those tables, you don't use lots of memory, but if you are accessing a lot, it seems memory for all your tables is a minimal overhead. I re-did the test of running the regression tests with CATCACHE_STATS enabled. The largest catcache population in any test was 1238 tuples, and most backends had 500 or less. I'm not sure whether you'd really want to consider the regression database as representative of small production databases, but granted that assumption, the current limit of 5000 tuples isn't limiting anything on small-to-middling databases. (Note we are counting tables and other cataloged objects, *not* volume of data stored --- so the regression database could easily be much bigger than many production DBs by this measure.) So I'm pretty strongly inclined to just dike out the limit. If you're running a database big enough to hit the existing limit, you can well afford to put more memory into the catcache. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Re-thing PG_MODULE_MAGIC
Josh Berkus josh@agliodbs.com writes: I just noticed (the hard way) that in 8.2CVS, the PG_MODULE_MAGIC header is now *required* for all loadable modules. This includes non-pg modules, such as Solaris' libumem (performance improvement for malloc). What is libumem and why are you trying to load it as a dynamic module? Should we really be requiring PG_MODULE_MAGIC for stuff that wasn't built with PG? If we don't require it then there's precious little point in having it at all. But why don't you link libumem when building the postmaster, if you want it? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Re-thing PG_MODULE_MAGIC
All, FATAL: incompatible library /usr/lib/libumem.so: missing magic block HINT: Extension libraries are now required to use the PG_MODULE_MAGIC macro. Should we really be requiring PG_MODULE_MAGIC for stuff that wasn't built with PG? This seems like a way to break a lot of people's software. Never mind, Neil C. just pointed out that we shouldn't be using preload_libraries for libumem anyway -- it's sloppy. However, I do think that we should brace ourselves for a slew of bug reports; if the Sun testing guys are misusing preload_libraries this way, they won't be the only onese. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Increasing catcache size
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: I just thought of a more radical idea: do we need a limit on catcache size at all? On normal size databases I believe that we never hit 5000 entries at all (at least, last time I ran the CATCACHE_STATS code on the regression tests, we didn't get close to that). We don't have any comparable limit in the relcache and it doesn't seem to hurt us, even though a relcache entry is a pretty heavyweight object. Well, assuming you never access all those tables, you don't use lots of memory, but if you are accessing a lot, it seems memory for all your tables is a minimal overhead. I re-did the test of running the regression tests with CATCACHE_STATS enabled. The largest catcache population in any test was 1238 tuples, and most backends had 500 or less. I'm not sure whether you'd really want to consider the regression database as representative of small production databases, but granted that assumption, the current limit of 5000 tuples isn't limiting anything on small-to-middling databases. (Note we are counting tables and other cataloged objects, *not* volume of data stored --- so the regression database could easily be much bigger than many production DBs by this measure.) So I'm pretty strongly inclined to just dike out the limit. If you're running a database big enough to hit the existing limit, you can well afford to put more memory into the catcache. And if we get problem reports, we can fix it. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Re-thing PG_MODULE_MAGIC
Josh Berkus wrote: Never mind, Neil C. just pointed out that we shouldn't be using preload_libraries for libumem anyway -- it's sloppy. However, I do think that we should brace ourselves for a slew of bug reports; if the Sun testing guys are misusing preload_libraries this way, they won't be the only onese. Worth a comment in the config file? cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Bruce Momjian pgman@candle.pha.pa.us wrote Added to TODO list. One thing we tried in February was padding out the statically defined locks with dummy lock definitions in the enum. This has the effect of ensuring that the most contested locks are very definitely in their own cache line and not shared with others. That showed a noticeable improvement in performance, probably because it costs very little to implement, even if the code would require some explanatory documentation. Has this been done? See the LWLOCK_PADDED_SIZE macro in code. Regards, Qingqing ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Qingqing Zhou wrote: Bruce Momjian pgman@candle.pha.pa.us wrote Added to TODO list. One thing we tried in February was padding out the statically defined locks with dummy lock definitions in the enum. This has the effect of ensuring that the most contested locks are very definitely in their own cache line and not shared with others. That showed a noticeable improvement in performance, probably because it costs very little to implement, even if the code would require some explanatory documentation. Has this been done? See the LWLOCK_PADDED_SIZE macro in code. Oh, yes, thanks. I thought it had but I couldn't find anything in the area of the code he propsed the patch. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Qingqing Zhou [EMAIL PROTECTED] writes: One thing we tried in February was padding out the statically defined locks with dummy lock definitions in the enum. Has this been done? See the LWLOCK_PADDED_SIZE macro in code. Not really --- that patch was intended to ensure that LWLocks don't unnecessarily cross two cache lines. It doesn't ensure that two different LWLocks aren't sharing a cache line. You could do that by increasing LWLOCK_PADDED_SIZE to the cache line size for your hardware, if you know what that is. I think a more effective answer might be to twiddle the order of enum LWLockId items so that the most heavily used LWLocks aren't close together. Haven't looked into it though. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Alternative variable length structure
On Wed, 2006-06-14 at 16:56 -0400, Bruce Momjian wrote: The code churn to do this is going to be quite significant Why do you say that? Although the original patch posted to implement this was incomplete, it certainly wasn't very large. -Neil ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Alternative variable length structure
Bruce Momjian pgman@candle.pha.pa.us wrote: I assume the conclusion from this email thread is that though the idea is interesting, the complexity added would not be worth the saving of a few bytes. I have same understanding about it. Besides the complexity, there is trade-off between cpu and i/o resources. Also, there is another approach in http://archives.postgresql.org/pgsql-hackers/2005-12/msg00258.php , where 2byte header version of varlena is suggested. It is more simple, but it may double intrinsic text types (shorttext, shortchar and shortvarchar). I assume it is worth the saving of a few bytes in particular for indexes on VLDBs, but my patch is still incomplete and needs more works. --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches
Tom Lane [EMAIL PROTECTED] wrote Not really --- that patch was intended to ensure that LWLocks don't unnecessarily cross two cache lines. It doesn't ensure that two different LWLocks aren't sharing a cache line. You could do that by increasing LWLOCK_PADDED_SIZE to the cache line size for your hardware, if you know what that is. Exactly, this is one way -- if we make LWLOCK_PADDED_SIZE big enough, we can assure that one lwlock one cacheline. If so, maybe we should plug in a check like LMBench in ./configure to guess out current cacheline size. But this way looks like overkill -- a compromise is to pad only some of the LWLocks big enough but not all (for example, the buffer content lock array). Regards, Qingqing ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Alternative variable length structure
Neil Conway [EMAIL PROTECTED] writes: On Wed, 2006-06-14 at 16:56 -0400, Bruce Momjian wrote: The code churn to do this is going to be quite significant Why do you say that? Although the original patch posted to implement this was incomplete, it certainly wasn't very large. IIRC, the original idea would have forced code changes on every existing varlena datatype ... not just those in our source tree, but outside users' custom types as well. You'd need a really *serious* gain to justify that. The second idea was to leave existing varlena types as-is and invent a set of small text etc datatypes alongside them. This is simply ugly (shades of Oracle's varchar2), and again there wasn't really an adequate case made why we should burden users with extra complexity. The variant I could have supported was making a 2-byte-header class of types and using it just for the few types where it would be sufficient (numeric and inet, basically). That would be transparent to everyone. Whether it's worth the trouble is not clear given that it'd only help a few datatypes, but would add cycles in critical inner loops in places like heap_deformtuple. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Test request for Stats collector performance improvement
Would some people please run the attached test procedure and report back the results? I basically need to know the patch is an improvement on more platforms than just my own. Thanks --- Run this script and record the time reported: ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.script Modify postgresql.conf: stats_command_string = on and reload the server. Do SELECT * FROM pg_stat_activity; to verify the command string is enabled. You should see your query in the current query column. Rerun the stat.script again and record the time. Apply this patch to CVS HEAD: ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.nobuffer Run the stat.script again and record the time. Report via email your three times and your platform. If the patch worked, the first and third times will be similar, and the second time will be high. --- Bruce Momjian wrote: Tom Lane wrote: Michael Fuhr [EMAIL PROTECTED] writes: Further tests show that for this application the killer is stats_command_string, not stats_block_level or stats_row_level. I tried it with pgbench -c 10, and got these results: 41% reduction in TPS rate for stats_command_string 9% reduction in TPS rate for stats_block/row_level (any combination) strace'ing a backend confirms my belief that stats_block/row_level send just one stats message per transaction (at least for the relatively small number of tables touched per transaction by pgbench). However stats_command_string sends 14(!) --- there are seven commands per pgbench transaction and each results in sending a command message and later an IDLE message. Given the rather lackadaisical way in which the stats collector makes the data available, it seems like the backends are being much too enthusiastic about posting their stats_command_string status immediately. Might be worth thinking about how to cut back the overhead by suppressing some of these messages. I did some research on this because the numbers Tom quotes indicate there is something wrong in the way we process stats_command_string statistics. I made a small test script: if [ ! -f /tmp/pgstat.sql ] theni=0 while [ $i -lt 1 ] do i=`expr $i + 1` echo SELECT 1; done /tmp/pgstat.sql fi time psql test /tmp/pgstat.sql /dev/null This sends 10,000 SELECT 1 queries to the backend, and reports the execution time. I found that without stats_command_string defined, it ran in 3.5 seconds. With stats_command_string defined, it took 5.5 seconds, meaning the command string is causing a 57% slowdown. That is way too much considering that the SELECT 1 has to be send from psql to the backend, parsed, optimized, and executed, and the result returned to the psql, while stats_command_string only has to send a string to a backend collector. There is _no_ way that collector should take 57% of the time it takes to run the actual query. With the test program, I tried various options. The basic code we have sends a UDP packet to a statistics buffer process, which recv()'s the packet, puts it into a memory queue buffer, and writes it to a pipe() that is read by the statistics collector process which processes the packet. I tried various ways of speeding up the buffer and collector processes. I found if I put a pg_usleep(100) in the buffer process the backend speed was good, but packets were lost. What I found worked well was to do multiple recv() calls in a loop. The previous code did a select(), then perhaps a recv() and pipe write() based on the results of the select(). This caused many small packets to be written to the pipe and the pipe write overhead seems fairly large. The best fix I found was to loop over the recv() call at most 25 times, collecting a group of packets that can then be sent to the collector in one pipe write. The recv() socket is non-blocking, so a zero return indicates there are no more packets available. Patch attached. This change reduced the stats_command_string time from 5.5 to 3.9, which is closer to the 3.5 seconds with stats_command_string off. A second improvement I discovered is that the statistics collector is calling gettimeofday() for every packet received, so it can determine the timeout for the select() call to write the flat file. I removed that behavior and instead used setitimer() to issue a SIGINT every 500ms, which was the original behavior. This eliminates the gettimeofday() call and makes the code cleaner. Second patch attached. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
[HACKERS] Are we still interested in the master-slave scan patch
I may have some free time recently to work on the master-slave scan idea. I've been able to support AS-IS both SeqScan and IndexScan. Are we still interested in getting it into 8.2? There are still some problems I am not quite sure the solution. One is the Xid assignment -- we need this to assure that master and slaves see the same snapshot. The other is the connection pool architecture: shall we let postmaster manage the slaves or let another process say slave-master to handle them? Currently I am choosing the latter. Regards, Qingqing ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] emu buildfarm failure
Buildfarm member emu is failing in some branches with checking libintl.h usability... yes checking libintl.h presence... no configure: WARNING: libintl.h: accepted by the compiler, rejected by the preprocessor! configure: WARNING: libintl.h: proceeding with the preprocessor's result checking for libintl.h... no configure: error: header file libintl.h is required for NLS I think this is probably because of 'config_env' = { 'CFLAGS' = '-I/usr/local/include', 'CC' = 'gcc', 'LDFLAGS' = '-L/usr/local/lib/ -lcrypto -liconv' }, The config.log trace suggests that libintl.h is installed in /usr/local/include, because the working tests succeed when they have -I/usr/local/include while the failing tests don't have that. -I switches should logically go in CPPFLAGS not CFLAGS. I'm not sure this explains emu's failure to build 7.4 and 8.0, because I don't know why 8.1 and HEAD wouldn't fail the same way ... but it looks wrong anyway. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Are we still interested in the master-slave scan patch
QingQing, I may have some free time recently to work on the master-slave scan idea. I've been able to support AS-IS both SeqScan and IndexScan. Are we still interested in getting it into 8.2? I don't know about anyone else, but *I'm* still interested. There are still some problems I am not quite sure the solution. One is the Xid assignment -- we need this to assure that master and slaves see the same snapshot. The other is the connection pool architecture: shall we let postmaster manage the slaves or let another process say slave-master to handle them? Currently I am choosing the latter. Hmmm. Why not the postmaster? -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly