Re: [GENERAL] advocacy: drupal and PostgreSQL
In response to Tom Hart [EMAIL PROTECTED]: Joshua D. Drake wrote: Robert Treat wrote: There's been a big move in the php community to push people towards php5 (one of which was EOL of php4), which has started to pay off. I'd guess that if they wanted to, they could switch to PDO with Drupal 7 and not hurt themselves too much. When I spoke with Dries about this issue one of the big hold backs wasn't PHP 4 but actually MySQL 3. When Drupal 6, MySQL 3 is not longer supported. So they can actually do some nicer stuff (like foreign keys) etc.. I am sure that with PHP5 things will improve as well. Sincerely, Joshua D. Drake Let me just sneak in a quick rant here, from somebody who really doesn't matter. We run drupal for our corporate intranet (currently being built) and we use postgreSQL as the backend. Some of the modules and things don't work perfectly, but drupal supported it and that made me happy enough to work with it. Now after reading this garbage, I'm extremely disappointed. Completely dropping postgresql capability might not affect them too largely in the huge run, because a large amount of their user base is using mySQL, but it would send a message to those of us that believe in choice. I'm afraid that they're choosing the route of convenience over their users, and every time I think about it I want to go looking for replacements. I run my personal site on Drupal+PostgreSQL. If Drupal drops PG support, I'll switch the front-end. I'm not switching the back end. I'm also planning a small enterprise that I was originally considering using Drupal for. I'm now more seriously considering Bricolage. However, read on ... It'd be easier to build drupal to only run on mySQL, but then again it'd be easy to build postgreSQL to only run on linux and forget about the windows users. I know it's not their duty to make drupal work with postgresql, but if they drop it like they're talking about, I'll be making a push here and to everyone I know who uses drupal to switch to another system, whether they're running postgres or not. If drupal 6 absolutely doesn't support postgres, then I'm dropping my drupal 5 install on the spot. This is a cold move drupal, and you should be ashamed. I made a post on the drupal-devel list to this effect. I got chewed out for flaming Karoly ... who's obviously some big Drupal code guru. Frankly, every time this topic comes up, it's initiated by Karoly, and I've lost patience with the crap, so I unsubscribed. If I can get my life back in order, I'll re-subscribe some time in Feb, and hopefully start to do something productive, like contribute testing and patches. Sorry, I'll end the rant here. BTW, I'm a PHP developer who uses postgreSQL almost exclusively and I'm on this list as well as other postgres lists constantly (even if as a reader most of the time). If they have this big of an issue, why not ask for help? If you read through the thread, it's just Karoly and a few other minor players in the Drupal community. Many people have stepped up and said, I _do_ test on PostgreSQL, so what are you complaining about? As best I can tell, Karoly writes patches, and when they don't work on PostgreSQL and therefore don't get committed right away, he starts this argument up on the Drupal lists yet again. The guy is a whiner who has a personal axe to grind and seems unable to accept that Drupal wants to run on more than just MySQL. If he loves MySQL so much, he should join a project that only supports MySQL and leave the Drupal people to their work. There's a LOT of effort in the Drupal community to build code abstractions that will make the system database-agnostic, and Karoly's constant whining is simply counterproductive. To a large degree, I think Karoly has blown the situation out of proportion. Look at how it affects _this_ list every time he starts bitching, for example. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] advocacy: drupal and PostgreSQL
In response to Tom Hart [EMAIL PROTECTED]: Bill Moran wrote: [snip] To a large degree, I think Karoly has blown the situation out of proportion. Look at how it affects _this_ list every time he starts bitching, for example. Is it just Karoly (chx) who has all these things to say about pg? He's just one person on the drupal team. Has anybody else in the core team spoken out on this subject? Last time this came up (which was in Dec, I believe) a few other core members jumped in eventually and said, No, we're keeping PG. The goal of Drupal is to be database agnostic, so dropping PG is counter- productive. To which Karoly responded that he didn't want to _drop_ PG support, he just wanted to drop PG support ... or something equally nonsensical. The guy sets my jerk alarms ringing like a 5 alarm fire. He doesn't play well with others, he constantly starts fights, and he threatens to take his ball and go home every time he loses. I don't care how much code he writes, I don't think he's worth the headache. Let's keep in mind as well that this doesn't only affect pg users but any other database as well that drupal supports or plans on supporting. Drupal is pretty popular, and I expect there are a number of organizations that don't fit in their mold of the ideal drupal user. As I said, I get the impression that most of the Drupal developers get this, and they have mentioned more than once that Drupal's design goal is to be database-agnostic. It just seems to be Karoly and a few people here and there that he's able to incite into riot. I'd almost consider trying to take drupal and create a derivative product and build in the pg and oracle and mssql, etc. support myself, but if the drupal team really pulls a messed up move like this, I really don't want to have anything to do with them anymore. It's not that I'm that huge of a pg nut (I used mySQL for a while myself), but any team that can turn it's back on that many of it's users to make their lives a little easier isn't in it for the right reasons (the advancement of technology, computing as a science, etc.). I am literally astonished that they would even consider telling even 1% of their users Take off, you're too much work. How many drupal+postgres users are large corporations, or regular donators? What about code contributors? How many people are they looking at pissing off with a move like this? I'm upset with the community. The other core members need to stand up to Karoly and say, You opinions are not those of the community, and we'll ban you from the lists if you continue to start this fight over and over again. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Sun acquires MySQL
In response to dvanatta [EMAIL PROTECTED]: What's up with 3 of the 7 being from Pennsylvania? What's the connection? Well, as everyone knows, Pennsylvania is a haven for brilliant people. In fact, simply living in Pennsylvania makes you smarter. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] large table vacuum issues
Ed L. [EMAIL PROTECTED] wrote: On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote: On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote: We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 First of all, update your 8.1 install to 8.1.10. Failing to keep up with bug fixes is negligent. who knows, you might be getting bitten by a bug that was fixed between 8.1.2 and 8.1.10 Could be. But like you said, who knows. In some environments, downtime for upgrading costs money (and more), too, sometimes even enough to make it negligent to take downtime to keep up with bug fixes (and of course, the new bugs) which may or may not be a factor at hand. Upgrades along the 8.1.x branch take something on the order of 5 minutes (if you're meticulous and serialize the process). If you haven't set yourself up so you can schedule 5 minutes of downtime once a month or so, then the negligence occurred much earlier than at the failure to upgrade. While the time required to restart a DB may be neglible, there are often upstream/downstream dependencies that greatly expand the actual downtime for the customer. Like what? The point to the double-dot branch is that upgrades don't affect dependencies. How much would downtime need to cost before you thought it negligent to upgrade immediately? It's a tradeoff, not well-supported by simple pronouncements, one the customer and provider are best qualified to make. Not really. Unscheduled downtime is _always_ more expensive than scheduled downtime. Scheduled downtime isn't going to put you in breach of contract if you've got an uptime guarantee. If you're really in a situation where you need 100% uptime, then you're still negligent for not having something like Slony to allow you to switch production to another server so you can alternate maintenance between the two. This is something along the RAID 5 argument, no matter how you argue it, it's a bad idea. If you claim you can't afford to buy more hardware, then you made a mistake in pricing out your product to your client. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] [OT] Slony (initial) Replication - Slow
In response to Ow Mun Heng [EMAIL PROTECTED]: I'm just wetting my hands with slony and during the setup of the slave, I did and dump and restore of the master DB to the Slave DB. However during the startup of slony, I noticed that it issues a truncate command to the (to be) replicated table. Hence, this means that there's no such need for me to do a dump/restore in the 1st place. can someone confirm this? Confirmed. It's how Slony is designed to work. It _is_ taking long time (for slony) to do the \copy (~60GB in multiple tables being replicated, including (on the fly) index creation) 1) It only needs to be done once 2) You can remove the indexes from the replica and add them back in after the initial sync is complete. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Is there PHP mysql_real_escape_string for postgresql?
In response to Erik Jones [EMAIL PROTECTED]: On Dec 17, 2007, at 8:37 PM, [EMAIL PROTECTED] wrote: In php is there a postgresql version of mysql_real_escape_string() ? You have both pg_escape_string and pg_escape_bytea available. Is there a mysql_fake_escape_string()? Should PostgreSQL have a pg_pretend_to_escape_string() that effectively does nothing? -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] multiple version installation in the same machine ????
In response to Josh Harrison [EMAIL PROTECTED]: Hi I have a postgres version 7.4 and version 8.3 installed in my system. 7.4 uses port 5432 and 8.3 uses port 5433. I started 7.4 and the database is running fine. Now i started the database server in version 8.3 and it started fine. pg_ctl -D /export/home/josh/postgres8.3/pgsql/data start -l log8.3.log server starting -sh-3.00$ pg_ctl status pg_ctl: server is running (PID: 4408) /usr4/postgres8.3/bin/postgres -D /export/home/josh/postgres8.3/pgsql/data But when I type psql -l I get this error -sh-3.00$ psql -l psql: FATAL: database postgres does not exist why? Is it not possible to have multiple version installations i the same machine(in different ports)? PostgreSQL 7.4 doesn't install a postgres database by default. Try explicitly connecting to template1. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] spreadsheet interface
In response to hjenkins [EMAIL PROTECTED]: Some people in my workplace are asking if there exists a GUI that will allow cut-and-paste of multiple cells directly from (and, preferably, directly to) spreadsheets. pgAdmin III, PGAccess, and TOra don't seem to. Any suggestions? To add on to Thomas' comment. You can also install OpenOffice.org with the pgsql ODBC driver and use the OOo spreadsheet to access data directly. I haven't done this, personally, so I can't vouch for how well it works. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] install problem
Hey, I am working on testing our windows version of software which uses postgress on a machine with XP Pro and 512 meg of memory. The very first time I installed, it was fine except for the fact it was only listening on the localhost 127.0.0.1 and I need it listening on its ip address. Since, I have not been able to re-install. I have tried thoroughly cleaning machine and registry. The issue always comes back to a 1920 error, do you have permissions in the event viewer and always fails to start the service. I even wiped away my test box for our product and installed a fresh copy of windows on top of my old version. zip :-) Can someone tell me what I am doing wrong? We have an end of the year rush and we have a number of people world wide who would be using this once we can actually test it. Bill ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] postgres cannot access the server configuration file
Sebastien ARBOGAST [EMAIL PROTECTED] wrote: I'm trying to start postgreSQL server on my Macbook Pro. I've installed it using packages available here: http://www.kyngchaos.com/wiki/software:postgres But when I try to start up the server running sudo SystemStarter start PostgreSQL, I get the following message: postgres cannot access the server configuration file /usr/local/pgsql/data/postgresql.conf: No such file or directory And as a matter of fact, there is no such file in this directory. Do I have to create it manually? You need to run initdb to create the directory: http://www.postgresql.org/docs/8.3/static/app-initdb.html -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] postgres cannot access the server configuration file
Sebastien ARBOGAST [EMAIL PROTECTED] wrote: 2007/12/15, Bill Moran [EMAIL PROTECTED]: Sebastien ARBOGAST [EMAIL PROTECTED] wrote: I'm trying to start postgreSQL server on my Macbook Pro. I've installed it using packages available here: http://www.kyngchaos.com/wiki/software:postgres But when I try to start up the server running sudo SystemStarter start PostgreSQL, I get the following message: postgres cannot access the server configuration file /usr/local/pgsql/data/postgresql.conf: No such file or directory And as a matter of fact, there is no such file in this directory. Do I have to create it manually? You need to run initdb to create the directory: http://www.postgresql.org/docs/8.3/static/app-initdb.html The problem is that I need the password of the postgres user that has been created automatically for me. I've tried postgres but it doesn't seem to work. And since I can't see the user in my Preference Pane, I can't change his password. Please don't top-post. And please don't respond personally to email that was originated on the mailing list. I've returned pgsql-general@postgresql.org to the list of recipients. If you installed the software, you obviously have root access, so just change the postgres password to something you know. While I'm not familiar with the Mac OS installation procedure, I'd assume that account was created without login capability (probably without a password at all) which is good, sound security practice. Software shouldn't create users with known passwords. You can also use sudo to switch to the postgres user without needing the password for the postgres user (although it will probably ask you for the root password again -- not entirely sure how Mac OS is set up by default) This approach has become pretty much par for the course on modern POSIX systems. Try: sudo -u postgres initdb -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Need LIMIT and ORDER BY for UPDATE
In response to D. Dante Lorenso [EMAIL PROTECTED]: Bill Moran wrote: D. Dante Lorenso [EMAIL PROTECTED] wrote: All, I'd really like to have ORDER BY and LIMIT for UPDATE and DELETE commands. Is this possible? UPDATE invoice i SET reserve_ts = NOW() + '1 hour'::timestamp FROM account a WHERE a.acct_id = i.acct_id AND i.reserve_ts NOW() AND a.status = 'A' AND i.is_paid IS FALSE ORDER BY i.create_ts ASC LIMIT 1 RETURNING invoice_id; This query would find JUST ONE invoice record which is not paid and reserve the right to operate on the row using the 'reserve_ts' column for all active accounts. The one row would be the oldest invoice matching the criteria. Only that one row would be updated and the invoice_id of the updated row (if any) would be returned. Running a query like this over and over would pop just one record off the queue and would guarantee an atomic reservation. While I'm not going to argue as to whether your suggestion would be a good idea or not, I will suggest you look at SELECT FOR UPDATE, which will allow you to do what you desire. UPDATE invoice SET reserve_ts = NOW() + '1 hour'::interval WHERE invoice_id = ( SELECT invoice_id FROM invoice i, account a WHERE a.acct_id = i.acct_id AND i.reserve_ts NOW() AND a.status = 'A' AND i.is_paid IS FALSE ORDER BY i.create_ts ASC LIMIT 1 FOR UPDATE ) RETURNING invoice_id; Does this do the same thing while still remaining a single atomic query that will guarantee no race conditions during the inner select/update? ERROR: SELECT FOR UPDATE/SHARE is not allowed in subqueries Guess not. BEGIN; SELECT invoice_id FROM invoice i, account a WHERE a.acct_id = i.acct_id AND i.reserve_ts NOW() AND a.status = 'A' AND i.is_paid IS FALSE ORDER BY i.create_ts ASC LIMIT 1 FOR UPDATE; UPDATE invoice SET reserve_ts = NOW() + '1 hour'::interval WHERE invoice_id = [previously selected value]; COMMIT; And before you start asking a lot of won't this x or y, please read the docs: http://www.postgresql.org/docs/8.1/static/sql-select.html#SQL-FOR-UPDATE-SHARE Then feel free to ask more questions. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Need LIMIT and ORDER BY for UPDATE
D. Dante Lorenso [EMAIL PROTECTED] wrote: All, I'd really like to have ORDER BY and LIMIT for UPDATE and DELETE commands. Is this possible? UPDATE invoice i SET reserve_ts = NOW() + '1 hour'::timestamp FROM account a WHERE a.acct_id = i.acct_id AND i.reserve_ts NOW() AND a.status = 'A' AND i.is_paid IS FALSE ORDER BY i.create_ts ASC LIMIT 1 RETURNING invoice_id; This query would find JUST ONE invoice record which is not paid and reserve the right to operate on the row using the 'reserve_ts' column for all active accounts. The one row would be the oldest invoice matching the criteria. Only that one row would be updated and the invoice_id of the updated row (if any) would be returned. Running a query like this over and over would pop just one record off the queue and would guarantee an atomic reservation. While I'm not going to argue as to whether your suggestion would be a good idea or not, I will suggest you look at SELECT FOR UPDATE, which will allow you to do what you desire. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Hijack!
In response to Gregory Williamson [EMAIL PROTECTED]: -Original Message- From: [EMAIL PROTECTED] on behalf of Joshua D. Drake Sent: Tue 12/11/2007 9:43 AM To: [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Hijack! -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, 11 Dec 2007 16:31:40 + Raymond O'Donnell [EMAIL PROTECTED] wrote: // Please note in particular the following points of netiquette: * Don't top-post, as it makes for confusing reading. * Don't start a new thread by replying to an old one, because [insert suitable technical explanation here]. Failure to observe the above may result in your question going unanswered. // O.k. this might be a bit snooty but frankly it is almost 2008. If you are still a top poster, you obviously don't care about the people's content that you are replying to, to have enough wits to not top post. However, I would also note that in windows world, it is very common to top post. I am constantly retraining very smart, just very ignorant customers. * Not all mail clients deal well with inline/bottom quoting (manually added to lines here since my mail reader does not do so automatically -- imagine doing so for a complex quote!) I recommend finding a better mail program. There are lots out there. Being forced to use substandard software in this day and age is a crime. * Top posting is very common in companies with lots of blackberry (etc) users since they seem to see only tops easily. Illicit drug use is very common in many areas. Denigration of women simply because they are women is common in many parts of the world. Crying everyone else is doing it is not a valid argument in my book. * my mail client *always* starts at the top of the message. For rapid/internal mails top posting works better because the answer/most recent is always at the top. Complex messages do deserve in-posting but not always easy, especially if you have to do it manually). Does your mail browser always start at the bottom ? I always see the top of a message first. Simple threads work very well this way -- complicated ones collapse under top-posting. This is a tired, overused argument that has little value. * a lot of us have to use what ever the company provides as mail server. Are you saying your mail server forces you to top post? That's a new one. Exchange sucks but I'd rather not quit my job just because _you_ have a problem reading mail that does not conform to the T to your expectations. And there is a limit to how much time I want to spend manually formatting your mail to respond to it. There's a limit to the amount of time I'm willing to spend trying to make heads/tails of an incomprehensible email. I think I deleted over 100 emails last week after seeing how badly formatted they were, even though I probably had the expertise to offer helpful information. I don't complain about people top-posting because I don't like it. I complain because it makes it more difficult for me to help, and thus less likely to do so, and I know that other, knowledgeable people feel the same way. I complain about top-posting because I know that the person is less likely to get helpful replies if they format their email poorly. Note that a lot of postGIS mail list posts are top-posted and the complaint rate is vanishingly small. Yet somehow business clanks on. Imagine that! And I can't even use exchange/outlook -- web interface to Micro$soft really sucks. Again, you're asking a community to offer you free help in spite of the fact that your tools suck. I'm not saying nobody will do it, all I'm saying is that if you make it too difficult for people to help, they won't. * Try to see the world from a perspective other that your own (admittedly superior) one ! Not everyone is so advanced. I do see it from other perspectives. I can still see it from the perspective of a Bill Moran from 10 years ago who got chewed out for top-posting because I didn't know anything and didn't get very good help because I didn't formulate good questions. That's a Bill Moran who learned _because_ people pointed out what I was doing wrong. I'm trying to pass the favor on when I point out problems with folks emails. I'm not trying to be an asshole -- that happens naturally. * Get a life Of course. How about: 1) I'll stop replying to emails that are formatted too badly to understand. 2) You accept that the rules of this community are no top posting and stop dragging this discussion out and accept that top-posted emails won't be responded to. Since nobody smart will offer advice on how to better format emails any more, the newbies will remain ignorant and never learn. That's obviously the best thing we can do for the community. Actually, I'd rather just continue to politely point out the rules of the list to newbies and help the world become a better place
Re: [GENERAL] top posting
In response to Collin Kidder [EMAIL PROTECTED]: Geoffrey wrote: Collin Kidder wrote: I have to suffer through dealing with people like the two of you quoted above. You can deal with people who'd like to top post. Anything else is just being a spoiled baby who can't deal with minor issues. If all the energy spent crying about top posting were used to fuel cities none of us would be paying for power right now. Sorry to be so blunt but it really irritates me when people cry like 4 year olds about top posting. It's not that bad, get over it. If it's not brought to the attention of the masses, then it will simply grow, and it simply is not the way it's done on this list. Get use to it. Now who's doing the 4 year old crying?? Yes, I'm bitching, crying, or whatever you'd like to call it. But you notice, I'm still attempting to follow the proper posting etiquette for this list. However, I do not see any actual valid reason that top posting cannot ever be acceptable except that some people are way too stuck in a mental rut and refuse to allow for anything other than their way. This is called Trolling Whether or not you are doing it on purpose is irrelevant. The effect is still the same, even if you do it accidentally. The point has been brought up again and again and again: top posting makes it difficult for the veterans on this list to understand and respond to your email. As a result, in order to get the best possible response, DO NOT TOP POST. Somehow, you continue to bring this back around to how we hate top-posting and despise top-posters and whatever else it is you're saying. I'm not aware of _anyone_ ever being banned or anything horrible as a result of top-posting. The worst thing that happens is that busy people begin ignoring the thread, and this is what me (and others) who say please don't top-post are trying to avoid. If you want to turn it into some personal war or something, please don't do it on the list. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Transaction problem
In response to x asasaxax [EMAIL PROTECTED]: Its just use a constraint then? there´s no problem id two sessions decrease the number, and this number goes to less then or equals as zero? I´m programming with php. BEGIN; SELECT quantity FROM products WHERE productid=[productid] FOR UPDATE; [Check in PHP to ensure enough product exists for this purchase] UPDATE products SET quantity=[new quantity after purchase] WHERE productid=[productid]; [... any other table updates you need to do for this transaction ...] COMMIT WORK; SELECT ... FOR UPDATE will prevent other transactions from locking this row until this transaction completes. It guarantees that only 1 transaction can modify a particular row at a time. See the docs for more details: http://www.postgresql.org/docs/8.1/static/sql-select.html#SQL-FOR-UPDATE-SHARE http://www.postgresql.org/docs/8.1/static/explicit-locking.html 2007/12/3, Cesar Alvarez [EMAIL PROTECTED]: What are you programing with?. are you using npgsql? Regards Cesar Alvarez. Hi everyone, I would like to know how can i do a simple transaction for this situation: I have n products in certain row of a table. When the user buys a product, the quantity of this product will be decreased. The user can only buy a product that has a quantity n 0. This means that when the user send the product confirmation to the system, the bd will decrease the product quantity with a transaction if the number of product in stock is greater than zero. Did anyone knows how can i do that with postgre? Thanks a lot. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] PostgreSQL Beta4 released
In response to Joshua D. Drake [EMAIL PROTECTED]: Thanks to all the testing, feedback and bug reports the community has performed with the current betas, we now have our fourth beta of 8.4. I assume you meant 8.3. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Avoid huge perfomance loss on string concatenation
-12-04' and dok.kuupaev||dok.kellaaeg BETWEEN '2007-11-01' AND '2007-12-0423 59' so it runs fast ? Andrus. PostgreSQL 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Killing a session on windows
Use select pg_cancel_backend(pid) instead -- we have to do this periodically when queries get timed out by the web server but Postgres doesn't notice / doesn't get notified... - Bill -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Howard Cole Sent: Thursday, November 29, 2007 5:55 AM To: 'PgSql General' Subject: [GENERAL] Killing a session on windows I have a database I want to drop on a windows server. Unfortunately I cannot restart postgres because it is running several live database. To kill the offending session, I tried select * from pg_stat_activity to find the PID of the session, and then tried to kill it with command line: taskkill /f /pid 1234 This appeared to kill the session, but postgres still thinks the session is live with the same process id. Using the SysInternals process explorer - there doesn't appear to be a process with the given ID. How can I get postgres to drop this session? Thanks Howard Cole www.selestial.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Calculation for Max_FSM_pages : Any rules of thumb?
In response to Ow Mun Heng [EMAIL PROTECTED]: Even with the regular vacuuming and even a vacuum full ( on my test DB) I still see that perhaps something is wrong (from the below) (I got this gem from the mailling list archives) hmxmms= SELECT c.relname, c.reltuples::bigint as rowcnt, pg_stat_get_tuples_inserted(c.oid) AS inserted, pg_stat_get_tuples_updated(c.oid) AS updated, pg_stat_get_tuples_deleted(c.oid) AS deleted FROM pg_class c WHERE c.relkind = 'r'::char GROUP BY c.oid, c.relname, c.reltuples HAVING pg_stat_get_tuples_updated(c.oid) + pg_stat_get_tuples_deleted(c.oid) 1000 ORDER BY pg_stat_get_tuples_updated(c.oid) + pg_stat_get_tuples_deleted(c.oid) DESC; relname| rowcnt | inserted | updated | deleted ---+--+--+-+-- tst_r | 11971691 |0 | 0 | 22390528 -- pg_statistic | 1465 | 280 |7716 | 153 dr_ns | 2305571 | 1959 | 0 | 1922 pg_attribute | 3787 | 1403 | 184 | 1292 No matter how many times I vacuum/full the deleted number still doesn't go down. Are you sure you're interpreting that number correctly? I took it to mean a counter of the number of delete operations since server start. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Calculation for Max_FSM_pages : Any rules of thumb?
In response to Ow Mun Heng [EMAIL PROTECTED]: On Thu, 2007-11-08 at 12:02 -0500, Bill Moran wrote: Frequently, when people ask for help because they've exceed max_fsm*, it's because they're not paying attention to their systems, and therefore the problem has been occurring for a while before it got so bad that they couldn't ignore it. As a result, a full vacuum is frequently a necessity. Folks who are monitoring their databases closely don't hit this problem nearly as often. How does one monitor it closely anyway? the warning comes when one does a vacuum verbose and with autovacuum turned on, I don't even see it anywhere. 1) Run vacuum verbose from cron on a regular basis and have the output emailed to you. 2) Capture and graph (I use mrtg) various stats that would indicate to you that something is wrong. Some suggestions are graphing the output of pg_database_size(), various stuff captured from the pg_buffercache addon. I also graph transactions/second and other stats, but those are useful for detecting _other_ problems, unrelated to vacuuming. It's amazing to me how many people just throw up a database and expect it to just magically work forever. Actually, this isn't isolated to databases ... I've seen people with fileservers run around one day saying the fileserver is full, someone delete some files! If it's a fileserver, why aren't you monitoring disk usage so you see this coming? If it's a database server, you should be monitoring critical stats on it. Then you can throw out all those silly rules of thumb and use some actual data! -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Calculation for Max_FSM_pages : Any rules of thumb?
In response to Vivek Khera [EMAIL PROTECTED]: On Nov 1, 2007, at 8:51 PM, Ow Mun Heng wrote: Another question is, based on what I've read in the archives (in my laptop.. No-Inet conn @ work) Since I've overran my max_FSM, I'm basically screwed and will have to do a vacuum verbose FULL on the entire DB. Crap.. I've seen this repeated many times as well, and I can't think of a really good reason why this should be true. It's not inherently true, it's just likely. Once you increase max fsm pages, won't the very next regular vacuum find all the free space in pages and add them to the map anyway? Yes. Ie, you've not lost any free space once the next regular vacuum runs. At worst, you've got a slightly bloated table because you allocated more pages rather than re- using some, but is that worth a full vacuum? The situation you just described is the reason I recommend a full vacuum after such a situation has occurred. No, it's not required in all cases, but it's a lot easier to recommend than the research required to determine whether or not your table bloat is excessive enough to warrant it. If you can make the time to do the full vacuum, it's probably worth it, just for peace of mind. If it's difficult to schedule a full vacuum, then you need to carefully review various page usages to see if any individual tables are worth it and/or all kinds of careful consideration. As a result, I recommend a full vacuum, and if the person complains that they can't schedule it, _then_ I go into the details of how to figure out what else can/should be done. So I guess I'm recommending it to make my own life easier :) I don't think it will be unless you're *way* under the fsm pages needed and have been for a long time. Frequently, when people ask for help because they've exceed max_fsm*, it's because they're not paying attention to their systems, and therefore the problem has been occurring for a while before it got so bad that they couldn't ignore it. As a result, a full vacuum is frequently a necessity. Folks who are monitoring their databases closely don't hit this problem nearly as often. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] INSERT performance deteriorates quickly during a large import
with Red Hat Enterprise Linux ES release 4 (Linux 2.6.9-42.0.3.ELsmp) on 2xDual Core AMD Opteron(tm) Processor 270 (4x2GHz logical CPUs) with 2GB RAM While 2G is better, that's still not a lot of RAM if you're trying to run 2 DB servers and a web server on a single system. If you haven't tuned PG to take advantage of it, then it won't help much anyway. Additionally, you've neglected to mention the disk subsystem on this machine as well. Is it running cheapo SATA drives because the price/gig is right? * both servers run in x86_64 mode, PostgreSQL footprint in memory stays relatively small, Of course it does, because you've told it not to use more than 8M of RAM. CPU usage maxes out on import, there is no resource starvation in any way You do realize that you're contradicting yourself here, right? The advice provided by others is good as well, so I won't repeat it. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: DB on a ramdisk (was Re: [GENERAL] Temporary, In-memory Postgres DB?)
In response to Gauthier, Dave [EMAIL PROTECTED]: One question I had earlier that I don't think got answered was how to undo an initdb. dropdb drops a DB, but how do I undo an initdb? rm -rf the directory in which you put the initdb. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Syntax error in a large COPY
In response to Thomas Kellerer [EMAIL PROTECTED]: Tom Lane, 07.11.2007 06:14: Thomas Kellerer [EMAIL PROTECTED] writes: If everyone simply top-posted, there would be no need for me to scroll down, just to find a two line answer below a forty line quote - which I personally find more irritating than top-posting. I think you're ignoring my basic point, which was that people shouldn't be quoting forty lines' worth in the first place. *Especially* not if they only have two lines to contribute. No, I did get your point. My point is: with top-posting I don't care how many lines were repeated because I don't have to scroll. Considering there is an RFC that recommends inline posting over top-posting (http://tools.ietf.org/html/rfc1855), and considering the fact that this topic has been beat to death on dozens of mailing lists and the predominant preference is _not_ for top-posting -- perhaps you should either follow the preferences of the group, or leave the group. But this horse has been beat to death before... Obviously not, as it keeps coming back to life. I guess it's an undead horse? -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Postgresql simple query performance question
In response to SHARMILA JOTHIRAJAH [EMAIL PROTECTED]: Hi We are in the process of testing for migration of our database from Oracle to Postgresql. I hava a simple query Select count(*) from foo This is asked a lot. The quick answer is that PostgreSQL method of MVCC makes it impossible to make this query fast. Perhaps, someday, some brilliant developer will come up with an optimization, but that hasn't happened yet. There may be some tweaks you can make to your tuning, see inline below. However, if you really need a fast, accurate count of rows in that table, I recommend you create a trigger to track it. This table has 29384048 rows and is indexed on foo_id The tables are vacuumed and the explain plan for postgresql is QUERY PLAN -- Aggregate (cost=1194020.60..1194020.61 rows=1 width=0) (actual time=68797.280..68797.280 rows=1 loops=1) - Seq Scan on foo (cost=0.00..1120560.48 rows=29384048 width=0) (actual time=0.232..60657.948 rows=29384048 loops=1) Total runtime: 68797.358 ms The explain plan for oracle is OPERATIONOBJECTACCESS_PREDICATES FILTER_PREDICATES --- SELECT STATEMENT () (null)(null)(null) SORT (AGGREGATE)(null)(null)(null) INDEX (FULL SCAN) foo_IDX_ID (null)(null) Oracle uses index for count(*) query in this case This query in Oracle takes only 5 sec and in postgresql it takes 1 min 10sec The same query in oracle without the index and full table scan(like in postgresql) has the explain plan like this and it takes 34 sec. select /*+ full(foo1) */ count(*) from foo1 OPERATIONOBJECT ACCESS_PREDICATES FILTER_PREDICATES --- -- SELECT STATEMENT () (null) (null)(null) SORT (AGGREGATE)(null) (null)(null) TABLE ACCESS (FULL) foo (null)(null) In short the query Select count(*) from foo takes the following time: Postgresql - 1m 10 sec Oracle(index scan) - 5 sec Oracle (full table scan) - 34 sec How can I speed up this query in postgresql ? The other postgres settings are postgresql max_connections = 100 shared_buffers = 5 How much memory does this system have? What version of PostgreSQL are you using? If you're using an 8.X version and have more 2G of RAM, this value is likely too low. Start with 1/4 the available RAM and tune from there. temp_buffers = 5000 work_mem = 16384 maintenance_work_mem = 262144 fsync = on wal_sync_method = fsync effective_cache_size = 30 random_page_cost = 4 cpu_tuple_cost = 0.01 cpu_index_tuple_cost = 0.001 cpu_operator_cost = 0.0025 Are there any tuning that need to be done in the OS or database side? I had attached the iostat and vmstat results of postgresql -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Postgresql simple query performance question
In response to Reg Me Please [EMAIL PROTECTED]: I have no doubt you're right, Pavel. But why not? It could be a simple enhacement. It's not simple. Do some searches on the mailing lists and you will find discussion of why it's difficult to do. Il Tuesday 06 November 2007 15:11:02 Pavel Stehule ha scritto: Hello PostgreSQL doesn't use index for COUN(*) http://www.varlena.com/GeneralBits/18.php http://sql-info.de/en/postgresql/postgres-gotchas.html#1_7 Regards Pavel Stehule On 06/11/2007, SHARMILA JOTHIRAJAH [EMAIL PROTECTED] wrote: Hi We are in the process of testing for migration of our database from Oracle to Postgresql. I hava a simple query Select count(*) from foo This table has 29384048 rows and is indexed on foo_id The tables are vacuumed and the explain plan for postgresql is QUERY PLAN -- Aggregate (cost=1194020.60..1194020.61 rows=1 width=0) (actual time=68797.280..68797.280 rows=1 loops=1) - Seq Scan on foo (cost=0.00..1120560.48 rows=29384048 width=0) (actual time=0.232..60657.948 rows=29384048 loops=1) Total runtime: 68797.358 ms The explain plan for oracle is OPERATIONOBJECTACCESS_PREDICATES FILTER_PREDICATES --- SELECT STATEMENT () (null)(null) (null) SORT (AGGREGATE)(null)(null) (null) INDEX (FULL SCAN) foo_IDX_ID (null)(null) Oracle uses index for count(*) query in this case This query in Oracle takes only 5 sec and in postgresql it takes 1 min 10sec The same query in oracle without the index and full table scan(like in postgresql) has the explain plan like this and it takes 34 sec. select /*+ full(foo1) */ count(*) from foo1 OPERATIONOBJECT ACCESS_PREDICATES FILTER_PREDICATES --- -- SELECT STATEMENT () (null) (null) (null) SORT (AGGREGATE)(null) (null) (null) TABLE ACCESS (FULL) foo (null) (null) In short the query Select count(*) from foo takes the following time: Postgresql - 1m 10 sec Oracle(index scan) - 5 sec Oracle (full table scan) - 34 sec How can I speed up this query in postgresql ? The other postgres settings are postgresql max_connections = 100 shared_buffers = 5 temp_buffers = 5000 work_mem = 16384 maintenance_work_mem = 262144 fsync = on wal_sync_method = fsync effective_cache_size = 30 random_page_cost = 4 cpu_tuple_cost = 0.01 cpu_index_tuple_cost = 0.001 cpu_operator_cost = 0.0025 Are there any tuning that need to be done in the OS or database side? I had attached the iostat and vmstat results of postgresql Thanks __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Reg me Please ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql simple query performance question
In response to André Volpato [EMAIL PROTECTED]: Richard Huxton escreveu: Reg Me Please wrote: While I would not spend resources in fine tuning the count(*), I would spend some to underastand why and how the other ones do it better. Just to be better. The problem is well understood, and there is extensive discussion in the mailing lists archives. The basic problem is that with PG's implementation of MVCC the indexes don't have row visibility information. The simple solution of adding it to every index entry would increase index size substantially imposing costs on every index access and update. There's a thread in -hackers called Visibility map thoughts that is looking at the situation again and if/how to implement visibility information in a compact form. Remember that you can always use serial fields to count a table, like: alter table foo add id serial; select id from foo order by id desc limit 1; This should return the same value than count(*), in a few msecs. I don't think so. What kind of accuracy do you have when rows are deleted? Also, sequences are not transactional, so rolled-back transactions will increment the sequence without actually adding rows. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Calculation for Max_FSM_pages : Any rules of thumb?
Ow Mun Heng [EMAIL PROTECTED] wrote: I just ran a vacuum verbose on the entire DB and this came out. number of page slots needed (274144) exceeds max_fsm_pages (153600) Hence, I've changed the max to 400,000 (pulled it straight out of the air). How does one calculate what's the number needed anyway? It's not simple. Every update or delete creates a dead tuple that needs to be tracked by an fsm entry. So it depends on how frequently your database is changing in between vacuum runs. In my experience, the best bet is to do vacuum verbose on a regular basis and get a feel for what you need. Every database load is different. Another question is, based on what I've read in the archives (in my laptop.. No-Inet conn @ work) Since I've overran my max_FSM, I'm basically screwed and will have to do a vacuum verbose FULL on the entire DB. Crap.. You don't _need_ to. But it's generally a good idea to get table bloat reduced. (I was playing with pgfouine and then I found the above piece of advice) I'm planning to run vacuum verbose full tonight/over the weekend. (is this sane?) Thanks for the advice.. vacuum full is sane, if that's what you mean. The only problem is that it locks tables while working on them, so you have to take into account what other workload might be blocked while vacuum full is working, and how long vacuum full is liable to take. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] (Never?) Kill Postmaster?
In response to Stefan Schwarzer [EMAIL PROTECTED]: Hi there, I read dozens of times the TIP 2: Don't 'kill -9' the postmaster... Now, what am I supposed to do if I launched a query which takes ages, and which I want to interrupt? Thanks for any advice, Tracing through the other threads, this is obviously not the regular wayward query, but one that won't die by the normal methods. Unfortunately, I came across this recently, and the only solution I found was to do a pg_ctl restart -m i (Yes, I tried -m f first). Luckily, the db in question was such that the front ends didn't suffer horribly from this and reconnected, and that the database finished up its recovery in a timely manner. Hopefully, I can generate a reproducible example so I can file a bug, but haven't gotten that far with it yet. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Indexes Primary Keys (based on the same columns)
In response to Joshua D. Drake [EMAIL PROTECTED]: Ow Mun Heng wrote: I'm wondering if what I'm doing is redundant. I have a primary key on columns (A,B,C,D) and I've also defined an index based on the same columns (A,B,C,D) and sometimes in the query explain, I see the pkey being used for the scan instead of the index. So.. That made me think perhaps the additional index on the _same_ parameter is redundant. A primary key creates an index so having a second index with the same definition is redundant. Note the same definition. Since this is a multi-column index, there may be some advantage gained by having indexes defined slightly differently. I.e., your PK is (ABCD) but you have an additional index on (DCBA) Whether or not this is actually helpful depends on the nature of the queries you run. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] looking for some real world performance numbers
In response to Gregory Stark [EMAIL PROTECTED]: Tom Lane [EMAIL PROTECTED] writes: Thomas Kellerer [EMAIL PROTECTED] writes: Where else do they want to store relational data than in a RDBMS? Indeed. It seems like we can hardly answer the OP's question without asking compared to what? If they're afraid an RDBMS won't scale, what have they got in mind that they are so certain will scale? I suspect they're misapplying the lesson Google taught everyone. Namely that domain-specific solutions can provide much better performance than general-purpose software. Google might not use an RDBMS to store their search index (which doesn't need any of the ACID guarantees and needs all kinds of parallelism and lossy alorithms which SQL and RDBMSes in general don't excel at), but on the other hand I would be quite surprised if they stored their Adsense or other more normal use data structures in anything but a bog-standard SQL database. Google also has enough high-calibre people that they can probably re-invent the concept of an RDBMS if they want to. Yet they don't. I know a particular Googleite who's a PostgreSQL buff and is bummed that they use MySQL all over the place. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] keeping an index in memory
Rajarshi Guha [EMAIL PROTECTED] wrote: Hi, relating to my previous queries on doing spatial searches on 10M rows, it seems that most of my queries return within 2 minutes. Generally this is not too bad, though faster is always better. Interestingly, it appears that the CUBE index for the table in question is about 3GB (the table itself is about 14GB). Not knowing the details of the postgres internals, I assume that when a query tries to use the index, it will need to read a 3GB file. Is this a correct assumption? In such a situation, is there a way to keep the index in memory? My machine has 8GB installed and currently has about 7.4GB free RAM (64 bit linux 2.6.9) Free or cached/buffered? Your OS should be using most of that to buffer disk blocks. A side effect of the size of the index is that if I do a query that performs a seq scan (say using cube_distance) it takes longer than when an index is used, but not significantly longer. And this is on a 10M row table. What strategies do people follow when the index becomes very big? What version of PG are you using and what is your shared_buffers setting? With 8G of RAM, you should start with shared_buffers around 2 - 3G, if you're using a modern version of PG. With that much shared memory, a large portion of that index should stay in RAM, as long as it's being used often enough that PG doesn't swap it for other data. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] looking for some real world performance numbers
snacktime [EMAIL PROTECTED] wrote: I'm working through the architecture design for a new product. We have a small group working on this. It's a web app that will be using ruby on rails. The challenge I'm running into is that the latest conventional wisdom seems to be that since obviously databases don't scale on the web, you should just not use them at all. Who are the people saying this? It doesn't sound very wise to me. Where are they proposing to put the data, if not in a database? That's what I'd like to know. I have a group of otherwise very bright people trying to convince me that a rdbms is not a good place to store relational data because eventually it won't scale. What is _their_ evidence? And of course we don't even have version 1 of our product out of the door. E.S. Raymond's The Art of UNIX Programming: Rule #15: Write a prototype before you optimize. Nothing is funnier than watching people try to performance optimize software that hasn't even been written yet. Very few people are smart enough to know where the performance bottlenecks will be before they've coded anything. If they insist on doing it wrong, at least you'll have a good laugh. I'll admit we do have a very good chance of actually getting tons of traffic, but my position is to use a rdbms for relational data, and then if and when it won't scale any more, deal with it then. That's sane. So what would really help me is some real world numbers on how postgresql is doing in the wild under pressure. If anyone cares to throw some out I would really appreciate it. http://people.freebsd.org/~kris/scaling/ Lots of interesting graphs on that page ... most of them seem to indicate that RDBMS scale rather nicely. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] a failover scenario
In response to Tomi N/A [EMAIL PROTECTED]: I am considering pgsql as the RDBMS in a project with the following constraints: - there's a master and reserve instance of the RDBMS on every remote location - there's a master and reserve instance of the RDBMS on a central location - the connections are DSL connections and therefore unreliable - all changes have to be propagated to all servers (multimaster replication) - if the connection between a remote location and the central location fails, the local server continues working and resynchronizes with the central server when the connection is restored - if any master fails, the reserve instance takes over and the rest of the system acts as though nothing happened The master/reserve instance is, from what I read, standard functionality, but I'm not so sure about the resynchronization part of a failed link...I imagine something like WAL shipping might be of use here, but it's just an uneducated guess. Does code exist to support this on pgsql or is it considered application specific functionality? Do other RDBMSs support similar functionality? I don't know of any system that will just hand you those capabilities. Every multi-master system I've ever heard of requires high-speed links between the masters, otherwise the synchronization is far too slow to be usable. I believe you could do what you want in the application. PostgreSQL 8.3 will have a native UUID type, which will help with managing conflicts between multiple masters. If you can define clear rules on how to manage conflicts, that can be done automatically. If the rules aren't so clear, you'll need an interface where a human can manage conflicts. With triggers and LISTEN/NOTIFY, you can put together an app that handles replicating data when tables experience changes. From there, you'll need to structure your schema so such an app can detect conflicts, (create last_updated timestamps on all tables, and ensure that primary keys include a UUID or other mechanism to guarantee uniqueness) and design some sort of queue mechanism to ensure updates can wait while network problems are resolved. How much effort such a thing requires is dependent on how complex the data is. If it's a sales database (for example) it's not all that hard, since there aren't typical cases where two people are simultaneously updating the same record. I know, for example, that the PA gaming commission is putting something like this together for the race tracks. Each track has handheld devices that are used to record bets/payouts, etc. These devices can't be connected all the time, but a sync system is pretty easy because all they ever do is _add_ new records. Thus, you assign each handheld a unique device ID, and that's part of the primary key for each table, so there's no chance of of conflict. Sounds like a fun and challenging project. I'm jealous. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Possibilities of Fine Grained Access Control?
In response to Uden van Hout [EMAIL PROTECTED]: I need a solution for the following: With all data stored in the same schema, consumers may only retreive and update data that is relevant to them. At the same time, users in our own company need to see all data. Is a solution similar to Oracle's Virtual Private Database possible with PostgreSQL, as this is precisely what we need? Not familiar with Oracle's solution, but ... Without knowing the details, it's difficult to be sure if PostgreSQL's native security meets your needs. You can assign read/write/create permissions to databases, schemas, tables, and other objects: http://www.postgresql.org/docs/8.2/static/sql-grant.html This falls short if you need permissions at the row or column level, which PG doesn't support naively (unless this has been added in 8.3 and I simply haven't see the announcement). For that, the best approach I know for you is Veil: http://veil.projects.postgresql.org/curdocs/index.html -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] reporting tools
Geoffrey [EMAIL PROTECTED] wrote: Andrus wrote: I guess I should have noted that we will need to run this on Linux clients. Geoffrey, You can run FYIReporting engine in Linux using MONO ( www.go-mono.com ) Thanks, we're looking for something that will run natively on Linux. I read this, almost deleted it, read it again ... Just in case there's confusion, MONO + FYIReporting _is_ native on Linux. At least, as much so as Java on Linux is. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Need advice on keeping backup database up to date
In response to Matthew Wilson [EMAIL PROTECTED]: I have been using postgresql for my web application for a while now and it has been great. I want to set up a separate emergency failover server and database in a different data center. In the event that my primary data center becomes inaccessible, I want to update a DNS record and then redirect visitors to the backup data center. I am trying to figure out how I can keep the postgresql database in the backup data center as up to date as possible. The ideal solution would keep the backup database updated in real time as the primary database changes. I need advice about the best way to accomplish this. This sounds like a textbook case for Slony. Slony will not guarantee that your database is up to the second copy, but that's part of the beauty of it. The _only_ way you can guarantee that two databases in different datacenters are perfectly synchronized at all times is not to let an application move forward until it has received confirmation from both databases that a transaction has completed -- and doing that will absolutely kill performance. Slony will make a best effort. If traffic is low, it will keep the two withing a few fractions of a second of each other. If traffic gets busy, the backup will get behind, but when things slow down again, Slony will get them caught up. As long as your average database traffic does not exceed the available bandwidth, all will be well. Slony will also allow you pick/choose which tables you want to duplicate. This can optimize things, as it's not normally worthwhile to replicate things like session tables, and they usually eat up a lot of bandwidth. http://www.slony.info -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off
-Original Message- From: Michael Glaesemann [mailto:[EMAIL PROTECTED] Sent: Friday, October 05, 2007 3:25 PM To: Bill Bartlett Cc: 'Andreas Kretschmer'; pgsql-general@postgresql.org Subject: Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off (Makes me have to think twice about raising any _real_ issues though, like why my VACUUMs periodically keep getting into lock contentions with my JDBC connections and ultimately causing me to have to shut down Postgres w/ pg_ctl stop -m immediate, or how to properly handle postmaster errors that don't appear to be documented anywhere except in the source code... [No, it's not the absolutely most recent version of Postgres; No, I can't upgrade it.]) This isn't fair, Bill. Your original question concerned posts your email client has trouble processing, which isn't the primary topic of the list. You also knew that it was somewhat contentious, given that you've made comments such as Believe me, I'm not defending Outlook, so you probably weren't overly surprised at some of the responses you got. Also note that the responses you got were attempts to solve your problem: I don't see any that only belittled your choice of software. Most people didn't completely read my email and thus unfortunately completely missed the point, in many cases seemingly because they were too quick to jump on my use of Outlook as an email client (thus assuming I was just one of those terrible horrible know-nothing Windows users). It's obvious from 3 years of reading these forums that there is a Linux Postgres users good, Windows Postgres users bad bias by many (but not all) of the members rather than them realizing that 'we' are all Postgres users, 'those' other horrible people are MySQL / Oracle / etc users. I forgot that by posting about a header flag which could be traced back to a Microsoft product, I'd be stepping right smack into that muck. Rereading my original email, you can see that the problem I was trying to address was not my choice of email software but rather that several people in these Postgres listservs (like other people in other listservs) were intentionally misusing a specific header flag that is used by specific email programs (only the various Outlook-related ones, as far as I've been able to determine) to highlight messages for special handling. Granted that they were using it for joking purposes, but after a while, all jokes get old. (After spending 4 days in a row dealing with significant Postgres system crashes on several different servers, old just happened to rub me the wrong way whereas I normally just grit my teeth and ignore it.) After researching this flag further (after seeing the feedback I got on this forum), I've discovered that this type of misuse is frequently used, and even recommended on many Linux-oriented web sites as a means, to annoy Outlook-based users (as a means to annoy Windows users). As I mentioned above, I think in a forum such as this, where we need to all be Postgres users, I don't think it's appropriate to intentionally annoy any of our group. (That may not have been the intent, but after seeing the many Linux-oriented web sites and forums recommending its use for specifically this purpose, now it does annoy me even more.) (Just for the record, not that I should have to justify my background and biases [or hopefully lack thereof] to the group: I gave up fighting platform wars a LOOONG time ago, back when I used to try to get the corporate world to bring in Macs instead of Windows 3.x. Now I generally use the best tool for the job, or the tools I have to use when that's not an option. Insofar as systems and OS's, I am currently handling 140+ servers running a pretty much even split between Win2K3 and various versions of Linux (primarily several SuSE 9.x versions, a few Red Hats and at least one Debian), with VMware instances of 2 SuSE servers running inside this specific XP development desktop, managing pretty much everything remotely via SSH and bash (via Cygwin on the Windows servers). I may be using Windows on my desktop, but I don't think I'd put myself into the category of being merely one of those terrible horrible know-nothing Windows users.) If you *are* having issues with PostgreSQL, it would behoove you to at least post them and see what kind of response you get, rather than judge the list as a whole due to the response you got to an off-topic post. Many of the people on the lists have been here for years and have gotten lots of helpful advice, which is why they've stuck around, and are many others that are happy to share their advice and experience. You never know: you might be pleasantly surprised. I agree that the majority of the responses that I've seen over the last 3 years of reading several of the Postgres listservs were indeed attempts to help (including several replies that I've
[GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off
Quick request to the group: we have several members who include bogus or humorous X-Message-Flag headers in their email messages. Could I request that you _please_ turn them off? Because they come through as flagged messages in Outlook, it throws off my email rules processing and the messages end up into the wrong groups. (With the volume of email these days, I definitely need all the assistance I can get from things like rules processing to attempt to stay on top of it.) (Yes, I understand people's personal preferences for not liking Windows or Outlook or Microsoft, but that isn't going to change the applications that I need to use for my day-to-day work. Feel free to continue posting your feelings in email signatures though -- never can tell if it will indeed influence a change ...) Much appreciated! - Bill ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of A. Kretschmer Sent: Friday, October 05, 2007 10:57 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off am Fri, dem 05.10.2007, um 10:05:32 -0400 mailte Bill Bartlett folgendes: Quick request to the group: we have several members who include bogus or humorous X-Message-Flag headers in their email messages. Could I request that you _please_ turn them off? Because they come through as Do you mean me? Not specifically -- yours just happened to be the one that triggered it, but no, I see this in postings from other people as well. (Didn't mean to single you out though.) flagged messages in Outlook, it throws off my email rules processing and the messages end up into the wrong groups. (With the volume of This header is a special Outlook-Feature. If this header realy make problems in _this_ software, then i think, _this_ software is broken. But hey, tell news ;-) See other reply -- I'm not complaining that Outlook is broken but rather that the misuse of this flag slows down my workflow. The issue is that because this flag is useful when used properly (e.g.: for mail rules like take all messages flagged for followup and move them to specific other folders for different types of followup), having the flag be added to emails just for the sake of putting in a humorous message just gets in the way. (I keep needing to go fish the incorrectly-flagged messages back out of the various followup folders before I can completely follow the threads in the Postgres listservs.) And yes: there are any email-software available, without problems. Including Windows. Hmm ... I still use Pine on some of my really old Linux boxes -- does this count? (Useful for reading CRON output...) (Yes, I understand people's personal preferences for not liking Windows or Outlook or Microsoft, but that isn't going to change the applications that I need to use for my day-to-day work. Feel free to continue Okay, i like this list and i like the people here. I will try to disable this special header line for this and only this list (okay, for all postgresql-lists). Let me try, i'm not sure how to disable this header-line only for [EMAIL PROTECTED] Please, don't beat me if this header are still in this message, the problem is work in progress... i'm not sure if i change the config properly. Yes, it's working properly now. MUCH thanks! Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: - Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off
-Original Message- From: Michael Glaesemann [mailto:[EMAIL PROTECTED] Sent: Friday, October 05, 2007 10:45 AM To: Bill Bartlett Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off On Oct 5, 2007, at 9:05 , Bill Bartlett wrote: Quick request to the group: we have several members who include bogus or humorous X-Message-Flag headers in their email messages. Could I request that you _please_ turn them off? In all practicality, a request like this is futile: * Given the number of people on the list, you'll never get everybody to remove bogus X-Message-Flag headers * Given it's an X- header, doesn't that mean the meaning of the value is implementation dependent? What's bogus wrt Outlook may not be wrt another mail system or client * Doesn't this indicate that Outlook is broken (for some values of broken)? Actually, no -- this is why I listed a specific X- header ( X-Message-Flag ) rather than simply saying Hey, would everyone please turn off their X-headers. This specific X- header is designed to have Outlook flag a message and display an extra line of text with the flag comment above the email. Since this is generally only used to flag messages for followup actions, having messages come across with the header already embedded in them simply serves to add an extra distraction to the already-too-many bits into which I need to slice my time. (Not that I want more time for things like my having to spend 3 hrs yesterday regenerating sequences after diagnosing a database crash [bad disk controller on an old SuSE box] and restoring from backup, but that's my real world...) Because they come through as flagged messages in Outlook, it throws off my email rules processing and the messages end up into the wrong groups. (With the volume of email these days, I definitely need all the assistance I can get from things like rules processing to attempt to stay on top of it.) I sympathize. there *is* a lot of email traffic these days (and not just from the lists). But rather than request that others bend to your rules, I'd think a better solution would be to find (or develop) tools that do what you want. Whether that means better rule handling or better understanding of various headers, it sounds like Outlook isn't doing the job for you. Perhaps a hybrid approach would be helpful: use another email client for mailing lists and Outlook otherwise. See above -- the problem isn't with Outlook at a mail client. Outlook is doing exactly what it's supposed to do when it sees this X- header: highlighting it and flagging the message for special handling. The issue is with the headers being used (or misused) as they are. Believe me, I'm not defending Outlook; however, that's what I (and many other people) use -- it's just a tool to get a job done. (For all you non-Outlook people out there, since you aren't seeing the messages anyway, most of them are generally humorous messages like ' Windows is not the answer. Windows is the question and the answer is no! ' or ' Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) ' Yes, they were funny 2 years ago when I first saw them, but now it's gotten a bit old -- sorry. [No, I'm not meaning to single out Andreas -- his just happened to be the first two that I saw in my listserv items of useful Postgresql stuff to keep for reference folder.]) Hope this gives you some ideas. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off
-Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Friday, October 05, 2007 12:30 PM To: Bill Bartlett Cc: A. Kretschmer; pgsql-general@postgresql.org Subject: Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off On 10/5/07, Bill Bartlett [EMAIL PROTECTED] wrote: Hmm ... I still use Pine on some of my really old Linux boxes -- does this count? (Useful for reading CRON output...) If you need / want the familiar interface of pine on a modern linux box, look for cone. Note that there's also an open source implementation of pico called nano. Just FYI. I use both, quite a bit. I just starting using nano, although I still tend to use vi a lot more than I should. (Never got into the whole emacs thing tho.) I hadn't heard of cone, so I'll have to take a look for that. Thanks for the tip! (Long live the console!) - Bill ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andreas Kretschmer Sent: Friday, October 05, 2007 12:55 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off Bill Bartlett [EMAIL PROTECTED] schrieb: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of A. Kretschmer Sent: Friday, October 05, 2007 10:57 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Request: Anyone using bogus / humorous X-Message-Flag headers, could we please turn them off am Fri, dem 05.10.2007, um 10:05:32 -0400 mailte Bill Bartlett folgendes: Quick request to the group: we have several members who include bogus or humorous X-Message-Flag headers in their email messages. Could I request that you _please_ turn them off? Because they come through as Outlook is a PITA. As you can see, it breaks the lines in quotings. Hard to read, because any mail readers can colored different quoting planes. Disgusting. Agreed - never said it wasn't. But putting time into changing to something else is time that I just won't have for the foreseeable future. See other reply -- I'm not complaining that Outlook is broken but rather that the misuse of this flag slows down my workflow. The issue is that I think, you have any other problems with this crappy software. For instance, Outlook can't group messages by Message-IDs (Treading). For mailing-lists IMHO a no-go. Agreed - but I'm not debating the merits of Outlook. It's a tool that does a job, and for all its faults, for what I need right now and for the very limited time I have avaiable, it does the job. And yes: there are any email-software available, without problems. Including Windows. Hmm ... I still use Pine on some of my really old Linux boxes -- does this count? (Useful for reading CRON output...) Pine isn't a solution, mutt is a solution for instance. KMail is a solution, and Thunderbird, both for Linux and Windows. (and any other software, i'm a mutt-user) Again, for what I need and how I need to use it, pine is a perfectly good solution that works (although in looking at cone, that definitely looks like a much better solution). For the specific cases where I use pine, neither KMail nor Thunderbird is a workable solution at all, since I need to use a console-mode mail reader running on remote servers being accessed via SSH and without being able to forward X-Windows so anything graphical is out. (I always need to match an appropriate solution to the problem, and part of that is realizing that one single solution does not always fit every problem.) Please, don't beat me if this header are still in this message, the problem is work in progress... i'm not sure if i change the config properly. Yes, it's working properly now. MUCH thanks! No problem, but i'm not sure, if this the right way. YOU can't force all other people to omit things that make problems in your crappy software. And: YOU renounce for nice feature in modern software... Never mind -- it's not that big a deal, certainly not as much as everyone has turned it into. I assumed that this would be a simple request about a simple problem, but I didn't realize I'd have to spend so much time trying to justify to so many people on the listserv the tools I have to use to do my job. (btw: The heading is back on in this reply, so there must be some other setting somewhere in your mail program that needs to be tweaked. But don't worry about trying to change anything else -- my effort involved in correcting the misdirected emails is probably less than your effort in trying to configure different headers for different groups, so I wouldn't want you to waste any more time on it.) Anyway - to those people still slogging through this thread who haven't moved on to something more productive: no more emails on this please; I'm done with this subject. (Makes me have to think twice about raising any _real_ issues though, like why my VACUUMs periodically keep getting into lock contentions with my JDBC connections and ultimately causing me to have to shut down Postgres w/ pg_ctl stop -m immediate, or how to properly handle postmaster errors that don't appear to be documented anywhere except in the source code... [No, it's not the absolutely most recent version of Postgres; No, I can't upgrade it.]) (sorry, english isn't my native language and i know my english is bad) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) If I was god, I would recompile penguin with --enable-fly. (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889
Re: [GENERAL] Large Result and Memory Limit
In response to Scott Marlowe [EMAIL PROTECTED]: On 10/4/07, Mike Ginsburg [EMAIL PROTECTED] wrote: export it. Memory limit is a major concern, but the query for one row returns a result set too large and PHP fails. If this is true, and one single db row makes php exceed its memory limit just by returning it, you've done something very very in your design. I work with Mike, and I'm not sure there's something very wrong in the design. The problem we're looking at is an export. This big TEXT field is nothing more than a staging area. The export itself can take several hours to run, so we background the process, and store the result in a big TEXT field (it ends up being a CSV). Later, the user can log in and download the exported file via the web interface. (But it's unlikely that anyone is going to click export, then wait 2 hours for their browser to refresh :) The process is, web interface - request export - get an estimate on completion time - come back later and check the status - download if complete. It's difficult (maybe impossible) to materialize the data on a schedule, since the user has a number of options how how to export the data (filters, etc) so we chose to do it on-demand. You'd proably be better served using either a plain text file system to store these things, or large objects in postgresql. We have multiple web servers with a load balancer, so saving the result to a file doesn't really work. We could put shared storage in place, but I'm still not convinced that's the best fix for this. Large objects are one option we were considering. The problem is they don't work with Slony, and our redundancy relies on Slony. Granted, we might want to just make this a non-replicated table, since it's only a staging area anyway, but we're looking for a better solution. My (specific) question is whether or not anyone has experimented with putting something like a CSV file in a table with one tuple for each row, and compared performance, etc to putting it in a large object? But if you're stuffing ~8 megs worth of csv text data into a single row* you're probably not using a very relational layout of your data. And you're losing all the advantages (checking your data for consistency and such) that a relational db could give you. Actually, we _are_ using lots of tables with lots of relations and foreign keys and triggers and stored procedures and all sorts of other stuff. That's why it takes multiple hours to flatten everything (is there a better term to describe the process of turning relational data into a single flat file?) * Note that I'm assuming a few things up there. 1: php uses about 2:1 memory to store data it's holding, roughly. If you're set to 16 Meg max, I'm assuming your return set is 8Meg or larger. I'm not sure, but that's probably correct. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] more problems with count(*) on large table
In response to Mike Charnoky [EMAIL PROTECTED]: The db server is pretty beefy: 2x Xeon 3.20 GHz with 6G RAM. The io subsystem is a 550G 4-disk SATA 150 RAID 10 array connected via a 3ware 9500S-8 controller (Seagate Nearline ST3400632NS drives). Currently, shared_buffers is set to 5 (nearly 400M) The advice on 8.x systems has been to start with 1/4-1/3 of the available RAM on the system, and fine-tune from there. Unless there are other (non-postgresql) functions this machine serves, you should probably up shared_buffers to about 2G. From there, you may find that your workload benefits from even more, or possibly less, but 400M seems pretty small for a 6G system. As for the data stored in this large table, there are 15 columns. Each row takes roughly 134 bytes to store, not counting the index. So, for one day's worth of data we are talking about 1.5G/day (1.8G with the index). That's about 11.5M rows/day. Although the data isn't stored exactly sequentially by the indexed time field, it is pretty close. How much other data is this server pushing around? If there's only that one table in that one database, then something is wrong, as that whole thing should be in the filesystem cache all the time. Otherwise, you have to consider what other operations may be needing memory and moving those tables out of the way. If it takes PG ~40 minutes to count(*) one day's worth of records, the avg throughput is 786k/s. Watching iostat during the count(*) operation, I see average read speeds in the range of 1100-1500k/s. Could be a lot of fragmentation of that table. Keep in mind that if you're deleting records occasionally, that free space will get reused, which means an insert might not insert sequentially, it might go all over the table. I guess I would expect postgres to perform a count(*) faster. When I run benchmarks on the machine with hdparm (with the db insert process running), I see the disk averages 80MB/sec for reads # hdparm -tT /dev/sdb1 /dev/sdb1: Timing cached reads: 3884 MB in 2.00 seconds = 1942.85 MB/sec Timing buffered disk reads: 248 MB in 3.01 seconds = 82.49 MB/sec Maybe PG has to do a lot of random disk access? I'm running bonnie++ now to get more detailed disk performance info. As Tomasz pointed out maybe using CLUSTER would help, but this probably takes a long time to perform. If you can spare the time, give it a try to see if it helps. Again, the only other thing happening with the db: a separate process is inserting data into this table. I have checkpoint_segments set to 64 so that pg is not constantly thrashing the disk with writes. The transaction log is on a separate disk. Mike Bill Moran wrote: In response to Mike Charnoky [EMAIL PROTECTED]: This is strange... count(*) operations over a period of one day's worth of data now take ~1-2 minutes to run or ~40 minutes. It seems that the first time the data is queried it takes about 40 minutes. If I try the query again, it finishes in 1-2 minutes! This sounds like a caching issue. My guess at what's happening is that other operations are pushing this data out of the shared_buffers, so when you run it, the system has to pull a bunch of tuples off the disk to check them. If you run it again immediately, the tuples are still in memory, and it runs very fast. If this is the case, you can speed up things by adding RAM/shared_buffers, or by moving to faster disks. The RAM solution is going to give you the biggest performance improvement. However, if there's enough other data on this system, you may have difficulty getting enough RAM to mitigate the problem, in which case, faster disks are going to be your best bet. How much RAM do you have, and how much of it is allocated to shared_buffers? What's your IO subsystem look like? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] more problems with count(*) on large table
In response to Mike Charnoky [EMAIL PROTECTED]: This is strange... count(*) operations over a period of one day's worth of data now take ~1-2 minutes to run or ~40 minutes. It seems that the first time the data is queried it takes about 40 minutes. If I try the query again, it finishes in 1-2 minutes! This sounds like a caching issue. My guess at what's happening is that other operations are pushing this data out of the shared_buffers, so when you run it, the system has to pull a bunch of tuples off the disk to check them. If you run it again immediately, the tuples are still in memory, and it runs very fast. If this is the case, you can speed up things by adding RAM/shared_buffers, or by moving to faster disks. The RAM solution is going to give you the biggest performance improvement. However, if there's enough other data on this system, you may have difficulty getting enough RAM to mitigate the problem, in which case, faster disks are going to be your best bet. How much RAM do you have, and how much of it is allocated to shared_buffers? What's your IO subsystem look like? Again, nothing else is happening on this db server except for a constant insertion into this table and a few others. I have done set statistics 100 for the evtime field in this table. Here is the output from EXPLAIN ANALYZE. This is the same query run back to back, first time takes 42 minutes, second time takes less than 2 minutes! mydb=# explain analyze select count(*) from prediction_accuracy where evtime between '2007-09-29' and '2007-09-30'; QUERY PLAN --- Aggregate (cost=3.02..3.03 rows=1 width=0) (actual time=2549854.351..2549854.352 rows=1 loops=1) - Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892 rows=11423786 loops=1) Index Cond: ((evtime = '2007-09-29 00:00:00-07'::timestamp with time zone) AND (evtime = '2007-09-30 00:00:00-07'::timestamp with time zone)) Total runtime: 2549854.411 ms (4 rows) Time: 2549943.506 ms mydb=# explain analyze select count(*) from prediction_accuracy where evtime between '2007-09-29' and '2007-09-30'; QUERY PLAN - Aggregate (cost=3.02..3.03 rows=1 width=0) (actual time=111200.943..111200.944 rows=1 loops=1) - Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483 rows=11423786 loops=1) Index Cond: ((evtime = '2007-09-29 00:00:00-07'::timestamp with time zone) AND (evtime = '2007-09-30 00:00:00-07'::timestamp with time zone)) Total runtime: 111201.000 ms (4 rows) Time: 111298.695 ms Mike Gregory Stark wrote: Mike Charnoky [EMAIL PROTECTED] writes: I altered the table in question, with set statistics 100 on the timestamp column, then ran analyze. This seemed to help somewhat. Now, queries don't seem to hang, but it still takes a long time to do the count: * where evtime between '2007-09-26' and '2007-09-27' took 37 minutes to run (result was ~12 million) * where evtime between '2007-09-25' and '2007-09-26' took 40 minutes to run (result was ~14 million) Still stymied about the seemingly random performance, especially since I have seen this query execute in 2 minutes. And the explain analyze for these? Are you still sure it's certain date ranges which are consistently problems and others are consistently fast? Or could it be something unrelated. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] more problems with count(*) on large table
In response to Mike Charnoky [EMAIL PROTECTED]: The autovacuum is turned on. Since this is pg8.1, I don't know when the table was actually last vacuumed. I *did* run analyze on the table, though. Also, nothing has been deleted in this table... so vacuum should have no affect, right? Updated rows also produce dead tuples that require vacuuming. If the table is insert only, you don't need vacuum. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] How unsubscrib this list ?
In response to Gerson Machado [EMAIL PROTECTED]: How unsubscrib this list ? Flickr agora em português. Você clica, todo mundo vê. Saiba mais. The information is in the mail headers: List-Archive: http://archives.postgresql.org/pgsql-general List-Help: mailto:[EMAIL PROTECTED] List-ID: pgsql-general.postgresql.org List-Owner: mailto:[EMAIL PROTECTED] List-Post: mailto:pgsql-general@postgresql.org List-Subscribe: mailto:[EMAIL PROTECTED] List-Unsubscribe: mailto:[EMAIL PROTECTED] -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Help tuning a large table off disk and into RAM
In response to James Williams [EMAIL PROTECTED]: I'm stuck trying to tune a big-ish postgres db and wondering if anyone has any pointers. I cannot get Postgres to make good use of plenty of available RAM and stop thrashing the disks. One main table. ~30 million rows, 20 columns all integer, smallint or char(2). Most have an index. It's a table for holding webserver logs. The main table is all foreign key ids. Row size is ~100bytes. The typical query is an aggregate over a large number of rows (~25% say). SELECT COUNT(*), COUNT(DISTINCT user_id) FROM table WHERE epoch ... AND epoch ... AND country = ... The box has 4 x Opterons, 4Gb RAM five 15k rpm disks, RAID 5. We wanted fast query/lookup. We know we can get fast disk IO. Running a typical query like above seems to: * hardly tax a single CPU * plenty of RAM free * disks thrash about The last is based mostly on the observation that another tiddly unrelated mysql db which normally runs fast, grinds to a halt when we're querying the postgres db (and cpu, memory appear to have spare capacity). We've currently got these settings, and have tried doubling/halving them, restarted and benchmarked a test query. They don't appear to materially alter our query time. shared_buffers = 128MB shared_buffers = 1.5GB Unless you've got a lot of stuff other than PostgreSQL on this machine. temp_buffers= 160MB work_mem= 200MB max_stack_depth = 7MB These look reasonable, although I can't be sure without more details. We're less concerned about insert speed. Typically 1 or 2 users, but want fast queries. Perhaps a little extreme, but I'm trying to find a way to express this in a way that Postgres understands: * Load this table, and one or two indexes (epoch, user_id) into RAM. Give it enough shared_buffers and it will do that. You're estimating the size of your table @ 3G (try a pg_relation_size() on it to get an actual size) If you really want to get _all_ of it in all the time, you're probably going to need to add RAM to the machine. With 8G, you could allocate about 3G to shared_buffers, but that would be ignoring the size of indexes. However, I think you'll be surprised how much performance improves with 1.5G of shared_buffers. You may not need any more. 128M is really forcing PG to work within limited space. * All of the table, all of those indexes. * Keep them there, but keep a disk based backup for integrity. * Run all selects against the in RAM copy. Always. This is what PG does if you allocate enough shared_buffers. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Building Windows fat clients
I haven't used it in a year or so, although I've heard that it only keeps getting better / more mature / more complete. According to their Supported Platforms page ( http://www.mono-project.com/Supported_Platforms ), their current list of supported OS's is: * Linux * Mac OS X * Sun Solaris * BSD - OpenBSD, FreeBSD, NetBSD * Microsoft Windows One of these days I'm hoping to have some time to get back to using it (instead of just using Java for cross-platform work). - Bill -Original Message- From: Martin Gainty [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 20, 2000 11:00 AM To: Bill Bartlett; 'johnf'; pgsql-general@postgresql.org Subject: Re: [GENERAL] Building Windows fat clients Glad to hear that I knew Novell *was* working on a port to BSD But I havent heard about ports to other Nix platforms? M-- - Original Message - From: Bill Bartlett [EMAIL PROTECTED] To: 'Martin Gainty' [EMAIL PROTECTED]; 'johnf' [EMAIL PROTECTED]; pgsql-general@postgresql.org Sent: Wednesday, September 19, 2007 10:42 PM Subject: Re: [GENERAL] Building Windows fat clients -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Gainty Sent: Tuesday, September 19, 2000 5:58 PM To: johnf; pgsql-general@postgresql.org Subject: Re: [GENERAL] Building Windows fat clients Hello Guys Using C# means .NET framework will need to be installed and your webapp will only work with Microsoft OS Not entirely true. The Mono project ( www.mono-project.com ) has implemented a decent amount of the .NET Framework in a cross-platform environment, including much of ASP.NET. Be aware scripting languages such as PHP and Python will necessitate that you acquire all of the libraries for your web app.. As long as you stay mainstream you should be ok But if you have specific requirements for XYZ Db that nobody supports or protocols or device drivers that nobody has written you'll have to write the libraries yourself Martin-- - Original Message - From: johnf [EMAIL PROTECTED] To: pgsql-general@postgresql.org Sent: Wednesday, September 19, 2007 5:20 PM Subject: Re: [GENERAL] Building Windows fat clients On Wednesday 19 September 2007 10:19, Scott Ribe wrote: I'm asking this group because we tend to think alike wrt to data modeling and separation of concerns ;-) Any recommendations on ORM libraries for new Windows development? The last time I started anything from scratch was over 10 years ago, and the state of the art seemed to be to smash everything together into event handlers on GUI objects. Ugh. I pulled the M of the MVC out into separate coherent classes and implemented a *very* simple ORM, leaving the VC mostly conflated in the event handlers--which is not too bad since this app will never need to be cross-platform. So the dev tool was discontinued, some closed-source libraries are getting less and less compatible by the year, and we're going to rewrite. Where to start? It's a custom Windows-only app, only installed at one site. Using .NET would be fine. C# or C++ would be most-preferred language choices, although we could suck it up and use Java. I don't want to put VB on the table. Leaning toward Visual Studio .NET because I know it will be around (in whatever morphed form) for a while; but also considering Borland's supposedly revitalized C++ tools because I used C++ Builder with success back when MS C++ compilers were still awful. I should probably mention that the Windows apps, with the exception of one complicated explore customer's entire history here screen, are pretty simple; the complexity is in reports and stored procedures. Suggestions where to start? If you like python you might want to check www.dabodev.com. Dabo was designed to access data. -- John Fabiani ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] autovacuum
In response to Robert Fitzpatrick [EMAIL PROTECTED]: I have a large database used with our mail filter. The pg_dumpall results in about 3GB with this being the only database in the system besides templates and postgres. I do a vacuum every night after backup and it takes about an hour, is this normal for this size db? normal is relative. If it's taking an hour to vacuum 3G, I would say that either your hardware is undersized/badly configured, or you're not vacuuming often enough. I also have autovacuum enabled and when it is running during the day, our mail queues will tend to fill up with slow response to the server. Should I have autovacuum on even if I am vacuuming the db every night? I'm not aware of any problems with autovaccum and scheduled vacuum working together. That doesn't mean you're vacuuming often enough, however. Switch your nightly vacuum to vacuum verbose and capture the output to see how much work it has to do. Are your fsm settings high enough? Let me know if you need more specifics. Just trying to get some feedback on if my vacuum is taking too long or if both are necessary...thanks for the help! How much RAM does the system have? What's your shared_buffer settings? What's your maintenance_work_mem set to? -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] autovacuum
In response to Robert Fitzpatrick [EMAIL PROTECTED]: On Thu, 2007-09-20 at 13:22 -0400, Bill Moran wrote: That doesn't mean you're vacuuming often enough, however. Switch your nightly vacuum to vacuum verbose and capture the output to see how much work it has to do. Are your fsm settings high enough? snip INFO: free space map contains 30078 pages in 41 relations DETAIL: A total of 30304 page slots are in use (including overhead). 30304 page slots are required to track all free space. Current limits are: 204800 page slots, 1000 relations, using 1265 kB. This was what I was most concerned about. If your FSM settings are too low, vacuum won't be able to fully do its job. But it looks like you're OK here. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] autovacuum
In response to Robert Fitzpatrick [EMAIL PROTECTED]: On Thu, 2007-09-20 at 13:22 -0400, Bill Moran wrote: In response to Robert Fitzpatrick [EMAIL PROTECTED]: I have a large database used with our mail filter. The pg_dumpall results in about 3GB with this being the only database in the system besides templates and postgres. I do a vacuum every night after backup and it takes about an hour, is this normal for this size db? normal is relative. If it's taking an hour to vacuum 3G, I would say that either your hardware is undersized/badly configured, or you're not vacuuming often enough. It is a dual P4 processor supermicro server with 2GB of RAM, so I will need to go over the configuration then? I didn't think it should take so long... Why does everyone leave of the IO subsystem? It's almost as if many people don't realize that disks exist ... With 2G of RAM, and a DB that's about 3G, then there's at least a G of database data _not_ in memory at any time. As a result, disk speed is important, and _could_ be part of your problem. You're not using RAID 5 are you? Let me know if you need more specifics. Just trying to get some feedback on if my vacuum is taking too long or if both are necessary...thanks for the help! How much RAM does the system have? What's your shared_buffer settings? What's your maintenance_work_mem set to? Yes, this is the first time I've had to do any tuning to pgsql, so I most likely need help in this area. This is 8.2.4 on a FreeBSD 6.2 server...here are those settings currently below. I also had to tweak BSD loader.conf to allow the changes to work... max_connections = 250 max_fsm_pages = 204800 shared_buffers = 128MB Unless this machine runs programs other than PostgreSQL, raise this to about 650MB. You might get better performance from even higher values. The rule of thumb is allocate 1/4 - 1/3 of the available RAM to shared_buffers ... subtract the RAM that other programs are using first. effective_cache_size = 256MB More like 1300MB (again, unless this machine is doing other things) work_mem = 64MB maintenance_work_mem = 256MB mx1# cat /etc/loader.conf kern.ipc.semmni=256 kern.ipc.semmns=512 kern.ipc.semmnu=256 mx1# cat /etc/sysctl.conf # tuning for PostgreSQL kern.ipc.shm_use_phys=1 kern.ipc.shmmax=1073741824 kern.ipc.shmall=262144 kern.ipc.semmsl=512 kern.ipc.semmap=256 If I don't have it listed above, then it is default settings for anything else. Watch the system during vacuum to see if it's blocking on IO or CPU. systat, vmstat, iostat, and top (use 'm' to switch views) are all good utilities to check on this. Another possibility is that autovac isn't configured correctly. Watch your PostgreSQL logs to see if it's running at all. If it is, turn up the logging level until it tells you which tables it's vacuuming. You may have to tweak the thresholds to make it more aggressive. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Building Windows fat clients
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Gainty Sent: Tuesday, September 19, 2000 5:58 PM To: johnf; pgsql-general@postgresql.org Subject: Re: [GENERAL] Building Windows fat clients Hello Guys Using C# means .NET framework will need to be installed and your webapp will only work with Microsoft OS Not entirely true. The Mono project ( www.mono-project.com ) has implemented a decent amount of the .NET Framework in a cross-platform environment, including much of ASP.NET. Be aware scripting languages such as PHP and Python will necessitate that you acquire all of the libraries for your web app.. As long as you stay mainstream you should be ok But if you have specific requirements for XYZ Db that nobody supports or protocols or device drivers that nobody has written you'll have to write the libraries yourself Martin-- - Original Message - From: johnf [EMAIL PROTECTED] To: pgsql-general@postgresql.org Sent: Wednesday, September 19, 2007 5:20 PM Subject: Re: [GENERAL] Building Windows fat clients On Wednesday 19 September 2007 10:19, Scott Ribe wrote: I'm asking this group because we tend to think alike wrt to data modeling and separation of concerns ;-) Any recommendations on ORM libraries for new Windows development? The last time I started anything from scratch was over 10 years ago, and the state of the art seemed to be to smash everything together into event handlers on GUI objects. Ugh. I pulled the M of the MVC out into separate coherent classes and implemented a *very* simple ORM, leaving the VC mostly conflated in the event handlers--which is not too bad since this app will never need to be cross-platform. So the dev tool was discontinued, some closed-source libraries are getting less and less compatible by the year, and we're going to rewrite. Where to start? It's a custom Windows-only app, only installed at one site. Using .NET would be fine. C# or C++ would be most-preferred language choices, although we could suck it up and use Java. I don't want to put VB on the table. Leaning toward Visual Studio .NET because I know it will be around (in whatever morphed form) for a while; but also considering Borland's supposedly revitalized C++ tools because I used C++ Builder with success back when MS C++ compilers were still awful. I should probably mention that the Windows apps, with the exception of one complicated explore customer's entire history here screen, are pretty simple; the complexity is in reports and stored procedures. Suggestions where to start? If you like python you might want to check www.dabodev.com. Dabo was designed to access data. -- John Fabiani ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?
I'm just starting with pgcrypto, and I'm curious if it's needed/recommended to use an initialization vector/value (IV) with the pgp_sym_encrypt() function. The docs hint that an IV is used automatically, but encrypting plain text that starts the same seems to result in initial common cipher text. So, I'm not clear. 2. Data is prefixed with block of random bytes. This is equal to using random IV. So, I'm currently generating a substring of a md5 hash of a few items and pre-pending that to the plain text I need to encrypt as the IV. Then when I decrypt I remove that prefix. BTW, this is for credit card storage, which is a business requirement. Besides following the PCI DSS and external audit procedures, the plan is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a credit card storage server. The server and db are SSL only and the key is passed from the application and never stored anyplace (except in memcached on other servers during the session). The key is a user's plain text password plus an application-specific secret. So, each row has its own key. Passwords must be changed periodically, etc. I'd welcome any comments or recommendations from others that have implemented something similar. Thanks, -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Filip Rembiałkowski [EMAIL PROTECTED] wrote: 2007/9/18, Joshua D. Drake [EMAIL PROTECTED]: If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If you do not overrun your max_fsm_pages, yes vacuum analyze can deal with the issue. Are you sure? I have a situation where above is no true. postgres version 8.1.8. while vacuum verbose says: INFO: free space map contains 2329221 pages in 490 relations DETAIL: A total of 2345744 page slots are in use (including overhead). 2345744 page slots are required to track all free space. Current limits are: 1000 page slots, 1000 relations, using 58698 KB. ... and we have constant problem with index bloat and need to REINDEX frequently. the database is very redundant and has quite hight data retention rate (it's an ecommerce site) I've been involved in a number of the discussions on this, and I think part of the confusion stems from the fact that index bloat is an ambiguous term. If the index gets large enough that it no longer fits in shared memory, and reindexing it will reduce its size to where it _will_ fit in shared memory, then the index _could_ be said to be bloated. However, an equally valid solution to that problem is to increase the amount of shared memory available (possibly by adding RAM). Unfortunately, folks like Phoenix are looking for yes/no answers, and with many of these questions, the _correct_ answer is it depends on your workload If you find that reindexing improves performance, then you should investigate further. Depending on the exact nature of the problem, there are many possible solutions, three that come to mind: * Add RAM/SHM * REINDEX on a regular schedule * (with newer version) reduce the fill factor and REINDEX -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
In response to Phoenix Kiula [EMAIL PROTECTED]: If you find that reindexing improves performance, then you should investigate further. Depending on the exact nature of the problem, there are many possible solutions, three that come to mind: * Add RAM/SHM Can I add SHM with merely by managing the entry in sysctl.conf? My current values: kernel.shmmax = 536870912 kernel.shmall = 536870912 These values define the max allowed. They exist to keep poorly written applications from sucking up all the available memory. Setting them higher than is needed does not cause any problems, unless a greedy or poorly-written application grabs all that memory. My shared_buffers in postgresql.conf is 2. From the website http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax should be sharedbuffer*8192, so I suppose my shmmax can be much lower than the above, but I raised it for performance. Am I wrong to do so? It's completely impossible to tell without knowing more about your physical hardware. The rule of thumb is 1/3 physical RAM to start, then adjust if more or less seems to help. That advice is for versions of PG = 8. If you're still running a 7.X version, upgrade. How much RAM does this system have in it? Unless you have other applications running on this system using RAM, you should allocate more of it to shared_buffers. If 160M is 1/3 your RAM, you probably need to add more RAM. How big is your database? If it's possible to fit it all in shared_buffers, that will give you the best performance. * REINDEX on a regular schedule This is sadly not really feasible, because we need to offer a 100% availability website. REINDEX does not work concurrently so it is not really an option for us. My max_fsm_pages and max_fsm_relations are way above the numbers that come up after the VACUUM ANALYZE VERBOSE run. Hence my comment about depending on your workload and investigating the situation to determine the best solution. But still, the autovacuum stuff seems like it is not working at all. Some related entries in the conf file: autovacuum = on autovacuum_vacuum_cost_delay = 20 vacuum_cost_delay= 20 autovacuum_naptime = 30 stats_start_collector= on stats_row_level = on autovacuum_vacuum_threshold = 80 autovacuum_analyze_threshold = 80 And yet, the db often slows down, at which point I manually login and run a manual VACUUM ANALYZE and it seems fine for some more time. Sometimes, I also restart pgsql and that seems to help for a while. You don't mention *_scale_factor settings. Those are going to be important as well. Based on your symptoms, it sounds like autovacuum is not getting those tables vacuumed enough. I recommend raising the debug level and watching the logs to see if autovacuum is actually getting tables vacuumed. Consider lowering your *_scale_factor values if not. Or even reducing autovacuum_naptime. Another advice on these forums is to see vmstat 1, without actually specifying how to draw inferences from it. The free version of it is coming up at decent rates, as follows: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 29124 110760 108980 346773601 206 1400 4 2 1 85 12 0 0 29124 110632 108980 346773600 0 0 1052 108 0 0 100 0 2 0 29124 108840 108980 346773600 0 0 1112 299 1 1 98 0 1 0 29124 109288 108980 346773600 0 0 1073 319 2 1 98 0 . Explaining how to interpret the output of this command and determine what to do with it is not something easily done in a short paragraph. However, it looks like you've got a lot of RAM being used for the disk cache. That memory would probably be better used as shared_buffers, so I suggest you increase that value considerably. * (with newer version) reduce the fill factor and REINDEX I think some of my tables are updated very frequently so a smaller fill factor will be nice. How can I find the current fill factor on my tables? Also, is there some method or science to calculating a decent fill factor -- size of table, number of indexes, frequency of updates, and such? We have one major table which faces a lot of INSERTs and UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10 million). I'm not sure how to find the current value, but a smaller fill factor on busy tables should lead to less fragmentation, thus more efficient indexes over time. Keep in mind that a smaller fill factor will also lead to larger indexes initially. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
index fillfactor (was Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER)
In response to Phoenix Kiula [EMAIL PROTECTED]: Thanks for a very informative post! One question: I'm not sure how to find the current value, but a smaller fill factor on busy tables should lead to less fragmentation, thus more efficient indexes over time. Keep in mind that a smaller fill factor will also lead to larger indexes initially. What constitutes a small fill factor? Would 70 be good? Unfortunately, I can't say. I have not yet had the opportunity to experiment with different fillfactors, so I can only speak in vague estimations on this topic. I guess my current must have been the default, which the manual says is 100. I expect it's at the default, but the docs say that is 90%: http://www.postgresql.org/docs/8.2/static/sql-createindex.html Where did you see 100? Or did you mean really small fill factor like 20? In this context, what is packing in the manual -- is that some kind of compression? Hopefully, someone more knowledgeable will chime in with some wise suggestions. Barring that, I can only suggest you experiment to find what works for your workload, but don't rule out the possibility that extremely low fillfactor values might work well for you. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?
On Tue, Sep 18, 2007 at 11:26:17AM +0300, Marko Kreen wrote: Few bytes being same is normal. Those are PGP packet header, telling this is symmetrically encrypted session key packet, with length X plus some more details. I see. So, you are saying no need to generate my own IV to prepend to the plain text before encrypting. If more that few bytes are same, and if the salt is not randomised it _could_ be a sign of problem. Either pgcrypto bug or failure to initialize random generator. If you suspect a problem, please send me few example encryptions with keys and your setup details (postgres version, openssl or not, os version) No, it was only a few bytes that were similar, so the headers explain that. Besides following the PCI DSS and external audit procedures, the plan is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a credit card storage server. The server and db are SSL only and the key is passed from the application and never stored anyplace (except in memcached on other servers during the session). The key is a user's plain text password plus an application-specific secret. So, each row has its own key. Passwords must be changed periodically, etc. I don't know details of your setup, but I strongly suggest you look into using public-key crypto. That allow you separate keys for encryption and decryption. So in webserver where users only input credit cards, you keep only public keys, so anybody cracking that won't be able to decrypt data. I need to look at that more. But I've seen that suggested where one needs to decrypt the data at a later time. We don't have that need. Our plan was to never store any keys. Every user must log in to the application with a password. Their account passwords are only stored hashed on disk, so we don't know their passwords. The plan is to encrypt their plain-text password with a secret known by the application only and stored into memcached. It's this plain-text password that will be sent to a separate server to encrypt and (and decrypt) their credit card data when the user make a transaction. We only need to store the credit card data to allow subsequent charges to their card on file -- and that only happens when a user logs in and processes a transaction. We don't have any way to decrypt the data without this password stored in the session. If someone hacks an application server they could pluck active user's passwords from memcached and also find the application's secret word. Then if they also hacked the credit card server they could then decrypt the data using passwords they were able to sniff. See any glaring holes? Thanks for the help! -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] constrains on two tables
In response to finecur [EMAIL PROTECTED]: On Sep 10, 9:55 pm, finecur [EMAIL PROTECTED] wrote: Hi Here is my table: Table School ( id integer, name text ); Table Department ( id integer, school_id integer reference school(id), name text ); Table Course ( department_id integer references department(id), name text, course_number text ) I would like to make the course_number unique, but not in the course table, nor in department. I would like to make the course_number unique in the scope of School. So, you may see the same course_number in Course table, but (school_id, course_number) should be always unique. How can I make the constrain? ALTER TABLE Course PRIMARY KEY(school_id, course_number); and add the school_id column to the Course table. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Security Advances in Postgresql over other RDBMS
In response to Jasbinder Singh Bali [EMAIL PROTECTED]: Hi, The way postgres has the concept of host base authentication, is this a step forward over other RDBMS like sql server and oracle? Host-based auth has been around a long time. It's just another layer. Some folks even consider HBA antiquated, which may be why MS doesn't use it. It's difficult to use if you've got a DHCP driven network where IPs change a lot. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Markus Schiltknecht [EMAIL PROTECTED] wrote: Hi, Bill Moran wrote: While true, I feel those applications are the exception, not the rule. Most DBs these days are the blogs and the image galleries, etc. And those don't need or want the overhead associated with synchronous replication. Uhm.. do blogs and image galleries need replication at all? Ever read anything on how myspace is laid out? The big ones need replication to handle the traffic. I'm thinking more of the business critical applications, where high availability is a real demand - and where your data *should* better be distributed among multiple data centers just to avoid a single point of failure. Agreed. Judged by importance, they're bigger. Judged by raw numbers, I suspect that they're less prominent than the blogs and image galleries. I guess it depends on your viewpoint. rant for most other stuff MySQL is good enough /rant I hate when people say that. Any company that says that is of limited viability in my opinion. For one thing, saying it's good enough is setting yourself up for trouble when your company expands its requirements. It's synchronous for the reason you describe, but it's asynchronous because a query that has returned successfully is not _guaranteed_ to be committed everywhere yet. Seems like we're dealing with a limitation in the terminology :) Certainly! But sync and async replication are so well known and used terms... on the other hand, I certainly agree that in Postgres-R, the nodes do not process transactions synchronously, but asynchronous. Good point. Maybe it's really better to speak of eager and lazy replication, as in some literature (namely the initial Postgres-R paper of Bettina Kemme). This could potentially be a problem on (for example) a web application, where a particular user's experience may be load-balanced to another node at any time. Of course, you just have to write the application with that knowledge. IMO, such heavily dynamic load-balancing is rarely useful. With application support, it's easily doable: let the first transaction on node A query the (global) transaction identifier and after connecting to the next node B, ask that to wait until that transaction has committed. It gets a little harder without application support: the load balancer would have to keep track of sessions and their last (writing) transaction. Again, thank you for pointing this out. Keep up the good work. I wish I had some spare cycles to work on Postgres-R -- it seems like a very interesting problem. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Export data to MS Excel
For quick/simple table format reports, you can just use psql to create the output in HTML format, then import that directly into Excel. For example, I use the following psql line to generate an HTML-format report of server IP information; this file can then be directly opened in Excel. (Excel 2002 and later will open and parse most HTML format files without even needing to import them -- just open the file.) psql -H -P tableattr='cellspacing=0 cellpadding=6' -P title='Server IP Information' -f get_server_ip_info.sql -o get_server_ip_info.html Alternatively, you can bury the formatting commands in the SQL file itself -- this is handy if the formatting is longer or doesn't change, or you want different formats or titles for different reports. For example: list_of_unsigned_images_report.sql: \pset format html \pset title 'List of Unsigned Images' \pset tableattr 'cellspacing=0 cellpadding=6' SELECT p.last_name, p.first_name, [blah blah blah]... - Bill -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Phoenix Kiula Sent: Saturday, September 01, 2007 5:24 AM To: Ashish Karalkar Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Export data to MS Excel On 01/09/07, Ashish Karalkar [EMAIL PROTECTED] wrote: Hello All, I want to export data from PostgreSQL tables to MS Excel. Is there any way? Sure, write SQL in a program (php, perl, jsp, asp) to dump the tables in HTML tabletrtd rows format. Then import that HTML page program into Excel from Tools -- Data Sources. Once you save the xls file, you can always just refresh the data because it already points to your program. I guess there must be other ways, but this one works fine for me. Excel has a limit of 65,700 rows (or thereabouts) so it's not a huge deal for a db like pgsql. ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
chris smith [EMAIL PROTECTED] wrote: Ever read anything on how myspace is laid out? The big ones need replication to handle the traffic. Actually no. http://highscalability.com/livejournal-architecture Using MySQL replication only takes you so far. (Yeh it's mysql but the point is valid regardless). You can't keep adding read slaves and scale. A lot use sharding now to keep scaling (limiting to X users/accounts per database system and just keep adding more database servers for the next X accounts). I got the impression that they hadn't moved _all_ of their DB needs to sharding. Just the ones that exceeded the scalability of replication, but they don't explicitly say, IIRC. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
In response to Markus Schiltknecht [EMAIL PROTECTED]: Hi, Bill Moran wrote: First off, clustering is a word that is too vague to be useful, so I'll stop using it. There's multi-master replication, where every database is read-write, then there's master-slave replication, where only one server is read-write and the rest are read-only. You can add failover capabilities to master-slave replication. Then there's synchronous replication, where all servers are guaranteed to get updates at the same time. And asynchronous replication, where other servers may take a while to get updates. These descriptions aren't really specific to PostgreSQL -- every database replication system has to make design decisions about which approaches to support. Good explanation! Synchronous replication is only really used when two servers are right next to each other with a high-speed link (probably gigabit) between them. Why is that so? There's certainly very valuable data which would gain from an inter-continental database system. For money transfers, for example, I'd rather wait half a second for a round trip around the world, to make sure the RDBS does not 'loose' my money. While true, I feel those applications are the exception, not the rule. Most DBs these days are the blogs and the image galleries, etc. And those don't need or want the overhead associated with synchronous replication. PostgreSQL-R is in development, and targeted to allow multi-master, asynchronous replication without rewriting your application. As far as I know, it works, but it's still beta. Sorry, this is nitpicking, but for some reason (see current naming discussion on -advocacy :-) ), it's Postgres-R. Sorry. Additionally, Postgres-R is considered to be a *synchronous* replication system, because once you get your commit confirmation, your transaction is guaranteed to be deliverable and *committable* on all running nodes (i.e. it's durable and consistent). Or put it another way: asynchronous systems have to deal with conflicting, but already committed transactions - Postgres-R does not. I find that line fuzzy. It's synchronous for the reason you describe, but it's asynchronous because a query that has returned successfully is not _guaranteed_ to be committed everywhere yet. Seems like we're dealing with a limitation in the terminology :) Certainly, this is slightly less restrictive than saying that a transaction needs to be *committed* on all nodes, before confirming the commit to the client. But as long as a database session is tied to a node, this optimization does not alter any transactional semantics. And despite that limitation, which is mostly the case in reality anyway, I still consider this to be synchronous replication. This could potentially be a problem on (for example) a web application, where a particular user's experience may be load-balanced to another node at any time. Of course, you just have to write the application with that knowledge. [ To get a strictly synchronous system with Postgres-R, you'd have to delay read only transactions on a node which hasn't applied all remote transactions, yet. In most cases, that's unwanted. Instead, a consistent snapshot is enough, just as if the transaction started *before* the remote ones which still need to be applied. ] Agreed. BTW: does anyone know of a link that describes these high-level concepts? If not, I think I'll write this up formally and post it. Hm.. somewhen before 8.3 was released, we had lots of discussions on -docs about the high availability and replication section of the PostgreSQL documentation. I'd have liked to add these fundamental concepts, but Bruce - rightly - wanted to keep focused on existing solutions. And unfortunately, most existing solutions are async, single-master. So explaining all these wonderful theoretic concepts only to state that there are no real solutions would have been silly. Someone else posted a link, and the docs look pretty comprehensive at this point ... enough so that I'm not going to bother writing up my own explanation. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Can this function be declared IMMUTABLE?
In response to [EMAIL PROTECTED]: Hello, I have a question about whether I can safely declare a function IMMUTABLE. Citing the PostgreSQL documentation under Function Volatility Categories in the section on Extending SQL: It is generally unwise to select from database tables within an IMMUTABLE function at all, since the immutability will be broken if the table contents ever change. Well, I am considering a function that does select from a table, but the table contents change extremely infrequently (the table is practically a list of constants). Would it be safe to declare the function IMMUTABLE provided that the table itself is endowed with a trigger that will drop and recreate the function any time the table contents are modified? In this way, it seems that the database would gain the performance benefit of an immutable function for the long stretches of time in between changes to the table. Is this a table that will only change during upgrades/maintenance? If so, then immutable is probably safe, as the table will change under controlled circumstances. The utmost gauge of this is what happen if the function is immutable and the data _does_ change? if the result of such a scenario is acceptable, then you can probably use immutable. Another rule to take into account is the Law of Premature Optimization. The law states that trying to optimize too soon will cause pain. Have you determined that the extra performance gain that immutable will give you is even necessary? If not, then start out with a more conservative approach and approach the immutability problem _if_ you see performance issues. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Turning off atime on PostgreSQL DB volumes
In response to Keaton Adams [EMAIL PROTECTED]: After reading several articles on the performance drag that Linux atime has on file systems we would like to mount our DB volumes with the noatime parameter to see just what type of a performance gain we will achieve. Does PostgreSQL use atime in any way when reading/writing data? If we turn off/disable atime on the DB volumes will that cause any type of issue at all with PostgreSQL 8.1 on Red Hat Enterprise Linux? I frequently run with noatime and have never noticed any problems. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] EXPLAIN ANALYZE not considering primary and unique indices!
In response to Sanjay [EMAIL PROTECTED]: Hi All, Say I have a simple table WEBSITE(website_id int4 PRIMARY KEY, name VARCHAR(30)). While I try this: EXPLAIN ANALYZE SELECT * FROM WEBSITE WHERE website_id = 1 the output is: -- Seq Scan on website (cost=0.00..1.31 rows=1 width=162) (actual time=0.047..0.051 rows=1 loops=1) Filter: (website_id = 1) Total runtime: 0.102 ms --- Wondering why it is not using the index, which would have been automatically created for the primary key. Because PG thinks the seq scan is faster than an index scan. Depending on other factors, it may be right. If there's only a few rows in the table, then a seq scan is going to be faster than scanning the index, only to grab most of the table in to memory anyway. If it's wrong, it's either because your analyze data isn't up to date, or your tuning parameters don't match your hardware. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Geographic High-Availability/Replication
In response to Markus Schiltknecht [EMAIL PROTECTED]: Hi, Bill Moran wrote: I'm curious as to how Postgres-R would handle a situation where the constant throughput exceeded the processing speed of one of the nodes. Well, what do you expect to happen? This case is easily detectable, but I can only see two possible solutions: either stop the node which is to slow or stop accepting new transactions for a while. It appears as if I miscommunicated my point. I'm not expecting PostgreSQL-R to break the laws of physics or anything, I'm just curious how it reacts. This is the difference between software that will be really great one day, and software that is great now. Great now would mean the system would notice that it's too far behind and Do The Right Thing automatically. I'm not exactly sure what The Right Thing is, but my first guess would be force the hopelessly slow node out of the cluster. I expect this would be non-trivial, as you've have to have a way to ensure it was a problem isolated to a single (or few) nodes, and not just the whole cluster getting hit with unexpected traffic. This technique is not meant to allow nodes to lag behind several thousands of transactions - that should better be avoided. Rather it's meant to decrease the commit delay necessary for synchronous replication. Of course not, that's why the behaviour when that non-ideal situation occurs is so interesting. How does PostgreSQL-R fail? PostgreSQL fails wonderfully: A hardware crash will usually result in a system that can recover without operator intervention. In a system like PostgreSQL-R, the failure scenarios are more numerous, and probably more complicated. I can see your system working if it's just spike loads and the slow nodes can catch up during slow periods, but I'm wondering about the scenarios where an admin has underestimated the hardware requirements and one or more nodes is unable to keep up. Please keep in mind, that replication per se does not speed your database up, it rather adds a layer of reliability, which *costs* some performance. To increase the transactional throughput you would need to add partitioning to the mix. Or you could try to make use of the gained reliability and abandon WAL - you won't need that as long as at least one replica is running - that should increase the single node's throughput and therefore the cluster's throughput, too. I understand. I'm not asking it to do something it's not designed to. At least, I don't _think_ I am. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Geographic High-Availability/Replication
Markus Schiltknecht [EMAIL PROTECTED] wrote: Hi, Gregory Stark wrote: Only if your application is single-threaded. By single-threaded I don't refer to operating system threads but to the architecture. If you're processing a large batch file handling records one by one and waiting for each commit before proceeding then it's single threaded. If you have a hundred independent clients on separate connections doing separate things then each one of them could get 6tps. Which you have will depend on your application and your needs, it may not be something you can change. Correct. Plus, as in the implementation of Postgres-R, performance is *not* bound to the slowest node. Instead, every node can process transactions at it's own speed. Slower nodes might then have to queue transactions from those until they catch up again. I'm curious as to how Postgres-R would handle a situation where the constant throughput exceeded the processing speed of one of the nodes. I can see your system working if it's just spike loads and the slow nodes can catch up during slow periods, but I'm wondering about the scenarios where an admin has underestimated the hardware requirements and one or more nodes is unable to keep up. Just musing, really. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Bigtime scaling of Postgresql (cluster and stuff I suppose)
Phoenix Kiula [EMAIL PROTECTED] wrote: We're moving from MySQL to PG, a move I am rather enjoying, but we're currently running both databases. As we web-enable our financial services in fifteen countries, I would like to recommend the team that we move entirely to PG. In doing research on big installations of the two databases, I read this from a MySQL senior exec on Slashdot: Senior MySQL exec means this is a marketing blurb, which means it's exaggerated, lacking any honest assessment of challenges and difficulties, and possibly an outright lie. I've no doubt that MySQL can do clusters if you know what you're doing, but if you want the facts, you're going to have to look deeper than that obviously biased quote. I seem to remember a forum thread with someone having considerable difficulty with MySQL cluster, and actual MySQL employees jumping in to try to help and no solution ever found. Anyone have that link lying around? In any event, replication is a large and complex topic. To do it well takes research, planning, and know-how. Anyone who tells you their solution will just drop in and work is either lying or charging you a bunch of money for their consultants to investigate your scenario and set it up for you. First off, clustering is a word that is too vague to be useful, so I'll stop using it. There's multi-master replication, where every database is read-write, then there's master-slave replication, where only one server is read-write and the rest are read-only. You can add failover capabilities to master-slave replication. Then there's synchronous replication, where all servers are guaranteed to get updates at the same time. And asynchronous replication, where other servers may take a while to get updates. These descriptions aren't really specific to PostgreSQL -- every database replication system has to make design decisions about which approaches to support. PostgreSQL has some built-in features to allow synchronous multi-master database replication. Two-phase commit allows you to reliably commit transactions to multiple servers concurrently, but it requires support at the application level, which will require you to rewrite any existing applications. Pgcluster is multi-master synchronous replication, but I believe it's still in beta. Note that no synchronous replication system works well over geographically large distances. The time required for the masters to synchronize over (for example) the Internet kills performance to the point of uselessness. Again, this is not a PostgreSQL problem, MSSQL suffers the same problem. Synchronous replication is only really used when two servers are right next to each other with a high-speed link (probably gigabit) between them. PostgreSQL-R is in development, and targeted to allow multi-master, asynchronous replication without rewriting your application. As far as I know, it works, but it's still beta. pgpool supports multi-master synchronous replication as well as failover. Slony supports master-slave asynchronous replication and works _very_ well over long distances (such as from an east coast to a west coast datacenter) Once you've looked at your requirements, start looking at the tool that matches those requirements, and I think you'll find what you need. BTW: does anyone know of a link that describes these high-level concepts? If not, I think I'll write this up formally and post it. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Apache + PHP + Postgres Interaction
In response to Hannes Dorbath [EMAIL PROTECTED]: On 24.08.2007 02:43, Bill Moran wrote: Hannes Dorbath [EMAIL PROTECTED] wrote: Bill Moran wrote: I guess I just feel that broken is a bit of a harsh term. If your expectations are for full-blown connection management from pconnect(), then you will be disappointed. If you take it for what it is: persistent connections, then those limitations would be expected. It's broken because persistent connections get randomly garbage collected where they should not. So broken in the sense of bugged. Expect connections to die for no reason, especially under load. It's funny that you should mention that, since I haven't seen that behaviour in 18 months of load testing over a dozen servers. Please reply to the list as well. Your reply to me did not have the list in the CC. How did you verify that? It will spawn a new connection silently, if the old got dropped. Did you really verify your logs, that you don't get more new connections than Apache spawns workers? This might not be noticeable for you, if you are running Apache. In a FCGI environment where you have a fixed amount of workers, you notice new connections, as there should not be any. As I stated in the other reply to an email that looked similar to this one -- I'm not sure I understand the behaviour you're trying to describe. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] PG Seg Faults Performing a Query
I'm a bit out of my depth with using these debugging tools and interpreting their results, but I think the problem is due to the output being just too big for interactive display. Using the same query with tighter limits in the WHERE clause works perfectly. When I changed the SQL script to write output into a table it worked with the same query using even looser limits in the WHERE clause. So sending output to a table instead of to the monitor when the queries produce a large amount of output is reliable, faster and doesn't tie up the machine. I tried using strace, but it produced so much telemetry and unfortunately I couldn't understand it anyway that I don't think this would do me any good. I don't want to bug the PostgreSQL list with a problem that's probably not a PostgreSQL one, but if someone here would be willing to help me track down this apparent popen or fork problem I'd appreciate it. However, I managed to get the results I needed, so we could also call this fixed via workaround. Thanks for the help, Tom and others! - Bill Thoen Tom Lane wrote: Bill Thoen [EMAIL PROTECTED] writes: (gdb) bt #0 0x003054264571 in fputc () from /lib64/libc.so.6 #1 0x0040dbc2 in print_aligned_text (title=0x0, headers=0x5665d0, cells=0x2f8fc010, footers=0x557c90, opt_align=0x557ef0 'l' repeats 18 times, rr, 'l' repeats 12 times, rl lll, opt_tuples_only=0 '\0', opt_numeric_locale=0 '\0', opt_border=1, encoding=8, fout=0x0) at print.c:448 #2 0x0040f0eb in printTable (title=0x0, headers=0x5665d0, cells=0x2f8fc010, footers=0x557c90, align=0x557ef0 'l' repeats 18 times, rr, 'l' repeats 12 times, rl lll, opt=0x7fff3e3be8c0, fout=0x3054442760, flog=0x0) at print.c:1551 OK, so the problem is that print_aligned_text is being passed fout = NULL. Since that wasn't what was passed to printTable, the conclusion must be that PageOutput() was called and returned NULL --- that is, that its popen() call failed. Obviously we should put in some sort of check for that. I can see three reasonable responses: either make psql abort entirely (akin to its out-of-memory behavior), or have it fall back to not using the pager, either silently or after printing an error message. Any thoughts which way to jump? Meanwhile, the question Bill needs to look into is why popen() is failing for him. I'm guessing it's a fork() failure at bottom, but why so consistent? strace'ing the psql run might provide some more info. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Add Column BEFORE/AFTER another column
In response to Matthew [EMAIL PROTECTED]: Hi gang, In MySQL it is possible to add a column before/after another column. I have not been able to find such syntax in the Postgres manual. Does this ability exist? It does not. Is my only solution to create a new table with the new column, copy the data, delete the old table and rename the new one? The fact that you're asking this question is a hint that you're using unsafe coding practices that may bite you at some point. The order of columns is not guaranteed to be the same as when you do CREATE TABLE, it's just coincidence that they usually are. To get your columns in a specific order, specify the column names in that order in your SELECT statement. The SQL standard doesn't provide for any other way to guarantee column order, and neither does Postgres. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Apache + PHP + Postgres Interaction
In response to Joshua D. Drake [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Max Zorloff wrote: Hello. I have a subject setup and a few questions. The first one is this. PHP establishes a connection to the Postgres database through pg_pconnect(). Don't use pconnect. Use pgbouncer or pgpool. Then it runs some query, then the script returns, leaving the persistent connection hanging. But the trouble is that in this case any query takes significantly more time to execute than in the case of one PHP script running the same query with different parameters for N times. How can I achieve the same performance in the first case? Persistent connections help but not enough - the queries are still 10 times slower than they would be on the 2nd time. Well you haven't given us any indication of data set or what you are trying to do. However, I can tell you, don't use pconnect, its broke ;) Broke? How do you figure? I'm not trying to argue the advantages of a connection pooler such as pgpool, but, in my tests, pconnect() does exactly what it's supposed to do: reuse existing connections. In our tests, we saw a 2x speed improvement over connect(). Again, I understand that pgpool will do even better ... Also, I'm curious as to whether he's timing the actual _query_ or the entire script execution. If you're running a script multiple times to get multiple queries, most of your time is going to be tied up in PHP's parsing and startup -- unless I misunderstood the question. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Apache + PHP + Postgres Interaction
In response to Josh Trutwin [EMAIL PROTECTED]: On Thu, 23 Aug 2007 13:29:46 -0400 Bill Moran [EMAIL PROTECTED] wrote: Well you haven't given us any indication of data set or what you are trying to do. However, I can tell you, don't use pconnect, its broke ;) Broke? How do you figure? I asked that question earlier this month - this thread has some interesting discussion on pconnect: http://archives.postgresql.org/pgsql-general/2007-08/msg00602.php Thanks to you and Erik for the link. Not sure how I missed that thread. I guess I just feel that broken is a bit of a harsh term. If your expectations are for full-blown connection management from pconnect(), then you will be disappointed. If you take it for what it is: persistent connections, then those limitations would be expected. *shrug* I'm just glad there aren't any unknown problems waiting to bite me ... -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Geographic High-Availability/Replication
Decibel! [EMAIL PROTECTED] wrote: On Aug 22, 2007, at 3:37 PM, Joshua D. Drake wrote: You can not do multi master cross continent reliably. I'm pretty sure that credit card processors and some other companies do it... it just costs a LOT to actually do it well. Isn't this sort of requirement the entire reason for 2-phase commit? -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Seeking datacenter PITR backup procedures [RESENDING]
Decibel! [EMAIL PROTECTED] wrote: On Aug 19, 2007, at 7:23 AM, Bill Moran wrote: Assumptions: a. After pg_stop_backup(), Pg immediately recycles log files and hence wal logs can be copied to backup. This is a clean start. I don't believe so. ARAIK, all pg_stop_backup() does is remove the marker that pg_start_backup() put in place to tell the recovery process when the filesystem backup started. I'm pretty certain that's not the case. For a PITR to ensure that data is back to a consistent state after a recovery, it has to replay all the transactions that took place between pg_start_backup and pg_stop_backup; so it needs to know when pg_stop_backup() was actually run. Sounds likely ... but I don't believe that forces any specific log cycling activity, like the OP suggested. Be nice if someone who knew for sure would chime in ;) By not backing up pg_xlog, you are going to be behind by however many transactions are in the most recent transaction log that has not yet been archived. Depending on how often your databases are updated, this is likely acceptable. If you need anything more timely than that, you'll probably want to implement Slony or some other replication system. Just keep in mind that Slony is *not* a backup solution (though you could possibly argue that it's log shipping is). True. This rides the fine line of the difference between an HA setup and backup. Specifically: HA won't allow you to recovery from user error. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] PG Seg Faults Performing a Query
On Tue, Aug 21, 2007 at 04:38:42PM -0500, Scott Marlowe wrote: On 8/21/07, Bill Thoen [EMAIL PROTECTED] wrote: How would you suggest I try to track down this problem? I run the following query: SELECT a.* FROM compliance_2006 a, ers_regions b WHERE a.fips_st_cd=b.fips_st AND a.fips_cnty_cd=b.fips_cou AND b.region =1 AND a.fips_st_cd='17' AND a.fips_cnty_cd='003'; and it works. But when I try this: SELECT a.* FROM compliance_2006 a, ers_regions b WHERE a.fips_st_cd=b.fips_st AND a.fips_cnty_cd=b.fips_cou AND b.region =1 AND a.fips_st_cd='17' ; psql dies with the message: Segmentation Fault. so the client psql is what's dieing right? In that case you likely are getting too big a result set for psql to handle at once. Trying declaring a cursor to hold your query and fetching 100 or 1000 or so rows at a time. Just guessing. What's the exact text of the error message? The exact message was: Segmentation Fault. But the table compliance_2006 is very big (18 million plus records) so I'll try that cursor idea. But even so, an error like that makes me think that something's broken. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] PG Seg Faults Performing a Query
On Wed, Aug 22, 2007 at 09:46:21AM +1200, Andrej Ricnik-Bay wrote: On 8/22/07, Bill Thoen [EMAIL PROTECTED] wrote: How would you suggest I try to track down this problem? Any suggestions? postgres version? Operating system? Anything in the log(s)? PostgreSQL Version is 8.1.5, running on Linux (Fedora Core 6). The last few lines in the Serverlog are: LOG: unexpected EOF on client connection LOG: transaction ID wrap limit is 1073746500, limited by database postgres LOG: transaction ID wrap limit is 1073746500, limited by database postgres (I ran VACUUM FULL after it crashed to make sure there was no loose disk space floating around, so that last line was probably from that.) I assume that bit about transaction wrap limit is informational and not related to this problem. My PostgreSQL is working great for small SQL queries even from my large table (18 million records). But when I ask it to retrieve anything that takes it more than 10 minutes to assemble, it crashes with this Segmentation Fault error. I get so little feedback and I'm still pretty unfamiliar with Postgresql that I don't even know where to begin. This version of PostgreSQL was compiled from source with support for various other packages needed for GIS support, but the tables I'm trying to extract data from contain no GIS information. So I believe that this operation is plain PostgreSQL. Any help you can offer as to how I can track down what's wrong would be greatly appreciated. If I can't get this to work and can only use small tables in PG, then its usefulnes to me will be pretty limited. - Bill Thoen ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] PG Seg Faults Performing a Query
As you requested, here's what bt in gbd reports: (gdb) bt #0 0x003054264571 in fputc () from /lib64/libc.so.6 #1 0x0040dbd2 in print_aligned_text () #2 0x0040f10b in printTable () #3 0x0041020b in printQuery () #4 0x00407906 in SendQuery () #5 0x00409153 in MainLoop () #6 0x0040b16e in main () Please tell me what it means if you can and if I can fix this problem. Thanks, - Bill Thoen Alvaro Herrera wrote: Martijn van Oosterhout escribió: That said, it would be nice if it returned an error instead of crashing. In my opinion it isn't just a matter of would be nice. It is a possible bug that should be investigated. A look at a stack trace from the crashing process would be the first place to start. In order to do that, please set ulimit -c unlimited and rerun the query under psql. That should produce a core file. Then run gdb psql core and inside gdb, execute bt. Please send that output our way. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] do you have an easy example of postgis and mapserver?
On Tue, Aug 21, 2007 at 12:44:49PM +0200, Ottavio Campana wrote: I'm sorry this mail is not very in topic, but I hope you can help me. Just so you know, perhaps a better list to contact with this is the MapServer mailing list at http://lists.umn.edu/archives/mapserver-users.html, or maybe the PostGIS list at http://www.postgis.org/mailman/listinfo/postgis-users. I'm trying to learn how postgis and mapserver work together, but I cannot understand nearly anything. I mean, I read the documentation of postgis and I think I understood it, but I cannot do anything useful with it. I think before you try to do something customized with python you should get familiar the mapserver MAP file which contains all the instructions to map features from various sources like shape files, MapInfo TAB files, PostGIS data sources, and so on. An example for mapping a layer from a PostGIS source in a MAP file looks like this: LAYER NAME states TYPE POLYGON STATUS ON CONNECTION user=gisuser password=*** dbname=us_data host=localhost port=5432 CONNECTIONTYPE POSTGIS DATA the_geom from states CLASSITEM 'name' PROJECTION +proj=latlong +datum=WGS84 END CLASS NAME US States OUTLINECOLOR 0 0 0 COLOR 255 255 196 END END The key bits here related to PostGIS are the CONNECTION, where you specify all the info needed to get access to your PostgreSQL database, the CONNECTIONTYPE which you set to POSTGIS, and the DATA directive in which you supply a string that contains the SQL expression that pulls out the geometry features you want to map. If you do this with python, then you would first need to create an instance of a mapObj, and then create a layerObj and add it to the map object, setting the properties for the connection. However, my knowledge of this approach is still very sketchy. Documentation is spread around, but the two sites you should peruse are the MapServer docs at: http://mapserver.gis.umn.edu/docs and the PostGIS docs at: http://postgis.refractions.net/docs/. - Bill Thoen ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] PG Seg Faults Performing a Query
How would you suggest I try to track down this problem? I run the following query: SELECT a.* FROM compliance_2006 a, ers_regions b WHERE a.fips_st_cd=b.fips_st AND a.fips_cnty_cd=b.fips_cou AND b.region =1 AND a.fips_st_cd='17' AND a.fips_cnty_cd='003'; and it works. But when I try this: SELECT a.* FROM compliance_2006 a, ers_regions b WHERE a.fips_st_cd=b.fips_st AND a.fips_cnty_cd=b.fips_cou AND b.region =1 AND a.fips_st_cd='17' ; psql dies with the message: Segmentation Fault. Any suggestions? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Searching for Duplicates and Hosed the System
Something is really screwy here. I tried what looked like a simpler task than I tried when I started this message thread. The only unusual issue here is that the table compliance_2006 is rather big (over 18 million records). The table ers_regions is pretty small (about 3100 records) and all the WHERE fields are indexed (except e.region). Here's the not-too-complicated SQL: SELECT a.* FROM compliance_2006 a, ers_regions e WHERE a.fips_st_cd=e.fips_st AND a.fips_cnty_cd=e.fips_cou AND e.region=1; I ran VACUUM ANALYZE just before I launched this and there were no other postgress jobs running. I'm the only user as well. I also ran EXPLAIN prior to the run and got this: Nested Loop (cost=11.71..28800.34 rows=7219 width=584) - Seq Scan on ers_regions e (cost=0.00..71.90 rows=16 width=28) Filter: (region = 1) - Bitmap Heap Scan on compliance_2006 a (cost=11.71..1788.76 rows=451 width=584) Recheck Cond: ((a.fips_st_cd = outer.fips_st) AND (a.fips_cnty_cd = outer.fips_cou)) - Bitmap Index Scan on key_tract (cost=0.00..11.71 rows=451 width=0) Index Cond: ((a.fips_st_cd = outer.fips_st) AND (a.fips_cnty_cd = outer.fips_cou)) (7 rows) So I launched it to see what would happen. This resulted in a very busy disk drive for about an hour, and one by one things started dying on my FC6 Linux machine. First the mouse disappeared, then the cursor in the terminal window where I was running psql vanished, the clock stopped, and the keyboard stopped responding. Meanwhile, the disk drive thrashed on. Finally the screen saver kicked in and shortly thereafter I logged back in only to see the word Killed sitting there on the last line and all my machine's services were running again. Just no data output. I looked in the log and saw this: LOG: transaction ID wrap limit is 1073746500, limited by database postgres LOG: transaction ID wrap limit is 1073746500, limited by database postgres LOG: could not send data to client: Broken pipe I guess this is what killed it, but does it make sense that such a simple request wrapped around the transaction limit? Is the bad behavior of my machine while running this query consistent with an error like this or is the machine broken and that's what caused Postgresql to crater? What should I set the wrap limit to? What suggestions would you make for tracking down the exact problem and fixing it? Any help would be appreciated - Bill Thoen Tom Lane wrote: Bill Thoen [EMAIL PROTECTED] writes: Tom, here's the explain results: Does this help explain what went wrong? (And yes, I think there will be a *lot* of groups.) explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc; QUERY PLAN Sort (cost=15119390.46..15123902.54 rows=1804832 width=160) Sort Key: count(*) - GroupAggregate (cost=13782933.29..14301822.43 rows=1804832 width=160) - Sort (cost=13782933.29..13828054.08 rows=18048318 width=160) Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr - Seq Scan on compliance_2006 (cost=0.00..1039927.18 rows=18048318 width=160) (6 rows) Hmm ... no, actually, that shows the planner doing the right thing for lotsa groups: picking GroupAggregate instead of HashAggregate. The estimated number of groups is 1804832, which might or might not have much to do with reality but in any case seems enough to keep it away from HashAggregate. Do you have autovacuum running, or a scheduled cronjob that runs ANALYZE or VACUUM ANALYZE? The only theory I can think of at this point is that your database statistics are more correct now than they were when you had the problem. If you try the query again, does it behave more sanely? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Seeking datacenter PITR backup procedures [RESENDING]
to restore some databases before you trust it. Getting PITR working effectively is tricky, no matter how many questions you ask of knowledgeable people. You're going to want to have first- hand experience going through the process. HTH. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Searching for Duplicates and Hosed the System
I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have a database with a little more than 18 million records that takes up about 3GB. I need to check to see if there are duplicate records, so I tried a command like this: SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable GROUP BY fld1, fld2, fld3, fld4 ORDER BY 1 DESC; I knew this would take some time, but what I didn't expect was that about an hour into the select, my mouse and keyboard locked up and also I couldn't log in from another computer via SSH. This is a Linux machine running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on the disc too. I finally had to shut the power off and reboot to regain control of my computer (that wasn't good idea, either, but eventually I got everything working again.) Is this normal behavior by PG with large databases? Did I misconfigure something? Does anyone know what might be wrong? - Bill Thoen ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Searching for Duplicates and Hosed the System
Bill Thoen [EMAIL PROTECTED] wrote: I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have a database with a little more than 18 million records that takes up about 3GB. I need to check to see if there are duplicate records, so I tried a command like this: SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable GROUP BY fld1, fld2, fld3, fld4 ORDER BY 1 DESC; I knew this would take some time, but what I didn't expect was that about an hour into the select, my mouse and keyboard locked up and also I couldn't log in from another computer via SSH. This is a Linux machine running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on the disc too. I finally had to shut the power off and reboot to regain control of my computer (that wasn't good idea, either, but eventually I got everything working again.) Is this normal behavior by PG with large databases? No. Something is wrong. Did I misconfigure something? Does anyone know what might be wrong? Possibly, but I would be more inclined to guess that your hardware is faulty and you encountered a RAM error, or the CPU overheated or something along those lines. I'm not familiar with Linux systems hard-locking like that unless there is a hardware issue. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Searching for Duplicates and Hosed the System
Tom, here's the explain results: Does this help explain what went wrong? (And yes, I think there will be a *lot* of groups.) explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc; QUERY PLAN Sort (cost=15119390.46..15123902.54 rows=1804832 width=160) Sort Key: count(*) - GroupAggregate (cost=13782933.29..14301822.43 rows=1804832 width=160) - Sort (cost=13782933.29..13828054.08 rows=18048318 width=160) Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr - Seq Scan on compliance_2006 (cost=0.00..1039927.18 rows=18048318 width=160) (6 rows) On Sun, Aug 19, 2007 at 01:19:51PM -0400, Tom Lane wrote: Bill Thoen [EMAIL PROTECTED] writes: I knew this would take some time, but what I didn't expect was that about an hour into the select, my mouse and keyboard locked up and also I couldn't log in from another computer via SSH. This is a Linux machine running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on the disc too. I finally had to shut the power off and reboot to regain control of my computer (that wasn't good idea, either, but eventually I got everything working again.) I've seen Fedora go nuts like that when it ran out of memory. Once it starts to swap heavily, performance goes into the tank; and once the kernel realizes it's in memory trouble, it starts to kill processes more or less at random. That might explain why ssh stopped working. One thing to do to make it more robust is to disable memory overcommit. I suspect also that configuring it with lots of swap space is counterproductive, because that just encourages the kernel to allow lots of swapping. I haven't actually experimented with that part though. As for why PG ran the system out of memory, I suspect that the planner drastically underestimated the number of groups to be created by your GROUP BY, and thought it could get away with a hash aggregation. We don't currently have any provision for spilling hash aggregation to disk, so if there's a very large number of groups the table just gets very big :-(. The planner is not supposed to choose hash agg if the estimated table size exceeds work_mem ... but if it had out-of-date statistics to work with it might have gotten the wrong answer. Have you ANALYZEd this table recently? What does EXPLAIN show as the estimated number of result rows? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Yet Another COUNT(*)...WHERE...question
In response to Rainer Bauer [EMAIL PROTECTED]: Trevor Talbot wrote: On 8/16/07, Rainer Bauer [EMAIL PROTECTED] wrote: But if you go to eBay, they always give you an accurate count. Even if the no. of items found is pretty large (example: http://search.ebay.com/new). And I'd bet money that they're using a full text search of some kind to get those results, which isn't remotely close to the same thing as a generic SELECT count(*). Without text search (but with a category restriction): http://collectibles.listings.ebay.com/_W0QQsacatZ1QQsocmdZListingItemList I only wanted to show a counter-example for a big site which uses pagination to display result sets and still reports accurate counts. Categories are still finite state: you can simply store a count for each category. Again it's just a case of knowing your data and queries; it's not trying to solve a general infinite-possibilities situation. Consider this query with multiple WHERE conditions: http://search.ebay.com/ne-ol-an_W0QQfasiZ1QQfbdZ1QQfcdZ1QQfcidZ77QQfclZ3QQfmcZ1QQfrppZ50QQfsooZ1QQfsopZ1QQftidZ1QQpriceZ1QQsabdhiZ100QQsacurZ999QQsalicZQ2d15QQsaprchiZ5QQsatitleZQ28neQ2aQ2colQ2aQ2canQ2aQ29QQsojsZ0 My point is that whatever search criterias are involved and how many items are found eBay always returns the *accurate* number of items found. While I don't _want_ to argue with you ... I can't seem to help myself. How do you _know_ that's the exact number of items? There are 50 items on that page, the paginator at the bottom shows 97,686 pages, but there's no way (that I can find) to go to the _last_ page to ensure that said numbers are correct. It could simply be estimating the number of items and calculating the # of pages based on that. With 4mil items, a few 1000 off isn't anything anyone would notice. Before this drifts off: * I do know *why* count(*) is slow using Postgres. * I *think* that count(*) is fast on eBay because count is cheaper using Oracle (which eBay does: http://www.sun.com/customers/index.xml?c=ebay.xml). That could be possible, but it's still speculation at this point. If someone with Oracle-fu could say for sure one way or the other, that would be interesting ... Unless there's data on that sun.com page that provides more detail. It doesn't seem to be willing to load for me at this point ... * I realize that pagination for multi-million tuple results does not make sense. Then what is the point to this thread? Are we just shooting the breeze at this point? -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Yet Another COUNT(*)...WHERE...question
In response to Phoenix Kiula [EMAIL PROTECTED]: Yes, optimization. :) You don't need an exact count to tell someone that there's more data and they can go to it. In general, I agree. But my example of Amazon was only to illustrate the point about two queries and why they may be needed. I seem to see many more pages than you do, but in any case, Google and Amazon can afford to be less precise. Thanks for the suggestion of using EXPLAIN and parsing an approximation, but when you need to show a trader how many trades he has made, for instance, then approximation is not a possibility at all. Especially not if the numbers sway so wildly -- FIRSTDB=# explain select * from trades where t_id = 'kXjha'; QUERY PLAN --- Bitmap Heap Scan on trades (cost=15.77..1447.12 rows=374 width=224) Recheck Cond: ((t_id)::text = 'kXjha'::text) - Bitmap Index Scan on trades_tid_date (cost=0.00..15.67 rows=374 width=0) Index Cond: ((t_id)::text = 'kXjha'::text) (4 rows) FIRSTDB=# select count(*) from trades where t_id = 'kXjha'; count --- 3891 (1 row) Could I do something so that the EXPLAIN showed up with slightly more close-to-accurate stats? The above query is just after a vacuum analyze! In the above case, you could probably materialize the data with a trigger that updates a counter in a separate table every time a new trade is added. This will give you 100% accurate results with _very_ fast response time. Part of the problem is that there's no one answer to your question, there are multiple approaches to solving it, depending on the details of the problem and the acceptable time/accuracy of the answers. Some basic approaches: 1) Materialize the data. MySQL actually does this automatically for you with MyISAM tables, which is why count(*) is so fast. But if you absolutely need fast, accurate counts, you can build your own triggers in PG. This is unlikely to be practical with all queries. 2) Estimate. The accuracy of estimates can vary wildly by query and how often the database is analyzed, etc. For something like, show results 1 - 10 of about 50,000, estimates are great and fast, but for other cases, not acceptable. The good news is you can get a fast estimate from any query with no up-front work. 3) Accept that sometimes to get accurate answers it's going to take time. Around here, we call it the Orbitz technique, because when we discuss it, everyone thinks of the please wait while I process your query page you get from orbitz.com. You'd be surprised how willing your users are to wait, as long as they know they have to wait. 4) Throw more hardware at it. If you absolutely _must_have_ super- accurate results faster, then you may need to buy more RAM, faster disks and faster CPUs to accomplish it. 5) Come up with something revolutionary that nobody's every thought of before. Good luck with this one. Of course, all of these ideas are only practical if you've already ensured that your system is properly tuned. Crappy values for shared_buffers and other tuning will lead you to waste time trying to redesign something that should work just fine, so verify all your configuration first. You may be able to get more acceptable estimates by increasing your statistics targets, for example. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Using PITR Backup and Recovery
In response to [EMAIL PROTECTED] [EMAIL PROTECTED]: We recently moved to PITR backup and recovery solution as defined in the documentation. Our basic setup executes the backup start command, and then takes a snapshot of the filesystem and backups wal files. However, we have database files ( not wal files ) that change while the system is in backup mode. This happens during every backup. Is this normal? Maybe. Not entirely sure I understand you're meaning, but ... My understanding is that pg_start_backup()'s purpose is to mark the database so it knows what the last complete transaction was before it was started. This doesn't prevent PostgreSQL from making changes to DB files, it just ensures that in the event of a restore, PG knows where to start as far as the data files and the WAL log are concerned. I'm curious as to why files would be changing if you made a filesystem snapshot, but that wouldn't be a problem with PostgreSQL, it would be a problem with the filesystem code. Or I could be misunderstanding what you mean. In any event, if database activity is occurring while the backup is running, PostgreSQL's data files will continue to change, but the archive of WAL logs will allow the system to recover from inconsistent changes during the recovery phase. I don't know if anyone's done extensive testing to know just how reliable PITR is, but it's worked every time for me. One caveat: you can't recover PITR data from an amd64 system to an i386 system :D PostgreSQL's data (and possibly the WAL logs as well) is architecture dependent. This can be a royal pain if you don't know about it and you have a mix of architectures. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Moving to postgresql and some ignorant questions
In response to Phoenix Kiula [EMAIL PROTECTED]: If you issue a BEGIN then nothing gets committed until you issue a COMMIT. If anything happens in the meantime then everything you've done since the BEGIN disappears. There are some cases where I would like to bunch queries into a transaction purely for speed purposes, but they're not interdependent for integrity. E.g., BEGIN TRANSACTION; UPDATE1; UPDATE2; UPDATE3; COMMIT; If UPDATE2 fails because it, say, violates a foreign key constraint, then so be it. I want UPDATE3 to go ahead. Is this not possible, or is there an option I can use to do these kind of independent-query transactions? That's not possible, by design. However, your application could keep track of which queries have run, and if one fails, start the transaction over without the failing query. But the rule is, if any query within the transaction errors, then all queries within the transaction are rolled back. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Problem Designing Index
In response to Alan J Batsford [EMAIL PROTECTED]: Hello, I'm doing some select statements on my table that look like: SELECT * FROM table WHERE prod_num = '1234567' AND transaction_timestamp '2007-07-18 21:29:57' OR prod_num '1234567' ORDER BY prod_num ASC, transaction_timestamp ASC LIMIT 1; I've added two indices one for prod_num and another transaction_timestamp. This table has 151,000 rows and the above statement returns in less than a millisecond. If I change the above statement from '' to '' it takes 8 seconds to complete. Prod_num '1234567' is towards the end of the 151k rows. If i use a prod_num like '123' towards the front the problem is reversed with '' and ''. I tried adding a third index that uses both prod_num and transaction_timestamp. The average performance at each end of the data for both '' and '' improved but the problem wasn't resolved. Selects at the end of the data with '' conditions (Like the original statement) then become broken and take 500 ms to finish, which is unacceptable for the application. I did analyze on the table with no effect. Is it possible to design an index that can account for all the scenerios? Thanks for any help you can provide. While it's difficult to be sure, I'm guessing you have either a hardware problem, or a tuning problem -- but I don't think your indexes are a problem. Keep in mind that once PostgreSQL has determined which rows to return, it has to actually read all those rows off disk and send them to the client application. In my opinion, 8 seconds to read in over 100,000 rows isn't unreasonable (especially if those rows are wide). If 8 seconds is an unacceptable time, then you're liable to need hardware to fix it: more RAM to cache those rows, or faster disks or both. However, this is just speculation. You didn't provide analyze output, table schema, hardware details, or configuration information ... so it's entirely possible that there is something else wrong. I'm just making an educated guess. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Database Select Slow
In response to .ep [EMAIL PROTECTED]: On Aug 10, 9:42 pm, [EMAIL PROTECTED] (A. Kretschmer) wrote: am Fri, dem 10.08.2007, um 17:46:11 +0800 mailte carter ck folgendes: Hi all, I am facing a performance issue here. Whenever I do a count(*) on a table that contains about 300K records, it takes few minutes to complete. Whereas my other application which is counting 500K records just take less than 10 seconds to complete. I have indexed all the essential columns and still it does not improve the speed. Indexes don't help in this case, a 'select count(*)' forces a seq. scan. Do you realy need this information? An estimate for the number of rows can you find in the system catalog (reltuples in pg_class, seehttp://www.postgresql.org/docs/current/interactive/catalog-pg-class.html) Hi, what if I need to do a count with a WHERE condition? E.g., SELECT count(*) from customers where cust_id = 'georgebush' and created_on current_date - interval '1 week' ; Can I get the info about this from somewhere in the pg system tables as well? Queries like these are very common in most applications, so I'm hoping I can avoid the sequential scans! Many thanks for any tips. If you only need an estimate, you can do an explain of the query, and grep out the row count. The accuracy of this will vary depending on the statistics, but it's very fast and works with a query of any complexity. If you need fast, accurate counts, your best bet is to set up triggers on your tables to maintain counts in a separate table. This can be rather complex to set up, and you take a performance hit during inserts and updates, but I don't know of any other way to do it. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Asynchronous Query processing
In response to Brendon Sablinski [EMAIL PROTECTED]: I have a need to do asynchronous query processing. I am a little confused by the order and frequency of calls. Can someone tell me if the below is correct? The code comments are littered with my questions. PQsendQueryParams(pgconn, SELECT .., ); while(1) { epoll/poll/select PQsocket(pgconn) [read] // call this each time poll indicates data is ready? PQconsumeInput(pgconn); What poll? PQconsumeInput()/PQisBusy() _is_ the poll. Also, you should check the return code. If you get 0, something has gone wrong. // If isBusy returns 1, just poll again? Will this indicate 0 when I have a full PGResult set? // Or, can it split the results of a single statement across multiple result sets. if(PQisBusy()) conitnue; Is this just experimenting? Note that by simply staying within the loop, you basically wrap asynchronous access with synchronous code. In your real code, you should return to whatever else the program has to work on. // I know the docs say keep calling getResult until it returns NULL. In my // case, I am wondering if that is needed. I only send one statement at a // time to the server. The connection is not listening for notifies either. Do // I still need to call getResult multiple times? I saw many examples online // that DO NOT call it multiple times, all of them executed a single statement. If you know for a fact that there are no more statements in processing, there's no need to call PQgetResult() any more. However, if you know for a fact that there could never be more than a single query in the pipeline, it's unlikely that your code is written to handle asynchronous processing. res = PQgetResult(pgconn); process_results(res); } thanks, brendon -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] v8.2 ... command line interface on Windows
Couldn't this be offered pre-built as an alternative in the Windows installer or as a psql2.exe program? Seems like there might be more than a few PostgreSQL users running with US keyboard layouts under native Windows, and providing the prebuilt version along with the normal one might eliminate many of the postings like this that keep reappearing on these lists. - Bill -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Magnus Hagander Sent: Saturday, August 04, 2007 3:09 PM To: nac1967 Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] v8.2 ... command line interface on Windows nac1967 wrote: I am a PostgreSQL user, using the native windows version 8.2. I am new to PostgreSQL and am wondering if other users have suggestions regarding command line interfaces. I cannot for the life of me get readline to work on the Windows version. If you use IPython, for example, readline works fantastically with smart ctrl-P search of previous commands and tab completion. Does anyone know of a way either to get readline to work or another good command line interface? Readline only works with US keyboard layouts under native windows, that's why it's been turned off in the binary builds. You are left with the very limited commandline editing in the default windows command shell. If you are using only US keyboard layout, you can rebuild psql from source (using the MingW build system) with readline. You only need to rebuild psql - not libpq or the backend itself. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] v8.2 ... command line interface on Windows
We use the Cygwin version of the psql.exe program under Windows instead of the native psql.exe program, even though the rest of the PostgreSQL installation uses the standard native Windows versions of all the other PostgreSQL components (database, tools, etc.). (So before I get flamed, I want to clarify that the ONLY component of PostgreSQL that we use from Cygwin is psql.exe, and it's run from an alternate directory.) The main advantage that we get by using the Cygwin version of psql is that is runs in interactive mode regardless of how it is run, whereas the native psql program runs in non-interactive mode (showing almost no output, no prompts, no readline support, etc.) when run from most shell programs. We frequently run psql via a remote SSH connection or from the RXVT terminal program or from inside Emacs, and in all of these cases the native psql program runs in non-interactive mode whereas the Cygwin version of psql is fully-functional. - Bill -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of nac1967 Sent: Saturday, August 04, 2007 3:01 PM To: pgsql-general@postgresql.org Subject: [GENERAL] v8.2 ... command line interface on Windows I am a PostgreSQL user, using the native windows version 8.2. I am new to PostgreSQL and am wondering if other users have suggestions regarding command line interfaces. I cannot for the life of me get readline to work on the Windows version. If you use IPython, for example, readline works fantastically with smart ctrl-P search of previous commands and tab completion. Does anyone know of a way either to get readline to work or another good command line interface? Thank you. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] file-system snapshot under freebsd for backup
In response to ProAce [EMAIL PROTECTED]: I already put the data directory on hds san storage, but there is no snapshot license on it. Could I use mksnap_ffs under freebsd to make snapshot for backup ? Maybe. I'm confused by your question, but: * If the data directory is on an FFS2 volume on a FreeBSD machine, you can use mksnap_ffs to back it up, and that approach is actually recommended for PITR-type backups. * If the data is mounted via NFS or something similar and the filesystem is not FFS2, then snapshots are not available on FreeBSD. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] file-system snapshot under freebsd for backup
In response to ProAce [EMAIL PROTECTED]: I use SAN, not NAS. :) Because some bugdet issue, I just have a basic SAN environment. For multipathing, I use geom_fox to complete. For snapshot, I want to use mksnap_ffs. But I don't have any experience of using mksnap_ffs to backup pgsql. Does anyone give me some advice? If you don't want to do PITR, read this: http://www.postgresql.org/docs/8.2/static/backup-file.html You can also use the mount command to make filesystem snapshots. Assuming your PG data directory is under the mount point /var/db: mount -o snapshot -u /var/db/snapshot /var/db mdconfig -a -t vnode -f /var/db/snapshot -u 4 mount -r /dev/md4 /mnt [ ... do whatever you do to back up /mnt (which is the snapshot of /var/db ...] umount /mnt mdconfig -d -u 4 rm /var/db/snapshot 2007/8/2, Bill Moran [EMAIL PROTECTED]: Maybe. I'm confused by your question, but: * If the data directory is on an FFS2 volume on a FreeBSD machine, you can use mksnap_ffs to back it up, and that approach is actually recommended for PITR-type backups. * If the data is mounted via NFS or something similar and the filesystem is not FFS2, then snapshots are not available on FreeBSD. -- Bill Moran http://www.potentialtech.com -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match