[GENERAL] Query optimisation
The following query is executing in a long time, 500ms or so. This needs to be about 100ms or so in order to be acceptable. Can anyone spot any optimisations that I could make to this query to bring the exec time down? Have I designed this query correctly? Is joining to the same table every time like that the right thing to be doing? I'm pretty sure I have indexes on the right fields etc. Thanks to anyone who can spare a minute or three to look at this. Here's the query and the query plan: SELECT images.imageid, images_sites.siteid, images.title, images.description, albums.albumid, albums.title AS albumtitle, albums.private AS album_private, images.entered, images.taken, images.private, images.comments, images.showcomments, images.shownames, images.commentlimit, images.commentlimit_user, images.trashed, images.deleted, imageid_file.fileid AS imageid_fileid, imageid_file.filename AS imageid_filename, imageid_file.size AS imageid_size, imageid_file.md5 AS imageid_md5, imageid_file.sha1 AS imageid_sha1, size120_file.fileid AS size120_fileid, size120_file.filename AS size120_filename, size240_file.fileid AS size240_fileid, size240_file.filename AS size240_filename, size420_file.fileid AS size420_fileid, size420_file.filename AS size420_filename, size600_file.fileid AS size600_fileid, size600_file.filename AS size600_filename, size800_file.fileid AS size800_fileid, size800_file.filename AS size800_filename, size1024_file.fileid AS size1024_fileid, size1024_file.filename AS size1024_filename, size130sq_file.fileid AS size130sq_fileid, size130sq_file.filename AS size130sq_filename, size240sq_file.fileid AS size240sq_fileid, size240sq_file.filename AS size240sq_filename, size420sq_file.fileid AS size420sq_fileid, size420sq_file.filename AS size420sq_filename FROM images JOIN files imageid_file ON (images.imageid = imageid_file.fileid) JOIN files size120_file ON (images.size120 = size120_file.fileid) JOIN files size240_file ON (images.size240 = size240_file.fileid) JOIN files size420_file ON (images.size420 = size420_file.fileid) JOIN files size600_file ON (images.size600 = size600_file.fileid) JOIN files size800_file ON (images.size800 = size800_file.fileid) JOIN files size1024_file ON (images.size1024 = size1024_file.fileid) JOIN files size130sq_file ON (images.size130sq = size130sq_file.fileid) JOIN files size240sq_file ON (images.size240sq = size240sq_file.fileid) JOIN files size420sq_file ON (images.size420sq = size420sq_file.fileid) JOIN images_sites ON (images_sites.imageid = images.imageid) LEFT OUTER JOIN albums ON (images_sites.albumid = albums.albumid) WHERE images_sites.siteid = 1 AND images_sites.albumid = 6 AND (albums.private IS NULL OR albums.private = 5) AND images.private = 5 ORDER BY images.entered; QUERY PLAN -- Sort (cost=14045.35..14045.43 rows=34 width=404) (actual time=507.877..507.881 rows=11 loops=1) Sort Key: images.entered Sort Method: quicksort Memory: 22kB - Nested Loop (cost=13084.62..14044.48 rows=34 width=404) (actual time=507.409..507.814 rows=11 loops=1) - Nested Loop (cost=13084.62..13986.47 rows=34 width=395) (actual time=507.399..507.724 rows=11 loops=1) - Nested Loop (cost=13084.62..13928.46 rows=34 width=386) (actual time=507.389..507.642 rows=11 loops=1)
Re: [GENERAL] Query optimisation
As you can see, they all are the same table, just repeatedly joined with aliases. The images table has several fields, each one referring to a different sized version of the image. It then has to join against the files table for each size to get the file that corresponds with that image version. - Naz. Craig Ringer wrote: Naz Gassiep wrote: JOIN files imageid_file ON (images.imageid = imageid_file.fileid) JOIN files size120_file ON (images.size120 = size120_file.fileid) JOIN files size240_file ON (images.size240 = size240_file.fileid) JOIN files size420_file ON (images.size420 = size420_file.fileid) JOIN files size600_file ON (images.size600 = size600_file.fileid) JOIN files size800_file ON (images.size800 = size800_file.fileid) JOIN files size1024_file ON (images.size1024 = size1024_file.fileid) JOIN files size130sq_file ON (images.size130sq = size130sq_file.fileid) JOIN files size240sq_file ON (images.size240sq = size240sq_file.fileid) JOIN files size420sq_file ON (images.size420sq = size420sq_file.fileid) JOIN images_sites ON (images_sites.imageid = images.imageid) That's a whole lot of joins. Do all those tables have the same fields? Can you unify them into one table with an additional field like "imagesize" and use an appropriate WHERE clause when looking up the table? -- Craig Ringer
[GENERAL] Serial Data Type
I have just created a table using SELECT INTO however the PK was supposed to be a serial. It is now an integer. To make it a serial I just create the seq and set the default to be the nextval() of that sequence right? is there anything else I need to do? It'll maintain the transactional safety of a serial created default, right? I.e., it'll not rollback seq values on a transaction abortion will it? Thanks, - Naz. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Locale / Encoding mismatch
I have just attempted to upgrade to 8.3.1 and I now get this error when trying to create a UTF8 DB: [EMAIL PROTECTED]:~$ createdb twerl -E utf8 createdb: database creation failed: ERROR: encoding UTF8 does not match server's locale en_AU DETAIL: The server's LC_CTYPE setting requires encoding LATIN1. I have never bothered with the server's locale, all the locale handling in the app gets handled within the DB itself. I guess I now have to dig into the murky world of system locales and how they affect my app ecosystem. I don't suppose there is an easy solution to this and a short explanation of why it now occurs is there? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Locale / Encoding mismatch
Short answer is: use en_AU.UTF-8 for your locale. If it doesn't exist you can create it using /etc/locale.gen (assuming you're running some kind of linux) I've just installed that locale on my system (Debian Sarge). However I'm still getting the error. I only set the locale for that user, could it be that the locale needs to be set to UTF8 system wide? Here's the output: [EMAIL PROTECTED]:~$ locale LANG=en_AU.UTF8 LANGUAGE=en_AU.UTF8:en_US.UTF8:en_GB.UTF8:en.UTF8 LC_CTYPE=en_AU.UTF8 LC_NUMERIC=en_AU.UTF8 LC_TIME=en_AU.UTF8 LC_COLLATE=en_AU.UTF8 LC_MONETARY=en_AU.UTF8 LC_MESSAGES=en_AU.UTF8 LC_PAPER=en_AU.UTF8 LC_NAME=en_AU.UTF8 LC_ADDRESS=en_AU.UTF8 LC_TELEPHONE=en_AU.UTF8 LC_MEASUREMENT=en_AU.UTF8 LC_IDENTIFICATION=en_AU.UTF8 LC_ALL= [EMAIL PROTECTED]:~$ createdb twerl -E utf8 createdb: database creation failed: ERROR: encoding UTF8 does not match server's locale en_AU DETAIL: The server's LC_CTYPE setting requires encoding LATIN1. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)
1) What type of names do you prefer? --- a) old notation - createdb, createuser ... b) new one with pg_ prefix - pg_createdb, pg_creteuser ... c) new one with pg prefix - pgcreatedb, pgcreateuser ... d) remove them - psql is the solution e) remove them - pgadmin is the solution a) 2) How often do you use these tools? --- a) every day (e.g. in my cron) b) one per week c) one time d) never b) 3) What name of initdb do you prefer? -- -- a) initdb b) pg_initdb c) pg_init d) pg_ctl -d dir init (replace initdb with pg_ctl new functionality) e) What is initdb? My start/stop script does it automatically. a) 4) How do you perform VACUUM? - a) vacuumdb - shell command b) VACUUM - SQL command c) autovacuum d) What is vacuum? b) We're not seriously thinking of changing these are we? Once a command set has been in use for as long a time as the PG command set has, any benefit that may be derived by new users with an aversion to documentation reading is vastly offset by the confusion that would result among long time users whos scripts, tools and mental mental processes all have the old names hardcoded in. I can't imagine how there would be a nomenclature clash, if there is, then just take one of the tools out of the path, use symlinks or put calling scripts in the path instead. These are suboptimal solutions, granted, but *any* naming scheme we change to will be subject to the possibility of naming clashes with another package with a similar name, unless we make the binaries have long, verbose names. I don't know about you, but I don't fancy having to type postgresqlclient dbname to start a DB. I like psql dbname. So I ask again, we're not seriously thinking about this are we? Regards, - Naz. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Read/Write restriction mechanism
A tangentially PG related question: In a PHP project I have several functions that I use for DB operations. I only want to allow one of them to write, all the others are for reading only. I was thinking that a way I can enforce this would be to check that the read only ones only have queries where the first non-whitespace character is 'S'. This is not a security thing, user defined queries are totally disallowed, this is just a so developers don't forget measure. Checking the first char like that seems awfully hackinsh to me, although I can't see any reason it wouldn't work. Does anyone have any better ideas? (Using DB level perms are out, as this is the function usage I'm trying to control, not the connections). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Hash Indexes
Why are hash indexes obviously best? In an ideal world with a good implementation maybe, but postgresql b-trees are really quite good. Because doing normal queries on a table where there are large text blocks is unlikely to be a good idea. E.g.,: SELECT * FROM table WHERE textcol = 'a 4kb block of text'; You could always do something like: CREATE INDEX foo ON table((md5(textcol))); Then it will get used in queries like: SELECT * FROM table WHERE md5(textcol) = md5('text'); That's exactly what I was considering doing, however there is always the change of a hash collision. Yes, this is a very remote chance, however the ramifications of a collision under those circumstances is potentially catastrophic. Think a user being delivered text that contains confidential and sensitive material as opposed to the latest memo about the cleaning of toilets. I would assume that hash indexes have inbuilt mechanisms for collision checking before returning the row as a match. Am I correct in this assumption? Best regards, - Naz. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Hash Indexes
Hi there, I am creating functionality where there are blocks of text that are being stored in the DB and that need to be searched for. No like or pattern matching, just a plain old WHERE clause. Obviously, hash indexes would be best here, however I've been warned away from PG's hash implementation. Would it be better to manually create a column storing the hashes (maintained with a CHECK (md5(textcol) = hashcol) type constraint) and generate the hashes in the app? - Naz. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Query problem
Aside from the messy nomenclature, is anyone able to spot why the sum column from the first query is not returning 7, as the second query suggests that it should? I know that this is probably simple, and that It's probably going to jump out at me the minute I hit Send, but if I don't hit send, then I'll never see it hehe. twerl=# SELECT 'contactgroups', contactgroups.siteid, contactgroups.id, contactgroups.name, contactgroups.description, SUM(contacts.id), contactgroups.trashed FROM contactgroups LEFT OUTER JOIN contacts ON (contactgroups.id = contacts.groupid) WHERE contactgroups.trashed IS NOT NULL AND contactgroups.deleted IS NULL GROUP BY contactgroups.siteid, contactgroups.id, contactgroups.name, contactgroups.description, contactgroups.trashed; ?column?| siteid | id |name| description | sum | trashed ---++++-+-+--- contactgroups | 1 | 3 | Telechoice / Optus | | 594 | 2007-10-01 20:08:51.449825+10 (1 row) twerl=# select count(*) from contacts where groupid = 3; count --- 7 (1 row) Thanks, - Naz. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Etc/% timezones
I just noticed that in the pg_timezone_names system table, the name and abbrev of the Etc/% timezones appear to be inverted with their utc_offset value. I never noticed before, as I filter these zones out (among others) and do not use them in my app. I was just interested as to why the suggested abbrev field and utc_offset field are the opposite sign of each other. Is this a bug or is there a reason for this? I have included the output from the view with the relevant time zones below. E.g., I am in Melbourne, Australia, which I have always known as GMT+10 (ignoring DST). According to the below list, however, the timezone that has the matching utc_offset to mine is called Etc/GMT-10, which to me seems wrong. Were I presented with a list of the names, I'd select Etc/GMT+10 as my time zone, incorrectly telling the app that my timezone's utc_offset is -10:00:00. I'm pretty certain that this is a bug. Can anyone confirm? Regards, - Naz. select * from pg_timezone_names where name ilike 'etc/%' order by utc_offset, abbrev, name; name | abbrev | utc_offset | is_dst ---+++ Etc/GMT+12| GMT+12 | -12:00:00 | f Etc/GMT+11| GMT+11 | -11:00:00 | f Etc/GMT+10| GMT+10 | -10:00:00 | f Etc/GMT+9 | GMT+9 | -09:00:00 | f Etc/GMT+8 | GMT+8 | -08:00:00 | f Etc/GMT+7 | GMT+7 | -07:00:00 | f Etc/GMT+6 | GMT+6 | -06:00:00 | f Etc/GMT+5 | GMT+5 | -05:00:00 | f Etc/GMT+4 | GMT+4 | -04:00:00 | f Etc/GMT+3 | GMT+3 | -03:00:00 | f Etc/GMT+2 | GMT+2 | -02:00:00 | f Etc/GMT+1 | GMT+1 | -01:00:00 | f Etc/GMT | GMT| 00:00:00 | f Etc/GMT+0 | GMT| 00:00:00 | f Etc/GMT-0 | GMT| 00:00:00 | f Etc/GMT0 | GMT| 00:00:00 | f Etc/Greenwich | GMT| 00:00:00 | f Etc/UCT | UCT| 00:00:00 | f Etc/UTC | UTC| 00:00:00 | f Etc/Universal | UTC| 00:00:00 | f Etc/Zulu | UTC| 00:00:00 | f Etc/GMT-1 | GMT-1 | 01:00:00 | f Etc/GMT-2 | GMT-2 | 02:00:00 | f Etc/GMT-3 | GMT-3 | 03:00:00 | f Etc/GMT-4 | GMT-4 | 04:00:00 | f Etc/GMT-5 | GMT-5 | 05:00:00 | f Etc/GMT-6 | GMT-6 | 06:00:00 | f Etc/GMT-7 | GMT-7 | 07:00:00 | f Etc/GMT-8 | GMT-8 | 08:00:00 | f Etc/GMT-9 | GMT-9 | 09:00:00 | f Etc/GMT-10| GMT-10 | 10:00:00 | f Etc/GMT-11| GMT-11 | 11:00:00 | f Etc/GMT-12| GMT-12 | 12:00:00 | f Etc/GMT-13| GMT-13 | 13:00:00 | f Etc/GMT-14| GMT-14 | 14:00:00 | f (35 rows) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Etc/% timezones
I'm pretty certain that this is a bug. Can anyone confirm? It is a bug -- in the SQL standard definition. The meaning of the sign is inverted w.r.t. the relevant POSIX (?) standard, AFAIU. Unsurprisingly, we're following the SQL standard here. Wow. Seriously, wow. Good thing I'm filtering them out then, else confusion would ensue. Has anyone brought the stupidity of this to the attention of the SQL team? Is there any rationale behind this? I've been working with timezone stuff for the last few weeks and I'm really getting into it, so I'm just interested to know how it was concluded that this is a good idea. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Accessing pg_timezone_names system view
I was wondering if there is any reason that accessing the system view pg_timezone_names is extremely slow relative to other queries. The following query: SELECT * FROM pg_timezone_names; Executes in between 29ms and 32ms on my server. It takes about the same when I put a WHERE name = 'some/timezone' clause in it. To put this into perspective, on the pages that execute this, it accounts for something like 3/4 of my DB execution time. Here's a screenshot to show you what I'm talking about: http://www.mrnaz.com/dbetime.gif As you can see, the execution of that single fetch dwarfs all other processing loads. I've run this a few times, and the timings are always roughly the same. Is there a way for me to speed this up? Would I be better off loading these into a static table and executing from there? It seems kinda purpose defeating to do that though. Perhaps this has been addressed in 8.3 ? I eagerly await. Regards, - Naz ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] More Time Zone fun
I'm making (slow) progress in my timezone system, and I just noticed this little behavioral nugget, which surely is a bug. In the system view pg_timezone_names is a few timezones that use leap seconds. An example which I tested is Asia/Riyadh87. When I attempt to SET TIME ZONE using this timezone, I get this: ERROR: time zone Asia/Riyadh87 appears to use leap seconds DETAIL: PostgreSQL does not support leap seconds. Surely it's a bug to have timezones in the list that result in errors being thrown? Also, is there a reason that PostgreSQL does not support leap seconds? Regards, - Naz ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] More Time Zone fun
Just to further note an oddity, apparently PostgreSQL in fact *does* support leap seconds since 7.4 After scouring the manual for info on this, I discovered on this page: http://www.postgresql.org/docs/8.2/static/release-7-4.html Tom added support for 60 second values in the seconds component of TIME/TIMESTAMP/INTERVAL data types, which, as noted by the changelog, is needed for support of leap seconds. So this error very well may be a behavioral bug. Should I post this to -hackers or -bugs ? - Naz. Naz Gassiep wrote: I'm making (slow) progress in my timezone system, and I just noticed this little behavioral nugget, which surely is a bug. In the system view pg_timezone_names is a few timezones that use leap seconds. An example which I tested is Asia/Riyadh87. When I attempt to SET TIME ZONE using this timezone, I get this: ERROR: time zone Asia/Riyadh87 appears to use leap seconds DETAIL: PostgreSQL does not support leap seconds. Surely it's a bug to have timezones in the list that result in errors being thrown? Also, is there a reason that PostgreSQL does not support leap seconds? Regards, - Naz ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] User-Friendly TimeZone List
Hi all, I am still, after quite some time, wrangling over the time zone system in my app. I have sorted out all the internal handling, however I am still uncertain as to what the best way to get the user to select their time zone is. I was thinking of having users just select their timezones from a list which was populated with the contents of the query: select * from pg_timezone_names ; which would look something like this. http://mrnaz.com/tztest.html This however is problematic for a number of reasons: 1. The timezone list there isn't exactly user friendly, there are many Etc/* timezones there, as well as others that would be potentially confusing for users who are trying to select the timezone they are in. 2. If a timezone is removed from the list for whatever reason, then the system will be left with users who have selected a timezone that is no longer a valid choice in the list. The ideal situation would be to maintain my own persistent list of timezones (the way Microsoft maintains their own user-friendly list that they use for Windows) that maps to the timezones embedded in the PG back end, but I haven't the resources to pull this off. Has anyone else worked on a mechanism to allow users to supply the timezone they are in, and if so, do you have any comments on how best to handle the apparently mutually exclusive problems of simplicity for users and accuracy in the back end? At the simple end of the I can't just have users only select from a list going from GMT-12 to GMT+12. At the complex end of the scale I'd just give them the output from the list and any that are deprecated will result in the user reverting to UTC until they reset a new timezone. Comments on this usability problem are appreciated, - Naz. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] User-Friendly TimeZone List
Do the views in 8.2 pg_timezone_abbrevs and pg_timezone_names help at all? They are where I am currently getting the authoritative list of timezones. However this list does not seem to be quite appropriate to expose users to directly. Read my original post, I've explained it a little more there. Thanks, - Naz.
[GENERAL] TimestampTZ
When entering data into a timestamptz field, if no timezone is added does it assume you've entered a UTC time, or the time at the timezone set in the session with SET TIMEZONE, or the local system time ? - Naz ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Persistent connections in PHP
Hi, Does the connection pooling feature of PHP cause the persistent connections to keep the properties between accesses? E.g., if a user takes a connection, sets a timezone to it using SET TIMEZONE, will the next user who happens to take this connection get it in that same state, or will it be reset to a blank or starting state as though it had been opened? Also, what about temp tables? Will they be present to the second user if the first user set some up? - Naz. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] TimestampTZ
As clearly stated in the documentation http://www.postgresql.org/docs/8.2/interactive/datatype-datetime.html#DATATYPE-TIMEZONES Perhaps I'm thick, but I don't find that particular page to be clear on this at all. - Naz. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Removing a schema
I'm trying to remove a schema and move all the tables to another schema. I've manually run alter table on every table to move them, however all the foreign keys still reference the old schema, and there are too many to do by hand. Is there an easy way to update one of the system catalogs to do this? I want to change every reference of that schema to point to the new one, and then drop the schema, but not drop any data. Thanks, - Naz. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] using Tsearch2 for chemical text
I think you might need to write a custom lexer to divide the strings into meaningful units. If there are subsections of these names that make sense to search for, then tsearch2 can certainly handle the mechanics of that, but I doubt that the standard rules will divide these names into lexemes usefully. A custom lexer for tsearch2 that recognized chemistry related lexical components (di-, tetra-, acetyl-, ethan-, -oic, -ane, -ene etc) would increase *hugely* the out-of-the-box applicability of PostgreSQL to scientific applications. Perhaps such an effort could be co ordinated with a physics based lexer and biology related lexer, to perhaps provide a unified lexer that provided full scientific capabilities in the way that PostGIS provides unified geospatial capabilities. I don't know how best to bring such an effort about, but I do know that if such a thing were created it would be a boon for PostgreSQL, giving it a very significant leg up in terms of functionality, not to mention the great positive impact that the wide, free availability of such a tool would have on the scientific research community. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] why postgresql over other RDBMS
Surely such a use case could, and more to the point *should* be met using PITR? Regards, - Naz. Alvaro Herrera wrote: A.M. wrote: On May 24, 2007, at 14:29 , Wiebe Cazemier wrote: On Thursday 24 May 2007 17:30, Alexander Staubo wrote: [2] Nobody else has this, I believe, except possibly Ingres and NonStop SQL. This means you can do a begin transaction, then issue create table, alter table, etc. ad nauseum, and in the mean time concurrent transactions will just work. Beautiful for atomically upgrading a production server. Oracle, of course, commits after each DDL statements. If this is such a rare feature, I'm very glad we chose postgresql. I use it all the time, and wouldn't know what to do without it. We circumvented Ruby on Rails' migrations, and just implemented them in SQL. Writing migrations is a breeze this way, and you don't have to hassle with atomicity, or the pain when you discover the migration doesn't work on the production server. Indeed. Wouldn't it be a cool feature to persists transaction states across connections so that a new connection could get access to a sub- transaction state? That way, you could make your schema changes and test them with any number of test clients (which designate the state to connect with) and then you would commit when everything works. Unfortunately, the postgresql architecture wouldn't lend itself well to this. Still, it seems like a basic extension of the notion of sub- transactions. Hmm, doesn't this Just Work with two-phase commit?
[GENERAL] Changing DB Encodings
Why have I been told that I need to do a re initdb to change the char encoding? The man says i can just createdb foodb -E utf8so why would i need to dump/initdb/create/restore? cant i just dump/create/restore? It'd save all the messing around with changing the data dirs etc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Changing DB Encodings
Tom Lane wrote: Are you clear on the difference between encoding and locale? I confidently reply with maybe. You can make new databases with whatever encoding you say, but the server's lc_collate and lc_ctype are frozen at initdb, and it will not work well to select an encoding that is incompatible with the locale setting. In practice this means that you can only use multiple database encodings when you initdb'd in C locale; all other locale settings imply a particular encoding. Yes, this is pretty annoying. No, it's not easy to fix. OK I understand now, I think. How do I check what locale is currently set? I've successfully created a utf8 database, does that imply that because I was able to create a DB with a different encoding to the ones all the others use (SQL_ASCII) that my locale is set to C ? Regards, - Naz. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Changing DB Encodings
Tom Lane wrote: I've successfully created a utf8 database, does that imply that because I was able to create a DB with a different encoding to the ones all the others use (SQL_ASCII) that my locale is set to C ? No, that implies a lack of error checking. Surely, then, that's a bug? Shouldn't postmaster check if a DB is in an encoding that differs from the selected locale? lc_collate and lc_type both report C as the locale. Does that mean I can safely bulk load a pg_dump into it? When doing the dump, do I have to use --encoding=utf8 as a param? Please excuse me if I sound ignorant of all of this, I'm currently forced to address an issue that I don't yet feel I have a sufficient grasp of. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Proposed Feature
Yo, I'd have no problem with it being disabled by default. I ruthlessly pare my systray down and turn on the hide inactive icons function as well. But when I have PG running it is the sort of thing I'd want to be able to see at a glance, the same as with Apache. - Naz. Magnus Hagander wrote: Naz Gassiep wrote: Hey, I'm sure that'd be greatly appreciated, most other major servers and DBs have a similar feature, and that's what the systray is for, i.e., viewing major user-installed services. Don't forget that the tray (or taskbar notification area as it's supposed to be called) is one of the most abused areas of the windows GUI. I've seen installs where it takes up half the screen. So it's not *necessarily* a good thing - making it mandatory to have an easy way to turn it off. And if we don't add any actual *functionality*, it should absolutely not be enabled by default. Providing it as an option is never wrong, though. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Table Names
Is there a limit on the length of table names? Thanks, - Naz. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Proposed Feature
I'm using PG on windows for the first time (as of about 6 minutes ago). I was thinking that it would be great to have a system tray icon with a running indicator, kind of like the way Apache2.x for windows has, or even MSSQL. Perhaps the PG logo with a small white circle with a red square or a green triangle in the same fashion. Just a thought. - Naz. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Proposed Feature
Hi, It provides a very quick and easy status check (running or not) as well as right click access to start/stop and any other basic admin features that may be appropriate. I'd be happy with seeing whether it was running, as well as start/stop accessible via one right click. It's the same as Apache for Win32, MSSQL any many other server programs. - Naz. Bruce Momjian wrote: Naz Gassiep wrote: I'm using PG on windows for the first time (as of about 6 minutes ago). I was thinking that it would be great to have a system tray icon with a running indicator, kind of like the way Apache2.x for windows has, or even MSSQL. Perhaps the PG logo with a small white circle with a red square or a green triangle in the same fashion. Just a thought. And what does the icon show or do?
Re: [GENERAL] Proposed Feature
Hey, I'm sure that'd be greatly appreciated, most other major servers and DBs have a similar feature, and that's what the systray is for, i.e., viewing major user-installed services. - Naz. Tony Caduto wrote: Bruce Momjian wrote: Naz Gassiep wrote: I'm using PG on windows for the first time (as of about 6 minutes ago). I was thinking that it would be great to have a system tray icon with a running indicator, kind of like the way Apache2.x for windows has, or even MSSQL. Perhaps the PG logo with a small white circle with a red square or a green triangle in the same fashion. Just a thought. And what does the icon show or do? That would be pretty easy to do with Delphi. I could whip something up and donate it to the project with a BSD license. It could be a green Play arrow if the service is running and a red one if it is not, then have a few right click options to start/stop/restart the service. I was thinking the same thing awhile back, but forgot about it. Later, ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Aggregates
Hi, If I have a table with users and a table with messages, is it possible to have a query that returns user.* as well as one extra column with the number of messages they have posted and the data and time of the last message? At the moment I am using a subquery to do this, however it seems suboptimal. Is there a better way? - Naz. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Surrogate VS natural keys
OK so which is the correct way to do it? E.g., Say I have a table with users, and a table with clubs, and a table that links them. Each user can be in more than one club and each club has more than one member. Standard M:M relationship. Which link table is the right way to do it? This: CREATE TABLE ( userid INTEGER NOT NULL REFERENCES users, clubid INTEGER NOT NULL REFERENCES clubs, PRIMARY KEY (userid, clubid) ); Or this: CREATE TABLE ( id SERIAL PRIMARY KEY, userid INTEGER NOT NULL REFERENCES users, clubid INTEGER NOT NULL REFERENCES clubs ); I've always favored natural keys (the first option) as it just seems to me a more natural and semantic representation of the data, however I often get app designers complaining about it being more complex or something. Comments? ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] TimeZone List
I've been trying to sort out the answer to this question for a while now, I've received different answers from different places. I'm looking for a definitive non-volatile list of timezones for use in a web application. I can't use the OS's time zone list, as changing OSes may cause some listed timezones to disappear or change name, causing havoc for any entries in the DB referring to those zones. Is relying on zone.tab to be a non-volatile list of timezones appropriate for this purpose? Or is there an ISO listing of the various time zones? I would prefer to use a list that matched the PostgreSQL DB's internal timezone strings so they could easily be referenced without any further work in the app. Thanks - Naz. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] TimeZone List
Alvaro Herrera wrote: alvherre=# select * from pg_timezone_names ; name | abbrev | utc_offset | is_dst --+++ Africa/Algiers | CET| 01:00:00 | f Africa/Luanda| WAT| 01:00:00 | f Africa/Porto-Novo| WAT| 01:00:00 | f Africa/Gaborone | CAT| 02:00:00 | f Africa/Ouagadougou | GMT| 00:00:00 | f Africa/Bujumbura | CAT| 02:00:00 | f Africa/Douala| WAT| 01:00:00 | f Africa/Bangui| WAT| 01:00:00 | f Africa/Ndjamena | WAT| 01:00:00 | f Africa/Kinshasa | WAT| 01:00:00 | f Africa/Lubumbashi| CAT| 02:00:00 | f Africa/Brazzaville | WAT| 01:00:00 | f Africa/Abidjan | GMT| 00:00:00 | f Africa/Djibouti | EAT| 03:00:00 | f Africa/Cairo | EEST | 03:00:00 | t Africa/Malabo| WAT| 01:00:00 | f Africa/Asmera| EAT| 03:00:00 | f ... alvherre=# select * from pg_timezone_abbrevs ; abbrev | utc_offset | is_dst ++ ACSST | 10:30:00 | t ACST | -04:00:00 | t ACT| -05:00:00 | f ADT| -03:00:00 | t AESST | 11:00:00 | t AEST | 10:00:00 | f AFT| 04:30:00 | f AKDT | -08:00:00 | t AKST | -09:00:00 | f ALMST | 07:00:00 | t ... Ok, that's kinda cool. But can I trust those names to not change from version to version? And how are conflicts in the abbreviations handled? (for example, EST is a valid time zone for the US and Australia). Also, whats with the Etc/GMT+x timezones? I assume they are just standard integer offset values? I'd probably filter them out in the app using WHERE substring(name FROM 0 FOR 3) Etc are there any caveats for doing this? Thanks, - Naz ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] TimeZone List
Tom Lane wrote: Naz Gassiep [EMAIL PROTECTED] writes: Ok, that's kinda cool. But can I trust those names to not change from version to version? No, you can't. The reason there is no nonvolatile list of timezones is that timezones are subject to the whims of politicians, who can and do change them at the drop of a hat. Read the historical comments in the zic source files sometime... As if war, famine, injustice and social inequality weren't enough, politicians also have to screw up time zone handling in my app. Damn politicians. - Naz. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Integrity on large sites
I'm working in a project at the moment that is using MySQL, and people keep making assertions like this one: *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. This sounds to me like MySQLish. A large DB working with no RI would give me nightmares. Is it really true that large sites turn RI off to improve performance, or is that just a MySQL thing where it gets turned off just because MySQL allows you to turn it off and improve performance? Can you even turn RI off in PostgreSQL? Does Oracle, DB2 or MSSQL allow you to turn it off? Am I just being naive in thinking that everyone runs their DBs with RI in production? - Naz ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] In theory question
This may be a question for -hackers, but I don't like disturbing them unnecessarily. I've been having a look at memcached. I would like to ask, is there any reason that, theoretically, a similar caching system could be built right into the db serving daemon? I.e., the hash tables and libevent could sit on top of postmaster as an optional component caching data on a per-query basis and only hitting the actual db in the event of a cache miss? Such a mechanism could be a) transparent to any and all APIs accessing the back end thus instantly providing the benefits of caching to all apps transparently, and b) assist with replication by providing a way for a setup to have n serving daemons (effectively db caches) on different machines accessing n replicated DBs. Such a setup would be far easier to set up than n fully fledged DB servers, and would likely scale better anyway. Thoughts? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] In theory question
Hannes Dorbath wrote: I think this is close to what MySQL's query cache does. The question is if this should be the job of the DBMS and not another layer. At least the pgmemcache author and I think that it's better done outside the DBMS. See http://people.FreeBSD.org/~seanc/pgmemcache/pgmemcache.pdf for the idea. This is exactly what I was asking about. So my theoretical idea has already been implemented. Now if only *all* my ideas were done for me by the time I came up with them :) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] In theory question
This is exactly what I was asking about. So my theoretical idea has already been implemented. Now if only *all* my ideas were done for me by the time I came up with them :) Then you wouldn't be able to eventually patent them ;) What an un-BSD licensish thing to say :P ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] In theory question
I have always found MySQL's query cache to be utterly useless. Think about it this way : It only works for tables that seldom change. It does not work for big tables (like the posts table of a forum) because the cache would have to be huge. So, the most frequent usage of MySQL's query cache is for dumb applications who use, for instance, PHP, store their configuration in MySQL, and reload it on each and every page with a SELECT * FROM configuration. In this case, you save the query time, but you don't save : the roundtrip between PHP and the database, extracting query results, building objects from them, time spent in ORMs, etc. A much better solution is to do your own caching, for instance using shared memory in the application server, and then you cache native language objects. You not only save the query time, but also all the time spent building those objects on every page load. This was actually my original question. In my web page, I cache the config, user preferences and other static material in session vars and only rerun the function that fetches them if the app ever changes them If you are clever about db fetches in this way and store as much stuff in session vars (which is just like storing it in a memcached instance really) is there much or even any benefit in running memcached? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Update violating constraint
Hi, I'm trying to do an update on a table that has a unique constraint on the field, I need to update the table by setting field = field+1 however if this does not perform the updates on the table in a proper order (from last to first) then the update will cause a violation of the index *during* the update even though the table would be consistent after the update completes. So the update fails. How do I get around this without removing the constraint? - Naz. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Update violating constraint
Michael Glaesemann wrote: On May 2, 2007, at 23:01 , Naz Gassiep wrote: I'm trying to do an update on a table that has a unique constraint on the field, I need to update the table by setting field = field+1 however if this does not perform the updates on the table in a proper order (from last to first) then the update will cause a violation of the index *during* the update even though the table would be consistent after the update completes. If field's values are all positive, I generally will do it in two steps: update foo set field = -1 * (field + 1); update foo set field = -1 * field where field 0; Another way to do it is to add and then remove a large offset: update foo set field = 10 * (field + 1); update foo set field = field - 10 where field 10; Yes, in fact I actually use option one already in the handling of sql trees, so I'm annoyed with myself for not figuring that out. I don't know why you'd ever use your second option ever, as it virtually guarantees problems at a random point in your DB's growth. Thanks muchly for that! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Storing blobs in PG DB
This doesn't answer your question, but I thought I'd throw my opinion in anyway. My personal view is that in general, binary files have no place in databases. Filesystems are for files, databases are for data. My design choice is to store the files in a fileystem and use the database to hold metadata as well as a pointer to the file. If you *must* put files into the database, then you can do so, and PG will handle that many files of those sizes with ease. For all intents and purposes, PG can store an unlimited number of files. You're far more likely to run into walls in the form of limitations in your disk I/O system then limitations in what PG will handle. - Naz. Nikolay Moskvichev wrote: Hi All! Question is : How suitable PG for storing about 2 000 000 binary files 0,5-2,0 Mb size each ? It is not planned the big number of clients or a plenty of updatings. Like photoalbum on local host. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Lifecycle of PostgreSQL releases
Tom Lane wrote: Naz Gassiep [EMAIL PROTECTED] writes: Joshua D. Drake wrote: Example discussion with customer: ... Finally, in the absence of security concerns or performance issues (and I mean the we can't afford to buy better hardware type edge of the envelope type issues) there is zero *need* to upgrade. This line of argument ignores the fact that newer versions often contain fixes for data-loss-grade bugs. Now admittedly that is usually an argument for updating to x.y.z+1 rather than x.y+1, but I think it destroys any reasoning on the basis of if it ain't broke. Not when you consider that I did say in the absence of security concerns. I consider the possibility that a bug can cause me to lose my data to be a security concern. If it's a cosmetic bug or something that otherwise does not affect a feature I use, then upgrading, as you say, is very much of a x.y+1 wait than upgrading minor releases sometimes multiple times a month. It must be remembered that human error can result in downtime, which can cost money. Therefore its a foo risk vs bar risk type balance. At least, that's how I see it. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Bulk Data Entry
This problem is to do with bulk loading of data. I use the following scripts to take data from a live DB and put it into a testing DB with the current version of the schema: # SCRIPT 1 pg_dump blogbog -a -D -f blogbog_data.sql dropdb blogbogtemp createdb blogbogtemp psql blogbogtemp -f /www/htdocs/mrnaz.com/sql_tables.sql pg_dump blogbogtemp -D -f blogbog_tables.sql cp blogbog_tables.sql blogbog_constraints.sql I edit the blogbog_tables.sql file to remove the constraints and the blogbog_constraints.sql file to remove the tables. I then run the following script: # SCRIPT 2 dropdb blogbogdev createdb blogbogdev psql blogbogdev -f ./blogbog_tables.sql ./blogbog_tables_inserted.log psql blogbogdev -f ./blogbog_data.sql ./blogbog_data_inserted.log psql blogbogdev -f ./blogbog_constraints.sql ./blogbog_constraints_applied.log Somewhere in the insertion of the data (4th line of script 2) there is a failure, as no data appears in the blogbogdev database. This is likely due to a column in the live data somewhere that has been deprecated from the schema causing an insert failure on a table causing failure on all tables referring to it and so on cascading down the reference paths. I really would prefer psql to halt on error instead of just continuing to plow right ahead, but IIRC there was a discussion about this and it was decided that continuing was the best behavior. I have grepped the .log files that the script outputs for ERROR but there is none. How would one go about finding where the error in an SQL script is? Bulk ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Lifecycle of PostgreSQL releases
Joshua D. Drake wrote: Example discussion with customer: Customer: CMD, should we update to 8.2.3 CMD: Is there something in 8.2.3 that will benefit you? Customer: We don't know CMD: Are you having problems with 8.1? (We try to push all customers to at least 8.1) Customer: No, it is just that 8.2 is the current release CMD: True, but 8.3 is due out in the summer and 8.3 is a standard cycle release Customer: Oh... o.k. let's wait. CMD: I think that is probably prudent. That's how it is with me. I upgraded to 8.1 from 7.4 because there was nothing in 8.0 that I *needed* and performance was already more than sufficient on my ridiculous overkill hardware. I recently upgraded from 8.1.x to 8.2.3 only because of the DST updates in Western Australia. I would not have otherwise. If it ain't broke, don't fix it. Furthermore, upgrading is inherently risky. There is always the chance of human error induced downtime, and so doing it just coz is not a prudent policy. Finally, in the absence of security concerns or performance issues (and I mean the we can't afford to buy better hardware type edge of the envelope type issues) there is zero *need* to upgrade. Sure, it may be better to use a new and shiny version, however I always favor a realistic and honest assessment of *needs* over *perceived needs*. All that being said, the older the version you are running, the higher the weight that should be attributed to the upgrading is a good idea just coz argument. After a point, upgrading is just a good idea just coz. I wouldn't recommend anyone continue to run 7.2.x merely because it was working for them. Just my 2c (adjusted for inflation). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Postgres Database size
I have been looking for such a function. Having Just upgraded to 8.2, this function is a very welcome addition to my arsenal of tools. Many thanks! - Naz. Reece Hart wrote: On Sun, 2007-03-18 at 00:40 +0530, Mageshwaran wrote: how to find the size of a particular database in postgres... The old way was to use du or similar. Recent versions (I believe =8.1, but check the release notes to be sure) provide several useful functions for this: pg_column_size pg_database_size pg_relation_size pg_size_pretty pg_tablespace_size pg_total_relation_size For example: [EMAIL PROTECTED] select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ; datname | pg_size_pretty -+ postgres| 3905 kB csb | 113 GB template0 | 3840 kB csb-dev | 124 GB csb-dev-snapshot-2007-03-08 | 123 GB csb_02_02_2007 | 121 GB template1 | 3840 kB
[GENERAL] Design / Implementation problem
This is possibly not a DB only problem, the solution may involve application logic as well. But PG users are the smartest bunch I know. Ass kissing aside, here are the details: *** The Scenario *** We are running a customer loyalty program whereby customers earn points for purchasing products. Each product has a value of points that are earned by purchasing it, and a value of points required to redeem it. In order to prevent customers from stockpiling points, we want to place an expiry date on points so that unused points expire and are lost if they are not redeemed within a certain period of time. This will be calculated on a FIFO basis, I.e., the oldest points will expire first. We will assume the expiry period is 12 months. *** The Problem *** Ascertaining which points to expire is fairly conceptually simple. At any given point in time, the points expired is simply the balance on hand at the start of the period, less redemptions in that period. If the redemptions is less than the balance at open, not all points that were available on that date were used, and the difference is the expiry. This can be done periodically, say, at the start of every month. However there are a few problems with doing it periodically 1. The runs are likely to be too large to be manageable. A DB with tens of thousands of customers and many hundreds of thousands or even millions of sales in the records tables will require several queries and some application calculation to compute. If it takes 2 seconds to compute each balance of a 20,000 strong customer base, that's over 11 hours of heavy lifting in the DB, which will likely result in severely degraded performance during those hours. This problem can only get worse as time goes on, and hardware upgrade requirements just to accommodate a 12 hour window once a month is the sign of an app not designed to scale well. 2. Calculating the balance on the fly would be more effective, as it is unlikley that many customers will check their balance on a regular basis. It is likely that a small fraction of customers will check their balance in a given month, meaning that calculating it on the fly would both spread the load over time as well as reduce the total load, even if on the fly calculation results in significantly higher per-customer calculation time. 3. The app is a web app, and it would be preferable to contain business logic within the database itself or the current app codebase. Spreading application logic into an external mechanism such as cron or an external daemon would be undesirable unless there was no other way. *** A Possible Solution *** Calculating the balance on the fly can be done easily if it is done at the time the customer seeks to redeem a voucher. Expired points are only relevant at these times, and so the expired points would be calculated with an application function that did the following: 1. Get the balance as it was 12 months ago by getting total points earned less redemptions and expiries up to that date. 2. Subtract from it redemptions and expiries since then. The value obtained, if it is positive, is the value of points to expire. 3. Log the expiry entry, and then calculate the balance of points to the current date by subtracting total points redeemed and expired from total points earned. This procedure has a few practical problems, however: 1. Customers, when they view their running total, will not be aware that some of the points included in it will have expired, as the expiry will only happen when the application attempts to log a redemption. 2. A customer may attempt to redeem a product they do not have enoughh points for, and be told at the last minute that they do not have enough points, leading to acrimony. The solution is then to calculate it on only on redemptions, but also whenever the customer attempts to view their balance. This will ensure that expired points will never be shown in the current balance of available points. This, however, means the calculation may be done many times by each customer in a single catalog browsing session. *** The Question *** Is there a way to design the DB schema as well as the query in such a manner that calculating the point balance on the fly is not an unfeasibly heavy duty calculation to be done at every page view? This problem does not appear to be solved comprehensively by anyone. When I log into my credit card company web site to check my points, I get a message please come back in an hour, and your points will be calculated if I haven't logged in for over a week. So obviously they calculate the balance and put it in a table that acts as a cached balance. Emirates has a different solution, they do bi-annual runs, so points expire every March and September for them. Neither of these solutions appeals to me, and there must be A Better Way(tm). ---(end of
Re: [GENERAL] Design / Implementation problem
Here it is again with more sensible wrapping: *** The Scenario *** We are running a customer loyalty program whereby customers earn points for purchasing products. Each product has a value of points that are earned by purchasing it, and a value of points required to redeem it. In order to prevent customers from stockpiling points, we want to place an expiry date on points so that unused points expire and are lost if they are not redeemed within a certain period of time. This will be calculated on a FIFO basis, I.e., the oldest points will expire first. We will assume the expiry period is 12 months. *** The Problem *** Ascertaining which points to expire is fairly conceptually simple. At any given point in time, the points expired is simply the balance on hand at the start of the period, less redemptions in that period. If the redemptions is less than the balance at open, not all points that were available on that date were used, and the difference is the expiry. This can be done periodically, say, at the start of every month. However there are a few problems with doing it periodically 1. The runs are likely to be too large to be manageable. A DB with tens of thousands of customers and many hundreds of thousands or even millions of sales in the records tables will require several queries and some application calculation to compute. If it takes 2 seconds to compute each balance of a 20,000 strong customer base, that's over 11 hours of heavy lifting in the DB, which will likely result in severely degraded performance during those hours. This problem can only get worse as time goes on, and hardware upgrade requirements just to accommodate a 12 hour window once a month is the sign of an app not designed to scale well. 2. Calculating the balance on the fly would be more effective, as it is unlikley that many customers will check their balance on a regular basis. It is likely that a small fraction of customers will check their balance in a given month, meaning that calculating it on the fly would both spread the load over time as well as reduce the total load, even if on the fly calculation results in significantly higher per-customer calculation time. 3. The app is a web app, and it would be preferable to contain business logic within the database itself or the current app codebase. Spreading application logic into an external mechanism such as cron or an external daemon would be undesirable unless there was no other way. *** A Possible Solution *** Calculating the balance on the fly can be done easily if it is done at the time the customer seeks to redeem a voucher. Expired points are only relevant at these times, and so the expired points would be calculated with an application function that did the following: 1. Get the balance as it was 12 months ago by getting total points earned less redemptions and expiries up to that date. 2. Subtract from it redemptions and expiries since then. The value obtained, if it is positive, is the value of points to expire. 3. Log the expiry entry, and then calculate the balance of points to the current date by subtracting total points redeemed and expired from total points earned. This procedure has a few practical problems, however: 1. Customers, when they view their running total, will not be aware that some of the points included in it will have expired, as the expiry will only happen when the application attempts to log a redemption. 2. A customer may attempt to redeem a product they do not have enoughh points for, and be told at the last minute that they do not have enough points, leading to acrimony. The solution is then to calculate it on only on redemptions, but also whenever the customer attempts to view their balance. This will ensure that expired points will never be shown in the current balance of available points. This, however, means the calculation may be done many times by each customer in a single catalog browsing session. *** The Question *** Is there a way to design the DB schema as well as the query in such a manner that calculating the point balance on the fly is not an unfeasibly heavy duty calculation to be done at every page view? This problem does not appear to be solved comprehensively by anyone. When I log into my credit card company web site to check my points, I get a message please come back in an hour, and your points will be calculated if I haven't logged in for over a week. So obviously they calculate the balance and put it in a table that acts as a cached balance. Emirates has a different solution, they do bi-annual runs, so points expire every March and September for them. Neither of these solutions appeals, and there must be A Better Way(tm). ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Query Assistance
Is anyone able to tell me why in the last column of the returned result set, the value calculated is always 0? QUERY: SELECT products.productid, products.cost, products.srp, CASE WHEN products.srp 0 THEN (products.srp - products.cost) * 100 / products.srp ELSE 0 END AS margin, products.type, products.gstexempt, productpointvalues.earnvalue, productpointvalues.redeemvalue, productpointvalues.earnvalue / productpointvalues.redeemvalue AS redemptionmargin FROM categories, products LEFT OUTER JOIN productpointvalues USING (productid) WHERE products.active IS TRUE AND products.catid = categories.catid AND products.catid = 2 ORDER BY products.name; RESULT SET: productid | cost | srp |margin| type | gstexempt | earnvalue | redeemvalue | redemptionmargin ---+---++--+--+---+---+-+-- 716 | 8.60 | 10.00 | 14. | N| f | 50 |1500 |0 15 | 87.00 | 100.00 | 13. | N| f | 500 | 1 |0 13 | 26.10 | 30.00 | 13. | N| f | 150 |3000 |0 1189 | 0.00 | 40.00 | 100. | N| f | 200 |4000 |0 14 | 43.50 | 50.00 | 13. | N| f | 250 |5000 |0 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Query Assistance
Indeed. Thanks for that! I keep getting bitten by that too hehe. - Naz. William Garrison wrote: My guess is that integer division is to blame: 50 divided by 1500 = 0.03 which rounds to zero. You probably have to cast them to real before doing the division. Naz Gassiep wrote: Is anyone able to tell me why in the last column of the returned result set, the value calculated is always 0? QUERY: SELECT products.productid, products.cost, products.srp, CASE WHEN products.srp 0 THEN (products.srp - products.cost) * 100 / products.srp ELSE 0 END AS margin, products.type, products.gstexempt, productpointvalues.earnvalue, productpointvalues.redeemvalue, productpointvalues.earnvalue / productpointvalues.redeemvalue AS redemptionmargin FROM categories, products LEFT OUTER JOIN productpointvalues USING (productid) WHERE products.active IS TRUE AND products.catid = categories.catid AND products.catid = 2 ORDER BY products.name; RESULT SET: productid | cost | srp |margin| type | gstexempt | earnvalue | redeemvalue | redemptionmargin ---+---++--+--+---+---+-+-- 716 | 8.60 | 10.00 | 14. | N| f |50 |1500 |0 15 | 87.00 | 100.00 | 13. | N| f | 500 | 1 |0 13 | 26.10 | 30.00 | 13. | N| f | 150 |3000 |0 1189 | 0.00 | 40.00 | 100. | N| f | 200 |4000 |0 14 | 43.50 | 50.00 | 13. | N| f | 250 |5000 |0 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Query timing
Using EXPLAIN ANALYZE I can get the execution time of a query. Is there a command I can use to get the execution time without the planning information? I just need to time lots of queries that have complex plans and it'd be easier if I didn't have pages and pages of planning info between tries. The queries themselves are one line each, but refer to views, which make the plans pretty complex. Being able to see several times in one terminal window would be easier. Thanks, - Naz. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Query timing
That's not quite as fast as I would like to do it, that throws in a few more steps which slow down the development process. However if there is no way I will persevere with the method I have now. Thanks, - Naz. Jorge Godoy wrote: Naz Gassiep [EMAIL PROTECTED] writes: Using EXPLAIN ANALYZE I can get the execution time of a query. Is there a command I can use to get the execution time without the planning information? I just need to time lots of queries that have complex plans and it'd be easier if I didn't have pages and pages of planning info between tries. The queries themselves are one line each, but refer to views, which make the plans pretty complex. Being able to see several times in one terminal window would be easier. Write the queries in a file, run them with psql -f and grep for the desired lines only. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Select retrieval slowdown after db drop/reload. Suggestions?
You have to run ANALYZE; on your db after a drop/reload to recollect the stats. In the rest db, jus run ANALYZE; and then see how fast it is. I'd guess that this is your issue. Regards, - Naz. Andrew Edson wrote: I have a select statement, used in a Perl program, which is supposed to find all records related to those in one table which have a delete_dt field value offour years or older. This is the select statement: SELECT t2.dist_id, t1.clnt_seq, t2.cntrct_seq, t2.cntrct_id, t3.aunit_seq, t1.person_seq, t1.addr_seq, t3.addr_seq, t4.frst_nm || ' ' || t4.lst_nm AS name, t5.addr_1, t6.acntdel_dt FROM t1, t2, t3, t4, t5, t6 WHERE t1.clnt_seq = t2.clnt_seq AND t2.cntrct_seq = t3.cntrct_seq AND t3.aunit_seq = t6.aunit_seq AND t1.person_seq = t4.person_seq AND t3.addr_seq = t5.addr_seq AND t1.active_flg =0 AND t2.active_flg =0 AND t3.active_flg = 0 AND t6.acntdel_dt now() - '4 years'::interval order by t2.cntrct_id asc; I'm working on a test box at the moment; the db I am using was made by dumping the production db and copying it over to the test box to be loaded into a newly-created db there. It took a while for me to get the original Perl program working, as I don't really understand Perl, but after I did so, I dropped the db and reloaded it again off of the original files, so I could try another test run and pay attention to what's happening. On the original load of the test db, the query above had a run time of roughly 3, 3.5 minutes before giving results. Considering the size of the db it's searching through, I feel that's fairly reasonable, especially since that's about what the production db does on the same query. Now, after the drop/recreate, the test db is taking somewhat longer to give back its results; just shy of 7 minutes, if I've done the math correctly. (Timing results - Time: 417531.436 ms) I'm the only person working on this particular box at this point. This problem did not start until I reloaded the db from the original files. Obviously, I managed to miss something in the drop/reload process, but I have no clue what. I'm running a vacuum full analyze at the moment; if anyone has any other suggestions as to what I could do to solve this (admittedly minor) problem, I would be grateful to hear them. Thank you for your consideration. No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started.
[GENERAL] max_fsm_pages
I just did a vacuum analyze and I got a message I've never seen before: conwatchlive=# vacuum analyze; NOTICE: number of page slots needed (27056) exceeds max_fsm_pages (2) HINT: Consider increasing the configuration parameter max_fsm_pages to a value over 27056. VACUUM conwatchlive=# What does this mean? I assume it has nothing to do with the Flying Spaghetti Monster. More generally, I am a novice at administering a PostgreSQL database, is there a list of tips somewhere that I can read to improve performance? ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Ghost open transaction
I was performing a bunch of INSERTs into a table, users, that has a unique index on username. During the transaction, my internet connection dropped. The INSERTs were being done inside a transaction. Once I had manhandled my DSL router back online, I went back into a console to redo the inserts. I found that after I did the first insert, it appeared to freeze. I thought that my net had dropped out again, but I was able to Ctrl+C the command and rollback and do it again, with the same result. The previous connection is obviously still active, locking the transaction until the fate of the previous insert with that username is known, i.e., the ghost connection rolls back or commits. How do I determine which connection is the ghost connection, and how do I tell PG to kill it? Also, is it an OS setting to determine the timeout on open but inactive connections, or a PG setting? - Naz. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] vista
It's the folks who think that non-Windows-using developers should care about Vista that bug me. This is open-source code, people. Scratch your own itch. The scratch your own itch line can only be pushed so far, if it is being said by a developer who works on a project that desires to be taken seriously by professionals in industry. For minor features, yes, perhaps it could be argued that the core team could ignore certain issues, and just wait for a patch. For something like Vista compatibility, if you want to be taken seriously by anyone who uses Windows (hands up anyone who knows a Windows user), scratch your own itch is not really going to cut it, IMHO. I'm used to getting that line when talking to 2 developer obscure projects that have a userbase of a half a dozen, but for a project like PostgreSQL, the they tell you to do it yourself brush is one we do NOT want to get tarred with. If we don't have the resources to cope with a Vista port immediately then so be it. If it's low priority, so be it. However, lets not appear to deride as unnecessary that which we cannot immediately provide a solution to. That's small time project mentality. - Naz. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] vista
So what do you suggest? -core vote and order someone to do the work? Postgresql.org isn't a business and doesn't employ any developer - we only have the option of accepting patches from people/companies with itches. I don't suggest any chance to any structures in place, it's a purely PR point. That's important and we acknowledge the need. Even in the absence of any progress on that item, a statement like this sounds better to PHBs than If you need it, submit a patch. Regards, - Naz. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] vista
So... If you're not a PostgreSQL Win32 port dev, and *don't know* what they're up to as far as Vista, why respond to the Q? Or why respond fix it yourself instead of ask this guy or nobody here will know yet or post your query on -ports or -hackers. Precisely. My point is not that people *should* care about Win32, or that some coercive system should be put into place to force devs onto RFEs, or even that we should commit to having something ready at all ever. It's a point of apparent attitude. Responding with a useless answer is *worse* than simply ignoring the question. I am not a Win32 user (at least not on servers), but if I wanted to know if Vista compatibility was being worked on getting if you want it, go do it would be even less useful than an effort to convince me to run the DB on a *nix back end. I'd rather have someone tell me how and why to migrate to a better OS. Frankly, I too could care less about PG on Vista. Longhorn isn't due until Vista SP1, so PG support has a long time to go before it's a real concern. But then I didn't try to answer the question. Same here. It's a matter of apparent attitude of the community to outsiders or new users. Even with something like future Win32 support, I'd rather see people being told that issue is not important to our project because x, y and z than go do it yourself. Anyway, I seem to have kicked up a bit of a hornet's nest here so I'll shut up now. - Naz. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Changing schemas
Is there a way to change the schema that all objects are in? Essentially I want to move everything currently in the database into public rather than having the complex schemas that I have at the moment. They are unnecessary and the DB complexity is trivial, so using schema partitioning is more trouble than it is worth. Is there a way I can do this? I tried editing a dump, but that has proven to be a little difficult, as I am not a master of regex. If I can do it from within the DB before I do the dump that'd be easier. Is there a way to do this from within the DB? Thanks, - Naz. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Atomicity?
I am getting an error that I think I understand, but that I didn't think should happen. Below is the output from psql that I am getting to trigger this error. If the violation of the constraint really is being caused WITHIN the query, doesn't that violate the principle of atomicity? I.e., operations and entities should be considered a single entire construct rather than a collection of smaller, discrete parts. Or do I have my understanding all wrong? In any case, how do I get around this problem? Regards, - Naz. conwatch=# \d replies; Table conwatch.replies Column | Type | Modifiers ---+--+--- replyid | integer | not null default nextval('replies_replyid_seq'::regclass) postid| integer | not null lft | smallint | not null rgt | smallint | not null poster| integer | not null posted| timestamp with time zone | not null default now() title | character varying(100) | not null body | text | anonymous | boolean | not null default false Indexes: replies_pkey PRIMARY KEY, btree (replyid) replies_lft_postid UNIQUE, btree (lft, postid) replies_rgt_postid UNIQUE, btree (rgt, postid) replies_lft_index btree (lft) replies_rgt_index btree (rgt) Foreign-key constraints: replies_poster_fkey FOREIGN KEY (poster) REFERENCES users(userid) replies_postid_fkey FOREIGN KEY (postid) REFERENCES posts(postid) conwatch=# select replyid, postid, lft, rgt, title from replies where postid = 18 order by lft; replyid | postid | lft | rgt |title -++-+-+-- 24 | 18 | 1 | 14 | Invisible root post. 25 | 18 | 2 | 7 | Re: Pronto 26 | 18 | 3 | 6 | Re: Pronto 27 | 18 | 4 | 5 | Re: Pronto 29 | 18 | 8 | 13 | Re: Pronto 31 | 18 | 9 | 12 | Re: Pronto 32 | 18 | 10 | 11 | Re: Pronto (7 rows) conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 11; ERROR: duplicate key violates unique constraint replies_rgt_postid conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 14; UPDATE 1 conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 13; UPDATE 1 conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 12; UPDATE 1 conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 11; UPDATE 1 conwatch=# ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Atomicity?
Peter Eisentraut wrote: Naz Gassiep wrote: If the violation of the constraint really is being caused WITHIN the query, doesn't that violate the principle of atomicity? I.e., operations and entities should be considered a single entire construct rather than a collection of smaller, discrete parts. The principle of atomicity merely says that transactions are either performed entirely or not at all. If the transaction is not performed, then there is no violation of atomicity. conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 11; ERROR: duplicate key violates unique constraint "replies_rgt_postid" This is a well-known deficiency in PostgreSQL. You will have to work around it somehow (by changing the query, the schema, or the index). Do we have an ETA on fixing it? Or is it a long term outstanding issue with no ETA as yet? Thanks for the reply, - Naz
Re: [GENERAL] Atomicity?
No, the subsequent UPDATEs were just there to show you they worked... I was only interested in the failed update, and why it failed. The DB was consistent before the query, and it would have been after the query, so I did not understand why the query failed unless the query made teh DB inconsistent at some point DURING its execution. This seems odd to me, as queries should not trigger errors like that if the DB is only out of consistency DURING its execution, as long as it is consistent before and after. Regards, - Naz. Joshua D. Drake wrote: Naz Gassiep wrote: I am getting an error that I think I understand, but that I didn't think should happen. Below is the output from psql that I am getting to trigger this error. If the violation of the constraint really is being caused WITHIN the query, doesn't that violate the principle of atomicity? I.e., operations and entities should be considered a single entire construct rather than a collection of smaller, discrete parts. Or do I have my understanding all wrong? In any case, how do I get around this problem? If you do not specify the beginning of a transaction, all statements are run within their own transaction.. e;g: Your example actually means: begin; conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 11; commit; begin; conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 14; commit; What you want is: begin; UPDATE 1 conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 13; UPDATE 1 conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 12; UPDATE 1 conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 11; UPDATE 1 conwatch=# commit; Joshua D. Drake ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Atomicity?
I would like more information on this deficiency and what causes it so I know when to anticipate it. This resulted in a rather nasty bug which took me ages to track down. Is anyone able+willing to explain a little here or should I ask in -hackers ? Regards, - Naz. Michael Glaesemann wrote: On Aug 29, 2006, at 4:46 , Peter Eisentraut wrote: Naz Gassiep wrote: conwatch=# UPDATE replies SET rgt = rgt + 2 WHERE postid = 18 AND rgt = 11; ERROR: duplicate key violates unique constraint replies_rgt_postid This is a well-known deficiency in PostgreSQL. You will have to work around it somehow (by changing the query, the schema, or the index). One such workaround is: BEGIN; UPDATE replies SET rgt = -1 * (rgt + 2) WHERE postid = 18 AND rgt = 11; UPDATE replies SET rgt = -1 * rgt WHERE rgt 0; COMMIT; Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 6: explain analyze is your friend