Re: [GENERAL] Changing collate & ctype for an existing database
On 07/12/2017 11:25 PM, Tom Lane wrote: rihad writes: What if only English letters are used in the textual indices (ascii 0-127), would they still be impacted after datctype&datcollate "C"->"en_US.UTF-8" change? Yes, as even minimal testing would have told you. C sort order is more case-sensitive, for instance. regards, tom lane . Btw, can we wrap the update of datcollate &datctype and rebuilding of textual indices inside a transaction with effectively 0 downtime? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Changing collate & ctype for an existing database
On 07/12/2017 11:25 PM, Tom Lane wrote: rihad writes: What if only English letters are used in the textual indices (ascii 0-127), would they still be impacted after datctype&datcollate "C"->"en_US.UTF-8" change? Yes, as even minimal testing would have told you. C sort order is more case-sensitive, for instance. regards, tom lane . Thanks. It would be great if initdb didn't assume an implicit encoding, to prevent such fundamental configuration mistakes in the future. More often than not collation/ctype settings of an ssh login session used to run initdb aren't what must be used to set up the cluster. It'd be great if initdb didn't go any further if not provided with an explicit encoding. The error message would require the user to think twice before proceeding, and to read up on the matter. Explicit is better than implicit, as the old saying goes :) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Changing collate & ctype for an existing database
On 07/12/2017 09:31 PM, Tom Lane wrote: rihad writes: On 07/12/2017 01:54 PM, Albe Laurenz wrote: As you see, your index is still sorted according to the C collation and scanning it returns wrong results. This ordering issue can certainly be classified as an inconsistency, but nothing to lose sleep over. Is this all that is normally meant when saying "index corruption"? Laurenz neglected to point out that if the index isn't sorted the way that the system assumes it is, then searches may fail to find values that are present (due to descending into the wrong subtree), and by the same token insertions may fail to enforce uniqueness. That's pretty corrupt in my book. regards, tom lane What if only English letters are used in the textual indices (ascii 0-127), would they still be impacted after datctype&datcollate "C"->"en_US.UTF-8" change? Encoding has always been UTF8, btw. postgres=# \l List of databases Name| Owner | Encoding | Collate |Ctype| Access privileges ---+--+--+-+-+--- mydb| myuser | UTF8 | C | C | -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Changing collate & ctype for an existing database
On 07/12/2017 09:31 PM, Tom Lane wrote: rihad writes: On 07/12/2017 01:54 PM, Albe Laurenz wrote: As you see, your index is still sorted according to the C collation and scanning it returns wrong results. This ordering issue can certainly be classified as an inconsistency, but nothing to lose sleep over. Is this all that is normally meant when saying "index corruption"? Laurenz neglected to point out that if the index isn't sorted the way that the system assumes it is, then searches may fail to find values that are present (due to descending into the wrong subtree), and by the same token insertions may fail to enforce uniqueness. That's pretty corrupt in my book. regards, tom lane Wow. It sure is. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Changing collate & ctype for an existing database
On 07/12/2017 01:54 PM, Albe Laurenz wrote: rihad wrote: Hi there. We have a working database that was unfortunately created by initdb with default ("C") collation & ctype. All other locale specific settings have the value en_US.UTF-8 in postgresql.conf. The database itself is multilingual and all its data is stored in UTF-8. Sorting doesn't work correctly, though. To fix that, can I just do this: update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8' where datname='mydb'; This does seem to work on a testing copy of the database, i.e. select lower('БлаБлаБла') now works correctly when connected to that database. Is there still any chance for corrupting data by doing this, or indexes stopping working etc? p.s.: postgres 9.6.3 As explained, yes. Indexes on string columns will be corrupted. See this example: test=# CREATE DATABASE breakme LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0; test=# \c breakme breakme=# CREATE TABLE sort(id integer PRIMARY KEY, val text NOT NULL); breakme=# INSERT INTO sort VALUES (1, 'LITTLE'), (2, 'big'), (3, 'b-less'); breakme=# CREATE INDEX ON sort(val); breakme=# SET enable_seqscan=off; -- force index use breakme=# SELECT * FROM sort ORDER BY val; ┌┬┐ │ id │ val │ ├┼┤ │ 1 │ LITTLE │ │ 3 │ b-less │ │ 2 │ big│ └┴┘ (3 rows) breakme=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='breakme'; breakme=# \c breakme breakme=# SET enable_seqscan=off; -- force index use breakme=# SELECT * FROM sort ORDER BY val; ┌┬┐ │ id │ val │ ├┼┤ │ 1 │ LITTLE │ │ 3 │ b-less │ │ 2 │ big│ └┴┘ (3 rows) breakme=# SET enable_seqscan=on; -- this and the following force sequential scan breakme=# SET enable_bitmapscan=off; breakme=# SET enable_indexscan=off; breakme=# SET enable_indexonlyscan=off; breakme=# SELECT * FROM sort ORDER BY val; -- this returns the correct order ┌┬┐ │ id │ val │ ├┼┤ │ 2 │ big│ │ 3 │ b-less │ │ 1 │ LITTLE │ └┴┘ (3 rows) As you see, your index is still sorted according to the C collation and scanning it returns wrong results. Yours, Laurenz Albe This ordering issue can certainly be classified as an inconsistency, but nothing to lose sleep over. Is this all that is normally meant when saying "index corruption"? What about updating or deleting the wrong row addressed by the textual index that hasn't been rebuilt after datcollate/datctype change, complete table/database corruption, or other scary night-time stories of this kind? Possible? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Changing collate & ctype for an existing database
Hi there. We have a working database that was unfortunately created by initdb with default ("C") collation & ctype. All other locale specific settings have the value en_US.UTF-8 in postgresql.conf. The database itself is multilingual and all its data is stored in UTF-8. Sorting doesn't work correctly, though. To fix that, can I just do this: update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8' where datname='mydb'; This does seem to work on a testing copy of the database, i.e. select lower('БлаБлаБла') now works correctly when connected to that database. Is there still any chance for corrupting data by doing this, or indexes stopping working etc? p.s.: postgres 9.6.3 Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Changing collate & ctype for an existing database
On 07/11/2017 12:45 AM, Tom Lane wrote: rihad writes: On 07/10/2017 11:07 PM, Tom Lane wrote: ... which that isn't. I'd suggest checking for indexes that might need to be rebuilt with this query borrowed from the regression tests: I ran the query on our production database. Zero results. Really? You have no indexes on textual columns? That seems surprising. Oops, of course we do, around 10-15 per db. I was initially connected to the postgres database when I ran the query, I thought the query you gave me was global by looking at it. So, deciding NOT to reindex all of them risks the corruption of their relevant tables? It could be easier to simply drop and restore the db, albeit with some downtime. Thank you so much for you help. Do I have the green light to set datcollate='en_US.UTF-8', datctype='en_US.UTF-8' Well, I'd double-check that result, but I suppose you can always reindex later if you find you missed something. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Changing collate & ctype for an existing database
On 07/10/2017 11:07 PM, Tom Lane wrote: rihad writes: On 07/10/2017 08:42 PM, Tom Lane wrote: No, your indexes on text/char/varchar columns will be corrupted (because their sort order will now be wrong). If you can reindex them before doing anything more with the database, you'd be ok ... I think. Testing on a scratch copy of the database would be a good idea, if this is valuable data. Thank you, Tom. But can I still do it for the template1 database? update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8' where datname='template1'; It'd be safe to do it on template0, and also on template1 as long as that has only the original contents ... It's empty, only hosting a few extensions. ... which that isn't. I'd suggest checking for indexes that might need to be rebuilt with this query borrowed from the regression tests: SELECT indexrelid::regclass, indrelid::regclass, iclass, icoll FROM (SELECT indexrelid, indrelid, unnest(indclass) as iclass, unnest(indcollation) as icoll FROM pg_index) ss WHERE icoll != 0 AND iclass != (SELECT oid FROM pg_opclass WHERE opcname = 'text_pattern_ops' AND opcmethod = (SELECT oid FROM pg_am WHERE amname = 'btree')); I ran the query on our production database. Zero results. Do I have the green light to set datcollate='en_US.UTF-8', datctype='en_US.UTF-8' for all our working databases? :) Or for template0 & template1 only? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Changing collate & ctype for an existing database
On 07/10/2017 08:42 PM, Tom Lane wrote: rihad writes: Hi there. We have a working database that was unfortunately created by initdb with default ("C") collation & ctype. All other locale specific settings have the value en_US.UTF-8 in postgresql.conf. The database itself is multilingual and all its data is stored in UTF-8. Sorting doesn't work correctly, though. To fix that, can I just do this: update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8' where datname='mydb'; No, your indexes on text/char/varchar columns will be corrupted (because their sort order will now be wrong). If you can reindex them before doing anything more with the database, you'd be ok ... I think. Testing on a scratch copy of the database would be a good idea, if this is valuable data. regards, tom lane Thank you, Tom. But can I still do it for the template1 database? update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8' where datname='template1'; It's empty, only hosting a few extensions. Now I can't even create a database having a different collation: $ createdb -O myuser --locale='en_US.UTF-8' mydb createdb: database creation failed: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (C) HINT: Use the same collation as in the template database, or use template0 as template. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Changing collate & ctype for an existing database
Hi there. We have a working database that was unfortunately created by initdb with default ("C") collation & ctype. All other locale specific settings have the value en_US.UTF-8 in postgresql.conf. The database itself is multilingual and all its data is stored in UTF-8. Sorting doesn't work correctly, though. To fix that, can I just do this: update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8' where datname='mydb'; This does seem to work on a testing copy of the database, i.e. select lower('БлаБлаБла') now works correctly when connected to that database. Is there still any chance for corrupting data by doing this, or indexes stopping working etc? p.s.: postgres 9.6.3 Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Fractions of seconds in timestamps
On 04/24/2012 07:51 PM, rihad wrote: As PostgreSQL stores timestamps with a fractional part, does it mean that WHERE f BETWEEN '2012-04-23 00:00:00' AND '2012'04-23 23:59:59' might miss records with values of f equal to 23:59:59.1234 or so? Answering to myself: depends on how timestamp was defined at table creation time time, timestamp, and interval accept an optional precision value p which specifies the number of fractional digits retained in the seconds field. By default, there is no explicit bound on precision. The allowed range of p is from 0 to 6 for the timestamp and interval types.. As the columns in question were explicitly created as timestamp(0) in the database, they store no fractional part, so no matter at which moment during the 59-th second the column is updated, it's always exactly 59-th. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Fractions of seconds in timestamps
As PostgreSQL stores timestamps with a fractional part, does it mean that WHERE f BETWEEN '2012-04-23 00:00:00' AND '2012'04-23 23:59:59' might miss records with values of f equal to 23:59:59.1234 or so? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Are pg_xlog/* fiels necessary for PITR?
Hi, I'm backing up the entire server directory from time to time. pg_xlog/ directory containing WAL files is pretty heavy (wal_level=archive). Can I exclude it from the regular tar archive? #!/bin/sh renice 20 $$ 2>/dev/null pgsql -U pgsql -q -c "CHECKPOINT" postgres # speed up pg_start_backup() pgsql -U pgsql -q -c "select pg_start_backup('sol')" postgres tar -cjf - /db 2>/dev/null | ssh -q -i ~pgsql/.ssh/id_rsa -p 2022 -c blowfish dbarchive@10.0.0.1 'cat > db.tbz' pgsql -U pgsql -q -c "select pg_stop_backup()" postgres sleep 60 #wait for new WAL backups to appear echo 'ssh -q dbarchive@10.0.0.1 ./post-backup.sh' | su -m pgsql I want to change tar invocation to be: tar -cjf --exclude 'db/pg_xlog/*' ... Will there be enough data in case of recovery? (May God forbid... ))) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Named advisory locks
On 04/06/2011 12:20 AM, Vick Khera wrote: On Tue, Apr 5, 2011 at 2:49 PM, rihad mailto:ri...@mail.ru>> wrote: Can't do that, because I'm simply using some table's serial value as the lock ID, which is itself a bigint. So you assigned the entire namespace to the other purpose seems to be programmer's bad planning :( Better programmers have invented refactoring ;-) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Named advisory locks
On Tue, Apr 5, 2011 at 10:35 AM, rihad wrote: No, what I meant was that we're already using ints for a different purpose in another app on the same server, so I cannot safely reuse them. Aren't advisory lock ID's unique across the whole server? The sole purpose of the string ID is to be able to supply an initial namespace prefix ("foo.NNN") so NNN wouldn't clash in different subsystems of the app. MySQL is pretty convenient in this regard. Now I think it would be easier for me to work around this Postgres limitation by simply LOCKing on some table (maybe one created specifically as something to lock on to) instead of using pg_advisory_lock explicitly. so if you have a namespace problem, solve that. the range of integers is quite large. just assign a range to each application so they don't clash. Can't do that, because I'm simply using some table's serial value as the lock ID, which is itself a bigint. The workaround of LOCKing on a table looks fine to me. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Named advisory locks
On 04/05/2011 08:29 PM, Ben Chobot wrote: On Apr 5, 2011, at 7:35 AM, rihad wrote: No, what I meant was that we're already using ints for a different purpose in another app on the same server, so I cannot safely reuse them. Aren't advisory lock ID's unique across the whole server? The sole purpose of the string ID is to be able to supply an initial namespace prefix ("foo.NNN") so NNN wouldn't clash in different subsystems of the app. MySQL is pretty convenient in this regard. Now I think it would be easier for me to work around this Postgres limitation by simply LOCKing on some table (maybe one created specifically as something to lock on to) instead of using pg_advisory_lock explicitly. Simply locking tables might be easy, but probably won't be optimal. Why are you using advisory locks at all? They certainly have their place, but they can also be an overused crutch, especially for people less familiar with MVCC. . We're using advisory locks to limit access to an external shared resource. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Named advisory locks
On 5/04/2011 5:42 PM, rihad wrote: Hi, all. I'm looking for a way to lock on an arbitrary string, just how MySQL allows doing with GET_LOCK('foo.123') / RELEASE_LOCK('foo.123'). I know that at least Postgres 8.3 has pg_advisory_lock() / pg_advisory_unlock() but they seem to accept integer values only, and we're already using integer values elsewhere. Already using _string_ values elsewhere? No, what I meant was that we're already using ints for a different purpose in another app on the same server, so I cannot safely reuse them. Aren't advisory lock ID's unique across the whole server? The sole purpose of the string ID is to be able to supply an initial namespace prefix ("foo.NNN") so NNN wouldn't clash in different subsystems of the app. MySQL is pretty convenient in this regard. Now I think it would be easier for me to work around this Postgres limitation by simply LOCKing on some table (maybe one created specifically as something to lock on to) instead of using pg_advisory_lock explicitly. Alas, I don't know of any way to use string based advisory locks directly. You could store a mapping of lock strings to allocated ints in your app or in the DB. Alternately, you could maybe use the full 64 bits of the single-argument form locks to pack in the initial chars of the lock ID strings if they're short. If you can cheat and require that lock identifiers contain only the "base 64" characters - or even less - you can pack 10 or more characters into the 64 bits rather than the 8 chars you'd get with one byte per char. Of course, you can't do that if your strings are in any way user-supplied or user-visible because you can't support non-ascii charsets when doing ugly things like that. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Named advisory locks
Hi, all. I'm looking for a way to lock on an arbitrary string, just how MySQL allows doing with GET_LOCK('foo.123') / RELEASE_LOCK('foo.123'). I know that at least Postgres 8.3 has pg_advisory_lock() / pg_advisory_unlock() but they seem to accept integer values only, and we're already using integer values elsewhere. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] UPDATE ... RETURNING atomicity
On 05/24/2010 01:29 AM, Grzegorz Jaśkiewicz wrote: don't lock tables explicitly. That's a killer for (concurrent) performance. Just write queries properly, and use appropriate transaction level. And you are sorted. Read Committed is fine, as long as I restart the UPDATE query RETURNING nothing. The specifics of our app allow retrying the said query a few times and if it still did not get the id (like during the improbable total ID exhaustion), then pass through, this is considered a tolerable soft error. I suspect retrying just a single query is less expensive than retrying the failed serializable transaction, which is more heavy-weight in nature (and in practice). BTW, regarding your comment on avoiding to use explicit LOCKs: in one place which wasn't speed-sensitive I had to use the strictest LOCK mode because otherwise deadlocks occurred from time to time. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] UPDATE ... RETURNING atomicity
On 05/23/2010 08:19 PM, Tom Lane wrote: =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= writes: find in docs part that talks about transaction isolation levels, and translate it to your problem. Yes, please read the fine manual: http://www.postgresql.org/docs/8.4/static/mvcc.html What I think will happen in your example is that all concurrent executions will locate the same row-to-be-updated. The first one to get to the row "wins" and updates the row. All the rest will fail, either updating no rows (if not serializable) or throwing an error (if serializable). OK, thank you both, I had hoped that UPDATE would take a table level lock before running the inner select. But then I read that the type of locking done by UPDATE never conflicts with other such locks, so the queries would still run concurrently. We're running the default Read Commited mode. It's no problem for me to rewrite the Perl DBI query to check the return value and loop until it does get something. Which would have better performance: that, or an explicit LOCK on the table before the UPDATE ... SELECT? The transaction is committed shortly after, with no other queries in between. Thank you. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] UPDATE ... RETURNING atomicity
On 05/23/2010 03:15 PM, Grzegorz Jaśkiewicz wrote: every single query in postrgresql runs as a transaction, on top of it, some are atomic, like when you use RETURNING statement. This is because postgresql doesn't actually have to select these rows as separate query. Please note the cooperation of the UPDATE and the inner sub-SELECT query, which was my point. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] UPDATE ... RETURNING atomicity
Hello, In this query: UPDATE foo SET allocated_to=? WHERE id=(SELECT MIN(id) FROM foo WHERE allocated_to IS NULL) AND allocated_to IS NULL RETURNING id Is it guaranteed in any way that there will only be one id allocated and returned even if multiple clients are executing this query concurrently? Or is there a possibility that some other client executing this query (or another query modifying allocated_to) might set allocated_to to non-NULL and commit right after the inner select finds it as NULL, so the outer "AND allocated_to IS NULL" will no longer be true, and the outer query will return nothing? Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partial foreign keys
rihad wrote: Due to lack of support for partial (conditional) multi-column foreign keys in 8.3, can before-triggers be used to implement them in terms of data consistency and speed? Let me clarify the question in semi-pseudo-SQL: table foo { bar_id int not null; baz_id int not null; flag bool; key (bar_id, baz_id) references (bar.id, bar.baz_id); } table bar { id int primary key; baz_id int not null; } I want the effects of the above foo.key in every sense, but only for entries having foo.flag=true. So I think I'll write before-statement triggers to do just that instead of the key. But is data consistency still guaranteed as the foreign key in foo would otherwise do? What if, after the first query trigger checked to see that (foo.bar_id, foo.baz_id) multikey exists in bar, another query modifies bar in between, and the first query ends up with the wrong assumption? Similar problem goes for the table bar's trigger checking that nothing in foo still refers to the old column tuple. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Partial foreign keys
Hello, Due to lack of support for partial (conditional) multi-column foreign keys in 8.3, can before-triggers be used to implement them in terms of data consistency and speed? Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index speed and failed expectations?
Adam Rich wrote: This query from the console: select * from stats order by start_time; takes 8 seconds before starting its output. Am I wrong in assuming that the index on start_time should make ORDER BY orders of magnitude faster? Or is this already fast enough? Or should I max up some memory (buffer) setting to achieve greater speeds? Not that the speed is crucial, just curious. Postgresql won't use the index for queries like this. Due to the MVCC implementation, the index does not contain all necessary information and would therefore be slower than using the table data alone. (What postgresql lacks is a first_row/all_rows hint like oracle) However, if you limit the number of rows enough, you might force it to use an index: select * from stats order by start_time limit 1000; Thanks! Since LIMIT/OFFSET is the typical usage pattern for a paginated data set accessed from the Web (which is my case), it immediately becomes a non-issue. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] index speed and failed expectations?
sol=> \d stats; Table "public.stats" Column| Type | Modifiers --++--- id | integer| not null start_time | timestamp(0) without time zone | not null ... Indexes: "stats_start_time_idx" btree (start_time) "stats_id_key" btree (id) There are roughly half a million rows. This query from the console: select * from stats order by start_time; takes 8 seconds before starting its output. Am I wrong in assuming that the index on start_time should make ORDER BY orders of magnitude faster? Or is this already fast enough? Or should I max up some memory (buffer) setting to achieve greater speeds? Not that the speed is crucial, just curious. TIA. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] deadlock
Scott Marlowe wrote: Sure, but you have to trap that all the time. The solution using a cycling sequence keeps you from ever seeing that (unless you managed to check out all 9,999 other values while still getting the current one. No locking needed, dozens of updaters running concurrently and no need to track update errors. Yep, that does sound like it'd be nicer, at least if locks are becoming free at a reasonable rate (ie you don't have to step through most of the table to find a free lock). I was working on the probably mistaken assumption that the OP wanted the "next" / "first" available slot, not any free slot. If there are very few free locks at any given time I have the feeling the sequence approach could spend a lot of time just scanning through the table looking for free entries. Then again, using an aggregate subquery is far from free either, and it's a whole lot nicer to just repeat one statement until it succeeds rather than retrying the whole transaction if it conflicts with another (which will happen often if there's really high demand for locks). In fact, both transactions trying to grab the lowest free lock is practically a recipe for serialization failures, making it even less attractive. With only two concurrent connections it'd work OK if one used min() and the other used max() ... but add another couple and you're in trouble. The serial based approach sounds a fair bit better. Serial access to the firewall is what I'm now emulating with "LOCK bw_pool" as the first statement in the transaction. AFAIK Postgres' serializable transactions give you decent parallelism at the price of expecting you to retry them due to serialization errors, and thus cannot be relied upon for doing actions on a shared external resource (like manipulating the firewall) after having LOCKed some table. It's a pity there's no way to let go of the lock as soon as possible, only implicitly at the end of the transaction. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] deadlock
rihad wrote: Given this type query: UPDATE bw_pool SET user_id=? WHERE bw_id= (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) RETURNING bw_id Can you use a SERIALIZABLE transaction and avoid the explicit lock? Not really. Since LOCKing bw_pool backs up later firewall manipulation (of which there's one) I'm not really prepared to restart transactions due to deadlocks. It's easier for me to prevent deadlocks altogether by carefully stacking queries according to the level of lock restrictiveness, albeit at a price that the whole transaction will be single threaded, even parts of it that don't need it. I was indeed willing to exclusively lock only as little code as possible (fine-grained locking), but neglected the importance of the locking-type order. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] deadlock
rihad wrote: Hi, I've come across a strange deadlock that I need your help with. There are two copies of the same Perl daemon running on a 2 cpu box. The program is pretty simple (because I wrote it :)) so I can trace its pathway fairly well: in it, there's a single "LOCK table foo" occurring part way through a transaction that sometimes ends up as this: DETAIL: Process 91376 waits for AccessExclusiveLock on relation 16488 of database 16386; blocked by process 92387. Process 92387 waits for AccessExclusiveLock on relation 16488 of database 16386; blocked by process 91376. If there are only two processes, and each is waiting for an ACCESS EXCLUSIVE lock on the same relation and being blocked by the other one, then presumably both have weaker locks that conflict with ACCESS EXCLUSIVE on that relation. Process 1 can't proceed with the ACCESS EXCLUSIVE lock because process 2 has a lesser lock on the table. Process 2 can't proceed with the ACCESS EXCLUSIVE lock because process 1 has a lesser lock on the table. Deadlock. I don't see any other way the situation could arise, but I'm *very* far from an expert. Indeed, there is one SELECT and, conditionally, one UPDATE before the exclusive LOCK, on the table. I've re-read the manual, particularly this line: "One should also ensure that the first lock acquired on an object in a transaction is the highest mode that will be needed for that object." Since SELECT & UPDATE come before LOCK on bw_pool, the bug is obvious. Sadly I can't use any other locking as I need exclusive access to OS's firewall after getting bw_id. Well, I thought I'd move LOCK further away inside the transaction to better mimic fine-grained locking. So one solution is to move it back to the beginning. Thank you! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] choosing the right locking mode
Scott Marlowe wrote: On Thu, Apr 3, 2008 at 10:44 AM, rihad <[EMAIL PROTECTED]> wrote: Given this type query: UPDATE bw_pool SET user_id=? WHERE bw_id= (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) RETURNING bw_id The idea is to "single-threadedly" get at the next available empty slot, no matter how many such queries run in parallel. So far I've been semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it deadlocks sometimes. Maybe I could use some less restrictive locking mode and prevent possible collisions at the same time? So, is there some reason a sequence won't work here? bw_pool is pre-filled with 10 thousand rows of increasing bw_id, each of which is either set (user_id IS NOT NULL) or empty (user_id IS NULL). The state of each can change any time. If you've got a requirement for a no-gap id field, there are other, less locky-ish ways to do it. Locking the table doesn't scale, and that's likely what problem you're seeing. There's a shared resource backed by bw_pool that I absolutely need single-threaded access to, despite multiple cpus, hence an all-exclusive lock (or?..) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] choosing the right locking mode
Given this type query: UPDATE bw_pool SET user_id=? WHERE bw_id= (SELECT MIN(bw_id) FROM bw_pool WHERE user_id IS NULL) RETURNING bw_id The idea is to "single-threadedly" get at the next available empty slot, no matter how many such queries run in parallel. So far I've been semi-successfully using LOCK TABLE bw_pool before the UPDATE, but it deadlocks sometimes. Maybe I could use some less restrictive locking mode and prevent possible collisions at the same time? Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] deadlock
Hi, I've come across a strange deadlock that I need your help with. There are two copies of the same Perl daemon running on a 2 cpu box. The program is pretty simple (because I wrote it :)) so I can trace its pathway fairly well: in it, there's a single "LOCK table foo" occurring part way through a transaction that sometimes ends up as this: DETAIL: Process 91376 waits for AccessExclusiveLock on relation 16488 of database 16386; blocked by process 92387. Process 92387 waits for AccessExclusiveLock on relation 16488 of database 16386; blocked by process 91376. After the exclusive lock, there is also exactly one SELECT, and then one UPDATE query involving table foo, among others, doing their usual implicit locking on it. I've read in the manuals that it's okay to stack locks this way as long as the more restrictive locks precede less restrictive ones. Mind you, there may be many requests per second, and some of them can and will happen at the same wall clock time due to 2 cpus at work. Can locking break under these circumstances? I'd rather opt for an educated solution to this, than having to check and restart the query. PostgreSQL 8.3.1 FreeBSD 7.0 p5-DBI-1.60.1 p5-DBD-Pg-1.49 Thanks for any tips. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] help optimizing query
Hi all, The situation: there are users in one table, and their access statistics in the other. Now I want to find users whose last access time was more than one month ago. As I've only had to write quite simple queries involving no sub-selects so far, I'd like to ask your opinion if this one scales at all or not. SELECT u.login,last_use_time FROM users u JOIN (SELECT user_id, MAX(stop_time) AS last_use_time FROM stats GROUP BY user_id) AS s ON (u.id=s.user_id) WHERE status='3' AND next_plan_id IS NULL AND last_use_time < now() - interval '1 month' ORDER BY last_use_time; It seems to do the job, but how good is it in the long run? Any way I could tweak it? Thanks. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Upgrading from 8.3RC2 to release
Hi, Should the usual dump/restore cycle be performed during the upgrade on FreeBSD? Any minor backward-incompatible changes one should be aware of? Thank you for the best work. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Question for Postgres 8.3
If you want to support multiple encodings, the only safe locale choice is (and always has been) C. I should be ashamed for asking this, but would someone care to tell me how encoding differs from locale? My postgresql FreeBSD rcNG script reads: postgresql_initdb_flags=${postgresql_initdb_flags:-"--encoding=utf-8 --lc-collate=C"} As I understand it collation is part of locale, but encoding is "something else"? Thanks. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] need help optimizing query
Hi all, The situation: there are users in one table, and their access statistics in the other. Now I want to find users whose last access time was more than one month ago. As I've only had to write quite simple queries involving no sub-selects so far, I'd like to ask your opinion if this one scales at all or not. SELECT u.login,last_use_time FROM users u JOIN (SELECT user_id, MAX(stop_time) AS last_use_time FROM stats GROUP BY user_id) AS s ON (u.id=s.user_id) WHERE status='3' AND next_plan_id IS NULL AND last_use_time < now() - interval '1 month' ORDER BY last_use_time; It seems to do the job, but how good is it in the long run? Any way I could tweak it? Thanks. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] need to dump/restore data for 8.3beta2 -> 8.3RC1 upgrade?
Hi, all! Subj, as is. Any other pitfalls I should be aware of? Please check the release notes for this information. Could you please be more specific? I can't find the steps to jump from 8.3beta2 to 8.3rc1, only from 8.2 to 8.3. http://www.postgresql.org/docs/8.3/static/release-8-3.html ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] need to dump/restore data for 8.3beta2 -> 8.3RC1 upgrade?
Hi, all! Subj, as is. Any other pitfalls I should be aware of? Thanks. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] logging arguments to prepared statements?
Ted Byers wrote: --- rihad <[EMAIL PROTECTED]> wrote: Dec 18 15:49:41 myhost postgres[29832]: [35-1] ERROR: 23505: duplicate key value violates unique constraint "foo_key" Dec 18 15:49:41 myhost postgres[29832]: [35-4] INSERT INTO foo Dec 18 15:49:41 myhost postgres[29832]: [35-5] (a,b,c) Dec 18 15:49:41 myhost postgres[29832]: [35-7] VALUES ($1,$2,$3) Dec 18 15:49:41 myhost postgres[29832]: [35-8] And that's it, leaving me wondering which value triggered the error. Any Why? It seems simple enough. You have a table called foo, with at least three columns: a, b, and c. And you have a violation of your unique constraint. If it I was wondering if there was a way to see the _values_ themselves in case of errors, as is possible with log_statements=all, without turning it on. Apparently there isn't. Thanks anyway. isn't that simple, you have left out useful information. You did not say, for example, which of your columns, if any, are involved in your unique constraint. If the answer to that is none, then you need to show how the constraint is defined. Which of the three columns are involved in a unique constraint? If none of the columns you use are involved in a unique constraint, there must be other columns that are, and that would imply that there is either a problem with your prepared statement, ignoring certain columns that can't be ignored, or a problem with how you set up the default values for another column that is involved in a unique constraint; or the table has grown so big that it is impossible to add a new record without violating the existing unique constraint (unlikely as that is in most cases, especially during development). I could see creating a before insert trigger that stores the values to be inserted in a log table with a timestamp, but I don't see the profit in that. Doesn't such an error generate a SQL exception to your client? If so, the client code will know immediately what insert attempt failed, and therefore what values are involved in the problem. Using JDBC, for example, all of the JDBC functions that execute a prepared statement (or any other SQL) will throw a java.sql.SQLException. One therefore knows immediately when there is a problem of the sort you describe, and so you can determine quickly what the values were that resulting in your error. If need be, that could be stored in your application's log. If one needed full audit functionality, one could create the tables to store the details of every SQL statement, including who is responsible for the statement and a timestamp. But if you don't need to support that kind of detailed audit, why bother when there are easier ways to address your issue? HTH Ted ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] logging arguments to prepared statements?
Dec 18 15:49:41 myhost postgres[29832]: [35-1] ERROR: 23505: duplicate key value violates unique constraint "foo_key" Dec 18 15:49:41 myhost postgres[29832]: [35-4] INSERT INTO foo Dec 18 15:49:41 myhost postgres[29832]: [35-5](a,b,c) Dec 18 15:49:41 myhost postgres[29832]: [35-7] VALUES ($1,$2,$3) Dec 18 15:49:41 myhost postgres[29832]: [35-8] And that's it, leaving me wondering which value triggered the error. Any way to tweak postgres to include the values too, without setting log_statements=all? changed log settings: log_destination = 'syslog' log_error_verbosity = verbose log_min_error_statement = notice log_checkpoints = on log_connections = on log_disconnections = on log_lock_waits = on log_statements = 'none' log_temp_files = 0 log_autovacuum_min_duration = 250 other log_* settings kept as default (commented). PostgreSQL 8.3-beta2 (FreeBSD port is lagging behind a bit). Thanks. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] increasing checkpoint_timeout?
http://www.postgresql.org/docs/8.3/static/wal-configuration.html Is it right that checkpoint_timeout means the amount of time up to which you agree to lose data in the event of a power crash? What if I set it to 1 hour (and bump checkpoint_segments accordingly), does it mean that I'm willing to lose up to 1 hour of data? I'm thinking about increasing checkpoint_timeout to mitigate the full_page_writes bloat. BTW how are transactions WAL logged? Do the logs include data too? In this case, am I right that the effects of full_page_writes=on serve as a starting data page on top of which to replay transactions when doing crash recovery? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] TIMESTAMP difference
Hi, is there a way to get the difference in hours between two timestamps? The HH{1,}:MM:SS format will do. foo=> select timestamp '20071211 00:00' - timestamp '20071210 00:01'; ?column? -- 23:59:00 (1 row) foo=> select timestamp '20071211 00:00' - timestamp '20071209 01:00'; ?column? 1 day 23:00:00 (1 row) Any way to make it return "47:00:00" instead? select interval '47:00:00' is still a legal interval as far as postgresql goes. 8.3-beta2 (can't get to beta4: freebsd ports are yet frozen :( Thanks. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Conservative postgresql.conf made by initdb?
In postgresql.conf generated by initdb shared_buffers is set to 32MB even though there was more available (see below; also ipcs shows postgres is (and will be) the only shared memory user). Is this enough or maybe it's less than ok? I don't know. What I do know is that MySQL 4.0.x uses 500-550 mb RAM for similar access patterns. Also, max_connections preset to 40 is more than ok for our case. Is max_fsm_pages = 204800 OK ? Should I increase it if I expect 15-20 gb disk usage by db? FreeBSD 7.0-BETA3 PostgreSQL 8.3 beta2 sysctl settings at the time initdb ran more or less resembled the recommendations given here: http://www.postgresql.org/docs/8.3/static/kernel-resources.html#SYSVIPC namely: kern.ipc.shmmax: 134217728 kern.ipc.shmmin: 1 kern.ipc.shmall: 32768 kern.ipc.shmseg: 128 kern.ipc.shmmni: 192 kern.ipc.semmns: 60 kern.ipc.semmsl: 60 kern.ipc.semmap: 256 kern.ipc.semvmx: 32767 Maybe I should increase shmall/shmmax and rerun initdb to maximize performance of expectedly 25-30 concurrent connections, which are "persistent" and hence many of which are idle at the low hours? Thanks. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] full_page_writes = off?
Hi, would full_page_writes=off increase the risk of inconsistency or db corruption in 8.3 and FreeBSD 7? fsync = on; Definitely "on", as single power outage after three years is guaranteed to make your life interesting otherwise. synchronous_commit = off; Can be "off" in my case as I'm not doing any "external" actions based on transaction's success or failure (if I understand the docs correctly). So I don't care at which moment the evil strikes as long as the db is consistent. full_page_writes = off; AFAIK when this is "on" it means synchronous WAL writing with less impact, as it occurs once per checkpoint, but it's still synchronous. Not sure at all about this one. How would FreeBSD 7's UFS survive the power crash etc. with this set to "off"? OTOH, does "on" play well with synchronous_commit=off? Meaning, will the transaction holder get success immediately on commit, still guaranteeing consistency? Thanks for any insights or clarifying missed points. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] FK index q'n
Given this table: CREATE TABLE foo ( id integer primary key, bar_id integer references bar (id) ); and provided that bar.id is itself a PK, do I still need to create an index on bar_id if often doing queries like: SELECT MIN(id) FROM foo WHERE bar_id IS NULL; Table foo will contain a static number of rows (from 2,000 to 10,000 -- yet undecided) only doing SELECT & UPDATE. Thanks. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] any way for ORDER BY x to imply NULLS FIRST in 8.3?
Scott Marlowe wrote: On Nov 9, 2007 5:17 AM, rihad <[EMAIL PROTECTED]> wrote: Em Wednesday 07 November 2007 13:54:32 rihad escreveu: May I, as an outsider, comment? :) I really think of ASC NULLS FIRST (and DESC NULLS LAST) as the way to go. Imagine a last_login column that sorts users that have not logged in as the most recently logged in, which is not very intuitive. I vote for sort_nulls_first defaulting to false in order not to break bc. But then, when ordering by login date, you should use COALESCE and infinity for them (http://www.postgresql.org/docs/8.2/interactive/datatype-datetime.html). It's not an easy thing to do with for example Propel 1.2 ORM (written in PHP): $criteria->addDescendingOrderByColumn(myPeer::LAST_LOGIN); // no place to shove database-specific attributes in. Why not create an updatable view that orders the way you want it to? I've already thought about this, but... there aren't yet truly updatable views in PostgreSQL, but rather query rewriting a.k.a. text-substitution; 1) it isn't of paramount importance in my current case. It just wouldn't be bad to set sort_nulls_first=true, if it existed. If you wan't to know the full story, prepare for some OT :-) Because of the way Symfony/Propel does its "object hydration" has already forced me to write views in postgres to minimize the amount of data fetched: otherwise Propel is happy to fetch full-records from db, and all its FK-related objects, too (the lazyLoad misfeature is a two-sided gun: it pretends it fetches only this many columns for each row, but if you later access further columns each one will cost you a separate database hit), all of which is unacceptable for e.g. displaying a HTML table of N items with pagination etc. Symfony/Propel is also quite happy to hydrate the full object from db just for save()'ing it back to db right away (on a form POST for a record update, for example), which makes my brain hurt, so I went to the trouble of avoiding the pre-hydration, too. This all resulted in much effort not directly related to the business logic of my app, but rather on overriding Symfony's way of doing everyday web-programming tasks (like form validation, results listing, editing). Now I don't really want to work around further design inefficiencies of Symfony/Propel by trying updatable views. Really frustrating. Easier to just forgo any web-framework and write quality code yourself instead, like phpclasses.org's Manuel Lemos once said in his article... That said, Symfony 1.1-DEV/Doctrine begin to look promising. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] strange infinite loop in plpgsql
rihad <[EMAIL PROTECTED]> writes: LOOP SELECT date+1 INTO day FROM days WHERE date=day OR EXTRACT(dow FROM day) IN (0,6); EXIT WHEN NOT FOUND; timeout := timeout + 86400; END LOOP; If the EXTRACT condition is true, then the SELECT will always succeed. Isn't the new "day" re-evaluated on every loop iteration? I'm totally confused. It's morning again over at our place, so... of course it's not! I stupidly did a SELECT date+1 instead of SELECT day+1, resulting in an infinite loop. A hard to spot bug, especially when your eyes are half closed. Good thing is that the solution came to me while I was still in bed a minute ago :-) But your idea later on of separating EXTRACT outside the disk-touching code might be a nice cpu-memory trade-off that I hadn't thought of (or wasn't aware). It turns out any SQL "thing" can be in an IF or WHILE etc. ? I'll go read the docs more attentively. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] strange infinite loop in plpgsql
Tom Lane wrote: rihad <[EMAIL PROTECTED]> writes: LOOP SELECT date+1 INTO day FROM days WHERE date=day OR EXTRACT(dow FROM day) IN (0,6); EXIT WHEN NOT FOUND; timeout := timeout + 86400; END LOOP; If the EXTRACT condition is true, then the SELECT will always succeed. Isn't the new "day" re-evaluated on every loop iteration? I'm totally confused. This code will get even more whacko once you have more than one row in "days", because it'll pick a random one of the rows in that case (in practice, the physically first one). I think you need something more like LOOP IF EXTRACT(dow FROM day) IN (0,6) THEN -- don't bother to consult table on weekends day := day + 1; ELSE SELECT date+1 INTO day FROM days WHERE date=day; EXIT WHEN NOT FOUND; END IF; timeout := timeout + 86400; END LOOP; BTW, you forgot to initialize "timeout". Sorry, I hand-cooked this fast from the working code. I guess it defaults to NULL instead of "random bits", which of course wouldn't save me either, but the real (somewhat bigger) code eventually does RETURN LEAST(timeout, expiration_timeout); skipping any nulls. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] strange infinite loop in plpgsql
I've been reading the online docs, but... code like this somehow ends up in an indefinite loop: CREATE OR REPLACE FUNCTION foo() RETURNS int AS $$ DECLARE timeout int; day date; BEGIN day := current_date + 1; LOOP SELECT date+1 INTO day FROM days WHERE date=day OR EXTRACT(dow FROM day) IN (0,6); EXIT WHEN NOT FOUND; timeout := timeout + 86400; END LOOP; END; $$ LANGUAGE plpgsql; It's Saturday at our place, and the "days" table has only one record for tomorrow's date. I hope it's been a very very long day for me :-) Thank you for your help. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] any way for ORDER BY x to imply NULLS FIRST in 8.3?
Em Wednesday 07 November 2007 13:54:32 rihad escreveu: May I, as an outsider, comment? :) I really think of ASC NULLS FIRST (and DESC NULLS LAST) as the way to go. Imagine a last_login column that sorts users that have not logged in as the most recently logged in, which is not very intuitive. I vote for sort_nulls_first defaulting to false in order not to break bc. But then, when ordering by login date, you should use COALESCE and infinity for them (http://www.postgresql.org/docs/8.2/interactive/datatype-datetime.html). It's not an easy thing to do with for example Propel 1.2 ORM (written in PHP): $criteria->addDescendingOrderByColumn(myPeer::LAST_LOGIN); // no place to shove database-specific attributes in. which was my main point. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] prepared statements suboptimal?
Tom Lane wrote: rihad <[EMAIL PROTECTED]> writes: I don't understand why postgres couldn't plan this: SELECT foo.e, foo.f FROM foo WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3; to be later executed any slower than SELECT foo.e, foo.f FROM foo WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15'; The reason is that without knowing the parameter values, the planner has to pick a "generic" plan that will hopefully not be too awful regardless of what the actual values end up being. When it has the actual values it can make much tighter estimates of the number of matching rows, and possibly choose a much better but special-purpose plan. As an example, if the available indexes are on b and c then the best query plan for the first case is probably bitmap indexscan on b. But in the second case, the planner might be able to determine (by consulting the ANALYZE stats) that there are many rows matching b='13' but very few rows with c <= '2007-11-20 13:14:15', so for those specific parameter values an indexscan on c would be better. It would be folly to choose that as the generic plan, though, since on the average a one-sided inequality on c could be expected to not be very selective at all. Aha, thanks for a thorough explanation. Now I understand that while looking for a way to fulfill the query postgres will try hard to pick the one requiring the least number of rows visits. I've skimmed over my queries: almost all of them make use of the primary key as the first thing in the WHERE clause (say, a username, which is the only pk in the table): shouldn't that be enough for postgres to *always* decide to scan the pk's index (since a query on a pk always returns either one or zero results)? Same question for any number of joins where bar.id or baz.id is always aPK: select ... from foo JOIN bar ON(foo.bar_id=bar.id) JOIN baz ON(foo.baz_id=baz.id) WHERE asd=? AND dsa=?; ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] any way for ORDER BY x to imply NULLS FIRST in 8.3?
Simon Riggs wrote: On Wed, 2007-11-07 at 16:05 +0100, Martijn van Oosterhout wrote: On Wed, Nov 07, 2007 at 02:37:41PM +, Simon Riggs wrote: Editing an application, you would be required to add the words NULLS FIRST to every single ORDER BY and every single CREATE INDEX in an application. If we know that is what people would do, why not have one parameter to do this for them? I find it hard to beleive that every single query in an application depends on the ordering of NULLs. In fact, I don't think I've even written a query that depended on a particular way of sorting NULLs. Is it really that big a deal? True, but how would you know for certain? You'd need to examine each query to be able to tell, which would take even longer. Or would you not bother, catch a few errors in test and then wait for the application to break in random ways when a NULL is added later? I guess that's what most people do, if they do convert. I'd like to remove one difficult barrier to Postgres adoption. We just need some opinions from people who *havent* converted to Postgres, which I admit is difficult cos they're not listening. May I, as an outsider, comment? :) I really think of ASC NULLS FIRST (and DESC NULLS LAST) as the way to go. Imagine a last_login column that sorts users that have not logged in as the most recently logged in, which is not very intuitive. I vote for sort_nulls_first defaulting to false in order not to break bc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] prepared statements suboptimal?
rihad wrote: Hi, I'm planning to use prepared statements of indefinite lifetime in a daemon that will execute same statements rather frequently in reply to client requests. This link: http://www.postgresql.org/docs/8.3/static/sql-prepare.html has a note on performance: In some situations, the query plan produced for a prepared statement will be inferior to the query plan that would have been chosen if the statement had been submitted and executed normally. This is because when the statement is planned and the planner attempts to determine the optimal query plan, the actual values of any parameters specified in the statement are unavailable. PostgreSQL collects statistics on the distribution of data in the table, and can use constant values in a statement to make guesses about the likely result of executing the statement. Since this data is unavailable when planning prepared statements with parameters, the chosen plan might be suboptimal. I don't understand why postgres couldn't plan this: SELECT foo.e, foo.f FROM foo WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3; to be later executed any slower than SELECT foo.e, foo.f FROM foo WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15'; Can I help it make more educated guesses? In what scenarios could prepared statements turn around and bite me, being slower than simple queries? Is this a real problem in practice? Should I "refresh" prepared statements from time to time? If so, how? Only by deallocating them and preparing anew? Any knob to tweak for that? Okay, enough questions :) Thank you for any insights. From http://www.postgresql.org/docs/8.3/static/protocol-flow.html I just read that "This possible penalty is avoided when using the unnamed statement, since it is not planned until actual parameter values are available." Since I'm using Perl's DBI/pg, in postmaster's logs I can see that DBI's prepare() seems to using named prepared statements: Nov 7 15:57:46 sol postgres[1685]: [2-1] LOG: execute dbdpg_1: Nov 7 15:57:46 sol postgres[1685]: [2-2] SELECT ... is there any way to tell it to use unnamed prepared statements? I understand this is not a strictly PostgreSQL question so sorry if I'm off the topic. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] prepared statements suboptimal?
Hi, I'm planning to use prepared statements of indefinite lifetime in a daemon that will execute same statements rather frequently in reply to client requests. This link: http://www.postgresql.org/docs/8.3/static/sql-prepare.html has a note on performance: In some situations, the query plan produced for a prepared statement will be inferior to the query plan that would have been chosen if the statement had been submitted and executed normally. This is because when the statement is planned and the planner attempts to determine the optimal query plan, the actual values of any parameters specified in the statement are unavailable. PostgreSQL collects statistics on the distribution of data in the table, and can use constant values in a statement to make guesses about the likely result of executing the statement. Since this data is unavailable when planning prepared statements with parameters, the chosen plan might be suboptimal. I don't understand why postgres couldn't plan this: SELECT foo.e, foo.f FROM foo WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3; to be later executed any slower than SELECT foo.e, foo.f FROM foo WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15'; Can I help it make more educated guesses? In what scenarios could prepared statements turn around and bite me, being slower than simple queries? Is this a real problem in practice? Should I "refresh" prepared statements from time to time? If so, how? Only by deallocating them and preparing anew? Any knob to tweak for that? Okay, enough questions :) Thank you for any insights. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] any way for ORDER BY x to imply NULLS FIRST in 8.3?
Is there any way to "hardcode" the NULL handling in an index (as per http://www.postgresql.org/docs/8.3/static/indexes-ordering.html) so that SELECT * FROM t ORDER BY foo automatically implies NULLS FIRST (and, similarly so that SELECT * FROM t ORDER BY foo DESC automatically implies NULLS LAST)? Thing is, I'm using PHP Symfony/Propel to generate their SQL and have no easy way to influence how they do so. Thanks. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Would an index benefit select ... order by?
You mean Postgres wouldn't *always* use created_at's index with such access patterns on a big table (even if one exists): select * from foo order by created_at desc; No, it wouldn't necessarily, and that's a good thing. A full-table indexscan can often be slower than a sort because of inefficient disk access patterns. The planner will estimate the cost of each possibility and pick the one that looks cheaper. What if it's really a limited select: select * from foo order by created_at desc limit ; because this is what I meant initially (sorry), would Postgres always use index to get at sorted created_at values, so I don't *have* to create the index? I think maintaining the index has its own penalty so in my upcoming project I'm evaluating the option of skipping defining one entirely unless absolutely necessary. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Would an index benefit select ... order by?
Should an index be used on a created_at timestamp column if you know you will be using "ORDER BY created_at ASC|DESC" from time to time? Yes. Thanks. This is stated explicitly in 8.3 docs (as opposed to 8.2) http://www.postgresql.org/docs/8.3/static/indexes-ordering.html And you should use EXPLAIN. You mean Postgres wouldn't *always* use created_at's index with such access patterns on a big table (even if one exists): select * from foo order by created_at desc; ? Mind you the distribution of created_at values are going to be as different as the time is (i.e. almost as many different values as there are tables in the row). ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] index usage in joins q'n
http://www.postgresql.org/docs/8.2/interactive/indexes-intro.html states that "Indexes can moreover be used in join searches. Thus, an index defined on a column that is part of a join condition can significantly speed up queries with joins." Does this mean that a condition like "WHERE ... [AND] lhs.a=rhs.b [AND] ..." where rhs.b is already unique-indexed, also requires (non-unique) index on lhs.a for maximal join speed? Otherwise why would they want to say that? Thanks. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Would an index benefit select ... order by?
Hi, Should an index be used on a created_at timestamp column if you know you will be using "ORDER BY created_at ASC|DESC" from time to time? Thanks. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] 8.3b1 in production?
Hi, Does anyone have an idea how risky it is to start using 8.3b1 in production, with the intention of upgrading to release (or newer beta) as soon as it becomes available? Risky compared to running a release, that is. Beta -> release upgrades might be less tricky than 8.2 -> 8.3. Thank you. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] initdb: file "/usr/local/share/postgresql/snowball_create.sql" does not exist
Hi, After installing Postgresql 8.3 beta1 on a fresh FreeBSD 6.2 (port databases/postgresql83-server) initdb gives error: # /usr/local/etc/rc.d/postgresql initdb initdb: file "/usr/local/share/postgresql/snowball_create.sql" does not exist This means you have a corrupted installation or identified the wrong directory with the invocation option -L. Is this expected at this time, or maybe a faulty port? Thanks. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] 3 tables join update
Richard Broersma Jr wrote: --- rihad <[EMAIL PROTECTED]> wrote: UPDATE Foo foo SET foo.baz_id=baz.id FROM Baz baz LEFT JOIN Bar bar ON (foo.bar_id=bar.id) WHERE bar.id IS NULL; This query cannot work. Basiclly, you are trying to set the foo.baz_id = baz.id for records in foo that do not yet exist. Doing this is impossible. Sorry the query failed victim of me trying to simplify it so I forgot the foo=bar join. Here's a 2 table join suffering from the same problem: I want the update only when bar.common_field IS NULL: UPDATE Foo foo SET ... FROM LEFT JOIN Bar bar USING(common_field) WHERE blah='blah' AND bar.common_field IS NULL; ERROR: syntax error at or near "JOIN" I know I'm misusing UPDATE ... FROM because I don't really want Bar's values to go into Foo, but only using them for a conditional update (atomically I hope). ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] 3 tables join update
Richard Broersma Jr wrote: --- rihad <[EMAIL PROTECTED]> wrote: UPDATE Foo foo SET foo.baz_id=baz.id FROM Baz baz LEFT JOIN Bar bar ON (foo.bar_id=bar.id) WHERE bar.id IS NULL; This query cannot work. I know. So how do I do it efficiently? Thanks. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] 3 tables join update
Hi, Say I want to update table Foo with data in table Bar iff left join between Foo and yet another table Baz finds no match. UPDATE Foo foo LEFT JOIN Bar bar ON (foo.bar_id=bar.id) SET foo.baz_id=baz.id FROM Baz baz WHERE bar.id IS NULL; ERROR: syntax error at or near "LEFT" UPDATE Foo foo SET foo.baz_id=baz.id FROM Baz baz LEFT JOIN Bar bar ON (foo.bar_id=bar.id) WHERE bar.id IS NULL; ERROR: invalid reference to FROM-clause entry for table "foo" HINT: There is an entry for table "foo", but it cannot be referenced from this part of the query. Is it possible to rewrite this so that it does what I want in a single query? Important: performance matters. Thanks. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] strange TIME behaviour
Michael Fuhr wrote: On Sat, Sep 15, 2007 at 04:45:02PM +0500, rihad wrote: Can someone please explain to me why these two give different results? The idea is to get the number of seconds past 00:00:00, so the second one is obviously correct. They're both correct. foo=> select extract(epoch from current_time); date_part -- 42023.026348 (1 row) current_time is a time with time zone; the above query returns the number of seconds since 00:00:00 UTC. foo=> select extract(epoch from cast(current_time as time)); date_part -- 60030.824587 (1 row) By casting current_time to time without time zone you're now getting the number of seconds since 00:00:00 in your local time zone. I'm reading this right now: http://www.postgresql.org/docs/8.2/interactive/datatype-datetime.html "time with time zone" is not recommended. I'm still unsure if the timezone issue is at all important when comparing timestamps (greater/less/etc), or when adding intervals to preset dates? Like registration_time + interval '2 months'; ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] strange TIME behaviour
Michael Fuhr wrote: On Sat, Sep 15, 2007 at 04:45:02PM +0500, rihad wrote: Can someone please explain to me why these two give different results? The idea is to get the number of seconds past 00:00:00, so the second one is obviously correct. They're both correct. foo=> select extract(epoch from current_time); date_part -- 42023.026348 (1 row) current_time is a time with time zone; the above query returns the number of seconds since 00:00:00 UTC. foo=> select extract(epoch from cast(current_time as time)); date_part -- 60030.824587 (1 row) By casting current_time to time without time zone you're now getting the number of seconds since 00:00:00 in your local time zone. PostgreSQL seems to default to "time without time zone" when declaring columns in the table schema. Since all my times and timestamps are in local time zone, and I'm *only* dealing with local times, should I be using "time with time zone" instead? When would it make a difference? Only when comparing/subtracting? Is "with time zone" not the default because it's slower? Thanks. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] strange TIME behaviour
Can someone please explain to me why these two give different results? The idea is to get the number of seconds past 00:00:00, so the second one is obviously correct. foo=> select extract(epoch from current_time); date_part -- 42023.026348 (1 row) foo=> select extract(epoch from cast(current_time as time)); date_part -- 60030.824587 (1 row) Isn't current_time already a time? Why is the cast necessary? Thanks. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] getting min/max of two values
How can I efficiently return the minimum/maximum of two given expressions? Like SELECT MYMIN(a+b-c,d+e*f). Thanks. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Inserting a timestamp in a "timestamp" column.
Hi all, I have a column declared as "timestamp without time zone" that I vainly want to insert a raw timestamp into (i.e. in the format returned by Perl's or PHP's time()). I know of SQL NOW(), but I want to insert a "cooked" timestamp from the outside most efficiently. How? Thanks. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] cast time interval to seconds
Hi, I have two columns start_time & stop_time declared as "TIME". I'd like to compute the difference between the two times in seconds, all in db: SELECT (CAST(stop_time AS SECONDS) + 86400 - CAST(start_time AS SECONDS)) % 86400; Unfortunately AS SECONDS causes parse error. Any hints? Thanks. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] serial grows on failed requests
Hi, my table is defined as: CREATE TABLE users ( id integer NOT NULL, ... ); CREATE SEQUENCE users_id_seq INCREMENT BY 1 NO MAXVALUE NO MINVALUE CACHE 1; ALTER SEQUENCE users_id_seq OWNED BY users.id; Although it's just a more verbose way to say create table users (id serial primary key); :) When I do an insert that fails (like FK inconsistency, illegal value, etc.) the users.id grows nonetheless... This is unacceptable for my current needs. Any way to prevent that while still maintaining ease of use? Using PostgreSQL 8.2.4 Thanks. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/