Re: [GENERAL] Server/Client Encoding Errors
APseudoUtopia wrote: I'm having some problems when inserting special characters into a column. Here's the table: -- Table public.users_history_ip Column |Type | Modifiers +-+--- id | bigint | not null default nextval('users_history_ip_id_seq'::regclass) userid | integer | not null ip | inet| not null hostname | character varying(512) | not null geoip_info | character varying(512) | not null start_time | timestamp without time zone | not null last_seen | timestamp without time zone | not null type | ip_history_type | not null Indexes: users_history_ip_pkey PRIMARY KEY, btree (id) Foreign-key constraints: users_history_ip_userid_fkey FOREIGN KEY (userid) REFERENCES users_main(id) ON DELETE CASCADE -- I'm trying to insert information into the geoip_info column. Here's some of the information that I'm trying to insert, and the errors: 'Portugal, 09, Vila Real De Santo António' ERROR: invalid byte sequence for encoding UTF8: 0xf36e696f 'Norway, 08, Ålesund' ERROR: invalid byte sequence for encoding UTF8: 0xc56c 'Portugal, 04, Vila Nova De Famalicão' ERROR: invalid byte sequence for encoding UTF8: 0xe36f2c The locale on the server is C and the encoding is UTF8. I thought the UTF8 encoding would allow characters like this? Why is it disallowing it? Note, the GeoIP info is generated automatically by a module, so I am unable to determine exactly what characters will be returned. The UTF8 encoding allows you to store the characters ó, Å and ã, but you have to encode them correctly. Judging from the error messages, you have set your client_encoding to UTF8, but feed data that are encoded in LATIN1 or WIN1252. If you feed the client LATIN1 data, set client_encoding to LATIN1 so that PostgreSQL can correctly convert the characters to UTF-8. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] synchronous_commit=off doesn't always return immediately
Alvaro Herrera-7 wrote: This is why Tom was suggesting you to increase wal_buffers. Did you try that? Thanks for the explanation. I will try increasing the wal_buffers. Unfortunately this is on a system I can't restart for the next few days. Tomer -- View this message in context: http://www.nabble.com/synchronous_commit%3Doff-doesn%27t-always-return-immediately-tp24621119p24675696.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Very slow joins
postgres collect all necessary stats. Maybe an implicit analyze is necessary? Should be: explicit analyze. BUT I found the real cause of my problem - the fk2 field from my example had not only an index, but it was also a foreign key to another table. That seems unlikely to be the cause. It's just what I saw. First I tried with all the FKs but had to break the queries because they took too long. Then I dropped the FK and the query run in a couple of minutes. Thanks, MS -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Very slow joins
What first post? The only thing I can find is a reference in a message by you from yesterday, to a two-year old post that you claim is about the same problem. Though it's possible that it is the same problem, you don't provide any data to back that up. Strange - you can see the full thread here: http://groups.google.pl/group/pgsql.general/browse_thread/thread/6c4ea8356327276c I post via Google, maybe that's the cause? The message you referred to was about a one-of-a-kind problem with communications to the client and had nothing to do with performance on the server; is that indeed what you're seeing? In that case you should check your network infrastructure for problems. No, I actually meant that the CPU usage was low during my query. On the other hand IO wait was very high so the low CPU usage was kind of normal. Usually server performance problems are due to problems with tuning parameters or outdated statistics. Those issues can usually be solved easily. Well, maybe - I started to use postgres only recently, so maybe I'm doing some obvious mistakes. My database was filled incrementally by a shell script - I don't know if that alone lets postgres collect all necessary stats. Maybe an implicit analyze is necessary? I also tried to vacuum/vacuum full my tables before running my queries but it took too long so I had to break it. Without posting an EXPLAIN ANALYSE people here can only guess what your problem is. I'm posting another explain analyze below. I've run my query with explain analyze, but forgot to save it :( I think I won't be able to run my queries again, because they took around 30-60 minutes and almost killed my server (was almost completely unresponsive during the query, because of 90% IO wait). BUT I found the real cause of my problem - the fk2 field from my example had not only an index, but it was also a foreign key to another table. That seems unlikely to be the cause. From the above it seems much more likely that you're suffering from a bad query plan instead, but you don't provide any details. I just tried to create a test with similar data - here is what it looks like: (it should work if you just paste it in some Test db) -- - generate test tables + data drop table if exists article, keyword, article_keyword, tochange, sums cascade; CREATE TABLE article ( id serial NOT NULL PRIMARY KEY, content varchar(255) NULL, ip inet NULL, has_comments bool not null ) ; CREATE TABLE keyword ( id serial NOT NULL PRIMARY KEY, keyword varchar(40) NOT NULL UNIQUE, articles integer NOT NULL ) ; CREATE TABLE article_keyword ( id serial NOT NULL PRIMARY KEY, article_id integer NOT NULL REFERENCES article (id) DEFERRABLE INITIALLY DEFERRED, keyword_id integer NOT NULL REFERENCES keyword (id) DEFERRABLE INITIALLY DEFERRED, votes_yes integer NOT NULL, votes_no integer NOT NULL ) ; CREATE INDEX article_keyword_keyword_id ON article_keyword (keyword_id); insert into article(content, ip, has_comments) values ('some article', '123.121.121.223', true); insert into keyword select nextval('keyword_id_seq'), md5(to_char(i, '9')), 0 from generate_series(1,200) as i; insert into article_keyword select nextval('article_keyword_id_seq'), 1, k.id, 0, 0 from generate_series(1,200) as i join keyword k on k.keyword=md5(to_char(i, '9')) join generate_series(1,5) as times on true ; create table tochange ( fromid int not null primary key, toid int not null ); insert into tochange select k1.id, k2.id from generate_series(1,20) as i join keyword k1 on k1.keyword=md5(to_char(i, '9')) join keyword k2 on k2.keyword=md5(to_char(i+20, '9')) ; create table sums ( id int not null primary key, sum int ); -- - now my queries: -- replace fromid's with toid's update article_keyword set keyword_id=tc.toid from tochange tc where keyword_id=tc.fromid ; -- delete unused keywords delete from article_keyword where id in ( select k.id from keyword k left join article_keyword ak on k.id=ak.keyword_id where ak.keyword_id is null ) ; -- recalculate sums - in how many articles is a keyword used? insert into sums select keyword_id, count(*) from article_keyword group by keyword_id; update keyword k set articles=s.sum from sums s where k.id=s.id; -- The problem is that I can't reproduce this slow behaviour with this test case. :( The tables are almost identical - only the article table is bigger in reality (it has around million rows) When I run explain update (first update from the test case) it prints this now: Merge Join (cost=5.14..53436.13 rows=3636710 width=26) Merge Cond: (tc.fromid = article_keyword.keyword_id) - Index Scan using tochange_pkey on tochange tc (cost=0.00..2830.26 rows=10 width=8) - Index Scan using article_keyword_keyword_id on article_keyword
[GENERAL] Postgres and multiple updates in one statement
Hi there We have a function which runs a set of update clauses and we are considering putting all the update clauses into one statement. I would like to understand how postgres handles multiple updates. If we were to send 5 update statements in one sql statement to the db would it: Do 5 passes on the table, on each pass, retrieve the id then update the row Do 1 pass to retrieve the 5 primary keys, then update all rows in parallel Apart from saving on the overhead of having to open up 5 separate connections, what are the benefits to passing in multiple updates in one statement? Many thanks, Jamie
Re: [GENERAL] Postgres and multiple updates in one statement
On 27/07/2009 09:10, Jamie Lawrence-Jenner wrote: Apart from saving on the overhead of having to open up 5 separate connections, what are the benefits to passing in multiple updates in one statement? If you do them all within one transaction - begin; update update... ... commit; - then you save on the overhead associated with beginning and committing a transaction for each update. Ray. -- Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland r...@iol.ie Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals -- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres and multiple updates in one statement
On Mon, Jul 27, 2009 at 2:10 AM, Jamie Lawrence-Jennerjamie.jen...@autovhc.co.uk wrote: Hi there We have a function which runs a set of update clauses and we are considering putting all the update clauses into one statement. I would like to understand how postgres handles multiple updates. If we were to send 5 update statements in one sql statement to the db would it: Do 5 passes on the table, on each pass, retrieve the id then update the row Do 1 pass to retrieve the 5 primary keys, then update all rows in parallel I would do 5 passes. Better to have one update statement to reduce bloat. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres and multiple updates in one statement
Hello, Le 27/07/09 15:07, Raymond O'Donnell a écrit : On 27/07/2009 09:10, Jamie Lawrence-Jenner wrote: Apart from saving on the overhead of having to open up 5 separate connections, what are the benefits to passing in multiple updates in one statement? If you do them all within one transaction - begin; update update... ... commit; - then you save on the overhead associated with beginning and committing a transaction for each update. Next to the transaction way suggested by Raymond O'Donnell, I would add that performance would depend on FROM and WHERE clauses specified in the original UPDATE statements. In the case of multiple UPDATE statements following a quite similar schema (ie. similar FROM (optional) and WHERE clauses), it might be clearer (for the source code) and faster (for the database engine) to merge them in a single UPDATE statement. Otherwise (ie. UPDATE statements with not so much FROM and/or WHERE clauses in common), transaction 1-block statement as suggested by Raymond O'Donnell would certainly be the more appropriate. Improvements on the merging UPDATE statements may thence be advised if some pieces of original statements could be given--without compromising confidential data. Regards. -- nha / Lyon / France. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres and multiple updates in one statement
On Mon, Jul 27, 2009 at 07:31:37AM -0600, Scott Marlowe wrote: On Mon, Jul 27, 2009 at 2:10 AM, Jamie Lawrence-Jennerjamie.jen...@autovhc.co.uk wrote: Do 1 pass to retrieve the 5 primary keys, then update all rows in parallel I would do 5 passes. Better to have one update statement to reduce bloat. You could possibly use UNION or UNION ALL to consolidate your 5 passes into one pass. You could also possibly use UPDATE FROM to avoid having to return the primary keys at all, and get the whole thing done in one query. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com signature.asc Description: Digital signature
[GENERAL] Calculating the difference between timetz values
Hello, I was looking for a way to get the difference (interval) between 2 timetz values, i.e.: postgres=# select '2:45+7'::timetz - '2:44+2'::timetz; ERROR: operator does not exist: time with time zone - time with time zone LINE 1: select '2:45+7'::timetz - '2:44+2'::timetz; I'd expect the result of the above to be the interval of 05:01:00. Is there any function or operator that calculates the difference correctly ? I've found a way to cast a timetz to the time without TZ and substract resulting time values, but it's not a correct solution for the problem above due to the loss of all TZ information. Regards, -- Alexey Klyukin http://www.CommandPrompt.com The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres and multiple updates in one statement
Hi There Our update statements are as follows Update table set col1=x,col2=y where pkid=1; Update table set col1=x,col2=y where pkid=2; Update table set col1=x,col2=y where pkid=3; Very simple and straight forward. Sometimes there could be as many as 50 update statements to process. Many thanks Jamie -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of nha Sent: 27 July 2009 14:35 To: Jamie Lawrence-Jenner Cc: PgSQL-general Subject: Re: [GENERAL] Postgres and multiple updates in one statement Hello, Le 27/07/09 15:07, Raymond O'Donnell a écrit : On 27/07/2009 09:10, Jamie Lawrence-Jenner wrote: Apart from saving on the overhead of having to open up 5 separate connections, what are the benefits to passing in multiple updates in one statement? If you do them all within one transaction - begin; update update... ... commit; - then you save on the overhead associated with beginning and committing a transaction for each update. Next to the transaction way suggested by Raymond O'Donnell, I would add that performance would depend on FROM and WHERE clauses specified in the original UPDATE statements. In the case of multiple UPDATE statements following a quite similar schema (ie. similar FROM (optional) and WHERE clauses), it might be clearer (for the source code) and faster (for the database engine) to merge them in a single UPDATE statement. Otherwise (ie. UPDATE statements with not so much FROM and/or WHERE clauses in common), transaction 1-block statement as suggested by Raymond O'Donnell would certainly be the more appropriate. Improvements on the merging UPDATE statements may thence be advised if some pieces of original statements could be given--without compromising confidential data. Regards. -- nha / Lyon / France. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres and multiple updates in one statement
Hello again, Le 27/07/09 16:48, Jamie Lawrence-Jenner a écrit : Hi There Our update statements are as follows Update table set col1=x,col2=y where pkid=1; Update table set col1=x,col2=y where pkid=2; Update table set col1=x,col2=y where pkid=3; Very simple and straight forward. Sometimes there could be as many as 50 update statements to process. Many thanks Jamie [...] Improvements on the merging UPDATE statements may thence be advised if some pieces of original statements could be given--without compromising confidential data. Thanks for the examples. Comparison between explain analyze outputs from the two ways (multiple statements vs. one statement) should help choosing the faster. For the one-statement schema, the rewritten query could be: UPDATE yTable SET col1=x, col2=y WHERE pkID IN (1, 2, 3); Lists of pkID would rather be expressed in terms of enumeration when in WHERE clause. In a more general situation, I would recommand to determine pkIDs list before building UPDATE statement(s) if possible. This hint would surely save runtime. With regards. -- nha / Lyon / France. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres and multiple updates in one statement
On Mon, Jul 27, 2009 at 8:48 AM, Jamie Lawrence-Jennerjamie.jen...@autovhc.co.uk wrote: Hi There Our update statements are as follows Update table set col1=x,col2=y where pkid=1; Update table set col1=x,col2=y where pkid=2; Update table set col1=x,col2=y where pkid=3; Very simple and straight forward. Sometimes there could be as many as 50 update statements to process. Might as well fire them each as separate statements inside one transaction, since pkid is unique and non-repeated in the updates. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres and multiple updates in one statement
Scott Marlowe scott.marl...@gmail.com writes: On Mon, Jul 27, 2009 at 8:48 AM, Jamie Lawrence-Jennerjamie.jen...@autovhc.co.uk wrote: Our update statements are as follows Update table set col1=x,col2=y where pkid=1; Update table set col1=x,col2=y where pkid=2; Update table set col1=x,col2=y where pkid=3; Very simple and straight forward. Sometimes there could be as many as 50 update statements to process. Might as well fire them each as separate statements inside one transaction, since pkid is unique and non-repeated in the updates. If they're all exactly the same pattern like that, it might be worth the trouble to set up a prepared statement. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Running vacuumdb -a taking too long
We are to a point in size and utilization of a set of our Postgres 8.1.17 databases that a vacuumdb -a -z -U postgres is still running after 50 hours and we have to kill it off because the additional i/o is causing queries to stack up. We have archived off as much data as possible (100+ GB) and we are looking at moving to a better hardware configuration to support the increased system activity, but for now I need to know if there is a way to avoid transaction ID wraparound if we don't issue a vacuumdb -a to update the datfrozenxid value. This is a logging type database where data is initially loaded and then rolled up into weekly/monthly tables. Many of the tables in the DB don't last for more than seven days, and after a week the majority of the historical tables are static and stay around until they are finally dropped off (based on a retention period), so re-running a vacuum on these older static tables doesn't really gain much since the data never changes. I have read (and re-read) the documentation and am a bit confused on exactly what needs to happen here: Since periodic VACUUM runs are needed anyway for the reasons described earlier, it's unlikely that any table would not be vacuumed for as long as a billion transactions. But to help administrators ensure this constraint is met, VACUUM stores transaction ID statistics in the system table pg_database. In particular, the datfrozenxid column of a database's pg_database row is updated at the completion of any database-wide VACUUM operation (i.e., VACUUM that does not name a specific table). The value stored in this field is the freeze cutoff XID that was used by that VACUUM command. All normal XIDs older than this cutoff XID are guaranteed to have been replaced by FrozenXID within that database. A convenient way to examine this information is to execute the query SELECT datname, age(datfrozenxid) FROM pg_database; The age column measures the number of transactions from the cutoff XID to the current transaction's XID. So if, after a table is no longer added to (becomes static), I run a VACUUM against it, the table wouldn't need to be vacuumed again since the tuples and their related transaction ID's never change? Is there a way to set up a vacuum scheme on a table-by-table basis to accomplish the same goal as an all-in-one vacuumdb -a run that I can spread out over time instead of relying on the completion of a single vacuumdb -a command?
Re: [GENERAL] Calculating the difference between timetz values
On Jul 27, 2009, at 10:54 , Alexey Klyukin wrote: Hello, I was looking for a way to get the difference (interval) between 2 timetz values, i.e.: I don't have a solution, but am curious what your use case is for timetz (as opposed to timestamptz). Michael Glaesemann grzm seespotcode net -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Run procedure at startup
Hello, Is there a way to run a pl/pgsql automatically at server startup ? Is there a way to run a pl/pgsql function with an infinite loop as a daemon ? Is there a way to start a pl/pgsql function that would persist after the user session has closed ? Is there a way for an unprivileged user to delegate a task (allowed by superuser) to a superuser ? Thanks in advance.
Re: [GENERAL] Run procedure at startup
Saleem EDAH-TALLY wrote: Is there a way to run a pl/pgsql automatically at server startup ? in your postgres startup script launch a session with `psql ... -c some sql commands` or `psql ... -f somescript.sql` ... Is there a way to run a pl/pgsql function with an infinite loop as a daemon ? functions are called from within a transaction. if you did this, that transaction would never end, and this would prevent VACUUM from cleaning up any freed tuples from newer than the start of that transaction. Not good. you could, however, have a system daemon that periodically invokes a plpgsql function. Is there a way to start a pl/pgsql function that would persist after the user session has closed ? no. Is there a way for an unprivileged user to delegate a task (allowed by superuser) to a superuser ? someone else will have to chime in here. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Run procedure at startup
Hello 2009/7/27 Saleem EDAH-TALLY nm...@netcourrier.com: Hello, Is there a way to run a pl/pgsql automatically at server startup ? no - only you can modify startup scripts Is there a way to run a pl/pgsql function with an infinite loop as a daemon ? infinite loop is possible, but probably you need orafce or similar to be infinite loop usable: http://www.postgres.cz/index.php/Oracle_functionality_%28en%29#DBMS_ALERT http://www.postgres.cz/index.php/Oracle_functionality_%28en%29#DBMS_PIPE Is there a way to start a pl/pgsql function that would persist after the user session has closed ? yes - with some pooling sw like pgpool, bucardo and others Is there a way for an unprivileged user to delegate a task (allowed by superuser) to a superuser ? yes - look on security definer flag http://www.postgresql.org/docs/8.4/static/sql-createfunction.html Thanks in advance. regards Pavel Stehule -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Calculating the difference between timetz values
On Jul 27, 2009, at 6:52 PM, Michael Glaesemann wrote: On Jul 27, 2009, at 10:54 , Alexey Klyukin wrote: Hello, I was looking for a way to get the difference (interval) between 2 timetz values, i.e.: I don't have a solution, but am curious what your use case is for timetz (as opposed to timestamptz). I'm writing a custom trigger function that has to compare values of time* types and make some actions depending on a result. Answering my own question, Alvaro proposed a solution with extract(epoch from tz_value), the result is in seconds: postgres=# select extract(epoch from '00:00:00+0'::timetz) - extract(epoch from '2:00:00+2'::timetz); ?column? -- 0 (1 row) -- Alexey Klyukin http://www.CommandPrompt.com The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] combining db's- importing primary keys
I would like to combine multiple databases (same schema) into one master db. Does anyone know how I can reconcile all my primary and foreign keys to maintain referential integrity. Many thanks, Bob -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres and multiple updates in one statement
On Mon, 2009-07-27 at 11:50 -0400, Tom Lane wrote: Scott Marlowe scott.marl...@gmail.com writes: On Mon, Jul 27, 2009 at 8:48 AM, Jamie Lawrence-Jennerjamie.jen...@autovhc.co.uk wrote: Our update statements are as follows Update table set col1=x,col2=y where pkid=1; Update table set col1=x,col2=y where pkid=2; Update table set col1=x,col2=y where pkid=3; Very simple and straight forward. Sometimes there could be as many as 50 update statements to process. Might as well fire them each as separate statements inside one transaction, since pkid is unique and non-repeated in the updates. If they're all exactly the same pattern like that, it might be worth the trouble to set up a prepared statement. Seems like an opportunity for the use of a function. Joshua D. Drake regards, tom lane -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Calculating the difference between timetz values
Alexey Klyukin al...@commandprompt.com writes: On Jul 27, 2009, at 6:52 PM, Michael Glaesemann wrote: I don't have a solution, but am curious what your use case is for timetz (as opposed to timestamptz). I'm writing a custom trigger function that has to compare values of time* types and make some actions depending on a result. It's still fairly unclear why you think that comparing timetz values is a useful activity. Is 23:32 earlier or later than 00:32? How can you tell whether it's the same day or different days? Adding timezones into that doesn't make it better. Our documentation deprecates timetz as a poorly-defined datatype, and I've never seen a reason to argue with that judgment. I'd suggest taking a very hard look at why you're not using timestamptz instead. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Relational Algebra and Aggregate Functions
On Sun, 2009-07-26 at 15:36 -0400, Robert James wrote: I'm working on improving my background database theory, to aid in practice. I've found learning relational algebra to be very helpful. One thing which relational algebra doesn't cover is aggregate functions. Can anyone recommend any papers or web pages which provide some good theoretical background for aggregate functions? When it comes to relational theory, C.J. Date is a good author. An Introduction To Database Systems covers pretty much everything. There's a formal definition of a relational algebra (including SUMMARIZE, which is the authors' version of an aggregate operator) defined with only two operators here: http://thethirdmanifesto.com/ (look for Appendix A) Although Appendix A is not easy to understand without some basic familiarity with the authors' other works. Regards, Jeff Davis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running vacuumdb -a taking too long
In response to Keaton Adams kad...@mxlogic.com: We are to a point in size and utilization of a set of our Postgres 8.1.17 databases that a vacuumdb -a -z -U postgres is still running after 50 hours and we have to kill it off because the additional i/o is causing queries to stack up. We have archived off as much data as possible (100+ GB) and we are looking at moving to a better hardware configuration to support the increased system activity, but for now I need to know if there is a way to avoid transaction ID wraparound if we don't issue a vacuumdb -a to update the datfrozenxid value. This is a logging type database where data is initially loaded and then rolled up into weekly/monthly tables. Many of the tables in the DB don't last for more than seven days, and after a week the majority of the historical tables are static and stay around until they are finally dropped off (based on a retention period), so re-running a vacuum on these older static tables doesn't really gain much since the data never changes. I have read (and re-read) the documentation and am a bit confused on exactly what needs to happen here: Since periodic VACUUM runs are needed anyway for the reasons described earlier, it's unlikely that any table would not be vacuumed for as long as a billion transactions. But to help administrators ensure this constraint is met, VACUUM stores transaction ID statistics in the system table pg_database. In particular, the datfrozenxid column of a database's pg_database row is updated at the completion of any database-wide VACUUM operation (i.e., VACUUM that does not name a specific table). The value stored in this field is the freeze cutoff XID that was used by that VACUUM command. All normal XIDs older than this cutoff XID are guaranteed to have been replaced by FrozenXID within that database. A convenient way to examine this information is to execute the query SELECT datname, age(datfrozenxid) FROM pg_database; The age column measures the number of transactions from the cutoff XID to the current transaction's XID. So if, after a table is no longer added to (becomes static), I run a VACUUM against it, the table wouldn't need to be vacuumed again since the tuples and their related transaction ID's never change? Is there a way to set up a vacuum scheme on a table-by-table basis to accomplish the same goal as an all-in-one vacuumdb -a run that I can spread out over time instead of relying on the completion of a single vacuumdb -a command? I'm not quite sure what the best answer is to your problem, but since nobody else has suggested anything, here are my thoughts. First off, can you allocate more maintenance_work_mem? Perhaps that can speed up vacuum enough. Secondly, if you VACUUM FREEZE those static tables, it will guarantee that you'll never lose data from them, even if you hit XID wraparound. If you VACUUM FREEZE tables one at a time, perhaps you can avoid the huge performance hit. Third, while your argument about tables not needing vacuumed again seems logical, it's simply not how PG functions. Since the XID is database-wide, it may affect any table. Of course, the FREEZE process will protect tables from this. This seems to be improved in newer versions of Postgres, so an upgrade should improve the issue. Finally, are you really in danger of hitting the wraparound? If you run the query SELECT datname, age(datfrozenxid) FROM pg_database; (as suggested in the docs) once a day for a few days, does it seems like you're using up XIDs fast enough to be a danger? If you've got new hardware coming soon anyway, perhaps you have enough time to now worry about it on the current hardware? Hope this helps. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] For production: 8.4 or 8.3?
Just looking for experiences of people. Are people already using 8.4 in serious live hosting environments? Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Video available for PGDay SJC '09
Greetings, The video recordings of the sessions for PG Day SJC '09 are now available: Version 8.4: Easier to Administer than Ever / Josh Berkus / PostgreSQL Experts http://media.postgresql.org/pgday-sjc-09/pgday-sjc-09-easier.mov Rapid Upgrades with pg_Migrator / Bruce Momjian / EnterpriseDB http://media.postgresql.org/pgday-sjc-09/pgday-sjc-09-migrator.mov Check Please! What Your Postgres Databases Wishes You Would Monitor / Robert Treat / OmniTI http://media.postgresql.org/pgday-sjc-09/pgday-sjc-09-monitoring.mov PostgreSQL Pitfalls / Jeff Davis / Truviso http://media.postgresql.org/pgday-sjc-09/pgday-sjc-09-pitfalls.mov What works with Postgres: The Open Geo Data Interoperabilty Overview / Brian Hamlin / OSGeo Foundation http://media.postgresql.org/pgday-sjc-09/pgday-sjc-09-postgis.mov Very Large Databases and PostgreSQL: Issues and Solutions / John Cieslewicz / Aster Data http://media.postgresql.org/pgday-sjc-09/pgday-sjc-09-petabytes.mov pgGearman: A distributed worker queue for PostgreSQL / Brian Aker, Eric Day / Gearman Project http://media.postgresql.org/pgday-sjc-09/pgday-sjc-09-gearman.mov Lightning Talks http://media.postgresql.org/pgday-sjc-09/pgday-sjc-09-lightning.mov Many thanks to Steve Crawford for the audio system and assistance with setup, tear-down, and coordination during the event. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] For production: 8.4 or 8.3?
On Tue, 2009-07-28 at 03:51 +0800, Phoenix Kiula wrote: Are people already using 8.4 in serious live hosting environments? Not yet. There are lots of (important) fixes in CVS which are waiting for 8.4.1. For production, I'd wait for a while. -- Devrim GÜNDÜZ, RHCE Command Prompt - http://www.CommandPrompt.com devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org signature.asc Description: This is a digitally signed message part
Re: [GENERAL] Running vacuumdb -a taking too long
We are upgrading to 8.3.7 in September, if that helps the situation at all. I just want to make sure I thoroughly understand how these two statements work together: The price is this maintenance requirement: every table in the database must be vacuumed at least once every billion transactions. For each database in the cluster, PostgreSQL keeps track of the time of the last database-wide VACUUM. When any database approaches the billion-transaction danger level, the system begins to emit warning messages. If nothing is done, it will eventually shut down normal operations until appropriate manual maintenance is done. Let's say I have two sets of tables in the production DB, one set is always around and the other set rolls forward in time using CREATE/DROP table (_MMDD), with the oldest table being 14 days old. I never issue the actual vacuumdb -a command against the entire cluster, but I do a DB wide VACUUM; against databases such as postgres and template1 once a week. I issue a VACUUM table once a day against the tables that always exist in the database (they do receive inserts/updates/deletes daily) because they are small enough to vacuum without a big performance hit. This includes everything that is shown as a real table in pg_tables except those that match the filter of '%_MMDD' to avoid the rolling table set. The large day tables (rolling forward in time) I never vacuum because I just COPY into them, and if there is a failed COPY operation or two I don't really care about the wasted space, due to the fact that the table has a limited lifespan anyway. So these tables are always rolling forward and being dropped in a 14 day window. So my questions are: 1. Will the Postgres cluster eventually shut down because I never do a true database-wide VACUUM using a vacuumdb -a command on all of the databases, even though I vacuum the tables in the production database that have a lifespan of greater than 14 days? 2. Would I ever be at risk of losing data in a table that is only around for a two week (14 day period) if I never do this database wide VACUUM on the actual production DB? Thanks again for your response. On 7/27/09 1:41 PM, Bill Moran wmo...@potentialtech.com wrote: In response to Keaton Adams kad...@mxlogic.com: We are to a point in size and utilization of a set of our Postgres 8.1.17 databases that a vacuumdb -a -z -U postgres is still running after 50 hours and we have to kill it off because the additional i/o is causing queries to stack up. We have archived off as much data as possible (100+ GB) and we are looking at moving to a better hardware configuration to support the increased system activity, but for now I need to know if there is a way to avoid transaction ID wraparound if we don't issue a vacuumdb -a to update the datfrozenxid value. This is a logging type database where data is initially loaded and then rolled up into weekly/monthly tables. Many of the tables in the DB don't last for more than seven days, and after a week the majority of the historical tables are static and stay around until they are finally dropped off (based on a retention period), so re-running a vacuum on these older static tables doesn't really gain much since the data never changes. I have read (and re-read) the documentation and am a bit confused on exactly what needs to happen here: Since periodic VACUUM runs are needed anyway for the reasons described earlier, it's unlikely that any table would not be vacuumed for as long as a billion transactions. But to help administrators ensure this constraint is met, VACUUM stores transaction ID statistics in the system table pg_database. In particular, the datfrozenxid column of a database's pg_database row is updated at the completion of any database-wide VACUUM operation (i.e., VACUUM that does not name a specific table). The value stored in this field is the freeze cutoff XID that was used by that VACUUM command. All normal XIDs older than this cutoff XID are guaranteed to have been replaced by FrozenXID within that database. A convenient way to examine this information is to execute the query SELECT datname, age(datfrozenxid) FROM pg_database; The age column measures the number of transactions from the cutoff XID to the current transaction's XID. So if, after a table is no longer added to (becomes static), I run a VACUUM against it, the table wouldn't need to be vacuumed again since the tuples and their related transaction ID's never change? Is there a way to set up a vacuum scheme on a table-by-table basis to accomplish the same goal as an all-in-one vacuumdb -a run that I can spread out over time instead of relying on the completion of a single vacuumdb -a command? I'm not quite sure what the best answer is to your problem, but since nobody else has suggested anything, here are my thoughts. First off, can you allocate more
Re: [GENERAL] For production: 8.4 or 8.3?
On Mon, Jul 27, 2009 at 12:51 PM, Phoenix Kiulaphoenix.ki...@gmail.com wrote: Just looking for experiences of people. Are people already using 8.4 in serious live hosting environments? Thanks. Wait.. 8.3 is running fine and dandy. Lots of decent sized changes in 8.4 with awaiting fixes. So wait. And those that have multiple TB's of data, weee another dump and restore upgrade (pt!) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] For production: 8.4 or 8.3?
Tory M Blue wrote on 27.07.2009 22:45: And those that have multiple TB's of data, weee another dump and restore upgrade (pt!) Isn't that what pg_migrator is for? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] For production: 8.4 or 8.3?
On Mon, Jul 27, 2009 at 4:45 PM, Tory M Blue tmb...@gmail.com wrote: And those that have multiple TB's of data, weee another dump and restore upgrade (pt!) pg_migrator doesn't need to dump - restore, it can do an in-place upgrade of the datafiles for you. http://archives.postgresql.org/pgsql-committers/2009-06/msg00031.php --Scott
Re: [GENERAL] For production: 8.4 or 8.3?
On Mon, Jul 27, 2009 at 2:48 PM, Thomas Kellererspam_ea...@gmx.net wrote: Tory M Blue wrote on 27.07.2009 22:45: And those that have multiple TB's of data, weee another dump and restore upgrade (pt!) Isn't that what pg_migrator is for? I use slony for such things, downtime = zero (ok a few seconds) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] For production: 8.4 or 8.3?
On Mon, 2009-07-27 at 22:48 +0200, Thomas Kellerer wrote: Tory M Blue wrote on 27.07.2009 22:45: And those that have multiple TB's of data, weee another dump and restore upgrade (pt!) Isn't that what pg_migrator is for? It depends, 8.3 and 8.4 are not compatible by default (because of --integer-datetimes). So, yeah if you are running Debian/Ubuntu but if you are running Cent/RH with the defaults, pg_migrator isn't going to work unless you compile Pg from source. Joshua D. Drake -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running vacuumdb -a taking too long
On Mon, Jul 27, 2009 at 02:21:02PM -0600, Keaton Adams wrote: We are upgrading to 8.3.7 in September, if that helps the situation at all. This is good. Since 8.2 VACUUM age is done per table instead of per database. This should solve most of your problems. So my questions are: 1. Will the Postgres cluster eventually shut down because I never do a true database-wide VACUUM using a vacuumdb -a command on all of the databases, even though I vacuum the tables in the production database that have a lifespan of greater than 14 days? On older versions you need to do a database-wide vacuum (note this is not vacuumdb -a) once every billion transaction. Did you take the advice in the email you responded to with respect to speeding up vacuum? And using SELECT datname, age(datfrozenxid) FROM pg_database; to determine if it's an actual problem (just post the results if you can't interpret them). 2. Would I ever be at risk of losing data in a table that is only around for a two week (14 day period) if I never do this database wide VACUUM on the actual production DB? You won't lose data, but you need to do a DB wide (not cluster-wide) vacuum to advance the wraparound counter... Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [GENERAL] Running vacuumdb -a taking too long
This is good. Since 8.2 VACUUM age is done per table instead of per database. This should solve most of your problems. On older versions you need to do a database-wide vacuum (note this is not vacuumdb -a) once every billion transactions. You won't lose data, but you need to do a DB wide (not cluster-wide) vacuum to advance the wraparound counter... That answered the question exactly as I needed to hear it. Did you take the advice in the email you responded to with respect to speeding up vacuum? Great suggestion and one that I will try out over the next couple of days. And using to determine if it's an actual problem (just post the results if you can't interpret them). postgres=# SELECT datname, age(datfrozenxid) FROM pg_database; datname |age + postgres | 1073741878 listenerdb | 1074114794 template1 | 1073908727 template0 | 30121699 (4 rows) Thanks again. On 7/27/09 3:10 PM, Martijn van Oosterhout klep...@svana.org wrote: On Mon, Jul 27, 2009 at 02:21:02PM -0600, Keaton Adams wrote: We are upgrading to 8.3.7 in September, if that helps the situation at all. This is good. Since 8.2 VACUUM age is done per table instead of per database. This should solve most of your problems. So my questions are: 1. Will the Postgres cluster eventually shut down because I never do a true database-wide VACUUM using a vacuumdb -a command on all of the databases, even though I vacuum the tables in the production database that have a lifespan of greater than 14 days? On older versions you need to do a database-wide vacuum (note this is not vacuumdb -a) once every billion transaction. Did you take the advice in the email you responded to with respect to speeding up vacuum? And using SELECT datname, age(datfrozenxid) FROM pg_database; to determine if it's an actual problem (just post the results if you can't interpret them). 2. Would I ever be at risk of losing data in a table that is only around for a two week (14 day period) if I never do this database wide VACUUM on the actual production DB? You won't lose data, but you need to do a DB wide (not cluster-wide) vacuum to advance the wraparound counter... Have a nice day, -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] C Function Question
Greetings: Does anyone know if a function written in C and linked into the backend in a shared library with a statically declared structure, maintain that data for the life of the backend process such that, when the function is called again, the structure data is intact? Thanks for any insight anyone can give... -- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] For production: 8.4 or 8.3?
Joshua D. Drake j...@commandprompt.com writes: It depends, 8.3 and 8.4 are not compatible by default (because of --integer-datetimes). So, yeah if you are running Debian/Ubuntu but if you are running Cent/RH with the defaults, pg_migrator isn't going to work unless you compile Pg from source. Oh? You think RH/Cent is going to change that default now? Think again. Of course the real question is whether you trust pg_migrator to not eat your data. Those who are afraid to trust 8.4.0 will probably not care to trust pg_migrator for a few versions yet either... regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] For production: 8.4 or 8.3?
On Mon, 2009-07-27 at 19:16 -0400, Tom Lane wrote: Joshua D. Drake j...@commandprompt.com writes: It depends, 8.3 and 8.4 are not compatible by default (because of --integer-datetimes). So, yeah if you are running Debian/Ubuntu but if you are running Cent/RH with the defaults, pg_migrator isn't going to work unless you compile Pg from source. Oh? You think RH/Cent is going to change that default now? Think again. I thought they would get around to changing it now. That is a shame because RH really can't be used as a production PostgreSQL server (if date based data is important) unless you recompile or install the --integer-datetime rpms. Joshua D. Drake -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] For production: 8.4 or 8.3?
Joshua D. Drake j...@commandprompt.com writes: On Mon, 2009-07-27 at 19:16 -0400, Tom Lane wrote: Oh? You think RH/Cent is going to change that default now? Think again. I thought they would get around to changing it now. They is me, and it's not changing. I'm not blowing a chance at in-place upgrade to switch the integer-timestamp default. because RH really can't be used as a production PostgreSQL server (if date based data is important) I have open bugs about the lack of in-place upgrade. I have never once heard a customer complain about FP timestamps. So your position is nonsense. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] general question on two-partition table
Hi. We have a table with 30 M records that is growing by about 100 K records per day. The experimentalists, whose data are in the table, have decided that they will focus on the records for which the value of one field, rbscore, is greater than a cut-off. However, they want to continue to store all of the data - even the records for which rbscore is less than the cutoff - in the database. For the current table, there are about 400 K (out of 30 M) records that meet the 'good' criteria. Each record in the table has about 40 fields, and the experimentalists do in fact write queries that use many of those fields, some more than others. (They are building a model and have not pinned down exactly which fields are more important than others, so that's why they store and query by so many fields.) If they are going to spend 95% of their time querying the records that meet the 'good' criteria, what are the good strategies for ensuring good performance for those queries? (1) Should I partition the table into two partitions based on the value of rbscore? (2) Should I create two separate tables? Are (1) and (2) more or less equivalent in terms of performance? I think that partitioning the table is a more flexible option (i.e., what if the cutoff value changes, no need to change the name of the table being queried, etc.), but would (2) give better performance given that 95% of their queries are for rbscores greater than a threshold value? Can you suggest other strategies? Thank you, Janet -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Building from source vs RPMs
I'm moving from a long time in BSD-land to using Linux. I've always been in the habit of building PostgreSQL from the source tarballs. On Linux, is there an advantage either way to using the RPMs as opposed to building from source? Thanks! -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] A question about the permissions
I am trying to monitor replication lag using zabbix. I have written a simple script in ruby to get the lag it goes like this. require 'date' require 'yaml' y = YAML.load `/usr/lib/postgresql/8.3/bin/pg_controldata /var/lib/postgresql/8.3/main` last_checkpoint = DateTime.parse( y['Time of latest checkpoint']) hours, mins, secs, fractions = Date::day_fraction_to_time(DateTime.now - last_checkpoint) puts hours * 60 * 60 + mins * 60 + secs When I try to run this script as the zabbix user (or any user other than postgres or root) I get the error pg_controldata: could not open file /var/lib/postgresql/8.3/main/global/pg_control for reading: Permission denied This is because everything under the 8.3 directory is readable by the postgres user only (not group permissions). Is there any harm to giving the postgres group the same rights as the postgres group and adding the zabbix user to the postgres group? What is the rationale for not giving the group any permissions at all? Thanks. BTW I think it's really cool that the output of pg_controldata is a YAML parseable format. I am pretty sure that's by accident but don't change it :) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] For production: 8.4 or 8.3?
On Mon, 2009-07-27 at 19:44 -0400, Tom Lane wrote: because RH really can't be used as a production PostgreSQL server (if date based data is important) I have open bugs about the lack of in-place upgrade. I have never once heard a customer complain about FP timestamps. So your position is nonsense. Most customers wouldn't even understand the problem. We have systems we have to custom maintain due to PostgreSQL having ghost data because of the floating point based timestamp storage. The problem is very simple. If you run on RH by default you have an opportunity for data that will disappear in a practical sense. You know this is true. My response is not nonsense. The data is still there but it is floating point based and thus, inexact. The where clause that you expect to retrieve the data, may not. Joshua D. Drake -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] general question on two-partition table
On Mon, Jul 27, 2009 at 7:52 PM, Janet Jacobsenjsjacob...@lbl.gov wrote: Can you suggest other strategies? Something that might be easier to play with is to create a (or several, to speed up other queries) functional index on the comparison between rbscore and the cutoff. It won't buy you anything on seq scans, but if most of the queries are done on small subsets of the tuples which meet the good criteria, it could be a big win that's very easy to implement. -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] general question on two-partition table
On Tue, Jul 28, 2009 at 1:08 AM, David Wilsondavid.t.wil...@gmail.com wrote: On Mon, Jul 27, 2009 at 7:52 PM, Janet Jacobsenjsjacob...@lbl.gov wrote: Can you suggest other strategies? Something that might be easier to play with is to create a (or several, to speed up other queries) functional index on the comparison between rbscore and the cutoff. I think it would be even more interesting to have partial indexes -- ie specified with WHERE rbscore cutoff. I'm actually wondering if partitioning is really what you want. You might prefer to just keep two entirely separate tables. One that has all the data and one that has a second copy of the desirable subset. Kind of like a materialized view of a simple query with the where clause of rbscore cutoff. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] general question on two-partition table
On Mon, Jul 27, 2009 at 8:24 PM, Greg Starkgsst...@mit.edu wrote: I think it would be even more interesting to have partial indexes -- ie specified with WHERE rbscore cutoff. Yes- that's what I actually meant. Word got scrambled between brain and fingers... -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Clients disconnect but query still runs
Hi. I noticed that when clients (both psql and pgAdmin) disconnect or cancel, queries are often still running on the server. A few questions: 1) Is there a way to reconnect and get the results? 2) Is there a way to tell postgres to automatically stop all queries when the client who queried them disconnects? 3) Is there a way to see all queries whose clients have disconnected? 4) And finally: Why is this the behavior? Doesn't this keep some very long queries running which drain performance but don't seem to benefit anyone?
Re: [GENERAL] general question on two-partition table
* Janet Jacobsen (jsjacob...@lbl.gov) wrote: If they are going to spend 95% of their time querying the records that meet the 'good' criteria, what are the good strategies for ensuring good performance for those queries? (1) Should I partition the table into two partitions based on the value of rbscore? (2) Should I create two separate tables? Are (1) and (2) more or less equivalent in terms of performance? It's not clear to me what you plan here.. How would you handle (2) for the users? Would you construct a view across them, or expect them to query the right table(s)? Options, as I see them, and in the order of 'best-to-worst' wrt user friendlyness and performance, I believe, are: 1- Partitioning (with CHECK constraints and constraint_exclusion) 2- View across two tables (with appropriate WHERE clauses) 3- Functional index (as suggested by someone else) 4- separate tables (users have to figure out how to use them) 5- single table with everything My recommendation would be #1, followed by #2. Be sure to look up how to do partitioning by using inheiritance in PG, and, if you need to, look at how to implement a trigger to handle inserts on the parent table. Make sure you create your CHECK() constraints correctly, and that you have constraint_exclusion enabled, and that it *works*. I think that partitioning the table is a more flexible option (i.e., what if the cutoff value changes, no need to change the name of the table being queried, etc.), but would (2) give better performance given that 95% of their queries are for rbscores greater than a threshold value? If you have your partitioning set up correctly, I don't believe having actual separate tables would be that much of a performance gain for queries. It would help some with inserts, tho if you know which table to insert into, you could just insert into that child rather than the main and using a trigger. Enjoy, Stephen signature.asc Description: Digital signature
Re: [GENERAL] Relational Algebra and Aggregate Functions
Thanks for all the good replies (both on and off list). It seems the consensus is for me to read Christopher Date. I found two relevant Date books: 1) Introduction to Database Systems http://www.amazon.com/Introduction-Database-Systems-Kannan-Swamynathan/dp/B001BVYKY4/ref=sr_1_5?ie=UTF8s=booksqid=1248742811sr=1-5 and 2) Database in Depth: Relational Theory for Practitioners http://www.amazon.com/Database-Depth-Relational-Theory-Practitioners/dp/0596100124/ref=sr_1_7?ie=UTF8s=booksqid=1248742811sr=1-7 Any recommendations as to which? From the titles, I'd be inclined towards the second, but not if the first is better. One thing I'm not interested in is polemics against SQL and lamentations on how ignorant all practitioners are. On Mon, Jul 27, 2009 at 2:45 PM, Jeff Davis pg...@j-davis.com wrote: On Sun, 2009-07-26 at 15:36 -0400, Robert James wrote: I'm working on improving my background database theory, to aid in practice. I've found learning relational algebra to be very helpful. One thing which relational algebra doesn't cover is aggregate functions. Can anyone recommend any papers or web pages which provide some good theoretical background for aggregate functions? When it comes to relational theory, C.J. Date is a good author. An Introduction To Database Systems covers pretty much everything. There's a formal definition of a relational algebra (including SUMMARIZE, which is the authors' version of an aggregate operator) defined with only two operators here: http://thethirdmanifesto.com/ (look for Appendix A) Although Appendix A is not easy to understand without some basic familiarity with the authors' other works. Regards, Jeff Davis
Re: [GENERAL] Relational Algebra and Aggregate Functions
On Mon, 2009-07-27 at 21:05 -0400, Robert James wrote: 1) Introduction to Database Systems http://www.amazon.com/Introduction-Database-Systems-Kannan-Swamynathan/dp/B001BVYKY4/ref=sr_1_5?ie=UTF8s=booksqid=1248742811sr=1-5 and 2) Database in Depth: Relational Theory for Practitioners http://www.amazon.com/Database-Depth-Relational-Theory-Practitioners/dp/0596100124/ref=sr_1_7?ie=UTF8s=booksqid=1248742811sr=1-7 I recommend #2. It's shorter and easier to read than An Introduction to Database Systems, and I think it will answer your question about relational theory and aggregates (see SUMMARIZE). Appendix A is a part of Databases, Types, and The Relational Model: The Third Manifesto. That's interesting, but it's not an easy read, it's describing a system in formal detail. Regards, Jeff Davis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Clients disconnect but query still runs
Robert James srobertja...@gmail.com writes: Hi. I noticed that when clients (both psql and pgAdmin) disconnect or cancel, queries are often still running on the server. A few questions: 1) Is there a way to reconnect and get the results? No. 2) Is there a way to tell postgres to automatically stop all queries when the client who queried them disconnects? No. 3) Is there a way to see all queries whose clients have disconnected? No. 4) And finally: Why is this the behavior? It's not easy to tell whether a client has disconnected (particularly if the network stack is unhelpful, which is depressingly often true). Postgres will cancel a query if it gets told that the connection's been dropped, but it will only discover this when an attempt to output to the client fails. It does not spend cycles looking aside to see if the connection has dropped when it is doing something that doesn't involve output to the client. If your client code is polite enough to send a cancel request before disconnecting, that should terminate the query reasonably promptly. But just yanking the plug doesn't do that. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] A question about the permissions
Tim Uckun timuc...@gmail.com writes: What is the rationale for not giving the group any permissions at all? On lots of systems, giving group permissions is nearly as bad as giving world permissions (eg, all the users might be in a users group). So we don't do it by default. If you want to poke holes in the security of your own installation, go right ahead. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgres 8.2 database recovery Could not create relation Invalid Argument
Novice here :). I have PostgreSQL 8.2 installed on a single board computer running Windows XP Embedded on a Compact Flash drive - 2 databases with no more than 2000 ro. After 10 power cycles spaced 6 mins apart, I noticed the postgres.exe processes no longer running. I located log file (see below) and it appears postgres attempts to recover but fails to do so. Any suggestions: 2009-07-28 04:31:01 LOG: database system was interrupted at 2009-07-28 04:27:54 GMT Daylight Time 2009-07-28 04:31:01 LOG: checkpoint record is at 0/7CBF58 2009-07-28 04:31:01 LOG: redo record is at 0/7CBF58; undo record is at 0/0; shutdown TRUE 2009-07-28 04:31:01 LOG: next transaction ID: 0/12290; next OID: 17183 2009-07-28 04:31:01 LOG: next MultiXactId: 1; next MultiXactOffset: 0 2009-07-28 04:31:01 LOG: database system was not properly shut down; automatic recovery in progress 2009-07-28 04:31:02 LOG: redo starts at 0/7CBFA8 2009-07-28 04:31:02 FATAL: could not create relation 1663/16403/16586: Invalid argument 2009-07-28 04:31:02 CONTEXT: xlog redo update: rel 1663/16403/16586; tid 35/32; new 35/33 2009-07-28 04:31:02 LOG: startup process (PID 1484) exited with exit code 1 2009-07-28 04:31:02 LOG: aborting startup due to startup process failure 2009-07-28 04:31:02 LOG: logger shutting down Thanks in advance for any insights. Jus