Re: [GENERAL] Weird performance issue with custom function with a for loop.
On 01/31/2011 08:48 PM, Tom Lane wrote: Nicos Panayidesni...@magneta.com.cy writes: the following function takes forever to execute as is. I see 'Inserting original actions in temporary table' and nothing after that. If i replace orig_user_id in the FOR loop with 1811 (the same orig_user_id passed as the function parameter) it returns immediately correctly (the table has indices so it's very fast). It seems likely that you're getting a different plan for the generic case because that user id isn't representative of the overall average for the column. You could investigate by explaining a parameterized query: PREPARE foo (bigint) AS SELECT ... WHERE game_round_actions.user_id = $1 ... ; EXPLAIN EXECUTE foo(1811); (To really see exactly what's happening, you'd probably need to parameterize for each of the plpgsql variables used in the query; I'm suspicious that the BETWEEN might be contributing to the issue as well.) Possibly increasing the stats target for the user id column would help, but it's hard to be sure without knowing what its distribution is like. regards, tom lane I tried the prepared statement with both $1 and 1811 for user_id and here's the plans I got: Sort (cost=51704688.71..51704689.50 rows=314 width=57) Sort Key: game_round_actions.action_time, game_round_actions.action_id - Nested Loop (cost=0.00..51704675.69 rows=314 width=57) - Seq Scan on game_round_actions (cost=0.00..51702078.26 rows=314 width=53) Filter: ((action_time = $2) AND (action_time = $3) AND (sub_action_id = 0) AND (user_id = $1)) - Index Scan using PK_game_table on game_tables (cost=0.00..8.26 rows=1 width=12) Index Cond: (game_tables.table_id = game_round_actions.table_id) Filter: (game_tables.game_type_id ANY ($4)) Sort (cost=226660.58..226661.33 rows=300 width=57) Sort Key: game_round_actions.action_time, game_round_actions.action_id - Nested Loop (cost=0.00..226648.24 rows=300 width=57) - Index Scan using i_session on game_round_actions (cost=0.00..224166.97 rows=300 width=53) Index Cond: ((action_time = $2) AND (action_time = $3)) Filter: (user_id = 1811) - Index Scan using PK_game_table on game_tables (cost=0.00..8.26 rows=1 width=12) Index Cond: (game_tables.table_id = game_round_actions.table_id) Filter: (game_tables.game_type_id ANY ($4)) Here's the table definition: CREATE TABLE game_round_actions ( table_id bigint NOT NULL, round_id integer NOT NULL, action_id integer NOT NULL, seat_id integer NOT NULL, action_desc character varying(20) NOT NULL, action_area character varying(100), amount numeric(16,6), action_value character varying(100), action_time timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP AT TIME ZONE 'UTC', user_id bigint, sub_action_id integer NOT NULL, -- Sub action id is 0 for the root actions. 0 for generated actions. CONSTRAINT PK_game_round_actions PRIMARY KEY (table_id, round_id, action_id, sub_action_id), CONSTRAINT fk_game_round_actions_round FOREIGN KEY (table_id, round_id) REFERENCES game_rounds (table_id, round_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE, CONSTRAINT fk_game_round_actions_user FOREIGN KEY (table_id, user_id) REFERENCES game_table_users (table_id, user_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH ( OIDS=FALSE ); CREATE INDEX i_session ON game_round_actions USING btree (action_time) WHERE user_id 0 AND sub_action_id = 0; The table contains 1 655 528 000 rows (estimated) and there are about 1 unique user_ids. The data spans about 2 years. Shouldn't postgres realise that in both cases user_id is compared against a constant value and chose the same plan? How do I increase the stats target for the column? -- Regards, Nicos Panayides IT Manager Magneta Technologies Ltd Tel: +357 22721919, 22317400 Fax: +357 22721917 Web: http://www.magneta.eu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Select for update with offset interferes with concurrent transactions
Hello all, I am in the process of migrating a system from Postgresql 8.3 to 9.0, and have run into a problem with the task queue systems I am using. The task queue controls the allocation of tasks between about 1000 processes working in parallel, and is essentially a table of record_id (unique) project_id task_description_id state (idle, started, finished) Each project currently have about 2 million entries. My plan is to increase that significantly the next few months. To avoid having the processes trample each other's queries (the first attempt was to select the first matching entries of the table, which caused one to block all other transactions), one of the steps I took was to select a set of idle rows at a random offset into the table from the project, mark them for update, then update each record's state as started. SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE At present n is 100-150, i is a random value in the range 0-1. There is, intentionally, no ordering specified, since that would just slow down the query, and is not necessary. For reference, the above query is sent through Django's cursor.execute() call in a manual transaction block. What I've discovered when using Postgres 9.0 is that the processes are now blocking every other query into this table, apparently reducing the task processing speed by at least a factor of 10, and increasing the load on the server by a similar factor, compared to when Postgres 8.3 was used. The problem is apparent just after starting, with only 50-100 processes active (startup is staggered). Reducing n (and looping), or increasing the i range did not work. The reason seems to be this new part of http://www.postgresql.org/docs/9.0/static/sql-select.html (towards the end of the FOR UPDATE section): If a LIMIT is used, locking stops once enough rows have been returned to satisfy the limit (but note that rows skipped over by OFFSET will get locked). Similarly, if FOR UPDATE or FOR SHARE is used in a cursor's query, only rows actually fetched or stepped past by the cursor will be locked. I can't find similar text in the 8.3 or 8.4 documentation. AFAICT, and assuming I have not misunderstood this part of the documentation this means that if one of my processing nodes selects a block of 100 entries at offset 8000 in the resulting table, then every other node will be blocked while the block is being processed, not just the nodes that would have selected the rows in the range 0 to 7999, but also =8100, because they cannot gain access to the rows. Also, using FOR SHARE does not seem to solve the problem. IMO, as a database non-expert, locking rows that were not returned as a result of the query is a bug. As an example, if a query selects the X last items in the matching rows, that is equivalent to locking the table, or the relevant part of it, even if the requester have no intention to modify those other rows. Is there any way to avoid this problem? Or do I have to add a random batch_id field to the queue table in order to separate the processes' queries so that they do not block each other (as frequently)? Is it possible to disable the source code causing this (that is, reverting the patch that introduced the problem, or changing a configuration switch)? -- Sincerely, Yngve N. Pettersen Senior Developer Email: yn...@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax:+47 23 69 24 01 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] cast problem in Postgresql 9.0.1
On Monday, January 31, 2011 10:14:29 pm AI Rumman wrote: I migrated my DB from Postgresql 8.1 to Postgresql 9.0.1. I have a table testtab \d testtab id int, hours varchar When I execute the following: select sum(hours) from testtab I get cast error. Try: select sum(hours::int) from testtab; Then, I created following IMPLICIT CAST functions in my DB = CREATE FUNCTION pg_catalog.integer(varchar) RETURNS int4 STRICT IMMUTABLE LANGUAGE SQL AS 'SELECT int4in(varcharout($1));'; CREATE CAST (varchar AS integer) WITH FUNCTION pg_catalog.integer(varchar) AS IMPLICIT; CREATE FUNCTION pg_catalog.smallint(varchar) RETURNS smallint STRICT IMMUTABLE LANGUAGE SQL AS 'SELECT int2in(varcharout($1));'; CREATE CAST (varchar AS smallint) WITH FUNCTION pg_catalog.smallint(varchar) AS IMPLICIT; Now, the above query works, but SELECT COALESCE(hours,0) from testtab failed. Any idea why? -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
On 2/1/2011 6:32 AM, Yngve Nysaeter Pettersen wrote: Hello all, I am in the process of migrating a system from Postgresql 8.3 to 9.0, and have run into a problem with the task queue systems I am using. The task queue controls the allocation of tasks between about 1000 processes working in parallel, and is essentially a table of record_id (unique) project_id task_description_id state (idle, started, finished) Each project currently have about 2 million entries. My plan is to increase that significantly the next few months. To avoid having the processes trample each other's queries (the first attempt was to select the first matching entries of the table, which caused one to block all other transactions), one of the steps I took was to select a set of idle rows at a random offset into the table from the project, mark them for update, then update each record's state as started. SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE At present n is 100-150, i is a random value in the range 0-1. There is, intentionally, no ordering specified, since that would just slow down the query, and is not necessary. For reference, the above query is sent through Django's cursor.execute() call in a manual transaction block. What I've discovered when using Postgres 9.0 is that the processes are now blocking every other query into this table, apparently reducing the task processing speed by at least a factor of 10, and increasing the load on the server by a similar factor, compared to when Postgres 8.3 was used. The problem is apparent just after starting, with only 50-100 processes active (startup is staggered). Reducing n (and looping), or increasing the i range did not work. The reason seems to be this new part of http://www.postgresql.org/docs/9.0/static/sql-select.html (towards the end of the FOR UPDATE section): If a LIMIT is used, locking stops once enough rows have been returned to satisfy the limit (but note that rows skipped over by OFFSET will get locked). Similarly, if FOR UPDATE or FOR SHARE is used in a cursor's query, only rows actually fetched or stepped past by the cursor will be locked. I can't find similar text in the 8.3 or 8.4 documentation. AFAICT, and assuming I have not misunderstood this part of the documentation this means that if one of my processing nodes selects a block of 100 entries at offset 8000 in the resulting table, then every other node will be blocked while the block is being processed, not just the nodes that would have selected the rows in the range 0 to 7999, but also =8100, because they cannot gain access to the rows. Also, using FOR SHARE does not seem to solve the problem. IMO, as a database non-expert, locking rows that were not returned as a result of the query is a bug. As an example, if a query selects the X last items in the matching rows, that is equivalent to locking the table, or the relevant part of it, even if the requester have no intention to modify those other rows. Is there any way to avoid this problem? Or do I have to add a random batch_id field to the queue table in order to separate the processes' queries so that they do not block each other (as frequently)? Is it possible to disable the source code causing this (that is, reverting the patch that introduced the problem, or changing a configuration switch)? So, if I understand correctly, you: q = SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE while not q.eof update queue set state = started where record_id = x; process record_id update queue set state = finsihed where record_id = x; q.next; Might I suggest and alternative: q = update queue set state = started WHERE project_id = my_project AND state = idle LIMIT n OFFSET i RETURNING project_id; idlist = @q; commit; foreach x in idlist process record_id begin update queue set state = finsihed where record_id = x; commit; Forgive the part perl part python sudocode. Oh, and I've never done this, no idea if it actually works. :-) -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Cluster table and order information
Hello, From the documentation of CLUSTER table (http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I understand that clustering can be achieved by re-creating the table like this: CREATE TABLE newtable AS SELECT * FROM table ORDER BY columnlist; My question is: If I upload with COPY a datafile which is already correctly sorted, can I inform postgres of such order, so that no clustering is necessary after the import? In other words, how can I tell postgres that my file is order by this and that column? Many thanks! Dario -- Dr. Dario Beraldi Institute of Evolutionary Biology University of Edinburgh West Mains Road Edinburgh EH9 3JT Scotland, UK -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Using the copy protocol, is there a way to be notified of a potential error before sending?
Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to send data during many minutes, and is there a way to be notified of a potential error (for example an invalid value in one column) before sending PQputCopyEnd? Or do I have to send my data in small chunks (for example batch of 1 rows), issue a PQputCopyEnd, check for errors, and continue with the next chunk? I tried the command \copy ... from stdio in psql and it looks like psql has to read the entire input before returning an error, even if the invalid value is in one of the first sent rows. Thanks for your help and advice. Regards, Nicolas Grilly
[GENERAL] Book recommendation?
As a result of my recent encounter with table bloat and other tuning issues I've been running into, I'm looking for a good resource for improving my tuning skills. My sysadmin ran into the following book: PostgreSQL 9.0 High Performance, by Gregory Smith, ISBN 184951030X http://amzn.com/184951030X Which covers versions 8.1 through 9. Any opinions on this book? Other suggestions? Thank you, Herouth -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Cluster table and order information
On 2/1/2011 9:08 AM, Dario Beraldi wrote: Hello, From the documentation of CLUSTER table (http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I understand that clustering can be achieved by re-creating the table like this: CREATE TABLE newtable AS SELECT * FROM table ORDER BY columnlist; My question is: If I upload with COPY a datafile which is already correctly sorted, can I inform postgres of such order, so that no clustering is necessary after the import? In other words, how can I tell postgres that my file is order by this and that column? Many thanks! Dario The planner has no knowledge of cluster. Meaning PG will query a clustered and unclustered table exactly the same way. A table is not marked or anything as clustered. And in fact, during usage of a table it'll become unclustered. Clustering is only useful when you are going to read multiple records in the same order as an index. It turns more random seeks into more sequential reads. If your COPY loads data in indexed order, then just dont run the cluster. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Book recommendation?
As a result of my recent encounter with table bloat and other tuning issues I've been running into, I'm looking for a good resource for improving my tuning skills. My sysadmin ran into the following book: PostgreSQL 9.0 High Performance, by Gregory Smith, ISBN 184951030X http://amzn.com/184951030X Which covers versions 8.1 through 9. Any opinions on this book? Other suggestions? 100% positive - go and buy it. It's very thorough, it covers a lot of associated topics (hw, filesystem, ...) and it's fresh. There were several reviews at planet.postgresql.org and all of them weve very positive. regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Book recommendation?
On 02/01/11 08:55, Herouth Maoz wrote: As a result of my recent encounter with table bloat and other tuning issues I've been running into, I'm looking for a good resource for improving my tuning skills. My sysadmin ran into the following book: PostgreSQL 9.0 High Performance, by Gregory Smith, ISBN 184951030X http://amzn.com/184951030X Which covers versions 8.1 through 9. Any opinions on this book? Other suggestions? Thank you, Herouth I bought this when it first came out. I bought the pdf version directly from Packt bundled with the PostgreSQL 9 Admin Cookbook for an amazingly low price. I doubt you'll find a better book resource for recent versions of PostgreSQL. Both are well and clearly written, and cover a lot of ground in great detail. Greg Smith has helped me and countless others on this and the Performance list and he is truly an authoritative resource. Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PD_ALL_VISIBLE flag was incorrectly set in relation pg_statistic
On Mon, Jan 31, 2011 at 2:55 PM, bricklen brick...@gmail.com wrote: We just had a slew of the following messages in our log. How concerned should I be at this point? I have no idea what triggered it. Version: PostgreSQL 9.0.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit 2011-01-31 14:16:13 PST [32767]: [1-1] (user=) (rhost=) WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation pg_statistic page 22 ... Are there any other details I can provide to help diagnose this issue? It is a production server, so I'd like to rule out the likelihood of corruption etc if possible. There were about 3400 entries related to that message yesterday, referencing two tables (pg_statistics and a user table). The messages haven't reoccurred, and I've run a couple of cluster-wide vacuum analyzes to see if I could shake anything else out. Cheers, Bricklen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Problem with encode () and hmac() in pgcrypto
Hi, I am trying to compute a hash string with 172 bytes. I have a openssl RSA private key stored in the postgresql db and a string with some text. I want to sign that string with the private key and the SHA1 algorithm and encode the result in the base64 format. I am writing it inside the pg database in a function that performs other stuff. String to sign: v_em_crt_conc = '2011-01-31;2011-02-01T13:33:38;100036;684.40; ' Private RSA Key: This Key was genarated in OpenSSL with this command: ~$openssl genrsa -out abc.pem 1024 and then copied the content to v_Private v_Private = 'MIICXAIBAAKBgQCfaFpbOjsz0fKygWc9zdvu1XjOSJEZJ9XwlAyayxt2A57OUjxJ GJZBwZDLWx+rYga2B04v5MigY9q/TfrSzbQZpmPf6hC/U36y7pgPce1ijosl7/on 4UejIlLAcqJgkoJVrkaRp0NuX5hcmFiR2z8b+ypHtF3t/JOHkz+gUAmc7wIDAQAB AoGBAIuNcbz356hgHpaDjvyeYfJe1FnTUaOyKitpkPda5HmWhcqeV8SuT6zVZouB BOKm+LUXBC4Nnk473N8px3IRP57rCaLwFdQCrfVMieAkdVPoLIryofo81jF4bbOo yUJ1E901lCbSW8bnPhrWz1zFVWBUHoik2aWIiETs8v7HD8RBAkEA0YKX0SngYo6J 5BX8en9kG0cpE/oXrTOP5cTXuyOK/bG2pikVXfozSIZx3rbszAtAjSH72Mi/djLo WsbEKL0KEQJBAMLHpLOssg06Hyj6XYGi8l6Yhxy2Vsv6qtwbxZ3soysuZrcmxgNm ek1wVyDsuTKRPQYcFgyi2fhaucXPKgzNNv8CQAIqDDFv8k5yVIBTrdECIeGaQZVg PwBeCP2BpAzd2CC4xj8/K2ZWIFN+eAyp7RZKg3cxw61AnaG18uqFJR+anFECQB8b yadUpp9MMBvYWMTSpIkNLU8dpCEx6MX8vYQqfijwYUFRNQDoBhKGJbRBepaj8a4f IDpmbA+pVdBM9PdUczsCQGrwaHYQw0NhUz2WIGvog8HUbkyAPLiWLpwWvwpjoAFs LcoEyU/UxuNTwGSL3o5aYCo1RFCZH0sBnhd4S0DH0JI=' The FUNCTION: encode(hmac(v_em_crt_conc, v_Private,'sha1'),'base64'); The Result: h6CpmrP1QCE/Mp3xn3utUEPtftg= This hash has 28 chars When I use OpenSSL in command line like this: ~$ echo 2011-01-31;2011-02-01T13:33:38;100036;684.40; | openssl dgst -sha1 -sign abc.pem | openssl enc -base64 -A The Result is: nKfxnt31+kk/RnKihJ0jKufq+nZvmPjVauGo8+tqJ1Y/ah/mAu4jSS1wnzU+wRygZ4CLIV9DGSs9bxBc4r9e71C8s9B5ms6Kpggmc12kdmqVHBRO28bPWb/YLCej59gZFFkvcCudweNAT4qHvVqWsOtFCf9kE4q92UIv1JcwSDU= This hash has 172 chars Does someone know where is my problem? Is there other way to implement? or is it simply impossible?? Thanks in advance Luis -- View this message in context: http://postgresql.1045698.n5.nabble.com/Problem-with-encode-and-hmac-in-pgcrypto-tp3366420p3366420.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Cluster table and order information
On 2/1/2011 10:17 AM, Dario Beraldi wrote: Quoting Andy Colson a...@squeakycode.net: On 2/1/2011 9:08 AM, Dario Beraldi wrote: Hello, From the documentation of CLUSTER table (http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I understand that clustering can be achieved by re-creating the table like this: CREATE TABLE newtable AS SELECT * FROM table ORDER BY columnlist; My question is: If I upload with COPY a datafile which is already correctly sorted, can I inform postgres of such order, so that no clustering is necessary after the import? In other words, how can I tell postgres that my file is order by this and that column? Many thanks! Dario The planner has no knowledge of cluster. Meaning PG will query a clustered and unclustered table exactly the same way. A table is not marked or anything as clustered. And in fact, during usage of a table it'll become unclustered. Clustering is only useful when you are going to read multiple records in the same order as an index. It turns more random seeks into more sequential reads. If your COPY loads data in indexed order, then just dont run the cluster. -Andy Thanks very much Andy, this clarifies my doubts. I was misled by the docs saying When a table is clustered, PostgreSQL remembers which index it was clustered by which made me think that the order information is stored somewhere. All the best Dario The next sentience clears it up: The form CLUSTER table_name reclusters the table using the same index as before. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weird performance issue with custom function with a for loop.
Nicos Panayides ni...@magneta.com.cy writes: On 01/31/2011 08:48 PM, Tom Lane wrote: It seems likely that you're getting a different plan for the generic case because that user id isn't representative of the overall average for the column. I tried the prepared statement with both $1 and 1811 for user_id and here's the plans I got: [ bad ] - Seq Scan on game_round_actions (cost=0.00..51702078.26 rows=314 width=53) Filter: ((action_time = $2) AND (action_time = $3) AND (sub_action_id = 0) AND (user_id = $1)) [ good ] - Index Scan using i_session on game_round_actions (cost=0.00..224166.97 rows=300 width=53) Index Cond: ((action_time = $2) AND (action_time = $3)) Filter: (user_id = 1811) So the question is why it won't use that index in the parameterized case ... CREATE INDEX i_session ON game_round_actions USING btree (action_time) WHERE user_id 0 AND sub_action_id = 0; ... and the answer is that it can't prove user_id 0 when it doesn't know the value of the parameter equated to user_id, so it cannot build a plan that relies on using that partial index. (IOW, if it did use the index, it would get the wrong answer if $1 happened to be zero.) I don't know the reason you had for making the index partial in the first place, but maybe you should reconsider that. Another possibility is to explicitly include user_id 0 in the query conditions, if you're certain that the passed-in value is never zero. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Cluster table and order information
Quoting Andy Colson a...@squeakycode.net: On 2/1/2011 9:08 AM, Dario Beraldi wrote: Hello, From the documentation of CLUSTER table (http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I understand that clustering can be achieved by re-creating the table like this: CREATE TABLE newtable AS SELECT * FROM table ORDER BY columnlist; My question is: If I upload with COPY a datafile which is already correctly sorted, can I inform postgres of such order, so that no clustering is necessary after the import? In other words, how can I tell postgres that my file is order by this and that column? Many thanks! Dario The planner has no knowledge of cluster. Meaning PG will query a clustered and unclustered table exactly the same way. A table is not marked or anything as clustered. And in fact, during usage of a table it'll become unclustered. Clustering is only useful when you are going to read multiple records in the same order as an index. It turns more random seeks into more sequential reads. If your COPY loads data in indexed order, then just dont run the cluster. -Andy Thanks very much Andy, this clarifies my doubts. I was misled by the docs saying When a table is clustered, PostgreSQL remembers which index it was clustered by which made me think that the order information is stored somewhere. All the best Dario -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
Hi, Thanks for the quick answer, Andy. On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson a...@squeakycode.net wrote: snip So, if I understand correctly, you: q = SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE while not q.eof update queue set state = started where record_id = x; process record_id update queue set state = finsihed where record_id = x; q.next; Almost, the update to started is done for all selected elements first, releasing the lock, then the items are processed one at a time, marking each finished as they complete. (each processing step can take minutes, so keeping a lock the whole time is not an option) Might I suggest and alternative: q = update queue set state = started WHERE project_id = my_project AND state = idle LIMIT n OFFSET i RETURNING project_id; idlist = @q; commit; foreach x in idlist process record_id begin update queue set state = finsihed where record_id = x; commit; Forgive the part perl part python sudocode. Oh, and I've never done this, no idea if it actually works. :-) Thanks for that suggestion, I'll take a look at it. While I hadn't caught on to the RETURNING part, I had been wondering if using a single step UPDATE might be a solution. One concern I have is how concurrent updates will affect the returned list (or if they will just be skipped, as SELECT would in normal transaction mode, if I understood correctly), or whether it might return with an error code (I know that the normal update return value is the number of updated items, just not sure if that applies for RETURNING). Although, I will note that this process (if it works) will, sort of, make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy might cause issues for concurrent updates for the use-cases where FOR UPDATE is relevant. -- Sincerely, Yngve N. Pettersen Senior Developer Email: yn...@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax:+47 23 69 24 01 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
On 2/1/2011 10:59 AM, Yngve Nysaeter Pettersen wrote: Hi, Thanks for the quick answer, Andy. On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson a...@squeakycode.net wrote: snip So, if I understand correctly, you: q = SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE while not q.eof update queue set state = started where record_id = x; process record_id update queue set state = finsihed where record_id = x; q.next; Almost, the update to started is done for all selected elements first, releasing the lock, then the items are processed one at a time, marking each finished as they complete. (each processing step can take minutes, so keeping a lock the whole time is not an option) Might I suggest and alternative: q = update queue set state = started WHERE project_id = my_project AND state = idle LIMIT n OFFSET i RETURNING project_id; idlist = @q; commit; foreach x in idlist process record_id begin update queue set state = finsihed where record_id = x; commit; Forgive the part perl part python sudocode. Oh, and I've never done this, no idea if it actually works. :-) Thanks for that suggestion, I'll take a look at it. While I hadn't caught on to the RETURNING part, I had been wondering if using a single step UPDATE might be a solution. One concern I have is how concurrent updates will affect the returned list (or if they will just be skipped, as SELECT would in normal transaction mode, if I understood correctly), or whether it might return with an error code (I know that the normal update return value is the number of updated items, just not sure if that applies for RETURNING). Although, I will note that this process (if it works) will, sort of, make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy might cause issues for concurrent updates for the use-cases where FOR UPDATE is relevant. Yeah, I'd wondered the same thing. It could be two updates hitting the same row will deadlock, or maybe not, I'm not sure. But I think its the same as with the select, if you happen to have two limits that hit the same range, you're in trouble. I think the random limit thing is a race condition itself. Whenever you have multiple processes hitting the same rows you're going to run into problems. Have you thought of using a sequence instead of a random limit? Each process could get the next 100 record_id'd via a sequence, then there would be much less chance of deadlock. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
Yngve Nysaeter Pettersen yn...@opera.com writes: To avoid having the processes trample each other's queries (the first attempt was to select the first matching entries of the table, which caused one to block all other transactions), one of the steps I took was to select a set of idle rows at a random offset into the table from the project, mark them for update, then update each record's state as started. SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE At present n is 100-150, i is a random value in the range 0-1. There is, intentionally, no ordering specified, since that would just slow down the query, and is not necessary. This seems like a pretty bad design. There are recognized ways to solve this problem with more predictability and much less chance of different processes blocking each other. In particular, this query seems be based on some untenable assumptions about the physical row order being stable. What I've discovered when using Postgres 9.0 is that the processes are now blocking every other query into this table, In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which means that rows skipped over by OFFSET still get locked, which means that different sessions executing this query are now practically certain to block each other, rather than just likely to block each other. This was an intentional change to improve the predictability of FOR UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the predictability of the behavior for you, just not in the direction you'd like :-( regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Book recommendation?
On Tue, Feb 1, 2011 at 11:14 AM, Jeff Ross jr...@wykids.org wrote: I doubt you'll find a better book resource for recent versions of PostgreSQL. Both are well and clearly written, and cover a lot of ground in great detail. Greg Smith has helped me and countless others on this and the Performance list and he is truly an authoritative resource. I've hired Greg for performance tuning work, and I must say he *really* knows what he's doing. I've read some chapters in the book and he seems to get across his knowledge very well into those pages. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
On Tue, 01 Feb 2011 18:18:17 +0100, Tom Lane t...@sss.pgh.pa.us wrote: Yngve Nysaeter Pettersen yn...@opera.com writes: To avoid having the processes trample each other's queries (the first attempt was to select the first matching entries of the table, which caused one to block all other transactions), one of the steps I took was to select a set of idle rows at a random offset into the table from the project, mark them for update, then update each record's state as started. SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE At present n is 100-150, i is a random value in the range 0-1. There is, intentionally, no ordering specified, since that would just slow down the query, and is not necessary. This seems like a pretty bad design. Well, I don't claim to be a database expert ;). While there might be better ways, the current one have worked OK in the year since it was implemented. There are recognized ways to solve this problem with more predictability and much less chance of different I'd appreciate it if you could provide a couple of pointers. processes blocking each other. In particular, this query seems be based on some untenable assumptions about the physical row order being stable. No, it does not assume that the row order is stable; I don't really care about the order of the elements, since the actual order of task execution depends much more significantly on other variables, and the actual order isn't important at all (although further design changes might impose some limited category grouping on the queue, that would still not make the ordering important within the group). What I've discovered when using Postgres 9.0 is that the processes are now blocking every other query into this table, In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which means that rows skipped over by OFFSET still get locked, which means that different sessions executing this query are now practically certain to block each other, rather than just likely to block each other. This was an intentional change to improve the predictability of FOR UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the predictability of the behavior for you, just not in the direction you'd like :-( That might be, but is is necessary to continue locking (which is what it sounds like to me) the elements that are not used in the final response past completing the query? What happens now, if I understand it correctly, is that if a select foo from bar limit 1 order by whatever offset tablelen-1 for update is performed, the effective operation is also LOCK bar, not just a row lock on item tablelen-1 in that table. Was that the intention? (and yes, I am aware that ordering might be used to reverse that sequence so offset 0 can be used, but wouldn't that just as much block the query for offset 1?) -- Sincerely, Yngve N. Pettersen Senior Developer Email: yn...@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax:+47 23 69 24 01 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson a...@squeakycode.net wrote: On 2/1/2011 10:59 AM, Yngve Nysaeter Pettersen wrote: Hi, Thanks for the quick answer, Andy. On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson a...@squeakycode.net wrote: snip So, if I understand correctly, you: q = SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE while not q.eof update queue set state = started where record_id = x; process record_id update queue set state = finsihed where record_id = x; q.next; Almost, the update to started is done for all selected elements first, releasing the lock, then the items are processed one at a time, marking each finished as they complete. (each processing step can take minutes, so keeping a lock the whole time is not an option) Might I suggest and alternative: q = update queue set state = started WHERE project_id = my_project AND state = idle LIMIT n OFFSET i RETURNING project_id; idlist = @q; commit; foreach x in idlist process record_id begin update queue set state = finsihed where record_id = x; commit; Forgive the part perl part python sudocode. Oh, and I've never done this, no idea if it actually works. :-) Thanks for that suggestion, I'll take a look at it. While I hadn't caught on to the RETURNING part, I had been wondering if using a single step UPDATE might be a solution. One concern I have is how concurrent updates will affect the returned list (or if they will just be skipped, as SELECT would in normal transaction mode, if I understood correctly), or whether it might return with an error code (I know that the normal update return value is the number of updated items, just not sure if that applies for RETURNING). Although, I will note that this process (if it works) will, sort of, make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy might cause issues for concurrent updates for the use-cases where FOR UPDATE is relevant. Yeah, I'd wondered the same thing. It could be two updates hitting the same row will deadlock, or maybe not, I'm not sure. But I think its the same as with the select, if you happen to have two limits that hit the same range, you're in trouble. I think the random limit thing is a race condition itself. Whenever you have multiple processes hitting the same rows you're going to run into problems. Have you thought of using a sequence instead of a random limit? Each process could get the next 100 record_id'd via a sequence, then there would be much less chance of deadlock. How would that work, in case you would like to provide an example? I am not really familiar with sequences, as I have only seen them used for the id field in Django generated tables. In case it is relevant, the processes does not (currently, at least) have a unique ID; though they have a local sequence number for the machine they are running on. -- Sincerely, Yngve N. Pettersen Senior Developer Email: yn...@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax:+47 23 69 24 01 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Book recommendation?
hero...@unicell.co.il (Herouth Maoz) writes: As a result of my recent encounter with table bloat and other tuning issues I've been running into, I'm looking for a good resource for improving my tuning skills. My sysadmin ran into the following book: PostgreSQL 9.0 High Performance, by Gregory Smith, ISBN 184951030X http://amzn.com/184951030X Which covers versions 8.1 through 9. Any opinions on this book? Other suggestions? It's the best thing in that vein that is available, and is rather good. The only thing that's *somewhat* comparable is _PostgreSQL_ by Korry Douglas Susan Douglas, which, alas, is now 7 years old, and thus rather dated. It was the one reference that had substantive material on query planning, though that's likely somewhat less relevant to you. You should certainly take a look at Greg Smith's book. -- (reverse (concatenate 'string moc.liamg @ enworbbc)) http://linuxdatabases.info/info/finances.html what would we do without C? we would have PASAL, BASI, OBOL, and Ommon Lisp. -- #Erik -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote: On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson a...@squeakycode.net I think the random limit thing is a race condition itself. Whenever you have multiple processes hitting the same rows you're going to run into problems. Have you thought of using a sequence instead of a random limit? Each process could get the next 100 record_id'd via a sequence, then there would be much less chance of deadlock. How would that work, in case you would like to provide an example? I am not really familiar with sequences, as I have only seen them used for the id field in Django generated tables. In case it is relevant, the processes does not (currently, at least) have a unique ID; though they have a local sequence number for the machine they are running on. I have a really simple q table I use. create table q (id integer not null, msg integer, primary key(id)); create sequence q_add; create sequence q_read; I insert via q_add: andy=# insert into q(id, msg) values(nextval('q_add'), 20); INSERT 0 1 andy=# insert into q(id, msg) values(nextval('q_add'), 4); INSERT 0 1 andy=# select * from q; id | msg +- 1 | 20 2 | 4 (2 rows) Then I run multiple batch proc's which get their next job like: andy=# select msg from q where id = (select nextval('q_read')); msg - 20 (1 row) andy=# select msg from q where id = (select nextval('q_read')); msg - 4 (1 row) It works for me because I can empty the q table, reset the q_add and q_read sequences and start over clean. Not sure if it would work for your setup. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weird performance issue with custom function with a for loop.
On 02/01/2011 06:52 PM, Tom Lane wrote: Nicos Panayidesni...@magneta.com.cy writes: On 01/31/2011 08:48 PM, Tom Lane wrote: It seems likely that you're getting a different plan for the generic case because that user id isn't representative of the overall average for the column. I tried the prepared statement with both $1 and 1811 for user_id and here's the plans I got: [ bad ] - Seq Scan on game_round_actions (cost=0.00..51702078.26 rows=314 width=53) Filter: ((action_time= $2) AND (action_time= $3) AND (sub_action_id = 0) AND (user_id = $1)) [ good ] - Index Scan using i_session on game_round_actions (cost=0.00..224166.97 rows=300 width=53) Index Cond: ((action_time= $2) AND (action_time= $3)) Filter: (user_id = 1811) So the question is why it won't use that index in the parameterized case ... CREATE INDEX i_session ON game_round_actions USING btree (action_time) WHERE user_id 0 AND sub_action_id = 0; ... and the answer is that it can't prove user_id 0 when it doesn't know the value of the parameter equated to user_id, so it cannot build a plan that relies on using that partial index. (IOW, if it did use the index, it would get the wrong answer if $1 happened to be zero.) I don't know the reason you had for making the index partial in the first place, but maybe you should reconsider that. Another possibility is to explicitly include user_id 0 in the query conditions, if you're certain that the passed-in value is never zero. regards, tom lane I added an additional user_id 0 and performance is normal. I didn't know that the database does not use the value of the parameter to chose a plan. It makes sense now that you have explained it. The reason we use a partial index is that the table is very huge and we are really interested only in those rows. Actually the condition should have used user_id IS NOT NULL instead of 0. Unfortunately the index takes quite a few hours to build so we implemented the workaround you suggested for now. If the condition was user_id IS NOT NULL AND sub_action_id=0 would postgres always chose the index since user_id = NULL will always be false? Thank you very much for help! -- Regards, Nicos Panayides IT Manager Magneta Technologies Ltd Tel: +357 22721919, 22317400 Fax: +357 22721917 Web: http://www.magneta.eu -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
Thanks Andy, On Tue, 01 Feb 2011 19:29:08 +0100, Andy Colson a...@squeakycode.net wrote: On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote: On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson a...@squeakycode.net I think the random limit thing is a race condition itself. Whenever you have multiple processes hitting the same rows you're going to run into problems. Have you thought of using a sequence instead of a random limit? Each process could get the next 100 record_id'd via a sequence, then there would be much less chance of deadlock. How would that work, in case you would like to provide an example? I am not really familiar with sequences, as I have only seen them used for the id field in Django generated tables. In case it is relevant, the processes does not (currently, at least) have a unique ID; though they have a local sequence number for the machine they are running on. I have a really simple q table I use. create table q (id integer not null, msg integer, primary key(id)); create sequence q_add; create sequence q_read; I insert via q_add: andy=# insert into q(id, msg) values(nextval('q_add'), 20); INSERT 0 1 andy=# insert into q(id, msg) values(nextval('q_add'), 4); INSERT 0 1 andy=# select * from q; id | msg +- 1 | 20 2 | 4 (2 rows) Then I run multiple batch proc's which get their next job like: andy=# select msg from q where id = (select nextval('q_read')); msg - 20 (1 row) andy=# select msg from q where id = (select nextval('q_read')); msg - 4 (1 row) It works for me because I can empty the q table, reset the q_add and q_read sequences and start over clean. Not sure if it would work for your setup. I see how that would work (it is essentially how Django assigns row ids). My current setup can have multiple runs configured at a time (and have had several dozen queued, in one case), with varying priorities on each run, and they might, at least theoretically, be configured in parallel (even the individual runs are set up in parallel), meaning the ids would not be sequential (a sequence is used for the id field in each row of the table), unless they could somehow be allocated for each individual run/project (multiple sequence objects, one for each run might be an option, but I don't like that possibility). And as I mentioned elsewhere in the thread I might make the queuing a bit more complex, which might make this system even more complicated. So, AFAICT I am afraid it would not work in the general case for my project :( . However, it might be useful in somebody else's project :) . -- Sincerely, Yngve N. Pettersen Senior Developer Email: yn...@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax:+47 23 69 24 01 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
On 2/1/2011 12:51 PM, Yngve Nysaeter Pettersen wrote: Thanks Andy, On Tue, 01 Feb 2011 19:29:08 +0100, Andy Colson a...@squeakycode.net wrote: On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote: On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson a...@squeakycode.net I think the random limit thing is a race condition itself. Whenever you have multiple processes hitting the same rows you're going to run into problems. Have you thought of using a sequence instead of a random limit? Each process could get the next 100 record_id'd via a sequence, then there would be much less chance of deadlock. How would that work, in case you would like to provide an example? I am not really familiar with sequences, as I have only seen them used for the id field in Django generated tables. In case it is relevant, the processes does not (currently, at least) have a unique ID; though they have a local sequence number for the machine they are running on. I have a really simple q table I use. create table q (id integer not null, msg integer, primary key(id)); create sequence q_add; create sequence q_read; I insert via q_add: andy=# insert into q(id, msg) values(nextval('q_add'), 20); INSERT 0 1 andy=# insert into q(id, msg) values(nextval('q_add'), 4); INSERT 0 1 andy=# select * from q; id | msg +- 1 | 20 2 | 4 (2 rows) Then I run multiple batch proc's which get their next job like: andy=# select msg from q where id = (select nextval('q_read')); msg - 20 (1 row) andy=# select msg from q where id = (select nextval('q_read')); msg - 4 (1 row) It works for me because I can empty the q table, reset the q_add and q_read sequences and start over clean. Not sure if it would work for your setup. I see how that would work (it is essentially how Django assigns row ids). My current setup can have multiple runs configured at a time (and have had several dozen queued, in one case), with varying priorities on each run, and they might, at least theoretically, be configured in parallel (even the individual runs are set up in parallel), meaning the ids would not be sequential (a sequence is used for the id field in each row of the table), unless they could somehow be allocated for each individual run/project (multiple sequence objects, one for each run might be an option, but I don't like that possibility). And as I mentioned elsewhere in the thread I might make the queuing a bit more complex, which might make this system even more complicated. So, AFAICT I am afraid it would not work in the general case for my project :( . However, it might be useful in somebody else's project :) . No, I didn't think it would work for you, yours looks much more complicated than main. Just out of curiosity, have you looked at PgQ? -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
Hmm... May I ask how this look in details. If e.g. I do select * from myeshop offset 100 limit 20, I have 1000 rows which rows will be locked? a) 0 to 120, or b) all rows will be locked.? Kind regards, Radek Tom Lane t...@sss.pgh.pa.us Tuesday 01 February 2011 18:18:17 In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which means that rows skipped over by OFFSET still get locked, which means that different sessions executing this query are now practically certain to block each other, rather than just likely to block each other. This was an intentional change to improve the predictability of FOR UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the predictability of the behavior for you, just not in the direction you'd like :-( regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
On Tue, 01 Feb 2011 20:04:31 +0100, Andy Colson a...@squeakycode.net wrote: On 2/1/2011 12:51 PM, Yngve Nysaeter Pettersen wrote: So, AFAICT I am afraid it would not work in the general case for my project :( . However, it might be useful in somebody else's project :) . No, I didn't think it would work for you, yours looks much more complicated than main. Just out of curiosity, have you looked at PgQ? I did look around for some queuing systems a year ago, I am not sure if that one crossed my path, but didn't find any that I thought would work for me, which might just be due to the fact that I had just started with database programming (which was also the reason I chose a framework like Django for most of it; the FOR UPDATE SQL is one of less than 10 locations where I use raw SQL in my system, because Django could not provide the functionality) and I just did not realize that it could help me. Regarding PgQ, based on a quick skimming I am not sure how it would fit in my case. This may be because the tutorial leaves (IMO) a bit too much up in the air regarding how the system it is working in is organized, at least for a relative beginner as myself, and also not how a similar alternative system would look. A small complete example showing all the tables involved, the client(s), the server(s), and the operations performed, might have helped. -- Sincerely, Yngve N. Pettersen Senior Developer Email: yn...@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax:+47 23 69 24 01 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Book recommendation?
On Tue, Feb 1, 2011 at 8:56 AM, Herouth Maoz hero...@unicell.co.il wrote: My sysadmin ran into the following book: PostgreSQL 9.0 High Performance, by Gregory Smith, ISBN 184951030X http://amzn.com/184951030X highly recommended. Also take a look at the pg admin cookbook from packt. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] yum repo problem
i have an older CentOS 4.8 server that is running 8.4.2 and wanted to update it. it was installed via the older yum repo pre the reorgs... # rpm -qa |grep postgres postgresql-server-8.4.2-1PGDG.rhel4 postgresql-jdbc-8.3.605-1PGDG.rhel4 postgresql-libs-8.4.2-1PGDG.rhel4 postgresql-8.4.2-1PGDG.rhel4 compat-postgresql-libs-4-1PGDG.rhel4 postgresql-contrib-8.4.2-1PGDG.rhel4 I fetch the updated yum.pgrpms.org repo file, but when I try and update postgresql-server, I'm getting... # rpm -ivh http://yum.pgrpms.org/reporpms/8.4/pgdg-centos-8.4-2.noarch.rpm # yum update postgresql-server Setting up Update Process Setting up repositories Reading repository metadata in from local files Resolving Dependencies -- Populating transaction set with selected packages. Please wait. --- Package postgresql-server.i386 0:8.4.4-2PGDG.el4 set to be updated -- Running transaction check -- Processing Dependency: postgresql = 8.4.4-2PGDG.el4 for package: postgresql-server -- Restarting Dependency Resolution with new changes. -- Populating transaction set with selected packages. Please wait. --- Package postgresql.i386 0:8.4.4-2PGDG.el4 set to be updated -- Running transaction check -- Processing Dependency: postgresql-libs = 8.4.4-2PGDG.el4 for package: postgresql -- Processing Dependency: postgresql = 8.4.2 for package: postgresql-contrib -- Restarting Dependency Resolution with new changes. -- Populating transaction set with selected packages. Please wait. --- Package postgresql-libs.i386 0:8.4.4-2PGDG.el4 set to be updated --- Package postgresql-contrib.i386 0:8.4.4-2PGDG.el4 set to be updated -- Running transaction check Dependencies Resolved = Package Arch Version Repository Size = Updating: postgresql-server i386 8.4.4-2PGDG.el4 pgdg84 4.5 M Updating for dependencies: postgresql i386 8.4.4-2PGDG.el4 pgdg84 1.3 M postgresql-contrib i386 8.4.4-2PGDG.el4 pgdg84 376 k postgresql-libs i386 8.4.4-2PGDG.el4 pgdg84 179 k Transaction Summary = Install 0 Package(s) Update 4 Package(s) Remove 0 Package(s) Total download size: 6.4 M Is this ok [y/N]: y Downloading Packages: warning: rpmts_HdrFromFdno: V3 DSA signature: NOKEY, key ID e8e345b8 Public key for postgresql-8.4.4-2PGDG.el4.i386.rpm is not installed Retrieving GPG key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-PGDG GPG key at file:///etc/pki/rpm-gpg/RPM-GPG-KEY-PGDG (0x442DF0F8) is already installed The GPG keys listed for the PostgreSQL 8.4 4 - i386 repository are already installed but they are not correct for this package. Check that the correct key URLs are configured for this repository. # more /etc/pki/rpm-gpg/RPM-GPG-KEY-PGDG -BEGIN PGP PUBLIC KEY BLOCK- Version: GnuPG v1.4.7 (GNU/Linux) mQGiBEeD8koRBACC1VBRsUwGr9gxFFRho9kZpdRUjBJoPhkeOTvp9LzkdAQMFngr BFi6N0ov1kCX7LLwBmDG+JPR7N+XcH9YR1coSHpLVg+JNy2kFDd4zAyWxJafjZ3a 9zFg9Yx+0va1BJ2t4zVcmKS4aOfbgQ5KwIOWUujalQW5Y+Fw39Gn86qjbwCg5dIo tkM0l19h2sx50D027pV5aPsD/2c9pfcFTbMhB0CcKS836GH1qY+NCAdUwPs646ee Ex/k9Uy4qMwhl3HuCGGGa+N6Plyon7V0TzZuRGp/1742dE8IO+I/KLy2L1d1Fxrn XOTBZd8qe6nBwh12OMcKrsPBVBxn+iSkaG3ULsgOtx+HHLfa1/p22L5+GzGdxizr peBuA/90cCp+lYcEwdYaRoFVR501yDOTmmzBc1DrsyWP79QMEGzMqa393G0VnqXt L4pGmunq66Agw2EhPcIt3pDYiCmEt/obdVtSJH6BtmSDB/zYhbE8u3vLP3jfFDa9 KXxgtYj0NvuUVoRmxSKm8jtfmj1L7zoKNz3jl+Ba3L0WxIv4+bRBUG9zdGdyZVNR TCBSUE0gQnVpbGRpbmcgUHJvamVjdCA8cGdzcWxycG1zLWhhY2tlcnNAcGdmb3Vu ZHJ5Lm9yZz6IYAQTEQIAIAUCR4PySgIbIwYLCQgHAwIEFQIIAwQWAgMBAh4BAheA AAoJEB8W0uFELfD4jnkAoMqd6ZwwsgYHZ3hP9vt+DJt1uDW7AKDbRwP8ESKFhwdJ 8m91RPBeJW/tMLkCDQRHg/JKEAgA64+ZXgcERPYfZYo4p+yMTJAAa9aqnE3U4Ni6 ZMB57GPuEy8NfbNya+HiftO8hoozmJdcI6XFyRBCDUVCdZ8SE+PJdOx2FFqZVIu6 dKnr8ykhgLpNNEFDG3boK9UfLj/5lYQ3Y550Iym1QKOgyrJYeAp6sZ+Nx2PavsP3 nMFCSD67BqAbcLCVQN7a2dAUXfEbfXJjPHXTbo1/kxtzE+KCRTLdXEbSEe3nHO04 K/EgTBjeBUOxnciH5RylJ2oGy/v4xr9ed7R1jJtshsDKMdWApwoLlCBJ63jg/4T/ z/OtXmu4AvmWaJxaTl7fPf2GqSqqb6jLCrQAH7AIhXr9V0zPZwADBQgAlpptNQHl u7euIdIujFwwcxyQGfee6BG+3zaNSEHMVQMuc6bxuvYmgM9r7aki/b0YMfjJBk8v OJ3Eh1vDH/woJi2iJ13vQ21ot+1JP3fMd6NPR8/qEeDnmVXu7QAtlkmSKI9Rdnjz FFSUJrQPHnKsH4V4uvAM+njwYD+VFiwlBPTKNeL8cdBb4tPN2cdVJzoAp57wkZAN VA2tKxNsTJKBi8wukaLWX8+yPHiWCNWItvyB4WCEp/rZKG4A868NM5sZQMAabpLd l4fTiGu68OYgK9qUPZvhEAL2C1jPDVHPkLm+ZsD+90Pe66w9vB00cxXuHLzm8Pad GaCXCY8h3xi6VIhJBBgRAgAJBQJHg/JKAhsMAAoJEB8W0uFELfD4K4cAoJ4yug8y 1U0cZEiF5W25HDzMTtaDAKCaM1m3Cbd+AZ0NGWNg/VvIX9MsPA== =au6K -END PGP PUBLIC KEY BLOCK- so where do I go from here??!? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Some Problems - Shall I reinstall the DB?
In regarding of log message, I get the following error message on my application for the connection problem and there is NOT any error messages in the PG log file. It is the same with DbVisualizer - not error messages in the PG log file. ~~~ Caused by: org.postgresql.util.PSQLException: The connection attempt failed. at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:150) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66) at org.postgresql.jdbc2.AbstractJdbc2Connection.init(AbstractJdbc2Connection.java:125) at org.postgresql.jdbc3.AbstractJdbc3Connection.init(AbstractJdbc3Connection.java:30) at org.postgresql.jdbc3g.AbstractJdbc3gConnection.init(AbstractJdbc3gConnection.java:22) at org.postgresql.jdbc4.AbstractJdbc4Connection.init(AbstractJdbc4Connection.java:30) at org.postgresql.jdbc4.Jdbc4Connection.init(Jdbc4Connection.java:24) at org.postgresql.Driver.makeConnection(Driver.java:393) at org.postgresql.Driver.connect(Driver.java:267) at org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38) at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:294) at org.apache.commons.dbcp.BasicDataSource.validateConnectionFactory(BasicDataSource.java:1247) at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1221) ... 52 more Caused by: java.io.EOFException at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:261) at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:269) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:108) ... 64 more ~~~ When I try to access PG from PC command prompt with the psql command, I get the following error message: psql: server closed the connection unexpectedly This probably mean the server terminated abnormally before or while processing the request. This time a error message shows up in the PG log file: FATAL: could not reattach to shared memory (key=5432001, addr=011D): Invalid argument As I can tell, the PG log file doesn't tell the whole story. The version of PG is 8.0. - Original Message - From: Adrian Klaver adrian.kla...@gmail.com To: vwu98...@lycos.com Cc: pgsql-general pgsql-general@postgresql.org Sent: 2011年1月31日 星期一 下午01时30分09秒 GMT -08:00 US/Canada Pacific Subject: Re: [GENERAL] Some Problems - Shall I reinstall the DB? On 01/31/2011 01:18 PM, vwu98...@lycos.com wrote: The DB connection problem has nothing to do with Hibernate. As I have said that the problem can be solved by restarting the DB. This problem occurs with my application as well as DdVisualizer. So know we know it is a generic connection problem. Is there information in the Postgres logs that might be helpful? And as I have mentioned, the id error doesn't occur to other table. Again is there any information generated in the logs when you try to connect to that table? Another thing to check is whether the max id in the table is greater then the last value for the sequence associated with that table. Could be id(s) got inserted without using the sequence. At this time reinstalling the DB would seem to be excessive. There is no guarantee that the problem won't just appear again with out knowing what is in the first place. -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search parser's treatment of URLs and emails
I have added this as a TODO: * Improve handling of plus signs in email address user names, and perhaps improve URL parsing * http://archives.postgresql.org/pgsql-hackers/2010-10/msg00772.php --- Thom Brown wrote: Hi, I noticed that if I run this: SELECT alias, description, token FROM ts_debug('http://www.postgresql.org:2345/directory/page.html?version=9.1build=alpha1#summary'); I get: alias | description | token --+---+- protocol | Protocol head | http:// url | URL | www.postgresql.org:2345/directory/page.html?version=9.1build=alpha1#summary host | Host | www.postgresql.org:2345 url_path | URL path | /directory/page.html?version=9.1build=alpha1#summary (4 rows) It could be me being picky, but I don't regard parameters or page fragments as part of the URL path. Ideally, I'd sort of expect: alias | description | token --+---+- protocol | Protocol head | http:// url | URL | www.postgresql.org:2345/directory/page.html?version=9.1build=alpha1#summary host | Host | www.postgresql.org port | Port | 2345 url_path | URL path | /directory/page.html query_string | Query string | version=9.1build=alpha1 fragment | Page fragment | summary (7 rows) ... of course that's if there was support for query strings and page fragments, which there isn't. But if changes were made to support my definition of a URL path, they'd have to be considered breaking changes. But my main gripe is with the name url_path. Also: SELECT alias, description, token FROM ts_debug('myname+prior...@gmail.com'); Yields: alias | description | token ---+-+ asciiword | Word, all ASCII | myname blank | Space symbols | + email | Email address | prior...@gmail.com (3 rows) The entire string I entered is a valid email address, and isn't totally uncommon. Shouldn't that take such email address styles be taken into account? The example above incorrectly identifies the email address since the real destination address would most likely be myn...@gmail.com. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Book recommendation?
Herouth Maoz wrote: My sysadmin ran into the following book: PostgreSQL 9.0 High Performance, by Gregory Smith, ISBN 184951030X http://amzn.com/184951030X That guy's a troublemaker, but I guess he writes OK. There are three customer reviews at http://www.amazon.com/PostgreSQL-High-Performance-Gregory-Smith/product-reviews/184951030X/ and two longer ones at: http://people.planetpostgresql.org/andrew/index.php?/archives/130-Buy-this-book,-now..html http://www.postgresonline.com/journal/archives/192-postgresql9highperformance.html And here's some free samples: https://www.packtpub.com/article/postgresql-9-reliable-controller-disk-setup https://www.packtpub.com/article/postgresql-9-balancing-hardware-spending https://www.packtpub.com/article/server-configuration-tuning-postgresql https://www.packtpub.com/article/unix-monitoring-tool-for-postgresql https://www.packtpub.com/article/postgresql-tips-tricks https://www.packtpub.com/sites/default/files/0301OS-Chapter-2-Database-Hardware.pdf -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] yum repo problem
On 02/01/11 12:07 PM, John R Pierce wrote: warning: rpmts_HdrFromFdno: V3 DSA signature: NOKEY, key ID e8e345b8 Public key for postgresql-8.4.4-2PGDG.el4.i386.rpm is not installed Retrieving GPG key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-PGDG GPG key at file:///etc/pki/rpm-gpg/RPM-GPG-KEY-PGDG (0x442DF0F8) is already installed ... k, I did a yum clean all and its working now. odd. before it was trying to install... postgresql-server i386 8.4.4-2PGDG.el4 pgdg84 and now its installed... postgresql-server i386 8.4.6-1PGDG.rhel4 pgdg84 not sure i understand, but not sure it matters, problem solved. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Issues with generate_series using integer boundaries
On 1 February 2011 01:05, Tom Lane t...@sss.pgh.pa.us wrote: Thom Brown t...@linux.com writes: I've noticed that if I try to use generate_series to include the upper boundary of int4, it never returns: I'll bet it's testing currval bound without considering the possibility that incrementing currval caused an overflow wraparound. We fixed a similar problem years ago in plpgsql FOR-loops... Yes, you're right. Internally, the current value is checked against the finish. If it hasn't yet passed it, the current value is increased by the step. When it reaches the upper bound, since it hasn't yet exceeded the finish, it proceeds to increment it again, resulting in the iterator wrapping past the upper bound to become the lower bound. This then keeps it looping from the lower bound upward, so the current value stays well below the end. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Some Problems - Shall I reinstall the DB?
On Tuesday, February 01, 2011 12:03:20 pm vwu98...@lycos.com wrote: When I try to access PG from PC command prompt with the psql command, I get the following error message: psql: server closed the connection unexpectedly This probably mean the server terminated abnormally before or while processing the request. This time a error message shows up in the PG log file: FATAL: could not reattach to shared memory (key=5432001, addr=011D): Invalid argument As I can tell, the PG log file doesn't tell the whole story. The version of PG is 8.0. I am going to go out on a limb here and say you are running the server on Windows correct? Before we go any further the 8.0.x series is past end of life, especially for the Windows version. The problem you report above FATAL:... is a known Windows version problem. It was fixed in later versions, looks like from 8.2.18 on. I do not run Postgres on Windows, if that indeed is where your server is running, so I am at the limit of what I can help you with. -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] cast problem in Postgresql 9.0.1
On 1 Feb 2011, at 7:14, AI Rumman wrote: I migrated my DB from Postgresql 8.1 to Postgresql 9.0.1. That's quite a big step up. You skipped 8.2, 8.3 and 8.4 - all major releases. My advise: Test very thoroughly for more differences in behaviour. One thing to start looking at right away is whether your config parameters still make sense, in case you're re-using your old config. Some changed names I think, and there are some new ones you might want to change. I have a table testtab \d testtab id int, hours varchar That seems an odd choice for a datatype. What are you trying to accomplish by making it varchar? And no, of course you can't sum varchars, what kind of output would you expect from that? That an older version of Postgres didn't throw an error was probably a bug. Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4d48774c11731638385336! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Some Problems - Shall I reinstall the DB?
vwu98...@lycos.com writes: When I try to access PG from PC command prompt with the psql command, I get the following error message: psql: server closed the connection unexpectedly This probably mean the server terminated abnormally before or while processing the request. This time a error message shows up in the PG log file: FATAL: could not reattach to shared memory (key=5432001, addr=011D): Invalid argument If you're running PG on Windows, this is a known problem that was fixed some time ago ... As I can tell, the PG log file doesn't tell the whole story. The version of PG is 8.0. ... but not *that* long ago. 8.0.x on Windows was desupported more than three years ago, and as of last year is out of support on all platforms. Please update to a current version. http://www.postgresql.org/about/news.865 regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problem with encode () and hmac() in pgcrypto
On Tue, Feb 1, 2011 at 5:36 PM, hlcborg h.luis.card...@gmail.com wrote: encode(hmac(v_em_crt_conc, v_Private,'sha1'),'base64'); HMAC - key-dependant SHA1 The Result: h6CpmrP1QCE/Mp3xn3utUEPtftg= This hash has 28 chars When I use OpenSSL in command line like this: ~$ echo 2011-01-31;2011-02-01T13:33:38;100036;684.40; | openssl dgst -sha1 -sign abc.pem | openssl enc -base64 -A Plain SHA1, which is signed with RSA signature. The Result is: nKfxnt31+kk/RnKihJ0jKufq+nZvmPjVauGo8+tqJ1Y/ah/mAu4jSS1wnzU+wRygZ4CLIV9DGSs9bxBc4r9e71C8s9B5ms6Kpggmc12kdmqVHBRO28bPWb/YLCej59gZFFkvcCudweNAT4qHvVqWsOtFCf9kE4q92UIv1JcwSDU= This hash has 172 chars Does someone know where is my problem? Is there other way to implement? or is it simply impossible?? These two operations are not equivalent. -- marko -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] cast problem in Postgresql 9.0.1
On Tue, 2011-02-01 at 12:14 +0600, AI Rumman wrote: I migrated my DB from Postgresql 8.1 to Postgresql 9.0.1. I have a table testtab \d testtab id int, hours varchar When I execute the following: select sum(hours) from testtab I get cast error. In 8.3, implicit casts were removed. You can't sum text. You need to change the data type to a proper numerical type. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] troubleshooting Postgresql on Windows
Hi I recently read some of your windows docs on the wiki. They mentioned that some people experience problems with Postgresql and firewalls and/or anti-virus programs on windows. I know that a software firewall is usually not a problem. I have used a couple of different software firewalls at different times and they usually prompt you to ask if you want to allow this program to access the internet. Hardware firewalls are a different story however. That's why I rather disabled my adsl router's firewall and I am using a software firewall. As far as anti virus programs go according to the wiki you can disable scanning of certain folders or executable files. What sort of problems do firewalls and anti virus programs cause? Do they cause the installation of Postgres to crash? How will you know if a problem is caused by a firewall and/or anti virus program? Will Postgres work as an embedded database on windows? Won't I have lots of support calls with users not being able to install Postgres on Windows? Is it possible to write my own installation routine for Postgres under windows? Where would I find such info? I think I did read somewhere on the web that the one click installer does allow you to do an unattened installation? Thanks H.F.
Re: [GENERAL] Book recommendation?
On 1 Feb 2011, at 21:15, Greg Smith wrote: Herouth Maoz wrote: My sysadmin ran into the following book: PostgreSQL 9.0 High Performance, by Gregory Smith, ISBN 184951030X http://amzn.com/184951030X That guy's a troublemaker, but I guess he writes OK. It must be tough to be reminded of him every time you look into a mirror. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4d487b6011733122416624! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] cast problem in Postgresql 9.0.1
Joshua D. Drake j...@commandprompt.com writes: On Tue, 2011-02-01 at 12:14 +0600, AI Rumman wrote: I migrated my DB from Postgresql 8.1 to Postgresql 9.0.1. I have a table testtab \d testtab id int, hours varchar When I execute the following: select sum(hours) from testtab I get cast error. In 8.3, implicit casts were removed. You can't sum text. You couldn't do it in previous releases, either. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Issues with generate_series using integer boundaries
On 1 Feb 2011, at 21:26, Thom Brown wrote: On 1 February 2011 01:05, Tom Lane t...@sss.pgh.pa.us wrote: Thom Brown t...@linux.com writes: I've noticed that if I try to use generate_series to include the upper boundary of int4, it never returns: I'll bet it's testing currval bound without considering the possibility that incrementing currval caused an overflow wraparound. We fixed a similar problem years ago in plpgsql FOR-loops... Yes, you're right. Internally, the current value is checked against the finish. If it hasn't yet passed it, the current value is increased by the step. When it reaches the upper bound, since it hasn't yet exceeded the finish, it proceeds to increment it again, resulting in the iterator wrapping past the upper bound to become the lower bound. This then keeps it looping from the lower bound upward, so the current value stays well below the end. That could actually be used as a feature to create a repeating series. A bit more control would be useful though :P Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4d487c1211731974314558! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] zero_damaged_pages doesn't work
David Boreham wrote: On 9/27/2010 4:53 PM, Tom Lane wrote: The reason it tells you that data will be destroyed is that that could very well happen. Re-parsing this, I think there was a mis-communication : I'm not at all suggesting that the doc should _not_ say that data will be corrupted. I'm suggesting that in addition to what it currently says, it also should say that the on-disk data won't be changed by the page zeroing mode. In my searching I found countless people over the past few years who had been similarly confused into believing that it would write back the zeroed page to disk. Based on this discussion from September, I have applied the attached documentation patch to clarify that zero_damaged_pages are not forced to disk, and when to set this parameter off again. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 3a0f755..141430c 100644 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** LOG: CleanUpLock: deleting: lock(0xb7ac *** 6059,6073 para Detection of a damaged page header normally causes productnamePostgreSQL/ to report an error, aborting the current ! command. Setting varnamezero_damaged_pages/ to on causes ! the system to instead report a warning, zero out the damaged page, ! and continue processing. This behavior emphasiswill destroy data/, ! namely all the rows on the damaged page. But it allows you to get past the error and retrieve rows from any undamaged pages that might ! be present in the table. So it is useful for recovering data if corruption has occurred due to a hardware or software error. You should generally not set this on until you have given up hope of recovering ! data from the damaged pages of a table. The default setting is literaloff/, and it can only be changed by a superuser. /para --- 6059,6075 para Detection of a damaged page header normally causes productnamePostgreSQL/ to report an error, aborting the current ! transaction. Setting varnamezero_damaged_pages/ to on causes ! the system to instead report a warning, zero out the damaged ! page in memory, and continue processing. This behavior emphasiswill destroy data/, ! namely all the rows on the damaged page. However, it does allow you to get past the error and retrieve rows from any undamaged pages that might ! be present in the table. It is useful for recovering data if corruption has occurred due to a hardware or software error. You should generally not set this on until you have given up hope of recovering ! data from the damaged pages of a table. Zerod-out pages are not ! forced to disk so it is recommended to recreate the table or ! the index before turning this parameter off again. The default setting is literaloff/, and it can only be changed by a superuser. /para -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Windows to Linux PostgreSQL Migration
Hello, I have just done a multi-part migration and am having difficulty viewing binary content stored in a bytea field; specifically I stored a PDF in the existing setup and now receive a File does not begin with '%PDF-' message when I go to pull it up from the new setup. The specific migration is as follows: Existing: Windows 2000 (32bit); PostgreSQL 8.2 New: Ubuntu 10.10 (64bit); PostgreSQL 9.0 To migrate the data I did the following: Pg_dump (v 8.3) from the existing Windows (32) Machine into a dump file on an NTFS file system. Pg_restore (v 9.0) from a different Windows (64) machine from the dump file into the running Linux (64) server. Using the same application software and JRE (Tomcat container) I attempted to view the files as normal (using Firefox on a Windows Machine) and the File does not begin with '%PDF-' message appears. I can think of some possible variations and causes for this but figured before I go running for the cliff I'd see if anyone can at least point me in the right direction. Thank You David Johnston
Re: [GENERAL] Windows to Linux PostgreSQL Migration
On 2/1/2011 3:44 PM, David Johnston wrote: Hello, I have just done a multi-part migration and am having difficulty viewing binary content stored in a “bytea” field; specifically I stored a PDF in the existing setup and now receive a “File does not begin with ‘%PDF-‘” message when I go to pull it up from the new setup. The specific migration is as follows: Existing: Windows 2000 (32bit); PostgreSQL 8.2 New: Ubuntu 10.10 (64bit); PostgreSQL 9.0 To migrate the data I did the following: Pg_dump (v 8.3) from the existing Windows (32) Machine into a dump file on an NTFS file system. Pg_restore (v 9.0) from a different Windows (64) machine from the dump file into the running Linux (64) server. Using the same application software and JRE (Tomcat container) I attempted to view the files as normal (using Firefox on a Windows Machine) and the “File does not begin with ‘%PDF-‘” message appears. I can think of some possible variations and causes for this but figured before I go running for the cliff I’d see if anyone can at least point me in the right direction. Thank You David Johnston Not sure if it'll help, but you could try the pg_dump from version 9 to do the backup. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Windows to Linux PostgreSQL Migration
Andy, thank you for the quick response but I figured out the problem. I recalled reading something regarding the bytea data type in 9.0 migration/release notes and decided to look into that. It turns out that changing the default output format to hex from escape is the cause of the issue. I restarted the server the bytea_output = escape and I am once again able to view my PDF files. The notes are very vague as to WHY this change but at least getting the original behavior back isn't that difficult. I am just curious whether (or even how) I would change my Java code (JDBC) to use the newly default hex format that appears to be preferred over the escape format. Dave -Original Message- From: Andy Colson [mailto:a...@squeakycode.net] Sent: Tuesday, February 01, 2011 4:59 PM To: David Johnston Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Windows to Linux PostgreSQL Migration On 2/1/2011 3:44 PM, David Johnston wrote: Hello, I have just done a multi-part migration and am having difficulty viewing binary content stored in a bytea field; specifically I stored a PDF in the existing setup and now receive a File does not begin with '%PDF-' message when I go to pull it up from the new setup. The specific migration is as follows: Existing: Windows 2000 (32bit); PostgreSQL 8.2 New: Ubuntu 10.10 (64bit); PostgreSQL 9.0 To migrate the data I did the following: Pg_dump (v 8.3) from the existing Windows (32) Machine into a dump file on an NTFS file system. Pg_restore (v 9.0) from a different Windows (64) machine from the dump file into the running Linux (64) server. Using the same application software and JRE (Tomcat container) I attempted to view the files as normal (using Firefox on a Windows Machine) and the File does not begin with '%PDF-' message appears. I can think of some possible variations and causes for this but figured before I go running for the cliff I'd see if anyone can at least point me in the right direction. Thank You David Johnston Not sure if it'll help, but you could try the pg_dump from version 9 to do the backup. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Documentation enhancement
Thomas Kellerer wrote: Hi, I would like to suggest to enhance the documentation of the CREATE VIEW statement. I think the fact that a SELECT * is internally stored as the expanded column list (valid at the time when the view was created) should be documented together with the CREATE VIEW statement. Especially because the example does use SELECT * to create the view. Agreed. The attached, applied patch documents this behavior. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/ref/create_view.sgml b/doc/src/sgml/ref/create_view.sgml index dd15507..417f8c3 100644 *** a/doc/src/sgml/ref/create_view.sgml --- b/doc/src/sgml/ref/create_view.sgml *** CREATE VIEW comedies AS *** 173,178 --- 173,182 FROM films WHERE kind = 'Comedy'; /programlisting +This will create a view containing the columns that are in the +literalfilm/ table at the time of view creation. Though +literal*/ was used to create the view, columns added later to +the table will not be part of the view. /para /refsect1 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Issues with generate_series using integer boundaries
On 1 February 2011 21:32, Alban Hertroys dal...@solfertje.student.utwente.nl wrote: On 1 Feb 2011, at 21:26, Thom Brown wrote: On 1 February 2011 01:05, Tom Lane t...@sss.pgh.pa.us wrote: Thom Brown t...@linux.com writes: I've noticed that if I try to use generate_series to include the upper boundary of int4, it never returns: I'll bet it's testing currval bound without considering the possibility that incrementing currval caused an overflow wraparound. We fixed a similar problem years ago in plpgsql FOR-loops... Yes, you're right. Internally, the current value is checked against the finish. If it hasn't yet passed it, the current value is increased by the step. When it reaches the upper bound, since it hasn't yet exceeded the finish, it proceeds to increment it again, resulting in the iterator wrapping past the upper bound to become the lower bound. This then keeps it looping from the lower bound upward, so the current value stays well below the end. That could actually be used as a feature to create a repeating series. A bit more control would be useful though :P I don't quite understand why the code works. As I see it, it always returns a set with values 1 higher than the corresponding result. So requesting 1 to 5 actually returns 2 to 6 internally, but somehow it correctly shows 1 to 5 in the query output. If there were no such discrepancy, the upper-bound/lower-bound problem wouldn't exist, so not sure how those output values result in the correct query result values. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problem with encode () and hmac() in pgcrypto
Hi Marko, OK, These two operations are not equivalent. But... Can I have this operation done in the Stored Procedure inside the Database? Plain SHA1, which is signed with RSA signature. and in the end encoded to base64? I was looking in the pgcrypto functions, and I haven´t found any that I could use I think... Maybe I am not looking for the rigth one. I need to use Plain SHA1 signed with a RSA signature and encoded into base64 due to a new law related to digital signature in invoices in my country. -- Luis -- View this message in context: http://postgresql.1045698.n5.nabble.com/Problem-with-encode-and-hmac-in-pgcrypto-tp3366420p3367159.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] yum repo problem
On Tue, 2011-02-01 at 12:25 -0800, John R Pierce wrote: k, I did a yum clean all and its working now. odd. before it was trying to install... postgresql-server i386 8.4.4-2PGDG.el4 pgdg84 and now its installed... postgresql-server i386 8.4.6-1PGDG.rhel4 pgdg84 not sure i understand, but not sure it matters, problem solved. I have seen similar issues with old versions of yum, like you are using now :( Glad that it is solved now. -- Devrim GÜNDÜZ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [GENERAL] Select for update with offset interferes with concurrent transactions
If random sampling is desirable would the following construct limit locking only to the sampled rows? SELECT id FROM tasktable WHERE id IN (SELECT random_id_sample()) FOR UPDATE The random_id_sample would supply a configurable group of IDs off of tasktable which the FOR UPDATE would then lock I guess the issue remains that random_id_sample() would still end up blocking if any of the rows it wants to return are already locked. I too am using this basic protocol of maintaining state info within the database and sending every query against it. As I ponder this more it really seems as if moving some of this logic into the application layer would possibly make more sense in Yngve's situation (or at least something to consider). Continue to use the database as a persistence mechanism but code the dispatching of tasks in the application layer and then as each task is dispatched you simply do an UPDATE table SET state = 'dispatch' WHERE id = 'ID' and a similar UPDATE when the task is returned completed. This somewhat presumes you still only ever hand off one task at a time. If you are indeed handing off tasks in batches then it would make sense to have a batch table and operate at the batch level instead of individual tasks - assigning tasks to a given batch via some standard mechanism. Either way if you truly want true parallel processing then you need to create the parallel paths that can operate without clobbering each other and thus each path needs to have its own pool of tasks since as soon as you have a shared resource the only true way to make sure it is only allocated once is to serialize access to it. An alternative method would be to allow multiple dispatches but have a write-once method that is called and sets an immutable handler_id and then when the processing begins only the handler with the matching id would be able allow to perform the actual processing. I say the above with certainty but at the moment I am using and fairly happy with my limited serialization - especially since I have specific sub-properties that I can use to limit how many records are locked AND also because the locking time is very short (I cap around 20 or so active tasks to dispatch - and only infrequently at that) so my experience and insight to high-demand situations is limited. Dave -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Tuesday, February 01, 2011 12:18 PM To: Yngve Nysaeter Pettersen Cc: pgsql-general@postgresql.org Subject: Re: Select for update with offset interferes with concurrent transactions Yngve Nysaeter Pettersen yn...@opera.com writes: To avoid having the processes trample each other's queries (the first attempt was to select the first matching entries of the table, which caused one to block all other transactions), one of the steps I took was to select a set of idle rows at a random offset into the table from the project, mark them for update, then update each record's state as started. SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE At present n is 100-150, i is a random value in the range 0-1. There is, intentionally, no ordering specified, since that would just slow down the query, and is not necessary. This seems like a pretty bad design. There are recognized ways to solve this problem with more predictability and much less chance of different processes blocking each other. In particular, this query seems be based on some untenable assumptions about the physical row order being stable. What I've discovered when using Postgres 9.0 is that the processes are now blocking every other query into this table, In 9.0, LIMIT/OFFSET processing is done after FOR UPDATE locking, which means that rows skipped over by OFFSET still get locked, which means that different sessions executing this query are now practically certain to block each other, rather than just likely to block each other. This was an intentional change to improve the predictability of FOR UPDATE's interactions with LIMIT/OFFSET, and indeed it's improved the predictability of the behavior for you, just not in the direction you'd like :-( regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Importing/Appending to Existing Table
I have an existing table with 15,492 rows and want to add additional rows from a .csv file. If I use 'COPY tablename from filename with delimiter as : csv quote as ' ' will this overwrite existing rows in the table or append rows? Rich -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On Mon, Jan 31, 2011 at 7:52 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Jan 31, 2011 at 5:54 PM, Aleksey Tsalolikhin atsaloli.t...@gmail.com wrote: Situation: Disk usage on production server root filesystem is at 68% utilization (80 GB used), on DR is at 51% (56 GB used). We use SlonyII-1.2.x to keep the DR up to date. I would like to account for the 24 GB difference. This is likely free space in your database. Some of it is completely normal and actually improves performance. Too much and your db is bloated and things starting taking too long. Thanks, Scott! Bucardo's check_postgres.pl --action bloat complains about one table, 1 GB wasted. So the other tables must be OK. So what about my DR, which doesn't have this same 20+ GB of free space. Will it acquire it once it goes into production? Will performance be impacted as it acquires the free space? Should I even be concerned about the difference in disk usage or is it normal and expected? How do I find out how much actual data I have in my database, minus the free space? Is there some built-in way to report this, or do I need to run SELECT * FROM * and look at the byte count of the output? Thanks, -at -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On 1 February 2011 03:52, Scott Marlowe scott.marl...@gmail.com wrote: You can reclaim that space by doing a cluster or vacuum full on the subject table. Yes, but this is a fairly bad idea, particularly prior to PG 9.0 . 9.0 has a new vacuum full implementation that makes it not so bad - it just rewrites the entire table. VACUUM FULL will take exclusive locks on tables being vacuumed. It also causes index bloat. You should be very careful about using it on a production system. I'm not sure why you'd advocate CLUSTER as a way to reclaim disk space. I wouldn't increase index fill factor as an optimisation, unless you had the unusual situation of having very static data in the table. -- Regards, Peter Geoghegan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On Tue, Feb 1, 2011 at 8:13 PM, Peter Geoghegan peter.geoghega...@gmail.com wrote: On 1 February 2011 03:52, Scott Marlowe scott.marl...@gmail.com wrote: You can reclaim that space by doing a cluster or vacuum full on the subject table. Yes, but this is a fairly bad idea, particularly prior to PG 9.0 . 9.0 has a new vacuum full implementation that makes it not so bad - it just rewrites the entire table. VACUUM FULL will take exclusive locks on tables being vacuumed. It also causes index bloat. You should be very careful about using it on a production system. I know these things. I'm pretty sure it's even in the docs by now. I'm not sure why you'd advocate CLUSTER as a way to reclaim disk space. Because it can reclaim disk space? http://www.postgresql.org/docs/8.4/static/sql-cluster.html I wouldn't increase index fill factor as an optimisation, unless you had the unusual situation of having very static data in the table. That makes no sense whatsoever. You decrease fill factor (not increase btw) so there will be some space for future updates. If he's getting bloat it may well help quite a bit to have a lower than 100% fill factor. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
More info here. http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On Tue, Feb 1, 2011 at 7:29 PM, Aleksey Tsalolikhin atsaloli.t...@gmail.com wrote: Bucardo's check_postgres.pl --action bloat complains about one table, 1 GB wasted. So the other tables must be OK. Pretty sure that unless you give it more args, the default for bloat check is to list the first bloated table and stop. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] postgresql 9 on windows 2000
Hi, Does postgresql run on windows 2000? I've tried to install postgresql-9.0.3-1 on windows 2000 but I couldn't connect to the default local server. If not, which version can I use for windows 2000? Thanks,
[GENERAL] Tablespaces Issue in Postgres
Dear all, Few days ago I attached a tablespace to one my databases. Say /hdd-1/database_name is the path for the tablespace. When I attach it has a free space of 100 GB. Now my database size grows to 90 GB. I want to prepare for my upcoming problem of space. How can I handle this. Would anyone Please guide me the way to overcome. Highly Thankful Adarsh Sharma -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Importing/Appending to Existing Table
On 2 February 2011 03:28, Rich Shepard rshep...@appl-ecosys.com wrote: I have an existing table with 15,492 rows and want to add additional rows from a .csv file. If I use 'COPY tablename from filename with delimiter as : csv quote as ' ' will this overwrite existing rows in the table or append rows? No it wont overwrite, it will append rows. Rich -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sergey Konoplev Blog: http://gray-hemp.blogspot.com / Linkedin: http://ru.linkedin.com/in/grayhemp / JID/GTalk: gray...@gmail.com / Skype: gray-hemp -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Bucardo's check_postgres.pl --action bloat complains about one table, 1 GB wasted. So the other tables must be OK. Pretty sure that unless you give it more args, the default for bloat check is to list the first bloated table and stop. No, it will show all tables over the given threshhold. However, the statement the other tables must be OK is definitely not a given, as the bloat calculation used by check_postgres is a very rough one. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201102020206 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAk1JAqkACgkQvJuQZxSWSsiH0ACfZowR8lU2PJByBCyhsELpdozg 3SIAnjguAyRbjXxT8cSo6yZ8zar00TNZ =ji8B -END PGP SIGNATURE- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general