Re: [GENERAL] timestamp with time zone tutorial
Hey Tom, I was trying to use 'US/Pacific-New' as my long, unabbreviated timezone and it wasn't working. I thought postgres wasn't accepting the unabbreviated, geopolitical, daylight savings time, time zones. Turns out, the server that I was on, (not my own box), didn't have that in the '/usr/share/zoneinfo/US' directory. My bad. I just have to read more on how to get it out relative to a different time zone than it went in. I'll find it. Dennis Gearon Signature Warning EARTH has a Right To Life I agree with Bolivian President Evo Morales # The right to life: The right for no ecosystem to be eliminated by the irresponsible acts of human beings. # The right of biosystems to regenerate themselves: Development cannot be infinite. There's a limit on everything. # The right to a clean life: The right for Mother Earth to live without contamination, pollution. Fish and animals and trees have rights. # The right to harmony and balance between everyone and everything: We are all interdependent. See the movie - 'Inconvenient Truth' See the movie - 'Syriana' --- On Sun, 7/19/09, Tom Lane t...@sss.pgh.pa.us wrote: From: Tom Lane t...@sss.pgh.pa.us Subject: Re: [GENERAL] timestamp with time zone tutorial To: Dennis Gearon gear...@sbcglobal.net Cc: pgsql-general@postgresql.org Date: Sunday, July 19, 2009, 8:00 PM Dennis Gearon gear...@sbcglobal.net writes: What I want is to be able to insert into my project's database, times given by anybody anywhere on the planet (the SUBMITTER), add the appropriate timezone in the insert statement so that it in 'GMT/UMT' neutral'. I believe this is the way Postgres does it, storing times in GMT time. So just do it. What do you need to_timestamp for? The basic timestamptz input function is far more likely to get it right for input that's not in a precise predetermined format. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] First query very slow. Solutions: memory, or settings, or SQL?
On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote: explain analyze select * from sites where user_id = 'phoenix' order by id desc limit 10; QUERY PLAN --- --- Limit (cost=344.85..344.87 rows=10 width=262) (actual time=5879.069..5879.167 rows=10 loops=1) - Sort (cost=344.85..345.66 rows=323 width=262) (actual time=5879.060..5879.093 rows=10 loops=1) Sort Key: id - Index Scan using new_idx_sites_userid on sites (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557 rows=2178 loops=1) Index Cond: ((user_id)::text = 'phoenix'::text) Total runtime: 5879.414 ms (6 rows) The row estimate for the index scan is off. Try reanalyzing, or increase the statistics target. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] timestamp with time zone tutorial
I just have to read more on how to get it out relative to a different time zone than it went in. I'll find it. Sounds like a job for SELECT ... AT TIME ZONE ...; Karsten -- Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Best practices for moving UTF8 databases
Phoenix Kiula wrote: Really, PG absolutely needs a way to upgrade the database without so much data related downtime and all these silly woes. Several competing database systems are a cinch to upgrade. I'd call it data corruption, not a silly woe. I know that Oracle for example would not make that much fuss about your data: they would be imported without even a warning, and depending on your encoding settings the bad bytes would either be imported as-is or tacitly changed to inverted (or normal) question marks. It's basically a design choice that PostgreSQL made: we think that an error is preferrable to clandestinely modifying the user's data or accepting input that cannot possibly make any sense when it is retrieved at a future time. Anyway this is the annoying error I see as always: ERROR: invalid byte sequence for encoding UTF8: 0x80 I think my old DB is all utf8. If there are a few characters that are not, how can I work with this? I've done everything I can to take care of the encoding and such. This code was used to initdb: initdb --locale=en_US.UTF-8 --encoding=UTF8 Locale environment variables are all en_US.UTF-8 too. 0x80 makes me think of the following: The data originate from a Windows system, where 0x80 is a Euro sign. Somehow these were imported into PostgreSQL without the appropriate translation into UTF-8 (how I do not know). I wonder: why do you spend so much time complaining instead of simply locating the buggy data and fixing them? This does not incur any downtime (you can fix the data in the old database before migrating), and it will definitely enhance the fun your users have with your database (if they actually see Euros where they should be). Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Working around spurious unique constraint errors due to SERIALIZABLE bug
Craig Ringer wrote: The drawback is that some of the side effects of the INSERT occur before the constraint check fails, so it seems to me that I still need to perform the select. If you really can't afford the INSERT side effects and can't redesign your code to be tolerant of them, you can always lock the table before an INSERT. I wonder what could be the side effects of an INSERT that causes an error that is caught in a plpgsql exception block. What happens behind the scenes when an exception is caught is that the transaction is rolled back to a savepoint that was set at the beginning of the block. So all changes to the database that were caused as side effects of the INSERT, for example triggers, will be rolled back as well. The only side effects that would remain could be caused by things that go outside the limits of the database, e.g. modify files on the database server or perform trigger based replication to another database. Everything that is not under MVCC control, for short. Is that the problem here? On another line: The original question asked was how can I tell an error that is caused by incomplete isolation from another error? If you have a code segment like SELECT COUNT(id) INTO i2 FROM a WHERE id = i; IF i2 = 0 THEN INSERT INTO a (id) VALUES (i); END IF; Then you can be certain that any unique_violation thrown here must be a serialization problem (if the only unique contraint is on id). Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Working around spurious unique constraint errors due to SERIALIZABLE bug
* Craig Ringer: The test program, attached, demonstrates what I should've known in the first place. In SERIALIZABLE isolation, the above is *guaranteed* to fail every time there's conflict, because concurrent transactions cannot see changes committed by the others. So is a SELECT test then separate INSERT, by the way. Yes, I forgot to mention that you can't use SERIALIZABLE if you use this approach. Given that, it seems to me you'll have to rely on Pg's internal lower-level synchonization around unique indexes. Try the insert and see if it fails, then ROLLBACK TO SAVEPOINT (or use a PL/PgSQL exception block). As you noted, this does mean that certain side-effects may occur, including: - advancement of sequences due to nextval(...) calls - triggers that've done work that can't be rolled back, eg dblink calls, external file writes, inter-process communication etc It's also the cost of producing the input data for the INSERT. (You might want to use the two-argument form of the advisory locking calls if your IDs are INTEGER size not INT8, and use the table oid for the first argument.) Locking on a hash value could also be an option (it's how concurrent hash tables are sometimes implemented). Also: Is this really a phantom read? Your issue is not that you read a record that then vanishes or no longer matches your filter criteria; rather, it's that a record is created that matches your criteria after you tested for it. It's the INSERT which performs the phantom read. And is SQL's definition of serializability really different from the textbook one? Certainly that wouldn't be possible if the concurrent transactions were actually executed serially, but does the standard actually require that this be the case? If it does, then compliant implementations would have to do predicate locking. Ouch. Does anybody do that? You don't need predicate locking here. You just have to lock on the gap in the index you touched. I think some implementations do this (InnoDB calls it next-key locking). -- Florian Weimerfwei...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [EDIT] Timestamp indicies not being used!
On 19 Jul 2009, at 12:39, Pedro Doria Meunier wrote: Hash: SHA1 Hi All, I *really* need some help with this one... I have a table ~18M rows with a 'timestamp with time zone' column. It's indexed thus: CREATE INDEX my_table_timestamp_idx ON my_table USING btree (zulu_timestamp); whenever I issue a command like: SELECT speed, digital_input_1, digital_input_2, digital_input_3, digital_input_4, priority FROM my_table WHERE id='123456789012345' AND zulu_timestamp '2009-07-10 15:24:45+01' ORDER BY zulu_timestamp DESC LIMIT 1 it takes FOREVER in instances where there's only 1 row or 0 rows in the table EDITfor a date equal to the one being compared/EDIT the column id is also indexed. this is the query plan: Limit (cost=0.00..83.88 rows=1 width=20) (actual time=810784.212..810784.212 rows=0 loops=1) - Index Scan Backward using my_table_timestamp_idx on my_table (cost=0.00..3706639.95 rows=44192 width=20) (actual time=810784.210..810784.210 rows=0 loops=1) Index Cond: (zulu_timestamp '2009-07-10 15:24:45+01'::timestamp with time zone) Filter: (id = '123456789012345'::bpchar) Total runtime: 810808.298 ms From this plan it appears the planner statistics aren't up to date or the statistics size on the timestamp column is too small, as the expected number of rows (44192) doesn't match the actual number (0) at all. Some experimenting with ANALYSE and column statistics should tell whether this is indeed the problem. That said statistics are most useful for common cases, they're usually not very accurate for exceptions so playing around with those may not give the desired results. What happens in above query plan is that the planner scans a large part of rows referred to from the timestamp index (namely all those before the specified timestamp) to find any rows matching the id. There are a few things you could do about that: 1) You could specify a lower boundary for the timestamps. The way you're going about it the longer your application runs the more rows will match your zulu_timestamp '2009-07-10 15:24:45+01' expression. It seems likely that you know that the timestamp is at least in 2009-10 for example, reducing the matching rows by a lot once your application is running for several months. 2) You could define a multi-column index instead of two separate indexes. Which column should be first depends on which column you query on most frequently, but I expect it to be (id, zulu_timestamp). With such an index the matching rows are known to be in the index and thus looking them up should be a lot faster. Finally, sending every message as urgent is not going to help you. It's like this: http://en.wikipedia.org/wiki/The_boy_who_cried_wolf Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4a6437be10131991414558! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [EDIT] Timestamp indicies not being used!
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thank you Alban for your considerations. The issue has already been marked as [SOLVED] following Sam's suggestion. And FYI the issue *was* urgent and the wolf *was* biting my leg! :] BR, Pedro Doria Meunier GSM: +351 96 17 20 188 Skype: pdoriam Alban Hertroys wrote: On 19 Jul 2009, at 12:39, Pedro Doria Meunier wrote: Hash: SHA1 Hi All, I *really* need some help with this one... I have a table ~18M rows with a 'timestamp with time zone' column. It's indexed thus: CREATE INDEX my_table_timestamp_idx ON my_table USING btree (zulu_timestamp); whenever I issue a command like: SELECT speed, digital_input_1, digital_input_2, digital_input_3, digital_input_4, priority FROM my_table WHERE id='123456789012345' AND zulu_timestamp '2009-07-10 15:24:45+01' ORDER BY zulu_timestamp DESC LIMIT 1 it takes FOREVER in instances where there's only 1 row or 0 rows in the table EDITfor a date equal to the one being compared/EDIT the column id is also indexed. this is the query plan: Limit (cost=0.00..83.88 rows=1 width=20) (actual time=810784.212..810784.212 rows=0 loops=1) - Index Scan Backward using my_table_timestamp_idx on my_table (cost=0.00..3706639.95 rows=44192 width=20) (actual time=810784.210..810784.210 rows=0 loops=1) Index Cond: (zulu_timestamp '2009-07-10 15:24:45+01'::timestamp with time zone) Filter: (id = '123456789012345'::bpchar) Total runtime: 810808.298 ms From this plan it appears the planner statistics aren't up to date or the statistics size on the timestamp column is too small, as the expected number of rows (44192) doesn't match the actual number (0) at all. Some experimenting with ANALYSE and column statistics should tell whether this is indeed the problem. That said statistics are most useful for common cases, they're usually not very accurate for exceptions so playing around with those may not give the desired results. What happens in above query plan is that the planner scans a large part of rows referred to from the timestamp index (namely all those before the specified timestamp) to find any rows matching the id. There are a few things you could do about that: 1) You could specify a lower boundary for the timestamps. The way you're going about it the longer your application runs the more rows will match your zulu_timestamp '2009-07-10 15:24:45+01' expression. It seems likely that you know that the timestamp is at least in 2009-10 for example, reducing the matching rows by a lot once your application is running for several months. 2) You could define a multi-column index instead of two separate indexes. Which column should be first depends on which column you query on most frequently, but I expect it to be (id, zulu_timestamp). With such an index the matching rows are known to be in the index and thus looking them up should be a lot faster. Finally, sending every message as urgent is not going to help you. It's like this: http://en.wikipedia.org/wiki/The_boy_who_cried_wolf Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4a6437be10131991414558! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFKZDtl2FH5GXCfxAsRAq4BAKCz6J8+ellx1DsaXLeznV6E4z7OkACgqwjK RbZ0c+jvNYD+vxJi2ucneCg= =D6re -END PGP SIGNATURE- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Best practices for moving UTF8 databases
On Mon, Jul 20, 2009 at 10:32:15AM +0800, Phoenix Kiula wrote: Thanks Martin. I tried searching through the archives and could only come with something like this: http://docs.moodle.org/en/UTF-8_PostgreSQL But this only has the usual iconv stuff suggested. Could you pls suggest some specific scripts or URLs? Or let me know the keywords I should look for? I was thinking of this one: http://www.mail-archive.com/pgsql-hack...@postgresql.org/msg68541.html You can use it to find the bogus strings in your existing database and fix them. It's not really fast, but maybe it'll help. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [GENERAL] Working around spurious unique constraint errors due to SERIALIZABLE bug
* Albe Laurenz: The original question asked was how can I tell an error that is caused by incomplete isolation from another error? If you have a code segment like SELECT COUNT(id) INTO i2 FROM a WHERE id = i; IF i2 = 0 THEN INSERT INTO a (id) VALUES (i); END IF; Then you can be certain that any unique_violation thrown here must be a serialization problem (if the only unique contraint is on id). I want to put this into a library, so I'd like something foolproof. Right now, user code sets a flag which basically says that the wrapper should retry the transaction a few times if a unique_violation is detected, but I'd like to get rid of that because it's one thing less the programmer needs to worry about. -- Florian Weimerfwei...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search in PostgreSQL 8.4
Hello, I recently upgraded to version 8.4 and now full text search with russian configuration is not working: template1=# create database test encoding='win1251'; test=# create table test (test varchar(255)); test=# insert into test values ('тест'); test=# select * from test where to_tsvector('russian', test) @@ to_tsquery('russian', 'тест'); In 8.3 version I have result: test -- тест (1 запись) In 8.4 I have this notice and 0 rows with any table values: NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored test -- (0 rows) Why it may not working in 8.4 version? Thanks Hi Konstantin, I ran your tests with 8.3 and 8.4. I have the expected result: postgres=# \c test psql (8.4.0) You are now connected to database test. test=# create table test (test varchar(255)); CREATE TABLE test=# insert into test values ('тест'); INSERT 0 1 test=# select * from test where to_tsvector('russian', test) @@ to_tsquery('russian', 'тест'); test -- тест (1 row) I have a clean installation - means the dictionarys are not edited. After insterting тест into the russian.stop file, I can reproduce your case: test=# select * from test where to_tsvector('russian', test) @@ to_tsquery('russian', 'тест'); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored test -- (0 rows) If you would have given the column test the data type tsvector, probably no value would have been inserted. Just try it with another column and see if you can insert тест into that column. This is just an idea ... Cheers Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: PostgreSQL Databse Migration to the Latest Version and Help for Database Replication.
On Jul 20, 9:29 am, dmag...@gmail.com (Chris) wrote: *I have tried *RubyRep, Bucardo, *_but none of these have a stable rpm release for CentOS version of Linux Operation System_. Just to mention this: rubyrep comes with a bundled package. http://www.rubyrep.org/installation.html Just unzip and it will work*. No need to hunt down libraries or something like this. *: rubyrep requires JDK = 1.5. But that's it. Best Regards, Arndt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Understanding sequential versus index scans.
Yes, I had done UNION. UNION ALL achives the expected plan and speed! Thank you! BTW, this is interesting, because there are only about 5 or 6 rows max returned from both queries - but I guess the planner expects more and hence changes the plan to remove duplicates. On Sun, Jul 19, 2009 at 9:05 PM, Scott Marlowe scott.marl...@gmail.comwrote: On Sun, Jul 19, 2009 at 6:10 PM, Robert Jamessrobertja...@gmail.com wrote: UNION was better, but still 5 times as slow as either query done individually. set enable_seqscan=off didn't help at all - it was totally ignored Is there anything else I can do? Did you try union, or union all?
[GENERAL] Documentation Improvement suggestions
Two small suggestions that might make it easier for newcomers to take advantage of the wonderful database: 1. Googling Postgres docs produces links for all different versions. This is because incoming links are to different versions. Besides being confusing, it pushes the pages lower in Google, and makes it harder to find them. Could the website offer a link to the 'current' version, whichever it is. Eg instead of just : http://www.postgresql.org/docs/8.1/static/creating-cluster.html Have: http://www.postgresql.org/docs/current/static/creating-cluster.html which would keep all incoming links pointed to the current page. 2. The 'SQL' in 'PostgresSQL' is hard to say and type. Everyone drops it (even this list!). Why not change the official name? Again, it would make googling and naming things easier.
Re: [GENERAL] timestamp with time zone tutorial
On Sunday 19 July 2009 10:59:24 pm Dennis Gearon wrote: Hey Tom, I was trying to use 'US/Pacific-New' as my long, unabbreviated timezone and it wasn't working. I thought postgres wasn't accepting the unabbreviated, geopolitical, daylight savings time, time zones. Turns out, the server that I was on, (not my own box), didn't have that in the '/usr/share/zoneinfo/US' directory. My bad. I just have to read more on how to get it out relative to a different time zone than it went in. I'll find it. Dennis Gearon For me it easier to think of the time zones as a formatting option rather than a data storage attribute. The simple explanation is that it always goes in as UTC. The more complicated explanation follows. The tz data types are stored as UTC. The time zone information is used on input to make the correct offset from the specified tz to UTC. On output the procedure is reversed. Since the data is stored as UTC you get to choose whatever time zone you want it displayed as by using the appropriate setting or function. The issue is if it is important to know the tz that was used for the input offset. That would require a separate field. You can search the archives for the many discussions that have occurred on this point. -- Adrian Klaver akla...@comcast.net -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Understanding INNER JOIN versus IN subquery
I have two queries which should be equivalent. The Planner plans them differently, although they are both about the same time. Can someone explain why? select word from dict where word in (select substr('moon', 0, generate_series(3,length('moon' select * from dict inner join (select substr('moon', 0, generate_series(3,length('moon' as m on dict.word = m.substr Is one preferred?
Re: [GENERAL] Documentation Improvement suggestions
On Mon, 20 Jul 2009 09:34:00 -0400 Robert James srobertja...@gmail.com wrote: Two small suggestions that might make it easier for newcomers to take advantage of the wonderful database: 1. Googling Postgres docs produces links for all different versions. This is because incoming links are to different versions. Besides being confusing, it pushes the pages lower in Google, and makes it harder to find them. Could the website offer a link to the 'current' version, whichever it is. Eg instead of just : http://www.postgresql.org/docs/8.1/static/creating-cluster.html Have: http://www.postgresql.org/docs/current/static/creating-cluster.html which would keep all incoming links pointed to the current page. Did you actually tried the link you proposed? There is a page behind it... ant it actually links to 8.4 docs. People still need old docs so you can't just make docs for previous versions unavailable... and once they are available, they will be indexed. Unfortunately current version doesn't mean most used. Actually what I'd enjoy is a link to the interactive version and maybe aggregating all the comments of previous versions in the new version (indicating from which versions they come from). -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Understanding sequential versus index scans.
On Mon, Jul 20, 2009 at 2:22 PM, Robert Jamessrobertja...@gmail.com wrote: BTW, this is interesting, because there are only about 5 or 6 rows max returned from both queries - but I guess the planner expects more and hence changes the plan to remove duplicates. If you sent the plans for the various attempts we might be able to explain what's going on. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Understanding INNER JOIN versus IN subquery
On Mon, Jul 20, 2009 at 2:37 PM, Robert Jamessrobertja...@gmail.com wrote: I have two queries which should be equivalent. The Planner plans them differently, although they are both about the same time. Can someone explain why? Uhm, please post the two plans and the server version. I know you've posted them before but that was in another thread that I don't have handy. The short answer is that the server doesn't know that the subquery is definitely going to produce distinct results so it doesn't know the inner join won't produce duplicates. So it can't tell that these queries might be equivalent. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Understanding INNER JOIN versus IN subquery
On Mon, Jul 20, 2009 at 2:37 PM, Robert Jamessrobertja...@gmail.com wrote: I have two queries which should be equivalent. The Planner plans them differently, although they are both about the same time. Can someone explain why? select word from dict where word in (select substr('moon', 0, generate_series(3,length('moon' select * from dict inner join (select substr('moon', 0, generate_series(3,length('moon' as m on dict.word = m.substr Is one preferred? it is hard to say from your example. But in my general expierence, I noticed similar thing many times. Needless to say, I prefer personally to choose JOINs instead of IN(), because those tend to be faster on postgresql. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Documentation Improvement suggestions
Robert James escribió: Two small suggestions that might make it easier for newcomers to take advantage of the wonderful database: 1. Googling Postgres docs produces links for all different versions. This is because incoming links are to different versions. Besides being confusing, it pushes the pages lower in Google, and makes it harder to find them. Could the website offer a link to the 'current' version, whichever it is. Eg instead of just : http://www.postgresql.org/docs/8.1/static/creating-cluster.html Have: http://www.postgresql.org/docs/current/static/creating-cluster.html which would keep all incoming links pointed to the current page. Maybe we should offer a robots.txt file that told crawlers to only index the current version of the docs, not the version-specific ones. 2. The 'SQL' in 'PostgresSQL' is hard to say and type. Everyone drops it (even this list!). Why not change the official name? Again, it would make googling and naming things easier. This is a taboo topic which has created the largest holy wars I've seen in this project. Please don't raise it. If you're interested, search the archives. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Documentation Improvement suggestions
On Mon, Jul 20, 2009 at 5:14 PM, Alvaro Herreraalvhe...@commandprompt.com wrote: Robert James escribió: Two small suggestions that might make it easier for newcomers to take advantage of the wonderful database: 1. Googling Postgres docs produces links for all different versions. This is because incoming links are to different versions. Besides being confusing, it pushes the pages lower in Google, and makes it harder to find them. Could the website offer a link to the 'current' version, whichever it is. Eg instead of just : http://www.postgresql.org/docs/8.1/static/creating-cluster.html Have: http://www.postgresql.org/docs/current/static/creating-cluster.html which would keep all incoming links pointed to the current page. Maybe we should offer a robots.txt file that told crawlers to only index the current version of the docs, not the version-specific ones. I don't see this is a good idea, if you want to google only for a specific version you can use ''site'' expression to filter your results site:http://www.postgresql.org/docs/8.4/ foo 2. The 'SQL' in 'PostgresSQL' is hard to say and type. Everyone drops it (even this list!). Why not change the official name? Again, it would make googling and naming things easier. This is a taboo topic which has created the largest holy wars I've seen in this project. Please don't raise it. If you're interested, search the archives. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Documentation Improvement suggestions
On Mon, Jul 20, 2009 at 11:14:27AM -0400, Alvaro Herrera wrote: Robert James escribió: Could the website offer a link to the 'current' version, whichever it is. Eg instead of just : http://www.postgresql.org/docs/8.1/static/creating-cluster.html Have: http://www.postgresql.org/docs/current/static/creating-cluster.html which would keep all incoming links pointed to the current page. Maybe we should offer a robots.txt file that told crawlers to only index the current version of the docs, not the version-specific ones. I know it's not easy, but a nice option to me would be if the 8.1 docs page linked to the equivalent page in the other versions. That would avoid the need to manually edit the URL after a google search. Oh, and +10 for the Up link at the top of the page also. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [GENERAL] Checkpoint Tuning Question
On Mon, Jul 13, 2009 at 3:53 PM, Dan Armbrustdaniel.armbrust.l...@gmail.com wrote: So this thought leads to a couple of other things Dan could test. First, see if turning off full_page_writes makes the hiccup go away. If so, we know the problem is in this area (though still not exactly which reason); if not we need another idea. That's not a good permanent fix though, since it reduces crash safety. The other knobs to experiment with are synchronous_commit and wal_sync_method. If the stalls are due to commits waiting for additional xlog to get written, then async commit should stop them. I'm not sure if changing wal_sync_method can help, but it'd be worth experimenting with. regards, tom lane All of my testing to date has been done with synchronous_commit=off I just tried setting full_page_writes=off - and like magic, the entire hiccup went away. Thanks, Dan I haven't done any other testing for this issue since discovering that setting full_page_writes=off makes my performance hiccup go away. I'm about to move on to some different tests and reset this setup that I was using to test out this issue - is there anything else that anyone would like to have gathered before I axe this system? I can redo it again later too, it will just take a fair bit longer for me to reconfigure the test environment. Thanks, Dan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] xml to string, ascii x utf8 conversion
Hello, having problem with conversion while doing function query_to_xml. query_to_xml('select Nazev as TITLE, Datum as DATE, Autor_Akce as meta rel=''action_author'', gave me table xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; row TITLETest/TITLE DATE2009-07-20/DATE meta_x0020_rel_x003D__x0027_action_author_x0027_test /meta_x0020_rel_x003D__x0027_action_author_x0027_ How can i transcode this well generated xml(ascii) to string(UTF8) without the escape characters? I found nothing on google... Regards, Radek Novotny -- www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online streaming) a videoarchív ve formátu FLASH. Bc. Radek Novotný jednatel Mediawork group s.r.o. tel.: +420 724 020 361 email: radek.novo...@mediawork.cz http://www.mediawork.cz
Re: [GENERAL] suggestion: log_statement = sample
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 In my experience, I've found that enabling full logging for a short time (perhaps a few hours) gathers enough data to run through tools like pgFouine and find problem areas. It is not possible for us. Logging millions of statements take too much time. Too much time to do what? Where is the bottleneck? But you dont run the real use cases with automated tests. There so many factors involved in real time: caching, concurrency, data, peaktime, deadlocks, doubleclicks, robots etc. that you just can't reproduce it on a development system without lots of effort. Agreed. Additionally, we make sure our production systems have enough hardware behind them that we can add additional tasks without it affecting production use. that's nice, but not everybody can afford it. Of course i would love to log every statement. But do you really log every statement in production? I guess not. We have clients that do, and we always recommend it for all of our clients. It can get very expensive (hint: ship them off with syslog over an internal network to a dedicated box), but being able to see exactly what happened in your database when something goes wrong after the fact is invaluable. Plus we can then run tools like pgsi (http://bucardo.org/pgsi/) to get nice statistics and find weak spots in the application code. Again: for my use case it makes sense to have a log_sample feature. Perhaps, but I don't think you've quite overcome the 'log everything' counter-argument. I don't think log_sample is necessarily a bad idea, but I doubt it is going to happen soon, and certainly not in time to put on your production system in the next year or so. Perhaps you can attack this from another angle, such as using some sort of filter or script to grab every X lines from the logs and discard the rest while in full logging mode. That would have the advantage of being fully backwards-compatible and usable today. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200907201256 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkpkoh4ACgkQvJuQZxSWSsi5swCfSgxE/3Vs+kCmfqSERL6u84XJ nNQAoO5oOs/Nwuhe27FQ+THZEUVcdULO =7JPQ -END PGP SIGNATURE- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] suggestion: log_statement = sample
In response to Greg Sabino Mullane g...@turnstep.com: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 In my experience, I've found that enabling full logging for a short time (perhaps a few hours) gathers enough data to run through tools like pgFouine and find problem areas. It is not possible for us. Logging millions of statements take too much time. This is a ridiculous statement. In actual practice, full query logging is 1/50 the amount of disk I/O as the actual database activity. If your systems are so stressed that they can't handle another 2% increase, then you've got bigger problems lurking. Have you benchmarked the load it creates under your workload? But you dont run the real use cases with automated tests. There so many factors involved in real time: caching, concurrency, data, peaktime, deadlocks, doubleclicks, robots etc. that you just can't reproduce it on a development system without lots of effort. Logging a sample query every arbitrary number of queries isn't a real use case either, thus your counter argument makes no sense. I suggested the test system because it's a good compromise. Additionally, we make sure our production systems have enough hardware behind them that we can add additional tasks without it affecting production use. that's nice, but not everybody can afford it. Of course i would love to log every statement. But do you really log every statement in production? I guess not. Try reading my responses instead of guessing. We run full query logging on occasion to get a sample of real usage on our production systems. As a result, we know that our servers can handle the load. We're working on the infrastructure to manage the amount of data so we can do it all the time (we don't currently have enough disk space). Overall, it seems like you've decided that you want this feature and nothing else will do. If that's the case, then just go ahead and write it. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] suggestion: log_statement = sample
On Mon, 2009-07-20 at 13:24 -0400, Bill Moran wrote: In response to Greg Sabino Mullane g...@turnstep.com: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 In my experience, I've found that enabling full logging for a short time (perhaps a few hours) gathers enough data to run through tools like pgFouine and find problem areas. It is not possible for us. Logging millions of statements take too much time. This is a ridiculous statement. No it isn't. In actual practice, full query logging is 1/50 the amount of disk I/O as the actual database activity. If your systems are so stressed that they can't handle another 2% increase, then you've got bigger problems lurking. It depends on the system. I have seen even big systems take a huge hit by full logging due to transactional velocity. Joshua D. Drake -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] suggestion: log_statement = sample
On Jul 20, 2009, at 10:24 AM, Bill Moran wrote: In response to Greg Sabino Mullane g...@turnstep.com: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 In my experience, I've found that enabling full logging for a short time (perhaps a few hours) gathers enough data to run through tools like pgFouine and find problem areas. It is not possible for us. Logging millions of statements take too much time. This is a ridiculous statement. No, it isn't. In actual practice, full query logging is 1/50 the amount of disk I/O as the actual database activity. If your systems are so stressed that they can't handle another 2% increase, then you've got bigger problems lurking. That's really not true. Ever, probably, but certainly in my situation. A lot of my inserts are large text fields (10k - 1M). Many of those are never read again. If I want to log all statements or slow statements then a lot of what's logged is those large inserts. There's currently no way to log the statement without logging all the data inserted, so the log traffic produced by each insert is huge. Logging via syslog on Solaris I've had reports of that slowing the machine down to unusability. (I'm fairly sure I know why, and I suspect you can guess too, but this is on customer machines). The hit due to logging can be huge, even on fairly overpowered systems. There are a lot of limitations in the current logging system when it comes to capturing data for performance analysis, and that's one of them. There's certainly significant room for improvement in the logging system - some of that can be added externally, but some of it really needs to be done within the core code. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] suggestion: log_statement = sample
In response to Joshua D. Drake j...@commandprompt.com: On Mon, 2009-07-20 at 13:24 -0400, Bill Moran wrote: In response to Greg Sabino Mullane g...@turnstep.com: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 In my experience, I've found that enabling full logging for a short time (perhaps a few hours) gathers enough data to run through tools like pgFouine and find problem areas. It is not possible for us. Logging millions of statements take too much time. This is a ridiculous statement. No it isn't. In actual practice, full query logging is 1/50 the amount of disk I/O as the actual database activity. If your systems are so stressed that they can't handle another 2% increase, then you've got bigger problems lurking. It depends on the system. I have seen even big systems take a huge hit by full logging due to transactional velocity. Perhaps I'm just in a foul mood today, but I feel like people are picking my emails apart to make me the bad guy. Note that you trimmed out a key part of what I said here: Have you benchmarked the load it creates under your workload? Perhaps I should have explicitly said, There are likely some cases where this statement isn't true, so you should benchmark your specific load case, but you don't mention that you've done that. Or, perhaps I should just abstain from posting to mailing lists when I'm in a foul mood ... -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] suggestion: log_statement = sample
On Mon, 2009-07-20 at 13:46 -0400, Bill Moran wrote: In response to Joshua D. Drake j...@commandprompt.com: It depends on the system. I have seen even big systems take a huge hit by full logging due to transactional velocity. Perhaps I'm just in a foul mood today, but I feel like people are picking Possibly. :) my emails apart to make me the bad guy. Note that you trimmed out a key part of what I said here: Right. I didn't have issue with the rest of your email. Its just the idea what the person said was ridiculous. Have you benchmarked the load it creates under your workload? Which is certainly a valid question. Perhaps I should have explicitly said, There are likely some cases where this statement isn't true, so you should benchmark your specific load case, but you don't mention that you've done that. Or, perhaps I should just abstain from posting to mailing lists when I'm in a foul mood ... It has certainly gotten me in trouble more than once. There is a reason they say, Don't go JD. Joshua D. Drake -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Log timings on Windows 64
Hey all, I'm looking at windows, and a couple of quick google's didn't give me anything, If I set my log_min_duration_statement 16 ms, I get one of 2 values for my pgbench runs (pretty quick statements). 0, or 16 ms (sometimes 15.999). If I benchmark some other way (via my app) things look the same. If I set my log_line_prefix to %m vs. %t, then the time increments in 16ms chunks. I'm guessing this is an OS limitation on windows. Does anybody else have experience here (or aware of a previous bug)? Thanks --Scott
Re: [GENERAL] best practice transitioning from one datatype to another
I'm stumped-- at least for an easy way to do this. When referencing the uniqueidentifier data type in PostgreSQL 8.1 (now uuid, because of an UPDATE to the pg_ tables) in function definitions in schemas not in the search path, one must reference the data type as public.uniqueidentifier (ahem. public.uuid). This was done because the query planner couldn't with 100% certainty determine type equality (i.e. for foreign key constraints, comparisons in WHERE clauses...), so it decided to use sequential scans where index scans would have been most appropriate. When I reload to PostgreSQL 8.4, it doesn't understand public.uuid. So, the only way I can think of to get those functions back into the database is to use a restore list, and keep them from being inserted in the first place. Then, one-at-a-time, re-create them manually. There must be a better way, though! I'm sure this only my second of several more hurdles to overcome before I'm finished with the transition. Your wisdom will be appreciated! CG From: Tom Lane t...@sss.pgh.pa.us To: CG cgg...@yahoo.com Cc: pgsql-general@postgresql.org Sent: Wednesday, July 15, 2009 7:05:47 PM Subject: Re: [GENERAL] best practice transitioning from one datatype to another CG cgg...@yahoo.com writes: While transitioning from 8.1 to 8.4, I need to transition to the internal UUID type in place of the contrib/uniqueidentifier module. I've built the database around uniqueidentifier, so nearly every table has one column of that data type. It's going to be tedious to ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE uuid; ...repeat 600 times... I'll also have to drop and reload the views and the rules on tables. It'll be tedious even if the tables have no data in them. Can anyone recommend a better/faster way to make the transition? Couldn't you rename the type to uuid in the 8.1 database before you dump? regards, tom lane
[GENERAL] Server Backup: pg_dump vs pg_dumpall
Hey, I'm writing a backup script. Right now, I only have one database on my postgresql server. I'm deciding if I should use pg_dump or pg_dumpall when backing up the server. As far as I can tell, pg_dumpall cannot compress the dumps automatically and it only dumps data in the standard SQL text file format. This means that I would not be able to use pg_restore to selectively restore the database, correct? What is *NOT* included in a pg_dump that IS in pg_dumpall (Other than all the databases)? Things like user-defined functions and datatypes? Roles? Views? I was leaning towards pg_dumpall, but then I realized that it only dumps in the standard SQL text file format, and it cannot be compressed automatically. Thanks for any advice.
Help needed for reading postgres log : RE: [GENERAL] Concurrency issue under very heay loads
I would like some help in reading the postgres logs. Here is a snippet of the log. Auto commit seems to be set to false. But still the logs shows CommitTransactionCommand in debug mode. The same order number is given for multiple clients. Please see CommitTransactionCommand below for both select ...for update and update... SQLs and let me know if I am reading correctly that auto commit is actually effective. Thanks Raji - 2009-07-17 14:10:59 4736 970134 DEBUG: parse unnamed: SELECT nextEntityID FROM tableEntityID WHERE entityType = $1 FOR UPDATE 2009-07-17 14:10:59 4736 970134 STATEMENT: SELECT nextEntityID FROM tableEntityID WHERE entityType = $1 FOR UPDATE 2009-07-17 14:10:59 4736 970134 DEBUG: StartTransactionCommand 2009-07-17 14:10:59 4736 970134 STATEMENT: SELECT nextEntityID FROM tableEntityID WHERE entityType = $1 FOR UPDATE 2009-07-17 14:10:59 4736 970134 DEBUG: parse tree: 2009-07-17 14:10:59 4736 970134 DETAIL: {QUERY :commandType 1 :querySource 0 :canSetTag true :utilityStmt :resultRelation 0 :into :intoOptions :intoOnCommit 0 :intoTableSpaceName :hasAggs false :hasSubLinks false :rtable ( {RTE :alias :eref {ALIAS :aliasname tableentityid :colnames (entitytype nextentityid) } :rtekind 0 :relid 16420 :inh true :inFromCl true :requiredPerms 6 :checkAsUser 0 } ) :jointree {FROMEXPR :fromlist ( {RANGETBLREF :rtindex 1 } ) :quals {OPEXPR :opno 98 :opfuncid 0 :opresulttype 16 :opretset false :args ( {RELABELTYPE :arg {VAR :varno 1 :varattno 1 :vartype 1043 :vartypmod 68 :varlevelsup 0 :varnoold 1 :varoattno 1 } :resulttype 25 :resulttypmod -1 :relabelformat 2 } {RELABELTYPE :arg {PARAM :paramkind 0 :paramid 1 :paramtype 1043 } :resulttype 25 :resulttypmod -1 :relabelformat 2 } ) } } :targetList ( {TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 1043 :vartypmod 132 :varlevelsup 0 :varnoold 1 :varoattno 2 } :resno 1 :resname nextentityid :ressortgroupref 0 :resorigtbl 16420 :resorigcol 2 :resjunk false } ) :returningList :groupClause :havingQual :distinctClause :sortClause :limitOffset :limitCount :rowMarks ( {ROWMARKCLAUSE :rti 1 :forUpdate true :noWait false } ) :setOperations :resultRelations :returningLists } 2009-07-17 14:10:59 4736 970134 STATEMENT: SELECT nextEntityID FROM tableEntityID WHERE entityType = $1 FOR UPDATE 2009-07-17 14:10:59 4736 970134 DEBUG: rewritten parse tree: 2009-07-17 14:10:59 4736 970134 DETAIL: ( {QUERY :commandType 1 :querySource 0 :canSetTag true :utilityStmt :resultRelation 0 :into :intoOptions :intoOnCommit 0 :intoTableSpaceName :hasAggs false :hasSubLinks false :rtable ( {RTE :alias :eref {ALIAS :aliasname tableentityid :colnames (entitytype nextentityid) } :rtekind 0 :relid 16420 :inh true :inFromCl true :requiredPerms 6 :checkAsUser 0 } ) :jointree {FROMEXPR :fromlist ( {RANGETBLREF :rtindex 1 } ) :quals {OPEXPR :opno 98 :opfuncid 0 :opresulttype 16 :opretset false :args ( {RELABELTYPE :arg {VAR :varno 1 :varattno 1 :vartype 1043 :vartypmod 68 :varlevelsup 0 :varnoold 1 :varoattno 1 } :resulttype 25 :resulttypmod -1 :relabelformat 2 } {RELABELTYPE :arg {PARAM :paramkind 0 :paramid 1 :paramtype 1043 } :resulttype 25 :resulttypmod -1 :relabelformat 2 } ) } } :targetList ( {TARGETENTRY
Re: [GENERAL] Server Backup: pg_dump vs pg_dumpall
Hi -- I'm writing a backup script. Right now, I only have one database on my postgresql server. I'm deciding if I should use pg_dump or pg_dumpall when backing up the server. As far as I can tell, pg_dumpall cannot compress the dumps automatically and it only dumps data in the standard SQL text file format. This means that I would not be able to use pg_restore to selectively restore the database, correct? What is *NOT* included in a pg_dump that IS in pg_dumpall (Other than all the databases)? Things like user-defined functions and datatypes? Roles? Views? The pg_dumpall gets users and groups that are not dumped in the single database versions; I used to use pg_dump on each of several databases on a given server and then pg_dumpall with suitable parameters to get only those bits and nothing else. User defined stuff, views etc. are all in a given database so they will be in the pg_dump. HTH, Greg Williamson -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] killing processes
What's the generally accepted method for killing processes that went 'all wacky' in postgres? I think i've seen in this group that kill -INT would be the way to go. I'm playing around with different options for a median function. this one got out of hand and was taking too long, so i wanted to kill it: test=# select array_median(array(select t1 from test2 order by 1)); ^CCancel request sent It just sits there, it's been trying to die for 1/2 an hour. At the OS it's taking up 100% of the CPU. I tried kill -INT pid but that didn't help. It's not updating anything, and i'm the only one in the database. Fortunatly it's not production, so I don't really care. But if it was production, what would be the method to kill it? (I know about kill -9, i'm assuming that == bad) If this were production, I'd need to end the process, force a rollback (if necessary) and get my CPU back so just waiting for it to die really isn't an option... (PostgreSQL 8.3.5, linux/SLES11) Thanks Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Server Backup: pg_dump vs pg_dumpall
On Mon, Jul 20, 2009 at 4:23 PM, APseudoUtopiaapseudouto...@gmail.com wrote: Hey, I'm writing a backup script. Right now, I only have one database on my postgresql server. I'm deciding if I should use pg_dump or pg_dumpall when backing up the server. As far as I can tell, pg_dumpall cannot compress the dumps automatically and it only dumps data in the standard SQL text file format. This means that I would not be able to use pg_restore to selectively restore the database, correct? What is *NOT* included in a pg_dump that IS in pg_dumpall (Other than all the databases)? Things like user-defined functions and datatypes? Roles? Views? I was leaning towards pg_dumpall, but then I realized that it only dumps in the standard SQL text file format, and it cannot be compressed automatically. I use pg_dump on databases and pg_dumpall --globals to get the user accounts and such. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Concurrency issue under very heay loads
Here is a snippet of the postgres log. Auto commit seems to be set to false. But still the logs shows CommitTransactionCommand in debug mode. The same order number is given for multiple clients. Please see CommitTransactionCommand below for both select ...for update and update... SQLs and let me know if I am reading correctly that auto commit is actually effective. Thanks Raji - 2009-07-17 14:10:59 4736 970134 DEBUG: parse unnamed: SELECT nextEntityID FROM tableEntityID WHERE entityType = $1 FOR UPDATE 2009-07-17 14:10:59 4736 970134 STATEMENT: SELECT nextEntityID FROM tableEntityID WHERE entityType = $1 FOR UPDATE 2009-07-17 14:10:59 4736 970134 DEBUG: StartTransactionCommand 2009-07-17 14:10:59 4736 970134 STATEMENT: SELECT nextEntityID FROM tableEntityID WHERE entityType = $1 FOR UPDATE 2009-07-17 14:10:59 4736 970134 DEBUG: parse tree: 2009-07-17 14:10:59 4736 970134 DETAIL: {QUERY :commandType 1 :querySource 0 :canSetTag true :utilityStmt :resultRelation 0 :into :intoOptions :intoOnCommit 0 :intoTableSpaceName :hasAggs false :hasSubLinks false :rtable ( {RTE :alias :eref {ALIAS :aliasname tableentityid :colnames (entitytype nextentityid) } :rtekind 0 :relid 16420 :inh true :inFromCl true :requiredPerms 6 :checkAsUser 0 } ) :jointree {FROMEXPR :fromlist ( {RANGETBLREF :rtindex 1 } ) :quals {OPEXPR :opno 98 :opfuncid 0 :opresulttype 16 :opretset false :args ( {RELABELTYPE :arg {VAR :varno 1 :varattno 1 :vartype 1043 :vartypmod 68 :varlevelsup 0 :varnoold 1 :varoattno 1 } :resulttype 25 :resulttypmod -1 :relabelformat 2 } {RELABELTYPE :arg {PARAM :paramkind 0 :paramid 1 :paramtype 1043 } :resulttype 25 :resulttypmod -1 :relabelformat 2 } ) } } :targetList ( {TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 1043 :vartypmod 132 :varlevelsup 0 :varnoold 1 :varoattno 2 } :resno 1 :resname nextentityid :ressortgroupref 0 :resorigtbl 16420 :resorigcol 2 :resjunk false } ) :returningList :groupClause :havingQual :distinctClause :sortClause :limitOffset :limitCount :rowMarks ( {ROWMARKCLAUSE :rti 1 :forUpdate true :noWait false } ) :setOperations :resultRelations :returningLists } 2009-07-17 14:10:59 4736 970134 STATEMENT: SELECT nextEntityID FROM tableEntityID WHERE entityType = $1 FOR UPDATE 2009-07-17 14:10:59 4736 970134 DEBUG: rewritten parse tree: 2009-07-17 14:10:59 4736 970134 DETAIL: ( {QUERY :commandType 1 :querySource 0 :canSetTag true :utilityStmt :resultRelation 0 :into :intoOptions :intoOnCommit 0 :intoTableSpaceName :hasAggs false :hasSubLinks false :rtable ( {RTE :alias :eref {ALIAS :aliasname tableentityid :colnames (entitytype nextentityid) } :rtekind 0 :relid 16420 :inh true :inFromCl true :requiredPerms 6 :checkAsUser 0 } ) :jointree {FROMEXPR :fromlist ( {RANGETBLREF :rtindex 1 } ) :quals {OPEXPR :opno 98 :opfuncid 0 :opresulttype 16 :opretset false :args ( {RELABELTYPE :arg {VAR :varno 1 :varattno 1 :vartype 1043 :vartypmod 68 :varlevelsup 0 :varnoold 1 :varoattno 1 } :resulttype 25 :resulttypmod -1 :relabelformat 2 } {RELABELTYPE :arg {PARAM :paramkind 0 :paramid 1 :paramtype 1043 } :resulttype 25 :resulttypmod -1 :relabelformat 2 } ) } } :targetList ( {TARGETENTRY :expr {VAR :varno
Re: [GENERAL] Server Backup: pg_dump vs pg_dumpall
APseudoUtopia wrote: I'm writing a backup script. Right now, I only have one database on my postgresql server. I'm deciding if I should use pg_dump or pg_dumpall when backing up the server. As far as I can tell, pg_dumpall cannot compress the dumps automatically and it only dumps data in the standard SQL text file format. This means that I would not be able to use pg_restore to selectively restore the database, correct? What is *NOT* included in a pg_dump that IS in pg_dumpall (Other than all the databases)? Things like user-defined functions and datatypes? Roles? Views? I was leaning towards pg_dumpall, but then I realized that it only dumps in the standard SQL text file format, and it cannot be compressed automatically. Things that span all databases in your cluster (like your roles) do not get saved with pg_dump. So yes, if you only have one database, pg_dumpall is probably what you should be using. You can always pipe the output of pg_dumpall to your compression program of choice. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] killing processes
On Mon, Jul 20, 2009 at 4:44 PM, David Kerrd...@mr-paradox.net wrote: What's the generally accepted method for killing processes that went 'all wacky' in postgres? I think i've seen in this group that kill -INT would be the way to go. I'm playing around with different options for a median function. this one got out of hand and was taking too long, so i wanted to kill it: test=# select array_median(array(select t1 from test2 order by 1)); ^CCancel request sent It just sits there, it's been trying to die for 1/2 an hour. At the OS it's taking up 100% of the CPU. I tried kill -INT pid but that didn't help. It's not updating anything, and i'm the only one in the database. Fortunatly it's not production, so I don't really care. But if it was production, what would be the method to kill it? (I know about kill -9, i'm assuming that == bad) If this were production, I'd need to end the process, force a rollback (if necessary) and get my CPU back so just waiting for it to die really isn't an option... (PostgreSQL 8.3.5, linux/SLES11) What's most likely happening is that it's stuck in a tight loop that doesn't check for interrupts, so it just keeps on going. You can kill -9 a process. It'll cause the postmaster to kill all backends and flush the buffers if I remember correctly. Yeah, not the ideal solution in production but on a non-prod machine it's an ok way to get out of these issues. And even in production, it's often much faster to kill -9 a single process than to wait for it to finish. I think there might be choices other than -9 here, but I can't recall them off the top of my head. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Server Backup: pg_dump vs pg_dumpall
APseudoUtopia wrote: Hey, I'm writing a backup script. Right now, I only have one database on my postgresql server. I'm deciding if I should use pg_dump or pg_dumpall when backing up the server. As far as I can tell, pg_dumpall cannot compress the dumps automatically and it only dumps data in the standard SQL text file format. This means that I would not be able to use pg_restore to selectively restore the database, correct? What is *NOT* included in a pg_dump that IS in pg_dumpall (Other than all the databases)? Things like user-defined functions and datatypes? Roles? Views? views are part of a database, so they are included, so are user defined functions.whats not included are, primarily, roles/users. I was leaning towards pg_dumpall, but then I realized that it only dumps in the standard SQL text file format, and it cannot be compressed automatically. pgdumpall | gzip dumpfile.sql.gz -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] killing processes
On Mon, Jul 20, 2009 at 6:48 PM, Scott Marlowescott.marl...@gmail.com wrote: On Mon, Jul 20, 2009 at 4:44 PM, David Kerrd...@mr-paradox.net wrote: What's the generally accepted method for killing processes that went 'all wacky' in postgres? I think i've seen in this group that kill -INT would be the way to go. I'm playing around with different options for a median function. this one got out of hand and was taking too long, so i wanted to kill it: test=# select array_median(array(select t1 from test2 order by 1)); ^CCancel request sent It just sits there, it's been trying to die for 1/2 an hour. At the OS it's taking up 100% of the CPU. I tried kill -INT pid but that didn't help. It's not updating anything, and i'm the only one in the database. Fortunatly it's not production, so I don't really care. But if it was production, what would be the method to kill it? (I know about kill -9, i'm assuming that == bad) If this were production, I'd need to end the process, force a rollback (if necessary) and get my CPU back so just waiting for it to die really isn't an option... (PostgreSQL 8.3.5, linux/SLES11) What's most likely happening is that it's stuck in a tight loop that doesn't check for interrupts, so it just keeps on going. You can kill -9 a process. It'll cause the postmaster to kill all backends and flush the buffers if I remember correctly. That's right -- and if there is any case where built in functions, sql, or pgsql functions get stuck in such a way that a kill -9 is required, it should be reported so that it can be fixed. Trust me, you should avoid kill -9 if at all possible. Backend C functions you write should check interrupts at appropriate places so you can respond to cancels appropriately with CHECK_FOR_INTERRUPTS(); A good example if how this is done, take a look at ITAGAKI's proposed enhancement of dblink and the attached patch here: http://www.nabble.com/query-cancel-issues-in-contrib-dblink-td24214133.html merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] killing processes
Scott Marlowe scott.marl...@gmail.com writes: On Mon, Jul 20, 2009 at 4:44 PM, David Kerrd...@mr-paradox.net wrote: I'm playing around with different options for a median function. this one got out of hand and was taking too long, so i wanted to kill it: test=# select array_median(array(select t1 from test2 order by 1)); ^CCancel request sent It just sits there, it's been trying to die for 1/2 an hour. What's most likely happening is that it's stuck in a tight loop that doesn't check for interrupts, so it just keeps on going. Yeah, that was my first thought. What is array_median()? If it's custom C code, you probably forgot to put CHECK_FOR_INTERRUPTS in suitable place(s). You can kill -9 a process. It'll cause the postmaster to kill all backends and flush the buffers if I remember correctly. Yeah, not the ideal solution in production but on a non-prod machine it's an ok way to get out of these issues. And even in production, it's often much faster to kill -9 a single process than to wait for it to finish. I think there might be choices other than -9 here, but I can't recall them off the top of my head. Kill -9 on the postmaster is disrecommended for a number of reasons, but kill -9 on an individual backend is quite predictable. What it will do is cause *all* of your open sessions to get aborted. If you can tolerate that then do it. If you can't tolerate that, you can try a SIGTERM, but I suspect that if the thing is too hosed to notice a cancel (a/k/a SIGINT) then it won't notice SIGTERM either. SIGTERM on a single backend isn't as safe as the other options anyhow. There used to be bugs in that code path (still ARE known bugs, if you're not running a pretty recent PG release) and I wouldn't trust it totally. One thing you should definitely not try is SIGQUIT. That will kill the process all right (unless it's wedged a lot worse than I think) but it will not do anything about cleaning up its shared memory state, and that almost certainly will cause problems later. Better to find a time when it's okay to use SIGKILL. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] killing processes
On Mon, Jul 20, 2009 at 07:18:07PM -0400, Merlin Moncure wrote: - On Mon, Jul 20, 2009 at 6:48 PM, Scott Marlowescott.marl...@gmail.com wrote: - On Mon, Jul 20, 2009 at 4:44 PM, David Kerrd...@mr-paradox.net wrote: - What's most likely happening is that it's stuck in a tight loop that - doesn't check for interrupts, so it just keeps on going. - - You can kill -9 a process. It'll cause the postmaster to kill all - backends and flush the buffers if I remember correctly. - - That's right -- and if there is any case where built in functions, - sql, or pgsql functions get stuck in such a way that a kill -9 is - required, it should be reported so that it can be fixed. Trust me, - you should avoid kill -9 if at all possible. Backend C functions you - write should check interrupts at appropriate places so you can respond - to cancels appropriately with CHECK_FOR_INTERRUPTS(); - - A good example if how this is done, take a look at ITAGAKI's proposed - enhancement of dblink and the attached patch here: - - http://www.nabble.com/query-cancel-issues-in-contrib-dblink-td24214133.html Thanks guys. I wonder if this is a case for a bug, of course, it's probably not reproduceable. But, i don't see any coded loop or way for me to insert a signal check. (I'm not much of a programmer) the function was just: CREATE OR REPLACE FUNCTION array_median(anyarray) RETURNS anyelement AS $$ SELECT CASE WHEN array_upper($1,1) = 0 THEN null WHEN mod(array_upper($1,1),2) = 1 THEN asorted[ceiling(array_upper(asorted,1)/2.0)] ELSE ((asorted[ceiling(array_upper(asorted,1)/2.0)] + asorted[ceiling(array_upper(asorted,1)/2.0)+1])/2.0) END FROM (SELECT ARRAY(SELECT ($1)[n] FROM generate_series(1, array_upper($1, 1)) AS n WHERE ($1)[n] IS NOT NULL ORDER BY ($1)[n] ) As asorted) As foo ; $$ LANGUAGE 'sql' IMMUTABLE; (got it from here - thanks to the author =) ) http://www.postgresonline.com/journal/index.php?/archives/67-Build-Median-Aggregate-Function-in-SQL.html Thanks Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Server Backup: pg_dump vs pg_dumpall
John R Pierce wrote: APseudoUtopia wrote: ... I was leaning towards pg_dumpall, but then I realized that it only dumps in the standard SQL text file format, and it cannot be compressed automatically. pgdumpall | gzip dumpfile.sql.gz That deals with compression. But if you want to use the new parallel-restore feature in 8.4 pg_restore which can be *way* faster if you have multiple cores available then you will need to backup using pg_dump with the custom format (-Fc). Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: Help needed for reading postgres log : RE: [GENERAL] Concurrency issue under very heay loads
Raji Sridar (raji) wrote: I would like some help in reading the postgres logs. Here is a snippet of the log. Auto commit seems to be set to false. But still the logs shows CommitTransactionCommand in debug mode. The same order number is given for multiple clients. Please see CommitTransactionCommand below for both select ...for update and update... SQLs and let me know if I am reading correctly that auto commit is actually effective. CommitTransactionCommand is an internal function that has nothing to do with a SQL-level COMMIT. If there were a true transaction commit you'd see a debug entry saying CommitTransaction. You seem to be barking up the wrong tree here. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] First query very slow. Solutions: memory, or settings, or SQL?
On Mon, Jul 20, 2009 at 2:04 PM, Peter Eisentrautpete...@gmx.net wrote: On Monday 20 July 2009 06:45:40 Phoenix Kiula wrote: explain analyze select * from sites where user_id = 'phoenix' order by id desc limit 10; QUERY PLAN --- --- Limit (cost=344.85..344.87 rows=10 width=262) (actual time=5879.069..5879.167 rows=10 loops=1) - Sort (cost=344.85..345.66 rows=323 width=262) (actual time=5879.060..5879.093 rows=10 loops=1) Sort Key: id - Index Scan using new_idx_sites_userid on sites (cost=0.00..331.39 rows=323 width=262) (actual time=44.408..5867.557 rows=2178 loops=1) Index Cond: ((user_id)::text = 'phoenix'::text) Total runtime: 5879.414 ms (6 rows) The row estimate for the index scan is off. Try reanalyzing, or increase the statistics target. How did you tell that the row estimate is off? I mean which numbers? Also, my statistics are already set pretty high. On USER_ID they are at 100. Which columns should I increase the stats on, those in the WHERE clause or those in the SELECT bit? The USER_ID and the USER_KNOWN, which are in the WHERE clause, both have statistics of 100! And USER_KNOWN is just a binary value (0 or 1) so I wonder what purpose stats would serve? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Best practices for moving UTF8 databases
0x80 makes me think of the following: The data originate from a Windows system, where 0x80 is a Euro sign. Somehow these were imported into PostgreSQL without the appropriate translation into UTF-8 (how I do not know). I wonder: why do you spend so much time complaining instead of simply locating the buggy data and fixing them? I'd love to fix them. But if I do a search for SELECT * FROM xyz WHERE col like '%0x80%' it doesn't work. How should I search for these characters? Thanks much -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] commercial adaptation of postgres
I once talked to a company that made a custome version of Postgres. It split tables up on columns and also by rows, had some other custome features. It was enormously faster from what I gathered. I could of sworn it began with the letter 'T', but maybe not. I don't see anything like that on the commercial page of the posgres site. Does anyone know what it is out there in enterprise commercially modified postgres servers? (on 64 bit machines, preferably) Dennis Gearon Signature Warning EARTH has a Right To Life I agree with Bolivian President Evo Morales # The right to life: The right for no ecosystem to be eliminated by the irresponsible acts of human beings. # The right of biosystems to regenerate themselves: Development cannot be infinite. There's a limit on everything. # The right to a clean life: The right for Mother Earth to live without contamination, pollution. Fish and animals and trees have rights. # The right to harmony and balance between everyone and everything: We are all interdependent. See the movie - 'Inconvenient Truth' See the movie - 'Syriana' -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] commercial adaptation of postgres
On Jul 20, 2009, at 6:56 PM, Dennis Gearon wrote: I once talked to a company that made a custome version of Postgres. It split tables up on columns and also by rows, had some other custome features. It was enormously faster from what I gathered. I could of sworn it began with the letter 'T', but maybe not. I don't see anything like that on the commercial page of the posgres site. Truviso? http://truviso.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] commercial adaptation of postgres
On Jul 20, 2009, at 8:56 PM, Dennis Gearon wrote: I once talked to a company that made a custome version of Postgres. It split tables up on columns and also by rows, had some other custome features. It was enormously faster from what I gathered. I could of sworn it began with the letter 'T', but maybe not. I don't see anything like that on the commercial page of the posgres site. Does anyone know what it is out there in enterprise commercially modified postgres servers? (on 64 bit machines, preferably) Yahoo did that, they where planning to open source it, that's all I know -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] killing processes
David Kerr d...@mr-paradox.net writes: But, i don't see any coded loop or way for me to insert a signal check. (I'm not much of a programmer) the function was just: CREATE OR REPLACE FUNCTION array_median(anyarray) RETURNS anyelement AS $$ SELECT CASE WHEN array_upper($1,1) = 0 THEN null WHEN mod(array_upper($1,1),2) = 1 THEN asorted[ceiling(array_upper(asorted,1)/2.0)] ELSE ((asorted[ceiling(array_upper(asorted,1)/2.0)] + asorted[ceiling(array_upper(asorted,1)/2.0)+1])/2.0) END FROM (SELECT ARRAY(SELECT ($1)[n] FROM generate_series(1, array_upper($1, 1)) AS n WHERE ($1)[n] IS NOT NULL ORDER BY ($1)[n] ) As asorted) As foo ; $$ LANGUAGE 'sql' IMMUTABLE; Huh. How big an array were you trying to invoke it on? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] memory leak occur when disconnect database
Because of the three-day break, my response is late. Valgrind is a great tool, but you must learn how to identify false positives and tell the difference between a leak that matters (say 1kb allocated and not freed in a loop that runs once per second) and a leak that doesn't. I get the memory leak scenario not only from Valgrind, but also from the output of top command. At first I think the memory leak occur when I disconnect database by Valgrind, then I write a test sample that just connect and disconnect database in a infinite loop to check it. after 1 hour performation, 7M memory is leaked. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] memory leak occur when disconnect database
Because of the three-day break, my response is late. 8.1.8 is pretty old. Also you'll have better luck getting help if you actually include the output from Valgrind. the output from Valgrind is not stored. from now on, I will do it again and get the result from Valgrind. PS: the memory leak scenario is not only got from Valgrind, but also from the period output of top command. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general