Re: [GENERAL] invalid byte sequence for encoding UTF8: 0xf1612220
2011/5/12 Craig Ringer cr...@postnewspapers.com.au: On 05/11/2011 03:16 PM, AI Rumman wrote: I am trying to migrate a database from Postgresql 8.2 to Postgresql 8.3 and getting the following error: pg_restore: [archiver (db)] Error from TOC entry 2764; 0 29708702 TABLE DATA originaldata postgres pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding UTF8: 0xf1612220 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. CONTEXT: COPY wi_originaldata, line 3592 I took a dump from 8.2 server and then tried to restore at 8.3. Both the client_encoding and server_encoding are UTF8 at both the servers. Newer versions of Pg got better at caching bad unicode. While this helps prevent bad data getting into the database, it's a right pain if you're moving data over from an older version with less strict checks. I don't know of any way to relax the checks for the purpose of importing dumps. You'll need to fix your dump files before loading them (by finding the faulty text and fixing it) or fix it in the origin database before migrating the data. Neither approach is nice or easy, but nobody has yet stepped up to write a unicode verifier tool that checks old databases' text fields against stricter rules... The 2 following articles have SQL functions and documentation you may find useful: http://tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_1.html http://tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_2.html -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Read Committed transaction with long query
Durumdara wrote: Two table: Main Lookup The query is: select Main.*, Lookup.Name left join Lookup on (Main.Type_ID = Lookup.ID) hat's not correct SQL, but I think I understand what you mean. Lookup: ID Name 1 Value1 2 Value 2 3 Value 3 Many records is in Main table (for example 1 million). What happens in this case (C = connection): C1.) begin read committed C1.) starting this query C1.) query running C2.) begin read committed C2.) update Lookup set Name = New2 where ID = 2 C2.) commit C1.) query running C1.) query finished Is it possible to the first joins (before C2 modifications) are containing Value2 on the beginning of the query and New2 on the end of the query? So is it possible to the long query is containing not consistent state because of C2's changing? For example mixing Value2 and New2? No, this is not possible. See http://www.postgresql.org/docs/current/static/transaction-iso.html#XACT- READ-COMMITTED : When a transaction uses this [read committed] isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] vacuumdb with cronjob needs password since 9.0?
Hi list, i always vaccumed my postgresql automatically with crontab, because autovacuum is not suitable for my applications. With version 8.2 it works perfect for me with this command line: 00 02 * * *postgres /usr/bin/vacuumdb -d gis -z But not with 9.0, because vacuumdb now wants to have the password to connect to the db. i did not find any options to send the password with the command line in vacuumdb!? CheersAndreas -- Dipl. Geoökologe Andreas Laggner Institut für Agrarrelevante Klimaforschung (AK) des vTI Arbeitsgruppe Emissionsinventare Johann Heinrich von Thünen-Institut (vTI), Bundesforschungsinstitut für Ländliche Räume, Wald und Fischerei Institute of Agricultural Climate Research (AK) of the vTI Johann Heinrich von Thünen-Institute (vTI), Federal Research Institute for Rural Areas, Forestry and Fisheries Bundesallee 50 D-38116 Braunschweig Tel.: (+49) (0)531 596 2636 Fax : (+49) (0)531 596 2645 E-mail: andreas.lagg...@vti.bund.de Homepage: http://www.vti.bund.de -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Read Committed transaction with long query
Hi! 2011/5/12 Albe Laurenz laurenz.a...@wien.gv.at: Durumdara wrote: Two table: Main Lookup The query is: select Main.*, Lookup.Name left join Lookup on (Main.Type_ID = Lookup.ID) hat's not correct SQL, but I think I understand what you mean. Sorry, the from is missed here... :-( Lookup: ID Name 1 Value1 2 Value 2 3 Value 3 Many records is in Main table (for example 1 million). What happens in this case (C = connection): C1.) begin read committed C1.) starting this query C1.) query running C2.) begin read committed C2.) update Lookup set Name = New2 where ID = 2 C2.) commit C1.) query running C1.) query finished Is it possible to the first joins (before C2 modifications) are containing Value2 on the beginning of the query and New2 on the end of the query? So is it possible to the long query is containing not consistent state because of C2's changing? For example mixing Value2 and New2? No, this is not possible. Thanks! Great! See http://www.postgresql.org/docs/current/static/transaction-iso.html#XACT- READ-COMMITTED : When a transaction uses this [read committed] isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. Query is meaning statement here? For example if I have more statement in one Query are they running separatedly? They can be see the modifications? Query text (or stored procedure body): insert into ... ; + update ...; + select ... Are they handled as one unit, or they are handled one by one? AutoCommit = False! Thanks: dd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuumdb with cronjob needs password since 9.0?
Andreas Laggner andreas.lagg...@vti.bund.de writes: Hi list, i always vaccumed my postgresql automatically with crontab, because autovacuum is not suitable for my applications. With version 8.2 it works perfect for me with this command line: 00 02 * * *postgres /usr/bin/vacuumdb -d gis -z But not with 9.0, because vacuumdb now wants to have the password to connect to the db. i did not find any options to send the password with the command line in vacuumdb!? Password on command line a bad habit anyway and especially for a possibly long running job like vacuumdb. Have a look at setting the pw in the .pgpass file for the invoking user. If you insist on doing it on cmd line; try; PGPASSWORD=foo vacuumdb ... HTH -- Jerry Sievers Postgres DBA/Development Consulting e: gsiever...@comcast.net p: 305.321.1144 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuumdb with cronjob needs password since 9.0?
On Thu, May 12, 2011 at 10:56:20AM +0200, Andreas Laggner wrote: Hi list, i always vaccumed my postgresql automatically with crontab, because autovacuum is not suitable for my applications. With version 8.2 it works perfect for me with this command line: 00 02 * * *postgres /usr/bin/vacuumdb -d gis -z But not with 9.0, because vacuumdb now wants to have the password to connect to the db. version has nothing to do with it. You had to change pg_hba.conf - most likely you changes trust authentication for local connections to something like md5 or password. it works exactly the same way in 8.2 as in 9.0 - if connection has trust authenticator - password is not necessary. i did not find any options to send the password with the command line in vacuumdb!? best options are to either use .pgpass file (described here: http://www.postgresql.org/docs/9.0/static/libpq-pgpass.html) or setup system in such way that postgres user can login locally without password, using ident authenticator. be warned though that ident, when not well configured, is a common source of problems - described for example here: http://www.depesz.com/index.php/2007/10/04/ident/ Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to handle bogus nulls from ActiveRecord
It is required for application data verification filters that default values for table columns are known to ActiveRecord when creating a new row. So ActiveRecord obtains the default values from the tables dynamically and assigns them to their appropriate column attributes. The problem we encounter arises because ActiveRecord then uses those column assignments when inserting a row even if the column is not otherwise referenced. I am developing a web application using the Ruby on Rails framework with PostgreSQL as the back-end store. In one of our tables we have a column called expected_by which is a time-stamp. It is set to NOT NULL DEFAULT 'INFINITY'. However, Ruby has no concept of infinity and whatever the PostgreSQL adapter is returning for it ActiveRecord receives as nil which is converted to NULL. So, the real fix to this is to alter the persistence class so that columns with default values are not explicitly set to those values on insert. This is unlikely to happen in the short term and will take some time to be integrated into the framework even when it is completed, if ever, So solve this for the moment what I think I require is a trigger on expected_at which tests for NULL on insert and converts it to infinity. The other alternative is to simply set the default to some valid, but unreachable, date like -12-31. I would like other opinions about how to best handle this situation and observations on what other significant concerns I may not be aware of but should provide for. -- *** E-Mail is NOT a SECURE channel *** James B. Byrnemailto:byrn...@harte-lyne.ca Harte Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] auto-reconnect: temp schemas, sequences, transactions
Hi, Just to sum things up: On Wednesday 04 May 2011 19:21:42 Tom Lane wrote: Well, I think it's foolish to imagine that a client library should try to do transparent reconnection: it's somewhere between difficult and impossible to keep track of all the server-side state that the application might be relying on, above and beyond the immediate problem of an unfinished transaction. After sleeping on it - I now agree 100%. (A simple example would be savepoints... The idea to try to create trans in error was silly, I must say.) It's almost always better to punt the problem back to the application, and let it decide whether to try again or just curl up and die. Yes. I dug into it a bit more and I have found the magic place where the library which I'm using did a silent reconnection in the background. Now I think this is the place which is wrong - if connection is not re-established applications have a chance to notice that something went wrong and react appropriately (do a proper clean-up, or reconnect, or abort etc.). If you have server restarts occurring often enough that this seems useful to work on, then I submit that you have problems you ought to be fixing on the server side instead. Agreed. For your information, it does not happen that often, but when it did (once in two years...) was scary enough to trigger an investigation. Tom, thank you very much for your help! Best, ~Marek -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Custom Data Type size - too big overhead?
Hi, we are trying to minimize data storage size as possible. We'd like to replace BOX and POINT datatypes with our own. We don't need double precision - 4 bytes integer would be totally fine. I tried following experiment in which custom data type vPointInt of 4 integers takes 28 bytes. Why is that? Create Type vPointInt AS( a integer, b integer ) CREATE TABLE rt( id int, pt vPointInt ); insert into rt values(1, (4,4) ); select *, (pg_dump(pt)).* from rt limit 15; 1;(4,4);vpointint;30712;28;000,000,000,002,000,000,000,023,000,000,000,004,000,000,000,004,000,000,000,023,000,000,000,004,000,000,000,004;\000\000\000\002\000\000\000\027\000\000\000\004\000\000\000\004\000\000\000\027\000\000\000\004\000\000\000\004 -- View this message in context: http://postgresql.1045698.n5.nabble.com/Custom-Data-Type-size-too-big-overhead-tp4389681p4389681.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
On 11.05.11 17:04, t...@fuzzy.cz t...@fuzzy.cz wrote: We had exactly the same problem and persistent connection solved it. First testing with persistent connections seems to work like a charm. Will do some thorough testing and watch the memory load. Hopefully, I will not trip over some sort of pitfall. Goole seems to be full of people who have problems with persistent connections. Big thanks for your advice. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search to_tsquery performance with ispell dictionary
On 11.05.11 17:31, Tom Lane t...@sss.pgh.pa.us wrote: You really, really, really need to fix whatever is preventing you from using pooling. Opening a database connection to run one query is just horridly inefficient. Very true. I did not mean that anything actually prevents us from using pooling. We just have no idea, how it will interfere with our productive pgcluster setup. I imagine the evaluation, testing and verification of pooling systems in combination with our setup to be quite tedious. Of course, I don't open a connection for each query. One is opened for each service call. The services are designed to start an own process for every call. Such a process - for now - needs an own connection. It usually handles dozens of queries. Until now, we never ran into performance problems. The time consumed by DB operations is usually negligible compared to the rest. First tests with a simple persistent connection setup seem to work fine and solve the performance issue. I tend to put some thorough testing on this setup and see if I step into a pitfall. Big kudos to you and this list. You were a great help, as always. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] vacuumdb with cronjob needs password since 9.0? SOLVED
thank you depesz, your help was very useful! Am 12.05.2011 13:19, schrieb hubert depesz lubaczewski: On Thu, May 12, 2011 at 10:56:20AM +0200, Andreas Laggner wrote: Hi list, i always vaccumed my postgresql automatically with crontab, because autovacuum is not suitable for my applications. With version 8.2 it works perfect for me with this command line: 00 02 * * *postgres /usr/bin/vacuumdb -d gis -z But not with 9.0, because vacuumdb now wants to have the password to connect to the db. version has nothing to do with it. You had to change pg_hba.conf - most likely you changes trust authentication for local connections to something like md5 or password. it works exactly the same way in 8.2 as in 9.0 - if connection has trust authenticator - password is not necessary. i did not find any options to send the password with the command line in vacuumdb!? best options are to either use .pgpass file (described here: http://www.postgresql.org/docs/9.0/static/libpq-pgpass.html) or setup system in such way that postgres user can login locally without password, using ident authenticator. be warned though that ident, when not well configured, is a common source of problems - described for example here: http://www.depesz.com/index.php/2007/10/04/ident/ Best regards, depesz -- Dipl. Geoökologe Andreas Laggner Institut für Agrarrelevante Klimaforschung (AK) des vTI Arbeitsgruppe Emissionsinventare Johann Heinrich von Thünen-Institut (vTI), Bundesforschungsinstitut für Ländliche Räume, Wald und Fischerei Institute of Agricultural Climate Research (AK) of the vTI Johann Heinrich von Thünen-Institute (vTI), Federal Research Institute for Rural Areas, Forestry and Fisheries Bundesallee 50 D-38116 Braunschweig Tel.: (+49) (0)531 596 2636 Fax : (+49) (0)531 596 2645 E-mail: andreas.lagg...@vti.bund.de Homepage: http://www.vti.bund.de -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Regexp match not working.. (SQL help)
On Wed, May 11, 2011 at 11:18 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote: I have a text column in a table, which I want to search through -- seeking the occurrence of about 300 small strings in it. Let's say the table is like this: table1 ( id bigint primary key ,mytext text ,mydate timestamp without time zone ); I am using this SQL: SELECT id FROM table1 WHERE mytext ~* E'sub1|sub2|sub3|sub4...' LIMIT 10; This is basically working, but some of the mytext columns being returned that do not contain any of these substrings. Am I doing the POSIX regexp wrongly? This same thing works when I try it in PHP with preg_match. But not in Postgresql. I have tried several variations too: WHERE mytext ~* E'(sub1)(sub2)(sub3)(sub4)...' None of this is working. I cannot seem to get out the results that do NOT contain any of those strings. Appreciate any pointers! Thanks! My bad. I figured out that the pipe should only separate the strings to be searched. I had one stray pipe at the end: SELECT id FROM table1 WHERE mytext ~* E'sub1|sub2|sub3|subXY|' LIMIT 10; This meant that it was matching, well basically anything. Sorry. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Massive delete from a live production DB
Hi Been reading some old threads (pre 9.x version) and it seems that the consensus is to avoid doing massive deletes from a table as it'll create so much unrecoverable space/gaps that vacuum full would be needed. Etc. Instead, we might as well do a dump/restore. Faster, cleaner. This is all well and good, but what about a situation where the database is in production and cannot be brought down for this operation or even a cluster? Any ideas on what I could do without losing all the live updates? I need to get rid of about 11% of a 150 million rows of database, with each row being nearly 1 to 5 KB in size... Thanks! Version is 9.0.4. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Read Committed transaction with long query
Durumdara wrote: C1.) begin read committed C1.) starting this query C1.) query running C2.) begin read committed C2.) update Lookup set Name = New2 where ID = 2 C2.) commit C1.) query running C1.) query finished Is it possible to the first joins (before C2 modifications) are containing Value2 on the beginning of the query and New2 on the end of the query? So is it possible to the long query is containing not consistent state because of C2's changing? For example mixing Value2 and New2? No, this is not possible. See http://www.postgresql.org/docs/current/static/transaction-iso.html#XACT-READ-COMMITTED : When a transaction uses this [read committed] isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. Query is meaning statement here? For example if I have more statement in one Query are they running separatedly? They can be see the modifications? Query text (or stored procedure body): insert into ... ; + update ...; + select ... Are they handled as one unit, or they are handled one by one? AutoCommit = False! Query is usually used as a synonym for SQL statement, but the term is probably not exactly defined. To be more precise in this case one could say a single reading SQL statement. So if you have several consecutive statements, each one may see different data. This is the case, no matter if all statements run in one transaction or not. If you want several statements to see exactly the same data (a snapshot of the database), you have to pack them into one transaction and use isolation level REPEATABLE READ. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
Am 12.05.2011 16:38, schrieb Phoenix Kiula: On Thu, May 12, 2011 at 10:33 PM, Eric Ndengang eric.ndengang_fo...@affinitas.de wrote: Am 12.05.2011 16:23, schrieb Phoenix Kiula: Hi Been reading some old threads (pre 9.x version) and it seems that the consensus is to avoid doing massive deletes from a table as it'll create so much unrecoverable space/gaps that vacuum full would be needed. Etc. Instead, we might as well do a dump/restore. Faster, cleaner. This is all well and good, but what about a situation where the database is in production and cannot be brought down for this operation or even a cluster? Any ideas on what I could do without losing all the live updates? I need to get rid of about 11% of a 150 million rows of database, with each row being nearly 1 to 5 KB in size... Thanks! Version is 9.0.4. Hey, try to use pg_reorg -- http://reorg.projects.postgresql.org but the table must get a primary key. regards Thanks Eric. I do have a primary key. I am on version 9.0.4. Will pg_reorg work with this version too? The example on that website mentions 8.3. Also, it it a fast process that does not consume too much resource? This DB is behind a very high traffic website, so I cannot have a CLUSTER alternative like pg_reog making my DB very slow concurrently. How does one install the patch easily on CentOS (Linux) 64 bit? Thanks! Hi, /* I am on version 9.0.4. Will pg_reorg work with this version too? The example on that website mentions 8.3. */ I used to use pg_reorg on version 8.4.8 and regarding the documentation it will also work with the 9.0 version. /* How does one install the patch easily on CentOS (Linux) 64 bit? */ You can easily install it as a contrib . Just read the installation guide or the man Page. /* Also, it it a fast process that does not consume too much resource? This DB is behind a very high traffic website, so I cannot have a CLUSTER alternative like pg_reog making my DB very slow concurrently.*/ Yes, it's a fast process that is neither time nor resource consumming. The reorgainization of a table with about 60 million could take less than 8 minutes without higher cpu cost. cheers -- Eric Ndengang Datenbankadministrator Affinitas GmbH | Kohlfurter Straße 41/43 | 10999 Berlin | Germany email: eric.ndengang_fo...@affinitas.de | tel: +49.(0)30. 991 949 5 0 | www.edarling.de Geschäftsführer: Lukas Brosseder, David Khalil, Kai Rieke, Christian Vollmann Eingetragen beim Amtsgericht Berlin, HRB 115958 Real People: www.edarling.de/echte-paare Real Love:www.youtube.de/edarling Real Science: www.edarling.org -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
In response to Phoenix Kiula phoenix.ki...@gmail.com: Hi Been reading some old threads (pre 9.x version) and it seems that the consensus is to avoid doing massive deletes from a table as it'll create so much unrecoverable space/gaps that vacuum full would be needed. Etc. Instead, we might as well do a dump/restore. Faster, cleaner. This is all well and good, but what about a situation where the database is in production and cannot be brought down for this operation or even a cluster? Any ideas on what I could do without losing all the live updates? I need to get rid of about 11% of a 150 million rows of database, with each row being nearly 1 to 5 KB in size... Have you considered the following process: 1) SELECT the rows you want to keep into a new table (time-consuming) 2) Start outage 3) Pull over any new rows that might have been added between 1 2 4) Drop the old table 5) Rename the new table to the old name 6) Any other steps required to make the new table exactly like the old one (i.e. foreign keys, serials, etc) 7) End outage window Because steps 3 - 6 are very fast, your outage window is very short. Not a perfect, 0 downtime solution, but possibly helpful. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
On Thu, May 12, 2011 at 17:23, Phoenix Kiula phoenix.ki...@gmail.com wrote: Been reading some old threads (pre 9.x version) and it seems that the consensus is to avoid doing massive deletes from a table as it'll create so much unrecoverable space/gaps that vacuum full would be needed. Etc. Just running DELETE with normal autovacuum won't *shrink* the physical table, but the freed-up space will be made available for future inserts/updates. No problem there. Fragmentation of newly inserted records is still a potential issue. It's true that pre-8.4 PostgreSQL versions you could run into dead space that couldn't be re-used, if you had badly tuned FSM. I presume this is why VACUUM FULL was recommended -- but this advice no longer applies to 8.4 or 9.0. Instead, we might as well do a dump/restore. Faster, cleaner. Any ideas on what I could do without losing all the live updates? I need to get rid of about 11% of a 150 million rows of database, with each row being nearly 1 to 5 KB in size... For deleting 11%, a dump and restore of 150 million records and hundreds of gigabytes doesn't seem worth it. If it was closer to 50%, then I'd consider it. Regards, Marti -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
[snip] You can easily install it as a contrib . Just read the installation guide or the man Page. Thanks Eric. How though? The instructions here -- http://reorg.projects.postgresql.org/pg_reorg.html -- are woefully incomplete. I have a standard PG install on WHM/Cpanel type server. I know the path to pgsql. I can download the pg_reorg.1.1.5.tar.gz into this folder and untar it. Then what? A make and make install does not work -- the usual ./config stuff is not available. Sorry, I need more detailed steps. I googled and found this: http://www.postgresql.org/docs/9.0/static/contrib.html But the recommended steps: gmake gmake install ...don't work either. Here's what I see: [mydomain] src cd pg_reorg-1.1.5 [mydomain] pg_reorg-1.1.5 gmake Makefile:13: ../../src/Makefile.global: No such file or directory gmake: *** No rule to make target `../../src/Makefile.global'. Stop. [mydomain] pg_reorg-1.1.5 What am I missing? PS. If pg_reorg is such a useful contribution, why can't it be included with PG? Seems like a very useful tool anyway! No? Thanks. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgredac Dump
Hi Can anyone point me to an example of how to use the Postgresdac Dump component?? Bob
Re: [GENERAL] How to handle bogus nulls from ActiveRecord
-Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- ow...@postgresql.org] On Behalf Of James B. Byrne Sent: Thursday, May 12, 2011 9:12 AM To: pgsql-general@postgresql.org Subject: [GENERAL] How to handle bogus nulls from ActiveRecord So solve this for the moment what I think I require is a trigger on expected_at which tests for NULL on insert and converts it to infinity. The other alternative is to simply set the default to some valid, but unreachable, date like -12-31. Not a huge fan of Infinity as a value...but that just may be lack of experience. I'd probably remove the NOT NULL constraint on expected_at and deal with tri-value logic; or also include a boolean (is_expected) and form queries like NOT is_expected OR (is_expected AND expected_at op timestamp) is_expected could be a calculated value in a view to make things somewhat easier; otherwise you'd need a table constraint to ensure non-null expected has a true is_expected. Without more info as to how you use expected_at other advice is difficult but can you user a meaningful value (say now()+'30 days'::interval) for the default? David J. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to handle bogus nulls from ActiveRecord
On Thu, May 12, 2011 12:40, David Johnston wrote: Not a huge fan of Infinity as a value...but that just may be lack of experience. I'd probably remove the NOT NULL constraint on expected_at and deal with tri-value logic; or also include a boolean (is_expected) and form queries like Well, actually, the reason for the NOT NULL constraint is to catch application errors exactly like this one. Removing it is not contemplated. I had no idea that AR actually 'copied' and used default values on columns that were not referenced in the application code until I encountered this. And had it gone undetected this would have been a major problem later on. As it was, our tests brought it to our attention quite early which is why we can contemplate several solutions. Without more info as to how you use expected_at other advice is difficult but can you user a meaningful value (say now()+'30 days'::interval) for the default? The column expected_by contains an estimated time of arrival for a particular conveyance. When a row is initialized this value is unknown some of the time. The expected_by value is reset to the arrived_at value on UPDATE if and only if expected_by is greater than arrived_at. Conveyances that have +infinite expected_by time-stamps are considered pending. At some point conveyance rows that are never going to arrive are otherwise flagged. On the other hand, rows with overdue expected_by values are given somewhat more attention, to put it mildly. So, we either fix the problem with AR, possibly by moving to Sequel ORM for this case, although I have not yet received an answer as to whether it does any better; Or we trap and override NULL values with infinity in a trigger; Or we choose for the default value a fixed date far, far into the future. +Infinity was chosen as a default to avoid the complexities of dealing with NULL logic in SELECTS. I suppose that the simplest solution is to go with a date of -12-31 and treat that value like infinity. -- *** E-Mail is NOT a SECURE channel *** James B. Byrnemailto:byrn...@harte-lyne.ca Harte Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pg_dump on Hot standby : clarification on how to
I would just like to get some clarification from the list on how to do a pg_dump on the slave in the face of canceling statement due to conflict with recovery. The following links seem to indicate that If I start an idle transaction on the master I should be able to do the pg_dump, but I tried this in psql on the master start transaction, and was still unable to do a pg_dump on the slave at the same time. Is there something special about using dblink that would make this all work? http://postgresql.1045698.n5.nabble.com/Hot-Standby-ERROR-canceling-statement-due-to-conflict-with-recovery-td3402417.html One solution is to begin idle transactions on the master by using e.g. dblink from the *standby* to the master before you start *pg_dump* on the *standby* and end them after *pg_dump* (or whatever) is finished.
Re: [GENERAL] Postgredac Dump
On Thu, May 12, 2011 at 10:02 PM, Bob Pawley rjpaw...@shaw.ca wrote: Hi Can anyone point me to an example of how to use the Postgresdac Dump component?? Below URL will give more detail information about Postgresdac dump:: http://www.microolap.com/products/connectivity/postgresdac/help/TPSQLDump/Methods/DumpToFile.htm --Raghu Ram
Re: [GENERAL] Massive delete from a live production DB
On Thu, May 12, 2011 at 8:23 AM, Phoenix Kiula phoenix.ki...@gmail.com wrote: Hi Been reading some old threads (pre 9.x version) and it seems that the consensus is to avoid doing massive deletes from a table as it'll create so much unrecoverable space/gaps that vacuum full would be needed. Etc. Any ideas on what I could do without losing all the live updates? I need to get rid of about 11% of a 150 million rows of database, with each row being nearly 1 to 5 KB in size... 11% is not big deal as the space will get re-used for future updates and inserts. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] insert order question
Hi: Does... insert into mytbl (col1) values ('a'),('b'),('c'); ... insert records 'a','b','c' in that order while... insert into mytbl (col1) values ('c'),('b'),('a'); ... insert the records in the opposite order? The order matters because there are triggers on the table which will react differently depending on what's already in the table. Thanks in Advance !
Re: [GENERAL] How to handle bogus nulls from ActiveRecord
The column expected_by contains an estimated time of arrival for a particular conveyance. When a row is initialized this value is unknown some of the time. The expected_by value is reset to the arrived_at value on UPDATE if and only if expected_by is greater than arrived_at. Conveyances that have +infinite expected_by time-stamps are considered pending. At some point conveyance rows that are never going to arrive are otherwise flagged. On the other hand, rows with overdue expected_by values are given somewhat more attention, to put it mildly. So, we either fix the problem with AR, possibly by moving to Sequel ORM for this case, although I have not yet received an answer as to whether it does any better; Or we trap and override NULL values with infinity in a trigger; Or we choose for the default value a fixed date far, far into the future. +Infinity was chosen as a default to avoid the complexities of dealing with NULL logic in SELECTS. I suppose that the simplest solution is to go with a date of -12-31 and treat that value like infinity. The just make it work solution has many merits - I would also probably just use -12-31 as a close approximation for +infinity; which itself is just there because you are avoiding estimate is unknown. Why bother updating the expected_by value once the conveyance is no longer pending? Do you not really care if something arrived early? Even if you do not currently it seems a waste to throw out the data when you can readily get the same result as-needed (CASE WHEN expected_by = arrived_at THEN arrived_at ELSE expected_by END) without giving up the ability to calculate early-ness. It would make more sense to set expected = arrived if and only if expected = 'Infinity'. Still, it would at least seem reasonable to guess a reasonable expected date if one is not otherwise provided - possibly with a flag indicating that it is a true guestimate instead of a estimate. David J. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
Dne 12.5.2011 17:37, Phoenix Kiula napsal(a): [mydomain] src cd pg_reorg-1.1.5 [mydomain] pg_reorg-1.1.5 gmake Makefile:13: ../../src/Makefile.global: No such file or directory gmake: *** No rule to make target `../../src/Makefile.global'. Stop. [mydomain] pg_reorg-1.1.5 What am I missing? Do you have a source or just a binary package? To compile such contrib package you need a properly configured source tree. If you do have the sources already available, you need to run configure (because that's what produces the src/Makefile.global). And then build the contrib module again. If you don't have the sources, you can download the distribution at postgresql.org. But maybe there's a src package for your distro. PS. If pg_reorg is such a useful contribution, why can't it be included with PG? Seems like a very useful tool anyway! No? There's a lot of packages that might be included into the default install, but that'd put a lot of responsibilities to maintain them. regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] insert order question
See here: http://www.postgresql.org/docs/9.0/static/sql-values.html Implied is that the supplied data set will be returned in the same order as written unless an ORDER BY is used to re-order the listing prior to it being spit out the other end. 1, 3, 2 = VALUES = 1, 3, 2 1, 3, 2 = VALUES ORDER ASC = 1, 2, 3 The only time you end up with ordering issues is the FROM 'physical table' because there is no defined order for how those records are stored into memory; but when you explicitly list a set of data that explicit order is maintained as long as possible. David J. From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Gauthier, Dave Sent: Thursday, May 12, 2011 3:37 PM To: pgsql-general@postgresql.org Subject: [GENERAL] insert order question Hi: Does... insert into mytbl (col1) values ('a'),('b'),('c'); ... insert records 'a','b','c' in that order while... insert into mytbl (col1) values ('c'),('b'),('a'); ... insert the records in the opposite order? The order matters because there are triggers on the table which will react differently depending on what's already in the table. Thanks in Advance !
Re: [GENERAL] insert order question
Gauthier, Dave dave.gauth...@intel.com writes: Does... insert into mytbl (col1) values ('a'),('b'),('c'); ... insert records 'a','b','c' in that order while... insert into mytbl (col1) values ('c'),('b'),('a'); ... insert the records in the opposite order? I believe so, but it seems unwise to hard-wire a dependency on that into your application, since this is only an implementation artifact and not anything guaranteed by the standard. If you need the inserts to occur in a specific order, issue them as separate commands ... you're not going to save all that much by having them be one command. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to handle bogus nulls from ActiveRecord
On Thu, May 12, 2011 15:51, David Johnston wrote: +Infinity was chosen as a default to avoid the complexities of dealing with NULL logic in SELECTS. I suppose that the simplest solution is to go with a date of -12-31 and treat that value like infinity. The just make it work solution has many merits - I would also probably just use -12-31 as a close approximation for +infinity; which itself is just there because you are avoiding estimate is unknown. Why bother updating the expected_by value once the conveyance is no longer pending? Do you not really care if something arrived early? Even if you do not currently it seems a waste to throw out the data when you can readily get the same result as-needed (CASE WHEN expected_by = arrived_at THEN arrived_at ELSE expected_by END) without giving up the ability to calculate The main reason to update expected_by is that sometimes the conveyance arrives without the expected_by ever being set. Leaving the expected_by value at infinity, or 1231, or NULL, complicates other parts of the system. However, leaving untouched expected_by values that are less than the infinite value is doable and is a better approach. -- *** E-Mail is NOT a SECURE channel *** James B. Byrnemailto:byrn...@harte-lyne.ca Harte Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] insert order question
Ya, I'm sort of coming to that conclusion because of a different consideration. I'm worried about whether or not the triggers will be fired immediately after each record inserted, or once ot the end, or something else. Just too risky. I'm going to go with the discrete insert statements in the order I desire. Thanks -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Thursday, May 12, 2011 4:06 PM To: Gauthier, Dave Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] insert order question Gauthier, Dave dave.gauth...@intel.com writes: Does... insert into mytbl (col1) values ('a'),('b'),('c'); ... insert records 'a','b','c' in that order while... insert into mytbl (col1) values ('c'),('b'),('a'); ... insert the records in the opposite order? I believe so, but it seems unwise to hard-wire a dependency on that into your application, since this is only an implementation artifact and not anything guaranteed by the standard. If you need the inserts to occur in a specific order, issue them as separate commands ... you're not going to save all that much by having them be one command. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massive delete from a live production DB
Dne 12.5.2011 16:23, Phoenix Kiula napsal(a): Hi Been reading some old threads (pre 9.x version) and it seems that the consensus is to avoid doing massive deletes from a table as it'll create so much unrecoverable space/gaps that vacuum full would be needed. Etc. Instead, we might as well do a dump/restore. Faster, cleaner. This is all well and good, but what about a situation where the database is in production and cannot be brought down for this operation or even a cluster? Any ideas on what I could do without losing all the live updates? I need to get rid of about 11% of a 150 million rows of database, with each row being nearly 1 to 5 KB in size... Thanks! Version is 9.0.4. One of the possible recipes in such case is usually a partitioning. If you can divide the data so that a delete is equal to a drop of a partition, then you don't need to worry about vacuum etc. But the partitioning has it's own problems - you can't reference the partitioned table using foreign keys, the query plans often are not as efficient as with a non-partitioned table etc. regards Tomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_dump on Hot standby : clarification on how to
On Thu, May 12, 2011 at 11:26:38AM -0700, bubba postgres wrote: I would just like to get some clarification from the list on how to do a pg_dump on the slave in the face of canceling statement due to conflict with recovery. The following links seem to indicate that If I start an idle transaction on the master I should be able to do the pg_dump, but I tried this in psql on the master start transaction, and was still unable to do a pg_dump on the slave at the same time. Is there something special about using dblink that would make this all work? Could you define what you mean by unable to do pg_dump on the slave? I don't see why dblink would be the special thing. I think what you want is to hold a transaction open on the master so that the WAL can't get recycled. At least, that's what I understood from the post. I haven't actually tried it yet, but to me it sounded like it ought to work. A -- Andrew Sullivan a...@crankycanuck.ca -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to handle bogus nulls from ActiveRecord
David suggested using a guesstimate default date along with a boolean to indicate when you're using guesstimates. I think this is a solid approach, but if the default expected_by idea doesn't work for you, a boolean would still make this a lot easier on the Rails side. It sounds like you're using a setup for Heroku, so I checked the postgreSQL 8.3 manual--a boolean is 1 bytehttp://www.postgresql.org/docs/8.3/interactive/datatype-boolean.html. If this isn't for Heroku, other postgreSQL version probably implement booleans the same way. Your database size should go up by # records * 1 byte + indexing overhead. Though I don't know how many records you're working with, this seems relatively cheap given that it will make your code more readable (if expected_date_estimated?). It should also simplify any remaining code you have to write, as you won't have to think about writing elaborate if or case statements to determine if expected_by was explicitly set. On Thu, May 12, 2011 at 1:06 PM, James B. Byrne byrn...@harte-lyne.cawrote: On Thu, May 12, 2011 15:51, David Johnston wrote: +Infinity was chosen as a default to avoid the complexities of dealing with NULL logic in SELECTS. I suppose that the simplest solution is to go with a date of -12-31 and treat that value like infinity. The just make it work solution has many merits - I would also probably just use -12-31 as a close approximation for +infinity; which itself is just there because you are avoiding estimate is unknown. Why bother updating the expected_by value once the conveyance is no longer pending? Do you not really care if something arrived early? Even if you do not currently it seems a waste to throw out the data when you can readily get the same result as-needed (CASE WHEN expected_by = arrived_at THEN arrived_at ELSE expected_by END) without giving up the ability to calculate The main reason to update expected_by is that sometimes the conveyance arrives without the expected_by ever being set. Leaving the expected_by value at infinity, or 1231, or NULL, complicates other parts of the system. However, leaving untouched expected_by values that are less than the infinite value is doable and is a better approach. -- *** E-Mail is NOT a SECURE channel *** James B. Byrnemailto:byrn...@harte-lyne.ca Harte Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sharing data between databases
On 05/12/2011 12:04 PM, Tim Uckun wrote: or carefully structure your dblink joins so they can perform efficiently, possibly using temp tables as a sort of materialized view. According to the documents unless you are writing procedural code with cursors when you touch the dblink view it will pull the entire table/recordset over. thats really all that the fancier database engines do behind the scenes... and even then, distributed joins can be painful. I am not sure what they do but I have done this kind of thing in SQL server without any problems and with almost no performance penalty if the two databases were on the same instance. On the same instance? Yes, that's a bit different. Many database engines manage multiple databases that're really just namespaces within a single storage engine. I don't know if that's how SQL Server does things, but it's certainly how MySQL does for example, and people are often confused by the way they can't SELECT from tables on another database in Pg. Unfortunately, Pg's design doesn't make it easy for a single backend to have multiple databases open at once. Inter-database communication even within a single Pg instance (cluster) requires multiple backends. I sometimes think it'd be nice if Pg offered the ability to translate schema to databases, so it runs with a single database and multiple schema, and you connect to a schema, MySQL style. It'd help people who want to use multiple databases on a machine and query between them, though of course it'd do nothing for people who want to do inter-machine or inter-instance queries. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sharing data between databases
I sometimes think it'd be nice if Pg offered the ability to translate schema to databases, so it runs with a single database and multiple schema, and you connect to a schema, MySQL style. It'd help people who want to use multiple databases on a machine and query between them, though of course it'd do nothing for people who want to do inter-machine or inter-instance queries. That's an interesting idea. Since I am building this app from scratch I suppose I could create different schemas for different applications instead of using different databases. I wonder how rails and active record can deal with that. I'll take a look and see. I am presuming of course that one can query across schemas. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sharing data between databases
On 05/12/11 8:38 PM, Tim Uckun wrote: I sometimes think it'd be nice if Pg offered the ability to translate schema to databases, so it runs with a single database and multiple schema, and you connect to a schema, MySQL style. It'd help people who want to use multiple databases on a machine and query between them, though of course it'd do nothing for people who want to do inter-machine or inter-instance queries. That's an interesting idea. Since I am building this app from scratch I suppose I could create different schemas for different applications instead of using different databases. I wonder how rails and active record can deal with that. I'll take a look and see. I am presuming of course that one can query across schemas. most certainly. just prefix any objects or fields with schemaname.objectname. if you don't specify the schemaname it looks in the SEARCH_PATH, which defaults to $user,public -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sharing data between databases
On 05/13/2011 11:38 AM, Tim Uckun wrote: That's an interesting idea. Since I am building this app from scratch I suppose I could create different schemas for different applications instead of using different databases. I wonder how rails and active record can deal with that. I'll take a look and see. I am presuming of course that one can query across schemas. Yep, no problem at all with that, schema just let you categorize tables/functions/etc into namespaces. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general