Re: [GENERAL] any way for a transaction to see inserts done earlier in the transaction?
On Apr 16, 2014, at 4:27 PM, Susan Cassidy susan.cass...@decisionsciencescorp.com wrote: Is there any way to let a transaction see the inserts that were done earlier in the transaction? I want to insert a row, then later use it within the same transaction. If not, I will have to commit after each insert, and I don't want to do that until add the rows are added, if I can possibly avoid it. Did you try it? This is already how it works, unless I misunderstand your question… postgres=# create temporary table foo (i integer primary key); CREATE TABLE postgres=# begin; BEGIN postgres=# insert into foo values(1); INSERT 0 1 postgres=# select * from foo; i --- 1 (1 row) postgres=# commit; COMMIT -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] any way for a transaction to see inserts done earlier in the transaction?
On Wed, Apr 16, 2014 at 4:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Susan Cassidy susan.cass...@decisionsciencescorp.com writes: Is there any way to let a transaction see the inserts that were done earlier in the transaction? It works that way automatically, as long as you're talking about separate statements within one transaction. regards, tom lane On Apr 16, 2014, at 4:53 PM, Susan Cassidy susan.cass...@decisionsciencescorp.com wrote: Well, it isn't working for me right now. It can't see a row that was inserted earlier in the transaction. It is a new primary key, and when I SELECT it, it isn't found. Can you share the code that does not work with us? Preferably as a small self-contained example. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is it safe to stop postgres in between pg_start_backup and pg_stop_backup?
On Apr 2, 2014, at 3:08 PM, Jacob Scott jacob.sc...@gmail.com wrote: Hi, Hello there ;) Does upgrading a a disk being used by postgres (9.1, on Ubuntu) with the following process sound safe? • pg_start_backup • Take a filesystem snapshot (of a volume containing postgres data but not pg_xlog) • Bring a new higher performing disk online from snapshot • pg_ctl stop • switch disks (umount/remount at same mountpoint) • pg_ctl start • pg_stop_backup This seems like an odd use case because pg_start_backup is designed for performing on-line backups, but I think it will give me minimum downtime. At the very least you need to move your pg_stop_backup earlier in the process. Online backups do not survive server shutdowns; any backup in process at shutdown is aborted. • pg_start_backup • Take a filesystem snapshot (of a volume containing postgres data but not pg_xlog) • pg_stop_backup • pg_ctl stop • Bring a new higher performing disk online from snapshot • switch disks (umount/remount at same mountpoint) • pg_ctl start Assuming you ensure that your archived xlogs are available same to the new instance as the old, I believe this should work. But run it on a test instance first! It sounds like an odd use case but really it’s no different from “the data is gone, restore from backup” — just that you intentionally trashed the data by switching disks :) Another option you could consider is rsync. I have often transferred databases by running rsync concurrently with the database to get a “dirty backup” of it. Then once the server is shutdown you run a cleanup rsync which is much faster than the initial run to ensure that the destination disk is consistent and up to date. This way your downtime is limited to how long it takes rsync to compare fs trees / fix the inconsistencies. Good luck! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trimming transaction logs after extended WAL archive failures
On Mar 27, 2014, at 5:29 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Mar 27, 2014 at 1:42 AM, Steven Schlansker ste...@likeness.com wrote: On Mar 25, 2014, at 7:58 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: Yea, vacuum just marks space as available for reuse it does not actually free space. I even knew that. Funny what you'll forget when the system is down and you're in a panic. This is actually something that has bit me on more than one occasion -- if you accidentally temporarily use too much space, it is *very* hard to back out of the situation. It seems that the only way to actually release space to the system are VACUUM FULL, CLUSTER, or to DROP objects. None of these can be executed without severe disruption to a running database. A cluster operation on any of our tables that are large enough to matter can easily run through the night. Yep, depending on your application needs you could actually avoid any periodic VACUUM FULL-like operations that need an exclusive lock on the objects it is cleaning by making autovacuum more aggressive. This makes your house cleaner by dropping the garbage at a higher frequency. Yes, although this will not save you in a couple of notable cases. We have run into this at least by: * Accidentally leaving transactions open for days * Runaway process inserting data until the disk fills So yes we should autovacuum more, but it is not a total solution. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trimming transaction logs after extended WAL archive failures
On Mar 25, 2014, at 7:58 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 03/25/2014 04:52 PM, Steven Schlansker wrote: Some more questions, what happens when things begin to dawn on me:) You said the disk filled up entirely with log files yet currently the number(size) of logs is growing. It’s holding stable now. I tried to vacuum up to clean some space which turned out to generate more pg_xlog activity than it saved space, and (I assume) the archiver fell behind and that was the source of the growing log. There haven’t been any new segments since I stopped doing that. Yea, vacuum just marks space as available for reuse it does not actually free space. I even knew that. Funny what you’ll forget when the system is down and you’re in a panic. This is actually something that has bit me on more than one occasion — if you accidentally temporarily use too much space, it is *very* hard to back out of the situation. It seems that the only way to actually release space to the system are VACUUM FULL, CLUSTER, or to DROP objects. None of these can be executed without severe disruption to a running database. A cluster operation on any of our tables that are large enough to matter can easily run through the night. I can only keep wishing for a CLUSTER CONCURRENTLY or VACUUM FULL CONCURRENTLY that can run without a temporary copy of the entire table... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trimming transaction logs after extended WAL archive failures
On Mar 26, 2014, at 9:04 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Mar 25, 2014 at 6:33 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tuesday, March 25, 2014, Steven Schlansker ste...@likeness.com wrote: Hi everyone, I have a Postgres 9.3.3 database machine. Due to some intelligent work on the part of someone who shall remain nameless, the WAL archive command included a ‘ /dev/null 21’ which masked archive failures until the disk entirely filled with 400GB of pg_xlog entries. PostgreSQL itself should be logging failures to the server log, regardless of whether those failures log themselves. I have fixed the archive command and can see WAL segments being shipped off of the server, however the xlog remains at a stable size and is not shrinking. In fact, it’s still growing at a (much slower) rate. The leading edge of the log files should be archived as soon as they fill up, and recycled/deleted two checkpoints later. The trailing edge should be archived upon checkpoints and then recycled or deleted. I think there is a throttle on how many off the trailing edge are archived each checkpoint. So issues a bunch of CHECKPOINT; commands for a while and see if that clears it up. Indeed, forcing a bunch of CHECKPOINTS started to get things moving again. Actually my description is rather garbled, mixing up what I saw when wal_keep_segments was lowered, not when recovering from a long lasting archive failure. Nevertheless, checkpoints are what provoke the removal of excessive WAL files. Are you logging checkpoints? What do they say? Also, what is in pg_xlog/archive_status ? I do log checkpoints, but most of them recycle and don’t remove: Mar 26 16:09:36 prd-db1a postgres[29161]: [221-1] db=,user= LOG: checkpoint complete: wrote 177293 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 56 recycled; write=539.838 s, sync=0.049 s, total=539.909 s; sync files=342, longest=0.015 s, average=0.000 s That said, after letting the db run / checkpoint / archive overnight, the xlog did indeed start to slowly shrink. The pace at which it is shrinking is somewhat unsatisfying, but at least we are making progress now! I guess if I had just been patient I could have saved some mailing list traffic. But patience is hard when your production database system is running at 0% free disk :) Thanks everyone for the help, if the log continues to shrink, I should be out of the woods now. Best, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Trimming transaction logs after extended WAL archive failures
Hi everyone, I have a Postgres 9.3.3 database machine. Due to some intelligent work on the part of someone who shall remain nameless, the WAL archive command included a ‘ /dev/null 21’ which masked archive failures until the disk entirely filled with 400GB of pg_xlog entries. I have fixed the archive command and can see WAL segments being shipped off of the server, however the xlog remains at a stable size and is not shrinking. In fact, it’s still growing at a (much slower) rate. I’ve seen references to people just deleting “old” segment files or using pg_resetxlog to fix this situation, however I already know that the response from the mailing list will be “that’s insane, don’t do that”. So what is the correct solution to pursue here? The steady state of the machine should have enough space, I just need to reclaim some of it... Thanks for any guidance! Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trimming transaction logs after extended WAL archive failures
On Mar 25, 2014, at 3:52 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 03/25/2014 01:56 PM, Steven Schlansker wrote: Hi everyone, I have a Postgres 9.3.3 database machine. Due to some intelligent work on the part of someone who shall remain nameless, the WAL archive command included a ‘ /dev/null 21’ which masked archive failures until the disk entirely filled with 400GB of pg_xlog entries. I have fixed the archive command and can see WAL segments being shipped off of the server, however the xlog remains at a stable size and is not shrinking. In fact, it’s still growing at a (much slower) rate. So what is wal_keep_segments set at in postgresql.conf? 5000. There are currently about 18000 WAL segments in pg_xlog. I’ve seen references to people just deleting “old” segment files or using pg_resetxlog to fix this situation, however I already know that the response from the mailing list will be “that’s insane, don’t do that”. So what is the correct solution to pursue here? The steady state of the machine should have enough space, I just need to reclaim some of it... Thanks for any guidance! Steven -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trimming transaction logs after extended WAL archive failures
On Mar 25, 2014, at 4:02 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 03/25/2014 03:54 PM, Steven Schlansker wrote: On Mar 25, 2014, at 3:52 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 03/25/2014 01:56 PM, Steven Schlansker wrote: Hi everyone, I have a Postgres 9.3.3 database machine. Due to some intelligent work on the part of someone who shall remain nameless, the WAL archive command included a ‘ /dev/null 21’ which masked archive failures until the disk entirely filled with 400GB of pg_xlog entries. I have fixed the archive command and can see WAL segments being shipped off of the server, however the xlog remains at a stable size and is not shrinking. In fact, it’s still growing at a (much slower) rate. So what is wal_keep_segments set at in postgresql.conf? 5000. There are currently about 18000 WAL segments in pg_xlog. I guess what I should have also asked previously is what exactly are you doing, are you streaming as well as archiving? Yes, we have both enabled. Here’s some hopefully relevant configuration stanzas and information: checkpoint_segments = 1024# in logfile segments, min 1, 16MB each checkpoint_timeout = 10min # range 30s-1h checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 checkpoint_warning = 8min # 0 disables archive_mode = on # allows archiving to be done archive_command = 'rsync -q %p pgbac...@d0028.nessops.net:./wal/prd-db1a/%f' # command to use to archive a logfile segment archive_timeout = 0# force a logfile segment switch after this # number of seconds; 0 disables # These settings are ignored on a standby server max_wal_senders = 10# max number of walsender processes wal_keep_segments = 5000 # in logfile segments, 16MB each; 0 disables vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed wal_sender_timeout = 60s # in milliseconds; 0 disables synchronous_standby_names = '' # standby servers that provide sync rep # comma-separated list of application_name # from standby(s); '*' = all # - Standby Servers - # These settings are ignored on a master server hot_standby = on # on allows queries during recovery max_standby_archive_delay = 30s# max delay before canceling queries # when reading WAL from archive; # -1 allows indefinite delay max_standby_streaming_delay = 30s # max delay before canceling queries # when reading streaming WAL; # -1 allows indefinite delay wal_receiver_status_interval = 10s # send replies at least this often # 0 disables hot_standby_feedback = off # send info from standby to prevent # query conflicts [root@prd-db1a data]# ls -1 pg_xlog | wc -l 20168 I have verified that WAL segments are being archived to the archive destination, and that the slave is connected and receiving segments. Thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trimming transaction logs after extended WAL archive failures
On Mar 25, 2014, at 4:45 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 03/25/2014 04:17 PM, Steven Schlansker wrote: On Mar 25, 2014, at 4:02 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 03/25/2014 03:54 PM, Steven Schlansker wrote: On Mar 25, 2014, at 3:52 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 03/25/2014 01:56 PM, Steven Schlansker wrote: Hi everyone, I have a Postgres 9.3.3 database machine. Due to some intelligent work on the part of someone who shall remain nameless, the WAL archive command included a ‘ /dev/null 21’ which masked archive failures until the disk entirely filled with 400GB of pg_xlog entries. I have fixed the archive command and can see WAL segments being shipped off of the server, however the xlog remains at a stable size and is not shrinking. In fact, it’s still growing at a (much slower) rate. So what is wal_keep_segments set at in postgresql.conf? 5000. There are currently about 18000 WAL segments in pg_xlog. I guess what I should have also asked previously is what exactly are you doing, are you streaming as well as archiving? Yes, we have both enabled. Here’s some hopefully relevant configuration stanzas and information: I have verified that WAL segments are being archived to the archive destination, and that the slave is connected and receiving segments. Some more questions, what happens when things begin to dawn on me:) You said the disk filled up entirely with log files yet currently the number(size) of logs is growing. It’s holding stable now. I tried to vacuum up to clean some space which turned out to generate more pg_xlog activity than it saved space, and (I assume) the archiver fell behind and that was the source of the growing log. There haven’t been any new segments since I stopped doing that. So did you grow the disk, move the logs or find some way to reduce the number? I used tune2fs to use some of the “reserved” filesystem space temporarily. I was too scared to move log segments away, this is a production database. What happened to the server when the disk filled up? Postgresql PANICed due to failed writes. Mar 25 22:46:41 prd-db1a postgres[18995]: [12-1] db=checkin,user=postgres PANIC: could not write to file pg_xlog/xlogtemp.18995: No space left on device In other words do the log entries at the time show it recovered gracefully? The database is currently up and running, although I do not have much time until it fails again, there are only a few precious GB free. If not what did you do to get it running again? tune2fs and restarted postgres The concern being that the server is actually fully recovered. I believe it is. Our production site is back up and running seemingly normally, the postgres log has no obvious complaining. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [JDBC] Cannot insert to 'path' field using EclipseLink
On Mar 12, 2014, at 10:12 AM, Daryl Foster daryl.fos...@oncenter.com wrote: java.lang.ClassCastException: org.postgresql.geometric.PGpath cannot be cast to org.postgresql.geometric.PGpath That's a sure sign of ClassLoader confusion. Make sure there is only one copy of the driver jar in your application or the JBoss container, but never both. On Wed, Mar 12, 2014 at 7:37 AM, Daryl Foster daryl.fos...@oncenter.com wrote: I have a java app running in JBoss that uses EclipseLink to persist to a Postgres database. I've added a field with a 'path' datatype to one of the tables but I keep getting the following exception when I try to insert data: org.postgresql.util.PSQLException: Can't infer the SQL type to use for an instance of org.postgresql.geometric.PGpath. Use setObject() with an explicit Types value to specify the type to use. Here's the table definition: CREATE TABLE schema.table_name ( item_iduuid NOT NULL, item_path path NOT NULL ) The java entity is representing the item_path field as a ListPoint object, and I'm using a converter to map from the ListPoint object to a PGpath object: import org.eclipse.persistence.mappings.DatabaseMapping; import org.eclipse.persistence.mappings.converters.Converter; import org.eclipse.persistence.sessions.Session; import org.postgresql.geometric.PGpath; import java.awt.Point; import java.util.ArrayList; import java.util.List; import static java.sql.Types.OTHER; public class PgPathConverter implements Converter { @Override public boolean isMutable () { return false; } @Override public ListPoint convertDataValueToObjectValue (Object value, Session session) { // Code that converts PGpath to ListPoint } @Override public PGpath convertObjectValueToDataValue (Object value, Session session) { // Code that converts ListPoint to PGpath } @Override public void initialize (DatabaseMapping mapping, Session session) { mapping.getField ().setSqlType (OTHER); } } The entity class is defined as follows: @Entity @Table ( name = table_name, schema = schema ) @Converter ( name = path, converterClass = PgPathConverter.class ) public class TableName { public TableName () {} private static final long serialVersionUID = 1L; @Column (name = item_path) @Convert (path) private ListPoint m_ItemPath; @Id @Column ( name = item_id, unique = true, nullable = false ) private UUIDm_ItemId; public UUID getItemId () { return m_ItemId; } public ListPoint getItemPath () { return m_InkPath; } public void setItemId (UUID itemId) { m_ItemId = itemId; } public void setInkPath (ListPoint itemPath) { m_ItemPath = itemPath; } } Finally, here's the exception I get when I call `EntityManager.persist (entity)`: 18:10:33,789 ERROR [org.jboss.as.ejb3] (http-/0.0.0.0:8080-1) javax.ejb.EJBTransactionRolledbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.0.v20130507-3faac2b): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: org.postgresql.util.PSQLException: Can't infer the SQL type to use for an instance of org.postgresql.geometric.PGpath. Use setObject() with an explicit Types value to specify the type to use. Error Code: 0 Call: INSERT INTO schema.table_name (item_id, item_path) VALUES (?, ?) bind = [2 parameters bound] 18:10:33,789 ERROR [org.jboss.as.ejb3.invocation] (http-/0.0.0.0:8080-1) JBAS014134: EJB Invocation failed on component TableNameRepository for method public void com.mycompany.myproject.data.Repository.flush() throws javax.persistence.TransactionRequiredException,javax.persistence.PersistenceException: javax.ejb.EJBTransactionRolledbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.0.v20130507-3faac2b): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: org.postgresql.util.PSQLException: Can't infer the SQL type to use for an instance of org.postgresql.geometric.PGpath. Use setObject() with an explicit Types value to specify the type to use. Error Code: 0 Call: INSERT INTO schema.table_name (item_id, item_path VALUES (?, ?) bind = [2 parameters bound] at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleInCallerTx(CMTTxInterceptor.java:138)
[GENERAL] Timing for release with fix for Nov2013ReplicationIssue
Hi everyone, I’ve seen murmuring on the list regarding https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue Is there an ETA on a release with the bug fix for this? I’m putting off building from source because I prefer to use the pgdg RPM packages, but if we don’t get a release soon it might force my hand :) Thanks, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Timing for release with fix for Nov2013ReplicationIssue
On Dec 3, 2013, at 2:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Steven Schlansker stevenschlans...@gmail.com writes: I’ve seen murmuring on the list regarding https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue Is there an ETA on a release with the bug fix for this? I’m putting off building from source because I prefer to use the pgdg RPM packages, but if we don’t get a release soon it might force my hand :) http://www.postgresql.org/message-id/21902.1385241...@sss.pgh.pa.us Thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Blowfish Encrypted String
On Sep 26, 2013, at 10:55 AM, Craig Boyd craigbo...@gmail.com wrote: Dmitriy, Thank you very much for the reply! Right...got the the data type...but how do I actually insert a binary string into the column? What does the SQL look like? For the moment assume I have the following bit of binary: 4PO„âÔ™ä² What does the SQL look like to insert that into the bytea column? (FYI...most of my binary data would be only slightly bigger than what I just pasted here. I am not storing Word documents or images, just some encrypted strings of test that are not more than 100 characters long and typically about 16 to 20. http://www.postgresql.org/docs/devel/static/datatype-binary.html Most client libraries will have functionality to do this nicely (e.g. sending a Java byte[] to a Postgres bytea); it sucks somewhat to do it by hand in SQL. Thanks, Craig Boyd On Thu, Sep 26, 2013 at 11:46 AM, Dmitriy Igrishin dmit...@gmail.com wrote: 2013/9/26 Craig Boyd craigbo...@gmail.com Hello All, I have a string in a program that I have encrypted using Blowfish and I am now trying to figure out the best way to store that in PostgreSQL so that I can store it and retrieve it later for decryption. I have searched around and have not found some good examples of how to go about this. So my first questions is: Can someone point me to a tutorial or posting that shows how one might do that? Failing that: What data type should I use to store this? I believe that you should use bytea datatype. What does the SQL look like to INSERT/SELECT the field? Just like any other INSERT/SELECT query. -- // Dmitriy. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] JDBC driver for Postgres 9.3
On Sep 26, 2013, at 6:35 AM, Kohler Manuel manuel.koh...@bsse.ethz.ch wrote: Hi, we are developing a Java based software with Postgres as a DB. Could someone tell me if there will be a JDBC driver for 9.3 out soon or is it safe and recommended to use the latest JDBC driver available? Currently we are using: PostgreSQL 9.2-1000 JDBC4 (build 1000) I believe the 9.3 JDBC driver is coming soon, but the latest driver works just fine (we use it in production) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Trying to create DB / user to import some data
On Sep 26, 2013, at 10:28 PM, mdr monosij.for...@gmail.com wrote: create user import_dbms_user with password 'import_dbms'; create database import_dbms_db; grant all privileg However when I try to run psql from the command line: psql -h localhost -U import_dbms_user -WI enter password when prompted Password for user import_dbms_user: psql: FATAL: database import_dbms_user does not exist But I get the error as above. By default, psql will try to use the username as the database name if it is not specified. Since your database name is different from the user, this does not work. You could either name both your user and db import_dbms and have the default work, or specify the database explicitly with -d import_dbms_db Hope that helps. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Deduplication and transaction isolation level
On Sep 25, 2013, at 6:04 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Sep 24, 2013 at 10:19 PM, François Beausoleil franc...@teksol.info wrote: Hi all! I import many, many rows of data into a table, from three or more computers, 4 times per hour. I have a primary key, and the query I use to import the data is supposed to dedup before inserting, but I still get primary key violations. The import process is: * Load CSV data into temp table * INSERT INTO dest SELECT DISTINCT (pkey) FROM temp WHERE NOT EXISTS(temp.pkey = dest.pkey) I assumed (erroneously) that this would guarantee no duplicate data could make it into the database. The primary key violations are proving me wrong. Right. Transaction A and B are interleaved: they both run the same check against the same id at the same time. Both checks pass because neither transaction is committed. This problem is not solvable by adjusting the isolation level. Are you sure that this is the case? It is my understanding that since 9.1 with SSI (https://wiki.postgresql.org/wiki/SSI) if you set the transaction isolation level to SERIALIZABLE, this problem is solved, as the insert will take a predicate lock and the other insert cannot succeed. We use this to detect / resolve concurrent inserts that violate primary keys and it works great. However in this case it probably doesn't help the OP because the cost of restarting the entire import is likely too high. Typical solutions might be to: A. Lock the table while inserting B. Retry the transaction following an error. C. Import the records to a staging table, then copy the do the deduplication check when moving from the staging table -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ZFS snapshots - to use pg_start_backup() and pg_stop_backup() or no?
On Sep 11, 2013, at 4:29 PM, Gregory Haase haa...@onefreevoice.com wrote: I was trying to figure out how to get the following syntax to work: echo select pg_start_backup('zfs_snapshot'); \\! zfs snapshot zroot/zpgsql@test; \\ select pg_stop_backup(); | psql postgres I do: psql -c select pg_start_backup('whatever'); zfs snapshot pool/fs@sn psql -c select pg_stop_backup(); That way no need to shell out from psql :) The above command successfully starts the backup and creates the snapshot but then fails to stop the backup. I've tried various combinations of \ and \\ here with different whitespace and I just can't seem to find a combination that works. I don't understand the proper use of \\ (described as the separator metacommand). Keep in mind that echo \\ will actually only echo '\' because \ is a shell escape as well... However, in my research, I noted that a bunch of people seem to just not even bother with pg_start_backup/pg_stop_backup and I guess aren't that worried about the crash recovery process if they need to perform a restore. I also find the omission of the start/stop backup functions from the File System Level Backup page: http://www.postgresql.org/docs/9.2/static/backup-file.html Is the pg_start_backup() and pg_stop_backup() even necessary? If all of your Postgres files are part of *the same* consistent snapshot (i.e. are on one FS that gets snapshotted), then the start/stop backup should not be necessary. It will just look like a server crash instead. pg_start_backup is used when you do not have filesystem snapshotting available, and is described in detail on the next manual page: http://www.postgresql.org/docs/9.2/static/continuous-archiving.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Best Postgresql books
PostgreSQL 9.0 High Performance: http://www.amazon.com/books/dp/184951030X On Jul 18, 2013, at 3:11 PM, Pedro Costa pedrocostaa...@sapo.pt wrote: Hi guys, Can anyone tell me the best books about postgresql? Specialy about tunning and querys performances Thanks Enviado do meu tablet Samsung Note -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] backend hangs at sendto() and can't be terminated
On Jul 8, 2013, at 6:48 AM, Jov am...@amutu.com wrote: netstat show nothing about the socket of the process,so I think the TCP timeout took effect.so it is really wired. Jov blog: http:amutu.com/blog 2013/7/8 Tom Lane t...@sss.pgh.pa.us Merlin Moncure mmonc...@gmail.com writes: On Mon, Jul 8, 2013 at 4:56 AM, Jov am...@amutu.com wrote: my first post already try the pg_terminate_backend but failed: pg_terminate_backend return t but the backend still there. possibly a kernel problem? The backend will keep trying to send data until the kernel informs it the connection is lost. (Anything else would be a bad idea.) So the real question here is why it's taking so long for the TCP stack to decide that the client is gone. I'm wondering what exactly you did to kill the psql session. Most ordinary ways of killing a process should result in closure of whatever connections it had open. If you'd lost network connectivity to the client, a TCP timeout on the order of an hour wouldn't be surprising. (If you feel this is too long, you can fool with the TCP keepalive parameters.) But it seems unlikely that that's what's happening here. Interestingly enough, I am seeing what may (or then again, may not) be a related problem. I have a backend process stuck in a recvfrom -- [root@prd-db2a ~]# strace -tv -p 24402 Process 24402 attached - interrupt to quit 00:02:00 recvfrom(10, postgres 24402 0.0 10.7 3505628 2639032 ? Ss Jul01 0:21 postgres: event event 10.29.62.21(39485) idle It is a psql process that I launched from the command line 10 days ago: steven 24401 0.0 0.0 166824 2532 pts/2TJul01 0:00 psql -U event -h prd-db2a.nessops.net -c delete from event where event_id in (select event_id from event where payload is null limit 10); event=# select pid,application_name,backend_start,waiting,state,query from pg_stat_activity where pid=24402; pid | application_name | backend_start | waiting | state | query ---+--+---+-+---+ -- 24402 | psql | 2013-07-01 21:03:27.417039+00 | f | idle | delete from event where event_id in (select event_id from event where payload is null l imit 10); (1 row) I invoked it with -c, which supposedly makes it exit when the single command is finished. Many similar queries have been run, and I'd say they run for a half hour on average. But the process has been alive for 10 days now, not blocked, but just idle. It is connected over TCP from the local box (although not through the loopback interface, through a 10.x interface) How does this make any sense? The command seems to be immune to pg_cancel_backend, but pg_terminate_backend did manage to kill it. If this problem is not related, please tell me to shove off, and I will not pollute this thread further. But I hope some of this information is useful. I am running PG 9.2.4, CentOS kernel 2.6.32.360. Best, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] backend hangs at sendto() and can't be terminated
On Jul 8, 2013, at 6:48 AM, Jov am...@amutu.com wrote: netstat show nothing about the socket of the process,so I think the TCP timeout took effect.so it is really wired. Jov blog: http:amutu.com/blog 2013/7/8 Tom Lane t...@sss.pgh.pa.us Merlin Moncure mmonc...@gmail.com writes: On Mon, Jul 8, 2013 at 4:56 AM, Jov am...@amutu.com wrote: my first post already try the pg_terminate_backend but failed: pg_terminate_backend return t but the backend still there. possibly a kernel problem? The backend will keep trying to send data until the kernel informs it the connection is lost. (Anything else would be a bad idea.) So the real question here is why it's taking so long for the TCP stack to decide that the client is gone. I'm wondering what exactly you did to kill the psql session. Most ordinary ways of killing a process should result in closure of whatever connections it had open. If you'd lost network connectivity to the client, a TCP timeout on the order of an hour wouldn't be surprising. (If you feel this is too long, you can fool with the TCP keepalive parameters.) But it seems unlikely that that's what's happening here. Interestingly enough, I am seeing what may (or then again, may not) be a related problem. I have a backend process stuck in a recvfrom -- [root@prd-db2a ~]# strace -tv -p 24402 Process 24402 attached - interrupt to quit 00:02:00 recvfrom(10, postgres 24402 0.0 10.7 3505628 2639032 ? Ss Jul01 0:21 postgres: event event 10.29.62.21(39485) idle It is a psql process that I launched from the command line 10 days ago: steven 24401 0.0 0.0 166824 2532 pts/2TJul01 0:00 psql -U event -h prd-db2a.nessops.net -c delete from event where event_id in (select event_id from event where payload is null limit 10); event=# select pid,application_name,backend_start,waiting,state,query from pg_stat_activity where pid=24402; pid | application_name | backend_start | waiting | state | query ---+--+---+-+---+ -- 24402 | psql | 2013-07-01 21:03:27.417039+00 | f | idle | delete from event where event_id in (select event_id from event where payload is null l imit 10); (1 row) I invoked it with -c, which supposedly makes it exit when the single command is finished. Many similar queries have been run, and I'd say they run for a half hour on average. But the process has been alive for 10 days now, not blocked, but just idle. It is connected over TCP from the local box (although not through the loopback interface, through a 10.x interface) How does this make any sense? The command seems to be immune to pg_cancel_backend, but pg_terminate_backend did manage to kill it. If this problem is not related, please tell me to shove off, and I will not pollute this thread further. But I hope some of this information is useful. I am running PG 9.2.4, CentOS kernel 2.6.32.360. Best, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] unique constraint violations
On Jun 26, 2013, at 11:04 AM, pg noob pgn...@gmail.com wrote: Hi all, There are some places in our application where unique constraint violations are difficult to avoid due to multithreading. What we've done in most places to handle this is to retry in a loop. Generally it starts by checking if a value already exists, if not - try to insert it, which may cause a unique violation, and if a constraint violation occurs, retrying the process again. The trouble that we have with this approach is twofold. First, it causes errors to show up in the Postgres log about unique constraint violations. These errors are misleading to our support folks because they look like errors but in fact they are handled (usually) by retries in the code and don't cause any real problems. We'd like these to be handled without causing errors to show up in the logs. We have solved this problem by leveraging the new SSI feature in Postgres 9.1 (http://wiki.postgresql.org/wiki/SSI) By running your insert / update inside of a serializable transaction, instead of getting unique failures, you get serialization failures. These are expected and we just retry them. Much less confusing log clutter, and no need to differentiate between expected and unexpected unique violations. Second, in some cases we've done a lot of work on a transaction before hitting a unique constraint violation. If this happens the entire transaction gets rolled back and all the work leading up to the constraint violation has to be redone. As a work around for this, I have proposed the following function which will execute an arbitrary query (with no results returned) and catch a unique constraint violation without causing errors to show up in the postgres log or the transaction to get rolled back. Now what the code does is to call this function with the query to execute. If a unique constraint violation happens, the caller gets a non-zero return code back and this can be handled but the DB transaction does not get rolled back. The approach we took wouldn't solve this problem for you. Serialization failures also require restarting the transaction. You could also consider savepoints as a more lightweight way of undoing the bad updates -- http://www.postgresql.org/docs/9.2/interactive/sql-savepoint.html Sounds like you might have a slightly different use case and our approach won't help you much, but I figured I'd mention it. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Index over only uncommon values in table
Hi everyone, I assume this is not easy with standard PG but I wanted to double check. I have a column that has a very uneven distribution of values. ~95% of the values will be the same, with some long tail of another few dozens of values. I want to have an index over this value. Queries that select the most common value will not use the index, because it is a overwhelming percentage of the table. This means that ~95% of the disk space and IOPS to maintain the index is wasted. I cannot use a hardcoded partial index because: 1) The common value is not known at schema definition time, and may change (very slowly) over time. 2) JDBC uses prepared statements for everything, and the value to be selected is not known at statement prepare time, so any partial indices are ignored (this is a really really obnoxious behavior and makes partial indices almost useless combined with prepared statements, sadly…) The table size is expected to approach the 0.5 billion row mark within the next few months, hence my eagerness to save even seemingly small amounts of per-row costs. Curious if anyone has a good way to approach this problem. Thanks, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index over only uncommon values in table
On Jun 18, 2013, at 12:23 PM, John R Pierce pie...@hogranch.com wrote: On 6/18/2013 12:17 PM, Steven Schlansker wrote: 1) The common value is not known at schema definition time, and may change (very slowly) over time. how could a value thats constant in 95% of the rows change, unless you added 20 times more rows with a new value (and for a big portion of the time, no value would meet your 95% criteria). The table is a denormalized version of some packed data. The packed data is constant, but the extractor code changes over time. The value in question is a extractor version used to create this row. There is a periodic job that attempts to find batches of rows that have fields extracted by an old version of the extractor. These rows are re-extracted from the packed data. So, most of the time the vast majority of rows will have CURRENT_VERSION as their version, and a small percentage of rows will have a previous version. The job will select rows where extracted_version != CURRENT_VERSION. If this query is not indexed, even doing a periodic check if any rows exist takes an absurd amount of time. At some point, the code changes, and CURRENT_VERSION gets incremented. Rows then slowly (over a period of days / weeks) get upgraded to the new current version, in batches of thousands. This is what I mean by a very slowly changing mostly-constant value. Hope that makes sense, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index over only uncommon values in table
On Jun 18, 2013, at 1:49 PM, David Johnston pol...@yahoo.com wrote: Steven Schlansker-3 wrote At some point, the code changes, and CURRENT_VERSION gets incremented. Rows then slowly (over a period of days / weeks) get upgraded to the new current version, in batches of thousands. This is what I mean by a very slowly changing mostly-constant value. This seems insane without knowing the details. This seems like it would be more of a cache invalidation problem. What percentage of your rows are being updated multiple times without ever being queried for other reasons? I am open to suggestions of how to do it better. The problem I face is that doing any sort of updates in one big go -- whether it be by ALTER TABLE statements or large UPDATE queries -- is all but unworkable. It takes days or weeks depending on what the update is, so any locking causes the entire system to grind to a halt. And there is nothing more depressing than losing 5 days of work on a huge UPDATE because something hiccuped. Hence, allowing outdated versions in the table, which then over time get upgraded in reasonably-sized batches. I was going to say that table partitioning (INHERITS) seems like a possibility; then I thought maybe not; now I'm back to suggesting you consider it. Every version of the extractor would get its own table. To upgrade you remove the record from the older table and add it to the newer one. Maybe even consider calling the these version_upgraded to distinguish them from records originally insert using the newest version. Or have original version as the partition key and a second current version field that varies. Not sure how the planner would be able to use constraint exclusion to limiting the scanning though… Interesting idea. I have been trying to avoid making code changes require schema changes as well -- it is very nice to not have to make schema changes for every code deployment. The code may get changed multiple times in the same day, if I am busy hacking on it. Having to muck around with table inheritance and changing partition definitions on code deployments seems unpleasant. Perhaps I am overestimating the work involved, but I am very much trying to keep the deployment process as brain-dead-simple as possible. Thanks for the input. Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Index over only uncommon values in table
On Jun 18, 2013, at 2:29 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Jun 18, 2013 at 12:17 PM, Steven Schlansker ste...@likeness.com wrote: Hi everyone, I assume this is not easy with standard PG but I wanted to double check. I have a column that has a very uneven distribution of values. ~95% of the values will be the same, with some long tail of another few dozens of values. I want to have an index over this value. Queries that select the most common value will not use the index, because it is a overwhelming percentage of the table. This means that ~95% of the disk space and IOPS to maintain the index is wasted. I cannot use a hardcoded partial index because: 1) The common value is not known at schema definition time, and may change (very slowly) over time. I think this is going to turn into a game of whack-a-mole. There is going to have to be some transition period during which all or most of the rows need to be indexed. So that amount of space is going to have to be available, and given that it is available, what advantage is there to using it some of the time and not using it some of the time? You can't feasibly use it for something else during the off periods, because then you will run into emergency out of space situations. This is a good point. I could define it further to e.g. only take values which make up 10% of the table, for example, which may solve this problem. But from the responses I've been getting it sounds like there's nothing standard that will just solve my problem and it's not worth the effort to me to cook up something intelligent right now. 2) JDBC uses prepared statements for everything, and the value to be selected is not known at statement prepare time, so any partial indices are ignored (this is a really really obnoxious behavior and makes partial indices almost useless combined with prepared statements, sadly…) What version are you using? This has been improved in 9.2.0. Thank goodness! We are in the process of migrating to the 9.2 branch now, so I am thrilled to get this. I'm sorry I didn't mention the version earlier, funny how the simplest things escape your mind when you're trying to make a good mailing list post... Thanks again for all the input. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] upsert functionality
On May 15, 2013, at 11:52 PM, Thomas Kellerer spam_ea...@gmx.net wrote: Sajeev Mayandi, 16.05.2013 07:01: Hi, Our company is planning to move to postreSQL. We were initially using sybase where upsert functionality was available using insert on existing update clause. I know there multiple ways to fix this using RULE or separate function in postgresql. But I would like to know which version of postgresql has support for upsert planned using an official syntax. I have postgresql 9.2 which does not have this feature, if its planned in near future, I would rather wait to migrate to PostgreSQL. You can use writeable CTEs for this purpose. There are several examples out there: http://www.xzilla.net/blog/2011/Mar/Upserting-via-Writeable-CTE.html http://www.depesz.com/2011/03/16/waiting-for-9-1-writable-cte/ http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/ http://stackoverflow.com/a/8702291/330315 One thing I didn't see mentioned in two of the links -- they mention race conditions, where multiple writers can still cause the faked UPSERT to fail. This can be avoided using SERIALIZABLE transactions, now that Postgres has SSI. http://wiki.postgresql.org/wiki/SSI I can vouch that we use writable CTEs and SERIALIZABLE to implement UPSERT in production with no issues thus far. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID
On May 10, 2013, at 7:14 AM, Matt Brock m...@mattbrock.co.uk wrote: Hello. We're intending to deploy PostgreSQL on Linux with SSD drives which would be in a RAID 1 configuration with Hardware RAID. My first question is essentially: are there any issues we need to be aware of when running PostgreSQL 9 on CentOS 6 on a server with SSD drives in a Hardware RAID 1 configuration? Will there be any compatibility problems (seems unlikely)? Should we consider alternative configurations as being more effective for getting better use out of the hardware? The second question is: are there any SSD-specific issues to be aware of when tuning PostgreSQL to make the best use of this hardware and software? A couple of things I noticed with a similar-ish setup: * Some forms of RAID / LVM break the kernel's automatic disk tuning mechanism. In particular, there is a rotational tunable that often does not get set right. You might end up tweaking read ahead and friends as well. http://www.mjmwired.net/kernel/Documentation/block/queue-sysfs.txt#112 * The default Postgres configuration is awful for a SSD backed database. You really need to futz with checkpoints to get acceptable throughput. The PostgreSQL 9.0 High Performance book is fantastic and is what I used to great success. * The default Linux virtual memory configuration is awful for this configuration. Briefly, it will accept a ton of incoming data, and then go through an awful stall as soon as it calls fsync() to write all that data to disk. We had multi-second delays all the way through to the application because of this. We had to change the zone_reclaim_mode and the dirty buffer limits. http://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com I am not sure that these numbers will end up being anywhere near what works for you, but these are my notes from tuning a 4xMLC SSD RAID-10. I haven't proven that this is optimal, but it was way better than the defaults. We ended up with the following list of changes: * Change IO scheduler to noop * Mount DB volume with nobarrier, noatime * Turn blockdev readahead to 16MiB * Turn sdb's rotational tuneable to 0 PostgreSQL configuration changes: synchronous_commit = off effective_io_concurrency = 4 checkpoint_segments = 1024 checkpoint_timeout = 10min checkpoint_warning = 8min shared_buffers = 32gb temp_buffers = 128mb work_mem = 512mb maintenance_work_mem = 1gb Linux sysctls: vm.swappiness = 0 vm.zone_reclaim_mode = 0 vm.dirty_bytes = 134217728 vm.dirty_background_bytes = 1048576 Hope that helps, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID
On May 10, 2013, at 11:38 AM, Merlin Moncure mmonc...@gmail.com wrote: PostgreSQL configuration changes: synchronous_commit = off that's good info, but it should be noted that synchronous_commit trades a risk of some data loss (but not nearly as much risk as volatile storage) for a big increase in commit performance. Yes, that is a choice we consciously made. If our DB server crashes losing the last few ms worth of transactions is an acceptable loss to us. But that may not be OK for everyone :-) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID
On May 10, 2013, at 11:35 AM, Lonni J Friedman netll...@gmail.com wrote: I am not sure that these numbers will end up being anywhere near what works for you, but these are my notes from tuning a 4xMLC SSD RAID-10. I haven't proven that this is optimal, but it was way better than the defaults. We ended up with the following list of changes: * Change IO scheduler to noop * Mount DB volume with nobarrier, noatime * Turn blockdev readahead to 16MiB * Turn sdb's rotational tuneable to 0 PostgreSQL configuration changes: synchronous_commit = off effective_io_concurrency = 4 checkpoint_segments = 1024 checkpoint_timeout = 10min checkpoint_warning = 8min shared_buffers = 32gb temp_buffers = 128mb work_mem = 512mb maintenance_work_mem = 1gb Linux sysctls: vm.swappiness = 0 vm.zone_reclaim_mode = 0 vm.dirty_bytes = 134217728 vm.dirty_background_bytes = 1048576 Can you provide more details about your setup, including: * What kind of filesystem are you using? * Linux distro and/or kernel version * hardware (RAM, CPU cores etc) * database usage patterns (% writes, growth, etc) Yes, as long as you promise not to just use my configuration without doing proper testing on your own system, even if it seems similar! Linux version 2.6.32.225 (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #2 SMP Thu Mar 29 16:43:20 EDT 2012 DMI: Supermicro X8DTN/X8DTN, BIOS 2.1c 10/28/2011 CPU0: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz stepping 02 Total of 24 processors activated (140796.98 BogoMIPS).(2 socket x 2 hyperthread x 6 cores) 96GB ECC RAM Filesystem is ext4 on LVM on hardware RAID 1+0 Adaptec 5405 Database is very much read heavy, but there is a base load of writes and bursts of much larger writes. I don't have specifics regarding how it breaks down. The database is about 400GB and is growing moderately, maybe a few GB/day. More of the write traffic is re-writes rather than writes. Hope that helps, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why?
On May 1, 2013, at 9:36 AM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: I have to ask myself, is it more likely that I have discovered some PG anomaly in 9.0 that no one has ever noticed, or that the client has accidentally launched the process twice and doesn't know it? Given my (admittedly limited) experience with the PostgreSQL project and dealing with clients, the latter ;-) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why?
On Apr 30, 2013, at 4:00 PM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: Hi Tom, There's nothing obviously wrong with that, which means the issue is in something you didn't show us. Care to assemble a self-contained example? Unfortunately, it happens erratically and very, very rarely so I can't give you something that will fail. I expected an occasional failure and there is a try-catch to handle it, I was just surprised when the client told me there was no other apps running against this table. I just wanted to make sure the logic was correct and that I wasn't doing something stupid or there is some known SQL or PG behaviour that would explain this. The only way I can see this happening is that an acache_mdx_logic_address_validation sneaks in before the insert and after the NOT EXISTS... SELECT. And for that to occur, the client must be mistaken and something else MUST be running and inserting into acache_mdx_logic_address_validation. Would you agree, or is there anything else to consider? INSERT INTO mdx_lib.acache_mdx_logic_address_validation ( address, postal_code, address_id ) SELECT '306 station 22 1 2 st' AS address, '29482' AS postal_code, 100165016 AS address_id WHERE NOT EXISTS ( SELECT 1 FROM mdx_lib.acache_mdx_logic_address_validation WHERE address = '306 station 22 1 2 st' AND postal_code = '29482' ) I know you said that it was the only active application at the time, but I figured I'd share my experience anyway… We do a somewhat more involved version of this to provide fake UPSERT functionality, and this failure mode happened more often than we'd guessed it would (due to concurrent updates). However, new in 9.1 is SSI -- http://wiki.postgresql.org/wiki/Serializable which means that if you run this transaction at isolation level SERIALIZABLE you will get serialization failures instead of duplicate key exceptions, which makes it easy to retry until success. So now we run any code that looks like this at SERIALIZABLE and are very happy with it. Not sure if that'll help, but hopefully! Best, Steven -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: April 25, 2013 5:06 PM To: Carlo Stonebanks Cc: pgsql-general@postgresql.org Subject: Re: Simple SQL INSERT to avoid duplication failed: why? Carlo Stonebanks stonec.regis...@sympatico.ca writes: Ok, I tried to be clever and I wrote code to avoid inserting duplicate data. The calling function has a try-catch to recover from this, but I am curious as to why it failed: There's nothing obviously wrong with that, which means the issue is in something you didn't show us. Care to assemble a self-contained example? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Using an index for IS DISTINCT FROM queries
Hi everyone, I have a large table (~150M rows) that keeps a version field. At any given time, it is expected that the vast majority of the rows are on the current version, but some may be missing. To figure out all the missing our outdated values, I run a query along the lines of SELECT id FROM source_table LEFT OUTER JOIN dest_table WHERE version IS DISTINCT FROM current-version However, this query always selects a sequential scan and hash of both tables, which is *very* slow. The statistics reflect that the planner knows that current-version is overwhelmingly common: schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation +---+--+---+---+---++--+---+--+- public | event | unpacker_version | f | 0 | 4 | 1 | {1} | {1} | | 1 but it doesn't help: event= explain select count(1) from event where unpacker_version is distinct from 1; QUERY PLAN -- Aggregate (cost=10658841.93..10658841.94 rows=1 width=0) - Seq Scan on event (cost=0.00..10658841.93 rows=1 width=0) Filter: (unpacker_version IS DISTINCT FROM 1) (3 rows) I can by hand force the planner to consider the obvious solution (find values below, above, and null) but it is ugly: event= explain select count(1) from event where unpacker_version 1 or unpacker_version 1 or unpacker_version is null; QUERY PLAN Aggregate (cost=139.14..139.15 rows=1 width=0) - Bitmap Heap Scan on event (cost=135.13..139.14 rows=1 width=0) Recheck Cond: ((unpacker_version 1) OR (unpacker_version 1) OR (unpacker_version IS NULL)) - BitmapOr (cost=135.13..135.13 rows=1 width=0) - Bitmap Index Scan on event_unpacker_version_idx (cost=0.00..45.04 rows=1 width=0) Index Cond: (unpacker_version 1) - Bitmap Index Scan on event_unpacker_version_idx (cost=0.00..45.04 rows=1 width=0) Index Cond: (unpacker_version 1) - Bitmap Index Scan on event_unpacker_version_idx (cost=0.00..45.04 rows=1 width=0) Index Cond: (unpacker_version IS NULL) (10 rows) but this sucks to do such tuning by munging the query. Is there some case where these are not equivalent? If they are equivalent, would this be a reasonable case for the planner to consider and optimize by itself? Thanks, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_restore from split files
On Apr 22, 2013, at 12:47 PM, akp geek akpg...@gmail.com wrote: pg_dump dbname -n schemaname -t table_name -Fc | split -b 500m -t table.dump Since you split the files outside of the Postgres world, you have to combine them again. Roughly, cat table.dump.* table.dump.combined pg_restore --usual-arguments table.dump.combined On Mon, Apr 22, 2013 at 3:41 PM, Igor Neyman iney...@perceptron.com wrote: How exactly did you create “split” dump? Igor Neyman From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of akp geek Sent: Monday, April 22, 2013 3:36 PM To: pgsql-general Subject: [GENERAL] pg_restore from split files Hi All - I have created a dump of a big table into 5 split files. What is the procedure to restore them using pg_dump. I am getting the following error pg_restore: [custom archiver] could not read from input file: end of file pg_restore: *** aborted because of error Appreciate your help. Regards -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Queries seldomly take 4s while normally take 1ms?
On Apr 9, 2013, at 11:25 AM, Scott Marlowe scott.marl...@gmail.com wrote: One of the most common causes I've seen for this is linux's vm.*dirty* settings to get in the way. Like so many linux kernel optimizations this one looks good on paper but gives at best middling improvements with occasional io storms that block everything else. On big mem machines doing a lot of writing IO I just set these to 0. Also tend to turn off swap as well as it's known to get in the way as well. settings for /etc/sysctl.conf vm.dirty_background_ratio = 0 vm.dirty_ratio = 0 I'll +1 on the you have to tune your Linux install advice. I found the PostgreSQL 9.0 High Performance book to be worth its weight in gold. A few days spent with the book and research on mailing lists improved our PostgreSQL performance multiple times over, and responsiveness under load by orders of magnitude. http://www.amazon.com/PostgreSQL-High-Performance-Gregory-Smith/dp/184951030X On Tue, Apr 9, 2013 at 3:41 AM, Christian Hammers c...@lathspell.de wrote: Hello I have a setup with one master and two slaves which are used by a closed source application. The database is asked the same query, a stored procedure, with different parameters about 4 million times per second at a peak rate of 150 times per second using 10 parallel connections. The slaves are decent Dell servers with 64GB RAM with dual hexacore CPUs and RAID10. Usually this stored procedure takes 1ms as it basically just does two selects against a GIST index for a prefix_range type. Seldomly though, about 1-3 times per day, one of these queries takes up to 4000ms! All those queries also runs in 1ms when executed manually some times later. Queries with similar parameters (who I supposed to use the same area of the index) also continues to run fast during that time. Queries with different paramers which are running parallel on different threads take 1ms, too, so it's not a general load problem. Cronjobs and other applications seem quiet during that time, there is no peak in any of our monitoring graphs. Automatic vacuum/analyze log entries on the master are not near the timestamps in question. So my problem seems not the query itself nor the way I indexed my data but what could it be? Some strange effects with streaming replication or cache invalidation? Apologies for not giving you reproducible problem but maybe you still have some ideas as I'm just curious as I've never seem such an effect during my MySQL years :-) The queries contain obvious customer data so I'm reluctant to give examples but again I doubt that an explain plan will help if only 1 out of 4E6 queries takes too long. bye, -christian- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- To understand recursion, one must first understand recursion. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Testing Technique when using a DB
On Mar 12, 2013, at 8:09 PM, Joe Van Dyk j...@tanga.com wrote: On Mar 12, 2013, at 8:42 AM, Perry Smith pedz...@gmail.com wrote: The other thought is perhaps there is a snap shot type concept. I don't see it in the list of SQL commands. A snap shot would do exactly what it sounds like. It would take a snap shot and save it somehow. Then a restore to snap shot would restore the DB back to that state. This would be super super super awesome, but it doesn't exist as far as I know. This would be a permanent snapshot that could be easily and quickly restored. I wonder if it would be possible to make an extension that made this easy to do. For what it's worth, I have something very similar to that to write JUnit tests in Java. While that might not be much help to the original poster's Ruby environment, the code is available as open source for anyone who thinks this is an interesting idea. https://github.com/NessComputing/components-ness-pg We got tired of maintaining a Postgres instance on every development box and trying to make sure they had reasonably close versions, so I hacked this together. It's fairly specific to the infrastructure we use but if there is any interest I could spruce it up a bit, write some documentation, and make it more stand-alone. Short description: * Drop your SQL files in the class path somewhere (src/test/resources for Maven folk) * Provides an embedded PostgreSQL instance that you can spin up and down on demand -- builds via shell script, auto detects platform * The SQL files are loaded into a Postgres template database, and then cloning the databases can be done relatively cheaply and quickly. * A fresh database is prepared in a thread in the background so one is always immediately available to test cases * Provides a handy JUnit @Rule so you can mix it in to test cases easily -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] best config
On Feb 6, 2013, at 8:14 AM, Roberto Scattini roberto.scatt...@gmail.com wrote: hi list, we have two new dell poweredge r720. based on recommendations from this list we have configued the five disks in raid10 + 1 hot spare. You might mention a bit more about how your drives are configured. 5 drives in a RAID1+0 sounds odd to me. now we are looking for advice in the postgres installation for our setup. we have two databases. one for a lot of small apps and one for one big app with a lot of data and a lot of usage. we want to use streaming replication to have a functional copy of databases in a failure. one of the ideas is to have one database running on each server, and then have another instance of the other database running in streaming replication (i mean, crossed replications). the other idea is to have both databases running in one server and backup everything in the other with streaming replication. which alternative would you use? I would not introduce the complexity of having each server be master for half of the data unless you can show that this improves some metric you care a lot about. Any failure or maintenance event will revert you back to the common configuration -- back to having both masters on one system -- until you do another promotion back to the cross wired setup. Extra work without a proposed gain. Plus then you can get away with half as many Postgres installs to maintain. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] best config
On Feb 6, 2013, at 9:55 AM, Roberto Scattini roberto.scatt...@gmail.com wrote: hi steven, we have two new dell poweredge r720. based on recommendations from this list we have configued the five disks in raid10 + 1 hot spare. You might mention a bit more about how your drives are configured. 5 drives in a RAID1+0 sounds odd to me. i mean, 4 disks in raid10, plus one disk as hot spare. also, wasn't this list where recommended this setup, was in debian-user. That makes a lot more sense. Nothing wrong with that setup :-) now we are looking for advice in the postgres installation for our setup. we have two databases. one for a lot of small apps and one for one big app with a lot of data and a lot of usage. we want to use streaming replication to have a functional copy of databases in a failure. one of the ideas is to have one database running on each server, and then have another instance of the other database running in streaming replication (i mean, crossed replications). the other idea is to have both databases running in one server and backup everything in the other with streaming replication. which alternative would you use? I would not introduce the complexity of having each server be master for half of the data unless you can show that this improves some metric you care a lot about. Any failure or maintenance event will revert you back to the common configuration -- back to having both masters on one system -- until you do another promotion back to the cross wired setup. Extra work without a proposed gain. Plus then you can get away with half as many Postgres installs to maintain. ok. we thought in this crossed-replication config because one heavy query in one of the databases wouldnt affect the performance of the other. Both of your servers need to be powerful enough to handle the whole load, otherwise your replication setup will not continue to function acceptably when one of the servers is offline due to a crash or maintenance. I don't think there is anything necessarily wrong with your proposal, I am just pointing out that simplicity is better than complexity unless you can prove (say, in a test environment) that your application actually performs better enough to justify the administrative cost with this cross-wired setup. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Using partial index in combination with prepared statement parameters
Hi, It's been covered a few times in the past, http://www.postgresql.org/message-id/banlktimft4ohqkb6y7m4wqrffpwnutp...@mail.gmail.com http://postgresql.1045698.n5.nabble.com/partial-indexes-not-used-on-parameterized-queries-td2121027.html but in a nutshell, partial indices do not play nicely with prepared statements because whether the index is valid or not cannot be known at query plan time. I am curious if there is any development on this? I have looked around but don't see any activity (whether Simon's I have a patch but don't hold your breath or otherwise). I ask largely because we use prepared statements everywhere and this makes it very hard to use partial indices, which would offer us significant performance gains. Does anyone know of any acceptable workaround? Is there continued interest in maybe improving the PostgreSQL behavior in this case? Thanks! Steven Schlansker -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Combine Date and Time Columns to Timestamp
On Jan 18, 2013, at 4:26 PM, Rich Shepard rshep...@appl-ecosys.com wrote: On Fri, 18 Jan 2013, Adrian Klaver wrote: How are they stored, as date and time type, strings, other? Adrian, ISO date and time. A sample of the data would help also. Example: 2012-10-29 | 10:19 | 2012-10-30 | 09:40 steven=# create temporary table date_test (d varchar, t varchar); CREATE TABLE steven=# insert into date_test values('2010-08-23', '8:04:33'); INSERT 0 1 steven=# select d::date + t::interval from date_test; ?column? - 2010-08-23 08:04:33 (1 row) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Noticed something odd with pgbench
On Nov 16, 2012, at 11:59 AM, Richard Huxton d...@archonet.com wrote: On 16/11/12 19:35, Shaun Thomas wrote: Hey guys, So, we have a pretty beefy system that runs dual X5675's with 72GB of RAM. After our recent upgrade to 9.1, things have been... odd. I managed to track it down to one setting: shared_buffers = 8GB Does this match what you're seeing? http://frosty-postgres.blogspot.co.uk/2012/08/postgresql-numa-and-zone-reclaim-mode.html (Slightly OT from the OP's question, sorry) Would this be worth referencing in the PostgreSQL documentation? I feel like I've read a lot of the documentation on Postgres tuning but this is news to me. And surprise surprise I'm running an affected system too! Maybe it deserves a more prominent warning, perhaps at http://www.postgresql.org/docs/9.2/interactive/performance-tips.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Confirming \timing output
On Aug 23, 2012, at 11:13 AM, Gauthier, Dave dave.gauth...@intel.com wrote: With \timing set on, I run an update statement and it reports Time: 0.524 ms Is that really 0.524 ms? As in 524 nanoseconds? 0.524ms = 524000ns Perhaps you meant microseconds? 0.524ms = 524us If all your data happens to be in RAM cache, simple queries can execute very fast! Unless you have a reason to believe it's wrong, I would trust it to be accurate :-) Also, is this wallclock time or some sort of indication of how much cpu it took? Thanks for any answers ! \timing measures wall time. There's a more detailed discussion of the difference between this and e.g. EXPLAIN ANALYZE here: http://postgresql.1045698.n5.nabble.com/What-does-timing-measure-td4289329.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Ignore hash indices on replicas
On Aug 19, 2012, at 8:01 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Tue, Jul 10, 2012 at 1:09 AM, Steven Schlansker ste...@likeness.com wrote: I'm using Postgres hash indices on a streaming replica master. As is documented, hash indices are not logged, so the replica does not have access to them. I understand that the current wisdom is don't use hash indices, but (unfortunately?) I have benchmarks that show that our particular application is faster by quite a bit when a hash index is available. You could use a slony slave and have different indexes etc between master and slave but it's more complex to setup, maintain and monitor for most people. Thanks for the suggestion, but we finally have replication working in a way we understand / like and I don't really consider this a viable option. The built-in replication has been treating us very well. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Ignore hash indices on replicas
On Aug 19, 2012, at 2:37 PM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote: I understand that the current wisdom is don't use hash indices, but (unfortunately?) I have benchmarks that show that our particular application is faster by quite a bit when a hash index is available. Can you publish the results somewhere? It might provoke some interest. I might be able to spend some time looking at making this public, but the general parameters are: 122M rows, lookup key is a UUID type. Lookups are ~1000 random keys at a time (as in, a giant SELECT * FROM table WHERE key IN (?,?,?,?,…) I assume that fixing the hash index logging issue hasn't been a priority due to low interest / technical limitations, but I'm curious for a stopgap measure -- can we somehow configure Postgres to ignore hash indices on a replica, using other b-tree indices or even a sequential scan? I know I can do this on a per-connection basis by disabling various index lookup methods, but it'd be nice if it just ignored invalid indices on its own. This might work for you: http://sigaev.ru/git/gitweb.cgi?p=plantuner.git;a=blob;hb=HEAD;f=README.plantuner Thanks for the link; that looks interesting. It is a bit unfortunate that I would have to find and exclude indices manually, but very doable... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgres will not compile on Mac 10.8 with contrib/uuid-ossp
It is not possible to compile Postgres contrib/uuid-ossp on the newest release of Mac OS X, 10.8 The specific compile error: make -C uuid-ossp install /bin/sh ../../config/install-sh -c -d '/usr/local/Cellar/postgresql/9.1.3/lib' /usr/bin/clang -Os -w -pipe -march=native -Qunused-arguments -I/usr/local/Cellar/ossp-uuid/1.6.2/include -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -I. -I. -I../../src/include -I/usr/local/Cellar/readline/6.2.2/include -I/usr/include/libxml2 -I/usr/include/libxml2 -c -o uuid-ossp.o uuid-ossp.c In file included from uuid-ossp.c:27: In file included from /usr/local/Cellar/ossp-uuid/1.6.2/include/uuid.h:38: /usr/include/unistd.h:689:26: error: expected identifier int gethostuuid(uuid_t, const struct timespec *) __OSX_AVAILABLE_STARTING(__MAC_10_5, __IPHONE_2_0); ^ In file included from uuid-ossp.c:27: /usr/local/Cellar/ossp-uuid/1.6.2/include/uuid.h:94:24: error: typedef redefinition with different types ('struct uuid_st' vs '__darwin_uuid_t' (aka 'unsigned char [16]')) typedef struct uuid_st uuid_t; ^ /usr/include/uuid/uuid.h:42:25: note: previous definition is here typedef __darwin_uuid_t uuid_t; ^ In file included from uuid-ossp.c:27: /usr/local/Cellar/ossp-uuid/1.6.2/include/uuid.h:107:22: error: conflicting types for 'uuid_compare' extern uuid_rc_t uuid_compare (const uuid_t *_uuid, const uuid_t *_uuid2, int *_result); ^ /usr/include/uuid/uuid.h:59:5: note: previous declaration is here int uuid_compare(const uuid_t uu1, const uuid_t uu2); ^ 3 errors generated. make[2]: *** [uuid-ossp.o] Error 1 make[1]: *** [install-uuid-ossp-recurse] Error 2 make[1]: *** Waiting for unfinished jobs…. Full log available here (not mine, but I have the exact same problem): https://gist.github.com/2287209 There is a workaround (disable uuid-ossp) but obviously this does not help those of us who require UUID support. There are a number of bugs open: https://github.com/mxcl/homebrew/issues/13639 https://trac.macports.org/ticket/35153 and a proposed fix which does fix the problem for me: This is caused by the inclusion of the system uuid.h in pwd.h. Just add the line : #define _XOPEN_SOURCE at the first line of uuid-ossp.c, and it should compile fine. I am not sure what the correct fix is, but adding the above definition does fix the issue. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres will not compile on Mac 10.8 with contrib/uuid-ossp
On Jul 30, 2012, at 7:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Steven Schlansker ste...@likeness.com writes: It is not possible to compile Postgres contrib/uuid-ossp on the newest release of Mac OS X, 10.8 This looks like some variant of the same issue that OSSP's uuid package has had on Macs all along, to wit an unreliable method for avoiding conflicts with any OS-supplied uuid type. Back when we last looked at this, http://archives.postgresql.org/pgsql-hackers/2007-11/msg00551.php we concluded that what we had to do was not include unistd.h ahead of uuid.h, and as far as I can tell we're still not doing that; for me, contrib/uuid-ossp still builds fine on Lion with uuid 1.6.2. Looks right to me. So assuming you're using a stock copy of 1.6.2, (I am) that leaves unistd.h as the only variable in the equation that could have changed. It's not obvious from here exactly how it changed, but in any case this fight is ultimately between OSSP uuid and OS X; there's not a lot Postgres can (or should) do to fix it. It's a bit distressing that OSSP doesn't seem to have made any new uuid releases since 2008, but maybe if you prod them they'll do something about this. regards, tom lane I've sent a message upstream. It seems that both their bug tracker and forum are either nonexistent or so broken that they seem so. So I will work on the assumption that this bug won't get fixed upstream… Would it be reasonable to include the _XOPEN_SOURCE define in the contrib module? It at least fixes this issue and hopefully wouldn't cause any ill effects. There's a patch at https://github.com/stevenschlansker/homebrew/commit/d86e17cbcc5d287d7a393a6754aa8f94b995c5ea It's not terribly extensively tested but it does compile and I would be very surprised if it caused problems. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Replication/cloning: rsync vs modification dates?
I think it's pretty easy to show that timestamp+size isn't good enough to do this 100% reliably. Imagine that your timestamps have a millisecond resolution. I assume this will vary based on OS / filesystem, but the point remains the same no matter what size it is. You can have multiple writes occur in the same quantized instant. If the prior rsync just happened to catch the first write (at T+0.1ms) in that instant but not the second (which happened at T+0.4ms), the second may not be transferred. But the modification time is the same for the two writes. All that said, I think the chances of this actually happening is vanishingly small. I personally use rsync without checksums and have had no problems. On Jul 16, 2012, at 2:42 PM, Chris Angelico wrote: On Tue, Jul 17, 2012 at 4:35 AM, Sergey Konoplev sergey.konop...@postgresql-consulting.com wrote: On Mon, Jul 16, 2012 at 8:01 PM, Chris Angelico ros...@gmail.com wrote: On Tue, Jul 17, 2012 at 1:58 AM, Michael Nolan htf...@gmail.com wrote: As I understand the docs for rsync, it will use both mod time and file size if told not to do checksums. I wonder if it is correct in general to use mtime and size to perform these checks from the point of view of PostgreSQL. If it works with the current version then is there a guaranty that it will work with the future versions? That was my exact question. Ideally, I'd like to hear from someone who works with the Postgres internals, but the question may not even be possible to answer. ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Ignore hash indices on replicas
I'm using Postgres hash indices on a streaming replica master. As is documented, hash indices are not logged, so the replica does not have access to them. I understand that the current wisdom is don't use hash indices, but (unfortunately?) I have benchmarks that show that our particular application is faster by quite a bit when a hash index is available. I assume that fixing the hash index logging issue hasn't been a priority due to low interest / technical limitations, but I'm curious for a stopgap measure -- can we somehow configure Postgres to ignore hash indices on a replica, using other b-tree indices or even a sequential scan? I know I can do this on a per-connection basis by disabling various index lookup methods, but it'd be nice if it just ignored invalid indices on its own. I've not seen much reference to this problem around, but I do apologize if I've missed it in the manual or it is extremely obvious how you do this :) Thanks, Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Suboptimal query plan fixed by replacing OR with UNION
On Jul 6, 2012, at 9:24 PM, Gurjeet Singh wrote: On Thu, Jul 5, 2012 at 7:16 PM, Steven Schlansker ste...@likeness.com wrote: On Jul 5, 2012, at 3:51 PM, Tom Lane wrote: Steven Schlansker ste...@likeness.com writes: Why is using an OR so awful here? Because the OR stops it from being a join (it possibly needs to return some rows that are not in the semijoin of the two tables). Why does it pick a sequential scan? Is this an optimizer bug No. It can't transform OR into a UNION because the results might not be the same. I assume you don't care about removal of duplicates, or have some reason to know that there won't be any ... but the planner doesn't know that. Thanks for the insight here. It still seems unfortunate that it picks a sequential scan -- but if there really is no more efficient way to do this, I will just rewrite the query. It might not be applicable to this case (because of the use of ANY in second branch of OR clause), but some databases provide a feature called OR-Optimization, where the optimizer breaks up the query at OR clause boundaries and uses UNION ALL operator to join the resulting queries, just like you did. Optimizer does need to add additional AND clauses to some of the branches to make sure the result set is not affected. That sounds like a great optimization for Postgres, but unfortunately it's far outside of my skill set / time to contribute, so I'd have to wait for a real PG dev to get to it :) Just a thought. -- Gurjeet Singh EnterpriseDB Corporation The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Suboptimal query plan fixed by replacing OR with UNION
On Jul 5, 2012, at 6:35 PM, Jasen Betts wrote: I note you've decided to rewrite this query as a union SELECT * FROM account WHERE user_id in (SELECT user_id FROM account WHERE id = ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}')) OR id = ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'); I notice both arrays (used with = ANY) have the exact same content, if this is always true you can use a CTE here for the ID=ANY(...) query and reference the CTE on both sides of the union. Thanks for the idea! I'll be sure to incorporate that. Doesn't fix the unfortunate behavior with OR, though. WITH i as ( SELECT * FROM account WHERE id = ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}') ) SELECT * from i UNION DISTINCT SELECT account.* from account join i on i.user_id = account.userid ; -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Suboptimal query plan fixed by replacing OR with UNION
) Index Cond: (id = ANY ('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[])) - Bitmap Heap Scan on account (cost=37.20..3188.55 rows=803 width=160) Recheck Cond: (user_id = public.account.user_id) - Bitmap Index Scan on account_user_id_idx (cost=0.00..37.00 rows=803 width=0) Index Cond: (user_id = public.account.user_id) (10 rows) ness_user=# explain SELECT * FROM account WHERE id = ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'); QUERY PLAN - Bitmap Heap Scan on account (cost=17.56..29.58 rows=3 width=160) Recheck Cond: (id = ANY ('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[])) - Bitmap Index Scan on account_id_user_id_idx (cost=0.00..17.56 rows=3 width=0) Index Cond: (id = ANY ('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[])) (4 rows) (where reasonable is defined as not a sequential scan) Upon seeing this -- I had a crazy idea. What if I just paste them together with a UNION DISTINCT? ness_user=# explain SELECT * FROM account WHERE ness_user-# user_id in (SELECT user_id FROM account WHERE id = ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}')) UNION DISTINCT ness_user-# SELECT * FROM account WHERE ness_user-# id = ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'); QUERY PLAN - HashAggregate (cost=3342.22..3366.35 rows=2413 width=160) - Append (cost=66.79..3281.90 rows=2413 width=160) - Nested Loop (cost=66.79..3228.18 rows=2410 width=160) - HashAggregate (cost=29.59..29.60 rows=1 width=16) - Bitmap Heap Scan on account (cost=17.56..29.58 rows=3 width=16) Recheck Cond: (id = ANY ('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[])) - Bitmap Index Scan on account_id_user_id_idx (cost=0.00..17.56 rows=3 width=0) Index Cond: (id = ANY ('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[])) - Bitmap Heap Scan on account (cost=37.20..3188.55 rows=803 width=160) Recheck Cond: (user_id = public.account.user_id) - Bitmap Index Scan on account_user_id_idx (cost=0.00..37.00 rows=803 width=0) Index Cond: (user_id = public.account.user_id) - Bitmap Heap Scan on account (cost=17.56..29.58 rows=3 width=160) Recheck Cond: (id = ANY ('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[])) - Bitmap Index Scan on account_id_user_id_idx (cost=0.00..17.56 rows=3 width=0) Index Cond: (id = ANY ('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[])) (16 rows) Wow! Changing the query from using an OR clause to a UNION DISTINCT with two SELECTs reduced the cost from 1379485.60 to 3366.35! And the gains are realized when you actually execute the query. Why is using an OR so awful here? Why does it pick a sequential scan? Is this an optimizer bug or have I missed something in my queries? Thanks much for any advice, Steven Schlansker -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Suboptimal query plan fixed by replacing OR with UNION
On Jul 5, 2012, at 3:51 PM, Tom Lane wrote: Steven Schlansker ste...@likeness.com writes: Why is using an OR so awful here? Because the OR stops it from being a join (it possibly needs to return some rows that are not in the semijoin of the two tables). Why does it pick a sequential scan? Is this an optimizer bug No. It can't transform OR into a UNION because the results might not be the same. I assume you don't care about removal of duplicates, or have some reason to know that there won't be any ... but the planner doesn't know that. Thanks for the insight here. It still seems unfortunate that it picks a sequential scan -- but if there really is no more efficient way to do this, I will just rewrite the query. Steven -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general