Re: [GENERAL] any way for a transaction to see inserts done earlier in the transaction?

2014-04-16 Thread Steven Schlansker

On Apr 16, 2014, at 4:27 PM, Susan Cassidy 
susan.cass...@decisionsciencescorp.com wrote:

 Is there any way to let a transaction see the inserts that were done 
 earlier in the transaction?  I want to insert a row, then later use it within 
 the same transaction.
 
 If not, I will have to commit after each insert, and I don't want to do that 
 until add the rows are added, if I can possibly avoid it.

Did you try it?  This is already how it works, unless I misunderstand your 
question…

postgres=# create temporary table foo (i integer primary key);
CREATE TABLE
postgres=# begin;
BEGIN
postgres=# insert into foo values(1);
INSERT 0 1
postgres=# select * from foo;
 i 
---
 1
(1 row)

postgres=# commit;
COMMIT




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] any way for a transaction to see inserts done earlier in the transaction?

2014-04-16 Thread Steven Schlansker


 On Wed, Apr 16, 2014 at 4:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Susan Cassidy susan.cass...@decisionsciencescorp.com writes:
  Is there any way to let a transaction see the inserts that were done
  earlier in the transaction?
 
 It works that way automatically, as long as you're talking about separate
 statements within one transaction.
 
 regards, tom lane

 On Apr 16, 2014, at 4:53 PM, Susan Cassidy 
 susan.cass...@decisionsciencescorp.com wrote:
 Well, it isn't working for me right now.  It can't see a row that was 
 inserted earlier in the transaction.  It is a new primary key, and when I 
 SELECT it, it isn't found.  
 

Can you share the code that does not work with us?  Preferably as a small 
self-contained example.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Is it safe to stop postgres in between pg_start_backup and pg_stop_backup?

2014-04-03 Thread Steven Schlansker

On Apr 2, 2014, at 3:08 PM, Jacob Scott jacob.sc...@gmail.com wrote:

 Hi, 

Hello there ;)

 
 
 Does upgrading a a disk being used by postgres (9.1, on Ubuntu) with the 
 following process sound safe?
   • pg_start_backup
   • Take a filesystem snapshot (of a volume containing postgres data but 
 not pg_xlog)
   • Bring a new higher performing disk online from snapshot
   • pg_ctl stop
   • switch disks (umount/remount at same mountpoint)
   • pg_ctl start
   • pg_stop_backup
 This seems like an odd use case because pg_start_backup is designed for 
 performing on-line backups, but I think it will give me minimum downtime.

At the very least you need to move your pg_stop_backup earlier in the process.  
Online backups do not survive server shutdowns; any backup in process at 
shutdown is aborted.

• pg_start_backup
• Take a filesystem snapshot (of a volume containing postgres data but 
not pg_xlog)
• pg_stop_backup
• pg_ctl stop
• Bring a new higher performing disk online from snapshot
• switch disks (umount/remount at same mountpoint)
• pg_ctl start

Assuming you ensure that your archived xlogs are available same to the new 
instance as the old, I believe this should work.  But run it on a test instance 
first!

It sounds like an odd use case but really it’s no different from “the data is 
gone, restore from backup” — just that you intentionally trashed the data by 
switching disks :)

Another option you could consider is rsync.  I have often transferred databases 
by running rsync concurrently with the database to get a “dirty backup” of it.  
Then once the server is shutdown you run a cleanup rsync which is much faster 
than the initial run to ensure that the destination disk is consistent and up 
to date.  This way your downtime is limited to how long it takes rsync to 
compare fs trees / fix the inconsistencies.

Good luck!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trimming transaction logs after extended WAL archive failures

2014-03-27 Thread Steven Schlansker

On Mar 27, 2014, at 5:29 AM, Michael Paquier michael.paqu...@gmail.com wrote:

 On Thu, Mar 27, 2014 at 1:42 AM, Steven Schlansker ste...@likeness.com 
 wrote:
 
 On Mar 25, 2014, at 7:58 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:
 
 Yea, vacuum just marks space as available for reuse it does not actually 
 free space.
 
 
 I even knew that.  Funny what you'll forget when the system is down and 
 you're in a panic.
 
 This is actually something that has bit me on more than one occasion -- if 
 you accidentally temporarily use too much space, it is *very* hard to back 
 out of the situation.  It seems that the only way to actually release space 
 to the system are VACUUM FULL, CLUSTER, or to DROP objects.  None of these 
 can be executed without severe disruption to a running database. A cluster 
 operation on any of our tables that are large enough to matter can easily 
 run through the night.
 Yep, depending on your application needs you could actually avoid any
 periodic VACUUM FULL-like operations that need an exclusive lock on
 the objects it is cleaning by making autovacuum more aggressive. This
 makes your house cleaner by dropping the garbage at a higher
 frequency.

Yes, although this will not save you in a couple of notable cases.  We have run 
into this at least by:

* Accidentally leaving transactions open for days
* Runaway process inserting data until the disk fills

So yes we should autovacuum more, but it is not a total solution.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trimming transaction logs after extended WAL archive failures

2014-03-26 Thread Steven Schlansker

On Mar 25, 2014, at 7:58 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:

 On 03/25/2014 04:52 PM, Steven Schlansker wrote:
 
 
 Some more questions, what happens when things begin to dawn on me:)
 
 You said the disk filled up entirely with log files yet currently the 
 number(size) of logs is growing.
 
 It’s holding stable now.  I tried to vacuum up to clean some space which 
 turned out to generate more pg_xlog activity than it saved space, and (I 
 assume) the archiver fell behind and that was the source of the growing log. 
  There haven’t been any new segments since I stopped doing that.
 
 Yea, vacuum just marks space as available for reuse it does not actually free 
 space.
 

I even knew that.  Funny what you’ll forget when the system is down and you’re 
in a panic.

This is actually something that has bit me on more than one occasion — if you 
accidentally temporarily use too much space, it is *very* hard to back out of 
the situation.  It seems that the only way to actually release space to the 
system are VACUUM FULL, CLUSTER, or to DROP objects.  None of these can be 
executed without severe disruption to a running database.  A cluster operation 
on any of our tables that are large enough to matter can easily run through the 
night.

I can only keep wishing for a CLUSTER CONCURRENTLY or VACUUM FULL CONCURRENTLY 
that can run without a temporary copy of the entire table...



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trimming transaction logs after extended WAL archive failures

2014-03-26 Thread Steven Schlansker

On Mar 26, 2014, at 9:04 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Tue, Mar 25, 2014 at 6:33 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tuesday, March 25, 2014, Steven Schlansker ste...@likeness.com wrote:
 Hi everyone,
 
 I have a Postgres 9.3.3 database machine.  Due to some intelligent work on 
 the part of someone who shall remain nameless, the WAL archive command 
 included a ‘ /dev/null 21’ which masked archive failures until the disk 
 entirely filled with 400GB of pg_xlog entries.
 
 PostgreSQL itself should be logging failures to the server log, regardless of 
 whether those failures log themselves.
 
 
 I have fixed the archive command and can see WAL segments being shipped off 
 of the server, however the xlog remains at a stable size and is not 
 shrinking.  In fact, it’s still growing at a (much slower) rate.
 
 The leading edge of the log files should be archived as soon as they fill up, 
 and recycled/deleted two checkpoints later.  The trailing edge should be 
 archived upon checkpoints and then recycled or deleted.  I think there is a 
 throttle on how many off the trailing edge are archived each checkpoint.  So 
 issues a bunch of  CHECKPOINT; commands for a while and see if that clears 
 it up.

Indeed, forcing a bunch of CHECKPOINTS started to get things moving again.

 
 Actually my description is rather garbled, mixing up what I saw when 
 wal_keep_segments was lowered, not when recovering from a long lasting 
 archive failure.  Nevertheless, checkpoints are what provoke the removal of 
 excessive WAL files.  Are you logging checkpoints?  What do they say?  Also, 
 what is in pg_xlog/archive_status ?
  

I do log checkpoints, but most of them recycle and don’t remove:
Mar 26 16:09:36 prd-db1a postgres[29161]: [221-1] db=,user= LOG:  checkpoint 
complete: wrote 177293 buffers (4.2%); 0 transaction log file(s) added, 0 
removed, 56 recycled; write=539.838 s, sync=0.049 s, total=539.909 s; sync 
files=342, longest=0.015 s, average=0.000 s

That said, after letting the db run / checkpoint / archive overnight, the xlog 
did indeed start to slowly shrink.  The pace at which it is shrinking is 
somewhat unsatisfying, but at least we are making progress now!

I guess if I had just been patient I could have saved some mailing list 
traffic.  But patience is hard when your production database system is running 
at 0% free disk :)

Thanks everyone for the help, if the log continues to shrink, I should be out 
of the woods now.

Best,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Trimming transaction logs after extended WAL archive failures

2014-03-25 Thread Steven Schlansker
Hi everyone,

I have a Postgres 9.3.3 database machine.  Due to some intelligent work on the 
part of someone who shall remain nameless, the WAL archive command included a 
‘ /dev/null 21’ which masked archive failures until the disk entirely filled 
with 400GB of pg_xlog entries.

I have fixed the archive command and can see WAL segments being shipped off of 
the server, however the xlog remains at a stable size and is not shrinking.  In 
fact, it’s still growing at a (much slower) rate.

I’ve seen references to people just deleting “old” segment files or using 
pg_resetxlog to fix this situation, however I already know that the response 
from the mailing list will be “that’s insane, don’t do that”.

So what is the correct solution to pursue here?  The steady state of the 
machine should have enough space, I just need to reclaim some of it...  Thanks 
for any guidance!

Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trimming transaction logs after extended WAL archive failures

2014-03-25 Thread Steven Schlansker

On Mar 25, 2014, at 3:52 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:

 On 03/25/2014 01:56 PM, Steven Schlansker wrote:
 Hi everyone,
 
 I have a Postgres 9.3.3 database machine.  Due to some intelligent work on 
 the part of someone who shall remain nameless, the WAL archive command 
 included a ‘ /dev/null 21’ which masked archive failures until the disk 
 entirely filled with 400GB of pg_xlog entries.
 
 I have fixed the archive command and can see WAL segments being shipped off 
 of the server, however the xlog remains at a stable size and is not 
 shrinking.  In fact, it’s still growing at a (much slower) rate.
 
 So what is wal_keep_segments set at in postgresql.conf?
 

5000.  There are currently about 18000 WAL segments in pg_xlog.

 
 I’ve seen references to people just deleting “old” segment files or using 
 pg_resetxlog to fix this situation, however I already know that the response 
 from the mailing list will be “that’s insane, don’t do that”.
 
 So what is the correct solution to pursue here?  The steady state of the 
 machine should have enough space, I just need to reclaim some of it...  
 Thanks for any guidance!
 
 Steven
 
 
 
 
 
 -- 
 Adrian Klaver
 adrian.kla...@aklaver.com



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trimming transaction logs after extended WAL archive failures

2014-03-25 Thread Steven Schlansker

On Mar 25, 2014, at 4:02 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:

 On 03/25/2014 03:54 PM, Steven Schlansker wrote:
 
 On Mar 25, 2014, at 3:52 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:
 
 On 03/25/2014 01:56 PM, Steven Schlansker wrote:
 Hi everyone,
 
 I have a Postgres 9.3.3 database machine.  Due to some intelligent work on 
 the part of someone who shall remain nameless, the WAL archive command 
 included a ‘ /dev/null 21’ which masked archive failures until the disk 
 entirely filled with 400GB of pg_xlog entries.
 
 I have fixed the archive command and can see WAL segments being shipped 
 off of the server, however the xlog remains at a stable size and is not 
 shrinking.  In fact, it’s still growing at a (much slower) rate.
 
 So what is wal_keep_segments set at in postgresql.conf?
 
 
 5000.  There are currently about 18000 WAL segments in pg_xlog.
 
 I guess what I should have also asked previously is what exactly are you 
 doing, are you streaming as well as archiving?

Yes, we have both enabled.  Here’s some hopefully relevant configuration 
stanzas and information:

checkpoint_segments = 1024# in logfile segments, min 1, 16MB 
each
checkpoint_timeout = 10min  # range 30s-1h
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0
checkpoint_warning = 8min   # 0 disables


archive_mode = on # allows archiving to be done
archive_command = 'rsync -q %p pgbac...@d0028.nessops.net:./wal/prd-db1a/%f'
   # command to use to archive a logfile segment
archive_timeout = 0# force a logfile segment switch after this
# number of seconds; 0 disables


# These settings are ignored on a standby server

max_wal_senders = 10# max number of walsender processes
wal_keep_segments = 5000  # in logfile segments, 16MB each; 0 disables
vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is delayed

wal_sender_timeout = 60s  # in milliseconds; 0 disables
synchronous_standby_names = '' # standby servers that provide sync rep
# comma-separated list of application_name
# from standby(s); '*' = all

# - Standby Servers -

# These settings are ignored on a master server

hot_standby = on  # on allows queries during recovery
max_standby_archive_delay = 30s# max delay before canceling queries
# when reading WAL from archive;
# -1 allows indefinite delay

max_standby_streaming_delay = 30s  # max delay before canceling queries
# when reading streaming WAL;
# -1 allows indefinite delay
wal_receiver_status_interval = 10s # send replies at least this often
# 0 disables
hot_standby_feedback = off # send info from standby to prevent
# query conflicts

[root@prd-db1a data]# ls -1 pg_xlog | wc -l
20168

I have verified that WAL segments are being archived to the archive 
destination, and that the slave is connected and receiving segments.

Thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trimming transaction logs after extended WAL archive failures

2014-03-25 Thread Steven Schlansker

On Mar 25, 2014, at 4:45 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:

 On 03/25/2014 04:17 PM, Steven Schlansker wrote:
 
 On Mar 25, 2014, at 4:02 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:
 
 On 03/25/2014 03:54 PM, Steven Schlansker wrote:
 
 On Mar 25, 2014, at 3:52 PM, Adrian Klaver adrian.kla...@aklaver.com 
 wrote:
 
 On 03/25/2014 01:56 PM, Steven Schlansker wrote:
 Hi everyone,
 
 I have a Postgres 9.3.3 database machine.  Due to some intelligent work 
 on the part of someone who shall remain nameless, the WAL archive 
 command included a ‘ /dev/null 21’ which masked archive failures 
 until the disk entirely filled with 400GB of pg_xlog entries.
 
 I have fixed the archive command and can see WAL segments being shipped 
 off of the server, however the xlog remains at a stable size and is not 
 shrinking.  In fact, it’s still growing at a (much slower) rate.
 
 So what is wal_keep_segments set at in postgresql.conf?
 
 
 5000.  There are currently about 18000 WAL segments in pg_xlog.
 
 I guess what I should have also asked previously is what exactly are you 
 doing, are you streaming as well as archiving?
 
 Yes, we have both enabled.  Here’s some hopefully relevant configuration 
 stanzas and information:
 
 
 
 I have verified that WAL segments are being archived to the archive 
 destination, and that the slave is connected and receiving segments.
 
 Some more questions, what happens when things begin to dawn on me:)
 
 You said the disk filled up entirely with log files yet currently the 
 number(size) of logs is growing.

It’s holding stable now.  I tried to vacuum up to clean some space which turned 
out to generate more pg_xlog activity than it saved space, and (I assume) the 
archiver fell behind and that was the source of the growing log.  There haven’t 
been any new segments since I stopped doing that.

 
 So did you grow the disk, move the logs or find some way to reduce the number?

I used tune2fs to use some of the “reserved” filesystem space temporarily.  I 
was too scared to move log segments away, this is a production database.

 
 What happened to the server when the disk filled up?

Postgresql PANICed due to failed writes.
Mar 25 22:46:41 prd-db1a postgres[18995]: [12-1] db=checkin,user=postgres 
PANIC:  could not write to file pg_xlog/xlogtemp.18995: No space left on 
device

 In other words do the log entries at the time show it recovered gracefully?

The database is currently up and running, although I do not have much time 
until it fails again, there are only a few precious GB free.

 If not what did you do to get it running again?
 

tune2fs and restarted postgres

 The concern being that the server is actually fully recovered.

I believe it is.  Our production site is back up and running seemingly 
normally, the postgres log has no obvious complaining.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [JDBC] Cannot insert to 'path' field using EclipseLink

2014-03-12 Thread Steven Schlansker


 On Mar 12, 2014, at 10:12 AM, Daryl Foster daryl.fos...@oncenter.com wrote:
 
 java.lang.ClassCastException: org.postgresql.geometric.PGpath cannot be cast 
 to org.postgresql.geometric.PGpath
 

That's a sure sign of ClassLoader confusion. Make sure there is only one copy 
of the driver jar in your application or the JBoss container, but never both. 

 
 On Wed, Mar 12, 2014 at 7:37 AM, Daryl Foster daryl.fos...@oncenter.com 
 wrote:
 I have a java app running in JBoss that uses EclipseLink to persist to a 
 Postgres database. I've added a field with a 'path' datatype to one of the 
 tables but I keep getting the following exception when I try to insert data:
 
 org.postgresql.util.PSQLException: Can't infer the SQL type to use for an 
 instance of org.postgresql.geometric.PGpath. Use setObject() with an 
 explicit Types value to specify the type to use.
 
 Here's the table definition:
 
 CREATE TABLE schema.table_name
 (
 item_iduuid NOT NULL,
 item_path  path NOT NULL
 )
 
 The java entity is representing the item_path field as a ListPoint object, 
 and I'm using a converter to map from the ListPoint object to a PGpath 
 object:
 
 import org.eclipse.persistence.mappings.DatabaseMapping;
 import org.eclipse.persistence.mappings.converters.Converter;
 import org.eclipse.persistence.sessions.Session;
 import org.postgresql.geometric.PGpath;
 import java.awt.Point;
 import java.util.ArrayList;
 import java.util.List;
 import static java.sql.Types.OTHER;
 
 public class PgPathConverter implements Converter
 {
 @Override
 public boolean isMutable ()
 {
 return false;
 }
 
 @Override
 public ListPoint convertDataValueToObjectValue (Object value, 
 Session session)
 {
 // Code that converts PGpath to ListPoint
 }
 
 @Override
 public PGpath convertObjectValueToDataValue (Object value, Session 
 session)
 {
 // Code that converts ListPoint to PGpath
 }
 
 @Override
 public void initialize (DatabaseMapping mapping, Session session)
 {
 mapping.getField ().setSqlType (OTHER);
 }
 }
 
 The entity class is defined as follows:
 
 @Entity
 @Table (
 name   = table_name,
 schema = schema
 )
 @Converter (
 name   = path,
 converterClass = PgPathConverter.class
 )
 public class TableName
 {
 public TableName () {}
 private static final long serialVersionUID = 1L;
 
 @Column (name = item_path)
 @Convert (path)
 private ListPoint m_ItemPath;
 
 @Id
 @Column (
 name = item_id,
 unique   = true,
 nullable = false
 )
 private UUIDm_ItemId;
 
 public UUID getItemId ()
 {
 return m_ItemId;
 }
 
 public ListPoint getItemPath ()
 {
 return m_InkPath;
 }
 
 public void setItemId (UUID itemId)
 {
 m_ItemId = itemId;
 }
 
 public void setInkPath (ListPoint itemPath)
 {
  m_ItemPath = itemPath;
 }
 }
 
 Finally, here's the exception I get when I call `EntityManager.persist 
 (entity)`:
 
 18:10:33,789 ERROR [org.jboss.as.ejb3] (http-/0.0.0.0:8080-1) 
 javax.ejb.EJBTransactionRolledbackException: Exception [EclipseLink-4002] 
 (Eclipse Persistence Services - 2.5.0.v20130507-3faac2b): 
 org.eclipse.persistence.exceptions.DatabaseException
 Internal Exception: org.postgresql.util.PSQLException: Can't infer the 
 SQL type to use for an instance of org.postgresql.geometric.PGpath. Use 
 setObject() with an explicit Types value to specify the type to use.
 Error Code: 0
 Call: INSERT INTO schema.table_name (item_id, item_path) VALUES (?, ?)
  bind = [2 parameters bound]
 18:10:33,789 ERROR [org.jboss.as.ejb3.invocation] (http-/0.0.0.0:8080-1) 
 JBAS014134: EJB Invocation failed on component TableNameRepository for 
 method public void com.mycompany.myproject.data.Repository.flush() throws 
 javax.persistence.TransactionRequiredException,javax.persistence.PersistenceException:
  javax.ejb.EJBTransactionRolledbackException: Exception [EclipseLink-4002] 
 (Eclipse Persistence Services - 2.5.0.v20130507-3faac2b): 
 org.eclipse.persistence.exceptions.DatabaseException
 Internal Exception: org.postgresql.util.PSQLException: Can't infer the 
 SQL type to use for an instance of org.postgresql.geometric.PGpath. Use 
 setObject() with an explicit Types value to specify the type to use.
 Error Code: 0
 Call: INSERT INTO schema.table_name (item_id, item_path VALUES (?, ?)
  bind = [2 parameters bound]
 at 
 org.jboss.as.ejb3.tx.CMTTxInterceptor.handleInCallerTx(CMTTxInterceptor.java:138)
  

[GENERAL] Timing for release with fix for Nov2013ReplicationIssue

2013-12-03 Thread Steven Schlansker
Hi everyone,

I’ve seen murmuring on the list regarding 
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue

Is there an ETA on a release with the bug fix for this?  I’m putting off 
building from source because I prefer to use the pgdg RPM packages, but if we 
don’t get a release soon it might force my hand :)

Thanks,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Timing for release with fix for Nov2013ReplicationIssue

2013-12-03 Thread Steven Schlansker

On Dec 3, 2013, at 2:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Steven Schlansker stevenschlans...@gmail.com writes:
 I’ve seen murmuring on the list regarding 
 https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue
 
 Is there an ETA on a release with the bug fix for this?  I’m putting off 
 building from source because I prefer to use the pgdg RPM packages, but if 
 we don’t get a release soon it might force my hand :)
 
 http://www.postgresql.org/message-id/21902.1385241...@sss.pgh.pa.us

Thanks!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Blowfish Encrypted String

2013-09-26 Thread Steven Schlansker

On Sep 26, 2013, at 10:55 AM, Craig Boyd craigbo...@gmail.com wrote:

 Dmitriy,
 
 Thank you very much for the reply!
 
 Right...got the the data type...but how do I actually insert a binary string 
 into the column?  What does the SQL look like?
 For the moment assume I have the following bit of binary: 4PO„âÔ™ä²
 
 What does the SQL look like to insert that into the bytea column?  
 (FYI...most of my binary data would be only slightly bigger than what I just 
 pasted here.  I am not storing Word documents or images, just some encrypted 
 strings of test that are not more than 100 characters long and typically 
 about 16 to 20.
 

http://www.postgresql.org/docs/devel/static/datatype-binary.html

Most client libraries will have functionality to do this nicely (e.g. sending a 
Java byte[] to a Postgres bytea); it sucks somewhat to do it by hand in SQL.

 Thanks,
 
 Craig Boyd
 
 
 On Thu, Sep 26, 2013 at 11:46 AM, Dmitriy Igrishin dmit...@gmail.com wrote:
 
 
 
 2013/9/26 Craig Boyd craigbo...@gmail.com
 Hello All,
 
 I have a string in a program that I have encrypted using Blowfish and I am 
 now trying to figure out the best way to store that in PostgreSQL so that I 
 can store it and retrieve it later for decryption.  I have searched around 
 and have not found some good examples of how to go about this.
 
 So my first questions is: Can someone point me to a tutorial or posting that 
 shows how one might do that?
 
 Failing that: 
 What data type should I use to store this?
 I believe that you should use bytea datatype. 
 What does the SQL look like to INSERT/SELECT the field?
 Just like any other INSERT/SELECT query.
 
 
 -- 
 // Dmitriy.
 
 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JDBC driver for Postgres 9.3

2013-09-26 Thread Steven Schlansker
On Sep 26, 2013, at 6:35 AM, Kohler  Manuel manuel.koh...@bsse.ethz.ch 
wrote:

 Hi,
 we are developing a Java based software with Postgres as a DB.
 Could someone tell me if there will be a JDBC driver for 9.3 out soon or
 is it safe and recommended to use the latest JDBC driver available?
 Currently we are using:
 PostgreSQL 9.2-1000 JDBC4 (build 1000)
 

I believe the 9.3 JDBC driver is coming soon, but the latest driver works just 
fine (we use it in production)



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trying to create DB / user to import some data

2013-09-26 Thread Steven Schlansker

On Sep 26, 2013, at 10:28 PM, mdr monosij.for...@gmail.com wrote:
 
 create user import_dbms_user with password 'import_dbms';
 create database import_dbms_db;
 grant all privileg

 However when I try to run psql from the command line:
 psql -h localhost -U import_dbms_user -WI enter password when prompted
 Password for user import_dbms_user: 
 psql: FATAL:  database import_dbms_user does not exist
 
 But I get the error as above.

By default, psql will try to use the username as the database name if it is not 
specified.
Since your database name is different from the user, this does not work.

You could either name both your user and db import_dbms and have the default 
work, or specify the database explicitly with -d import_dbms_db

Hope that helps.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deduplication and transaction isolation level

2013-09-25 Thread Steven Schlansker

On Sep 25, 2013, at 6:04 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Tue, Sep 24, 2013 at 10:19 PM, François Beausoleil
 franc...@teksol.info wrote:
 Hi all!
 
 I import many, many rows of data into a table, from three or more computers, 
 4 times per hour. I have a primary key, and the query I use to import the 
 data is supposed to dedup before inserting, but I still get primary key 
 violations.
 
 The import process is:
 
 * Load CSV data into temp table
 * INSERT INTO dest SELECT DISTINCT (pkey) FROM temp WHERE NOT 
 EXISTS(temp.pkey = dest.pkey)
 
 I assumed (erroneously) that this would guarantee no duplicate data could 
 make it into the database. The primary key violations are proving me wrong.
 
 Right.  Transaction A and B are interleaved: they both run the same
 check against the same id at the same time.  Both checks pass because
 neither transaction is committed.  This problem is not solvable by
 adjusting the isolation level.

Are you sure that this is the case?  It is my understanding that since 9.1 with 
SSI (https://wiki.postgresql.org/wiki/SSI) if you set the transaction isolation 
level to SERIALIZABLE, this problem is solved, as the insert will take a 
predicate lock and the other insert cannot succeed.

We use this to detect / resolve concurrent inserts that violate primary keys 
and it works great.

However in this case it probably doesn't help the OP because the cost of 
restarting the entire import is likely too high.

 
 Typical solutions might be to:
 A. Lock the table while inserting
 B. Retry the transaction following an error.
 C. Import the records to a staging table, then copy the do the
 deduplication check when moving from the staging table
 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ZFS snapshots - to use pg_start_backup() and pg_stop_backup() or no?

2013-09-11 Thread Steven Schlansker

On Sep 11, 2013, at 4:29 PM, Gregory Haase haa...@onefreevoice.com wrote:

 I was trying to figure out how to get the following syntax to work:
 
 echo select pg_start_backup('zfs_snapshot'); \\! zfs snapshot 
 zroot/zpgsql@test; \\ select pg_stop_backup(); | psql postgres

I do:

psql -c select pg_start_backup('whatever');  zfs snapshot pool/fs@sn  
psql -c select pg_stop_backup();

That way no need to shell out from psql :) 

 
 The above command successfully starts the backup and creates the snapshot but 
 then fails to stop the backup. I've tried various combinations of \ and \\ 
 here with different whitespace and I just can't seem to find a combination 
 that works. I don't understand the proper use of \\ (described as the 
 separator metacommand).

Keep in mind that echo \\  will actually only echo '\' because \ is a shell 
escape as well...

 
 However, in my research, I noted that a bunch of people seem to just not even 
 bother with pg_start_backup/pg_stop_backup and I guess aren't that worried 
 about the crash recovery process if they need to perform a restore. I also 
 find the omission of the start/stop backup functions from the File System 
 Level Backup page: http://www.postgresql.org/docs/9.2/static/backup-file.html
 
 Is the pg_start_backup() and pg_stop_backup() even necessary?
 

If all of your Postgres files are part of *the same* consistent snapshot (i.e. 
are on one FS that gets snapshotted), then the start/stop backup should not be 
necessary.  It will just look like a server crash instead.

pg_start_backup is used when you do not have filesystem snapshotting available, 
and is described in detail on the next manual page:

http://www.postgresql.org/docs/9.2/static/continuous-archiving.html



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best Postgresql books

2013-07-18 Thread Steven Schlansker
PostgreSQL 9.0 High Performance: http://www.amazon.com/books/dp/184951030X

On Jul 18, 2013, at 3:11 PM, Pedro Costa pedrocostaa...@sapo.pt wrote:

 Hi guys, 
 
 Can anyone tell me the best books about postgresql? Specialy about tunning 
 and querys performances
 Thanks
 
 
 
 Enviado do meu tablet Samsung Note



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] backend hangs at sendto() and can't be terminated

2013-07-11 Thread Steven Schlansker
On Jul 8, 2013, at 6:48 AM, Jov am...@amutu.com wrote:

 netstat show nothing about the socket of the process,so I think the TCP 
 timeout took effect.so it is really wired.
 
 Jov
 blog: http:amutu.com/blog
 
 
 2013/7/8 Tom Lane t...@sss.pgh.pa.us
 Merlin Moncure mmonc...@gmail.com writes:
 On Mon, Jul 8, 2013 at 4:56 AM, Jov am...@amutu.com wrote:
 my first post already try the pg_terminate_backend but failed:
 pg_terminate_backend return t but the backend still there.
 
 possibly a kernel problem?
 
 The backend will keep trying to send data until the kernel informs it
 the connection is lost.  (Anything else would be a bad idea.)  So the
 real question here is why it's taking so long for the TCP stack to
 decide that the client is gone.  I'm wondering what exactly you did
 to kill the psql session.  Most ordinary ways of killing a process
 should result in closure of whatever connections it had open.
 
 If you'd lost network connectivity to the client, a TCP timeout on the
 order of an hour wouldn't be surprising.  (If you feel this is too long,
 you can fool with the TCP keepalive parameters.)  But it seems unlikely
 that that's what's happening here.

Interestingly enough, I am seeing what may (or then again, may not) be a 
related problem.

I have a backend process stuck in a recvfrom --
[root@prd-db2a ~]# strace -tv -p 24402
Process 24402 attached - interrupt to quit
00:02:00 recvfrom(10, 

postgres 24402  0.0 10.7 3505628 2639032 ? Ss   Jul01   0:21 postgres: 
event event 10.29.62.21(39485) idle

It is a psql process that I launched from the command line 10 days ago:

steven   24401  0.0  0.0 166824  2532 pts/2TJul01   0:00 psql -U event 
-h prd-db2a.nessops.net -c delete from event where event_id in (select event_id 
from event where payload is null limit 10);

event=# select pid,application_name,backend_start,waiting,state,query from 
pg_stat_activity where pid=24402;
 pid  | application_name | backend_start | waiting | state |
query   


---+--+---+-+---+
--
24402 | psql | 2013-07-01 21:03:27.417039+00 | f   | idle  | 
delete from event where event_id in (select event_id from event where payload 
is null l
imit 10);
(1 row)

I invoked it with -c, which supposedly makes it exit when the single command is 
finished.  Many similar queries have been run, and I'd say they run for a half 
hour on average.

But the process has been alive for 10 days now, not blocked, but just idle.  It 
is connected over TCP from the local box (although not through the loopback 
interface, through a 10.x interface)  How does this make any sense?

The command seems to be immune to pg_cancel_backend, but pg_terminate_backend 
did manage to kill it.

If this problem is not related, please tell me to shove off, and I will not 
pollute this thread further.  But I hope some of this information is useful.

I am running PG 9.2.4, CentOS kernel 2.6.32.360.

Best,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] backend hangs at sendto() and can't be terminated

2013-07-11 Thread Steven Schlansker
On Jul 8, 2013, at 6:48 AM, Jov am...@amutu.com wrote:

 netstat show nothing about the socket of the process,so I think the TCP 
 timeout took effect.so it is really wired.
 
 Jov
 blog: http:amutu.com/blog
 
 
 2013/7/8 Tom Lane t...@sss.pgh.pa.us
 Merlin Moncure mmonc...@gmail.com writes:
  On Mon, Jul 8, 2013 at 4:56 AM, Jov am...@amutu.com wrote:
  my first post already try the pg_terminate_backend but failed:
  pg_terminate_backend return t but the backend still there.
 
  possibly a kernel problem?
 
 The backend will keep trying to send data until the kernel informs it
 the connection is lost.  (Anything else would be a bad idea.)  So the
 real question here is why it's taking so long for the TCP stack to
 decide that the client is gone.  I'm wondering what exactly you did
 to kill the psql session.  Most ordinary ways of killing a process
 should result in closure of whatever connections it had open.
 
 If you'd lost network connectivity to the client, a TCP timeout on the
 order of an hour wouldn't be surprising.  (If you feel this is too long,
 you can fool with the TCP keepalive parameters.)  But it seems unlikely
 that that's what's happening here.

Interestingly enough, I am seeing what may (or then again, may not) be a 
related problem.

I have a backend process stuck in a recvfrom --
[root@prd-db2a ~]# strace -tv -p 24402
Process 24402 attached - interrupt to quit
00:02:00 recvfrom(10, 

postgres 24402  0.0 10.7 3505628 2639032 ? Ss   Jul01   0:21 postgres: 
event event 10.29.62.21(39485) idle

It is a psql process that I launched from the command line 10 days ago:

steven   24401  0.0  0.0 166824  2532 pts/2TJul01   0:00 psql -U event 
-h prd-db2a.nessops.net -c delete from event where event_id in (select event_id 
from event where payload is null limit 10);

event=# select pid,application_name,backend_start,waiting,state,query from 
pg_stat_activity where pid=24402;
  pid  | application_name | backend_start | waiting | state |   
 query  
 
  
---+--+---+-+---+
--
 24402 | psql | 2013-07-01 21:03:27.417039+00 | f   | idle  | 
delete from event where event_id in (select event_id from event where payload 
is null l
imit 10);
(1 row)

I invoked it with -c, which supposedly makes it exit when the single command is 
finished.  Many similar queries have been run, and I'd say they run for a half 
hour on average.

But the process has been alive for 10 days now, not blocked, but just idle.  It 
is connected over TCP from the local box (although not through the loopback 
interface, through a 10.x interface)  How does this make any sense?

The command seems to be immune to pg_cancel_backend, but pg_terminate_backend 
did manage to kill it.

If this problem is not related, please tell me to shove off, and I will not 
pollute this thread further.  But I hope some of this information is useful.

I am running PG 9.2.4, CentOS kernel 2.6.32.360.

Best,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] unique constraint violations

2013-06-26 Thread Steven Schlansker

On Jun 26, 2013, at 11:04 AM, pg noob pgn...@gmail.com wrote:

 
 Hi all,
 
 There are some places in our application where unique constraint violations 
 are difficult to avoid due to multithreading.
 What we've done in most places to handle this is to retry in a loop.
 
 Generally it starts by checking if a value already exists, if not - try to 
 insert it, which may cause a unique violation, and if a constraint violation 
 occurs, retrying the process again.
 
 The trouble that we have with this approach is twofold.
 First, it causes errors to show up in the Postgres log about unique 
 constraint violations.  These errors are misleading to our support folks 
 because they look like errors but in fact they are handled (usually) by 
 retries in the code and don't cause any real problems.  We'd like these to be 
 handled without causing errors to show up in the logs.
 

We have solved this problem by leveraging the new SSI feature in Postgres 9.1 
(http://wiki.postgresql.org/wiki/SSI)

By running your insert / update inside of a serializable transaction, instead 
of getting unique failures, you get serialization failures.  These are 
expected and we just retry them.  Much less confusing log clutter, and no 
need to differentiate between expected and unexpected unique violations.

 Second, in some cases we've done a lot of work on a transaction before 
 hitting a unique constraint violation.
 If this happens the entire transaction gets rolled back and all the work 
 leading up to the constraint violation has to be redone.
 
 As a work around for this, I have proposed the following function which will 
 execute an arbitrary query (with no results returned) and catch a unique 
 constraint violation without causing errors to show up in the postgres log or 
 the transaction to get rolled back.  Now what the code does is to call this 
 function with the query to execute.  If a unique constraint violation 
 happens, the caller gets a non-zero return code back and this can be handled 
 but the DB transaction does not get rolled back.
 

The approach we took wouldn't solve this problem for you.  Serialization 
failures also require restarting the transaction.


You could also consider savepoints as a more lightweight way of undoing the 
bad updates -- 
http://www.postgresql.org/docs/9.2/interactive/sql-savepoint.html


Sounds like you might have a slightly different use case and our approach won't 
help you much, but I figured I'd mention it.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Index over only uncommon values in table

2013-06-18 Thread Steven Schlansker
Hi everyone,

I assume this is not easy with standard PG but I wanted to double check.

I have a column that has a very uneven distribution of values.  ~95% of the 
values will be the same, with some long tail of another few dozens of values.

I want to have an index over this value.  Queries that select the most common 
value will not use the index, because it is a overwhelming percentage of the 
table.  This means that ~95% of the disk space and IOPS to maintain the index 
is wasted.

I cannot use a hardcoded partial index because:
1) The common value is not known at schema definition time, and may change 
(very slowly) over time.
2) JDBC uses prepared statements for everything, and the value to be selected 
is not known at statement prepare time, so any partial indices are ignored 
(this is a really really obnoxious behavior and makes partial indices almost 
useless combined with prepared statements, sadly…)

The table size is expected to approach the 0.5 billion row mark within the next 
few months, hence my eagerness to save even seemingly small amounts of per-row 
costs.

Curious if anyone has a good way to approach this problem.
Thanks,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index over only uncommon values in table

2013-06-18 Thread Steven Schlansker

On Jun 18, 2013, at 12:23 PM, John R Pierce pie...@hogranch.com wrote:

 On 6/18/2013 12:17 PM, Steven Schlansker wrote:
 1) The common value is not known at schema definition time, and may change 
 (very slowly) over time.
 
 how could a value thats constant in 95% of the rows change, unless you added 
 20 times more rows with a new value (and for a big portion of the time, no 
 value would meet your 95% criteria).

The table is a denormalized version of some packed data.  The packed data is 
constant, but the extractor code changes over time.  The value in question is a 
extractor version used to create this row.

There is a periodic job that attempts to find batches of rows that have fields 
extracted by an old version of the extractor.  These rows are re-extracted from 
the packed data.

So, most of the time the vast majority of rows will have CURRENT_VERSION as 
their version, and a small percentage of rows will have a previous version.  
The job will select rows where extracted_version != CURRENT_VERSION.  If this 
query is not indexed, even doing a periodic check if any rows exist takes an 
absurd amount of time.

At some point, the code changes, and CURRENT_VERSION gets incremented.  Rows 
then slowly (over a period of days / weeks) get upgraded to the new current 
version, in batches of thousands.


This is what I mean by a very slowly changing mostly-constant value.

Hope that makes sense,
Steven




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index over only uncommon values in table

2013-06-18 Thread Steven Schlansker

On Jun 18, 2013, at 1:49 PM, David Johnston pol...@yahoo.com wrote:

 Steven Schlansker-3 wrote
 At some point, the code changes, and CURRENT_VERSION gets incremented. 
 Rows then slowly (over a period of days / weeks) get upgraded to the new
 current version, in batches of thousands.
 
 This is what I mean by a very slowly changing mostly-constant value.
 
 This seems insane without knowing the details.  This seems like it would be
 more of a cache invalidation problem.  What percentage of your rows are
 being updated multiple times without ever being queried for other reasons?

I am open to suggestions of how to do it better.  The problem I face is that
doing any sort of updates in one big go -- whether it be by ALTER TABLE
statements or large UPDATE queries -- is all but unworkable.  It takes days
or weeks depending on what the update is, so any locking causes the entire
system to grind to a halt.  And there is nothing more depressing than losing
5 days of work on a huge UPDATE because something hiccuped.

Hence, allowing outdated versions in the table, which then over time get 
upgraded
in reasonably-sized batches.

 
 I was going to say that table partitioning (INHERITS) seems like a
 possibility; then I thought maybe not; now I'm back to suggesting you
 consider it.
 
 Every version of the extractor would get its own table.  To upgrade you
 remove the record from the older table and add it to the newer one.  Maybe
 even consider calling the these version_upgraded to distinguish them from
 records originally insert using the newest version.  Or have original
 version as the partition key and a second current version field that
 varies.  Not sure how the planner would be able to use constraint exclusion
 to limiting the scanning though…
 

Interesting idea.  I have been trying to avoid making code changes require
schema changes as well -- it is very nice to not have to make schema changes 
for every
code deployment.  The code may get changed multiple times in the same day, if I 
am
busy hacking on it.  Having to muck around with table inheritance and changing 
partition
definitions on code deployments seems unpleasant.  Perhaps I am overestimating 
the work
involved, but I am very much trying to keep the deployment process as 
brain-dead-simple 
as possible.

Thanks for the input.
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index over only uncommon values in table

2013-06-18 Thread Steven Schlansker

On Jun 18, 2013, at 2:29 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Tue, Jun 18, 2013 at 12:17 PM, Steven Schlansker ste...@likeness.com 
 wrote:
 Hi everyone,
 
 I assume this is not easy with standard PG but I wanted to double check.
 
 I have a column that has a very uneven distribution of values.  ~95% of the 
 values will be the same, with some long tail of another few dozens of values.
 
 I want to have an index over this value.  Queries that select the most common 
 value will not use the index, because it is a overwhelming percentage of the 
 table.  This means that ~95% of the disk space and IOPS to maintain the index 
 is wasted.
 
 I cannot use a hardcoded partial index because:
 1) The common value is not known at schema definition time, and may change 
 (very slowly) over time.
 
 
 I think this is going to turn into a game of whack-a-mole.  There is going to 
 have to be some transition period during which all or most of the rows need 
 to be indexed.  So that amount of space is going to have to be available, and 
 given that it is available, what advantage is there to using it some of the 
 time and not using it some of the time? You can't feasibly use it for 
 something else during the off periods, because then you will run into 
 emergency out of space situations.

This is a good point.

I could define it further to e.g. only take values which make up  10% of the 
table, for example, which may solve this problem.

But from the responses I've been getting it sounds like there's nothing 
standard that will just solve my problem and it's not worth the effort to me 
to cook up something intelligent right now.

  
  
 2) JDBC uses prepared statements for everything, and the value to be selected 
 is not known at statement prepare time, so any partial indices are ignored 
 (this is a really really obnoxious behavior and makes partial indices almost 
 useless combined with prepared statements, sadly…)
 
 
 What version are you using?  This has been improved in 9.2.0.

Thank goodness!  We are in the process of migrating to the 9.2 branch now, so I 
am thrilled to get this.  I'm sorry I didn't mention the version earlier, funny 
how the simplest things escape your mind when you're trying to make a good 
mailing list post...

Thanks again for all the input.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] upsert functionality

2013-05-16 Thread Steven Schlansker

On May 15, 2013, at 11:52 PM, Thomas Kellerer spam_ea...@gmx.net wrote:

 Sajeev Mayandi, 16.05.2013 07:01:
 Hi,
 
 Our company is planning to move to postreSQL. We were initially using
 sybase where upsert functionality was available using insert on
 existing update clause.  I know there multiple ways to fix this
 using RULE or separate function in postgresql.  But I would like to
 know which version of postgresql has support for upsert planned using
 an official syntax. I have postgresql 9.2 which does not have this
 feature, if its planned in near future, I would rather wait to
 migrate to PostgreSQL.
 
 
 You can use writeable CTEs for this purpose.
 
 There are several examples out there:
 
 http://www.xzilla.net/blog/2011/Mar/Upserting-via-Writeable-CTE.html
 http://www.depesz.com/2011/03/16/waiting-for-9-1-writable-cte/
 http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/
 
 http://stackoverflow.com/a/8702291/330315

One thing I didn't see mentioned in two of the links -- they mention race
conditions, where multiple writers can still cause the faked UPSERT to fail.

This can be avoided using SERIALIZABLE transactions, now that Postgres has
SSI.  http://wiki.postgresql.org/wiki/SSI

I can vouch that we use writable CTEs and SERIALIZABLE to implement UPSERT
in production with no issues thus far.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Steven Schlansker

On May 10, 2013, at 7:14 AM, Matt Brock m...@mattbrock.co.uk wrote:

 Hello.
 
 We're intending to deploy PostgreSQL on Linux with SSD drives which would be 
 in a RAID 1 configuration with Hardware RAID.
 
 My first question is essentially: are there any issues we need to be aware of 
 when running PostgreSQL 9 on CentOS 6 on a server with SSD drives in a 
 Hardware RAID 1 configuration? Will there be any compatibility problems 
 (seems unlikely)? Should we consider alternative configurations as being more 
 effective for getting better use out of the hardware?
 
 The second question is: are there any SSD-specific issues to be aware of when 
 tuning PostgreSQL to make the best use of this hardware and software?
 

A couple of things I noticed with a similar-ish setup:

* Some forms of RAID / LVM break the kernel's automatic disk tuning mechanism.  
In particular, there is a rotational tunable that often does not get set 
right.  You might end up tweaking read ahead and friends as well.
http://www.mjmwired.net/kernel/Documentation/block/queue-sysfs.txt#112

* The default Postgres configuration is awful for a SSD backed database.  You 
really need to futz with checkpoints to get acceptable throughput.
The PostgreSQL 9.0 High Performance book is fantastic and is what I used to 
great success.

* The default Linux virtual memory configuration is awful for this 
configuration.  Briefly, it will accept a ton of incoming data, and then go 
through an awful stall as soon as it calls fsync() to write all that data to 
disk.  We had multi-second delays all the way through to the application 
because of this.  We had to change the zone_reclaim_mode and the dirty buffer 
limits.
http://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com



I am not sure that these numbers will end up being anywhere near what works for 
you, but these are my notes from tuning a 4xMLC SSD RAID-10.  I haven't proven 
that this is optimal, but it was way better than the defaults.  We ended up 
with the following list of changes:

* Change IO scheduler to noop
* Mount DB volume with nobarrier, noatime
* Turn blockdev readahead to 16MiB
* Turn sdb's rotational tuneable to 0

PostgreSQL configuration changes:
synchronous_commit = off
effective_io_concurrency = 4
checkpoint_segments = 1024
checkpoint_timeout = 10min
checkpoint_warning = 8min
shared_buffers = 32gb
temp_buffers = 128mb
work_mem = 512mb
maintenance_work_mem = 1gb

Linux sysctls:
vm.swappiness = 0
vm.zone_reclaim_mode = 0
vm.dirty_bytes = 134217728
vm.dirty_background_bytes = 1048576

Hope that helps,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Steven Schlansker

On May 10, 2013, at 11:38 AM, Merlin Moncure mmonc...@gmail.com wrote:
 
 PostgreSQL configuration changes:
 synchronous_commit = off
 
 
 that's good info, but it should be noted that synchronous_commit
 trades a risk of some data loss (but not nearly as much risk as
 volatile storage) for a big increase in commit performance.

Yes, that is a choice we consciously made.  If our DB server crashes losing the 
last few ms worth of transactions is an acceptable loss to us.  But that may 
not be OK for everyone :-)




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deploying PostgreSQL on CentOS with SSD and Hardware RAID

2013-05-10 Thread Steven Schlansker

On May 10, 2013, at 11:35 AM, Lonni J Friedman netll...@gmail.com wrote:
 
 I am not sure that these numbers will end up being anywhere near what works 
 for you, but these are my notes from tuning a 4xMLC SSD RAID-10.  I haven't 
 proven that this is optimal, but it was way better than the defaults.  We 
 ended up with the following list of changes:
 
 * Change IO scheduler to noop
 * Mount DB volume with nobarrier, noatime
 * Turn blockdev readahead to 16MiB
 * Turn sdb's rotational tuneable to 0
 
 PostgreSQL configuration changes:
 synchronous_commit = off
 effective_io_concurrency = 4
 checkpoint_segments = 1024
 checkpoint_timeout = 10min
 checkpoint_warning = 8min
 shared_buffers = 32gb
 temp_buffers = 128mb
 work_mem = 512mb
 maintenance_work_mem = 1gb
 
 Linux sysctls:
 vm.swappiness = 0
 vm.zone_reclaim_mode = 0
 vm.dirty_bytes = 134217728
 vm.dirty_background_bytes = 1048576
 
 Can you provide more details about your setup, including:
 * What kind of filesystem are you using?
 * Linux distro and/or kernel version
 * hardware (RAM, CPU cores etc)
 * database usage patterns (% writes, growth, etc)

Yes, as long as you promise not to just use my configuration without doing 
proper testing on your own system, even if it seems similar!

Linux version 2.6.32.225 (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) 
#2 SMP Thu Mar 29 16:43:20 EDT 2012
DMI: Supermicro X8DTN/X8DTN, BIOS 2.1c   10/28/2011
CPU0: Intel(R) Xeon(R) CPU   X5670  @ 2.93GHz stepping 02
Total of 24 processors activated (140796.98 BogoMIPS).(2 socket x 2 
hyperthread x 6 cores)
96GB ECC RAM

Filesystem is ext4 on LVM on hardware RAID 1+0 Adaptec 5405

Database is very much read heavy, but there is a base load of writes and bursts 
of much larger writes.  I don't have specifics regarding how it breaks down.  
The database is about 400GB and is growing moderately, maybe a few GB/day.  
More of the write traffic is re-writes rather than writes.

Hope that helps,
Steven



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why?

2013-05-01 Thread Steven Schlansker

On May 1, 2013, at 9:36 AM, Carlo Stonebanks stonec.regis...@sympatico.ca 
wrote:
 I have to ask myself, is it more likely that I have discovered some PG
 anomaly in 9.0 that no one has ever noticed, or that the client has
 accidentally launched the process twice and doesn't know it?

Given my (admittedly limited) experience with the PostgreSQL project and 
dealing with clients, the latter ;-)




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why?

2013-04-30 Thread Steven Schlansker

On Apr 30, 2013, at 4:00 PM, Carlo Stonebanks stonec.regis...@sympatico.ca 
wrote:

 Hi Tom,
 
 There's nothing obviously wrong with that, which means the issue is in
 something you didn't show us.  Care to assemble a self-contained example?
 
 
 Unfortunately, it happens erratically and very, very rarely so I can't give
 you something that will fail. I expected an occasional failure and there is
 a try-catch to handle it, I was just surprised when the client told me there
 was no other apps running against this table. I just wanted to make sure the
 logic was correct and that I wasn't doing something stupid or there is some
 known SQL or PG behaviour that would explain this.
 
 The only way I can see this happening is that an
 acache_mdx_logic_address_validation sneaks in before the insert and after
 the NOT EXISTS... SELECT. And for that to occur, the client must be mistaken
 and something else MUST be running and inserting into
 acache_mdx_logic_address_validation. 
 
 Would you agree, or is there anything else to consider?
 
 INSERT INTO 
mdx_lib.acache_mdx_logic_address_validation 
( 
   address, 
   postal_code, 
   address_id 
  ) 
 SELECT 
  '306 station 22 1 2 st' AS address, 
  '29482' AS postal_code, 
  100165016 AS address_id 
   WHERE 
  NOT EXISTS 
  ( SELECT 
 1 
  FROM 
 mdx_lib.acache_mdx_logic_address_validation 
  WHERE 
 address = '306 station 22 1 2 st' 
 AND postal_code = '29482' 
  ) 
 

I know you said that it was the only active application at the time, but I
figured I'd share my experience anyway…

We do a somewhat more involved version of this to provide fake UPSERT 
functionality,
and this failure mode happened more often than we'd guessed it would (due to 
concurrent updates).

However, new in 9.1 is SSI -- http://wiki.postgresql.org/wiki/Serializable
which means that if you run this transaction at isolation level SERIALIZABLE 
you will get
serialization failures instead of duplicate key exceptions, which makes it easy 
to retry until success.

So now we run any code that looks like this at SERIALIZABLE and are very happy 
with it.

Not sure if that'll help, but hopefully!

Best,
Steven

 
 -Original Message-
 From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
 Sent: April 25, 2013 5:06 PM
 To: Carlo Stonebanks
 Cc: pgsql-general@postgresql.org
 Subject: Re: Simple SQL INSERT to avoid duplication failed: why?
 
 Carlo Stonebanks stonec.regis...@sympatico.ca writes:
 Ok, I tried to be clever and I wrote code to avoid inserting duplicate
 data.
 The calling function has a try-catch to recover from this, but I am 
 curious as to why it failed:
 
 There's nothing obviously wrong with that, which means the issue is in
 something you didn't show us.  Care to assemble a self-contained example?
 
   regards, tom lane
 
 
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Using an index for IS DISTINCT FROM queries

2013-04-22 Thread Steven Schlansker
Hi everyone,
I have a large table (~150M rows) that keeps a version field.  At any given 
time, it is expected that the vast majority of the rows are on the current 
version, but some may be missing.

To figure out all the missing our outdated values, I run a query along the 
lines of 

SELECT id FROM source_table LEFT OUTER JOIN dest_table WHERE version IS 
DISTINCT FROM current-version

However, this query always selects a sequential scan and hash of both tables, 
which is *very* slow.

The statistics reflect that the planner knows that current-version is 
overwhelmingly common:

 schemaname | tablename | attname  | inherited | null_frac | avg_width 
| n_distinct | most_common_vals | most_common_freqs | histogram_bounds | 
correlation
+---+--+---+---+---++--+---+--+-
 public | event | unpacker_version | f | 0 | 4 
|  1 | {1}  | {1}   |  |
   1

but it doesn't help:

event= explain select count(1) from event where unpacker_version is distinct 
from 1;
QUERY PLAN
--
 Aggregate  (cost=10658841.93..10658841.94 rows=1 width=0)
   -  Seq Scan on event  (cost=0.00..10658841.93 rows=1 width=0)
 Filter: (unpacker_version IS DISTINCT FROM 1)
(3 rows)


I can by hand force the planner to consider the obvious solution (find values 
below, above, and null) but it is ugly:

event= explain select count(1) from event where unpacker_version  1 or 
unpacker_version  1 or unpacker_version is null;
   QUERY PLAN

 Aggregate  (cost=139.14..139.15 rows=1 width=0)
   -  Bitmap Heap Scan on event  (cost=135.13..139.14 rows=1 width=0)
 Recheck Cond: ((unpacker_version  1) OR (unpacker_version  1) OR 
(unpacker_version IS NULL))
 -  BitmapOr  (cost=135.13..135.13 rows=1 width=0)
   -  Bitmap Index Scan on event_unpacker_version_idx  
(cost=0.00..45.04 rows=1 width=0)
 Index Cond: (unpacker_version  1)
   -  Bitmap Index Scan on event_unpacker_version_idx  
(cost=0.00..45.04 rows=1 width=0)
 Index Cond: (unpacker_version  1)
   -  Bitmap Index Scan on event_unpacker_version_idx  
(cost=0.00..45.04 rows=1 width=0)
 Index Cond: (unpacker_version IS NULL)
(10 rows)

but this sucks to do such tuning by munging the query.  Is there some case 
where these are not equivalent?  If they are equivalent, would this be a 
reasonable case for the planner to consider and optimize by itself?

Thanks,
Steven

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_restore from split files

2013-04-22 Thread Steven Schlansker

On Apr 22, 2013, at 12:47 PM, akp geek akpg...@gmail.com wrote:

 pg_dump dbname -n schemaname -t table_name -Fc | split -b 500m -t table.dump

Since you split the files outside of the Postgres world, you have to combine 
them again.  Roughly,

cat table.dump.*  table.dump.combined
pg_restore --usual-arguments table.dump.combined


 On Mon, Apr 22, 2013 at 3:41 PM, Igor Neyman iney...@perceptron.com wrote:
 How exactly did you create “split” dump?
 
  
 
 Igor Neyman
 
  
 
 From: pgsql-general-ow...@postgresql.org 
 [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of akp geek
 Sent: Monday, April 22, 2013 3:36 PM
 To: pgsql-general
 Subject: [GENERAL] pg_restore from split files
 
  
 
 Hi All -
 
  
 
 I have created a dump of a big table into 5 split files.  What is the 
 procedure to restore them using pg_dump.
 
  
 
 I am getting the following error
 
  
 
 pg_restore: [custom archiver] could not read from input file: end of file
 
 pg_restore: *** aborted because of error
 
  
 
 Appreciate your help.
 
  
 
 Regards
 
 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Queries seldomly take 4s while normally take 1ms?

2013-04-09 Thread Steven Schlansker

On Apr 9, 2013, at 11:25 AM, Scott Marlowe scott.marl...@gmail.com wrote:

 One of the most common causes I've seen for this is linux's vm.*dirty* 
 settings to get in the way. Like so many linux kernel optimizations this 
 one looks good on paper but gives at best middling improvements with 
 occasional io storms that block everything else.  On big mem machines doing a 
 lot of writing IO I just set these to 0. Also tend to turn off swap as well 
 as it's known to get in the way as well.
 
 settings for /etc/sysctl.conf
 vm.dirty_background_ratio = 0
 vm.dirty_ratio = 0
 

I'll +1 on the you have to tune your Linux install advice.

I found the PostgreSQL 9.0 High Performance book to be worth its weight in 
gold.  A few days spent with the book and research on mailing lists improved 
our PostgreSQL performance multiple times over, and responsiveness under load 
by orders of magnitude.

http://www.amazon.com/PostgreSQL-High-Performance-Gregory-Smith/dp/184951030X

 
 
 On Tue, Apr 9, 2013 at 3:41 AM, Christian Hammers c...@lathspell.de wrote:
 Hello
 
 I have a setup with one master and two slaves which are used by a closed
 source application. The database is asked the same query, a stored procedure,
 with different parameters about 4 million times per second at a peak rate of
 150 times per second using 10 parallel connections. The slaves are decent
 Dell servers with 64GB RAM with dual hexacore CPUs and RAID10.
 
 Usually this stored procedure takes 1ms as it basically just does two
 selects against a GIST index for a prefix_range type. Seldomly though,
 about 1-3 times per day, one of these queries takes up to 4000ms!
 All those queries also runs in 1ms when executed manually some times later.
 Queries with similar parameters (who I supposed to use the same area of the
 index) also continues to run fast during that time. Queries with different
 paramers which are running parallel on different threads take 1ms, too,
 so it's not a general load problem.
 
 Cronjobs and other applications seem quiet during that time, there is
 no peak in any of our monitoring graphs. Automatic vacuum/analyze log
 entries on the master are not near the timestamps in question.
 
 So my problem seems not the query itself nor the way I indexed my data
 but what could it be? Some strange effects with streaming replication
 or cache invalidation?
 
 Apologies for not giving you reproducible problem but maybe you
 still have some ideas as I'm just curious as I've never seem such an
 effect during my MySQL years :-) The queries contain obvious customer
 data so I'm reluctant to give examples but again I doubt that
 an explain plan will help if only 1 out of 4E6 queries takes too long.
 
 bye,
 
 -christian-
 
 
 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general
 
 
 
 -- 
 To understand recursion, one must first understand recursion.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Testing Technique when using a DB

2013-03-13 Thread Steven Schlansker

On Mar 12, 2013, at 8:09 PM, Joe Van Dyk j...@tanga.com wrote:

 On Mar 12, 2013, at 8:42 AM, Perry Smith pedz...@gmail.com wrote:
 
 
 The other thought is perhaps there is a snap shot type concept.  I don't 
 see it in the list of SQL commands.  A snap shot would do exactly what it 
 sounds like.  It would take a snap shot and save it somehow.  Then a 
 restore to snap shot would restore the DB back to that state. 
 
 This would be super super super awesome, but it doesn't exist as far as I 
 know. This would be a permanent snapshot that could be easily and quickly 
 restored. 
 
 I wonder if it would be possible to make an extension that made this easy to 
 do. 

For what it's worth, I have something very similar to that to write JUnit tests 
in Java.
While that might not be much help to the original poster's Ruby environment, 
the code is
available as open source for anyone who thinks this is an interesting idea.

https://github.com/NessComputing/components-ness-pg

We got tired of maintaining a Postgres instance on every development box and 
trying to make sure they had reasonably close versions, so I hacked this 
together.

It's fairly specific to the infrastructure we use but if there is any interest 
I could spruce it up a bit, write some documentation, and make it more 
stand-alone.

Short description:

* Drop your SQL files in the class path somewhere (src/test/resources for Maven 
folk)
* Provides an embedded PostgreSQL instance that you can spin up and down on 
demand -- builds via shell script, auto detects platform
* The SQL files are loaded into a Postgres template database, and then cloning 
the databases can be done relatively cheaply and quickly.
* A fresh database is prepared in a thread in the background so one is always 
immediately available to test cases
* Provides a handy JUnit @Rule so you can mix it in to test cases easily




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] best config

2013-02-06 Thread Steven Schlansker

On Feb 6, 2013, at 8:14 AM, Roberto Scattini roberto.scatt...@gmail.com wrote:

 hi list,
 
 we have two new dell poweredge r720. based on recommendations from this list 
 we have configued the five disks in raid10 + 1 hot spare.

You might mention a bit more about how your drives are configured.  5 drives in 
a RAID1+0 sounds odd to me.

 
 now we are looking for advice in the postgres installation for our setup.
 
 we have two databases. one for a lot of small apps and one for one big app 
 with a lot of data and a lot of usage.
 we want to use streaming replication to have a functional copy of databases 
 in a failure.
 
 one of the ideas is to have one database running on each server, and then 
 have another instance of the other database running in streaming replication 
 (i mean, crossed replications).
 
 the other idea is to have both databases running in one server and backup 
 everything in the other with streaming replication.
 
 which alternative would you use?

I would not introduce the complexity of having each server be master for half 
of the data unless you can show that this improves some metric you care a lot 
about.  Any failure or maintenance event will revert you back to the common 
configuration -- back to having both masters on one system -- until you do 
another promotion back to the cross wired setup.  Extra work without a 
proposed gain.

Plus then you can get away with half as many Postgres installs to maintain.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] best config

2013-02-06 Thread Steven Schlansker

On Feb 6, 2013, at 9:55 AM, Roberto Scattini roberto.scatt...@gmail.com wrote:

 
 hi steven,
 
  we have two new dell poweredge r720. based on recommendations from this 
  list we have configued the five disks in raid10 + 1 hot spare.
 
 You might mention a bit more about how your drives are configured.  5 drives 
 in a RAID1+0 sounds odd to me.
 
 
 i mean, 4 disks in raid10, plus one disk as hot spare.
 
 also, wasn't this list where recommended this setup, was in debian-user.
  

That makes a lot more sense.  Nothing wrong with that setup :-)

 
  now we are looking for advice in the postgres installation for our setup.
 
  we have two databases. one for a lot of small apps and one for one big app 
  with a lot of data and a lot of usage.
  we want to use streaming replication to have a functional copy of databases 
  in a failure.
 
  one of the ideas is to have one database running on each server, and then 
  have another instance of the other database running in streaming 
  replication (i mean, crossed replications).
 
  the other idea is to have both databases running in one server and backup 
  everything in the other with streaming replication.
 
  which alternative would you use?
 
 I would not introduce the complexity of having each server be master for half 
 of the data unless you can show that this improves some metric you care a lot 
 about.  Any failure or maintenance event will revert you back to the common 
 configuration -- back to having both masters on one system -- until you do 
 another promotion back to the cross wired setup.  Extra work without a 
 proposed gain.
 
 Plus then you can get away with half as many Postgres installs to maintain.
 
 ok. we thought in this crossed-replication config because one heavy query in 
 one of the databases wouldnt affect the performance of the other. 

Both of your servers need to be powerful enough to handle the whole load, 
otherwise your replication setup will not continue to function acceptably when 
one of the servers is offline due to a crash or maintenance.

I don't think there is anything necessarily wrong with your proposal, I am just 
pointing out that simplicity is better than complexity unless you can prove 
(say, in a test environment) that your application actually performs better 
enough to justify the administrative cost with this cross-wired setup.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Using partial index in combination with prepared statement parameters

2013-02-03 Thread Steven Schlansker
Hi,

It's been covered a few times in the past,
http://www.postgresql.org/message-id/banlktimft4ohqkb6y7m4wqrffpwnutp...@mail.gmail.com
http://postgresql.1045698.n5.nabble.com/partial-indexes-not-used-on-parameterized-queries-td2121027.html

but in a nutshell, partial indices do not play nicely with prepared statements 
because
whether the index is valid or not cannot be known at query plan time.

I am curious if there is any development on this?  I have looked around but 
don't see any activity
(whether Simon's I have a patch but don't hold your breath or otherwise).

I ask largely because we use prepared statements everywhere and this makes it 
very hard
to use partial indices, which would offer us significant performance gains.

Does anyone know of any acceptable workaround?  Is there continued interest in 
maybe improving the PostgreSQL behavior in this case?

Thanks!
Steven Schlansker



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Combine Date and Time Columns to Timestamp

2013-01-18 Thread Steven Schlansker

On Jan 18, 2013, at 4:26 PM, Rich Shepard rshep...@appl-ecosys.com wrote:

 On Fri, 18 Jan 2013, Adrian Klaver wrote:
 
 How are they stored, as date and time type, strings, other?
 
 Adrian,
 
  ISO date and time.
 
 A sample of the data would help also.
 
  Example:  2012-10-29  |  10:19   | 2012-10-30  | 09:40


steven=# create temporary table date_test (d varchar, t varchar);
CREATE TABLE
steven=# insert into date_test values('2010-08-23', '8:04:33');
INSERT 0 1
steven=# select d::date + t::interval from date_test;
  ?column?   
-
 2010-08-23 08:04:33
(1 row)



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Noticed something odd with pgbench

2012-11-16 Thread Steven Schlansker

On Nov 16, 2012, at 11:59 AM, Richard Huxton d...@archonet.com wrote:

 On 16/11/12 19:35, Shaun Thomas wrote:
 Hey guys,
 
 So, we have a pretty beefy system that runs dual X5675's with 72GB of RAM. 
 After our recent upgrade to 9.1, things have been... odd. I managed to track 
 it down to one setting:
 
 shared_buffers = 8GB
 
 Does this match what you're seeing?
 
 http://frosty-postgres.blogspot.co.uk/2012/08/postgresql-numa-and-zone-reclaim-mode.html
 


(Slightly OT from the OP's question, sorry)

Would this be worth referencing in the PostgreSQL documentation?  I feel like 
I've read a lot of the documentation on Postgres tuning but this is news to me. 
 And surprise surprise I'm running an affected system too!  Maybe it deserves a 
more prominent warning, perhaps at 
http://www.postgresql.org/docs/9.2/interactive/performance-tips.html




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Confirming \timing output

2012-08-23 Thread Steven Schlansker

On Aug 23, 2012, at 11:13 AM, Gauthier, Dave dave.gauth...@intel.com wrote:

 With \timing set on, I run an update statement and it reports
  
 Time: 0.524 ms
  
 Is that really 0.524 ms?  As in 524 nanoseconds?

0.524ms = 524000ns

Perhaps you meant microseconds?

0.524ms = 524us

If all your data happens to be in RAM cache, simple queries can execute very 
fast!  Unless you have a reason to believe it's wrong, I would trust it to be 
accurate :-)

  
 Also, is this wallclock time or some sort of indication of how much cpu it 
 took?
  
 Thanks for any answers !
  


\timing measures wall time.  There's a more detailed discussion of the 
difference between this and e.g. EXPLAIN ANALYZE here:

http://postgresql.1045698.n5.nabble.com/What-does-timing-measure-td4289329.html





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ignore hash indices on replicas

2012-08-20 Thread Steven Schlansker

On Aug 19, 2012, at 8:01 PM, Scott Marlowe scott.marl...@gmail.com wrote:

 On Tue, Jul 10, 2012 at 1:09 AM, Steven Schlansker ste...@likeness.com 
 wrote:
 I'm using Postgres hash indices on a streaming replica master.
 As is documented, hash indices are not logged, so the replica does not have 
 access to them.
 
 I understand that the current wisdom is don't use hash indices, but 
 (unfortunately?) I have benchmarks that
 show that our particular application is faster by quite a bit when a hash 
 index is available.
 
 You could use a slony slave and have different indexes etc between
 master and slave but it's more complex to setup, maintain and monitor
 for most people.

Thanks for the suggestion, but we finally have replication working in a way we 
understand / like and I don't really consider this a viable option.  The 
built-in replication has been treating us very well.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ignore hash indices on replicas

2012-08-20 Thread Steven Schlansker

On Aug 19, 2012, at 2:37 PM, Jeff Davis pg...@j-davis.com wrote:

 On Tue, 2012-07-10 at 00:09 -0700, Steven Schlansker wrote:
 I understand that the current wisdom is don't use hash indices, but
 (unfortunately?) I have benchmarks that
 show that our particular application is faster by quite a bit when a
 hash index is available.
 
 Can you publish the results somewhere? It might provoke some interest.

I might be able to spend some time looking at making this public, but the 
general parameters are:

122M rows, lookup key is a UUID type.  Lookups are ~1000 random keys at a time 
(as in, a giant SELECT * FROM table WHERE key IN (?,?,?,?,…)

 
 I assume that fixing the hash index logging issue hasn't been a
 priority due to low interest / technical limitations, but I'm curious
 for a stopgap measure -- can we somehow configure Postgres to ignore
 hash indices on a replica, using other b-tree indices or even a
 sequential scan?  I know I can do this on a per-connection basis by
 disabling various index lookup methods, but it'd be nice if it just
 ignored invalid indices on its own.
 
 This might work for you:
 
 http://sigaev.ru/git/gitweb.cgi?p=plantuner.git;a=blob;hb=HEAD;f=README.plantuner

Thanks for the link; that looks interesting.  It is a bit unfortunate that I 
would have to find and exclude indices manually, but very doable...



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgres will not compile on Mac 10.8 with contrib/uuid-ossp

2012-07-30 Thread Steven Schlansker
It is not possible to compile Postgres contrib/uuid-ossp on the newest release 
of Mac OS X, 10.8

The specific compile error: 

make -C uuid-ossp install

/bin/sh ../../config/install-sh -c -d '/usr/local/Cellar/postgresql/9.1.3/lib'
/usr/bin/clang -Os -w -pipe -march=native -Qunused-arguments 
-I/usr/local/Cellar/ossp-uuid/1.6.2/include -Wall -Wmissing-prototypes 
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard  -I. -I. 
-I../../src/include -I/usr/local/Cellar/readline/6.2.2/include 
-I/usr/include/libxml2 -I/usr/include/libxml2   -c -o uuid-ossp.o uuid-ossp.c

In file included from uuid-ossp.c:27:
In file included from /usr/local/Cellar/ossp-uuid/1.6.2/include/uuid.h:38:
/usr/include/unistd.h:689:26: error: expected identifier
int  gethostuuid(uuid_t, const struct timespec *) 
__OSX_AVAILABLE_STARTING(__MAC_10_5, __IPHONE_2_0);
 ^
In file included from uuid-ossp.c:27:
/usr/local/Cellar/ossp-uuid/1.6.2/include/uuid.h:94:24: error: typedef 
redefinition with different types ('struct uuid_st' vs '__darwin_uuid_t' (aka 
'unsigned char [16]'))
typedef struct uuid_st uuid_t;
   ^
/usr/include/uuid/uuid.h:42:25: note: previous definition is here
typedef __darwin_uuid_t uuid_t;
^
In file included from uuid-ossp.c:27:
/usr/local/Cellar/ossp-uuid/1.6.2/include/uuid.h:107:22: error: conflicting 
types for 'uuid_compare'
extern uuid_rc_t uuid_compare  (const uuid_t  *_uuid, const uuid_t *_uuid2, 
int *_result);
 ^
/usr/include/uuid/uuid.h:59:5: note: previous declaration is here
int uuid_compare(const uuid_t uu1, const uuid_t uu2);
^
3 errors generated.
make[2]: *** [uuid-ossp.o] Error 1
make[1]: *** [install-uuid-ossp-recurse] Error 2
make[1]: *** Waiting for unfinished jobs….


Full log available here (not mine, but I have the exact same problem):
https://gist.github.com/2287209

There is a workaround (disable uuid-ossp) but obviously this does not help 
those of us who require UUID support.

There are a number of bugs open:
https://github.com/mxcl/homebrew/issues/13639
https://trac.macports.org/ticket/35153

and a proposed fix which does fix the problem for me:

 This is caused by the inclusion of the system uuid.h in pwd.h. Just add the 
 line :
 
 #define _XOPEN_SOURCE
 
 at the first line of uuid-ossp.c, and it should compile fine.


I am not sure what the correct fix is, but adding the above definition does 
fix the issue.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres will not compile on Mac 10.8 with contrib/uuid-ossp

2012-07-30 Thread Steven Schlansker

On Jul 30, 2012, at 7:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Steven Schlansker ste...@likeness.com writes:
 It is not possible to compile Postgres contrib/uuid-ossp on the newest 
 release of Mac OS X, 10.8
 
 This looks like some variant of the same issue that OSSP's uuid
 package has had on Macs all along, to wit an unreliable method for
 avoiding conflicts with any OS-supplied uuid type.  Back when we
 last looked at this,
 http://archives.postgresql.org/pgsql-hackers/2007-11/msg00551.php
 we concluded that what we had to do was not include unistd.h
 ahead of uuid.h, and as far as I can tell we're still not doing that;
 for me, contrib/uuid-ossp still builds fine on Lion with uuid 1.6.2.

Looks right to me.

 
 So assuming you're using a stock copy of 1.6.2,

(I am)

 that leaves unistd.h
 as the only variable in the equation that could have changed.  It's
 not obvious from here exactly how it changed, but in any case this
 fight is ultimately between OSSP uuid and OS X; there's not a lot
 Postgres can (or should) do to fix it.  It's a bit distressing that
 OSSP doesn't seem to have made any new uuid releases since 2008,
 but maybe if you prod them they'll do something about this.
 
   regards, tom lane

I've sent a message upstream.  It seems that both their bug tracker and forum 
are either
nonexistent or so broken that they seem so.  So I will work on the assumption 
that this bug
won't get fixed upstream…

Would it be reasonable to include the _XOPEN_SOURCE define in the contrib 
module?
It at least fixes this issue and hopefully wouldn't cause any ill effects.

There's a patch at 
https://github.com/stevenschlansker/homebrew/commit/d86e17cbcc5d287d7a393a6754aa8f94b995c5ea

It's not terribly extensively tested but it does compile and I would be very 
surprised if it caused problems.
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Replication/cloning: rsync vs modification dates?

2012-07-16 Thread Steven Schlansker
I think it's pretty easy to show that timestamp+size isn't good enough to do 
this 100% reliably.

Imagine that your timestamps have a millisecond resolution.  I assume this will 
vary based on OS / filesystem, but the point remains the same no matter what 
size it is.

You can have multiple writes occur in the same quantized instant.

If the prior rsync just happened to catch the first write (at T+0.1ms) in that 
instant but not the second (which happened at T+0.4ms), the second may not be 
transferred.  But the modification time is the same for the two writes.

All that said, I think the chances of this actually happening is vanishingly 
small.  I personally use rsync without checksums and have had no problems.

On Jul 16, 2012, at 2:42 PM, Chris Angelico wrote:

 On Tue, Jul 17, 2012 at 4:35 AM, Sergey Konoplev
 sergey.konop...@postgresql-consulting.com wrote:
 On Mon, Jul 16, 2012 at 8:01 PM, Chris Angelico ros...@gmail.com wrote:
 On Tue, Jul 17, 2012 at 1:58 AM, Michael Nolan htf...@gmail.com wrote:
 As I understand the docs for rsync, it will use both mod time and file size
 if told not to do checksums.
 
 I wonder if it is correct in general to use mtime and size to perform
 these checks from the point of view of PostgreSQL.
 
 If it works with the current version then is there a guaranty that it
 will work with the future versions?
 
 That was my exact question. Ideally, I'd like to hear from someone who
 works with the Postgres internals, but the question may not even be
 possible to answer.
 
 ChrisA
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Ignore hash indices on replicas

2012-07-10 Thread Steven Schlansker
I'm using Postgres hash indices on a streaming replica master.
As is documented, hash indices are not logged, so the replica does not have 
access to them.

I understand that the current wisdom is don't use hash indices, but 
(unfortunately?) I have benchmarks that
show that our particular application is faster by quite a bit when a hash index 
is available.

I assume that fixing the hash index logging issue hasn't been a priority due to 
low interest / technical limitations, but I'm curious for a stopgap measure -- 
can we somehow configure Postgres to ignore hash indices on a replica, using 
other b-tree indices or even a sequential scan?  I know I can do this on a 
per-connection basis by disabling various index lookup methods, but it'd be 
nice if it just ignored invalid indices on its own.

I've not seen much reference to this problem around, but I do apologize if I've 
missed it in the manual or it is extremely obvious how you do this :)

Thanks,
Steven


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Suboptimal query plan fixed by replacing OR with UNION

2012-07-09 Thread Steven Schlansker
On Jul 6, 2012, at 9:24 PM, Gurjeet Singh wrote:

 On Thu, Jul 5, 2012 at 7:16 PM, Steven Schlansker ste...@likeness.com wrote:
 
 On Jul 5, 2012, at 3:51 PM, Tom Lane wrote:
 
  Steven Schlansker ste...@likeness.com writes:
  Why is using an OR so awful here?
 
  Because the OR stops it from being a join (it possibly needs to return
  some rows that are not in the semijoin of the two tables).
 
  Why does it pick a sequential scan?  Is this an optimizer bug
 
  No.  It can't transform OR into a UNION because the results might not
  be the same.  I assume you don't care about removal of duplicates, or
  have some reason to know that there won't be any ... but the planner
  doesn't know that.
 
 
 Thanks for the insight here.  It still seems unfortunate that it picks a
 sequential scan -- but if there really is no more efficient way to do this,
 I will just rewrite the query.
 
 It might not be applicable to this case (because of the use of ANY in second 
 branch of OR clause), but some databases provide a feature called 
 OR-Optimization, where the optimizer breaks up the query at OR clause 
 boundaries and uses UNION ALL operator to join the resulting queries, just 
 like you did. Optimizer does need to add additional AND clauses to some of 
 the branches to make sure the result set is not affected.
 

That sounds like a great optimization for Postgres, but unfortunately it's far 
outside of my skill set / time to contribute, so I'd have to wait for a real 
PG dev to get to it :)

 Just a thought.
 -- 
 Gurjeet Singh
 EnterpriseDB Corporation
 The Enterprise PostgreSQL Company
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Suboptimal query plan fixed by replacing OR with UNION

2012-07-06 Thread Steven Schlansker

On Jul 5, 2012, at 6:35 PM, Jasen Betts wrote:

 I note you've decided to rewrite this query as a union 
 
 SELECT * FROM account
  WHERE user_id in 
(SELECT user_id FROM account 
  WHERE id = 
 ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'))
  OR
id = 
 ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}');
 
 I notice both arrays (used with = ANY) have the exact same content,
 
 if this is always true you can use a CTE here for the ID=ANY(...)
 query and reference the CTE on both sides of the union.
 

Thanks for the idea!  I'll be sure to incorporate that.  Doesn't fix the 
unfortunate behavior with OR, though.

 WITH i as (
 SELECT * FROM account WHERE id = 
 ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}')
 )
 SELECT 
 * from i
 UNION DISTINCT 
 SELECT
 account.* from account join i on i.user_id = account.userid ;
 
 -- 
 ⚂⚃ 100% natural
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Suboptimal query plan fixed by replacing OR with UNION

2012-07-05 Thread Steven Schlansker
)
 Index Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   -  Bitmap Heap Scan on account  (cost=37.20..3188.55 rows=803 width=160)
 Recheck Cond: (user_id = public.account.user_id)
 -  Bitmap Index Scan on account_user_id_idx  (cost=0.00..37.00 
rows=803 width=0)
   Index Cond: (user_id = public.account.user_id)
(10 rows)

ness_user=# explain SELECT * FROM account WHERE id = 
ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}');
 QUERY 
PLAN  
-
 Bitmap Heap Scan on account  (cost=17.56..29.58 rows=3 width=160)
   Recheck Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   -  Bitmap Index Scan on account_id_user_id_idx  (cost=0.00..17.56 rows=3 
width=0)
 Index Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
(4 rows)

(where reasonable is defined as not a sequential scan)

Upon seeing this -- I had a crazy idea.  What if I just paste them together 
with a UNION DISTINCT?

ness_user=# explain SELECT * FROM account WHERE 
ness_user-# user_id in (SELECT user_id FROM account WHERE id = 
ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'))
 UNION DISTINCT
ness_user-# SELECT * FROM account WHERE
ness_user-# id = 
ANY('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}');

 QUERY PLAN 
 
-
 HashAggregate  (cost=3342.22..3366.35 rows=2413 width=160)
   -  Append  (cost=66.79..3281.90 rows=2413 width=160)
 -  Nested Loop  (cost=66.79..3228.18 rows=2410 width=160)
   -  HashAggregate  (cost=29.59..29.60 rows=1 width=16)
 -  Bitmap Heap Scan on account  (cost=17.56..29.58 rows=3 
width=16)
   Recheck Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   -  Bitmap Index Scan on account_id_user_id_idx  
(cost=0.00..17.56 rows=3 width=0)
 Index Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   -  Bitmap Heap Scan on account  (cost=37.20..3188.55 rows=803 
width=160)
 Recheck Cond: (user_id = public.account.user_id)
 -  Bitmap Index Scan on account_user_id_idx  
(cost=0.00..37.00 rows=803 width=0)
   Index Cond: (user_id = public.account.user_id)
 -  Bitmap Heap Scan on account  (cost=17.56..29.58 rows=3 width=160)
   Recheck Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
   -  Bitmap Index Scan on account_id_user_id_idx  
(cost=0.00..17.56 rows=3 width=0)
 Index Cond: (id = ANY 
('{-02f6-379d-c000-00026810,-0320-b467-c000-00026810,-000d-cefb-c000-00026810}'::uuid[]))
(16 rows)


Wow!  Changing the query from using an OR clause to a UNION DISTINCT with two 
SELECTs reduced the cost from 1379485.60 to 3366.35!  And the gains are 
realized when you actually execute the query.

Why is using an OR so awful here?  Why does it pick a sequential scan?  Is this 
an optimizer bug or have I missed something in my queries?

Thanks much for any advice,
Steven Schlansker


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Suboptimal query plan fixed by replacing OR with UNION

2012-07-05 Thread Steven Schlansker

On Jul 5, 2012, at 3:51 PM, Tom Lane wrote:

 Steven Schlansker ste...@likeness.com writes:
 Why is using an OR so awful here?
 
 Because the OR stops it from being a join (it possibly needs to return
 some rows that are not in the semijoin of the two tables).
 
 Why does it pick a sequential scan?  Is this an optimizer bug
 
 No.  It can't transform OR into a UNION because the results might not
 be the same.  I assume you don't care about removal of duplicates, or
 have some reason to know that there won't be any ... but the planner
 doesn't know that.
 

Thanks for the insight here.  It still seems unfortunate that it picks a
sequential scan -- but if there really is no more efficient way to do this,
I will just rewrite the query.

Steven


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general