Re: [GENERAL] Transaction ID Wraparound Monitoring

2015-08-03 Thread William Dunn
Hello Jan,

I think your calculation is slightly off because per the docs when
PostgreSQL comes within 1 million of the age at which an actual wraparound
occurs it will go into the safety shutdown mode. Thus the calculation
should be ((2^32)-1)/2-100 rather than just ((2^32)-1)/2 as I think you
are using.

When I first started building out my group's PostgreSQL monitoring solution
I too found the wording of transaction freeze to be a bit difficult to
understand. For my team's internal documentation I have summarized it as
follows, I hope it might be more clear:

...normal XIDs are compared using modulo-2^32 arithmetic, which means that
~(2^32-1)/2- transactions appear in the future and ~(2^32-1)/2 transactions
appear in the past.

This [Transaction ID freeze] behavior of autovacuum is primarily dependent
on the settings autovacuum_freeze_table_age and autovacuum_freeze_max_age,
which are set as database defaults but can also be specified on a per table
basis (as storage parameters in CREATE TABLE or ALTER TABLE)

   - When a table's oldest transaction reaches autovacuum_freeze_table_age,
   the next autovacuum that is performed on that table will be a vacuum freeze
  - PostgreSQL implicitly caps autovacuum_freeze_table_age at
  0.95*autovacuum_freeze_max_age.
   - When a table reaches autovacuum_freeze_max_age PostgreSQL will force
   an autovacuum freeze on that table, even if the table would not otherwise
   be autovacuumed or autovacuum is disabled.
  - PostgreSQL implicitly caps autovacuum_freeze_max_age at 2 billion
  (20)

The actual age that a wraparound occurs is ((2^32)/2)-1. When a PostgreSQL
database comes within 1 million of this age (2^32/2-1-100) the database
will go into the safety shutdown mode" and no longer accept commands,
including the vacuum commands, and your only recovery option is to stop the
server and use a single-user backend (where shutdown mode is not enforced)
to execute VACUUM. This should, obviously, be avoided at all costs.

References:

   -
   
http://www.PostgreSQL.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
   -
   
http://www.PostgreSQL.org/docs/current/static/runtime-config-client.html#GUC-VACUUM-FREEZE-TABLE-AGE
   -
   
http://www.PostgreSQL.org/docs/current/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-FREEZE-MAX-AGE


Based on the above explanation we consider the following to be the most
correct check for how close you are to an actual wraparound freeze:

CAST (age(relfrozenxid) AS real) / CAST(trunc(((2^32)/2)-1-100) AS
real) AS perc_until_wraparound_server_freeze


(Note that we do this at the table level rather than the database level
like you did, though, so that we have the information we need to tune the
settings for individual tables.)

However it is better to set autovacuum max freeze age well below that value
and monitor that instead. Autovacuum should always do a vacuum freeze for a
table that has exceeded max freeze age, and if you are monitoring for that
you should avoid a wrap around freeze:

CAST (age(relfrozenxid) AS real) / CAST ((least(autovacuum_freeze_max_age,
20)) AS real) AS perc_until_freeze_max_age


And ensure that value does not exceed 100%. Though it is important to note
that max freeze age can be set on a per table basis, so to get the true
autovacuum_freeze_max_age of a table (or the real max of the database) you
would need to check the reloptions field of pg_class for that table and
only if there is no value specified for '%autovacuum_freeze_table_age%' use
current_setting('autovacuum_freeze_max_age')


Let me know if I missed something or can clarify it further.

*Will J. Dunn*
*willjdunn.com *

On Thu, Jul 30, 2015 at 11:44 AM, Adrian Klaver 
wrote:

> On 07/30/2015 08:41 AM, Jan Keirse wrote:
>
>> On Thu, Jul 30, 2015 at 2:56 PM, Adrian Klaver
>>  wrote:
>>
>>> On 07/30/2015 02:55 AM, Jan Keirse wrote:
>>>

 Hello,

 we have some very write heavy databases and I have our monitoring
 system watch the transaction age of my databases to be alerted before
 we get into problems in case autovacuum can't keep up to avoid
 transaction ID wraparound.

 The query I am executing is this:
 SELECT max(age(pg_database.datfrozenxid)) / 2147483648.0 * 100.0 AS
 "Percentage of transaction ID's used" FROM pg_database;

 My believe was that if this reaches 100 the database will stop
 accepting writes and one must vacuum. I have set alerts on 50 and 90,
 the result is around 9 so my believe was autovacuum is working fine
 for my workload.
 I often see autovacuum kicking in to prevent XID Wraparround, I
 thought that was just to be on the safe side and vacuum well before
 it's too late.

 However today I saw this post:


 http://blog.getsentry.com/2015/07/23/transaction-id-wraparound-in-postgres.html

 The following line has me worried:
 ... that dat

Re: [GENERAL] Setting up HA postgresql

2015-07-23 Thread William Dunn
Thanks so much for lending your expertise Tatsuo.

Aviel I have found and linked below the relevant section of the official
pgpool-II docs on using pgpool-II for failover. It is available starting in
pgpool-II version 3.1:
http://www.pgpool.net/docs/latest/pgpool-en.html#stream

*Will J. Dunn*

On Tue, Jul 21, 2015 at 10:07 PM, Tatsuo Ishii  wrote:

> > But it appears that the fail condition for "watchdog" is the failure of a
> > pgpool-II instance. In the configuration described in the wiki you would
> > put a pgpool-II instance on each Postgres node, and if one of the
> pgpool-II
> > instances fails it executes a script (which can create the trigger file
> to
> > promote the standby to master). However, if the fail condition for
> watchdog
> > is a failure of the pgpool-II instance what happens if the DBMS has
> > availability issues but the pgpool-II process on that server is ok? The
> > fail condition would never be triggered, right?
>
> No. The condition (PostgreSQL failure) will be detected by one of
> pgpool-II (not watchdog process) and the pgpool-II will disconnect the
> failed PostgreSQL, then execute failover script (if defined) which
> will usually promote standby PostgreSQL (if failed PostgreSQL was
> primary) or just ignore the broken PostgreSQL if it was standby.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > *Will J. Dunn*
> > *willjdunn.com <http://willjdunn.com>*
> >
> > On Tue, Jul 21, 2015 at 4:27 PM, Joshua D. Drake 
> > wrote:
> >
> >>
> >> On 07/21/2015 01:21 PM, William Dunn wrote:
> >>
> >>> That's pretty cool! But the intended use of watchdog is so you can have
> >>> multiple pgpool-II instances and failover among them
> >>> (http://www.pgpool.net/docs/latest/pgpool-en.html#watchdog) rather
> than
> >>> failure of Postgres. In the configuration described in that wiki what
> >>> happens when the DBMS goes down but pgpool-II is fine? The watchdog
> >>> appears to be monitoring the pgpool-II process, not the
> >>> postgres/postmaster process.
> >>>
> >>
> >> From what I see, it has the ability to execute a command/script based
> on a
> >> failed condition which could include but not be limited to a command to
> >> fail over a standby.
> >>
> >>
> >> JD
> >>
> >> --
> >> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
> >> PostgreSQL Centered full stack support, consulting and development.
> >> Announcing "I'm offended" is basically telling the world you can't
> >> control your own emotions, so everyone else should do it for you.
> >>
>


Re: [GENERAL] Setting up HA postgresql

2015-07-21 Thread William Dunn
Maybe Linux-HA which you recommended is the more promising option for open
source tool.
http://www.linux-ha.org/wiki/Main_Page

The Postgres resource agent appears to monitor the instance by executing
'SELECT now();' which is typically the recommended check. Though, I do not
know Linux-HA well enough to know of any limitations or whether it should
be recommend
http://linux-ha.org/doc/man-pages/re-ra-pgsql.html

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, Jul 21, 2015 at 5:35 PM, William Dunn  wrote:

> On Tue, Jul 21, 2015 at 5:12 PM, Joshua D. Drake 
>  wrote:
>>
>>
>> "When backend node status changes by failover etc., watchdog notifies the
>> information to other pgpool-IIs and synchronizes them. When online recovery
>> occurs, watchdog restricts client connections to other pgpool-IIs for
>> avoiding inconsistency between backends."
>
>
> You're taking that out of context, though. That is coming from the doc
> http://www.pgpool.net/docs/latest/pgpool-en.html#watchdog, which is
> describing how Watchdog coordinates fault of *pgpool-II* backend
> processes and coordinates the pgpool-II instances with each other. It is
> for maintaining availability of pgpool-II and monitoring for failure of
> pgpool-II backends, not Postgres/postmaster.
>
> *Will J. Dunn*
> *willjdunn.com <http://willjdunn.com>*
>
> On Tue, Jul 21, 2015 at 5:12 PM, Joshua D. Drake 
> wrote:
>
>>
>> On 07/21/2015 01:37 PM, William Dunn wrote:
>>
>>> But it appears that the fail condition for "watchdog" is the failure of
>>> a pgpool-II instance. In the configuration described in the wiki you
>>> would put a pgpool-II instance on each Postgres node, and if one of the
>>> pgpool-II instances fails it executes a script (which can create the
>>> trigger file to promote the standby to master). However, if the fail
>>> condition for watchdog is a failure of the pgpool-II instance what
>>> happens if the DBMS has availability issues but the pgpool-II process on
>>> that server is ok? The fail condition would never be triggered, right?
>>>
>>
>> What I am going off of is:
>>
>> "When backend node status changes by failover etc., watchdog notifies the
>> information to other pgpool-IIs and synchronizes them. When online recovery
>> occurs, watchdog restricts client connections to other pgpool-IIs for
>> avoiding inconsistency between backends."
>>
>> A backend node status would change if say the master went down, at that
>> point pgpool can be instructed either automatically or manually to failover
>> (depending on configuration) and allow the slave to failover.
>>
>> Sincerely,
>>
>>
>> JD
>>
>> --
>> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
>> PostgreSQL Centered full stack support, consulting and development.
>> Announcing "I'm offended" is basically telling the world you can't
>> control your own emotions, so everyone else should do it for you.
>>
>
>


Re: [GENERAL] Setting up HA postgresql

2015-07-21 Thread William Dunn
On Tue, Jul 21, 2015 at 5:12 PM, Joshua D. Drake 
 wrote:
>
>
> "When backend node status changes by failover etc., watchdog notifies the
> information to other pgpool-IIs and synchronizes them. When online recovery
> occurs, watchdog restricts client connections to other pgpool-IIs for
> avoiding inconsistency between backends."


You're taking that out of context, though. That is coming from the doc
http://www.pgpool.net/docs/latest/pgpool-en.html#watchdog, which is
describing how Watchdog coordinates fault of *pgpool-II* backend processes
and coordinates the pgpool-II instances with each other. It is for
maintaining availability of pgpool-II and monitoring for failure of
pgpool-II backends, not Postgres/postmaster.

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, Jul 21, 2015 at 5:12 PM, Joshua D. Drake 
wrote:

>
> On 07/21/2015 01:37 PM, William Dunn wrote:
>
>> But it appears that the fail condition for "watchdog" is the failure of
>> a pgpool-II instance. In the configuration described in the wiki you
>> would put a pgpool-II instance on each Postgres node, and if one of the
>> pgpool-II instances fails it executes a script (which can create the
>> trigger file to promote the standby to master). However, if the fail
>> condition for watchdog is a failure of the pgpool-II instance what
>> happens if the DBMS has availability issues but the pgpool-II process on
>> that server is ok? The fail condition would never be triggered, right?
>>
>
> What I am going off of is:
>
> "When backend node status changes by failover etc., watchdog notifies the
> information to other pgpool-IIs and synchronizes them. When online recovery
> occurs, watchdog restricts client connections to other pgpool-IIs for
> avoiding inconsistency between backends."
>
> A backend node status would change if say the master went down, at that
> point pgpool can be instructed either automatically or manually to failover
> (depending on configuration) and allow the slave to failover.
>
> Sincerely,
>
>
> JD
>
> --
> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
> Announcing "I'm offended" is basically telling the world you can't
> control your own emotions, so everyone else should do it for you.
>


Re: [GENERAL] Setting up HA postgresql

2015-07-21 Thread William Dunn
But it appears that the fail condition for "watchdog" is the failure of a
pgpool-II instance. In the configuration described in the wiki you would
put a pgpool-II instance on each Postgres node, and if one of the pgpool-II
instances fails it executes a script (which can create the trigger file to
promote the standby to master). However, if the fail condition for watchdog
is a failure of the pgpool-II instance what happens if the DBMS has
availability issues but the pgpool-II process on that server is ok? The
fail condition would never be triggered, right?

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, Jul 21, 2015 at 4:27 PM, Joshua D. Drake 
wrote:

>
> On 07/21/2015 01:21 PM, William Dunn wrote:
>
>> That's pretty cool! But the intended use of watchdog is so you can have
>> multiple pgpool-II instances and failover among them
>> (http://www.pgpool.net/docs/latest/pgpool-en.html#watchdog) rather than
>> failure of Postgres. In the configuration described in that wiki what
>> happens when the DBMS goes down but pgpool-II is fine? The watchdog
>> appears to be monitoring the pgpool-II process, not the
>> postgres/postmaster process.
>>
>
> From what I see, it has the ability to execute a command/script based on a
> failed condition which could include but not be limited to a command to
> fail over a standby.
>
>
> JD
>
> --
> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
> Announcing "I'm offended" is basically telling the world you can't
> control your own emotions, so everyone else should do it for you.
>


Re: [GENERAL] Setting up HA postgresql

2015-07-21 Thread William Dunn
That's pretty cool! But the intended use of watchdog is so you can have
multiple pgpool-II instances and failover among them (
http://www.pgpool.net/docs/latest/pgpool-en.html#watchdog) rather than
failure of Postgres. In the configuration described in that wiki what
happens when the DBMS goes down but pgpool-II is fine? The watchdog appears
to be monitoring the pgpool-II process, not the postgres/postmaster process.

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, Jul 21, 2015 at 3:31 PM, Joshua D. Drake 
wrote:

>
> On 07/21/2015 11:04 AM, William Dunn wrote:
>
>> If you dig deeper into pgpool-II you will find that it does not have
>> failover logic. Its intention is to pool connections and distribute
>> query load among replicas, but it cannot differentiate node failure from
>> network partition and cannot promote a standby to master in the case of
>> failure.
>>
>
> If you dig deeper, I think you would find your statement is no longer
> accurate:
>
>
> http://www.pgpool.net/pgpool-web/contrib_docs/watchdog_master_slave_3.3/en.html
>
>
> JD
>
> --
> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
> Announcing "I'm offended" is basically telling the world you can't
> control your own emotions, so everyone else should do it for you.
>


Re: [GENERAL] Promoting 1 of 2 slaves

2015-07-21 Thread William Dunn
As I am aware, you would have two options depending on your configuration:

   1. Change the primary_conninfo value on the second standby's
   recovery.conf to point to the standby that has been promoted to master.
   However, I think this would require that instance to be rebooted for the
   configuration changes to take effect
   2. Modify the DNS entries so that the host value in primary_conninfo points
   to the new master rather than the old master

Does anyone know of another way?

*Will J. Dunn*
*willjdunn.com *

On Fri, Jul 17, 2015 at 9:43 AM, Joseph Kregloh 
wrote:

> Hello,
>
> I am getting ready to take my master server out of service today and
> promote the slave to take it's place. In my current setup I have one master
> server which ships WAL files to two separate slave servers.
>
> My plan is to promote one of those slaves. How do I get the slave that I
> promoted to start replicating to the other slave the moment I trigger it's
> promotion?
>
> Thanks,
> -Joseph Kregloh
>


Re: [GENERAL] Setting up HA postgresql

2015-07-21 Thread William Dunn
Implementing failover logic is quite simple, and probably even preferred
over a pre-built solution because you can implement it in the way that
integrates well with your architecture.

The basic logic is as follows:

   - On the standby server you would configure the "trigger" file. When you
   create a trigger file with the name you specify the standby will promote to
   master.
   - You have your queries send to the master.
   - If your application starts failing to connect to the master you would
   have logic evaluate why
   - When the master goes down you should
  - Generate a trigger file to promote the standby to master
  - Modify your DNS entries to have your applications point to the new
  master


*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, Jul 21, 2015 at 12:20 PM, Aviel Buskila  wrote:

> hey will,
> Is there any open-source tool instead of developing the fail-over logic by
> myself?
>
> 2015-07-21 18:34 GMT+03:00 William Dunn :
>
>> Hello Aviel,
>>
>> On Tue, Jul 21, 2015 at 3:56 AM, Aviel Buskila  wrote:
>>>
>>> How can I set a highly available postgresql in a share-nothing
>>> architecture?
>>>
>>
>> I suggest you review the official documentation on high-availability
>> configurations linked below:
>> http://www.postgresql.org/docs/current/static/high-availability.html
>>
>> The most common configuration is to use PostgreSQL's built in
>> master/standby streaming replication. However you will need to develop any
>> fail-over logic you need yourself or use a third party tool such as
>> EnterpriseDB's Failover Manager.
>>
>> *Will J. Dunn*
>> *willjdunn.com <http://willjdunn.com>*
>>
>> On Tue, Jul 21, 2015 at 3:56 AM, Aviel Buskila  wrote:
>>
>>> Hey,
>>> How can I set a highly available postgresql in a share-nothing
>>> architecture?
>>>
>>>
>>>
>>> Best regards,
>>> Aviel B.
>>>
>>
>>


Re: [GENERAL] Setting up HA postgresql

2015-07-21 Thread William Dunn
If you dig deeper into pgpool-II you will find that it does not have
failover logic. Its intention is to pool connections and distribute query
load among replicas, but it cannot differentiate node failure from network
partition and cannot promote a standby to master in the case of failure.

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, Jul 21, 2015 at 1:55 PM, Aviel Buskila  wrote:

> Can you link me up to a good tutorial using pgpool-II?
>
> 2015-07-21 20:02 GMT+03:00 Joshua D. Drake :
>
>>
>> On 07/21/2015 08:34 AM, William Dunn wrote:
>>
>>> Hello Aviel,
>>>
>>> On Tue, Jul 21, 2015 at 3:56 AM, Aviel Buskila >> <mailto:avie...@gmail.com>> wrote:
>>>
>>> How can I set a highly available postgresql in a share-nothing
>>> architecture?
>>>
>>> I suggest you review the official documentation on high-availability
>>> configurations linked below:
>>> http://www.postgresql.org/docs/current/static/high-availability.html
>>>
>>> The most common configuration is to use PostgreSQL's built in
>>> master/standby streaming replication. However you will need to develop
>>> any fail-over logic you need yourself or use a third party tool such as
>>> EnterpriseDB's Failover Manager.
>>>
>>
>> Or use already available open source tools such as Pgpool-II or Linux-HA.
>>
>> Sincerely,
>> JD
>>
>> --
>> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
>> PostgreSQL Centered full stack support, consulting and development.
>> Announcing "I'm offended" is basically telling the world you can't
>> control your own emotions, so everyone else should do it for you.
>>
>
>


Re: [GENERAL] Setting up HA postgresql

2015-07-21 Thread William Dunn
Hello Aviel,

On Tue, Jul 21, 2015 at 3:56 AM, Aviel Buskila  wrote:
>
> How can I set a highly available postgresql in a share-nothing
> architecture?
>

I suggest you review the official documentation on high-availability
configurations linked below:
http://www.postgresql.org/docs/current/static/high-availability.html

The most common configuration is to use PostgreSQL's built in
master/standby streaming replication. However you will need to develop any
fail-over logic you need yourself or use a third party tool such as
EnterpriseDB's Failover Manager.

*Will J. Dunn*
*willjdunn.com *

On Tue, Jul 21, 2015 at 3:56 AM, Aviel Buskila  wrote:

> Hey,
> How can I set a highly available postgresql in a share-nothing
> architecture?
>
>
>
> Best regards,
> Aviel B.
>


Re: [GENERAL] Index Only Scan vs Cache

2015-07-14 Thread William Dunn
On Tue, Jul 14, 2015 at 2:19 PM, Marc Mamin  wrote:

>
> Any rule of the thumb with which number of pages per relation it is worth
> to start indexing ?


The code for the monitoring tool check_postgres uses table size larger than
5.12kb as a rule of thumb, expecting that for tables smaller than 5.12kb
the planner may choose a sequential scan instead because the table is so
small. check_postgres was written a long time ago though, so someone who is
more familiar with the optimizer may be able to provide a better estimate.

*Will J. Dunn*
*willjdunn.com *

On Tue, Jul 14, 2015 at 2:19 PM, Marc Mamin  wrote:

>
> >On 7/9/2015 12:41 PM, Tom Lane wrote:
> >> Andy Colson  writes:
> >>> My question is:  Will PG cache only the index (assuming it can always
> do
> >>> an Index Only Scan), or will it cache the table as well?
>
> I'm not sure that indexes on tiny tables are useful.
> They raise the options to consider by the query planner, which has its
> small cost too.
> I'd be interested on other opinions on this.
> Any rule of the thumb with which number of pages per relation it is worth
> to start indexing ?
>
> And still another question: I've have tiny static tables too, that never
> got analyzed.
> Can this fool the query planner in a negative way ?
>
> regards,
>
> Marc Mamin
>
> >> The table blocks would fall out of cache if they're never touched.
> >>
> >>regards, tom lane
> >>
> >>
> >
> >Sweet!  Thanks Tom.
> >
> >
> >--
> >Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> >To make changes to your subscription:
> >http://www.postgresql.org/mailpref/pgsql-general
> >
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Oracle to PostgreSQL Migration - Need Information

2015-07-08 Thread William Dunn
On Wed, Jul 8, 2015 at 5:20 PM, Joshua D. Drake 
 wrote:

>
> On 07/08/2015 12:47 PM, John McKown wrote:
>
>
>> ​Why are they converting?
>>
>> Would EnterpriseDB (a commercial version of PostgreSQL which has
>> extensions to make it a "drop in" replacement for Oracle) be a
>> possibility?
>> http://www.enterprisedb.com/solutions/oracle-compatibility-technology
>>
>
> Because EDB is expensive. Why go from one closed source solution to
> another when you can go to the best Open Source database and forgo all of
> that?
>
> Yes, EDB has some nice tools (no denying that) but porting from Oracle to
> PostgreSQL proper is not difficult in the least (although time consuming).


I would suggest refrain from dismissing EnterpriseDB's PostgreSQL Advanced
Server like that. It is not free like the community version of Postgres but
the cost pays for developer time spent adding the additional features which
make it capable of being a drop-in replacement of Oracle. For an
organization migrating off of Oracle paying for that would make a lot of
sense because of the developer time (and cost) saved by porting to that
rather than the additional effort of migration to community Postgres. It
also has the additional benefit of providing all the programming features
that their app developers have become used to when working with Oracle.

The EnterpriseDB developers are some of the most active contributors to the
community version of Postgres and the advancements made in the community
version are included in EnterpriseDB Advanced Server as well.

*Will J. Dunn*
*willjdunn.com *

On Wed, Jul 8, 2015 at 5:20 PM, Joshua D. Drake 
wrote:

>
> On 07/08/2015 12:47 PM, John McKown wrote:
>
>
>> ​Why are they converting?
>>
>> Would EnterpriseDB (a commercial version of PostgreSQL which has
>> extensions to make it a "drop in" replacement for Oracle) be a
>> possibility?
>> http://www.enterprisedb.com/solutions/oracle-compatibility-technology
>>
>
> Because EDB is expensive. Why go from one closed source solution to
> another when you can go to the best Open Source database and forgo all of
> that?
>
> Yes, EDB has some nice tools (no denying that) but porting from Oracle to
> PostgreSQL proper is not difficult in the least (although time consuming).
>
> Sincerely,
>
> JD
>
>
> --
> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
> Announcing "I'm offended" is basically telling the world you can't
> control your own emotions, so everyone else should do it for you.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] very slow queries and ineffective vacuum

2015-06-30 Thread William Dunn
Jerry,

On Tue, Jun 30, 2015 at 5:31 PM, Jerry Sievers 
 wrote:

>
> foodb/postgres
> =# \d pg_stat_activity|pg_prepared_xacts
> View "pg_catalog.pg_prepared_xacts"
>Column|   Type   | Modifiers
> -+--+---
>  transaction | xid  |
>  gid | text |
>  prepared| timestamp with time zone |
>  owner   | name |
>  database| name |
>
>View "pg_catalog.pg_stat_activity"
>   Column  |   Type   | Modifiers
> --+--+---
>  datid| oid  |
>  datname  | name |
>  pid  | integer  |
>  usesysid | oid  |
>  usename  | name |
>  application_name | text |
>  client_addr  | inet |
>  client_hostname  | text |
>  client_port  | integer  |
>  backend_start| timestamp with time zone |
>  xact_start   | timestamp with time zone |
>  query_start  | timestamp with time zone |
>  state_change | timestamp with time zone |
>  waiting  | boolean  |
>  state| text |
>  query| text |
>
> foodb/postgres
> =#


What exactly are you trying to tell us? If you want to provide someone
details about one of the system views it is probably better to link them to
the official documentation which lists not only the view's fields and their
datatype but also their meaning,what they will be in their specific
Postgres version, and any additional notes the community deemed useful

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, Jun 30, 2015 at 5:31 PM, Jerry Sievers 
wrote:

> William Dunn  writes:
>
> > Sorry I meant to say, "To track transactions that have been left idle
> but not committed or rolled back you would..."
> > Typo
>
>
> foodb/postgres
> =# \d pg_stat_activity|pg_prepared_xacts
> View "pg_catalog.pg_prepared_xacts"
>Column|   Type   | Modifiers
> -+--+---
>  transaction | xid  |
>  gid | text |
>  prepared| timestamp with time zone |
>  owner   | name |
>  database| name |
>
>View "pg_catalog.pg_stat_activity"
>   Column  |   Type   | Modifiers
> --+--+---
>  datid| oid  |
>  datname  | name |
>  pid  | integer  |
>  usesysid | oid  |
>  usename  | name |
>  application_name | text |
>  client_addr  | inet |
>  client_hostname  | text |
>  client_port  | integer  |
>  backend_start| timestamp with time zone |
>  xact_start   | timestamp with time zone |
>  query_start  | timestamp with time zone |
>  state_change | timestamp with time zone |
>  waiting  | boolean  |
>  state| text |
>  query| text |
>
> foodb/postgres
> =#
>
>
> >
> > Will J. Dunn
> > willjdunn.com
> >
> > On Tue, Jun 30, 2015 at 4:33 PM, William Dunn  wrote:
> >
> > On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
> lukasz.wro...@motorolasolutions.com> wrote:
> >
> > Perhaps I'm missing some indexes on the tables (creating them on
> the columns on which the where clause was used in the long queries seemed
> to halve their
> > times). Also how can I monitor my transactions and if they are
> closed properly?
> >
> > To track transactions that have not been left idle but not committed
> or rolled back you would:
> >
> > 1) Set track_activities true in the config (doc:
> http://www.postgresql.org/docs/current/static/runtime-config-statistics.html#GUC-TRACK-ACTIVITIES
> )
> > 2) Query the pg_stat_activity view for connections where state =
> 'idle in transaction' (doc:
> http://www.postgresql.org/docs/current/static/monitoring-stats.html#
> > PG-STAT-ACTIVITY-VIEW)
> >
> > As you would suspect, transactions that have

Re: [GENERAL] very slow queries and ineffective vacuum

2015-06-30 Thread William Dunn
Sorry I meant to say, "To track transactions that *have been* left idle but
not committed or rolled back you would..."
Typo

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, Jun 30, 2015 at 4:33 PM, William Dunn  wrote:

> On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
> lukasz.wro...@motorolasolutions.com> wrote:
>>
>>
>> Perhaps I'm missing some indexes on the tables (creating them on the
>> columns on which the where clause was used in the long queries seemed to
>> halve their times). Also how can I monitor my transactions and if they are
>> closed properly?
>>
>
> To track transactions that have not been left idle but not committed or
> rolled back you would:
>
> 1) Set track_activities true in the config (doc:
> http://www.postgresql.org/docs/current/static/runtime-config-statistics.html#GUC-TRACK-ACTIVITIES
> )
> 2) Query the pg_stat_activity view for connections where state = 'idle in
> transaction' (doc:
> http://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW
> )
>
> As you would suspect, transactions that have been left "idle in
> transaction" prevent vacuum from removing old tuples (because they are
> still in scope for that transaction)
>
> *Will J. Dunn*
> *willjdunn.com <http://willjdunn.com>*
>
> On Tue, Jun 30, 2015 at 4:27 PM, William Dunn  wrote:
>
>> Hello Lukasz,
>>
>> On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
>> lukasz.wro...@motorolasolutions.com> wrote:
>>
>>>
>>> There doesn't seem to be any issues with disk space, memory or CPU, as
>>> neither of those is even 50% used (as per df and top).
>>>
>>
>> Are you using the default PostgreSQL configuration settings, or have you
>> custom tuned them? The default settings are targeted for wide compatibility
>> and are not optimized for performance. If PostgreSQL is performing badly
>> and using a small amount of system resources it is likely some tuning is
>> needed. See docs:
>> http://www.postgresql.org/docs/current/static/runtime-config.html
>>
>>
>> On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
>> lukasz.wro...@motorolasolutions.com> wrote:
>>
>>>
>>> For whatever reason there is also no data in pg_stat* tables.
>>>
>>
>> You can also turn on tracking (for statistics views) by enabling
>> statistics collection in the config
>> http://www.postgresql.org/docs/current/static/runtime-config-statistics.html
>>
>> *Will J. Dunn*
>> *willjdunn.com <http://willjdunn.com>*
>>
>> On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
>> lukasz.wro...@motorolasolutions.com> wrote:
>>
>>> Hello.
>>>
>>> I have multiple problems with my database, the biggest of which is how
>>> to find out what is actually wrong.
>>>
>>> First of all I have a 9.3 postgres database that is running for about a
>>> month. Right now the queries on that database are running very slowly
>>> (select with a simple "where" on a non-indexed column on a table with about
>>> 5000 records takes 1,5s, a complicated hibernate select with 7 joins on
>>> tables of about 5000 records takes about 15s, insert or update on a table
>>> with 35000 records takes up to 20 mins).
>>>
>>> The tables and indexes on those tables are bloated to the point where
>>> this query: https://wiki.postgresql.org/wiki/Show_database_bloat shows
>>> wasted bytes in hundreds of MB.
>>>
>>> For whatever reason there is also no data in pg_stat* tables.
>>>
>>> So due to the long query times, there are multiple errors in my
>>> application logs like "No free connection available" or "Could not
>>> synchronize database state with session", or "Failed to rollback
>>> transaction" and the application fails to start in the required time.
>>>
>>> The only thing that helps fix the situation seems to be vacuum full of
>>> the entire database. Regular vacuum doesn't even lower the dead tuples
>>> count (which appear by the thousands during application launching). Reindex
>>> of all the indexes in the database didn't help as well. All autovacuum
>>> parameters are default.
>>>
>>> There doesn't seem to be any issues with disk space, memory or CPU, as
>>> neither of those is even 50% used (as per df and top).
>>>
>>> Is there any good tool that will monitor the queries and generate a
>>> report with useful information on what might be the problem? I tried
>>> pg_badger, but all I got were specific queries and their times, but the
>>> long query times are just one of the symptoms of what's wrong with the
>>> database, not the cause.
>>>
>>> Perhaps I'm missing some indexes on the tables (creating them on the
>>> columns on which the where clause was used in the long queries seemed to
>>> halve their times). Also how can I monitor my transactions and if they are
>>> closed properly?
>>>
>>> I will be grateful for any help and if you need more details I can
>>> provide them if possible.
>>>
>>> Best regards.
>>> Lukasz
>>>
>>
>>
>


Re: [GENERAL] very slow queries and ineffective vacuum

2015-06-30 Thread William Dunn
On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
lukasz.wro...@motorolasolutions.com> wrote:
>
>
> Perhaps I'm missing some indexes on the tables (creating them on the
> columns on which the where clause was used in the long queries seemed to
> halve their times). Also how can I monitor my transactions and if they are
> closed properly?
>

To track transactions that have not been left idle but not committed or
rolled back you would:

1) Set track_activities true in the config (doc:
http://www.postgresql.org/docs/current/static/runtime-config-statistics.html#GUC-TRACK-ACTIVITIES
)
2) Query the pg_stat_activity view for connections where state = 'idle in
transaction' (doc:
http://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW
)

As you would suspect, transactions that have been left "idle in
transaction" prevent vacuum from removing old tuples (because they are
still in scope for that transaction)

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, Jun 30, 2015 at 4:27 PM, William Dunn  wrote:

> Hello Lukasz,
>
> On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
> lukasz.wro...@motorolasolutions.com> wrote:
>
>>
>> There doesn't seem to be any issues with disk space, memory or CPU, as
>> neither of those is even 50% used (as per df and top).
>>
>
> Are you using the default PostgreSQL configuration settings, or have you
> custom tuned them? The default settings are targeted for wide compatibility
> and are not optimized for performance. If PostgreSQL is performing badly
> and using a small amount of system resources it is likely some tuning is
> needed. See docs:
> http://www.postgresql.org/docs/current/static/runtime-config.html
>
>
> On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
> lukasz.wro...@motorolasolutions.com> wrote:
>
>>
>> For whatever reason there is also no data in pg_stat* tables.
>>
>
> You can also turn on tracking (for statistics views) by enabling
> statistics collection in the config
> http://www.postgresql.org/docs/current/static/runtime-config-statistics.html
>
> *Will J. Dunn*
> *willjdunn.com <http://willjdunn.com>*
>
> On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
> lukasz.wro...@motorolasolutions.com> wrote:
>
>> Hello.
>>
>> I have multiple problems with my database, the biggest of which is how to
>> find out what is actually wrong.
>>
>> First of all I have a 9.3 postgres database that is running for about a
>> month. Right now the queries on that database are running very slowly
>> (select with a simple "where" on a non-indexed column on a table with about
>> 5000 records takes 1,5s, a complicated hibernate select with 7 joins on
>> tables of about 5000 records takes about 15s, insert or update on a table
>> with 35000 records takes up to 20 mins).
>>
>> The tables and indexes on those tables are bloated to the point where
>> this query: https://wiki.postgresql.org/wiki/Show_database_bloat shows
>> wasted bytes in hundreds of MB.
>>
>> For whatever reason there is also no data in pg_stat* tables.
>>
>> So due to the long query times, there are multiple errors in my
>> application logs like "No free connection available" or "Could not
>> synchronize database state with session", or "Failed to rollback
>> transaction" and the application fails to start in the required time.
>>
>> The only thing that helps fix the situation seems to be vacuum full of
>> the entire database. Regular vacuum doesn't even lower the dead tuples
>> count (which appear by the thousands during application launching). Reindex
>> of all the indexes in the database didn't help as well. All autovacuum
>> parameters are default.
>>
>> There doesn't seem to be any issues with disk space, memory or CPU, as
>> neither of those is even 50% used (as per df and top).
>>
>> Is there any good tool that will monitor the queries and generate a
>> report with useful information on what might be the problem? I tried
>> pg_badger, but all I got were specific queries and their times, but the
>> long query times are just one of the symptoms of what's wrong with the
>> database, not the cause.
>>
>> Perhaps I'm missing some indexes on the tables (creating them on the
>> columns on which the where clause was used in the long queries seemed to
>> halve their times). Also how can I monitor my transactions and if they are
>> closed properly?
>>
>> I will be grateful for any help and if you need more details I can
>> provide them if possible.
>>
>> Best regards.
>> Lukasz
>>
>
>


Re: [GENERAL] very slow queries and ineffective vacuum

2015-06-30 Thread William Dunn
Hello Lukasz,

On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
lukasz.wro...@motorolasolutions.com> wrote:

>
> There doesn't seem to be any issues with disk space, memory or CPU, as
> neither of those is even 50% used (as per df and top).
>

Are you using the default PostgreSQL configuration settings, or have you
custom tuned them? The default settings are targeted for wide compatibility
and are not optimized for performance. If PostgreSQL is performing badly
and using a small amount of system resources it is likely some tuning is
needed. See docs:
http://www.postgresql.org/docs/current/static/runtime-config.html


On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
lukasz.wro...@motorolasolutions.com> wrote:

>
> For whatever reason there is also no data in pg_stat* tables.
>

You can also turn on tracking (for statistics views) by enabling statistics
collection in the config
http://www.postgresql.org/docs/current/static/runtime-config-statistics.html

*Will J. Dunn*
*willjdunn.com *

On Tue, Jun 30, 2015 at 8:57 AM, Lukasz Wrobel <
lukasz.wro...@motorolasolutions.com> wrote:

> Hello.
>
> I have multiple problems with my database, the biggest of which is how to
> find out what is actually wrong.
>
> First of all I have a 9.3 postgres database that is running for about a
> month. Right now the queries on that database are running very slowly
> (select with a simple "where" on a non-indexed column on a table with about
> 5000 records takes 1,5s, a complicated hibernate select with 7 joins on
> tables of about 5000 records takes about 15s, insert or update on a table
> with 35000 records takes up to 20 mins).
>
> The tables and indexes on those tables are bloated to the point where this
> query: https://wiki.postgresql.org/wiki/Show_database_bloat shows wasted
> bytes in hundreds of MB.
>
> For whatever reason there is also no data in pg_stat* tables.
>
> So due to the long query times, there are multiple errors in my
> application logs like "No free connection available" or "Could not
> synchronize database state with session", or "Failed to rollback
> transaction" and the application fails to start in the required time.
>
> The only thing that helps fix the situation seems to be vacuum full of the
> entire database. Regular vacuum doesn't even lower the dead tuples count
> (which appear by the thousands during application launching). Reindex of
> all the indexes in the database didn't help as well. All autovacuum
> parameters are default.
>
> There doesn't seem to be any issues with disk space, memory or CPU, as
> neither of those is even 50% used (as per df and top).
>
> Is there any good tool that will monitor the queries and generate a report
> with useful information on what might be the problem? I tried pg_badger,
> but all I got were specific queries and their times, but the long query
> times are just one of the symptoms of what's wrong with the database, not
> the cause.
>
> Perhaps I'm missing some indexes on the tables (creating them on the
> columns on which the where clause was used in the long queries seemed to
> halve their times). Also how can I monitor my transactions and if they are
> closed properly?
>
> I will be grateful for any help and if you need more details I can provide
> them if possible.
>
> Best regards.
> Lukasz
>


Re: [GENERAL] foreign keys to foreign tables

2015-06-22 Thread William Dunn
Hello Rick,

As I understand it you are correct. Oracle/DB2/Postgres and I think the SQL
Standards to not implement constraints against tables on foreign servers.
Although it would be possible to develop the DBMS to handle such
constraints in a heterogeneous distributed environment it would be unwise
because of the poor performance and reliability of data sent over networks
so DBMSs do not implement it. You would, as you suspected, have to use
stored procedures to emulate some of the functionality of a foreign key but
definitely think twice about the performance bottlenecks you would
introduce. A more clever thing to do is use Slony, BDR, or triggers to
replicate the foreign table and create the constraint against the local
copy. In some other DBMSs the clever thing to do is create a materialized
view and constraints against the materialized view (which achieves the
same) but Postgres does not yet support such constraints against
materialized views.

*Will J. Dunn*
*willjdunn.com *

On Mon, Jun 22, 2015 at 12:21 PM, Tom Lane  wrote:

> Rick Otten  writes:
> > Hello pgsql-general,
> > I'd like to set up a foreign key constraint to a foreign table from a
> local
> > table.
>
> > ie, I have a column in a local table that I'd like to ensure has a value
> in
> > the foreign table.
>
> > alter mytable
> > add column  some_column_id   uuid references
> myforeigntable(some_column_id)
> > ;
>
> > Unfortunately I get a "not a table" error when I try this.
>
> > ERROR:  referenced relation "myforeigntable" is not a table
>
> > I'm thinking I'll have to write a function that checks for existance of
> the
> > ids in the foreign table, and then put a CHECK constraint on using that
> > function, but I thought I'd as first if there was a better way.
>
> What's going to happen when the foreign server decides to delete some rows
> from its table?
>
> regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] pg_class.reltuples VS pg_stat_all_tables.n_live_tup for estimation of table

2015-06-17 Thread William Dunn
Thanks so much Tom!

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Wed, Jun 17, 2015 at 3:48 PM, Tom Lane  wrote:

> William Dunn  writes:
> > Does anyone which is a more accurate estimate of a table's live
> > rows: pg_class.reltuples (
> > http://www.postgresql.org/docs/current/static/catalog-pg-class.html)
> > OR pg_stat_all_tables.n_live_tup (
> >
> http://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW
> )?
>
> They're both inaccurate with different sources of inaccuracy.  I dunno
> that you should assume that one is necessarily better than the other.
>
> >- Is pg_class.reltuples an estimation of live tuples only, or is it of
> >all tuples (both live and dead)? I would guess it's live only but
> that is a
> >guess
>
> Hm.  I'm pretty sure the planner takes it as counting live tuples only,
> but it looks like VACUUM thinks it includes recently-dead-but-not-yet-
> removable tuples.  We might need to do some adjustment there.
>
> regards, tom lane
>


[GENERAL] pg_class.reltuples VS pg_stat_all_tables.n_live_tup for estimation of table

2015-06-17 Thread William Dunn
Hello,

Does anyone which is a more accurate estimate of a table's live
rows: pg_class.reltuples (
http://www.postgresql.org/docs/current/static/catalog-pg-class.html)
OR pg_stat_all_tables.n_live_tup (
http://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW)?
In my test database their values are not the same.

   - Is pg_class.reltuples an estimation of live tuples only, or is it of
   all tuples (both live and dead)? I would guess it's live only but that is a
   guess
   - In the database I am testing:
   - pg_class.reltuples <> pg_stat_all_tables.n_live_tup
  - pg_class.reltuples <> (pg_stat_all_tables.n_live_tup
  + pg_stat_all_tables.n_dead_tup)


I understand that pg_stat_all_tables is probably the more expensive view to
query but I am already querying both so I only care which is more accurate.


Thanks!
Will

*Will J. Dunn*
*willjdunn.com *


Re: [GENERAL] Indexed views like SQL Server - NOT Materialized Views

2015-06-09 Thread William Dunn
Though I'm sure you've already looked into it, for your specific issue of
getting row counts:
- In PostgreSQL 9.2 and above this operation can be made much faster with
index-only scans so ensure you are on a recent version and do your count on
a column of a candidate key with an index (for example, the primary key)
- An approximate rowcount is maintained in pg_stat_user_tables, if an
approximate value is acceptable you can obtain one there very fast

As for PostgreSQL implementing Microsoft SQL Server features:
In general, Microsoft SQL Server is famous for it's lack of standards
compliance while PostgreSQL is famously ANSI/ISO standards compliant. If a
SQL Server non-standard feature is not adopted by Oracle and/or DB2 and/or
the standards it is unlikely PostgreSQL will adopt it unless the feature is
very highly desired or a contributor has a deep interest. However it is
more likely for non-standard features to be implemented as a PostgreSQL
plug-in.
On Jun 9, 2015 7:28 PM, "inspector morse" 
wrote:

> SQL Server has a feature called Indexed Views that are similiar to
> materialized views.
>
> Basically, the Indexed View supports COUNT/SUM aggregate queries. You
> create a unique index on the Indexed View and SQL Server automatically
> keeps the COUNT/SUM upto date.
>
> Example:
> CREATE VIEW ForumTopicCounts
> AS
> SELECT  ForumId, COUNT_BIG(*) AS TopicsCount
> FROM Topics
> GROUP BY ForumId
>
> CREATE UNIQUE CLUSTERED INDEX idx ON ForumTopicCounts(ForumId);
>
> After doing that, if you add or delete a topic from the Topics Table, SQL
> Server automatically keeps the count updated.and it's fast because of
> the unique index.
>
>
> Doing the same thing in Postgresql using Materialized views is slow and
> the developer has to manually issue a "refresh materialized view" command.
> The alternative is to write additional sql to update count
> columnsuneccessary work.
>
>
> Do you know when Postgresql will implement such a feature? Counting is
> already slow in Postgresql, adding similiar feature like SQL Server will
> really help.
>


Re: [GENERAL] Database designpattern - product feature

2015-06-04 Thread William Dunn
Hello Adrian,

Sorry for the late response. I've not used Django so I can't provide
specific guidance on how you would implement it with that but in SQLAlchemy
that model is called "joined-table-inheritance":
http://docs.sqlalchemy.org/en/latest/orm/inheritance.html#joined-table-inheritance

That is also the behavior of Hibernate (Java) when using "table per
subclass" mapping.

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Wed, Jun 3, 2015 at 2:50 AM, Adrian Stern 
wrote:

> Hi William, thanks for joining the conversation.
>
> 1) We do hope for constraints since a connection to an ERP system is
> possible in the future. We want to plan ahead.
>
> 2) As for the subclass approach: I would need about 30 subclasses and it
> will get really hard to add new products since a change in the database
> will be necessary each time. That's why we want a more generic approach.
>
> Maybe I don't understand you right, because of the language barrier. Can
> you provide me a link to a subclassing example?
> ->
> https://docs.djangoproject.com/en/1.8/topics/db/models/#multi-table-inheritance
> ?
>
> ORM is a given in my case. This is not a high performance application.
>
> Freundliche Grüsse
>
> Adrian Stern
> unchained - web solutions
>
> adrian.st...@unchained.ch
> +41 79 292 83 47
>
> On Tue, Jun 2, 2015 at 5:35 PM, William Dunn  wrote:
>
>> Hello Adrian,
>>
>> May I ask why you need a non-standard model? By standard models I mean
>> the following:
>>
>> 1) When you don't need to have subclass specific database constraints:
>> All subclasses in the same table, subclasses that do not have an attribute
>> have that column null. This has the best performance because no joins are
>> required when querying both superclass and subclass attributes, and all the
>> data for an object will be in the same block on disk. The disadvantage of
>> this is that you cannot enforce constraints, such as not-null, on subclass
>> specific attributes columns because the constraints would also be applied
>> to the superclasses. If you can ensure that your application is the only
>> way data can be inserted those constraints will naturally be enforced there.
>>
>> 2) When you need subclass specific database constraints: Use an ORM such
>> as Django's ORM or SQLAlchemy which has one table with the superclass
>> attributes and a table for each subclass with their subclass specific
>> attributes. This is slower because joins will be needed and the data for an
>> object will be in 2 different blocks of disk but it allows you to enforce
>> constraints within the database which will be checked whenever any
>> application tries to insert values. There is a lot of complexity added
>> because there will be so many small tables and indexes but the ORM takes
>> care of that for you.
>>
>> *Will J. Dunn*
>> *willjdunn.com <http://willjdunn.com>*
>>
>> On Mon, Jun 1, 2015 at 10:35 AM, Adrian Stern 
>> wrote:
>>
>>> Hi, I'm new
>>>
>>> I've been working as the sole administrator of various postgresql
>>> projects for a while now. All of which where django projects.
>>> Since a new project is starting and we've found the need for a more
>>> generic approach I would like to ask a few questions.
>>>
>>> I would like to implement a pattern similar to the product feature
>>> pattern explained in the silverstone book - the data model resource book
>>> vol 1. It is simply explained. There is a Table PRODUCT holding the fields
>>> all the products share, then there is the table PRODUCT_FEATURE, both of
>>> them in a “many to many“ relationship.
>>>
>>> PRODUCT <--- m  n ---> PRODUCT_FEATURE (a table in between of
>>> course)
>>>
>>> PRODUCT_FEATURE --> PF
>>> PRODUCT --> P
>>> TABLE IN BETWEEN --> TIB
>>>
>>> PF defines the feature Type while P stands for the product the feature
>>> is applied to. Some of these PF can have values of different types (text,
>>> numbers, floating, blob, ...) which would be applied to TIB.
>>>
>>> I don't like the idea of having numerous empty fields prepared in TIB,
>>> just to store occasional values of different types, therefore I need to
>>> specialize those TIB Values.
>>>
>>> Now how would I do That?
>>>
>>> I could create some tables solely for the means of holding [NUM],
>>> [TEXT], [BLOB], [ETC] and reference them with the TIB PK. When using them I
>>> cou

Re: [GENERAL] Database designpattern - product feature

2015-06-02 Thread William Dunn
Hello Adrian,

May I ask why you need a non-standard model? By standard models I mean the
following:

1) When you don't need to have subclass specific database constraints: All
subclasses in the same table, subclasses that do not have an attribute have
that column null. This has the best performance because no joins are
required when querying both superclass and subclass attributes, and all the
data for an object will be in the same block on disk. The disadvantage of
this is that you cannot enforce constraints, such as not-null, on subclass
specific attributes columns because the constraints would also be applied
to the superclasses. If you can ensure that your application is the only
way data can be inserted those constraints will naturally be enforced there.

2) When you need subclass specific database constraints: Use an ORM such as
Django's ORM or SQLAlchemy which has one table with the superclass
attributes and a table for each subclass with their subclass specific
attributes. This is slower because joins will be needed and the data for an
object will be in 2 different blocks of disk but it allows you to enforce
constraints within the database which will be checked whenever any
application tries to insert values. There is a lot of complexity added
because there will be so many small tables and indexes but the ORM takes
care of that for you.

*Will J. Dunn*
*willjdunn.com *

On Mon, Jun 1, 2015 at 10:35 AM, Adrian Stern 
wrote:

> Hi, I'm new
>
> I've been working as the sole administrator of various postgresql projects
> for a while now. All of which where django projects.
> Since a new project is starting and we've found the need for a more
> generic approach I would like to ask a few questions.
>
> I would like to implement a pattern similar to the product feature pattern
> explained in the silverstone book - the data model resource book vol 1. It
> is simply explained. There is a Table PRODUCT holding the fields all the
> products share, then there is the table PRODUCT_FEATURE, both of them in a
> “many to many“ relationship.
>
> PRODUCT <--- m  n ---> PRODUCT_FEATURE (a table in between of
> course)
>
> PRODUCT_FEATURE --> PF
> PRODUCT --> P
> TABLE IN BETWEEN --> TIB
>
> PF defines the feature Type while P stands for the product the feature is
> applied to. Some of these PF can have values of different types (text,
> numbers, floating, blob, ...) which would be applied to TIB.
>
> I don't like the idea of having numerous empty fields prepared in TIB,
> just to store occasional values of different types, therefore I need to
> specialize those TIB Values.
>
> Now how would I do That?
>
> I could create some tables solely for the means of holding [NUM], [TEXT],
> [BLOB], [ETC] and reference them with the TIB PK. When using them I could
> create a view TIBV containing all of [NUM, TEXT, BLOB, ETC] in the same
> column called Value, and join it with TIB to get the value of a PF.
>
> But is this a good idea?
> Is there a better way?
>
> Also, I would have to create a pivot table in order to list all the
> products with all the features. As this is implemented in C (afaik) I
> suppose it is rather fast or at least fast enough, but I do not actually
> know. What I know is, there are about 30 Product Types and around 50
> possible product features. One product can have up to approximately 25 PF
> but are mostly around 5 to 10.
>
> Do you think a pivot table is a good idea?
> What alternative do i have?
>
> There is room for caching since the dataset is not updated too often.
>
> regards, adrian
>


Re: [GENERAL] odbc to emulate mysql for end programs

2015-06-01 Thread William Dunn
Hello,

PostgreSQL has a fully standards compliant ODBC driver (See:
https://odbc.postgresql.org/). Any application designed to communicate with
DBMS over ODBC connection should be able to use that driver to communicate
with PostgreSQL. Most applications that interact with databases come with
ODBC drivers pre-installed for the most common databases (MySQL, Microsoft
SQL Server, Oracle, etc.) but allow you to set up an ODBC driver for
another DBMS.


*Will J. Dunn*
*willjdunn.com *

On Mon, Jun 1, 2015 at 3:04 PM, Mimiko  wrote:

> Hello.
>
> I have several programs which can use mysql,access,oracle DB as a
> front-end DB via odbc. Is there a method to emulate or hide the back-end DB
> so for program it will be seen as mysql, but real DB will be on postgres?
>
> Thank you.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Is my standby fully connected?

2015-06-01 Thread William Dunn
In 9.1+ you can monitor the state of your slave easily with
the standby_state field of pg_stat_replication:
SELECT standby_pid,
   standby_usesysid,
   standby_usename,
   standby_client_addr,
   standby_client_port,
   standby_state
FROM pg_stat_replication;

If the standby is not connected at all the query will not return any result
for that standby.
If the standby is connected possible values of standby_state are streaming
(ideal state), catchup (standby has fallen behind), startup, backup.

*Will J. Dunn*
*willjdunn.com *

On Mon, Jun 1, 2015 at 2:13 PM, Guillaume Lelarge 
wrote:

> Please, keep the list posted.
> 2015-06-01 20:09 GMT+02:00 Ivann Ruiz :
>
>> I get 1 row, a walreceiver and my slave IP,
>>
>>
> Then all is good.
>
>
>> i was expecting something like:
>>
>>  pg_current_xlog_location
>> --
>>  0/200
>> (1 row)
>>
>>
>> When I ran ->   psql -c "SELECT pg_current_xlog_location()"
>> -h192.168.0.10 (primary host)
>>
>
> Well, you can't execute it on a slave. You can use
> pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on a
> slave.
>
>
>>
>> On Mon, Jun 1, 2015 at 2:04 PM, Guillaume Lelarge > > wrote:
>>
>>> Hi,
>>>
>>> 2015-06-01 19:57 GMT+02:00 Ivann Ruiz :
>>>
 I just setup my databases and got this LOG:

 LOG:  database system was shut down in recovery at 2015-06-01 13:31:21
 EDT
 LOG:  entering standby mode
 LOG:  redo starts at 0/67000278
 LOG:  consistent recovery state reached at 0/67000350
 LOG:  record with zero length at 0/67000350
 LOG:  database system is ready to accept read only connections
 LOG:  started streaming WAL from primary at 0/6700 on timeline 1


>>> Sounds good to me.
>>>
>>>
 And it always stays like this, i never get a message like:
 LOG:   streaming replication successfully connected to primary

 Additionally, i cannot run queries such as:
 psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10

 ...to calculate replication lag, I get the following message:
 ERROR: recovery is in progress
 HINT: WAL control functions cannot be executed during recovery

 Can someone please tell me what is going on?

>>>
>>> This is just what it's supposed to be. And it tells you exactly that.
>>> You cannot use pg_current_xlog_location() on a slave.
>>>
>>> Check pg_stat_replication view on the master.
>>>
>>> --
>>> Guillaume.
>>>   http://blog.guillaume.lelarge.info
>>>   http://www.dalibo.com
>>>
>>
>>
>
>
> --
> Guillaume.
>   http://blog.guillaume.lelarge.info
>   http://www.dalibo.com
>


Re: [GENERAL] Can we simulate Oracle Flashback with pg_export_snapshot()?

2015-05-26 Thread William Dunn
Hello Laurenz,

Thanks for your feedback. Actually when I said that the same overhead
occurs in Oracle I was referring to bloat in the UNDO logs, which similar
to PostgreSQL dead tuples has some impact on size/performance and is the
majority of the trade-off considered when deciding to implement Flashback.

Thank you for point out HOT Updates! I had not read about that before as it
doesn't seem to appear in the official documentation. I found info on it in
a readme in the source code (
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD)
and hope to read through it soon.

In terms of benefit over a lagging replica Flashback has the benefit of
being transparent to the user (the user can query over the same database
connection, etc), it does not incur the full cost of having a replica...

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Fri, May 22, 2015 at 4:50 AM, Albe Laurenz 
wrote:

> William Dunn wrote:
> > Just had an idea and could use some feedback. If we start a transaction,
> leave it idle, and use
> > pg_export_snapshot() to get its snapshot_id MVCC will hold all the
> tuples as of that transaction's
> > start and any other transaction can see the state of the database as of
> that time using SET
> > TRANSACTION SNAPSHOT snapshot_id?
> >
> >
> http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION
> >
> >
> > I'm thinking of setting up automation to ~every half hour open a
> transaction as such, close any that
> > have been open over an hour, and store the snapshot_id. However, I don't
> have a busy system that I can
> > test it on.
> >
> > Of course this would cause some extra bloat because those tuples cannot
> get autovacuumed until the
> > transaction closes but that is also the case in Oracle. Is there
> anything else I am missing or a
> > reason that this would not be possible?
>
> Oracle does not have that issue because modifications cannot bloat the
> table (the bloat is in
> what is called the "UNDO tablespace").
>
> What you suggest would allow you to look at the data as they were at
> specific times (of the snapshots).
> But the price on a busy system where data are modified would be
> considerable; your tables might
> become quite bloated, and you could not use HOT updates.
>
> If you want to look into the past, wouldn't it be much more useful to have
> a standby server
> that is lagging behind?  There is an option for that
> (recovery_min_apply_delay) from
> PostgreSQL 9.4 on.
>
> Yours,
> Laurenz Albe
>


Re: [GENERAL] Queries for unused/useless indexes

2015-05-26 Thread William Dunn
The query I previously sent was table level. Here is an index level one:
SELECT pg_stat_user_indexes.schemaname,
   pg_stat_user_indexes.relname,
   pg_stat_user_indexes.indexrelid,
   pg_stat_user_indexes.indexrelname,
   pg_stat_user_indexes.idx_scan,
   pg_stat_user_tables.seq_scan,
   (100 * pg_stat_user_indexes.idx_scan / (pg_stat_user_tables.seq_scan
+ pg_stat_user_indexes.idx_scan)) AS perc_idx_used
FROM pg_stat_user_indexes INNER JOIN pg_stat_user_tables ON
pg_stat_user_indexes.relid = pg_stat_user_tables.relid
WHERE pg_relation_size(pg_stat_user_indexes.relid)>(5*8192)
  AND NOT ((pg_stat_user_indexes.idx_scan=0
OR pg_stat_user_indexes.idx_scan=NULL)
   AND pg_stat_user_tables.seq_scan=0)
ORDER BY perc_idx_used;

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Tue, May 26, 2015 at 10:31 AM, William Dunn  wrote:

> Melvin - thanks for sharing.
>
> Here is the query I use which lists the percent of queries against the
> table which use the index ordered by least used first.
>
> The 'pg_relation_size(relid)>(5*8192)' is used to remove any tables that
> would be so small the optimizer would just choose a table scan.
>
> SELECT schemaname,
>relname,
>idx_scan,
>seq_scan,
>(100 * idx_scan / (seq_scan + idx_scan)) AS perc_idx_used
> FROM pg_stat_user_tables
> WHERE pg_relation_size(relid)>(5*8192)
>   AND NOT ((idx_scan=0
> OR idx_scan=NULL)
>AND seq_scan=0)
> ORDER BY perc_idx_used;
>
> *Will J. Dunn*
> *willjdunn.com <http://willjdunn.com>*
>
> On Mon, May 25, 2015 at 12:39 PM, Peter J. Holzer  wrote:
>
>> On 2015-05-25 12:25:01 -0400, Melvin Davidson wrote:
>> > I'm not sure why you are using "pg_stat_user_indexes".
>>
>> Because you did. I didn't change that.
>>
>> > My original query below
>> > uses "pg_stat_all_indexes" and the schema names are joined and it does
>> work.
>>
>> I'm not sure what you mean by "original", but this:
>>
>> > SELECT n.nspname as schema,
>> >i.relname as table,
>> >i.indexrelname as index,
>> >i.idx_scan,
>> >i.idx_tup_read,
>> >i.idx_tup_fetch,
>> >pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
>> > quote_ident(i.relname))) AS table_size,
>> >pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
>> > quote_ident(i.indexrelname))) AS index_size,
>> >pg_get_indexdef(idx.indexrelid) as idx_definition
>> >   FROM pg_stat_all_indexes i
>> >   JOIN pg_class c ON (c.oid = i.relid)
>> >   JOIN pg_namespace n ON (n.oid = c.relnamespace)
>> >   JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
>> >  WHERE i.idx_scan < 200
>> >AND NOT idx.indisprimary
>> >AND NOT idx.indisunique
>> >  ORDER BY 1, 2, 3;
>>
>> is not the query you posted in your original message.
>>
>> Here is what you posted:
>>
>> > On Mon, May 25, 2015 at 10:41 AM, Peter J. Holzer 
>> wrote:
>> >
>> > On 2015-05-22 09:41:57 -0400, Melvin Davidson wrote:
>> > > I'd like to share those queries with the community, as I know
>> there must
>> > be
>> > > others out there with the same problem.
>> > >
>> > > /* useless_indexes.sql */
>> > > SELECT
>> > >idstat.schemaname AS schema,
>> > >idstat.relname AS table_name,
>> > >indexrelname AS index_name,
>> > >idstat.idx_scan AS times_used,
>> > >
>> pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
>> > '.' ||
>> > > quote_ident(idstat.relname))) AS table_size,
>> > >
>> pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
>> > '.' ||
>> > > quote_ident(indexrelname))) AS index_size,
>> > >n_tup_upd + n_tup_ins + n_tup_del as num_writes,
>> > >indexdef AS definition
>> > > FROM pg_stat_user_indexes AS idstat
>> > > JOIN pg_indexes ON indexrelname = indexname
>> > > JOIN pg_stat_user_tables AS tabstat ON idstat.relname =
>> tabstat.relname
>> > > WHERE idstat.idx_scan < 200
>> > > AND indexdef !~* 'unique'
>> > > ORDER BY idstat.schemaname,
>> > >  idstat.relname,
>> > >  indexrelname;
>>
>> --
>>_  | Peter J. Holzer| I want to forget all about both belts and
>> |_|_) || suspenders; instead, I want to buy pants
>> | |   | h...@hjp.at | that actually fit.
>> __/   | http://www.hjp.at/ |   -- http://noncombatant.org/
>>
>
>


Re: [GENERAL] Queries for unused/useless indexes

2015-05-26 Thread William Dunn
Melvin - thanks for sharing.

Here is the query I use which lists the percent of queries against the
table which use the index ordered by least used first.

The 'pg_relation_size(relid)>(5*8192)' is used to remove any tables that
would be so small the optimizer would just choose a table scan.

SELECT schemaname,
   relname,
   idx_scan,
   seq_scan,
   (100 * idx_scan / (seq_scan + idx_scan)) AS perc_idx_used
FROM pg_stat_user_tables
WHERE pg_relation_size(relid)>(5*8192)
  AND NOT ((idx_scan=0
OR idx_scan=NULL)
   AND seq_scan=0)
ORDER BY perc_idx_used;

*Will J. Dunn*
*willjdunn.com *

On Mon, May 25, 2015 at 12:39 PM, Peter J. Holzer  wrote:

> On 2015-05-25 12:25:01 -0400, Melvin Davidson wrote:
> > I'm not sure why you are using "pg_stat_user_indexes".
>
> Because you did. I didn't change that.
>
> > My original query below
> > uses "pg_stat_all_indexes" and the schema names are joined and it does
> work.
>
> I'm not sure what you mean by "original", but this:
>
> > SELECT n.nspname as schema,
> >i.relname as table,
> >i.indexrelname as index,
> >i.idx_scan,
> >i.idx_tup_read,
> >i.idx_tup_fetch,
> >pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
> > quote_ident(i.relname))) AS table_size,
> >pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
> > quote_ident(i.indexrelname))) AS index_size,
> >pg_get_indexdef(idx.indexrelid) as idx_definition
> >   FROM pg_stat_all_indexes i
> >   JOIN pg_class c ON (c.oid = i.relid)
> >   JOIN pg_namespace n ON (n.oid = c.relnamespace)
> >   JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
> >  WHERE i.idx_scan < 200
> >AND NOT idx.indisprimary
> >AND NOT idx.indisunique
> >  ORDER BY 1, 2, 3;
>
> is not the query you posted in your original message.
>
> Here is what you posted:
>
> > On Mon, May 25, 2015 at 10:41 AM, Peter J. Holzer 
> wrote:
> >
> > On 2015-05-22 09:41:57 -0400, Melvin Davidson wrote:
> > > I'd like to share those queries with the community, as I know
> there must
> > be
> > > others out there with the same problem.
> > >
> > > /* useless_indexes.sql */
> > > SELECT
> > >idstat.schemaname AS schema,
> > >idstat.relname AS table_name,
> > >indexrelname AS index_name,
> > >idstat.idx_scan AS times_used,
> > >
> pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
> > '.' ||
> > > quote_ident(idstat.relname))) AS table_size,
> > >
> pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
> > '.' ||
> > > quote_ident(indexrelname))) AS index_size,
> > >n_tup_upd + n_tup_ins + n_tup_del as num_writes,
> > >indexdef AS definition
> > > FROM pg_stat_user_indexes AS idstat
> > > JOIN pg_indexes ON indexrelname = indexname
> > > JOIN pg_stat_user_tables AS tabstat ON idstat.relname =
> tabstat.relname
> > > WHERE idstat.idx_scan < 200
> > > AND indexdef !~* 'unique'
> > > ORDER BY idstat.schemaname,
> > >  idstat.relname,
> > >  indexrelname;
>
> --
>_  | Peter J. Holzer| I want to forget all about both belts and
> |_|_) || suspenders; instead, I want to buy pants
> | |   | h...@hjp.at | that actually fit.
> __/   | http://www.hjp.at/ |   -- http://noncombatant.org/
>


[GENERAL] Can we simulate Oracle Flashback with pg_export_snapshot()?

2015-05-20 Thread William Dunn
Hello,

Just had an idea and could use some feedback. If we start a transaction,
leave it idle, and use pg_export_snapshot() to get its snapshot_id MVCC
will hold all the tuples as of that transaction's start and any other
transaction can see the state of the database as of that time using SET
TRANSACTION SNAPSHOT snapshot_id?

http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

I'm thinking of setting up automation to ~every half hour open a
transaction as such, close any that have been open over an hour, and store
the snapshot_id. However, I don't have a busy system that I can test it on.

Of course this would cause some extra bloat because those tuples cannot get
autovacuumed until the transaction closes but that is also the case in
Oracle. Is there anything else I am missing or a reason that this would not
be possible?

Thanks!

*Will J. Dunn*
*willjdunn.com *


Re: [GENERAL] Optimizing a read-only database

2015-05-18 Thread William Dunn
On Mon, May 18, 2015 at 10:54 AM, François Battail <
francois.batt...@sipibox.fr> wrote:

> Le 18/05/2015 16:38, William Dunn a écrit :
>
>   * You can also run a CLUSTER command on one of your indexes to group
>> data that is frequently accessed together into the same segment of
>> disk so you can get more of it in a single IO operation.
>>
>
> Hum... I was planning to put indexes and data on different disks (SSD) /
> controller to maximize bandwith use, am I wrong?
>

Hello François - the CLUSTER command doesn't have to do with where your
indexes are. What the CLUSTER command does is physically sort the table
data based on the index (Doc:
http://www.postgresql.org/docs/devel/static/sql-cluster.html). So for
example if you have a column called 'region_code' with an index and CLUSTER
the table by that index all the rows for 'region_code'=15 will be located
next to each other on disk and can be read in the same IO operation. The
usual disadvantage of CLUSTER is that it does not maintain itself, but
since your data is read-only that doesn't matter. And yes you can still
have the index on an SSD and the source table on slower storage.

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Mon, May 18, 2015 at 10:54 AM, François Battail <
francois.batt...@sipibox.fr> wrote:

> Le 18/05/2015 16:38, William Dunn a écrit :
>
> Thank you William,
>
>* With read-only work loads you can make shared_buffers very large,
>> like 40% of RAM available to the database. Usually you would keep it
>> lower because in a write heavy workload large shared_buffers causes
>> checkpoints to have huge IO, but since you are not making changes in
>> shared_buffers this will not happen.
>>
>
> Yes, good idea.
>
>* You can also increase checkpoint_timeout to a very large value to
>> prevent checkpoints, since you don't need them. WAL level can be
>> minimal as well.
>>
>
> Already set to 5 min with 50 segments and 0.9 completion target (but used
> also for the bulk loading). But of course I will set it to 1 hour when in
> read only mode.
>
>* You can also run a CLUSTER command on one of your indexes to group
>> data that is frequently accessed together into the same segment of
>> disk so you can get more of it in a single IO operation.
>>
>
> Hum... I was planning to put indexes and data on different disks (SSD) /
> controller to maximize bandwith use, am I wrong?
>
>* You can also run the VACUUM FULL command during off-hours to get
>> your tables vacuumed and statistics up-to-date. It's usually too
>> much overhead to be worthwhile but since you are not doing updates
>> you only have to do it once then don't need to worry about
>> autovacuum being aggressive enough.
>>
>
> Vacuum is done at the end of the import and then set to off.
>
>* I don't think that removing locks will provide any benefit if your
>> queries are truly read-only since ordinary read-only transactions do
>> not require any locks
>>
>
> At least a read write lock should be needed, but you're right: better take
> a look at the source code to be sure.
>
>
> Best regards
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] My index doesn't write anymore but read

2015-05-18 Thread William Dunn
Hello Ben,

Looks like you need to tune autovacuum to be more aggressive. Make sure
autovacuum=ON (the default), increase autovacuum_max_workers (at least 1
per database, more if autovacuum is falling
behind), autovacuum_vacuum_scale_factor to be ~half of the default and can
be set per table to be lower for large
tables, autovacuum_vacuum_scale_factor to be ~half of the default.

You can run the following SQL to see the last time each table was vacuumed.
If it's still not frequent enough you can try to decrease
autovacuum_naptime.
This SQL is from Bucardo's check_postgres:
SELECT current_database() AS datname,
   nspname AS sname,
   relname AS tname,
   CASE
   WHEN v IS NULL THEN -1
   ELSE round(extract(epoch
  FROM now()-v))
   END AS ltime,
   CASE
   WHEN v IS NULL THEN '?'
   ELSE TO_CHAR(v, 'HH24:MI FMMonth DD, ')
   END AS ptime
FROM
  (SELECT nspname,
  relname,
  GREATEST(pg_stat_get_last_analyze_time(c.oid),
pg_stat_get_last_autoanalyze_time(c.oid)) AS v
   FROM pg_class c,
 pg_namespace n
   WHERE relkind = 'r'
 AND n.oid = c.relnamespace
 AND n.nspname <> 'information_schema'
   ORDER BY 3) AS foo;


And review the bloat of each table (should be ~1. If far above 1 vacuum is
falling behind):
SELECT schemaname,
   relname,
   (pg_relation_size(relid)) AS table_bytes,
   n_live_tup,
   n_dead_tup,
   (n_live_tup::float+n_dead_tup)/n_live_tup AS bloat
FROM pg_stat_user_tables
WHERE
n_live_tup>(current_setting('autovacuum_vacuum_threshold')::bigint*10)+1;


*Will J. Dunn*
*willjdunn.com *

On Mon, May 18, 2015 at 3:51 AM, ben.play 
wrote:

> Thank you for your quick answer !
>
> And ... you are a genius :)
>
> A simple "analyse
> " resolved my problem.
> Do We have to do it regularly ?
>
> Thank you a lot !
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/My-index-doesn-t-write-anymore-but-read-tp5849689p5849699.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Optimizing a read-only database

2015-05-18 Thread William Dunn
Hello François,

   - With read-only work loads you can make shared_buffers very large, like
   40% of RAM available to the database. Usually you would keep it lower
   because in a write heavy workload large shared_buffers causes checkpoints
   to have huge IO, but since you are not making changes in shared_buffers
   this will not happen.

   - You can also increase checkpoint_timeout to a very large value to
   prevent checkpoints, since you don't need them. WAL level can be minimal as
   well.

   - You can also run a CLUSTER command on one of your indexes to group
   data that is frequently accessed together into the same segment of disk so
   you can get more of it in a single IO operation.

   - You can also run the VACUUM FULL command during off-hours to get your
   tables vacuumed and statistics up-to-date. It's usually too much overhead
   to be worthwhile but since you are not doing updates you only have to do it
   once then don't need to worry about autovacuum being aggressive enough.

   - I don't think that removing locks will provide any benefit if your
   queries are truly read-only since ordinary read-only transactions do not
   require any locks


That's all I can think of right now!
Will

*Will J. Dunn*
*willjdunn.com *

On Mon, May 18, 2015 at 10:07 AM, François Battail <
francois.batt...@sipibox.fr> wrote:

> Dear List,
>
> I would like to know if somebody is aware of tricks for optimizing
> PostgreSQL settings for a read-only database.
> I have a big read-only database (> 1.10^9 records splitted into ~ 10
> tables) using GiST and Btree indexes, no foreign keys on tables at all.
>
> I believe that not doing locks on pages could save some time as there's a
> lot of simultaneaous readers, but so far I've found nothing about this
> specific case in official documentation...
>
> Best regards
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL]

2015-05-17 Thread William Dunn
Hello Sachin,

I hate to respond by suggesting an alternative but it may be good to try
using pg_basebackup (Doc:
http://www.postgresql.org/docs/9.1/static/app-pgbasebackup.html) to back-up
your database. It takes a copy of the file system files rather than
querying the data as an ordinary connection which is much faster. Straight
file system backup is fast too but it may be more difficult to restore the
database to a consistent state. At this point pg_dump is very good for
exporting specific tables or schemas or for loading data into another dbms
but it is slow and has a lot of overhead because of MVCC. pg_basebackup is
faster, but does require you take a backup of the entire cluster.

*Will J. Dunn*
*willjdunn.com *

On Fri, May 15, 2015 at 8:09 AM, Albe Laurenz 
wrote:

> Sachin Srivastava wrote:
> > How can I fast my daily pg_dump backup. Can I use parallel option(Which
> is introduced in Postgres 9.3)
> > with Postgres 9.1. There is any way I can use this is for 9.1 database.
>
> You cannot do that.
>
> Switch to file system backup, that is much faster.
>
> Yours,
> Laurenz Albe
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] SELECT .. WHERE id IN(..)

2015-05-17 Thread William Dunn
Hello Maks,

As I think Sam suspects the issue might be that you may not have enough
RAM, or not enough RAM is allocated to shared_buffers, or you may have this
table's data being evicted from shared_buffers because of some other
queries, so while you are identifying all the rows in your fast index
scan retrieving the records from disk is very slow. You might want to
investigate your shared buffers like so:

Buffer hit rate for the table:
SELECT heap_blks_hit/(heap_blks_hit+heap_blks_read) AS buffer_hit_ratio
FROM pg_statio_user_tables
WHERE relname='users';

Buffer hit rate for the db:
SELECT heap_blks_hit/(heap_blks_hit+heap_blks_read) AS buffer_hit_ratio
FROM pg_statio_user_tables;

Get the current shared buffers setting:
SELECT current_setting('shared_buffers');

If you are going to use 'EXPLAIN (ANALYZE 1,BUFFERS 1)' as Sam suggested it
might be good to use some different commonly accessed profile_ids than the
ones you just used because those ones will be more likely to be in buffers
than usual (since you just ran the same query via explain analyze)


*Will J. Dunn*
*willjdunn.com *

On Sun, May 17, 2015 at 12:49 AM, Samuel Smith  wrote:

> On 05/16/2015 10:44 PM, Maks Materkov wrote:
>
>> I have a database, table "users", with column "profile_id", and the
>> following query:
>>
>>
>> EXPLAIN ANALYZE SELECT * FROM users_user WHERE profile_id IN (...50
>> ids...);
>>
>>
>> Result:
>>
>>
>> Index Scan using users_user_83a0eb3f on users_user  (cost=0.50..292.22
>> rows=50
>>
>> width=633) (actual time=0.039..0.622 rows=44 loops=1)
>>
>> Index Cond: (profile_id = ANY ('{2445564,... 50 ids}'::integer[]))
>>
>>   Planning time: 0.322 ms
>>
>>   Execution time: 5192.321 ms
>>
>> This query takes ~5 seconds. (about ~5 million rows in table). I have a
>> btree index on this column. Is there any way to speed up query?
>>
>>
> That does not seem right. Try again only using 'EXPLAIN (ANALYZE 1,BUFFERS
> 1)'
> Also what version are you on and what are shared_buffers set to?
>
> --Sam
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: R: [GENERAL] Index on integer or on string field

2015-05-15 Thread William Dunn
Hello Francesco,

You should probably set timing on, run an explain analyze, and use pgbadger
to diagnose your performance issue.

While it may be the case that comparison in the index might be slightly
faster because of the modulo arithmetic, those in-memory operations are
extremely fast and it is likely that the seek in that index is the fastest
part of your query. And since you only have 50 distinct values the btree is
probably extremely shallow and there will be very few comparisons anyway.

I don't know much about your query but I suspect that the issue is that
your index scan is not selective enough so Postgres needs to scan a lot
from disk (which is *extremely* slow). If you want to improve the
performance you should first try to make the query as selective as
possible, and try to put an index on a more selective column of the WHERE
clause.

If you really must rely primarily on that column which has only 50 distinct
values you can try periodically running a CLUSTER command on the table for
that column index (doc:
http://www.postgresql.org/docs/devel/static/sql-cluster.html), or partition
the table (doc:
http://www.postgresql.org/docs/devel/static/ddl-partitioning.html) so that
the data you are scanning is close together on disk and you can get as much
of it per IO operation as possible.

*Will J. Dunn*
*willjdunn.com *

On Fri, May 15, 2015 at 1:32 PM, Arthur Silva  wrote:

> Yes that's my suggestion. Btree-Gin deals with lots of repeated values
> much better than the Btree index as repeated keys are only stored once.
> Em 15/05/2015 12:38, "Job"  escreveu:
>
>>  Hello Arthur!
>>
>> So, i read that btree-gin have got "the ability to enforce uniqueness".
>>
>> If in this 10.millions long table i have, in index, 50 recurring values,
>> i can leave the alphabetical field and change to btree-gin the index on it?!
>>
>> Thank you!
>> Francesco
>>
>>  --
>> *Da:* Arthur Silva [arthur...@gmail.com]
>> *Inviato:* venerdì 15 maggio 2015 17.26
>> *A:* Job
>> *Cc:* pgsql-general@postgresql.org
>> *Oggetto:* Re: [GENERAL] Index on integer or on string field
>>
>>   You should probably experiment with a btree-gin index on those.
>> Em 15/05/2015 12:22, "Job"  escreveu:
>>
>>> Hello,
>>>
>>> i have a table of about 10 millions of records, with the index on a
>>> string field.
>>> Actually is alphabetical; since queries are about 100/200 per seconds, i
>>> was looking for a better way to improve performance and reduce workload.
>>>
>>> The unique values, of that fields, are about the 50 (category name), and
>>> we could create a second table to codify, with numerical integer values,
>>> the 50 recurring names.
>>>
>>> Is index are integer and not characteral, performance are better and
>>> workload reduces?
>>>
>>> Is there any comparisons?
>>>
>>> Thank you!
>>> Francesco
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>


Re: [GENERAL] finding tables about to be vacuum freezed

2015-05-12 Thread William Dunn
Hello Steve,

Great monitoring query (https://gist.github.com/skehlet/36aad599171b25826e82).
I suggest modifying the value "autovacuum_freeze_table_age" to
"LEAST(autovacuum_freeze_table_age,(0.95*autovacuum_freeze_max_age))
AS autovacuum_freeze_table_age" since PostgreSQL implicitly
limits vacuum_freeze_table_age to 95% of autovacuum_freeze_max_age (as
documented at:
http://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-VACUUM-FREEZE-TABLE-AGE
).

It might also be cool to add something like:
  (((2^32)/2)-1-100) AS wraparound_dbfreeze_age,
  ROUND(100.0 * age(relfrozenxid) / (((2^32)/2)-1-100)::numeric, 1) ||
'%' AS "% til wraparound db freeze"
to monitor how close it is getting to the point at which it gets close to
going into safety shutdown mode (as documented in
http://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND)
in case there is a problem with autovacuum (rare, but if you are already
running that query adding it is cheap and nice for completeness).

You have the count of rows there already, but it might also be nice (albeit
more computationally expensive) to add the size of the table so you can
evaluate if it's really worth doing a manual off-hours vacuum of the table
to avoid the scan happening during peak hours:
(pg_relation_size(pg_class.oid)) AS table_bytes


For your situation it might be good to set vacuum_freeze_min_age to a very
low value, which will make it more likely that your ordinary vacuums are
more likely to freeze the rows and advance relfrozenxid, and to keep
autovacuum tuned aggressive (for bloat/ space reclamation purposes). This
may favor many short/cheap operations and help avoid the long lock heavy
vacuum freeze operations. Increasing autovacuum_max_workers will also help
avoid having freeze operation of one table monopolize the autovacuum
workers.

*Will J. Dunn*
*willjdunn.com *

On Tue, May 12, 2015 at 12:51 PM, Steve Kehlet 
wrote:

> On Wed, May 6, 2015 at 7:24 PM Jeff Janes  wrote:
>
>> I've booked-marked these but haven't really looked into them to any
>> extent.  It would be awesome if you put the SQL one somewhere on
>> http://wiki.postgresql.org.  That way it is easier to find, and anyone
>> who finds it can contribute explanations, corrections, and update it to
>> keep up with changes to the database.
>>
>
> I'll see if I can do this!
>
>
>> I don't have any experience with 3.5TB databases, but I certainly think
>> that that is something to worry about.
>>
>
> We did hit the autovacuum_freeze_max_age threshold on Saturday and the
> autovacuumer has been running for days now, slowly cranking through each
> table. Fortunately, I had autovacuum_vacuum_cost_delay set so the IO impact
> isn't bad: no complaints from the customer, and our ops group says the IO
> load is okay. So Postgres is just quietly doing its thing. This has clearly
> happened numerous times before, and explains a few mysterious incidents in
> the past where a nightly analyze script has hung for several days. It's
> really great to understand this better now.
>
>
>>  There are two main problems you are likely to encounter (from what I
>> know):
>>
>> One is that the autovacuum scheduler deals poorly with a database
>> exceeding autovacuum_freeze_max_age.  It forces all available to autovacuum
>> resources to be directed to that database, starving any other database of
>> attention.  If you have multiple active databases, by the time one database
>> has been frozen enough to no longer exceed autovacuum_freeze_max_age, the
>> other one(s) might be horribly bloated.  If your cluster only has one
>> active database in it, this won't be a problem.  The one that gets all the
>> attention is the one that needs all the attention.  But if you have
>> multiple active databases in your cluster, this could be a problem.
>>
>
> Fortunately in this case it's just one database, but good to know.
>
>
>> The other problem is that autovac takes a fairly strong lock out on the
>> table while it is vacuuming it.  Normally it relinquishes the lock once it
>> realizes someone else is waiting on it.  But in the case of a forced
>> full-table scan (either autovacuum_freeze_max_age or
>> vacuum_freeze_table_age is exceeded), it refuses to relinquish the lock.
>> This means that any process which needs a strong-ish table lock (add
>> column, drop column, create index, drop index, cluster, truncate, reindex,
>> etc.) is going to block for potentially a very very long time.  This is
>> only a problem if you actually have such processes.  If all you do is
>> select, insert, update, delete, none of those things will be blocked.
>>
>
> You're right, that was the exact problem that got me going down this path.
> Some of our guys were doing a software update and some CREATE INDEX
> operations it wanted to do were blocked by the autovacuumer. Fortunately,
> we don't do software updates all the time, but it's falling onto me to
> figure out how t

Re: [GENERAL] PGFoundry Sample databases (particularly world)?

2015-05-05 Thread William Dunn
PgFoundry.org went down some months ago, I contacted webmaster Marc
Fournier and he was able to get it back up but a lot of it no longer works
and I don't think he responded to my follow-up. For the most part top pages
are broken but sub-pages are still there (just very hard to navigate to and
find).

If you are looking for something specific you will need to google for it
rather than navigating through pgfoundry.org. For your example (Sample
Databases) I googled "site:PgFoundry.org sample databases" and I think the
first hit is what you are looking for:
http://pgfoundry.org/frs/?group_id=1000150. You'll notice that you can
navigate through the other sub-pages in that project but if you go to the
Summary page it will be broken.

*Will J. Dunn*
*willjdunn.com *

On Tue, May 5, 2015 at 7:45 AM, Vincent Veyron  wrote:

> On Mon, 4 May 2015 23:05:11 -0700
> Guyren Howe  wrote:
>
> > And what is going on at pgfoundry?
> >
>
> I don't follow them personally, but I saw this recent thread, which does
> not bode well :
>
> http://www.postgresql.org/message-id/20150409153813.ga7...@gmail.com
>
>
> --
> Salutations, Vincent Veyron
>
> https://marica.fr/
> Gestion des contentieux, des dossiers de sinistres assurance et des
> contrats pour le service juridique
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] PostgreSQL HA config recommendations

2015-04-30 Thread William Dunn
Alex,
Note that you should be weary of suggestions to make your replication
synchronous. Synchronous replication is rarely used for this kind of use
case (Cisco Jabber) where the most complete durability of the standby is
not of the utmost concern (as it would be in a banking application). Not
only will it decrease performance, but since you expect to have only one
local standby it could actually decrease your availability because if your
standby went down no transactions would be able to commit on the master.
See the Synchronous Replication section of the docs for more details (
http://www.postgresql.org/docs/devel/static/warm-standby.html).

Also note that the suggestion provided by Fabio that you should not have
your application commit more than one transaction per user operation is
only applicable in synchronous replication (though since this is for a Cisco
Jabber, where you neither have control over nor much information regarding
the number of commits sent by the transaction per user operation, that
suggestion is not applicable anyway...). In the case of asynchronous
master-slave replication the typical issue with streaming replication
latency is that you have transactions going to the master and then the
application sends a read only transaction to the slave before the slave
receives the transaction. So long as you don't have the application
consider the user operation completed before all the transactions are
committed I don't think having multiple transactions would make your
replication latency issue any less.

For example, if you had a calendar application where a user enters event
details and creates an event for the calendar. The application may be set
up to execute 2 transactions, 1) Add the event and details to the calendar
events table and 2) once the event creation transaction returns add the
current user as an attendee for that event. In this case both transactions
would be going against the master, so how far the slave is behind wouldn't
be a factor. Of course it would be faster overall to send the inserts as a
single database procedure, but that all goes against the master database so
the streaming replication is not a factor in that consideration.


*William J. Dunn*
*willjdunn.com *

*William J. Dunn*
*P* 978-844-4427 |  *dunn...@gmail.com *
*du...@bu.edu *

On Thu, Apr 30, 2015 at 9:02 AM, Fabio Ugo Venchiarutti 
wrote:

> > WAN delays can cause problems for any replication system; you just have
> > to be aware of that and not push things too hard (or try and violate the
> > laws of physics). For example, streaming replication set to be
> > synchronous crossing the planet is something you'd probably be rather
> > unhappy with. :)
>
>
> In my experience streaming replication fits most use cases due to inherent
> its simplicity and robustness, but you might need to adjust your software
> design to get the best out of it.
>
>
> More specifically, latency issues can be heavily mitigated by having
> application software commit no more than one transaction per user
> operation, provided 1 x "master<->sync_slave round trip time" is acceptable
> delay when they submit forms or the like.
>
> It can get much worse if the application server is on a different
> geographical node than the DB master. In such case it is realistically
> beneficial to batch multiple write operations in a single STATEMENT instead.
> If the replication synchronous slave is on yet another node, the best case
> (single statement) scenario would be 2 x round trip time. This
> configuration is more common than you might think as some setups feature
> remote app servers reading off synchronous slaves at their own physical
> location but committing against a master that is somewhere else.
>
>
> Cheers
>
>
>
>
>
>
>
> On 30/04/15 11:06, Jim Nasby wrote:
>
>> On 4/29/15 1:13 PM, Alex Gregory wrote:
>>
>>> I was thinking that I could use Slony but then I read that it does not
>>> like WAN replication.  I have also read about streaming replication
>>> native to Postgres but was not sure how that would work over the WAN.
>>> Bucardo seems better for Data Warehousing or multimaster situations
>>> which this is not.  That leaves pgpool ii which seems like it would
>>> add an extra layer of complexity.
>>>
>>
>> WAN delays can cause problems for any replication system; you just have
>> to be aware of that and not push things too hard (or try and violate the
>> laws of physics). For example, streaming replication set to be
>> synchronous crossing the planet is something you'd probably be rather
>> unhappy with. :)
>>
>> I haven't played with Slony in forever, but when I did it loved to lock
>> things. That would not play well with high latency.
>>
>> I have run londiste between sites within the same city, and that worked
>> well.
>>
>> Bucardo and pg_pool are both based on the idea of replaying SQL
>> statements instead of replicating actual data. They have their uses, but
>> I personally distrust that idea, espe

Re: [GENERAL] PostgreSQL HA config recommendations

2015-04-30 Thread William Dunn
Alex,
Note that you should be weary of suggestions to make your replication
synchronous. Synchronous replication is rarely used for this kind of use
case (Cisco Jabber) where the most complete durability of the standby is
not of the utmost concern (as it would be in a banking application). Not
only will it decrease performance, but since you expect to have only one
local standby it could actually decrease your availability because if your
standby went down no transactions would be able to commit on the master.
See the Synchronous Replication section of the docs for more details (
http://www.postgresql.org/docs/devel/static/warm-standby.html).

Also note that the suggestion provided by Fabio that you should not have
your application commit more than one transaction per user operation is
only applicable in synchronous replication (though since this is for a Cisco
Jabber, where you neither have control over nor much information regarding
the number of commits sent by the transaction per user operation, that
suggestion is not applicable anyway...). In the case of asynchronous
master-slave replication the typical issue with streaming replication
latency is that you have transactions going to the master and then the
application sends a read only transaction to the slave before the slave
receives the transaction. So long as you don't have the application
consider the user operation completed before all the transactions are
committed I don't think having multiple transactions would make your
replication latency issue any less.

For example, if you had a calendar application where a user enters event
details and creates an event for the calendar. The application may be set
up to execute 2 transactions, 1) Add the event and details to the calendar
events table and 2) once the event creation transaction returns add the
current user as an attendee for that event. In this case both transactions
would be going against the master, so how far the slave is behind wouldn't
be a factor. Of course it would be faster overall to send the inserts as a
single database procedure, but that all goes against the master database so
the streaming replication is not a factor in that consideration.


*William J. Dunn*
*P* 978-844-4427 |  *dunn...@gmail.com *
*du...@bu.edu *

On Thu, Apr 30, 2015 at 9:02 AM, Fabio Ugo Venchiarutti 
wrote:

> > WAN delays can cause problems for any replication system; you just have
> > to be aware of that and not push things too hard (or try and violate the
> > laws of physics). For example, streaming replication set to be
> > synchronous crossing the planet is something you'd probably be rather
> > unhappy with. :)
>
>
> In my experience streaming replication fits most use cases due to inherent
> its simplicity and robustness, but you might need to adjust your software
> design to get the best out of it.
>
>
> More specifically, latency issues can be heavily mitigated by having
> application software commit no more than one transaction per user
> operation, provided 1 x "master<->sync_slave round trip time" is acceptable
> delay when they submit forms or the like.
>
> It can get much worse if the application server is on a different
> geographical node than the DB master. In such case it is realistically
> beneficial to batch multiple write operations in a single STATEMENT instead.
> If the replication synchronous slave is on yet another node, the best case
> (single statement) scenario would be 2 x round trip time. This
> configuration is more common than you might think as some setups feature
> remote app servers reading off synchronous slaves at their own physical
> location but committing against a master that is somewhere else.
>
>
> Cheers
>
>
>
>
>
>
>
> On 30/04/15 11:06, Jim Nasby wrote:
>
>> On 4/29/15 1:13 PM, Alex Gregory wrote:
>>
>>> I was thinking that I could use Slony but then I read that it does not
>>> like WAN replication.  I have also read about streaming replication
>>> native to Postgres but was not sure how that would work over the WAN.
>>> Bucardo seems better for Data Warehousing or multimaster situations
>>> which this is not.  That leaves pgpool ii which seems like it would
>>> add an extra layer of complexity.
>>>
>>
>> WAN delays can cause problems for any replication system; you just have
>> to be aware of that and not push things too hard (or try and violate the
>> laws of physics). For example, streaming replication set to be
>> synchronous crossing the planet is something you'd probably be rather
>> unhappy with. :)
>>
>> I haven't played with Slony in forever, but when I did it loved to lock
>> things. That would not play well with high latency.
>>
>> I have run londiste between sites within the same city, and that worked
>> well.
>>
>> Bucardo and pg_pool are both based on the idea of replaying SQL
>> statements instead of replicating actual data. They have their uses, but
>> I personally distrust that idea, especially for DR.
>>
>>  When it comes down to to there are s

Re: [GENERAL] PostgreSQL HA config recommendations

2015-04-29 Thread William Dunn
The streaming replication built into PostgreSQL would work fine for your
use case, assuming that you are OK with having only one primary supporting
writes and any slaves being read only as it currently (9.0-9.4) only
supports a single master. This will put minimal load on your primary server
and in most cases will get you what you need. An excellent benefit of using
the built in streaming replication in PostgreSQL 9.4 or newer for WAN
replication is that with Replication Slots the master will keep track of
when the slave gets disconnected or falls behind and will retain WAL logs
as necessary. It puts minimal load on the master as the WAL logs are
written regardless and adding additional details to them are cheap. Slony
and Bucardo use triggers which put transactional load on the master, and
aren't really feasible over a distant WAN.

A common configuration is to have master-slave replication set up via
streaming replication and using pgpool-II to load balance. pgpool-II can be
configured to send all the writes to the master and distribute selects to
both. However, this will not get you all the desired HA you want because
pgpool-II does not have any logic to promote the slave to become the new
master if the master goes down. It is very easy to promote a slave to be a
master (you simply create a file that triggers auto-promote, then
reconfigure pgpool or do a DNS switch to point the application there) but
to have failover completely automated is much more complicated and
pgpool-II will not get you there.

The built in streaming replication can only replicate your entire
PostgreSQL cluster, so if you need finer grain control over what to
replicate (for example only a particular database of the cluster) you will
need to look to one of the other tools, such as Slony.

I would also recommend you take a look at the BDR Project from 2ndQuadrant.
The docs are located at http://bdr-project.org/docs/stable/index.html. Like
Bucardo it provides master-master replication capabilities with conflict
resolution which may allow you to avoid having logic differentiating write
transactions and directing them to a particular PostgreSQL Cluster, and
will avoid you having to have logic to promote slaves in the case of
failure of the master. Unlike Bucardo it uses streaming replication rather
than triggers so load on the master is minimal. The primary components of
the BDR project are being incorporated into core PostgreSQL and will very
likely be part of the standard streaming replication in PostgreSQL 9.5 and
above.

*Will J. Dunn*
willjdunn.com

On Wed, Apr 29, 2015 at 2:57 PM, Joshua D. Drake 
wrote:

>
> On 04/29/2015 10:53 AM, Alex Gregory wrote:
>
>> Hello-
>>
>> I have been doing lots of reading and I really want to make sure that I
>> get this HA architecture I am working on correct.  I figured the best way
>> would be to reach out to the community for advice.
>>
>> I am installing Cisco Jabber and want to use Postgres for the back end.
>> The Postgres servers will be running on CentOS.
>>
>> The requirement is to have two servers in HA using a database stored on
>> shared NetApp Filer storage in California.  A third server will be in
>> Ireland for DR purposes.  There only really needs to be one writeable
>> server in California if it keeps things simple.  Automatic conflict
>> resolution should be easier this way causing lower overhead.
>>
>> I was thinking that I could use Slony but then I read that it does not
>> like WAN replication.  I have also read about streaming replication native
>> to Postgres but was not sure how that would work over the WAN.  Bucardo
>> seems better for Data Warehousing or multimaster situations which this is
>> not.  That leaves pgpool ii which seems like it would add an extra layer of
>> complexity.
>>
>> When it comes down to to there are so many choices I am not sure if I
>> need one or a combination of two.Any help you could provide could be
>> greatly appreciated.
>>
>
> This is a can of worms topic but:
>
> You can use streaming replication (or log shipping) asynchronously which
> will allow you to use it over  WAN just fine.
>
> Other than that, use the Linux HA suite. That is what it is there for.
>
> JD
>
>
> --
> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
> Announcing "I'm offended" is basically telling the world you can't
> control your own emotions, so everyone else should do it for you.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: R: [GENERAL] DB on mSATA SSD

2015-04-23 Thread William Dunn
Additional things to consider for decreasing pressure on the cheap drives:


   - Another configuration parameter to look into
   is effective_io_concurrency. For SSD we typically set it to 1 io per
   channel of controller card not including the RAID parity drives. If you
   decrease this value PostgreSQL will not try to initiate as many parallel io
   operations.
   http://www.postgresql.org/docs/8.4/static/runtime-config-resource.html


   - A strategy Instagram used is to (manually) vertically partition tables
   that have some, but not all, columns updated frequently. When PostgreSQL
   updates a data value for a column it writes a new copy of the entire row
   and marks the original row as garbage. If you have tables with many rows
   but many of them are updated infrequently (Instagram's example was a table
   with user information where "last login" was updated very frequently but
   other information about the user was rarely updated) you could split the
   frequently updated columns into a separate table to reduce io. However note
   that PostgreSQL does not have features to support vertical partitioning
   directly so to employ this technique you would need to manually partition
   and update your SQL code accordingly.


   - Also check into your WAL and logging settings to make sure you aren't
   writing more to them than you need to.

*Will J Dunn*

On Thu, Apr 23, 2015 at 10:38 AM, Jim Nasby 
wrote:

> On 4/23/15 8:36 AM, Job wrote:
>
>> Hello, thank you first of all for your wonder help!
>>
>> Tomas, regarding:
>>
>>  There are ways to make the writes less frequent, both at the database
>>> and OS level. We don't know what's your PostgreSQL config, but making
>>> the checkpoints less frequent and tuning the kernel/mount options may
>>> help a lot.
>>>
>>
>> We can raise up checkpoints, at kernel-side which options do you suggest
>> to tune for minimize disk writing?
>>
>
> Certainly disable atime updates if you haven't already.
>
> Having a long checkpoint period is somewhat similar to running in a ram
> disk and doing periodic backups. BTW, if you want to try using backups, I
> recommend you setup actual PITR archiving to the SSD. That will write data
> sequentially and in larger chunks, which should help the SSD better deal
> with the writes. This will give you more control over how much data you
> lose during a crash. Though, if all you do is a single large update once a
> day you're probably better off just taking a backup right after the update.
> I would also look at the backup size and recovery time of pg_dump vs PITR
> or a filesystem snapshot; it could be significantly smaller. It might take
> longer to restore though.
>
> BTW, if you go the ramdisk route you should turn off fsync; there's no
> point in the extra calls to the kernel. Only do that if the ENTIRE database
> is in a ramdisk though.
>
>  We have a table, about 500Mb, that is updated and written every day.
>> When machines updates, table is truncated and then re-populated with
>> pg_bulk.
>> But i think we strongly writes when importing new data tables..
>>
>
> That depends on how much data has actually changed. If most of the data in
> the table is changed then truncate and load will be your best bet. OTOH if
> relatively little of the data has changed you'll probably get much better
> results by loading the data into a loading table and then updating changed
> data, deleting data that shouldn't be there anymore, and inserting new data.
>
> You definitely want the loading table to not be on SSD, and to be
> unlogged. That means it needs to go in a tablespace on a ram disk. True
> temporary tables (CREATE TEMP TABLE) are not logged, but I don't know that
> that will work well with pg_bulk. You can use a real table with the
> unlogged option to the same effect (though, I'm not sure if unlogged is
> available in 8.4).
>
> You also need to consider the indexes. First, make absolutely certain you
> need all of them. Get rid of every one that isn't required. Second, you
> need to test the amount of data that's written during an update with the
> indexes in place *and doing a subsequent VACCUM* compared to dropping all
> the indexes and re-creating them.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] PL\pgSQL 'ERROR: invalid input syntax for type oid:' [PostgreSQL 9.3.6 and 9.4]

2015-04-17 Thread William Dunn
Thanks Adrian!  Changing the declaration row_data to be of type RECORD
(rather than pg_catalog.pg_class%ROWTYPE) resolved the error :)

- Will

*Will J Dunn*
*willjdunn.com <http://willjdunn.com/>*

On Thu, Apr 16, 2015 at 4:36 PM, Adrian Klaver 
wrote:

> On 04/16/2015 07:52 AM, William Dunn wrote:
>
>> Hello list,
>>
>> I am creating a plpgsql procedure in Postgres 9.4 (also testing in
>> 9.3.6) to move all of the tables that are not in a default tablespace
>> (pg_default, pg_global, or 0) into the tablespace pg_default. However
>> when it executes I get an error 'ERROR:  invalid input syntax for type
>> oid:' which I do not know how to resolve..
>>
>> The procedure executes the following select query, which returns the
>> /relname
>> <http://www.postgresql.org/docs/devel/static/catalog-pg-class.html>
>> /(tablename, type /name/) and /nspname
>> <http://www.postgresql.org/docs/devel/static/catalog-pg-namespace.html>
>> /(schema name, type /name/) of each table that are not in the default
>> tablespaces, into a variable called /row_data/ (of type
>>
>> pg_catalog.pg_class%ROWTYPE):
>>
>> SELECT pg_class.relname, pg_namespace.nspname
>> FROM pg_class INNER JOIN pg_namespace ON
>> pg_class.relnamespace=pg_namespace.oid
>> WHERE pg_class.reltablespace<>(SELECT oid FROM pg_tablespace WHERE
>> spcname='pg_default')
>> AND pg_class.reltablespace<>(SELECT oid FROM pg_tablespace WHERE
>> spcname='pg_global')
>> AND pg_class.reltablespace<>0
>> AND pg_class.relkind='r'
>> ORDER BY pg_class.relname;
>>
>> Using the example database EDBSTORE (example database provided by
>> Enterprise DB) the query returned the table 'inventory' which was in
>> schema 'edbstore' (which I had stored on tablespace 'edbstore', not
>> pg_default):
>>relname  | nspname
>> ---+--
>>   inventory | edbstore
>> (1 row)
>>
>>
>> The procedure loops through each returned row and executes an ALTER
>> TABLE command to move them to the tablespace pg_default:
>> EXECUTE 'ALTER TABLE '|| row_data.nspname || '.' || row_data.relname ||'
>> SET TABLESPACE pg_default';
>>
>> (so in the above edbstore example it should execute "ALTER TABLE
>> edbstore.inventory SET TABLESPACE pg_default;")
>>
>> However, when I run the procedure it is returning the following error:
>> ERROR:  invalid input syntax for type oid: "edbstore"
>> CONTEXT:  PL/pgSQL function move_table_tablespaces_to_pg_default()
>> line 18 at FOR over SELECT rows
>>
>> Does anyone understand this error?
>>
>
> pg_class has a hidden field oid:
>
> http://www.postgresql.org/docs/9.3/interactive/catalog-pg-class.html
>
> When you are doing:
>
> row_data pg_catalog.pg_class%ROWTYPE;
>
> that is saying you want the whole row type for pg_class:
>
>
> http://www.postgresql.org/docs/9.3/interactive/plpgsql-declarations.html#PLPGSQL-DECLARATION-ROWTYPES
>
> You are not supplying the oid or the columns other then relname and
> nspname so the error is expected.
>
> If it where me I would use a RECORD type:
>
>
> http://www.postgresql.org/docs/9.3/interactive/plpgsql-declarations.html#PLPGSQL-DECLARATION-RECORDS
>
> It will adapt to the columns actually returned.
>
>
>> The full plpgsql function is as follows:
>> ~~~
>> CREATE FUNCTION move_table_tablespaces_to_pg_default () RETURNS INTEGER
>> AS $$
>> -- Loops through the tables not in the tablespace pg_default, pg_global,
>> or the default tablespace and moves them to the pg_default tablespace
>> -- Returns the number of tables that were moved
>>
>> DECLARE
>>
>>-- Declare a variable to hold the counter of tables moved
>>objects_affected INTEGER = 0;
>>
>>-- Declare a variable to hold rows from the pg_class table
>>row_data pg_catalog.pg_class%ROWTYPE;
>>
>> BEGIN
>>
>>-- Iterate through the results of a query which lists all of the
>> tables not in the tablespace pg_default, pg_global, or the default
>> tablespace
>>FOR row_data IN (SELECT pg_class.relname, pg_namespace.nspname
>> FROM pg_class INNER JOIN pg_namespace ON
>> pg_class.relnamespace=pg_namespace.oid
>>
>> WHERE pg_class.reltablespace<>(SELECT oid FROM pg_tablespace WHERE
>> spcname='pg_default&#x

[GENERAL] PL\pgSQL 'ERROR: invalid input syntax for type oid:' [PostgreSQL 9.3.6 and 9.4]

2015-04-16 Thread William Dunn
Hello list,

I am creating a plpgsql procedure in Postgres 9.4 (also testing in 9.3.6)
to move all of the tables that are not in a default tablespace (pg_default,
pg_global, or 0) into the tablespace pg_default. However when it executes I
get an error 'ERROR:  invalid input syntax for type oid:' which I do not
know how to resolve..

The procedure executes the following select query, which returns the *relname

*(tablename,
type *name*) and *nspname

*(schema
name, type *name*) of each table that are not in the default tablespaces,
into a variable called *row_data* (of type pg_catalog.pg_class%ROWTYPE):

   SELECT pg_class.relname, pg_namespace.nspname
   FROM pg_class INNER JOIN pg_namespace ON
pg_class.relnamespace=pg_namespace.oid
   WHERE pg_class.reltablespace<>(SELECT oid FROM pg_tablespace WHERE
spcname='pg_default')
   AND pg_class.reltablespace<>(SELECT oid FROM pg_tablespace WHERE
spcname='pg_global')
   AND pg_class.reltablespace<>0
   AND pg_class.relkind='r'
   ORDER BY pg_class.relname;

Using the example database EDBSTORE (example database provided by
Enterprise DB) the query returned the table 'inventory' which was in schema
'edbstore' (which I had stored on tablespace 'edbstore', not pg_default):
  relname  | nspname
---+--
 inventory | edbstore
(1 row)


The procedure loops through each returned row and executes an ALTER TABLE
command to move them to the tablespace pg_default:
EXECUTE 'ALTER TABLE '|| row_data.nspname || '.' || row_data.relname ||'
SET TABLESPACE pg_default';

(so in the above edbstore example it should execute "ALTER TABLE
edbstore.inventory SET TABLESPACE pg_default;")

However, when I run the procedure it is returning the following error:
   ERROR:  invalid input syntax for type oid: "edbstore"
   CONTEXT:  PL/pgSQL function move_table_tablespaces_to_pg_default() line
18 at FOR over SELECT rows

Does anyone understand this error?

The full plpgsql function is as follows:
~~~
CREATE FUNCTION move_table_tablespaces_to_pg_default () RETURNS INTEGER AS
$$
-- Loops through the tables not in the tablespace pg_default, pg_global, or
the default tablespace and moves them to the pg_default tablespace
-- Returns the number of tables that were moved

   DECLARE

  -- Declare a variable to hold the counter of tables moved
  objects_affected INTEGER = 0;

  -- Declare a variable to hold rows from the pg_class table
  row_data pg_catalog.pg_class%ROWTYPE;

   BEGIN

  -- Iterate through the results of a query which lists all of the
tables not in the tablespace pg_default, pg_global, or the default
tablespace
  FOR row_data IN (SELECT pg_class.relname, pg_namespace.nspname FROM
pg_class INNER JOIN pg_namespace ON pg_class.relnamespace=pg_namespace.oid

WHERE pg_class.reltablespace<>(SELECT oid FROM pg_tablespace WHERE
spcname='pg_default') AND pg_class.reltablespace<>(SELECT oid FROM
pg_tablespace WHERE

spcname='pg_global') AND pg_class.reltablespace<>0 AND pg_class.relkind='r'
ORDER BY pg_class.relname)  LOOP

 -- execute ALTER TABLE statement on that table to move it to
tablespace pg_default
 EXECUTE 'ALTER TABLE '|| row_data.nspname || '.' ||
row_data.relname ||' SET TABLESPACE pg_default';

 -- increment count of tables moved
 objects_affected := objects_affected + 1;
  END LOOP;

  -- Return count of tables moved
  -- RETURN objects_affected;
   END;
$$ LANGUAGE 'plpgsql';

Thanks!!
Will

*Will J Dunn*
*willjdunn.com *