Re: [GENERAL] optimizing a query

2016-06-22 Thread David G. Johnston
On Wed, Jun 22, 2016 at 2:23 PM, Jonathan Vanasco  wrote:

>
> Note 2:
>
> This is odd, but this index is used by the planner:
> CREATE INDEX idx_partial_fkey_id ON table_a(fkey_1, id)
> WHERE col_partial IS NOT FALSE;
>
> but this index is never used:
> CREATE INDEX idx_partial_id_fkey ON table_a(id, fkey_1)
> WHERE col_partial IS NOT FALSE;
>
> I honestly don't know why the second index would not be used.  The
> query time doubled without it when run on a table with 6million rows and
> about 20 columns.
>
> ---
>
> The indexes I tested on:
>
> CREATE INDEX idx_fkey_1 ON table_a(fkey_1);
> CREATE INDEX idx_partial_fkey ON table_a(fkey_1) WHERE col_partial
> IS NOT FALSE;
> CREATE INDEX idx_partial_fkey_id ON table_a(fkey_1, id) WHERE
> col_partial IS NOT FALSE;
> CREATE INDEX idx_partial_id_fkey ON table_a(id, fkey_1) WHERE
> col_partial IS NOT FALSE;
> CREATE INDEX idx_partial_fkey_partial ON table_a(fkey_1,
> col_partial) WHERE col_partial IS NOT FALSE;
> CREATE INDEX idx_partial_fkey_id_partial ON table_a(fkey_1, id,
> col_partial) WHERE col_partial IS NOT FALSE;
>
>
​There is no relation named "table_a" anywhere in your past emails.

Your "t_a" relation is (id : int, col1 : int, ​col_2 : bool) not (id :
int?, fkey_1 : int?, col_partial : bool), and given that t_a is an end
relation it doesn't typically have a foreign key.

1) t_a is going to use the indexes appropriate to its where clauses to
select records.  For the (id, fkey_1) index unless there is dependency [*]
(id => fkey_1) you have to scan the entire index so know that you've
located all records containing a given fkey_1 value when id is unspecified
(as it is in your query).

* if even there was a dependency I don't not believe PostgreSQL would be
able to use that knowledge during planning.

2) to join t_a2b an index on t_a2b having a_id as the first column would
likely be used to quickly locate matching records from [1].  No additional
indexes on t_a are going to help here.

You have a good head for this, and maybe I'm missing something obvious here
- this is not my strong suit.  The random, though similar, naming of
objects in your posts makes it a bit hard to follow.  As you found the
partial indexes might or might not be influencing the results.  And note
that the missing column influences the use of an index-only scan - if you
are going to get one of those anyway its quite possible a given index will
be worse than some non-index alternative.

David J.


Re: [GENERAL] optimizing a query

2016-06-22 Thread Jonathan Vanasco

On Jun 22, 2016, at 2:38 PM, David G. Johnston wrote:
> What query?  ​A self-contained email would be nice.​

This was the same query as in the previous email in the thread.  I didn't think 
to repeat it.  I did include it below.

> ​https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html
> 
> Note especially:
> 
> ​"Visibility information is not stored in index entries, only in heap 
> entries; ..."
> 
> The check against the heap isn't for the truthiness of the predicate but the 
> visibility of the row.

Thanks for this link. 

The table I worked on hasn't had any writes since a server restart, and 
according to those docs the queries should have been off the visibility map not 
the heap.  
However the amount of time to search is not in line with expectations for the 
visibility map. 

After reading the last paragraph about some index optimizations in 9.6 that 
looked related, I installed the RC on an another machine and dumped 2 tables 
from production to see if I would qualify for any improvements.  

>>> But there's a problem: the WHERE clause refers to success which is not 
>>> available as a result column of the index. Nonetheless, an index-only scan 
>>> is possible because the plan does not need to recheck that part of the 
>>> WHERE clause at runtime: all entries found in the index necessarily have 
>>> success = true so this need not be explicitly checked in the plan. 
>>> PostgreSQL versions 9.6 and later will recognize such cases and allow 
>>> index-only scans to be generated, but older versions will not.

The 9.6 branch planner optimizes for my query and realizes that it doesn't need 
to check the table:

So while this index is necessary on 9.5:
CREATE INDEX idx__9_5 ON table_a(column_1, id, column_2) WHERE column_2 
IS NOT FALSE;

This index works on 9.6
CREATE INDEX idx__9_6 ON table_a(column_1, id) WHERE column_2 IS NOT 
FALSE;

Considering I have several million rows, this has a noticeable effect .

Combined with the various improvements on 9.6, there is a huge difference in 
query speed:

9.6 runs the query with the smaller index in an average of 1200ms
9.5 runs the query with the larger index in an average of 2700ms


> ​This one requires knowledge of the query; but I am not surprised that 
> reversing the order of columns in a b-tree index has an impact.

I expected this to impact the decision on which index to use when multiple ones 
are available, or to offer poor performance -- but not to discount using the 
index entirely.


> ​All at once?

No.  I dropped all indexes to test, then for each column combination did:

CREATE INDEX foo_idx;
ANALYZE foo ;
EXPLAIN ANALYZE; 
DROP INDEX foo_idx;

I call Explain Analyze manually once for the plan, then via script 25x to 
average out execution times and account for cold-start vs having loaded all the 
indexes.  I shut down all other user processes on the machine as well.





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help on recovering my standby

2016-06-22 Thread Patrick B
I had the same issue... A slave server had missing wal_files... and it
wasn't synced.

I had to re-sync all the DB, by running the pg_basebackup command

So.. basically, what I did is:

1 - Ensure that the wal_files are being inserted into the slave
2 - Backup the recovery.conf, postgresql.conf and pg_hba.conf
3 - Delete all the current data folder, by doing: rm -rf
/var/lib/pgsql/9.2/data/*
4 - Running the pg_basebackup command to re-sync the DB from another slave
to the slave that I wanna fix
5 - Replace the .conf backup files into the new data folder
6 - Start postgres

And it worked nice

Patrick


Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Bill Moran
On Wed, 22 Jun 2016 10:20:38 +
Sameer Kumar  wrote:

> On Wed, Jun 22, 2016 at 6:08 PM Vlad Arkhipov  wrote:
> 
> > I am running PostgreSQL 9.5.
> >
> > CREATE TABLE t (id BIGINT NOT NULL PRIMARY KEY, name TEXT);
> >
> > The constraint that the data must satisfy is `there is no more than 3
> > records with the same name`.
> >
> > I am not in control of queries that modify the table, so advisory locks
> > can hardly be of help to me.
> >
> 
> Define a function which does a count of the rows and if count is 3 it
> return false if count is less it returns true.

An exclusion constraint might be a better solution.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres 9.5.2 upgrade to 9.6

2016-06-22 Thread David G. Johnston
On Wed, Jun 22, 2016 at 2:36 PM, Michelle Schwan 
wrote:

> I have a question about the upgrade from 9.5.2 to 9.6
>
>
>
> I know 9.6 is still in beta – but I’m trying to be ahead of the game!
>
>
>
> I found instructions on how to move from 9.4 to 9.5.  I will try testing
> these steps with 9.5 to 9.6
>
>
>
> However, I was wondering if there is a “nicer” way to upgrade – similar to
> the GUI installation.  I want to be able to have a “nice” recommendation
> for our clients when we request that they upgrade!  I would like this for
> the Major1 and Major2 updates.
>
>
>
> Is this possible or even available now?
>
>
>

​That would depend largely on your O/S and, related, installation method.

You mention you found instructions.  It would help if you disclosed them so
we have some baseline understanding of your existing knowledge.

David J.


Re: [GENERAL] optimizing a query

2016-06-22 Thread David G. Johnston
On Wed, Jun 22, 2016 at 2:23 PM, Jonathan Vanasco  wrote:

>
> On Jun 22, 2016, at 4:25 AM, Erik Gustafson wrote:
>
> > don't you want an index on t_a2b.col_a, maybe partial where col_a=1 ?
>
> that table has indexes on all columns.  they're never referenced because
> the rows are so short.  this was just an example query too, col_a has 200k
> variations
>
> After a lot of testing, I think I found a not-bug but possible
> area-for-improvement in the planner when joining against a table for
> filtering (using my production 9.5.2 box)
>
> I checked a query


What query?  ​A self-contained email would be nice.​

against multiple possible indexes using the related columns.  only one of
> indexes was on the table for each series of tests, and I analyzed the table
> after the drop/create of indexes.
>
>
> Note 1: The only time an index-only scan is used, is on this form:
>
> CREATE INDEX idx_partial_fkey_id_partial ON
> table_a(fkey_1, id, col_partial) WHERE col_partial IS NOT FALSE;
>
> Omitting the col_partial from being indexed will trigger a Bitmap
> Heap Scan on the full table with a recheck condition:
>
> CREATE INDEX idx_partial_fkey_id ON table_a(fkey_1, id)
> WHERE col_partial IS NOT FALSE;
>
> This shouldn't be necessary.  the planner knew that `col_partial`
> fulfilled the WHERE clause when it used the index, but scanned the table to
> check it anyways.
>
> On most tables the heap scan was negligible, but on a few larger
> tables it accounted a 20% increase in execution.
>
>
​https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html

Note especially:

​"Visibility information is not stored in index entries, only in heap
entries; ..."

The check against the heap isn't for the truthiness of the predicate but
the visibility of the row.


> Note 2:
>
> This is odd, but this index is used by the planner:
> CREATE INDEX idx_partial_fkey_id ON table_a(fkey_1, id)
> WHERE col_partial IS NOT FALSE;
>
> but this index is never used:
> CREATE INDEX idx_partial_id_fkey ON table_a(id, fkey_1)
> WHERE col_partial IS NOT FALSE;
>
> I honestly don't know why the second index would not be used.  The
> query time doubled without it when run on a table with 6million rows and
> about 20 columns.
>
>
​This one requires knowledge of the query; but I am not surprised that
reversing the order of columns in a b-tree index has an impact.

---
>
> The indexes I tested on:
>
> CREATE INDEX idx_fkey_1 ON table_a(fkey_1);
> CREATE INDEX idx_partial_fkey ON table_a(fkey_1) WHERE col_partial
> IS NOT FALSE;
> CREATE INDEX idx_partial_fkey_id ON table_a(fkey_1, id) WHERE
> col_partial IS NOT FALSE;
> CREATE INDEX idx_partial_id_fkey ON table_a(id, fkey_1) WHERE
> col_partial IS NOT FALSE;
> CREATE INDEX idx_partial_fkey_partial ON table_a(fkey_1,
> col_partial) WHERE col_partial IS NOT FALSE;
> CREATE INDEX idx_partial_fkey_id_partial ON table_a(fkey_1, id,
> col_partial) WHERE col_partial IS NOT FALSE;


​All at once?

David J.
​


[GENERAL] Postgres 9.5.2 upgrade to 9.6

2016-06-22 Thread Michelle Schwan
I have a question about the upgrade from 9.5.2 to 9.6

I know 9.6 is still in beta - but I'm trying to be ahead of the game!

I found instructions on how to move from 9.4 to 9.5.  I will try testing these 
steps with 9.5 to 9.6

However, I was wondering if there is a "nicer" way to upgrade - similar to the 
GUI installation.  I want to be able to have a "nice" recommendation for our 
clients when we request that they upgrade!  I would like this for the Major1 
and Major2 updates.

Is this possible or even available now?

Thanks,
Michelle Schwan
Database Developer  |  Architecture
msch...@opentext.com
Phone: (519) 888 7111 ext 3241
Website:

www.opentext.com


[http://mimage.opentext.com/alt_content/binary/images/emailsupport-logo-opentext.gif]

This email message is confidential, may be privileged, and is intended for the 
exclusive use of the addressee. Any other person is strictly prohibited from 
disclosing or reproducing it. If the addressee cannot be reached or is unknown 
to you, please inform the sender by return email and delete this email message 
and all copies immediately.



Re: [GENERAL] optimizing a query

2016-06-22 Thread Jonathan Vanasco

On Jun 22, 2016, at 4:25 AM, Erik Gustafson wrote:

> don't you want an index on t_a2b.col_a, maybe partial where col_a=1 ?  

that table has indexes on all columns.  they're never referenced because the 
rows are so short.  this was just an example query too, col_a has 200k 
variations 

After a lot of testing, I think I found a not-bug but possible 
area-for-improvement in the planner when joining against a table for filtering 
(using my production 9.5.2 box)

I checked a query against multiple possible indexes using the related columns.  
only one of indexes was on the table for each series of tests, and I analyzed 
the table after the drop/create of indexes.


Note 1: The only time an index-only scan is used, is on this form:

CREATE INDEX idx_partial_fkey_id_partial ON table_a(fkey_1, id, 
col_partial) WHERE col_partial IS NOT FALSE;

Omitting the col_partial from being indexed will trigger a Bitmap Heap 
Scan on the full table with a recheck condition:

CREATE INDEX idx_partial_fkey_id ON table_a(fkey_1, id) WHERE 
col_partial IS NOT FALSE;

This shouldn't be necessary.  the planner knew that `col_partial` 
fulfilled the WHERE clause when it used the index, but scanned the table to 
check it anyways.

On most tables the heap scan was negligible, but on a few larger tables 
it accounted a 20% increase in execution.

Note 2:

This is odd, but this index is used by the planner:
CREATE INDEX idx_partial_fkey_id ON table_a(fkey_1, id) WHERE 
col_partial IS NOT FALSE;

but this index is never used:
CREATE INDEX idx_partial_id_fkey ON table_a(id, fkey_1) WHERE 
col_partial IS NOT FALSE;

I honestly don't know why the second index would not be used.  The 
query time doubled without it when run on a table with 6million rows and about 
20 columns.

---

The indexes I tested on:

CREATE INDEX idx_fkey_1 ON table_a(fkey_1);
CREATE INDEX idx_partial_fkey ON table_a(fkey_1) WHERE col_partial IS 
NOT FALSE;
CREATE INDEX idx_partial_fkey_id ON table_a(fkey_1, id) WHERE 
col_partial IS NOT FALSE;
CREATE INDEX idx_partial_id_fkey ON table_a(id, fkey_1) WHERE 
col_partial IS NOT FALSE;
CREATE INDEX idx_partial_fkey_partial ON table_a(fkey_1, col_partial) 
WHERE col_partial IS NOT FALSE;
CREATE INDEX idx_partial_fkey_id_partial ON table_a(fkey_1, id, 
col_partial) WHERE col_partial IS NOT FALSE;



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread John R Pierce

On 6/22/2016 3:07 AM, Vlad Arkhipov wrote:


CREATE TABLE t (id BIGINT NOT NULL PRIMARY KEY, name TEXT);

The constraint that the data must satisfy is `there is no more than 3 
records with the same name`.


I am not in control of queries that modify the table, so advisory 
locks can hardly be of help to me.


select name from t group by name having count(id)>3

will return all names with more than 3 records in a single query...  now 
the question is, what do you want to do with this information ?




--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help on recovering my standby

2016-06-22 Thread Melvin Davidson
On Wed, Jun 22, 2016 at 12:22 PM, Alan Hodgson 
wrote:

> On Tuesday 21 June 2016 19:34:18 Ramalingam, Sankarakumar wrote:
> > Hi I have my standby (streaming replication) down due to missing wal
> files.
> > You would see the same error in the logs stating "cannot find the wal
> file
> > ..." What is the best way to get it going so that when we switch between
> > standby and primary once in a while they are in sync?
> >
> > Currently I am working on a CERT server and hence there is no outage
> > concerns. I need to repeat the same process on prod once I get it going
> > successfully. Any help is appreciated.
> >
>
> You should keep your WAL files from the master for at least as long as the
> slave might be offline (plus startup time), somewhere the slave can copy
> them
> from when needed (shared file system, object store, scp target, whatever).
>
> See the postgresql.conf parameter archive_command and the corresponding
> recovery.conf parameter restore_command.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

It would be really helpful if you included PostgreSQL version and O/S in
your problem description, but since you have not, I will give a "generic"
fix.

It is doubtful, but you can check the pg_xlog on the master for the
"missing" WAL files and if they are there, simply rsync them to the standby.
If you are truly missing WAL files in your slave/standy, then  you need to
rebuild the slave as per standard procedures.
Make sure you change wal_keep_segments value on the master to be
sufficiently highly so that the problem does not occur again.
Once you make the change, be sure to reload the config file on the master
Either
SELECT pg_reload_conf();
or
pg_ctl reload -D your_data_dir



-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Help on recovering my standby

2016-06-22 Thread Alan Hodgson
On Tuesday 21 June 2016 19:34:18 Ramalingam, Sankarakumar wrote:
> Hi I have my standby (streaming replication) down due to missing wal files.
> You would see the same error in the logs stating "cannot find the wal file
> ..." What is the best way to get it going so that when we switch between
> standby and primary once in a while they are in sync?
> 
> Currently I am working on a CERT server and hence there is no outage
> concerns. I need to repeat the same process on prod once I get it going
> successfully. Any help is appreciated.
> 

You should keep your WAL files from the master for at least as long as the 
slave might be offline (plus startup time), somewhere the slave can copy them 
from when needed (shared file system, object store, scp target, whatever).

See the postgresql.conf parameter archive_command and the corresponding  
recovery.conf parameter restore_command.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: ***SPAM*** Re: [GENERAL] random huge delay when recreate a VIEW or FUNCTION

2016-06-22 Thread Catalin Maftei

  
  
well,
the for your reply,
we have PG on linux ubuntu 14.04 distribution, but pls remember
  this issue started when our DB size increase to 3-5GB, now it is
  16GB.
this happend only when we use "CREATE OR REPLACE"


since I can run a query and get reply all the time in less than
  1sec, why do you consider "CREATE A VIEW" on the remote server is
  a bad idea?
I have 1Gb connection between my laptop (windows 7/10) and remote
  server.


Best regards,
Catalin Maftei
www.plationline.eu
www.livrarionline.ro
www.c-solution.biz

Skype: c-solution
Skype: catalinmaftei
Mobile: +40723 338 598
On 6/17/2016 2:55 AM, Melvin Davidson
  wrote:


  

  On Wed, Jun 15, 2016 at 12:49 AM,
Catalin Maftei 
wrote:
hei Adrian,
  
  thx for your reply,
  
  this is an example:
  
  "CREATE OR REPLACE VIEW feedback.get_answers_set AS
   SELECT f.awb || '/' || r.dulapid as "awb",
      q.qid,
      q.question,
      a.aid,
      a.answer,
      q.questionareid,
      f.stamp_created
      , c.first_name
      , l.referinta_expeditor
  FROM feedback.answers a
       JOIN feedback.questions q ON a.qid = q.qid
       JOIN feedback.feedback f ON f.qid = q.qid AND f.aid =
  a.aid
       join public.livrari_details ld on ld.awb=f.awb
       join public.livrari l on l.livrareid = ld.livrareid
       join public.customers c on l.shipto_custkey =
  c.custkey
       join dulap.rezervare r on r.rezid = l.rezervareid;
  
  ALTER TABLE feedback.get_answers_set
    OWNER TO postgres;"
  
  
  I use PGADMIN 1.22.1
  
  my server is remote and is replicated Master-Slave.
  
  
  my team report this random delay all the time when we
  recreate VIEWS and FUNCTIONS.
  
  
  
  Best regards,
  Catalin Maftei
  www.plationline.eu
  www.livrarionline.ro
  www.c-solution.biz
  
  Skype: catalinmaftei
  Mobile: +40723 338 598
  
  On 6/15/2016 7:36 AM, Adrian Klaver wrote:
  
On 06/14/2016 02:59 PM, Catalin Maftei wrote:

  when I recreate a VIEW or FUNCTION with a small change
  I get:


What are the actual statements you are running?

What program are you running them from?

Is the server you are running the command against local
or remote?




  
  Query returned successfully with no result in 03:58
  minutes.
  
  
  we have pg 9.4
  
  -- 
  Best regards,
  Catalin Maftei
  www.plationline.eu
  www.livrarionline.ro
  www.c-solution.biz
  
  Skype: catalinmaftei
  Mobile: +40723 338 598
  
  
  
  



  
  
  
  
  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general
  

  
  
  You have not provided information as to the O/S of the
PostgreSQL server or your PgAdmin remote system (I suspect
Windows), but regardless, you 
are just asking for trouble by doing development over a
remote connection.
You should verify there is no delay by connecting directly
to the PostgreSQL server and testing the CREATE OR REPLACE
there. I suspect you will have none or very minimal delay.
Hopefully PostgreSQL is on a Linux O/S and you can use Putty
for Windows http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
to connect directly 
and develop.
  -- 
  
Melvin
Davidson
  I reserve the right to
  fantasize.  Whether or not you 
  

[GENERAL] Help on recovering my standby

2016-06-22 Thread Ramalingam, Sankarakumar
Hi I have my standby (streaming replication) down due to missing wal files. You 
would see the same error in the logs stating "cannot find the wal file ..."
What is the best way to get it going so that when we switch between standby and 
primary once in a while they are in sync?

Currently I am working on a CERT server and hence there is no outage concerns. 
I need to repeat the same process on prod once I get it going successfully. Any 
help is appreciated.


Thanks
Kumar Ramalingam
Global Database Administration
Elavon, Atlanta , GA

678 731 5288

The information contained in this e-mail and in any attachments is intended 
only for the person or entity to which it is addressed and may contain 
confidential and/or privileged material. Any review, retransmission, 
dissemination or other use of, or taking of any action in reliance upon, this 
information by persons or entities other than the intended recipient is 
prohibited. This message has been scanned for known computer viruses.


Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

2016-06-22 Thread Erdmann, Markus @ Bellevue

On Jun 20, 2016, at 1:36 PM, David G. Johnston 
> wrote:

​Please don't top-post.

Yes, using "[VACUUM] ANALYZE" is what was meant (the VACUUM doesn't hurt - and 
possibly helps - but wasn't required).  ANALYZE recomputes the statistics for 
your database.  The apparent problem was that those statistics were wrong which 
causes the planner to choose the wrong plan.

EXPLAIN ANALYZE doesn't help in that scenario - the explain and planner both 
end up using the bad statistics.  The addition of ANALYZE to EXPLAIN simply 
tells the system to not only explain your query but to execute it as well (but 
discard the results).  Aside from sharing the same 7 characters the two words 
have nothing in common.

I'm not sure how one would "...includ[e] ANALYZE in EXPLAIN ANALYZE" - you only 
get to explain one statement at a time.

David J.


Thank you, David. My confusion originated from a lack of familiarity with the 
ANALYZE command. Your responses and Tom’s response have been very enlightening.

The head-scratcher for us is that our statistics became so out of date even 
though we have the autovacuum daemon enabled in RDS, and according to the docs 
that does run ANALYZE periodically. Now we know (thanks to your help) to check 
for this issue immediately when the planner is showing a large disparity 
between the estimated and actual cost.

Markus E.


Re: [GENERAL] pg_restore error-s after pg_dump

2016-06-22 Thread Adrian Klaver

On 06/22/2016 04:00 AM, SDAG wrote:

Hi
Postgres version :
*PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-3), 64-bit*

We transfer from one server to another (with better cpu an ram) our system
using vm converter and when I try to backup database have an error :


The above is going to need more explanation.

Did you convert the VM in place or move from one VM to another?

Is the architecture of the new VM the same as the old?

Did the conversion take place on a live VM or was it stopped?


Just to be clear the dump below was taken after the conversion, correct?



*pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading extensions
pg_dump: reading user-defined functions
pg_dump: reading user-defined types
pg_dump: reading procedural languages
pg_dump: reading user-defined aggregate functions
pg_dump: reading user-defined operators
pg_dump: reading user-defined operator classes
pg_dump: reading user-defined operator families
pg_dump: reading user-defined text search parsers
pg_dump: reading user-defined text search templates
pg_dump: reading user-defined text search dictionaries
pg_dump: reading user-defined text search configurations
pg_dump: reading user-defined foreign-data wrappers
pg_dump: reading user-defined foreign servers
pg_dump: reading default privileges
pg_dump: reading user-defined collations
pg_dump: reading user-defined conversions
pg_dump: reading type casts
pg_dump: reading table inheritance information
pg_dump: reading rewrite rules
pg_dump: finding extension members
pg_dump: finding inheritance relationships
pg_dump: reading column info for interesting tables
pg_dump: finding the columns and types of table "account_account"
pg_dump: [archiver (db)] query failed: ERROR:  missing chunk number 0 for
toast value 3297740 in pg_toast_2619
pg_dump: [archiver (db)] query was: SELECT a.attnum, a.attname, a.atttypmod,
a.attstattarget, a.attstorage, t.typstorage, a.attnotnull, a.atthasdef,
a.attisdropped, a.attlen, a.attalign, a.attislocal,
pg_catalog.format_type(t.oid,a.atttypmod) AS atttypname,
array_to_string(a.attoptions, ', ') AS attoptions, CASE WHEN a.attcollation
<> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation,
pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name)
|| ' ' || pg_catalog.quote_literal(option_value) FROM
pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E',
') AS attfdwoptions FROM pg_catalog.pg_attribute a LEFT JOIN
pg_catalog.pg_type t ON a.atttypid = t.oid WHERE a.attrelid =
'274619'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2 ORDER BY
a.attrelid, a.attnum*

I solve it using *reindexdb *,after I try* pg_dump -U postgres my_db >
/home/mydb.backup* and it was successfull. Then I try to restore database to
ensure that backup is valid
*psql -U postgres new_db < /home/mydb.backup*

And have an errors :


Does table tbl1 have an id column?



*ERROR : extra data after last expected column
Context: COPY tbl1, line1: "1 2013-12-02 2013-12-02  9387.74
9775.4621148549086494"   6180.95   80262 "
ERROR : column "id" of relation "tbl1" does not exists
invalid command \N
invalid command \N
invalid command \N
.
invalid command \N
invalid command \.
ERROR:  syntax error at or near "87685"
LINE 1: 87685 SO87690 1 170468 2015-05-30 2015 05 30
^
invalid command \.



The below looks like you are trying to restore over existing data.



ERROR:  duplicate key value violates unique constraint "ir_act_client_pkey"
DETAIL:  Key (id)=(103) already exists.
CONTEXT:  COPY ir_act_client, line 21: ""
ERROR:  duplicate key value violates unique constraint
"ir_act_report_xml_pkey"
DETAIL:  Key (id)=(733) already exists.
CONTEXT:  COPY ir_act_report_xml, line 59: ""
ERROR:  duplicate key value violates unique constraint "ir_act_server_pkey"
DETAIL:  Key (id)=(703) already exists.
CONTEXT:  COPY ir_act_server, line 6: ""
ERROR:  duplicate key value violates unique constraint "ir_act_window_pkey"
DETAIL:  Key (id)=(1) already exists.
CONTEXT:  COPY ir_act_window, line 235: "334Last Product Inventories
ir.actions.act_window   \N  1   2013-07-03 10:39:48.399509
2013-12-16 16:38:..."
ERROR:  duplicate key value violates unique constraint
"ir_act_window_group_rel_act_id_gid_key"
DETAIL:  Key (act_id, gid)=(76, 1) already exists.
CONTEXT:  COPY ir_act_window_group_rel, line 14: ""
ERROR:  duplicate key value violates unique constraint
"ir_act_window_view_pkey"
DETAIL:  Key (id)=(100) already exists.
CONTEXT:  COPY ir_act_window_view, line 88: ""*


Any advice to solve this problem ?



--
View this message in context: 
http://postgresql.nabble.com/pg-restore-error-s-after-pg-dump-tp5909084.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:

Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Jehan-Guillaume de Rorthais
Le Wed, 22 Jun 2016 18:07:46 +0800,
Vlad Arkhipov  a écrit :

> I am running PostgreSQL 9.5.
> 
> CREATE TABLE t (id BIGINT NOT NULL PRIMARY KEY, name TEXT);

I guess this is not your definitive definition of the table and you might
have some other fields isn't it ? 

I can see multiple way to change this schema that seems broken, but we probably
lack informations to pick the right one...

> The constraint that the data must satisfy is `there is no more than 3 
> records with the same name`.
> 
> I am not in control of queries that modify the table, so advisory locks 
> can hardly be of help to me.
> 
> On 06/22/2016 05:20 PM, Sameer Kumar wrote:
> >
> >
> > On Wed, Jun 22, 2016 at 5:10 PM Vlad Arkhipov  > > wrote:
> >
> > Hello,
> >
> > I have a constraint that requires a table to be locked before checking
> > it (i.e. no more than 2 records with the same value in the same
> > column).
> > If I lock the table in the SHARE ROW EXCLUSIVE mode, any vacuuming (or
> > autovacuuming) process prevents me from checking the constraint. What
> > are possible solutions?
> >
> >
> > May be you would like to share-
> > - Table Structure
> > - PostgreSQL version
> >
> > This will help people who would try to help you.
> >
> > I think you might want to consider an optimistic way of locking your 
> > records, instead of locking them. Or look at advisory locks (but that 
> > depends on your Postgres version).


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Jehan-Guillaume de Rorthais
Le Wed, 22 Jun 2016 10:49:13 +,
Albe Laurenz  a écrit :

> Sameer Kumar wrote:
> > On Wed, Jun 22, 2016 at 6:08 PM Vlad Arkhipov  wrote:
> >> I am running PostgreSQL 9.5.
> >> 
> >> CREATE TABLE t (id BIGINT NOT NULL PRIMARY KEY, name TEXT);
> >> 
> >> The constraint that the data must satisfy is `there is no more than 3
> >> records with the same name`.
> >> 
> >> I am not in control of queries that modify the table, so advisory locks
> >> can hardly be of help to me.
> > 
> > 
> > Define a function which does a count of the rows and if count is 3 it
> > return false if count is less it returns true.
> > 
> > Use check constraint with this function. I have not tried this so not sure
> > if you can use function with SELECT on same table in CHECK constraint. So
> > test it out first.
> > 
> > If this works, any insert trying to get the 4th record in table would fail.
> 
> You cannot use subqueries in a check constraint:
> 
> ALTER TABLE t
>ADD CONSTRAINT name_count
>   CHECK ((SELECT count(*) FROM t t1 WHERE t1.name = name) <= 3);
> ERROR:  cannot use subquery in check constraint
> 
> > A last resort could be using triggers. But either of these approaches will
> > cause issues if you have high concurrency.
> 
> Yes, triggers is the way to go:
> 
> CREATE OR REPLACE FUNCTION check_t() RETURNS trigger LANGUAGE plpgsql AS
> $$BEGIN
>IF (SELECT count(*) FROM t WHERE name = NEW.name) > 3 THEN
>   RAISE EXCEPTION 'More than three values!';
>END IF;
>RETURN NEW;
> END;$$;
> 
> CREATE TRIGGER check_t AFTER INSERT OR UPDATE ON t FOR EACH ROW
>EXECUTE PROCEDURE check_t();
> 
> But be warned that this will only work if all transactions involved use
> the isolation level SERIALIZABLE.
> 
> Otherwise two concurrent INSERTs would not see each other's entry, and the
> triggers would not raise an error even if there are more than three entries
> after COMMIT.

Use advisory locks to be able to use this in any isolation level:

  CREATE OR REPLACE FUNCTION check_t() RETURNS trigger LANGUAGE plpgsql AS
  $$BEGIN
 PERFORM pg_advisory_xact_lock(hashtext(NEW.name));

 IF (SELECT count(*) FROM t WHERE name = NEW.name) > 3 THEN
RAISE EXCEPTION 'More than three values!';
 END IF;
 RETURN NEW;
  END;$$;


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_restore error-s after pg_dump

2016-06-22 Thread SDAG
Hi
Postgres version : 
*PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-3), 64-bit*

We transfer from one server to another (with better cpu an ram) our system
using vm converter and when I try to backup database have an error :

*pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading extensions
pg_dump: reading user-defined functions
pg_dump: reading user-defined types
pg_dump: reading procedural languages
pg_dump: reading user-defined aggregate functions
pg_dump: reading user-defined operators
pg_dump: reading user-defined operator classes
pg_dump: reading user-defined operator families
pg_dump: reading user-defined text search parsers
pg_dump: reading user-defined text search templates
pg_dump: reading user-defined text search dictionaries
pg_dump: reading user-defined text search configurations
pg_dump: reading user-defined foreign-data wrappers
pg_dump: reading user-defined foreign servers
pg_dump: reading default privileges
pg_dump: reading user-defined collations
pg_dump: reading user-defined conversions
pg_dump: reading type casts
pg_dump: reading table inheritance information
pg_dump: reading rewrite rules
pg_dump: finding extension members
pg_dump: finding inheritance relationships
pg_dump: reading column info for interesting tables
pg_dump: finding the columns and types of table "account_account"
pg_dump: [archiver (db)] query failed: ERROR:  missing chunk number 0 for
toast value 3297740 in pg_toast_2619
pg_dump: [archiver (db)] query was: SELECT a.attnum, a.attname, a.atttypmod,
a.attstattarget, a.attstorage, t.typstorage, a.attnotnull, a.atthasdef,
a.attisdropped, a.attlen, a.attalign, a.attislocal,
pg_catalog.format_type(t.oid,a.atttypmod) AS atttypname,
array_to_string(a.attoptions, ', ') AS attoptions, CASE WHEN a.attcollation
<> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation,
pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name)
|| ' ' || pg_catalog.quote_literal(option_value) FROM
pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E',
') AS attfdwoptions FROM pg_catalog.pg_attribute a LEFT JOIN
pg_catalog.pg_type t ON a.atttypid = t.oid WHERE a.attrelid =
'274619'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2 ORDER BY
a.attrelid, a.attnum*

I solve it using *reindexdb *,after I try* pg_dump -U postgres my_db >
/home/mydb.backup* and it was successfull. Then I try to restore database to
ensure that backup is valid 
*psql -U postgres new_db < /home/mydb.backup*

And have an errors :

*ERROR : extra data after last expected column
Context: COPY tbl1, line1: "1 2013-12-02 2013-12-02  9387.74 
9775.4621148549086494"   6180.95   80262 "
ERROR : column "id" of relation "tbl1" does not exists
invalid command \N
invalid command \N
invalid command \N
. 
invalid command \N
invalid command \.
ERROR:  syntax error at or near "87685"
LINE 1: 87685 SO87690 1 170468 2015-05-30 2015 05 30 
^
invalid command \.

ERROR:  duplicate key value violates unique constraint "ir_act_client_pkey"
DETAIL:  Key (id)=(103) already exists.
CONTEXT:  COPY ir_act_client, line 21: ""
ERROR:  duplicate key value violates unique constraint
"ir_act_report_xml_pkey"
DETAIL:  Key (id)=(733) already exists.
CONTEXT:  COPY ir_act_report_xml, line 59: ""
ERROR:  duplicate key value violates unique constraint "ir_act_server_pkey"
DETAIL:  Key (id)=(703) already exists.
CONTEXT:  COPY ir_act_server, line 6: ""
ERROR:  duplicate key value violates unique constraint "ir_act_window_pkey"
DETAIL:  Key (id)=(1) already exists.
CONTEXT:  COPY ir_act_window, line 235: "334Last Product Inventories   
ir.actions.act_window   \N  1   2013-07-03 10:39:48.399509 
2013-12-16 16:38:..."
ERROR:  duplicate key value violates unique constraint
"ir_act_window_group_rel_act_id_gid_key"
DETAIL:  Key (act_id, gid)=(76, 1) already exists.
CONTEXT:  COPY ir_act_window_group_rel, line 14: ""
ERROR:  duplicate key value violates unique constraint
"ir_act_window_view_pkey"
DETAIL:  Key (id)=(100) already exists.
CONTEXT:  COPY ir_act_window_view, line 88: ""*


Any advice to solve this problem ?



--
View this message in context: 
http://postgresql.nabble.com/pg-restore-error-s-after-pg-dump-tp5909084.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Vlad Arkhipov
That is why I need to lock the table before. The transactions are 
running at the READ COMMITTED isolation level.


On 06/22/2016 06:49 PM, Albe Laurenz wrote:

But be warned that this will only work if all transactions involved use
the isolation level SERIALIZABLE.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Albe Laurenz
Sameer Kumar wrote:
> On Wed, Jun 22, 2016 at 6:08 PM Vlad Arkhipov  wrote:
>> I am running PostgreSQL 9.5.
>> 
>> CREATE TABLE t (id BIGINT NOT NULL PRIMARY KEY, name TEXT);
>> 
>> The constraint that the data must satisfy is `there is no more than 3 
>> records with the same name`.
>> 
>> I am not in control of queries that modify the table, so advisory locks can 
>> hardly be of help to me.
> 
> 
> Define a function which does a count of the rows and if count is 3 it return 
> false if count is less it
> returns true.
> 
> Use check constraint with this function. I have not tried this so not sure if 
> you can use function
> with SELECT on same table in CHECK constraint. So test it out first.
> 
> If this works, any insert trying to get the 4th record in table would fail.

You cannot use subqueries in a check constraint:

ALTER TABLE t
   ADD CONSTRAINT name_count
  CHECK ((SELECT count(*) FROM t t1 WHERE t1.name = name) <= 3);
ERROR:  cannot use subquery in check constraint

> A last resort could be using triggers. But either of these approaches will 
> cause issues if you have
> high concurrency.

Yes, triggers is the way to go:

CREATE OR REPLACE FUNCTION check_t() RETURNS trigger LANGUAGE plpgsql AS
$$BEGIN
   IF (SELECT count(*) FROM t WHERE name = NEW.name) > 3 THEN
  RAISE EXCEPTION 'More than three values!';
   END IF;
   RETURN NEW;
END;$$;

CREATE TRIGGER check_t AFTER INSERT OR UPDATE ON t FOR EACH ROW
   EXECUTE PROCEDURE check_t();

But be warned that this will only work if all transactions involved use
the isolation level SERIALIZABLE.

Otherwise two concurrent INSERTs would not see each other's entry, and the
triggers would not raise an error even if there are more than three entries
after COMMIT.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Sameer Kumar
On Wed, Jun 22, 2016 at 6:08 PM Vlad Arkhipov  wrote:

> I am running PostgreSQL 9.5.
>
> CREATE TABLE t (id BIGINT NOT NULL PRIMARY KEY, name TEXT);
>
> The constraint that the data must satisfy is `there is no more than 3
> records with the same name`.
>
> I am not in control of queries that modify the table, so advisory locks
> can hardly be of help to me.
>

Define a function which does a count of the rows and if count is 3 it
return false if count is less it returns true.

Use check constraint with this function. I have not tried this so not sure
if you can use function with SELECT on same table in CHECK constraint. So
test it out first.

If this works, any insert trying to get the 4th record in table would fail.

A last resort could be using triggers. But either of these approaches will
cause issues if you have high concurrency.



>
>
> On 06/22/2016 05:20 PM, Sameer Kumar wrote:
>
>
>
> On Wed, Jun 22, 2016 at 5:10 PM Vlad Arkhipov 
> wrote:
>
>> Hello,
>>
>> I have a constraint that requires a table to be locked before checking
>> it (i.e. no more than 2 records with the same value in the same column).
>> If I lock the table in the SHARE ROW EXCLUSIVE mode, any vacuuming (or
>> autovacuuming) process prevents me from checking the constraint. What
>> are possible solutions?
>>
>
> May be you would like to share-
> - Table Structure
> - PostgreSQL version
>
> This will help people who would try to help you.
>
> I think you might want to consider an optimistic way of locking your
> records, instead of locking them. Or look at advisory locks (but that
> depends on your Postgres version).
>
>
>>
>>
>> --
>> Sent via pgsql-general mailing list ( 
>> pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
> --
> --
> Best Regards
> Sameer Kumar | DB Solution Architect
> *ASHNIK PTE. LTD.*
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
>
> T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com
>
>
> --
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com


Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Vlad Arkhipov

I am running PostgreSQL 9.5.

CREATE TABLE t (id BIGINT NOT NULL PRIMARY KEY, name TEXT);

The constraint that the data must satisfy is `there is no more than 3 
records with the same name`.


I am not in control of queries that modify the table, so advisory locks 
can hardly be of help to me.


On 06/22/2016 05:20 PM, Sameer Kumar wrote:



On Wed, Jun 22, 2016 at 5:10 PM Vlad Arkhipov > wrote:


Hello,

I have a constraint that requires a table to be locked before checking
it (i.e. no more than 2 records with the same value in the same
column).
If I lock the table in the SHARE ROW EXCLUSIVE mode, any vacuuming (or
autovacuuming) process prevents me from checking the constraint. What
are possible solutions?


May be you would like to share-
- Table Structure
- PostgreSQL version

This will help people who would try to help you.

I think you might want to consider an optimistic way of locking your 
records, instead of locking them. Or look at advisory locks (but that 
depends on your Postgres version).




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com





Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Albe Laurenz
Vlad Arkhipov wrote:
> I have a constraint that requires a table to be locked before checking
> it (i.e. no more than 2 records with the same value in the same column).
> If I lock the table in the SHARE ROW EXCLUSIVE mode, any vacuuming (or
> autovacuuming) process prevents me from checking the constraint. What
> are possible solutions?

Can you describe your check in more detail?
Why don't you use simple unique constraints?

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Sameer Kumar
On Wed, Jun 22, 2016 at 5:10 PM Vlad Arkhipov  wrote:

> Hello,
>
> I have a constraint that requires a table to be locked before checking
> it (i.e. no more than 2 records with the same value in the same column).
> If I lock the table in the SHARE ROW EXCLUSIVE mode, any vacuuming (or
> autovacuuming) process prevents me from checking the constraint. What
> are possible solutions?
>

May be you would like to share-
- Table Structure
- PostgreSQL version

This will help people who would try to help you.

I think you might want to consider an optimistic way of locking your
records, instead of locking them. Or look at advisory locks (but that
depends on your Postgres version).


>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
-- 
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com


[GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Vlad Arkhipov

Hello,

I have a constraint that requires a table to be locked before checking 
it (i.e. no more than 2 records with the same value in the same column). 
If I lock the table in the SHARE ROW EXCLUSIVE mode, any vacuuming (or 
autovacuuming) process prevents me from checking the constraint. What 
are possible solutions?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] optimizing a query

2016-06-22 Thread Erik Gustafson
Hi,

don't you want an index on t_a2b.col_a, maybe partial where col_a=1 ?



On Wed, Jun 22, 2016 at 2:07 AM, Jonathan Vanasco  wrote:

>
> On Jun 21, 2016, at 6:55 PM, David G. Johnston wrote:
>
> ​Aside from the name these indexes are identical...​
>
>
> sorry. tired eyes copy/pasting between windows and trying to 'average' out
> 40 similar queries.
>
> ​These two items combined reduce the desirability of diagnosing this...it
> doesn't seem like you've faithfully recreated the scenario for us to
> evaluate.
>
> Your post is also not self-contained and you haven't provided the actual
> EXPLAINs you are getting.
>
>
> I played around with some more indexes, creating and disabling them on one
> specific query
> Eventually i found some index formats that didn't pull in the whole table.
> They gave approximately the same results as the other selects, with some
> differences in reporting.  the heap scan on the table was negligible.  the
> big hit was off the outer hash join.
> the formatting in explain made a negligible check look like it was the
> root issue
>
>
> CREATE TABLE t_a (id SERIAL PRIMARY KEY,
>  col_1 INT NOT NULL,
>  col_2 BOOLEAN DEFAULT NULL
>  );
> CREATE INDEX test_idx__t_a_col1_col2__v1 on t_a (col_1) WHERE col_2 IS NOT
> FALSE;
> CREATE INDEX test_idx__t_a_col1_col2__v2 on t_a (col_1, id) WHERE col_2 IS
> NOT FALSE;
> CREATE INDEX test_idx__t_a_col1_col2__v3 on t_a (id, col_1) WHERE col_2 IS
> NOT FALSE;
> CREATE INDEX test_idx__t_a_col1_col2__v4 on t_a (id, col_1, col_2) WHERE
> col_2 IS NOT FALSE;
> CREATE INDEX test_idx__t_a_col1_col2__v5 on t_a (col_1, col_2) WHERE col_2
> IS NOT FALSE;
>
> CREATE TABLE t_b (id SERIAL PRIMARY KEY,
>  col_1 INT NOT NULL,
>  col_2 BOOLEAN DEFAULT NULL
>  );
> CREATE TABLE t_a2b (a_id INT NOT NULL REFERENCES t_a(id),
>b_id INT NOT NULL REFERENCES t_b(id),
>col_a INT NOT NULL,
>PRIMARY KEY (a_id, b_id)
>);
>
> EXPLAIN ANALYZE
> SELECT t_a2b.b_id AS t_a2b_b_id,
>   count(t_a2b.b_id) AS counted
> FROM t_a2b
> JOIN t_a ON t_a2b.a_id = t_a.id
> WHERE t_a.col_1 = 730
>  AND t_a2b.col_a = 1
>  AND (t_a.col_2 IS NOT False)
> GROUP BY t_a2b.b_id
> ORDER BY counted DESC,
> t_a2b.b_id ASC
> LIMIT 25
> OFFSET 0
> ;
>
>
>
>QUERY PLAN
>
> 
>  Limit  (cost=270851.55..270851.62 rows=25 width=4) (actual
> time=1259.950..1259.953 rows=25 loops=1)
>->  Sort  (cost=270851.55..270863.43 rows=4750 width=4) (actual
> time=1259.945..1259.945 rows=25 loops=1)
>  Sort Key: (count(t_a2b.b_id)) DESC, t_a2b.b_id
>  Sort Method: top-N heapsort  Memory: 26kB
>  ->  HashAggregate  (cost=270670.01..270717.51 rows=4750 width=4)
> (actual time=1259.430..1259.769 rows=1231 loops=1)
>Group Key: t_a2b.b_id
>->  Hash Join  (cost=171148.45..270516.71 rows=30660
> width=4) (actual time=107.662..1230.481 rows=124871 loops=1)
>  Hash Cond: (t_a2b.a_id = t_a.id)
>  ->  Seq Scan on t_a2b  (cost=0.00..89741.18
> rows=2485464 width=8) (actual time=0.011..661.978 rows=2492783 loops=1)
>Filter: (col_a = 1)
>Rows Removed by Filter: 2260712
>  ->  Hash  (cost=170446.87..170446.87 rows=56126
> width=4) (actual time=107.409..107.409 rows=48909 loops=1)
>Buckets: 65536  Batches: 1  Memory Usage: 2232kB
>->  Bitmap Heap Scan on t_a
>  (cost=1055.41..170446.87 rows=56126 width=4) (actual time=18.243..94.470
> rows=48909 loops=1)
>  Recheck Cond: ((col_1 = 730) AND (col_2
> IS NOT FALSE))
>  Heap Blocks: exact=43972
>  ->  Bitmap Index Scan on
> test_idx__t_a_col1_col2__v2  (cost=0.00..1041.38 rows=56126 width=0)
> (actual time=8.661..8.661 rows=48909 loops=1)
>Index Cond: (col_1 = 730)
>  Planning time: 0.796 ms
>  Execution time: 1260.092 ms
>
>
>   QUERY PLAN
>
> --
>  Limit  (cost=208239.59..208239.65 rows=25 width=4) (actual
> time=1337.739..1337.743 rows=25 loops=1)
>->  Sort  (cost=208239.59..208251.47 rows=4750 width=4) (actual
> time=1337.737..1337.739 rows=25 loops=1)
>  Sort Key: (count(t_a2b.b_id)) DESC, t_a2b.b_id
>  Sort Method: top-N heapsort  Memory: 26kB
>  ->  HashAggregate  (cost=208058.05..208105.55 rows=4750 width=4)
> (actual time=1337.183..1337.556 rows=1231 loops=1)
>Group Key: t_a2b.b_id
>->  Hash Join  (cost=108628.33..207935.37 rows=24537
> width=4) 

R: [GENERAL] Vacuum full: alternatives?

2016-06-22 Thread Job
Excellent Scott!
Thank you!
Francesco


Da: Scott Marlowe [scott.marl...@gmail.com]
Inviato: martedì 21 giugno 2016 2.06
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?

On Mon, Jun 20, 2016 at 3:18 AM, Job  wrote:
>
> Hello,
>
> we have a table with an heavy traffic of pg_bulkload and delete of records.
> The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
>
> We have important problems on size and the only way to gain free space is 
> issueing a vacuum full .
> But the operation is very slow, sometimes 2/4 hours, and table is not 
> available for services as it is locked.
>
> We do not delete everything at one (in this case the truncate woudl resolve 
> the problem).
>
> The autovacuum is not able (same for normal vacuum) to free the spaces.
>
> Are there some suggestions or another way to manage this?

First off, from your comments I'm not sure you really get postgresql's
way of freeing space and reusing it via autovacuum. Basically
postgresql, autovacuum process marks space as free, and the backend
writes new data (inserts or updates) into the free space. You
eventually reach equilibrium of a sort when the vacuum is freeing up
space as quickly as it's being consumed, or faster. The problem occurs
when vacuum can't keep up with your delete / write and update rate
combined. If this is happening you need to:

A: Make sure your IO Subsystem is fast enough to handle BOTH your
update rate AND your vacuuming needed to keep up, You're better off
with a machine that can do 15,000 transactions per second running a
load of 1,000 than trying to handle it with a machine that can do
1,500 tps etc. Sizing the hardware is a whole other conversation.

AND

B: Make your autovacuum aggressive enough to NOT fall behind.

It's important to remember that autovacuum was built and designed in a
time when most databases lived on spinning media. It's designed to not
overload spinning discs with too much random IO. A super fast RAID-10
array from that time period could do 200 to 1,000 transactions per
second and that only with a top notch RAID controller etc. Regular
spinning discs have a maximum random write ops per second that measure
in the 100 per second range.

My laptop with no pgsql optimizations, can do 850 tps on it's SSD. A
server with 10 SSDs in RAID-5 can do 15,000 tps.  If you have a fast
IO subsystem and wish to utilize it with pgsql you're going to have to
examine whether or not autovacuum with default settings is fast enough
to keep up. Remmeber, Auto-vacuum, by design, is slow and tries not to
get in the way. It's fast enough for most mundane uses, but can't keep
up with a fast machine running hard. The default settings for
autovacuum to look at here are first these two.:

autovacuum_vacuum_cost_delay = 20ms
autovacuum_vacuum_cost_limit = 200

They govern how hard autovac works. By default autovac doesn't work
hard. Making it work too hard for a given machine can cause system
performance issues. I.e. it "gets in the way".

Lowering cost_delay is usually enough. As you approach 1ms autovac
starts to use a lot more bandwidth. I find that even on pretty fast
machines that are routinely doing 1,000 writes per second or more, 3ms
is fast enough to keep up with a cost limit of 200. 5ms is a good
compromise without getting too aggressive.

In contrast to autovacuum, REGULAR vacuum, by default, runs at full
throttle. It hits your db server hard, performance-wise. It has  zero
cost delay, so it works very hard. If you run it midday on a hard
working server you will almost certainly see the performance drop. The
difference between regular vacuum with a delay time of 0 and autovac
with a delay of 20ms is huge.

These settings become important if you have a LOT of tables or dbs.
Otherwise they're probably fine.

autovacuum_max_workers =3 # Adjust this last, unless you have
thousands of tables or dbs.
autovacuum_naptime = 1 min # How long to wait before checking the next
db. Default is usually fine unless you have a lot of dbs.

These settings tell autovacuum when to kick in. Keeping these low
enough to keep autovac busy is a good idea too:

autovacuum_vacuum_scale_factor
autovacuum_vacuum_threshold
autovacuum_analyze_scale_factor
autovacuum_analyze_threshold

I tend to go for threshold, which is an absolute number of rows
changed before autovac kicks off. Scale factor can be dangerous
because what seems small at the beginning, gets big fast. If it's 0.1
then that's 10%. 10% of 1,000 is 100 rows, but 10% of 1,000,000 is
100,000, which is a LOT of rows to ignore until you have more than
that that need vacuuming. Setting it to something like 100 or 1,000
will keep your db from growing hundreds of thousands ofr dead tuples
in a big table.

Either way you need to make sure your autovacuum is aggressive enough
to keep up with your db's throughput.

Checking for bloat. You can see what parts of