from:"Jeremy Finzel"

Aggregate not using BRIN index on timestamp

2019-08-05 Thread Jeremy Finzel

Hello -

I have started to make much more use of BRIN indexes on timestamp fields on
tables which are insert-only.  I have seen great performance with these and
of course far less overhead.

However, I am noticing that a simple aggregate is not using the index.  I
don't find anything obvious in the docs as to why, and I am not sure if the
operator is not actually supported, or for some reason it is not choosing
it because of the estimate.

I have a very large table with 4 billion rows and a BRIN index on timestamp
spanning from 2013 to present.  I am running this simple query:

SELECT MIN(created_at) FROM table;

It is choosing a parallel seq scan as opposed to a BRIN bitmap scan.

Please note also that the following queries that I am using are using the
index with great performance:
SELECT * FROM table WHERE created_at > '2013-04-01' AND created_at <=
'2013-04-08';

I can provide more info.  But first - am I missing something obvious?

Thanks,
Jeremy

Re: Aggregate not using BRIN index on timestamp

2019-08-05 Thread Jeremy Finzel

>
> Yes: BRIN indexes don't provide any ordering information.  A btree
> index on created_at could be used to optimize this query, but without
> one of those, seqscanning the whole table is the only possibility.
>

Thanks Tom.  So, this is a very general question, but would it be possible
to develop that feature into BRIN, given what it stores?  Even if it does
not have ordering information, doesn't it know which blocks would contain
the lowest values, so it could choose to do a "bitmap scan ordered sort" or
something, depending on the number of rows sought?  Or is the problem that
it has no way of determining what value actually would be the "minimum"
without the query specifying a particular date, such as less than
"2013-04-01"?

Thanks!
Jeremy

How to use brin_summarize_range

2019-08-22 Thread Jeremy Finzel

Good afternoon!

I am finding it difficult to understand how to maintain my BRIN index from
the docs.  Specifically, this is the documentation on the
function brin_summarize_range which isn't clear to me:

brin_summarize_range(index regclass, blockNumber bigint) integer

   - summarize the page range covering the given block, if not already
   summarized

There is no information on how a user is to actually find blockNumber,
especially what blockNumber she might be interested in (like the end of the
table).  On my table, my BRIN index is all of a sudden all out of whack and
I'm trying to figure out why.  The planner doesn't choose it.  Even if I
force a BRIN scan, it estimates way wrong, and performs terribly.  I do not
have autosummarize on.  I am curious if vacuum somehow invalidated
everything?

When I ran brin_summarize_new_values, it immediately returned 0.  This
table has 292 million rows, and a straightforward insert-only pattern, but
we also prune data older than 1 year old. The BRIN index is on insert
time.  It was working great up until just a bit ago.

Any direction on using these brin functions would be very appreciated.

Thanks,
Jeremy

Re: How to use brin_summarize_range

2019-08-23 Thread Jeremy Finzel

On Thu, Aug 22, 2019 at 1:46 PM Jeremy Finzel  wrote:

> Good afternoon!
>
> I am finding it difficult to understand how to maintain my BRIN index from
> the docs.  Specifically, this is the documentation on the
> function brin_summarize_range which isn't clear to me:
>
> brin_summarize_range(index regclass, blockNumber bigint) integer
>
>- summarize the page range covering the given block, if not already
>summarized
>
> I answered my own question (I think).  blockNumber corresponds, I believe,
to pages_per_range.  So if I choose 64 as that value, I can run above
function on 64 possible values.  But perhaps I'm wrong about that?


> There is no information on how a user is to actually find blockNumber,
> especially what blockNumber she might be interested in (like the end of the
> table).  On my table, my BRIN index is all of a sudden all out of whack and
> I'm trying to figure out why.  The planner doesn't choose it.  Even if I
> force a BRIN scan, it estimates way wrong, and performs terribly.  I do not
> have autosummarize on.  I am curious if vacuum somehow invalidated
> everything?
>
> When I ran brin_summarize_new_values, it immediately returned 0.  This
> table has 292 million rows, and a straightforward insert-only pattern, but
> we also prune data older than 1 year old. The BRIN index is on insert
> time.  It was working great up until just a bit ago.
>
> Any direction on using these brin functions would be very appreciated.
>

I am also noticing bad plan choices with BRIN indexes on above scenario.  I
have tried creating said index with pages_per_range values of 64, 128, 500,
1000, and 1.  The 1000 value works best and executed in 11 seconds.

However, regardless of pages_per_change, the planner is still choosing a
btree index on (id, id_2, insert_time) fields, which is taking 30 seconds
to execute.  I have to SET enable_indexscan TO false to get the BRIN index
used, which is 3x faster.  What gives?

Any suggestions?

Thanks,
Jeremy

Re: DDL support for logical replication

2019-10-10 Thread Jeremy Finzel

On Thu, Oct 10, 2019 at 3:09 PM Lev Kokotov  wrote:

> Hi Miles,
>
> One issue is keeping the subscriber and the publisher schema identical.
> Running migrations on both the publisher and subscriber does not seem
> atomic to me, therefore I don't have a way to enforce consistency between
> the two. The use case is simple: schemas change all the time, and keeping
> two databases (or more!) in sync manually is tough.
>

We had this requirement as well and this is why I wrote pgl_ddl_deploy
 on top of pglogical to handle the
issue.  It's by no means an all-encompassing solution, but it's worked
really well for us.  It only supports pglogical currently (which could be
expanded for in-core logical... but we have not tried yet).

It basically uses event triggers, and inspecting what is being executed, as
an idea of how to propagate DDL.

However, I am very interested in future of this feature for in-core logical
replication in a much more natively-supported way.  As such, I have not
heard anything for awhile.  Schema mapping for logical replication is
another feature we really would want, and would also very much relate to
DDL replication.  @Alvaro Herrera  did a lot of
great work on support for in-core DDL replication a few years back.  Here
is that thread:
https://postgr.es/m/cacaco5qquav+n4gi+ya1jf_a+qenr6sjup8cydpsrxka+fh...@mail.gmail.com

Thanks,
Jeremy

Cases where alter table set type varchar(longer length) still needs table rewrite

2020-02-17 Thread Jeremy Finzel

Good morning!

We are a little bit puzzled because running the following command on a 9.6
cluster is apparently requiring a table rewrite, or at least a very long
operation of some kind, even though the docs say that as of 9.2:


   -

   Increasing the length limit for a varchar or varbit column, or removing
   the limit altogether, no longer requires a table rewrite. Similarly,
   increasing the allowable precision of a numeric column, or changing a
   column from constrained numeric to unconstrained numeric, no longer
   requires a table rewrite. Table rewrites are also avoided in similar cases
   involving the interval, timestamp, and timestamptz types.

I have a table foo with 100 million rows, and a column:

   - id character varying(20)

The following command is the one that we expect to execute very quickly (we
are not seeing any locking), but it is instead taking a very long time:

   - ALTER TABLE foo ALTER COLUMN id TYPE varchar(100);

I attempted the same operation instead cast to text - same problem.

Why do the docs seem wrong in our case?  I have a guess: if the table was
created prior to version 9.2, perhaps they are not binary coercible to text
after 9.2?  In any case, I would be very grateful for an explanation!


Thank you!
Jeremy

Re: Cases where alter table set type varchar(longer length) still needs table rewrite

2020-02-17 Thread Jeremy Finzel

On Mon, Feb 17, 2020 at 8:21 AM Tom Lane  wrote:

> Jeremy Finzel  writes:
> > I have a table foo with 100 million rows, and a column:
> >- id character varying(20)
> > The following command is the one that we expect to execute very quickly
> (we
> > are not seeing any locking), but it is instead taking a very long time:
> >- ALTER TABLE foo ALTER COLUMN id TYPE varchar(100);
>
> Hm, the code is supposed to avoid a table rewrite, but I wonder if
> there's something else that's not being avoided, such as an index
> rebuild or foreign-key verification.  Could we see the whole table
> definition, eg from psql \d+ ?
>
> regards, tom lane
>

Based on your feedback, I quickly identified that indeed, the following
index is causing the re-type to be slow:

"id_idx" btree ("substring"(id::text, 4, 7))

I'm still not sure why a rebuild of this index would be required,
technically speaking.  But perhaps in any case the docs should have
something to the effect that expression indexes may require rebuild under
specific circumstances?

Thanks!
Jeremy

Re: Cases where alter table set type varchar(longer length) still needs table rewrite

2020-02-17 Thread Jeremy Finzel

On Mon, Feb 17, 2020 at 10:46 AM Adrian Klaver 
wrote:

>
> How about?:
>
> https://www.postgresql.org/docs/9.6/sql-altertable.html
>
> "Adding a column with a DEFAULT clause or changing the type of an
> existing column will require the entire table and its indexes to be
> rewritten. As an exception when changing the type of an existing column,
> if the USING clause does not change the column contents and the old type
> is either binary coercible to the new type or an unconstrained domain
> over the new type, a table rewrite is not needed; but any indexes on the
> affected columns must still be rebuilt. Adding or removing a system oid
> column also requires rewriting the entire table. Table and/or index
> rebuilds may take a significant amount of time for a large table; and
> will temporarily require as much as double the disk space."
>
> >
> > Thanks!
> > Jeremy
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


You mean the part "any indexes on the affected columns must still be
rebuilt"?  Yes, I guess that is pretty clear.  Thanks,

Jeremy

Dependency tree to tie type/function deps to a table

2017-12-13 Thread Jeremy Finzel

It looks like the very useful dependency tree shown when using DROP CASCADE
is written in C in dependency.c, but there is no way to leverage this
within Postgres to actually query an object's dependencies.  Can we get
this somehow as a Postgres client in SQL?

One of the problems we face over and over is that we want precisely this
list of dependencies, for example so that we can recreate a base table with
all of the dependencies on top of it.

The pg_depend_display  is
not perfect for example to tie together functions that have table
dependencies.  Currently, if I have a function foo() that returns type
setof foo where foo is a foreign table, the pg_depend_display functions
shows no connection between the function foo()  and the type foo to the
table foo.

Is there any way we can get that dependency.c code in a consumable form
inside postgres, for example as an ordered JSON list of objects as
dependencies to be recreated that could be passed to pg_dump in order to
recreate the objects in order after recreating a table?

Really appreciate any feedback.

Thanks!
Jeremy

Re: Dependency tree to tie type/function deps to a table

2017-12-13 Thread Jeremy Finzel

On Wed, Dec 13, 2017 at 9:54 AM, Melvin Davidson 
wrote:

>
>
> On Wed, Dec 13, 2017 at 10:20 AM, Tom Lane  wrote:
>
>> Jeremy Finzel  writes:
>> > It looks like the very useful dependency tree shown when using DROP
>> CASCADE
>> > is written in C in dependency.c, but there is no way to leverage this
>> > within Postgres to actually query an object's dependencies.  Can we get
>> > this somehow as a Postgres client in SQL?
>>
>
Thanks Tom, I am working on this.


>
>> Seems like you could build a query for that easily enough using a
>> recursive union over pg_depend plus pg_describe_object() to produce
>> text descriptions of the entries.
>>
>> regards, tom lane
>>
>>
>
> Jeremy ,
>
> per Tom
>
> >Seems like you could build a query...
>
> Attached is the query that I use. Hope that helps you.
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>

I appreciate that, Melvin.  However, this doesn't do the recursive part.
It doesn't show me type or function dependencies, for example:

CREATE TEMP TABLE foo1 (id int);
CREATE TEMP TABLE foo2 (id int);
CREATE VIEW pg_temp.foo3 AS
SELECT f.id, f2.id AS id2
FROM foo1 f CROSS JOIN foo2 f2;
CREATE VIEW pg_temp.foo4 AS
SELECT f.id, f2.id AS id2
FROM foo1 f CROSS JOIN foo3 f2;

CREATE FUNCTION foo() RETURNS SETOF foo1 AS 'SELECT * FROM foo1;' LANGUAGE
SQL;

Your function only shows:
  schema   |  parent   | dep_schema | dependent | type
---+---++---+---
 pg_temp_4 | foo1  | pg_temp_4  | foo3  | view
 pg_temp_4 | foo1  | pg_temp_4  | foo4  | view
 pg_temp_4 | foo2  | pg_temp_4  | foo3  | view
 pg_temp_4 | foo3  | pg_temp_4  | foo4  | view

Thanks,
Jeremy

Re: Dependency tree to tie type/function deps to a table

2017-12-15 Thread Jeremy Finzel

4
view foo4 view public.foo4 2618 24176 foo4 3
view foo3 view public.foo3 2618 24172 foo3 2

If I drop these in order of appearance, it all works and finally lets me
drop table foo1 without cascade.

Thanks,
Jeremy

On Wed, Dec 13, 2017 at 1:31 PM, Alvaro Herrera 
wrote:

> Jeremy Finzel wrote:
>
> > I appreciate that, Melvin.  However, this doesn't do the recursive part.
> > It doesn't show me type or function dependencies, for example:
>
> You need a WITH RECURSIVE query ...
>
> If you do figure it out, please publish it as in this section of the wiki
> https://wiki.postgresql.org/wiki/Category:Snippets
>
> (In general, it would be good to have lots of contents in Snippets, so
> feel free to add stuff that you think may be of general usefulness.)
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: Dependency tree to tie type/function deps to a table

2017-12-17 Thread Jeremy Finzel

>
> Perhaps you'll find the version on the wiki useful as reference, it's an
> older version of the dependencies, and can be found at
> https://wiki.postgresql.org/wiki/Pg_depend_display
>

This is where I started.  However, I noticed that it do everything I need.
As I said in my original post, as an example if I have a function foo()
that returns type setof foo where foo is a foreign table, the
pg_depend_display functions shows no connection between the function foo()
and the type foo to the table foo.

Re: Dependency tree to tie type/function deps to a table

2017-12-17 Thread Jeremy Finzel

On Sun, Dec 17, 2017 at 3:31 PM, Jeremy Finzel  wrote:

> Perhaps you'll find the version on the wiki useful as reference, it's an
>> older version of the dependencies, and can be found at
>> https://wiki.postgresql.org/wiki/Pg_depend_display
>>
>
> This is where I started.  However, I noticed that it do everything I
> need.  As I said in my original post, as an example if I have a function
> foo() that returns type setof foo where foo is a foreign table, the
> pg_depend_display functions shows no connection between the function foo()
> and the type foo to the table foo.
>

Sorry - Haste makes waste - I meant to say "I noticed that it *doesn't* do
everything I need".

Re: reclaiming space from heavily used tables?

2017-12-18 Thread Jeremy Finzel

On Mon, Dec 18, 2017 at 1:03 PM, Rob Nikander 
wrote:

> Hi,
>
> I've got a large table from which I'd like to completely reclaim space. I
> read the docs and it sounds like I can’t run `vacuum full`, because this
> table is accessed constantly and can’t have downtime. Assuming that’s true,
> what do you think of the following idea? Is there a better alternative?
>
> 1. Replace the table T with two tables T1 and T2 and a view T that is
> `select * from T1 union T2`.
> 2. Write only to T1, and slowly move records from T2 to T1.
> 3. When T2 is empty, redefine the view T to be simply `select * from T1`.
> 4. Recreate or vacuum full T2, so space is fully reclaimed.
> 5. Redefine view as the union select and repeat process going other
> direction from T1 to T2, as needed.
>
> I guess one question is whether I can do 1 and 3 (renaming tables,
> redefining views) atomically without disturbing concurrent select queries.
>
>
This is what you want: https://github.com/reorg/pg_repack

This has been around for many years and is a very trusted extension (when
will it be in core).


> Rob
>
>
>

Re: reclaiming space from heavily used tables?

2017-12-18 Thread Jeremy Finzel

>
> It's been around, but is it trusted?  I for one do not trust it.  See
> for example
> https://www.postgresql.org/message-id/CA+TgmoaWoU+BpBG4nwz1L
> gkmdsbl6_hm9r8jnqte-5vrdls...@mail.gmail.com
>
>
Needs some discussion.  Has anyone actually reported corruption related to
this?  I don't doubt the opinion, but I do wonder why I have never seen
reports of corruption related to this when people have readily reported
such things in years past about pg_repack.  If it (apparently) works so
well in practice for so many companies, but has this bug, why not leverage
what they have done and fix its bugs when it serves such an important
function?  The extension serves a huge need on many fronts.

Re: reclaiming space from heavily used tables?

2017-12-18 Thread Jeremy Finzel

>
> Maybe the nature of the corruption caused is different.  It took months
> of running large databases on production for corruption to become
> apparent from multixact bugs, for example.  Or maybe because the
> relfrozenxid is fixed by other activity in the system, any bugs are
> masked -- but that you could get in trouble if autovacuum is disabled,
> perhaps.  Or maybe it's masked even in that case, since autovacuum
> observes that the table has a very old frozenxid and runs a forced
> vacuum on that table anyway.


Or maybe there isn’t actual corruption. There is debate in the Git issue
discussed over this... I’m not sure.  But whatever the case may be, I don’t
find it useful to throw out a tool out of hand because of what in theory
appears to be a huge bug but which in reality has not happened to a very
broad install base of users. pg_repack appears to be the number one way
people solve this kind of problem in very critical production environments.
So I don’t find it reasonable to just throw out the tool.

>
> > If it (apparently) works so well in practice for so many companies,
> > but has this bug, why not leverage what they have done [...] ?
>
> What have they done?
>

Made a tool that can cluster tables, remove bloat and reindex, and move
tables between tablespaces with no downtime.


> > [...] and fix its bugs when it serves such an important function?  The
> > extension serves a huge need on many fronts.
>
> It's not PGDG's tool, so these are not "our" bugs to fix.  Though it's
> pretty clear that we need *something* to cover those needs ... but
> nothing has been submitted to commitfests.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


I’m not trying to point fingers or say who should do what.  My point is
that this tool solves some important production challenges and it would be
better to leverage this code base and maybe fix some issues then try to
write something completely new. And that is largely because there has
already been huge production usage of this tool for years which is already
exposed many issues that have been fixed.
Many thanks for the feedback,
Jeremy

Re: Deadlock with one table - PostgreSQL is doing it right

2017-12-21 Thread Jeremy Finzel

It's hard to follow how the 2 videos relate, because you don't run the same
SQL both places.  You first update where i = 2 in Postgres and i = 1 in
Oracle.

On Thu, Dec 21, 2017 at 4:37 AM, Hans Schou  wrote:

> Hi
>
> FYI - if it has any interest
>
> During my preparation for describing what happens when two processes
> update the same row in a table, I came across that PostgreSQL is doing
> right and Oracle is doing it wrong.
>
> The situation is a process which get a deadlock, but because it is a
> script, it sends a commit anyway. This is bad behavior by humans but that's
> how they are.
>
> After both processes commit's the table should be:
>  i |  n
> ---+---
>  1 | 11
>  2 | 21
> in Oracle it is:
>  i |  n
> ---+---
>  1 | 11
>  2 | 22
>
> PostgreSQL: https://youtu.be/rH-inFRMcvQ
> Oracle: https://youtu.be/l2IGoaWql64
>
> PostgreSQL:
> A
> select * from t;
> begin;
> update t set n=n+1 where i=2;
>
> B
> begin;
> update t set n=n+1 where i=1;
> update t set n=n+1 where i=2;
>
> A
> update t set n=n+1 where i=1;
>
> B
> commit;
>
> A
> commit;
>
> best regards
> hans
>

Problem with pgq3 packages missing pgq_node

2017-12-22 Thread Jeremy Finzel

Posting here because it seems the pgsql-pkg-debian list is not very
responsive.

This recent package appears to be broken for pgq3, used for Skytools
replication:
https://www.postgresql.org/message-id/E1eR2Lv-0003h7-FC@
atalia.postgresql.org

There is no pgq_node.control file (postgresql-9.6-pgq3), meaning the
package won't update properly.  Since this is only a package version change
this seems like a mistake.

Has anyone else come across this?  It could break production very easily.
It was only just released 12-18.

Thanks,
Jeremy

Deadlock between concurrent index builds on different tables

2017-12-22 Thread Jeremy Finzel

I am attempting to build several indexes in parallel, guaranteeing that I
never build one on the same table twice.  I understand I can't build two on
the same table at once or I will get a deadlock.  However, I am also
getting a deadlock when doing several in parallel on different tables.

Here is an example of the error I am getting:

2017-12-22 15:48:07.669 CST,"CREATE INDEX",2017-12-22 15:48:02
CST,8/32,0,ERROR,40P01,"deadlock detected","Process 4470 waits for
ShareLock on virtual transaction 4/262; blocked by process 4466.
Process 4466 waits for ShareLock on virtual transaction 8/32; blocked by
process 4470.
Process 4470: CREATE INDEX CONCURRENTLY index_foo_on_created_at ON foo
USING btree (created_at);
Process 4466: CREATE INDEX CONCURRENTLY index_bar_on_id ON bar USING btree
(id);","See server log for query details.""CREATE INDEX CONCURRENTLY
index_foo_on_created_at ON foo USING btree (created_at);",,,""

Here is my process:

   - Kick off one index build and background 1 second apart, using a queue
   table to determine what indexes to build and what is finished
   - When I determine if a index can be built, I first check if there is
   another already in build on target table before starting another
   - After the index is built in each workflow, I mark it as built in a
   queue table

I assume that one of the indexes (the earlier transaction) is building and
taking so long that several finished indexes end up waiting, and there is
perhaps a limit to that?

Any insight appreciated.

Thanks,
Jeremy

Re: Deadlock between concurrent index builds on different tables

2017-12-23 Thread Jeremy Finzel

>
>
>> Each index build needs to wait for all other transactions (Including the
> ones used by the other index build) to finish.  So I don't think a deadlock
> here is unexpected.
>
> Cheers,
>
> Jeff
>
 Does that mean I should never build more than one concurrent index at a
time within the entire cluster? If so, that is not clear from the
documentation.

So if I start 2 index builds within a second of each other, you are saying
that the first will not finish even though it started first because there
is a long transaction from the second index build? I would expect much more
deadlocks in this case, but building 5 indexes concurrently at a time of a
total 15 indexes I only get one or two deadlocks. I would expect many more
if I can’t even have 2 at once...

Thanks,
Jeremy

Re: Deadlock between concurrent index builds on different tables

2017-12-23 Thread Jeremy Finzel

>
> The only thing I can think of is that there's a foreign key from foo to
> bar(id), but the create index on bar shouldn't prevent a share lock on
> foo, even if such a restriction exists.
>
> --
> Martín Marquéshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
No, there are no foreign keys at all in this db. It is a logical replica.

Thanks,
Jeremy

Re: Deadlock between concurrent index builds on different tables

2017-12-26 Thread Jeremy Finzel

On Tue, Dec 26, 2017 at 10:28 AM, Alvaro Herrera 
wrote:

> Jeremy Finzel wrote:
> > >
> > >
> > >> Each index build needs to wait for all other transactions
> > >> (Including the ones used by the other index build) to finish.
> > >> So I don't think a deadlock here is unexpected.
>
> > Does that mean I should never build more than one concurrent index at
> > a time within the entire cluster? If so, that is not clear from the
> > documentation.
>
> No, there is no such expectation.  Jeff analyzed your scenario,
> discovered a bug and sent a patch to fix it -- care to test it and
> report back?  You can get it from here:
>
> https://www.postgresql.org/message-id/CAMkU=1ztk3TpQdcUNbxq93pc80FrXUjpDWL
> gmevbdx71ghn...@mail.gmail.com
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
I would be thrilled to review it, but I need a little direction as I have
not done a patch review before.  I have been reading through some of the
developer FAQ and patch info. The attached file is simply a git diff, so
I'm not sure how I am to use this?  Is there a specific source version I
can download and compile?  I know where to get the current master, etc.,
from git, but where can I get the patched version or what is the proper way
to apply the patch to current master?

Thanks!
Jeremy

Re: Deadlock between concurrent index builds on different tables

2017-12-26 Thread Jeremy Finzel

On Tue, Dec 26, 2017 at 11:11 AM, Alvaro Herrera 
wrote:

> Jeremy Finzel wrote:
> > On Tue, Dec 26, 2017 at 10:28 AM, Alvaro Herrera <
> alvhe...@alvh.no-ip.org>
> > wrote:
> >
> > > Jeremy Finzel wrote:
> > > > >
> > > > >> Each index build needs to wait for all other transactions
> > > > >> (Including the ones used by the other index build) to finish.
> > > > >> So I don't think a deadlock here is unexpected.
> > >
> > > > Does that mean I should never build more than one concurrent index at
> > > > a time within the entire cluster? If so, that is not clear from the
> > > > documentation.
> > >
> > > No, there is no such expectation.  Jeff analyzed your scenario,
> > > discovered a bug and sent a patch to fix it -- care to test it and
> > > report back?  You can get it from here:
> > >
> > > https://www.postgresql.org/message-id/CAMkU=
> 1ztk3TpQdcUNbxq93pc80FrXUjpDWL
> > > gmevbdx71ghn...@mail.gmail.com
>
> > I would be thrilled to review it, but I need a little direction as I have
> > not done a patch review before.  I have been reading through some of the
> > developer FAQ and patch info. The attached file is simply a git diff, so
> > I'm not sure how I am to use this?  Is there a specific source version I
> > can download and compile?  I know where to get the current master, etc.,
> > from git, but where can I get the patched version or what is the proper
> way
> > to apply the patch to current master?
>
> I don't think a patch review as such is necessary -- that code is very
> complex and you'd need to learn about a lot of internals (though I won't
> stop if you want to learn).  I was thinking about testing it, instead.
>
> To create a patched build,
> 1. get a clone with the branch you're on.  Assuming you're on 9.6, it'd
>be like this
>git clone  -b REL9_6_STABLE
>
> 2. apply the patch on top
>cd postgresql
>patch -p1 < /path/to/file.diff
># you could use "git apply" instead (or "git am", but not with this one)
>
> 3. configure and make
>./configure 
>make
>make install
>
> 4. run it
>initdb -D# to create a fresh datadir
>pg_ctl 
>
> You may need additional packages (zlib devel, readline devel, others;
> see https://www.postgresql.org/docs/9.6/static/installation.html)
>
> For the options in step 3 you could use whatever your current server
> has; use "pg_config --configure" to find these out.  You're gonna need
> same flags if you want to use your existing data directory.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Many thanks for the great and simple explanation.

I was able to get this compiled, and ran the test before on stock 9.6.6,
then on this patched version.  I indeed reproduced it on 9.6.6, but on the
patched version, it indeed fixes my issue!

I am indeed very interested in learning more about the whole patch review
process, but I will have to save that for another day!

Let me know if you need me to check anything else!

Thanks,
Jeremy

Mimic parse bind execute in psql?

2017-12-29 Thread Jeremy Finzel

If I want to mimic a log statement that is from an application that uses
PARSE/BIND/EXECUTE, is there any way to mimic this behavior in psql, for
example to troubleshoot execution plans, other than writing a named
prepared statement, or manually subbing the parameters?

It would be nice to take the commands just as in they appear in the log
files and execute them in psql as PARSE  BIND ... EXECUTE.  Perhaps I'm
missing something?

Thanks,
Jeremy

Alter table set logged hanging after writing out all WAL

2018-02-05 Thread Jeremy Finzel

We are running:

PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.7.2-5)
4.7.2, 64-bit

The table I am setting to logged is 32GB with indexes.  I see it writing
WAL files like crazy but after about an hour and a half, it has written out
some 2500 WAL segments, then it just sits and continues to run as "active",
but no more WAL files are being created.  There are no locks, and no other
transactions in the system running.

Any ideas?  Is it still doing something that I need to wait for?

Thanks,
Jeremy

Re: Alter table set logged hanging after writing out all WAL

2018-02-05 Thread Jeremy Finzel

Here is the basic structure - is the gist index significant?:

CREATE UNLOGGED TABLE foo (
as_of_date daterange NOT NULL,
customer_id integer,
bunch_of_fields_here);

ALTER TABLE ONLY foo
ADD CONSTRAINT foo_as_of_date_excl EXCLUDE USING gist (customer_id WITH
=, as_of_date WITH &&);

CREATE UNIQUE INDEX foo_idx1 ON foo USING btree (customer_id) WHERE
(upper(as_of_date) = 'infinity'::date);

CREATE INDEX foo_idx2 ON foo USING btree (customer_id, lower(as_of_date))
WHERE (upper(as_of_date) = 'infinity'::date);

CREATE UNIQUE INDEX foo_idx3 ON foo USING btree (customer_id,
lower(as_of_date));


This is all I see - please help me if there's a better command I can run:

strace -p 6497
read(32, "\347\0\0\0pi\3\222p\371\0\0\254\0H\1\0 \4
\0\0\0\\237\240\1X\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0\340\214\3\222\314x\0\0\254\0H\1\0 \4
\0\0\0\0(\237\260\1P\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0(\260\3\222\242A\0\0\254\0p\1\0 \4
\0\0\0\0(\237\260\1X\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0\220\323\3\222Kg\0\0\254\0P\1\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0\340\366\3\222\v|\0\0\254\0h\1\0 \4
\0\0\0\0(\237\260\1X\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0`\32\4\222\230m\0\0\254\0008\1\0 \4
\0\0\0\\237\240\1X\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0\350=\4\222\235\342\0\0\254\\1\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0 b\4\222\224f\0\0\260\0\270\0\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0(\206\4\222\301\373\0\0\260\0\320\0\0 \4
\0\0\0\0(\237\260\1X\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0\260\251\4\222Hx\0\0\254\\1\0 \4
\0\0\0\0(\237\260\1H\236\300\1"..., 8192) = 8192
read(32, "\347\0\0\0\260\315\4\222\254d\0\0\260\0\330\0\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0\370\360\4\222\357\235\0\0\254\0p\1\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0 \25\5\222\250k\0\0\260\0\260\0\0 \4
\0\0\0\0(\237\260\1X\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0\0209\5\222\353\1\0\0\260\0\350\0\0 \4
\0\0\0\\237\240\1X\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0\30]\5\222UQ\0\0\260\0\320\0\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0P\201\5\222g\226\0\0\260\0\270\0\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0\210\244\5\222\332\324\0\0\254\0\200\1\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0\310\307\5\222Lr\0\0\254\0x\1\0 \4
\0\0\0\0(\237\260\1`\236\220\1"..., 8192) = 8192
read(32, "\347\0\0\0\0\353\5\2225\346\0\0\254\0\200\1\0 \4
\0\0\0\0(\237\260\1P\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0p\16\6\222\36w\0\0\254\0H\1\0 \4
\0\0\0\0(\237\260\1P\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0\2302\6\222I\251\0\0\260\0\260\0\0 \4
\0\0\0\0(\237\260\1X\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0008V\6\222{N\0\0\254\0\30\1\0 \4
\0\0\0\0(\237\260\1P\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0py\6\222}t\0\0\254\0\200\1\0 \4
\0\0\0\0(\237\260\1P\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0\220\235\6\222\1\17\0\0\260\0\270\0\0 \4
\0\0\0\0(\237\260\1X\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0\10\301\6\222\236\352\0\0\254\0X\1\0 \4 \0\0\0\0
\237\300\1H\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0\210\344\6\2226%\0\0\254\0008\1\0 \4
\0\0\0\\237\240\1X\236\260\1"..., 8192) = 8192
read(32, "\347\0\0\0\330\7\7\222\1\360\0\0\254\0h\1\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192
read(32, "\347\0\0\0\0,\7\222G'\0\0\260\0\260\0\0 \4
\0\0\0\\237\240\1`\236\240\1"..., 8192) = 8192

On Tue, Feb 6, 2018 at 1:19 AM, Michael Paquier 
wrote:

> On Tue, Feb 06, 2018 at 12:50:56AM -0600, Jeremy Finzel wrote:
> > The table I am setting to logged is 32GB with indexes.  I see it writing
> > WAL files like crazy but after about an hour and a half, it has written
> out
> > some 2500 WAL segments, then it just sits and continues to run as
> "active",
> > but no more WAL files are being created.  There are no locks, and no
> other
> > transactions in the system running.
>
> Switching an unlogged table to be logged causes an entire image of the
> table to be WAL-logged so as the operation is consistent in case of a
> crash.
>
> > Any ideas?  Is it still doing something that I need to wait for?
>
> Do you have a backtrace with the process doing the ALTER TABLE hanging?
> How is structured you table with its indexes?  It is a bit hard to guess
> much without more information.
> --
> Michael
>

Re: Alter table set logged hanging after writing out all WAL

2018-02-06 Thread Jeremy Finzel

On Tue, Feb 6, 2018 at 6:02 PM Michael Paquier 
wrote:

> On Tue, Feb 06, 2018 at 01:36:04AM -0600, Jeremy Finzel wrote:
> > Here is the basic structure - is the gist index significant?:
> >
> > CREATE UNLOGGED TABLE foo (
> > as_of_date daterange NOT NULL,
> > customer_id integer,
> > bunch_of_fields_here);
> >
> > ALTER TABLE ONLY foo
> > ADD CONSTRAINT foo_as_of_date_excl EXCLUDE USING gist (customer_id
> WITH
> > =, as_of_date WITH &&);
> >
> > CREATE UNIQUE INDEX foo_idx1 ON foo USING btree (customer_id) WHERE
> > (upper(as_of_date) = 'infinity'::date);
> >
> > CREATE INDEX foo_idx2 ON foo USING btree (customer_id, lower(as_of_date))
> > WHERE (upper(as_of_date) = 'infinity'::date);
> >
> > CREATE UNIQUE INDEX foo_idx3 ON foo USING btree (customer_id,
> > lower(as_of_date));
>
> I am not sure, but I would think about something related to gist here
> when heavy insertions are done on it...  I cannot put my finger on the
> thread though.
>
> > This is all I see - please help me if there's a better command I can
> > run:
>
> If the process is still running, can you attach gdb to it and then run
> the command bt? You may need to install debugging symbols to make the
> trace readable.
> --
> Michael


I am trying a few other scenarios to see if I can reproduce. I was able to
set to logged a copy of the table with no indexes. I am now attempting same
with only the gist index. If I can reproduce it on a non production server
I will try gdb.

Thank you much for the follow up.

Jeremy

Re: Alter table set logged hanging after writing out all WAL

2018-02-07 Thread Jeremy Finzel

On Tue, Feb 6, 2018 at 9:48 PM, Jeremy Finzel  wrote:

> On Tue, Feb 6, 2018 at 6:02 PM Michael Paquier 
> wrote:
>
>> On Tue, Feb 06, 2018 at 01:36:04AM -0600, Jeremy Finzel wrote:
>> > Here is the basic structure - is the gist index significant?:
>> >
>> > CREATE UNLOGGED TABLE foo (
>> > as_of_date daterange NOT NULL,
>> > customer_id integer,
>> > bunch_of_fields_here);
>> >
>> > ALTER TABLE ONLY foo
>> > ADD CONSTRAINT foo_as_of_date_excl EXCLUDE USING gist (customer_id
>> WITH
>> > =, as_of_date WITH &&);
>> >
>> > CREATE UNIQUE INDEX foo_idx1 ON foo USING btree (customer_id) WHERE
>> > (upper(as_of_date) = 'infinity'::date);
>> >
>> > CREATE INDEX foo_idx2 ON foo USING btree (customer_id,
>> lower(as_of_date))
>> > WHERE (upper(as_of_date) = 'infinity'::date);
>> >
>> > CREATE UNIQUE INDEX foo_idx3 ON foo USING btree (customer_id,
>> > lower(as_of_date));
>>
>> I am not sure, but I would think about something related to gist here
>> when heavy insertions are done on it...  I cannot put my finger on the
>> thread though.
>>
>> > This is all I see - please help me if there's a better command I can
>> > run:
>>
>> If the process is still running, can you attach gdb to it and then run
>> the command bt? You may need to install debugging symbols to make the
>> trace readable.
>> --
>> Michael
>
>
> I am trying a few other scenarios to see if I can reproduce. I was able to
> set to logged a copy of the table with no indexes. I am now attempting same
> with only the gist index. If I can reproduce it on a non production server
> I will try gdb.
>
> Thank you much for the follow up.
>
> Jeremy
>

I was able to get it to finish by just waiting awhile.  To give you an
idea, the table with no indexes was set logged in 7 minutes.  With the gist
index, it took 3 hours but finally finished.  It is only writing WAL for
about the first 30 minutes, then it apparently is not writing any more WAL
but takes very long to finish.

Thanks,
Jeremy

Re: pglogical in postgres 9.6

2018-02-13 Thread Jeremy Finzel

On Tue, Feb 13, 2018 at 1:46 PM greigwise  wrote:

> No need!  I figured it out.
>
> Had to put this "synchronize_data := false" on the create_subscription
> call.
> Weird that there seem to be redundant parameters for this; one on the
> replication set add and one on the create subscription.  Maybe I'm not
> quite
> understanding the usage on those or something.   If anyone knows the
> difference, I'd be interested to hear.
>
> Greig
>
>
>
> --
> Sent from:
> http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
>
> Yes. When you add a subscription with sync = true, then all tables in the
replication set(s) will be synced. But suppose you want to add a table
later to the replication set. Sync = true will sync only that one table. So
the latter is more granular.

Jeremy

Enforce primary key on every table during dev?

2018-02-28 Thread Jeremy Finzel

We want to enforce a policy, partly just to protect those who might forget,
for every table in a particular schema to have a primary key.  This can't
be done with event triggers as far as I can see, because it is quite
legitimate to do:

BEGIN;
CREATE TABLE foo (id int);
ALTER TABLE foo ADD PRIMARY KEY (id);
COMMIT;

It would be nice to have some kind of "deferrable event trigger" or some
way to enforce that no transaction commits which added a table without a
primary key.

Any ideas?

Thanks,
Jeremy

ERROR could not access transaction/Could not open file pg_commit_ts

2018-03-09 Thread Jeremy Finzel

Hello -

Here is our cluster setup:

cluster_a 9.5.11 Ubuntu 16.04.4 LTS
 --> cluster_b (streamer) 9.5.11 Ubuntu 16.04.4 LTS
 --> cluster_c (streamer) 9.5.11 Ubuntu 16.04.4 LTS

Very recently, we started seeing these errors when running a query on a
specific table on the streamer:

2018-03-09 08:28:16.280
CST,"uname","foo",18692,"0.0.0.0:0",5aa29292.4904,4,"SELECT",2018-03-09
07:56:34 CST,18/15992,0,*ERROR*,58P01,"*could not access status of
transaction 1035047007*","*Could not open file ""pg_commit_ts/9A45*"": No
such file or directory."

A little history on the cluster:

   - The most recent change we made was a point release upgrade from 9.5.5
   to 9.5.11 on the master, and 9.5.9 to 9.5.11 for the 2 streamers
   - It is a very high WAL traffic reporting system.
   - We actually have synchronous_commit set to off.  It's possible this
   could have bitten us and we are just now seeing issues, however there have
   been no crashes since the table in question was created.
   - We have run pg_repack on many tables on this cluster, but that also
   has not happened since over a month
   - We had a similar error of missing pg_commit_ts file over a year ago
   after an actual crash.  We had serious issues getting the cluster to start,
   and had to resort to recreating the missing pg_commit_ts with null bytes
   (IIRC, we had a snapshot of the system which still showed the file), which
   worked but left us questioning what really caused the issue.


The table that is causing the error has been in production and used fine
since 2/15/2018 when it was created. It is fed by pglogical replication (v.
2.1.1 on subscriber) from a system running 9.6.1 and pglogical v. 1.2.1.
The point release upgrade from earlier 9.5 did take place *after* this.

However, we *only* just started seeing errors in the past 12 hours.  The
table was autovacuumed on master at 2018-03-08 18:18:15.532137-06, which
was about 3 hours before the first user query errored, however, I saw that
2 hours after the autovac, there was another user query that worked
successfully on the table.  Not sure if related?

Any insight/ideas would be much appreciated!

Thanks,
Jeremy

Re: ERROR could not access transaction/Could not open file pg_commit_ts

2018-03-09 Thread Jeremy Finzel

On Fri, Mar 9, 2018 at 10:43 AM, Jeremy Finzel  wrote:

> Hello -
>
> Here is our cluster setup:
>
> cluster_a 9.5.11 Ubuntu 16.04.4 LTS
>  --> cluster_b (streamer) 9.5.11 Ubuntu 16.04.4 LTS
>  --> cluster_c (streamer) 9.5.11 Ubuntu 16.04.4 LTS
>
> Very recently, we started seeing these errors when running a query on a
> specific table on the streamer:
>
> 2018-03-09 08:28:16.280 CST,"uname","foo",18692,"0.0.0.0:0
> ",5aa29292.4904,4,"SELECT",2018-03-09 07:56:34 CST,18/15992,0,*ERROR*
> ,58P01,"*could not access status of transaction 1035047007*","*Could not
> open file ""pg_commit_ts/9A45*"": No such file or directory."
>
> A little history on the cluster:
>
>- The most recent change we made was a point release upgrade
>from 9.5.5 to 9.5.11 on the master, and 9.5.9 to 9.5.11 for the 2 streamers
>- It is a very high WAL traffic reporting system.
>- We actually have synchronous_commit set to off.  It's possible this
>could have bitten us and we are just now seeing issues, however there have
>been no crashes since the table in question was created.
>- We have run pg_repack on many tables on this cluster, but that also
>has not happened since over a month
>- We had a similar error of missing pg_commit_ts file over a year ago
>after an actual crash.  We had serious issues getting the cluster to start,
>and had to resort to recreating the missing pg_commit_ts with null
>bytes (IIRC, we had a snapshot of the system which still showed the file),
>which worked but left us questioning what really caused the issue.
>
>
> The table that is causing the error has been in production and used fine
> since 2/15/2018 when it was created. It is fed by pglogical replication (v.
> 2.1.1 on subscriber) from a system running 9.6.1 and pglogical v. 1.2.1.
> The point release upgrade from earlier 9.5 did take place *after* this.
>
> However, we *only* just started seeing errors in the past 12 hours.  The
> table was autovacuumed on master at 2018-03-08 18:18:15.532137-06, which
> was about 3 hours before the first user query errored, however, I saw that
> 2 hours after the autovac, there was another user query that worked
> successfully on the table.  Not sure if related?
>
> Any insight/ideas would be much appreciated!
>
> Thanks,
> Jeremy
>

UPDATE: what is actually failing is a call to
pg_xact_commit_timestamp(xmin) on a given table under the view.  We still
think we must have some corruption though with pg_commit_ts.

Primary key gist index?

2018-03-14 Thread Jeremy Finzel

Hello!  From all that I can tell, it is not possible using a btree_gist
index as a primary key.  If so, why not?  I have a table with this gist
index which truly ought to be its primary key.  as_of_date is of range date
type:

EXCLUDE USING gist (id WITH =, as_of_date WITH &&)

Any direction here would be much appreciated.

Right now, I am forced to create a redundant btree index UNIQUE, btree (id,
lower(as_of_date)) in order to have a primary key on the table.

Thanks!
Jeremy

Re: Primary key gist index?

2018-03-14 Thread Jeremy Finzel

On Wed, Mar 14, 2018 at 8:33 AM, Adrian Klaver 
wrote:

> On 03/14/2018 06:19 AM, Jeremy Finzel wrote:
>
>> Hello!  From all that I can tell, it is not possible using a btree_gist
>> index as a primary key.  If so, why not?  I have a table with this gist
>>
>
> https://www.postgresql.org/docs/10/static/btree-gist.html
>
> "In general, these operator classes will not outperform the equivalent
> standard B-tree index methods, and they lack one major feature of the
> standard B-tree code: the ability to enforce uniqueness."
>
>
> index which truly ought to be its primary key.  as_of_date is of range
>> date type:
>>
>> EXCLUDE USING gist (id WITH =, as_of_date WITH &&)
>>
>> Any direction here would be much appreciated.
>>
>> Right now, I am forced to create a redundant btree index UNIQUE, btree
>> (id, lower(as_of_date)) in order to have a primary key on the table.
>>
>> Thanks!
>> Jeremy
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Thank you for the ref.  But I don't understand how an exclusion constraint
does not have "the ability to enforce uniqueness" unless they just mean
that is the case "under the covers of postgres".  That is exactly what it
does, right?  By the definition of the exclusion index I have above, there
cannot be more than one row with the same id and as_of_date values.

Thanks,
Jeremy

Re: Primary key gist index?

2018-03-14 Thread Jeremy Finzel

On Wed, Mar 14, 2018 at 1:29 PM Paul Jungwirth 
wrote:

> On 03/14/2018 06:19 AM, Jeremy Finzel wrote:
> > Hello!  From all that I can tell, it is not possible using a btree_gist
> > index as a primary key.  If so, why not?  I have a table with this gist
> > index which truly ought to be its primary key.  as_of_date is of range
> > date type:
> >
> > EXCLUDE USING gist (id WITH =, as_of_date WITH &&)
>
> I'm curious why you need a primary key on this table, especially if the
> exclusion constraint is already preventing duplicate/overlapping records?
>
> Technically I think an exclusion constraint (or at least this one)
> fulfills the formal requirements of a primary key (is unique, isn't
> null), but maybe there are other primary-key duties it doesn't meet,
> like defining foreign keys that reference it. I've been on-and-off
> building an extension for temporal foreign keys at [1]. That is pretty
> new, but perhaps it will be useful/interesting to you. And if you have
> any feedback, I'd love to hear it!
>
> But anyway, maybe if you shared why the table needs a real PRIMARY KEY,
> people here can suggest something.
>
> [1] https://github.com/pjungwir/time_for_keys
>
> Yours,
>
> --
> Paul  ~{:-)
> pj@ 


Because many extensions require primary keys. I also infer primary keys for
various purposes.



illuminatedcomputing.com 
>
>

found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Jeremy Finzel

Getting some concerning errors in one of our databases that is on 9.5.11,
on autovacuum from template0 database pg_authid and pg_auth_members.  I
only saw some notes on the list about this error related to materialized
views.  FWIW, we did use pg_upgrade to upgrade this database from 9.4 to
9.5.  Here is an example:

2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19 12:08:33
CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before relfrozenxid
740087784","automatic vacuum of table
""template0.pg_catalog.pg_authid"""""
2018-03-19 12:08:33.957 CDT,,,14892,,5aafee91.3a2c,2,,2018-03-19 12:08:33
CDT,59/340953,0,ERROR,XX001,"found xmin 2906288383 from before relfrozenxid
740087784","automatic vacuum of table
""template0.pg_catalog.pg_auth_members"""""


Any insight would be much appreciated.

Thanks,
Jeremy

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Jeremy Finzel

pg_control version number:942
Catalog version number:   201510051
Database system identifier:   6351536019599012028
Database cluster state:   in production
pg_control last modified: Mon 19 Mar 2018 12:56:10 PM CDT
Latest checkpoint location:   262BE/FE96240
Prior checkpoint location:262BA/623D5E40
Latest checkpoint's REDO location:262BA/F5499E98
Latest checkpoint's REDO WAL file:0001000262BA00F5
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  16/3132524419
Latest checkpoint's NextOID:  1090653331
Latest checkpoint's NextMultiXactId:  2142
Latest checkpoint's NextMultiOffset:  5235
Latest checkpoint's oldestXID:1829964553
Latest checkpoint's oldestXID's DB:   12376
Latest checkpoint's oldestActiveXID:  3131774441
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16400
Latest checkpoint's oldestCommitTsXid:1829964553
Latest checkpoint's newestCommitTsXid:3132524418
Time of latest checkpoint:Mon 19 Mar 2018 12:54:08 PM CDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
wal_level setting:logical
wal_log_hints setting:off
max_connections setting:  2000
max_worker_processes setting: 10
max_prepared_xacts setting:   0
max_locks_per_xact setting:   64
track_commit_timestamp setting:   on
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 2048
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0

On Mon, Mar 19, 2018 at 12:46 PM, Alvaro Herrera 
wrote:

> Jeremy Finzel wrote:
> > Getting some concerning errors in one of our databases that is on 9.5.11,
> > on autovacuum from template0 database pg_authid and pg_auth_members.  I
> > only saw some notes on the list about this error related to materialized
> > views.  FWIW, we did use pg_upgrade to upgrade this database from 9.4 to
> > 9.5.  Here is an example:
> >
> > 2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19
> 12:08:33
> > CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before
> relfrozenxid
> > 740087784","automatic vacuum of table
> > ""template0.pg_catalog.pg_authid"""""
>
> Can you please supply output of pg_controldata?
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Jeremy Finzel

On Mon, Mar 19, 2018 at 12:46 PM, Alvaro Herrera 
wrote:

> Jeremy Finzel wrote:
> > Getting some concerning errors in one of our databases that is on 9.5.11,
> > on autovacuum from template0 database pg_authid and pg_auth_members.  I
> > only saw some notes on the list about this error related to materialized
> > views.  FWIW, we did use pg_upgrade to upgrade this database from 9.4 to
> > 9.5.  Here is an example:
> >
> > 2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19
> 12:08:33
> > CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before
> relfrozenxid
> > 740087784","automatic vacuum of table
> > ""template0.pg_catalog.pg_authid"""""
>
> Can you please supply output of pg_controldata?
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Please forgive my accidental top-post.  Here:

pg_control version number:942
Catalog version number:   201510051
Database system identifier:   6351536019599012028
Database cluster state:   in production
pg_control last modified: Mon 19 Mar 2018 12:56:10 PM CDT
Latest checkpoint location:   262BE/FE96240
Prior checkpoint location:262BA/623D5E40
Latest checkpoint's REDO location:262BA/F5499E98
Latest checkpoint's REDO WAL file:0001000262BA00F5
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  16/3132524419
Latest checkpoint's NextOID:  1090653331
Latest checkpoint's NextMultiXactId:  2142
Latest checkpoint's NextMultiOffset:  5235
Latest checkpoint's oldestXID:1829964553
Latest checkpoint's oldestXID's DB:   12376
Latest checkpoint's oldestActiveXID:  3131774441
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16400
Latest checkpoint's oldestCommitTsXid:1829964553
Latest checkpoint's newestCommitTsXid:3132524418
Time of latest checkpoint:Mon 19 Mar 2018 12:54:08 PM CDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
wal_level setting:logical
wal_log_hints setting:off
max_connections setting:  2000
max_worker_processes setting: 10
max_prepared_xacts setting:   0
max_locks_per_xact setting:   64
track_commit_timestamp setting:   on
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 2048
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0


Thanks,
Jeremy

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Jeremy Finzel

On Mon, Mar 19, 2018 at 1:17 PM, Andres Freund  wrote:

> Hi Jeremy, Alvaro,
>
> On 2018-03-19 13:00:13 -0500, Jeremy Finzel wrote:
> > On Mon, Mar 19, 2018 at 12:46 PM, Alvaro Herrera <
> alvhe...@alvh.no-ip.org>
> > wrote:
> >
> > > Jeremy Finzel wrote:
> > > > Getting some concerning errors in one of our databases that is on
> 9.5.11,
> > > > on autovacuum from template0 database pg_authid and
> pg_auth_members.  I
> > > > only saw some notes on the list about this error related to
> materialized
> > > > views.  FWIW, we did use pg_upgrade to upgrade this database from
> 9.4 to
> > > > 9.5.  Here is an example:
> > > >
> > > > 2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19
> > > 12:08:33
> > > > CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before
> > > relfrozenxid
> > > > 740087784","automatic vacuum of table
> > > > ""template0.pg_catalog.pg_authid"""""
> > >
> > > Can you please supply output of pg_controldata?
>
> > Latest checkpoint's NextXID:  16/3132524419
> > Latest checkpoint's NextMultiXactId:  2142
> > Latest checkpoint's NextMultiOffset:  5235
> > Latest checkpoint's oldestXID:1829964553
> > Latest checkpoint's oldestXID's DB:   12376
> > Latest checkpoint's oldestActiveXID:  3131774441
> > Latest checkpoint's oldestMultiXid:   1
> > Latest checkpoint's oldestMulti's DB: 16400
>
> Hm, based on these it doesn't look like multixacts were involved (based
> on oldestMultiXid it's highly unlikley there've multi wraparound, and
> there's not much multixact usage on system tables anyway).  Which
> suggests that there might have been actual corrpution here.
>
> Jeremy:
> - which version of 9.4 and 9.5 ran on this? Do you know?
>

We upgraded to 9.5.5, and today we are running 9.5.11.  And actually we
upgraded from 9.3, not 9.4.  We are still trying to figure out which point
release we were on at 9.3.


> - Can you install the pageinspect extension? If so, it might be a
>   CREATE EXTENSION pageinspect;
>   CREATE OR REPLACE FUNCTION check_rel(rel regclass, OUT blockno int8, OUT
> lp int2, OUT xmin xid)
> RETURNS SETOF RECORD
> LANGUAGE SQL
> AS $$
> SELECT blockno, lp, t_xmin
> FROM
> generate_series(0, pg_relation_size($1::text) / 8192 - 1) blockno,
> -- every block in the relation
> heap_page_items(get_raw_page($1::text, blockno::int4)) -- every
> item on the page
> WHERE
> t_xmin IS NOT NULL -- filter out empty items
> AND t_xmin != 1 -- filter out bootstrap
> AND t_xmin != 2 -- filter out frozen transaction id
> AND (t_infomask & ((x'0100' | x'0200')::int)) != ((x'0100' |
> x'0200')::int) -- filter out frozen rows with xid present
> AND age(t_xmin) > age((SELECT relfrozenxid FROM pg_class WHERE oid
> = $1)) -- xid cutoff filter
> $$;
>   SELECT * FROM check_rel('pg_authid') LIMIT 100;
>

Small note - Needs to be this because != is not supported for xid:

AND NOT t_xmin = 1 -- filter out bootstrap
AND NOT t_xmin = 2 -- filter out frozen transaction id


>
>   and then display all items for one of the affected pages like
>   SELECT * FROM heap_page_items(get_raw_page('pg_authid', 34343));
>
>
> Alvaro:
> - Hm, we talked about code adding context for these kind of errors,
>   right? Is that just skipped for csvlog?
> - Alvaro, does the above check_rel() function make sense?
>
> Greetings,
>
> Andres Freund
>

The function does NOT show any issue with either of those tables.

One very interesting thing that is puzzling us - we have taken several san
snapshots of the system real time that are running on the exact same
version 9.5.11, and they do NOT show the same error when we vacuum these
tables.  It makes us wonder if simply a db restart would solve the issue.

We will continue to investigate but interested in your feedback about what
we have seen thus far.

Thanks,
Jeremy

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Jeremy Finzel

On Mon, Mar 19, 2018 at 2:41 PM, Andres Freund  wrote:

> On 2018-03-19 14:37:24 -0500, Jeremy Finzel wrote:
> > We upgraded to 9.5.5, and today we are running 9.5.11.  And actually we
> > upgraded from 9.3, not 9.4.  We are still trying to figure out which
> point
> > release we were on at 9.3.
>
> Ok.  IIRC there used to be a bug a few years back that sometimes lead to
> highly contended pages being skipped during vacuum, and we'd still
> update relfrozenxid. IIRC it required the table to be extended at the
> same time or something?
>
>
> >
> > > - Can you install the pageinspect extension? If so, it might be a
> > >   CREATE EXTENSION pageinspect;
> > >   CREATE OR REPLACE FUNCTION check_rel(rel regclass, OUT blockno int8,
> OUT
> > > lp int2, OUT xmin xid)
> > > RETURNS SETOF RECORD
> > > LANGUAGE SQL
> > > AS $$
> > > SELECT blockno, lp, t_xmin
> > > FROM
> > > generate_series(0, pg_relation_size($1::text) / 8192 - 1)
> blockno,
> > > -- every block in the relation
> > > heap_page_items(get_raw_page($1::text, blockno::int4)) --
> every
> > > item on the page
> > > WHERE
> > > t_xmin IS NOT NULL -- filter out empty items
> > > AND t_xmin != 1 -- filter out bootstrap
> > > AND t_xmin != 2 -- filter out frozen transaction id
> > > AND (t_infomask & ((x'0100' | x'0200')::int)) != ((x'0100' |
> > > x'0200')::int) -- filter out frozen rows with xid present
> > > AND age(t_xmin) > age((SELECT relfrozenxid FROM pg_class WHERE
> oid
> > > = $1)) -- xid cutoff filter
> > > $$;
> > >   SELECT * FROM check_rel('pg_authid') LIMIT 100;
> > >
> >
> > Small note - Needs to be this because != is not supported for xid:
> >
> > AND NOT t_xmin = 1 -- filter out bootstrap
> > AND NOT t_xmin = 2 -- filter out frozen transaction id
>
> Only on older releases ;). But yea, that looks right.
>
>
>
> > >   and then display all items for one of the affected pages like
> > >   SELECT * FROM heap_page_items(get_raw_page('pg_authid', 34343));
> > >
> > >
> > > Alvaro:
> > > - Hm, we talked about code adding context for these kind of errors,
> > >   right? Is that just skipped for csvlog?
> > > - Alvaro, does the above check_rel() function make sense?
> > >
> > > Greetings,
> > >
> > > Andres Freund
> > >
> >
> > The function does NOT show any issue with either of those tables.
>
> Uh, huh?  Alvaro, do you see a bug in my query?
>
> Greetings,
>
> Andres Freund
>

FWIW, if I remove the last filter, I get these rows and I believe row 7/57/
2906288382 is the one generating error:

SELECT * FROM check_rel('pg_authid') LIMIT 100;
 blockno | lp |xmin
-++
   7 |  4 | 2040863716
   7 |  5 | 2040863716
   7 |  8 | 2041172882
   7 |  9 | 2041172882
   7 | 12 | 2041201779
   7 | 13 | 2041201779
   7 | 16 | 2089742733
   7 | 17 | 2090021318
   7 | 18 | 2090021318
   7 | 47 | 2090021898
   7 | 48 | 2090021898
   7 | 49 | 2102749003
   7 | 50 | 2103210571
   7 | 51 | 2103210571
   7 | 54 | 2154640913
   7 | 55 | 2163849781
   7 | 56 | 2295315714
   7 | 57 | 2906288382
   7 | 58 | 2906329443
   7 | 60 | 3131766386
   8 |  1 | 2089844462
   8 |  2 | 2089844462
   8 |  3 | 2089844463
   8 |  6 | 2089844463
   8 |  9 | 2295318868
(25 rows)

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Jeremy Finzel

On Mon, Mar 19, 2018 at 2:56 PM, Andres Freund  wrote:

> Hi,
>
> On 2018-03-19 14:53:58 -0500, Jeremy Finzel wrote:
> > FWIW, if I remove the last filter, I get these rows and I believe row
> 7/57/
> > 2906288382 is the one generating error:
>
> Oh, yea, that makes sense. It's wrapped around and looks like it's from
> the future.
>
> > SELECT * FROM check_rel('pg_authid') LIMIT 100;
> >  blockno | lp |xmin
> > -++
> >7 |  4 | 2040863716
> >7 |  5 | 2040863716
> >7 |  8 | 2041172882
> >7 |  9 | 2041172882
> >7 | 12 | 2041201779
> >7 | 13 | 2041201779
> >7 | 16 | 2089742733
> >7 | 17 | 2090021318
> >7 | 18 | 2090021318
> >7 | 47 | 2090021898
> >7 | 48 | 2090021898
> >7 | 49 | 2102749003
> >7 | 50 | 2103210571
> >7 | 51 | 2103210571
> >7 | 54 | 2154640913
> >7 | 55 | 2163849781
> >7 | 56 | 2295315714
> >7 | 57 | 2906288382
> >7 | 58 | 2906329443
> >7 | 60 | 3131766386
> >8 |  1 | 2089844462
> >8 |  2 | 2089844462
> >8 |  3 | 2089844463
> >8 |  6 | 2089844463
> >8 |  9 | 2295318868
> > (25 rows)
>
> Could you show the contents of those two pages with a query like I had
> in an earlier email?
>
> Greetings,
>
> Andres Freund
>

SELECT heap_page_items(get_raw_page('pg_authid', 7));


heap_page_items

 
(1,4720,1,108,1897434979,0,0,"(7,1)",11,2825,32,1101110011100000,507769370)
 
(2,4608,1,108,1897442758,0,18,"(7,2)",11,2825,32,100011111011001000100000,507776451)
 
(3,4496,1,108,1897442758,0,20,"(7,3)",11,2825,32,101000111011001000100000,507776452)
 
(4,4384,1,108,2040863716,0,37,"(7,4)",11,2313,32,1011011011101110001100101000,525105004)
 
(5,4272,1,108,2040863716,0,39,"(7,5)",11,2313,32,10001011011011101110001100101000,525105005)
 (6,0,3,0,)
 (7,0,3,0,)
 
(8,4160,1,108,2041172882,0,49,"(7,8)",11,2313,32,1111010010101100011100101000,525219118)
 
(9,4048,1,108,2041172882,0,51,"(7,9)",11,2313,32,1000010010101100011100101000,525219119)
 (10,0,3,0,)
 (11,0,3,0,)
 
(12,3936,1,108,2041201779,0,181,"(7,12)",11,2313,32,1000110101000100011100101000,525236779)
 
(13,3824,1,108,2041201779,0,183,"(7,13)",11,2313,32,101101000100011100101000,525236780)
 (14,0,3,0,)
 (15,0,3,0,)
 
(16,3712,1,108,2089742733,0,0,"(7,16)",11,2313,32,110001011100111000111000,532706210)
 
(17,3600,1,108,2090021318,0,1,"(7,17)",11,2313,32,11001111010010111000,532753458)
 
(18,3488,1,108,2090021318,0,3,"(7,18)",11,2313,32,100011001111010010111000,532753459)
 
(19,8080,1,108,753125991,0,2,"(7,19)",11,2825,32,101100010101110010111111,236796556)
 
(20,7968,1,108,753125991,0,4,"(7,20)",11,2825,32,1000101100010101110010111111,236796557)
 
(21,7856,1,108,753125992,0,19,"(7,21)",11,2825,32,1000110010010101110010111111,236796563)
 
(22,7744,1,108,753125992,0,21,"(7,22)",11,2825,32,101010010101110010111111,236796564)
 
(23,7632,1,108,753125993,0,36,"(7,23)",11,2825,32,110110010101110010111111,236796570)
 
(24,7520,1,108,753125993,0,38,"(7,24)",11,2825,32,1000110110010101110010111111,236796571)
 
(25,7408,1,108,753125994,0,53,"(7,25)",11,2825,32,100011010101110010111111,236796577)
 
(26,7296,1,108,753125994,0,55,"(7,26)",11,2825,32,110001010101110010111111,236796578)
 
(27,7184,1,108,753125995,0,70,"(7,27)",11,2825,32,1001010101

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Jeremy Finzel

On Mon, Mar 19, 2018 at 3:01 PM, Jeremy Finzel  wrote:

>
>
> On Mon, Mar 19, 2018 at 2:56 PM, Andres Freund  wrote:
>
>> Hi,
>>
>> On 2018-03-19 14:53:58 -0500, Jeremy Finzel wrote:
>> > FWIW, if I remove the last filter, I get these rows and I believe row
>> 7/57/
>> > 2906288382 is the one generating error:
>>
>> Oh, yea, that makes sense. It's wrapped around and looks like it's from
>> the future.
>>
>> > SELECT * FROM check_rel('pg_authid') LIMIT 100;
>> >  blockno | lp |xmin
>> > -++
>> >7 |  4 | 2040863716
>> >7 |  5 | 2040863716
>> >7 |  8 | 2041172882
>> >7 |  9 | 2041172882
>> >7 | 12 | 2041201779
>> >7 | 13 | 2041201779
>> >7 | 16 | 2089742733 <(208)%20974-2733>
>> >7 | 17 | 2090021318
>> >7 | 18 | 2090021318
>> >7 | 47 | 2090021898
>> >7 | 48 | 2090021898
>> >7 | 49 | 2102749003 <(210)%20274-9003>
>> >7 | 50 | 2103210571 <(210)%20321-0571>
>> >7 | 51 | 2103210571 <(210)%20321-0571>
>> >7 | 54 | 2154640913 <(215)%20464-0913>
>> >7 | 55 | 2163849781 <(216)%20384-9781>
>> >7 | 56 | 2295315714 <(229)%20531-5714>
>> >7 | 57 | 2906288382
>> >7 | 58 | 2906329443
>> >7 | 60 | 3131766386
>> >8 |  1 | 2089844462 <(208)%20984-4462>
>> >8 |  2 | 2089844462 <(208)%20984-4462>
>> >8 |  3 | 2089844463 <(208)%20984-4463>
>> >8 |  6 | 2089844463 <(208)%20984-4463>
>> >8 |  9 | 2295318868 <(229)%20531-8868>
>> > (25 rows)
>>
>> Could you show the contents of those two pages with a query like I had
>> in an earlier email?
>>
>> Greetings,
>>
>> Andres Freund
>>
>
> SELECT heap_page_items(get_raw_page('pg_authid', 7));
>
>
> heap_page_items
> 
> 
> 
>  (1,4720,1,108,1897434979,0,0,"(7,1)",11,2825,32,
> 110111001100
> 00100000,507769370)
>  (2,4608,1,108,1897442758,0,18,"(7,2)",11,2825,32,
> 1000111110110010
> 00100000,507776451)
>  (3,4496,1,108,1897442758,0,20,"(7,3)",11,2825,32,
> 1010001110110010
> 00100000,507776452)
>  (4,4384,1,108,2040863716,0,37,"(7,4)",11,2313,32,
> 10110110111011100011
> 00101000,525105004)
>  (5,4272,1,108,2040863716,0,39,"(7,5)",11,2313,32,
> 100010110110111011100011
> 00101000,525105005)
>  (6,0,3,0,)
>  (7,0,3,0,)
>  (8,4160,1,108,2041172882,0,49,"(7,8)",11,2313,32,
> 11110100101011000111
> 00101000,525219118)
>  (9,4048,1,108,2041172882,0,51,"(7,9)",11,2313,32,
> 10000100101011000111
> 00101000,525219119)
>  (10,0,3,0,)
>  (11,0,3,0,)
>  (12,3936,1,108,2041201779,0,181,"(7,12)",11,2313,32,
> 10001101010001000111
> 00101000,525236779)
>  (13,3824,1,108,2041201779,0,183,"(7,13)",11,2313,32,
> 1011010001000111
> 00101000,525236780)
>  (14,0,3,0,)
>  (15,0,3,0,)
>  (16,3712,1,108,2089742733 <(208)%20974-2733>,0,0,"(7,16)",11,2313,32,
> 1100010111001110
> 00111000,532706210)
>  (17,3600,1,108,2090021318,0,1,"(7,17)",11,2313,32,
> 1100111101001000
> 00111000,532753458)
>  (18,3488,1,108,2090021318,0,3,"(7,18)",11,2313,32,
> 10001100111101001000
> 00111000,532753459)
>  (19,8080,1,108,753125991,0,2,"(7,19)",11,2825,32,
> 10110001010111001011
> 1111,236796556)
>  (20,7968,1,108,753125991,0,4,"(7,20)",11,2825,32,
> 100010110001010111001011
> 1111,236796557)
>  (21,7856,1,108,753125992,0,19,"(7,21)",11,2825,32,
> 100

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Jeremy Finzel

On Mon, Mar 19, 2018 at 3:53 PM, Peter Geoghegan  wrote:

> On Mon, Mar 19, 2018 at 1:01 PM, Jeremy Finzel  wrote:
> > SELECT heap_page_items(get_raw_page('pg_authid', 7));
>
> Can you post this?
>
> SELECT * FROM page_header(get_raw_page('pg_authid', 7));
>
> --
> Peter Geoghegan
>

@Peter :

staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7));
  lsn   | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
+--+---+---+---+-+--+-+---
 262B4/10FDC478 |0 | 1 |   304 |  2224 |8192 | 8192 |
 4 | 0
(1 row)

@Andres :

This is from snapshot (on 9.5.12, but we didn't have the error either on a
9.5.11 snap):

 heap_page_items
-
 (1,0,0,0,)
 (2,0,0,0,)
 (3,0,0,0,)
 (4,0,0,0,)
 (5,0,0,0,)
 
(6,8080,1,108,3137434815,0,0,"(7,6)",11,10505,32,111011001001,2166427518)
 
(7,7936,1,144,3137434816,0,0,"(7,7)",11,10507,32,1100011101110111010110111011,3504005358)
 (8,0,0,0,)
 (9,0,0,0,)
 
(10,7792,1,144,3137434817,0,0,"(7,10)",11,10507,32,1100010101100011011101101000,401353834)
 
(11,7680,1,108,3137434818,0,0,"(7,11)",11,10505,32,111000110100100100010111,2248708806)
 (12,0,0,0,)
 (13,0,0,0,)
 
(14,7568,1,108,3137434819,0,0,"(7,14)",11,10505,32,1111000011011011100010100101,2770187214)
 
(15,7456,1,108,3137434820,0,0,"(7,15)",11,10505,32,100000010100010100001011,2235343503)
 (16,0,0,0,)
 (17,0,0,0,)
 (18,0,0,0,)
 (19,0,0,0,)
 (20,0,0,0,)
 (21,0,0,0,)
 (22,0,0,0,)
 (23,0,0,0,)
 (24,0,0,0,)
 (25,0,0,0,)
 (26,0,0,0,)
 (27,0,0,0,)
 (28,0,0,0,)
 (29,0,0,0,)
 (30,0,0,0,)
 (31,0,0,0,)
 (32,0,0,0,)
 (33,0,0,0,)
 (34,0,0,0,)
 (35,0,0,0,)
 (36,0,0,0,)
 (37,0,0,0,)
 (38,0,0,0,)
 (39,0,0,0,)
 (40,0,0,0,)
 (41,0,0,0,)
 (42,0,0,0,)
 (43,0,0,0,)
 (44,0,0,0,)
 (45,0,0,0,)
 (46,0,0,0,)
 (47,0,0,0,)
 (48,0,0,0,)
 (49,0,0,0,)
 (50,0,0,0,)
 (51,0,0,0,)
 
(52,7344,1,108,3137434821,0,0,"(7,52)",11,10505,32,1000110110111000101001010101,2191859675)
 
(53,7232,1,108,3137434822,0,0,"(7,53)",11,10505,32,1110101101000110011011100100,661027542)
 (54,0,0,0,)
 (55,0,0,0,)
 (56,0,0,0,)
 (57,0,0,0,)
 (58,0,0,0,)
 (59,0,0,0,)
 (60,0,0,0,)
 
(61,7120,1,108,3137434823,0,0,"(7,61)",11,10505,32,1001011101011101010111010100,732568296)
 
(62,6976,1,144,3137434824,0,0,"(7,62)",11,10507,32,1100101001100100101011010100,674571301)
 
(63,6864,1,108,3137434825,0,0,"(7,63)",11,10505,32,1000100110110101111001010001,2319695577)
 
(64,6720,1,144,3137434826,0,0,"(7,64)",11,10507,32,1100011001100011100100101000,345892418)
 
(65,6608,1,108,3137434827,0,0,"(7,65)",11,10505,32,1101010100010010101011001010,1398049410)
 
(66,6496,1,108,3137434828,0,0,"(7,66)",11,10505,32,1010100000110101001000101010,1414188820)
 
(67,6384,1,108,3137434829,0,0,"(7,67)",11,10505,32,1011010101110011001110101001,2513301164)
 (68,0,0,0,)
 (69,0,0,0,)
 (70,0,0,0,)
(70 rows)


Thanks,
Jeremy

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-19 Thread Jeremy Finzel

On Mon, Mar 19, 2018 at 4:12 PM, Peter Geoghegan  wrote:

> On Mon, Mar 19, 2018 at 1:55 PM, Jeremy Finzel  wrote:
> > @Peter :
> >
> > staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7));
> >   lsn   | checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> > +--+---+---+---+
> -+--+-+---
> >  262B4/10FDC478 |0 | 1 |   304 |  2224 |8192 | 8192 |
> > 4 | 0
> > (1 row)
>
> Thanks.
>
> That looks normal. I wonder if the contents of that page looks
> consistent with the rest of the table following manual inspection,
> though. I recently saw system catalog corruption on a 9.5 instance
> where an entirely different relation's page ended up in pg_attribute
> and pg_depend. They were actually pristine index pages from an
> application index. I still have no idea why this happened.
>
> This is very much a guess, but it can't hurt to check if the contents
> of the tuples themselves are actually sane by inspecting them with
> "SELECT * FROM pg_authid". heap_page_items() doesn't actually care
> about the shape of the tuples in the page, so this might have been
> missed.
>
> --
> Peter Geoghegan
>

The data all looks fine.  I even forced the index scan on both indexes
which also looks fine.

Thanks,
Jeremy

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-20 Thread Jeremy Finzel

On Mon, Mar 19, 2018 at 3:55 PM, Jeremy Finzel  wrote:

>
>
> On Mon, Mar 19, 2018 at 3:53 PM, Peter Geoghegan  wrote:
>
>> On Mon, Mar 19, 2018 at 1:01 PM, Jeremy Finzel  wrote:
>> > SELECT heap_page_items(get_raw_page('pg_authid', 7));
>>
>> Can you post this?
>>
>> SELECT * FROM page_header(get_raw_page('pg_authid', 7));
>>
>> --
>> Peter Geoghegan
>>
>
> @Peter :
>
> staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7));
>   lsn   | checksum | flags | lower | upper | special | pagesize |
> version | prune_xid
> +--+---+---+---+
> -+--+-+---
>  262B4/10FDC478 |0 | 1 |   304 |  2224 |8192 | 8192 |
>  4 | 0
> (1 row)
>
> @Andres :
>
> This is from snapshot (on 9.5.12, but we didn't have the error either on a
> 9.5.11 snap):
>
>
>  heap_page_items
> 
> 
> -
>  (1,0,0,0,)
>  (2,0,0,0,)
>  (3,0,0,0,)
>  (4,0,0,0,)
>  (5,0,0,0,)
>  (6,8080,1,108,3137434815 <(313)%20743-4815>,0,0,"(7,6)",11,10505,32,
> 11101000
> 01001001,2166427518)
>  (7,7936,1,144,3137434816 <(313)%20743-4816>,0,0,"(7,7)",11,10507,32,
> 11000111011101110101
> 10111011,3504005358)
>  (8,0,0,0,)
>  (9,0,0,0,)
>  (10,7792,1,144,3137434817 <(313)%20743-4817>,0,0,"(7,10)",11,10507,32,
> 11000101011000110111
> 01101000,401353834)
>  (11,7680,1,108,3137434818 <(313)%20743-4818>,0,0,"(7,11)",11,10505,32,
> 11100011010010010001
> 0111,2248708806)
>  (12,0,0,0,)
>  (13,0,0,0,)
>  (14,7568,1,108,3137434819 <(313)%20743-4819>,0,0,"(7,14)",11,10505,32,
> 1111000011011011
> 100010100101,2770187214)
>  (15,7456,1,108,3137434820 <(313)%20743-4820>,0,0,"(7,15)",11,10505,32,
> 10000001010001010011
> 11001011,2235343503)
>  (16,0,0,0,)
>  (17,0,0,0,)
>  (18,0,0,0,)
>  (19,0,0,0,)
>  (20,0,0,0,)
>  (21,0,0,0,)
>  (22,0,0,0,)
>  (23,0,0,0,)
>  (24,0,0,0,)
>  (25,0,0,0,)
>  (26,0,0,0,)
>  (27,0,0,0,)
>  (28,0,0,0,)
>  (29,0,0,0,)
>  (30,0,0,0,)
>  (31,0,0,0,)
>  (32,0,0,0,)
>  (33,0,0,0,)
>  (34,0,0,0,)
>  (35,0,0,0,)
>  (36,0,0,0,)
>  (37,0,0,0,)
>  (38,0,0,0,)
>  (39,0,0,0,)
>  (40,0,0,0,)
>  (41,0,0,0,)
>  (42,0,0,0,)
>  (43,0,0,0,)
>  (44,0,0,0,)
>  (45,0,0,0,)
>  (46,0,0,0,)
>  (47,0,0,0,)
>  (48,0,0,0,)
>  (49,0,0,0,)
>  (50,0,0,0,)
>  (51,0,0,0,)
>  (52,7344,1,108,3137434821 <(313)%20743-4821>,0,0,"(7,52)",11,10505,32,
> 10001101101110001010
> 01010101,2191859675)
>  (53,7232,1,108,3137434822 <(313)%20743-4822>,0,0,"(7,53)",11,10505,32,
> 1110101101000110
> 011011100100,661027542)
>  (54,0,0,0,)
>  (55,0,0,0,)
>  (56,0,0,0,)
>  (57,0,0,0,)
>  (58,0,0,0,)
>  (59,0,0,0,)
>  (60,0,0,0,)
>  (61,7120,1,108,3137434823 <(313)%20743-4823>,0,0,"(7,61)",11,10505,32,
> 1001011101011101
> 010111010100,732568296)
>  (62,6976,1,144,3137434824 <(313)%20743-4824>,0,0,"(7,62)",11,10507,32,
> 11001010011001001010
> 11010100,674571301)
>  (63,6864,1,108,3137434825 <(313)%20743-4825>,0,0,"(7,63)",11,10505,32,
> 10001001101101011100
> 001001010001,2319695577)
>  (64,6720,1,144,3137434826 <(313)%20743-4826>,0,0,"(7,64)",11,10507,32,
> 1100011001100011
> 100100101000,345892418)
>  (65,6608,1,108,3137434827 <(313)%20743-4827>,0,0,"(7,65)",11,10505,32,
> 1101010100010010
> 101011001010,1398049410)
>

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-21 Thread Jeremy Finzel

On Tue, Mar 20, 2018 at 11:19 AM, Jeremy Finzel  wrote:

>
>
> On Mon, Mar 19, 2018 at 3:55 PM, Jeremy Finzel  wrote:
>
>>
>>
>> On Mon, Mar 19, 2018 at 3:53 PM, Peter Geoghegan  wrote:
>>
>>> On Mon, Mar 19, 2018 at 1:01 PM, Jeremy Finzel 
>>> wrote:
>>> > SELECT heap_page_items(get_raw_page('pg_authid', 7));
>>>
>>> Can you post this?
>>>
>>> SELECT * FROM page_header(get_raw_page('pg_authid', 7));
>>>
>>> --
>>> Peter Geoghegan
>>>
>>
>> @Peter :
>>
>> staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7));
>>   lsn   | checksum | flags | lower | upper | special | pagesize |
>> version | prune_xid
>> +--+---+---+---+
>> -+--+-+---
>>  262B4/10FDC478 |0 | 1 |   304 |  2224 |8192 | 8192
>> |   4 | 0
>> (1 row)
>>
>> @Andres :
>>
>> This is from snapshot (on 9.5.12, but we didn't have the error either on
>> a 9.5.11 snap):
>>
>>
>>  heap_page_items
>> 
>> 
>> -
>>  (1,0,0,0,)
>>  (2,0,0,0,)
>>  (3,0,0,0,)
>>  (4,0,0,0,)
>>  (5,0,0,0,)
>>  (6,8080,1,108,3137434815 <(313)%20743-4815>,0,0,
>> "(7,6)",11,10505,32,1000
>> 011011001001,2166427518 <(216)%20642-7518>)
>>  (7,7936,1,144,3137434816 <(313)%20743-4816>,0,0,
>> "(7,7)",11,10507,32,1100
>> 011101110111010110111011,3504005358)
>>  (8,0,0,0,)
>>  (9,0,0,0,)
>>  (10,7792,1,144,3137434817 <(313)%20743-4817>,0,0
>> ,"(7,10)",11,10507,32,11
>> 00010101100011011101101000,401353834)
>>  (11,7680,1,108,3137434818 <(313)%20743-4818>,0,0
>> ,"(7,11)",11,10505,32,10
>> 00011000110100100100010111,2248708806 <(224)%20870-8806>)
>>  (12,0,0,0,)
>>  (13,0,0,0,)
>>  (14,7568,1,108,3137434819 <(313)%20743-4819>,0,0
>> ,"(7,14)",11,10505,32,10
>> 000111000011011011100010100101,2770187214)
>>  (15,7456,1,108,3137434820 <(313)%20743-4820>,0,0
>> ,"(7,15)",11,10505,32,10
>> 0000010100010100001011,2235343503 <(223)%20534-3503>)
>>  (16,0,0,0,)
>>  (17,0,0,0,)
>>  (18,0,0,0,)
>>  (19,0,0,0,)
>>  (20,0,0,0,)
>>  (21,0,0,0,)
>>  (22,0,0,0,)
>>  (23,0,0,0,)
>>  (24,0,0,0,)
>>  (25,0,0,0,)
>>  (26,0,0,0,)
>>  (27,0,0,0,)
>>  (28,0,0,0,)
>>  (29,0,0,0,)
>>  (30,0,0,0,)
>>  (31,0,0,0,)
>>  (32,0,0,0,)
>>  (33,0,0,0,)
>>  (34,0,0,0,)
>>  (35,0,0,0,)
>>  (36,0,0,0,)
>>  (37,0,0,0,)
>>  (38,0,0,0,)
>>  (39,0,0,0,)
>>  (40,0,0,0,)
>>  (41,0,0,0,)
>>  (42,0,0,0,)
>>  (43,0,0,0,)
>>  (44,0,0,0,)
>>  (45,0,0,0,)
>>  (46,0,0,0,)
>>  (47,0,0,0,)
>>  (48,0,0,0,)
>>  (49,0,0,0,)
>>  (50,0,0,0,)
>>  (51,0,0,0,)
>>  (52,7344,1,108,3137434821 <(313)%20743-4821>,0,0
>> ,"(7,52)",11,10505,32,10
>> 00110110111000101001010101,2191859675)
>>  (53,7232,1,108,3137434822 <(313)%20743-4822>,0,0
>> ,"(7,53)",11,10505,32,10
>> 000110101101000110011011100100,661027542)
>>  (54,0,0,0,)
>>  (55,0,0,0,)
>>  (56,0,0,0,)
>>  (57,0,0,0,)
>>  (58,0,0,0,)
>>  (59,0,0,0,)
>>  (60,0,0,0,)
>>  (61,7120,1,108,3137434823 <(313)%20743-4823>,0,0
>> ,"(7,61)",11,10505,32,10
>> 01011101011101010111010100,732568296)
>>  (62,6976,1,144,3137434824 <(313)%20743-4824>,0,0
>> ,"(7,62)",11,10507,32,1100

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-22 Thread Jeremy Finzel

On Wed, Mar 21, 2018 at 4:29 PM, Peter Geoghegan  wrote:

> On Wed, Mar 21, 2018 at 1:38 PM, Jeremy Finzel  wrote:
> > A server restart and upgrade to 9.5.12 (at the same time), as expected,
> made
> > the issue go away.  Still doesn't give us any answers as to what
> happened or
> > if it would happen again!  Thanks for the feeback.
>
> You may still want to use amcheck to look for problems. The version on
> Github works with 9.5, and there are Redhat and Debian pgdg packages.
> See:
>
> https://github.com/petergeoghegan/amcheck
>
> The "heapallindexed" option will be of particular interest to you -
> that option verifies that the table has matching rows for a target
> index (in addition to testing the structure of a target B-Tree index
> itself). This is probably the best general test for corruption that is
> available. There is a fair chance that this will reveal new
> information.
>
> --
> Peter Geoghegan
>

Thank you for the recommendation.  I ran both amcheck functions on all 4
indexes of those 2 tables with heapallindexed = true, but no issues were
found.

Thanks,
Jeremy

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-03-22 Thread Jeremy Finzel

On Thu, Mar 22, 2018 at 3:20 PM, Peter Geoghegan  wrote:

> On Thu, Mar 22, 2018 at 12:27 PM, Jeremy Finzel  wrote:
> > Thank you for the recommendation.  I ran both amcheck functions on all 4
> > indexes of those 2 tables with heapallindexed = true, but no issues were
> > found.
>
> Probably wouldn't hurt to run it against all indexes, if you can make
> time for that. If you can generalize from the example query that calls
> the bt_index_check() function, but set
> "heapallindexed=>i.indisprimary" and remove "n.nspname =
> 'pg_catalog'", as well as "LIMIT 10".  This will test tables and
> indexes from all schemas, which might be interesting.
> --
> Peter Geoghegan
>

I am running this on a san snapshot of our production system.  I assume
that this will give me a valid check for file-system-level corruption.  I
am going to kick it off and see if I find anything interesting.

Thanks,
Jeremy

Re: Concurrent CTE

2018-04-04 Thread Jeremy Finzel

On Wed, Apr 4, 2018 at 3:20 AM Artur Formella  wrote:

> Hello!
> We have a lot of big CTE (~40 statements, ~1000 lines) for very dynamic
> OLTP content and avg response time 50-300ms. Our setup has 96 threads
> (Intel Xeon Gold 6128), 256 GB RAM and 12 SSD (3 tablespaces). DB size <
> RAM.
> Simplifying the problem:
>
> WITH aa as (
>SELECT * FROM table1
> ), bb (
>SELECT * FROM table2
> ), cc (
>SELECT * FROM table3
> ), dd (
>SELECT * FROM aa,bb
> ), ee (
>SELECT * FROM aa,bb,cc
> ), ff (
>SELECT * FROM ee,dd
> ), gg (
>SELECT * FROM table4
> ), hh (
>SELECT * FROM aa
> )
> SELECT * FROM gg,hh,ff /* primary statement */
>
> Execution now:
> time-->
> Thread1: aa | bb | cc | dd | ee | ff | gg | hh | primary
>
> And the question: is it possible to achieve more concurrent execution
> plan to reduce the response time? For example:
> Thread1: aa | dd | ff | primary
> Thread2: bb | ee | gg
> Thread3: cc | -- | hh
>
> Table1, table2 and table3 are located on separate tablespaces and are
> independent.
> Partial results (aa,bb,cc,dd,ee) are quite big and slow (full text
> search, arrays, custom collations, function scans...).
>
> We consider resigning from the CTE and rewrite to RX Java but we are
> afraid of downloading partial results and sending it back with WHERE
> IN(...).
>
> Thanks!
>
> Artur Formella

It is very difficult from your example to tell just what kind of data you
are querying and why you are doing it this way. I will give it a try.

If you are filtering any of this data later you are fencing off that
optimization. Also in your example it makes no sense to have cte aa when
you could just cross join table1 directly in all your other ctes (and bb
and cc for the same reason).

Also in my experience, you are not going to have a great query plan with
that many CTEs. Also are you using functions or prepared statements or are
you paying the price of planning this query every time?

It is hard to tell but your example leads me to question if there are some
serious issues in your db design. Where are your joins and where are you
leveraging indexes?  Also it is very easy to misuse use a raise and
function scans to even make performance worse.

Thanks,
Jeremy

>

Query function arg data types ONLY (no arg names)

2018-04-28 Thread Jeremy Finzel

It appears that neither pg_get_function_arguments
nor pg_get_function_identity_arguments could be used for this.  I want to
get function argument data types from the catalog by ordinal position,
without the argument name.

For example, I want the same information for these 2 functions:

foo(p_1 int, p_2 text)

- {int, text}

foo(int, text)

- {int, text}

Any suggestions as to how to use the catalogs or built-in postgres
functions to query this?

Thank you!
Jeremy

Re: Query function arg data types ONLY (no arg names)

2018-04-28 Thread Jeremy Finzel

On Sat, Apr 28, 2018 at 12:01 PM Pavel Stehule 
wrote:

> Hi
>
> 2018-04-28 18:52 GMT+02:00 Jeremy Finzel :
>
>> It appears that neither pg_get_function_arguments
>> nor pg_get_function_identity_arguments could be used for this.  I want to
>> get function argument data types from the catalog by ordinal position,
>> without the argument name.
>>
>> For example, I want the same information for these 2 functions:
>>
>> foo(p_1 int, p_2 text)
>>
>> - {int, text}
>>
>> foo(int, text)
>>
>> - {int, text}
>>
>> Any suggestions as to how to use the catalogs or built-in postgres
>> functions to query this?
>>
>
>  CREATE OR REPLACE FUNCTION public.foo(a integer, b integer, c text)
>  RETURNS text
>  LANGUAGE sql
> AS $function$ select 'hi'; $function$
>
> postgres=# select (proargtypes::regtype[])[0:] from pg_proc where proname
> = 'foo';
> ┌─[ RECORD 1 ]┬┐
> │ proargtypes │ {integer,integer,text} │
> └─┴┘
>
> Regards
>
> Pavel
>
>
>> Thank you!
>> Jeremy
>>
>
This is perfect - thank you!

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-23 Thread Jeremy Finzel

On Tue, May 22, 2018 at 2:42 PM, Maxim Boguk  wrote:

>
>
> On Tue, May 22, 2018 at 10:30 PM, Andres Freund 
> wrote:
>
>> Hi,
>>
>> On 2018-05-22 22:18:15 +0300, Maxim Boguk wrote:
>> > On Tue, May 22, 2018 at 9:47 PM, Andres Freund 
>> wrote:
>> > > >  select relfrozenxid from pg_class where relname='pg_authid';
>> > > >  relfrozenxid
>> > > > --
>> > > >2863429136
>>
>> > select txid_current();
>> >  txid_current
>> > --
>> >   41995913769
>>
>> So that's an xid of 3341208114, if you leave the epoch out. What's
>> 
>> autovacuum_freeze_max_age set to in that cluster?
>>
>
> postgres=# show autovacuum_freeze_max_age;
>  autovacuum_freeze_max_age
> ---
>  2
> (default value I think)
>
>
>
>> Can you show pg_controldata output, and
>> 
>> relminmxid from that cluster?
>>
>
> postgres@db:~$ /usr/lib/postgresql/9.6/bin/pg_controldata  -D
> /var/lib/postgresql/9.6/main
> pg_control version number:960
> Catalog version number:   201608131
> Database system identifier:   6469368654711450114
> Database cluster state:   in production
> pg_control last modified: Tue 22 May 2018 10:20:14 PM MSK
> Latest checkpoint location:   CCB5/F9C37950
> Prior checkpoint location:CCB0/43F316B0
> Latest checkpoint's REDO location:CCB1/6706BD88
> Latest checkpoint's REDO WAL file:0001CCB10067
> Latest checkpoint's TimeLineID:   1
> Latest checkpoint's PrevTimeLineID:   1
> Latest checkpoint's full_page_writes: on
> Latest checkpoint's NextXID:  9:3341161759
> Latest checkpoint's NextOID:  190071899
> Latest checkpoint's NextMultiXactId:  59416233
> Latest checkpoint's NextMultiOffset:  215588532
> Latest checkpoint's oldestXID:2814548646
> Latest checkpoint's oldestXID's DB:   16400
> Latest checkpoint's oldestActiveXID:  3341161759
> Latest checkpoint's oldestMultiXid:   54264778
> Latest checkpoint's oldestMulti's DB: 16400
> Latest checkpoint's oldestCommitTsXid:2814548646
> Latest checkpoint's newestCommitTsXid:3341161758
> Time of latest checkpoint:Tue 22 May 2018 10:05:16 PM MSK
> Fake LSN counter for unlogged rels:   0/1
> Minimum recovery ending location: 0/0
> Min recovery ending loc's timeline:   0
> Backup start location:0/0
> Backup end location:  0/0
> End-of-backup record required:no
> wal_level setting:replica
> wal_log_hints setting:on
> max_connections setting:  2000
> max_worker_processes setting: 8
> max_prepared_xacts setting:   0
> max_locks_per_xact setting:   64
> track_commit_timestamp setting:   on
> Maximum data alignment:   8
> Database block size:  8192
> Blocks per segment of large relation: 131072
> WAL block size:   8192
> Bytes per WAL segment:16777216
> Maximum length of identifiers:64
> Maximum columns in an index:  32
> Maximum size of a TOAST chunk:1996
> Size of a large-object chunk: 2048
> Date/time type storage:   64-bit integers
> Float4 argument passing:  by value
> Float8 argument passing:  by value
> Data page checksum version:   0
> 
> postgres=# select datname,datfrozenxid,datminmxid from pg_database order
> by datname;
>   datname  | datfrozenxid | datminmxid
> ---+--+
>  **|   2815939794 |   54265194
>  postgres  |   2863429136 |   54280819
>  template0 |   3148297669 |   59342106
>  template1 |   2816765546 |   59261794
>
>
>
>
>
>>
>> I might be daft here, but it's surely curious that the relfrozenxid from
>> the error and pg_catalog are really different (number of digits):
>> catalog: 2863429136
>> error:248712603
>>
>>
>> > About gdb bt - it's tricky because it is mission critical master db of
>> > huge project.
>> > I'll will try promote backup replica and check is issue persist there
>> and
>> > if yes - we will have our playground for a while, but it will require
>> > sometime to arrange.
>>
>> You should be ok to just bt that in the running cluster, but I
>> definitely understand if you don't want to do that...  I'd appreciate if
>> you set up the a playground, because this seems like something that'll
>> reappear.
>>
>
> My gdb kunf-fu exceptionally rusty now, so I better play in sandbox.
> And if error did't exist on fresh promoted replica it will give us useful
> info as well.
>
> --
> Maxim Boguk
> Senior Postgresql DBA
> http://dataegret.com/ 
>
> Phone RU: +7  985 433 
> Phone UA: +380 99 143 
> Phone AU: +61  45 218 5678
>
> LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
> Skype: maxim.boguk
>
> "Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
> когда я так делаю ещё раз?"
>
>

Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-24 Thread Jeremy Finzel

>
> BTW is it just a coincidence or are all the affected tables pg_authid?
> Maybe the problem is shared relations ..?  Maybe the fact that they have
> separate relfrozenxid (!?) in different databases?
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
 We have had this problem twice and both times on both pg_authid and
pg_auth_members. Thanks,

Jeremy

Slow planning time for simple query

2018-06-06 Thread Jeremy Finzel

Hello -

We have an odd scenario on one of our OLTP systems, which behaves the same
way on a streamer, of a 700-1000ms planning time for a query like this:

SELECT *
FROM table1
WHERE  source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

The actual execution time is sub-ms.

We initially thought maybe catalog bloat?  But we were able to reindex all
of the highly churned catalog tables, and I even did VACUUM FULL on
pg_attribute and pg_statistic, to no avail.

There are no custom settings for pg_attribute for the given tables either.

Interestingly, the problem goes away on a SAN snapshot of the target system.

Any ideas of what else we could try?  A PL function that caches the query
plan works, but that is just a workaround.

Thanks!
Jeremy

Re: Slow planning time for simple query

2018-06-06 Thread Jeremy Finzel

On Wed, Jun 6, 2018 at 12:12 PM, Tom Lane  wrote:

> Jeremy Finzel  writes:
> > We have an odd scenario on one of our OLTP systems, which behaves the
> same
> > way on a streamer, of a 700-1000ms planning time for a query like this:
>
> > SELECT *
> > FROM table1
> > WHERE  source_id IN (SELECT id FROM table2 WHERE customer_id = $1);
>
> Hm.  Is this the first query executed in a session?  If so maybe it's
> got something to do with populating caches and other session spin-up
> overhead.
>
> Another theory is it's some sort of locking issue.  Turning on
> log_lock_waits, and setting deadlock_timeout to say 100ms, would help
> in investigating that.
>
> regards, tom lane
>

I have run it over and over with no improvement in the planning time, so I
don't thing it's first in session-related.  I can only make it faster with
a pl function so far.

We have log_lock_waits on and nothing shows, and turning down
deadlock_timeout also doesn't do anything.

Thanks,
Jeremy

Re: Slow planning time for simple query

2018-06-06 Thread Jeremy Finzel

On Wed, Jun 6, 2018 at 1:13 PM, Jeremy Finzel  wrote:

>
>
> On Wed, Jun 6, 2018 at 12:12 PM, Tom Lane  wrote:
>
>> Jeremy Finzel  writes:
>> > We have an odd scenario on one of our OLTP systems, which behaves the
>> same
>> > way on a streamer, of a 700-1000ms planning time for a query like this:
>>
>> > SELECT *
>> > FROM table1
>> > WHERE  source_id IN (SELECT id FROM table2 WHERE customer_id = $1);
>>
>> Hm.  Is this the first query executed in a session?  If so maybe it's
>> got something to do with populating caches and other session spin-up
>> overhead.
>>
>> Another theory is it's some sort of locking issue.  Turning on
>> log_lock_waits, and setting deadlock_timeout to say 100ms, would help
>> in investigating that.
>>
>> regards, tom lane
>>
>
> I have run it over and over with no improvement in the planning time, so I
> don't thing it's first in session-related.  I can only make it faster with
> a pl function so far.
>
> We have log_lock_waits on and nothing shows, and turning down
> deadlock_timeout also doesn't do anything.
>
> Thanks,
> Jeremy
>

Forgot to mention, this is running on this version:
PostgreSQL 9.6.8 on x86_64-pc-linux-gnu (Ubuntu 9.6.8-1.pgdg16.04+1),
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit


Thanks,
Jeremy

Re: ERROR: found multixact from before relminmxid

2018-06-08 Thread Jeremy Finzel

On Tue, Jun 5, 2018 at 8:42 PM, Alexandre Arruda  wrote:

> Em seg, 28 de mai de 2018 às 16:44, Andres Freund 
> escreveu:
> >
> > Hi,
> >
> > I think I found the bug, and am about to post a fix for it belo
> > https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de.
> >
> > Greetings,
> >
> > Andres Freund
>
> Hi Andres,
>
> In end of April we did a complete dump/reload in database to version 10.3.
> Today, the problem returns:
>
> production=# vacuum verbose co27t;
> INFO:  vacuuming "public.co27t"
> ERROR:  found multixact 81704071 from before relminmxid 107665371
> production=# vacuum full verbose co27t;
> INFO:  vacuuming "public.co27t"
> ERROR:  found multixact 105476076 from before relminmxid 107665371
> production=# cluster co27t;
> ERROR:  found multixact 105476076 from before relminmxid 107665371
>
> But this time, regular vacuum versus full/cluster are different in
> multixact number.
> Your patch is applicable to this issue and is in 10.4 ?
>
> Best regards,
>
> Alexandre
>
>
We encountered this issue ourselves for the first time on a busy OLTP
system.  It is at 9.6.8.  We found that patching to 9.6.9 on a snapshot of
this system did not fix the problem, but I assume that is because the patch
in 9.6.9 only prevents the problem moving forward.  Is that accurate?

Before we take an outage for this patch, we want as much information as
possible on if this is indeed likely to be our issue.

Like the other people on this thread, amcheck didn't show anything on the
snap:
db=# select bt_index_parent_check(indexrelid,true) FROM
pg_stat_user_indexes WHERE relname = 'mytable';
 bt_index_parent_check
---





(5 rows)

db=# select bt_index_check(indexrelid,true) FROM pg_stat_user_indexes WHERE
relname = 'mytable';
 bt_index_check






(5 rows)


Not surprisingly, I can get the problem to go away in production if I use
pg_repack to rebuild the table.  But we are interested of course in solving
this problem permanently.

Thanks,
Jeremy

Re: ERROR: found multixact from before relminmxid

2018-06-08 Thread Jeremy Finzel

On Fri, Jun 8, 2018 at 1:08 PM Andres Freund  wrote:

> On 2018-06-08 12:38:03 -0500, Jeremy Finzel wrote:
> > On Tue, Jun 5, 2018 at 8:42 PM, Alexandre Arruda 
> wrote:
> >
> > > Em seg, 28 de mai de 2018 às 16:44, Andres Freund 
> > > escreveu:
> > > >
> > > > Hi,
> > > >
> > > > I think I found the bug, and am about to post a fix for it belo
> > > >
> https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de.
> > > >
> > > > Greetings,
> > > >
> > > > Andres Freund
> > >
> > > Hi Andres,
> > >
> > > In end of April we did a complete dump/reload in database to version
> 10.3.
> > > Today, the problem returns:
> > >
> > > production=# vacuum verbose co27t;
> > > INFO:  vacuuming "public.co27t"
> > > ERROR:  found multixact 81704071 from before relminmxid 107665371
> > > production=# vacuum full verbose co27t;
> > > INFO:  vacuuming "public.co27t"
> > > ERROR:  found multixact 105476076 from before relminmxid 107665371
> > > production=# cluster co27t;
> > > ERROR:  found multixact 105476076 from before relminmxid 107665371
> > >
> > > But this time, regular vacuum versus full/cluster are different in
> > > multixact number.
> > > Your patch is applicable to this issue and is in 10.4 ?
> > >
> > > Best regards,
> > >
> > > Alexandre
> > >
> > >
> > We encountered this issue ourselves for the first time on a busy OLTP
> > system.  It is at 9.6.8.  We found that patching to 9.6.9 on a snapshot
> of
> > this system did not fix the problem, but I assume that is because the
> patch
> > in 9.6.9 only prevents the problem moving forward.  Is that accurate?
>
> Which patch in 9.6.9 are you referring to?  The patch I linked to above
> hasn't yet been merged, much less been released.


 No I was referring to this from the documentation:


   -

   Avoid spuriously marking pages as all-visible (Dan Wood, Pavan Deolasee,
   Álvaro Herrera)

   This could happen if some tuples were locked (but not deleted). While
   queries would still function correctly, vacuum would normally ignore such
   pages, with the long-term effect that the tuples were never frozen. In
   recent releases this would eventually result in errors such as "found
   multixact n from before relminmxid n".



   thanks,

Jeremy

>

Re: ERROR: found multixact from before relminmxid

2018-06-11 Thread Jeremy Finzel

>
> Is there any chance I could get access
> to a copy of the data? It's very hard to debug something like this
> without something that can reproduce the issue...
>

It would be very difficult for us to be able to clean and anonymize this
data and provide a snapshot publicly.  But I am very willing to hook up a
debugger on a snapshot that reproduces the issue if you can guide me into
what exactly I should look for here.

Thanks,
Jeremy

Partitioning with range types

2018-06-15 Thread Jeremy Finzel

Several months ago we had some detailed discussions about whether to use
separate date columns to indicate a date range, or to use the daterange
data type.  We opted for the latter because this type is specifically
designed for this use case - a table that has a range of valid dates for
the data it contains.  It also has some great operators and functions.

But I recently discovered that daterange is not supported in any way as a
partition key because it depends on an expression.  I was excited about
this possibility in pg11 with unique constraints on the parent table, but
now it appears it may have instead been to our advantage if we had two
separate date columns instead, so that we could use UPSERT transparently
for date-ranged tables.

Is there any possibility of this feature coming for range types, or, if we
really want to partition using daterange, should we look instead at two
separate date columns?

Thanks,
Jeremy

Re: Partitioning with range types

2018-06-15 Thread Jeremy Finzel

On Fri, Jun 15, 2018 at 11:23 AM, Adrian Klaver 
wrote:

> On 06/15/2018 08:26 AM, Jeremy Finzel wrote:
>
>> Several months ago we had some detailed discussions about whether to use
>> separate date columns to indicate a date range, or to use the daterange
>> data type.  We opted for the latter because this type is specifically
>> designed for this use case - a table that has a range of valid dates for
>> the data it contains.  It also has some great operators and functions.
>>
>> But I recently discovered that daterange is not supported in any way as a
>> partition key because it depends on an expression.  I was excited
>>
>
> A quick test:
>
> Postgres 10.4
>
> create table dr_partition(id integer, dr daterange) PARTITION BY LIST(dr);
>
>  \d dr_partition
>  Table "public.dr_partition"
>  Column |   Type| Collation | Nullable | Default
> +---+---+--+-
>  id | integer   |   |  |
>  dr | daterange |   |  |
> Partition key: LIST (dr)
>
>
> create table dr_1 PARTITION OF dr_partition FOR VALUES IN ('[06/01/2018,
> 06/30/2018]');
>
> \d dr_1
>  Table "public.dr_1"
>  Column |   Type| Collation | Nullable | Default
> +---+---+--+-
>  id | integer   |   |  |
>  dr | daterange |   |  |
> Partition of: dr_partition FOR VALUES IN ('[2018-06-01,2018-07-01)')
>
>
>
> about this possibility in pg11 with unique constraints on the parent
>> table, but now it appears it may have instead been to our advantage if we
>> had two separate date columns instead, so that we could use UPSERT
>> transparently for date-ranged tables.
>>
>> Is there any possibility of this feature coming for range types, or, if
>> we really want to partition using daterange, should we look instead at two
>> separate date columns?
>>
>> Thanks,
>> Jeremy
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Let me clarify - what I said was not accurate.  What I meant is that using
an UPSERT on a parent partition is not supported with range types
specifically because we can't create unique indexes involving expressions
on parent partitions:

CREATE UNIQUE INDEX ON foo (id, lower(as_of_date));
ERROR:  unsupported UNIQUE constraint with partition key definition
DETAIL:  UNIQUE constraints cannot be used when partition keys include
expressions.

Workaround is of course not to use UPSERT, but we all know the advantages
of using UPSERT to handle concurrency and the like and to make our queries
simpler.  We are currently using UPSERT for many of these tables, but they
are not partitioned yet.

Thanks,
Jeremy

Clarifying "timestamp with time zone"

2018-06-15 Thread Jeremy Finzel

Hello!

We often prefer to use timestamptz or "timestamp with time zone" in our
environment because of its actually storing "objective time" with respect
to UTC.  But in my own work experience, I have scarcely encountered a case
where business users, and software engineers, do not actually think it
means the opposite.

When I say "timestamp with time zone", people think the data is saved *in*
a specific time zone, whereas in reality, the opposite is true.  It is
really more like "timestamp UTC" or you even could say "timestamp at UTC".
When you query this of course, then it shows you the time offset based on
your client timezone setting.

I do believe this is part of the SQL standard, but I have found that it
creates great confusion.  I think many devs choose timestamp instead of
timestamptz because they don't really understand that timestamptz gives you
UTC time storage built-in.

That of course means that if you have multiple servers that run in a
different time zone, and you want to replicate that data to a centralized
location, you can easily figure out what objective time a record changed,
for instance, not knowing anything about what time zone the source system
is in.

So it seems to me that "timestamp with time zone" is a misnomer in a big
way, and perhaps it's worth at least clarifying the docs about this, or
even renaming the type or providing an aliased type that means the same
thing, something like timestamputc.  Maybe I'm crazy but I would appreciate
any feedback on this and how easily it confuses.

Thanks,
Jeremy

Re: Clarifying "timestamp with time zone"

2018-06-18 Thread Jeremy Finzel

On Fri, Jun 15, 2018 at 2:57 PM, Adrian Klaver 
wrote:

> On 06/15/2018 12:24 PM, Jeremy Finzel wrote:
>
>> Hello!
>>
>> We often prefer to use timestamptz or "timestamp with time zone" in our
>> environment because of its actually storing "objective time" with respect
>> to UTC.  But in my own work experience, I have scarcely encountered a case
>> where business users, and software engineers, do not actually think it
>> means the opposite.
>>
>> When I say "timestamp with time zone", people think the data is saved
>> *in* a specific time zone, whereas in reality, the opposite is true.  It is
>> really more like "timestamp UTC" or you even could say "timestamp at UTC".
>> When you query this of course, then it shows you the time offset based on
>> your client timezone setting.
>>
>> I do believe this is part of the SQL standard, but I have found that it
>> creates great confusion.  I think many devs choose timestamp instead of
>> timestamptz because they don't really understand that timestamptz gives you
>> UTC time storage built-in.
>>
>> That of course means that if you have multiple servers that run in a
>> different time zone, and you want to replicate that data to a centralized
>> location, you can easily figure out what objective time a record changed,
>> for instance, not knowing anything about what time zone the source system
>> is in.
>>
>> So it seems to me that "timestamp with time zone" is a misnomer in a big
>>
>
> It actually is. It is just one timezone though, UTC.
>
> way, and perhaps it's worth at least clarifying the docs about this, or
>>
>
> https://www.postgresql.org/docs/10/static/datatype-datetime.
> html#DATATYPE-TIMEZONES
>
> "For timestamp with time zone, the internally stored value is always in
> UTC (Universal Coordinated Time, traditionally known as Greenwich Mean
> Time, GMT). An input value that has an explicit time zone specified is
> converted to UTC using the appropriate offset for that time zone. If no
> time zone is stated in the input string, then it is assumed to be in the
> time zone indicated by the system's TimeZone parameter, and is converted to
> UTC using the offset for the timezone zone."
>
> How should the above be clarified?

Actually, that is a really good description.  But I would say the problem
is it does not have a prominent place on the page, and that many people
reading the docs will make enough assumptions about the data types before
they get down to this part of the page.  What is displayed as nitty-gritty
details down the page should be essential reading for any user of postgres
wanting to know how to decide between timestamp and timestamptz.

There are some descriptions that tend to mislead that perhaps could be
clarified.  For example, at the top of the page, timestamptz is described
as "both date and time, with time zone".  Given what we all seem to
acknowledge is a misleading description, I think we ought to either change
this to summarize the above very helpful description, perhaps "both date
and time, in UTC" or some such idea.

I like the idea of making that note that is now nested deep in the example
section very prominent at the top of the page, perhaps as one of these
special notes, given how critical timestamps are for nearly any relational
database use.

Thoughts?

Thanks,
Jeremy

Re: Open Source tool to deploy/promote PostgreSQL DDL

2018-07-11 Thread Jeremy Finzel

On Tue, Jul 10, 2018 at 5:13 PM Hustler DBA  wrote:

> Hi Community,
> A client of mine is looking for an open source tool to deploy and promote
> PostgreSQL DDL changes through database environments as part of SDLC. What
> tools (open source) does the community members use? I normally use scripts,
> but they want something open source.
>
> Thanks,
> Neil Barrett
>

A popular one is ActiveRecord Migrations. It is open source. Although
ActiveRecord is also an ORM it’s not required to use it -you can just use
SQL. FWIW:
http://edgeguides.rubyonrails.org/active_record_migrations.html

>

Suboptimal GIST index?

2022-12-06 Thread Jeremy Finzel

Greetings!

I am getting the following message using DEBUG logging, telling me that a
particular GIST index is suboptimal.  This is for a table with ids and date
ranges that are supposed to be non-overlapping.  Here is the index def:

"my_fact_table_id_as_of_date_excl" EXCLUDE USING gist (id WITH =,
as_of_date WITH &&)

This is on PG 13.9, but I got this exact same message on the same cluster
when it was on version 10.  The fact table has about 250m rows and is about
275GB in size.

Here is the approximate log message:

2022-12-06 06:00:00.152
GMT,"me","foo",3559072,"0.0.0.0:5000",638ead26.364ea0,27080,"SELECT",2022-12-06
02:47:02 GMT,30/296653,2464130682,DEBUG,XX000,"picksplit method for column
2 of index ""my_fact_table_id_as_of_date_excl"" failed",,"The index is not
optimal. To optimize it, contact a developer, or try to use the column as
the second one in the CREATE INDEX command.",,,"SQL statement ""WITH
ended_last_fact AS
(UPDATE my_fact_table
SET as_of_date = daterange(lower(as_of_date), v_as_of_date_start)
  , updated = v_updated
WHERE id = v_id
  AND lower(as_of_date) <> v_as_of_date_start
  AND upper(as_of_date) = 'infinity'
RETURNING *)

INSERT INTO my_fact_table AS f (
  as_of_date
  , customer_id
  , foo
  , id
  updated)
  SELECT
daterange(v_as_of_date_start, 'infinity')
, v_customer_id
, 'mock' AS foo
, v_id
, v_updated
  FROM (SELECT v_id AS id) nes
  LEFT JOIN ended_last_fact ON nes.id = ended_last_fact.id
ON CONFLICT (id, lower(daterange(as_of_date)))
DO UPDATE
SET
  foo = f.foo
  , updated = f.updated;



So I suppose this means I could be getting better performance but in any
case, I don't understand the hint suggested.  Any insight is much
appreciated.  Thanks!

Jeremy

Unkillable processes creating millions of tiny temp files

2021-03-05 Thread Jeremy Finzel

Greetings!

We are running postgres 11.9 (were running 11.7 prior to recent restart) on
a large db (10s of TB) with 5 or 6 tablespaces and 1000s of tables/indexes.

Within the past few days we have started to see a few queries running for
over 8 hours which we then attempt to terminate, but will not terminate.
These queries are also generating hundreds of thousands of tiny/empty temp
files.  In fact, before the restart there were over 23 million files in
pg_tmp which were removed.  We also have verified no server settings have
changed for at least a couple weeks, well before this issue started
happening only in the past few days.

Looking back awhile at the same query a couple weeks ago, we see this
(large file sizes):
2021-02-18 12:01:59.195
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,3,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.0"", size
21299836",,"
2021-02-18 12:01:59.446
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,4,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.5"", size
4138272",,"
2021-02-18 12:01:59.496
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,5,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.1"", size
10706416",,"
2021-02-18 12:01:59.747
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,6,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.3"", size
4150920",,"
2021-02-18 12:01:59.797
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,7,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.2"", size
10660908",,"
2021-02-18 12:02:00.050
GMT,"foo_user","mydb",81780,"0.0.0.0:49926",602e5716.13f74,8,"SELECT",2021-02-18
12:01:26 GMT,164/1009590,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp81780.4"", size
4132224",,"

But here is what we see presently (tiny files):
2021-03-05 21:30:52.712
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73562,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.177717"", size
24",,"
2021-03-05 21:30:52.735
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73563,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.990067"", size
92",,"
2021-03-05 21:30:52.950
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73564,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.490"", size
24",,"
2021-03-05 21:30:53.072
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73565,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.800016"", size
140",,"
2021-03-05 21:30:53.522
GMT,"foo_user","mydb",67713,"0.0.0.0:48482",60424a1d.10881,73566,"SELECT",2021-03-05
15:11:25 GMT,80/2860,0,LOG,0,"temporary file: path
""pg_tblspc/16428/PG_11_201809051/pgsql_tmp/pgsql_tmp67713.500266"", size
48",,"

After the restart of the db and upgrade to 11.9, we see the same problem.

There are 2 separate queries that have had this behavior with seemingly
unrelated objects.  Here is the plan for one of them:


QUERY PLAN
--
 Aggregate  (cost=2654520.75..2654520.76 rows=1 width=336)
   ->  Hash Left Join  (cost=1087907.49..2019675.12 rows=14107680 width=69)
 Hash Cond: (t1.team_stadium_id = ra.team_stadium_id)
 ->  Hash Left Join  (cost=1079740.68..1955870.37 rows=14107680
width=69)
   Hash Cond: (t1.team_stadium_id = frg.team_stadium_id)
   ->  Hash Right Join  (cost=1073614.70..1894916.60
rows=14107680 width=69)
 Hash Cond: (sm_1.id = t1.id)
 ->  Hash Join  (cost=47373.90..695007.85 rows=170011
width=4)
   Hash Cond: (sm_1.id = l.id)
   ->  Index Scan using
index_foo_fact_on_id_first_win on foo_fact sm_1  (cost=0.42..647177.59
rows=174010 width=12)
 Filter: (CASE WHEN
(basketball_renewal_date <= now()) THEN true ELSE false END AND (NOT CASE
WHEN (basketball_renewal_date <= (now() - '5 days'::interval)) THEN false
ELSE true END))
   ->  Hash  (cost=34408.

Re: PostgreSQL intenal scheduler?

2018-09-05 Thread Jeremy Finzel

If I follow your use case, we have written something that just may fit your
scenario and plan to open source it rather soon.

It has several layers but let me boil it down. First we use an open sourced
auditing system to log changes to the source tables. This becomes your
queue. A postgres background worker will asynchronously process these
changes based on your configuration, which is highly configurable. It also
handles the concurrency you are questioning.

This allows you to build history tables without requiring you for example
to do it directly via a trigger. It also removes redundancy if you have the
same key updated multiple times. It assumes we are fine with the data built
not being 100% up to date data because these updates obviously don’t all
happen in the same transaction as the source data change.

Let me know if this interests you and I can share more.

Thanks,
Jeremy

On Wed, Sep 5, 2018 at 10:07 AM Thiemo Kellner 
wrote:

>
> Hi all
>
> I am designing a framework for historisation implementation (SCD). One
> feature I would like to provide is a table in that the actual state of
> an entity is put and if this is complete, this history table is
> "updated":
>
> -
> ==>  | ENTITY_ACT |  ==>  | ENTITY_HIST |
> -
>
> I plan to use instead-of-triggers on the hist table that read the
> actual table and perfoms all necessary inserts und updates on the
> history table. If I want the termination of a record version (actually
> the record of a specific business key with a specific payload) to get
> propagated up and/or down referential integrities (no overlapping
> validities) I have to make sure that only one of those processes is
> modifying a table. I was thinking of a scheduler queue where the
> trigger would put a process request and PostgreSQL would work through.
> Is there a scheduler within PostgreSQL? I read the documentation and
> searched the web but could not find a hint. But before going another
> road or implementing something myself, I ask. Maybe this design is no
> good at all.
>
> Kind regards
>
> Thiemo
>
>
>

Determine last LSN before promotion?

2018-09-17 Thread Jeremy Finzel

Is there any easy way to determine what the last LSN was prior to
promotion, except for (as I understand it), using pg_xlogdump on the
.partial WAL file after promotion and find the last LSN?  Am I missing
something obvious?

The question I am trying to answer has to do with replication slots and how
to determine when new activity begins on the master just following the
promotion point on the standby.

Thanks,
Jeremy

postgres_fdw chooses remote index scan too rarely

2018-09-19 Thread Jeremy Finzel

First of all, my local system which has postgres_fdw installed is this
version:
PostgreSQL 9.6.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit

Here are my server configurations (I have tweaked these with no improvement
to my issue):

   - fetch_size=100
   - fdw_startup_cost=1000
   - fdw_tuple_cost=10
   - use_remote_estimate=true

I am seeing some really odd behavior and I'm not sure how I can most easily
help our users to write queries that actually hit an index scan on the
remote server.  In one example, I have:

   - a temp table with 33 rows
   - text instead of integer data type - which the remote table's data type
   is
   - it is freshly analyzed

Joining to the table, doing id IN (list), or doing EXISTS all yield a full
remote table scan:

select *
from remote.customers c
inner join abc t on t.customer_id::int=c.customer_id;

select *
from remote.customers c
where c.customer_id in (SELECT customer_id::int FROM abc);

select *
from remote.customers c
where exists (SELECT 1 FROM abc WHERE customer_id::int = c.customer_id);

 QUERY PLAN
-
 Hash Join  (cost=1002.47..1170208178867.64 rows=5851034 width=3113)
   Hash Cond: (c.customer_id = (abc.customer_id)::integer)
   ->  Foreign Scan on customers c  (cost=1000.43..1170208089344.77
rows=11702069 width=902)
   ->  Hash  (cost=1.69..1.69 rows=28 width=7)
 ->  HashAggregate  (cost=1.41..1.69 rows=28 width=7)
   Group Key: (abc.customer_id)::integer
   ->  Seq Scan on abc  (cost=0.00..1.33 rows=33 width=7)
(7 rows)

However, I can get the index scan 2 ways:

   - converting the temp table data type to int
   - keeping the data type as is, but adding DISTINCT to the IN list

Neither of these make any sense to me because the planner knows there are
only 33 rows in the temp table.  It should always do an index scan.  And
why should converting the data type to int be any different than casting it
to int?  I understand the planner stats on that field are not for an
integer, but it's still only 33 rows.

So these 2 versions get a good plan and actually run very quickly:

select *
from remote.customers c
where c.customer_id in (SELECT DISTINCT customer_id::int FROM abc);

WITH distinctified AS (SELECT DISTINCT customer_id::int FROM abc)

select *
from remote.customers c
INNER JOIN distinctified t on t.customer_id::int=c.customer_id;

  QUERY PLAN
---
 Nested Loop  (cost=1002.03..5628140.32 rows=5851034 width=3113)
   ->  HashAggregate  (cost=1.58..2.00 rows=28 width=4)
 Group Key: (abc.customer_id)::integer
 ->  Seq Scan on abc  (cost=0.00..1.50 rows=33 width=4)
   ->  Foreign Scan on customers c  (cost=1000.45..201004.91 rows=2
width=902)
(5 rows)


Likewise if I alter the type it works without any special treatment:

ALTER TABLE abc ALTER COLUMN customer_id TYPE int USING customer_id::INT;
ANALYZE abc;

EXPLAIN
select *
from remote.customers c
inner join abc t using (customer_id);

  QUERY PLAN
---
 Nested Loop  (cost=1000.45..6633164.02 rows=1930841 width=3194)
   ->  Seq Scan on abc t  (cost=0.00..1.33 rows=33 width=85)
   ->  Foreign Scan on customers c  (cost=1000.45..201004.91 rows=2
width=902)
(3 rows)

Any insight appreciated!

Thanks,
Jeremy

Re: Comparing dates in DDL

2019-01-04 Thread Jeremy Finzel

On Fri, Jan 4, 2019 at 4:19 PM Rich Shepard 
wrote:

> On Fri, 4 Jan 2019, David G. Johnston wrote:
>
> > That would be the decision to make - does your toolkit support (or can be
> > made to support) the type and are you willing to choose a sub-optimal
> > database model because one or more applications happen to do things
> > differently?
> >
> > IMO the daterange datatype is the best type you can choose for the model;
> > now you have to figure out and decide where any tradeoffs are and if they
> > are worth it given your specific circumstances.
>
> David,
>
>Thanks for the insights.
>
> Regards,
>
> Rich

Another suggestion which hasn’t been mentioned is using ‘infinity’ as the
end date. I like this because it IMO indicates that the record is clearly
the current valid record more than null.

But I’m not sure exactly what you are trying to do either. If you are
creating a new record with changes and closing (ending) the range of the
original record, then a GIST exclusion index would ensure you have no
overlapping date ranges for all historical records put together.

Thanks,
Jeremy

>

Get LSN at which a cluster was promoted on previous timeline

2019-01-04 Thread Jeremy Finzel

I am having a hard time finding out how I can easily determine at which LSN
a streamer was promoted *after *promotion.  A related question is that if I
pause wal replay, I am able to see the last replayed LSN but I can't simply
immediately promote.

I want to know the state of the streamer post-promotion with respect to the
master by getting the LSN at which it was promoted.

Thanks,
Jeremy

Re: PostgreSQL logical replication depends on WAL segments?

2019-01-22 Thread Jeremy Finzel

>
> Thanks, I see... So if I understand it correctly - since I have quite big
> partitions like ~30 GB each in one parent table and from ~1GB to ~5 GB in
> several others I presume I had to set wal_keep_segments to some really high
> number and stop our security cronjob cleaning old WAL segments (because we
> already had some problems with almost full disk due to old WAL segments)
> until the whole transfer of snapshot is done. Because only after the whole
> snapshot is transferred logical replication workers start to transfer WAL
> logs reflecting changes done from the moment snapshot was taken...
>
> jm
>

Understand there are other downsides to just keeping around a huge amount
of WAL segments apart from only taking up disk space.  None of the data
held in those WAL segments can be vacuumed away while they are left around,
which can lead to significant bloat and performance issues over time.

I'm not exactly clear on your use case, but if you need to just
resychronize data for a single table, there is a built-in way to do that
(actually would be nice if the docs spelled this out).

On publisher:

ALTER PUBLICATION mypub DROP TABLE old_data_table;

On subscriber:

ALTER SUBSCRIPTION mysub REFRESH PUBLICATION WITH ( COPY_DATA = true);

On publisher:

ALTER PUBLICATION mypub ADD TABLE old_data_table;

On subscriber:

ALTER SUBSCRIPTION mysub REFRESH PUBLICATION WITH ( COPY_DATA = true);

The last command will resync the table from the current table data,
regardless of the WAL file situation.  This is the "normal" way you would
go about resynchronizing data between clusters when a long time has passed,
rather than trying to keep all that WAL around!

So far as I can tell from testing, above pattern is the easiest way to do
this, and it will not resynchronize any of the other tables in your
subscription.

P.S. do heed the advice of the others and get more familiar with the docs
around WAL archiving.

Thanks,
Jeremy

Re: PostgreSQL logical replication depends on WAL segments?

2019-01-22 Thread Jeremy Finzel

>
> That is news to me. Can you provide a citation for this?
>

I can see the confusion in what I said.  To clarify, I really meant that in
order to retrieve that data that you need in really old WAL segments, you
need to keep your replication slot in a position that will hold that WAL in
place.  And that is what will prevent old rows from being vacuumed away.
There is no way to create a replication slot in the past, so the only way
to actually use the old WAL for logical replication is to have an old slot
in place with an LSN position from that long time ago.

The bit about rows not being removed is found here:
https://www.postgresql.org/docs/9.4/warm-standby.html#STREAMING-REPLICATION-SLOTS

Thanks,
Jeremy

Re: PostgreSQL logical replication depends on WAL segments?

2019-01-22 Thread Jeremy Finzel

On Tue, Jan 22, 2019 at 12:52 PM Andres Freund  wrote:

> Hi,
>
> On 2019-01-22 11:10:27 -0600, Jeremy Finzel wrote:
> > P.S. do heed the advice of the others and get more familiar with the docs
> > around WAL archiving.
>
> Logical replication doesn't normally interact with WAL archiving in any
> way, so that seems orthogonal.
>

But he is manually removing WAL files via cron which should normally be
held until the replication slot doesn't need it anymore.  I do believe that
is why he is getting errors.  The suggestion is to rely on Postgres' WAL
archiving rather than removing WAL files manually...

Thanks,
Jeremy

Re: PostgreSQL logical replication depends on WAL segments?

2019-01-22 Thread Jeremy Finzel

>
> Note replication slots only prevent old *catalog* rows from being
> removed, not old row versions in user created tables.
>

Thank you for that clarification.  I can see this is noted clearly in the
CAUTION statement here:
https://www.postgresql.org/docs/current/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS

Thanks,
Jeremy

Re: PostgreSQL logical replication depends on WAL segments?

2019-01-22 Thread Jeremy Finzel

>
> wal retention in pg_wal and wal archiving are different things. The OP got
> problems cause he deliberately went on to delete files in pg_wal which
> means he used the wrong method to address a wrong situation.
>
> However, if the OP has still those WAL files archived he can use them to
> bring logical replication back in synch. Don't ask me how I know :
> https://www.postgresql.org/message-id/ae8812c3-d138-73b7-537a-a273e15ef6e1%40matrix.gatewaynet.com
>
I don't think we are disagreeing.  My point is that WAL will be retained in
pg_wal if it is still needed by a replication slot, and any one of those
WAL files will not be archived and recycled as long as it no longer needed
by a replication slot.  So yes, they are different things, but they are
related in that sense.

It sounded to me like he had an old slot in place pointing to old WAL
files, which the cron job removed, thus leading to his error.

Sure, he can recover from this by moving those WAL files back into pg_wal.
But my point is simply that he should rely on WAL archiving rather than
archiving via cron because Postgres handles knowing whether or not a
replication slot still needs a WAL file.

Thanks,
Jeremy

Re: PostgreSQL logical replication depends on WAL segments?

2019-01-22 Thread Jeremy Finzel

On Tue, Jan 22, 2019 at 2:16 PM Jeremy Finzel  wrote:

> any one of those WAL files will not be archived and recycled as long as it
>> no longer needed by a replication slot
>>
>
:(.  s/no longer/still.  I meant to say any one of those WAL files will not
be archived and recycled as long as it still needed by a replication slot.

Re: PostgreSQL logical replication depends on WAL segments?

2019-01-22 Thread Jeremy Finzel

On Tue, Jan 22, 2019 at 8:19 PM Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

>
> On 22/1/19 10:18 μ.μ., Jeremy Finzel wrote:
>
> On Tue, Jan 22, 2019 at 2:16 PM Jeremy Finzel  wrote:
>
>> any one of those WAL files will not be archived and recycled as long as
>>> it no longer needed by a replication slot
>>>
>>
> :(.  s/no longer/still.  I meant to say any one of those WAL files will
> not be archived and recycled as long as it still needed by a replication
> slot.
>
>
> A WAL segment will be archived as soon as it is completed.
> https://www.postgresql.org/docs/11/continuous-archiving.html
>

Yep, you’re right. Let me try this again. It will be archived, but the
segment won’t be reused or removed as long as the slot needs it. Is that an
accurate statement?

Thanks,
Jeremy

<https://www.postgresql.org/docs/11/continuous-archiving.html>
>
<https://www.postgresql.org/docs/11/continuous-archiving.html>
>

Re: Geographical multi-master replication

2019-01-24 Thread Jeremy Finzel

>
> The problem is that the version for BDR 1.0.7, which has an implementation
> for postgres 9.4, will be on end of live at the end of this year.
> Unfortunately the paid solution is out of our budget, so we currently have
> two options: find an alternative or remove the multi-region implementation.
> We are currently looking for alternatives.
>

You are missing all of the alternatives here.  Why don't you consider
upgrading from postgres 9.4 and with it to a supported version of BDR?
There is nothing better you can do to keep your infrastructure up to date,
performant, secure, and actually meet your multi-master needs than to
upgrade to a newer version of postgres which does have BDR support.

Even "stock" postgres 9.4 is set for end of life soon.  Upgrade!

Thanks,
Jeremy

Re: Geographical multi-master replication

2019-01-28 Thread Jeremy Finzel

On Mon, Jan 28, 2019 at 3:32 PM Ruben Rubio Rey  wrote:

> Hi Jeremy,
>
> > Why don't you consider upgrading from postgres 9.4 and with it to a
> supported version of BDR?  There is nothing better you can do to keep your
> infrastructure up to date, performant, secure, and actually meet your
> multi-master needs than to upgrade to a newer version of postgres which
> does have BDR support.
>
> This is something that we have definitely thought of. That would be a good
> alternative but I have an strong feeling that the business won't be willing
> to pay for it. The cost of the paid BDR version, in comparison with the
> current fixed expenses of the infrastructure, are just too large. That's
> why we are researching alternatives.
>

When I said “supported version” of BDR, I mostly meant a newer version of
Postgres. I know you have to decide on getting support BDR but regardless
of what route you take, part of your plan needs to be > 9.4 :).

Thanks,
Jeremy

Size estimation of postgres core files

2019-02-15 Thread Jeremy Finzel

I am trying to determine the upper size limit of a core file generated for
any given cluster.  Is it feasible that it could actually be the entire
size of the system memory + shared buffers (i.e. really huge)?

I've done a little bit of testing of this myself, but want to be sure I am
clear on this for planning to have enough free space for postgres core
files in case of potential crashes.

Thanks!
Jeremy

Re: Size estimation of postgres core files

2019-02-15 Thread Jeremy Finzel

>
> In Linux, yes.  Not sure about other OSes.
>
> You can turn off the dumping of shared memory with some unusably
> unfriendly bitwise arithmetic using the "coredump_filter" file in /proc
> for the process.  (It's inherited by children, so you can just set it
> once for postmaster at server start time).
>

Yes Linux.  This is very helpful, thanks.  A follow-up question - will it
take postgres a really long time to crash (and hopefully recover) if I have
say 1T of RAM because it has to write that all out to a core file first?

Thanks,
Jeremy

Re: Size estimation of postgres core files

2019-02-15 Thread Jeremy Finzel

>
> It doesn't write out all of RAM, only the amount in use by the
> particular backend that crashed (plus all the shared segments attached
> by that backend, including the main shared_buffers, unless you disable
> that as previously mentioned).
>
> And yes, it can take a long time to generate a large core file.
>
> --
> Andrew (irc:RhodiumToad)
>

Based on the Alvaro's response, I thought it is reasonably possible that
that *could* include nearly all of RAM, because that was my original
question.  If shared buffers is say 50G and my OS has 1T, shared buffers is
a small portion of that.  But really my question is what should we
reasonably assume is possible - meaning what kind of space should I
provision for a volume to be able to contain the core dump in case of
crash?  The time of writing the core file would definitely be a concern if
it could indeed be that large.

Could someone provide more information on exactly how to do that
coredump_filter?

We are looking to enable core dumps to aid in case of unexpected crashes
and wondering if there are any recommendations in general in terms of
balancing costs/benefits of enabling core dumps.

Thank you!
Jeremy

idle_in_transaction_session_timeout for a set of SQL statements

2019-02-27 Thread Jeremy Finzel

I was hoping to use idle_in_transaction_session_timeout to prevent schema
change migrations from running too long and thereby locking up the
application for an extended period even if any one statement in the
migration is very short.

I am not finding predictable behavior
using idle_in_transaction_session_timeout.  If I create a simple SQL file
with two ALTER TABLE statements, and intentionally block the first
statement long enough to exceed idle_in_transaction_session_timeout,
usually once I unblock the first statement, the migration succeeds.  I want
it to actually be killed once it has exceeded
idle_in_transaction_session_timeout and finished executing one SQL
statement and is about to move to another.

One of my tries, it actually did somehow exceed it and terminate, with the
exact same test:
$ psql test -f foo
SET
BEGIN
ALTER TABLE
ALTER TABLE
psql:foo:11: FATAL:  terminating connection due to idle-in-transaction
timeout
psql:foo:12: SSL connection has been closed unexpectedly
psql:foo:12: connection to server was lost

However, I only got that to happen once  usually it just executes fine
which I don't want.

Session 1:
SET idle_in_transaction_session_timeout = 1;
BEGIN;
ALTER TABLE foo ADD COLUMN bar text; -- block this for >
idle_in_transaction_session_timeout

-- I was hoping it would timeout here

ALTER TABLE bar ADD COLUMN foo text;
COMMIT;

Session 2:
BEGIN;
SELECT * FROM foo;
. wait then abort


Granted this example is contrived, but the goal is again to avoid allowing
a migration with many individual statements from taking longer than say 5
seconds to execute, locking up the application.

Is there any way to timeout a long transaction or any clarity around how
idle_in_transaction_session_timeout works when executing a file with
multiple SQL statements?

Thanks,
Jeremy

Re: idle_in_transaction_session_timeout for a set of SQL statements

2019-02-27 Thread Jeremy Finzel

>
> SET lock_timeout TO '1s';
>

No, my assumption is that lock_timeout does not cover my use case here.  My
point is actually that any one statement is not hitting lock_timeout, but
as a whole the transaction takes too long.  For example if I set
lock_timeout to 1 second, but my migration actually has 10 SQL statements
each of which take just under a second, I have a total now of a near
10-second lockout.

Regardless of whether or not I am technically going idle, I want to be able
to abort based on transaction time length.  I do believe I can handle this
externally via the timeout command, but curious still if there is any way
to reliably do it from within postgres.

Thanks,
Jeremy

Do all superuser processes count toward superuser_reserved_connections?

2019-03-14 Thread Jeremy Finzel

I don't find a clear mention in the docs of superuser processes that are
exempt from counting toward superuser_reserved_connections.  So I would
think that it's possible that postgres autovac workers ought to count
toward that.  Am I wrong about that?  I actually have the same question
about pglogical replication background workers and manager, which also run
as postgres.

Do I need to increase my limit to account for these workers, or are some or
all of these processes ignored by superuser_reserved_connections?

Many thanks!
Jeremy

Re: Do all superuser processes count toward superuser_reserved_connections?

2019-03-15 Thread Jeremy Finzel

On Fri, Mar 15, 2019 at 9:48 AM Adrian Klaver 
wrote:

> On 3/14/19 8:23 AM, Jeremy Finzel wrote:
> > I don't find a clear mention in the docs of superuser processes that are
> > exempt from counting toward superuser_reserved_connections.  So I would
> > think that it's possible that postgres autovac workers ought to count
> > toward that.  Am I wrong about that?  I actually have the same question
>
> AFAICK autovacuum workers do not use the connections referred to above.
> The details can be found here:
>
> https://doxygen.postgresql.org/autovacuum_8c.html

Not sure I can really grok that and how it answers the question.  Are you
saying if you have max_connections set to 10, you could theoretically have
20 autovac processes still?

> > about pglogical replication background workers and manager, which also
> > run as postgres.
>
> But the actual connection can be by a different user:
>
> https://www.postgresql.org/docs/11/logical-replication-security.html
>
>
But I am speaking of pglogical, which does require superuser, last I
checked :). 

It does use replication slots, but there are processes corresponding to
each subscription.  I have some databases with dozens of them.

Thanks,
Jeremy

Why does log_error_verbosity not apply to server logs?

2019-04-22 Thread Jeremy Finzel

I have a DO block which is raising a log message with number of rows
deleted.  It also shows CONTEXT messages every time, which I don't want.
But setting in the client log_error_verbosity = terse does not work to get
rid of the messages.  I can't get it to work even setting it on a per-user
level.

My client shows terse verbosity as expected, but the server logs always no
matter what have CONTEXT messages.

Is there any way to suppress these when called in a DO block like this?

Thanks,
Jeremy

Re: Why does log_error_verbosity not apply to server logs?

2019-04-22 Thread Jeremy Finzel

On Mon, Apr 22, 2019 at 3:47 PM Tom Lane  wrote:

> Jeremy Finzel  writes:
> > I have a DO block which is raising a log message with number of rows
> > deleted.  It also shows CONTEXT messages every time, which I don't want.
> > But setting in the client log_error_verbosity = terse does not work to
> get
> > rid of the messages.  I can't get it to work even setting it on a
> per-user
> > level.
>
> > My client shows terse verbosity as expected, but the server logs always
> no
> > matter what have CONTEXT messages.
>
> Sure sounds to me like what you are setting is something client-side,
> not the server's log verbosity.  It works for me:
>
> regression=# do $$ declare x int; y int = 0; begin x := 1/y; end$$;
> psql: ERROR:  division by zero
> CONTEXT:  PL/pgSQL function inline_code_block line 1 at assignment
> regression=# set log_error_verbosity = terse;
> SET
> regression=# do $$ declare x int; y int = 0; begin x := 1/y; end$$;
> psql: ERROR:  division by zero
> CONTEXT:  PL/pgSQL function inline_code_block line 1 at assignment
>
> after which I see this in the postmaster log:
>
> 2019-04-22 16:40:38.300 EDT [25788] ERROR:  division by zero
> 2019-04-22 16:40:38.300 EDT [25788] CONTEXT:  PL/pgSQL function
> inline_code_block line 1 at assignment
> 2019-04-22 16:40:38.300 EDT [25788] STATEMENT:  do $$ declare x int; y int
> = 0; begin x := 1/y; end$$;
> 2019-04-22 16:40:51.654 EDT [25788] ERROR:  division by zero
> 2019-04-22 16:40:51.654 EDT [25788] STATEMENT:  do $$ declare x int; y int
> = 0; begin x := 1/y; end$$;
>
> Note that this changed the server log verbosity but *not*
> what was displayed on the client side.
>
> (BTW, if you want to get rid of the statement logging as well,
> see log_min_error_statement.)
>
> Also note that adjusting log_error_verbosity on the fly
> like this requires being superuser, which isn't really
> a good way to run in production.  I'd expect though that
> you could apply it with ALTER USER SET.
>
> regards, tom lane
>

I am running it differently - explicitly raising a LOG level message, not
an ERROR.  The line of interest is the following:

do $$
..
raise log 'pruned % rows from table', rows;
...

Even run as a superuser, it doesn't work. I have run it just as you did
above - setting it client side.  Also done it on a per-role basis and it
didn't work.

Thanks,
Jeremy

Re: Why does log_error_verbosity not apply to server logs?

2019-04-23 Thread Jeremy Finzel

>
> My question was about how you (think you are) setting log_error_verbosity,
> not what you're doing to produce the message.  The decisions in elog.c
> about which fields to print in the server log do not depend on the latter.
>

Roger that.


> However... looking again at what elog.c does, I'm prompted to ask whether
> you're using plain or CSV log output.  The CSV log format doesn't
> consider log_error_verbosity when deciding whether to output this field.
>

Yup, we are using CSV log format.  So in that case, is there no way around
the context messages if I still want the log message?

Thanks,
Jeremy

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Jeremy Finzel

On Tue, May 21, 2019 at 12:24 PM Rich Shepard 
wrote:

> On Tue, 21 May 2019, Francisco Olarte wrote:
>
> > From how you say it, I assume you have some data in your original
> > dumps which can relate boths, lets assume it's org_name, but may be an
> > org-code. If you do not have it it means you cannot match people to
> > orgs in your data, all is lost.
>
> Francisco,
>
> Not yet with these new data.
>
> I'll manually insert the org_id numbers from the organizations table into
> the people table.
>

To me, this is the key to your problem what will either make this a time
saver or time waster.  Somehow you are accounting for what uniquely
identifies organizations, right?

Say there are 5 fields that correspond to an organization.  I assume then
you are creating only one new org_id for each unique combination of these
fields?

Then take Francisco's suggestion, only use an md5 of the organization
fields to create yourself a unique identifier.  Then you can use ctid
(unique internal identifier for each row) to join back.  You use SQL like
this:

SELECT md5(row(org_name, org_stuff_1, org_stuff_2)::text) AS hash_identifier
FROM table;

Assume in example below that your unique "org" rows are the first 3 fields:

1. Load your data file into a loading table like so:
CREATE TABLE loader (org_name text, org_stuff_1 text, org_stuff_2 text,
person_name text);
\copy loader from 'my_data.csv' with csv header
ALTER TABLE loader ADD COLUMN org_id INT;

Example data:
INSERT INTO loader VALUES ('a', ' ', ' ', 'Jerry');
INSERT INTO loader VALUES ('a', ' ', 'b', 'Bob');
INSERT INTO loader VALUES ('a', ' ', 'b', 'Janice');
INSERT INTO loader VALUES ('a', ' ', 'c', 'Chris');
INSERT INTO loader VALUES ('b', ' ', 'c', 'Jason');
INSERT INTO loader VALUES ('a', ' ', ' ', 'Alice');

2. Load org table:
test=# CREATE TABLE organizations (org_id serial primary key, org_name
text, org_stuff_1 text, org_stuff_2 text);
CREATE TABLE
test=# INSERT INTO organizations (org_name, org_stuff_1, org_stuff_2)
test-# SELECT DISTINCT org_name, org_stuff_1, org_stuff_2
test-# FROM loader;
INSERT 0 4

3. Build mapping directly and update:
-- build hash of org fields in loader table, take ctid in order to map back
later
WITH map_source AS (
SELECT ctid, md5(row(org_name, org_stuff_1, org_stuff_2)::text) AS
hash_identifier
FROM loader)

-- build hash of org fields in organizations table to join back to loader
and bring in org_id of course
, map_org AS (
SELECT org_id, md5(row(org_name, org_stuff_1, org_stuff_2)::text) AS
hash_identifier
FROM organizations)

-- map by joining together on hash_identifier
, final_map AS (
SELECT org_id, ctid
FROM map_source l
INNER JOIN map_org o USING (hash_identifier)
)

-- Perform update
UPDATE loader l
SET org_id = fm.org_id
FROM final_map fm
WHERE fm.ctid = l.ctid;

Final data ready for the person table to be populated:
test=# table organizations;
 org_id | org_name | org_stuff_1 | org_stuff_2
+--+-+-
  1 | a| | b
  2 | a| |
  3 | a| | c
  4 | b| | c
(4 rows)

test=# table loader;
 org_name | org_stuff_1 | org_stuff_2 | person_name | org_id
--+-+-+-+
 a| | | Jerry   |  2 |
 a| | b   | Bob |  1 |
 a| | b   | Janice  |  1 |
 a| | c   | Chris   |  3 |
 b| | c   | Jason   |  4 |
 a| | | Alice   |  2 |
(6 rows)

Hope this helps!
Thanks,
Jeremy

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Jeremy Finzel

On Wed, May 22, 2019 at 7:40 AM Rich Shepard 
wrote:

> On Wed, 22 May 2019, Francisco Olarte wrote:
>
> > I'm curious, what org_id do you put (manually) to the people? you must
> > have some way to match it ( like, say, "I have an organization line,
> > followed by lines for people in this organization"
>
> Francisco,
>
> The first data entered was in small chunks so I'd add a few rows to the
> organization table, output a file of org_id and org_name, then use the
> source data to associate that org_id to the people associated with it.
>
> With this new data source I want to populate the two tables more quickly. I
> think the time consuming part is associating people with their
> organization.
> I'll do more thinking about this.
>
> I don't do perl, though.
>

There's absolutely no need to use anything beyond SQL here, though you
could if you want to.

I really wonder how much we are just talking past each other simply because
we don't know what your data looks like, so we can't show you how our
examples apply to your use case.  If you provided a sample scrubbed data
file, this whole thread probably would have been much shorter :).  Can you
do that?

You said here again the most time consuming part is associating people with
their organization.  Well, that's the whole question I was trying to
optimize on.  You told Francisco that the data file does not have a unique
org name that could be used as a unique organization identifier.  However
you seem to have contradicted that by responding favorably to this solution:

INSERT INTO PEOPLE (id, name, org_id) VALUES (1, ‘Bob’, (SELECT org_id FROM
org WHERE org_name=‘Main Office’))

How can this solution work if you said org_name cannot be used to link a
person in the data file?

So, the question again becomes: does your data file have a combination of
org fields that allows you to assign a person to a unique organization?

   - If the answer is still no, how could you ever assign people to a
   unique organization?  In that case you need to massage your data file first
   before trying to build a database schema, as Francisco noted.  This is
   basically a non-starter for your database schema.
   - If the answer is yes, that means you can use the combination of those
   fields to uniquely identify an organization, and thus link people to it
   later.  That's the reason for the md5 - to easily use many fields in
   combination as a unique id

Thanks,
Jeremy

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Jeremy Finzel

On Wed, May 22, 2019 at 11:07 AM Rich Shepard 
wrote:

> On Wed, 22 May 2019, Francisco Olarte wrote:
>
> > You are not reading what we write to you. Note YOU AND ONLY YOU are the
> > one speaking of PK. We are speaking of "unique identifier" ( that would
> > be, IIRC, "candidate keys", you can peek any as your PK, or even
> introduce
> > a new synthetic one with a sequence, or a femto second exact timestamp or
> > whatever ).
>
> Francisco,
>
> Let me clarify.
>
> The organizations table has org_id (an integer) as PK.
>
> The people table has person_id (an interger) as PK and org_id as the
> reference to organization.org_id.
>
> Does this help?
>

Francisco hit just the point I was making.  I never said org_name should be
the primary key.  We all acknowledge org_id will the the primary key.  But
the whole question that would solve your issue is what other field or
fields identify an organization in your data file.

You need two unique identifiers:

   1. A "natural" one from your data file that identifies unique
   information about an organization.  This is NOT the primary key on
   organizations table
   2. A "synthetic" one - org_id, generated in the organizations table.

The whole point of the (1) natural unique identifier is this is how you can
*easily* link org_id back to the person record in your data file.

You say we don't need to see your data file to answer the question.  To me,
it's still clear that if you were to share a sample of your data file
(obviously, with fake data), all of this back and forth would quickly end.
This is all way too abstract.

Thanks,
Jeremy

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Jeremy Finzel

On Wed, May 22, 2019 at 12:53 PM Rich Shepard 
wrote:

> On Wed, 22 May 2019, Francisco Olarte wrote:
>
> > Also, when I speak of "unique identifier" I'm not speaking of the one if
> > your FINAL tables, I assume you would have at least the *_id field as
> > PKEY, so nothing else needed, but the one in your SOURCE data set (it can
> > be anything, like the row number in the original excel).
>
> Francisco/Jeremy,
>
> I'm grateful for you patient help. The 'unique identifier' in the source
> file has been provided (just now) using nl  >.
> The syntax I used is:
>
> nl -b a -n ln -s , -v 339 source.txt > out.txt
>
> because the organizations table has 338 as the maximum org_id number.
>
> I believe this fulfills the need for a known unique ID in the source file,
> and when I parse each row using gawk to create the two files for table
> input
> I can use it in both the organizations table (as the PK) and the people
> table (as the FK referring to the organizations table). I can let postgres
> assign the unique ID for the new rows in the people table.
>
> Am I still missing something critical?

Are you saying your database already has an organizations table, and this
data file is appending to it with all-brand-new organizations and people?

Say you have 800 people.  Are you honestly saying that there are also 800
organizations?  If so, I guess your solution would work.  However, I
thought that it was the case that multiple people belong to the same
organization.  Hence, the entire problem of mapping we have been discussing.

The data model doesn't make much sense if that's true.

Also, you said previously that some fields from your data file would be
populated in *both tables*.  That is a fundamental violation of DB design.
Why are you duplicating that data in two places?

Thanks,
Jeremy

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Jeremy Finzel

>
> A sample of the data you are cleaning up.
>
> I think what people are trying to wrap there head around is how 800
> lines in the file is being split into two subsets: the organization data
> and the people data. In particular how that is being done to preserve
> the relationship between organizations and people? This is before it
> ever gets to the database.
>

I agree.  We keep going in circles and until we have a sample of the data,
it's not a good use of time for us to keep guessing at what none of us is
clear about but you - what the data actually looks like.

Thanks,
Jeremy

Questions about btree_gin vs btree_gist for low cardinality columns

2019-05-24 Thread Jeremy Finzel

I have been hoping for clearer direction from the community about
specifically btree_gin indexes for low cardinality columns (as well as low
cardinality multi-column indexes). In general there is very little
discussion about this both online and in the docs. Rather, the emphasis
for GIN indexes discussed is always on full text search of JSON indexing,
not btree_gin indexes.

However, I have never been happy with the options open to me for indexing
low cardinality columns and was hoping this could be a big win. Often I
use partial indexes as a solution, but I really want to know how many use
cases btree_gin could solve better than either a normal btree or a partial
index.

Here are my main questions:

"The docs say regarding *index only scans*: The index type must support
index-only scans. B-tree indexes always do. GiST and SP-GiST indexes
support index-only scans for some operator classes but not others. Other
index types have no support. The underlying requirement is that the index
must physically store, or else be able to reconstruct, the original data
value for each index entry. As a counterexample, GIN indexes cannot support
index-only scans because each index entry typically holds only part of the
original data value."

This is confusing to say "B-tree indexes always do" and "GIN indexes cannot
support index-only scans", when we have a btree_gin index type.
Explanation please ???

Is it true that for a btree_gin index on a regular column, "each index
entry typically holds only part of the original data value"? Do these
still not support index only scans? Could they? I can't see why they
shouldn't be able to for a single indexed non-expression field?

Lack of index only scans is definitely a downside. However, I see
basically identical performance, but way less memory and space usage, for
gin indexes. In terms of read-only performance, if index only scans are
not a factor, why not always recommend btree_gin indexes instead of regular
btree for low cardinality fields, which will yield similar performance but
use far, far less space and resources?

This relates to 2. I understand the write overhead can be much greater for
GIN indexes, which is why the fastupdate feature exists. But again, in
those discussions in the docs, it appears to me they are emphasizing that
penalty more for full text or json GIN indexes. Does the same overhead
apply to a btree_gin index on a regular column with no expressions?

Those are my questions.

FYI, I can see an earlier thread about this topic (
https://www.postgresql.org/message-id/flat/E260AEE7-95B3-4142-9A4B-A4416F1701F0%40aol.com#5def5ce1864298a3c0ba2d4881a660c2),
but a few questions were left unanswered and unclear there.

I first started seriously considering using btree_gin indexes for low
cardinality columns, for example some text field with 30 unique values
across 100 million rows, after reading a summary of index types from
Bruce's article: https://momjian.us/main/writings/pgsql/indexing.pdf

This article was also helpful but yet again I wonder it's broader
viability:
http://hlinnaka.iki.fi/2014/03/28/gin-as-a-substitute-for-bitmap-indexes/

Thank you!
Jeremy

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-05-29 Thread Jeremy Finzel

On Fri, May 24, 2019 at 10:25 AM Jeremy Finzel  wrote:

> I have been hoping for clearer direction from the community about
> specifically btree_gin indexes for low cardinality columns (as well as low
> cardinality multi-column indexes).  In general there is very little
> discussion about this both online and in the docs.  Rather, the emphasis
> for GIN indexes discussed is always on full text search of JSON indexing,
> not btree_gin indexes.
>
> However, I have never been happy with the options open to me for indexing
> low cardinality columns and was hoping this could be a big win.  Often I
> use partial indexes as a solution, but I really want to know how many use
> cases btree_gin could solve better than either a normal btree or a partial
> index.
>
> Here are my main questions:
>
> 1.
>
> "The docs say regarding *index only scans*: The index type must support
> index-only scans. B-tree indexes always do. GiST and SP-GiST indexes
> support index-only scans for some operator classes but not others. Other
> index types have no support. The underlying requirement is that the index
> must physically store, or else be able to reconstruct, the original data
> value for each index entry. As a counterexample, GIN indexes cannot support
> index-only scans because each index entry typically holds only part of the
> original data value."
>
> This is confusing to say "B-tree indexes always do" and "GIN indexes
> cannot support index-only scans", when we have a btree_gin index type.
> Explanation please ???
>
> Is it true that for a btree_gin index on a regular column, "each index
> entry typically holds only part of the original data value"?  Do these
> still not support index only scans?  Could they?  I can't see why they
> shouldn't be able to for a single indexed non-expression field?
>
> 2.
>
> Lack of index only scans is definitely a downside.  However, I see
> basically identical performance, but way less memory and space usage, for
> gin indexes.  In terms of read-only performance, if index only scans are
> not a factor, why not always recommend btree_gin indexes instead of regular
> btree for low cardinality fields, which will yield similar performance but
> use far, far less space and resources?
>
> 3.
>
> This relates to 2.  I understand the write overhead can be much greater
> for GIN indexes, which is why the fastupdate feature exists.  But again, in
> those discussions in the docs, it appears to me they are emphasizing that
> penalty more for full text or json GIN indexes.  Does the same overhead
> apply to a btree_gin index on a regular column with no expressions?
>
> Those are my questions.
>
> FYI, I can see an earlier thread about this topic (
> https://www.postgresql.org/message-id/flat/E260AEE7-95B3-4142-9A4B-A4416F1701F0%40aol.com#5def5ce1864298a3c0ba2d4881a660c2),
> but a few questions were left unanswered and unclear there.
>
> I first started seriously considering using btree_gin indexes for low
> cardinality columns, for example some text field with 30 unique values
> across 100 million rows, after reading a summary of index types from
> Bruce's article: https://momjian.us/main/writings/pgsql/indexing.pdf
>
> This article was also helpful but yet again I wonder it's broader
> viability:
> http://hlinnaka.iki.fi/2014/03/28/gin-as-a-substitute-for-bitmap-indexes/
>
>
> Thank you!
> Jeremy
>

Could anyone shed any light on these questions?  I appreciate it.

Thanks,
Jeremy

Re: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-04 Thread Jeremy Finzel

On Sun, Jun 2, 2019 at 6:07 PM Tom Lane  wrote:

> Anyway, the larger point here is that right now btree_gin is just a quick
> hack, and it seems like it might be worth putting some more effort into
> it, because the addition of duplicate-compression changes the calculus
> for whether it's useful.
>

Thank you to all for the thoughtful and thorough replies!

To clarify, I am definitely more interested in "low cardinality" in the
sense NOT of a boolean or very few values, but rather enough values to
where:

   1. It's selective enough that having *some kind of index* actually will
   significantly speed up queries vs. a sequential scan
   2. There are too many values to use partial indexes easily without it
   becoming a kind of maintenance nightmare

In our environment, we happen to have this kind of situation a lot.  For
example, account codes, or other foreign key ids to lookup tables that
categorize data in some way that is frequently queried, exist on tables
with upwards of 100 million rows.  Typically it may be something like 50 to
500 unique values.

Historically, we have just had a bunch of regular btree indexes on these,
and there are quite a lot of them indeed.

Here is my specific example:

   - Table has 127 million rows, including a toast field.  The table is
   270GB
   - The filter is on a field with only 16 unique values.
   - The actual filter condition is filtering a join to 4 of the 16 unique
   values

As I said, performance was nearly identical for btree vs. gin.  But I was
much happier with the memory usage of GIN given its tiny size:

   - Memory for join using btree: Buffers: shared hit=12 read=328991
   - Memory for join using gin: Buffers: shared hit=12 read=13961

The btree index here is 3.8GB, whereas the gin index is only 200MB.  But I
have not tested how either index handles bloat.

Morris, I think the reason your gin index was clearly slower was precisely
because your example uses an index-only scan - which gin does not support.

Thanks,
Jeremy

98 matches

Mail list logo