from:"Robert"

[GENERAL] Changing the location of the default data directory on PG 9.6.6 (CentOS 7)?

2017-11-17 Thread Robert Gordon

I'm trying to identify which postgresql.conf file I should be editing, in order 
to change the default database files location for Postgres 9.6.6, when 
installed on CentOS 7.x/

Is the bet method for changing the default data directory at the time of 
database init, to include the $PGDATA variable at initialization, such as:

su - postgres -c '/usr/pgsql-9.6/bin/initdb --pgdata=$PGDATA', where $PGDATA is 
the directory path that I want the psql database files to reside?

[GENERAL] ERROR: invalid input syntax for integer: "INSERT"

2017-11-04 Thread Robert Lakes

Guys,
New to Postgres - here's my code inside an event trigger:
ELSIF (TG_OP = 'INSERT') THEN
EXECUTE format('INSERT INTO %I SELECT statement_timestamp(),
''INSERT'', $1.*', TG_TABLE_NAME || '_cdc')
USING NEW;
RETURN NEW;


Here's the error I am receiving - when I am attempting to insert a record
into a table:

ERROR:  invalid input syntax for integer: "INSERT"
LINE 1: ...T INTO listings_cdc SELECT statement_timestamp(), 'INSERT', ...
 ^
QUERY:  INSERT INTO listings_cdc SELECT statement_timestamp(), 'INSERT',
$1.*
CONTEXT:  PL/pgSQL function audit_func() line 28 at EXECUTE

[GENERAL] Adding 'serial' to existing column

2017-11-03 Thread Robert Lakes

I am new to Postgres and I am trying to build this SQL statement in my SQL
script:
ALTER TABLE listings_cdc ALTER COLUMN table_id SET DEFAULT
nextval('tab_id_seq');

I am trying to build the above-stated command as a dynamic SQL statement:
 EXECUTE 'ALTER TABLE listings_cdc ALTER COLUMN table_id SET DEFAULT
nextval('||quote_ident(tab_id_seq)||')';

I've received an error when running the script this above-stated command:
 ERROR:  column "tab_id_seq" does not exist
LINE 1: ...OLUMN table_id SET DEFAULT nextval('||quote_ident(tab_id_seq...

Need help on how to write the above-stated command as a dynamic SQL
statement.

In addition, do you know of a tutorial or a book that I can purchase that
teaches how to build dynamic SQL statements

Re: [GENERAL] Confused about max_standby_streaming_delay

2017-09-07 Thread Robert Inder

On 6 September 2017 at 20:47, Jeff Janes  wrote:

>
>> Have I misunderstood something?  Or is Postgres not actually configured
>> the way I think it is?
>>
>
> The standby will wait for ten minutes to obtain the lock it wishes to
> obtain.  In 9.4, if something other than dump of database b was already
> blocking it for 8 minutes before the dump starts, then the dump of database
> b will only have 2 minutes, not 10, before it gets cancelled.
>

H...
You're saying that the time for dumping database b may be spent 8 minutes
waiting on a lock then 2 minutes actually dumping.

But would it not still be at least 10 minutes elapsed between the finish
time of the previous dump and the following dump starting (let alone
finishing)?  And that's what I'm not seeing...

When one dump fails, there is no 10-minute gap in the mod. times of the
other successful dump files

> So, are there any long running jobs in database b other than the pg_dump?
>
There shouldn't be.

The standby server is (well, should be!) essentially idle:
 we're using it to do the dumps, so that they don't
 get under the feet of live queries, but (I think -- I'll have to check!)
that is all

>
> Cheers,
>
> Jeff
>

Robert.

-- 
Robert Inder,0131 229 1052 / 07808 492
213
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than
words

[GENERAL] Confused about max_standby_streaming_delay

2017-09-06 Thread Robert Inder

A server running PostgreSQL 9.4 on Centos 6.8, with a live server and a hot
standby, is supporting about 20 customer organisations, each with their own
linux user and its own installation/copy of the system, talking to its own
database.

The system has a backup script which is a wrapper round pg_dump.
For most installations, it takes "a couple of minutes" to run, though
others take longer.

On the standby server, we have a directory full of cron jobs -- one for
each installation -- to invoke this backup script
Each script is run as soon as the previous one is finished. So when all
goes to plan, we get a dump of
 database_a at 5 minutes past the hour,
 database_b at 7 minutes past,
database_c at 9 minutes past
and so on.

H
owever, we o
ccasionally
 have
a dump die with a message
 like

   Error message from server: ERROR:  canceling statement due to
conflict with recovery

I believe this means that an update from the live server wants to change a
table that is being dumped, and so the dump is aborted.

And I've read that the answer to this is to set max_standby_streaming_delay
in postgresql94.conf.
So I've set it to "600s" -- ten minutes.

I thought this would mean that when there was a conflict with an update
from the live server, Postgres would give the dump 10 minutes "grace" in
which to finish before killing it.

Ten minutes may or may not be enough.  But in a case where it isn't enough,
and the dump is abandonned, I would expect to see something like

the dump of database_a finishing at 5 minutes past the hour,
the dump of database_b
  starting after the dump of database_a,
  having a conflict,
  being given 10 minutes to complete, and then
  being abandonned
the dump of database_c starting after the dump of database_b and
finishing (say) 3 minutes later

So the dump of database_c should finish at around 18 minutes past the hour.

BUT that is not what I am seeing.
On occasions where the dump of database_b is being abandonned, the
successful dump of
database_c is timestamped less than 10 minutes after the dump of database_a

Which does not fit with the dump of database_b being given 10 minutes in
which to finish

Have I misunderstood something?  Or is Postgres not actually configured the
way I think it is?

Robert.

-- 
Robert Inder,0131 229 1052 / 07808 492
213
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than
words

Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100

2017-06-09 Thread Robert Lakes

Ha guys,
I am new to postgress and I am trying to write my first function to insert,
update or delete and trap errors as a result of the table not existing ,
the columns not exist or if any other error simply pass back the sqlstate
here's my code can you help
CREATE OR REPLACE FUNCTION listings_audit() RETURNS TRIGGER AS
$listings_audit$
  BEGIN
IF (TG_OP = 'DELETE') THEN
 IF (EXISTS (
  SELECT 1
  FROM pg_catalog.pg_class c
  JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
  WHERE n.nspname = 'schema_name'
  AND   c.relname = 'table_name'
  AND   c.relkind = 'r' -- only tables
)) THEN
   INSERT INTO listings_changes
 SELECT now(), 'DELETE', OLD.*;
   RETURN OLD;
   ELSE RAISE EXCEPTION 'Table does not exists';
 END IF;
ELSIF (TG_OP = 'UPDATE') THEN
  IF (EXISTS (
  SELECT 1
  FROM pg_catalog.pg_class c
  JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
  WHERE n.nspname = 'schema_name'
  AND   c.relname = 'table_name'
  AND   c.relkind = 'r' -- only tables
)) THEN
   INSERT INTO listings_changes
 SELECT now(), 'UPDATE', NEW.*;
   RETURN NEW;
   ELSE RAISE EXCEPTION 'Table does not exists';
 END IF;
ELSEIF (TG_OP = 'INSERT') THEN

   INSERT INTO listings_changes
 SELECT now(), 'INSERT', NEW.*;
   RETURN NEW;

END IF;
EXCEPTION
WHEN SQLSTATE '42611' THEN
  RAISE EXCEPTION 'Columns do not match audit file does not match user
file';
WHEN SQLSTATE '42P16' THEN
  RAISE EXCEPTION 'Table does not exists';
WHEN OTHERS THEN
  RAISE EXCEPTION 'PostgresSQL error code that has occurred';
RETURN SQLSTATE;
END;
$listings_audit$ LANGUAGE plpgsql;

On Thu, Jun 8, 2017 at 12:49 PM, Tom Lane  wrote:

> Harry Ambrose  writes:
> > Please find the jar attached (renamed with a .txt extension as I know
> some
> > email services deem jars a security issue).
>
> Hmm, the output from this script reminds me quite a lot of one I was
> sent in connection with bug #1 awhile back:
> https://www.postgresql.org/message-id/20161201165505.
> 4360.28203%40wrigleys.postgresql.org
> Was that a colleague of yours?
>
> Anyway, the bad news is I couldn't reproduce the problem then and I can't
> now.  I don't know if it's a timing issue or if there's something critical
> about configuration that I'm not duplicating.  Can you explain what sort
> of platform you're testing on, and what nondefault configuration settings
> you're using?
>
> regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Call for users to talk about table partitioning

2017-05-18 Thread Robert Eckhardt

All the code for creating and managing partitions is part of the core
Postgres code. What we are interested in looking into is what that work
flow might look like and how that workflow can be supported with a GUI
management tool.

-- Rob

On Thu, May 18, 2017 at 3:21 PM, Melvin Davidson 
wrote:

> Shirley,
> I am a bit confused. AFAIK, it is PostgreSQL that is responsible for table
> partitioning. PgAdmin4 is just an administrative tool.
> Are you saying that PgAdmin4 now can make partition tables automatically?
>
> On Thu, May 18, 2017 at 2:10 PM, Shirley Wang  wrote:
>
>> Hello!
>>
>> We're a team from Pivotal, working with members of the Postgres community
>> on table partitioning in pgAdmin4. We're looking to chat with some Postgres
>> users on their expectations with table partitioning within pgAdmin4.
>>
>> If you have some availability next week, we'd love to set up some time to
>> learn about your needs with this feature. Let us know some times that work
>> for you and we'll send over details for the call.
>>
>> Thanks!
>> Shirley
>>
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>

Re: [HACKERS] REFERENCES privilege should not be symmetric (was Re: [GENERAL] Postgres Permissions Article)

2017-03-31 Thread Robert Haas

On Fri, Mar 31, 2017 at 11:29 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Mar 30, 2017 at 4:45 PM, Tom Lane  wrote:
>>> In short, it seems like this statement in the docs is correctly describing
>>> our code's behavior, but said behavior is wrong and should be changed.
>>> I'd propose fixing it like that in HEAD; I'm not sure if the back branches
>>> should also be changed.
>
>> Sounds reasonable, but I don't see much advantage to changing it in
>> the back-branches.
>
> Well, it's a SQL-compliance bug, and we often back-patch bug fixes.

Personally, I'd be more inclined to view it as a definitional change.
Maybe we picked the wrong definition before, but it's well-established
behavior at this point.  The SQL standard also says that identifiers
are supposed to be folded to upper case, so I don't think the theory
that every deviation from the standard should be fixed and
back-patched holds a lot of water.

> The argument for not back-patching a bug fix usually boils down to
> fear of breaking existing applications, but it's hard to see how
> removal of a permission check could break a working application ---
> especially when the permission check is as hard to trigger as this one.
> How many table owners ever revoke their own REFERENCES permission?

Sure, but that argument cuts both ways.  If nobody ever does that, who
will be helped by back-patching this?

I certainly agree that back-patching this change is pretty low risk.
I just don't think it has any real benefits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] REFERENCES privilege should not be symmetric (was Re: [GENERAL] Postgres Permissions Article)

2017-03-31 Thread Robert Haas

On Thu, Mar 30, 2017 at 4:45 PM, Tom Lane  wrote:
> In short, it seems like this statement in the docs is correctly describing
> our code's behavior, but said behavior is wrong and should be changed.
> I'd propose fixing it like that in HEAD; I'm not sure if the back branches
> should also be changed.

Sounds reasonable, but I don't see much advantage to changing it in
the back-branches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Moving pg_xlog

2016-12-03 Thread Robert Inder

Thanks, everyone, for your comments.

I think I've got a clearer idea of what's going on now...

Robert.


On 1 December 2016 at 13:55, Robert Inder  wrote:
> I'm running  Postgres9.4 in master/hot-standby mode on a few pairs of servers.
>
> While recovering from A Bit Of Bother last week, I came across a
> posting saying that pg_xlog should be on a separate partition.
>
> I tried to find out more about this, by consulting the PostgresQL
> documentation (i.e.
> https://www.postgresql.org/docs/9.4/static/index.html )
> But all I could find was a mention that "It is advantageous if the log
> is located on a different disk from the main database files".
>
> The questions:
> 1. WHY is this good?  Is it (just) to stop pg_xlog filling the
> database disk/partition?  Or are there performance implications?
> SPECIFICALLY: my database is currently in "/", which is on SSD.  Is it
> better to move pg_xlog to another partition on the same SSD?  Or to a
> physical disk or SAN?
>
> 2. What are the implications for doing a base backup?  I believe I
> read that putting pg_xlog on a different partition meant it would be
> omitted from a file-system bulk copy (e.g. rsync), and this was a GOOD
> thing, because the copy operation would be faster -- not copying
> pg_xlog would not prevent the standby server from starting, because
> the information it needed would be in the WAL files that would be
> shipped separately.  Have I got that right?
>
> Finally, the suggestion.
>
> I'd really like to read an explicit discussion of this in the official
> documentation, rather than just glean what I can from answers to
> questions.
> The possibility of moving pg_xlog to another disk is mentioned in the
> documentation, but I almost missed it because it is in "the wrong
> place".  It is in Section 29.5 -- "Reliability and the Write Ahead
> Log" / "WAL Internals".  But I wasn't interested in anything INTERNAL:
> I wanted to know where I should try to locate it/them.  So I'd looked
> in "the obvious places" -- Section 18 (Server configuration), and in
> particular 18.2 "File Locations".  Could I suggest that the motivation
> for doing this, and the consequences for backups, should be discussed
> in "the right place" -- in or near the section that talks about file
> locations in the context of server configuration.
>
> Robert.
>
> --
> Robert Inder,    0131 229 1052 / 07808 492 213
> Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
> Registered in Scotland, Company no. SC 150689
>Interactions speak louder than 
> words



-- 
Robert Inder,0131 229 1052 / 07808 492 213
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Moving pg_xlog

2016-12-01 Thread Robert Inder

I'm running  Postgres9.4 in master/hot-standby mode on a few pairs of servers.

While recovering from A Bit Of Bother last week, I came across a
posting saying that pg_xlog should be on a separate partition.

I tried to find out more about this, by consulting the PostgresQL
documentation (i.e.
https://www.postgresql.org/docs/9.4/static/index.html )
But all I could find was a mention that "It is advantageous if the log
is located on a different disk from the main database files".

The questions:
1. WHY is this good?  Is it (just) to stop pg_xlog filling the
database disk/partition?  Or are there performance implications?
SPECIFICALLY: my database is currently in "/", which is on SSD.  Is it
better to move pg_xlog to another partition on the same SSD?  Or to a
physical disk or SAN?

2. What are the implications for doing a base backup?  I believe I
read that putting pg_xlog on a different partition meant it would be
omitted from a file-system bulk copy (e.g. rsync), and this was a GOOD
thing, because the copy operation would be faster -- not copying
pg_xlog would not prevent the standby server from starting, because
the information it needed would be in the WAL files that would be
shipped separately.  Have I got that right?

Finally, the suggestion.

I'd really like to read an explicit discussion of this in the official
documentation, rather than just glean what I can from answers to
questions.
The possibility of moving pg_xlog to another disk is mentioned in the
documentation, but I almost missed it because it is in "the wrong
place".  It is in Section 29.5 -- "Reliability and the Write Ahead
Log" / "WAL Internals".  But I wasn't interested in anything INTERNAL:
I wanted to know where I should try to locate it/them.  So I'd looked
in "the obvious places" -- Section 18 (Server configuration), and in
particular 18.2 "File Locations".  Could I suggest that the motivation
for doing this, and the consequences for backups, should be discussed
in "the right place" -- in or near the section that talks about file
locations in the context of server configuration.

Robert.

-- 
Robert Inder,0131 229 1052 / 07808 492 213
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Invoice Table Design

2016-11-24 Thread Robert Heinen

I was wondering if anyone might be able to help me out with a table design
question.

A quick intro -- I'm helping a company switch from a mongo database over to
postgresql (yay!). The company is a marketplace app for musicians and
hosts. The basic idea is that a host can book a musician for an event, like
a wedding or a birthday. Also, an artist and a host can be either basic or
"pro" accounts -- if they're "pro" then they pay a little bit more and get
some extra features.

The design I'm struggling with is how to handle invoices and transactions
in postgres. In mongo, everything is stuffed into a single 'invoices' table
that includes sender and receiver addresses, the amount of the invoice,
taxes, etc. It also contains a reference to the booked event, the artist
and the host, as well as some state information through nullable columns --
created date, sent date, paid date.

At the same time the table also tracks the above mentioned "pro"
subscriptions by utilizing a type field (so 'concertfee' vs
'subscription'). So both type of invoices are stuffed into the table and
it's up to the application to understand the difference in the types.

To translate this to postgres, I'm leaning towards breaking out the
different types of invoices into their own tables but keeping the basics of
an invoice (sender, receiver, amount) and then referencing from specific
tables like -- subscription_invoices and event_invoices.

so tables would be:
invoices (invoice_uuid primary key)
event_invoices (invoice_uuid FK, event_uuid FK)
artist_subscription_invoices (invoice_uuid FK, artist_uuid FK)

There is one last interesting part. When an event is booked, two invoices
are generated -- one from the artist to the host for the payment of the
concert, and then a second one from my company to the artist for the
booking fee. Again, these seem like two separate tables, with, I suppose,
 a kind of a parent-child relationship (we can't have a booking fee unless
we have the original invoice for the booking).

Thanks for reading --any insight, comments, or questions are appreciated!

Rob

Re: [GENERAL] Graphical entity relation model

2016-10-04 Thread Robert Stone

Hello,

If you have jdk 1.8 or above installed go to www.executequery.org and
download the latest jar file. Download the JDBC driver from Postgres and
set it up. It's open source.
It has an ERD generator but obviously depends on having all your foreign
keys declared in order to link tables, etc. After generating the ERD you
then have to re-organise it by dragging and dropping so that when you
print, the diagram is readable.
I use it all the time for testing, etc.

HTH,
Rob

On 1 October 2016 at 04:45,  wrote:

> > Does anybody know a Software for generating graphical entity relation
> models from existing postgresql databases?
> >
> > Best regards Johannes
>
> I use dbWrench (dbwrench.com). It's not free, but they do have a free
> trial version so you can see if you like it before you buy it. It's also
> not expensive compared to many of these sorts of tools. It also runs on all
> 3 major platforms (it's written in Java) and the developer is responsive if
> you find a problem.
>
> If money is no object, you can look at Power Designer (by Sybase). I used
> to use it years ago and liked it even if it was MS-Windows only, but the
> price has gone up so much only companies can really afford it now, IMO.
>
> HTH,
> Kevin
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] [HACKERS] pg_dumping extensions having sequences with 9.6beta3

2016-07-29 Thread Robert Haas

On Wed, Jul 27, 2016 at 2:24 AM, Michael Paquier
 wrote:
> On Wed, Jul 27, 2016 at 8:07 AM, Stephen Frost  wrote:
>> That'd be great.  It's definitely on my list of things to look into, but
>> I'm extremely busy this week.  I hope to look into it on Friday, would
>> be great to see what you find.
>
> Sequences that are directly defined in extensions do not get dumped,
> and sequences that are part of a serial column in an extension are
> getting dumped. Looking into this problem, getOwnedSeqs() is visibly
> doing an incorrect assumption: sequences owned by table columns are
> dumped unconditionally, but this is not true for sequences that are
> part of extensions. More precisely, dobj->dump is being enforced to
> DUMP_COMPONENT_ALL, which makes the sequence definition to be dumped.
> Oops.
>
> The patch attached fixes the problem for me. I have added as well
> tests in test_pg_dump in the shape of sequences defined in an
> extension, and sequences that are part of a serial column. This patch
> is also able to work in the case where a sequence is created as part
> of a serial column, and gets removed after, say with ALTER EXTENSION
> DROP SEQUENCE. The behavior for sequences and serial columns that are
> not part of extensions is unchanged.
>
> Stephen, it would be good if you could check the correctness of this
> patch as you did all this refactoring of pg_dump to support catalog
> ACLs. I am sure by the way that checking for (owning_tab->dobj.dump &&
> DUMP_COMPONENT_DEFINITION) != 0 is not good because of for example the
> case of a serial column created in an extension where the sequence is
> dropped from the extension afterwards.

Stephen, is this still on your list of things for today?  Please
provide a status update.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] PgQ and pg_dump

2016-06-21 Thread Robert Haas

On Thu, Jun 16, 2016 at 1:46 PM, Martín Marqués  wrote:
> The comment is accurate on what is going to be dumpable and what's not
> from the code. In our case, as the pgq schema is not dumpable becaause
> it comes from an extension, other objects it contain will not be
> dumpable as well.
>
> That's the reason why the PgQ event tables created by
> pgq.create_queue() are not dumped.

That sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: R: [GENERAL] Vacuum full: alternatives?

2016-06-20 Thread Robert Wysocki

On Mon, 2016-06-20 at 11:43 +0200, Job wrote:
> Hi Andreas,
> 
> >I would suggest run only autovacuum, and with time you will see a not 
> >more growing table. There is no need for vacuum full.
> 
> So new record, when will be pg_bulkloaded, will replace "marked-free" 
> location?

Yes, but you may have to make autovacuum process more aggressive. Even
then it might not cope with the frequency of your bulk I/U/D.

There are many tools to use instead of VACUUM FULL though, have a look
at pg_reorg and pgcompact for example. Do not be afraid to use an
awesome tool called Google as well ;-) (This is like the very basic
problem everyone asks about, so you'll find many more in-depth answers
and articles; the phrase you want to google for is "postgresql bloat")

Cheers,
R.

NET-A-PORTER.COM

CONFIDENTIALITY NOTICE
The information in this email is confidential and is intended solely for the 
addressee. Access to this email by anyone else is unauthorised. If you are not 
the intended recipient, you must not read, use or disseminate the information. 
Any views expressed in this message are those of the individual sender, except 
where the sender specifically states them to be the views of Net-A-Porter Group 
Limited. 

The Net-A-Porter Group Limited is a company registered in England & Wales 
Number: 3820604 Registered Office: 1 The Village Offices, Westfield, Ariel Way, 
London, W12 7GF

Re: [GENERAL] [HACKERS] Change in order of criteria - reg

2016-06-03 Thread Robert Haas

On Wed, Jun 1, 2016 at 5:22 AM, Amit Langote
 wrote:
> On 2016/06/01 13:07, sri harsha wrote:
>> Hi,
>>
>> In PostgreSQL , does the order in which the criteria is given matter ??
>> For example
>>
>> Query 1 : Select * from TABLE where a > 5 and b < 10;
>>
>> Query 2 : Select * from TABLE where b <10 and a > 5;
>>
>> Are query 1 and query 2 the same in PostgreSQL or different ?? If its
>> different , WHY ??
>
> tl;dr they are the same.  As in they obviously produce the same result and
> result in invoking the same plan.
>
> Internally, optimizer will order application of those quals in resulting
> plan based on per-tuple cost of individual quals.  So a cheaper, more
> selective qual might result in short-circuiting of relatively expensive
> quals for a large number of rows in the table saving some cost in
> run-time.  Also, if index scan is chosen and quals pushed down, the
> underlying index method might know to order quals smartly.
>
> However, the cost-markings of operators/functions involved in quals better
> match reality.  By default, most operators/functions in a database are
> marked with cost of 1 unit.  Stable sorting used in ordering of quals
> would mean the order of applying quals in resulting plan matches the
> original order (ie, the order in which they appear in the query).  So, if
> the first specified qual really happens to be an expensive qual but marked
> as having the same cost as other less expensive quals, one would have to
> pay the price of evaluating it for all the rows.  Whereas, correctly
> marking the costs could have avoided that (as explained above).  Note that
> I am not suggesting that ordering quals in query by their perceived cost
> is the solution.  Keep optimizer informed by setting costs appropriately
> and it will do the right thing more often than not. :)

I think that if the costs are actually identical, the system will keep
the quals in the same order they were written - so then the order does
matter, a little bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Create index concurrently hanging with big table on pgsql 9.3.12

2016-05-09 Thread Robert Anderson

 You still haven't stated why you think it is blocked?

Ouput from iotop:

3990 be/4 postgres0.00 B/s0.00 B/s  0.00 %  0.00 % postgres:
postgres flip [local] CREATE INDEX

The process isn't reading or writing anything for many hours, but it's
using almost 90% of CPU.

How long has it been taking?

backend_start| 2016-05-07 11:48:39.218398-03

More than 50 hours.

What is your maintenance_work_mem set to?

maintenance_work_mem = 352MB



2016-05-09 14:34 GMT-03:00 Joshua D. Drake :

> On 05/09/2016 10:32 AM, Robert Anderson wrote:
>
>> Only one line returned:
>>
>> postgres=# select * from pg_stat_activity where pid=3990;
>> -[ RECORD 1 ]+
>> datid| 16434
>> datname  | flip
>> pid  | 3990
>> usesysid | 10
>> usename  | postgres
>> application_name | psql
>> client_addr  |
>> client_hostname  |
>> client_port  | -1
>> backend_start| 2016-05-07 11:48:39.218398-03
>> xact_start   | 2016-05-07 11:48:43.417734-03
>> query_start  | 2016-05-07 11:48:43.417734-03
>> state_change | 2016-05-07 11:48:43.417742-03
>> waiting  | f
>> state| active
>> query| CREATE INDEX CONCURRENTLY index_texto
>>   |   ON flip_pagina_edicao
>>   |   USING hash
>>   |   (texto COLLATE pg_catalog."default");
>>
>
> So it isn't blocked by a lock. You still haven't stated why you think it
> is blocked? How long has it been taking? What is your maintenance_work_mem
> set to?
>
>
> JD
>
>
>
> --
> Command Prompt, Inc.  http://the.postgres.company/
> +1-503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
> Everyone appreciates your honesty, until you are honest with them.
>

Re: [GENERAL] Create index concurrently hanging with big table on pgsql 9.3.12

2016-05-09 Thread Robert Anderson

Only one line returned:

postgres=# select * from pg_stat_activity where pid=3990;
-[ RECORD 1 ]+
datid| 16434
datname  | flip
pid  | 3990
usesysid | 10
usename  | postgres
application_name | psql
client_addr  |
client_hostname  |
client_port  | -1
backend_start| 2016-05-07 11:48:39.218398-03
xact_start   | 2016-05-07 11:48:43.417734-03
query_start  | 2016-05-07 11:48:43.417734-03
state_change | 2016-05-07 11:48:43.417742-03
waiting  | f
state| active
query| CREATE INDEX CONCURRENTLY index_texto
 |   ON flip_pagina_edicao
 |   USING hash
 |   (texto COLLATE pg_catalog."default");

postgres=#

2016-05-09 14:20 GMT-03:00 Tom Lane :

> Robert Anderson  writes:
> > There aren't transactions blocking:
>
> > postgres=# SELECT
> > postgres-#w.query as waiting_query,
> > postgres-#w.pid as waiting_pid,
> > postgres-#w.usename as w_user,
> > postgres-#l.pid as blocking_pid,
> > postgres-#l.usename as blocking_user,
> > postgres-#t.schemaname || '.' || t.relname as tablename
> > postgres-#FROM pg_stat_activity w
> > postgres-#JOIN pg_locks l1 ON (w.pid = l1.pid and not l1.granted)
> > postgres-#JOIN pg_locks l2 on (l1.relation = l2.relation and
> l2.granted)
> > postgres-#JOIN pg_stat_activity l ON (l2.pid = l.pid)
> > postgres-#JOIN pg_stat_user_tables t ON (l1.relation = t.relid)
> > postgres-#WHERE w.waiting;
> >  waiting_query | waiting_pid | w_user | blocking_pid | blocking_user |
> > tablename
> >
> ---+-++--+---+---
> > (0 rows)
>
> This test proves little, because that last JOIN will discard locks on
> non-table objects, and what CREATE INDEX CONCURRENTLY would be most
> likely to be blocked on is transaction VXIDs.  However, since
> pg_stat_activity claims that "waiting" is false, probably there isn't
> anything in pg_locks.  Still, it'd be better to do
> "select * from pg_stat_activity where pid = 3990" and be sure.
>
> regards, tom lane
>

Re: [GENERAL] Create index concurrently hanging with big table on pgsql 9.3.12

2016-05-09 Thread Robert Anderson

Hi,

There aren't transactions blocking:

postgres=# SELECT
postgres-#w.query as waiting_query,
postgres-#w.pid as waiting_pid,
postgres-#w.usename as w_user,
postgres-#l.pid as blocking_pid,
postgres-#l.usename as blocking_user,
postgres-#t.schemaname || '.' || t.relname as tablename
postgres-#FROM pg_stat_activity w
postgres-#JOIN pg_locks l1 ON (w.pid = l1.pid and not l1.granted)
postgres-#JOIN pg_locks l2 on (l1.relation = l2.relation and l2.granted)
postgres-#JOIN pg_stat_activity l ON (l2.pid = l.pid)
postgres-#JOIN pg_stat_user_tables t ON (l1.relation = t.relid)
postgres-#WHERE w.waiting;
 waiting_query | waiting_pid | w_user | blocking_pid | blocking_user |
tablename
---+-++--+---+---
(0 rows)

How long I'm waiting:

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_activity where query like 'CREATE%';
-[ RECORD 1 ]+
datid| 16434
datname  | flip
pid  | 3990
usesysid | 10
usename  | postgres
application_name | psql
client_addr  |
client_hostname  |
client_port  | -1
backend_start| 2016-05-07 11:48:39.218398-03
xact_start   | 2016-05-07 11:48:43.417734-03
query_start  | 2016-05-07 11:48:43.417734-03
state_change | 2016-05-07 11:48:43.417742-03
waiting  | f
state| active
query| CREATE INDEX CONCURRENTLY index_texto
 |   ON flip_pagina_edicao
 |   USING hash
 |   (texto COLLATE pg_catalog."default");

postgres=#


In attachment follows a strace sample of the running process.

2016-05-09 13:25 GMT-03:00 Melvin Davidson :

> Try the following query. See if it shows you if another transaction is
> blocking the needed locks to create the index.
>
> SELECT
>w.query as waiting_query,
>w.pid as waiting_pid,
>w.usename as w_user,
>l.pid as blocking_pid,
>l.usename as blocking_user,
>t.schemaname || '.' || t.relname as tablename
>FROM pg_stat_activity w
>JOIN pg_locks l1 ON (w.pid = l1.pid and not l1.granted)
>JOIN pg_locks l2 on (l1.relation = l2.relation and l2.granted)
>JOIN pg_stat_activity l ON (l2.pid = l.pid)
>JOIN pg_stat_user_tables t ON (l1.relation = t.relid)
>WHERE w.waiting;
>
>
> On Mon, May 9, 2016 at 11:50 AM, Joshua D. Drake 
> wrote:
>
>> On 05/09/2016 05:04 AM, Robert Anderson wrote:
>>
>>> Hi,
>>>
>>> We are trying to create a index concurrently but, at least apparently,
>>> it hangs in a infinite loop and never ends.
>>>
>>
>> Apparently how?
>>
>> How long did you wait?
>>
>> JD
>>
>>
>> --
>> Command Prompt, Inc.  http://the.postgres.company/
>> +1-503-667-4564
>> PostgreSQL Centered full stack support, consulting and development.
>> Everyone appreciates your honesty, until you are honest with them.
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>


strace.txt.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Create index concurrently hanging with big table on pgsql 9.3.12

2016-05-09 Thread Robert Anderson

Hi,

We are trying to create a index concurrently but, at least apparently, it
hangs in a infinite loop and never ends.

Our version:

flip=# select version();
version

 PostgreSQL 9.3.12 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
2012
0313 (Red Hat 4.4.7-16), 64-bit
(1 row)


Index creation:

CREATE INDEX CONCURRENTLY index_texto
  ON flip_pagina_edicao
  USING hash
  (texto COLLATE pg_catalog."default");  -- texto is a text data type.

Size of the table:

flip=# select pg_size_pretty(pg_total_relation_size('flip_pagina_edicao'));
 pg_size_pretty

 956 GB
(1 row)


Process strace:


semop(622611, {{6, 1, 0}}, 1)   = 0
semop(655380, {{5, -1, 0}}, 1)  = 0
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
semop(655380, {{5, -1, 0}}, 1)  = 0
semop(622611, {{6, 1, 0}}, 1)   = 0
semop(655380, {{5, -1, 0}}, 1)  = 0
semop(622611, {{6, 1, 0}}, 1)   = 0
semop(622611, {{6, 1, 0}}, 1)   = 0
semop(622611, {{6, 1, 0}}, 1)   = 0
semop(655380, {{5, -1, 0}}, 1)  = 0
semop(622611, {{6, 1, 0}}, 1)   = 0
semop(655380, {{5, -1, 0}}, 1)  = 0
semop(622611, {{6, 1, 0}}, 1)   = 0
semop(557073, {{2, 1, 0}}, 1)   = 0
semop(655380, {{5, -1, 0}}, 1)  = 0
semop(655380, {{5, -1, 0}}, 1)  = 0
semop(655380, {{5, -1, 0}}, 1)  = 0
semop(655380, {{5, -1, 0}}, 1)  = 0
...


Thanks in advance.

Robert

Re: [GENERAL] Request - repeat value of \pset title during \watch interations

2016-03-21 Thread Robert Haas

On Mon, Mar 21, 2016 at 2:09 PM, David G. Johnston
 wrote:
> On Monday, March 21, 2016, Tom Lane  wrote:
>> "David G. Johnston"  writes:
>> > I'd rather not omit sleep but removing "Watch every" is fine (preferred
>> > actually), so:
>> > Title Is Here Mon Mar 21 15:05:06 2016 (5s)
>>
>> Meh ... seems a bit awkward to me.  Couldn't you include " (5s)" in the
>> title, if you want that info?  If it's variable, you could still
>> accommodate that:
>
> Actually, only if it's a variable that you setup and repeat and you show.  A
> bit cumbersome and mixes the parts that are title and those that are present
> only because you are watching.

Ah, come on.  This doesn't really seem like an issue we should spend
more time quibbling about.  I think Tom's version is fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] Request - repeat value of \pset title during \watch interations

2016-03-21 Thread Robert Haas

On Mon, Mar 21, 2016 at 11:17 AM, David G. Johnston
 wrote:
>> And does everybody agree that this is a desirable change?
>
> Adding the title is desirable.  While I'm inclined to bike-shed this
> anything that gets it in I can live with and so I'm content letting the
> author/committer decide where exactly things (including whitespace) appear.
>
> It is a bit odd that the "Watch every %s" gets centered if the result is
> wide but that the title remains left-aligned.

Well, the title isn't normally centered, but yeah, that is odd.  Yeah,
that is odd.  Come to think of it, I think I might have expected the
title to appear *above* "Watch every %s", not below it.  That might
decrease the oddness.

As for letting the committer decide, I don't care about this
personally at all, so I'm only looking at it to be nice to the people
who do.  Whatever is the consensus is OK with me.  I just don't want
to get yelled at later for committing something here, so it would be
nice to see a few votes for whatever we're gonna do here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] Request - repeat value of \pset title during \watch interations

2016-03-21 Thread Robert Haas

On Sun, Mar 20, 2016 at 9:31 AM, Michael Paquier
 wrote:
> And the patch attached gives the following output:
> With title:
> =# \watch 1
> Watch every 1sSun Mar 20 22:28:38 2016
> popo
>  a
> ---
>  1
> (1 row)
>
> And without title:
> Watch every 1sSun Mar 20 22:29:31 2016
>
>  a
> ---
>  1
> (1 row)

And does everybody agree that this is a desirable change?

As for the patch itself, you could replace all this:

+   /*
+* Take into account any title present in the user setup as a part of
+* what is printed for each iteration by using it as a header.
+*/
+   if (myopt.title)
+   {
+   title_len = strlen(myopt.title);
+   title = pg_malloc(title_len + 50);
+   head_title = pg_strdup(myopt.title);
+   }
+   else
+   {
+   title_len = 0;
+   title = pg_malloc(50);
+   head_title = pg_strdup("");
+   }

...with:

head_title = pg_strdup(myopt.title != NULL ? myopt.title : "");
title_len = strlen(head_title);
title = pg_malloc(title_len + 50);

Better yet, include the + 50 in title_len, and then you don't need to
reference the number 50 again further down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] Request - repeat value of \pset title during \watch interations

2016-03-19 Thread Robert Haas

On Thu, Mar 10, 2016 at 1:40 AM, David G. Johnston
 wrote:
> Adding -hackers for consideration in the Commitfest.

I don't much like how this patch uses the arbitrary constant 50 in no
fewer than 5 locations.

Also, it seems like we could arrange for head_title to be "" rather
than NULL when myopt.title is NULL.  Then instead of this:

+if (head_title)
+snprintf(title, strlen(myopt.title) + 50,
+ _("Watch every %lds\t%s\n%s"),
+ sleep, asctime(localtime(&timer)), head_title);
+else
+snprintf(title, 50, _("Watch every %lds\t%s"),
+ sleep, asctime(localtime(&timer)));

...we could just the first branch of that if all the time.

 if (res == -1)
+{
+pg_free(title);
+pg_free(head_title);
 return false;
+}

Instead of repeating the cleanup code, how about making this break;
then, change the return statement at the bottom of the function to
return (res != -1).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] vacuum - reclaiming disk space.

2016-03-19 Thread Robert McAlpine

Just to throw some extreme ideas out there, you could stand up a postgres
on some other server, pg_dump your current database and use that dump to
build up your second postgres. Use that new postgres when your system goes
live again after downtime. Restoring from a dump means your database would
not take up as much space since I assume your issue is that all that space
was allocated to postgres for the purposes of your large number of table
updates.

On Thu, Mar 17, 2016 at 11:34 AM, Melvin Davidson 
wrote:

>
>
> On Thu, Mar 17, 2016 at 10:57 AM, bricklen  wrote:
>
>> On Thu, Mar 17, 2016 at 7:27 AM, Mike Blackwell 
>> wrote:
>>
>>> I have a large table with numerous indexes which has approximately
>>> doubled in size after adding a column - every row was rewritten and 50% of
>>> the tuples are dead.  I'd like to reclaim this space, but VACUUM FULL
>>> cannot seem to finish within the scheduled downtime.
>>>
>>> Any suggestions for reclaiming the space without excessive downtime?
>>>
>>
>> pg_repack is a good tool for removing bloat.
>> https://github.com/reorg/pg_repack
>>
>>
> "I have a large table with numerous indexes :
> My first thought is, "DEFINE NUMEROUS". How many indexes do you actually
> have? How many of those indexes are actually used? In addition to VACUUMing
> the table, it also needs to go through every index you have.
> So find out if you have any unneeded indexes with:
>
> SELECT n.nspname as schema,
>i.relname as table,
>i.indexrelname as index,
>i.idx_scan,
>i.idx_tup_read,
>i.idx_tup_fetch,
>pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
> quote_ident(i.relname))) AS table_size,
>pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
> quote_ident(i.indexrelname))) AS index_size,
>pg_get_indexdef(idx.indexrelid) as idx_definition
>   FROM pg_stat_all_indexes i
>   JOIN pg_class c ON (c.oid = i.relid)
>   JOIN pg_namespace n ON (n.oid = c.relnamespace)
>   JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
>  WHERE i.idx_scan = 0
>AND NOT idx.indisprimary
>    AND NOT idx.indisunique
>  ORDER BY 1, 2, 3;
>
> Then drop any index that shows up!
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>



-- 
Robert McAlpine
DevOps Engineer
Perfecta Federal <http://www.perfectafederal.com/>
6506 Loisdale Road
Springfield, VA 22150
O: 202.888.4949 ext 1005
C: 757.620.3503
r...@pfcta.com

Re: [GENERAL] Cannot create role, no default superuser role exists

2016-03-10 Thread Robert McAlpine

Thank you for the quick reply.

I very recently had some success by doing a complete purge of
postgresql-9.5, reinstalling and then copying in the original data
directory (or I guess I could have just pointed to it, but w/e). I did not
expect this to work, as I thought it would just point to the table holding
the roles

But to answer your questions anyways:


Did you install postgres-xc over your existing instance?

I think that is probably what happened. The reason I installed it is that I
tried to run the command 'pg_ctl' and my box and couldn't find it. I had
not used pg_ctl before, so I assumed it was a utility package, and
installed it via postgres-xc.


Which data directory, the postgres-xc one or the original Postgres one?

The original.


Installed from a package or source?

>From a package, specifically apt-get install postgresql-9.5


The dump was from a pre-xc version of Postgres?

No, my postgres version has not changed recently


before or after the reinstall?

After. I wanted to see if apt-get remove and then apt-get install would be
enough of a jolt to fix the issue, as I didn't yet want to try apt-get
purge, which would remove all my data.





On Thu, Mar 10, 2016 at 7:44 PM, Adrian Klaver 
wrote:

> On 03/10/2016 04:11 PM, Robert McAlpine wrote:
>
>>
>> Postgresql 9.5, Ubuntu 14.04.
>>
>> I broke my ability to access postgres after attemping to install
>> postgres-xc (ironic, since I installed that to use pg_ctl to reload my
>> pg_hba.conf without restarting the entirety of postgres).
>>
>
> That is available with the stock Postgres, unless I am missing something:
>
> http://www.postgresql.org/docs/9.5/interactive/auth-pg-hba-conf.html
> "The pg_hba.conf file is read on start-up and when the main server process
> receives a SIGHUP signal. If you edit the file on an active system, you
> will need to signal the postmaster (using pg_ctl reload or kill -HUP) to
> make it re-read the file."
>
>
> Did you install postgres-xc over your existing instance?
>
>
>> After doing so I can no longer access my databases via psql or any other
>> utilities, getting the error:
>>
>> FATAL:  role "postgres" does not exist
>>
>> Unfortunately the same error gets thrown if I attempt to use 'createuser'.
>>
>> The postgres user and role were the only ones able to access postgresql
>> server, so altering the permissions in my pg_hba.conf also got me nowhere.
>>
>> I made a complete copy of the data directory, took a snapshot (it is on
>>
>
> Which data directory, the postgres-xc one or the original Postgres one?
>
> a vm), then purged postgresql-9.5 and reinstalled, restoring to a dump
>> that is a few days old. I thought about copying all the files from the
>>
>
> Installed from a package or source?
>
> The dump was from a pre-xc version of Postgres?
>
> It would seem to me if you reinstalled in default manner you would have a
> postgres user available. So where did you get:
>
> FATAL:  role "postgres" does not exist
>
> before or after the reinstall?
>
> data directory (except for the roles table file) into a fresh install
>> with a correct default postgres role, but my gut tells me that screwing
>> around with those files is doomed to fail.
>>
>
> Yeah, I would hold off doing that until it is clearer what is going on.
>
>
>> I would appreciate any help or thoughts on how to recover access to the
>> data.
>> --
>> Robert McAlpine
>> r...@pfcta.com <mailto:r...@pfcta.com>
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>



-- 
Robert McAlpine
DevOps Engineer
Perfecta Federal <http://www.perfectafederal.com/>
6506 Loisdale Road
Springfield, VA 22150
O: 202.888.4949 ext 1005
C: 757.620.3503
r...@pfcta.com

[GENERAL] Cannot create role, no default superuser role exists

2016-03-10 Thread Robert McAlpine

Postgresql 9.5, Ubuntu 14.04.

I broke my ability to access postgres after attemping to install
postgres-xc (ironic, since I installed that to use pg_ctl to reload my
pg_hba.conf without restarting the entirety of postgres).

After doing so I can no longer access my databases via psql or any other
utilities, getting the error:

FATAL:  role "postgres" does not exist

Unfortunately the same error gets thrown if I attempt to use 'createuser'.

The postgres user and role were the only ones able to access postgresql
server, so altering the permissions in my pg_hba.conf also got me nowhere.

I made a complete copy of the data directory, took a snapshot (it is on a
vm), then purged postgresql-9.5 and reinstalled, restoring to a dump that
is a few days old. I thought about copying all the files from the data
directory (except for the roles table file) into a fresh install with a
correct default postgres role, but my gut tells me that screwing around
with those files is doomed to fail.

I would appreciate any help or thoughts on how to recover access to the
data.
-- 
Robert McAlpine
r...@pfcta.com

[GENERAL] ts_rank and ts_rank_cd with multiple search terms

2015-07-25 Thread Robert Nikander

Hi,

I’m reading about the ranking functions [1], and I have a couple questions… 

1. Is ts_rank taking proximity of terms into account? It seems like it is, but 
the docs suggest that only ts_rank_cd does that.
2. Is there a way to search multiple terms like ‘a | b | c …’ but score higher 
when multiple match, AND take into account distance between words? It doesn’t 
seem like basic use of ts_rank or ts_rank_cd is doing this.  Do you recommend a 
custom ranking function here?

For example, I want to search for “black bear” and get better results ordered 
so that documents with both words close together score highest, and the 
document with only “bear" is the last.

create table search_test ( title text, body text, vec tsvector );
— These 3 have “black” and “bear” at different distances from each other
insert into search_test values ('close', 'The black bear sat on a brown 
rock and ate a barrel of red berries.');
insert into search_test values ('medium', 'The brown bear sat on a black 
rock and ate a barrel of red berries.’);
insert into search_test values ('far', 'The brown bear sat on a red rock 
and ate a barrel of black berries.’);
— This one has the word “bear”, but not “black"
insert into search_test values ('only bear', 'The brown bear sat on a red 
rock and ate a barrel of orange berries.');
update search_test set vec = to_tsvector(body);

Now a query:

select title, ts_rank(vec, q) as rank
from search_test, to_tsquery('black & bear') q
where vec @@ q order by rank desc;

That surprises me by scoring close > medium > far. Hence, my question #1.  
Substituting ts_rank_cd also works, as expected.

If I change the query to `black | bear`, to try to match “only bear” as well, 
then both ts_rank and ts_rank_cd return equal rankings for “close”, “medium” 
and “far”.

Any recommendations? 

thanks,
Rob 



[1] 
http://www.postgresql.org/docs/9.4/static/textsearch-controls.html#TEXTSEARCH-RANKING

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bounded Zone Offset Query

2015-07-10 Thread Robert DiFalco

On Fri, Jul 10, 2015 at 9:40 AM, John McKown 
wrote:

> On Fri, Jul 10, 2015 at 11:15 AM, Robert DiFalco  > wrote:
>
>> I have a table something like this:
>>
>> CREATE TABLE devices (
>>   owner_idBIGINT NOT NULL,
>>   utc_offset_secs INT,
>>   PRIMARY KEY (uid, platform),
>>   FOREIGN KEY (owner_id) REFERENCES users(id) ON DELETE CASCADE
>> );
>>
>>
>> I want to do a query from an application that returns all devices who's
>> time is between 10am or 10pm for a given instant in time.
>>
>> For example:
>>
>> SELECT *
>> FROM devices
>> WHERE :utcSecondsOfDay + utc_offset_secs BETWEEEN 10am AND 10pm
>>
>>
>>
>> In the above query assume the correct "seconds of day" values for 10am
>> and 10pm. The problem is that I have to do addition on each record to do
>> the above query and I can't imagine that would be efficient. Also I think
>> it this example query will only work in some cases. For example what if the
>> utcSecondsOfDay is 360 (i.e. 1am) and the utc_offset_secs is -5 hours?
>>
>> Thanks
>>
>
> I'm not sure exactly what :utSecondsOfDay really is. I guess it is an
> integer which is a "time" value, such as "seconds after midnight" and thus
> would range be from 0 to 24*60*60=86400 (actually 86399, I guess). In this
> notation, 10 am would be 10*60*60 or 36000 and 10pm would be 22*60*60 or
> 79200. How about calculating, in your application code, two different
> values: utcSecondsLower and utSecondsHigher. utcSecondsLower would be
> 36000-utcSecondsOfDay. utcSecondsHigher would be 79200-utSecondsOfDay.
> Change the SELECT to be:
>
> SELECT *
> FROM devices
> WHERE ut_offsec_secs BETWEEN :utcSecondsLower AND :utcSecondsHigher;
>
> I am not sure, but I think that is legal. Or maybe it gives you another
> approach.
>
>
> --
>
> Schrodinger's backup: The condition of any backup is unknown until a
> restore is attempted.
>
> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
>
> He's about as useful as a wax frying pan.
>
> 10 to the 12th power microphones = 1 Megaphone
>
> Maranatha! <><
> John McKown
>


Thanks John, let me revise my original query to give a CORRECT and working
example, maybe this will help. I've created a query that actually works,
it's just ugly and I'd like to figure out how to make it like the example
you gave (i.e. no math on the utc_offset field, just comparisons).

 select *
 from devices d
 where (now() at time zone 'UTC' + make_interval(hours :=
d.utc_offset))::time
BETWEEN time '10:00' AND time '22:00';

[GENERAL] Bounded Zone Offset Query

2015-07-10 Thread Robert DiFalco

I have a table something like this:

CREATE TABLE devices (
  owner_idBIGINT NOT NULL,
  utc_offset_secs INT,
  PRIMARY KEY (uid, platform),
  FOREIGN KEY (owner_id) REFERENCES users(id) ON DELETE CASCADE
);


I want to do a query from an application that returns all devices who's
time is between 10am or 10pm for a given instant in time.

For example:

SELECT *
FROM devices
WHERE :utcSecondsOfDay + utc_offset_secs BETWEEEN 10am AND 10pm



In the above query assume the correct "seconds of day" values for 10am and
10pm. The problem is that I have to do addition on each record to do the
above query and I can't imagine that would be efficient. Also I think it
this example query will only work in some cases. For example what if the
utcSecondsOfDay is 360 (i.e. 1am) and the utc_offset_secs is -5 hours?

Thanks

Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Robert DiFalco

>
> I am fairly certain this does not give you the correct results.
> Specifically, the minimum value for each cDate is going to be 1 since
> count(*) counts NULLs.  count(u) should probably work.
> 
>
> Yes you are right, I forgot to change COUNT(*) to COUNT(id), as you
mention COUNT(u.*) will also work. I just couldn't get the idea of
generating a sequence form 0 to 6 to work correctly. The approach I'm using
seems to give the correct results (with COUNT(u.id)).

Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Robert DiFalco

Paul, I'm sure I'm missing something but it seems like your approach will
not work. It's because the LEFT OUTER JOIN is on the numeric day of the
week. So if you had this query going over weeks or months of data wouldn't
you have the same issue with the days that had no new users not being
factored into the AVG?  I ended up doing something like this, which seems
to work pretty well.

WITH usersByDay AS (
SELECT cDate, COUNT(*) AS total
FROM (
SELECT generate_series(
{CALENDAR_INTERVAL.START}::DATE,
{CALENDAR_INTERVAL.END}::DATE,
interval '1 day')::DATE AS cDate
) AS c
LEFT OUTER JOIN users u ON u.created::DATE = c.cDate
GROUP BY cDate),
avgUsersByDOW AS (
SELECT extract('dow' FROM cDate) AS nDay,
 to_char(cDate,'Dy') AS "Day",
ROUND(AVG(total), 2) AS "New Users"
FROM usersByDay
GROUP BY 1, 2
ORDER BY 1)
SELECT "Day", "New Users" FROM avgUsersByDOW ORDER BY nDay




On Mon, Jul 6, 2015 at 11:30 AM, Paul Jungwirth  wrote:

> Thanks Paul, I guess I'm not sure how a generate_series between 0 to 6
>> would solve this problem. Wouldn't I have to generate a series based on
>> the date range (by day) and then group by DOW _after_ that? Can you give
>> me an example of how I'd do it with a series based on 0 to 6?
>>
>
> Looks like David Johnston beat me to it! :-) But this is what I had in
> mind:
>
> SELECT  s.d AS dow,
> COUNT(u.id) c
> FROMgenerate_series(0, 6) s(d)
> LEFT OUTER JOIN users u
> ON  EXTRACT(dow FROM created) = s.d
> GROUP BY dow
> ORDER BY dow
> ;
>
> You can also get human-readable DOW names by creating a 7-row CTE table
> and joining to it based on the numeric dow.
>
> Paul
>
>

Re: [GENERAL] Average New Users Per DOW

2015-07-06 Thread Robert DiFalco

Thanks Paul, I guess I'm not sure how a generate_series between 0 to 6
would solve this problem. Wouldn't I have to generate a series based on the
date range (by day) and then group by DOW _after_ that? Can you give me an
example of how I'd do it with a series based on 0 to 6?

On Mon, Jul 6, 2015 at 10:58 AM, Paul Jungwirth  wrote:

> > I'm not sure how to create a result where I get the average number of
> > new users per day of the week. My issues are that days that did not
> > have any new users will not be factored into the average
>
> This is a pretty common problem with time-series queries when there is
> sparse data. My go-to solution is to use generate_series---in your case
> from 0 to 6---then do a left join from there to your actual data.
>
> Paul
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

[GENERAL] Average New Users Per DOW

2015-07-06 Thread Robert DiFalco

I'm not sure how to create a result where I get the average number of new
users per day of the week. My issues are that days that did not have any
new users will not be factored into the average, giving an overinflated
result.

This is what I started with:

WITH userdays AS
  (SELECT u.created::DATE AS created,
  to_char(u.created,'Dy') AS d,
  COUNT(*) AS total
   FROM users u
   GROUP BY 1,2),
userdays_avg AS
  (SELECT extract('dow'
  FROM created) AS nDay,
  d AS "Day",
  AVG(total) AS "New Users"
   FROM userdays
   GROUP BY 1,2
   ORDER BY 1)
SELECT "Day", "New Users"
FROM userdays_avg
ORDER BY nDay;


But you can see it wont give correct results since (for example) Monday's
with no new users will not be counted in the average as 0.

TIA

R.

[GENERAL] use null or 0 in foreign key column, to mean "no value"?

2015-06-26 Thread Robert Nikander

Hi,

(Maybe my subject line should be: `is not distinct from` and indexes.)

In Postgres 9.4, I’ve got a table of ‘items’ that references a table ‘colors’.  
Not all items have colors, so I created a nullable column in items like:

  color_id bigint references colors  

There is also an index on color_id:

  create index on items (color_id);

I thought this was the right way to do it, but now I’m not so sure... In 
application code, prepared statements want to say: `select * from items where 
color_id = ?` and that `?` might be a int or null, so that doesn’t work.  I 
used `is not distinct from` instead of =, which has the right meaning, but now 
I notice it doesn’t use the index for queries that replace `=` with `is not 
distinct from`, and queries run much slower.  Using `explain` confirms: it’s 
doing sequential scans where `=` was using index.

So… is this bad DB design to use null to mean that an item has no color? Should 
I instead put a special row in `colors`, maybe with id = 0, to represent the 
“no color” value?  Or is there some way to make an index work with nulls and 
`is not distinct from`?  

thank you,
Rob





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-15 Thread Robert Haas

On Fri, Jun 12, 2015 at 7:27 PM, Steve Kehlet  wrote:
> Just wanted to report that I rolled back my VM to where it was with 9.4.2
> installed and it wouldn't start. I installed 9.4.4 and now it starts up just
> fine:
>
>> 2015-06-12 16:05:58 PDT [6453]: [1-1] LOG:  database system was shut down
>> at 2015-05-27 13:12:55 PDT
>> 2015-06-12 16:05:58 PDT [6453]: [2-1] LOG:  MultiXact member wraparound
>> protections are disabled because oldest checkpointed MultiXact 1 does not
>> exist on disk
>> 2015-06-12 16:05:58 PDT [6457]: [1-1] LOG:  autovacuum launcher started
>> 2015-06-12 16:05:58 PDT [6452]: [1-1] LOG:  database system is ready to
>> accept connections
>>  done
>> server started
>
> And this is showing up in my serverlog periodically as the emergency
> autovacuums are running:
>
>> 2015-06-12 16:13:44 PDT [6454]: [1-1] LOG:  MultiXact member wraparound
>> protections are disabled because oldest checkpointed MultiXact 1 does not
>> exist on disk
>
> **Thank you Robert and all involved for the resolution to this.**
>
>> With the fixes introduced in this release, such a situation will result in
>> immediate emergency autovacuuming until a correct oldestMultiXid value can
>> be determined
>
> Okay, I notice these vacuums are of the "to prevent wraparound" type (like
> VACUUM FREEZE), that do hold locks preventing ALTER TABLEs and such. Good to
> know, we'll plan our software updates accordingly.
>
> Is there any risk until these autovacuums finish?

As long as you see only a modest number of files in
pg_multixact/members, you're OK.  But in theory, until that emergency
autovacuuming finishes, there's nothing keeping that directory from
wrapping around.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] SET LOCAL synchronous_commit TO OFF

2015-06-11 Thread Robert DiFalco

I want to make sure I understand the repercussions of this before making it
a global setting.

As far as I can tell this will put data/referential integrity at risk. It
only means that there is a period of time (maybe 600 msecs) between when a
commit occurs and when that data is safe in the case of a server crash.

Is there anything else I'm missing?

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Robert Haas

On Mon, Jun 8, 2015 at 1:23 PM, Alvaro Herrera  wrote:
> Andres Freund wrote:
>> On June 8, 2015 7:06:31 PM GMT+02:00, Alvaro Herrera 
>>  wrote:
>> >I might be misreading the code, but PMSIGNAL_START_AUTOVAC_LAUNCHER
>> >only causes things to happen (i.e. a new worker to be started) when
>> >autovacuum is disabled.  If autovacuum is enabled, postmaster
>> >receives the signal and doesn't do anything about it, because the
>> >launcher is already running.  Of course, regularly scheduled autovac
>> >workers will eventually start running, but perhaps this is not good
>> >enough.
>>
>> Well that's just the same for the plain xid precedent? I'd not mind
>> improving further, but that seems like a separate thing.
>
> Sure.  I just concern that we might be putting excessive trust on
> emergency workers being launched at a high pace.  With normally
> configured systems (naptime=1min) it shouldn't be a problem, but we have
> seen systems with naptime set to one hour or so, and those might feel
> some pain; and it would get worse the more databases you have, because
> people might feel the need to space the autovac runs even more.
>
> (My personal alarm bells go off when I see autovac_naptime=15min or
> more, but apparently not everybody sees things that way.)

Uh, I'd echo that sentiment if you did s/15min/1min/

I think Andres's patch is just improving the existing mechanism so
that it's reliable, and you're proposing something notably different
which might be better, but which is really a different proposal
altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Robert Haas

On Fri, Jun 5, 2015 at 4:40 PM, Andres Freund  wrote:
>>I think we would be foolish to rush that part into the tree.  We
>>probably got here in the first place by rushing the last round of
>>fixes too much; let's try not to double down on that mistake.
>
> My problem with that approach is that I think the code has gotten 
> significantly more complex in the least few weeks. I have very little trust 
> that the interactions between vacuum, the deferred truncations in the 
> checkpointer, the state management in shared memory and recovery are correct. 
> There's just too many non-local subtleties here.
>
> I don't know what the right thing to do here is.

That may be true, but we don't need to get to perfect to be better
than 9.4.2 and 9.4.3, where some people can't start the database.

I will grant you that, if the patch I committed today introduces some
regression that is even worse, life will suck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Robert Haas

On Fri, Jun 5, 2015 at 2:36 PM, Alvaro Herrera  wrote:
> Tom Lane wrote:
>> Robert Haas  writes:
>
>> > There are at least two other known issues that seem like they should
>> > be fixed before we release:
>>
>> > 1. The problem that we might truncate an SLRU members page away when
>> > it's in the buffers, but not drop it from the buffers, leading to a
>> > failure when we try to write it later.
>>
>> > 2. Thomas's bug fix for another longstanding but that occurs when you
>> > run his checkpoint-segment-boundary.sh script.
>>
>> > I think we might want to try to fix one or both of those before
>> > cutting a new release.  I'm less sold on the idea of installing
>> > WAL-logging in this minor release.  That probably needs to be done,
>> > but right now we've got stuff that worked in early 9.3.X release and
>> > is now broken, and I'm in favor of fixing that first.
>>
>> Okay, but if we're not committing today to a release wrap on Monday,
>> I don't see it happening till after PGCon.
>
> In that case, I think we should get a release out next week.  The
> current situation is rather badly broken and dangerous, and the above
> two bugs are nowhere as problematic.  If we can get fixes for these over
> the weekend, that would be additional bonus.

Yeah, I think I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Robert Haas

On Fri, Jun 5, 2015 at 2:47 PM, Andres Freund  wrote:
> On 2015-06-05 14:33:12 -0400, Tom Lane wrote:
>> Robert Haas  writes:
>> > 1. The problem that we might truncate an SLRU members page away when
>> > it's in the buffers, but not drop it from the buffers, leading to a
>> > failure when we try to write it later.
>
> I've got a fix for this, and about three other issues I found during
> development of the new truncation codepath.
>
> I'll commit the fix tomorrow.

OK.  Then I think we should release next week, so we get the fixes we
have out before PGCon.  The current situation is not good.

>> > I think we might want to try to fix one or both of those before
>> > cutting a new release.  I'm less sold on the idea of installing
>> > WAL-logging in this minor release.  That probably needs to be done,
>> > but right now we've got stuff that worked in early 9.3.X release and
>> > is now broken, and I'm in favor of fixing that first.
>
> I've implemented this, and so far it removes more code than it
> adds. It's imo also a pretty clear win in how understandable the code
> is.  The remaining work, besides testing, is primarily going over lots
> of comment and updating them. Some of them are outdated by the patch,
> and some already were.
>
> Will post tonight, together with the other fixes, after I get back from
> climbing.
>
> My gut feeling right now is that it's a significant improvement, and
> that it'll be reasonable to include it. But I'd definitely like some
> independent testing for it, and I'm not sure if that's doable in time
> for the wrap.

I think we would be foolish to rush that part into the tree.  We
probably got here in the first place by rushing the last round of
fixes too much; let's try not to double down on that mistake.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Robert Haas

On Fri, Jun 5, 2015 at 12:00 PM, Andres Freund  wrote:
> On 2015-06-05 11:43:45 -0400, Tom Lane wrote:
>> Robert Haas  writes:
>> > On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch  wrote:
>> >> I read through this version and found nothing to change.  I encourage 
>> >> other
>> >> hackers to study the patch, though.  The surrounding code is challenging.
>>
>> > Andres tested this and discovered that my changes to
>> > find_multixact_start() were far more creative than intended.
>> > Committed and back-patched with a trivial fix for that stupidity and a
>> > novel-length explanation of the changes.
>>
>> So where are we on this?  Are we ready to schedule a new set of
>> back-branch releases?  If not, what issues remain to be looked at?
>
> We're currently still doing bad things while the database is in an
> inconsistent state (i.e. read from SLRUs and truncate based on the
> results of that). It's quite easy to reproduce base backup startup
> failures.
>
> On the other hand, that's not new. And the fix requires, afaics, a new
> type of WAL record (issued very infrequently). I'll post a first version
> of the patch, rebased ontop of Robert's commit, tonight or tomorrow. I
> guess we can then decide what we'd like to do.

There are at least two other known issues that seem like they should
be fixed before we release:

1. The problem that we might truncate an SLRU members page away when
it's in the buffers, but not drop it from the buffers, leading to a
failure when we try to write it later.

2. Thomas's bug fix for another longstanding but that occurs when you
run his checkpoint-segment-boundary.sh script.

I think we might want to try to fix one or both of those before
cutting a new release.  I'm less sold on the idea of installing
WAL-logging in this minor release.  That probably needs to be done,
but right now we've got stuff that worked in early 9.3.X release and
is now broken, and I'm in favor of fixing that first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Robert Haas

On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch  wrote:
> On Thu, Jun 04, 2015 at 05:29:51PM -0400, Robert Haas wrote:
>> Here's a new version with some more fixes and improvements:
>
> I read through this version and found nothing to change.  I encourage other
> hackers to study the patch, though.  The surrounding code is challenging.

Andres tested this and discovered that my changes to
find_multixact_start() were far more creative than intended.
Committed and back-patched with a trivial fix for that stupidity and a
novel-length explanation of the changes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-04 Thread Robert Haas

On Thu, Jun 4, 2015 at 5:29 PM, Robert Haas  wrote:
> - Forces aggressive autovacuuming when the control file's
> oldestMultiXid doesn't point to a valid MultiXact and enables member
> wraparound at the next checkpoint following the correction of that
> problem.

Err, enables member wraparound *protection* at the next checkpoint,
not the wraparound itself.

It's worth noting that every startup will now include one of the
following two messages:

LOG:  MultiXact member wraparound protections are now enabled

Or:

LOG:  MultiXact member wraparound protections are disabled because
oldest checkpointed MultiXact %u does not exist on disk
...where %u is probably 1

If you get the second one, you'll get the first one later after vacuum
has done its thing and a checkpoint has happened.

This is, obviously, some log chatter for people who don't have a
problem and never have, but I think it's worth emitting the first
message at startup even when there's no problem, so that people don't
have to make inferences from the absence of a message.  We can tell
people very simply that (1) if they see the first message, everything
is fine; (2) if they see the second message, autovacuum is going to
clean things up and they will eventually see the first message; and
(3) if they see neither message, they haven't upgraded to a fixed
version yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-04 Thread Robert Haas

On Thu, Jun 4, 2015 at 12:57 PM, Robert Haas  wrote:
> On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas  wrote:
>> Thanks for the review.
>
> Here's a new version.  I've fixed the things Alvaro and Noah noted,
> and some compiler warnings about set but unused variables.
>
> I also tested it, and it doesn't quite work as hoped.  If started on a
> cluster where oldestMultiXid is incorrectly set to 1, it starts up and
> indicates that the member wraparound guards are disabled.  But even
> after everything is fixed, they don't get enabled until after the next
> full restart.  I think that's because TruncateMultiXact() bails out
> too early, without calling DetermineSafeOldestOffset.
>
> My attempt at a quick fix for that problem didn't work out, so I'm
> posting this version for now to facilitate further review and testing.

Here's a new version with some more fixes and improvements:

- SetOffsetVacuumLimit was failing to set MultiXactState->oldestOffset
when the oldest offset became known if the now-known value happened to
be zero.  Fixed.

- SetOffsetVacuumLimit now logs useful information at the DEBUG1
level, so that you can see that it's doing what it's supposed to.

- TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the
offsetStopLimit even if it can't truncate anything.  This seems
useless, but it's not, because it may be that the last checkpoint
advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real
value, and now we can actually set offsetStopLimit properly.

- TruncateMultiXact no longer calls find_multixact_start when there
are no remaining multixacts.  This is actually a completely separate
bug that goes all the way back to 9.3.0 and can potentially cause
TruncateMultiXact to remove every file in pg_multixact/offsets.
Restarting the cluster becomes impossible because TrimMultiXact barfs.

- TruncateMultiXact now logs a message if the oldest multixact does
not precede the earliest one on disk and is not equal to the next
multixact and yet does not exist.  The value of the log message is
that it discovered the bug mentioned in the previous line, so I think
it's earning its keep.

With this version, I'm able to see that when you start up a
9.3.latest+this patch with a cluster that has a bogus value of 1 in
relminmxid, datminmxid, and the control file, autovacuum vacuums
everything in sight, all the values get set back to the right thing,
and the next checkpoint enables the member-wraparound guards.  This
works with both autovacuum=on and autovacuum=off; the emergency
mechanism kicks in as intended.  We'll want to warn people with big
databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may
want to pre-vacuum those tables before upgrading to avoid a vacuum
storm.  But generally I'm pretty happy with this: forcing those values
to get fixed so that we can guard against member-space wraparound
seems like the right thing to do.

So, to summarize, this patch does the following:

- Fixes the failure-to-start problems introduced in 9.4.2 in
complicated pg_upgrade scenarios.
- Prevents the new calls to find_multixact_start we added in 9.4.2
from happening during recovery, where they can only create failure
scenarios.  The call in TruncateMultiXact that has been there all
along is not eliminated, but now handles failure more gracefully.
- Fixes possible incorrect removal of every single
pg_multixact/offsets file when no multixacts exist; one file should be
kept.
- Forces aggressive autovacuuming when the control file's
oldestMultiXid doesn't point to a valid MultiXact and enables member
wraparound at the next checkpoint following the correction of that
problem.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit 87aa15fe5257060e0c971e135dd9f460fdc00bd0
Author: Robert Haas 
Date:   Thu Jun 4 11:58:49 2015 -0400

bar

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9568ff1..7c457a6 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -198,13 +198,24 @@ typedef struct MultiXactStateData
 	/* next-to-be-assigned offset */
 	MultiXactOffset nextOffset;
 
+	/* Have we completed multixact startup? */
+	bool		finishedStartup;
+
 	/*
-	 * Oldest multixact that is still on disk.  Anything older than this
-	 * should not be consulted.  These values are updated by vacuum.
+	 * Oldest multixact that is still potentially referenced by a relation.
+	 * Anything older than this should not be consulted.  These values are
+	 * updated by vacuum.
 	 */
 	MultiXactId oldestMultiXactId;
 	Oid			oldestMultiXactDB;
+
+	/*
+	 * Oldest multixact offset that is potentially referenced by a
+	 * multixact referenced by a relation.  We don't always know this value,
+	 * so there's a flag here to ind

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-04 Thread Robert Haas

On Thu, Jun 4, 2015 at 1:27 PM, Andres Freund  wrote:
> On 2015-06-04 12:57:42 -0400, Robert Haas wrote:
>> + /*
>> +  * Do we need an emergency autovacuum?  If we're not sure, assume yes.
>> +  */
>> + return !oldestOffsetKnown ||
>> + (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD);
>
> I think without teaching autovac about those rules, this might just lead
> to lots of autovac processes starting without knowing they should do
> something? They know about autovacuum_multixact_freeze_age, but they
> know neither about !oldestOffsetKnown nor, afaics, about pending offset
> wraparounds?

You're right, but that's why the latest patch has changes in
MultiXactMemberFreezeThreshold.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-04 Thread Robert Haas

On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas  wrote:
> Thanks for the review.

Here's a new version.  I've fixed the things Alvaro and Noah noted,
and some compiler warnings about set but unused variables.

I also tested it, and it doesn't quite work as hoped.  If started on a
cluster where oldestMultiXid is incorrectly set to 1, it starts up and
indicates that the member wraparound guards are disabled.  But even
after everything is fixed, they don't get enabled until after the next
full restart.  I think that's because TruncateMultiXact() bails out
too early, without calling DetermineSafeOldestOffset.

My attempt at a quick fix for that problem didn't work out, so I'm
posting this version for now to facilitate further review and testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit eb39cf10e4ff853ed4b9d3fab599cf42911e6f70
Author: Robert Haas 
Date:   Thu Jun 4 11:58:49 2015 -0400

bar

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 699497c..209d3e6 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -196,13 +196,24 @@ typedef struct MultiXactStateData
 	/* next-to-be-assigned offset */
 	MultiXactOffset nextOffset;
 
+	/* Have we completed multixact startup? */
+	bool		finishedStartup;
+
 	/*
-	 * Oldest multixact that is still on disk.  Anything older than this
-	 * should not be consulted.  These values are updated by vacuum.
+	 * Oldest multixact that is still potentially referenced by a relation.
+	 * Anything older than this should not be consulted.  These values are
+	 * updated by vacuum.
 	 */
 	MultiXactId oldestMultiXactId;
 	Oid			oldestMultiXactDB;
+
+	/*
+	 * Oldest multixact offset that is potentially referenced by a
+	 * multixact referenced by a relation.  We don't always know this value,
+	 * so there's a flag here to indicate whether or not we currently do.
+	 */
 	MultiXactOffset oldestOffset;
+	bool		oldestOffsetKnown;
 
 	/*
 	 * This is what the previous checkpoint stored as the truncate position.
@@ -219,6 +230,7 @@ typedef struct MultiXactStateData
 
 	/* support for members anti-wraparound measures */
 	MultiXactOffset offsetStopLimit;
+	bool offsetStopLimitKnown;
 
 	/*
 	 * Per-backend data starts here.  We have two arrays stored in the area
@@ -348,10 +360,11 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
 		MultiXactOffset offset2);
 static void ExtendMultiXactOffset(MultiXactId multi);
 static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static void DetermineSafeOldestOffset(MultiXactId oldestMXact);
+static void DetermineSafeOldestOffset(MultiXactOffset oldestMXact);
 static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
 		 MultiXactOffset start, uint32 distance);
-static MultiXactOffset find_multixact_start(MultiXactId multi);
+static bool SetOffsetVacuumLimit(bool finish_setup);
+static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
 static void WriteMZeroPageXlogRec(int pageno, uint8 info);
 
 
@@ -960,7 +973,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 	 * against catastrophic data loss due to multixact wraparound.  The basic
 	 * rules are:
 	 *
-	 * If we're past multiVacLimit or the safe threshold for member storage space,
+	 * If we're past multiVacLimit or the safe threshold for member storage
+	 * space, or we don't know what the safe threshold for member storage is,
 	 * start trying to force autovacuum cycles.
 	 * If we're past multiWarnLimit, start issuing warnings.
 	 * If we're past multiStopLimit, refuse to create new MultiXactIds.
@@ -969,6 +983,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 	 *--
 	 */
 	if (!MultiXactIdPrecedes(result, MultiXactState->multiVacLimit) ||
+		!MultiXactState->oldestOffsetKnown ||
 		(MultiXactState->nextOffset - MultiXactState->oldestOffset
 			> MULTIXACT_MEMBER_SAFE_THRESHOLD))
 	{
@@ -1083,7 +1098,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 	 *--
 	 */
 #define OFFSET_WARN_SEGMENTS	20
-	if (MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
+	if (MultiXactState->offsetStopLimitKnown &&
+		MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset,
  nmembers))
 		ereport(ERROR,
 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
@@ -1095,7 +,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 		   MultiXactState->offsetStopLimit - nextOffset - 1),
  errhint("Execute a database-wide VACUUM in database with OID %u with reduced vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age settings.",
 		 MultiXactState->oldestMultiXactDB)));
-	else if (MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit,
+	else if (M

Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-04 Thread Robert Haas

On Thu, Jun 4, 2015 at 2:42 AM, Noah Misch  wrote:
> I like that change a lot.  It's much easier to seek forgiveness for wasting <=
> 28 GiB of disk than for deleting visibility information wrongly.

I'm glad you like it.  I concur.

>> 2. If setting the offset stop limit (the point where we refuse to
>> create new multixact space), we don't arm the stop point.  This means
>> that if you're in this situation, you run without member wraparound
>> protection until it's corrected.  A message gets logged once per
>> checkpoint telling you that you have this problem, and another message
>> gets logged when things get straightened out and the guards are
>> enabled.
>>
>> 3. If setting the vacuum force point, we assume that it's appropriate
>> to immediately force vacuum.
>
> Those seem reasonable, too.

Cool.

>> I've only tested this very lightly - this is just to see what you and
>> Noah and others think of the approach.  As compared with the previous
>> approach, it has the advantage of making minimal assumptions about the
>> sanity of what's on disk.  It has the disadvantage that, for some
>> people, the member-wraparound guard won't be enabled at startup -- but
>> note that those people can't start 9.3.7/9.4.2 *at all*, so currently
>> they are either running without member wraparound protection anyway
>> (if they haven't upgraded to those releases) or they're down entirely.
>
> That disadvantage is negligible, considering.

All right.

>> Another disadvantage is that we'll be triggering what may be quite a
>> bit of autovacuum activity for some people, which could be painful.
>> On the plus side, they'll hopefully end up with sane relminmxid and
>> datminmxid guards afterwards.
>
> That sounds good so long as each table requires just one successful emergency
> autovacuum.  I'm not seeing code to ensure that the launched autovacuum will
> indeed perform a full-table scan and update relminmxid; is it there?

No.  Oops.

> For sites that can't tolerate an autovacuum storm, what alternative can we
> provide?  Is "SET vacuum_multixact_freeze_table_age = 0; VACUUM " of
> every table, done before applying the minor update, sufficient?

I don't know.  In practical terms, they probably need to ensure that
if pg_multixact/offsets/ does not exist, no relations have
relminmxid = 1 and no remaining databases have datminmxid = 1.
Exactly what it will take to get there is possibly dependent on which
minor release you are running; on current minor releases, I am hopeful
that what you propose is sufficient.

>>  static void
>> -DetermineSafeOldestOffset(MultiXactId oldestMXact)
>> +DetermineSafeOldestOffset(MultiXactOffset oldestMXact)
>
> Leftover change from an earlier iteration?  The values passed continue to be
> MultiXactId values.

Oopsie.

>>   /* move back to start of the corresponding segment */
>> - oldestOffset -= oldestOffset %
>> - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
>> + offsetStopLimit = oldestOffset - (oldestOffset %
>> + (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
>> + /* always leave one segment before the wraparound point */
>> + offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * 
>> SLRU_PAGES_PER_SEGMENT);
>> +
>> + /* if nothing has changed, we're done */
>> + if (prevOffsetStopLimitKnown && offsetStopLimit == prevOffsetStopLimit)
>> + return;
>>
>>   LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
>> - /* always leave one segment before the wraparound point */
>> - MultiXactState->offsetStopLimit = oldestOffset -
>> - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
>> + MultiXactState->offsetStopLimit = oldestOffset;
>
> That last line needs s/oldestOffset/offsetStopLimit/, I presume.

Another oops.

Thanks for the review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Robert Haas

On Wed, Jun 3, 2015 at 8:24 AM, Robert Haas  wrote:
> On Tue, Jun 2, 2015 at 5:22 PM, Andres Freund  wrote:
>>> > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning
>>> > the disk it'll always get one at a segment boundary, right? I'm not sure
>>> > that's actually ok; because the value at the beginning of the segment
>>> > can very well end up being a 0, as MaybeExtendOffsetSlru() will have
>>> > filled the page with zeros.
>>> >
>>> > I think that should be harmless, the worst that can happen is that
>>> > oldestOffset errorneously is 0, which should be correct, even though
>>> > possibly overly conservative, in these cases.
>>>
>>> Uh oh.  That seems like a real bad problem for this approach.  What
>>> keeps that from being the opposite of too conservative?  There's no
>>> "safe" value in a circular numbering space.
>>
>> I think it *might* (I'm really jetlagged) be fine because that'll only
>> happen after a upgrade from < 9.3. And in that case we initialize
>> nextOffset to 0. That ought to safe us?
>
> That's pretty much betting the farm on the bugs we know about today
> being the only ones there are.  That seems imprudent.

So here's a patch taking a different approach.  In this approach, if
the multixact whose members we want to look up doesn't exist, we don't
use a later one (that might or might not be valid).  Instead, we
attempt to cope with the unknown.  That means:

1. In TruncateMultiXact(), we don't truncate.

2. If setting the offset stop limit (the point where we refuse to
create new multixact space), we don't arm the stop point.  This means
that if you're in this situation, you run without member wraparound
protection until it's corrected.  A message gets logged once per
checkpoint telling you that you have this problem, and another message
gets logged when things get straightened out and the guards are
enabled.

3. If setting the vacuum force point, we assume that it's appropriate
to immediately force vacuum.

I've only tested this very lightly - this is just to see what you and
Noah and others think of the approach.  As compared with the previous
approach, it has the advantage of making minimal assumptions about the
sanity of what's on disk.  It has the disadvantage that, for some
people, the member-wraparound guard won't be enabled at startup -- but
note that those people can't start 9.3.7/9.4.2 *at all*, so currently
they are either running without member wraparound protection anyway
(if they haven't upgraded to those releases) or they're down entirely.
Another disadvantage is that we'll be triggering what may be quite a
bit of autovacuum activity for some people, which could be painful.
On the plus side, they'll hopefully end up with sane relminmxid and
datminmxid guards afterwards.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9568ff1..4400fc5 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -198,13 +198,24 @@ typedef struct MultiXactStateData
 	/* next-to-be-assigned offset */
 	MultiXactOffset nextOffset;

+	/* Have we completed multixact startup? */
+	bool		finishedStartup;
+
 	/*
-	 * Oldest multixact that is still on disk.  Anything older than this
-	 * should not be consulted.  These values are updated by vacuum.
+	 * Oldest multixact that is still potentially referenced by a relation.
+	 * Anything older than this should not be consulted.  These values are
+	 * updated by vacuum.
 	 */
 	MultiXactId oldestMultiXactId;
 	Oid			oldestMultiXactDB;
+
+	/*
+	 * Oldest multixact offset that is potentially referenced by a
+	 * multixact referenced by a relation.  We don't always know this value,
+	 * so there's a flag here to indicate whether or not we currently do.
+	 */
 	MultiXactOffset oldestOffset;
+	bool		oldestOffsetKnown;

 	/*
 	 * This is what the previous checkpoint stored as the truncate position.
@@ -221,6 +232,7 @@ typedef struct MultiXactStateData

 	/* support for members anti-wraparound measures */
 	MultiXactOffset offsetStopLimit;
+	bool offsetStopLimitKnown;

 	/*
 	 * Per-backend data starts here.  We have two arrays stored in the area
@@ -350,10 +362,11 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
 		MultiXactOffset offset2);
 static void ExtendMultiXactOffset(MultiXactId multi);
 static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static void DetermineSafeOldestOffset(MultiXactId oldestMXact);
+static void DetermineSafeOldestOffset(MultiXactOffset oldestMXact);
 stat

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Robert Haas

On Tue, Jun 2, 2015 at 5:22 PM, Andres Freund  wrote:
>> > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning
>> > the disk it'll always get one at a segment boundary, right? I'm not sure
>> > that's actually ok; because the value at the beginning of the segment
>> > can very well end up being a 0, as MaybeExtendOffsetSlru() will have
>> > filled the page with zeros.
>> >
>> > I think that should be harmless, the worst that can happen is that
>> > oldestOffset errorneously is 0, which should be correct, even though
>> > possibly overly conservative, in these cases.
>>
>> Uh oh.  That seems like a real bad problem for this approach.  What
>> keeps that from being the opposite of too conservative?  There's no
>> "safe" value in a circular numbering space.
>
> I think it *might* (I'm really jetlagged) be fine because that'll only
> happen after a upgrade from < 9.3. And in that case we initialize
> nextOffset to 0. That ought to safe us?

That's pretty much betting the farm on the bugs we know about today
being the only ones there are.  That seems imprudent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Robert Haas

On Wed, Jun 3, 2015 at 4:48 AM, Thomas Munro
 wrote:
> On Wed, Jun 3, 2015 at 3:42 PM, Alvaro Herrera  
> wrote:
>> Thomas Munro wrote:
>>> On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera  
>>> wrote:
>>> > My guess is that the file existed, and perhaps had one or more pages,
>>> > but the wanted page doesn't exist, so we tried to read but got 0 bytes
>>> > back.  read() returns 0 in this case but doesn't set errno.
>>> >
>>> > I didn't find a way to set things so that the file exists but is of
>>> > shorter contents than oldestMulti by the time the checkpoint record is
>>> > replayed.
>>>
>>> I'm just starting to learn about the recovery machinery, so forgive me
>>> if I'm missing something basic here, but I just don't get this.  As I
>>> understand it, offsets/0046 should either have been copied with that
>>> page present in it if it existed before the backup started (apparently
>>> not in this case), or extended to contain it by WAL records that come
>>> after the backup label but before the checkpoint record that
>>> references it (also apparently not in this case).
>>
>> Exactly --- that's the spot at which I am, also.  I have had this
>> spinning in my head for three days now, and tried every single variation
>> that I could think of, but like you I was unable to reproduce the issue.
>> However, our customer took a second base backup and it failed in exactly
>> the same way, module some changes to the counters (the file that
>> didn't exist was 004B rather than 0046).  I'm still at a loss at what
>> the failure mode is.  We must be missing some crucial detail ...
>
> I have finally reproduced that error!  See attached repro shell script.
>
> The conditions are:
>
> 1.  next multixact == oldest multixact (no active multixacts, pointing
> past the end)
> 2.  next multixact would be the first item on a new page (multixact % 2048 == 
> 0)
> 3.  the page must not be the first in a segment (or we'd get the
> read-zeroes case)
>
> That gives you odds of 1/2048 * 31/32 * (probability of a wraparound
> vacuum followed by no multixact creations right before your backup
> checkpoint).  That seems like reasonably low odds... if it happened
> twice in a row, maybe I'm missing something here and there is some
> other way to get this...
>
> I realise now that this is actually a symptom of a problem spotted by
> Noah recently:
>
> http://www.postgresql.org/message-id/20150601045534.gb23...@tornado.leadboat.com
>
> He noticed the problem for segment boundaries, when not in recovery.
> In recovery, segment boundaries don't raise an error (the read-zeroes
> case applies), but page boundaries do.  The fix is probably to do
> nothing if they are the same, as we do elsewhere, like in the attached
> patch.

Actually, we still need to call DetermineSafeOldestOffset in that
case.  Otherwise, if someone goes from lots of multixacts to none, the
stop point won't advance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Robert Haas

On Tue, Jun 2, 2015 at 4:19 PM, Andres Freund  wrote:
> I'm not really convinced tying things closer to having done trimming is
> easier to understand than tying things to recovery having finished.
>
> E.g.
> if (did_trim)
> oldestOffset = GetOldestReferencedOffset(oldest_datminmxid);
> imo is harder to understand than if (!InRecovery).
>
> Maybe we could just name it finishedStartup and rename the functions 
> accordingly?

Basing that particular call site on InRecovery doesn't work, because
InRecovery isn't set early enough.  But I'm fine to rename it to
whatever.

> Maybe it's worthwhile to add a 'NB: At this stage the data directory is
> not yet necessarily consistent' StartupMultiXact's comments, to avoid
> reintroducing problems like this?

Sure.

>>   /*
>> +  * We can read this without a lock, because it only changes when 
>> nothing
>> +  * else is running.
>> +  */
>> + did_trim = MultiXactState->didTrimMultiXact;
>
> Err, Hot Standby? It might be ok to not lock, but the comment is
> definitely wrong. I'm inclined to simply use locking, this doesn't look
> sufficiently critical performancewise.

/me nods.  Good point.

> Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning
> the disk it'll always get one at a segment boundary, right? I'm not sure
> that's actually ok; because the value at the beginning of the segment
> can very well end up being a 0, as MaybeExtendOffsetSlru() will have
> filled the page with zeros.
>
> I think that should be harmless, the worst that can happen is that
> oldestOffset errorneously is 0, which should be correct, even though
> possibly overly conservative, in these cases.

Uh oh.  That seems like a real bad problem for this approach.  What
keeps that from being the opposite of too conservative?  There's no
"safe" value in a circular numbering space.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Robert Haas

On Tue, Jun 2, 2015 at 11:44 AM, Andres Freund  wrote:
> On 2015-06-02 11:37:02 -0400, Robert Haas wrote:
>> The exact circumstances under which we're willing to replace a
>> relminmxid with a newly-computed one that differs are not altogether
>> clear to me, but there's an "if" statement protecting that logic, so
>> there are some circumstances in which we'll leave the existing value
>> intact.
>
> I guess we'd have to change that then.

Yeah, but first we'd need to assess why it's like that.  Tom was the
one who installed the current logic, but I haven't been able to fully
understand it.

>> It would similarly do so when the oldest MXID reference in the
>> relation is in fact 1, but that value can't be vacuumed away yet.
>
> I'd thought of just forcing consumption of one multixact in that
> case. Not pretty, but imo acceptable.

What if multixact 1 still has living members?

>> Also, the database might be really big.  Even if it were true that a
>> full scan of every table would get us out of this state, describing
>> the time that it would take to do that as "relatively short" seems to
>> me to be considerably understating the impact of a full-cluster
>> VACUUM.
>
> Well. We're dealing with a corrupted cluster. Having a way out that's
> done automatically, even if it takes a long while, is pretty good
> already. In many cases the corruption (i.e. pg_upgrade) happened long
> ago, so the table's relminmxid will already have been recomputed.  I
> think that's acceptable.

I'm a long way from being convinced the automated recovery is
possible.  There are so many different scenarios here that it's very
difficult to reason generally about what the "right" thing to do is.
I agree it would be nice if we had it, though.

>> With regard to the more general question of WAL-logging this, are you
>> going to work on that?  Are you hoping Alvaro or I will work on that?
>> Should we draw straws?  It seems like somebody needs to do it.
>
> I'm willing to invest the time to develop an initial version, but I'll
> need help evaluating it. I don't have many testing resources available
> atm, and I'm not going to trust stuff I developed while travelling by
> just looking at the code.

I'm willing to help with that.  Hopefully I'm not the only one, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Robert Haas

On Tue, Jun 2, 2015 at 11:36 AM, Andres Freund  wrote:
>> That would be a departure from the behavior of every existing release
>> that includes this code based on, to my knowledge, zero trouble
>> reports.
>
> On the other hand we're now at about bug #5 attributeable to the odd way
> truncation works for multixacts. It's obviously complex and hard to get
> right. It makes it harder to cope with the wrong values left in
> datminxid etc. So I'm still wondering whether fixing this for good isn't
> the better approach.

It may well be.  But I think we should do something more surgical
first.  Perhaps we can justify the pain and risk of making changes to
the WAL format in the back-branches, but let's not do it in a rush.
If we can get this patch to a state where it undoes the damage
inflicted in 9.3.7/9.4.2, then we will be in a state where we have as
much reliability as we had in 9.3.6 plus the protections against
member-space wraparound added in 9.3.7 - which, like the patch I'm
proposing now, were directly motivated by multiple, independent bug
reports.  That seems like a good place to get to.  If nothing else, it
will buy us some time to figure out what else we want to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Robert Haas

On Tue, Jun 2, 2015 at 11:27 AM, Andres Freund  wrote:
> On 2015-06-02 11:16:22 -0400, Robert Haas wrote:
>> I'm having trouble figuring out what to do about this.  I mean, the
>> essential principle of this patch is that if we can't count on
>> relminmxid, datminmxid, or the control file to be accurate, we can at
>> least look at what is present on the disk.  If we also cannot count on
>> that to be accurate, we are left without any reliable source of
>> information.  Consider a hypothetical cluster where all our stored
>> minmxids of whatever form are corrupted (say, all change to 1) and in
>> addition there are stray files in pg_multixact.  I don't think there's
>> really any way to get ourselves out of trouble in that scenario.
>
> If we were to truncate after vacuum, and only on the primary (via WAL
> logging), we could, afaics, just rely on all the values to be
> recomputed. I mean relminmxid will be recomputed after a vacuum, and
> thus, after some time, will datminmxid and the control file value.  We
> could just force a value of 1 to always trigger anti-wraparound vacuums
> (or wait for that to happen implicitly, to delay the impact?). That'll
> then should then fix the problem in a relatively short amount of time?

The exact circumstances under which we're willing to replace a
relminmxid with a newly-computed one that differs are not altogether
clear to me, but there's an "if" statement protecting that logic, so
there are some circumstances in which we'll leave the existing value
intact.  If we force non-stop vacuuming in that scenario, autovacuum
will just run like crazy without accomplishing anything, which
wouldn't be good.  It would similarly do so when the oldest MXID
reference in the relation is in fact 1, but that value can't be
vacuumed away yet.

Also, the database might be really big.  Even if it were true that a
full scan of every table would get us out of this state, describing
the time that it would take to do that as "relatively short" seems to
me to be considerably understating the impact of a full-cluster
VACUUM.

With regard to the more general question of WAL-logging this, are you
going to work on that?  Are you hoping Alvaro or I will work on that?
Should we draw straws?  It seems like somebody needs to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Robert Haas

On Tue, Jun 2, 2015 at 8:56 AM, Andres Freund  wrote:
> But what *definitely* looks wrong to me is that a TruncateMultiXact() in
> this scenario now (since a couple weeks ago) does a
> SimpleLruReadPage_ReadOnly() in the members slru via
> find_multixact_start(). That just won't work acceptably when we're not
> yet consistent. There very well could not be a valid members segment at
> that point?  Am I missing something?

Yes: that code isn't new.

TruncateMultiXact() called SimpleLruReadPage_ReadOnly() directly in
9.3.0 and every subsequent release until 9.3.7/9.4.2.  The only thing
that's changed is that we've moved that logic into a function called
find_multixact_start() instead of having it directly inside that
function.  We did that because we needed to use the same logic in some
other places.  The reason why 9.3.7/9.4.2 are causing problems for
people that they didn't have previously is because those new,
additional call sites were poorly chosen and didn't include adequate
protection against calling that function with an invalid input value.
What this patch is about is getting back to the situation that we were
in from 9.3.0 - 9.3.6 and 9.4.0 - 9.4.1, where TruncateMultiXact() did
the thing that you're complaining about here but no one else did.

>From my point of view, I think that you are absolutely right to
question what's going on in TruncateMultiXact().  It's kooky, and
there may well be bugs buried there.  But I don't think fixing that
should be the priority right now, because we have zero reports of
problems attributable to that logic.  I think the priority should be
on undoing the damage that we did in 9.3.7/9.4.2, when we made other
places to do the same thing.  We started getting trouble reports
attributable to those changes *almost immediately*, which means that
whether or not TruncateMultiXact() is broken, these new call sites
definitely are.  I think we really need to fix those new places ASAP.

> I think at the very least we'll have to skip this step while not yet
> consistent. That really sucks, because we'll possibly end up with
> multixacts that are completely filled by the time we've reached
> consistency.

That would be a departure from the behavior of every existing release
that includes this code based on, to my knowledge, zero trouble
reports.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-02 Thread Robert Haas

On Tue, Jun 2, 2015 at 1:21 AM, Noah Misch  wrote:
> On Mon, Jun 01, 2015 at 02:06:05PM -0400, Robert Haas wrote:
>> On Mon, Jun 1, 2015 at 12:46 AM, Noah Misch  wrote:
>> > On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote:
>> >> SetMultiXactIdLimit() bracketed certain parts of its
>> >> logic with if (!InRecovery), but those guards were ineffective because
>> >> it gets called before InRecovery is set in the first place.
>> >
>> > SetTransactionIdLimit() checks InRecovery for the same things, and it is
>> > called at nearly the same moments as SetMultiXactIdLimit().  Do you have a
>> > sense of whether it is subject to similar problems as a result?
>>
>> Well, I think it's pretty weird that those things will get done before
>> beginning recovery, even on an inconsistent cluster, but not during
>> recovery.  That is pretty strange.  I don't see that it can actually
>> do any worse than emit a few log messages at the start of recovery
>> that won't show up again until the end of recovery, though.
>
> Granted.  Would it be better to update both functions at the same time, and
> perhaps to make that a master-only change?  Does the status quo cause more
> practical harm via SetMultiXactIdLimit() than via SetTransactionIdLimit()?

It does in the case of the call to find_multixact_start().  If that
fails, we take the server down for no good reason, as demonstrated by
the original report. I'll revert the changes to the other two things
in this function that I changed to be protected by did_trim.  There's
no special reason to think that's a necessary change.

>> >> 1. Moves the call to DetermineSafeOldestOffset() that appears in
>> >> StartupMultiXact() to TrimMultiXact(), so that we don't try to do this
>> >> until we're consistent.  Also, instead of passing
>> >> MultiXactState->oldestMultiXactId, pass the newer of that value and
>> >> the earliest offset that exists on disk.  That way, it won't try to
>> >> read data that's not there.
>> >
>> > Perhaps emit a LOG message when we do that, since it's our last 
>> > opportunity to
>> > point to the potential data loss?
>>
>> If the problem is just that somebody minmxid got set to 1 instead of
>> the real value, I think that there is no data loss, because none of
>> those older values are actually present there.  But we could add a LOG
>> message anyway.  How do you suggest that we phrase that?
>
> There's no data loss if 1 <= true_minmxid <= 2^31 at the time minmxid got set
> to 1.  Otherwise, data loss is possible.

Yes, but in that scenario, the log message you propose wouldn't be
triggered.  If true_minmxid > 2^31, then the stored minmxid will not
precede the files on disk; it will follow it (assuming the older stuff
hasn't been truncated, as is likely).  So the message would be
essentially:

LOG: you didn't lose data.  but if exactly the opposite of what this
message is telling you about had happened, then you would have.
DETAIL: Have a nice day.

> I don't hope for an actionable
> message, but we might want a reporter to grep logs for it when we diagnose
> future reports.  Perhaps this:
>
>   "missing pg_multixact/members files; begins at MultiXactId %u, expected %u"

This seems misleading.  In the known failure case, it's not that the
pg_multixact files are unexpectedly missing; it's that we incorrectly
think that they should still be there.  Maybe:

oldest MultiXactId on disk %u follows expected oldest MultiXact %u

> For good measure, perhaps emit this when lastCheckpointedOldest > earliest by
> more than one segment:
>
>   "excess pg_multixact/members files; begins at MultiXactId %u, expected %u"

So, this scenario will happen whenever the system was interrupted in
the middle of a truncation, or when the system is started from a base
backup and a truncation happened after files were copied.  I'm wary of
giving users the idea that this is an atypical event.  Perhaps a
message at DEBUG1?

>> I'm not sure what you mean about it becoming too old.  At least with
>> that fix, it should get updated to exactly the first file that we
>> didn't remove.  Isn't that right?
>
> Consider a function raw_GOMXOD() that differs from GetOldestMultiXactOnDisk()
> only in that it never reads or writes the cache.  I might expect
> oldestMultiXactOnDisk==raw_GOMXOD() if oldestMultiXactOnDiskValid, and that
> does hold most of the time.  It does not always hold between the start of the
> quoted code's SimpleLruTruncate() and its oldestMultiXactOnDisk assignment.
> That&

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-01 Thread Robert Haas

On Mon, Jun 1, 2015 at 4:58 AM, Andres Freund  wrote:
>> I'm probably biased here, but I think we should finish reviewing,
>> testing, and committing my patch before we embark on designing this.
>
> Probably, yes. I am wondering whether doing this immediately won't end
> up making some things simpler and more robust though.

I'm open to being convinced of that, but as of this moment I'm not
seeing any clear-cut evidence that we need to go so far.

>> So far we have no reports of trouble attributable to the lack of the
>> WAL-logging support discussed here, as opposed to several reports of
>> trouble from the status quo within days of release.
>
> The lack of WAL logging actually has caused problems in the 9.3.3 (?)
> era, where we didn't do any truncation during recovery...

Right, but now we're piggybacking on the checkpoint records, and I
don't have any evidence that this approach can't be made robust.  It's
possible that it can't be made robust, but that's not currently clear.

>> By the time we've reached the minimum recovery point, they will have
>> been recreated by the same WAL records that created them in the first
>> place.
>
> I'm not sure that's true. I think we could end up errorneously removing
> files that were included in the base backup. Anyway, let's focus on your
> patch for now.

OK, but I am interested in discussing the other thing too.  I just
can't piece together the scenario myself - there may well be one.  The
base backup will begin replay from the checkpoint caused by
pg_start_backup() and remove anything that wasn't there at the start
of the backup.  But all of that stuff should get recreated by the time
we reach the minimum recovery point (end of backup).

>> If, in the previous
>> replay, we had wrapped all the way around, some of the stuff we keep
>> may actually already have been overwritten by future WAL records, but
>> they'll be overwritten again.  Now, that could mess up our
>> determination of which members to remove, I guess, but I'm not clear
>> that actually matters either: if the offsets space has wrapped around,
>> the members space will certainly have wrapped around as well, so we
>> can remove anything we like at this stage and we're still OK.  I agree
>> this is ugly the way it is, but where is the actual bug?
>
> I'm more worried about the cases where we didn't ever actually "badly
> wrap around" (i.e. overwrite needed data); but where that's not clear on
> the standby because the base backup isn't in a consistent state.

I agree. The current patch tries to make it so that we never call
find_multixact_start() while in recovery, but it doesn't quite
succeed: the call in TruncateMultiXact still happens during recovery,
but only once we're sure that the mxact we plan to call it on actually
exists on disk.  That won't be called until we replay the first
checkpoint, but that might still be prior to consistency.

Since I forgot to attach the revised patch with fixes for the points
Noah mentioned to that email, here it is attached to this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
commit d33b4eb0167f465edb00bd6c0e1bcaa67dd69fe9
Author: Robert Haas 
Date:   Fri May 29 14:35:53 2015 -0400

foo

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9568ff1..aca829d 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -199,8 +199,9 @@ typedef struct MultiXactStateData
 	MultiXactOffset nextOffset;
 
 	/*
-	 * Oldest multixact that is still on disk.  Anything older than this
-	 * should not be consulted.  These values are updated by vacuum.
+	 * Oldest multixact that may still be referenced from a relation.
+	 * Anything older than this should not be consulted.  These values are
+	 * updated by vacuum.
 	 */
 	MultiXactId oldestMultiXactId;
 	Oid			oldestMultiXactDB;
@@ -213,6 +214,18 @@ typedef struct MultiXactStateData
 	 */
 	MultiXactId lastCheckpointedOldest;
 
+	/*
+	 * This is the oldest file that actually exist on the disk.  This value
+	 * is initialized by scanning pg_multixact/offsets, and subsequently
+	 * updated each time we complete a truncation.  We need a flag to
+	 * indicate whether this has been initialized yet.
+	 */
+	MultiXactId oldestMultiXactOnDisk;
+	bool		oldestMultiXactOnDiskValid;
+
+	/* Has TrimMultiXact been called yet? */
+	bool		didTrimMultiXact;
+
 	/* support for anti-wraparound measures */
 	MultiXactId multiVacLimit;
 	MultiXactId multiWarnLimit;
@@ -344,6 +357,8 @@ static char *mxstatus_to_string(MultiXactStatus status);
 /* management of SLRU infrastructure */
 static int	ZeroMultiXactOffsetPage(int page

Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-01 Thread Robert Haas

On Mon, Jun 1, 2015 at 12:46 AM, Noah Misch  wrote:
> Incomplete review, done in a relative rush:

Thanks.

> On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote:
>> OK, here's a patch.  Actually two patches, differing only in
>> whitespace, for 9.3 and for master (ha!).  I now think that the root
>> of the problem here is that DetermineSafeOldestOffset() and
>> SetMultiXactIdLimit() were largely ignorant of the possibility that
>> they might be called at points in time when the cluster was
>> inconsistent.
>
> A cause perhaps closer to the root is commit f741300 moving truncation from
> VACUUM to checkpoints.  CLOG has given us deep experience with VACUUM-time
> truncation.  Commit f6a6c46d and this patch are about bringing CHECKPOINT-time
> truncation up to the same level.
>
> Speaking of commit f6a6c46d, it seems logical that updating allocation stop
> limits should happen proximate to truncation.  That's currently the case for
> CLOG (vac_truncate_clog() does both) and pg_multixact/members (checkpoint's
> TruncateMultiXact() call does both).  However, pg_multixact/offsets is
> truncated from TruncateMultiXact(), but vac_truncate_clog() updates its limit.
> I did not distill an errant test case, but this is fishy.

Good point.  Because we confine ourselves to using half the offset
space, it seems much harder for us to get into trouble here than it is
with members. The first scenario that occurred to me is that the SLRU
might actually wrap.  That seems tough, though: between one checkpoint
and the next, vacuum would need to advance oldest_datminmxid by 2^31
MXIDs while generating 2^31 new ones, or something like that.  That
doesn't seem real plausible. But then it occurred to me that it's
probably sufficient to advance the head of the SLRU far enough that
TruncateMultiXact things that the tail is in the future instead of in
the past.  I see no reason why that couldn't happen.  Then we'd end up
leaving some files behind that we should have removed.  I'm not sure
exactly what problem that would cause; would they just get overwritten
on the next pass through the space, or would they cause errors?  I
have not had time to check.

>> SetMultiXactIdLimit() bracketed certain parts of its
>> logic with if (!InRecovery), but those guards were ineffective because
>> it gets called before InRecovery is set in the first place.
>
> SetTransactionIdLimit() checks InRecovery for the same things, and it is
> called at nearly the same moments as SetMultiXactIdLimit().  Do you have a
> sense of whether it is subject to similar problems as a result?

Well, I think it's pretty weird that those things will get done before
beginning recovery, even on an inconsistent cluster, but not during
recovery.  That is pretty strange.  I don't see that it can actually
do any worse than emit a few log messages at the start of recovery
that won't show up again until the end of recovery, though.

>> 1. Moves the call to DetermineSafeOldestOffset() that appears in
>> StartupMultiXact() to TrimMultiXact(), so that we don't try to do this
>> until we're consistent.  Also, instead of passing
>> MultiXactState->oldestMultiXactId, pass the newer of that value and
>> the earliest offset that exists on disk.  That way, it won't try to
>> read data that's not there.
>
> Perhaps emit a LOG message when we do that, since it's our last opportunity to
> point to the potential data loss?

If the problem is just that somebody minmxid got set to 1 instead of
the real value, I think that there is no data loss, because none of
those older values are actually present there.  But we could add a LOG
message anyway.  How do you suggest that we phrase that?

>> +  * PostgreSQL 9.3.0 through 9.3.6 and PostgreSQL 9.4.0 through 9.4.1
>> +  * had bugs that could allow users who reached those release through
>
> s/release/releases/

Fixed.

>> @@ -2859,6 +2947,14 @@ TruncateMultiXact(void)
>>   SimpleLruTruncate(MultiXactOffsetCtl,
>> 
>> MultiXactIdToOffsetPage(oldestMXact));
>>
>> + /* Update oldest-on-disk value in shared memory. */
>> + earliest = range.rangeStart * MULTIXACT_OFFSETS_PER_PAGE;
>> + if (earliest < FirstMultiXactId)
>> + earliest = FirstMultiXactId;
>> + LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
>> + Assert(MultiXactState->oldestMultiXactOnDiskValid);
>> + MultiXactState->oldestMultiXactOnDiskValid = earliest;
>
> That last line needs s/Valid//, I presume.  Is it okay that
> oldestMultiXactOnDisk becomes too-old during TruncateMultiXact(), despite its
> Valid indicator remaining true?

Ay yai

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-31 Thread Robert Haas

On Sat, May 30, 2015 at 8:55 PM, Andres Freund  wrote:
> Is oldestMulti, nextMulti - 1 really suitable for this? Are both
> actually guaranteed to exist in the offsets slru and be valid?  Hm. I
> guess you intend to simply truncate everything else, but just in
> offsets?

oldestMulti in theory is the right thing, I think, but in actuality we
know that some people have 1 here instead of the correct value.

>> One argument against this idea is that we may not want to keep a full
>> set of member files on standbys (due to disk space usage), but that's
>> what will happen unless we truncate during replay.
>
> I think that argument is pretty much the death-knell.=

Yes.  Truncating on the standby is really not optional.

>> > I think at least for 9.5+ we should a) invent proper truncation records
>> > for pg_multixact b) start storing oldestValidMultiOffset in pg_control.
>> > The current hack of scanning the directories to get knowledge we should
>> > have is a pretty bad hack, and we should not continue using it forever.
>> > I think we might end up needing to do a) even in the backbranches.
>>
>> Definitely agree with WAL-logging truncations; also +1 on backpatching
>> that to 9.3.  We already have experience with adding extra WAL records
>> on minor releases, and it didn't seem to have bitten too hard.
>
> I'm inclined to agree. My only problem is that I'm not sure whether we
> can find a way of doing all this without adding a pg_control field. Let
> me try to sketch this out:
>
> 1) We continue determining the oldest 
> SlruScanDirectory(SlruScanDirCbFindEarliest)
>on the master to find the oldest offsets segment to
>truncate. Alternatively, if we determine it to be safe, we could use
>oldestMulti to find that.
> 2) SlruScanDirCbRemoveMembers is changed to return the range of members
>to remove, instead of doing itself
> 3) We wal log [oldest offset segment guaranteed to not be alive,
>nextmulti) for offsets, and [oldest members segment guaranteed to not be 
> alive,
>nextmultioff), and redo truncations for the entire range during
>recovery.
>
> I'm pretty tired right now, but this sounds doable.

I'm probably biased here, but I think we should finish reviewing,
testing, and committing my patch before we embark on designing this.
So far we have no reports of trouble attributable to the lack of the
WAL-logging support discussed here, as opposed to several reports of
trouble from the status quo within days of release.

I'm having trouble reconstructing the series of events where what
you're worried about here really becomes a problem, and I think we
ought to have a very clear idea about that before back-patching
changes of this type.  It's true that if the state of the SLRU
directory is in the future, because recovery was restarted from an
earlier checkpoint, we might replay a checkpoint and remove some of
those files from the future.  But so what?  By the time we've reached
the minimum recovery point, they will have been recreated by the same
WAL records that created them in the first place.  If, in the previous
replay, we had wrapped all the way around, some of the stuff we keep
may actually already have been overwritten by future WAL records, but
they'll be overwritten again.  Now, that could mess up our
determination of which members to remove, I guess, but I'm not clear
that actually matters either: if the offsets space has wrapped around,
the members space will certainly have wrapped around as well, so we
can remove anything we like at this stage and we're still OK.  I agree
this is ugly the way it is, but where is the actual bug?

As far as your actual outline goes, I think if we do this, we need to
be very careful about step #2.  Right now, we decide what we need to
keep and then remove everything else, but that's kind of wonky because
new stuff may be getting created at the same time, so we keep
adjusting our idea of exactly what needs to be removed.  It would be
far better to invert that logic: decide what needs to be removed -
presumably, everything from the oldest member that now exists up until
some later point - and then remove precisely that stuff and nothing
else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-29 Thread Robert Haas

On Fri, May 29, 2015 at 9:46 PM, Andres Freund  wrote:
> On 2015-05-29 15:08:11 -0400, Robert Haas wrote:
>> It seems pretty clear that we can't effectively determine anything
>> about member wraparound until the cluster is consistent.
>
> I wonder if this doesn't actually hints at a bigger problem.  Currently,
> to determine where we need to truncate SlruScanDirectory() is
> used. That, afaics, could actually be a problem during recovery when
> we're not consistent.

I agree.  I actually meant to mention this in my previous email, but,
owing to exhaustion and burnout, didn't.

> I think at least for 9.5+ we should a) invent proper truncation records
> for pg_multixact b) start storing oldestValidMultiOffset in pg_control.
> The current hack of scanning the directories to get knowledge we should
> have is a pretty bad hack, and we should not continue using it forever.
> I think we might end up needing to do a) even in the backbranches.

That may be the right thing to do.  I'm concerned that changing the
behavior of master too much will make it every subsequent fix twice as
hard, because we'll have to do one fix in master and another fix in
the back-branches.  I'm also concerned that it will create even more
convoluted failure scenarios. The failure-to-start problem discussed
on this thread requires a chain of four (maybe three) different
PostgreSQL versions in order to create it, and the more things we go
change, the harder it's going to be to reason about this stuff.

The diseased and rotting elephant in the room here is that clusters
with bogus relminmxid, datminmxid, and/or oldestMultiXid values may
exist in the wild and we really have no plan to get rid of them.
78db307bb may have helped somewhat - although I'm haven't grokked what
it's about well enough to be sure - but it's certainly not a complete
solution, as this bug report itself illustrates rather well.  Unless
we figure out some clever solution that is not now apparent to me, or
impose a hard pg_upgrade compatibility break at some point, we
basically can't count on pg_control's "oldest multixact" information
to be correct ever again.  We may be running into clusters 15 years
from now that have problems that are just holdovers from what was
fixed in 9.3.5.

One thing I think we should definitely do is add one or two additional
fields to pg_controldata that get filled in by pg_upgrade.  One of
them should be "the oldest known catversion in the lineage of this
cluster" and the other should be "the most recent catverson in the
lineage of this cluster before this one".   Or maybe we should store
PG_VERSION_NUM values.  Or store both things.  I think that would make
troubleshooting this kind of problem a lot easier - just from the
pg_controldata output, you'd be able to tell whether the cluster had
been pg_upgraded, whether it had been pg_upgraded once or multiple
times, and at least some of the versions involved, without relying on
the user's memory of what they did and when.  Fortunately, Steve
Kellet had a pretty clear idea of what his history was, but not all
users know that kind of thing, and I've wanted it more than once while
troubleshooting.

Another thing I think we should do is add a field to pg_class that is
propagated by pg_upgrade and stores the most recent PG_VERSION_NUM
that is known to have performed a scan_all vacuum of the table.  This
would allow us to do things in the future like (a) force a full-table
vacuum of any table that hasn't been vacuumed since $BUGGYRELEASE or
(b) advise users to manually inspect the values and manually perform
said vacuum or (c) only believe that certain information about a table
is accurate if it's been full-scanned by a vacuum newer than
$BUGGYRELEASE.  It could also be used as part of a strategy for
reclaiming HEAP_MOVED_IN/HEAP_MOVED_OFF; e.g. you can't upgrade to
10.5, which repurposes those bits, unless you've done a scan_all
vacuum on every table with a release new enough to guarantee that
they're not used for their historical purpose.

> This problem isn't conflicting with most of the fixes you describe, so
> I'll continue with reviewing those.

Thank you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-29 Thread Robert Haas

On Fri, May 29, 2015 at 3:08 PM, Robert Haas  wrote:
> It won't fix the fact that pg_upgrade is putting
> a wrong value into everybody's datminmxid field, which should really
> be addressed too, but I've been working on this for about three days
> virtually non-stop and I don't have the energy to tackle it right now.
> If anyone feels the urge to step into that breech, I think what it
> needs to do is: when upgrading from a 9.3-or-later instance, copy over
> each database's datminmxid into the corresponding database in the new
> cluster.

Bruce was kind enough to spend some time on IM with me this afternoon,
and I think this may actually be OK.  What pg_upgrade does is:

1. First, put next-xid into the relminmxid for all tables, including
catalog tables.  This is the correct behavior for upgrades from a
pre-9.3 release, and is correct for catalog tables in general.

2. Next, restoring the schema dump will set the relminmxid values for
all non-catalog tables to the value dumped from the old cluster.  At
this point, everything is fine provided that we are coming from a
release 9.3 or newer.  But if the old cluster is pre-9.3, it will have
dumped *zero* values for all of its relminmxid values; so all of the
user tables go from the correct value they had after step 1 to an
incorrect value.

3. Finally, if the old cluster is pre-9.3, repeat step 1, undoing the
damage done in step 2.

This is a bit convoluted, but I don't know of a reason why it
shouldn't work.  Sorry for the false alarm.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-29 Thread Robert Haas

On Fri, May 29, 2015 at 12:43 PM, Robert Haas  wrote:
> Working on that now.

OK, here's a patch.  Actually two patches, differing only in
whitespace, for 9.3 and for master (ha!).  I now think that the root
of the problem here is that DetermineSafeOldestOffset() and
SetMultiXactIdLimit() were largely ignorant of the possibility that
they might be called at points in time when the cluster was
inconsistent.  SetMultiXactIdLimit() bracketed certain parts of its
logic with if (!InRecovery), but those guards were ineffective because
it gets called before InRecovery is set in the first place.

It seems pretty clear that we can't effectively determine anything
about member wraparound until the cluster is consistent.  Before then,
there might be files missing from the offsets or members SLRUs which
get put back during replay.  There could even be gaps in the sequence
of files, with some things having made it to disk before the crash (or
having made it into the backup) and others not.  So all the work of
determining what the safe stop points and vacuum thresholds for
members are needs to be postponed until TrimMultiXact() time.  And
that's fine, because we don't need this information in recovery anyway
- it only affects behavior in normal running.

So this patch does the following:

1. Moves the call to DetermineSafeOldestOffset() that appears in
StartupMultiXact() to TrimMultiXact(), so that we don't try to do this
until we're consistent.  Also, instead of passing
MultiXactState->oldestMultiXactId, pass the newer of that value and
the earliest offset that exists on disk.  That way, it won't try to
read data that's not there.  Note that the second call to
DetermineSafeOldestOffset() in TruncateMultiXact() doesn't need a
similar guard, because we already bail out of that function early if
the multixacts we're going to truncate away don't exist.

2. Adds a new flag MultiXactState->didTrimMultiXact indicate whether
we've finished TrimMultiXact(), and arranges for SetMultiXactIdLimit()
to use that rather than InRecovery to test whether it's safe to do
complicated things that might require that the cluster is consistent.
This is a slight behavior change, since formerly we would have tried
to do that stuff very early in the startup process, and now it won't
happen until somebody completes a vacuum operation.  If that's a
problem, we could consider doing it in TrimMultiXact(), but I don't
think it's safe the way it was.  The new flag also prevents
oldestOffset from being set while in recovery; I think it would be
safe to do that in recovery once we've reached consistency, but I
don't believe it's necessary.

3. Arranges for TrimMultiXact() to set oldestOffset.  This is
necessary because, previously, we relied on SetMultiXactIdLimit doing
that during early startup or during recovery, and that's no longer
true.  Here too we set oldestOffset keeping in mind that our notion of
the oldest multixact may point to something that doesn't exist; if so,
we use the oldest MXID that does.

4. Modifies TruncateMultiXact() so that it doesn't re-scan the SLRU
directory on every call to find the oldest file that exists.  Instead,
it arranges to remember the value from the first scan and then updates
it thereafter to reflect its own truncation activity.  This isn't
absolutely necessary, but because this oldest-file logic is used in
multiple places (TrimMultiXact, SetMultiXactIdLimit, and
TruncateMultiXact all need it directly or indirectly) caching the
value seems like a better idea than recomputing it frequently.

I have tested that this patch fixes Steve Kehlet's problem, or at
least what I believe to be Steve Kehlet's problem based on the
reproduction scenario I described upthread.  I believe it will also
fix the problems with starting up from a base backup with Alvaro
mentioned upthread.  It won't fix the fact that pg_upgrade is putting
a wrong value into everybody's datminmxid field, which should really
be addressed too, but I've been working on this for about three days
virtually non-stop and I don't have the energy to tackle it right now.
If anyone feels the urge to step into that breech, I think what it
needs to do is: when upgrading from a 9.3-or-later instance, copy over
each database's datminmxid into the corresponding database in the new
cluster.

Aside from that, it's very possible that despite my best efforts this
has serious bugs.  Review and testing would be very much appreciated.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 699497c..8d28a5c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -197,8 +197,9 @@ typedef struct MultiXactStateData
 	Mu

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-29 Thread Robert Haas

On Fri, May 29, 2015 at 10:17 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Fri, May 29, 2015 at 11:24 AM, Robert Haas  wrote:
>>> B. We need to change find_multixact_start() to fail softly.
>
>> Here is an experimental WIP patch that changes StartupMultiXact and
>> SetMultiXactIdLimit to find the oldest multixact that exists on disk
>> (by scanning the directory), and uses that if it is more recent than
>> the oldestMultiXactId from shmem,
>
> Not sure about the details of this patch, but I was planning to propose
> what I think is the same thing: the way to make find_multixact_start()
> fail softly is to have it find the oldest actually existing file if the
> one that should be there isn't.

Working on that now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-28 Thread Robert Haas

at we're doing right now.
The current logic purports to leave a one-file gap in the member
space, but there's no guarantee that the gap really exists on disk the
way we think it does.  With this approach, we can be certain that
there is a gap.  And that is a darned good thing to be certain about.

>> C. I think we should also change TruncateMultiXact() to truncate
>> offsets first, and then members.  As things stand, if we truncate
>> members first, we increase the risk of seeing an offset that will fail
>> when passed to find_multixact_start(), because TruncateMultiXact()
>> might get interrupted before it finishes.  That seem like an
>> unnecessary risk.
>
> Not sure about this point.  We did it the way you propose previously,
> and found it to be a problem because sometimes we tried to read an
> offset file that was no longer there.  Do we really read member files
> anywhere?  I thought we only tried to read offset files.  If we remove
> member files, what is it that we try to read and find not to be present?

Do you have a link to the previous discussion?

I mean, the problem we're having right now is that sometimes we have
an offset, but the corresponding member isn't there.  So clearly
offsets reference members.  Do members also reference offsets?  I
didn't think so, but life is full of surprises.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-28 Thread Robert Haas

On Thu, May 28, 2015 at 4:06 PM, Joshua D. Drake  wrote:
> FTR: Robert, you have been a Samurai on this issue. Our many thanks.

Thanks!  I really appreciate the kind words.

So, in thinking through this situation further, it seems to me that
the situation is pretty dire:

1. If you pg_upgrade to 9.3 before 9.3.5, then you may have relminmxid
or datminmxid values which are 1 instead of the correct value.
Setting the value to 1 was too far in the past if your MXID counter is
< 2B, and too far in the future if your MXID counter is > 2B.

2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid
values which are equal to the next-mxid counter instead of the correct
value; in other words, they are two new.

3. If you pg_upgrade to 9.3.5, 9.3.6, 9.4.0, or 9.4.1, then you will
have the first problem for tables in template databases, and the
second one for the rest. (See 866f3017a.)

4. Wrong relminmxid or datminmxid values can eventually propagate into
the control file, as demonstrated in my previous post.  Therefore, we
can't count on relminmxid to be correct, we can't count on datminmxid
to be correct, and we can't count on the control file to be correct.
That's a sack of sad.

5. If the values are too far in the past, then nothing really terrible
will happen unless you upgrade to 9.3.7 or 9.4.2, at which point the
system will refuse to start.  Forcing a VACUUM FREEZE on every
database, including the unconnectable ones, should fix this and allow
you to upgrade safely - which you want to do, because 9.3.7 and 9.4.2
fix a different set of multixact data loss bugs.

6. If the values are too far in the future, the system may fail to
prevent wraparound, leading to data loss.  I am not totally clear on
whether a VACUUM FREEZE will fix this problem.  It seems like the
chances are better if you are running at least 9.3.5+ or 9.4.X,
because of 78db307bb.  But I'm not sure how complete a fix that is.

So what do we do about this?  I have a few ideas:

A. Most obviously, we should fix pg_upgrade so that it installs
chkpnt_oldstMulti instead of chkpnt_nxtmulti into datfrozenxid, so
that we stop creating new instances of this problem.  That won't get
us out of the hole we've dug for ourselves, but we can at least try to
stop digging.  (This is assuming I'm right that chkpnt_nxtmulti is the
wrong thing - anyone want to double-check me on that one?)

B. We need to change find_multixact_start() to fail softly.  This is
important because it's legitimate for it to fail in recovery, as
discussed upthread, and also because we probably want to eliminate the
fail-to-start hazard introduced in 9.4.2 and 9.3.7.
find_multixact_start() is used in three places, and they each require
separate handling:

- In SetMultiXactIdLimit, find_multixact_start() is used to set
MultiXactState->oldestOffset, which is used to determine how
aggressively to vacuum.  If find_multixact_start() fails, we don't
know how aggressively we need to vacuum to prevent members wraparound;
it's probably best to decide to vacuum as aggressively as possible.
Of course, if we're in recovery, we won't vacuum either way; the fact
that it fails softly is good enough.

- In DetermineSafeOldestOffset, find_multixact_start() is used to set
MultiXactState->offsetStopLimit.  If it fails here, we don't know when
to refuse multixact creation to prevent wraparound.  Again, in
recovery, that's fine.  If it happens in normal running, it's not
clear what to do.  Refusing multixact creation is an awfully blunt
instrument.  Maybe we can scan pg_multixact/offsets to determine a
workable stop limit: the first file greater than the current file that
exists, minus two segments, is a good stop point.  Perhaps we ought to
use this mechanism here categorically, not just when
find_multixact_start() fails.  It might be more robust than what we
have now.

- In TruncateMultiXact, find_multixact_start() is used to set the
truncation point for the members SLRU.  If it fails here, I'm guessing
the right solution is not to truncate anything - instead, rely on
intense vacuuming to eventually advance oldestMXact to a value whose
member data still exists; truncate then.

C. I think we should also change TruncateMultiXact() to truncate
offsets first, and then members.  As things stand, if we truncate
members first, we increase the risk of seeing an offset that will fail
when passed to find_multixact_start(), because TruncateMultiXact()
might get interrupted before it finishes.  That seem like an
unnecessary risk.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-28 Thread Robert Haas

On Thu, May 28, 2015 at 8:51 AM, Robert Haas  wrote:
> [ speculation ]

OK, I finally managed to reproduce this, after some off-list help from
Steve Kehlet (the reporter), Alvaro, and Thomas Munro.  Here's how to
do it:

1. Install any pre-9.3 version of the server and generate enough
multixacts to create at least TWO new segments.  When you shut down
the server, all segments except for the most current one will be
removed.  At this point, the only thing in
$PGDATA/pg_multixact/offsets should be a single file, and the name of
that file should not be  or 0001.

2. Use pg_upgrade to upgrade to 9.3.4.  It is possible that versions <
9.3.4 will also work here, but you must not use 9.3.5 or higher,
because 9.3.5 includes Bruce's commit 3d2e18510, which arranged to
preserve relminmxid and datminmxid values.   At this point,
pg_controldata on the new cluster should show an oldestMultiXid value
greater than 1 (copied from the old cluster), but all the datminmxid
values are 1.  Also, initdb will have left behind a bogus  file in
pg_multixact/offsets.

3. Move to 9.3.5 (or 9.3.6), not via pg_upgrade, but just by dropping
in the new binaries.  Follow the instructions in the 9.3.5 release
notes; since you created at least TWO new segments in step one, there
will be no 0001 file, and the query there will say that you should
remove the bogus  file.  So do that, leaving just the good file in
pg_multixact/offsets.  At this point, pg_multixact/offsets is OK, and
pg_controldata still says that oldestMultiXid > 1, so that is also OK.
The only problem is that we've got some bogus datminmxid values
floating around.  Our next step will be to convince vacuum to
propagate the bogus datminmxid values back into pg_controldata.

4. Consume at least one transaction ID (e.g. SELECT txid_current())
and then do this:

postgres=# set vacuum_freeze_min_age = 0;
SET
postgres=# set vacuum_freeze_table_age = 0;
SET
postgres=# vacuum;
VACUUM

Setting the GUCs forces full table scans, so that we advance
relfrozenxid.  But notice that we were careful not to just run VACUUM
FREEZE, which would have also advanced relminmxid, which, for purposes
of reproducing this bug, is not what we want to happen.  So relminmxid
is still (incorrectly) set to 1 for every database.  However, since
the vacuum did advance relfrozenxid, it will call vac_truncate_clog,
which will call SetMultiXactIdLimit, which will propagate the bogus
datminmxid = 1 setting into shared memory.

(In my testing, this step doesn't work if performed on 9.3.4; you have
to do it on 9.3.5.  I think that's because of Tom's commit 78db307bb,
but I believe in a more complex test scenario you might be able to get
this to happen on 9.3.4 also.)

I believe it's the case that an autovacuum of even a single table can
substitute for this step if it happens to advance relfrozenxid but not
relminmxid.

5. The next checkpoint, or the shutdown checkpoint in any event, will
propagate the bogus value of 1 from shared memory back into the
control file.

6. Now try to start 9.3.7.  It will see the bogus oldestMultiXid = 1
value in the control file, attempt to read the corresponding offsets
file, and die.

In the process of investigating this, we found a few other things that
seem like they may also be bugs:

- As noted upthread, replaying an older checkpoint after a newer
checkpoint has already happened may lead to similar problems.  This
may be possible when starting from an online base backup; or when
restarting a standby that did not perform a restartpoint when
replaying the last checkpoint before the shutdown.

- pg_upgrade sets datminmxid =
old_cluster.controldata.chkpnt_nxtmulti, which is correct only if
there are ZERO multixacts in use at the time of the upgrade.  It would
be best, I think, to set this to the same value it had in the old
cluster, but if we're going to use a blanket value, I think it needs
to be chkpnt_oldstMulti.

- There's a third possible problem related to boundary cases in
SlruScanDirCbRemoveMembers, but I don't understand that one well
enough to explain it.  Maybe Thomas can jump in here and explain the
concern.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-28 Thread Robert Haas

On Thu, May 28, 2015 at 8:03 AM, Robert Haas  wrote:
>> Steve, is there any chance we can get your pg_controldata output and a
>> list of all the files in pg_clog?
>
> Err, make that pg_multixact/members, which I assume is at issue here.
> You didn't show us the DETAIL line from this message, which would
> presumably clarify:
>
> FATAL:  could not access status of transaction 1

And I'm still wrong, probably.  The new code in 9.4.2 cares about
being able to look at an *offsets* file to find the corresponding
member offset.  So most likely it is an offsets file that is missing
here.  The question is, how are we ending up with an offsets file that
is referenced by the control file but not actually present on disk?

It seems like it would be good to compare the pg_controldata output to
what is actually present in pg_multixact/offsets (hopefully that's the
right directory, now that I'm on my third try) and try to understand
what is going on here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-28 Thread Robert Haas

On Thu, May 28, 2015 at 8:01 AM, Robert Haas  wrote:
> On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera
>  wrote:
>> Steve Kehlet wrote:
>>> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
>>> just dropped new binaries in place) but it wouldn't start up. I found this
>>> in the logs:
>>>
>>> waiting for server to start2015-05-27 13:13:00 PDT [27341]: [1-1] LOG:
>>>  database system was shut down at 2015-05-27 13:12:55 PDT
>>> 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL:  the database system is
>>> starting up
>>> .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL:  could not access status of
>>> transaction 1
>>
>> I am debugging today a problem currently that looks very similar to
>> this.  AFAICT the problem is that WAL replay of an online checkpoint in
>> which multixact files are removed fails because replay tries to read a
>> file that has already been removed.
>
> Wait a minute, wait a minute.  There's a serious problem with this
> theory, at least in Steve's scenario.  This message:
>
> 2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: database system was shut
> down at 2015-05-27
>
> That message implies a *clean shutdown*.  If he had performed an
> immediate shutdown or just pulled the plug, it would have said
> "database system was interrupted" or some such.
>
> There may be bugs in redo, also, but they don't explain what happened to 
> Steve.
>
> Steve, is there any chance we can get your pg_controldata output and a
> list of all the files in pg_clog?

Err, make that pg_multixact/members, which I assume is at issue here.
You didn't show us the DETAIL line from this message, which would
presumably clarify:

FATAL:  could not access status of transaction 1

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-28 Thread Robert Haas

On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera
 wrote:
> Steve Kehlet wrote:
>> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
>> just dropped new binaries in place) but it wouldn't start up. I found this
>> in the logs:
>>
>> waiting for server to start2015-05-27 13:13:00 PDT [27341]: [1-1] LOG:
>>  database system was shut down at 2015-05-27 13:12:55 PDT
>> 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL:  the database system is
>> starting up
>> .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL:  could not access status of
>> transaction 1
>
> I am debugging today a problem currently that looks very similar to
> this.  AFAICT the problem is that WAL replay of an online checkpoint in
> which multixact files are removed fails because replay tries to read a
> file that has already been removed.

Wait a minute, wait a minute.  There's a serious problem with this
theory, at least in Steve's scenario.  This message:

2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: database system was shut
down at 2015-05-27

That message implies a *clean shutdown*.  If he had performed an
immediate shutdown or just pulled the plug, it would have said
"database system was interrupted" or some such.

There may be bugs in redo, also, but they don't explain what happened to Steve.

Steve, is there any chance we can get your pg_controldata output and a
list of all the files in pg_clog?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-28 Thread Robert Haas

On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera
 wrote:
> Steve Kehlet wrote:
>> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
>> just dropped new binaries in place) but it wouldn't start up. I found this
>> in the logs:
>>
>> waiting for server to start2015-05-27 13:13:00 PDT [27341]: [1-1] LOG:
>>  database system was shut down at 2015-05-27 13:12:55 PDT
>> 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL:  the database system is
>> starting up
>> .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL:  could not access status of
>> transaction 1
>
> I am debugging today a problem currently that looks very similar to
> this.  AFAICT the problem is that WAL replay of an online checkpoint in
> which multixact files are removed fails because replay tries to read a
> file that has already been removed.

Steve: Can you tell us more about how you shut down the old cluster?
Did you by any chance perform an immediate shutdown?  Do you have the
actual log messages that were written when the system was shut down
for the upgrade?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-27 Thread Robert Haas

On Wed, May 27, 2015 at 10:14 PM, Alvaro Herrera
 wrote:
> Well I'm not very clear on what's the problematic case.  The scenario I
> actually saw this first reported was a pg_basebackup taken on a very
> large database, so the master could have truncated multixact and the
> standby receives a truncated directory but actually tries to apply a
> checkpoint that is much older than what the master currently has
> transmitted as pg_multixact contents.

OK, that makes sense.

>> That might be an OK fix, but this implementation doesn't seem very
>> clean.  If we're going to remove the invariant that
>> MultiXactState->oldestOffset will always be valid after replaying a
>> checkpoint, then we should be explicit about that and add a flag
>> indicating whether or not it's currently valid.  Shoving nextOffset in
>> there and hoping that's good enough seems like a bad idea to me.
>>
>> I think we should modify the API for find_multixact_start.  Let's have
>> it return a Boolean and return oldestOffset via an out parameter.  If
>> !InRecovery, it will always return true and set the out parameter; but
>> if in recovery, it is allowed to return false without setting the out
>> parameter.  Both values can get stored in MultiXactState, and we can
>> adjust the logic elsewhere to disregard oldestOffset when the
>> accompanying flag is false.
>
> Sounds good.  I think I prefer that multixact creation is rejected
> altogether if the new flag is false.  Is that what you mean when you say
> "adjust the logic"?

No.  I'm not sure quite what you mean here.  We can't reject multixact
creation during normal running, and during recovery, we won't create
any really new mulitxacts, but we must replay the creation of
multixacts.  What I meant was stuff like this:

if (!MultiXactIdPrecedes(result, MultiXactState->multiVacLimit) ||
(MultiXactState->nextOffset - MultiXactState->oldestOffset
> MULTIXACT_MEMBER_SAFE_THRESHOLD))

I meant that we'd change the second prong of the test to check
multiXactState->nextOffsetValid && MultiXactState->nextOffset -
MultiXactState->oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD.  And
likewise change anything else that relies on oldestOffset.  Or else we
guarantee that we can't reach those points until the oldestOffset is
valid, and then check that it is with an Assert() or elog().

>> This still leaves open an ugly possibility: can we reach normal
>> running without a valid oldestOffset?  If so, until the next
>> checkpoint happens, autovacuum has no clue whether it needs to worry.
>> There's got to be a fix for that, but it escapes me at the moment.
>
> I think the fix to that issue is to set the oldest offset on
> TrimMultiXact.  That way, once WAL replay finished we're certain that we
> have a valid oldest offset to create new multixacts with.
>
> I'm also wondering whether the call to DetermineSafeOldestOffset on
> StartupMultiXact is good.  At that point, we haven't replayed any WAL
> yet, so the oldest multi might be pointing at a file that has already
> been removed -- again considering the pg_basebackup scenario where the
> multixact files are copied much later than pg_control, so the checkpoint
> to replay is old but the pg_multixact contents have already been
> truncated in the master and are copied truncated.

Moving the call from StartupMultiXact() to TrimMultiXact() seems like
a good idea.  I'm not sure why we didn't do that before.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-05-27 Thread Robert Haas

On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera
 wrote:
> Steve Kehlet wrote:
>> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we
>> just dropped new binaries in place) but it wouldn't start up. I found this
>> in the logs:
>>
>> waiting for server to start2015-05-27 13:13:00 PDT [27341]: [1-1] LOG:
>>  database system was shut down at 2015-05-27 13:12:55 PDT
>> 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL:  the database system is
>> starting up
>> .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL:  could not access status of
>> transaction 1
>
> I am debugging today a problem currently that looks very similar to
> this.  AFAICT the problem is that WAL replay of an online checkpoint in
> which multixact files are removed fails because replay tries to read a
> file that has already been removed.

Hmm, so what exactly is the sequence of events here?  It's possible
that I'm not thinking clearly just now, but it seems to me that if
we're replaying the same checkpoint we replayed previously, the offset
of the oldest multixact will be the first file that we didn't remove.
However, I can see that there could be a problem if we try to replay
an older checkpoint after having already replayed a new one - for
example, if a standby replays checkpoint A truncating the members
multixact and performs a restart point, and then replays checkpoint B
truncating the members multixact again but without performing a
restartpoint, and then is shut down, it will resume replay from
checkpoint A, and trouble will ensue.  Is that the scenario, or is
there something else?

> I think the fix to this is to verify whether the file exists on disk
> before reading it; if it doesn't, assume the truncation has already
> happened and that it's not necessary to remove it.

That might be an OK fix, but this implementation doesn't seem very
clean.  If we're going to remove the invariant that
MultiXactState->oldestOffset will always be valid after replaying a
checkpoint, then we should be explicit about that and add a flag
indicating whether or not it's currently valid.  Shoving nextOffset in
there and hoping that's good enough seems like a bad idea to me.

I think we should modify the API for find_multixact_start.  Let's have
it return a Boolean and return oldestOffset via an out parameter.  If
!InRecovery, it will always return true and set the out parameter; but
if in recovery, it is allowed to return false without setting the out
parameter.  Both values can get stored in MultiXactState, and we can
adjust the logic elsewhere to disregard oldestOffset when the
accompanying flag is false.

This still leaves open an ugly possibility: can we reach normal
running without a valid oldestOffset?  If so, until the next
checkpoint happens, autovacuum has no clue whether it needs to worry.
There's got to be a fix for that, but it escapes me at the moment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Setting up replication

2015-03-17 Thread Robert Fitzpatrick

I have a question about setting up replication between my 
postgresql-9.3.6 servers. If I'm using pg_basebackup on my FreeBSD 10.1 
slave server, the postgresql.conf file is in the data directory, which 
pg_basebackup insists must be empty. I can't find any info about how to 
relocate the postgresql.conf file and tell the init script its new 
location. If I setup PITR prior to replication setup and share the same 
postgresql.conf file transferred by pg_basebackup, how can both servers 
see the archive directory? It is local for the slave, nfs mount for the 
master.


Obviously there is something I'm missing or not understanding, can 
anyone help? Thanks!


--
Robert



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How do I calculate the sum of a field filtered by multiple windows defined by another field?

2015-03-15 Thread Robert James

Version 9.2.4

On 3/15/15, David G. Johnston  wrote:
> On Sunday, March 15, 2015, Robert James  wrote:
>
>> How do I calculate the sum of a field filtered by multiple windows
>> defined by another field?
>>
>> I have table event with fields event_date, num_events, site_id. I can
>> easily use aggregate SQL to do SELECT SUM(num_events) GROUP BY
>> site_id.
>>
>> But I also have another table site with fields site_id, target_date.
>> I'd like to do a JOIN, showing the SUM of num_events within 60 days of
>> the target_date, 90 days, 120 days, etc. I thought this could easily
>> be done using a WHERE clause in the aggregate SQL. However, this is
>> complicated by two challenges:
>>
>> 1. The target_date is not fixed, but varies for each site_id
>>
>> 2. I'd like multiple date ranges to be outputed in the same table; so
>> I can't do a simple WHERE to exclude records falling outside the range
>> from the event table
>>
>> One workaround I've thought of is to simply make several queries, one
>> for each date range, using a different WHERE clause for each, and then
>> use a view to paste them together. Is there a simpler, better, or more
>> elegant way to achieve my goals?
>>
>>
> I suggest you create and post a sample query, ideally using WITH & VALUES
> for sample data, that gets you the answer using UNION.  You will then have
> something to compare against and others can know exactly what you want
> instead of trying to figure it out from your limited description.
>
> What version does the solution need to work for?
>
> David J.
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Preserving formatting and comments in a VIEW definition

2015-03-15 Thread Robert James

When I save a VIEW, Postgres seems to convert it to a different
format, functionally equivalent but unrecognizable (whitespace,
comments, adds lots of casts, etc.)

Is there any simple way to preserve my original code?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] How do I calculate the sum of a field filtered by multiple windows defined by another field?

2015-03-15 Thread Robert James

How do I calculate the sum of a field filtered by multiple windows
defined by another field?

I have table event with fields event_date, num_events, site_id. I can
easily use aggregate SQL to do SELECT SUM(num_events) GROUP BY
site_id.

But I also have another table site with fields site_id, target_date.
I'd like to do a JOIN, showing the SUM of num_events within 60 days of
the target_date, 90 days, 120 days, etc. I thought this could easily
be done using a WHERE clause in the aggregate SQL. However, this is
complicated by two challenges:

1. The target_date is not fixed, but varies for each site_id

2. I'd like multiple date ranges to be outputed in the same table; so
I can't do a simple WHERE to exclude records falling outside the range
from the event table

One workaround I've thought of is to simply make several queries, one
for each date range, using a different WHERE clause for each, and then
use a view to paste them together. Is there a simpler, better, or more
elegant way to achieve my goals?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-13 Thread Robert Inder

On 12 March 2015 at 12:31, Thomas Kellerer  wrote:

> 8.4 cannot run queries on the standby, you need to upgrade to a 
> supported/maintained version for this
> (this feature was introduced in 9.0)
>
> In 9.x you can start the slave as a "hot standby" to allow read only queries 
> which is what pg_dump needs.
>
> You should really upgrade to a current version 9.4 or 9.3

Thought you were going to say that:-(

Well, I guess we have to do it some time, and now there is a reason
for it to happen sooner rather than later...

But even if (OK, "though")  I'm doing that, Steven's suggestion of
making the dump to a ram file system, then filing it as a separate
step, looks simple enough to be worth trying as a stop-gap...

Robert.

-- 
Robert Inder,
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-12 Thread Robert Inder

Hi, Francisco,

On 11 March 2015 at 17:32, Francisco Olarte  wrote:

> This is, build an streaming replication slave, pg_dump from the slave. If
> needed, restore in the master.

I really like the idea of running pg_dump on the slave, but I don't understand
how I could do it.

Postgres on our live machine is regularly pushing WAL files to the standby.

Postgres on the standby machine is continually reading those files.
But that is all it will do. "pg_dump" just says "The database is starting up".

Could/should I have something configured differently?

Or Is this something that has changed with Postgres 9?
We're currently running Postgres 8.4.
Is this my specific reason to embark on an upgrade?

Robert.

-- 
Robert Inder,
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Basic Question on Point In Time Recovery

2015-03-12 Thread Robert Inder

Thanks for your comments (so far:-)

I guess I'm pleased that nobody has said that I'm doing something stupid!

I'll certainly look at Slony and Barman.

And Stephane's suggestion of doing regular basebackups and keeping the
WAL files seems neat.  If I under stand it, we'd use the/a standby
server to "replay" the entire installation up to the point when the
problem occurs, and then use pg_dump to dump just the database we
need.

I'm wondering just how the size of a day's worth of WAL files would
compare to a whole slew of hourly dumps.

The other issue would be how long the "replay" would take.  But, I
realise, that's not a major concern: the delay would only be seen by a
client that
had had a major problem.  Everyone else would see service as normal.

I think I'll be doing some experiments to find out:-)

Robert.

-- 
Robert Inder,
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Basic Question on Point In Time Recovery

2015-03-11 Thread Robert Inder

We are developing a new software system which is now used by a number
of independent clients for gathering and storing live data as part of
their day to day work.

We have a number of clients sharing a single server.  It is running
one Postgres service, and each client is a separate user with access
to their own database.  Each client's database will contain "hundreds
of thousands" of records, and will be supporting occasional queries by
a small number of users.   So the system is currently running on
"modest" hardware.

To guard against the server failing, we have a standby server being
updated by WAL files, so if the worst comes to the worst we'll only
lose "a few minutes" work.  No problems there.

But, at least while the system is under rapid development, we also
want to have a way to roll a particular client's database back to a
(recent) "known good" state, but without affecting any other client.

My understanding is that the WAL files mechanism is installation-wide
-- it will affect all clients alike.

So to allow us to restore data for an individual client, we're running
"pg_dump" once an hour on each database in turn.  In the event of a
problem with one client's system, we can restore just that one
database, without affecting any other client.

The problem is that we're finding that as the number of clients grows,
and with it the amount of data, pg_dump is becoming more intrusive.
Our perception is that when pg_dump is running for any database,
performance on all databases is reduced.  I'm guessing this is because
the dump is making heavy use of the disk.

There is obviously scope for improving performance by getting using
more, or more powerful, hardware.  That's obviously going to be
necessary at some point, but it is obviously an expense that our
client would like to defer as long as possible.

So before we go down that route, I'd like to check that we're not
doing something dopey.

Is our current "frequent pg_dump" approach a sensible way to go about
things.  Or are we missing something?  Is there some other way to
restore one database without affecting the others?

Thanks in advance.

Robert.

-- 
Robert Inder,
Interactive Information Ltd,   3, Lauriston Gardens, Edinburgh EH3 9HH
Registered in Scotland, Company no. SC 150689
   Interactions speak louder than words


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-19 Thread Robert DiFalco

I don't think an advisory lock would remove the deadlock.

On Sun, Jan 18, 2015 at 10:33 PM, Roxanne Reid-Bennett 
wrote:

> On 1/16/2015 2:41 AM, Jim Nasby wrote:
>
>> On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote:
>>
>>>
>>> try this:  (if you still get deadlocks, uncomment the advisory lock
>>> [thanks Daniel] and try again)
>>> Logically I suppose it might run faster to do the select, then insert
>>> "if".  I almost always write these as insert first - because it's the more
>>> restrictive lock.
>>>
>>> CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
>>> RETURNS
>>> INTEGER AS
>>> $BODY$
>>> DECLARE
>>>  v_id integer;
>>> BEGIN
>>> --perform pg_advisory_xact_lock(hashtext(hometown_name));
>>>BEGIN
>>>  insert into hometowns (name)
>>>  select hometown_name where not exists (select id from hometowns
>>> where name = hometown_name)
>>>  returning id into v_id;
>>>
>>
>> That has a race condition. The only safe way to do this (outside of SSI)
>> is using the example code at http://www.postgresql.org/
>> docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
>>
>
> And if the advisory lock is used?  That presumably creates an exclusive
> lock on the asset "hometown_name". [in most examples given "Portland,
> OR".]  Would not any other process that runs (this function) on the same
> asset have to wait for this specific transaction to commit or roll back -
> blocking the race condition?
>
> Roxanne
> (sorry, I was out of town)
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-19 Thread Robert DiFalco

Hometowns get selected and possibly inserted in unpredictable ways even
from multiple concurrent sessions. The only way I could figure out how to
solve it was to force each INSERT hometowns to be in its own transaction.

On Mon, Jan 19, 2015 at 1:56 PM, Robert DiFalco 
wrote:

> I don't think an advisory lock would remove the deadlock.
>
> On Sun, Jan 18, 2015 at 10:33 PM, Roxanne Reid-Bennett 
> wrote:
>
>> On 1/16/2015 2:41 AM, Jim Nasby wrote:
>>
>>> On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote:
>>>
>>>>
>>>> try this:  (if you still get deadlocks, uncomment the advisory lock
>>>> [thanks Daniel] and try again)
>>>> Logically I suppose it might run faster to do the select, then insert
>>>> "if".  I almost always write these as insert first - because it's the more
>>>> restrictive lock.
>>>>
>>>> CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
>>>> RETURNS
>>>> INTEGER AS
>>>> $BODY$
>>>> DECLARE
>>>>  v_id integer;
>>>> BEGIN
>>>> --perform pg_advisory_xact_lock(hashtext(hometown_name));
>>>>BEGIN
>>>>  insert into hometowns (name)
>>>>  select hometown_name where not exists (select id from
>>>> hometowns where name = hometown_name)
>>>>  returning id into v_id;
>>>>
>>>
>>> That has a race condition. The only safe way to do this (outside of SSI)
>>> is using the example code at http://www.postgresql.org/
>>> docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
>>>
>>
>> And if the advisory lock is used?  That presumably creates an exclusive
>> lock on the asset "hometown_name". [in most examples given "Portland,
>> OR".]  Would not any other process that runs (this function) on the same
>> asset have to wait for this specific transaction to commit or roll back -
>> blocking the race condition?
>>
>> Roxanne
>> (sorry, I was out of town)
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

Re: [GENERAL] asynchronous commit

2015-01-19 Thread Robert DiFalco

Andreas, I think UNLOGGED would be something different but I'm not totally
clear. However, it seems to me that an unlogged table would simply
disappear (be truncated) after a server crash. That means instead of maybe
loosing a record or two that I could loose a ton or records. But maybe my
understanding is off.

On Mon, Jan 19, 2015 at 11:10 AM, Andreas Kretschmer <
akretsch...@spamfence.net> wrote:

> Robert DiFalco  wrote:
>
> > I have several tables that I use for logging and real-time stats. These
> are not
> > critical and since they are a bottleneck I want transactions against
> them to
> > always be asynchronous. Is there a way to specify this at a table level
> or do I
> > have to make sure to call set synchronous_commit='off'  every time I
> insert or
> > update to them? And presumably remember to turn it back on again for
> safety.
>
> I think, you can use unlogged tables instead.
>
>
> Andreas
> --
> Really, I'm not out to destroy Microsoft. That will just be a completely
> unintentional side effect.  (Linus Torvalds)
> "If I was god, I would recompile penguin with --enable-fly."   (unknown)
> Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

[GENERAL] asynchronous commit

2015-01-19 Thread Robert DiFalco

I have several tables that I use for logging and real-time stats. These are
not critical and since they are a bottleneck I want transactions against
them to always be asynchronous. Is there a way to specify this at a table
level or do I have to make sure to call set synchronous_commit='off'  every
time I insert or update to them? And presumably remember to turn it back on
again for safety.

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-16 Thread Robert DiFalco

Is there a way to force a new private transaction in a FUNCTION? That seems
like it would be a good solution here because I could simply do the insert
in the RAISE within its own private transaction. Then on the next iteration
of the loop (as long as I don't have repeatable reads) it should be picked
up. And there should only be a quick recoverable deadlock.

On Fri, Jan 16, 2015 at 7:49 AM, Daniel Verite 
wrote:

>     Robert DiFalco wrote:
>
> > I must be doing something wrong because both of these approaches are
> giving
> > me deadlock exceptions.
>
> Deadlocks are to be expected if the INSERTs are batched within a single
> transaction and there are several sessions doing this in parallel.
>
> Given that there's an unique constraint on hometowns(name), if this
> sequence
> happens (not even considering the "users" table to simplify):
>
> Session #1:  begin;
> Session #2:  begin;
> Session #1:  insert into hometowns(name) values('City1');
> Session #2:  insert into hometowns(name) values('City2');
> Session #1:  insert into hometowns(name) values('City2');
>   => Session #1 is put to wait until Session #2 commits or rollbacks
> Session #2:  insert into hometowns(name) values('City1');
>   => Session #2 should wait for Session #1 which is already waiting for
> Session #2: that's a deadlock
>
>
> It does not mean that the code meant to insert one user and the town
> without
> race condition is incorrect by itself. The unique_violation handler is not
> called in this scenario because the candidate row is not yet committed by
> the
> other session. This would work in an OLTP scenario when each "user" is
> commited after processing it.
>
> Avoiding deadlocks between parallel batches is a different problem than
> avoiding race conditions. If you have the above issue, I don't think you
> may
> solve it by tweaking the code of an individual process. It needs to be
> rethought at the level above, the one that initiates this in parallel and
> dictates the commit strategy.
>
> Best regards,
> --
> Daniel
> PostgreSQL-powered mail user agent and storage:
> http://www.manitou-mail.org
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-16 Thread Robert DiFalco

The code shown in the Doc (I think) will still give you deadlock in the
case where you have two sessions concurrently trying to insert the same
'hometown'. For example:

  INSERT INTO users VALUES('Tom', select_hometown_id('Portland, OR'));
  INSERT INTO users VALUES(''Waits', select_hometown_id('Portland, OR'));

The LOOP removes the race condition BUT with my limited knowledge I can
only think of two ways to prevent the deadlock. Some kind of lock (not sure
which) or force a separate transaction for the hometowns INSERT (also not
sure how to do this in a FUNCTION). Is there an accepted approach here?



On Thu, Jan 15, 2015 at 11:41 PM, Jim Nasby 
wrote:

> On 1/15/15 10:57 PM, Roxanne Reid-Bennett wrote:
>
>>
>> try this:  (if you still get deadlocks, uncomment the advisory lock
>> [thanks Daniel] and try again)
>> Logically I suppose it might run faster to do the select, then insert
>> "if".  I almost always write these as insert first - because it's the more
>> restrictive lock.
>>
>> CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
>> RETURNS
>> INTEGER AS
>> $BODY$
>> DECLARE
>>  v_id integer;
>> BEGIN
>> --perform pg_advisory_xact_lock(hashtext(hometown_name));
>>BEGIN
>>  insert into hometowns (name)
>>  select hometown_name where not exists (select id from hometowns
>> where name = hometown_name)
>>  returning id into v_id;
>>
>
> That has a race condition. The only safe way to do this (outside of SSI)
> is using the example code at http://www.postgresql.org/
> docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-15 Thread Robert DiFalco

FWIW I was using the select_hometown_id FUNCTION like this:

INSERT INTO users(...) values(..., select_hometown_id('Portland, OR'));



On Tue, Jan 13, 2015 at 2:21 PM, Brian Dunavant  wrote:

> The loop to run it twice handles that yes.  I don't think that buys
> you anything over a more traditional non-cte method though.  I'd run
> them a few thousand times to see if there's any difference in runtimes
> but my guess is the CTE version would be slightly slower here.
>
> v_id integer;
> BEGIN;
>   select id into v_id from hometowns where name = hometown_name;
>   BEGIN
> insert into hometowns (name)
> select hometown_name where v_id is null
> returning id into v_id;
>   EXCEPTION WHEN unique_violation
>   THEN
>  select id into v_id from hometowns where name = hometown_name;
>   END;
>   insert into users (name, hometown_id)
>   values ('Robert', v_id);
> END;
>
> On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco
>  wrote:
> > This seems to get rid of the INSERT race condition.
> >
> > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
> RETURNS
> > INTEGER AS $
> > DECLARE hometown_id INTEGER;
> > BEGIN
> >   LOOP
> > BEGIN
> >   WITH sel AS (
> >   SELECT id FROM hometowns WHERE name = hometown_name
> >   ), ins AS (
> > INSERT INTO hometowns (name)
> >   SELECT hometown_name
> >   WHERE NOT EXISTS(SELECT 1 FROM sel)
> > RETURNING id
> >   )
> >   SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
> >   RETURN hometown_id;
> >
> > EXCEPTION WHEN unique_violation
> >   THEN
> > END;
> >   END LOOP;
> > END;
> > $ LANGUAGE plpgsql;
> >
> >
> > On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant 
> wrote:
> >>
> >> With the single CTE I don't believe you can do a full upsert loop.  If
> >> you're doing this inside of a postgres function, your changes are
> >> already atomic, so I don't believe by switching you are buying
> >> yourself much (if anything) by using a CTE query instead of something
> >> more traditional here.
> >>
> >> The advantages of switching to a CTE would be if this code was all
> >> being done inside of the app code with multiple queries.
> >>
> >> On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
> >>  wrote:
> >> > Well, traditionally I would create a LOOP where I tried the SELECT, if
> >> > there
> >> > was nothing I did the INSERT, if that raised an exception I would
> repeat
> >> > the
> >> > LOOP.
> >> >
> >> > What's the best way to do it with the CTE? Currently I have the
> >> > following
> >> > which gives me Duplicate Key Exceptions when two sessions try to
> insert
> >> > the
> >> > same record at the same time.
> >> >
> >> > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
> >> > RETURNS
> >> > INTEGER AS $
> >> > DECLARE hometown_id INTEGER;
> >> > BEGIN
> >> >   WITH sel AS (
> >> >   SELECT id FROM hometowns WHERE name = hometown_name
> >> >   ), ins AS (
> >> > INSERT INTO hometowns (name)
> >> >   SELECT hometown_name
> >> >   WHERE NOT EXISTS(SELECT 1 FROM sel)
> >> > RETURNING id
> >> >   )
> >> >   SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM
> sel;
> >> >   RETURN hometown_id;
> >> > END;
> >> > $ LANGUAGE plpgsql;
> >> >
> >> > And that is no bueno. Should I just put the whole thing in a LOOP?
> >
> >
>

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-15 Thread Robert DiFalco

I must be doing something wrong because both of these approaches are giving
me deadlock exceptions.

On Tue, Jan 13, 2015 at 2:21 PM, Brian Dunavant  wrote:

> The loop to run it twice handles that yes.  I don't think that buys
> you anything over a more traditional non-cte method though.  I'd run
> them a few thousand times to see if there's any difference in runtimes
> but my guess is the CTE version would be slightly slower here.
>
> v_id integer;
> BEGIN;
>   select id into v_id from hometowns where name = hometown_name;
>   BEGIN
> insert into hometowns (name)
> select hometown_name where v_id is null
> returning id into v_id;
>   EXCEPTION WHEN unique_violation
>   THEN
>  select id into v_id from hometowns where name = hometown_name;
>   END;
>   insert into users (name, hometown_id)
>   values ('Robert', v_id);
> END;
>
> On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco
>  wrote:
> > This seems to get rid of the INSERT race condition.
> >
> > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
> RETURNS
> > INTEGER AS $
> > DECLARE hometown_id INTEGER;
> > BEGIN
> >   LOOP
> > BEGIN
> >   WITH sel AS (
> >   SELECT id FROM hometowns WHERE name = hometown_name
> >   ), ins AS (
> > INSERT INTO hometowns (name)
> >   SELECT hometown_name
> >   WHERE NOT EXISTS(SELECT 1 FROM sel)
> > RETURNING id
> >   )
> >   SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
> >   RETURN hometown_id;
> >
> > EXCEPTION WHEN unique_violation
> >   THEN
> > END;
> >   END LOOP;
> > END;
> > $ LANGUAGE plpgsql;
> >
> >
> > On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant 
> wrote:
> >>
> >> With the single CTE I don't believe you can do a full upsert loop.  If
> >> you're doing this inside of a postgres function, your changes are
> >> already atomic, so I don't believe by switching you are buying
> >> yourself much (if anything) by using a CTE query instead of something
> >> more traditional here.
> >>
> >> The advantages of switching to a CTE would be if this code was all
> >> being done inside of the app code with multiple queries.
> >>
> >> On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
> >>  wrote:
> >> > Well, traditionally I would create a LOOP where I tried the SELECT, if
> >> > there
> >> > was nothing I did the INSERT, if that raised an exception I would
> repeat
> >> > the
> >> > LOOP.
> >> >
> >> > What's the best way to do it with the CTE? Currently I have the
> >> > following
> >> > which gives me Duplicate Key Exceptions when two sessions try to
> insert
> >> > the
> >> > same record at the same time.
> >> >
> >> > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
> >> > RETURNS
> >> > INTEGER AS $
> >> > DECLARE hometown_id INTEGER;
> >> > BEGIN
> >> >   WITH sel AS (
> >> >   SELECT id FROM hometowns WHERE name = hometown_name
> >> >   ), ins AS (
> >> > INSERT INTO hometowns (name)
> >> >   SELECT hometown_name
> >> >   WHERE NOT EXISTS(SELECT 1 FROM sel)
> >> > RETURNING id
> >> >   )
> >> >   SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM
> sel;
> >> >   RETURN hometown_id;
> >> > END;
> >> > $ LANGUAGE plpgsql;
> >> >
> >> > And that is no bueno. Should I just put the whole thing in a LOOP?
> >
> >
>

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-13 Thread Robert DiFalco

Good points. I guess my feeling is that if there can be a race condition on
INSERT then the CTE version is not truly atomic, hence the LOOP.

On Tue, Jan 13, 2015 at 3:11 PM, Brian Dunavant  wrote:

> A very good point, but it does not apply as here (and in my article)
> we are not using updates, only insert and select.
>
>
>
> On Tue, Jan 13, 2015 at 6:03 PM, Thomas Kellerer 
> wrote:
> > Brian Dunavant wrote on 13.01.2015 22:33:
> >>
> >> What issue are you having?  I'd imagine you have a race condition on
> >> the insert into hometowns, but you'd have that same race condition in
> >> your app code using a more traditional 3 query version as well.
> >>
> >> I often use CTEs like this to make things atomic.  It allows me to
> >> remove transactional code out of the app and also to increase
> >> performance by reducing the back-and-forth to the db.
> >> http://omniti.com/seeds/writable-ctes-improve-performance
> >>
> >
> > Craig Ringer explained some of the pitfalls of this approach here:
> >
> >
> http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates
> >
> > which is a follow up question based on this:
> > http://stackoverflow.com/a/8702291/330315
> >
> > Thomas
> >
> >
> >
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-13 Thread Robert DiFalco

This seems to get rid of the INSERT race condition.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
  LOOP
BEGIN
  WITH sel AS (
  SELECT id FROM hometowns WHERE name = hometown_name
  ), ins AS (
INSERT INTO hometowns (name)
  SELECT hometown_name
  WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
  )
  SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
  RETURN hometown_id;

EXCEPTION WHEN unique_violation
  THEN
END;
  END LOOP;
END;
$ LANGUAGE plpgsql;


On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant  wrote:

> With the single CTE I don't believe you can do a full upsert loop.  If
> you're doing this inside of a postgres function, your changes are
> already atomic, so I don't believe by switching you are buying
> yourself much (if anything) by using a CTE query instead of something
> more traditional here.
>
> The advantages of switching to a CTE would be if this code was all
> being done inside of the app code with multiple queries.
>
> On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
>  wrote:
> > Well, traditionally I would create a LOOP where I tried the SELECT, if
> there
> > was nothing I did the INSERT, if that raised an exception I would repeat
> the
> > LOOP.
> >
> > What's the best way to do it with the CTE? Currently I have the following
> > which gives me Duplicate Key Exceptions when two sessions try to insert
> the
> > same record at the same time.
> >
> > CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
> RETURNS
> > INTEGER AS $
> > DECLARE hometown_id INTEGER;
> > BEGIN
> >   WITH sel AS (
> >   SELECT id FROM hometowns WHERE name = hometown_name
> >   ), ins AS (
> > INSERT INTO hometowns (name)
> >   SELECT hometown_name
> >   WHERE NOT EXISTS(SELECT 1 FROM sel)
> > RETURNING id
> >   )
> >   SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
> >   RETURN hometown_id;
> > END;
> > $ LANGUAGE plpgsql;
> >
> > And that is no bueno. Should I just put the whole thing in a LOOP?
>

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-13 Thread Robert DiFalco

Well, traditionally I would create a LOOP where I tried the SELECT, if
there was nothing I did the INSERT, if that raised an exception I would
repeat the LOOP.

What's the best way to do it with the CTE? Currently I have the following
which gives me Duplicate Key Exceptions when two sessions try to insert the
same record at the same time.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
  WITH sel AS (
  SELECT id FROM hometowns WHERE name = hometown_name
  ), ins AS (
INSERT INTO hometowns (name)
  SELECT hometown_name
  WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
  )
  SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
  RETURN hometown_id;
END;
$ LANGUAGE plpgsql;

And that is no bueno. Should I just put the whole thing in a LOOP?

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-13 Thread Robert DiFalco

This CTE approach doesn't appear to play well with multiple concurrent
transactions/connections.

On Tue, Jan 13, 2015 at 10:05 AM, John McKown 
wrote:

> On Tue, Jan 13, 2015 at 11:45 AM, Robert DiFalco  > wrote:
>
>> Thanks John. I've been seeing a lot of examples like this lately. Does
>> the following approach have any advantages over traditional approaches?
>> 
>>
>> WITH sel AS (
>> SELECT id FROM hometowns WHERE name = 'Portland'
>> ), ins AS (
>>   INSERT INTO hometowns(name)
>> SELECT 'Portland'
>> WHERE NOT EXISTS (SELECT 1 FROM sel)
>>   RETURNING id
>> )
>> INSERT INTO users(name, hometown_id)
>> VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel);
>>
>>
>>
> Oh, that is very clever. I've not see such a thing before. Thanks.
>
> I've added it to my stable of "tricks". Which aren't really tricks, just
> really nice new methods to do something.
>
> The main advantage that I can see is that it is a single SQL statement to
> send to the server. That makes it "self contained" so that it would be more
> difficult for someone to accidentally mess it up. On the other hand, CTEs
> are still a bit new (at least to me) and so the "why it works" might not be
> very obvious to other programmers who might need to maintain the
> application. To many this "lack of obviousness" is a detriment. To me, it
> means "update your knowledge". But then, I am sometimes a arrogant BOFH.
> Add that to my being an surly old curmudgeon, and you can end up with some
> bad advice when in a "corporate" environment. The minus, at present, is
> that it is "clever" and so may violate corporate coding standards due to
> "complexity". Or maybe I just work for a staid company.
>
> --
> 
> While a transcendent vocabulary is laudable, one must be eternally careful
> so that the calculated objective of communication does not become ensconced
> in obscurity.  In other words, eschew obfuscation.
>
> 111,111,111 x 111,111,111 = 12,345,678,987,654,321
>
> Maranatha! <><
> John McKown
>

Re: [GENERAL] Simple Atomic Relationship Insert

2015-01-13 Thread Robert DiFalco

Thanks John. I've been seeing a lot of examples like this lately. Does the
following approach have any advantages over traditional approaches?

WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
  INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
  RETURNING id
)
INSERT INTO users(name, hometown_id)
VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel);


On Tue, Jan 13, 2015 at 8:50 AM, John McKown 
wrote:

> On Tue, Jan 13, 2015 at 10:42 AM, Robert DiFalco  > wrote:
>
>> Let's say I have two tables like this (I'm leaving stuff out for
>> simplicity):
>>
>> CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
>> CREATE TABLE hometowns (
>>   id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
>>   name VARCHAR,
>>   PRIMARY KEY (id),
>>   UNIQUE(name)
>> );
>>
>> CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
>> CREATE TABLE users (
>>   id  BIGINT DEFAULT nextval('USER_SEQ_GEN'),
>>   hometown_id INTEGER,
>>   nameVARCHAR NOT NULL,
>>   PRIMARY KEY (id),
>>   FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
>> );
>>
>> The hometowns table is populate as users are created.  For example, a
>> client may submit {"name":"Robert", "hometown":"Portland"}.
>>
>> The hometowns table will never be updated, only either queries or
>> inserted.
>>
>> So given this I need to INSERT a row into "users" and either SELECT the
>> hometowns.id that matches "Portland" or if it doesn't exist I INSERT it
>> returning the hometowns.id".
>>
>> Normally I would do by first doing a SELECT on hometown. If I don't get
>> anything I do an INSERT into hometown RETURNING the id. If THAT throws an
>> error then I do the SELECT again. Now I'm finally ready to INSERT into
>> users using the hometowns.id from the above steps.
>>
>> But wow, that seems like a lot of code for a simple "Add if doesn't
>> exist" foreign key relationship -- but this is how I've always done.
>>
>> So my question. Is there a simpler, more performant, or thread-safe way
>> to do this?
>>
>> Thanks!
>>
>>
> What occurs to me is to simply do an INSERT into the "hometowns" table
> and just ignore the "already exists" return indication. Then do a SELECT to
> get the hometowns id which now exists, then INSERT the users. but I could
> easily be overlooking some reason why this wouldn't work properly.
>
>
> --
> 
> While a transcendent vocabulary is laudable, one must be eternally careful
> so that the calculated objective of communication does not become ensconced
> in obscurity.  In other words, eschew obfuscation.
>
> 111,111,111 x 111,111,111 = 12,345,678,987,654,321
>
> Maranatha! <><
> John McKown
>

[GENERAL] Simple Atomic Relationship Insert

2015-01-13 Thread Robert DiFalco

Let's say I have two tables like this (I'm leaving stuff out for
simplicity):

CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
  id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
  name VARCHAR,
  PRIMARY KEY (id),
  UNIQUE(name)
);

CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
  id  BIGINT DEFAULT nextval('USER_SEQ_GEN'),
  hometown_id INTEGER,
  nameVARCHAR NOT NULL,
  PRIMARY KEY (id),
  FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);

The hometowns table is populate as users are created.  For example, a
client may submit {"name":"Robert", "hometown":"Portland"}.

The hometowns table will never be updated, only either queries or inserted.

So given this I need to INSERT a row into "users" and either SELECT the
hometowns.id that matches "Portland" or if it doesn't exist I INSERT it
returning the hometowns.id".

Normally I would do by first doing a SELECT on hometown. If I don't get
anything I do an INSERT into hometown RETURNING the id. If THAT throws an
error then I do the SELECT again. Now I'm finally ready to INSERT into
users using the hometowns.id from the above steps.

But wow, that seems like a lot of code for a simple "Add if doesn't exist"
foreign key relationship -- but this is how I've always done.

So my question. Is there a simpler, more performant, or thread-safe way to
do this?

Thanks!

Re: [GENERAL] Combining two queries

2014-12-18 Thread Robert DiFalco

The INNER JOIN to itself with a count turns out to have a lower cost query
plan than the INTERSECT approach. On the ROW approach, it also seems to
take longer than the simple EXISTS query. But I suppose I can put both of
those into CTEs for convenience. I guess I was just hoping there was a
lower cost approach than what I was already doing.

On Thu, Dec 18, 2014 at 2:07 PM, David Johnston 
wrote:
>
> On Thu, Dec 18, 2014 at 3:02 PM, Robert DiFalco 
> wrote:
>
>> Is the intersect any better than what I originally showed? On the ROW
>> approach, I'm not sure where the context for that is coming from since it
>> may not be in the intersection. Consider n1 and n2 are NOT friends but they
>> have >0 mutual friends between them.
>>
>>
> The INTERSECT is a lot more direct about finding mutual friends.  The
> ROW() = ROW() piece is independent of the mutual friends question - it
> should be put in a WHERE clause and you can test whether a row is returned
> which, if one is, means the two people are friends.
>
> "One Query" does not mean you need to do everything all-at-once.  I
> suggest you make use of CTEs (WITH) subqueries for each distinct
> calculation you need then join all of the CTE items together in a final
> query the outputs the data in the format desired.
>
> David J.
>
>

Re: [GENERAL] Combining two queries

2014-12-18 Thread Robert DiFalco

Is the intersect any better than what I originally showed? On the ROW
approach, I'm not sure where the context for that is coming from since it
may not be in the intersection. Consider n1 and n2 are NOT friends but they
have >0 mutual friends between them.

On Thu, Dec 18, 2014 at 1:29 PM, David G Johnston <
david.g.johns...@gmail.com> wrote:
>
> Robert DiFalco wrote
> > For 2 arbitrary ids, I need a query to get two pieced of data:
> >* Are the two users friends?
>
> This seems easy...ROW(u_id, f_id) = ROW(n1, n2)
>
>
> >* How many friends do the two users have in common.
>
> SELECT f_id FROM [...] WHERE u_id = n1
> INTERSECT
> SELECT f_id FROM [...] WHERE u_id = n2
>
> Put those into WITH/CTE and use the main query to combine them in whatever
> way seems appropriate.
>
> David J.
>
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Combining-two-queries-tp5831378p5831391.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

[GENERAL] Combining two queries

2014-12-18 Thread Robert DiFalco

I have a table called friends with a user_id and a friend_id (both of these
relate to an id in a users table).

For each friend relationship there are two rows. There are currently ONLY
reciprocal relationships. So if user ids 1 and 2 are friends there will be
two rows (1,2) and (2,1).

For 2 arbitrary ids, I need a query to get two pieced of data:
   * Are the two users friends?
   * How many friends do the two users have in common.

Is there a way to do this with one query? Currently I've only been able to
figure out how to do it with two.

SELECT
  EXISTS(
SELECT 1
FROM friends
WHERE user_id = 166324 AND friend_id = 166325) AS friends,
  (SELECT COUNT(1)
FROM friends f1 JOIN friends f2 ON f1.friend_id = f2.friend_id
WHERE f1.user_id = 166324 AND f2.user_id = 166325) AS mutual;

I'm wondering if there is a better way to do this using only one query.
I've tried a couple of GROUP BY approaches but they haven't worked.

Re: [GENERAL] Help Optimizing a Summary Query

2014-12-11 Thread Robert DiFalco

Thanks Arthur. I don't think there is as big a different between BIGINT and
INTEGER as you think there is. In fact with an extended filesystem you
might not see any difference at all.

As I put in the first emal I am using a GIST index on user.name.

I was really more interested in the LEFT OUTER JOINs vs EXISTS queries and
if there was a better alternative I had not considered.

On Tue, Dec 9, 2014 at 11:44 AM, Arthur Silva  wrote:

> On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco 
> wrote:
>
>> I'm sorry, I missed a JOIN on the second variation. It is:
>>
>> SELECT u.id, u.name, u.imageURL, u.bio,
>>CASE
>>   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>>   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>>   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>>   ELSE 'none'
>>END AS 'friendStatus',
>>(SELECT COUNT(1) AS d
>>   FROM friends f1
>>  JOIN friends f2 ON f1.fiend_id = f2.friend_id
>>   WHERE f1.user_id = 33 AND f2.user_id = u.id)
>> FROM users u
>> *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
>> <http://u.id>*
>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>> WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;
>>
>>
>> On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco > > wrote:
>>
>>> I have users, friends, and friend_requests. I need a query that
>>> essentially returns a summary containing:
>>>
>>> * user (name, imageURL, bio, ...)
>>> * Friend status (relative to an active user)
>>>* Is the user a friend of the active user?
>>>* Has the user sent a friend request to the active user?
>>>* Has the user received a friend request from the active user?
>>> * # of mutualFriends
>>> * Exclude the active user from the result set.
>>>
>>> So I have mocked this up two ways but both have complicated query plans
>>> that will be problematic with large data sets. I'm thinking that my lack of
>>> deep SQL knowledge is making me miss the obvious choice.
>>>
>>> Here's my two query examples:
>>>
>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>>CASE
>>>   WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
>>> f.friend_id = u.id)   THEN 'isFriend'
>>>   WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
>>> AND s.from_id = u.id) THEN 'hasSentRequest'
>>>   WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
>>> AND r.from_id = 33)   THEN 'hasReceivedRequest'
>>>   ELSE 'none'
>>>END AS "friendStatus",
>>>(SELECT COUNT(1)
>>>   FROM friends f1
>>>  JOIN friends f2 ON f1.friend_id = f2.friend_id
>>>   WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
>>> FROM users u
>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>>
>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>>CASE
>>>   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>>>   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>>>   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>>>   ELSE 'none'
>>>END AS 'friendStatus',
>>>(SELECT COUNT(1) AS d
>>>   FROM friends f1
>>>  JOIN friends f2 ON f1.fiend_id = f2.friend_id
>>>   WHERE f1.user_id = 33 AND f2.user_id = u.id)
>>> FROM users u
>>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>>
>>> 33 is just the id of the active user I am using for testing. The WHERE
>>> clause could be anything. I'm just using "u.name" here but I'm more
>>> concerned about the construction of the result set than the WHERE clause.
>>> These have more or less similar query plans, nothing that would change
>>> things factorially. Is this the best I can do or am I missing the obvious?
>>>
>>> Here are the tables:
>>>
>>>
>>> CREATE TABLE users (
>>>   idBIGINT,
>>>   name  VAR

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1119 matches

Mail list logo