Re: [GENERAL] Combining two queries

2014-12-18 Thread Patrick Krecker
On Thu, Dec 18, 2014 at 1:57 PM, Robert DiFalco
 wrote:
> Thanks! So how would I combine them so that I would get a single row with
> the mutual friend count and isFriends for a given pair of users? I can't
> figure out how to modify what you've posted so that it gives the results
> like the compound query I quoted in my original post.
>
> On Thu, Dec 18, 2014 at 1:31 PM, Patrick Krecker 
> wrote:
>>
>> On Thu, Dec 18, 2014 at 12:10 PM, Robert DiFalco
>>  wrote:
>> > I have a table called friends with a user_id and a friend_id (both of
>> > these
>> > relate to an id in a users table).
>> >
>> > For each friend relationship there are two rows. There are currently
>> > ONLY
>> > reciprocal relationships. So if user ids 1 and 2 are friends there will
>> > be
>> > two rows (1,2) and (2,1).
>> >
>> > For 2 arbitrary ids, I need a query to get two pieced of data:
>> >* Are the two users friends?
>> >* How many friends do the two users have in common.
>> >
>> > Is there a way to do this with one query? Currently I've only been able
>> > to
>> > figure out how to do it with two.
>> >
>> > SELECT
>> >   EXISTS(
>> > SELECT 1
>> > FROM friends
>> > WHERE user_id = 166324 AND friend_id = 166325) AS friends,
>> >   (SELECT COUNT(1)
>> > FROM friends f1 JOIN friends f2 ON f1.friend_id = f2.friend_id
>> > WHERE f1.user_id = 166324 AND f2.user_id = 166325) AS mutual;
>> >
>> > I'm wondering if there is a better way to do this using only one query.
>> > I've
>> > tried a couple of GROUP BY approaches but they haven't worked.
>>
>> Assuming the friendships are not repeated (that is, if 1 is friends
>> with 3, then the tuple (1, 3) appears only once), you can find just
>> the mutual friends by using this one:
>>
>> (This would be for users 1 and 2):
>>
>> SELECT friend_id FROM friends WHERE user_id IN (1, 2) GROUP BY
>> friend_id HAVING count(friend_id) > 1;
>>
>> You can additionally test if 1 and 2 are friends by doing:
>>
>> SELECT friend_id FROM friends WHERE user_id IN (1, 2) OR (user_id = 1
>> AND friend_id = 2) GROUP BY friend_id HAVING (count(friend_id) > 1 OR
>> friend_id = 1);
>>
>> If 1 appears in the list, then 1 and 2 are friends. Any other rows are
>> the mutual friends.

(adding back psql-general)

Well it would not be a single SELECT statement anymore :)

There are probably other ways of doing this, I just came up with this
one off the top of my head:

SELECT is_user, c FROM (
SELECT friend_id = USER1 AS is_user, count(friend_id) OVER
(partition BY friend_id = USER1) AS c FROM (
SELECT friend_id FROM friends WHERE user_id IN (USER1, USER2)
OR (user_id = USER1 AND friend_id = USER2) GROUP BY friend_id HAVING
(count(friend_id) > USER1 OR friend_id = USER1)
) AS t1)
AS t2 GROUP BY is_user, c;

It should return 2 rows, one with is_user = t and one with is_user =
f. is_user = t will be present if the two users are friends, and will
always have c = 1. is_user = f will be present if there are mutual
friends, and c will be the number of mutual friends.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Combining two queries

2014-12-18 Thread Patrick Krecker
On Thu, Dec 18, 2014 at 12:10 PM, Robert DiFalco
 wrote:
> I have a table called friends with a user_id and a friend_id (both of these
> relate to an id in a users table).
>
> For each friend relationship there are two rows. There are currently ONLY
> reciprocal relationships. So if user ids 1 and 2 are friends there will be
> two rows (1,2) and (2,1).
>
> For 2 arbitrary ids, I need a query to get two pieced of data:
>* Are the two users friends?
>* How many friends do the two users have in common.
>
> Is there a way to do this with one query? Currently I've only been able to
> figure out how to do it with two.
>
> SELECT
>   EXISTS(
> SELECT 1
> FROM friends
> WHERE user_id = 166324 AND friend_id = 166325) AS friends,
>   (SELECT COUNT(1)
> FROM friends f1 JOIN friends f2 ON f1.friend_id = f2.friend_id
> WHERE f1.user_id = 166324 AND f2.user_id = 166325) AS mutual;
>
> I'm wondering if there is a better way to do this using only one query. I've
> tried a couple of GROUP BY approaches but they haven't worked.

Assuming the friendships are not repeated (that is, if 1 is friends
with 3, then the tuple (1, 3) appears only once), you can find just
the mutual friends by using this one:

(This would be for users 1 and 2):

SELECT friend_id FROM friends WHERE user_id IN (1, 2) GROUP BY
friend_id HAVING count(friend_id) > 1;

You can additionally test if 1 and 2 are friends by doing:

SELECT friend_id FROM friends WHERE user_id IN (1, 2) OR (user_id = 1
AND friend_id = 2) GROUP BY friend_id HAVING (count(friend_id) > 1 OR
friend_id = 1);

If 1 appears in the list, then 1 and 2 are friends. Any other rows are
the mutual friends.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Strange error message when reference non-existent column foo."count"

2014-12-17 Thread Patrick Krecker
On Wed, Dec 17, 2014 at 3:11 PM, David G Johnston <
david.g.johns...@gmail.com> wrote:
>
> Patrick Krecker wrote
> > I encountered this today and it was quite surprising:
> >
> > select version();
> >version
> >
> >
> --
> >  PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
> > 4.8.2-19ubuntu1) 4.8.2, 64-bit
> >
> > create table foo as (select generate_series(1,3));
> >
> > As expected, the following fails:
> >
> > select count from foo;
> > ERROR:  column "count" does not exist
> > LINE 1: select count from foo;
> >^
> > But if I change the syntax to something I thought was equivalent:
> >
> > select foo."count" from foo;
> >  count
> > ---
> >  3
> > (1 row)
> >
> > It works! This was quite surprising to me. Is this expected behavior,
> that
> > you can call an aggregate function without any parentheses (I can't find
> > any other syntax that works for count() sans parentheses, and this
> > behavior
> > doesn't occur for any other aggregate)?
>
> That fact that this is an aggregate function is beside the point - the
> syntax works for any function.
>
> The following two expressions are equivalent:
>
> count(foo) = foo.count
>
> I do not immediately recall where this is documented but it is.  It should
> probably be documented or cross-referenced at:
>
>
> http://www.postgresql.org/docs/9.3/static/sql-syntax-calling-funcs.html#SQL-SYNTAX-CALLING-FUNCS-NAMED
>
> but alas that is not so.
>
> The basic idea is to hide the function invocation and allow for
> syntactically similar derived columns to be described.
>
> (goes looking)
>
> 4.2.6 - the note therein:
>
> http://www.postgresql.org/docs/9.3/static/sql-expressions.html#FIELD-SELECTION
> pointing to 35.4.3
>
> http://www.postgresql.org/docs/9.3/static/xfunc-sql.html#XFUNC-SQL-COMPOSITE-FUNCTIONS
>
> This relies on the rule that every table automatically has an implicit type
> created and so a "composite function" can act on that type.  The "foo."
> reference in your example is technically referring to the type "foo" and
> not
> the table "foo".
>
> David J.
>
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Strange-error-message-when-reference-non-existent-column-foo-count-tp5831200p5831204.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Well, that clears it up. Thanks!


Re: [GENERAL] Strange error message when reference non-existent column foo."count"

2014-12-17 Thread Patrick Krecker
Sorry, I changed the email as I was writing it but I forgot to change the
subject line. An appropriate subject would be 'Strange behavior when
referencing non-existent column foo."count".'

On Wed, Dec 17, 2014 at 2:50 PM, Patrick Krecker 
wrote:
>
> I encountered this today and it was quite surprising:
>
> select version();
>version
>
>
> --
>  PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
> 4.8.2-19ubuntu1) 4.8.2, 64-bit
>
> create table foo as (select generate_series(1,3));
>
> As expected, the following fails:
>
> select count from foo;
> ERROR:  column "count" does not exist
> LINE 1: select count from foo;
>^
> But if I change the syntax to something I thought was equivalent:
>
> select foo."count" from foo;
>  count
> ---
>  3
> (1 row)
>
> It works! This was quite surprising to me. Is this expected behavior, that
> you can call an aggregate function without any parentheses (I can't find
> any other syntax that works for count() sans parentheses, and this behavior
> doesn't occur for any other aggregate)?
>


[GENERAL] Strange error message when reference non-existent column foo."count"

2014-12-17 Thread Patrick Krecker
I encountered this today and it was quite surprising:

select version();
   version

--
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit

create table foo as (select generate_series(1,3));

As expected, the following fails:

select count from foo;
ERROR:  column "count" does not exist
LINE 1: select count from foo;
   ^
But if I change the syntax to something I thought was equivalent:

select foo."count" from foo;
 count
---
 3
(1 row)

It works! This was quite surprising to me. Is this expected behavior, that
you can call an aggregate function without any parentheses (I can't find
any other syntax that works for count() sans parentheses, and this behavior
doesn't occur for any other aggregate)?


Re: [GENERAL] Pausing log shipping for streaming replication

2014-12-15 Thread Patrick Krecker
On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh
 wrote:
>
>
> On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker 
> wrote:
>>
>> On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh 
>> wrote:
>> > Hello,
>> >
>> > I have a master multi slave streaming replication setup. One master and
>> > two
>> > slaves. I need to do some maintenance on one of the slaves as one of the
>> > drives died however there is some other weird things going on in that
>> > array
>> > that I would need to investigate. So I am expecting the machine to be
>> > down
>> > at least two hours.
>> >
>> > I remember reading that if a master cannot connect to the slave it would
>> > hold the log file from shipping. Is there any other way to hold the file
>> > until the slave comes back online? Would it affect both slaves not
>> > getting
>> > their files shipped over?
>> >
>> > The good thing is that the slave in question is not serving any
>> > connections.
>> >
>> > From what I remember emptying out the archive_command would pause log
>> > shipping. Can the same be done by issuing a pg_stop_backup()?
>> >
>> > Thanks,
>> > -Joseph Kregloh
>>
>> I think you will need to change your archive_command so it saves the
>> WALs to a location reachable by both slaves and the master, and have
>> both slaves pull from the same location. I don't think
>> pg_stop_backup() is useful in this situation.
>>
>
> Currently my archive_command is to a sh script which internally does an
> rsync. It actually rsyncs to both slaves and then a Barman location. If I
> fail the archive_command, then i'll have a problem because my primary slave
> serves read only queries, so it might start serving out stale data.
>
> What I was thinking is shipping the log files that would go to the second
> slave to another machine or location on the master. Then once I am done with
> the maintenance i'll move those files over to the incoming folder. That
> would give a hopefully contain all the WAL files for the slave to catch up.
> Any thoughts against this?

Seems OK as long as you have the disk space to support the
accumulation of WALs (considering for the situation where the downtime
is much longer than anticipated).

When you say "i'll move those files over to the incoming folder," what
do you mean? I think that restore_command should be used on the slave
to retrieve the WALs from the archive location. Once the secondary has
caught up, you can change the configuration back to the old setup and
remove the accumulated WALs from the temporary location.

>
>>
>> The master will hold the logs as long as archive_command fails [1]. To
>> the extent that archive_command involves connecting to the slave, then
>> yes, Postgres will hold the WAL archives while the slave is down.
>> There are (at least) two reasons that saving the archives to some
>> other location is useful:
>>
>> 1) You don't risk running out of disk on the master due to batched up
>> WALs if a slave goes down.
>> 2) The backup of logs can be used to aid in point-in-time recovery.
>>
>> [1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
>
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Pausing log shipping for streaming replication

2014-12-15 Thread Patrick Krecker
On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh  wrote:
> Hello,
>
> I have a master multi slave streaming replication setup. One master and two
> slaves. I need to do some maintenance on one of the slaves as one of the
> drives died however there is some other weird things going on in that array
> that I would need to investigate. So I am expecting the machine to be down
> at least two hours.
>
> I remember reading that if a master cannot connect to the slave it would
> hold the log file from shipping. Is there any other way to hold the file
> until the slave comes back online? Would it affect both slaves not getting
> their files shipped over?
>
> The good thing is that the slave in question is not serving any connections.
>
> From what I remember emptying out the archive_command would pause log
> shipping. Can the same be done by issuing a pg_stop_backup()?
>
> Thanks,
> -Joseph Kregloh

I think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.

The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:

1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.

[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] is there a warm standby sync trigger?

2014-11-26 Thread Patrick Krecker
On Tue, Nov 25, 2014 at 6:37 PM, Sameer Kumar  wrote:
>
> On Fri, Oct 24, 2014 at 8:01 PM, Michael Paquier 
> wrote:
>>
>> On Fri, Oct 24, 2014 at 12:30 AM, John Smith 
>> wrote:
>> >
>> > i want to setup a warm standby that listens 24/7 but only syncs when
>> > told to (ie only when i am ok with the database updates, will i
>> > trigger the sync).
>> > can i?
>> >
>> > i don't want to manually backup and restore like i do now.
>>
>> That's what pause_at_recovery_target is aimed for:
>> http://www.postgresql.org/docs/devel/static/recovery-target-settings.html
>> Simply set up the recovery target you want to check, and use
>> pause_at_recovery_target to put the standby in a latent state you can
>> check. If the state of your server does not satisfy your needs,
>> shutdown the server and change the target. Note that operations are
>> not backward btw.
>
>
> If you were using hot-standby, you could have used pg_pause_recovery() /
> pg_resume_recovery()
>
> To pause when needed and continue later. e.g. when you are doing bulk loads
> or during a window when you see network congestion.
>
>
> Best Regards,
>
> Sameer Kumar | Database Consultant
>
> ASHNIK PTE. LTD.
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533
>
> M: +65 8110 0350  T: +65 6438 3504 | www.ashnik.com
>
>
>
>
>
> This email may contain confidential, privileged or copyright material and is
> solely for the use of the intended recipient(s).

I have never heard of pg_pause_recovery() or pg_resume_recovery().
However, you can use pg_xlog_replay_pause() and
pg_xlog_replay_resume() to pause and resume recovery. As Sameer
mentioned, they require the system to be in hot standby mode to use
them.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] WAL receive process dies

2014-09-01 Thread Patrick Krecker
On Sat, Aug 30, 2014 at 7:04 AM, Chris Hundt  wrote:

> (One of Patrick's coworkers here; thanks a lot for your assistance)
>
> Just in case you wanted this as well, I ran
>
> psql 'replication=1 dbname=XXX host=127.0.0.1 port=5432 user=XXX
> password=XXX' -c 'IDENTIFY_SYSTEM;'
>
> (5432 is the stuck replica and 5445 is the pipe to the working replica)
>
>   systemid   | timeline |   xlogpos
> -+--+--
>  5964163898407843711 |    1 | 174/B76D16A8
>
>
> On Fri, Aug 29, 2014 at 7:26 PM, Patrick Krecker 
> wrote:
>
>>
>>
>>
>> On Fri, Aug 29, 2014 at 3:46 PM, Andres Freund 
>> wrote:
>>
>>> [FWIW: proper quoting makes answering easier and thus more likely]
>>>
>>> On 2014-08-29 15:37:51 -0700, Patrick Krecker wrote:
>>> > I ran the following on the local endpoint of spiped:
>>> >
>>> > while [ true ]; do psql -h localhost -p 5445 judicata -U marbury -c
>>> "select
>>> > modtime, pg_last_xlog_receive_location(),
>>> pg_last_xlog_replay_location()
>>> > from replication_time";  done;
>>> >
>>> > And the same command on production and I was able to verify that the
>>> xlogs
>>> > for a given point in time were the same (modtime is updated every
>>> second by
>>> > an upstart job):
>>> >
>>> > spiped from office -> production:
>>> >   modtime   | pg_last_xlog_receive_location |
>>> > pg_last_xlog_replay_location
>>> >
>>> +---+--
>>> >  2014-08-29 15:23:25.563766 | 177/2E80C9F8  |
>>> 177/2E80C9F8
>>> >
>>> > Ran directly on production replica:
>>> >   modtime   | pg_last_xlog_receive_location |
>>> > pg_last_xlog_replay_location
>>> >
>>> +---+--
>>> >  2014-08-29 15:23:25.563766 | 177/2E80C9F8  |
>>> 177/2E80C9F8
>>> >
>>> > To me, this is sufficient proof that spiped is indeed talking to the
>>> > machine I think it's talking to (also lsof reports the correct
>>> hostname).
>>> >
>>> > I created another basebackup from the currently stuck postgres intance
>>> on
>>> > another machine and I also get this error:
>>> >
>>> > 2014-08-29 15:27:30 PDT FATAL:  could not receive data from WAL stream:
>>> > ERROR:  requested starting point 177/2D00 is ahead of the WAL flush
>>> > position of this server 174/B76D16A8
>>>
>>> Uh. this indicates that the machine you're talking to is *not* one of
>>> the above as it has a flush position of '174/B76D16A8' - not something
>>> that's really possible when the node actually is at '177/2E80C9F8'.
>>>
>>> Could you run, on the standby that's having problems, the following
>>> command:
>>> psql 'host=127.0.0.1 port=5445 user=XXX password=XXX' -c
>>> 'IDENTIFY_SYSTEM;'
>>>
>>> Greetings,
>>>
>>> Andres Freund
>>>
>>> --
>>>  Andres Freund http://www.2ndQuadrant.com/
>>>  PostgreSQL Development, 24x7 Support, Training & Services
>>>
>>
>> RE: quoting, I wonder if Gmail is messing it up somehow? Or am I doing
>> something else wrong? Sorry :(
>>
>>  First, I apologize for the misleading information, but when I made
>> another basebackup and tried to use it, I configured the machine to cascade
>> from the stuck replica, *not* from the spiped endpoint. When I properly
>> connected it to the spiped endpoint it synced up fine, giving this log line:
>>
>> 2014-08-29 16:16:21 PDT LOG:  started streaming WAL from primary at
>> 177/4F00 on timeline 1
>>
>> The command as you gave reported a syntax error as is, but I googled a
>> little bit and run this one:
>>
>> psql 'replication=1 dbname=XXX host=127.0.0.1 port=5445 user=XXX
>> password=XXX' -c 'IDENTIFY_SYSTEM;'
>>
>> And it gave me this output:
>>
>>   systemid   | timeline |   xlogpos
>> -+--+--
>>  5964163898407843711 |1 | 177/53091990
>>
>
>
I I restarted the server and it promptly started fetching WAL archives from
our backups [1]. I am still at a loss as to why the binary replication
couldn't reconnect and why Postgres didn't fall back to fetching WAL
archives with its restore_command.

[1] http://pastebin.com/ZxM1LsNw


Re: [GENERAL] WAL receive process dies

2014-08-29 Thread Patrick Krecker
On Fri, Aug 29, 2014 at 3:46 PM, Andres Freund 
wrote:

> [FWIW: proper quoting makes answering easier and thus more likely]
>
> On 2014-08-29 15:37:51 -0700, Patrick Krecker wrote:
> > I ran the following on the local endpoint of spiped:
> >
> > while [ true ]; do psql -h localhost -p 5445 judicata -U marbury -c
> "select
> > modtime, pg_last_xlog_receive_location(), pg_last_xlog_replay_location()
> > from replication_time";  done;
> >
> > And the same command on production and I was able to verify that the
> xlogs
> > for a given point in time were the same (modtime is updated every second
> by
> > an upstart job):
> >
> > spiped from office -> production:
> >   modtime   | pg_last_xlog_receive_location |
> > pg_last_xlog_replay_location
> >
> +---+--
> >  2014-08-29 15:23:25.563766 | 177/2E80C9F8  |
> 177/2E80C9F8
> >
> > Ran directly on production replica:
> >   modtime   | pg_last_xlog_receive_location |
> > pg_last_xlog_replay_location
> >
> +---+--
> >  2014-08-29 15:23:25.563766 | 177/2E80C9F8  |
> 177/2E80C9F8
> >
> > To me, this is sufficient proof that spiped is indeed talking to the
> > machine I think it's talking to (also lsof reports the correct hostname).
> >
> > I created another basebackup from the currently stuck postgres intance on
> > another machine and I also get this error:
> >
> > 2014-08-29 15:27:30 PDT FATAL:  could not receive data from WAL stream:
> > ERROR:  requested starting point 177/2D00 is ahead of the WAL flush
> > position of this server 174/B76D16A8
>
> Uh. this indicates that the machine you're talking to is *not* one of
> the above as it has a flush position of '174/B76D16A8' - not something
> that's really possible when the node actually is at '177/2E80C9F8'.
>
> Could you run, on the standby that's having problems, the following
> command:
> psql 'host=127.0.0.1 port=5445 user=XXX password=XXX' -c 'IDENTIFY_SYSTEM;'
>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

RE: quoting, I wonder if Gmail is messing it up somehow? Or am I doing
something else wrong? Sorry :(

First, I apologize for the misleading information, but when I made another
basebackup and tried to use it, I configured the machine to cascade from
the stuck replica, *not* from the spiped endpoint. When I properly
connected it to the spiped endpoint it synced up fine, giving this log line:

2014-08-29 16:16:21 PDT LOG:  started streaming WAL from primary at
177/4F00 on timeline 1

The command as you gave reported a syntax error as is, but I googled a
little bit and run this one:

psql 'replication=1 dbname=XXX host=127.0.0.1 port=5445 user=XXX
password=XXX' -c 'IDENTIFY_SYSTEM;'

And it gave me this output:

  systemid   | timeline |   xlogpos
-+--+--
 5964163898407843711 |1 | 177/53091990


Re: [GENERAL] WAL receive process dies

2014-08-29 Thread Patrick Krecker
On Fri, Aug 29, 2014 at 2:11 PM, Andres Freund 
wrote:

> On 2014-08-29 13:04:43 -0700, Patrick Krecker wrote:
> > Hi Craig -- Sorry for the late response, I've been tied up with some
> other
> > things for the last day. Just to give some context, this is a machine
> that
> > sits in our office and replicates from another read slave in production
> via
> > a tunnel set up with spiped. The spiped tunnel is working and postgres is
> > still stuck (it has been stuck since 8-25).
> >
> > The last moment that replication was working was  2014-08-25
> > 22:06:05.03972. We have a table called replication_time with one column
> and
> > one row that has a timestamp that is updated every second, so it's easy
> to
> > tell the last time this machine was in sync with production.
> >
> > recovery.conf: http://pastie.org/private/dfmystgf0wxgtmahiita
> > logs: http://pastie.org/private/qt1ixycayvdsxafrzj0l0q
>
> The problem is this log entry:
>
> 2014-08-27 18:44:27 PDT ERROR:  requested starting point 175/2800 is
> ahead of the WAL flush position of this server 174/B76D16A8
>
> That's the walreceiver connecting to the upstream node and askign for
> wAL. Somehow the standby has gotten *considerably* ahead of the node
> it's trying to receive WAL from.
>
> Are you sure primary_conninfo / the spiped tunnel connects to the right
> postgres instance? Did you possibly have a failover or something like that?
>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

We didn't have a failover. We just have one master and replica in different
AZs on Amazon EC2.

I ran the following on the local endpoint of spiped:

while [ true ]; do psql -h localhost -p 5445 judicata -U marbury -c "select
modtime, pg_last_xlog_receive_location(), pg_last_xlog_replay_location()
from replication_time";  done;

And the same command on production and I was able to verify that the xlogs
for a given point in time were the same (modtime is updated every second by
an upstart job):

spiped from office -> production:
  modtime   | pg_last_xlog_receive_location |
pg_last_xlog_replay_location
+---+--
 2014-08-29 15:23:25.563766 | 177/2E80C9F8  | 177/2E80C9F8

Ran directly on production replica:
  modtime   | pg_last_xlog_receive_location |
pg_last_xlog_replay_location
+---+--
 2014-08-29 15:23:25.563766 | 177/2E80C9F8  | 177/2E80C9F8

To me, this is sufficient proof that spiped is indeed talking to the
machine I think it's talking to (also lsof reports the correct hostname).

I created another basebackup from the currently stuck postgres intance on
another machine and I also get this error:

2014-08-29 15:27:30 PDT FATAL:  could not receive data from WAL stream:
ERROR:  requested starting point 177/2D00 is ahead of the WAL flush
position of this server 174/B76D16A8

However, this new instance is able to fetch logs from S3 and replay them
without issue.

Is it possible that the data dir on the stuck instance is just corrupt? It
is not impossible for this to have happened at some point in the past due
to the race condition in fetching logs from S3 I mentioned above.


Re: [GENERAL] WAL receive process dies

2014-08-29 Thread Patrick Krecker
Hi Craig -- Sorry for the late response, I've been tied up with some other
things for the last day. Just to give some context, this is a machine that
sits in our office and replicates from another read slave in production via
a tunnel set up with spiped. The spiped tunnel is working and postgres is
still stuck (it has been stuck since 8-25).

The last moment that replication was working was  2014-08-25
22:06:05.03972. We have a table called replication_time with one column and
one row that has a timestamp that is updated every second, so it's easy to
tell the last time this machine was in sync with production.

recovery.conf: http://pastie.org/private/dfmystgf0wxgtmahiita
logs: http://pastie.org/private/qt1ixycayvdsxafrzj0l0q

Currently the WAL receive process is still not running. Interestingly,
another pg instance running on the same machine is replicating just fine.

A note about that: there is another instance running on that machine and a
definite race condition with restore_wal_s3.py, which writes the file to
/tmp before copying it to the destination requested by postgres (I just
discovered this today, this is not generally how we run our servers). So,
if both are restoring at the same time, they will step on the WAL archives
being unzipped in /tmp and bad things will happen. But, interestingly, I
checked the logs for the other machine and there is no activity on that
day. It does not appear that the WAL replay was invoked or that the WAL
receive timed out.

As for enabling the core dump, it seems that it needs to be done when
Postgres starts, and thought I would leave it running in its "stuck" state
for now. However, if you know how to enable it on a running process, let me
know. We are running Ubuntu 13.10.


On Wed, Aug 27, 2014 at 11:30 PM, Craig Ringer 
wrote:

> On 08/28/2014 09:39 AM, Patrick Krecker wrote:
> > We have a periodic network connectivity issue (unrelated to Postgres)
> > that is causing the replication to fail.
> >
> > We are running Postgres 9.3 using streaming replication. We also have
> > WAL archives available to be replayed with restore_command. Typically
> > when I bring up a slave it copies over WAL archives for a while before
> > connecting via streaming replication.
> >
> > When I notice the machine is behind in replication, I also notice that
> > the WAL receiver process has died. There didn't seem to be any
> > information in the logs about it.
>
> What did you search for?
>
> Do you have core dumps enabled? That'd be a good first step. (Exactly
> how to do this depends on the OS/distro/version, but you basically want
> to set "ulimit -c unlimited" on some ancestor of the postmaster).
>
> > 1. It seems that Postgres does not fall back to copying WAL archives
> > with its restore_command. I just want to confirm that this is what
> > Postgres is supposed to do when its connection via streaming replication
> > times out.
>
> It should fall back.
>
> > 2. Is it possible to restart replication after the WAL receiver process
> > has died without restarting Postgres?
>
> PostgreSQL should do so its self.
>
> Please show your recovery.conf (appropriately redacted) and
> postgresql.conf for the replica, and complete logs for the time period
> of interest. You'll want to upload the logs somewhere then link to them,
> do not attach them to an email to the list.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


[GENERAL] WAL receive process dies

2014-08-27 Thread Patrick Krecker
We have a periodic network connectivity issue (unrelated to Postgres) that
is causing the replication to fail.

We are running Postgres 9.3 using streaming replication. We also have WAL
archives available to be replayed with restore_command. Typically when I
bring up a slave it copies over WAL archives for a while before connecting
via streaming replication.

When I notice the machine is behind in replication, I also notice that the
WAL receiver process has died. There didn't seem to be any information in
the logs about it. The solution thus far has been to restart Postgres. I
have two questions:

1. It seems that Postgres does not fall back to copying WAL archives with
its restore_command. I just want to confirm that this is what Postgres is
supposed to do when its connection via streaming replication times out.
2. Is it possible to restart replication after the WAL receiver process has
died without restarting Postgres?

Thanks!
Patrick


Re: [GENERAL] Getting "cache lookup failed for aggregate" error

2014-06-24 Thread Patrick Krecker
On Tue, Jun 24, 2014 at 4:25 PM, Tom Lane  wrote:

> Patrick Krecker  writes:
> > Hello everyone -- We received a strange error today on our production
> write
> > master. During a routine maintenance script, we got the following error:
>
> > "ERROR:  cache lookup failed for aggregate 5953992"
>
> > I think I found the culprit. We have a cron script that (among other
> > things) recreates the aggregate function array_cat_aggregate()once every
> > minute.
>
> Um.  Why's it do that?
>

Well, it's admittedly a hack. The cron script uses the function just after
creating it. It was an easy way of guaranteeing that the database will have
the function when it's necessary. However, I suppose that assumption is
wrong because you have confirmed that aggregate functions are not part of
the snapshot provided by beginning a transaction.


>
> > My guess is that a) transactions do not guarantee a snapshot of
> > custom functions and b) we got unlucky and the aggregate was deleted
> during
> > the execution of the UPDATE.
>
> Yeah, there's no guarantee that dropping the aggregate wouldn't leave a
> window for this type of failure.
>
> 9.4 might be a little better about this because it doesn't use SnapshotNow
> for catalog fetches anymore, but I think you'd still be at some risk.
>
> > Here is the query that produced the error:
>
> It'd be more interesting to see what the cron script was doing to the
> aggregate definition.
>

FWIW the SQL is

DROP AGGREGATE IF EXISTS array_cat_aggregate(anyarray);
CREATE AGGREGATE array_cat_aggregate(anyarray)  (
SFUNC = array_cat,
STYPE = anyarray,
INITCOND  = '{}'
);

Followed by the other statement given in my previous email. But, I think
you've thoroughly answered by question. Thanks!


>
> regards, tom lane
>


[GENERAL] Getting "cache lookup failed for aggregate" error

2014-06-24 Thread Patrick Krecker
Hello everyone -- We received a strange error today on our production write
master. During a routine maintenance script, we got the following error:

"ERROR:  cache lookup failed for aggregate 5953992"

I think I found the culprit. We have a cron script that (among other
things) recreates the aggregate function array_cat_aggregate()once every
minute. My guess is that a) transactions do not guarantee a snapshot of
custom functions and b) we got unlucky and the aggregate was deleted during
the execution of the UPDATE.

I was just wondering if my analysis is correct, so I can make the necessary
changes to production to prevent this from happening again.

Here is the query that produced the error:

UPDATE marbury_case SET
  components_vector = (
SELECT array_cat_aggregate(component) FROM (
  SELECT ARRAY[
id,
type_id,
"offset",
length,
internal_id,
parent_id,
right_sibling_id] AS component
  FROM marbury_component WHERE case_id = marbury_case.id ORDER BY id)
AS foo),
  attributes_json = (
SELECT array_to_json(array_agg(attributes || hstore('_ind',
ind::text))) FROM (
  SELECT (rank() OVER (ORDER BY id)) - 1 AS ind, attributes
  FROM marbury_component
  WHERE case_id = marbury_case.id ORDER BY id)
AS foo WHERE attributes IS NOT NULL
 AND array_length(akeys(attributes), 1) > 0),
  vectors_updated = timeofday()::timestamp
WHERE id = 71865


Re: [GENERAL] Curious unnest behavior

2013-01-03 Thread Patrick Krecker
I have to say, this seems straightforward to me.  An array with N elements
gets N rows in the result set.  I'm curious what other behavior would be
more reasonable.


On Thu, Jan 3, 2013 at 11:22 AM, Jeff Trout wrote:

> I just ran into an interesting thing with unnest and empty arrays.
>
> create table x (
> a int,
> b int[]
> );
>
> insert into x(a,b) values (1, '{}');
> insert into x(a,b) values (1, '{}');
> insert into x(a,b) values (1, '{}');
>
> select a, b from x;
> select a, unnest(b) from x;
>
> insert into x(a,b) values (2, '{5,6}');
> select a, unnest(b) from x;
>
> drop table x;
>
> gives me:
> CREATE TABLE
> INSERT 0 1
> INSERT 0 1
> INSERT 0 1
>  a | b
> ---+
>  1 | {}
>  1 | {}
>  1 | {}
> (3 rows)
>
>  a | unnest
> ---+
> (0 rows)
>
> INSERT 0 1
>  a | unnest
> ---+
>  2 |  5
>  2 |  6
> (2 rows)
>
> DROP TABLE
>
> I can understand the likely reasoning behind the behavior but perhaps a
> note in the documentation about it might be of use for others that may get
> bit by this functionality.  (especially given the structure of the query,
> had I been doing select * from unnest(arr) that would be more intuitive,
> but given the query structure of select with no where the results can be
> surprising.)
>
> thanks
>
> --
> Jeff Trout 
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>