date:20141230

Re: [GENERAL] bdr_init_copy fails when starting 2nd BDR node

2014-12-30 Thread John Casey


> What was your bdr config at this point? The error message indicates that
it tries to
> connect to port 5432 on localhost - but the copy was taken from
'main_node_ip'.
> Perhaps you forgot to specify the ehost in the config?

# Here is my conf on the DR server (where I am running bdr_init_copy)
bdr.connections = 'primary'
bdr.primary_dsn = 'dbname=my_db host=primary_ip user=my_username  port=5432'
bdr.primary_init_replica = on
bdr.primary_replica_local_dsn = 'dbname=my_db user=my_username port=5432'

# For reference, here is the conf on my Primary server:
bdr.connections = 'dr'
bdr.dr_dsn = 'dbname=my_db host=dr_ip user=my_username  port=5432'

> What does 'git describe --tags' return?

bdr-pg/REL9_4beta3-1-120-ga2725dd

-Original Message-
From: Andres Freund [mailto:and...@2ndquadrant.com] 
Sent: Tuesday, December 30, 2014 12:57 PM
To: John Casey
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] bdr_init_copy fails when starting 2nd BDR node

Hi,

On 2014-12-29 23:51:05 -0500, John Casey wrote:
> I've been having issues while attempting to begin BDR replication. If 
> I set up the main node, then use bdr_init_copy, it always fails on 
> second node, as shown below.
> 
>  
> 
> postgres$ rm -Rf $PGDATA
> 
> postgres$ echo db_password | pg_basebackup -X stream -h main_node_ip 
> -p 5432 -U username -D $PGDATA
> 
> postgres$ cp $HOME/backup/postgresql.conf $PGDATA
> 
> postgres$ bdr_init_copy -U username -D $PGDATA
> 
> bdr_init_copy: starting...
> 
> Assigning new system identifier: 6098464173726284030...
> 
> Creating primary replication slots...
> 
> Creating restore point...
> 
> Could not connect to the remote server: could not connect to server: 
> No such file or directory
> 
> Is the server running locally and accepting
> 
> connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

What was your bdr config at this point? The error message indicates that it
tries to connect to port 5432 on localhost - but the copy was taken from
'main_node_ip'. Perhaps you forgot to specify th ehost in the config?

What does 'git describe --tags' return?

Greetings,

Andres Freund



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Improving performance of merging data between tables

2014-12-30 Thread Maxim Boguk

On Wed, Dec 31, 2014 at 11:10 AM, Pawel Veselov 
wrote
>
>
> [skipped]
>
> 2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
>>> http://www.postgresql.org/docs/9.4/static/pgstatstatements.html
>>>
>>> I have used this to profile some functions, and it worked pretty well.
>>> Mostly I use it on a test box, but once ran it on the live, which was
>>> scary, but worked great.
>>>
>>
>> That looks promising. Turned it on, waiting for when I can turn the
>> server at the next "quiet time".
>>
>
> I have to say this turned out into a bit of a disappointment for this use
> case. It only measures total time spent in a call. So, it sends up
> operations that waited a lot on some lock. It's good, but it would be great
> if total_time was provided along with wait_time (and io_time may be as
> well, since I also see operations that just naturally have to fetch a lot
> of data)
>

1) pg_stat_statements provide an information about io_time of each
statement but you should have track_io_timing 

enabled for that.

2) About locking I suggest enable log_lock_waits and set deadlock_timeout
to say 100ms (just for testing purposes), and than any lock waiting more
than 100ms will be logged with some useful additional info.

PS: your setup look pretty complicated and hard to analyze without seeing
all involved table structures, transaction/query flow, and (especially)
involved procedures source code.

PPS: btw, please check the database logs for deadlocks messages, your setup
around "and then call a pgsql function to merge the data from its tables
into the common tables" part could be easily deadlock prone.

PPPS: and the last suggestion, after you finished with the "write all the
data into its own tables", then application should perform analyze of these
own tables (or you could have weird/inefficient plans during last stage).



-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ 

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

Re: [GENERAL] Improving performance of merging data between tables

2014-12-30 Thread Pawel Veselov

On Mon, Dec 29, 2014 at 9:29 PM, Pawel Veselov 
wrote:

[skipped]

>>> 1) How do I find out what exactly is consuming the CPU in a PL/pgSQL
>>> function? All I see is that the calls to merge_all() function take long
>>> time, and the CPU is high while this is going on.
>>>
>>>
[skipped]

2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
>> http://www.postgresql.org/docs/9.4/static/pgstatstatements.html
>>
>> I have used this to profile some functions, and it worked pretty well.
>> Mostly I use it on a test box, but once ran it on the live, which was
>> scary, but worked great.
>>
>
> That looks promising. Turned it on, waiting for when I can turn the server
> at the next "quiet time".
>

I have to say this turned out into a bit of a disappointment for this use
case. It only measures total time spent in a call. So, it sends up
operations that waited a lot on some lock. It's good, but it would be great
if total_time was provided along with wait_time (and io_time may be as
well, since I also see operations that just naturally have to fetch a lot
of data)

[skipped]

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

2014-12-30 Thread Andrew Dunstan



On 12/30/2014 09:20 AM, Tom Lane wrote:

Bernd Helmle  writes:

--On 29. Dezember 2014 12:55:11 -0500 Tom Lane  wrote:

Given the lack of previous complaints, this probably isn't backpatching
material, but it sure seems like a bit of attention to consistency
would be warranted here.

Now that i read it i remember a client complaining about this some time
ago. I forgot about it, but i think there's value in it to backpatch.

Hm.  Last night I wrote the attached draft patch, which I was intending
to apply to HEAD only.  The argument against back-patching is basically
that this might change the interpretation of scripts that had been
accepted silently before.  For example
\set ECHO_HIDDEN NoExec
will now select "noexec" mode whereas before you silently got "on" mode.
In one light this is certainly a bug fix, but in another it's just
definitional instability.

If we'd gotten a field bug report we might well have chosen to back-patch,
though, and perhaps your client's complaint counts as that.

Opinions anyone?

r


I got caught by this with ON_ERROR_ROLLBACK on 9.3 just this afternoon 
before remembering this thread. So there's a field report :-)


+0.75 for backpatching (It's hard to imagine someone relying on the bad 
behaviour, but you never know).


cheers

andrew



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] bdr_init_copy fails when starting 2nd BDR node

2014-12-30 Thread Andres Freund

Hi,

On 2014-12-29 23:51:05 -0500, John Casey wrote:
> I've been having issues while attempting to begin BDR replication. If I set
> up the main node, then use bdr_init_copy, it always fails on second node, as
> shown below.
> 
>  
> 
> postgres$ rm -Rf $PGDATA
> 
> postgres$ echo db_password | pg_basebackup -X stream -h main_node_ip -p 5432
> -U username -D $PGDATA
> 
> postgres$ cp $HOME/backup/postgresql.conf $PGDATA
> 
> postgres$ bdr_init_copy -U username -D $PGDATA
> 
> bdr_init_copy: starting...
> 
> Assigning new system identifier: 6098464173726284030...
> 
> Creating primary replication slots...
> 
> Creating restore point...
> 
> Could not connect to the remote server: could not connect to server: No such
> file or directory
> 
> Is the server running locally and accepting
> 
> connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

What was your bdr config at this point? The error message indicates that
it tries to connect to port 5432 on localhost - but the copy was taken
from 'main_node_ip'. Perhaps you forgot to specify th ehost in the
config?

What does 'git describe --tags' return?

Greetings,

Andres Freund


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] extra function calls from query returning composite type

2014-12-30 Thread Merlin Moncure

On Mon, Dec 29, 2014 at 9:06 AM, Ronald Peterson  
wrote:
> I added a 'raise notice' to a plpgsql function I was working on
> recently, and noticed that my notification was being raised more often
> than I'd expect.  The notification is raised in a function ('getone'
> in my example below) that returns a single composite value.  This
> function is then called by another function ('getset') that returns a
> setof that composite value.  It appears that 'getone' is called once
> for each column of my composite type.  I whittled this down to the
> following example.
>
> I get the expected result from my query.  But it appears to me (and
> what do I know) that perhaps something inefficient is happening in the
> way this query is running.  Or maybe this is doing exactly what it
> should, in which case this is just a curiosity question - I don't
> understand why my 'getone' notice is called twice as often as I'd
> expect.

This was answered pretty clearly above.  The problem is (func()).*
syntax and how it's expanded in the query.  This was a longstanding
issue with set returning functions until 9.3 nailed it completely with
LATERAL.  Consider (func()).* to be deprecated syntax.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] bdr_init_copy fails when starting 2nd BDR node

2014-12-30 Thread John Casey

I've been having issues while attempting to begin BDR replication. If I set
up the main node, then use bdr_init_copy, it always fails on second node, as
shown below.

 

postgres$ rm -Rf $PGDATA

postgres$ echo db_password | pg_basebackup -X stream -h main_node_ip -p 5432
-U username -D $PGDATA

postgres$ cp $HOME/backup/postgresql.conf $PGDATA

postgres$ bdr_init_copy -U username -D $PGDATA

bdr_init_copy: starting...

Assigning new system identifier: 6098464173726284030...

Creating primary replication slots...

Creating restore point...

Could not connect to the remote server: could not connect to server: No such
file or directory

Is the server running locally and accepting

connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

 

If I start both servers simply by using pg_ctl using conf set up for
replication, I get the following error on the main node:.

 

LOG:  starting background worker process "bdr
(6098483684958107256,1,16384,): dr: apply"

CONTEXT:  slot "bdr_16384_6098483684958107256_1_16384__", output plugin
"bdr", in the startup callback

ERROR:  data stream ended

LOG:  worker process: bdr (6098483684958107256,1,16384,): dr: apply (PID
6294) exited with exit code 1

 

. and, I get the following error on the second node:

 

ERROR:  bdr output plugin: slot creation rejected, bdr.bdr_nodes entry for
local node (sysid=6098483778037269710, timelineid=1, dboid=16384):
status='i', bdr still starting up: applying initial dump of remote node

HINT:  Monitor pg_stat_activity and the logs, wait until the node has caught
up

CONTEXT:  slot "bdr_16384_6098483684958107256_1_16384__", output plugin
"bdr", in the startup callback

LOG:  could not receive data from client: Connection reset by peer

 

It will keep cycling these errors indefinitely.

 

I have gotten this working off and on; but, I keep running into this issue.
I am on CentOS 6.5.  Both servers can execute psql against the databases on
other nodes when not configured for replication, so it is not a connectivity
or firewall issue. I have installed using the beta2 rpm as well as built it
from source for rc1 (bdr stable).

 

Any ideas?

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

2014-12-30 Thread David Johnston

On Tue, Dec 30, 2014 at 8:54 AM, Adrian Klaver 
wrote:

> On 12/30/2014 07:43 AM, David G Johnston wrote:
>
>> Tom Lane-2 wrote
>>
>>> Bernd Helmle <
>>>
>>
>>  mailings@
>>>
>>
>>  > writes:
>>>
 --On 29. Dezember 2014 12:55:11 -0500 Tom Lane <

>>>
>>  tgl@.pa
>>>
>>
>>  > wrote:
>>>
 Given the lack of previous complaints, this probably isn't backpatching
> material, but it sure seems like a bit of attention to consistency
> would be warranted here.
>

>>>  Now that i read it i remember a client complaining about this some time
 ago. I forgot about it, but i think there's value in it to backpatch.

>>>
>>> Hm.  Last night I wrote the attached draft patch, which I was intending
>>> to apply to HEAD only.  The argument against back-patching is basically
>>> that this might change the interpretation of scripts that had been
>>> accepted silently before.  For example
>>> \set ECHO_HIDDEN NoExec
>>> will now select "noexec" mode whereas before you silently got "on" mode.
>>> In one light this is certainly a bug fix, but in another it's just
>>> definitional instability.
>>>
>>> If we'd gotten a field bug report we might well have chosen to
>>> back-patch,
>>> though, and perhaps your client's complaint counts as that.
>>>
>>> Opinions anyone?
>>>
>>
>> -0.5 for back patching
>>
>> The one thing supporting this is that we'd potentially be fixing scripts
>> that are broken but don't know it yet.  But the downside of changing
>> active
>> settings for working scripts - even if they are only accidentally working
>> -
>> is enough to counter that for me.  Being more liberal in our acceptance of
>> input is more feature than bug fix even if we document that we accept more
>> items.
>>
>
> It is more about being consistent then liberal. Personally I think a
> situation where for one variable 0 = off but for another 0 = on,  is a bug
>
>
I can sorta buy the consistency angle but what will seal it for me is
script portability - the ability to write a script and instructions using
the most current release and have it run on previous versions without
having to worry about this kind of incompatibility.

So, +1 for back patching from me.

David J.

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

2014-12-30 Thread Adrian Klaver

On 12/30/2014 07:43 AM, David G Johnston wrote:

Tom Lane-2 wrote

Bernd Helmle <

mailings@

> writes:

--On 29. Dezember 2014 12:55:11 -0500 Tom Lane <

tgl@.pa

> wrote:

Given the lack of previous complaints, this probably isn't backpatching
material, but it sure seems like a bit of attention to consistency
would be warranted here.

Now that i read it i remember a client complaining about this some time
ago. I forgot about it, but i think there's value in it to backpatch.

Hm.  Last night I wrote the attached draft patch, which I was intending
to apply to HEAD only.  The argument against back-patching is basically
that this might change the interpretation of scripts that had been
accepted silently before.  For example
\set ECHO_HIDDEN NoExec
will now select "noexec" mode whereas before you silently got "on" mode.
In one light this is certainly a bug fix, but in another it's just
definitional instability.

If we'd gotten a field bug report we might well have chosen to back-patch,
though, and perhaps your client's complaint counts as that.

Opinions anyone?

-0.5 for back patching

The one thing supporting this is that we'd potentially be fixing scripts
that are broken but don't know it yet.  But the downside of changing active
settings for working scripts - even if they are only accidentally working -
is enough to counter that for me.  Being more liberal in our acceptance of
input is more feature than bug fix even if we document that we accept more
items.

It is more about being consistent then liberal. Personally I think a 
situation where for one variable 0 = off but for another 0 = on,  is a bug

 That said it may be worth a documentation change and release note

that those options are not liberal currently so as to help those relying on
issues find and fix them proactively.

David J.

--
View this message in context: 
http://postgresql.nabble.com/ON-ERROR-ROLLBACK-tp5832298p5832448.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Adrian Klaver
adrian.kla...@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

2014-12-30 Thread David G Johnston

Tom Lane-2 wrote
> Bernd Helmle <

> mailings@

> > writes:
>> --On 29. Dezember 2014 12:55:11 -0500 Tom Lane <

> tgl@.pa

> > wrote:
>>> Given the lack of previous complaints, this probably isn't backpatching
>>> material, but it sure seems like a bit of attention to consistency
>>> would be warranted here.
> 
>> Now that i read it i remember a client complaining about this some time 
>> ago. I forgot about it, but i think there's value in it to backpatch.
> 
> Hm.  Last night I wrote the attached draft patch, which I was intending
> to apply to HEAD only.  The argument against back-patching is basically
> that this might change the interpretation of scripts that had been
> accepted silently before.  For example
>   \set ECHO_HIDDEN NoExec
> will now select "noexec" mode whereas before you silently got "on" mode.
> In one light this is certainly a bug fix, but in another it's just
> definitional instability.
> 
> If we'd gotten a field bug report we might well have chosen to back-patch,
> though, and perhaps your client's complaint counts as that.
> 
> Opinions anyone?

-0.5 for back patching

The one thing supporting this is that we'd potentially be fixing scripts
that are broken but don't know it yet.  But the downside of changing active
settings for working scripts - even if they are only accidentally working -
is enough to counter that for me.  Being more liberal in our acceptance of
input is more feature than bug fix even if we document that we accept more
items.  That said it may be worth a documentation change and release note
that those options are not liberal currently so as to help those relying on
issues find and fix them proactively.

David J.




--
View this message in context: 
http://postgresql.nabble.com/ON-ERROR-ROLLBACK-tp5832298p5832448.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] vacuum vs pg_repack vs pg_reorg

2014-12-30 Thread Kevin Grittner

sramay  wrote:

> I have a database of size 1.5 TB.   The attachments are stored in bytea.
> The attachment table is consuming maximum space.   The database version is
> 9.1.x and Streaming Replication is set.  Now, I have to removed old records
> to make way for new records without increasing SAN Space.
>
> Which will be a better option
>
> pg_repack  - I have primary key on that table
>
> or
> Vacuum
>
> or
> pg_reorg
>
> If  you can suggest me a better option out of the above, It will simplify my
> life.   At any cost streaming replication should not disturbed.

I strongly recommend just plain VACUUM ANALYZE (or leave it to
autovacuum).  The others cause the table to give space back to the
OS filesystem (in your case on a SAN), and that means that before
that space can be used again it will need to be allocated from the
OS again.  This is slower and can cause concurrency issues on
extending the table.

You may see a transient increase in space required for WAL archives
during any action that affects a lot of data.  If that's a problem
you may want to delete rows in batches big enough to stay ahead of
the insertions, but small enough to limit the archive size --
assuming you clean up the archives.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

2014-12-30 Thread Tom Lane

Bernd Helmle  writes:
> --On 29. Dezember 2014 12:55:11 -0500 Tom Lane  wrote:
>> Given the lack of previous complaints, this probably isn't backpatching
>> material, but it sure seems like a bit of attention to consistency
>> would be warranted here.

> Now that i read it i remember a client complaining about this some time 
> ago. I forgot about it, but i think there's value in it to backpatch.

Hm.  Last night I wrote the attached draft patch, which I was intending
to apply to HEAD only.  The argument against back-patching is basically
that this might change the interpretation of scripts that had been
accepted silently before.  For example
\set ECHO_HIDDEN NoExec
will now select "noexec" mode whereas before you silently got "on" mode.
In one light this is certainly a bug fix, but in another it's just
definitional instability.

If we'd gotten a field bug report we might well have chosen to back-patch,
though, and perhaps your client's complaint counts as that.

Opinions anyone?

regards, tom lane

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index d29dfa2..bdfb67c 100644
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
*** EOF
*** 173,180 
Echo the actual queries generated by \d and other backslash
commands. You can use this to study psql's
internal operations. This is equivalent to
!   setting the variable ECHO_HIDDEN from within
!   psql.

--- 173,179 
Echo the actual queries generated by \d and other backslash
commands. You can use this to study psql's
internal operations. This is equivalent to
!   setting the variable ECHO_HIDDEN to on.

*** EOF
*** 333,340 
quietly. By default, it prints welcome messages and various
informational output. If this option is used, none of this
happens. This is useful with the -c option.
!   Within psql you can also set the
!   QUIET variable to achieve the same effect.

--- 332,339 
quietly. By default, it prints welcome messages and various
informational output. If this option is used, none of this
happens. This is useful with the -c option.
!   This is equivalent to setting the variable QUIET
!   to on.

*** bar
*** 2884,2891 
  ECHO_HIDDEN

! When this variable is set and a backslash command queries the
! database, the query is first shown. This way you can study the
  PostgreSQL internals and provide
  similar functionality in your own programs. (To select this behavior
  on program start-up, use the switch -E.)  If you set
--- 2883,2891 
  ECHO_HIDDEN

! When this variable is set to on and a backslash command
! queries the database, the query is first shown.
! This feature helps you to study
  PostgreSQL internals and provide
  similar functionality in your own programs. (To select this behavior
  on program start-up, use the switch -E.)  If you set
*** bar
*** 3046,3061 

! When on, if a statement in a transaction block
  generates an error, the error is ignored and the transaction
! continues. When interactive, such errors are only
  ignored in interactive sessions, and not when reading script
! files. When off (the default), a statement in a
  transaction block that generates an error aborts the entire
! transaction. The on_error_rollback-on mode works by issuing an
  implicit SAVEPOINT for you, just before each command
! that is in a transaction block, and rolls back to the savepoint
! on error.

--- 3046,3061 

! When set to on, if a statement in a transaction block
  generates an error, the error is ignored and the transaction
! continues. When set to interactive, such errors are only
  ignored in interactive sessions, and not when reading script
! files. When unset or set to off, a statement in a
  transaction block that generates an error aborts the entire
! transaction. The error rollback mode works by issuing an
  implicit SAVEPOINT for you, just before each command
! that is in a transaction block, and then rolling back to the
! savepoint if the command fails.

*** bar
*** 3065,3071 

  By default, command processing continues after an error.  When this
! variable is set, it will instead stop immediately.  In interactive mode,
  psql will return to the command prompt;
  otherwise, psql

Re: [GENERAL] Improving performance of merging data between tables

2014-12-30 Thread Andy Colson

On 12/29/2014 11:29 PM, Pawel Veselov wrote:

Andy,

thanks for looking into this.

On Mon, Dec 29, 2014 at 9:00 AM, Andy Colson mailto:a...@squeakycode.net>> wrote:

On 12/28/2014 3:49 PM, Pawel Veselov wrote:

Hi.

I was wondering if anybody would have any ideas on how to improve
certain operations that we are having.

Besides "can somebody please look at this and let me know if I'm doing
something utterly stupid", here are my questions.

1) How do I find out what exactly is consuming the CPU in a PL/pgSQL
function? All I see is that the calls to merge_all() function take long
time, and the CPU is high while this is going on.

First, I'll admit I didn't read your entire post.

I can think of a couple methods:

1) try each of the statements in merge_all by hand with an "explain 
analyze" in front to see which is slow.  Look for things that hit big tables without 
an index.  Check that fk lookups are indexes.

If I didn't miss anything, that seems to be OK, even on function-based queries.

2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
http://www.postgresql.org/__docs/9.4/static/__pgstatstatements.html 

I have used this to profile some functions, and it worked pretty well. 
Mostly I use it on a test box, but once ran it on the live, which was scary, 
but worked great.

That looks promising. Turned it on, waiting for when I can turn the server at the next 
"quiet time".

3) try auto-explain:
http://www.postgresql.org/__docs/9.4/static/auto-explain.__html 

I've never used it, so don't know if it'll show each statement inside a 
function.  Dumps stuff to the log AFAIK, so you'll have to dig out the info by 
hand.

> 2) Is there a better way to merge individual rows, except doing
> UPDATE/INSERT in a loop, and would that be CPU expensive?
>

Not that I know of.  I use pretty much the same thing.  Soon!  we will have 
merge/upsert support.  Hopefully it'll be fast.

Well, anytime I cancelled the PID that was executing this whole mess, it would 
always stop at UPDATE ... SET ... WHERE on the main table. Which does make me 
believe that bulk update would really help.

> 3) Is there a better way to merge whole tables? However, note that I
> need to translate primary keys from node main table into the common main
> table, as they are used as foreign keys, hence the loops. I suspect the
> looping is CPU intensive.

Avoiding loops and doing things as sets is the best way.  If possible. The 
only loop I saw was looping over the merge_xxx tables, which is probably the 
only way.

There is an endless loop that is just a device for merging, but then there are 
loops going over each record in all the tables that are being merge, feeding 
them into the function that actually does the merge. That table iteration is 
what I want to eliminate (especially if I knew it would help :) )

If possible (if you haven't already) you could add and extra column to your 
secondary table that you can set as the main table's key.

bulk insert into second;
update second set magic = (select key from main where ... );

Then, maybe, you can do two ops in batch:

update main (where key exists in main)
insert into main (where key not exists in main)

I was thinking along the same lines. I can't really do bulk insert, at any 
point, because any key can be inserted by another process at any time, and with 
a good probability. However, there will be a lot less inserts than updates. So, 
in general, I'm making it do this:

What about transactions?  I assume you do something like:

begin;
merge_all;
commit;

Depending on your transaction isolation level, then you could ensure that 
nobody could insert while you are inserting.  I've never used the different 
isolation levels, so not 100% sure.  Even then, maybe a lock on the table 
itself, like:

begin;
lock;
merge_all;
unlock;
commit;

This way only one at a time can work do insert/update, but you can do them in 
batch and not in a loop.  It might be faster that way.  Other processes might 
wait for the lock a little bit, but if merge_all was faster in general, the 
time to wait for lock would be less that the entire merge process itself.

I'm totally guessing here.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

2014-12-30 Thread Bernd Helmle




--On 29. Dezember 2014 12:55:11 -0500 Tom Lane  wrote:


Given the lack of previous complaints, this probably isn't backpatching
material, but it sure seems like a bit of attention to consistency
would be warranted here.


Now that i read it i remember a client complaining about this some time 
ago. I forgot about it, but i think there's value in it to backpatch.


--
Thanks

Bernd


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] bdr_init_copy fails when starting 2nd BDR node

Re: [GENERAL] Improving performance of merging data between tables

Re: [GENERAL] Improving performance of merging data between tables

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

Re: [GENERAL] bdr_init_copy fails when starting 2nd BDR node

Re: [GENERAL] extra function calls from query returning composite type

[GENERAL] bdr_init_copy fails when starting 2nd BDR node

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

Re: [GENERAL] vacuum vs pg_repack vs pg_reorg

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

Re: [GENERAL] Improving performance of merging data between tables

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

14 matches

Site Navigation

Mail list logo

Footer information