Re: [HACKERS] Compression of full-page-writes

2013-10-15 Thread KONDO Mitsumasa

(2013/10/15 13:33), Amit Kapila wrote:

Snappy is good mainly for un-compressible data, see the link below:
http://www.postgresql.org/message-id/CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4yebmoa...@mail.gmail.com

This result was gotten in ARM architecture, it is not general CPU.
Please see detail document.
http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9

I found compression algorithm test in HBase. I don't read detail, but it 
indicates snnapy algorithm gets best performance.

http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of

In fact, most of modern NoSQL storages use snappy. Because it has good 
performance and good licence(BSD license).



I think it is bit difficult to prove that any one algorithm is best
for all kind of loads.
I think it is necessary to make best efforts in community than I do the best 
choice with strict test.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CF 2013-09 Wrap Up

2013-10-15 Thread Noah Misch
On Mon, Oct 14, 2013 at 01:56:42PM -0500, Mike Blackwell wrote:
 CF 2013-09 will be wrapping up this week.  As a reminder, beginning on the
 official CF end date (11/15), patches Waiting for Author will be Returned
 with Feedback.  Authors are welcome to add their patch to the next CF
 (2013-11).

Sounds good.

 Any patches marked Needs Review will be automatically moved to the next CF.
  We will try to make sure that all patches in the current CF have received
 at least one review.

The combined effect of those two statements is not clear to me.  Does that
mean you'll retain never-reviewed patches and automatically move patches that
have received at least one review?

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add use of asprintf()

2013-10-15 Thread Asif Naeem
On Tue, Oct 15, 2013 at 10:55 AM, David Rowley dgrowle...@gmail.com wrote:

 Though this is not yet enough to get the windows build work with me... I'm
 still getting link failures for isolationtester.c


 D:\Postgres\c\pgsql.sln (default target) (1) -
 D:\Postgres\c\isolationtester.vcxproj (default target) (89) -
 (Link target) -
   isolationtester.obj : error LNK2019: unresolved external symbol
 _pg_strdup referenced in function _try_complete_step
 [D:\Postgres\c\isolationtester.vcxproj]
   isolationtester.obj : error LNK2019: unresolved external symbol
 _pg_asprintf referenced in function _try_complete_step
 [D:\Postgres\c\isolationtester.vcxproj
 ]
   .\Release\isolationtester\isolationtester.exe : fatal error LNK1120: 2
 unresolved externals [D:\Postgres\c\isolationtester.vcxproj]

 1 Warning(s)

 I guess this is down to a make file error somewhere.


Can you please try the attached patch ?. I hope it will solve the problem.



 David






Win_isolationtester_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Amit Kapila
On Mon, Oct 14, 2013 at 11:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On further analysis, I found that hang occurs in some of Windows
 API(FindFirstFile, RemoveDirectroy) when symlink path
 (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
 API's. For above testcase, it will hang in path
 destroy_tablespace_directories-ReadDir-readdir-FindFirstFile

 Well, that sucks.  So it's a Windows bug.

 It's not clear to me that we should do anything about this at all,
 except perhaps document that people should avoid long tablespace
 path names on an unknown set of Windows versions.

There are few more relatively minor issues with long paths in Windows.
For Example:
In function CreateTableSpace(), below check protects to create
tablespace on longer paths.

if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 +
OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS  MAXPGPATH)
  ereport(ERROR,
  (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
   errmsg(tablespace location \%s\ is too long,
   location)));

MAXPGPATH is defined to be 1024, whereas the windows API's used in PG
have limit of 260 due to which error comes directly from API's use
rather than from above check.
So, one of the change I am thinking is to define MAXPGPATH for windows
separately.

 We should not
 be in the business of working around any and every bug coming out
 of Redmond.

This bug leads to an uninterruptible hang (I am not able to kill
process by task manager or any other way) and the corresponding
backend started consuming ~100% of CPU, so user doesn't have much
options but to restart his m/c. Any form of shutdown of PG is also not
successful.
I had proposed to fix this issue based on its severity, but if you
feel that we should keep the onus of such usage on user, then I think
I can try to fix other relatively minor problems on usage of long
paths.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add use of asprintf()

2013-10-15 Thread David Rowley
On Tue, Oct 15, 2013 at 8:00 PM, Asif Naeem anaeem...@gmail.com wrote:




 On Tue, Oct 15, 2013 at 10:55 AM, David Rowley dgrowle...@gmail.comwrote:

 Though this is not yet enough to get the windows build work with me...
 I'm still getting link failures for isolationtester.c


 D:\Postgres\c\pgsql.sln (default target) (1) -
 D:\Postgres\c\isolationtester.vcxproj (default target) (89) -
 (Link target) -
   isolationtester.obj : error LNK2019: unresolved external symbol
 _pg_strdup referenced in function _try_complete_step
 [D:\Postgres\c\isolationtester.vcxproj]
   isolationtester.obj : error LNK2019: unresolved external symbol
 _pg_asprintf referenced in function _try_complete_step
 [D:\Postgres\c\isolationtester.vcxproj
 ]
   .\Release\isolationtester\isolationtester.exe : fatal error LNK1120: 2
 unresolved externals [D:\Postgres\c\isolationtester.vcxproj]

 1 Warning(s)

 I guess this is down to a make file error somewhere.


 Can you please try the attached patch ?. I hope it will solve the problem.


Thanks that combined with my patch above seems to get the build running
again.
In summary I've attached a patch which is both of these combined.

I've not run any regression tests yet as that seems to be broken, but
perhaps it is something changed with my build environment. The attached is
probably worth applying to get the windows build farm members building
again to see if they'll go green.

Regards

David Rowley






 David







asprintf_windows_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Description on bind message

2013-10-15 Thread Tatsuo Ishii
In manual of 48.5. Message Formats section, there is a description
of Bind message.

Int16[C]
The parameter format codes. Each must presently be zero (text) or one 
(binary).

This could be completely non-existent field in the current
implementation of PostgreSQL. I think the fact is not very clear from
the description. It would be nice the description is something like:

Int16[C]
The parameter format codes. Each must presently be zero (text) or one 
(binary).
This field does not exist if the number of prameter values is 0.

Same thing can be said to following:

Int16[R]
The result-column format codes. Each must presently be zero (text) or one 
(binary).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Description on bind message

2013-10-15 Thread Marko Tiikkaja

On 10/15/13 11:02 AM, Tatsuo Ishii wrote:

In manual of 48.5. Message Formats section, there is a description
of Bind message.

Int16[C]
 The parameter format codes. Each must presently be zero (text) or one 
(binary).

This could be completely non-existent field in the current
implementation of PostgreSQL. I think the fact is not very clear from
the description. It would be nice the description is something like:

Int16[C]
 The parameter format codes. Each must presently be zero (text) or one 
(binary).
This field does not exist if the number of prameter values is 0.


This is already explicitly said in the description of the previous field:

The number of parameter format codes that follow (denoted C below). 
This can be zero to indicate that there are no parameters or that the 
parameters all use the default format (text); or one, in which case the 
specified format code is applied to all parameters; or it can equal the 
actual number of parameters.


Also the documentation for the array notation explains this: An array 
of k n-bit integers, each in network byte order. The array length k is 
always determined by an earlier field in the message. Eg. Int16[M]..


This part seems clear enough to me, and I don't think repeating that 
*three times* is necessary.  We already say it twice.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Description on bind message

2013-10-15 Thread Tatsuo Ishii
 This is already explicitly said in the description of the previous
 field:
 
 The number of parameter format codes that follow (denoted C
 below). This can be zero to indicate that there are no parameters or
 that the parameters all use the default format (text); or one, in
 which case the specified format code is applied to all parameters; or
 it can equal the actual number of parameters.

I know it. The confusing part is this:

  This can be zero to indicate that there are no parameters or
  that the parameters all use the default format (text);

Here zero has double meaning. I am not convinced until I actually
looked into the source code what this actually means.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-15 Thread Haribabu kommi
On 12 October 2013 11:30 Tom Lane wrote: 
Haribabu kommi haribabu.ko...@huawei.com writes:
 To handle the above case instead of directly resetting the dead tuples 
 as zero, how if the exact dead tuples are removed from the table stats. With 
 this approach vacuum gets triggered frequently thus it reduces the bloat.

This does not seem like a very good idea as-is, because it will mean that 
n_dead_tuples can diverge arbitrarily far from reality over time, as a result 
of accumulation of errors.  It also doesn't seem 
like a very good idea that VACUUM sets n_live_tuples while only adjusting 
n_dead_tuples incrementally; ideally those counters should move in the same 
fashion.
In short, I think this patch will create at least as many problems as it fixes.

What would make more sense to me is for VACUUM to estimate the number of 
remaining dead tuples somehow and send that in its message.  However, since 
the whole point here is that we aren't accounting for 
transactions that commit while VACUUM runs, it's not very clear how to do that.

Another way to look at it is that we want to keep any increments to 
n_dead_tuples that occur after VACUUM takes its snapshot.  Maybe we could have 
VACUUM copy the n_dead_tuples value as it exists when 
VACUUM starts, and then send that as the value to subtract when it's done?

Taking of n_dead_tuples copy and pass the same at the vacuum end to subtract 
from table stats may not be correct, as vacuum may not be cleaned all the dead 
tuples because of tuple visibility
To other transactions. How about resets the n_dead_tuples as zero if it goes 
negative because of errors?


Regards,
Hari babu.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Standby catch up state change

2013-10-15 Thread Pavan Deolasee
Hello,

I wonder if there is an issue with the way state change happens
from WALSNDSTATE_CATCHUP to WALSNDSTATE_STREAMING. Please note my question
is solely based on a strange behavior reported by a colleague and my
limited own code reading. The colleague is trying out replication with a
networking middleware and noticed that the master logs the debug message
about standby catching up, but the write_location in the
pg_stat_replication view takes minutes to reflect the actual catch up
location.

ISTM that the following code in walsender.c assumes that the standby has
caught up once master sends all the required WAL.

1548 /* Do we have any work to do? */
1549 Assert(sentPtr = SendRqstPtr);
1550 if (SendRqstPtr = sentPtr)
1551 {
1552 *caughtup = true;
1553 return;
1554 }

But what if the standby has not yet received all the WAL data sent by the
master ? It can happen for various reasons such as caching at the OS level
or the network layer on the sender machine or any other intermediate hops.

Should we not instead wait for the standby to have received all the WAL
before declaring that it has caught up ? If a failure happens while the
data is still in the sender's buffer, the standby may not actually catch up
to the desired point contrary to the LOG message displayed on the master.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Andres Freund
On 2013-10-15 15:51:46 +0530, Pavan Deolasee wrote:
 Should we not instead wait for the standby to have received all the WAL
 before declaring that it has caught up ? If a failure happens while the
 data is still in the sender's buffer, the standby may not actually catch up
 to the desired point contrary to the LOG message displayed on the master.

I don't think that'd be a good idea - the caughtup logic is used to
determine whether we need to wait for further wal to be generated
locally if we haven't got anything else to do. And we only need to do so
when we reached the end of the WAL.

Also, we'd have to reset caughtup everytime we send data (in
XLogSend()), that'd be horrible.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Pavan Deolasee
On Tue, Oct 15, 2013 at 3:59 PM, Andres Freund and...@2ndquadrant.comwrote:


 I don't think that'd be a good idea - the caughtup logic is used to
 determine whether we need to wait for further wal to be generated
 locally if we haven't got anything else to do. And we only need to do so
 when we reached the end of the WAL.


Obviously I do not understand the logic caughtup fully, but don't you think
the log message about standby having caught up with master while it hasn't
because the sender has buffered a lot of data, is wrong ? Or are you saying
those are two different things really ?


 Also, we'd have to reset caughtup everytime we send data (in
 XLogSend()), that'd be horrible.


Sorry, I did not get that. I was only arguing that the log message about
standby having caught up with master should be delayed until standby has
actually received the WAL, not much about the actual implementation.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Andres Freund
On 2013-10-15 16:12:56 +0530, Pavan Deolasee wrote:
 On Tue, Oct 15, 2013 at 3:59 PM, Andres Freund and...@2ndquadrant.comwrote:
 
 
  I don't think that'd be a good idea - the caughtup logic is used to
  determine whether we need to wait for further wal to be generated
  locally if we haven't got anything else to do. And we only need to do so
  when we reached the end of the WAL.
 
 
 Obviously I do not understand the logic caughtup fully, but don't you think
 the log message about standby having caught up with master while it hasn't
 because the sender has buffered a lot of data, is wrong ? Or are you saying
 those are two different things really ?

The message is logged when the state changes because the state is
important for the behaviour of replication (e.g. that node becomes
elegible for sync rep). I don't think delaying the message is a good
idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-10-15 Thread Vik Fearing
On 10/09/2013 04:19 PM, Pavel Stehule wrote:

 I checked a conformance with ANSI SQL - and I didn't find any issue.

 I found so following error message is not too friendly (mainly
 because this functionality will be new)

 postgres=# select dense_rank(3,3,2) within group (order by num
 desc, odd) from test4;
 ERROR:  Incorrect number of arguments for hypothetical set function
 LINE 1: select dense_rank(3,3,2) within group (order by num desc,
 od...
^
 postgres=# select dense_rank(3,3,2) within group (order by num
 desc) from test4;
 ERROR:  Incorrect number of arguments for hypothetical set function
 LINE 1: select dense_rank(3,3,2) within group (order by num desc)
 fr...
^
 postgres=# select dense_rank(3,3) within group (order by num desc)
 from test4;
 ERROR:  Incorrect number of arguments for hypothetical set function
 LINE 1: select dense_rank(3,3) within group (order by num desc)
 from...
^
 postgres=# select dense_rank(3,3) within group (order by num desc,
 num) from test4;
  dense_rank
 
   3
 (1 row)

 Probably some hint should be there?



In addition to Pavel's review, I have finally finished reading the
patch. Here are some notes, mainly on style:

First of all, it no longer compiles on HEAD because commit
4d212bac1752e1bad6f3aa6242061c393ae93a0a stole oid 3968.  I modified
that locally to be able to continue my review.

Some of the error messages do not comply with project style.  That is,
they begin with a capital letter.

Ordered set functions cannot have transition functions
Ordered set functions must have final functions
Invalid argument types for hypothetical set function
Invalid argument types for ordered set function
Incompatible change to aggregate definition
Too many arguments to ordered set function
Ordered set finalfns must not be strict
Cannot have multiple ORDER BY clauses with WITHIN GROUP
Cannot have DISTINCT and WITHIN GROUP together
Incorrect number of arguments for hypothetical set function
Incorrect number of direct arguments to ordered set function %s

And in pg_aggregate.c I found a comment with a similar problem that
doesn't match its surrounding code:
Oidtranssortop = InvalidOid;  /* Can be omitted */

I didn't find any more examples like that, but I did see several block
comments that weren't complete sentences whereas I think they should
be.  Also a lot of the code comments say I and I don't recall seeing
that elsewhere.  I may be wrong, but I would prefer if they were more
neutral.

The documentation has a number of issues.

collateindex.pl complains of duplicated index entries for PERCENTILE
CONTINUOUS and PERCENTILE DISCRETE.  This is because the index markup is
used for both overloaded versions.  This is the same mistake Bruce made
and then corrected in commit 5dcc48c2c76cf4b2b17c8e14fe3e588ae0c8eff3.

if there are multiple equally good result should have an s on the end
in func.sgml.

Table 9-49 has an extra empty column.  That should either be removed, or
filled in with some kind of comment text like other similar tables.

Apart from that, it looks good.  There is some mismatched coding styles
in there but the next run of pgindent should catch them so it's no big deal.

I haven't yet exercised the actual functionality of the new functions,
nor have I tried to create my own.  Andrew alerted me to a division by
zero bug in one of them, so I'll be looking forward to catching that.

So, more review to come.

-- 
Vik



Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Pavan Deolasee
On Tue, Oct 15, 2013 at 4:16 PM, Andres Freund and...@2ndquadrant.comwrote:

  I don't think delaying the message is a good
  idea.


Comment in walsender.c says:

/*
 * If we're in catchup state, move to streaming.  This is an
 * important state change for users to know about, since before
 * this point data loss might occur if the primary dies and we
 * need to failover to the standby.
 */

IOW it claims no data loss will occur after this point. But if the WAL is
cached on the master side, isn't this a false claim i.e. the data loss can
still occur even after master outputs the log message and changes the state
to streaming. Or am I still getting it wrong ?

Thanks,
Pavan
-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Andres Freund
On 2013-10-15 16:29:47 +0530, Pavan Deolasee wrote:
 On Tue, Oct 15, 2013 at 4:16 PM, Andres Freund and...@2ndquadrant.comwrote:
 
   I don't think delaying the message is a good
   idea.
 
 
 Comment in walsender.c says:
 
 /*
  * If we're in catchup state, move to streaming.  This is an
  * important state change for users to know about, since before
  * this point data loss might occur if the primary dies and we
  * need to failover to the standby.
  */
 
 IOW it claims no data loss will occur after this point. But if the WAL is
 cached on the master side, isn't this a false claim i.e. the data loss can
 still occur even after master outputs the log message and changes the state
 to streaming. Or am I still getting it wrong ?

I think you're over-intrepreting it. We don't actually rely on the data
being confirmed received anywhere. And the message doesn't say anything
about everything safely being written out.
So, if you want to adjust that comment, go for it, but I am pretty
firmly confirmed that this isn't worth changing logic.

Note that the ready_to_stop logic *does* make sure everything's flushed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL renegotiation

2013-10-15 Thread Vik Fearing
On 09/23/2013 10:51 PM, Alvaro Herrera wrote:
 + /* are we in the middle of a renegotiation? */
 + static bool in_ssl_renegotiation = false;
 + 

Since this was committed, I'm getting the following warning:

be-secure.c:105:13: warning: ‘in_ssl_renegotiation’ defined but not used
[-Wunused-variable]

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Statistics collection for CLUSTER command

2013-10-15 Thread Vik Fearing
On 09/16/2013 08:26 AM, Satoshi Nagayasu wrote:
 (2013/08/08 20:52), Vik Fearing wrote:
 As part of routine maintenance monitoring, it is interesting for us to
 have statistics on the CLUSTER command (timestamp of last run, and
 number of runs since stat reset) like we have for (auto)ANALYZE and
 (auto)VACUUM.  Patch against today's HEAD attached.

 I would add this to the next commitfest but I seem to be unable to log
 in with my community account (I can log in to the wiki).  Help
 appreciated.

 I have reviewed the patch.

Thank you for your review.

 Succeeded to build with the latest HEAD, and passed the regression
 tests.

 Looks good enough, and I'd like to add a test case here, not only
 for the view definition, but also working correctly.

 Please take a look at attached one.

Looks good to me.  Attached is a rebased patch with those tests added.

-- 
Vik

*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***
*** 979,984  postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re
--- 979,989 
daemon/entry
  /row
  row
+  entrystructfieldlast_cluster//entry
+  entrytypetimestamp with time zone//entry
+  entryLast time at which commandCLUSTER/ was issued on this table/entry
+ /row
+ row
   entrystructfieldvacuum_count//entry
   entrytypebigint//entry
   entryNumber of times this table has been manually vacuumed
***
*** 1001,1006  postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re
--- 1006,1016 
   entryNumber of times this table has been analyzed by the autovacuum
daemon/entry
  /row
+ row
+  entrystructfieldcluster_count//entry
+  entrytypebigint//entry
+  entryNumber of times commandCLUSTER/ has been issued on this table/entry
+ /row
 /tbody
 /tgroup
/table
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***
*** 410,419  CREATE VIEW pg_stat_all_tables AS
  pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
  pg_stat_get_last_analyze_time(C.oid) as last_analyze,
  pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze,
  pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
  pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count,
  pg_stat_get_analyze_count(C.oid) AS analyze_count,
! pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count
  FROM pg_class C LEFT JOIN
   pg_index I ON C.oid = I.indrelid
   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
--- 410,421 
  pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
  pg_stat_get_last_analyze_time(C.oid) as last_analyze,
  pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze,
+ pg_stat_get_last_cluster_time(C.oid) as last_cluster,
  pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
  pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count,
  pg_stat_get_analyze_count(C.oid) AS analyze_count,
! pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count,
! pg_stat_get_cluster_count(C.oid) AS cluster_count
  FROM pg_class C LEFT JOIN
   pg_index I ON C.oid = I.indrelid
   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***
*** 35,40 
--- 35,41 
  #include commands/vacuum.h
  #include miscadmin.h
  #include optimizer/planner.h
+ #include pgstat.h
  #include storage/bufmgr.h
  #include storage/lmgr.h
  #include storage/predicate.h
***
*** 407,412  cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose,
--- 408,417 
  	 verbose);
  
  	/* NB: rebuild_relation does heap_close() on OldHeap */
+ 
+ 	/* Report CLUSTER to the stats collector, but not VACUUM FULL */
+ 	if (indexOid != InvalidOid)
+ 		pgstat_report_cluster(OldHeap);
  }
  
  /*
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***
*** 292,297  static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, in
--- 292,298 
  static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
  static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
  static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
+ static void pgstat_recv_cluster(PgStat_MsgCluster *msg, int len);
  static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
  static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
  static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
***
*** 1385,1390  pgstat_report_analyze(Relation rel,
--- 1386,1412 
  }
  
  /* 
+  * pgstat_report_cluster() -
+  *
+  *	Tell the collector about the table we just CLUSTERed.
+  * 

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Robert Haas
On Fri, Oct 11, 2013 at 2:30 PM, Peter Geoghegan p...@heroku.com wrote:
 But that's simpler than any of the alternatives that I see.
 Does there really need to be a new snapshot type with one tiny
 difference that apparently doesn't actually affect conventional
 clients of MVCC snapshots?

 I think that's the wrong way of thinking about it.  If you're
 introducing a new type of snapshot, or tinkering with the semantics of
 an existing one, I think that's a reason to reject the patch straight
 off.  We should be looking for a design that doesn't require that.  If
 we can't find one, I'm not sure we should do this at all.

 I'm confused by this. We need to lock a row not visible to our
 snapshot under conventional rules. I think we can rule out
 serialization failures at read committed. That just leaves changing
 something about the visibility rules of an existing snapshot type, or
 creating a new snapshot type, no?

 It would also be unacceptable to update a tuple, and not have the new
 row version (which of course will still have information from the
 future) visible to our snapshot - what would regular RETURNING
 return? So what do you have in mind? I don't think that locking a row
 and updating it are really that distinct anyway. The benefit of
 locking is that we don't have to update. We can delete, for example.

Well, the SQL standard way of doing this type of operation is MERGE.
The alternative we know exists in other databases is REPLACE; there's
also INSERT .. ON DUPLICATE KEY update.  In all of those cases,
whatever weirdness exists around MVCC is confined to that one command.
 I tend to think we should do similarly, with the goal that
HeapTupleSatisfiesMVCC need not change at all.

I don't have the only vote here, of course, but my feeling is that
that's more likely to be a good route.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] removing old ports and architectures

2013-10-15 Thread Andres Freund
On 2013-10-13 16:56:12 +0200, Tom Lane wrote:
 More to the point for this specific case, it seems like our process
 ought to be
 (1) select a preferably-small set of gcc atomic intrinsics that we
 want to use.

I suggest:
* pg_atomic_load_u32(uint32 *)
* uint32 pg_atomic_store_u32(uint32 *)
* uint32 pg_atomic_exchange_u32(uint32 *ptr, uint32 val)
* bool pg_atomic_compare_exchange_u32(uint32 *ptr, uint32 *expected, uint32 
newval)
* uint32 pg_atomic_fetch_add_u32(uint32 *ptr, uint32 add)
* uint32 pg_atomic_fetch_sub_u32(uint32 *ptr, uint32 add)
* uint32 pg_atomic_fetch_and_u32(uint32 *ptr, uint32 add)
* uint32 pg_atomic_fetch_or_u32(uint32 *ptr, uint32 add)
* u64 variants of the above
* bool pg_atomic_test_set(void *ptr)
* void pg_atomic_clear(void *ptr)

Ontop of that we can generically implement:
* pg_atomic_add_until_u32(uint32 *ptr, uint32 val, uint32 limit)
* pg_atomic_(add|sub|and|or)_fetch_u32()
* u64 variants of the above

We might also want to provide a generic implementation of the math
operations based on pg_atomic_compare_exchange() to make it easier to
bring up a new architecture.

I think we should leave 64bit support optional for now.

Opinions?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread MauMau

From: Magnus Hagander mag...@hagander.net

On Oct 12, 2013 2:13 AM, MauMau maumau...@gmail.com wrote:

I'm not sure if many use XA features, but I saw the questions and answer

a few times, IIRC.  In the trouble situation, PostgreSQL outputs an
intuitive message like increase max_prepared_transactions, so many users
might possibly have been able to change the setting and solve the problem
themselves without asking for help, feeling stress like Why do I have to
set this?  For example, max_prepared_transactions is called hideous
creature in the following page:


https://community.jboss.org/wiki/InstallPostgreSQLOnFedora?_sscc=t


Anybody who follows that page is screwed anyway. I notice they recommend
running regular VACUUM FULL across the whole database, so it's obvious 
they

know nothing about postgresql. There's nothing we can do about what people
write on random pages around the Internet.


Regular VACUUM FULL is certainly overkill.  Apart from that, having to set 
max_prepared_transactions seems to make PostgreSQL difficult for people with 
that level of knowledge, doesn't it?  I wonder if there are other major 
DBMSs which require marameter configuration and server restart to use 
distributed transactions.





According to the below page, the amount of memory consumed for this is

(770 + 270 * max_locks_per_transaction) * max_prepared_transactions.
With the default setting of maxconnections=100 and
max_locks_per_transaction=64, this is only 180KB.  So the overhead is
negligible.

You are assuming memory is the only overhead. I don't think it is.


Having a quick look at the source code, just setting 
max_prepared_transactions to non-zero seems to produce almost no processing 
overhead.



If the goal is to make PostgreSQL more friendly and run smoothly without

frustration from the start and not perfect tuning, I think
max_prepared_transactions=max_connections is an easy and good item.  If 
the

goal is limited to auto-tuning memory sizes, this improvement can be
treated separately.

Frankly, I think we'd help 1000 times more users of we enabled a few wal
writers by default and jumped the wal level. Mainly so they could run one
off base backup. That's used by orders of magnitude more users than XA.


Agreed.  The default of non-zero max_wal_senders and wal_level  'archive' 
would be beneficial for more users.  Likewise, non-zero 
max_prepared_transactons would improve the impression of PostgreSQL (for 
limited number of users, though), and it wouldn't do any harm.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas
On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 Well, I just think relying on specific symbol names in the .so file is
 kind of unfortunate.  It means that, for example, you can't have
 multiple output plugins provided by a single .so.  And in general I
 think it's something that we've tried to minimize.

 But that's not really different when you rely on _PG_init doing it's
 thing, right?

Sure, that's true.  But in general I think magic symbol names aren't a
particularly good design.

 But there's only so much information available here.  Why not just
 have a format that logs it all?

 Because we do not know what all is? Also, how would we handle
 replication sets and such that all of the existing replication solutions
 have generically?

I don't see how you can fail to know what all is.  There's only a
certain set of facts available.  I mean you could log irrelevant crap
like a random number that you just picked or the sum of all numeric
values in the column, but nobody's likely to want that.  What people
are going to want is the operation performed (insert, update, or
delete), all the values in the new tuple, the key values from the old
tuple, the transaction ID, and maybe some meta-information about the
transaction (such as the commit timestamp).  What I'd probably do is
emit the data in CSV format, with the first column of each line being
a single character indicating what sort of row this is: H means a
header row, defining the format of subsequent rows
(H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk;
a new header row is emitted only when the column list changes); I, U,
or D means an insert, update, or delete, with column 2 being the
transaction ID, column 3 being the table name, and the remaining
columns matching the last header row for emitted for that table, T
means meta-information about a transaction, whatever we have (e.g.
T,txn_id,commit_time).  There's probably some further tweaking of that
that could be done, and I might be overlooking some salient details,
like maybe we want to indicate the column types as well as their
names, but the range of things that someone can want to do here is not
unlimited.  The point, for me anyway, is that someone can write a
crappy Perl script to apply changes from a file like this in a day.
My contention is that there are a lot of people who will want to do
just that, for one reason or another.  The plugin interface has
awesome power and flexibility, and really high-performance replication
solutions will really benefit from that.  But regular people don't
want to write C code; they just want to write a crappy Perl script.
And I think we can facilitate that without too much work.

 Oh, yuck.  So that means you have to write an extra WAL record for
 EVERY heap insert, update, or delete to a catalog table?  OUCH.

 Yes. We could integrate it into the main record without too many
 problems, but it didn't seem like an important optimization and it would
 have higher chances of slowing down wal_level  logical.

Hmm.  I don't know whether that's an important optimization or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Andres Freund
On 2013-10-15 21:41:18 +0900, MauMau wrote:
 Likewise, non-zero max_prepared_transactons would improve the
 impression of PostgreSQL (for limited number of users, though), and it
 wouldn't do any harm.

I've seen several sites shutting down because of forgotten prepared
transactions causing bloat and anti-wraparound shutdowns.

A big, big -1 for changing that default.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread MauMau

From: Dimitri Fontaine dimi...@2ndquadrant.fr

The reason why that parameter default has changed from 5 to 0 is that
some people would mistakenly use a prepared transaction without a
transaction manager. Few only people are actually using a transaction
manager that it's better to have them have to set PostgreSQL.


I guess this problem is not unique to PostgreSQL.  I think PostgreSQL can be 
more friendly for normal users (who use external transaction manager), and 
does not need to be too conservative because of people who do irregular 
things.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas
On Mon, Oct 14, 2013 at 5:07 PM, Andres Freund and...@2ndquadrant.com wrote:
 So, see the attatched benchmark skript. I've always done using a disk
 bound and a memory bound (using eatmydata, preventing fsyncs) run.

 * unpatched run, wal_level = hot_standby, eatmydata
 * unpatched run, wal_level = hot_standby

 * patched run, wal_level = hot_standby, eatmydata
 * patched run, wal_level = hot_standby

 * patched run, wal_level = logical, eatmydata
 * patched run, wal_level = logical

 Based on those results, there's no difference above noise for
 wal_level=hot_standby, with or without the patch. With wal_level=logical
 there's a measurable increase in wal traffic (~12-17%), but no
 performance decrease above noise.

 From my POV that's ok, those are really crazy catalog workloads.

Any increase in WAL traffic will translate into a performance hit once
the I/O channel becomes saturated, but I agree those numbers don't
sound terrible for that faily-brutal test case. Actually, I was more
concerned about the hit on non-catalog workloads.  pgbench isn't a
good test because the key column is so narrow; but suppose we have a
table like (a text, b integer, c text) where (a, c) is the primary key
and those strings are typically pretty long - say just short enough
that we can still index the column.  It'd be worth testing both
workloads where the primary key doesn't change (so the only overhead
is figuring out that we need not log it) and those where it does
(where we're double-logging most of the tuple).  I assume the latter
has to produce a significant hit to WAL volume, and I don't think
there's much we can do about that; but the former had better be nearly
free.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Magnus Hagander
On Tue, Oct 15, 2013 at 2:47 PM, MauMau maumau...@gmail.com wrote:
 From: Dimitri Fontaine dimi...@2ndquadrant.fr

 The reason why that parameter default has changed from 5 to 0 is that
 some people would mistakenly use a prepared transaction without a
 transaction manager. Few only people are actually using a transaction
 manager that it's better to have them have to set PostgreSQL.


 I guess this problem is not unique to PostgreSQL.  I think PostgreSQL can be
 more friendly for normal users (who use external transaction manager), and
 does not need to be too conservative because of people who do irregular
 things.

I would say *using* an external transaction manager *is* the irregular
thing. The current default *is* friendly for normal users, for example
see the comments from Andres about what happens if you make a mistake.
So I definitely agree with your sentiment that we should be more
friendly for normal users - but in this case we are.

If I look through all the customers I've worked with, only a handful
have actually used a transaction manager. And of those, at least half
of them were using it even though they didn't need it, because they
didn't know what it was.

But the argument about being friendly for new users should definitely
have us change wal_level and max_wal_senders.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Robert Haas
On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, that sucks.  So it's a Windows bug.

 It's not clear to me that we should do anything about this at all,
 except perhaps document that people should avoid long tablespace
 path names on an unknown set of Windows versions.  We should not
 be in the business of working around any and every bug coming out
 of Redmond.

It's sort of incomprehensible to me that Microsoft has a bug like this
and apparently hasn't fixed it.  But I think I still favor trying to
work around it.  When people try to use a long data directory name and
it freezes the system, some of them will blame us rather than
Microsoft.  We've certainly gone to considerable lengths to work
around extremely strange bugs in various compiler toolchains, even
relatively obscure ones.  I don't particularly see why we shouldn't do
the same here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Magnus Hagander
On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, that sucks.  So it's a Windows bug.

 It's not clear to me that we should do anything about this at all,
 except perhaps document that people should avoid long tablespace
 path names on an unknown set of Windows versions.  We should not
 be in the business of working around any and every bug coming out
 of Redmond.

 It's sort of incomprehensible to me that Microsoft has a bug like this
 and apparently hasn't fixed it.  But I think I still favor trying to
 work around it.  When people try to use a long data directory name and
 it freezes the system, some of them will blame us rather than
 Microsoft.  We've certainly gone to considerable lengths to work
 around extremely strange bugs in various compiler toolchains, even
 relatively obscure ones.  I don't particularly see why we shouldn't do
 the same here.

I agree we'll probably want to work around it in the end, but I still
think it should be put to Microsoft PSS if we can. The usual - have we
actually produced a self-contained example that does just this (and
doesn't include the full postgres support) and submitted it to
*microsoft* for comments? Not talking about their end user forums, but
the actual microsoft support services? (AFAIK at least EDB, and
probably other pg companies as well, have agreements with MS that lets
you get access to their real support. I know I used to have it at my
last job, and used it a number of times during the initial porting
work. The people backing that one are generally pretty good)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-15 Thread k...@rice.edu
On Tue, Oct 15, 2013 at 03:11:22PM +0900, KONDO Mitsumasa wrote:
 (2013/10/15 13:33), Amit Kapila wrote:
 Snappy is good mainly for un-compressible data, see the link below:
 http://www.postgresql.org/message-id/CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4yebmoa...@mail.gmail.com
 This result was gotten in ARM architecture, it is not general CPU.
 Please see detail document.
 http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9
 
 I found compression algorithm test in HBase. I don't read detail,
 but it indicates snnapy algorithm gets best performance.
 http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of
 
 In fact, most of modern NoSQL storages use snappy. Because it has
 good performance and good licence(BSD license).
 
 I think it is bit difficult to prove that any one algorithm is best
 for all kind of loads.
 I think it is necessary to make best efforts in community than I do
 the best choice with strict test.
 
 Regards,
 -- 
 Mitsumasa KONDO
 NTT Open Source Software Center
 

Google's lz4 is also a very nice algorithm with 33% better compression
performance than snappy and 2X the decompression performance in some
benchmarks also with a bsd license:

https://code.google.com/p/lz4/

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund
On 2013-10-15 08:42:20 -0400, Robert Haas wrote:
 On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund and...@2ndquadrant.com wrote:
  Well, I just think relying on specific symbol names in the .so file is
  kind of unfortunate.  It means that, for example, you can't have
  multiple output plugins provided by a single .so.  And in general I
  think it's something that we've tried to minimize.
 
  But that's not really different when you rely on _PG_init doing it's
  thing, right?
 
 Sure, that's true.  But in general I think magic symbol names aren't a
 particularly good design.

It allows you to use the shared libary both as a normal extension loaded
via shared_preload_library or adhoc and as an output plugin which seems
like a sensible goal.
We could have a single _PG_init_output_plugin() symbol that fills in
such a struct which would then not conflict with using the .so
independently. If you prefer that I'll change things around.

We can't do something like 'output_plugin_in_progress' before calling
_PG_init() because _PG_init() won't be called again if the shared object
is already loaded...

  But there's only so much information available here.  Why not just
  have a format that logs it all?
 
  Because we do not know what all is? Also, how would we handle
  replication sets and such that all of the existing replication solutions
  have generically?
 
 I don't see how you can fail to know what all is.  There's only a
 certain set of facts available.  I mean you could log irrelevant crap
 like a random number that you just picked or the sum of all numeric
 values in the column, but nobody's likely to want that.  What people
 are going to want is the operation performed (insert, update, or
 delete), all the values in the new tuple, the key values from the old
 tuple, the transaction ID, and maybe some meta-information about the
 transaction (such as the commit timestamp).

Some will want all column names included because that makes replication
into different schemas/databases easier, others won't because it makes
replicating the data more complicated and expensive.
Lots will want the primary key as a separate set of columns even for
inserts, others not.
There's also datatypes of values and null representation.

 What I'd probably do is
 emit the data in CSV format, with the first column of each line being
 a single character indicating what sort of row this is: H means a
 header row, defining the format of subsequent rows
 (H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk;
 a new header row is emitted only when the column list changes); I, U,
 or D means an insert, update, or delete, with column 2 being the
 transaction ID, column 3 being the table name, and the remaining
 columns matching the last header row for emitted for that table, T
 means meta-information about a transaction, whatever we have (e.g.
 T,txn_id,commit_time).

There's two issues I have with this:
a) CSV seems like a bad format for this. If a transaction inserts into
multiple tables the number of columns will constantly change. Many CSV
parsers don't deal with that all too gracefully. E.g. you can't even
load the data into another postgres database as an audit log.

If we go for CSV I think we should put the entire primary key as one
column (containing all the columns) and the entire row another.

We also don't have any nice facilities for actually writing CSV - so
we'll need to start extracting escaping code from COPY. In the end all
that will make the output plugin very hard to use as an example because
the code will get more complicated.

b) Emitting new row descriptors everytime the schema changes will
require keeping track of the schema. I think that won't be trivial. It
also makes consumption of the data more complicated in comparison to
including the description with every row.

Both are even more true once we extend the format to support streaming
of transactions while they are performed.

 But regular people don't want to write C code; they just want to write
 a crappy Perl script.  And I think we can facilitate that without too
 much work.

I think the generic output plugin should be a separate one from the
example one (which is the one included in the patchset).

  Oh, yuck.  So that means you have to write an extra WAL record for
  EVERY heap insert, update, or delete to a catalog table?  OUCH.
 
  Yes. We could integrate it into the main record without too many
  problems, but it didn't seem like an important optimization and it would
  have higher chances of slowing down wal_level  logical.
 
 Hmm.  I don't know whether that's an important optimization or not.

Based on the benchmark I'd say no. If we discover we need to go there we
can do so later. I don't forsee this to be really problematic.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund
On 2013-10-15 08:49:26 -0400, Robert Haas wrote:
 On Mon, Oct 14, 2013 at 5:07 PM, Andres Freund and...@2ndquadrant.com wrote:
  So, see the attatched benchmark skript. I've always done using a disk
  bound and a memory bound (using eatmydata, preventing fsyncs) run.
 
  * unpatched run, wal_level = hot_standby, eatmydata
  * unpatched run, wal_level = hot_standby
 
  * patched run, wal_level = hot_standby, eatmydata
  * patched run, wal_level = hot_standby
 
  * patched run, wal_level = logical, eatmydata
  * patched run, wal_level = logical
 
  Based on those results, there's no difference above noise for
  wal_level=hot_standby, with or without the patch. With wal_level=logical
  there's a measurable increase in wal traffic (~12-17%), but no
  performance decrease above noise.
 
  From my POV that's ok, those are really crazy catalog workloads.
 
 Any increase in WAL traffic will translate into a performance hit once
 the I/O channel becomes saturated, but I agree those numbers don't
 sound terrible for that faily-brutal test case.

Well, the parallel workloads were fsync saturated although probably not
throughput, that's why I added them. But yes, it's not the same as a
throughput saturated IO channel.
Probably the worst case real-world workload is one that uses lots and
lots of ON COMMIT DROP temporary tables.

 Actually, I was more concerned about the hit on non-catalog workloads.  
 pgbench isn't a
 good test because the key column is so narrow; but suppose we have a
 table like (a text, b integer, c text) where (a, c) is the primary key
 and those strings are typically pretty long - say just short enough
 that we can still index the column.  It'd be worth testing both
 workloads where the primary key doesn't change (so the only overhead
 is figuring out that we need not log it) and those where it does
 (where we're double-logging most of the tuple).  I assume the latter
 has to produce a significant hit to WAL volume, and I don't think
 there's much we can do about that; but the former had better be nearly
 free.

Ah, ok. Then I misunderstood you.

Is there a specific overhead you are afraid of in the
pkey-doesn't-change scenario? The changed wal logging (buffer in a
separate rdata entry) or the check whether the primary key has changed?

The only way I have been able to measure differences in that scenario
was to load a table with a low fillfactor and wide tuples, checkpoint,
and then update lots of rows. On wal_level=logical that will result in
full-page-images and tuple data being logged which can be noticeable if
you have really large tuples, even if the pkey doesn't change.

We could optimize that by not actually logging the tuple data in that
case but just include the tid so we could extract things from the Bkp
block ourselves. But that will complicate the code and doesn't yet seem
warranted.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Triggers on foreign tables

2013-10-15 Thread Robert Haas
On Mon, Oct 14, 2013 at 5:24 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 And, I also want some comments from committers, not only from mine.

 +1

 +1

/me pokes head up.  I know I'm going to annoy people with this
comment, but I feel like it's going to have to be made at some point
by somebody, so here goes: I don't see the point of this feature.  If
you want a trigger on a table, why not set it on the remote side?  A
trigger on the foreign table won't be enforced consistently; it'll
only work when the update is routed through the foreign table, not
when people access the underlying table on the remote side through any
other mechanism.  The number of useful things you can do this way
seems fairly small.  Perhaps you could use a row-level trigger for
RLS, to allow only certain rows on the foreign side to be updated, but
even that seems like a slightly strange design: generally it'd be
better to enforce the security as close to the target object as
possible.

There's another issue that concerns me here also: performance.  IIUC,
an update of N tuples on the remote side currently requires N+1 server
round-trips.  That is unspeakably awful, and we really oughta be
looking for ways to make that number go down, by pushing the whole
update to the remote side.  But obviously that won't be possible if
there's a per-row trigger that has to be evaluated on the local side.
Now, assuming somebody comes up with code that implements that
optimization, we can just disable it when there are local-side
triggers.  But, then you're back to having terrible performance.  So
even if the use case for this seemed really broad, I tend to think
performance concerns would sink most of the possible real-world uses.

I could, of course, be all wet

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund
On 2013-10-15 15:17:58 +0200, Andres Freund wrote:
 If we go for CSV I think we should put the entire primary key as one
 column (containing all the columns) and the entire row another.

What about columns like:
* action B|I|U|D|C

* xid
* timestamp

* tablename

* key name
* key column names
* key column types

* new key column values

* column names
* column types
* column values

* candidate_key_changed?
* old key column values

And have output plugin options
* include-column-types
* include-column-names
* include-primary-key

If something isn't included it's simply left out.

What still need to be determined is:
* how do we separate and escape multiple values in one CSV column
* how do we represent NULLs

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Triggers on foreign tables

2013-10-15 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 /me pokes head up.  I know I'm going to annoy people with this
 comment, but I feel like it's going to have to be made at some point

Perhaps some folks will be annoyed- I'm not annoyed, but I don't
really agree. :)

 by somebody, so here goes: I don't see the point of this feature.  If
 you want a trigger on a table, why not set it on the remote side?  A
 trigger on the foreign table won't be enforced consistently; it'll
 only work when the update is routed through the foreign table, not
 when people access the underlying table on the remote side through any
 other mechanism.  The number of useful things you can do this way
 seems fairly small.  Perhaps you could use a row-level trigger for
 RLS, to allow only certain rows on the foreign side to be updated, but
 even that seems like a slightly strange design: generally it'd be
 better to enforce the security as close to the target object as
 possible.

I can certainly see use-cases for this, a very simple one being a way to
keep track of what's been updated/inserted/whatever through this
particular foreign table (essentially, an auditing table).  The *remote*
side might not be ideal for tracking that information and you might want
the info locally and remotely anyway.

 There's another issue that concerns me here also: performance.  IIUC,
 an update of N tuples on the remote side currently requires N+1 server
 round-trips.  That is unspeakably awful, and we really oughta be
 looking for ways to make that number go down, by pushing the whole
 update to the remote side.  But obviously that won't be possible if
 there's a per-row trigger that has to be evaluated on the local side.
 Now, assuming somebody comes up with code that implements that
 optimization, we can just disable it when there are local-side
 triggers.  But, then you're back to having terrible performance.  So
 even if the use case for this seemed really broad, I tend to think
 performance concerns would sink most of the possible real-world uses.

Performance, while a concern, should probably be secondary when there
are valid use-cases for this where the performance wouldn't be a problem
for users.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 9:17 AM, Andres Freund and...@2ndquadrant.com wrote:
 It allows you to use the shared libary both as a normal extension loaded
 via shared_preload_library or adhoc and as an output plugin which seems
 like a sensible goal.
 We could have a single _PG_init_output_plugin() symbol that fills in
 such a struct which would then not conflict with using the .so
 independently. If you prefer that I'll change things around.

I think part of the problem may be that you're using the library name
to identify the output plugin.  I'm not excited about that design.
For functions, you give the function a name and that is a pointer to
where to actually find the function, which may be a 2-tuple
library-name, function-name, or perhaps just a 1-tuple
builtin-function-name, or maybe the whole text of a PL/pgsql
procedure that should be compiled.

Perhaps this ought to work similarly.  Create a function in pg_proc
which returns the structure containing the function pointers.  Then,
when that output plugin is selected, it'll automatically trigger
loading the correct shared library if that's needed; and the shared
library name may (but need not) match the output plugin name.

 What I'd probably do is
 emit the data in CSV format, with the first column of each line being
 a single character indicating what sort of row this is: H means a
 header row, defining the format of subsequent rows
 (H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk;
 a new header row is emitted only when the column list changes); I, U,
 or D means an insert, update, or delete, with column 2 being the
 transaction ID, column 3 being the table name, and the remaining
 columns matching the last header row for emitted for that table, T
 means meta-information about a transaction, whatever we have (e.g.
 T,txn_id,commit_time).

 There's two issues I have with this:
 a) CSV seems like a bad format for this. If a transaction inserts into
 multiple tables the number of columns will constantly change. Many CSV
 parsers don't deal with that all too gracefully. E.g. you can't even
 load the data into another postgres database as an audit log.

We can pick some other separator.  I don't think ragged CSV is a big
problem; I'm actually more worried about having an easy way to handle
embedded commas and newlines and so on.  But I'd be fine with
tab-separated data or something too, if you think that's better.  What
I want is something that someone can parse with a script that can be
written in a reasonable amount of time in their favorite scripting
language.  I predict that if we provide something like this we'll
vastly expand the number of users who can make use of this new
functionality.

User: So, what's new in PostgreSQL 9.4?
Hacker: Well, now we have logical replication!
User: Why is that cool?
Hacker: Well, streaming replication is awesome for HA, but it has
significant limitations.  And trigger-based systems are very mature,
but the overhead is high and their lack of core integration makes them
hard to use.  With this technology, you can build systems that will
replicate individual tables or even parts of tables, multi-master
systems, and lots of other cool stuff.
User: Wow, that sounds great.  How do I use it?
Hacker: Well, first you write an output plugin in C using a special API.
User: Hey, do you know whether the MongoDB guys came to this conference?

Let's try that again.

User: Wow, that sounds great.  How do I use it?
Hacker: Well, currently, the output gets dumped as a series of text
files that are designed to be parsed using a scripting language.  We
have sample parsers written in Perl and Python that you can use as-is
or hack up to meet your needs.

Now, some users are still going to head for the hills.  But at least
from where I sit it sounds a hell of a lot better than the first
answer.  We're not going to solve all of the tooling problems around
this technology in one release, for sure.  But as far as 95% of our
users are concerned, a C API might as well not exist at all.  People
WILL try to machine parse the output of whatever demo plugins we
provide; so I think we should try hard to provide at least one such
plugin that is designed to make that as easy as possible.

 If we go for CSV I think we should put the entire primary key as one
 column (containing all the columns) and the entire row another.

 We also don't have any nice facilities for actually writing CSV - so
 we'll need to start extracting escaping code from COPY. In the end all
 that will make the output plugin very hard to use as an example because
 the code will get more complicated.

 b) Emitting new row descriptors everytime the schema changes will
 require keeping track of the schema. I think that won't be trivial. It
 also makes consumption of the data more complicated in comparison to
 including the description with every row.

 Both are even more true once we extend the format to support streaming
 of transactions while they are 

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Hannu Krosing
On 10/15/2013 01:42 PM, Robert Haas wrote:
 On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 Well, I just think relying on specific symbol names in the .so file is
 kind of unfortunate.  It means that, for example, you can't have
 multiple output plugins provided by a single .so.  And in general I
 think it's something that we've tried to minimize.
 But that's not really different when you rely on _PG_init doing it's
 thing, right?
 Sure, that's true.  But in general I think magic symbol names aren't a
 particularly good design.

 But there's only so much information available here.  Why not just
 have a format that logs it all?
 Because we do not know what all is? Also, how would we handle
 replication sets and such that all of the existing replication solutions
 have generically?
 I don't see how you can fail to know what all is.  
We instinctively know what all is - as in the famous case of buddhist
ordering a
hamburger - Make me All wit Everything :) - but the requirements of
different  replications systems vary wildly.

 ...
 What people
 are going to want is the operation performed (insert, update, or
 delete), all the values in the new tuple, the key values from the old
 tuple, 
For multi-master / conflict resolution you may also want all old
values to make sure that they have not changed on target.

the difference in WAL volume can be really significant, especially
in the case of DELETE, where there are no new columns.

for some forms of conflict resolution we may even want to know
the database user who initiated the operation. and possibly even
some session variables like very_important=yes.

 ... The point, for me anyway, is that someone can write a
 crappy Perl script to apply changes from a file like this in a day.
 My contention is that there are a lot of people who will want to do
 just that, for one reason or another.  The plugin interface has
 awesome power and flexibility, and really high-performance replication
 solutions will really benefit from that.  But regular people don't
 want to write C code; they just want to write a crappy Perl script.
 And I think we can facilitate that without too much work.
just provide a to-csv or to-json plugin and the crappy perl guys
are happy.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Hannu Krosing
On 10/15/2013 02:47 PM, Andres Freund wrote:
 On 2013-10-15 15:17:58 +0200, Andres Freund wrote:
 If we go for CSV I think we should put the entire primary key as one
 column (containing all the columns) and the entire row another.
just use JSON :)
 What about columns like:
 * action B|I|U|D|C

 * xid
 * timestamp

 * tablename

 * key name
 * key column names
 * key column types

 * new key column values

 * column names
 * column types
 * column values

 * candidate_key_changed?
 * old key column values

 And have output plugin options
 * include-column-types
 * include-column-names
 * include-primary-key

 If something isn't included it's simply left out.

 What still need to be determined is:
 * how do we separate and escape multiple values in one CSV column
 * how do we represent NULLs

or borrow whatever possible from pg_dump as they have
needed to solve most of the same problems already and
consistency is good in general

 Greetings,

 Andres Freund




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 9:47 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-15 15:17:58 +0200, Andres Freund wrote:
 If we go for CSV I think we should put the entire primary key as one
 column (containing all the columns) and the entire row another.

 What about columns like:
 * action B|I|U|D|C

BEGIN and COMMIT?

 * xid
 * timestamp

 * tablename

 * key name
 * key column names
 * key column types

 * new key column values

 * column names
 * column types
 * column values

 * candidate_key_changed?
 * old key column values

Repeating the column names for every row strikes me as a nonstarter.
If the plugin interface isn't rich enough to provide a convenient way
to avoid that, then it needs to be fixed so that it is, because it
will be a common requirement.  Sure, some people may want JSON or XML
output that reiterates the labels every time, but for a lot of people
that's going to greatly increase the size of the output and be
undesirable for that reason.

 What still need to be determined is:
 * how do we separate and escape multiple values in one CSV column
 * how do we represent NULLs

I consider the escaping a key design decision.  Ideally, it should be
something that's easy to reverse from a scripting language; ideally
also, it should be something similar to how we handle COPY.  These
goals may be in conflict; we'll have to pick something.

I'm not sure that having multiple values in one column is a good plan,
because now you need multiple levels of parsing to unpack the row.
I'd rather just have a flat column list with a key somewhere
explaining how to interpret the data.  But I'm prepared to give in on
that point so long as we can demonstrate that the format can be easily
parsed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 10:09 AM, Hannu Krosing ha...@krosing.net wrote:
 I don't see how you can fail to know what all is.
 We instinctively know what all is - as in the famous case of buddhist
 ordering a
 hamburger - Make me All wit Everything :) - but the requirements of
 different  replications systems vary wildly.

That's true to some degree, but let's not exaggerate the degree to
which it is true.

 For multi-master / conflict resolution you may also want all old
 values to make sure that they have not changed on target.

The patch as proposed doesn't make that information available.  If you
want that to be an option, now would be the right time to argue for
it.

 for some forms of conflict resolution we may even want to know
 the database user who initiated the operation. and possibly even
 some session variables like very_important=yes.

Well, if you have requirements like logging very_important=yes, then
you're definitely into the territory where you need your own output
plugin.  I have no problem telling people who want that sort of thing
that they've got to go write C code.  What I'm trying to do, as Larry
Wall once said, is to make simple things simple and hard things
possible.  The output plugin interface accomplishes the latter, but,
by itself, not the former.

 And I think we can facilitate that without too much work.
 just provide a to-csv or to-json plugin and the crappy perl guys
 are happy.

Yep, that's exactly what I'm advocating for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund
On 2013-10-15 10:20:55 -0400, Robert Haas wrote:
  For multi-master / conflict resolution you may also want all old
  values to make sure that they have not changed on target.
 
 The patch as proposed doesn't make that information available.  If you
 want that to be an option, now would be the right time to argue for
 it.

I don't think you necessarily want it for most MM solutions, but I agree
it will be useful for some scenarios.

I think the ReorderBufferChange struct needs a better way to distinguish
between old-key and old-tuple now, but I'd rather implement the
facililty for logging the full old tuple in a separate patch. The
patchset is big enough as is, lets not tack on more features.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund
On 2013-10-15 10:09:05 -0400, Robert Haas wrote:
 On Tue, Oct 15, 2013 at 9:17 AM, Andres Freund and...@2ndquadrant.com wrote:
  It allows you to use the shared libary both as a normal extension loaded
  via shared_preload_library or adhoc and as an output plugin which seems
  like a sensible goal.
  We could have a single _PG_init_output_plugin() symbol that fills in
  such a struct which would then not conflict with using the .so
  independently. If you prefer that I'll change things around.
 
 I think part of the problem may be that you're using the library name
 to identify the output plugin.  I'm not excited about that design.
 For functions, you give the function a name and that is a pointer to
 where to actually find the function, which may be a 2-tuple
 library-name, function-name, or perhaps just a 1-tuple
 builtin-function-name, or maybe the whole text of a PL/pgsql
 procedure that should be compiled.

That means you allow trivial remote code execution since you could try
to load system() or something else that's available in every shared
object. Now you can argue that that's OK since we have special checks
for replication connections, but I'd rather not go there.

 Perhaps this ought to work similarly.  Create a function in pg_proc
 which returns the structure containing the function pointers.  Then,
 when that output plugin is selected, it'll automatically trigger
 loading the correct shared library if that's needed; and the shared
 library name may (but need not) match the output plugin name.

I'd like to avoid relying on inserting stuff into pg_proc because that
makes it harder to extract WAL from a HS standby. Requiring to configure
that on the primary to extract data on the standby seems confusing to
me.

But perhaps that's the correct solution :/

 Now, some users are still going to head for the hills.  But at least
 from where I sit it sounds a hell of a lot better than the first
 answer.  We're not going to solve all of the tooling problems around
 this technology in one release, for sure.  But as far as 95% of our
 users are concerned, a C API might as well not exist at all.  People
 WILL try to machine parse the output of whatever demo plugins we
 provide; so I think we should try hard to provide at least one such
 plugin that is designed to make that as easy as possible.

Well, just providing the C API + an example in a first step didn't work
out too badly for FDWs. I am pretty sure that once released there will
soon be extensions for it on PGXN or whatever for special usecases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 12:13 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 If we think this way, then may be we should have max_user_connections
 instead of max_connections and then max_wal_connections. But still
 there are other's like pg_basebackup who needs connections and
 tomorrow there can be new such entities which need connection.
 Also we might need to have different infrastructure in code to make
 these options available to users.
 I think having different parameters to configure maximum connections
 for different entities can complicate both code as well as user's job.

Renaming max_connections is far too big a compatibility break to
consider without far more benefit than what this patch is aiming at.
I'm not prepared to endure the number of beatings I'd have to take if
we did that.

But I also agree that making max_wal_senders act as both a minimum and
a maximum is no good.  +1 to everything Josh Berkus said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Andres Freund
On 2013-10-15 10:29:58 -0400, Robert Haas wrote:
 On Tue, Oct 15, 2013 at 12:13 AM, Amit Kapila amit.kapil...@gmail.com wrote:
  If we think this way, then may be we should have max_user_connections
  instead of max_connections and then max_wal_connections. But still
  there are other's like pg_basebackup who needs connections and
  tomorrow there can be new such entities which need connection.
  Also we might need to have different infrastructure in code to make
  these options available to users.
  I think having different parameters to configure maximum connections
  for different entities can complicate both code as well as user's job.
 
 Renaming max_connections is far too big a compatibility break to
 consider without far more benefit than what this patch is aiming at.
 I'm not prepared to endure the number of beatings I'd have to take if
 we did that.

+many

 But I also agree that making max_wal_senders act as both a minimum and
 a maximum is no good.  +1 to everything Josh Berkus said.

Josh said we should treat replication connections in a separate pool
from normal database connections, right? So you withdraw your earlier
objection to that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 10:27 AM, Andres Freund and...@2ndquadrant.com wrote:
 I think part of the problem may be that you're using the library name
 to identify the output plugin.  I'm not excited about that design.
 For functions, you give the function a name and that is a pointer to
 where to actually find the function, which may be a 2-tuple
 library-name, function-name, or perhaps just a 1-tuple
 builtin-function-name, or maybe the whole text of a PL/pgsql
 procedure that should be compiled.

 That means you allow trivial remote code execution since you could try
 to load system() or something else that's available in every shared
 object. Now you can argue that that's OK since we have special checks
 for replication connections, but I'd rather not go there.

Well, obviously you can't let somebody load any library they want.
But that's pretty much true anyway; LOAD had better be confined to
superusers unless there is something (like a pg_proc entry) that
provides prior authorization for that specific load.

 Perhaps this ought to work similarly.  Create a function in pg_proc
 which returns the structure containing the function pointers.  Then,
 when that output plugin is selected, it'll automatically trigger
 loading the correct shared library if that's needed; and the shared
 library name may (but need not) match the output plugin name.

 I'd like to avoid relying on inserting stuff into pg_proc because that
 makes it harder to extract WAL from a HS standby. Requiring to configure
 that on the primary to extract data on the standby seems confusing to
 me.

 But perhaps that's the correct solution :/

That's a reasonable concern.  I don't have another idea at the moment,
unless we want to allow replication connections to issue LOAD
commands.  Then you can LOAD the library, so that the plug-in is
registered under the well-known name you expect it to have, and then
use that name to start replication.

 Now, some users are still going to head for the hills.  But at least
 from where I sit it sounds a hell of a lot better than the first
 answer.  We're not going to solve all of the tooling problems around
 this technology in one release, for sure.  But as far as 95% of our
 users are concerned, a C API might as well not exist at all.  People
 WILL try to machine parse the output of whatever demo plugins we
 provide; so I think we should try hard to provide at least one such
 plugin that is designed to make that as easy as possible.

 Well, just providing the C API + an example in a first step didn't work
 out too badly for FDWs. I am pretty sure that once released there will
 soon be extensions for it on PGXN or whatever for special usecases.

I suspect so, too.  But I also think that if that's the only thing
available in the first release, a lot of users will get a poor initial
impression.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Release note fix for timeline item

2013-10-15 Thread Bruce Momjian
On Tue, Oct 15, 2013 at 02:32:47PM +0900, KONDO Mitsumasa wrote:
 Sorry for my reply late...
 
 (2013/10/08 23:26), Bruce Momjian wrote:
  First, I want to apologize for not completing the release notes earlier
  so that others could review them.  I started working on the release
  notes on Friday, but my unfamiliarity with the process and fear of
  making a mistake caused many delays.  I have improved the documentation
  on the process which will hopefully help next time.
 There isn't anything in particular that I was dissatisfied about it.
 
 You are right that there is alot of details skipped in the release note
 text.  I have developed the attached patch which I think does a better
 job.  Is it OK?
 Yes, off course!
 Thanks for your sincere action!

Thanks, patch applied back through 9.1.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote:
 But I also agree that making max_wal_senders act as both a minimum and
 a maximum is no good.  +1 to everything Josh Berkus said.

 Josh said we should treat replication connections in a separate pool
 from normal database connections, right? So you withdraw your earlier
 objection to that?

I don't think that's what he said.  Here's what I was referring to:

$ Changing max_wal_senders requires a restart.  As such, we currently
$ advise users to set the setting generously: as many replication
$ connections as you think you'll ever need, plus two.  If
$ max_wal_senders is a reservation which could cause the user to run out
$ of other connections sooner than expected, then the user is faced with a
$ new hard to set parameter: they don't want to set it too high *or* too
$ low.
$
$ This would result in a lot of user frustration as they try to get thier
$ connection configuration right and have to restart the server multiple
$ times.  I find few new features worth making it *harder* to configure
$ PostgreSQL, and reserved replication connections certainly don't qualify.
$
$ If it's worth having reserved replication connections (and I can see
$ some reasons to want it), then we need a new GUC for this:
$ reserved_walsender_connections

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Andres Freund
On 2013-10-15 10:36:41 -0400, Robert Haas wrote:
 On Tue, Oct 15, 2013 at 10:34 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  But I also agree that making max_wal_senders act as both a minimum and
  a maximum is no good.  +1 to everything Josh Berkus said.
 
  Josh said we should treat replication connections in a separate pool
  from normal database connections, right? So you withdraw your earlier
  objection to that?
 
 I don't think that's what he said.  Here's what I was referring to:
 
 $ Changing max_wal_senders requires a restart.  As such, we currently
 $ advise users to set the setting generously: as many replication
 $ connections as you think you'll ever need, plus two.  If
 $ max_wal_senders is a reservation which could cause the user to run out
 $ of other connections sooner than expected, then the user is faced with a
 $ new hard to set parameter: they don't want to set it too high *or* too
 $ low.
 $
 $ This would result in a lot of user frustration as they try to get thier
 $ connection configuration right and have to restart the server multiple
 $ times.  I find few new features worth making it *harder* to configure
 $ PostgreSQL, and reserved replication connections certainly don't qualify.
 $
 $ If it's worth having reserved replication connections (and I can see
 $ some reasons to want it), then we need a new GUC for this:
 $ reserved_walsender_connections

I am referring to
http://archives.postgresql.org/message-id/525C31D3.3010006%40agliodbs.com
:
 On 10/14/2013 10:51 AM, Andres Freund wrote:
  Imo the complications around this prove my (way earlier) point that it'd
  be much better to treat replication connections as something entirely
  different to normal SQL connections. There's really not much overlap
  here and while there's some philosophical point to be made about it all
  being connections, from a practical POV treating them separately seems
  better.
 
 Given that replication connections don't even appear in pg_stat_activity
 now, I'd agree with you.


I still fail to see what the point is in treating those classes of
connections together. It only serves to confuse users and makes
considerations way much more complicated. There's no need for reserved
replication connections or anything like it if would separate the pools.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doc Patch: Subquery section to say that subqueries can't modify data

2013-10-15 Thread Vik Fearing
On 08/06/2013 11:03 PM, Karl O. Pinc wrote:
 The attached documentation patch, doc-subqueries-v1.patch,
 applies against head.

 I wanted to document that subqueries can't modify data.
 This is mentioned in the documentation for SELECT and
 implied elsewhere but I was looking for something more
 than an 'in-passing' mention.  

 (I wrote a bad query,
 modifying data in a subquery, couldn't recall where
 it was documented that you can't do this, and couldn't
 find the answer from the TOC or the index.  Now that
 there's lots of statements with RETURNING clauses
 it's natural to want to use them in subqueries.)

Hello, I am (finally) reviewing this patch.

After reading your reasoning, David's rebuttal, and the patch itself;
I'm wondering if this is needed or wanted at all.

Supposing it is wanted, it creates more questions than it answers.  The
two biggies are:

* In what other contexts can tabular subqueries be used?
* What are other ways of integrating data returned by data modification
statements?

On a superficial level I find the number of commas a bit clunky, and
parentheses is misspelled.

 The last 2 sentences of the first paragraph are
 something in the way of helpful hints and may not
 be appropriate, or even accurate.  I've left them in 
 for review.

I think the last sentence (of the first paragraph) is a bit much, but
the penultimate seems fine.

I'm attaching an updated patch that I think is an improvement but it's
still at a draft level and needs more copyediting.  This new patch does
not attempt to answer the two questions above.

-- 
Vik

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index c32c857..b134b66 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -549,7 +549,7 @@ SELECT * FROM my_table AS m WHERE my_table.a gt; 5;-- wrong
 SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
 /programlisting
  Additionally, an alias is required if the table reference is a
- subquery (see xref linkend=queries-subqueries).
+ subquery (see xref linkend=queries-subquery-derived-tables).
 /para
 
 para
@@ -590,10 +590,10 @@ SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
 /para
/sect3
 
-   sect3 id=queries-subqueries
-titleSubqueries/title
+   sect3 id=queries-subquery-derived-tables
+titleSubquery Derived Tables/title
 
-indexterm zone=queries-subqueries
+indexterm zone=queries-subquery-derived-tables
  primarysubquery/primary
 /indexterm
 
@@ -1315,6 +1315,44 @@ SELECT DISTINCT ON (replaceableexpression/replaceable optional, replaceab
  /sect1
 
 
+ sect1 id=queries-subqueries
+  titleSubqueries/title
+
+  indexterm zone=queries-subqueries
+   primarysubquery/primary
+  /indexterm
+
+  indexterm zone=queries-subqueries
+   primarysub-select/primary
+  /indexterm
+
+  para
+   Subqueries, also called sub-selects, are queries written within
+   parentheses in the text of larger queries.  The values produced by
+   subqueries may be scalar, or tabular.  Scalar subqueries are used within expressions as described
+   in xref linkend=sql-syntax-scalar-subqueries.
+   Tabular subqueries may substitute for tables as described
+   in xref linkend=queries-subquery-derived-tables, generate array
+   content as described
+   in xref linkend=sql-syntax-array-constructors, have their
+   result content tested within expressions as described
+   in xref linkend=functions-subquery, or be used in other
+   contexts.  Often either joins or subqueries can be used to produce
+   different query plans yielding identical output.
+  /para
+
+  para
+   Subqueries may not modify database
+   content.  link linkend=queries-withCommon Table
+   Expressions/link are one way to integrate data returned by data
+   modification statements,
+   i.e. commandINSERT/command/commandUPDATE/command/commandDELETE/command
+   statements with literalRETURNING/literal clauses, into larger
+   queries.
+  /para
+ /sect1
+
+
  sect1 id=queries-union
   titleCombining Queries/title
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund
On 2013-10-15 10:15:14 -0400, Robert Haas wrote:
 On Tue, Oct 15, 2013 at 9:47 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-10-15 15:17:58 +0200, Andres Freund wrote:
  If we go for CSV I think we should put the entire primary key as one
  column (containing all the columns) and the entire row another.
 
  What about columns like:
  * action B|I|U|D|C
 
 BEGIN and COMMIT?

That's B and C, yes. You'd rather not have them? When would you replay
the commit without an explicit message telling you to?

 Repeating the column names for every row strikes me as a nonstarter.
 [...]
 Sure, some people may want JSON or XML
 output that reiterates the labels every time, but for a lot of people
 that's going to greatly increase the size of the output and be
 undesirable for that reason.

But I argue that most simpler users - which are exactly the ones a
generic output plugin is aimed at - will want all column names since it
makes replay far easier.

 If the plugin interface isn't rich enough to provide a convenient way
 to avoid that, then it needs to be fixed so that it is, because it
 will be a common requirement.

Oh, it surely is possibly to avoid repeating it. The output plugin
interface simply gives you a relcache entry, that contains everything
necessary.
The output plugin would need to keep track of whether it has output data
for a specific relation and it would need to check whether the table
definition has changed, but I don't see how we could avoid that?

  What still need to be determined is:
  * how do we separate and escape multiple values in one CSV column
  * how do we represent NULLs
 
 I consider the escaping a key design decision.  Ideally, it should be
 something that's easy to reverse from a scripting language; ideally
 also, it should be something similar to how we handle COPY.  These
 goals may be in conflict; we'll have to pick something.

Note that parsing COPYs is a major PITA from most languages...

Perhaps we should make the default output json instead? With every
action terminated by a nullbyte?
That's probably easier to parse from various scripting languages than
anything else.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund
On 2013-10-15 10:34:53 -0400, Robert Haas wrote:
 On Tue, Oct 15, 2013 at 10:27 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I think part of the problem may be that you're using the library name
  to identify the output plugin.  I'm not excited about that design.
  For functions, you give the function a name and that is a pointer to
  where to actually find the function, which may be a 2-tuple
  library-name, function-name, or perhaps just a 1-tuple
  builtin-function-name, or maybe the whole text of a PL/pgsql
  procedure that should be compiled.
 
  That means you allow trivial remote code execution since you could try
  to load system() or something else that's available in every shared
  object. Now you can argue that that's OK since we have special checks
  for replication connections, but I'd rather not go there.
 
 Well, obviously you can't let somebody load any library they want.
 But that's pretty much true anyway; LOAD had better be confined to
 superusers unless there is something (like a pg_proc entry) that
 provides prior authorization for that specific load.

Currently you can create users that have permissions for replication but
which are not superusers. I think we should strive to providing that
capability for changeset extraction as well.

  Perhaps this ought to work similarly.  Create a function in pg_proc
  which returns the structure containing the function pointers.  Then,
  when that output plugin is selected, it'll automatically trigger
  loading the correct shared library if that's needed; and the shared
  library name may (but need not) match the output plugin name.
 
  I'd like to avoid relying on inserting stuff into pg_proc because that
  makes it harder to extract WAL from a HS standby. Requiring to configure
  that on the primary to extract data on the standby seems confusing to
  me.
 
  But perhaps that's the correct solution :/
 
 That's a reasonable concern.  I don't have another idea at the moment,
 unless we want to allow replication connections to issue LOAD
 commands.  Then you can LOAD the library, so that the plug-in is
 registered under the well-known name you expect it to have, and then
 use that name to start replication.

But what's the advantage of that over the current situation or one where
PG_load_output_plugin() is called? The current and related
implementations allow you to only load libraries in some designated
postgres directories and it doesn't allow you to call any arbitrary
functions in there.

Would you be content with a symbol PG_load_output_plugin being called
that fills out the actual callbacks?

  Now, some users are still going to head for the hills.  But at least
  from where I sit it sounds a hell of a lot better than the first
  answer.  We're not going to solve all of the tooling problems around
  this technology in one release, for sure.  But as far as 95% of our
  users are concerned, a C API might as well not exist at all.  People
  WILL try to machine parse the output of whatever demo plugins we
  provide; so I think we should try hard to provide at least one such
  plugin that is designed to make that as easy as possible.
 
  Well, just providing the C API + an example in a first step didn't work
  out too badly for FDWs. I am pretty sure that once released there will
  soon be extensions for it on PGXN or whatever for special usecases.
 
 I suspect so, too.  But I also think that if that's the only thing
 available in the first release, a lot of users will get a poor initial
 impression.

I think lots of people will expect a builtin logical replication
solution :/. Which seems a tad unlikely to arrive in 9.4.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan
On Tue, Oct 15, 2013 at 5:15 AM, Robert Haas robertmh...@gmail.com wrote:
 Well, the SQL standard way of doing this type of operation is MERGE.
 The alternative we know exists in other databases is REPLACE; there's
 also INSERT .. ON DUPLICATE KEY update.  In all of those cases,
 whatever weirdness exists around MVCC is confined to that one command.
  I tend to think we should do similarly, with the goal that
 HeapTupleSatisfiesMVCC need not change at all.

I don't think that it's very pragmatic to define success in terms of
not modifying a single visibility function. I feel it would be more
useful to define it as providing acceptable, non-surprising semantics,
while not regressing performance in other areas.

The fact remains that you're going to have a create a new snapshot
type even for this special case, so I don't see any win as regards
managing invasiveness here. Quite the contrary, in fact.

 I don't have the only vote here, of course, but my feeling is that
 that's more likely to be a good route.

Naturally we all want MERGE. It seems self-defeating to insist on
something significantly harder that there is significant less demand
for, though. I thought that there was at least informal agreement that
this sort of approach was preferable to MERGE in its full generality,
based on feedback at the 2012 developer meeting. I really don't think
that what I've done here is any worse than INSERT...ON DUPLICATE KEY
UPDATE in any of the areas you express concern about here. REPLACE has
some serious problems, and I just don't see it as a viable alternative
at all - just ask any MySQL user.

MERGE is of course more flexible to what I have here in some ways, but
actually less flexible in other ways. I think that the real point of
MERGE is that it's defined in a way that serves data warehousing use
cases very well: the semantics constrain things such that the executor
only has to execute a single ModifyTable node that does inserts,
updates and deletes in a single scan. That's great, but what if it's
useful to do that CRUD (yes, this can include selects) to entirely
different tables? Or what if the relevant DML will only come in a
later statement in the same transaction?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 10:56 AM, Andres Freund and...@2ndquadrant.com wrote:
  That means you allow trivial remote code execution since you could try
  to load system() or something else that's available in every shared
  object. Now you can argue that that's OK since we have special checks
  for replication connections, but I'd rather not go there.

 Well, obviously you can't let somebody load any library they want.
 But that's pretty much true anyway; LOAD had better be confined to
 superusers unless there is something (like a pg_proc entry) that
 provides prior authorization for that specific load.

 Currently you can create users that have permissions for replication but
 which are not superusers. I think we should strive to providing that
 capability for changeset extraction as well.

I agree.

  Perhaps this ought to work similarly.  Create a function in pg_proc
  which returns the structure containing the function pointers.  Then,
  when that output plugin is selected, it'll automatically trigger
  loading the correct shared library if that's needed; and the shared
  library name may (but need not) match the output plugin name.
 
  I'd like to avoid relying on inserting stuff into pg_proc because that
  makes it harder to extract WAL from a HS standby. Requiring to configure
  that on the primary to extract data on the standby seems confusing to
  me.
 
  But perhaps that's the correct solution :/

 That's a reasonable concern.  I don't have another idea at the moment,
 unless we want to allow replication connections to issue LOAD
 commands.  Then you can LOAD the library, so that the plug-in is
 registered under the well-known name you expect it to have, and then
 use that name to start replication.

 But what's the advantage of that over the current situation or one where
 PG_load_output_plugin() is called? The current and related
 implementations allow you to only load libraries in some designated
 postgres directories and it doesn't allow you to call any arbitrary
 functions in there.

Well, I've already said why I don't like conflating the library name
and the plugin name.  It rules out core plugins and libraries that
provide multiple plugins.  I don't have anything to add to that.

 Would you be content with a symbol PG_load_output_plugin being called
 that fills out the actual callbacks?

Well, it doesn't fix the muddling of library names with output plugin
names, but I suppose I'd find it a modest improvement.

  Well, just providing the C API + an example in a first step didn't work
  out too badly for FDWs. I am pretty sure that once released there will
  soon be extensions for it on PGXN or whatever for special usecases.

 I suspect so, too.  But I also think that if that's the only thing
 available in the first release, a lot of users will get a poor initial
 impression.

 I think lots of people will expect a builtin logical replication
 solution :/. Which seems a tad unlikely to arrive in 9.4.

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread k...@rice.edu
On Tue, Oct 15, 2013 at 11:02:39AM -0400, Robert Haas wrote:
  goals may be in conflict; we'll have to pick something.
 
  Note that parsing COPYs is a major PITA from most languages...
 
  Perhaps we should make the default output json instead? With every
  action terminated by a nullbyte?
  That's probably easier to parse from various scripting languages than
  anything else.
 
 I could go for that.  It's not quite as compact as I might hope, but
 JSON does seem to make people awfully happy.
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 

Feeding such a JSON stream into a compression algorithm like lz4 or
snappy should result in a pretty compact stream. The latest lz4 updates
also have ability to use a pre-existing dictionary which would really
help remove the redundant pieces.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund
On 2013-10-15 11:11:24 -0400, Robert Haas wrote:
 I'm not saying go implement MERGE.  I'm saying, make the
 insert-or-update operation a single statement, using some syntax TBD,
 instead of requiring the use of a new insert statement that makes
 invisible rows visible as a side effect, so that you can wrap that in
 a CTE and feed it to an update statement.  That's complex and, AFAICS,
 unlike how any other database product handles this.

I think we most definitely should provide a single statement
variant. That's the one users yearn for.

I also would like a variant where I can lock a row on conflict, for
multimaster scenarios, but that doesn't necessarily have to be exposed
to SQL.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CF 2013-09 Wrap Up

2013-10-15 Thread Mike Blackwell
On Tue, Oct 15, 2013 at 1:39 AM, Noah Misch n...@leadboat.com wrote:

 On Mon, Oct 14, 2013 at 01:56:42PM -0500, Mike Blackwell wrote: Any
 patches marked Needs Review will be automatically moved to the next CF.
   We will try to make sure that all patches in the current CF have
 received
  at least one review.

 The combined effect of those two statements is not clear to me.  Does that
 mean you'll retain never-reviewed patches and automatically move patches
 that
 have received at least one review?


Yes on the latter part.  We will try to get a quick review for
not-yet-reviewed patches and move or return them based on the result of
that review.  If we fail to find a reviewer, the patches will get moved to
the next CF.

For those following along, here are the patches still needing a first look.
 They are for the most part performance or internals patches and could use
the eye of someone more experienced.  Please consider a quick review of one
of them if you fit that description.  We'd like everyone to get a fair
shake here. ^_^

HStore Gin Speeduphttps://commitfest.postgresql.org/action/patch_view?id=1203
Performance Improvement by reducing WAL for Update
Operationhttps://commitfest.postgresql.org/action/patch_view?id=1209
[PoC] pgstattuple2: block sampling to reduce physical
readhttps://commitfest.postgresql.org/action/patch_view?id=1226
ECPG cursor 
readaheadhttps://commitfest.postgresql.org/action/patch_view?id=1195


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund
On 2013-10-15 11:02:39 -0400, Robert Haas wrote:
  If the plugin interface isn't rich enough to provide a convenient way
  to avoid that, then it needs to be fixed so that it is, because it
  will be a common requirement.
 
  Oh, it surely is possibly to avoid repeating it. The output plugin
  interface simply gives you a relcache entry, that contains everything
  necessary.
  The output plugin would need to keep track of whether it has output data
  for a specific relation and it would need to check whether the table
  definition has changed, but I don't see how we could avoid that?
 
 Well, it might be nice if there were a callback for, hey, schema has
 changed!  Seems like a lot of plugins will want to know that for one
 reason or another, and rechecking for every tuple sounds expensive.

I don't really see how we could provide that in any useful manner. We
could provide a callback that is called whenever another transaction has
changed the schema, but there's nothing easily to be done about schema
changes by the replayed transaction itself. And those are the only ones
where meaningful schema changes can happen since the locks the source
transaction has held will prevent most other schema changes.

As much as I hate such code, I guess checking (and possibly storing) the
ctid||xmin of the pg_class row is the easiest thing we could do :(.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan
On Tue, Oct 15, 2013 at 8:07 AM, Peter Geoghegan p...@heroku.com wrote:
 Naturally we all want MERGE. It seems self-defeating to insist on
 something significantly harder that there is significant less demand
 for, though.

I hasten to add: which is not to imply that you're insisting rather
than expressing a sentiment.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 10:48 AM, Andres Freund and...@2ndquadrant.com wrote:
  What about columns like:
  * action B|I|U|D|C

 BEGIN and COMMIT?

 That's B and C, yes. You'd rather not have them? When would you replay
 the commit without an explicit message telling you to?

No, BEGIN and COMMIT sounds good, actually.  Just wanted to make sure
I understood.

 Repeating the column names for every row strikes me as a nonstarter.
 [...]
 Sure, some people may want JSON or XML
 output that reiterates the labels every time, but for a lot of people
 that's going to greatly increase the size of the output and be
 undesirable for that reason.

 But I argue that most simpler users - which are exactly the ones a
 generic output plugin is aimed at - will want all column names since it
 makes replay far easier.

Meh, maybe.

 If the plugin interface isn't rich enough to provide a convenient way
 to avoid that, then it needs to be fixed so that it is, because it
 will be a common requirement.

 Oh, it surely is possibly to avoid repeating it. The output plugin
 interface simply gives you a relcache entry, that contains everything
 necessary.
 The output plugin would need to keep track of whether it has output data
 for a specific relation and it would need to check whether the table
 definition has changed, but I don't see how we could avoid that?

Well, it might be nice if there were a callback for, hey, schema has
changed!  Seems like a lot of plugins will want to know that for one
reason or another, and rechecking for every tuple sounds expensive.

  What still need to be determined is:
  * how do we separate and escape multiple values in one CSV column
  * how do we represent NULLs

 I consider the escaping a key design decision.  Ideally, it should be
 something that's easy to reverse from a scripting language; ideally
 also, it should be something similar to how we handle COPY.  These
 goals may be in conflict; we'll have to pick something.

 Note that parsing COPYs is a major PITA from most languages...

 Perhaps we should make the default output json instead? With every
 action terminated by a nullbyte?
 That's probably easier to parse from various scripting languages than
 anything else.

I could go for that.  It's not quite as compact as I might hope, but
JSON does seem to make people awfully happy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 11:07 AM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Oct 15, 2013 at 5:15 AM, Robert Haas robertmh...@gmail.com wrote:
 Well, the SQL standard way of doing this type of operation is MERGE.
 The alternative we know exists in other databases is REPLACE; there's
 also INSERT .. ON DUPLICATE KEY update.  In all of those cases,
 whatever weirdness exists around MVCC is confined to that one command.
  I tend to think we should do similarly, with the goal that
 HeapTupleSatisfiesMVCC need not change at all.

 I don't think that it's very pragmatic to define success in terms of
 not modifying a single visibility function. I feel it would be more
 useful to define it as providing acceptable, non-surprising semantics,
 while not regressing performance in other areas.

 The fact remains that you're going to have a create a new snapshot
 type even for this special case, so I don't see any win as regards
 managing invasiveness here. Quite the contrary, in fact.

Well, we might have to agree to disagree.

 I don't have the only vote here, of course, but my feeling is that
 that's more likely to be a good route.

 Naturally we all want MERGE. It seems self-defeating to insist on
 something significantly harder that there is significant less demand
 for, though. I thought that there was at least informal agreement that
 this sort of approach was preferable to MERGE in its full generality,
 based on feedback at the 2012 developer meeting. I really don't think
 that what I've done here is any worse than INSERT...ON DUPLICATE KEY
 UPDATE in any of the areas you express concern about here. REPLACE has
 some serious problems, and I just don't see it as a viable alternative
 at all - just ask any MySQL user.

 MERGE is of course more flexible to what I have here in some ways, but
 actually less flexible in other ways. I think that the real point of
 MERGE is that it's defined in a way that serves data warehousing use
 cases very well: the semantics constrain things such that the executor
 only has to execute a single ModifyTable node that does inserts,
 updates and deletes in a single scan. That's great, but what if it's
 useful to do that CRUD (yes, this can include selects) to entirely
 different tables? Or what if the relevant DML will only come in a
 later statement in the same transaction?

I'm not saying go implement MERGE.  I'm saying, make the
insert-or-update operation a single statement, using some syntax TBD,
instead of requiring the use of a new insert statement that makes
invisible rows visible as a side effect, so that you can wrap that in
a CTE and feed it to an update statement.  That's complex and, AFAICS,
unlike how any other database product handles this.

Again, other people can have different opinions on this, and that's
fine.  I'm just giving you mine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Josh Berkus
On 10/15/2013 05:52 AM, Magnus Hagander wrote:
 But the argument about being friendly for new users should definitely
 have us change wal_level and max_wal_senders.

+1 for having replication supported out-of-the-box aside from pg_hba.conf.

To put it another way: users are more likely to care about replication
than they are about IO overhead on a non-replicated server.  And for the
users who care about IO overhead, they are more likely to much about in
pg.conf *anyway* in order to set a slew of performance-tuning settings.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Andres Freund
On 2013-10-15 10:19:06 -0700, Josh Berkus wrote:
 On 10/15/2013 05:52 AM, Magnus Hagander wrote:
  But the argument about being friendly for new users should definitely
  have us change wal_level and max_wal_senders.
 
 +1 for having replication supported out-of-the-box aside from pg_hba.conf.
 
 To put it another way: users are more likely to care about replication
 than they are about IO overhead on a non-replicated server.  And for the
 users who care about IO overhead, they are more likely to much about in
 pg.conf *anyway* in order to set a slew of performance-tuning settings.

But it will hurt people restoring backups using pg_restore -j. I think
people might be rather dissapointed if that slows down by a factor of
three.

I think we really need to get to the point where we increase the wal
level ondemand...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Magnus Hagander
On Tue, Oct 15, 2013 at 7:26 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-15 10:19:06 -0700, Josh Berkus wrote:
 On 10/15/2013 05:52 AM, Magnus Hagander wrote:
  But the argument about being friendly for new users should definitely
  have us change wal_level and max_wal_senders.

 +1 for having replication supported out-of-the-box aside from pg_hba.conf.

 To put it another way: users are more likely to care about replication
 than they are about IO overhead on a non-replicated server.  And for the
 users who care about IO overhead, they are more likely to much about in
 pg.conf *anyway* in order to set a slew of performance-tuning settings.

 But it will hurt people restoring backups using pg_restore -j. I think
 people might be rather dissapointed if that slows down by a factor of
 three.

 I think we really need to get to the point where we increase the wal
 level ondemand...

Yeha, there are really two things.

If we can increase wal_level on demand, that would solve one of them.
Turning that into a SIGHUP parameter would be great. I have no idea
how hard it would be. In theory, couldn't we let it be sighup and then
just have do_pg_start_backup() block until all backends have
acknowledged that they are on the new WAL level somehow? (Yes, I
realize this might be a big simplification, but I'm allowed to hope,
no?)

The other problem is max_wal_senders. I think that's a much smaller
problem - setting that one to 5 or so by default shouldn't have a big
impact. But without the wal_level changes, it would also be mostly
pointless...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund
On 2013-10-15 10:19:17 -0700, Peter Geoghegan wrote:
 On Tue, Oct 15, 2013 at 9:56 AM, Robert Haas robertmh...@gmail.com wrote:
  Well, I don't know that any of us can claim to have a lock on what the
  syntax should look like.
 
 Sure. But it's not just syntax. We're talking about functional
 differences too, since you're talking about mandating an update, which
 is a not the same as an update locked row only conditionally, or a
 delete.

I think anything that only works by breaking visibility rules that way
is a nonstarter. Doing that from the C level is one thing, exposing it
this way seems a bad idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Andres Freund
On 2013-10-15 19:29:50 +0200, Magnus Hagander wrote:
 On Tue, Oct 15, 2013 at 7:26 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-10-15 10:19:06 -0700, Josh Berkus wrote:
  On 10/15/2013 05:52 AM, Magnus Hagander wrote:
   But the argument about being friendly for new users should definitely
   have us change wal_level and max_wal_senders.
 
  +1 for having replication supported out-of-the-box aside from pg_hba.conf.
 
  To put it another way: users are more likely to care about replication
  than they are about IO overhead on a non-replicated server.  And for the
  users who care about IO overhead, they are more likely to much about in
  pg.conf *anyway* in order to set a slew of performance-tuning settings.
 
  But it will hurt people restoring backups using pg_restore -j. I think
  people might be rather dissapointed if that slows down by a factor of
  three.
 
  I think we really need to get to the point where we increase the wal
  level ondemand...
 
 Yeha, there are really two things.
 
 If we can increase wal_level on demand, that would solve one of them.
 Turning that into a SIGHUP parameter would be great. I have no idea
 how hard it would be. In theory, couldn't we let it be sighup and then
 just have do_pg_start_backup() block until all backends have
 acknowledged that they are on the new WAL level somehow? (Yes, I
 realize this might be a big simplification, but I'm allowed to hope,
 no?)

Depends on what you want to support. For basebackups, that should be
doable with some pullups.
It's unfortunately more complex than that for streaming rep - we really
need persistent standby registration there. Otherwise the wal_level will
fall back to minimal when the standby disconnects which will obviously
break the standby.

 The other problem is max_wal_senders. I think that's a much smaller
 problem - setting that one to 5 or so by default shouldn't have a big
 impact. But without the wal_level changes, it would also be mostly
 pointless...

Well, you currently cannot even set it when the wal_level isn't set
appropriately, but that that should be easy enough to change.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Josh Berkus
On 10/15/2013 07:36 AM, Robert Haas wrote:
 On Tue, Oct 15, 2013 at 10:34 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 Josh said we should treat replication connections in a separate pool
 from normal database connections, right? So you withdraw your earlier
 objection to that?
 
 I don't think that's what he said.  Here's what I was referring to:

To clarify: I do, indeed, support the idea of treating replication
connections as a pool outside of max_connections.  Here's why:

FATAL: connection limit exceeded for non-superusers

SHOW max_connections;

100

SELECT COUNT(*) FROM pg_stat_activity;

94

SHOW superuser_reserved_connections;

3



... search around quite a bit,  eventually figure out that you have
three replication connections open.  We've already set up an illogical
and hard-to-troubleshoot situation where replication connections do not
appear in pg_stat_activity, yet they are counted against max_connections.

You could argue that the same is true of superuser_reserved_connections,
but there's a couple reasons why it isn't:

1) if superusers are actually connected, that shows up in
pg_stat_activity (and given how many of our users run their apps as
superuser, they get to max_connections out anyway).

2) the error message spells out that there may be superuser connections
available.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Magnus Hagander
On Tue, Oct 15, 2013 at 7:32 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-15 19:29:50 +0200, Magnus Hagander wrote:
 On Tue, Oct 15, 2013 at 7:26 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-10-15 10:19:06 -0700, Josh Berkus wrote:
  On 10/15/2013 05:52 AM, Magnus Hagander wrote:
   But the argument about being friendly for new users should definitely
   have us change wal_level and max_wal_senders.
 
  +1 for having replication supported out-of-the-box aside from pg_hba.conf.
 
  To put it another way: users are more likely to care about replication
  than they are about IO overhead on a non-replicated server.  And for the
  users who care about IO overhead, they are more likely to much about in
  pg.conf *anyway* in order to set a slew of performance-tuning settings.
 
  But it will hurt people restoring backups using pg_restore -j. I think
  people might be rather dissapointed if that slows down by a factor of
  three.
 
  I think we really need to get to the point where we increase the wal
  level ondemand...

 Yeha, there are really two things.

 If we can increase wal_level on demand, that would solve one of them.
 Turning that into a SIGHUP parameter would be great. I have no idea
 how hard it would be. In theory, couldn't we let it be sighup and then
 just have do_pg_start_backup() block until all backends have
 acknowledged that they are on the new WAL level somehow? (Yes, I
 realize this might be a big simplification, but I'm allowed to hope,
 no?)

 Depends on what you want to support. For basebackups, that should be
 doable with some pullups.
 It's unfortunately more complex than that for streaming rep - we really
 need persistent standby registration there. Otherwise the wal_level will
 fall back to minimal when the standby disconnects which will obviously
 break the standby.

I was actually thinking the easier step might not be to do it
dynamically as the standby registers - just allow it to be a SIGHUP
parameter. So you'd still change it in postgresql.conf, but it would
be ok with a reload rather than a restart.

Yes, fully dynamic would be better, so if we could combined those two,
that would make us require nothing for pg_basebackup, and just a
reload for replication slaves. The point being that we wouldn't need
a *restart* at any point - and that alond would be a big improvement.


 The other problem is max_wal_senders. I think that's a much smaller
 problem - setting that one to 5 or so by default shouldn't have a big
 impact. But without the wal_level changes, it would also be mostly
 pointless...

 Well, you currently cannot even set it when the wal_level isn't set
 appropriately, but that that should be easy enough to change.

Yes, it would be a trivial change to allow that parametre to be set
and then just give an error if you try to initiate streaming in that
case.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan
On Tue, Oct 15, 2013 at 10:29 AM, Andres Freund and...@2ndquadrant.com wrote:
 I think anything that only works by breaking visibility rules that way
 is a nonstarter. Doing that from the C level is one thing, exposing it
 this way seems a bad idea.

What visibility rule is that? Upsert *has* to do effectively the same
thing as what I've proposed - there is no getting away from it. So
maybe the visibility rulebook (which as far as I can tell is the way
things work today) needs to be updated. If we did, say, INSERT...ON
DUPLICATE KEY UPDATE, we'd have to update a row with potentially no
visible-to-snapshot version *at all*, and make a new version of that
visible. That's just what it takes. What's the difference between that
and just locking? If the only difference is that it isn't necessary to
modify tqual.c because you're passing a tid directly, that isn't a
user-visible difference - the rule has been broken just the same.
Arguably, it's even more of a hack, since it's a special, out-of-band
visibility exception. I'm happy to have total scrutiny of changes to
tqual.c, but I'm surprised that the mere fact of it having been
modified is being weighed so heavily.

Another thing that I'm not clear on is how an update can be backed out
of if the row is modified by another xact. As I think I've already
illustrated, the row locking that takes place has to be kind of
opportunistic. I'm sure you could do it, but it would probably be
quite invasive.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus
On 10/15/2013 08:11 AM, Robert Haas wrote:
 I'm not saying go implement MERGE.  I'm saying, make the
 insert-or-update operation a single statement, using some syntax TBD,
 instead of requiring the use of a new insert statement that makes
 invisible rows visible as a side effect, so that you can wrap that in
 a CTE and feed it to an update statement.  That's complex and, AFAICS,
 unlike how any other database product handles this.

Hmmm.  Is the plan NOT to eventually get to a single-statement upsert?
If not, then I'm not that keen on this feature.  I can't say that
anybody I know who's migrating from MySQL would use a 2-statement
version of upsert; if they were prepared for that, then they'd be
prepared to just rewrite their stuff as proper insert/updates anyway.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan
On Tue, Oct 15, 2013 at 10:58 AM, Josh Berkus j...@agliodbs.com wrote:
 Hmmm.  Is the plan NOT to eventually get to a single-statement upsert?
 If not, then I'm not that keen on this feature.

See the original e-mail in the thread for what I imagine idiomatic
usage will look like.

http://www.postgresql.org/message-id/cam3swzthwrktvurf1awaih8qthgnmzafydcnw8qju7pqhk5...@mail.gmail.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan
On Tue, Oct 15, 2013 at 11:05 AM, Peter Geoghegan p...@heroku.com wrote:
 See the original e-mail in the thread for what I imagine idiomatic
 usage will look like.

 http://www.postgresql.org/message-id/cam3swzthwrktvurf1awaih8qthgnmzafydcnw8qju7pqhk5...@mail.gmail.com


Note also that this doesn't preclude a variant with a more direct
update part (not that I think that's all that compelling). Doing
things this way was motivated by:

1) Serving the needs of logical changeset generation plugins, even if
Andres doesn't think that needs to be exposed through SQL. He and I
both want something that does this with low overhead (in particular,
no subtransactions).

2) Getting something effective into the next release. MERGE-like
flexibility seems like a very desirable thing. And the
implementation's infrastructure can be used by an eventual MERGE
implementation.

3) Being simple enough that huge bike shedding over syntax might not
be necessary. Making insert statements grow an update tumor is likely
to get messy fast. I know because I tried it myself.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus
Peter,

 Note also that this doesn't preclude a variant with a more direct
 update part (not that I think that's all that compelling). Doing
 things this way was motivated by:

I can see the value in the CTE format for this for existing PostgreSQL
users.

(although, AFAICT it doesn't allow for the implementation of one of my
personal desires, which is UPDATE ... ON NOT FOUND INSERT, for cases
where updates are expected to occur 95% of the time, but that's another
topic. Unless rejects for an Update could be the leftover rows, but
then we're getting into full MERGE.).

I'm just pointing out that this doesn't do much for the MySQL migration
case; the rewrite is too complex to automate.  I'd been assuming that we
had some plans to implement a MySQL-friendly syntax for 9.5, and this
version was a stepping stone to that.

Does this version make a distinction between PRIMARY KEY constraints and
UNIQUE indexes?  If not, how does it pick among keys?  If so, what about
tables with no PRIMARY KEY for various reasons (like unique GiST indexes?)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan
On Tue, Oct 15, 2013 at 11:23 AM, Josh Berkus j...@agliodbs.com wrote:
 (although, AFAICT it doesn't allow for the implementation of one of my
 personal desires, which is UPDATE ... ON NOT FOUND INSERT, for cases
 where updates are expected to occur 95% of the time, but that's another
 topic. Unless rejects for an Update could be the leftover rows, but
 then we're getting into full MERGE.).

This isn't really all that inefficient for that case. Certainly, the
balance in cost between mostly-insert cases and mostly-update cases is
a strength of my basic approach over others.

 Does this version make a distinction between PRIMARY KEY constraints and
 UNIQUE indexes?  If not, how does it pick among keys?  If so, what about
 tables with no PRIMARY KEY for various reasons (like unique GiST indexes?)

We thought about prioritizing where to look (mostly as a performance
optimization), but right now no. It works with amcanunique methods,
which in practice means btrees. There is no such thing as a GiST
unique index, so I guess you're referring to an exclusion constraint
on an equality operator. That doesn't work with this, but why would
you want it to? As for generalizing this to work with exclusion
constraints, which I guess you might have also meant, that's a much
more difficult and much less compelling proposition, in my opinion.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus
On 10/15/2013 11:38 AM, Peter Geoghegan wrote:
 We thought about prioritizing where to look (mostly as a performance
 optimization), but right now no. It works with amcanunique methods,
 which in practice means btrees. There is no such thing as a GiST
 unique index, so I guess you're referring to an exclusion constraint
 on an equality operator. That doesn't work with this, but why would
 you want it to? As for generalizing this to work with exclusion
 constraints, which I guess you might have also meant, that's a much
 more difficult and much less compelling proposition, in my opinion.

Yeah, that was one thing I was thinking of.

Also, because you can't INDEX CONCURRENTLY a PK, I've been building a
lot of databases which have no PKs, only UNIQUE indexes.  Historically,
this hasn't been an issue because aside from wonky annoyances (like the
CONCURRENTLY case), Postgres doesn't distinguish between UNIQUE indexes
and PRIMARY KEYs -- as, indeed, it shouldn't, since they're both keys,
adn the whole concept of a primary key is a legacy of index-organized
databases, which PostgreSQL is not.

However, it does seem like the new syntax could be extended with and
optional USING unqiue_index_name in the future (9.5), no?

I'm just checking that we're not painting ourselves into a corner with
this particular implementation.  It's OK if it doesn't implement most
things now; it's bad if it is impossible to build on and we have to
support it forever.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan
On Tue, Oct 15, 2013 at 11:55 AM, Josh Berkus j...@agliodbs.com wrote:
 However, it does seem like the new syntax could be extended with and
 optional USING unqiue_index_name in the future (9.5), no?

There is no reason why we couldn't do that and just consider that one
unique index. Whether we should is another question - I certainly
think that mandating it would be very bad.

 I'm just checking that we're not painting ourselves into a corner with
 this particular implementation.  It's OK if it doesn't implement most
 things now; it's bad if it is impossible to build on and we have to
 support it forever.

I don't believe it does. In essence this just simply inserts a row,
and rather than throwing a unique constraint violation, locks the row
that prevented insertion from proceeding in respect of any tuple
proposed for insertion where it does not. That's all. You can build
lots of things with it that you can't today. Or you can not use it at
all. So that covers semantics, I'd say.

As for implementation: I believe that the implementation is by far the
most forward thinking (in terms of building infrastructure for a
proper MERGE) of any proposal to date.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus
On 10/15/2013 12:03 PM, Peter Geoghegan wrote:
 On Tue, Oct 15, 2013 at 11:55 AM, Josh Berkus j...@agliodbs.com wrote:
 However, it does seem like the new syntax could be extended with and
 optional USING unqiue_index_name in the future (9.5), no?
 
 There is no reason why we couldn't do that and just consider that one
 unique index. Whether we should is another question - 

What's the shouldn't argument, if any?

 I certainly
 think that mandating it would be very bad.

Agreed.  If there is a PK, we should allow the user to use it implicitly.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background workers, round three

2013-10-15 Thread Kohei KaiGai
2013/10/14 Robert Haas robertmh...@gmail.com:
 * ephemeral-precious-v1.patch
 AtEOXact_BackgroundWorker() is located around other AtEOXact_*
 routines. Doesn't it makes resource management complicated?
 In case when main process goes into error handler but worker
 process is still running in health, it may continue to calculate
 something and put its results on shared memory segment, even
 though main process suggest postmaster to kill it.

 Since I wrote this patch set, I've been thinking a lot more about
 error recovery.  Obviously, one of the big problems as we think about
 parallel query is that you've now got multiple backends floating
 around, and if the transaction aborts (in any backend), the other
 backends don't automatically know that; they need some way to know
 that they, too, short abort processing.  There are a lot of details to
 get right here, and the time I've spent on it so far convinces me that
 the problem is anything but easy.

 Having said that, I'm not too concerned about the particular issue
 that you raise here.  The resources that need to be cleaned up during
 transaction abort are backend-private resources.  If, for example, the
 user backend detaches a dynamic shared memory segment that is being
 used for a parallel computation, they're not actually *destroying* the
 segment; they are just detaching it *from their address space*.  The
 last process to detach it will also destroy it.  So the ordering in
 which the various processes detach it doesn't matter much.

 One of the things I do this is necessary is a set of on_dsm_detach
 callbacks that work pretty much the way that on_shmem_exit callbacks
 work today.  Just as we can't detach from the main shared memory
 segment without releasing locks and buffer pins and lwlocks and our
 PGXACT, we can't release from a dynamic shared memory segment without
 performing any similar cleanup that is needed.  I'm currently working
 on a patch for that.

Hmm. It probably allows to clean-up smaller fraction of data structure
constructed on dynamic shared memory segment, if we map / unmap
for each transactions.

 All the ResourceOwnerRelease() callbacks are located prior to
 AtEOXact_BackgroundWorker(), it is hard to release resources
 being in use by background worker, because they are healthy
 running until it receives termination signal, but sent later.
 In addition, it makes implementation complicated if we need to
 design background workers to release resources if and when it
 is terminated. I don't think it is a good coding style, if we need
 to release resources in different location depending on context.

 Which specific resources are you concerned about?

I assumed smaller chunks allocated on static or dynamic shared
memory segment to be used for communicate between main
process and worker processes because of my motivation.
When we move a chunk of data to co-processor using asynchronous
DMA transfer, API requires the source buffer is mlock()'ed to avoid
unintentional swap out during DMA transfer. On the other hand,
cost of mlock() operation is not ignorable, so it may be a reasonable
design to lock a shared memory segment on start-up time then
continue to use it, without unmapping.
So, I wondered how to handle the situation when extension tries
to manage a resource with smaller granularity than the one
managed by PostgreSQL core.

 So, I'd like to propose to add a new invocation point of
 ResourceOwnerRelease() after all AtEOXact_* jobs, with
 new label something like RESOURCE_RELEASE_FINAL.

 In addition, AtEOXact_BackgroundWorker() does not synchronize
 termination of background worker processes being killed.
 Of course it depends on situation, I think it is good idea to wait
 for completion of worker processes to be terminated, to ensure
 resource to be released is backed to the main process if above
 ResourceOwnerRelease() do the job.

 Again, which resources are we talking about here?  I tend to think
 it's an essential property of the system that we *shouldn't* have to
 care about the order in which processes are terminated.  First, that
 will be difficult to control; if an ERROR or FATAL condition has
 occurred and we need to terminate, then there are real limits to what
 guarantees we can provide after that point.  Second, it's also
 *expensive*.  The point of parallelism is to make things faster; any
 steps we add that involve waiting for other processes to do things
 will eat away at the available gains.  For a query that'll run for an
 hour that hardly matters, but for short queries it's important to
 avoid unnecessary overhead.

Indeed, you are right. Error path has to be terminated soon.
Probably, ResourceOwnerRelease() callback needs to inform healthy
performing worker process the transaction got aborted thus no need
to return its calculation result, using some way, if I implement it.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes 

Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Amit Kapila
On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, that sucks.  So it's a Windows bug.

 It's not clear to me that we should do anything about this at all,
 except perhaps document that people should avoid long tablespace
 path names on an unknown set of Windows versions.  We should not
 be in the business of working around any and every bug coming out
 of Redmond.

 It's sort of incomprehensible to me that Microsoft has a bug like this
 and apparently hasn't fixed it.  But I think I still favor trying to
 work around it.  When people try to use a long data directory name and
 it freezes the system, some of them will blame us rather than
 Microsoft.  We've certainly gone to considerable lengths to work
 around extremely strange bugs in various compiler toolchains, even
 relatively obscure ones.  I don't particularly see why we shouldn't do
 the same here.

 I agree we'll probably want to work around it in the end, but I still
 think it should be put to Microsoft PSS if we can. The usual - have we
 actually produced a self-contained example that does just this (and
 doesn't include the full postgres support) and submitted it to
 *microsoft* for comments?

  I have written a self contained win32 console application with which
the issue can be reproduced.
  The application project is attached with this mail.

  Here is brief description of the project:
  This project is created using MSVC 2010, but even if somebody
doesn't have this version of VC, functions in file long_path.cpp can
be copied and
  used in new project.
  In project settings, I have changed Character Set to Use Multi-Byte
Character Set which is what Postgres uses.

  It takes 3 parameters as input:
  existingpath - path for which link will be created. this path should
be an already
  existing path with one level less than actual
path. For example,
  if we want to create a link for path
E:/PG_Patch/Long_Path/path_dir/version_dir,
  then this should be E:/PG_Patch/Long_Path/path_dir.
  newpath  - path where link needs to be created. it should be
non-absolute path
  of format linked_path_dir/test_version
  curpath   - path to set as current working directory path, it
should be the
  location to prepend to newpath

 Currently I have used input parameters as
 E:/PG_Patch/Long_Path/path_dir
 linked_path_dir/test_version
E:/PG_Patch/Long_Path/

 Long path is much less than 260 char limit on windows, I have
observed this problem with path length  130 (approx.)


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


long_path.rar
Description: application/rar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Triggers on foreign tables

2013-10-15 Thread Kohei KaiGai
2013/10/15 Robert Haas robertmh...@gmail.com:
 On Mon, Oct 14, 2013 at 5:24 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 And, I also want some comments from committers, not only from mine.

 +1

 +1

 /me pokes head up.  I know I'm going to annoy people with this
 comment, but I feel like it's going to have to be made at some point
 by somebody, so here goes: I don't see the point of this feature.  If
 you want a trigger on a table, why not set it on the remote side?  A
 trigger on the foreign table won't be enforced consistently; it'll
 only work when the update is routed through the foreign table, not
 when people access the underlying table on the remote side through any
 other mechanism.  The number of useful things you can do this way
 seems fairly small.  Perhaps you could use a row-level trigger for
 RLS, to allow only certain rows on the foreign side to be updated, but
 even that seems like a slightly strange design: generally it'd be
 better to enforce the security as close to the target object as
 possible.

One reason we should support local triggers is that not all the data
source of FDW support remote trigger. It required FDW drivers to
have RDBMS as its backend, but no realistic assumption.
For example, file_fdw is unavailable to implement remote triggers.

One thing I'd like to know is, where is the goal of FDW feature.
It seems to me, FDW goes into a feature to manage external data
set as if regular tables. If it is right understanding, things we try to
support on foreign table is things we're supporting on regular tables,
such as triggers.

 There's another issue that concerns me here also: performance.  IIUC,
 an update of N tuples on the remote side currently requires N+1 server
 round-trips.  That is unspeakably awful, and we really oughta be
 looking for ways to make that number go down, by pushing the whole
 update to the remote side.  But obviously that won't be possible if
 there's a per-row trigger that has to be evaluated on the local side.
 Now, assuming somebody comes up with code that implements that
 optimization, we can just disable it when there are local-side
 triggers.  But, then you're back to having terrible performance.  So
 even if the use case for this seemed really broad, I tend to think
 performance concerns would sink most of the possible real-world uses.

We often have some case that we cannot apply fully optimized path
because of some reasons, like view has security-barrier, qualifier
contained volatile functions, and so on...
Trigger may be a factor to prevent fully optimized path, however,
it depends on the situation which one shall be prioritized; performance
or functionality.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Robert Haas
On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, that sucks.  So it's a Windows bug.

 It's not clear to me that we should do anything about this at all,
 except perhaps document that people should avoid long tablespace
 path names on an unknown set of Windows versions.  We should not
 be in the business of working around any and every bug coming out
 of Redmond.

 It's sort of incomprehensible to me that Microsoft has a bug like this
 and apparently hasn't fixed it.  But I think I still favor trying to
 work around it.  When people try to use a long data directory name and
 it freezes the system, some of them will blame us rather than
 Microsoft.  We've certainly gone to considerable lengths to work
 around extremely strange bugs in various compiler toolchains, even
 relatively obscure ones.  I don't particularly see why we shouldn't do
 the same here.

 I agree we'll probably want to work around it in the end, but I still
 think it should be put to Microsoft PSS if we can. The usual - have we
 actually produced a self-contained example that does just this (and
 doesn't include the full postgres support) and submitted it to
 *microsoft* for comments?

   I have written a self contained win32 console application with which
 the issue can be reproduced.
   The application project is attached with this mail.

   Here is brief description of the project:
   This project is created using MSVC 2010, but even if somebody
 doesn't have this version of VC, functions in file long_path.cpp can
 be copied and
   used in new project.
   In project settings, I have changed Character Set to Use Multi-Byte
 Character Set which is what Postgres uses.

   It takes 3 parameters as input:
   existingpath - path for which link will be created. this path should
 be an already
   existing path with one level less than actual
 path. For example,
   if we want to create a link for path
 E:/PG_Patch/Long_Path/path_dir/version_dir,
   then this should be E:/PG_Patch/Long_Path/path_dir.
   newpath  - path where link needs to be created. it should be
 non-absolute path
   of format linked_path_dir/test_version
   curpath   - path to set as current working directory path, it
 should be the
   location to prepend to newpath

  Currently I have used input parameters as
  E:/PG_Patch/Long_Path/path_dir
  linked_path_dir/test_version
 E:/PG_Patch/Long_Path/

  Long path is much less than 260 char limit on windows, I have
 observed this problem with path length  130 (approx.)

And this reliably reproduces the hang?  On which Windows version(s)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund
On 2013-10-15 10:53:35 -0700, Peter Geoghegan wrote:
 On Tue, Oct 15, 2013 at 10:29 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I think anything that only works by breaking visibility rules that way
  is a nonstarter. Doing that from the C level is one thing, exposing it
  this way seems a bad idea.
 
 What visibility rule is that?

The early return you added to HTSMVCC.

At the very least it opens you to lots of halloween problem like
scenarios.

 Upsert *has* to do effectively the same thing as what I've proposed -
 there is no getting away from it. So maybe the visibility rulebook
 (which as far as I can tell is the way things work today) needs to
 be updated. If we did, say, INSERT...ON DUPLICATE KEY UPDATE, we'd
 have to update a row with potentially no visible-to-snapshot version
 *at all*, and make a new version of that visible. That's just what it
 takes. What's the difference between that and just locking? If the
 only difference is that it isn't necessary to modify tqual.c because
 you're passing a tid directly, that isn't a user-visible difference -
 the rule has been broken just the same.  Arguably, it's even more of
 a hack, since it's a special, out-of-band visibility exception.

No, doing it in special case code is fundamentally different since those
locations deal only with one row at a time. There's no scans that can
pass over that row.
That's why I think exposing the on conflict lock logic to anything but
C isn't going to fly btw.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund
On 2013-10-15 11:23:44 -0700, Josh Berkus wrote:
 (although, AFAICT it doesn't allow for the implementation of one of my
 personal desires, which is UPDATE ... ON NOT FOUND INSERT, for cases
 where updates are expected to occur 95% of the time, but that's another
 topic. Unless rejects for an Update could be the leftover rows, but
 then we're getting into full MERGE.).

FWIW I can't see the above syntax as something working very well - you
fundamentally have to SET every column and it only makes sense in
UPDATEs that provably affect only one row.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund
On 2013-10-15 11:55:06 -0700, Josh Berkus wrote:
 Also, because you can't INDEX CONCURRENTLY a PK, I've been building a
 lot of databases which have no PKs, only UNIQUE indexes.

You know that you can add prebuilt primary keys using ALTER TABLE
... ADD CONSTRAINT ... PRIMARY KEY (...) USING indexname?

 Postgres doesn't distinguish between UNIQUE indexes
 and PRIMARY KEYs -- as, indeed, it shouldn't, since they're both keys,
 adn the whole concept of a primary key is a legacy of index-organized
 databases, which PostgreSQL is not.

There's some other differences, fro one primary keys are automatically
picked up by foreign keys if the referenced columns aren't specified,
for another we do not yet automatically recognize NOT NULL UNIQUE
columns in GROUP BY.

 However, it does seem like the new syntax could be extended with and
 optional USING unqiue_index_name in the future (9.5), no?

Yes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Josh Berkus
On 10/15/2013 07:56 AM, Andres Freund wrote:
 Well, just providing the C API + an example in a first step didn't work
   out too badly for FDWs. I am pretty sure that once released there will
   soon be extensions for it on PGXN or whatever for special usecases.
  
  I suspect so, too.  But I also think that if that's the only thing
  available in the first release, a lot of users will get a poor initial
  impression.
 I think lots of people will expect a builtin logical replication
 solution :/. Which seems a tad unlikely to arrive in 9.4.

Well, last I checked the Slony team is hard at work on building
something which will be based on logical changesets.  So there will
likely be at least one tool available shortly after 9.4 is released.

A good and flexible API is, IMHO, more important than having any
finished solution.  The whole reason why logical replication was outside
core PG for so long is that replication systems have differing and
mutually incompatible goals.  A good API can support all of those goals;
a user-level tool, no matter how good, can't.

And, frankly, once the API is built, how hard will it be to write a
script which does the simplest replication approach (replay all
statements on slave)?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus
On 10/15/2013 02:31 PM, Andres Freund wrote:
 On 2013-10-15 11:55:06 -0700, Josh Berkus wrote:
 Also, because you can't INDEX CONCURRENTLY a PK, I've been building a
 lot of databases which have no PKs, only UNIQUE indexes.
 
 You know that you can add prebuilt primary keys using ALTER TABLE
 ... ADD CONSTRAINT ... PRIMARY KEY (...) USING indexname?

That still requires an ACCESS EXCLUSIVE lock, and then can't be dropped
using DROP INDEX CONCURRENTLY.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-15 Thread Mark Kirkwood
On 11/10/13 17:49, Mark Kirkwood wrote:
 On 11/10/13 17:08, Satoshi Nagayasu wrote:
 (2013/10/11 7:32), Mark Kirkwood wrote:
 On 11/10/13 11:09, Mark Kirkwood wrote:
 On 16/09/13 16:20, Satoshi Nagayasu wrote:
 (2013/09/15 11:07), Peter Eisentraut wrote:
 On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
 I'm looking forward to seeing more feedback on this approach,
 in terms of design and performance improvement.
 So, I have submitted this for the next CF.
 Your patch fails to build:

 pgstattuple.c: In function ‘pgstat_heap_sample’:
 pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
 this function)
 pgstattuple.c:737:13: note: each undeclared identifier is reported
 only once for each function it appears in
 Thanks for checking. Fixed to eliminate SnapshotNow.

 This seems like a cool idea! I took a quick look, and initally
 replicated the sort of improvement you saw:


 bench=# explain analyze select * from pgstattuple('pgbench_accounts');
 QUERY PLAN

 
 Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
 time=786.368..786.369 rows=1 loops=1)
 Total runtime: 786.384 ms
 (2 rows)

 bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
 NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
 dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
 QUERY PLAN

 
 Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
 time=12.004..12.005 rows=1 loops=1)
 Total runtime: 12.019 ms
 (2 rows)



 I wondered what sort of difference eliminating caching would make:

 $ sudo sysctl -w vm.drop_caches=3

 Repeating the above queries:


 bench=# explain analyze select * from pgstattuple('pgbench_accounts');
 QUERY PLAN

 
 Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
 time=9503.774..9503.776 rows=1 loops=1)
 Total runtime: 9504.523 ms
 (2 rows)

 bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
 NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
 dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
 QUERY PLAN

 
 Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
 time=12330.630..12330.631 rows=1 loops=1)
 Total runtime: 12331.353 ms
 (2 rows)


 So the sampling code seems *slower* when the cache is completely cold -
 is that expected? (I have not looked at how the code works yet - I'll
 dive in later if I get a chance)!
 Thanks for testing that. It would be very helpful to improve the
 performance.

 Quietly replying to myself - looking at the code the sampler does 3000
 random page reads... I guess this is slower than 163935 (number of pages
 in pgbench_accounts) sequential page reads thanks to os readahead on my
 type of disk (WD Velociraptor). Tweaking the number of random reads (i.e
 the sample size) down helps - but obviously that can impact estimation
 accuracy.

 Thinking about this a bit more, I guess the elapsed runtime is not the
 *only* theng to consider - the sampling code will cause way less
 disruption to the os page cache (3000 pages vs possibly lots more than
 3000 for reading an entire ralation).

 Thoughts?
 I think it could be improved by sorting sample block numbers
 *before* physical block reads in order to eliminate random access
 on the disk.

 pseudo code:
 --
 for (i=0 ; iSAMPLE_SIZE ; i++)
 {
 sample_block[i] = random();
 }

 qsort(sample_block);

 for (i=0 ; iSAMPLE_SIZE ; i++)
 {
 buf = ReadBuffer(rel, sample_block[i]);

 do_some_stats_stuff(buf);
 }
 --

 I guess it would be helpful for reducing random access thing.

 Any comments?
 Ah yes - that's a good idea (rough patch to your patch attached)!

 bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
 NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
 dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
 QUERY PLAN

 
 Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
 time=9968.318..9968.319 rows=1 loops=1)
 Total runtime: 9968.443 ms
 (2 rows)




Actually - correcting my compare function to sort the blocks in
*increasing* order (doh), gets a better result:

bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
QUERY PLAN
--
Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 

Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-15 Thread Mark Kirkwood

On 11/10/13 17:33, Jaime Casanova wrote:
also the name pgstattuple2, doesn't convince me... maybe you can use 
pgstattuple() if you use a second argument (percentage of the sample) 
to overload the function 


+1, that seems much nicer.

Regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Gibheer
On Mon, 14 Oct 2013 11:52:57 +0530
Amit Kapila amit.kapil...@gmail.com wrote:

 On Sun, Oct 13, 2013 at 2:08 PM, Gibheer gibh...@zero-knowledge.org
 wrote:
  On Sun, 13 Oct 2013 11:38:17 +0530
  Amit Kapila amit.kapil...@gmail.com wrote:
 
  On Thu, Oct 10, 2013 at 3:17 AM, Gibheer
  gibh...@zero-knowledge.org wrote:
   On Mon, 7 Oct 2013 11:39:55 +0530
   Amit Kapila amit.kapil...@gmail.com wrote:
   Robert Haas wrote:
   On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund
   andres(at)2ndquadrant(dot)com wrote:
Hmm.  It seems like this match is making MaxConnections no
longer mean the maximum number of connections, but rather
the maximum number of non-replication connections.  I don't
think I support that definitional change, and I'm kinda
surprised if this is sufficient to implement it anyway
(e.g. see InitProcGlobal()).
   
I don't think the implementation is correct, but why don't
you like the definitional change? The set of things you can
do from replication connections are completely different
from a normal connection. So using separate pools for them
seems to make sense. That they end up allocating similar
internal data seems to be an implementation detail to me.
  
Because replication connections are still connections.  If I
tell the system I want to allow 100 connections to the server,
it should allow 100 connections, not 110 or 95 or any other
number.
  
   I think that to reserve connections for replication, mechanism
   similar to superuser_reserved_connections be used rather than
   auto vacuum workers or background workers.
   This won't change the definition of MaxConnections. Another
   thing is that rather than introducing new parameter for
   replication reserved connections, it is better to use
   max_wal_senders as it can serve the purpose.
  
   Review for replication_reserved_connections-v2.patch,
   considering we are going to use mechanism similar to
   superuser_reserved_connections and won't allow definition of
   MaxConnections to change.
  
 
  
   Hi,
  
   I took the time and reworked the patch with the feedback till
   now. Thank you very much Amit!
  
   So this patch uses max_wal_senders together with the idea of the
   first patch I sent. The error messages are also adjusted to make
   it obvious, how it is supposed to be and all checks work, as far
   as I could tell.
 
  If I understand correctly, now the patch has implementation such
  that a. if the number of connections left are (ReservedBackends +
  max_wal_senders), then only superusers or replication connection's
  will be allowed
  b. if the number of connections left are ReservedBackend, then only
  superuser connections will be allowed.
 
  That is correct.
 
  So it will ensure that max_wal_senders is used for reserving
  connection slots from being used by non-super user connections. I
  find new usage of max_wal_senders acceptable, if anyone else thinks
  otherwise, please let us know.
 
 
  1.
  +varnamesuperuser_reserved_connections/varname
  +varnamemax_wal_senders/varname only superuser and WAL
  connections
  +are allowed.
 
  Here minus seems to be missing before max_wal_senders and I think
  it will be better to use replication connections rather than WAL
  connections.
 
  This is fixed.
 
  2.
  -new replication connections will be accepted.
  +new WAL or other connections will be accepted.
 
  I think as per new implementation, we don't need to change this
  line.
 
  I reverted that change.
 
  3.
  + * reserved slots from max_connections for wal senders. If the
  number of free
  + * slots (max_connections - max_wal_senders) is depleted.
 
   Above calculation (max_connections - max_wal_senders) needs to
  include super user reserved connections.
 
  My first thought was, that I would not add it here. When superuser
  reserved connections are not set, then only max_wal_senders would
  count.
  But you are right, it has to be set, as 3 connections are reserved
  by default for superusers.
 
 + * slots (max_connections - superuser_reserved_connections -
 max_wal_senders) here it should be ReservedBackends rather than
 superuser_reserved_connections.

fixed

  4.
  + /*
  + * Although replication connections currently require superuser
  privileges, we
  + * don't allow them to consume the superuser reserved slots, which
  are
  + * intended for interactive use.
*/
if ((!am_superuser || am_walsender) 
ReservedBackends  0 
!HaveNFreeProcs(ReservedBackends))
ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
  - errmsg(remaining connection slots are reserved for
  non-replication superuser connections)));
  + errmsg(remaining connection slots are reserved for superuser
  connections)));
 
  Will there be any problem if we do the above check before the check
  for wal senders and reserved replication connections (+
  !HaveNFreeProcs(max_wal_senders + ReservedBackends))) and 

Re: [HACKERS] PSQL return coder

2013-10-15 Thread James Sewell
I was avoiding ON_ERROR_STOP because I was using ON_ERROR_ROLLBACK, but
have just realised that if I encase my SQL in a transaction then rollback
will still happen.

Perfect!


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 * **W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Fri, Oct 11, 2013 at 12:25 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Thu, Oct 10, 2013 at 1:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  James Sewell james.sew...@lisasoft.com writes:
  My question is in a rollback scenario is it possible to get PSQL to
 return
  a non 0 exit status?
 
  Maybe you could use -c instead of -f?
 
  $ psql -c 'select 1; select 1/0' regression
  ERROR:  division by zero
  $ echo $?
  1
 
  You won't need explicit BEGIN/END because this is already a single
  transaction.

 According to the man page,
 EXIT STATUS
psql returns 0 to the shell if it finished normally, 1 if a fatal
 error
of its own (out of memory, file not found) occurs, 2 if the
  connection
to the server went bad and the session was not interactive, and 3
 if an
error occurred in a script and the variable ON_ERROR_STOP was set.

 So for a longer script ON_ERROR_STOP might be the ticket (which is
 usually a good idea anyways).

 merlin


-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread David Fetter
On Tue, Oct 15, 2013 at 10:09:05AM -0400, Robert Haas wrote:
 On Tue, Oct 15, 2013 at 9:17 AM, Andres Freund and...@2ndquadrant.com wrote:
 User: So, what's new in PostgreSQL 9.4?
 Hacker: Well, now we have logical replication!
 User: Why is that cool?
 Hacker: Well, streaming replication is awesome for HA, but it has
 significant limitations.  And trigger-based systems are very mature,
 but the overhead is high and their lack of core integration makes them
 hard to use.  With this technology, you can build systems that will
 replicate individual tables or even parts of tables, multi-master
 systems, and lots of other cool stuff.
 User: Wow, that sounds great.  How do I use it?
 Hacker: Well, first you write an output plugin in C using a special API.
 User: Hey, do you know whether the MongoDB guys came to this conference?
 
 Let's try that again.
 
 User: Wow, that sounds great.  How do I use it?
 Hacker: Well, currently, the output gets dumped as a series of text
 files that are designed to be parsed using a scripting language.  We
 have sample parsers written in Perl and Python that you can use as-is
 or hack up to meet your needs.

My version:

Hacker: the output gets dumped as a series of JSON files.  We have
docs for this rev of the format and examples of consumers in Perl and
Python you can use as-is or hack up to meet your needs.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Peter Geoghegan
On Tue, Oct 15, 2013 at 7:09 AM, Robert Haas robertmh...@gmail.com wrote:
 Let's try that again.

 User: Wow, that sounds great.  How do I use it?
 Hacker: Well, currently, the output gets dumped as a series of text
 files that are designed to be parsed using a scripting language.  We
 have sample parsers written in Perl and Python that you can use as-is
 or hack up to meet your needs.

Have you heard of multicorn? Plugin authors can write a wrapper that
spits out JSON or whatever other thing they like, which can be
consumed by non C-hackers.

 Now, some users are still going to head for the hills.  But at least
 from where I sit it sounds a hell of a lot better than the first
 answer.  We're not going to solve all of the tooling problems around
 this technology in one release, for sure.  But as far as 95% of our
 users are concerned, a C API might as well not exist at all.  People
 WILL try to machine parse the output of whatever demo plugins we
 provide; so I think we should try hard to provide at least one such
 plugin that is designed to make that as easy as possible.

I agree that this is important, but I wouldn't like to weigh it too heavily.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Amit Kapila
On Wed, Oct 16, 2013 at 2:04 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, that sucks.  So it's a Windows bug.

 It's not clear to me that we should do anything about this at all,
 except perhaps document that people should avoid long tablespace
 path names on an unknown set of Windows versions.  We should not
 be in the business of working around any and every bug coming out
 of Redmond.

 It's sort of incomprehensible to me that Microsoft has a bug like this
 and apparently hasn't fixed it.  But I think I still favor trying to
 work around it.  When people try to use a long data directory name and
 it freezes the system, some of them will blame us rather than
 Microsoft.  We've certainly gone to considerable lengths to work
 around extremely strange bugs in various compiler toolchains, even
 relatively obscure ones.  I don't particularly see why we shouldn't do
 the same here.

 I agree we'll probably want to work around it in the end, but I still
 think it should be put to Microsoft PSS if we can. The usual - have we
 actually produced a self-contained example that does just this (and
 doesn't include the full postgres support) and submitted it to
 *microsoft* for comments?

   I have written a self contained win32 console application with which
 the issue can be reproduced.
   The application project is attached with this mail.

   Here is brief description of the project:
   This project is created using MSVC 2010, but even if somebody
 doesn't have this version of VC, functions in file long_path.cpp can
 be copied and
   used in new project.
   In project settings, I have changed Character Set to Use Multi-Byte
 Character Set which is what Postgres uses.

   It takes 3 parameters as input:
   existingpath - path for which link will be created. this path should
 be an already
   existing path with one level less than actual
 path. For example,
   if we want to create a link for path
 E:/PG_Patch/Long_Path/path_dir/version_dir,
   then this should be E:/PG_Patch/Long_Path/path_dir.
   newpath  - path where link needs to be created. it should be
 non-absolute path
   of format linked_path_dir/test_version
   curpath   - path to set as current working directory path, it
 should be the
   location to prepend to newpath

  Currently I have used input parameters as
  E:/PG_Patch/Long_Path/path_dir
  linked_path_dir/test_version
 E:/PG_Patch/Long_Path/

  Long path is much less than 260 char limit on windows, I have
 observed this problem with path length  130 (approx.)

 And this reliably reproduces the hang?

   Yes, it produces hang whenever the length of 'curpath' parameter is
greater then 130 (approx.). In above example, I used curpath of length
159.

 On which Windows version(s)?

   I used Windows 7 64bit to reproduce it. However the original user
has reported this issue on Windows 2008 64bit, so this application
should hang on other Windows 2008 64bit as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve setup for documentation building with FOP

2013-10-15 Thread Peter Eisentraut

On 9/16/13 12:19 PM, Alvaro Herrera wrote:
 The FOP-based build works fine for me.  I gave the output a look.  I
 like that text formatted with fixed-width font wraps at the right
 margin, instead of continuing beyond it; there are some strange
 artifacts about it (such as addition of hyphens in some places, say in
 the middle of a configure option); but I think it's nicer than the
 output of the other toolchain nevertheless.  This willingness to break
 in the middle of pre formatted elements looks weird in some places;
 for example table 17.2 SSL Server File Usage; old output puts the option
 name on first line, file path on second line; new output puts option
 name on top, file name is split in half.

The wrapping and hyphenation is a complex issue.  I think we agree that
it's generally better to do suboptimal wrapping than to have the content
run off the page.  Everything between that is, AFAICT, a matter of
fine-tuning various parameters and attributes.

 In the US size PDF, look at page 386; below the gmake install-world
 there's the bottom-of-page horizontal line, but the paragraph continues
 below that.  Bug?

Yeah, no idea how that can happen.

 Section 17.8 Encryption Options is formatted differently; previous
 method used indentation for the paragraph after each item, new one uses
 a table.  I like the old method better.  This is notoriously bad in the
 superuser_reserved_connections entry in 18.3.1. Connection Settings,
 and others.  Also, the synopsis in PQconnectStartParams entry in 31.1
 Database Connection Control Functions looks a lot worse.  This
 manifests in many places, and while it's an improvement in a minority of
 them, I think it's a net loss overall.

Yep.  Fixed by setting an option that turns it back to the old way.

 I like that the new output has horizontal lines at top and bottom of the
 text area of the page.  In the old method, the page heading (above that
 line) contains the chapter number and title, while in FOP it only has
 title (no number).  I find that number useful.  Also, limiting the space
 for the title and wrapping if the title is too long seems pointless; see
 example of this in chapter High Availability, Load Balancing and
 Replication, where even the word Bal-ancing has been hyphenated.

Made a note that fix that later.

 Formatting of note, tip and such seems a lot better in FOP.  For
 warning, the old method used a surrounding box; the new one just has a
 Warning title and indented paragraph, just like for note et al.
 Doesn't look like a problem to me.  It'd be nice to have pretty icons
 for these areas.

Could be fine-tuned later.

 sidebar renders as grey-background box in FOP, white-background in old
 tool.  Not sure about this; it looks like the only place with grey
 background in the whole manual.  We only have one sidebar in the
 entire SGML source.

Hmm, maybe it's there to keep the printers honest. ;-)  It would be easy
to change this if desired.

 Example 19.1 Example pg_hba.conf Entries is completely broken (no page
 break); renders normally in old method.  There are other cases of this,
 including libpq sample programs and ECPG.

I can't reproduce that.  What version of FOP are you using?  I have 1.1.

 url renders differently: it used to produce a footnote.  FOP instead
 puts the link text inline.  Not really sure about this.

Changed back to footnote.

 The table 25.1 High Availability, Load Balancing, and Replication
 Feature Matrix contains a lot of [bull], which look odd.  Presumably
 an image of a bull head would be better?

This is an artifact of the SGML to XML conversion.  Already fixed in
separate commit.

 Tables 27-13 and 27-14 are misformatted in both implementations.  Surely
 we can do better than overlaying the contents of cells ...

This also needs to be addressed by adjusting the wrapping and
hyphenation rules.  Or we could manually tweak it by adding some zero
width space characters.  It would need to be checked how that would
affect other output formats, however.

 The START_REPLICATION stuff in the Frontend/Backend Protocol chapter is
 completely broken.  Maybe wrong markup?  Also in StartupMessage.

Same issue as with Encryption Options etc. above.

 Seems author resulted in nothing in the old toolchain, but now it does
 print the name of the author.  There are only two authors mentioned, in
 NLS and GEQO, though.  In the GEQO case it's a bit funny because the
 author is now mentioned twice.

I propose to remove those two author sections.  In the GEQO chapter, the
information is already present, and in the NLS case, well, I don't care.

 Speaking of GEQO: the links in its 53.4 Further Reading section don't
 look well in the FOP.  And the bibliography looks completely
 different.

I see.  That is customizable, but maybe not the way it looks in the old
output.  We might need to reassess the entire bibliography at some point
anyway.  It's not really well-maintained.

 Oh, the index at the end is not output.

Fixed in separate 

Re: [HACKERS] buildfarm failures on smew and anole

2013-10-15 Thread Peter Eisentraut
On Mon, 2013-10-14 at 18:14 -0400, Robert Haas wrote:
  I cleaned the semaphores on smew, but they came back.  Whatever is
  crashing is leaving the semaphores lying around.
 
 Ugh.  When did you do that exactly?  I thought I fixed the problem
 that was causing that days ago, and the last 4 days worth of runs all
 show the too many clients error.

I did it a few times over the weekend.  At least twice less than 4 days
ago.  There are currently no semaphores left around, so whatever
happened in the last run cleaned it up.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_sleep(interval)

2013-10-15 Thread Vik Fearing
On 09/30/2013 01:47 PM, Vik Fearing wrote:
 Yes, I understand you are trying to help, and I appreciate it!  My
 opinion, and that of others as well from the original thread, is that
 this patch should either go in as is and break that one case, or not go
 in at all.  I'm fine with either (although clearly I would prefer it
 went in otherwise I wouldn't have written the patch).

I see this is marked as rejected in the commitfest app, but I don't see
any note about who did it or why.  I don't believe there is consensus
for rejection on this list. In fact I think the opposite is true.

May we have an explanation please from the person who rejected this
without comment?

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan
On Tue, Oct 15, 2013 at 2:25 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-15 10:53:35 -0700, Peter Geoghegan wrote:
 On Tue, Oct 15, 2013 at 10:29 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I think anything that only works by breaking visibility rules that way
  is a nonstarter. Doing that from the C level is one thing, exposing it
  this way seems a bad idea.

 What visibility rule is that?

 The early return you added to HTSMVCC.

 At the very least it opens you to lots of halloween problem like
 scenarios.

The term visibility rule as you've used it here is suggestive of
some authoritative rule that should obviously never even be bent. I'd
suggest that what Postgres does isn't very useful as an authority on
this matter, because Postgres doesn't have upsert. Besides, today
Postgres doesn't just bend the rules (that is, some kind of classic
notion of MVCC as described in Concurrency Control in Distributed
Database Systems or something), it totally breaks them, at least in
READ COMMITTED mode (and what I've proposed here just occurs in RC
mode).

It is not actually in evidence that this approach introduces Halloween
problems. In order for HTSMVCC to controversially indicate visibility
under my scheme, it is not sufficient for the row version to just be
exclusive locked by our xact without otherwise being visible - it must
also *not be updated*. Now, I'll freely admit that this could still be
problematic - there might have been a subtlety I missed. But since an
actual example of where this is problematic hasn't been forthcoming, I
take it that it isn't obvious to either yourself or Robert that it
actually is. Any scheme that involves playing cute tricks with
visibility (which is to say, any credible upsert implementation) needs
very careful thought.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-15 Thread KONDO Mitsumasa

(2013/10/15 22:01), k...@rice.edu wrote:

Google's lz4 is also a very nice algorithm with 33% better compression
performance than snappy and 2X the decompression performance in some
benchmarks also with a bsd license:

https://code.google.com/p/lz4/

If we judge only performance, we will select lz4. However, we should think
 another important factor which is software robustness, achievement, bug
fix history, and etc... If we see unknown bugs, can we fix it or improve
algorithm? It seems very difficult, because we only use it and don't
understand algorihtms. Therefore, I think that we had better to select
robust and having more user software.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software


















--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Pavan Deolasee
On Tue, Oct 15, 2013 at 4:51 PM, Andres Freund and...@2ndquadrant.comwrote:



 I think you're over-intrepreting it.


I think you are right. Someone who understands the replication code very
well advised us to use that log message as a way to measure how much time
it takes to send all the missing WAL to a remote standby on a slow WAN
link. While it worked well for all measurements, when we use a middleware
which caches a lot of traffic on the sender side, this log message was very
counter intuitive. It took several more minutes for the standby to actually
receive all the WAL files and catch up after the message was displayed on
the master side. But then as you said, may be relying on the message was
not the best way to measure the time.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee