Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-08-25 Thread Victor Wagner
On Fri, 26 Aug 2016 10:10:33 +0530
Mithun Cy  wrote:

> On Thu, Mar 17, 2016 at 4:47 AM, David Steele 
> wrote:
> >Since there has been no response from the author I have marked this
> >patch
> "returned with feedback".  Please feel free >to resubmit for 9.7!
> I have started to work on this patch, and tried to fix some of the
> issues discussed above. The most recent patch 06 has fixed many
> issues which was raised previously which include indefinite looping,
> crashes. And, some of the issues which remain pending are.

Although I have not resubmutted my patch yet, I've continue to work on
it, so you can see my last version on 

https://www.wagner.pp.ru/fossil/pgfailover.

It seems that I've already fixed some of issues you mentioned below.
Also I've added testsuite, which is incomplete.

> JFYI Interestingly PostgreSql JDBC driver have following options
> targetServerType=any|master|slave|preferSlave for same purpose.

Probably we should havbe a comptibility alias.


-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-08-25 Thread Mithun Cy
On Thu, Mar 17, 2016 at 4:47 AM, David Steele  wrote:
>Since there has been no response from the author I have marked this patch
"returned with feedback".  Please feel free >to resubmit for 9.7!
I have started to work on this patch, and tried to fix some of the issues
discussed above. The most recent patch 06 has fixed many issues which was
raised previously which include indefinite looping, crashes. And, some of
the issues which remain pending are.

1. Connection status functions PQport, PQhost do not give corresponding
values of established connection. -- Have attached the patch for same.

2. Default value of readonly parameter is 0, which means should connect to
master only. So absence of parameter in connection string in simple cases
like "./psql -p PORT database" fails to connect to hot standby server.  I
think since if readonly=1 means connect to any. and readonly=0 means
connect to master only, we should change the default value  to 1, to handle
the cases when parameter is not specified.

JFYI Interestingly PostgreSql JDBC driver have following options
targetServerType=any|master|slave|preferSlave for same purpose.

Also did a little bit of patch clean-up.

Also found some minor issues related to build which I am working on.
1. make check-world @src/interfaces/ecpg/test/connect fails with following
error:
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-g -O0 -pthread -D_REENTRANT -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS
test1.o -L../../ecpglib -L../../pgtypeslib
-L../../../../../src/interfaces/libpq
-L../../../../../src/port -L../../../../../src/common -Wl,--as-needed
-Wl,-rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags  -lecpg
-lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm   -o test1
../../../../../src/interfaces/libpq/libpq.so: undefined reference to
`pg_usleep'
collect2: error: ld returned 1 exit status
make[3]: *** [test1] Error 1
make[3]: Leaving directory `/home/mithun/mynewpost/p1/
src/interfaces/ecpg/test/connect'

patch has used pg_usleep() which is in L../../../../../src/port  I think
dependency is not captured rightly.

Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index f22e3da..835061d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomreadonly=1
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered
+   alternate entries into same database and if connect to first one
+   fails, library would try to connect second etc. This can be used
+   for high availability cluster or for load balancing. See
+parameter.
+   
+   
+   Network host name can be accompanied with port number, separated by
+   colon. If so, this port number is used only when connected to
+   this host. If there is no port number, port specified in the
+parameter would be used.
+   
   
  
 
@@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname


   
-
+
+  
+  hostorder
+  
+  
+  Specifies how to choose host from list of alternate hosts,
+  specified in the  parameter.
+  
+  
+  If value of this argument is sequential (the
+  default) library connects to the hosts in order they specified,
+  and tries to connect second one only if 

Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Andres Freund
On 2016-08-26 13:07:09 +0900, Michael Paquier wrote:
> On Fri, Aug 26, 2016 at 12:54 PM, Amit Kapila  wrote:
> > On Fri, Aug 26, 2016 at 9:04 AM, Michael Paquier
> >  wrote:
> >> On Fri, Aug 26, 2016 at 12:25 PM, Amit Kapila  
> >> wrote:
> >>> If we change the default to 64MB, then I think it won't allow to use
> >>> old databases as-is because we store it in pg_control (I think one
> >>> will get below error [1] for old databases, if we just change default
> >>> and don't do anything else).  Do you have way to address it or you
> >>> think it is okay?
> >>
> >> Those would still be able to work with ./configure
> >> --with-wal-segsize=16, so that's not really an issue.
> >>
> >
> > Right, but do we need suggest users to do so?  The question/point was
> > if we deliver server with default value as 64MB, then it won't allow
> > to start old database.
> 
> Right, pg_upgrade could be made smarter by enforcing a conversion with
> a dedicated option: we could get away by filling the existing segments
> with zeros and add an XLOG switch record at the end of each segments
> formerly at 16MB converted to 64MB. That would still be better than
> converting each page LSN :(

Maybe I'm missing something here - but why would we need to do any of
that? The WAL already isn't compatible between versions, and we don't
reuse the old server's WAL anyway? Isn't all that's needed relaxing some
error check?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-08-25 Thread Vinayak Pokale
Hi All,

Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
across multiple foreign servers.
If a transaction make changes to more than two foreign servers the current
implementation in postgres_fdw doesn't make sure that either all of them
commit or all of them rollback their changes.

We (Masahiko Sawada and me) reopen this thread and trying to contribute in
it.

2PC for FDW

The patch provides support for atomic commit for transactions involving
foreign servers. when the transaction makes changes to foreign servers,
either all the changes to all the foreign servers commit or rollback.

The new patch 2PC for FDW include the following things:
1. The patch 0001 introduces a generic feature. All kinds of FDW that
support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve in
the transaction.

Currently we can push some conditions down to shard nodes, especially in
9.6 the directly modify feature has
been introduced. But such a transaction modifying data on shard node is not
executed surely.
Using 0002 patch, that modify is executed with 2PC. It means that we almost
can provide sharding solution using
multiple PostgreSQL server (one parent node and several shared node).

For multi master, we definitely need transaction manager but transaction
manager probably can use this 2PC for FDW feature to manage distributed
transaction.

2. 0002 patch makes postgres_fdw possible to use 2PC.

0002 patch makes postgres_fdw to use below APIs. These APIs are generic
features which can be used by all kinds of FDWs.

a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead of
COMMIT/ABORT on foreign server which supports 2PC.
b. Manage information of foreign prepared transactions resolver

Masahiko Sawada will post the patch.

Suggestions and comments are helpful to implement this feature.

Regards,

Vinayak Pokale

On Mon, Feb 1, 2016 at 11:14 PM, Alvaro Herrera 
wrote:

> Alvaro Herrera wrote:
> > Ashutosh Bapat wrote:
> >
> > > Here's updated patch. I didn't use version numbers in file names in my
> > > previous patches. I am starting from this onwards.
> >
> > Um, I tried this patch and it doesn't apply at all.  There's a large
> > number of conflicts.  Please update it and resubmit to the next
> > commitfest.
>
> Also, please run "git show --check" of "git diff origin/master --check"
> and fix the whitespace problems that it shows.  It's an easy thing but
> there's a lot of red squares in my screen.
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-25 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Aug 26, 2016 at 11:39:29AM +0900, Michael Paquier wrote:
> > Now, one of the things discussed as well was that we may want to still
> > keep pg_xlog, and soft-link to pg_journal or whatever-the-new-name is
> > to not break the existing tools. Personally, I'd prefer a hard break.
> > That would not be complicated to fix for any tool maintainers.
> 
> I agree on a hard break, unless we get pushback from users, and even
> then, they can create the symlinks themselves.

I'm in favor of the hard break generally, but I'd like David's input on
it, to be sure we're not missing something.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Showing parallel status in \df+

2016-08-25 Thread Pavel Stehule
2016-08-24 15:42 GMT+02:00 Tom Lane :

> Peter Eisentraut  writes:
> > On 8/22/16 1:52 PM, Pavel Stehule wrote:
> >> If I understand to purpose of this patch - it is compromise - PL source
> >> is removed from table, but it is printed in result.
>
> > What does it do if you are displaying more than one function?
>
> It prints more than one footer.  It's very much like the way that, say,
> rules are printed for tables by \d.
>
>
Using footer for this purpose is little bit strange. What about following
design?

1. move out source code of PL functions from \df+
2. allow not unique filter in \sf and allow to display multiple functions

Regards

Pavel




> regards, tom lane
>


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Michael Paquier
On Fri, Aug 26, 2016 at 12:54 PM, Amit Kapila  wrote:
> On Fri, Aug 26, 2016 at 9:04 AM, Michael Paquier
>  wrote:
>> On Fri, Aug 26, 2016 at 12:25 PM, Amit Kapila  
>> wrote:
>>> If we change the default to 64MB, then I think it won't allow to use
>>> old databases as-is because we store it in pg_control (I think one
>>> will get below error [1] for old databases, if we just change default
>>> and don't do anything else).  Do you have way to address it or you
>>> think it is okay?
>>
>> Those would still be able to work with ./configure
>> --with-wal-segsize=16, so that's not really an issue.
>>
>
> Right, but do we need suggest users to do so?  The question/point was
> if we deliver server with default value as 64MB, then it won't allow
> to start old database.

Right, pg_upgrade could be made smarter by enforcing a conversion with
a dedicated option: we could get away by filling the existing segments
with zeros and add an XLOG switch record at the end of each segments
formerly at 16MB converted to 64MB. That would still be better than
converting each page LSN :(
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-25 Thread Fujii Masao
On Fri, Aug 26, 2016 at 11:55 AM, Bruce Momjian  wrote:
> On Fri, Aug 26, 2016 at 11:39:29AM +0900, Michael Paquier wrote:
>> Now, one of the things discussed as well was that we may want to still
>> keep pg_xlog, and soft-link to pg_journal or whatever-the-new-name is
>> to not break the existing tools. Personally, I'd prefer a hard break.
>> That would not be complicated to fix for any tool maintainers.
>
> I agree on a hard break, unless we get pushback from users, and even
> then, they can create the symlinks themselves.

I strongly prefer symlink approach not to break many existing tools
and scripts.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Amit Kapila
On Fri, Aug 26, 2016 at 9:04 AM, Michael Paquier
 wrote:
> On Fri, Aug 26, 2016 at 12:25 PM, Amit Kapila  wrote:
>> If we change the default to 64MB, then I think it won't allow to use
>> old databases as-is because we store it in pg_control (I think one
>> will get below error [1] for old databases, if we just change default
>> and don't do anything else).  Do you have way to address it or you
>> think it is okay?
>
> Those would still be able to work with ./configure
> --with-wal-segsize=16, so that's not really an issue.
>

Right, but do we need suggest users to do so?  The question/point was
if we deliver server with default value as 64MB, then it won't allow
to start old database.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Alvaro Herrera
Gavin Flower wrote:
> On 26/08/16 05:43, Josh Berkus wrote:

> >The one thing I'd be worried about with the increase in size is folks
> >using PostgreSQL for very small databases.  If your database is only
> >30MB or so in size, the increase in size of the WAL will be pretty
> >significant (+144MB for the base 3 WAL segments).  I'm not sure this is
> >a real problem which users will notice (in today's scales, 144MB ain't
> >much), but if it turns out to be, it would be nice to have a way to
> >switch it back *just for them* without recompiling.
> >
> Let such folk use Microsoft Access???  
> 
> 
> More seriously:
> Surely most such people would be using very old hardware & not likely to be
> upgrading to the most recent version of pg in the near future?  And for the
> ones using modern hardware: either they have enough resources not to notice,
> or very probably will know enough to hunt round for a way to reduce the WAL
> size - I strongly suspect.

I've seen people with unusual environments, such as running Pg in some
embedded platform with minimal resources, where they were baffled that
Postgres used so much disk space on files that were barely written to
and never read.  It wasn't a question of there being "large" drives to
buy, but one of not wanting to have a drive in the first place.  Now, I
grant that this was a few years ago already and disk tech (SSDs) has
changed that world; maybe that argument doesn't apply anymore.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Michael Paquier
On Fri, Aug 26, 2016 at 12:25 PM, Amit Kapila  wrote:
> If we change the default to 64MB, then I think it won't allow to use
> old databases as-is because we store it in pg_control (I think one
> will get below error [1] for old databases, if we just change default
> and don't do anything else).  Do you have way to address it or you
> think it is okay?

Those would still be able to work with ./configure
--with-wal-segsize=16, so that's not really an issue.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Amit Kapila
On Thu, Aug 25, 2016 at 10:29 PM, Robert Haas  wrote:
> On Thu, Aug 25, 2016 at 11:21 AM, Simon Riggs  wrote:
>> On 25 August 2016 at 02:31, Robert Haas  wrote:
>>> Furthermore, there is an enforced, synchronous fsync at the end of
>>> every segment, which actually does hurt performance on write-heavy
>>> workloads.[2] Of course, if that were the only reason to consider
>>> increasing the segment size, it would probably make more sense to just
>>> try to push that extra fsync into the background, but that's not
>>> really the case.  From what I hear, the gigantic number of files is a
>>> bigger pain point.
>>
>> I think we should fully describe the problem before finding a solution.
>
> Sure, that's usually a good idea.  I attempted to outline all of the
> possible issues of which I am aware in my original email, but of
> course you may know of considerations which I overlooked.
>
>> This is too big a change to just tweak a value without discussing the
>> actual issue.
>
> Again, I tried to make sure I was discussing the actual issues in my
> original email.  In brief: having to run archive_command multiple
> times per second imposes very tight latency requirements on it;
> directories with hundreds of thousands or millions of files are hard
> to manage; enforced synchronous fsyncs at the end of each segment hurt
> performance.
>
>> And if the problem is as described, how can a change of x4 be enough
>> to make it worth the pain of change? I think you're already admitting
>> it can't be worth it by discussing initdb configuration.
>
> I guess it depends on how much pain of change you think there will be.
> I would expect a change from 16MB -> 64MB to be fairly painless, but
> (1) it might break tools that aren't designed to cope with differing
> segment sizes and (2) it will increase disk utilization for people who
> have such low velocity systems that they never end up with more than 2
> WAL segments, and now those segments are bigger.  If you know of other
> impacts or have reason to believe those problems will be serious,
> please fill in the details.
>
> Despite the fact that initdb configuration has dominated this thread,
> I mentioned it only in the very last sentence of my email and only as
> a possibility.  I believe that a 4x change will be good enough for the
> majority of people for whom this is currently a pain point.  However,
> yes, I do believe that there are some people for whom it won't be
> sufficient.  And I believe that as we continue to enhance PostgreSQL
> to support higher and higher transaction rates, the number of people
> who need an extra-large WAL segment size will increase.  As I see it,
> there are three options here:
>
> 1. Do nothing.  So far, I don't see anybody arguing for that.
>
> 2. Change the default to 64MB and call it good.  This idea seems to
> have considerable support.
>
> 3. Allow initdb-time configurability but keep the default at 16MB.  I
> don't see any support for this.  There is clearly support for
> configurability, but I don't see anyone arguing that the current
> default is preferable, unless that is what you are arguing.
>
> 4. Change the default to 64MB and also allow initdb-time
> configurability.  This option also appears to enjoy substantial
> support, perhaps more than #2.  Magnus seemed to be arguing that this
> is preferable to #2, because then it's easier for people to change the
> setting back if someone discovers a case where the higher default is a
> problem; Tom, on the other hand, seems to think this is overkill.
>

If we change the default to 64MB, then I think it won't allow to use
old databases as-is because we store it in pg_control (I think one
will get below error [1] for old databases, if we just change default
and don't do anything else).  Do you have way to address it or you
think it is okay?

[1] -
FATAL:  database files are incompatible with server
DETAIL:  The database cluster was initialized with XLOG_SEG_SIZE
16777216, but the server was compiled with XLOG_SEG_SIZE 67108864.
HINT:  It looks like you need to recompile or initdb.
LOG:  database system is shut down

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-08-25 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> 9.3 has addressed that by allowing streaming standbys to perform timeline
> jumps via the replication protocol. Doesn't this problem enter in this area?

IIUC, that new feature enables timeline switch without disconnecting the 
standby and without WAL archive.  I think 9.2 can perform timeline switch with 
WAL archive.  In fact, the customer said they hit the problem only once in many 
occurrences of the same test.  The bug seems to emerge depending on the timing.

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is a newly created index contains the invalid LSN?

2016-08-25 Thread Amit Kapila
On Thu, Aug 25, 2016 at 7:55 PM, Yury Zhuravlev
 wrote:
> Hello hackers.
>
> I have a small question. While working on an incremental backup I noticed a
> strange thing.
> Newly created index is contains the invalid LSN (0/0).
> Exmaple:
> postgres=# select lsn from page_header(get_raw_page('test_a_idx2',0));
> lsn -
> 0/0
> (1 row)
>
> Can you explain me why?
>

For some of the indexes like btree which are built outside shared
buffers, we don't write WAL unless wal_level >= REPLICA.  I think
Robert has explained it very well how we handle the crash recovery
situation for such indexes.  However, for some other indexes which
don't bypass shared buffers like BRIN, GIN we do write WAL for such
cases as well, so you must see LSN for those type of indexes.  I am
less sure, if there will be any problem, if don't write WAL for those
indexes as well when wal_level < REPLICA.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-25 Thread Bruce Momjian
On Fri, Aug 26, 2016 at 11:39:29AM +0900, Michael Paquier wrote:
> Now, one of the things discussed as well was that we may want to still
> keep pg_xlog, and soft-link to pg_journal or whatever-the-new-name is
> to not break the existing tools. Personally, I'd prefer a hard break.
> That would not be complicated to fix for any tool maintainers.

I agree on a hard break, unless we get pushback from users, and even
then, they can create the symlinks themselves.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-08-25 Thread Michael Paquier
On Fri, Aug 26, 2016 at 11:33 AM, Tsunakawa, Takayuki
 wrote:
> Our customer hit a problem of cascading replication, and we found the cause.  
> They are using the latest PostgreSQL 9.2.18.  The bug seems to have been 
> fixed in 9.4 and higher during the big modification of xlog.c, but it's not 
> reflected in older releases.
>
> The attached patch is for 9.2.18.  This just borrows the idea from 9.4 and 
> higher.
>
> But we haven't been able to reproduce the problem.  Could you review the 
> patch and help to test it?  I would very much appreciate it if you could 
> figure out how to reproduce the problem easily.
> [...]
> LOG:  out-of-sequence timeline ID 140 (after 141) in log file 652, segment 
> 117, offset 0

9.3 has addressed that by allowing streaming standbys to perform
timeline jumps via the replication protocol. Doesn't this problem
enter in this area?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Renaming of pg_xlog and pg_clog

2016-08-25 Thread Michael Paquier
Hi all,

I am relaunching $subject as 10 development will begin soon. As far as
I know, there is agreement that we can do something here. Among the
different proposals I have found:
- pg_clog renamed to pg_commit_status, pg_xact or pg_commit
- pg_xlog renamed to pg_xjournal, pg_wal or pg_journal

Another idea from Stephen
(https://www.postgresql.org/message-id/20160826003715.gg4...@tamriel.snowman.net)
would be to put everything that is temporary and not WAL-logged into a
single place to facilitate the filtering work of backup tools.

A straight renaming would be a simple patch (including pg_upgrade
part), and if we actually do it for 10.0 it would be good to do it now
instead of in 3 months. I don't mind writing a patch for it.

Now, one of the things discussed as well was that we may want to still
keep pg_xlog, and soft-link to pg_journal or whatever-the-new-name is
to not break the existing tools. Personally, I'd prefer a hard break.
That would not be complicated to fix for any tool maintainers.

Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-08-25 Thread Tsunakawa, Takayuki
Hello,

Our customer hit a problem of cascading replication, and we found the cause.  
They are using the latest PostgreSQL 9.2.18.  The bug seems to have been fixed 
in 9.4 and higher during the big modification of xlog.c, but it's not reflected 
in older releases.

The attached patch is for 9.2.18.  This just borrows the idea from 9.4 and 
higher.

But we haven't been able to reproduce the problem.  Could you review the patch 
and help to test it?  I would very much appreciate it if you could figure out 
how to reproduce the problem easily.


PROBLEM


The customer's configuration consists of three nodes: node1 is a primary, node2 
is a synchronous standby, and node3 is a cascading standby.  The primary 
archives WAL to a shared (network?) storage and the standbys read archived WAL 
from there with restore_command.  recovery_target_timeline is set to 'latest' 
on the standbys.

When node1 dies and node2 is promoted to a primary, node3 cannot catch up node2 
forever, emitting the following message repeatedly:

LOG:  out-of-sequence timeline ID 140 (after 141) in log file 652, segment 117, 
offset 0

The expected behavior is that node3 catches up node2 and keeps synchronization.


CAUSE


The processing went as follows.

1. node1's timeline is 140.  It wrote a WAL record at the end of WAL segment 
117.  The WAL record didn't fit the last page, so it was split across segments 
117 and 118.

2. WAL segment 117 was archived.

3. node1 got down, and node2 was promoted.

4. As part of the recovery process, node2 retrieves WAL segment 117 from 
archive.  It found a WAL record fragment at the end of the segment but could 
not find the remaining fragment in segment 118, so node2 stops recovery there.

LOG:  restored log file "008C028C0075" from archive
LOG:  received promote request
LOG:  redo done at 28C/75FFF738

5. node2 becomes the primary, and its timeline becomes 118.  node3 is 
disconnected by node2 (but later reconnectes to node2).

LOG:  terminating all walsender processes to force cascaded standby(s) to 
update timeline and reconnect

6. node3 retrieves and applies WAL segment 117 from archive.

LOG:  restored log file "008C028C0075" from archive

7. node3 found .history file for time line 141 and renews its timeline to 141.

8. Because node3 found a WAL record fragment at the end of segment 117, it 
expects to find the remaining fragment at the beginning of WAL segment 118 
streamed from node2.  But there was a fragment of a different WAL record, 
because node2 overwrote a different WAL record at the end of segment 117 across 
to 118.

LOG:  invalid contrecord length 5892 in log file 652, segment 118, offset 0

9. node3 then retrieves segment 117 from archive again to get the WAL record at 
the end of segment 117.  However, as node3's timeline is already 141, it 
complains about the older timeline when it sees the timeline 140 at the 
beginning of segment 117.

LOG:  out-of-sequence timeline ID 140 (after 141) in log file 652, segment 117, 
offset 0



Regards
Takayuki Tsunakawa



cascading_standby_stuck.patch
Description: cascading_standby_stuck.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch proposal

2016-08-25 Thread Stephen Frost
* Venkata B Nagothi (nag1...@gmail.com) wrote:
> On Thu, Aug 25, 2016 at 10:59 PM, Stephen Frost  wrote:
> > I'm not a fan of the "recovery_target" option, particularly as it's only
> > got one value even though it can mean two things (either "immediate" or
> > "not set"), but we need a complete solution before we can consider
> > deprecating it.  Further, we could consider making it an alias for
> > whatever better name we come up with.
> 
> The new parameter will accept options : "pause", "shutdown" and "promote"
> 
> *"promote"*
> 
> This option will ensure database starts up once the "immediate" consistent
> recovery point is reached even if it is well before the mentioned recovery
> target point (XID, Name or time).
> This behaviour will be similar to that of recovery_target="immediate" and
> can be aliased.

I don't believe we're really going at this the right way.  Clearly,
there will be cases where we'd like promotion at the end of the WAL
stream (as we currently have) even if the recovery point is not found,
but if the new option's "promote" is the same as "immediate" then we
don't have that.

We need to break this down into all the different possible combinations
and then come up with names for the options to define them.  I don't
believe a single option is going to be able to cover all of the cases.

The cases which I'm considering are:

recovery target is immediate (as soon as we have consistency)
recovery target is a set point (name, xid, time, whatever)

action to take if recovery target is found
action to take if recovery target is not found

Generally, "action" is one of "promote", "pause", or "shutdown".
Clearly, not all actions are valid for all recovery target cases- in
particular, "immediate" with "recovery target not found" can not support
the "promote" or "pause" options.  Otherwise, we can support:

Recovery Target  |  Found  |  Action
-|-|--
immediate|  Yes| promote
immediate|  Yes| pause
immediate|  Yes| shutdown

immediate|  No | shutdown

name/xid/time|  Yes| promote
name/xid/time|  Yes| pause
name/xid/time|  Yes| shutdown

name/xid/time|  No | promote
name/xid/time|  No | pause
name/xid/time|  No | shutdown

We could clearly support this with these options:

recovery_target = immediate, other
recovery_action_target_found = promote, pause, shutdown
recovery_action_target_not_found = promote, pause, shutdown

One question to ask is if we need to support an option for xid and time
related to when we realize that we won't find the recovery target.  If
consistency is reached at a time which is later than the recovery target
for time, what then?  Do we go through the rest of the WAL and perform
the "not found" action at the end of the WAL stream?  If we use that
approach, then at least all of the recovery target types are handled the
same, but I can definitely see cases where an administrator might prefer
an "error" option.

I'd suggest we attempt to support that also.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-08-25 Thread Amit Kapila
On Fri, Aug 26, 2016 at 4:59 AM, neha khatri  wrote:

> Hello,
>
> I noticed that a small optimization is possible in the flow of wait stat
> reporting for the LWLocks, when the pgstat_track_activities is disabled.
> If the check for pgstat_track_activities is done before invoking
>  LWLockReportWaitStart() instead of inside the pgstat_report_wait_start(),
> it can save some of the statements execution where the end result of
> LWLockReportWaitStart() is a NO-OP because pgstat_track_activities = false.
>
>
This is only called in slow path which means when we have to wait or sleep,
so saving few instructions will not make much difference.  Note that both
the functions you have mentioned are inline functions.   However, If you
want, you can try that way and see if this leads to any gain.


> I also see, that there are other callers of pgstat_report_wait_start() as
> well, which would also have to change to add a check for the
> pg_stat_activities, if the check is removed from the
> pgstat_report_wait_start(). Is the pg_stat_activities check inside
> pgstat_report_wait_start() because of some protocol being followed?
>

Not as such, but that variable is mainly used in pgstat.c/.h only.


> Would it be worth making that change.
>
>
-1 for the proposed change.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] patch proposal

2016-08-25 Thread Venkata B Nagothi
On Thu, Aug 25, 2016 at 10:59 PM, Stephen Frost  wrote:

> * Venkata B Nagothi (nag1...@gmail.com) wrote:
> > *Query 1*
> >
> > What about the existing parameter called "recovery_target" which accepts
> > only one value "immediate", which will be similar to the "promote" option
> > with the to-be-introduced new parameter.
> > Since this parameter's behaviour will be incorporated into the new
> > parameter, I think, this parameter can be deprecated from the next
> > PostgreSQL version ?
>
> I don't think we can really consider that without having a good answer
> for what the "new parameter" is, in particular...
>

Sure. I Agree.


> > *Query 2*
> >
> > I am thinking that the new parameter name should be
> > "recovery_target_incomplete" or "recovery_target_incomplete_action"
> which
> > (by name) suggests that recovery target point is not yet reached and
> > accepts options "pause","promote" and "shutdown".
> >
> > The other alternative name i thought of was -
> > "recovery_target_immediate_action", which (by name) suggests the action
> to
> > be taken when the recovery does not reach the actual set recovery target
> > and reaches immediate consistent point.
>
> I don't really care for any of those names.  Note that "immediate" and
> "the point at which we realize that we didn't find the recovery target"
> are *not* necessairly the same.  Whatever we do here needs to also cover
> the 'recovery_target_name' option, where we're only going to realize
> that we didn't find the restore point when we reach the end of the WAL
> stream.
>

Yes, all the recovery targets (including recovery_target_name) will be
taken into consideration.
The whole idea about this patch is to ensure PG realises that the recovery
is incomplete if the specified recovery target point is not found at the
end of the WAL stream.


> I'm not a fan of the "recovery_target" option, particularly as it's only
> got one value even though it can mean two things (either "immediate" or
> "not set"), but we need a complete solution before we can consider
> deprecating it.  Further, we could consider making it an alias for
> whatever better name we come up with.
>

The new parameter will accept options : "pause", "shutdown" and "promote"

*"promote"*

This option will ensure database starts up once the "immediate" consistent
recovery point is reached even if it is well before the mentioned recovery
target point (XID, Name or time).
This behaviour will be similar to that of recovery_target="immediate" and
can be aliased.

*"pause"*

This option ensures database recovery pauses if any of the specified
recovery target ("XID", "Name" or "time") is not reached. So that WAL
replay can be resumed if required.

*"shutdown"*

This option ensures database shuts down if any of the mentioned recovery
target points (XID, Name or time) is not reached.


Regards,
Venkata B N

Fujitsu Australia


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Stephen Frost
Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Thu, Aug 25, 2016 at 10:25 PM, Bruce Momjian  wrote:
> > On Wed, Aug 24, 2016 at 10:40:06PM -0300, Claudio Freire wrote:
> >> > time instead of requiring a recompile; we probably don't save any
> >> > significant number of cycles by compiling this into the server.
> >>
> >> FWIW, +1
> >>
> >> We're already hurt by the small segments due to a similar phenomenon
> >> as the ssh case: TCP slow start. Designing the archive/recovery
> >> command to work around TCP slow start is quite complex, and bigger
> >> segments would just be a better thing.
> >>
> >> Not to mention that bigger segments compress better.
> >
> > This would be good time to rename pg_xlog and pg_clog directories too.
> 
> That would be an excellent timing to do so. The first CF is close by,
> and such a change would be better at the beginning of the  development
> cycle.

If we're going to be renaming things, we might wish to consider further
changes, such as putting everything that's temporary & not WAL-logged
into "pgsql_tmp" directories, so we don't need lists of "directories to
exclude" in things like the pg_basebackup-related code.

We should really have an independent thread about this though, as it's
not what Robert's asking about here.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Michael Paquier
On Thu, Aug 25, 2016 at 10:25 PM, Bruce Momjian  wrote:
> On Wed, Aug 24, 2016 at 10:40:06PM -0300, Claudio Freire wrote:
>> > time instead of requiring a recompile; we probably don't save any
>> > significant number of cycles by compiling this into the server.
>>
>> FWIW, +1
>>
>> We're already hurt by the small segments due to a similar phenomenon
>> as the ssh case: TCP slow start. Designing the archive/recovery
>> command to work around TCP slow start is quite complex, and bigger
>> segments would just be a better thing.
>>
>> Not to mention that bigger segments compress better.
>
> This would be good time to rename pg_xlog and pg_clog directories too.

That would be an excellent timing to do so. The first CF is close by,
and such a change would be better at the beginning of the  development
cycle.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-08-25 Thread neha khatri
Hello,

I noticed that a small optimization is possible in the flow of wait stat
reporting for the LWLocks, when the pgstat_track_activities is disabled.
If the check for pgstat_track_activities is done before invoking
 LWLockReportWaitStart() instead of inside the pgstat_report_wait_start(),
it can save some of the statements execution where the end result of
LWLockReportWaitStart() is a NO-OP because pgstat_track_activities = false.

I also see, that there are other callers of pgstat_report_wait_start() as
well, which would also have to change to add a check for the
pg_stat_activities, if the check is removed from the
pgstat_report_wait_start(). Is the pg_stat_activities check inside
pgstat_report_wait_start() because of some protocol being followed? Would
it be worth making that change.

Regards,
Neha

Cheers,
Neha

On Thu, Mar 10, 2016 at 12:31 AM, Amit Kapila 
wrote:

> On Fri, Mar 4, 2016 at 7:11 PM, Alexander Korotkov 
> wrote:
> >
> >>
> >> If yes, then the only slight worry is that there will lot of repetition
> in wait_event_type column, otherwise it is okay.
> >
> >
> > There is morerows attribute of entry tag in Docbook SGML, it behaves
> like rowspan in HTML.
> >
>
> Thanks for the suggestion.  I have updated the patch to include
> wait_event_type information in the wait_event table.
>
> As asked above by Robert, below is performance data with the patch.
>
> M/C Details
> --
> IBM POWER-8 24 cores, 192 hardware threads
> RAM = 492GB
>
> Performance Data
> 
> min_wal_size=15GB
> max_wal_size=20GB
> checkpoint_timeout=15min
> maintenance_work_mem = 1GB
> checkpoint_completion_target = 0.9
>
>
> pgbench read-only (median of 3, 5-min runs)
>
> clients BASE PATCH %
> 1 19703.549206 19992.141542 1.4646718364
> 8 120105.542849 127717.835367 6.3380026745
> 64 487334.338764 495861.7211254 1.7498012521
>
>
> The read-only data shows some improvement with patch, but I think this is
> mostly attributed to run-to-run variation.
>
> pgbench read-write (median of 3, 30-min runs)
>
> clients BASE PATCH %
> 1 1703.275728 1696.568881 -0.3937616729
> 8 8884.406185 9442.387472 6.2804567394
> 64 32648.82798 32113.002416 -1.6411785572
>
>
> In the above data, the read-write data shows small regression (1.6%) at
> higher client-count, but when I ran individually that test, the difference
> was 0.5%. I think it is mostly attributed to run-to-run variation as we see
> with read-only tests.
>
>
> Thanks to Mithun C Y for doing performance testing of this patch.
>
> As this patch is adding 4-byte variable to shared memory structure PGPROC,
> so this is susceptible to memory alignment issues for shared buffers as
> discussed in thread [1], but in general the performance data doesn't
> indicate any regression.
>
> [1] - http://www.postgresql.org/message-id/CAA4eK1+ZeB8PMwwktf+
> 3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Gavin Flower

On 26/08/16 05:43, Josh Berkus wrote:

On 08/25/2016 01:12 PM, Robert Haas wrote:

I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried

at all about the risk of master/slave sync thing, per previous statement.
But if it does have performance implications, per Andres suggestion, then
making it configurable at initdb time probably comes with a cost that's not
worth paying.

At this point it's hard to judge, because we don't have any idea what
the cost might be.  I guess if we want to pursue this approach,
somebody will have to code it up and benchmark it.  But what I'm
inclined to do for starters is put together a patch to go from 16MB ->
64MB.  Committing that early this cycle will give us time to
reconsider if that turns out to be painful for reasons we haven't
thought of yet.  And give tool authors time to make adjustments, if
any are needed.

The one thing I'd be worried about with the increase in size is folks
using PostgreSQL for very small databases.  If your database is only
30MB or so in size, the increase in size of the WAL will be pretty
significant (+144MB for the base 3 WAL segments).  I'm not sure this is
a real problem which users will notice (in today's scales, 144MB ain't
much), but if it turns out to be, it would be nice to have a way to
switch it back *just for them* without recompiling.


Let such folk use Microsoft Access???  


More seriously:
Surely most such people would be using very old hardware & not likely to 
be upgrading to the most recent version of pg in the near future?  And 
for the ones using modern hardware: either they have enough resources 
not to notice, or very probably will know enough to hunt round for a way 
to reduce the WAL size - I strongly suspect.


Currently, I'm not support pg in any production environment, and using 
it for testing & keeping up-to-date with pg.  So it would affect me - 
however, I have enough resources so it is no problem in practice.




Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add the "snapshot too old" feature

2016-08-25 Thread Alvaro Herrera
Kevin Grittner wrote:
> On Thu, Aug 25, 2016 at 2:56 PM, Alvaro Herrera
>  wrote:
> 
> > I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
> > Why do we apply it to the metapage buffer (line 1717 in master)?
> 
> If there is any chance that GinPageGetMeta(page)->head could have
> changed from a valid block number to InvalidBlockNumber or a
> different pending-list page due to a vacuum freeing pages from the
> pending-list, the metapage must be checked -- there is no other way
> to detect that data might have disappeared.

Hmm ... but the disappearing data is moved to regular GIN pages, right?
It doesn't just go away.  I suppose that the error would be raised as
soon as we scan a GIN data page that, because of receiving data from the
pending list, has a newer LSN.  I don't know GIN in detail but perhaps
it's possible that the data is inserted into data pages in lsn A, then
removed from the pending list in lsn B (where A < B).  If the snapshot's
LSN lies in between, a spurious error would be raised.

> > FWIW I like the "revert" commit, because it easily shows me in what
> > places you considered a snapshot-too-old test and decided not to add
> > one.  Bare BufferGetPage calls (the current situation) don't indicate that.
> 
> I'm glad there is some value from having done that little dance.  :-)

I'm sure that was an annoying thing to do, so I agree.

> Thanks for looking this over so closely!

You're welcome.  I'm not actually reviewing this patch specifically,
just trying to figure out a different new feature and reading a few AMs
code while at it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add the "snapshot too old" feature

2016-08-25 Thread Kevin Grittner
On Thu, Aug 25, 2016 at 2:56 PM, Alvaro Herrera
 wrote:

> I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
> Why do we apply it to the metapage buffer (line 1717 in master)?

If there is any chance that GinPageGetMeta(page)->head could have
changed from a valid block number to InvalidBlockNumber or a
different pending-list page due to a vacuum freeing pages from the
pending-list, the metapage must be checked -- there is no other way
to detect that data might have disappeared.

> Shouldn't we apply it to the pending-list pages themselves only, if any?

If the pending-list pages are never freed, we could drop the
metapage test.

> (If there are no pending-list pages, surely the age of the snapshot used
> to read the metapage doesn't matter; and if there are, then the age of
> the pending-list pages will fail the test.)

True as long as no pending-list page is ever removed from the
pending-list.  If we take out the test, it might be worth a comment
explaining why it's not needed and that it will be needed if
someone modifies the AM to recycle empty pages from the list for
reuse.

> FWIW I like the "revert" commit, because it easily shows me in what
> places you considered a snapshot-too-old test and decided not to add
> one.  Bare BufferGetPage calls (the current situation) don't indicate that.

I'm glad there is some value from having done that little dance.  :-)

Thanks for looking this over so closely!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Peter Geoghegan
On Wed, Aug 24, 2016 at 6:31 PM, Robert Haas  wrote:
> 3. archive_timeout is no longer a frequently used option.  Obviously,
> if you are frequently archiving partial segments, you don't want the
> segment size to be too large, because if it is, each forced segment
> switch potentially wastes a large amount of space (and bandwidth).
> But given streaming replication and pg_receivexlog, the use case for
> archiving partial segments is, at least according to my understanding,
> a lot narrower than it used to be.  So, I think we don't have to worry
> as much about keeping forced segment switches cheap as we did during
> the 8.x series.

Heroku uses archive_timeout. It is considered important, because S3
archives are more reliable than EBS storage. We want to cap how much
time can pass before WAL is shipped to S3, to some degree. It's weird
to talk about degrees of durability, since we tend to assume that it's
either/or, but distinctions like that start to matter when you have an
enormous number of databases. S3 has an extremely good track record,
reliability-wise.

We're not too concerned about the overhead of all of this, I think,
because WAL segments consist of zeroes at the end when archive_timeout
is applied (at least from 9.4 on). We compress the WAL segments, and
many zeroes compress very well.

I admit that I haven't looked at it in much detail, but that is my
current understanding.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fwd: [Snowball-discuss] Greek stemmer

2016-08-25 Thread Oleg Bartunov
This is a chance to add default configuration for Greek language if
somebody with good knowledge could follow this development.

Oleg

-- Forwarded message --
From: Oleg Smirnov 
Date: Thu, Aug 25, 2016 at 5:26 PM
Subject: [Snowball-discuss] Greek stemmer
To: "snowball-discu." 


Hi all,

 I have implemented a stemmer for Modern Greek language [1] based on a
thesis by G. Ntais [2] with improvements proposed by S. Saroukos [3]

 I'm pretty new to Snowball so it will be great if someone could
review my code. Any feedback is much appreciated.

 1. https://github.com/snowballstem/snowball/pull/44
 2. http://sais.se/mthprize/2007/ntais2007.pdf
 3. http://tampub.uta.fi/bitstream/handle/10024/80480/gradu03463.pdf

--
Regards,
Oleg Smirnov

___
Snowball-discuss mailing list
snowball-disc...@lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add the "snapshot too old" feature

2016-08-25 Thread Kevin Grittner
On Thu, Aug 25, 2016 at 2:56 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> Add the "snapshot too old" feature
>
>> src/backend/access/gin/ginbtree.c  |   9 +-
>> src/backend/access/gin/gindatapage.c   |   7 +-
>> src/backend/access/gin/ginget.c|  22 +-
>> src/backend/access/gin/gininsert.c |   2 +-
>
> I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
> Why do we apply it to the metapage buffer (line 1717 in master)?
> Shouldn't we apply it to the pending-list pages themselves only, if any?
> (If there are no pending-list pages, surely the age of the snapshot used
> to read the metapage doesn't matter; and if there are, then the age of
> the pending-list pages will fail the test.)
>
>
> FWIW I like the "revert" commit, because it easily shows me in what
> places you considered a snapshot-too-old test and decided not to add
> one.  Bare BufferGetPage calls (the current situation) don't indicate that.

What about the state after pending-list entries are applied?  Would
the change you suggest still run across a page with an appropriate
LSN?

I will go review what I did in the gin AM, but wanted to put that
question out there first...

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Alvaro Herrera
Robert Haas wrote:
> On Thu, Aug 25, 2016 at 3:21 PM, Alvaro Herrera
>  wrote:

> > Does it work to set the minimum to one WAL segment, i.e. 64MB?  guc.c
> > has a hardcoded minimum of 2, but I couldn't find an explanation for it.
> 
> Well, I think that when you overrun the end of one segment, you're
> never going to be able to wrap around to the start of the same
> segment; you're going to get sucked into needing another file.

Sure, but that's a transient situation; after a couple of checkpoints,
the old segment can be removed without any danger, leaving only the
active segment.  [thinks]  Ah, on reflection, there's no way that this
buys anything: it is always critical to have enough disk space to have
one more segment to switch to.  So even if you're on tight disk
constraints, you cannot afford to allocate space for a single segment
only, because if you only have that and the need comes to create the
next one to switch to, you will just not have the space.

If we were to use the WAL space in a way different than the POSIX
file interface, we could probably do better.  But that seems too
onerous.

I suppose the only option is to keep the minimum at 2.  I don't see any
point in forcing the minimum to be more than that, however.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2016 at 12:59 PM, Peter Geoghegan  wrote:
> Maybe we should change the ordering of those IndexInfo structs to
> something more suitable, but it must be immutable (it cannot hinge
> upon the details of one particular DML statement).

I meant that it must be stable (not immutable), in the specific sense
that it cannot change without an ALTER TABLE or similar. The order of
insertion should be consistent among all backends that might be
performing DML against the table at any given time, since, in general,
to do otherwise risks deadlocking (perhaps only with hash indexes, or
a much earlier version of the nbtree am that still used heavyweight
locks).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2016 at 12:16 PM, Tom Lane  wrote:
> I'm not following.  The only insertions happening in this test case
> are coming from various clients doing the same INSERT ON CONFLICT UPDATE.
> If one of them has successfully inserted "42" into the arbiter index,
> how is it that other ones are not seeing that?

It's complicated. One backend may completely overtake another in this
race. We don't give up early within ExecInsertIndexTuples(), because
there isn't any useful distinction made between the speculative
insertion case, and the deferred constraint case. The ordering, as
such, is therefore irrelevant.

Fortunately, I posted a fix, of sorts, more than a year ago:

https://commitfest.postgresql.org/10/403/

It never occurred to me to make this argument for it.

There is a separate question of how to make the ordering avoid
problems if it did matter (if that patch ever got committed). I think
it would fix this exact test case, but only accidentally, because the
executor gets a list of IndexInfo structs from the relcache in a
consistent though fairly insignificant order (ordered by OID). I don't
think you can change that property within the relcache, because the
ordering must be consistent (to avoid deadlocks, perhaps elsewhere).

Maybe we should change the ordering of those IndexInfo structs to
something more suitable, but it must be immutable (it cannot hinge
upon the details of one particular DML statement). I actually also
wrote a patch to prefer insertion into the primary key first, which
also went nowhere (I gave up on that one, to be fair).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add the "snapshot too old" feature

2016-08-25 Thread Alvaro Herrera
Kevin Grittner wrote:
> Add the "snapshot too old" feature

> src/backend/access/gin/ginbtree.c  |   9 +-
> src/backend/access/gin/gindatapage.c   |   7 +-
> src/backend/access/gin/ginget.c|  22 +-
> src/backend/access/gin/gininsert.c |   2 +-

I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
Why do we apply it to the metapage buffer (line 1717 in master)?
Shouldn't we apply it to the pending-list pages themselves only, if any?
(If there are no pending-list pages, surely the age of the snapshot used
to read the metapage doesn't matter; and if there are, then the age of
the pending-list pages will fail the test.)


FWIW I like the "revert" commit, because it easily shows me in what
places you considered a snapshot-too-old test and decided not to add
one.  Bare BufferGetPage calls (the current situation) don't indicate that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 3:21 PM, Alvaro Herrera
 wrote:
> Yeah, and it's also related to the point Josh Berkus was making about
> clusters with little activity.

Right.

> Does it work to set the minimum to one WAL segment, i.e. 64MB?  guc.c
> has a hardcoded minimum of 2, but I couldn't find an explanation for it.

Well, I think that when you overrun the end of one segment, you're
never going to be able to wrap around to the start of the same
segment; you're going to get sucked into needing another file.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Bruce Momjian
On Thu, Aug 25, 2016 at 04:21:33PM +0100, Simon Riggs wrote:
> If we do have the pain of change, should we also consider making WAL
> files variable length? What do we gain by having the files all the
> same size? ISTM better to have WAL files that vary in length up to 1GB
> in size.
> 
> (This is all about XLOG_SEG_SIZE; I presume XLOG_BLCKSZ can stay as it
> is, right?)

I think having WAL use variable length files would add complexity for
recycling WAL files.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Konstantin Knizhnik

On 08/25/2016 10:08 PM, Peter Geoghegan wrote:

On Thu, Aug 25, 2016 at 11:49 AM, Tom Lane  wrote:

I think the point is that given the way he's set up the test case,
there should be no duplicate violation in the plain unique index
unless there is one in the arbiter index.  So assuming that INSERT
tests the arbiter indexes first, there shouldn't be an error.
Maybe it doesn't do that, but it seems like it would be a good idea
if it did.

Oh, yeah. This is arguably an example of inference failing to infer
multiple unique indexes as arbiters. Inference could, in principle,
recognize that the second unique index is equivalent to the first, but
doesn't. (I don't think that it matters which order anything is tested
in, though, because not finding a dup value in the arbiter index does
not guarantee that there won't be one in the other index. There is no
locking when no conflict is initially found, and so no guarantees
here.)

Anyway, I don't have a lot of sympathy for this point of view, because
the scenario is completely contrived. You have to draw the line
somewhere.


I do not think that this scenario is completely contrived: the cases when a 
table has more than one primary key are quite common.
For example, "user" may have unique e-mail address, phone number and login.
Also, as far as I know, this is not an artificial example, but real case taken 
from customers application...

I am not sure weather it's really bug or feature, but the user's intention was 
obvious: locate record by one of the unique keys and if such record already 
exists,
then increment counter (do update instead of insert).  But there are also good 
arguments why upsert  may report conflict in this case...

If such UPSERT behavior is assumed to be correct, what is the best workaround 
for the problem if we really need to have to separate indexes and want to 
enforce unique constraint for both keys?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Alvaro Herrera
Robert Haas wrote:
> On Thu, Aug 25, 2016 at 2:49 PM, Alvaro Herrera
>  wrote:

> > I think the relevant one for that case is the minimum, though:
> >
> > #min_wal_size = 80MB
> >
> > which corresponds to 5 segments.  I suppose the default value for this
> > minimum would change to some multiple of 64MB.
> 
> Yeah, Andres made the same point, although it looks like he
> erroneously stated that the minimum was 48MB whereas you have it as
> 80MB, which seems to be the actual value.  I assume we would have to
> raise that to either 128MB or 192MB, which does feel like a bit of a
> hefty increase.  It doesn't matter if you're going to make extensive
> use of the cluster, but if somebody's spinning up hundreds of clusters
> each of which has very little activity it might not be an altogether
> welcome change.

Yeah, and it's also related to the point Josh Berkus was making about
clusters with little activity.

Does it work to set the minimum to one WAL segment, i.e. 64MB?  guc.c
has a hardcoded minimum of 2, but I couldn't find an explanation for it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Tom Lane
Peter Geoghegan  writes:
> (I don't think that it matters which order anything is tested
> in, though, because not finding a dup value in the arbiter index does
> not guarantee that there won't be one in the other index. There is no
> locking when no conflict is initially found, and so no guarantees
> here.)

I'm not following.  The only insertions happening in this test case
are coming from various clients doing the same INSERT ON CONFLICT UPDATE.
If one of them has successfully inserted "42" into the arbiter index,
how is it that other ones are not seeing that?

> Anyway, I don't have a lot of sympathy for this point of view, because
> the scenario is completely contrived.

Well, I agree this particular test case looks contrived, but it might be
a simplification of a more convincing real-world case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2016 at 11:49 AM, Tom Lane  wrote:
> I think the point is that given the way he's set up the test case,
> there should be no duplicate violation in the plain unique index
> unless there is one in the arbiter index.  So assuming that INSERT
> tests the arbiter indexes first, there shouldn't be an error.
> Maybe it doesn't do that, but it seems like it would be a good idea
> if it did.

Oh, yeah. This is arguably an example of inference failing to infer
multiple unique indexes as arbiters. Inference could, in principle,
recognize that the second unique index is equivalent to the first, but
doesn't. (I don't think that it matters which order anything is tested
in, though, because not finding a dup value in the arbiter index does
not guarantee that there won't be one in the other index. There is no
locking when no conflict is initially found, and so no guarantees
here.)

Anyway, I don't have a lot of sympathy for this point of view, because
the scenario is completely contrived. You have to draw the line
somewhere.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 2:49 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Thu, Aug 25, 2016 at 1:43 PM, Josh Berkus  wrote:
>
>> > The one thing I'd be worried about with the increase in size is folks
>> > using PostgreSQL for very small databases.  If your database is only
>> > 30MB or so in size, the increase in size of the WAL will be pretty
>> > significant (+144MB for the base 3 WAL segments).  I'm not sure this is
>> > a real problem which users will notice (in today's scales, 144MB ain't
>> > much), but if it turns out to be, it would be nice to have a way to
>> > switch it back *just for them* without recompiling.
>>
>> I think you may be forgetting that "the base 3 WAL segments" is no
>> longer the default configuration.  checkpoint_segments=3 is history;
>> we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
>> not 3.
>
> I think the relevant one for that case is the minimum, though:
>
> #min_wal_size = 80MB
>
> which corresponds to 5 segments.  I suppose the default value for this
> minimum would change to some multiple of 64MB.

Yeah, Andres made the same point, although it looks like he
erroneously stated that the minimum was 48MB whereas you have it as
80MB, which seems to be the actual value.  I assume we would have to
raise that to either 128MB or 192MB, which does feel like a bit of a
hefty increase.  It doesn't matter if you're going to make extensive
use of the cluster, but if somebody's spinning up hundreds of clusters
each of which has very little activity it might not be an altogether
welcome change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-25 Thread Tom Lane
BTW, this example also exposes another problem, in that the report is

>> ERREUR:  unknown error severity
>> CONTEXT:  parallel worker

Since, in fact, this error was thrown from leader code, that CONTEXT
report is outright misleading.  It's easy to figure that out in this
particular case because the error could only have come from the leader
side, but errors reported from further down inside pq_parse_errornotice
(eg, out-of-memory in pstrdup) would not be so easy.

We could reduce this problem by narrowing the scope over which 
HandleParallelMessage activates the ParallelErrorContext callback.
But I'm inclined to get rid of that callback entirely and instead
do something like this:

/* Parse ErrorResponse or NoticeResponse. */
pq_parse_errornotice(msg, );

/* Death of a worker isn't enough justification for suicide. */
edata.elevel = Min(edata.elevel, ERROR);

+   /* If desired, tag the message with context. */
+   if (force_parallel_mode != FORCE_PARALLEL_REGRESS)
+   {
+   if (edata.context)
+   edata.context = psprintf("%s\n%s", edata.context,
+_("parallel worker"));
+   else
+   edata.context = pstrdup(_("parallel worker"));
+   }
+
/* Rethrow error or notice. */
ThrowErrorData();

This knows a little bit more than before about the conventions for
combining context strings, but we can be sure that "parallel worker"
will be appended to exactly the messages that come back from the
parallel worker, not anything else.

Barring objections I'll push something like this tomorrow.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Alvaro Herrera
Robert Haas wrote:
> On Thu, Aug 25, 2016 at 1:43 PM, Josh Berkus  wrote:

> > The one thing I'd be worried about with the increase in size is folks
> > using PostgreSQL for very small databases.  If your database is only
> > 30MB or so in size, the increase in size of the WAL will be pretty
> > significant (+144MB for the base 3 WAL segments).  I'm not sure this is
> > a real problem which users will notice (in today's scales, 144MB ain't
> > much), but if it turns out to be, it would be nice to have a way to
> > switch it back *just for them* without recompiling.
> 
> I think you may be forgetting that "the base 3 WAL segments" is no
> longer the default configuration.  checkpoint_segments=3 is history;
> we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
> not 3.

I think the relevant one for that case is the minimum, though:

#min_wal_size = 80MB

which corresponds to 5 segments.  I suppose the default value for this
minimum would change to some multiple of 64MB.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Aug 25, 2016 at 7:12 AM, Ivan Frolkov  wrote:
>> So, if we have primary key and unique constraint on a table then upsert will
>> not work as would expected.

> Why is this unexpected?

> You only take the alternative path (UPDATE) in the event of a would-be
> duplicate violation. You can't upsert while using more than one index
> as an arbiter index. This is true unless they're more or less
> equivalent, in which case multiple arbiter indexes can be inferred,
> but that clearly doesn't apply here.

I think the point is that given the way he's set up the test case,
there should be no duplicate violation in the plain unique index
unless there is one in the arbiter index.  So assuming that INSERT
tests the arbiter indexes first, there shouldn't be an error.
Maybe it doesn't do that, but it seems like it would be a good idea
if it did.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 1:50 PM, Andres Freund  wrote:
> On 2016-08-25 13:45:29 -0400, Robert Haas wrote:
>> I think you may be forgetting that "the base 3 WAL segments" is no
>> longer the default configuration.  checkpoint_segments=3 is history;
>> we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
>> not 3.
>
> Well, but min_wal_size still is 48MB. So sure, if you consistently have
> a high WAL throughput, it'll be bigger. But otherwise pg_xlog will
> shrink again.

Hmm, yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Andres Freund
On 2016-08-25 13:45:29 -0400, Robert Haas wrote:
> I think you may be forgetting that "the base 3 WAL segments" is no
> longer the default configuration.  checkpoint_segments=3 is history;
> we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
> not 3.

Well, but min_wal_size still is 48MB. So sure, if you consistently have
a high WAL throughput, it'll be bigger. But otherwise pg_xlog will
shrink again.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Magnus Hagander
On Thu, Aug 25, 2016 at 7:45 PM, Robert Haas  wrote:

> On Thu, Aug 25, 2016 at 1:43 PM, Josh Berkus  wrote:
> > On 08/25/2016 01:12 PM, Robert Haas wrote:
> >>> I agree that #4 is best. I'm not sure it's worth the cost. I'm not
> worried
> >>> > at all about the risk of master/slave sync thing, per previous
> statement.
> >>> > But if it does have performance implications, per Andres suggestion,
> then
> >>> > making it configurable at initdb time probably comes with a cost
> that's not
> >>> > worth paying.
> >> At this point it's hard to judge, because we don't have any idea what
> >> the cost might be.  I guess if we want to pursue this approach,
> >> somebody will have to code it up and benchmark it.  But what I'm
> >> inclined to do for starters is put together a patch to go from 16MB ->
> >> 64MB.  Committing that early this cycle will give us time to
> >> reconsider if that turns out to be painful for reasons we haven't
> >> thought of yet.  And give tool authors time to make adjustments, if
> >> any are needed.
> >
> > The one thing I'd be worried about with the increase in size is folks
> > using PostgreSQL for very small databases.  If your database is only
> > 30MB or so in size, the increase in size of the WAL will be pretty
> > significant (+144MB for the base 3 WAL segments).  I'm not sure this is
> > a real problem which users will notice (in today's scales, 144MB ain't
> > much), but if it turns out to be, it would be nice to have a way to
> > switch it back *just for them* without recompiling.
>
> I think you may be forgetting that "the base 3 WAL segments" is no
> longer the default configuration.  checkpoint_segments=3 is history;
> we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
> not 3.
>
>
And obviously that'd be 16 files if we increase the wal segment size. So
the actual maximum size doesn't change, except you can currently set
max_wal_size to something lower than 64Mb. If we change, the minimum value
would become 64Mb, I assume.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 1:43 PM, Josh Berkus  wrote:
> On 08/25/2016 01:12 PM, Robert Haas wrote:
>>> I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried
>>> > at all about the risk of master/slave sync thing, per previous statement.
>>> > But if it does have performance implications, per Andres suggestion, then
>>> > making it configurable at initdb time probably comes with a cost that's 
>>> > not
>>> > worth paying.
>> At this point it's hard to judge, because we don't have any idea what
>> the cost might be.  I guess if we want to pursue this approach,
>> somebody will have to code it up and benchmark it.  But what I'm
>> inclined to do for starters is put together a patch to go from 16MB ->
>> 64MB.  Committing that early this cycle will give us time to
>> reconsider if that turns out to be painful for reasons we haven't
>> thought of yet.  And give tool authors time to make adjustments, if
>> any are needed.
>
> The one thing I'd be worried about with the increase in size is folks
> using PostgreSQL for very small databases.  If your database is only
> 30MB or so in size, the increase in size of the WAL will be pretty
> significant (+144MB for the base 3 WAL segments).  I'm not sure this is
> a real problem which users will notice (in today's scales, 144MB ain't
> much), but if it turns out to be, it would be nice to have a way to
> switch it back *just for them* without recompiling.

I think you may be forgetting that "the base 3 WAL segments" is no
longer the default configuration.  checkpoint_segments=3 is history;
we now have max_wal_size=1GB, which is a maximum of 64 WAL segments,
not 3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 1:19 PM, Tom Lane  wrote:
>> It's probably best to try
>> to hack things somehow so that the worker localizes nothing and the
>> leader localizes everything.
>
> No way that's gonna work.  For example, the expected report in
> English for the example above is
> ERROR:  invalid input syntax for integer: "BA"
> That doesn't match anything in gettext's database, because we
> already substituted something for the %s.  Basically, localization
> always has to happen before error message construction, not later.

Oh, right.

>> Or we could add another field to the
>> message the worker sends that includes the error level as an integer.
>
> The two alternatives that seem reasonable to me after a bit of reflection
> are:
>
> 1. Hack elog.c to not localize the severity field when in a worker
> process.  (It'd still have to localize all the other fields, because
> of the %s problem.)  This would be a very localized fix, but it still
> seems mighty ugly.
>
> 2. Add another field to error messages, which would be a never-localized
> copy of the severity string.  This might be the best long-run solution,
> especially if Jakob can present a convincing argument why clients might
> want it.  We would be taking some small chance of breaking existing
> clients, but if it only happens in a new major release (ie 9.6) then
> that doesn't seem like a problem.  And anyway the protocol spec has
> always counseled clients to be prepared to ignore unrecognized fields
> in an error message.
>
> We could do a variant 2a in which we invent an additional field but
> only allow workers to send it, which is more or less the same as your
> idea (though I'd still prefer string to integer).  I don't find that
> very attractive though.  If we're going to hack elog.c to have different
> sending behavior in a worker, we might as well do #1.

I don't have strong feelings about this.  Technically, this issue
affects 9.5 also, because pqmq.c was introduced in that release.  I
don't think we want to add another error field in a released branch.
However, since there's no parallel query in 9.5, only people who are
accessing that functionality via extension code would be affected,
which might be nobody and certainly isn't a lot of people, so we could
just leave this unfixed in 9.5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Josh Berkus
On 08/25/2016 01:12 PM, Robert Haas wrote:
>> I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried
>> > at all about the risk of master/slave sync thing, per previous statement.
>> > But if it does have performance implications, per Andres suggestion, then
>> > making it configurable at initdb time probably comes with a cost that's not
>> > worth paying.
> At this point it's hard to judge, because we don't have any idea what
> the cost might be.  I guess if we want to pursue this approach,
> somebody will have to code it up and benchmark it.  But what I'm
> inclined to do for starters is put together a patch to go from 16MB ->
> 64MB.  Committing that early this cycle will give us time to
> reconsider if that turns out to be painful for reasons we haven't
> thought of yet.  And give tool authors time to make adjustments, if
> any are needed.

The one thing I'd be worried about with the increase in size is folks
using PostgreSQL for very small databases.  If your database is only
30MB or so in size, the increase in size of the WAL will be pretty
significant (+144MB for the base 3 WAL segments).  I'm not sure this is
a real problem which users will notice (in today's scales, 144MB ain't
much), but if it turns out to be, it would be nice to have a way to
switch it back *just for them* without recompiling.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-25 Thread Tom Lane
Robert Haas  writes:
> On Thu, Aug 25, 2016 at 11:43 AM, Tom Lane  wrote:
>> Ooops.  Indeed, that is broken:
>> postgres=# select stringu1::int2 from tenk1 where unique1 = 1;
>> ERREUR:  unknown error severity
>> CONTEXT:  parallel worker

> Uggh.  Obviously, I failed to realize that those strings were
> localized.  Leaving aside the question of this particular matching
> problem, I wonder if we are localizing everything twice right now,
> once in the worker and once in the leader.

Hm.  It's possible we're passing already-localized strings through
gettext a second time in the leader, but that basically does nothing
(unless somehow you got a match, but the probability seems tiny).
I rather doubt that is happening though, because of the next point:

> It's probably best to try
> to hack things somehow so that the worker localizes nothing and the
> leader localizes everything.

No way that's gonna work.  For example, the expected report in
English for the example above is
ERROR:  invalid input syntax for integer: "BA"
That doesn't match anything in gettext's database, because we
already substituted something for the %s.  Basically, localization
always has to happen before error message construction, not later.

> Or we could add another field to the
> message the worker sends that includes the error level as an integer.

The two alternatives that seem reasonable to me after a bit of reflection
are:

1. Hack elog.c to not localize the severity field when in a worker
process.  (It'd still have to localize all the other fields, because
of the %s problem.)  This would be a very localized fix, but it still
seems mighty ugly.

2. Add another field to error messages, which would be a never-localized
copy of the severity string.  This might be the best long-run solution,
especially if Jakob can present a convincing argument why clients might
want it.  We would be taking some small chance of breaking existing
clients, but if it only happens in a new major release (ie 9.6) then
that doesn't seem like a problem.  And anyway the protocol spec has
always counseled clients to be prepared to ignore unrecognized fields
in an error message.

We could do a variant 2a in which we invent an additional field but
only allow workers to send it, which is more or less the same as your
idea (though I'd still prefer string to integer).  I don't find that
very attractive though.  If we're going to hack elog.c to have different
sending behavior in a worker, we might as well do #1.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is a newly created index contains the invalid LSN?

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 1:13 PM, Robert Haas  wrote:
> On Thu, Aug 25, 2016 at 10:25 AM, Yury Zhuravlev
>  wrote:
>> I have a small question. While working on an incremental backup I noticed a
>> strange thing.
>> Newly created index is contains the invalid LSN (0/0).
>> Exmaple:
>> postgres=# select lsn from page_header(get_raw_page('test_a_idx2',0));
>> lsn -
>> 0/0
>> (1 row)
>>
>> Can you explain me why?
>
> Why not?

Hmm, maybe I can do better than that.  In general, the reason why we
set the page LSN is to prevent the page from being written before the
WAL record that most recently modified it is flushed to disk; this is
a necessary invariant of write-ahead logging.  But for an index build
we don't need to generate any WAL records: if the system crashes, the
entire transaction will be considered to have aborted and the
relfilenode in which the new index was being written will be ignored,
so it doesn't matter whether we recover any of the contents of that
file.  Since there's no WAL being generated, there's no need to set
LSNs on the pages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 1:05 PM, Magnus Hagander  wrote:
>> 1. Do nothing.  So far, I don't see anybody arguing for that.
>>
>> 2. Change the default to 64MB and call it good.  This idea seems to
>> have considerable support.
>>
>> 3. Allow initdb-time configurability but keep the default at 16MB.  I
>> don't see any support for this.  There is clearly support for
>> configurability, but I don't see anyone arguing that the current
>> default is preferable, unless that is what you are arguing.
>>
>> 4. Change the default to 64MB and also allow initdb-time
>> configurability.  This option also appears to enjoy substantial
>> support, perhaps more than #2.  Magnus seemed to be arguing that this
>> is preferable to #2, because then it's easier for people to change the
>> setting back if someone discovers a case where the higher default is a
>> problem; Tom, on the other hand, seems to think this is overkill.
>
> I was not arguing for #4 over #2, at least not strongly. I think #2 is fine,
> and I think #4 are fine. #4 allows a way out, but it's not *that* important
> unless we go *beyond* 64Mb.

OK, thanks for clarifying.  I can't see going beyond 64MB by default
when we're shipping max_wal_size=1GB.  In another 20 years when
PB-size thumb drives are commonplace we might reconsider.

> I was mainly arguing that we can't claim "it has a configure switch so it's
> kinda configurable" as a way out. If we want it configurable *at all*, it
> should be an initdb switch. If we are confident in our defaults, it doesn't
> have to be.
>
> I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried
> at all about the risk of master/slave sync thing, per previous statement.
> But if it does have performance implications, per Andres suggestion, then
> making it configurable at initdb time probably comes with a cost that's not
> worth paying.

At this point it's hard to judge, because we don't have any idea what
the cost might be.  I guess if we want to pursue this approach,
somebody will have to code it up and benchmark it.  But what I'm
inclined to do for starters is put together a patch to go from 16MB ->
64MB.  Committing that early this cycle will give us time to
reconsider if that turns out to be painful for reasons we haven't
thought of yet.  And give tool authors time to make adjustments, if
any are needed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is a newly created index contains the invalid LSN?

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 10:25 AM, Yury Zhuravlev
 wrote:
> I have a small question. While working on an incremental backup I noticed a
> strange thing.
> Newly created index is contains the invalid LSN (0/0).
> Exmaple:
> postgres=# select lsn from page_header(get_raw_page('test_a_idx2',0));
> lsn -
> 0/0
> (1 row)
>
> Can you explain me why?

Why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2016 at 7:12 AM, Ivan Frolkov  wrote:
> So, if we have primary key and unique constraint on a table then upsert will
> not work as would expected.

Why is this unexpected?

You only take the alternative path (UPDATE) in the event of a would-be
duplicate violation. You can't upsert while using more than one index
as an arbiter index. This is true unless they're more or less
equivalent, in which case multiple arbiter indexes can be inferred,
but that clearly doesn't apply here.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Magnus Hagander
On Thu, Aug 25, 2016 at 6:59 PM, Robert Haas  wrote:

> On Thu, Aug 25, 2016 at 11:21 AM, Simon Riggs 
> wrote:
> > On 25 August 2016 at 02:31, Robert Haas  wrote:
> >> Furthermore, there is an enforced, synchronous fsync at the end of
> >> every segment, which actually does hurt performance on write-heavy
> >> workloads.[2] Of course, if that were the only reason to consider
> >> increasing the segment size, it would probably make more sense to just
> >> try to push that extra fsync into the background, but that's not
> >> really the case.  From what I hear, the gigantic number of files is a
> >> bigger pain point.
> >
> > I think we should fully describe the problem before finding a solution.
>
> Sure, that's usually a good idea.  I attempted to outline all of the
> possible issues of which I am aware in my original email, but of
> course you may know of considerations which I overlooked.
>
> > This is too big a change to just tweak a value without discussing the
> > actual issue.
>
> Again, I tried to make sure I was discussing the actual issues in my
> original email.  In brief: having to run archive_command multiple
> times per second imposes very tight latency requirements on it;
> directories with hundreds of thousands or millions of files are hard
> to manage; enforced synchronous fsyncs at the end of each segment hurt
> performance.
>
> > And if the problem is as described, how can a change of x4 be enough
> > to make it worth the pain of change? I think you're already admitting
> > it can't be worth it by discussing initdb configuration.
>
> I guess it depends on how much pain of change you think there will be.
> I would expect a change from 16MB -> 64MB to be fairly painless, but
> (1) it might break tools that aren't designed to cope with differing
> segment sizes and (2) it will increase disk utilization for people who
> have such low velocity systems that they never end up with more than 2
> WAL segments, and now those segments are bigger.  If you know of other
> impacts or have reason to believe those problems will be serious,
> please fill in the details.
>
> Despite the fact that initdb configuration has dominated this thread,
> I mentioned it only in the very last sentence of my email and only as
> a possibility.  I believe that a 4x change will be good enough for the
> majority of people for whom this is currently a pain point.  However,
> yes, I do believe that there are some people for whom it won't be
> sufficient.  And I believe that as we continue to enhance PostgreSQL
> to support higher and higher transaction rates, the number of people
> who need an extra-large WAL segment size will increase.  As I see it,
> there are three options here:
>
> 1. Do nothing.  So far, I don't see anybody arguing for that.
>
> 2. Change the default to 64MB and call it good.  This idea seems to
> have considerable support.
>
> 3. Allow initdb-time configurability but keep the default at 16MB.  I
> don't see any support for this.  There is clearly support for
> configurability, but I don't see anyone arguing that the current
> default is preferable, unless that is what you are arguing.
>
> 4. Change the default to 64MB and also allow initdb-time
> configurability.  This option also appears to enjoy substantial
> support, perhaps more than #2.  Magnus seemed to be arguing that this
> is preferable to #2, because then it's easier for people to change the
> setting back if someone discovers a case where the higher default is a
> problem; Tom, on the other hand, seems to think this is overkill.


> Personally, I believe option #4 is for the best.  I believe that the
> great majority of users will be better off with 64MB than with 16MB,
> but I like the idea of allowing for smaller values (for people with
> really low-velocity instances) and larger ones (for people with really
> high-velocity instances).
>

I was not arguing for #4 over #2, at least not strongly. I think #2 is
fine, and I think #4 are fine. #4 allows a way out, but it's not *that*
important unless we go *beyond* 64Mb.

I was mainly arguing that we can't claim "it has a configure switch so it's
kinda configurable" as a way out. If we want it configurable *at all*, it
should be an initdb switch. If we are confident in our defaults, it doesn't
have to be.

I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried
at all about the risk of master/slave sync thing, per previous statement.
But if it does have performance implications, per Andres suggestion, then
making it configurable at initdb time probably comes with a cost that's not
worth paying.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 11:43 AM, Tom Lane  wrote:
> Ooops.  Indeed, that is broken:
>
> postgres=# select 1/0;  -- using French locale
> ERREUR:  division par zéro
> postgres=# set force_parallel_mode=1;
> SET
> postgres=# select stringu1::int2 from tenk1 where unique1 = 1;
> ERREUR:  unknown error severity
> CONTEXT:  parallel worker
>
> Not sure what we ought to do about that, but we need to do something.

Uggh.  Obviously, I failed to realize that those strings were
localized.  Leaving aside the question of this particular matching
problem, I wonder if we are localizing everything twice right now,
once in the worker and once in the leader.  It's probably best to try
to hack things somehow so that the worker localizes nothing and the
leader localizes everything.  Or we could add another field to the
message the worker sends that includes the error level as an integer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-25 Thread Alvaro Herrera
Kuntal Ghosh wrote:

> 4. For Speculative Heap tuple insert operation, there was
> inconsistency in t_ctid value. So, I've modified the t_ctid value (in
> backup page) to current block number and offset number. Need
> suggestions!!

In speculative insertions, t_ctid is used to store the speculative
token.  I think you should just mask that field out in that case (which
you can recognize because ip_posid is set to magic value 0xfffe).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 11:21 AM, Simon Riggs  wrote:
> On 25 August 2016 at 02:31, Robert Haas  wrote:
>> Furthermore, there is an enforced, synchronous fsync at the end of
>> every segment, which actually does hurt performance on write-heavy
>> workloads.[2] Of course, if that were the only reason to consider
>> increasing the segment size, it would probably make more sense to just
>> try to push that extra fsync into the background, but that's not
>> really the case.  From what I hear, the gigantic number of files is a
>> bigger pain point.
>
> I think we should fully describe the problem before finding a solution.

Sure, that's usually a good idea.  I attempted to outline all of the
possible issues of which I am aware in my original email, but of
course you may know of considerations which I overlooked.

> This is too big a change to just tweak a value without discussing the
> actual issue.

Again, I tried to make sure I was discussing the actual issues in my
original email.  In brief: having to run archive_command multiple
times per second imposes very tight latency requirements on it;
directories with hundreds of thousands or millions of files are hard
to manage; enforced synchronous fsyncs at the end of each segment hurt
performance.

> And if the problem is as described, how can a change of x4 be enough
> to make it worth the pain of change? I think you're already admitting
> it can't be worth it by discussing initdb configuration.

I guess it depends on how much pain of change you think there will be.
I would expect a change from 16MB -> 64MB to be fairly painless, but
(1) it might break tools that aren't designed to cope with differing
segment sizes and (2) it will increase disk utilization for people who
have such low velocity systems that they never end up with more than 2
WAL segments, and now those segments are bigger.  If you know of other
impacts or have reason to believe those problems will be serious,
please fill in the details.

Despite the fact that initdb configuration has dominated this thread,
I mentioned it only in the very last sentence of my email and only as
a possibility.  I believe that a 4x change will be good enough for the
majority of people for whom this is currently a pain point.  However,
yes, I do believe that there are some people for whom it won't be
sufficient.  And I believe that as we continue to enhance PostgreSQL
to support higher and higher transaction rates, the number of people
who need an extra-large WAL segment size will increase.  As I see it,
there are three options here:

1. Do nothing.  So far, I don't see anybody arguing for that.

2. Change the default to 64MB and call it good.  This idea seems to
have considerable support.

3. Allow initdb-time configurability but keep the default at 16MB.  I
don't see any support for this.  There is clearly support for
configurability, but I don't see anyone arguing that the current
default is preferable, unless that is what you are arguing.

4. Change the default to 64MB and also allow initdb-time
configurability.  This option also appears to enjoy substantial
support, perhaps more than #2.  Magnus seemed to be arguing that this
is preferable to #2, because then it's easier for people to change the
setting back if someone discovers a case where the higher default is a
problem; Tom, on the other hand, seems to think this is overkill.

Personally, I believe option #4 is for the best.  I believe that the
great majority of users will be better off with 64MB than with 16MB,
but I like the idea of allowing for smaller values (for people with
really low-velocity instances) and larger ones (for people with really
high-velocity instances).

> If we do have the pain of change, should we also consider making WAL
> files variable length? What do we gain by having the files all the
> same size? ISTM better to have WAL files that vary in length up to 1GB
> in size.

This seems like an odd comment because the whole way we address WAL
positions is based on the fact that segments are fixed size, as I
would have thought you would know better than I.  The file that
contains a particular byte of WAL is based on lsn/XLOG_SEG_SIZE and
the position within the file is lsn%XLOG_SEG_SIZE.  Making files
variable-size would vastly complicate this addressing scheme and maybe
hurt performance in the process.  I can't see any compelling reason to
go there.

> (This is all about XLOG_SEG_SIZE; I presume XLOG_BLCKSZ can stay as it
> is, right?)

Yep.  Or at least, any discussion of changing the default XLOG block
size would be a completely separate from the issues raised here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Claudio Freire
On Thu, Aug 25, 2016 at 12:21 PM, Simon Riggs  wrote:
> On 25 August 2016 at 02:31, Robert Haas  wrote:
>
>> Furthermore, there is an enforced, synchronous fsync at the end of
>> every segment, which actually does hurt performance on write-heavy
>> workloads.[2] Of course, if that were the only reason to consider
>> increasing the segment size, it would probably make more sense to just
>> try to push that extra fsync into the background, but that's not
>> really the case.  From what I hear, the gigantic number of files is a
>> bigger pain point.
>
> I think we should fully describe the problem before finding a solution.
>
> This is too big a change to just tweak a value without discussing the
> actual issue.
>
> And if the problem is as described, how can a change of x4 be enough
> to make it worth the pain of change? I think you're already admitting
> it can't be worth it by discussing initdb configuration.
>
> If we do have the pain of change, should we also consider making WAL
> files variable length? What do we gain by having the files all the
> same size? ISTM better to have WAL files that vary in length up to 1GB
> in size.
>
> (This is all about XLOG_SEG_SIZE; I presume XLOG_BLCKSZ can stay as it
> is, right?)

Avoiding variable sizes does avoid some failure modes on the
filesystem side in the face of crashes/power loss.

So making them variable size, while possible, wouldn't be simple at
all (it would involve figuring out the way filesystems behave when
facing crash when a file is changing in size).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-25 Thread Kuntal Ghosh
Hi,

I've added the feature in CP app. Following are the testing details:

1. In master, I've enabled following configurations:

* wal_level = replica
* max_wal_senders = 3
* wal_keep_segments = 4000
* hot_standby = on
* wal_consistency_mask = 511  /* Enable consistency check mask bit*/

2. In slave, I've enabled following configurations:

* standby_mode = on
* wal_consistency_mask = 511 /* Enable consistency check mask bit*/

3. Then, I performed gmake installcheck in master. I didn't get any
warning regarding WAL inconsistency in slave.

I've made following changes to the attached patch:

1. For BRIN pages, I've masked the unused space, PD_PAGE_FULL and
PD_HAS_FREE_LINES flags.
2. For Btree pages, I've masked BTP_HALF_DEAD, BTP_SPLIT_END,
BTP_HAS_GARBAGE and BTP_INCOMPLETE_SPLIT flags.
3. For GIN_DELETED page, I've masked the entire page since the page is
always initialized during recovery.
4. For Speculative Heap tuple insert operation, there was
inconsistency in t_ctid value. So, I've modified the t_ctid value (in
backup page) to current block number and offset number. Need
suggestions!!


What needs to be done:
1. Add support for other Resource Managers.
2. Modify masking techniques for existing Resource Managers (if required).
3. Modify the GUC parameter which will accept a list of rmgr names.
4. Modify the technique for identifying rmgr names for which the
feature should be enabled.
5. Generalize the page type identification technique.


On Wed, Aug 24, 2016 at 2:14 PM, Simon Riggs  wrote:
> On 22 August 2016 at 16:56, Simon Riggs  wrote:
>> On 22 August 2016 at 13:44, Kuntal Ghosh  wrote:
>>
>>> Please let me know your thoughts on this.
>>
>> Do the regression tests pass with this option enabled?
>
> Hi,
>
> I'd like to be a reviewer on this. Please can you add this onto the CF
> app so we can track the review?
>
> Please supply details of the testing and test coverage.
>
> Thanks
>
> --
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Thanks & Regards,
Kuntal Ghosh
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..7b64167 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -25,6 +25,7 @@
 #include "access/commit_ts.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
+#include "access/rmgr.h"
 #include "access/subtrans.h"
 #include "access/timeline.h"
 #include "access/transam.h"
@@ -52,7 +53,9 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/barrier.h"
+#include "storage/bufmask.h"
 #include "storage/bufmgr.h"
+#include "storage/bufpage.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/large_object.h"
@@ -94,6 +97,7 @@ bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
 bool		wal_compression = false;
+int		wal_consistency_mask = 0;
 bool		log_checkpoints = false;
 int			sync_method = DEFAULT_SYNC_METHOD;
 int			wal_level = WAL_LEVEL_MINIMAL;
@@ -867,6 +871,9 @@ static void WALInsertLockAcquireExclusive(void);
 static void WALInsertLockRelease(void);
 static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
 
+void checkWALConsistency(XLogReaderState *xlogreader);
+void checkWALConsistencyForBlock(XLogReaderState *record, uint8 block_id);
+
 /*
  * Insert an XLOG record represented by an already-constructed chain of data
  * chunks.  This is a low-level routine; to construct the WAL record header
@@ -6868,6 +6875,12 @@ StartupXLOG(void)
 /* Now apply the WAL record itself */
 RmgrTable[record->xl_rmid].rm_redo(xlogreader);
 
+/*
+ * Check whether the page associated with WAL record is consistent
+ * with the existing page
+ */
+checkWALConsistency(xlogreader);
+
 /* Pop the error context stack */
 error_context_stack = errcallback.previous;
 
@@ -11626,3 +11639,161 @@ XLogRequestWalReceiverReply(void)
 {
 	doRequestWalReceiverReply = true;
 }
+
+/*
+ * Check whether the page associated with WAL record is consistent with the
+ * existing page or not.
+ */
+void checkWALConsistency(XLogReaderState *xlogreader)
+{
+	RmgrIds rmid = (RmgrIds) XLogRecGetRmid(xlogreader);
+	int block_id;
+	int enableWALConsistencyMask = 1;
+	RmgrIds rmids[] = {RM_HEAP2_ID,RM_HEAP_ID,RM_BTREE_ID,RM_HASH_ID,RM_GIN_ID,RM_GIST_ID,RM_SEQ_ID,RM_SPGIST_ID,RM_BRIN_ID};
+	int size = sizeof(rmids)/sizeof(rmid);
+	int i;
+
+	for (i = 0; i < size; i++)
+	{
+		if (rmids[i]==rmid && (wal_consistency_mask & enableWALConsistencyMask))
+		{
+			for (block_id = 0; block_id <= xlogreader->max_block_id; block_id++)
+checkWALConsistencyForBlock(xlogreader,block_id);
+			break;
+		}
+		/*
+		 * Enable checking for the next bit
+		 */
+		enableWALConsistencyMask <<= 1;
+	}
+}
+void checkWALConsistencyForBlock(XLogReaderState 

Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-25 Thread Tom Lane
Jakob Egger  writes:
> My PostgreSQL client checks the PG_DIAG_SEVERITY error field to determine the 
> error level.
> However, I have now learned that this field is localized. This means that a 
> server configured with --enable-nls might for example return the string 
> ERREUR instead of ERROR.

Check.

> So if I want to determine the error level, do I need to compare against all 
> localised variations in my app? Or is there another way?

Generally, we've presumed that clients don't really need to know the
difference between error levels, beyond the error-versus-notice
distinction that's embedded in the message type.  If you have an
application where that's actually important, it would be interesting to
know what it is.

> I browsed through the PostgreSQL source and discovered that 
> pq_parse_errornotice() (in src/backend/libpq/pqmq.c) uses the same naive 
> strcmp() approach to determine error level. This means that it will fail when 
> the server is compiled with --enable-nls. I am not sure what the impact of 
> this is, since I couldn't really figure out where that function is used.

Ooops.  Indeed, that is broken:

postgres=# select 1/0;  -- using French locale
ERREUR:  division par zéro
postgres=# set force_parallel_mode=1;
SET
postgres=# select stringu1::int2 from tenk1 where unique1 = 1;
ERREUR:  unknown error severity
CONTEXT:  parallel worker

Not sure what we ought to do about that, but we need to do something.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v2] Add overflow checks to money type input function

2016-08-25 Thread Fabien COELHO


Hello Peter,

My 0.02€ (not $0.02:-) comments on this patch:

Patch applies and "make check" is ok. I see no issue with the code.

A few comments below.

The regression tests are clearer & commented, it is an improvement.

While you are at it, maybe you could consider adding tests for more 
features, eg ',' skipping and '(' as negative sign:


 SELECT '(1)'::MONEY;
 SELECT '($123,456.78)'::MONEY;

The code does what it advertises. Note that this patch only tests 
overflows when parsing a string. It does not detect overflows during 
operations.




The money type input function did not have any overflow checks at all.
There were some regression tests that purported to check for overflow,
but they actually checked for the overflow behavior of the int8 type
before casting to money.  Remove those unnecessary checks and add some
that actually check the money input function.


I think that the lack of generality of the MONEY type makes it near 
unusable (I do not think that it is the place of the database to 
prettyprint the currency, especially with a '$' sign which happen not to 
be the currency of 95% of the world population, the precision is hardwired 
to 2 figures after the unit, the convention to use '(' for negative 
numbers is rather an anglo-saxon accounting one, ...), so I would not have 
bothered. This type should really be named "DOLLAR" or "USD".



+   /*
+* We accumulate the absolute amount in "value" and then apply the sign 
at
+* the end.  (The sign can appear before or after the digits, so it 
would
+* be more complicated to do otherwise.)  Because of the larger range of
+* negative signed integers, we build "value" in the negative and then
+* flip the sign at the end,


Argh. A trick!


catching most-negative-number overflow if
+* necessary.
+*/
+
for (; *s; s++)
{
/* we look for digits as long as we have found less */
/* than the required number of decimal places */
if (isdigit((unsigned char) *s) && (!seen_dot || dec < fpoint))
{
-   value = (value * 10) + (*s - '0');
+   Cash newvalue = (value * 10) - (*s - '0');
+
+   if (newvalue / 10 != value)


I would have done "if (newvalue > 0)" because / used to be expensive and 
the overflow materializes as a sign inversion, but I understand Tom 
commented against that, so this is fine.



/* round off if there's another digit */
if (isdigit((unsigned char) *s) && *s >= '5')
-   value++;
+   value--;


Positive/negative trick again. A reminder comment?


+   if (value > 0)


Trick again...

Ok, this test seems to be necessary just for a min int value that would 
have been badly rounded down by the preceding increment.




+   ereport(ERROR,
+   (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+errmsg("value \"%s\" is out of range for type 
money",
+   str)));

/* adjust for less than required decimal places */
for (; dec < fpoint; dec++)
-   value *= 10;
+   {
+   Cash newvalue = value * 10;
+
+   if (newvalue / 10 != value)
+   ereport(ERROR,
+   
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+errmsg("value \"%s\" is out of range for 
type money",
+   str)));
+
+   value = newvalue;
+   }

/*
 * should only be trailing digits followed by whitespace, right paren,
@@ -247,7 +280,17 @@ cash_in(PG_FUNCTION_ARGS)
str)));
}

-   result = value * sgn;
+   if (sgn > 0)
+   {
+   result = -value;


The code looks a little bit strange because of the above negative value 
trick. Maybe there could be a reminder comment?


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Aug 25, 2016 at 10:34 AM, Stephen Frost  wrote:
> >> My point wasn't really that archive_command should actually be simple.
> >> My point was that if it's being run multiple times per second, there
> >> are additional challenges that wouldn't arise if it were being run
> >> only every 5-10 seconds.
> >
> > My point was that the concerns about TCP/ssh startup costs, which was
> > part of your point #1 in your initial justification for the change,
> > have been addressed through tooling.
> 
> It's good to know that some tool sets have addressed that, but I'm
> pretty certain that not every tool set has done so, probably not even
> all of the ones in common use.  Anyway, I think the requirements we
> impose on archive_command today are just crazy.  All other things
> being equal, changes that make it easier to write a decent one are
> IMHO going in the right direction.

Agreed, but, unfortunately, this isn't an "all other things being equal"
case, or we wouldn't be having this discussion.  Increasing the WAL
segment size means it'll be longer before archive_command is called
which means there's a larger amount of potential data loss for users who
are using it without any other archiving/replication solution, along
with the other concerns about it possibly resulting in a higher disk
space cost.

I agree that increasing it makes sense and that 64MB is a good number,
but I wouldn't want to go much higher than that.  That doesn't
completely solve the TCP/SSH start-up cost penalty as there will be
environments where that is still expensive even with 64MB WAL segments,
but it will certainly be reduced.

To try to summarize, I don't think we should be trying to solve the
TCP/SSH start-up penalty issue for all users by encouraging them to
increase the WAL segment size, at least not without covering the
trade-offs.  That isn't to say we shouldn't change the default, I agree
that we should, but I believe we should keep it a reasonably
conservative change and if we make it user-configurable then we need
to be sure to document the trade-offs.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-25 Thread Jakob Egger
Hi,

My PostgreSQL client checks the PG_DIAG_SEVERITY error field to determine the 
error level.

However, I have now learned that this field is localized. This means that a 
server configured with --enable-nls might for example return the string ERREUR 
instead of ERROR.

So if I want to determine the error level, do I need to compare against all 
localised variations in my app? Or is there another way?

I browsed through the PostgreSQL source and discovered that 
pq_parse_errornotice() (in src/backend/libpq/pqmq.c) uses the same naive 
strcmp() approach to determine error level. This means that it will fail when 
the server is compiled with --enable-nls. I am not sure what the impact of this 
is, since I couldn't really figure out where that function is used.

Best regards,
Jakob

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Simon Riggs
On 25 August 2016 at 02:31, Robert Haas  wrote:

> Furthermore, there is an enforced, synchronous fsync at the end of
> every segment, which actually does hurt performance on write-heavy
> workloads.[2] Of course, if that were the only reason to consider
> increasing the segment size, it would probably make more sense to just
> try to push that extra fsync into the background, but that's not
> really the case.  From what I hear, the gigantic number of files is a
> bigger pain point.

I think we should fully describe the problem before finding a solution.

This is too big a change to just tweak a value without discussing the
actual issue.

And if the problem is as described, how can a change of x4 be enough
to make it worth the pain of change? I think you're already admitting
it can't be worth it by discussing initdb configuration.

If we do have the pain of change, should we also consider making WAL
files variable length? What do we gain by having the files all the
same size? ISTM better to have WAL files that vary in length up to 1GB
in size.

(This is all about XLOG_SEG_SIZE; I presume XLOG_BLCKSZ can stay as it
is, right?)

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 9:34 AM, Magnus Hagander  wrote:
> Because it comes with the cluster during replication. I think it's more
> likely that you accidentally end up with two instances compiled with
> different values than that you get an issue from this.

I hadn't thought about it that way, but I think you're right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 10:39 AM, Bruce Momjian  wrote:
> Another issue is that many users are coming from database products that
> have significant performance hits in switching WAL files so they might
> be tempted to set very high segment sizes in inappropriate cases.

Well, we have some hit there, too.  It may be smaller, but it's
certainly not zero.

I'm generally in favor of preventing people from setting ridiculous
values for settings; we shouldn't let somebody set the WAL segment
size to 8kB or something silly like that.  But it's more important to
enable legitimate uses than it is to prohibit inappropriate uses.  If
a particular value of a particular setting may be legitimately useful
to some users, we should allow it, even if some other user might
choose that value under false assumptions.

In short, let's eschew nannyism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Bruce Momjian
On Wed, Aug 24, 2016 at 08:52:20PM -0700, Andres Freund wrote:
> On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
> > On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund  wrote:
> > > and I'm also rather doubtful that it's actually without overhead.
> > 
> > Really?  Where do you think the overhead would come from?
> 
> ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
> doing a lot of %). Some of that happens with exclusive lwlocks held, and
> some even with a spinlock held IIRC. Making that variable won't be
> free. Whether it's actually measurabel - hard to say. I do remember
> Heikki fighting hard to simplify some parts of the critical code during
> xlog scalability stuff, and that that even involved moving minor amounts
> of math out of critical sections.

I think Robert made a good case that high-volume servers might want a
larger WAL segment size, but as Andres pointed out, there are
performance concerns.  Those might be minimized by requiring the segment
size to be a 2x multiple of 16MB.

Another issue is that many users are coming from database products that
have significant performance hits in switching WAL files so they might
be tempted to set very high segment sizes in inappropriate cases.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 10:34 AM, Stephen Frost  wrote:
>> My point wasn't really that archive_command should actually be simple.
>> My point was that if it's being run multiple times per second, there
>> are additional challenges that wouldn't arise if it were being run
>> only every 5-10 seconds.
>
> My point was that the concerns about TCP/ssh startup costs, which was
> part of your point #1 in your initial justification for the change,
> have been addressed through tooling.

It's good to know that some tool sets have addressed that, but I'm
pretty certain that not every tool set has done so, probably not even
all of the ones in common use.  Anyway, I think the requirements we
impose on archive_command today are just crazy.  All other things
being equal, changes that make it easier to write a decent one are
IMHO going in the right direction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Aug 25, 2016 at 9:48 AM, Stephen Frost  wrote:
> > * Robert Haas (robertmh...@gmail.com) wrote:
> >> Meanwhile, we'll significantly help people who are currently
> >> generating painfully large but not totally insane numbers of WAL
> >> segments.  Someone who is currently generating 32,768 WAL segments per
> >> day - about one every 2.6 seconds - will have a significantly easier
> >> time if they start generating 8,192 WAL segments per day - about one
> >> every 10.5 seconds - instead.  It's just much easier for a reasonably
> >> simple archive command to keep up, "ls" doesn't have as many directory
> >> entries to sort, etc.
> >
> > I'm generally on-board with increasing the WAL segment size, and I can
> > see the point that we might want to make it more easily configurable as
> > it's valuable to set it differently on a small database vs. a large
> > database, but I take exception with the notion that a "simple archive
> > command" is ever appropriate.
> 
> My point wasn't really that archive_command should actually be simple.
> My point was that if it's being run multiple times per second, there
> are additional challenges that wouldn't arise if it were being run
> only every 5-10 seconds.

My point was that the concerns about TCP/ssh startup costs, which was
part of your point #1 in your initial justification for the change,
have been addressed through tooling.

> I guess I should have said "simpler" rather than "reasonably simple",
> because there's nothing simple about setting archive_command properly.

Agreed.

> I mean, it could only actually be simple if somebody had a good a
> backup tool that provided an archive_command that you could just drop
> in place.  But I'm sure if somebody had such a tool, they'd take every
> opportunity to bring it up, so we doubtless would have heard about it
> by now.  Right?  :-)

Thankfully there's actually multiple good open source and freely
available tools that address this issue (albeit, through different
mechanisms).

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] Why is a newly created index contains the invalid LSN?

2016-08-25 Thread Yury Zhuravlev

Hello hackers.

I have a small question. While working on an incremental backup I noticed a 
strange thing.

Newly created index is contains the invalid LSN (0/0).
Exmaple:
postgres=# select lsn from page_header(get_raw_page('test_a_idx2',0));
lsn 
-

0/0
(1 row)

Can you explain me why?

Thanks.

--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 9:48 AM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> Meanwhile, we'll significantly help people who are currently
>> generating painfully large but not totally insane numbers of WAL
>> segments.  Someone who is currently generating 32,768 WAL segments per
>> day - about one every 2.6 seconds - will have a significantly easier
>> time if they start generating 8,192 WAL segments per day - about one
>> every 10.5 seconds - instead.  It's just much easier for a reasonably
>> simple archive command to keep up, "ls" doesn't have as many directory
>> entries to sort, etc.
>
> I'm generally on-board with increasing the WAL segment size, and I can
> see the point that we might want to make it more easily configurable as
> it's valuable to set it differently on a small database vs. a large
> database, but I take exception with the notion that a "simple archive
> command" is ever appropriate.

My point wasn't really that archive_command should actually be simple.
My point was that if it's being run multiple times per second, there
are additional challenges that wouldn't arise if it were being run
only every 5-10 seconds.

I guess I should have said "simpler" rather than "reasonably simple",
because there's nothing simple about setting archive_command properly.
I mean, it could only actually be simple if somebody had a good a
backup tool that provided an archive_command that you could just drop
in place.  But I'm sure if somebody had such a tool, they'd take every
opportunity to bring it up, so we doubtless would have heard about it
by now.  Right?  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UPSERT strange behavior

2016-08-25 Thread Ivan Frolkov

Suppose we have some table

create table cnt( 
 usr_id int primary key, 
 usr_doc_ref text not null, 
 cnt int, 
 sum int 
);

And going to run some insert on conflict update on it (pgbench script):

\setrandom id 1 50 
insert into cnt as c(usr_id,usr_doc_ref, cnt) values(:id, '#'||:id, 1) on 
conflict(usr_id) do update set cnt=c.cnt+1; 

Run it:

 pgbench -c 16 -j 2 -t 5 -n -h localhost -p 5432 -U postgres -f 
upsert2-ok.pgb  work 
transaction type: Custom query 
scaling factor: 1 
query mode: simple 
number of clients: 16 
number of threads: 2 
number of transactions per client: 5 
number of transactions actually processed: 80/80 
latency average: 0.000 ms 
tps = 36475.782816 (including connections establishing) 
tps = 36483.759765 (excluding connections establishing) 

All ok.
Then add a unique constraint to the table.

alter table cnt add constraint usr_doc_ref_uq unique(usr_doc_ref) 

Run pgbench again:

pgbench -c 16 -j 2 -t 5 -n -h localhost -p 5432 -U postgres -f 
upsert2-ok.pgb work
client 2 aborted in state 2: ERROR: duplicate key value violates unique 
constraint "usr_doc_ref_uq"
DETAIL: Key (usr_doc_ref)=(#39) already exists.
client 6 aborted in state 2: ERROR: duplicate key value violates unique 
constraint "usr_doc_ref_uq"
DETAIL: Key (usr_doc_ref)=(#16) already exists.
client 9 aborted in state 2: ERROR: duplicate key value violates unique 
constraint "usr_doc_ref_uq"
DETAIL: Key (usr_doc_ref)=(#28) already exists.

So, if we have primary key and unique constraint on a table then upsert will 
not work as would expected.




Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> Meanwhile, we'll significantly help people who are currently
> generating painfully large but not totally insane numbers of WAL
> segments.  Someone who is currently generating 32,768 WAL segments per
> day - about one every 2.6 seconds - will have a significantly easier
> time if they start generating 8,192 WAL segments per day - about one
> every 10.5 seconds - instead.  It's just much easier for a reasonably
> simple archive command to keep up, "ls" doesn't have as many directory
> entries to sort, etc.

I'm generally on-board with increasing the WAL segment size, and I can
see the point that we might want to make it more easily configurable as
it's valuable to set it differently on a small database vs. a large
database, but I take exception with the notion that a "simple archive
command" is ever appropriate.  Heikki's excellent talk at PGCon '15
(iirc) goes over why our archive command example is about as terrible as
you can get and that's primairly because it's just a simple 'cp'.

archive_command needs to be doing things like fsync'ing the WAL file
after it's been copied away, probably fsync'ing the directory the WAL
file has been copied into, returning the correct exit code to PG, etc.

Thankfully, there are backup/WAL archive utilities which do this
correctly and are even built to handle a large rate of WAL files for
high transaction systems (including keeping open a long-running ssh/TCP
to address the startup costs of both).  Switching to 64MB would still be
nice to simply reduce the number of files you have to deal with, and I'm
all for it for that reason, but the ssh/TCP startup cost reasons aren't
good ones for the switch as people shouldn't be using a "simple" command
anyway and the good tools for WAL archiving have already addressed those
issues.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Magnus Hagander
On Thu, Aug 25, 2016 at 5:32 AM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Wed, Aug 24, 2016 at 10:33 PM, Tom Lane  wrote:
> >> ... but I think this is just folly.  You'd have to do major amounts
> >> of work to keep, eg, slave servers on the same page as the master
> >> about what the segment size is.
>
> > I said an initdb-time parameter, meaning not capable of being changed
> > within the lifetime of the cluster.  So I don't see how the slave
> > servers would get out of sync?
>
> The point is that that now becomes something to worry about.  I do not
> think I have to exhibit a live bug within five minutes' thought before
> saying that it's a risk area.  It's something that we simply have not
> worried about before, and IME that generally means there's some squishy
> things there.
>

If we ignore the possible performance implications (which we shouldn't, of
course, but for the sake of argument), I think having it as a configurable
parameter in initdb would make it *less* of something to worry about.

Because it comes with the cluster during replication. I think it's more
likely that you accidentally end up with two instances compiled with
different values than that you get an issue from this.

That said, I think it also has to be a *very* bad painpoint for somebody to
care about changing it if it requires recompilation. The vast majority of
users run the packaged versions, and they don't want to run anything else.
So you will have whatever the RPMs or the DEBs or installers pick for you.
Anything that is a ./configure-time option,is something we should expect
almost nobody to change.

Changing the default will of course help/hurt those as well. But if we
change the default to something high and say "hey those of you who just run
it on a smaller system should recompile with a different --configure", we
are being *very* user-unfriendly. Or the other way around.

That doesn't mean we shouldn't change the default. We just need to be a lot
more careful about what we change it to if it's ./configure to reset it.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Robert Haas
On Thu, Aug 25, 2016 at 1:04 AM, Wolfgang Wilhelm
 wrote:
> What would happen if there's a database on a server with initdb (or
> whatever) parameter -with-wal-size=64MB and later someone decides to make it
> the master in a replicated system and has a slave without that parameter?
> Would the slave work with the "different" wal size of the master? How could
> be guaranteed that in such a scenario the replication either works correctly
> or failes with a meaningful error message?

You make reference to an "initdb (or whatever) parameter" but actually
there is a big difference between the "initdb" case and the "whatever"
case.  If the parameter is fixed at initdb time, then the master and
the slave will definitely agree: the slave had to be created by
copying the master, and that means the control file that contains the
size was also copied.  Neither can have been changed afterwards.
That's what an initdb-time parameter means.  On the other hand, if the
parameter is, say, a GUC, then you would have exactly the kinds of
problems that you are talking about here.  I am not keen to solve any
of those problems, which is why I am not proposing to go any further
than an initdb-time parameter.

> But in general I thing a more flexible WAL size is a good idea.
> To answer Andres: You have found one of the (few?) users to adjust initdb
> parameters.

Good to know, thanks.

In further defense of the idea that making this more configurable
isn't nuts, it's worth noting that the history here is:

* When Vadim originally added XLogSegSize in
30659d43eb73272e20f2eb1d785a07ba3b553ed8 (September 1999), it was a
constant.
* In c3c09be34b6b0d7892f1087a23fc6eb93f3c4f04 (February 2004), this
became configurable via pg_config_manual.h.
* In cf9f6c8d8e9df28f3fbe1850ca7f042b2c01252e (May 2008), Tom made
this configurable via configure.

So there's a well-established history of making this gradually easier
for users to change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Intermittent "cache lookup failed for type" buildfarm failures

2016-08-25 Thread Tom Lane
I wrote:
> There is something rotten in the state of Denmark.  Here are four recent
> runs that failed with unexpected "cache lookup failed for type "
> errors:

> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grouse=2016-08-16%2008%3A39%3A03
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nudibranch=2016-08-13%2009%3A55%3A09
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2016-08-09%2001%3A46%3A17
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2016-08-09%2000%3A44%3A18

I believe I've figured this out.

I realized that all the possible instances of "cache lookup failed for
type" are reporting failures of SearchSysCache1(TYPEOID, ...) or related
calls, and therefore I could narrow this down by setting a breakpoint
there on the combination of cacheId = TYPEOID and key1 > 16384 (since the
OIDs reported for the failures are clearly for some non-builtin type).
After a bit of logging it became clear that the only such calls occurring
in the statements that are failing in the buildfarm are coming from the
parser's attempts to resolve an operator name.  And then it was blindingly
obvious what changed recently: commits f0c7b789a et al added a test case
in case.sql that creates and then drops both an '=' operator and the type
it's for.  And that runs in parallel with the failing tests, which all
need to resolve operators named '='.  So in the other sessions, the parser
is seeing that transient '=' operator as a possible candidate, but then
when it goes to test whether that operator could match the actual inputs,
the type is already gone (causing a failure in getBaseType or
get_element_type or possibly other places).

The best short-term fix, and the only one I'd consider back-patching,
is to band-aid the test to prevent this problem, probably by wrapping
that whole test case in BEGIN ... ROLLBACK so that concurrent tests
never see the transient '=' operator.

In the long run, it'd be nice if we were more robust about such
situations, but I have to admit I have no idea how to go about
making that so.  Certainly, just letting the parser ignore catalog
lookup failures doesn't sound attractive.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Bruce Momjian
On Wed, Aug 24, 2016 at 10:40:06PM -0300, Claudio Freire wrote:
> > time instead of requiring a recompile; we probably don't save any
> > significant number of cycles by compiling this into the server.
> 
> FWIW, +1
> 
> We're already hurt by the small segments due to a similar phenomenon
> as the ssh case: TCP slow start. Designing the archive/recovery
> command to work around TCP slow start is quite complex, and bigger
> segments would just be a better thing.
> 
> Not to mention that bigger segments compress better.

This would be good time to rename pg_xlog and pg_clog directories too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-25 Thread Alvaro Herrera
Amit Kapila wrote:
> On Wed, Aug 24, 2016 at 11:46 PM, Alvaro Herrera
>  wrote:

> > Can you split the new xlog-related stuff to a new file, say hash_xlog.h,
> > instead of cramming it in hash.h?  Removing the existing #include
> > "xlogreader.h" from hash.h would be nice.  I volunteer for pushing any
> > preliminary header cleanup commits.
> 
> So, what you are expecting here is that we create hash_xlog.h and move
> necessary functions like hash_redo(), hash_desc() and hash_identify()
> to it. Then do whatever else is required to build it successfully.
> Once that is done, I can build my patches on top of it.  Is that
> right?  If yes, I can work on it.

Yes, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to do failover in pglogical replication?

2016-08-25 Thread Umair Shahid
Github tracker: https://github.com/2ndQuadrant/pglogical/issues

You can also send an email to pglogical-l...@2ndquadrant.com.

- Umair

On Thu, Aug 25, 2016 at 7:01 AM, Muhammed Roshan 
wrote:

> Hi Craig,
>
> Could you please let me know how to access the github tracker?
>
> Regards,
> Muhammed Roshan
>
> On Thu, Aug 25, 2016 at 9:25 AM, Craig Ringer 
> wrote:
>
>> On 24 August 2016 at 19:42, roshan_myrepublic 
>> wrote:
>>
>> > Now, I am able to see the last added row in my subscriber table. All the
>> > other 4 rows which were added in the beginning are still missing. What
>> am I
>> > doing wrong here?
>>
>> Hi. This isn't really on topic for the pgsql-hackers mailing list.
>> We're adding a pglogical mailing list now that it's clear it's not
>> going into core, but in the mean time feel free to raise this on the
>> github tracker for the project.
>>
>> --
>>  Craig Ringer   http://www.2ndQuadrant.com/
>>  PostgreSQL Development, 24x7 Support, Training & Services
>>
>
>


Re: [HACKERS] How to do failover in pglogical replication?

2016-08-25 Thread Muhammed Roshan
Hi Craig,

Could you please let me know how to access the github tracker?

Regards,
Muhammed Roshan

On Thu, Aug 25, 2016 at 9:25 AM, Craig Ringer  wrote:

> On 24 August 2016 at 19:42, roshan_myrepublic 
> wrote:
>
> > Now, I am able to see the last added row in my subscriber table. All the
> > other 4 rows which were added in the beginning are still missing. What
> am I
> > doing wrong here?
>
> Hi. This isn't really on topic for the pgsql-hackers mailing list.
> We're adding a pglogical mailing list now that it's clear it's not
> going into core, but in the mean time feel free to raise this on the
> github tracker for the project.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] patch proposal

2016-08-25 Thread Stephen Frost
* Venkata B Nagothi (nag1...@gmail.com) wrote:
> *Query 1*
> 
> What about the existing parameter called "recovery_target" which accepts
> only one value "immediate", which will be similar to the "promote" option
> with the to-be-introduced new parameter.
> Since this parameter's behaviour will be incorporated into the new
> parameter, I think, this parameter can be deprecated from the next
> PostgreSQL version ?

I don't think we can really consider that without having a good answer
for what the "new parameter" is, in particular...

> *Query 2*
> 
> I am thinking that the new parameter name should be
> "recovery_target_incomplete" or "recovery_target_incomplete_action" which
> (by name) suggests that recovery target point is not yet reached and
> accepts options "pause","promote" and "shutdown".
> 
> The other alternative name i thought of was -
> "recovery_target_immediate_action", which (by name) suggests the action to
> be taken when the recovery does not reach the actual set recovery target
> and reaches immediate consistent point.

I don't really care for any of those names.  Note that "immediate" and
"the point at which we realize that we didn't find the recovery target"
are *not* necessairly the same.  Whatever we do here needs to also cover
the 'recovery_target_name' option, where we're only going to realize
that we didn't find the restore point when we reach the end of the WAL
stream.

I'm not a fan of the "recovery_target" option, particularly as it's only
got one value even though it can mean two things (either "immediate" or
"not set"), but we need a complete solution before we can consider
deprecating it.  Further, we could consider making it an alias for
whatever better name we come up with.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Logical decoding restart problems

2016-08-25 Thread Stas Kelvich
> On 20 Aug 2016, at 15:59, Craig Ringer  wrote:
> 
> I'll wait for a test case or some more detail.

Thanks for clarification about how restart_lsn is working.

Digging slightly deeper into this topic revealed that problem was in two phase 
decoding, not it logical decoding itself.
While I was writing DecodePrepare() I've and copied call to 
SnapBuildCommitTxn() function from DecodeCommit()
which was removing current transaction from running list and that’s obviously 
wrong thing to do for a prepared tx.

So I end up with following workaround:

--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -950,7 +950,7 @@ SnapBuildAbortTxn(SnapBuild *builder, XLogRecPtr lsn,
  */
 void
 SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
-  int nsubxacts, TransactionId *subxacts)
+  int nsubxacts, TransactionId *subxacts, bool 
isCommit)
 {
int nxact;
 
@@ -1026,7 +1026,8 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, 
TransactionId xid,
 * Make sure toplevel txn is not tracked in running txn's anymore, 
switch
 * state to consistent if possible.
 */
-   SnapBuildEndTxn(builder, lsn, xid);
+   if (isCommit)
+   SnapBuildEndTxn(builder, lsn, xid);


Calling SnapBuildCommitTxn with isCommit=true from commit and false from 
Prepare. However while I’m not
observing partially decoded transactions anymore, I’m not sure that this is 
right way to build proper snapshot and
something else isn’t broken.

Also while I was playing psycopg2 logical decoding client I do see empty 
transaction containing only
BEGIN/COMMIT (with test_decoding output plugin, and current postgres master).


-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-08-25 Thread Gerdan Santos
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

I did some tests and found nothing special. The stated resource is implemented 
correctly.
He passes all regression tests and enables the use of the new features 
specified.


The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-08-25 Thread Gerdan Rezende dos Santos
OK. as I said just got confused if there was any way to disable. anyway the
code is ok, does what it says and is well formatted.
I will change now!
So sorry,  for my confused.

Thnks!

Em quinta-feira, 25 de agosto de 2016, Corey Huinker <
corey.huin...@gmail.com> escreveu:

>
>
> On Wed, Aug 24, 2016 at 10:36 PM, Gerdan Santos  > wrote:
>
>> The following review has been posted through the commitfest application:
>> make installcheck-world:  tested, passed
>> Implements feature:   tested, passed
>> Spec compliant:   tested, passed
>> Documentation:tested, passed
>>
>> Sorry, my mistake!
>>
>> I could not find a way to disable this functionality , I see that the
>> impact can be big as it is changed the output structure \timing without a
>> mode to disable it. I even finding great functionality but need a way to
>> set to default.
>>
>>
>>
> Thanks for reviewing! I'm not really sure how to proceed, so I'll try to
> summarize where it stands. Apologies if I 
> mischaracterize/misattribute/misremember
> someone's position.
>
> Generally speaking, people disliked the third mode for \timing, and were
> generally fine with AndrewG's idea of printing the timing in both raw
> milliseconds and a more human-digestible format, which means that we can:
>
> 1. keep the format exactly as is, ignoring locale issues
>  + It's already done
>  + lightweight
>  +TomL believes there will be no confusion
>  - others disagree
> 2. we fish out the proper locale-specific abbreviations for
> days/hours/minutes/seconds
>  + no additional settings
>  + locale stuff can't be that hard
>  - I've never dealt with it (American, surprise)
> 3. ignore locales and fall back to a left-trimmed DDD HH:MM:SS.mmm format
>  + Easy to revert to that code
>  + My original format and one PeterE advocated
>  - others disliked
> 4. we have a \pset that sets fixed scale (seconds, minutes, hours, days),
> sliding scale (what's displayed now), or interval
>  + some flexibility with some easy config values
>  - still have the locale issue
>  -  likely will miss a format somebody wanted
> 4. The \pset option is a time format string like "%d %h:%m:%s".
>  + maximum flexibility
>  + sidesteps the locale issue by putting it in the user's hands
>  - those format strings are sometimes tough for users to grok
>  - ISO 8601 isn't much of a help as it doesn't handle milliseconds
>  - additional config variable
>  - documentation changes
>  - debate about what the default should be. GOTO 1.
>
> I personally would be happy with any of these options, so I think we get
> some more feedback to see if a consensus emerges. It's a tiny patch and
> trivial to test, so we have time(ing) on our side.
>
>


-- 
*Gerdan Rezende dos Santos *
*Po*stgreSQL & EnterpriseDB Specialist, Support, Training & Services
+55 (61) 9645-1525


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-25 Thread Michael Paquier
On Thu, Aug 25, 2016 at 10:25 AM, Martín Marqués  wrote:
> 2016-08-24 21:34 GMT-03:00 Michael Paquier :
>>
>> Yes, you are right. If I look at the diffs this morning I am seeing
>> the ACLs being dumped for this aggregate. So we could just fix the
>> test and be done with it. I did not imagine the index issue though,
>> and this is broken for some time, so that's not exclusive to 9.6 :)
>
> Do you see any easier way than what I mentioned earlier (adding a
> selectDumpableIndex() function) to fix the index dumping issue?

Yes, we are going to need something across those lines. And my guess
is that this is going to be rather close to getOwnedSeqs() in terms of
logic.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-08-25 Thread amul sul
On Thu, Aug 25, 2016 at 3:41 PM, Artur Zakirov  wrote:
>>> You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to
>>> execute such query:
>>>
>>>
>>> SELECT to_timestamp('---2000JUN', ' MON');
>>>
>>>
>>> Will be it a proper behaviour?
>>
>>
>>
>> Looks good to me, no one will complain if something working on PG but not
>> on Oracle.
>
>
> Thanks. I've created the entry in https://commitfest.postgresql.org/10/713/
> . You can add yourself as a reviewer.
>

Done, added myself as reviewer & changed status to "Ready for
Committer". Thanks !

Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-08-25 Thread Artur Zakirov

You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to
execute such query:


SELECT to_timestamp('---2000JUN', ' MON');


Will be it a proper behaviour?



Looks good to me, no one will complain if something working on PG but not on 
Oracle.


Thanks. I've created the entry in 
https://commitfest.postgresql.org/10/713/ . You can add yourself as a 
reviewer.


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Checksum error and VACUUM FULL

2016-08-25 Thread Tatsuki Kadomoto
Hello, gurus,

I faced incorrect checksum on "global/pg_filenode.map" at the right timing 
"VACUUM FULL" is executed and session was aborted.

Aug 16 20:51:19 postgres[22329]: [2-1] FATAL:  relation mapping file 
"global/pg_filenode.map" contains incorrect checksum


Aug 16 20:51:19 postgres[22329]: [2-2] STATEMENT:  SELECT 
id,readbm,writebm,survbm,timeout FROM Users WHERE username='packetlogicd' AND 
password=md5('x')


I'm reading the comment in src/backend/utils/cache/relmapper.c .

===
Therefore mapped catalogs can only be relocated by operations such as VACUUM 
FULL and CLUSTER, which make no transactionally-significant changes: it must be 
safe for the new file to replace the old, even if the transaction itself aborts.
===

Does this comment mean it's expected to get this kind of checksum error if the 
timing is really bad?

Regards,
Tatsuki



Re: [HACKERS] Bug in to_timestamp().

2016-08-25 Thread amul sul
On Thursday, August 25, 2016 1:56 PM, Artur Zakirov  
wrote:
>> #2. Warning at compilation;
>>
>> formatting.c: In function ‘do_to_timestamp’:
>> formatting.c:3049:37: warning: ‘prev_type’ may be used uninitialized in this 
>> function [-Wmaybe-uninitialized]
>> if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
>> ^
>> formatting.c:2988:5: note: ‘prev_type’ was declared here
>> prev_type;
>> ^
>>
>> You can avoid this by assigning  zero (or introduce NODE_TYPE_INVAL ) to 
>> prev_type at following line:
>>
>> 256 +   prev_type;
>
>
>You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to 
>execute such query:
>
>
>SELECT to_timestamp('---2000JUN', ' MON');
>
>
>Will be it a proper behaviour?


Looks good to me, no one will complain if something working on PG but not on 
Oracle. 


Thanks & Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-08-25 Thread Artur Zakirov

Hi,


#1.
Whitespace @ line # 317.


Sorry, fixed.


#2. Warning at compilation;

formatting.c: In function ‘do_to_timestamp’:
formatting.c:3049:37: warning: ‘prev_type’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]
if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
^
formatting.c:2988:5: note: ‘prev_type’ was declared here
prev_type;
^

You can avoid this by assigning  zero (or introduce NODE_TYPE_INVAL ) to 
prev_type at following line:

256 +   prev_type;


You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to 
execute such query:


SELECT to_timestamp('---2000JUN', ' MON');

Will be it a proper behaviour?

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5c1c4f6..36d8b3e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6146,9 +6146,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
  
   
to_timestamp and to_date
-   skip multiple blank spaces in the input string unless the
+   skip multiple blank spaces and printable non letter and non digit
+   characters in the input string and in the formatting string unless the
FX option is used. For example,
-   to_timestamp('2000JUN', ' MON') works, but
+   to_timestamp('2000JUN', ' MON'),
+   to_timestamp('2000JUN', ' MON')
+   and to_timestamp('2000JUN', 'MON') work, but
to_timestamp('2000JUN', 'FX MON') returns an error
because to_timestamp expects one space only.
FX must be specified as the first item in
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index bbd97dc..7430013 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -169,6 +169,8 @@ struct FormatNode
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -947,6 +949,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(char *str, const KeyWord *kw,
  const int *index);
 static const KeySuffix *suff_search(char *str, const KeySuffix *suf, int type);
+static bool is_char_separator(char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -963,7 +966,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1036,6 +1038,17 @@ suff_search(char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_char_separator(char *str)
+{
+	return ((pg_mblen(str) == 1) &&
+			/* printable character, but not letter and digit */
+			((*str >= '!' && *str <= '/') ||
+			 (*str >= ':' && *str <= '@') ||
+			 (*str >= '[' && *str <= '`') ||
+			 (*str >= '{' && *str <= '~')));
+}
+
 /* --
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * --
@@ -1237,9 +1250,10 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 {
 	const KeySuffix *s;
 	FormatNode *n;
+	bool		in_text = false,
+in_backslash = false;
 	int			node_set = 0,
-suffix,
-last = 0;
+suffix;
 
 #ifdef DEBUG_TO_FROM_CHAR
 	elog(DEBUG_elog_output, "to_char/number(): run parser");
@@ -1251,6 +1265,55 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 	{
 		suffix = 0;
 
+		/* Previous character was a backslash */
+		if (in_backslash)
+		{
+			/* After backslash should go non-space character */
+			if (isspace(*str))
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("invalid escape sequence")));
+			in_backslash = false;
+
+			n->type = NODE_TYPE_CHAR;
+			n->character = *str;
+			n->key = NULL;
+			n->suffix = 0;
+			n++;
+			str++;
+			continue;
+		}
+		/* Previous character was a quote */
+		else if (in_text)
+		{
+			if (*str == '"')
+			{
+str++;
+in_text = false;
+			}
+			else if (*str == '\\')
+			{
+str++;
+in_backslash = true;
+			}
+			else
+			{
+if (ver == DCH_TYPE && is_char_separator(str))
+	n->type = NODE_TYPE_SEPARATOR;
+else if (isspace(*str))
+	n->type = NODE_TYPE_SPACE;
+else
+	n->type = NODE_TYPE_CHAR;
+
+n->character = *str;
+n->key = NULL;
+n->suffix = 0;
+n++;
+str++;
+			}
+			continue;
+		}
+
 		/*
 		 

Re: [HACKERS] Declarative partitioning - another take

2016-08-25 Thread Ashutosh Bapat
On Thu, Aug 25, 2016 at 12:22 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2016/08/22 13:51, Ashutosh Bapat wrote:
> > The parent-child relationship of multi-level partitioned tables is not
> > retained when creating the AppendRelInfo nodes. We create RelOptInfo
> nodes
> > for all the leaf and intermediate tables. The AppendRelInfo nodes created
> > for these RelOptInfos set the topmost table as the parent of all the leaf
> > and child tables. Since partitioning scheme/s at each level is/are
> > associated with the parent/s at that level, we loose information about
> the
> > immediate parents and thus it becomes difficult to identify which leaf
> node
> > falls where in the partition hierarchy. This stops us from doing any
> > lump-sum partition pruning where we can eliminate all the partitions
> under
> > a given parent-partition if that parent-partition gets pruned. It also
> > restricts partition-wise join technique from being applied to partial
> > partition hierarchy when the whole partitioning scheme of joining tables
> > does not match. Maintaining a RelOptInfo hierarchy should not create
> > corresponding Append (all kinds) plan hierarchy since
> > accumulate_append_subpath() flattens any such hierarchy while creating
> > paths. Can you please consider this point in your upcoming patch?
>
> I agree.  So there seem to be two things here:  a) when expanding a
> partitioned table inheritance set, do it recursively such that resulting
> AppendRelInfos preserve *immediate* parent-child relationship info.


Right.


> b)
> when accumulating append subpaths, do not flatten a subpath that is itself
> an append when ((AppendPath *) subpath)->path.parent is a RelOptInfo with
> non-NULL partitioning info.Is the latter somehow necessary for
> pairwise-join considerations?
>

I don't think you need to do anything in the path creation code for this.
As is it flattens all AppendPath hierarchies whether for partitioning or
inheritance or subqueries. We should leave it as it is.



> I think I can manage to squeeze in (a) in the next version patch and will
> also start working on (b), mainly the part about RelOptInfo getting some
> partitioning info.
>

I am fine with b, where you would include some partitioning information in
RelOptInfo. But you don't need to do what you said in (b) above.

In a private conversation Robert Haas suggested a way slightly different
than what my patch for partition-wise join does. He suggested that the
partitioning schemes i.e strategy, number of partitions and bounds of the
partitioned elations involved in the query should be stored in PlannerInfo
in the form of a list. Each partitioning scheme is annotated with the
relids of the partitioned relations. RelOptInfo of the partitioned relation
will point to the partitioning scheme in PlannerInfo. Along-with that each
RelOptInfo will need to store partition keys for corresponding relation.
This simplifies matching the partitioning schemes of the joining relations.
Also it reduces the number of copies of partition bounds floating around as
we expect that a query will involve multiple partitioned tables following
similar partitioning schemes. May be you want to consider this idea while
working on (b).


> Thanks,
> Amit
>
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning - another take

2016-08-25 Thread Amit Langote
On 2016/08/22 13:51, Ashutosh Bapat wrote:
> The parent-child relationship of multi-level partitioned tables is not
> retained when creating the AppendRelInfo nodes. We create RelOptInfo nodes
> for all the leaf and intermediate tables. The AppendRelInfo nodes created
> for these RelOptInfos set the topmost table as the parent of all the leaf
> and child tables. Since partitioning scheme/s at each level is/are
> associated with the parent/s at that level, we loose information about the
> immediate parents and thus it becomes difficult to identify which leaf node
> falls where in the partition hierarchy. This stops us from doing any
> lump-sum partition pruning where we can eliminate all the partitions under
> a given parent-partition if that parent-partition gets pruned. It also
> restricts partition-wise join technique from being applied to partial
> partition hierarchy when the whole partitioning scheme of joining tables
> does not match. Maintaining a RelOptInfo hierarchy should not create
> corresponding Append (all kinds) plan hierarchy since
> accumulate_append_subpath() flattens any such hierarchy while creating
> paths. Can you please consider this point in your upcoming patch?

I agree.  So there seem to be two things here:  a) when expanding a
partitioned table inheritance set, do it recursively such that resulting
AppendRelInfos preserve *immediate* parent-child relationship info.  b)
when accumulating append subpaths, do not flatten a subpath that is itself
an append when ((AppendPath *) subpath)->path.parent is a RelOptInfo with
non-NULL partitioning info.  Is the latter somehow necessary for
pairwise-join considerations?

I think I can manage to squeeze in (a) in the next version patch and will
also start working on (b), mainly the part about RelOptInfo getting some
partitioning info.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for lazy_scan_heap

2016-08-25 Thread Masahiko Sawada
On Thu, Aug 25, 2016 at 1:41 AM, Anastasia Lubennikova
 wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> Hi,
> I haven't tested the performance yet, but the patch itself looks pretty good
> and reasonable improvement.
> I have a question about removing the comment. It seems to be really tricky
> moment. How do we know that all-frozen block hasn't changed since the
> moment we checked it?

I think that we don't need to check whether all-frozen block hasn't
changed since we checked it.
Even if the all-frozen block has changed after we saw it as an
all-frozen page, we can update the relfrozenxid.
Because any new XIDs just added to that page are newer than the
GlobalXmin we computed.

Am I missing something?

In this patch, since we count the number of all-frozen page even
during not an aggresive scan, I thought that we don't need to check
whether these blocks were all-frozen pages.

> I'm going to test the performance this week.
> I wonder if you could send a test script or describe the steps to test it?

This optimization reduces the number of scanning visibility map when
table is almost visible or frozen..
So it would be effective for very large table (more than several
hundreds GB) which is almost all-visible or all-frozen pages.

For example,
1. Create very large table.
2. Execute VACUUM FREEZE to freeze all pages of table.
3. Measure the execution time of VACUUM FREEZE.
I hope that the second VACUUM FREEZE become fast a little.

I modified the comment, and attached v2 patch.

Regards,

--
Masahiko Sawada
diff --git a/src/backend/commands/vacuumlazy.c 
b/src/backend/commands/vacuumlazy.c
index 231e92d..226dc83 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -471,6 +471,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats 
*vacrelstats,
PROGRESS_VACUUM_MAX_DEAD_TUPLES
};
int64   initprog_val[3];
+   BlockNumber n_skipped;
+   BlockNumber n_all_frozen;
 
pg_rusage_init();
 
@@ -501,6 +503,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats 
*vacrelstats,
initprog_val[2] = vacrelstats->max_dead_tuples;
pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
+   n_skipped = 0;
+   n_all_frozen = 0;
+
/*
 * Except when aggressive is set, we want to skip pages that are
 * all-visible according to the visibility map, but only when we can 
skip
@@ -558,14 +563,20 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats 
*vacrelstats,
{
if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
break;
+   n_all_frozen++;
}
else
{
if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
break;
+
+   /* Count the number of all-frozen pages */
+   if (vmstatus & VISIBILITYMAP_ALL_FROZEN)
+   n_all_frozen++;
}
vacuum_delay_point();
next_unskippable_block++;
+   n_skipped++;
}
}
 
@@ -574,7 +585,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats 
*vacrelstats,
else
skipping_blocks = false;
 
-   for (blkno = 0; blkno < nblocks; blkno++)
+   blkno = 0;
+   while (blkno < nblocks)
{
Buffer  buf;
Pagepage;
@@ -592,18 +604,22 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats 
*vacrelstats,
TransactionId visibility_cutoff_xid = InvalidTransactionId;
 
/* see note above about forcing scanning of last page */
-#define FORCE_CHECK_PAGE() \
-   (blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
+#define FORCE_CHECK_PAGE(blk) \
+   ((blk) == nblocks - 1 && should_attempt_truncation(vacrelstats))
 
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, 
blkno);
 
if (blkno == next_unskippable_block)
{
+   n_skipped = 0;
+   n_all_frozen = 0;
+
/* Time to advance next_unskippable_block */
next_unskippable_block++;
if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
{
-   while (next_unskippable_block < nblocks)
+   while (next_unskippable_block < nblocks 

Re: Collective typos in contrib/postgres_fdw (Was: Re: [HACKERS] Incorrect comment in contrib/postgres_fdw/deparse.c)

2016-08-25 Thread Etsuro Fujita

On 2016/08/25 5:29, Robert Haas wrote:

On Wed, Aug 24, 2016 at 2:41 AM, Etsuro Fujita
 wrote:

On 2016/04/04 20:11, Etsuro Fujita wrote:

I found an incorrect comment in contrib/postgres_fdw/deparse.c: s/FOR
SELECT or FOR SHARE/FOR UPDATE or FOR SHARE/  Attached is a patch to fix
that comment.



Other than this, I ran into some typos in contrib/postgres_fdw, while
working on join pushdown improvements.  Please find attached an updated
version of the patch.



Thanks, committed.


Thanks for picking this up!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-25 Thread Etsuro Fujita

On 2016/08/25 1:08, Robert Haas wrote:

On Tue, Aug 23, 2016 at 11:18 PM, Etsuro Fujita
 wrote:

OK, I think we should fix the issue that postgres_fdw produces incorrect
aliases for joining relations shown in the Relations line in EXPLAIN for a
join pushdown query like the above) in advance of the 9.6 release, so I'll
add this to the 9.6 open items.



Isn't it a bit late for that?

I'm not eager to have 9.6 get further delayed while we work on this
issue, and I definitely don't believe that this is a sufficiently
important issue to justify reverting join pushdown in its entirety.
We're talking about a minor detail of the EXPLAIN output that we'd
like to fix, but for which we have no consensus on exactly how to fix
it, and the fix may involve some redesign.  That does not seem like a
good thing to the week before rc1.  Let's just leave this well enough
alone and work on it for v10.


That's fine with me.

You said upthread, "I don't want to change it at all, neither in 10 or 
any later version."  That was the reason why I proposed to fix this in 9.6.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers