date:20160822

On 23 August 2016 at 08:27, Craig Ringer  wrote:

> On 10 August 2016 at 14:44, Michael Paquier 
> wrote:
>
>>

> I am looking a bit more seriously at this patch and assigned myself as
>> a reviewer.
>>
>
> Much appreciated.
>
>
>>

> macos complains here. You may want to replace %06lds by just %06d.
>>
>
> Yeah, or cast to a type known to be big enough. Will amend.
>

I used an upcast to (long), because on Linux it's a long. I don't see the
point of messing about adding a configure test for something this trivial.

> This patch generates a core dump, use for example pg_ctl start -w and
>> you'll bump into the trace above. There is something wrong with the
>> queue handling.
>>
>
> Huh. I didn't see that here (Fedora 23). I'll look more closely.
>
>
>> Do you have plans for a more generic structure for the command queue list?
>>
>
> No plans, no. This was a weekend experiment that turned into a useful
> patch and I'm having to scrape up time for it amongst much more important
> things like logical failover / sequence decoding and various other
> replication work.
>
> Thanks for the docs review too, will amend.
>
>
>> +   fprintf(stderr, "internal error, COPY in batch mode");
>> +   abort();
>> I don't think that's a good idea.
>
>
My thinking there was that it's a "shouldn't happen" case. It's a problem
in library logic. I'd use an Assert() here in the backend.

I could printfPQExpBuffer(...) an error and return failure instead if you
think that's more appropriate. I'm not sure how the app would handle it
correctly, but OTOH it's generally better for libraries not to call
abort(). So I'll do that, but since it's an internal error that's not meant
to happen I'll skip the gettext calls.

> Error messages should also use libpq_gettext, and perhaps
>> be stored in conn->errorMessage as we do so for OOMs happening on
>> client-side and reporting them back even if they are not expected
>> (those are blocked PQsendQueryStart in your patch).
>>
>
I didn't get that last part, re PQsendQueryStart.

> src/test/examples is a good idea to show people what this new API can
>> do, but this is never getting compiled. It could as well be possible
>> to include tests in src/test/modules/, in the same shape as what
>> postgres_fdw is doing by connecting to itself and link it to libpq. As
>> this patch complicates quote a lot fe-exec.c, I think that this would
>> be worth it. Thoughts?
>
>
I think it makes sense to use the TAP framework. Added
src/test/modules/test_libpq/ with a test for async mode. Others can be
added/migrated based on that. I thought it made more sense for the tests to
live there than in src/interfaces//libpq/ since they need test client
programs and shouldn't pollute the library directory.

I've made the docs changes too. Thanks.

I fixed the list handling error. I'm amazed it appears to run fine, and
without complaint from valgrind, here, since it was an accidentally
_deleted_ line.

Re lists, I looked at simple_list.c and it's exceedingly primitive. Using
it would mean more malloc()ing since we'll have a list cell and then a
struct pointed to it, and won't recycle members, but... whatever. It's not
going to matter a great deal. The reason I did it with an embedded list
originally was because that's how it's done for PGnotify, but that's not
exactly new code

The bigger problem is that simple_list also uses pg_malloc, which won't set
conn->errorMessage, it'll just fprintf() and exit(1). I'm not convinced
it's appropriate to use that for libpq.

For now I've left list handling unchanged. If it's to move to a generic
list, it should probably be one that knows how to use
pg_malloc_extended(size, MCXT_ALLOC_NO_OOM) and emit its own
libpq-error-handling-aware error. I'm not sure whether that list should use
cell heads embedded in the structures it manages or pointing to them,
either.

Updated patch attached.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From fad0def5570907f5c8e3b6d65d57e5e7678e7383 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 20 May 2016 12:45:18 +0800
Subject: [PATCH] Pipelining (batch) support for libpq

Allow libpq clients to avoid excessive round trips by pipelining multiple
commands into batches. A sync is only sent at the end of a batch. Commands
in a batch succeed or fail together.

Adds TAP tests for libpq at src/test/modules/test_libpq .

Includes a test program in src/test/modules/test_libpq/testlibpqbatch.c
---
 doc/src/sgml/libpq.sgml  |  478 
 src/interfaces/libpq/.gitignore  |1 +
 src/interfaces/libpq/Makefile|5 +
 src/interfaces/libpq/exports.txt |7 +
 src/interfaces/libpq/fe-connect.c|   17 +
 src/interfaces/libpq/fe-exec.c   |  572 -
 src/interfaces/libpq/fe-protocol2.c  |6 +
 src/interfaces/libpq/fe-protocol3.c

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-22 Thread Ashutosh Sharma

Hi All,

Following are the steps that i have followed to verify the WAL Logging
of hash index,

1. I used Mithun's patch to improve coverage of hash index code [1] to
verify the WAL Logging of hash index. Firstly i have confirmed if all
the XLOG records associated with hash index are being covered or not
using this patch. In case if any of the XLOG record for hash index
operation is not being covered i have added a testcase for it. I have
found that one of the XLOG record 'XLOG_HASH_MOVE_PAGE_CONTENTS' was
not being covered and added a small testcase for the same. The patch
for this is available @ [2].

2. I executed the regression test suite and found all the hash indexes
that are getting created as a part of regression test suite using the
below query.

SELECT t.relname index_name, t.oid FROM pg_class t JOIN pg_am idx
ON idx.oid = t.relam WHERE idx.amname = 'hash';

3. Thirdly, I have calculated the number of pages associated with each
hash index and compared every page of hash index on master and standby
server using pg_filedump tool. As for example if the number of pages
associated with 'con_hash_index' is 10 then here is what i did,

On master:
-
select pg_relation_filepath('con_hash_index');
pg_relation_filepath
--
base/16408/16433
(1 row)

./pg_filedump -if -R 0 9
/home/edb/git-clone-postgresql/postgresql/TMP/postgres/master/base/16408/16433
> /tmp/file1

On Slave:
---
select pg_relation_filepath('con_hash_index');
pg_relation_filepath
--
base/16408/16433
(1 row)

./pg_filedump -if -R 0 9
/home/edb/git-clone-postgresql/postgresql/TMP/postgres/standby/base/16408/16433
> /tmp/file2

compared file1 and file2 using some diff tool.

Following are the list of hash indexes that got created inside
regression database when regression test suite was executed on a
master server.

hash_i4_index
hash_name_index
hash_txt_index
hash_f8_index
con_hash_index
hash_idx

In short, this is all i did and found no issues during testing. Please
let me know if you need any further details.

I would like to Thank Amit for his support and guidance during the
testing phase.

[1] -
https://www.postgresql.org/message-id/CAA4eK1JOBX%3DYU33631Qh-XivYXtPSALh514%2BjR8XeD7v%2BK3r_Q%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CAE9k0PkNjryhSiG53mjnKFhi%2BMipJMjSa%3DYkH-UeW3bfr1HPJQ%40mail.gmail.com

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "Some tests to cover hash_index"

2016-08-22 Thread Ashutosh Sharma

Hi,

I missed to attach the patch in my previous mail. Here i attach the patch.

With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Tue, Aug 23, 2016 at 11:47 AM, Ashutosh Sharma 
wrote:

> Hi All,
>
> I have reverified the code coverage for hash index code using the test
> file (commit-hash_coverage_test) attached with this mailing list and have
> found that some of the  code in _hash_squeezebucket() function flow is not
> being covered. For this i have added a small testcase on top of 'commit
> hash_coverage_test' patch.  I have done this mainly to test Amit's WAL for
> hash index patch [1].
>
> I have also removed the warning message that we used to get for hash index
> like 'WARNING: hash indexes are not WAL-logged and their use is
> discouraged' as this message is now no more visible w.r.t hash index after
> the WAL patch for hash index. Please have a look and let me know your
> thoughts.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1JOBX%
> 3DYU33631Qh-XivYXtPSALh514%2BjR8XeD7v%2BK3r_Q%40mail.gmail.com
>
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com
>
> On Sat, Aug 6, 2016 at 9:41 AM, Amit Kapila 
> wrote:
>
>> On Thu, Aug 4, 2016 at 7:24 PM, Mithun Cy 
>> wrote:
>>
>>> I am attaching the patch to improve some coverage of hash index code [1].
>>> I have added some basic tests, which mainly covers overflow pages. It
>>> took 2 sec extra time in my machine in parallel schedule.
>>>
>>>
>>>
>>>
>>> Hit Total Coverage
>>> old tests Line Coverage 780 1478 52.7
>>>
>>> Function Coverage 63 85 74.1
>>> improvement after tests Line Coverage 1181 1478 79.9 %
>>>
>>> Function Coverage 78 85 91.8 %
>>>
>>>
>>
>> I think the code coverage improvement for hash index with these tests
>> seems to be quite good, however time for tests seems to be slightly on
>> higher side.  Do anybody have better suggestion for these tests?
>>
>> diff --git a/src/test/regress/sql/concurrent_hash_index.sql
>> b/src/test/regress/sql/concurrent_hash_index.sql
>> I wonder why you have included a new file for these tests, why can't be
>> these added to existing hash_index.sql.
>>
>> +--
>> +-- Cause some overflow insert and splits.
>> +--
>> +CREATE TABLE con_hash_index_table (keycol INT);
>> +CREATE INDEX con_hash_index on con_hash_index_table USING HASH (keycol);
>>
>> The relation name con_hash_index* choosen in above tests doesn't seem to
>> be appropriate, how about hash_split_heap* or something like that.
>>
>> Register your patch in latest CF (https://commitfest.postgresql.org/10/)
>>
>> --
>> With Regards,
>> Amit Kapila.
>> EnterpriseDB: http://www.enterprisedb.com
>>
>
>
diff --git a/src/test/regress/expected/concurrent_hash_index.out b/src/test/regress/expected/concurrent_hash_index.out
index c3b8036..60191c0 100644
--- a/src/test/regress/expected/concurrent_hash_index.out
+++ b/src/test/regress/expected/concurrent_hash_index.out
@@ -3,7 +3,6 @@
 --
 CREATE TABLE con_hash_index_table (keycol INT);
 CREATE INDEX con_hash_index on con_hash_index_table USING HASH (keycol);
-WARNING:  hash indexes are not WAL-logged and their use is discouraged
 INSERT INTO con_hash_index_table VALUES (1);
 INSERT INTO con_hash_index_table SELECT * from con_hash_index_table;
 INSERT INTO con_hash_index_table SELECT * from con_hash_index_table;
@@ -75,5 +74,15 @@ DROP TABLE hash_ovfl_temp_heap CASCADE;
 CREATE TABLE hash_ovfl_heap_float4 (x float4, y int);
 INSERT INTO hash_ovfl_heap_float4 VALUES (1.1,1);
 CREATE INDEX hash_idx ON hash_ovfl_heap_float4 USING hash (x);
-WARNING:  hash indexes are not WAL-logged and their use is discouraged
 DROP TABLE hash_ovfl_heap_float4 CASCADE;
+--
+-- Test hash index insertion with squeeze bucket (XLOG_HASH_MOVE_PAGE_CONTENTS
+-- WAL record type).
+--
+CREATE TABLE hash_split_buckets (seqno int4, random int4);
+CREATE INDEX hash_idx ON hash_split_buckets USING hash (random int4_ops)
+with (fillfactor = 10);
+INSERT INTO hash_split_buckets (seqno, random) SELECT a, a*5 FROM
+GENERATE_SERIES(1, 10) a;
+REINDEX INDEX hash_idx;
+DROP TABLE hash_split_buckets;
diff --git a/src/test/regress/sql/concurrent_hash_index.sql b/src/test/regress/sql/concurrent_hash_index.sql
index 8f930d5..d3b09d0 100644
--- a/src/test/regress/sql/concurrent_hash_index.sql
+++ b/src/test/regress/sql/concurrent_hash_index.sql
@@ -78,3 +78,15 @@ CREATE TABLE hash_ovfl_heap_float4 (x float4, y int);
 INSERT INTO hash_ovfl_heap_float4 VALUES (1.1,1);
 CREATE INDEX hash_idx ON hash_ovfl_heap_float4 USING hash (x);
 DROP TABLE hash_ovfl_heap_float4 CASCADE;
+
+--
+-- Test hash index insertion with squeeze bucket (XLOG_HASH_MOVE_PAGE_CONTENTS
+-- WAL record type).
+--
+CREATE TABLE hash_split_buckets (seqno int4, random int4);
+CREATE INDEX hash_idx ON hash_split_buckets USING hash (random int4_ops)
+with (fillfactor = 10);
+INSERT INTO hash_split_buckets (seqno, random) SELECT a, a*5 FROM
+GENERATE_SERIES(1, 10) a;
+REINDEX INDEX hash_idx;
+DROP TABLE hash_split_buckets;

-

Re: [HACKERS] "Some tests to cover hash_index"

2016-08-22 Thread Ashutosh Sharma

Hi All,

I have reverified the code coverage for hash index code using the test file
(commit-hash_coverage_test) attached with this mailing list and have found
that some of the  code in _hash_squeezebucket() function flow is not being
covered. For this i have added a small testcase on top of 'commit
hash_coverage_test' patch.  I have done this mainly to test Amit's WAL for
hash index patch [1].

I have also removed the warning message that we used to get for hash index
like 'WARNING: hash indexes are not WAL-logged and their use is
discouraged' as this message is now no more visible w.r.t hash index after
the WAL patch for hash index. Please have a look and let me know your
thoughts.

[1] -
https://www.postgresql.org/message-id/CAA4eK1JOBX%3DYU33631Qh-XivYXtPSALh514%2BjR8XeD7v%2BK3r_Q%40mail.gmail.com

With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Sat, Aug 6, 2016 at 9:41 AM, Amit Kapila  wrote:

> On Thu, Aug 4, 2016 at 7:24 PM, Mithun Cy 
> wrote:
>
>> I am attaching the patch to improve some coverage of hash index code [1].
>> I have added some basic tests, which mainly covers overflow pages. It
>> took 2 sec extra time in my machine in parallel schedule.
>>
>>
>>
>>
>> Hit Total Coverage
>> old tests Line Coverage 780 1478 52.7
>>
>> Function Coverage 63 85 74.1
>> improvement after tests Line Coverage 1181 1478 79.9 %
>>
>> Function Coverage 78 85 91.8 %
>>
>>
>
> I think the code coverage improvement for hash index with these tests
> seems to be quite good, however time for tests seems to be slightly on
> higher side.  Do anybody have better suggestion for these tests?
>
> diff --git a/src/test/regress/sql/concurrent_hash_index.sql
> b/src/test/regress/sql/concurrent_hash_index.sql
> I wonder why you have included a new file for these tests, why can't be
> these added to existing hash_index.sql.
>
> +--
> +-- Cause some overflow insert and splits.
> +--
> +CREATE TABLE con_hash_index_table (keycol INT);
> +CREATE INDEX con_hash_index on con_hash_index_table USING HASH (keycol);
>
> The relation name con_hash_index* choosen in above tests doesn't seem to
> be appropriate, how about hash_split_heap* or something like that.
>
> Register your patch in latest CF (https://commitfest.postgresql.org/10/)
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-22 Thread Ryan Murphy

Thanks for committing it Tom!  Thank you all for your help.

I really like the Postgres project!  If there's anything that especially
needs worked on let me know, I'd love to help.

Best,
Ryan

Re: [HACKERS] Tracking wait event for latches

On Tue, Aug 23, 2016 at 12:44 AM, Robert Haas  wrote:
> On Mon, Aug 22, 2016 at 9:49 AM, Michael Paquier
>  wrote:
>> The reason why I chose this way is that there are a lot of them. It is
>> painful to maintain the order of the array elements in perfect mapping
>> with the list of IDs...
>
> You can use stupid macro tricks to help with that problem...

Yeah, still after thinking about it I think I would just go with an
array like lock types and be done with it. With a comment to mention
that the order should be respected things would be enough...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Amit Kapila

On Tue, Aug 23, 2016 at 10:57 AM, Michael Paquier
 wrote:
>
> Also, what's the use case of allowing only a certain set of rmgrs to
> be checked. Wouldn't a simple on/off switch be simpler?
>

I think there should be a way to test WAL for one particular resource
manager.  For example, if someone develops a new index or some other
heap storage, only that particular module can be verified.  Generating
WAL for all the resource managers together can also serve the purpose,
but it will be slightly difficult to verify it.

> As presented,
> wal_consistency_mask is also going to be very quite confusing for
> users. You should not need to apply some maths to set up this
> parameter, a list of rmgr names may be more adapted if this level of
> tuning is needed,
>

Yeah, that can be better.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in to_timestamp().

2016-08-22 Thread amul sul

Hi Artur Zakirov,

Please see following review comment for 
"0001-to-timestamp-format-checking-v2.patch" & share your thought: 

#1.

15 +   to_timestamp('2000JUN', ' MON')

Documented as working case, but unfortunatly it does not : 

postgres=# SELECT to_timestamp('2000JUN', ' MON');
ERROR:  invalid value "---" for "MON"
DETAIL:  The given value did not match any of the allowed values for this field.


#2.

102 +   /* Previous character was a quote */
103 +   else if (in_text)
104 +   {
105 +   if (*str == '"')
106 +   {
107 +   str++;
108 +   in_text = false;
109 +   }
110 +   else if (*str == '\\')
111 +   {
112 +   str++;
113 +   in_backslash = true;
114 +   }
115 +   else
116 +   {
117 +   n->type = NODE_TYPE_CHAR;
118 +   n->character = *str;
119 +   n->key = NULL;
120 +   n->suffix = 0;
121 +   n++;
122 +   str++;
123 +   }
124 +   continue;
125 +   }
126 +

NODE_TYPE_CHAR assumption in else block seems incorrect. What if we have space 
after double quote?  It will again have incorrect output without any error, see 
below: 

postgres=# SELECT to_timestamp('Year: 1976, Month: May, Day: 16', 
postgres(# '"Year:" , "Month:" FMMonth, "Day:"   DD');
to_timestamp 
--
0006-05-16 00:00:00-07:52:58
(1 row)

I guess, we might need NODE_TYPE_SEPARATOR and NODE_TYPE_SPACE check as well? 



#3.

296 -   /* Ignore spaces before fields when not in FX (fixed width) mode */
297 +   /* Ignore spaces before fields when not in FX (fixed * width) mode 
*/
298 if (!fx_mode && n->key->id != DCH_FX)
299 {
300 -   while (*s != '\0' && isspace((unsigned char) *s))
301 +   while (*s != '\0' && (isspace((unsigned char) *s)))
302 s++;

Unnecessary hunk?
We should not touch any code unless it necessary to implement proposed feature, 
otherwise it will add unnecessary diff to the patch and eventually extra burden 
to review the code. Similarly hunk in the patch @ line # 313 - 410, nothing to 
do with to_timestamp behaviour improvement, IIUC.

If you think this changes need to be in, please submit separate cleanup-patch.


>I attached second patch "0002-to-timestamp-validation-v2.patch". With it 
>PostgreSQL perform additional checks for date and time. But as I wrote 
>there is another patch in the thread "to_date_valid()" wich differs from 
>this patch.

@community : I am not sure what to do with this patch, should we keep it as 
separate enhancement?

Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

On Tue, Aug 23, 2016 at 1:32 PM, Amit Kapila  wrote:
> On Mon, Aug 22, 2016 at 9:16 PM, Robert Haas  wrote:
>> On Mon, Aug 22, 2016 at 9:25 AM, Michael Paquier
>>  wrote:
>>> Another pin-point is: given a certain page, how do we identify of
>>> which type it is? One possibility would be again to extend the AM
>>> handler with some kind of is_self function with a prototype like that:
>>> bool handler->is_self(Page);
>>> If the page is of the type of the handler, this returns true, and
>>> false otherwise. Still here performance would suck.
>>>
>>> At the end, what we want is a clean interface, and more thoughts into it.
>>
>> I think that it makes sense to filter based on the resource manager
>> ID
>>
>
> +1.

Yes actually that's better. That's simple enough and removes any need
to looking at pd_special.

> I think the patch currently addresses only a subset of resource
> manager id's (mainly Heap and Index resource manager id's).  Do we
> want to handle the complete resource manager list as defined in
> rmgrlist.h?

Not all of them generate FPWs. I don't think it matters much.

> Another thing that needs some thoughts is the UI of this patch,
> currently it is using integer mask which might not be best way, but
> again as it is intended to be mainly used for tests, it might be okay.

What we'd want to have is a way to visualize easily differences of
pages. Any other ideas than MASK_MARKER would be welcome of course.

> Do we want to enable some tests in the regression suite by using this option?

We could get out a recovery test that sets up a standby/master and
runs the tests of src/test/regress with pg_regress with this parameter
enabled.

+ * bufmask.c
+ *  Routines for buffer masking, used to ensure that buffers used for
+ *  comparison across nodes are in a consistent state.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
Copyright notices need to be updated. (It's already been 2 years!!)

Also, what's the use case of allowing only a certain set of rmgrs to
be checked. Wouldn't a simple on/off switch be simpler? As presented,
wal_consistency_mask is also going to be very quite confusing for
users. You should not need to apply some maths to set up this
parameter, a list of rmgr names may be more adapted if this level of
tuning is needed, still it seems to me that we don't need this much.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Kuntal Ghosh

Yes, I've verified the outputs and log contents after running gmake
installcheck and gmake installcheck-world. The status of the test was
marked as pass for all the testcases.


On Mon, Aug 22, 2016 at 9:26 PM, Simon Riggs  wrote:
> On 22 August 2016 at 13:44, Kuntal Ghosh  wrote:
>
>> Please let me know your thoughts on this.
>
> Do the regression tests pass with this option enabled?
>
> --
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Thanks & Regards,
Kuntal Ghosh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Amit Kapila

On Mon, Aug 22, 2016 at 9:16 PM, Robert Haas  wrote:
> On Mon, Aug 22, 2016 at 9:25 AM, Michael Paquier
>  wrote:
>> Another pin-point is: given a certain page, how do we identify of
>> which type it is? One possibility would be again to extend the AM
>> handler with some kind of is_self function with a prototype like that:
>> bool handler->is_self(Page);
>> If the page is of the type of the handler, this returns true, and
>> false otherwise. Still here performance would suck.
>>
>> At the end, what we want is a clean interface, and more thoughts into it.
>
> I think that it makes sense to filter based on the resource manager
> ID
>

+1.

I think the patch currently addresses only a subset of resource
manager id's (mainly Heap and Index resource manager id's).  Do we
want to handle the complete resource manager list as defined in
rmgrlist.h?

Another thing that needs some thoughts is the UI of this patch,
currently it is using integer mask which might not be best way, but
again as it is intended to be mainly used for tests, it might be okay.

Do we want to enable some tests in the regression suite by using this option?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-22 Thread Amit Kapila

On Tue, Aug 23, 2016 at 8:54 AM, Amit Kapila  wrote:
> $SUBJECT will make hash indexes reliable and usable on standby.
> AFAIU, currently hash indexes are not recommended to be used in
> production mainly because they are not crash-safe and with this patch,
> I hope we can address that limitation and recommend them for use in
> production.
>
> This patch is built on my earlier patch [1] of making hash indexes
> concurrent.  The main reason for doing so is that the earlier patch
> allows to complete the split operation and used light-weight locking
> due to which operations can be logged at granular level.
>
> WAL for different operations:
>
> This has been explained in README as well, but I am again writing it
> here for the ease of people.
>
>
..
> One of the challenge in writing this patch was that the current code
> was not written with a mindset that we need to write WAL for different
> operations.  Typical example is _hash_addovflpage() where pages are
> modified across different function calls and all modifications needs
> to be done atomically, so I have to refactor some code so that the
> operations can be logged sensibly.
>

This patch has not done handling for OldSnapshot.  Previously, we
haven't done TestForOldSnapshot() checks in hash index as they were
not logged, but now with this patch, it makes sense to perform such
checks.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2016-08-22 Thread Tom Lane

Peter Geoghegan  writes:
> On Mon, Aug 22, 2016 at 6:28 PM, Michael Paquier
>  wrote:
>> There is no need to put restrictions on those I think, and they are
>> actually supported.

> Bi-directional text support (i.e., the use of right-to-left control
> characters) is known to have security implications, FWIW. There is an
> interesting discussion of the matter here:

> http://www.unicode.org/reports/tr36/#Bidirectional_Text_Spoofing

The problem with implementing anything like that is that it requires
assumptions about what encoding we're dealing with, which would be
entirely not based in fact.  (The DB encoding is not a good guide
to what global names are encoded as, much less what encoding some
shell might think it's using.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Forbid use of LF and CR characters in database and role names

On Mon, Aug 22, 2016 at 6:28 PM, Michael Paquier
 wrote:
> There is no need to put restrictions on those I think, and they are
> actually supported.

Bi-directional text support (i.e., the use of right-to-left control
characters) is known to have security implications, FWIW. There is an
interesting discussion of the matter here:

http://www.unicode.org/reports/tr36/#Bidirectional_Text_Spoofing

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exporting more function in libpq

2016-08-22 Thread Tom Lane

Alvaro Herrera  writes:
> Craig Ringer wrote:
>> Shouldn't that generally be done by extending libpq to add the required
>> functionality?

> The thought that came to me was that maybe we need a separate library
> that handles the lower level operations (a "fe/be" library, if you will)
> which can be exported for others to use and is used by libpq to
> implement the slightly-higher-level functionality.

If you wanted a library that exposed something close to the wire-level
protocol, I do not think that tearing out some of the oldest and cruftiest
parts of libpq and exposing them verbatim is really the best way to go
about it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Forbid use of LF and CR characters in database and role names

On Tue, Aug 23, 2016 at 10:19 AM, Peter Geoghegan  wrote:
> I haven't looked at the patch, but offhand I wonder if it's worth
> considering control characters added by unicode, if you haven't already.

There is no need to put restrictions on those I think, and they are
actually supported. Look for example at pg_upgrade/test.sh, we are
already testing them with database names :) Not BEL of course, but
that works.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Forbid use of LF and CR characters in database and role names

I haven't looked at the patch, but offhand I wonder if it's worth
considering control characters added by unicode, if you haven't already.

--
Peter Geoghegan

Re: [HACKERS] Forbid use of LF and CR characters in database and role names

On Fri, Aug 12, 2016 at 10:12 AM, Michael Paquier
 wrote:
> Note that pg_dump[all] and pg_upgrade already have safeguards against
> those things per the same routines putting quotes for execution as
> commands into psql and shell. So attached is a patch to implement this
> restriction in the backend, and I am adding that to the next CF for
> 10.0. Attached is as well a script able to trigger those errors.
> Thoughts?

I am re-sending the patch. For a reason escaping me, it is showing up
as 'invalid/octet-stream'... (Thanks Bruce for noting that)
-- 
Michael
diff --git a/src/backend/commands/dbcommands.c 
b/src/backend/commands/dbcommands.c
index c1c0223..5746958 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -77,6 +77,7 @@ typedef struct
 } movedb_failure_params;
 
 /* non-export function prototypes */
+static void check_db_name(const char *dbname);
 static void createdb_failure_callback(int code, Datum arg);
 static void movedb(const char *dbname, const char *tblspcname);
 static void movedb_failure_callback(int code, Datum arg);
@@ -456,6 +457,9 @@ createdb(const CreatedbStmt *stmt)
/* Note there is no additional permission check in this path */
}
 
+   /* do sanity checks on database name */
+   check_db_name(dbname);
+
/*
 * Check for db name conflict.  This is just to give a more friendly 
error
 * message than "unique index violation".  There's a race condition but
@@ -745,6 +749,22 @@ check_encoding_locale_matches(int encoding, const char 
*collate, const char *cty
   pg_encoding_to_char(collate_encoding;
 }
 
+/*
+ * Perform sanity checks on the database name.
+ */
+static void
+check_db_name(const char *dbname)
+{
+   if (strchr(dbname, '\n') != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("database name cannot use LF 
character")));
+   if (strchr(dbname, '\r') != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("database name cannot use CR 
character")));
+}
+
 /* Error cleanup callback for createdb */
 static void
 createdb_failure_callback(int code, Datum arg)
@@ -949,6 +969,9 @@ RenameDatabase(const char *oldname, const char *newname)
int npreparedxacts;
ObjectAddress address;
 
+   /* check format of new name */
+   check_db_name(newname);
+
/*
 * Look up the target database's OID, and get exclusive lock on it. We
 * need this for the same reasons as DROP DATABASE.
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index b6ea950..8954e16 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -57,6 +57,21 @@ static void DelRoleMems(const char *rolename, Oid roleid,
bool admin_opt);
 
 
+/* Do sanity checks on role name */
+static void
+check_role_name(const char *rolname)
+{
+   if (strchr(rolname, '\n') != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("role name cannot use LF character")));
+   if (strchr(rolname, '\r') != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("role name cannot use CR character")));
+}
+
+
 /* Check if current user has createrole privileges */
 static bool
 have_createrole_privilege(void)
@@ -111,6 +126,9 @@ CreateRole(CreateRoleStmt *stmt)
DefElem*dvalidUntil = NULL;
DefElem*dbypassRLS = NULL;
 
+   /* sanity check for role name */
+   check_role_name(stmt->role);
+
/* The defaults can vary depending on the original statement type */
switch (stmt->stmt_type)
{
@@ -1137,6 +1155,9 @@ RenameRole(const char *oldname, const char *newname)
ObjectAddress address;
Form_pg_authid authform;
 
+   /* sanity check for role name */
+   check_role_name(newname);
+
rel = heap_open(AuthIdRelationId, RowExclusiveLock);
dsc = RelationGetDescr(rel);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Better locale-specific-character-class handling for regexps

2016-08-22 Thread Tom Lane

I got tired of hearing complaints about the issue described in
this thread:
https://www.postgresql.org/message-id/flat/24241.1329347196%40sss.pgh.pa.us

Here's a proposed fix.  I've not done extensive performance testing,
but it seems to be as fast or faster than the old code in cases where
there are not too many "large" characters in the input.  And, more
to the point, it gets the right answer for such large characters.

I'll add this to the upcoming commitfest.

regards, tom lane

diff --git a/src/backend/regex/README b/src/backend/regex/README
index 6c9f483..b4a7ad7 100644
*** a/src/backend/regex/README
--- b/src/backend/regex/README
*** and similarly additional source files re
*** 27,39 
  regexec.  This was done to avoid exposing internal symbols globally;
  all functions not meant to be part of the library API are static.
  
! (Actually the above is a lie in one respect: there is one more global
! symbol, pg_set_regex_collation in regcomp.  It is not meant to be part of
! the API, but it has to be global because both regcomp and regexec call it.
! It'd be better to get rid of that, as well as the static variables it
! sets, in favor of keeping the needed locale state in the regex structs.
! We have not done this yet for lack of a design for how to add
! application-specific state to the structs.)
  
  What's where in src/backend/regex/:
  
--- 27,40 
  regexec.  This was done to avoid exposing internal symbols globally;
  all functions not meant to be part of the library API are static.
  
! (Actually the above is a lie in one respect: there are two more global
! symbols, pg_set_regex_collation and pg_reg_getcolor in regcomp.  These are
! not meant to be part of the API, but they have to be global because both
! regcomp and regexec call them.  It'd be better to get rid of
! pg_set_regex_collation, as well as the static variables it sets, in favor of
! keeping the needed locale state in the regex structs.  We have not done this
! yet for lack of a design for how to add application-specific state to the
! structs.)
  
  What's where in src/backend/regex/:
  
*** colors:
*** 274,301 
an existing color has to be subdivided.
  
  The last two of these are handled with the "struct colordesc" array and
! the "colorchain" links in NFA arc structs.  The color map proper (that
! is, the per-character lookup array) is handled as a multi-level tree,
! with each tree level indexed by one byte of a character's value.  The
! code arranges to not have more than one copy of bottom-level tree pages
! that are all-the-same-color.
  
! Unfortunately, this design does not seem terribly efficient for common
! cases such as a tree in which all Unicode letters are colored the same,
! because there aren't that many places where we get a whole page all the
! same color, except at the end of the map.  (It also strikes me that given
! PG's current restrictions on the range of Unicode values, we could use a
! 3-level rather than 4-level tree; but there's not provision for that in
! regguts.h at the moment.)
  
! A bigger problem is that it just doesn't seem very reasonable to have to
! consider each Unicode letter separately at regex parse time for a regex
! such as "\w"; more than likely, a huge percentage of those codes will
! never be seen at runtime.  We need to fix things so that locale-based
! character classes are somehow processed "symbolically" without making a
! full expansion of their contents at parse time.  This would mean that we'd
! have to be ready to call iswalpha() at runtime, but if that only happens
! for high-code-value characters, it shouldn't be a big performance hit.
  
  
  Detailed semantics of an NFA
--- 275,330 
an existing color has to be subdivided.
  
  The last two of these are handled with the "struct colordesc" array and
! the "colorchain" links in NFA arc structs.
  
! Ideally, we'd do the first two operations using a simple linear array
! storing the current color assignment for each character code.
! Unfortunately, that's not terribly workable for large charsets such as
! Unicode.  Our solution is to divide the color map into two parts.  A simple
! linear array is used for character codes up to MAX_SIMPLE_CHR, which can be
! chosen large enough to include all popular characters (so that the
! significantly-slower code paths about to be described are seldom invoked).
! Characters above that need be considered at compile time only if they
! appear explicitly in the regex pattern.  We store each such mentioned
! character or character range as an entry in the "colormaprange" array in
! the colormap.  (Overlapping ranges are split into unique subranges, so that
! each range in the finished list needs only a single color that describes
! all its characters.)  When mapping a character above MAX_SIMPLE_CHR to a
! color at runtime, we search this list of ranges explicitly.
  
! That's still not quite enough, though, because of lo

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

On 23 August 2016 at 01:03, Robert Haas  wrote:


>
> I think you should use underscores to separate all of the words
> instead of only some of them.
>
>
ifassigned => if_assigned

ifrecent=> if_recent

Updated patch series attached. As before, 0-4 intended for commit, 5 just
because it'll be handy to have around for people doing wraparound related
testing.

Again, thanks for taking a look.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 81cbe525261a15a21415af361b3421038eccc895 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 19 Aug 2016 14:44:15 +0800
Subject: [PATCH 1/4] Introduce txid_status(bigint) to get status of an xact

If an appliation is disconnected while a COMMIT request is in flight,
the backend crashes mid-commit, etc, then an application may not be
sure whether or not a commit completed successfully or was rolled
back. While two-phase commit solves this it does so at a considerable
overhead, so introduce a lighter alternative.

txid_status(bigint) lets an application determine the status of a
a commit based on an xid-with-epoch as returned by txid_current()
or similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.

Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned.
---
 doc/src/sgml/func.sgml   | 28 +++-
 src/backend/access/transam/clog.c| 23 --
 src/backend/catalog/system_views.sql | 20 +
 src/backend/utils/adt/txid.c | 82 
 src/include/access/clog.h| 23 ++
 src/include/catalog/pg_proc.h|  2 +
 src/test/regress/expected/txid.out   | 50 ++
 src/test/regress/sql/txid.sql| 35 +++
 8 files changed, 239 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 169a385..8edf490 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17139,6 +17139,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
 txid_visible_in_snapshot

 
+   
+txid_status
+   
+

 The functions shown in 
 provide server transaction information in an exportable form.  The main
@@ -17157,7 +17161,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
   
txid_current()
bigint
-   get current transaction ID, assigning a new one if the current transaction does not have one
+   get current 64-bit transaction ID with epoch, assigning a new one if the current transaction does not have one
   
   
txid_current_snapshot()
@@ -17184,6 +17188,11 @@ SELECT collation for ('foo' COLLATE "de_DE");
boolean
is transaction ID visible in snapshot? (do not use with subtransaction ids)
   
+  
+   txid_status(bigint)
+   txid_status
+   report the status of the given xact - committed, aborted, in-progress, or null if the xid is too old
+  
  
 

@@ -17254,6 +17263,23 @@ SELECT collation for ('foo' COLLATE "de_DE");

 

+txid_status(bigint) reports the commit status of a recent
+transaction. Any recent transaction can be identified as one of
+
+ in-progress
+ committed
+ aborted
+
+Prepared transactions are identified as in-progress.
+The commit status of transactions older than the transaction ID wrap-around
+threshold is no longer known by the system, so txid_status
+returns NULL for such transactions. Applications may use
+txid_status to determine whether a transaction committed
+or aborted when the application and/or database server crashed or lost
+connection while a COMMIT command was in-progress.
+   
+
+   
 The functions shown in 
 provide information about transactions that have been already committed.
 These functions mainly provide information about when the transactions
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2634476..1a6e26d 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -41,29 +41,6 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 
-/*
- * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
- * everywhere else in Postgres.
- *
- * Note: because TransactionIds are 32 bits and wrap around at 0x,
- * CLOG page numbering also wraps around at 0x/CLOG_XACTS_PER_PAGE,
- * and CLOG segment numbering at
- * 0x/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
- * explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
- */
-
-/* We need two bits per xact, so four xacts

[HACKERS] Re: [BUGS] Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

On Wed, Mar 23, 2016 at 10:46 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Are you still in information-gathering more, or are you going to issue
>> a recommendation on how we should proceed here, or what?
>
> If I had to make a recommendation right now, I would go for your
> option #4, ie shut 'em all down Scotty.  We do not know the full extent
> of the problem but it looks pretty bad, and I think our first priority
> has to be to guarantee data integrity.  I do not have a lot of faith in
> the proposition that glibc's is the only buggy implementation, either.

For the record, I have been able to determine by using amcheck on the
Heroku platform that en_US.UTF-8 cases are sometimes affected by an
inconsistency between strcoll() and strxfrm() behavior, which was
previously an open question. I saw only two instances of this across
many thousands of servers. For some reason, both cases involved
strings with code points from the Arabic alphabet, even though each
case was from a totally unrelated customer database.

I'll go update the Wiki page for this [1] now.

[1] https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

On 23 August 2016 at 01:03, Robert Haas  wrote:

> I think you should use underscores to separate all of the words
> instead of only some of them.
>

Right. Will fix.

Thanks for taking a look.

> Also, note that the corresponding internal function is
> GetTopTransactionIdIfAny(), which might suggest txid_current_if_any()
> rather than txid_current_if_assigned(), but you could argue that your
> naming is better.

Yeah, I do argue that in this case. Not a hugely strong opinion, but we
refer to txid_current() in the docs as:

"get current transaction ID, assigning a new one if the current transaction
does not have one"

so I thought it'd be worth being consistent with that. It's not like
txid_current() mirrors the name of GetTopTransactionId() after all ;) and I
think it's more important to be consistent with what the user sees than
with the code.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

On 10 August 2016 at 14:44, Michael Paquier 
wrote:

> On Fri, Jun 3, 2016 at 8:51 PM, Dmitry Igrishin  wrote:
> >> BTW, I've publushed the HTML-ified SGML docs to
> >> http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a
> preview.
> > Typo detected: "Returns 1 if the batch curently being received" --
> "curently".
>
> I am looking a bit more seriously at this patch and assigned myself as
> a reviewer.
>

Much appreciated.


> testlibpqbatch.c:1239:73: warning: format specifies type 'long' but
> the argument has type '__darwin_suseconds_t' (aka 'int') [-Wformat]
> printf("batch insert elapsed:  %ld.%06lds\n",
> elapsed_time.tv_sec, elapsed_time.tv_usec);
> macos complains here. You may want to replace %06lds by just %06d.
>

Yeah, or cast to a type known to be big enough. Will amend.


> This patch generates a core dump, use for example pg_ctl start -w and
> you'll bump into the trace above. There is something wrong with the
> queue handling.
>

Huh. I didn't see that here (Fedora 23). I'll look more closely.


> Do you have plans for a more generic structure for the command queue list?
>

No plans, no. This was a weekend experiment that turned into a useful patch
and I'm having to scrape up time for it amongst much more important things
like logical failover / sequence decoding and various other replication
work.

Thanks for the docs review too, will amend.


> +   fprintf(stderr, "internal error, COPY in batch mode");
> +   abort();
> I don't think that's a good idea. defaultNoticeProcessor can be
> overridden to allow applications to have error messages sent
> elsewhere. Error messages should also use libpq_gettext, and perhaps
> be stored in conn->errorMessage as we do so for OOMs happening on
> client-side and reporting them back even if they are not expected
> (those are blocked PQsendQueryStart in your patch).
>
> src/test/examples is a good idea to show people what this new API can
> do, but this is never getting compiled. It could as well be possible
> to include tests in src/test/modules/, in the same shape as what
> postgres_fdw is doing by connecting to itself and link it to libpq. As
> this patch complicates quote a lot fe-exec.c, I think that this would
> be worth it. Thoughts?


I didn't think it added much complexity to fe-exec.c personally. A lot of
the appeal is that it has very minor impact on anything that isn't using it.

I think it makes sense to (ab)use the recovery module tests for this,
invoking the test program from there.

Ideally I'd like to teach pgsql and pg_restore how to use async mode, but
that's a whole separate patch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-22 Thread Bruce Momjian

On Mon, Aug 15, 2016 at 10:19:12AM -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 08/14/2016 04:38 PM, Tom Lane wrote:
> >> I did a trial run following the current pgindent README procedure, and
> >> noticed that the perltidy step left me with a pile of '.bak' files
> >> littering the entire tree.  This seems like a pretty bad idea because
> >> a naive "git add ." would have committed them.  It's evidently because
> >> src/tools/pgindent/perltidyrc includes --backup-and-modify-in-place.
> 
> BTW, after experimenting with this, I did not find any way to get perltidy
> to overwrite the original files without making a backup file.

Yep, that's why --backup-and-modify-in-place had to be used.  I have a
local script to remove file with specified extentions, but didn't
document that cleanup step.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-22 Thread Thomas Munro

On Tue, Aug 23, 2016 at 8:41 AM, Robert Haas  wrote:
> We could test to see how much it slows things down.  But it
> may be worth paying the cost even if it ends up being kinda expensive.

Here are some numbers from a Xeon E7-8830 @ 2.13GHz running Linux 3.10
running the attached program.  It's fairly noisy and I didn't run
super long tests with many repeats, but the general theme is visible.
If you're actually going to USE the memory, it's only a small extra
cost to have reserved seats.  But if there's a strong chance you'll
never access most of the memory, you might call it expensive.

Segment size 1MB:

base = shm_open + ftruncate + mmap + munmap + close = 5us
base + fallocate = 38us
base + memset = 332us
base + fallocate + memset = 346us

Segment size 1GB:

base = shm_open + ftruncate + mmap + munmap + close = 10032us
base + fallocate = 30774us
base + memset = 602925us
base + fallocate + memset = 655433us

-- 
Thomas Munro
http://www.enterprisedb.com
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define SEGMENT_NAME "/my_test_segment"

int
main(int argc, char *argv[])
{
	int loops, i;
	off_t size;
	bool fallocatep;
	bool memsetp;
	bool hugep;
	void *mem;

	if (argc != 6)
	{
		fprintf(stderr, "Usage: %s \n", argv[0]);
		return EXIT_FAILURE;
	}

	loops = atoi(argv[1]);
	size = atoi(argv[2]) * 1024 * 1024;
	fallocatep = atoi(argv[3]) != 0;
	memsetp = atoi(argv[4]) != 0;
	hugep = atoi(argv[5]) != 0;

	for (i = 0; i < loops; ++i)
	{
		int fd;

		fd = shm_open(SEGMENT_NAME, O_CREAT | O_RDWR, S_IWUSR | S_IRUSR);
		if (fd < 0)
		{
			perror("shm_open");
			goto cleanup;
		}
		if (ftruncate(fd, size))
		{
			perror("ftruncate");
			goto cleanup;
		}
		if (fallocatep && fallocate(fd, 0, 0, size))
		{
			perror("fallocate");
			goto cleanup;
		}
		mem = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED | (hugep ? MAP_HUGETLB : 0), fd, 0);
		if (mem == NULL)
		{
			fprintf(stderr, "mmap failed");
			goto cleanup;
		}
		if (memsetp)
			memset(mem, 0, size);
		munmap(mem, size);
		close(fd);
	}

	shm_unlink(SEGMENT_NAME);
	return EXIT_SUCCESS;

cleanup:	
	shm_unlink(SEGMENT_NAME);
	return EXIT_FAILURE;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding of sequence advances, part II

On 23 Aug 2016 05:43, "Kevin Grittner"  wrote:
>
> On Mon, Aug 22, 2016 at 3:29 PM, Robert Haas 
wrote:
>
> > it seems to me that
> > this is just one facet of a much more general problem: given two
> > transactions T1 and T2, the order of replay must match the order of
> > commit unless you can prove that there are no dependencies between
> > them.  I don't see why it matters whether the operations are sequence
> > operations or data operations; it's just a question of whether they're
> > modifying the same "stuff".

It matters because sequence operations aren't transactional in pg. Except
when they are - operations on a newly CREATEd sequence or one where we did
a TRUNCATE ...RESTART IDENTITY.

But we don't store the xid of the xact associated with a transactional
sequence update along with the sequence update anywhere. We just rely on nk
other xact knowing to look at the sequence relfilenode we're changing.
Doesn't work so well in logical rep.

We also don't store knowledge of whether or not the sequence advance is
transactional. Again important because for two xacts t1 and t2:

* Sequence last value is 50

* T1 calls nextval. Needs a new chunk because all cached values have been
used. Writes sequence wal advancing seq last_value to 100, returns 51.

* T2 calls nextval, gets cached value 52.

* T2 commits

* Master crashes and we fail over to replica.

This is fine for physical rep. We replay the sequence advance and all is
well.

But for logical rep the sequence can't be treated as part of t1. If t1
rolls back or we fail over before replying it we might return value 52 from
nextval even though we replayed and committed t2 that used value 52. Oops.

However if some xact t3 creates a sequence we can't replay updates to it
until the sequence relation is committed. And it's even more fun with
TRUNCATE ... RESTART IDENTITY where we need rollback behaviour too.

Make sense? It's hard because sequences are sometimes but not always exrmpt
from transactional behaviour and pg doesn't record when, since it can rely
on physical wal redo order and can apply sequence advances before the
sequence relation is committed yet.

>
> The commit order is the simplest and safest *unless* there is a
> read-write anti-dependency a/k/a read-write dependency a/k/a
> rw-conflict: where a read from one transaction sees the "before"
> version of data modified by the other transaction.  In such a case
> it is necessary for correct serializable transaction behavior for
> the transaction that read the "before" image to be replayed before
> the write it didn't see, regardless of commit order.  If you're not
> trying to avoid serialization anomalies, it is less clear to me
> what is best.

Could you provide an example of a case where xacts replayed in commit order
will produce incorrect results?

Remember that we aren't doing statement based replication in pg logical
decoding/replication. We don't care how a row got changed, only that we
make consistent transitions from before state to after state to for each
transaction, such that the data committed and visible on the master is
visible on the standby and no uncommitted or not yet visible data on the
master is committed/visible on the replica. The replica should have visible
committed data matching the master as it was when it originally executed
the xact we most recently replayed.

No locking is decoded or replayed. It is not expected that a normal non
replication client executing some other concurrent xact will have the same
effect if run on standby as on master.

It's replication not tightly coupled clustering. If/when we have things
like parallel decoding and replay of  concurrent xacts then issues like the
dependencies you mention will start to become a concern. We are a long way
from there.

For sequences the requirement IMO is that the sequence advances on the
replica to or past the position it was at on the master when the first xact
that saw those sequence values committed. We should never see the sequence
'behind' such that calling nextval on the replica can produce a value
already seen and stored by some committed xact on the replica. Being a bit
ahead is ok, much like pg discards sequence values on crash.

That's not that hard. The problems arise when the sequence it's self isn't
committed yet, per above.

Re: [HACKERS] UTF-8 docs?

2016-08-22 Thread Tatsuo Ishii

> On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
>> I don't know what kind of problem you are seeing with encoding
>> handling, but at least UTF-8 is working for Japanese, French and
>> Russian.
> 
> Those translations are using DocBook XML.

But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade  -wall -wno-unused-param 
-wno-empty -wfully-tagged -D . -D . -c 
/usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t 
sgml -i output-html -i include-index postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2016-08-22 Thread Tomas Vondra




On 08/22/2016 07:42 PM, Alvaro Herrera wrote:

Robert Haas wrote:

On Sat, Aug 20, 2016 at 4:58 PM, Tom Lane  wrote:

Jeff Janes  writes:

On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane  wrote:

It does know it, what it doesn't know is how many duplicates there are.



Does it know whether the count comes from a parsed query-string list/array,
rather than being an estimate from something else?  If it came from a join,
I can see why it would be dangerous to assume they are mostly distinct.
But if someone throws 6000 things into a query string and only 200 distinct
values among them, they have no one to blame but themselves when it makes
bad choices off of that.


I am not exactly sold on this assumption that applications have
de-duplicated the contents of a VALUES or IN list.  They haven't been
asked to do that in the past, so why do you think they are doing it?


It's hard to know, but my intuition is that most people would
deduplicate.  I mean, nobody is going to want to their query generator
to send X IN (1, 1, ) to the server if it
could have just sent X IN (1).


Also, if we patch it this way and somebody has a slow query because of a
lot of duplicate values, it's easy to solve the problem by
de-duplicating.  But with the current code, people that have the
opposite problem has no way to work around it.



I certainly agree it's better when a smart user can fix his query plan 
by deduplicating the values than when we end up generating a poor plan 
due to assuming some users are somewhat dumb.


I wonder how expensive would it be to actually count the number of 
distinct values - there certainly are complex data types where the 
comparisons are fairly expensive, but I would not expect those to be 
used in explicit VALUES lists. Also, maybe there's some sufficiently 
accurate estimation approach - e.g. for small number of values we can 
compute the number of distinct values directly (and it's still going to 
be fairly cheap), while for larger number we could probably sample the 
values similarly to what ANALYZE does.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_receivexlog does not report flush position with --synchronous

2016-08-22 Thread Gabriele Bartolini

Hi guys,

  while adding synchronous WAL streaming to Barman, I noticed that
pg_receivexlog - unless a replication slot is specified and --synchronous
is passed - does not become a synchronous receiver (if the application_name
is in the synchronous_standby_names value). I was a bit surprised by this
behaviour.

  By reading the pg_receivexlog documentation, I assumed that:

1) if I set application_name properly for pg_receivexlog (let's say
'barman_receive_wal')
2) then I set synchronous_standby_names so that barman_receive_wal is first
in the list
3) then I run pg_receivexlog with --synchronous

  I would find the pg_receivexlog in 'sync' state in the
pg_stat_replication view on the master.

  However, I kept receiving the 'async' state. After looking at the
documentation once more, I noticed that '--synchronous' was mentioned also
in the '--slot-name' option but the explanation - at least to me - was not
very clear.

  I tried again by creating a replication slot and passing it to
pg_receivexlog and this time I could see 'sync' as streaming state.

  Looking up the code in more details I see that, unless replication slot
are enabled, pg_receivexlog does not report the flush position (this is a
precaution that's been taken when '--synchronous' was probably not around).
Please find this very short patch which - in case replication slots are not
present but synchronous is - reports the flush position.

   I am not sure if it is a bug or not. I any case I guess we should
probably improve the documentation - it's a bit late here so maybe I can
try better tomorrow with a fresher mind. :)

Thanks,
Gabriele
--
 Gabriele Bartolini - 2ndQuadrant Italia - Director
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it


0001-pg_receivexlog-does-not-report-flush-position-with-s.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-22 Thread Thomas Munro

On Tue, Aug 23, 2016 at 8:41 AM, Robert Haas  wrote:
> On Tue, Aug 16, 2016 at 7:41 PM, Thomas Munro
>  wrote:
>> I still think it's worth thinking about something along these lines on
>> Linux only, where holey Swiss tmpfs files can bite you.  Otherwise
>> disabling overcommit on your OS isn't enough to prevent something
>> which is really a kind of deferred overcommit with a surprising
>> failure mode (SIGBUS rather than OOM SIGKILL).
>
> Yeah, I am inclined to agree.  I mean, creating a DSM is fairly
> heavyweight already, so one extra system call isn't (I hope) a crazy
> overhead.  We could test to see how much it slows things down.  But it
> may be worth paying the cost even if it ends up being kinda expensive.
> We don't really have any way of knowing whether the caller's request
> is reasonable relative to the amount of virtual memory available, and
> converting a possible SIGBUS into an ereport(ERROR, ...) is a big win.

Here's a version of the patch that only does something special if the
following planets are aligned:

* Linux only: for now, there doesn't seem to be any reason to assume
that other operating systems share this file-with-holes implementation
quirk, or that posix_fallocate would work on such a fd, or which errno
values to tolerate if it doesn't.  From what I can tell, Solaris,
FreeBSD etc either don't overcommit or do normal non-stealth
overcommit with the usual out-of-swap failure mode for shm_open
memory, with a way to turn overcommit off.  So I put a preprocessor
test in to do this just for __linux__, and I used "fallocate" (a
non-standard Linux syscall) instead of "posix_fallocate".

* Glibc version >= 2.10: ancient versions and other libc
implementations don't have fallocate, so I put a test into the
configure script.

* Kernel version >= 2.6.23+: the man page says that ancient kernels
don't provide the syscall, and that glibc sets errno to ENOSYS in that
case, so I put a check in to keep calm and carry on.

I don't know if any distros ever shipped with an old enough kernel and
new enough glibc for ENOSYS to happen in the wild; for example RHEL5
had neither kernel nor glibc support, and RHEL6 had both.  I haven't
personally tested that path.

Maybe it would be worth thinking about whether this is a condition
that should cause dsm_create to return NULL rather than ereporting,
depending on a flag along the lines of the existing
DSM_CREATE_NULL_IF_MAXSEGMENTS.  But that could be a separate patch if
it turns out to be useful.

-- 
Thomas Munro
http://www.enterprisedb.com

fallocate.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-08-22 Thread Gavin Flower


On 23/08/16 09:40, Andres Freund wrote:

Hi,

as noted in [1] I started hacking on removing the current implementation
of SRFs in the targetlist (tSRFs henceforth). IM discussion brought the
need for a description of the problem, need and approach to light.

There are several reasons for wanting to get rid of tSRFs. The primary
ones in my opinion are that the current behaviour of several SRFs in one
targetlist is confusing, and that the implementation burden currently is
all over the executor.  Especially the latter is what is motivating me
working on this, because it blocks my work on making the executor faster
for queries involving significant amounts of tuples.  Batching is hard
if random places in the querytree can icnrease the number of tuples.

The basic idea, hinted at in several threads, is, at plan time, to convert a 
query like
SELECT generate_series(1, 10);
into
SELECT generate_series FROM ROWS FROM(generate_series(1, 10));

thereby avoiding the complications in the executor (c.f. execQual.c
handling of isDone/ExprMultipleResult and supporting code in many
executor nodes / node->*.ps.ps_TupFromTlist).

There are several design questions along the way:

1) How to deal with the least-common-multiple behaviour of tSRFs. E.g.
=# SELECT generate_series(1, 3), generate_series(1,2);
returning
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   1 │
│   2 │   2 │
│   3 │   1 │
│   1 │   2 │
│   2 │   1 │
│   3 │   2 │
└─┴─┘
(6 rows)
but
=# SELECT generate_series(1, 3), generate_series(5,7);
returning
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   5 │
│   2 │   6 │
│   3 │   7 │
└─┴─┘

discussion in this thread came, according to my reading, to the
conclusion that that behaviour is just confusing and that the ROWS FROM
behaviour of
=# SELECT * FROM ROWS FROM(generate_series(1, 3), generate_series(1,2));
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   1 │
│   2 │   2 │
│   3 │  (null) │
└─┴─┘
(3 rows)

makes more sense.
I had always implicitly assumed that having 2 generated sequences would 
act as equivalent to:


SELECT
sa,
sb
FROM
ROWS FROM(generate_series(1, 3)) AS sa,
ROWS FROM(generate_series(5, 7)) AS sb
ORDER BY
sa,
sb;

 sa | sb
+
  1 |  5
  1 |  6
  1 |  7
  2 |  5
  2 |  6
  2 |  7
  3 |  5
  3 |  6
  3 |  7


Obviously I was wrong - but to me, my implicit assumption makes more sense!
[...]


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changed SRF in targetlist handling

2016-08-22 Thread Andres Freund

Hi,

On 2016-08-22 16:20:58 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-08-17 17:41:28 -0700, Andres Freund wrote:
> >> Tom, do you think this is roughly going in the right direction?
> 
> I've not had time to look at this patch, I'm afraid.  If you still
> want me to, I can make time in a day or so.

That'd greatly be appreciated. I think polishing the POC up to
committable patch will be a considerable amount of work, and I'd like
design feedback before that.


> > I'm working on these. Atm ExecMakeTableFunctionResult() resides in
> > execQual.c - I'm inlining it into nodeFunctionscan.c now, because
> > there's no other callers, and having it separate seems to bring no
> > benefit.
> 
> > Please speak soon up if you disagree.
> 
> I think ExecMakeTableFunctionResult was placed in execQual.c because
> it seemed to belong there alongside the support for SRFs in tlists.
> If that's going away then there's no good reason not to move the logic
> to where it's used.

Cool, then we agree.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)