from:"Beena Emerson"

Re: [HACKERS] Runtime Partition Pruning

2017-11-09 Thread Beena Emerson

Hello all,

Here is the updated patch which is rebased over v10 of Amit Langote's
path towards faster pruning patch [1]. It modifies the PartScanKeyInfo
struct to hold expressions which is then evaluated by the executor to
fetch the correct partitions using the function.

The code still chooses the custom plan instead of the generic plan for
the prepared statements. I am working on it. The following output is
after adding a hack in the code forcing selection of generic plan.

postgres=# EXPLAIN EXECUTE prstmt_select(7);
QUERY PLAN
--
 Append  (cost=0.00..1732.25 rows=2 width=8)
   ->  Seq Scan on tprt_1  (cost=0.00..847.00 rows=16667 width=8)
 Filter: ($1 > col1)
   ->  Seq Scan on tprt_2  (cost=0.00..847.00 rows=16667 width=8)
 Filter: ($1 > col1)
(5 rows)

postgres=# EXPLAIN EXECUTE prstmt_select(20);
QUERY PLAN
--
 Append  (cost=0.00..1732.25 rows=3 width=8)
   ->  Seq Scan on tprt_1  (cost=0.00..847.00 rows=16667 width=8)
 Filter: ($1 > col1)
   ->  Seq Scan on tprt_2  (cost=0.00..847.00 rows=16667 width=8)
 Filter: ($1 > col1)
   ->  Seq Scan on tprt_3  (cost=0.00..38.25 rows=753 width=8)
 Filter: ($1 > col1)
(7 rows)


[1] 
https://www.postgresql.org/message-id/b8094e71-2c73-ed8e-d8c3-53f232c8c049%40lab.ntt.co.jp

Tested on commit: 9b9cb3c4534d717c1c95758670198ebbf8a20af2

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-Implement-runtime-partiton-pruning.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] path toward faster partition pruning

2017-10-25 Thread Beena Emerson

Hello,

On Wed, Oct 25, 2017 at 1:07 PM, Amit Langote
<langote_amit...@lab.ntt.co.jp> wrote:
> On 2017/10/25 15:47, Amit Langote wrote:
>> On 2017/10/24 1:38, Beena Emerson wrote:
>>> I had noticed this and also that this crash:
>>>
>>> tprt PARTITION BY RANGE(Col1)
>>>tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
>>>   tprt_11 FOR VALUES FROM (1) TO (1),
>>>   tprt_1d DEFAULT
>>>tprt_2 FOR VALUES FROM (50001) TO (11)
>>>
>>> EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 2 AND 7;
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>
>> ...and this (crash) were due to bugs in the 0005 patch.
>
> [  ]
>
>> Should be fixed in the attached updated version.
>
> Oops, not quite.  The crash that Beena reported wasn't fixed (or rather
> reintroduced by some unrelated change after once confirming it was fixed).
>
> Really fixed this time.

Some minor comments:

1. wrong function name (0003)

The comment on function get_partitions_from_clauses_guts uses wrong name:
instead of "_from_", "_using_" is written.

 /*
+ * get_partitions_using_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *

2. typo information (0003)

+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that informatin in the output argument

3. misspell admissible (0003)
+* columns, whereas a prefix of all partition key columns is addmissible
+* as min and max keys.

4. double and? (0002)
+* as part of the operator family, check if its negator
+* exists and and that the latter is compatible with

5. typo inequality (0002)
+* (key < val OR key > val), if the partitioning method
+* supports such notion of inequlity.


6. typo output (0005)
+* return it as the only scannable partition, that means the query
+* doesn't want null values in its outout.

7. typo provide (0005)
+   /* Valid keys->eqkeys must provoide all partition keys. */
+   Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);

8. comment of struct PartClause (0003)
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid repeated recomputation in remove_redundant_clauses().
+ */

Instead of repeated recomputation, we can use just  " repeated
computation" or just " recomputation"



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] path toward faster partition pruning

2017-10-25 Thread Beena Emerson

Hello Amit,

Thanks for the updated patches

On Wed, Oct 25, 2017 at 1:07 PM, Amit Langote
<langote_amit...@lab.ntt.co.jp> wrote:
> On 2017/10/25 15:47, Amit Langote wrote:
>> On 2017/10/24 1:38, Beena Emerson wrote:
>>> I had noticed this and also that this crash:
>>>
>>> tprt PARTITION BY RANGE(Col1)
>>>tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
>>>   tprt_11 FOR VALUES FROM (1) TO (1),
>>>   tprt_1d DEFAULT
>>>tprt_2 FOR VALUES FROM (50001) TO (11)
>>>
>>> EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 2 AND 7;
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>
>> ...and this (crash) were due to bugs in the 0005 patch.
>
> [  ]
>
>> Should be fixed in the attached updated version.
>
> Oops, not quite.  The crash that Beena reported wasn't fixed (or rather
> reintroduced by some unrelated change after once confirming it was fixed).
>
> Really fixed this time.
>

The crashes are fixed. However, handling of DEFAULT partition in
various queries is not proper.

Case 1: In this case default should be selected.
 DROP TABLE tprt;
  CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
  CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
PARTITION BY list(col1);
  CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES IN (2, 25000);
  CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES IN (5, 35000);
  CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES IN (1);
  CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;


postgres=#  EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 1;
QUERY PLAN
--
 Result
   One-Time Filter: false
(2 rows)


Case 2: In this case DEFAULT need not be selected.

DROP TABLE  tprt;
  CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
  CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
PARTITION BY range(col1);
  CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES FROM (1) TO (1);
  CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES FROM (1) TO (2);
  CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES FROM (2) TO (3);
  CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;
  INSERT INTO tprt SELECT generate_series(1,5), generate_series(1,5);

postgres=#  EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 1;
       QUERY PLAN

 Append
   ->  Seq Scan on tprt_11
 Filter: (col1 < 1)
   ->  Seq Scan on tprt_1d
 Filter: (col1 < 1)
(5 rows)


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] path toward faster partition pruning

2017-10-23 Thread Beena Emerson

On Mon, Oct 23, 2017 at 3:24 PM, Rajkumar Raghuwanshi
<rajkumar.raghuwan...@enterprisedb.com> wrote:
>
> On Mon, Oct 23, 2017 at 1:12 PM, Amit Langote
> <langote_amit...@lab.ntt.co.jp> wrote:
>>
>> The compiler I have here (gcc (GCC) 6.2.0) didn't complain like that for
>> this typedef redefinition introduced by the 0002 patch, but it seems that
>> it's not needed anyway, so got rid of that line in the attached updated
>> patch.
>>
>> Fixed one more useless diff in 0002, but no changes in any other patch
>
>
> Thanks for updated patches, I am able to compile it on head.
>
> While testing this, I got an observation, pruning is not scanning default
> partition leading to wrong output. below is test to reproduce this.
>
> create table rp (a int, b varchar) partition by range (a);
> create table rp_p1 partition of rp default;
> create table rp_p2 partition of rp for values from (1) to (10);
> create table rp_p3 partition of rp for values from (10) to (maxvalue);
>
> insert into rp values (-1,'p1');
> insert into rp values (1,'p2');
> insert into rp values (11,'p3');
>
> postgres=# select tableoid::regclass,* from rp;
>  tableoid | a  | b
> --++
>  rp_p2|  1 | p2
>  rp_p3| 11 | p3
>  rp_p1| -1 | p1
> (3 rows)
>
> --with pruning
> postgres=# explain (costs off) select * from rp where a <= 1;
> QUERY PLAN
> --
>  Append
>->  Seq Scan on rp_p2
>  Filter: (a <= 1)
> (3 rows)
>
> postgres=# select * from rp where a <= 1;
>  a | b
> ---+
>  1 | p2
> (1 row)
>

I had noticed this and also that this crash:

tprt PARTITION BY RANGE(Col1)
   tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
  tprt_11 FOR VALUES FROM (1) TO (1),
  tprt_1d DEFAULT
   tprt_2 FOR VALUES FROM (50001) TO (11)

EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 2 AND 7;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Runtime Partition Pruning

2017-09-25 Thread Beena Emerson

I have been working on implementing the runtime partition pruning
which would increase the performance of queries involving partitioned
table to a great extent.

PFA the POC which can be applied over Amit's patch for faster
partition pruning [1] and Dilip's refactor patch [2] on commit
2c74e6c1dcc5002fa8b822e5757f6c95d899fb7a.

[1] 
https://www.postgresql.org/message-id/e02923ea-a117-a6ad-6a3e-ea5e1ba41ece%40lab.ntt.co.jp

[2] 
https://www.postgresql.org/message-id/CAFiTN-tGnQzF_4QtbOHT-3hE%3DOvNaMfbbeRxa4UY0CQyF0G8gQ%40mail.gmail.com

There were a couple of things that need improvement/opinion:
In get_rel_partition_info, we store minop and maxop for each partition
key. For the equality case, which is most common, both would store the
same value. We could make it better by storing equal (bound, bound,
) instead repeating the same values.

get_partitions_for_keys currently returns the list of partitions valid
for the given keys but for a table with many partitions this list
would be very long so maybe for range qual ( key > a & key < b ) we
could only store the min and max partition number and increment
as_whichplan by 1 till we reach max partition number. For
non-continuous partitions, we would still need the list.

Currently, the partitions numbers are recalculated whenever the
ChgParam is set, This can be optimised by skipping this step when only
a non-partition key column has changed; reusing the existing
partitions selected.

Others:
- better handling of multiple key
- allow use of expression in the quals.
- To use min_incl, max_incl properly in get_partitions_for_keys.
- pruning during function calls.


Currently with patch, during NestLoop:
Nested Loop
-> SeqScan tbl1
-> Append
  -> Index Scan p01
  -> Index Scan p02
  -> Index Scan p03

For each tuple from tbl1, only the relevant partition (p01or p02 or
p03) will be scanned.


--- Prepared Statement Behaviour with patch---

Table Descritpion:
   Table "public.tprt"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
+-+---+--+-+-+--+-
 col1   | integer |   |  | | plain   |  |
 col2   | integer |   |  | | plain   |  |
Partition key: RANGE (col1)
Partitions: tprt_1 FOR VALUES FROM (1) TO (50001),
tprt_2 FOR VALUES FROM (50001) TO (11),
tprt_3 FOR VALUES FROM (11) TO (21)

EXPLAIN EXECUTE prstmt_select(15);

QUERY PLAN
--
 Append  (cost=0.00..1736.55 rows=1 width=8)
   ->  Seq Scan on tprt_1  (cost=0.00..849.15 rows=16724 width=8)
 Filter: (col1 < $1)
(3 rows)

EXPLAIN EXECUTE prstmt_select(6);
QUERY PLAN
--
 Append  (cost=0.00..1736.55 rows=2 width=8)
   ->  Seq Scan on tprt_1  (cost=0.00..849.15 rows=16724 width=8)
 Filter: (col1 < $1)
   ->  Seq Scan on tprt_2  (cost=0.00..849.15 rows=16724 width=8)
     Filter: (col1 < $1)
(5 rows)


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-POC-Implement-runtime-partiton-pruning.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-09-05 Thread Beena Emerson

Hello all,

This is to inform all that this patch has been merged with default
list partition patch [1]. There will be no further updates here. The
status of this will be updated on the commitfest according to progres
on that thread.


[1] 
https://www.postgresql.org/message-id/CAOgcT0ONgwajdtkoq%2BAuYkdTPY9cLWWLjxt_k4SXue3eieAr%2Bg%40mail.gmail.com

Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-08-24 Thread Beena Emerson

Hello,

On Thu, Aug 24, 2017 at 3:08 PM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
> Hi Beena,
>
>
> On Tue, Aug 22, 2017 at 5:22 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
>>
>> On Tue, Aug 22, 2017 at 4:20 PM, Beena Emerson <memissemer...@gmail.com>
>> wrote:
>> > Hi Jeevan,
>> >
>> > On Tue, Aug 22, 2017 at 7:53 AM, Jeevan Ladhe
>> > <jeevan.la...@enterprisedb.com> wrote:
>> >>
>> >>
>> >> 4.
>> >>  static List *
>> >> -get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
>> >> +get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
>> >> +  bool for_default)
>> >>  {
>> >>
>> >> The addition and the way flag ‘for_default’ is being used is very
>> >> confusing.
>> >> At this moment I am not able to think about a alternate solution to the
>> >> way you have used this flag. But will try and see if I can come up with
>> >> an alternate approach.
>> >
>> > Well, we need to skip the get_range_nulltest while collecting the qual
>> > of other partition to make one for default. We could skip this flag
>> > and remove the NullTest from the qual returned for each partition but
>> > I am not sure if thats a good way to go about it.
>> >
>> >
>>
>> Please find attached a patch which removes the for_default flag from
>> the get_qual_for_range function. This patch is just to show an idea
>> and should be applied over my earlier patch. There could be a better
>> way to do this. Let me know your opinion.
>>
>
> +
> + foreach (lc, list_tmp)
> + {
> + Node *n = (Node *) lfirst(lc);
> +
> + if (IsA(n, NullTest))
> + {
> + list_delete(part_qual, n);
> + count++;
> + }
> + }
> +
>
> I think its an unnecessary overhead to have the nulltest prepared first
> and then search for it and remove it from the partition qual. This is very
> ugly.

Yes, I felt the same but just put it out there to see if someone can
improve on this.

> I think the original idea is still better compared to this. May be we can
> rename
> the 'for_default' flag to something like 'part_of_default_qual'.
>

Ya. I think that would work.



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-08-22 Thread Beena Emerson

On Tue, Aug 22, 2017 at 4:20 PM, Beena Emerson <memissemer...@gmail.com> wrote:
> Hi Jeevan,
>
> On Tue, Aug 22, 2017 at 7:53 AM, Jeevan Ladhe
> <jeevan.la...@enterprisedb.com> wrote:
>>
>>
>> 4.
>>  static List *
>> -get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
>> +get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
>> +  bool for_default)
>>  {
>>
>> The addition and the way flag ‘for_default’ is being used is very confusing.
>> At this moment I am not able to think about a alternate solution to the
>> way you have used this flag. But will try and see if I can come up with
>> an alternate approach.
>
> Well, we need to skip the get_range_nulltest while collecting the qual
> of other partition to make one for default. We could skip this flag
> and remove the NullTest from the qual returned for each partition but
> I am not sure if thats a good way to go about it.
>
>

Please find attached a patch which removes the for_default flag from
the get_qual_for_range function. This patch is just to show an idea
and should be applied over my earlier patch. There could be a better
way to do this. Let me know your opinion.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


remove_default_flag.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-08-22 Thread Beena Emerson

Hi Jeevan,

On Tue, Aug 22, 2017 at 7:53 AM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
>
> Hi Beena,
>
> On Thu, Aug 17, 2017 at 4:59 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
>>
>> PFA the patch rebased over v25 patches of default list partition [1]
>
>
> Thanks for rebasing.
>
> Range partition review:

Thank you for your review.

>
> 3.
> @@ -2396,6 +2456,8 @@ make_one_range_bound(PartitionKey key, int index, List
> *datums, bool lower)
> ListCell   *lc;
> int i;
>
> +   Assert(datums != NULL);
> +
> bound = (PartitionRangeBound *) palloc0(sizeof(PartitionRangeBound));
> bound->index = index;
> bound->datums = (Datum *) palloc0(key->partnatts * sizeof(Datum));
>
> I am not really convinced, why should we have above Assert.

The function should never be called for default partition where datums = NULL.

>
> 4.
>  static List *
> -get_qual_for_range(PartitionKey key, PartitionBoundSpec *spec)
> +get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
> +  bool for_default)
>  {
>
> The addition and the way flag ‘for_default’ is being used is very confusing.
> At this moment I am not able to think about a alternate solution to the
> way you have used this flag. But will try and see if I can come up with
> an alternate approach.

Well, we need to skip the get_range_nulltest while collecting the qual
of other partition to make one for default. We could skip this flag
and remove the NullTest from the qual returned for each partition but
I am not sure if thats a good way to go about it.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-08-17 Thread Beena Emerson

PFA the patch rebased over v25 patches of default list partition [1]

[1] 
https://www.postgresql.org/message-id/CAOgcT0NwqnavYtu-QM-DAZ6N%3DwTiqKgy83WwtO2x94LSLZ1-Sw%40mail.gmail.com




-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v11_rebase.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-08-17 Thread Beena Emerson

Hello,

PFA the updated patch which returns NULL instead of true when the
default partition has no constraints and also have modified the output
as discussed above.

This applies over v24 patch [1] of default list partition. I will
rebase over the next version when it is updated.

[1] 
https://www.postgresql.org/message-id/CAOgcT0OVwDu%2BbeChWb5R5s6rfKLCiWcZT5617hqu7T3GdA1hAw%40mail.gmail.com
-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] expanding inheritance in partition bound order

2017-08-10 Thread Beena Emerson

Hi Amit,

On Thu, Aug 10, 2017 at 7:41 AM, Amit Langote
<langote_amit...@lab.ntt.co.jp> wrote:
> On 2017/08/05 2:25, Robert Haas wrote:
>> Concretely, my proposal is:
>>
>> P.S. While I haven't reviewed 0002 in detail, I think the concept of
>> minimizing what needs to be built in RelationGetPartitionDispatchInfo
>> is a very good idea.
>
> I put this patch ahead in the list and so it's now 0001.
>

FYI, 0001 patch throws the warning:

execMain.c: In function ‘ExecSetupPartitionTupleRouting’:
execMain.c:3342:16: warning: ‘next_ptinfo’ may be used uninitialized
in this function [-Wmaybe-uninitialized]
 next_ptinfo->parentid != ptinfo->parentid)

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-08-09 Thread Beena Emerson

Hello Rajkumar,

On Wed, Aug 9, 2017 at 12:37 PM, Rajkumar Raghuwanshi
<rajkumar.raghuwan...@enterprisedb.com> wrote:
>
> Hi Beena,
>
> I have applied Jeevan's v24 patches and then your v9 patch over commit
> 5ff3d73813ebcc3ff80be77c30b458d728951036.
> and while testing I got a server crash. below is sql to reproduce it.
>
> postgres=# CREATE TABLE rp (a int, b int) PARTITION by range (a);
> CREATE TABLE
> postgres=# CREATE TABLE rp_p1 PARTITION OF rp DEFAULT partition by range(a);
> CREATE TABLE
> postgres=# CREATE TABLE rp_p11 PARTITION OF rp_p1 FOR VALUES FROM (1) TO
> (15);
> CREATE TABLE
> postgres=# CREATE TABLE rp_p12 PARTITION OF rp_p1 DEFAULT;
> CREATE TABLE
> postgres=# insert into rp select i,i from generate_series(1,15) i;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>

Thank you for testing. It seems I made a mistake in the assert
condition. I have corrected it in this patch.



Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-08-08 Thread Beena Emerson

On Fri, Aug 4, 2017 at 7:48 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Jul 31, 2017 at 8:28 AM, Beena Emerson <memissemer...@gmail.com> 
> wrote:
>
> Why do we need to introduce PARTITION_RANGE_DATUM_DEFAULT at all?  It
> seems to me that the handling of default range partitions ought to be
> similar to the way a null-accepting list partition is handled -
> namely, it wouldn't show up in the "datums" or "kind" array at all,
> instead just showing up in PartitionBoundInfoData's default_index
> field.
>

I have updated the patch to make it similar to the way default/null is
handled in list partition, removing the PARTITION_RANGE_DATUM_DEFAULT.
This is to be applied over v24 patches shared by Jeevan [1] which
applies on commit id 5ff3d73813ebcc3ff80be77c30b458d728951036.

The RelationBuildPartitionDesc has been modified a lot, especially the
way all_bounds, ndatums and rbounds are set.

[1] 
https://www.postgresql.org/message-id/CAOgcT0OVwDu%2BbeChWb5R5s6rfKLCiWcZT5617hqu7T3GdA1hAw%40mail.gmail.com

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-07-31 Thread Beena Emerson

On Wed, Jul 26, 2017 at 7:05 PM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
> Hi Beena,
>
> I have posted the rebased patches[1] for default list partition.
> Your patch also needs a rebase.
>
> [1]
> https://www.postgresql.org/message-id/CAOgcT0OVwDu%2BbeChWb5R5s6rfKLCiWcZT5617hqu7T3GdA1hAw%40mail.gmail.com
>

Thanks for informing.
PFA the updated patch.
I have changed the numbering of enum PartitionRangeDatumKind since I
have to include DEFAULT as well. If you have better ideas, let me
know.



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Minor comment update in partition.c

2017-07-31 Thread Beena Emerson

The commit d363d42bb9a4399a0207bd3b371c966e22e06bd3 changed
RangeDatumContent *content to PartitionRangeDatumKind *kind but a
comment on function partition_rbound_cmp was left unedited and it
still mentions content1 instead of kind1.

PFA the patch to fix this.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


fix_comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-07-14 Thread Beena Emerson

Hello,

On Thu, Jul 13, 2017 at 1:22 AM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
>
>> - Should probably be merged with the patch to add default partitioning
>> for ranges.
>
>
> Beena is already rebasing her patch on my latest patches, so I think getting
> them merged here won't be an issue, mostly will be just like one more patch
> on top my patches.
>

I have posted the updated patch which can be applied over the v22
patches submitted here.
https://www.postgresql.org/message-id/CAOG9ApGEZxSQD-ZD3icj_CwTmprSGG7sZ_r3k9m4rmcc6ozr%3Dg%40mail.gmail.com

Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-07-14 Thread Beena Emerson

On Tue, Jul 4, 2017 at 4:21 PM, Rahila Syed <rahilasye...@gmail.com> wrote:
>
> Hello Beena,
>
> Thanks for the updated patch. It passes the basic tests which I performed.
>
> Few comments:
> 1. In   check_new_partition_bound(),
>
>> /* Default case is not handled here */
>>if (spec->is_default)
>>break;
> The default partition check here can be performed in common for both range
> and list partition.

Removed the check here.

>
> 2. RANGE_DATUM_FINITE = 0, /* actual datum stored elsewhere */
> +   RANGE_DATUM_DEFAULT,/* default */
>
> More elaborative comment?

I am not sure what to add. Do you have something in mind?

>
> 3.  Inside make_one_range_bound(),
>
>>+
>>+   /* datums are NULL for default */
>>+   if (datums == NULL)
>>+   for (i = 0; i < key->partnatts; i++)
>>+   bound->content[i] = RANGE_DATUM_DEFAULT;
>>+
> IMO, returning bound at this stage will make code clearer as further
> processing
> does not happen for default partition.

Done.

>
> 4.
> @@ -549,6 +568,7 @@ RelationBuildPartitionDesc(Relation rel)
> sizeof(RangeDatumContent
> *));
> boundinfo->indexes = (int *) palloc((ndatums + 1) *
> sizeof(int));
> +   boundinfo->default_index = -1;
> This should also be done commonly for both default list and range partition.

Removed the line here. Common allocation is done by Jeevan's patch.

>
> 5.
>   if (!spec->is_default && partdesc->nparts > 0)
> {
> ListCell   *cell;
>
> Assert(boundinfo &&
>boundinfo->strategy == PARTITION_STRATEGY_LIST &&
>(boundinfo->ndatums > 0 ||
> partition_bound_accepts_nulls(boundinfo) ||
> partition_bound_has_default(boundinfo)));
> The assertion condition partition_bound_has_default(boundinfo) is rendered
> useless
> because of if condition change and can be removed perhaps.

This cannot be removed.
This check is required when this code is run for a non-default
partition and default is the only existing partition.

>
> 6. I think its better to have following  regression tests coverage
>
> --Testing with inserting one NULL and other non NULL values for multicolumn
> range partitioned table.
>
> postgres=# CREATE TABLE range_parted (
> postgres(# a int,
> postgres(# b int
> postgres(# ) PARTITION BY RANGE (a, b);
> CREATE TABLE
> postgres=#
> postgres=# CREATE TABLE part1 (
> postgres(# a int NOT NULL CHECK (a = 1),
> postgres(# b int NOT NULL CHECK (b >= 1 AND b <= 10)
> postgres(# );
> CREATE TABLE
>
> postgres=# ALTER TABLE range_parted ATTACH PARTITION part1 FOR VALUES FROM
> (1, 1) TO (1, 10);
> ALTER TABLE
> postgres=# CREATE TABLE partr_def1 PARTITION OF range_parted DEFAULT;
> CREATE TABLE
>
> postgres=# insert into range_parted values (1,NULL);
> INSERT 0 1
> postgres=# insert into range_parted values (NULL,9);
> INSERT 0 1

added.





-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-07-14 Thread Beena Emerson

Thank you for your review Dilip and Rahila.

PFA the updated patch which is rebased to Jeevan's latest list
partition patch [1] and also handles your comments.

https://www.postgresql.org/message-id/CAOgcT0OARciE2X%2BU0rjSKp9VuC279dYcCGkc3nCWKhHQ1_m2rw%40mail.gmail.com

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-07-14 Thread Beena Emerson

On Mon, Jul 3, 2017 at 8:00 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Fri, Jun 30, 2017 at 11:56 AM, Beena Emerson <memissemer...@gmail.com> 
> wrote:
>> Hello Dilip,
>>
>> On Wed, Jun 21, 2017 at 6:27 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>>> On Tue, Jun 20, 2017 at 6:57 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>>>> This is basically crashing in RelationBuildPartitionDesc, so I think
>
>>
>> Thank you for your review and analysis.
>>
>> I have updated the patch.
>> - bound->content is set to RANGE_DATUM_DEFAULT for each of the keys
>> and not just the first one.
>> - Improve the way of handling DEFAULT bounds in RelationBuildPartitionDesc,
>>
>> There is a test for multiple column range in alter_table.sql
>
> Thanks for update patch,  After this fix segmentation fault is solved.
>
> I have some more comments.
>
> 1.
>
> lower = make_one_range_bound(key, -1, spec->lowerdatums, true);
>   upper = make_one_range_bound(key, -1, spec->upperdatums, false);
>
> + /* Default case is not handled here */
> + if (spec->is_default)
> + break;
> +
>
> I think we can move default check just above the make range bound it
> will avoid unnecessary processing.
>

Removed the check here. Default is checked beforehand.

>
> 2.
> RelationBuildPartitionDesc
> {
> 
>
>/*
> * If either of them has infinite element, we can't equate
> * them.  Even when both are infinite, they'd have
> * opposite signs, because only one of cur and prev is a
> * lower bound).
> */
> if (cur->content[j] != RANGE_DATUM_FINITE ||
>   prev->content[j] != RANGE_DATUM_FINITE)
> {
> is_distinct = true;
> break;
> }
> After default range partitioning patch the above comment doesn't sound
> very accurate and
> can lead to the confusion, now here we are not only considering
> infinite bounds but
> there is also a possibility that prev bound can be DEFAULT, so
> comments should clearly
> mention that.

Modified.

>
> 3.
>
> + bound->datums = datums ? (Datum *) palloc0(key->partnatts * sizeof(Datum))
> + : NULL;
>   bound->content = (RangeDatumContent *) palloc0(key->partnatts *
> sizeof(RangeDatumContent));
>   bound->lower = lower;
>
>   i = 0;
> +
> + /* datums are NULL for default */
> + if (datums == NULL)
> + for (i = 0; i < key->partnatts; i++)
> + bound->content[i] = RANGE_DATUM_DEFAULT;
>
> To me, it will look cleaner if we keep bound->content=NULL for
> default, instead of allocating memory
> and initializing each element,  But it's just a suggestions and you
> can decide whichever way
> seems better to you.  Then the other places e.g.
> + if (cur->content[i] == RANGE_DATUM_DEFAULT)
> + {
> + /*
>
> will change like
>
> + if (cur->content == NULL)
> + {
> + /*

I disagree. We use the content value RANGE_DATUM_DEFAULT during
comparing in partition_rbound_cmp and the code will not be very
intuiative if we use NULL instead of DEFAULT.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-07-06 Thread Beena Emerson

On Thu, Jul 6, 2017 at 3:21 PM, tushar <tushar.ah...@enterprisedb.com> wrote:
> On 07/06/2017 12:04 PM, Beena Emerson wrote:
>>
>> The 04-initdb-walsegsize_v2.patch has the following improvements:
>> - Rebased over new 03 patch
>> - Pass the wal-segsize intidb option as command-line option rathern
>> than in an environment variable.
>> - Since new function check_wal_size had only had two checks and was
>> sed once, moved the code to ReadControlFile where it is used and
>> removed this function.
>> - improve comments and add validations where required.
>> - Use DEFAULT_XLOG_SEG_SIZE to set the min_wal_size and
>> max_wal_size,instead of the value 16.
>> - Use XLogSegMaxSize and XLogSegMinSize to calculate the range of guc
>> wal_segment_size instead 16 - INT_MAX.
>
> Thanks Beena. I tested with below following scenarios  and all are working
> as expected
> .)Different WAL segment size i.e 1,2,8,16,32,64,512,1024 at the time of
> initdb
> .)SR setup in place.
> .)Combinations of  max/min_wal_size in postgresql.conf with different
> wal_segment_size.
> .)shutdown the server forcefully (kill -9) / promote slave / to make sure
> -recovery happened successfully.
> .)with different utilities like pg_resetwal/pg_upgrade/pg_waldump
> .)running pg_bench for substantial workloads (~ 1 hour)
> .)WAL segment size is not default (changed at the time of ./configure) +
> different wal_segment_size (at the time of initdb) .
>
> Things looks fine.
>

Thank you Tushar.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-07-06 Thread Beena Emerson

Hello,



On Tue, Mar 28, 2017 at 1:06 AM, Beena Emerson <memissemer...@gmail.com> wrote:
> Hello,
>
> On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
>>
>> At this point, I suggest splitting this patch up into several
>> potentially less controversial pieces.
>>
>> One big piece is that we currently don't support segment sizes larger
>> than 64 GB, for various internal arithmetic reasons.  Your patch appears
>> to address that.  So I suggest isolating that.  Assuming it works
>> correctly, I think there would be no great concern about it.
>>
>> The next piece would be making the various tools aware of varying
>> segment sizes without having to rely on a built-in value.
>>
>> The third piece would then be the rest that allows you to set the size
>> at initdb
>>
>> If we take these in order, we would make it easier to test various sizes
>> and see if there are any more unforeseen issues when changing sizes.  It
>> would also make it easier to do performance testing so we can address
>> the original question of what the default size should be.
>
>
> PFA the patches divided into 3 parts:
>
> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.

Already committed.

>
> 03-modify-tools.patch - Makes XLogSegSize into a variable, currently set as
> XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
> inbuilt value.
>

The updated 03-modify-tools_v2.patch has the following changes:
 - Rebased over current HEAD
 - Impoverished comments
 - Adding error messages where applicable.
 - Replace XLOG_SEG_SIZE in the tools and xlog_internal.h to
XLogSegSize. XLOG_SEG_SIZE is the wal_segment_size the server was
compiled with and XLogSegSize is the wal_segment_size of the target
server on which the tool is run. When the binaries used and the target
server are compiled with different wal_segment_size, the calculations
would be be affected and the tool would crash. To avoid it, all the
calculations used by tool should use XLogSegSize.
 - pg_waldump : The  fuzzy_open_file is split into two functions -
open_file_in_directory and identify_target_directory so that code can
be reused when determining the XLogSegSize from the WAL file header.
 - IsValidXLogSegSize macro is moved from 04 to here so that we can
use it for validating the size in all the tools.


> 04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and
> make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE
> instead of XLOG_SEG_SIZE

The 04-initdb-walsegsize_v2.patch has the following improvements:
- Rebased over new 03 patch
- Pass the wal-segsize intidb option as command-line option rathern
than in an environment variable.
- Since new function check_wal_size had only had two checks and was
sed once, moved the code to ReadControlFile where it is used and
removed this function.
- improve comments and add validations where required.
- Use DEFAULT_XLOG_SEG_SIZE to set the min_wal_size and
max_wal_size,instead of the value 16.
- Use XLogSegMaxSize and XLogSegMinSize to calculate the range of guc
wal_segment_size instead 16 - INT_MAX.


>
>>
>> One concern I have is that your patch does not contain any tests.  There
>> should probably be lots of tests.
>
>
> 05-initdb_tests.patch adds tap tests to initialize cluster with different
> wal_segment_size and then check the config values. What other tests do you
> have in mind? Checking the various tools?
>
>




-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


04-initdb-walsegsize_v2.patch
Description: Binary data


03-modify-tools_v2.patch
Description: Binary data


01-add-XLogSegmentOffset-macro_rebase.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-30 Thread Beena Emerson

Hello Dilip,

On Wed, Jun 21, 2017 at 6:27 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Tue, Jun 20, 2017 at 6:57 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>> This is basically crashing in RelationBuildPartitionDesc, so I think
>> we don't have any test case for testing DEFAULT range partition where
>> partition key has more than one attribute.  So I suggest we can add
>> such test case.
>
> Some more comments.
>
> 
> - bound->datums = (Datum *) palloc0(key->partnatts * sizeof(Datum));
> - bound->content = (RangeDatumContent *) palloc0(key->partnatts *
> -   sizeof(RangeDatumContent));
> + bound->datums = datums ? (Datum *) palloc0(key->partnatts * sizeof(Datum))
> + : NULL;
> + bound->content = (RangeDatumContent *) palloc0(
> + (datums ? key->partnatts : 1) * sizeof(RangeDatumContent));
>   bound->lower = lower;
>
>   i = 0;
> +
> + /* If default, datums are NULL */
> + if (datums == NULL)
> + bound->content[i] = RANGE_DATUM_DEFAULT;
> 
>
> For the default partition we are only setting bound->content[0] to
> default, but content for others key
> attributes are not initialized.  But later in the code, if the content
> of the first key is RANGE_DATUM_DEFAULT then it should not access the
> next content,  but I see there are some exceptions.  Which can access
> uninitialized value?
>
> for example see below code
>
> 
> for (i = 0; i < key->partnatts; i++)
>   {
> + if (rb_content[i] == RANGE_DATUM_DEFAULT)   --> why it's going to
> content for next parttion attribute, we never initialized that?
> + continue;
> +
>   if (rb_content[i] != RANGE_DATUM_FINITE)
>   return rb_content[i] == RANGE_DATUM_NEG_INF ? -1 : 1;
> 
>
> Also
> In RelatiobBuildPartitionDesc
>
> 
> for (j = 0; j < key->partnatts; j++)
> {
> -- Suppose first is RANGE_DATUM_DEFAULT then we should not check next
>because that is never initialized.  I think this is the cause of
> the crash also what I have reported above.
> 
> if (rbounds[i]->content[j] == RANGE_DATUM_FINITE)
> boundinfo->datums[i][j] =
> datumCopy(rbounds[i]->datums[j],
>  key->parttypbyval[j],
>  key->parttyplen[j]);
> /* Remember, we are storing the tri-state value. */
> boundinfo->content[i][j] = rbounds[i]->content[j];
> 
>

Thank you for your review and analysis.

I have updated the patch.
- bound->content is set to RANGE_DATUM_DEFAULT for each of the keys
and not just the first one.
- Improve the way of handling DEFAULT bounds in RelationBuildPartitionDesc,

There is a test for multiple column range in alter_table.sql

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-28 Thread Beena Emerson

Hello Rahila,

On Wed, Jun 28, 2017 at 12:34 PM, Rahila Syed <rahilasye...@gmail.com> wrote:
> Hi Beena,
>
> I started testing and reviewing the patch. Can you update the patch as v5
> patch does not apply cleanly on master?
>

Thanks for looking into this.

The patch is to be applied on Jeevn's partition patch
(http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg316818.html)
which can be applied over commit
a12c09ad86e60a8acb269744b8ee86429dda2cd8.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-28 Thread Beena Emerson

On Wed, Jun 28, 2017 at 12:34 PM, Rahila Syed <rahilasye...@gmail.com> wrote:
> Hi Beena,
>
> I started testing and reviewing the patch. Can you update the patch as v5
> patch does not apply cleanly on master?
>

I am currently working on Dilip's comments, I will update the patch soon.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-14 Thread Beena Emerson

Hello,

PFA the updated patch.
This is rebased over v21 patches of list partition.
(http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg316818.html)


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-06 Thread Beena Emerson

Hello Dilip,

On Mon, Jun 5, 2017 at 8:44 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Mon, Jun 5, 2017 at 1:43 AM, Beena Emerson <memissemer...@gmail.com> wrote:
>> The new patch is rebased over default_partition_v18.patch
>> [http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315831.html]
>
> I have done the initial review of the patch, I have few comments.
>
Thank you.
>
> + if ((lower->content[0] == RANGE_DATUM_DEFAULT))
> + {
> + if (partition_bound_has_default(partdesc->boundinfo))
> + {
> + overlap = true;
> + with = partdesc->boundinfo->default_index;
> + }
>
> I think we can change if ((lower->content[0] == RANGE_DATUM_DEFAULT))
> check to if (spec->is_default) that way it will be consistent with the
> check in the PARTITION_STRATEGY_LIST.  Or we can move this complete
> check outside of the switch.

I have now moved the is_default check for both list and range outside
the switch case.

>
> -
>
> - PartitionRangeDatum *datum = castNode(PartitionRangeDatum, lfirst(lc));
> + Node   *node = lfirst(lc);
> + PartitionRangeDatum *datum;
> +
> + datum = castNode(PartitionRangeDatum, node);
>
> Why do you need to change this?

I forgot to remove it.
It was needed for previous version of patch but no longer needed and
hence reverted this change.

>
> --
>
> + if (!datums)
> + bound->content[i] = RANGE_DATUM_DEFAULT;
> +
>
> Better to check if (datums != NULL) for non boolean types.

done.

>
> -
> + if (content1[i] == RANGE_DATUM_DEFAULT ||
> + content2[i] == RANGE_DATUM_DEFAULT)
> + {
> + if (content1[i] == content2[i])
> + return 0;
> + else if (content1[i] == RANGE_DATUM_DEFAULT)
> + return -1;
>
> I don't see any comments why default partition should be considered
> smallest in the index comparison. For negative infinity, it's pretty
> obvious by the enum name itself.

Default could be lowest or highest, no specific reason for putting it lowest.
I have not added any comments in this version.

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-04 Thread Beena Emerson

Hello,

On Sun, Jun 4, 2017 at 9:26 AM, Rafia Sabih
<rafia.sa...@enterprisedb.com> wrote:
> On Fri, Jun 2, 2017 at 5:48 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Fri, Jun 2, 2017 at 8:09 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>> I think if you have found spelling mistakes unrelated to this patch,
>>> then it is better to submit those as a separate patch in a new thread.
>>
>> +1.
>
> Sure, attached is the version without those changes.
>

Thank you for your comments. I have applied most of the changes. The
regression comment change from 'fail' -> ' Following statement should
fail' and 'ok' -> 'Following should complete successfully' is ignored
since other tests in the file had similar comments

The new patch is rebased over default_partition_v18.patch
[http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315831.html]


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-04 Thread Beena Emerson

On Mon, Jun 5, 2017 at 12:14 AM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
>
>
>>
>> What is the reason the new patch does not mention of violating rows
>> when a new partition overlaps with default?
>> Is it because more than one row could be violating the condition?
>
>
> This is because, for reporting the violating error, I had to function
> ExecBuildSlotValueDescription() public. Per Amit's comment I have
> removed this change and let the overlapping error without row contains.
> I think this is analogus to other functions that are throwing violation
> error
> but are not local to execMain.c.
>

ok thanks.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-04 Thread Beena Emerson

Hello,

On Fri, Jun 2, 2017 at 1:05 AM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
> Hi,
>
> I have addressed Ashutosh's and Amit's comments in the attached patch.
>
> Please let me know if I have missed anything and any further comments.
>
> PFA.
>
> Regards,
> Jeevan Ladhe
>

What is the reason the new patch does not mention of violating rows
when a new partition overlaps with default?
Is it because more than one row could be violating the condition?

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-05-31 Thread Beena Emerson

Hello,

PFA the updated patch.
Dependent patch default_partition_v17.patch [1]
This patch adds regression tests and removes the separate function to
get default partition qual.


On Mon, May 29, 2017 at 3:22 PM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
> Hi Beena,
>
> I went through your patch, and here are some of my comments:
>
> - For generating a qual for default range partition, instead of scanning for
> all
> the children and collecting all the boundspecs, how about creating and
> negating
> an expression from the lists of lowerdatums and upperdatums in boundinfo.
>

Unlike list, range partition can be for multiple columns and the
expressions get complicated. I have used the same logic of looping
through different partitions and negating the ORed expr. However, this
is now done under get_qual_for_range.


> - Wrong comment:
> + int default_index; /* Index of the default list partition. */

Comment is made generic in the dependent patch.

>
> - Suggested by Robert earlier on default list partitioning thread, instead
> of
> abbreviating is_def/found_def use is(found)_default etc.

corrected.

>
> - unrelated change:
> - List   *all_parts;
> + List   *all_parts;
>

undone.

> - typo: partiton should be partition, similar typo at other places too.
> +  * A non-default range partiton table does not currently allow partition
> keys
>

rectified.

> - Useless hunk for this patch:
> - Oiddefid;
> + Oid defid;

undone.

>
> - better to use IsA here, instead of doing explicit check:
> + bound->content[i] = (datum->type == T_DefElem)
> + ? RANGE_DATUM_DEFAULT

Modified

>
> - It is better to use head of either lowerdatums or upperdatums list to
> verify
> if its a default partition and get rid of the constant PARTITION_DEFAULT
> altogether.
>

modified this part as necessary.


> - In the function get_qual_from_partbound() is_def is always going to be
> false
> for range partition, the function get_qual_for_range can be directly passed
> false instead.
>
> - Following comment for function get_qual_for_range_default() implies that
> this
> function returns bool, but the actually it returns a List.
> + *
> + * If DEFAULT is the only partiton for the table then this returns TRUE.
> + *
>
Updated.

[1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315573.html

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-30 Thread Beena Emerson

On Wed, May 31, 2017 at 8:13 AM, Amit Langote
<langote_amit...@lab.ntt.co.jp> wrote:
> On 2017/05/31 9:33, Amit Langote wrote:
>
>
> In get_rule_expr():
>
>  case PARTITION_STRATEGY_LIST:
>  Assert(spec->listdatums != NIL);
>
> +/*
> + * If the boundspec is of Default partition, it does
> + * not have list of datums, but has only one node to
> + * indicate its a default partition.
> + */
> +if (isDefaultPartitionBound(
> +(Node *) linitial(spec->listdatums)))
> +{
> +appendStringInfoString(buf, "DEFAULT");
> +break;
> +}
> +
>
> How about adding this part before the switch (key->strategy)?  That way,
> we won't have to come back and add this again when we add range default
> partitions.

I think it is best that we add a bool is_default to PartitionBoundSpec
and then have a general check for both list and range. Though
listdatums, upperdatums and lowerdatums are set to default for a
DEFAULt partition, it does not seem proper that we check listdatums
for range as well.




-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-05-30 Thread Beena Emerson

On Mon, May 29, 2017 at 3:22 PM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
> Hi Beena,
>
> I went through your patch, and here are some of my comments:
>

Thank you for your comments. I  will take care of them in the next
version of patch.

> - I am sorry, but I could not understand following hunk. Does this change
> really
> belongs to this patch? If not, it will be better to handle it separately.
>
> @@ -2242,33 +2387,16 @@ get_partition_for_tuple(PartitionDispatch *pd,
>   ecxt->ecxt_scantuple = slot;
>   FormPartitionKeyDatum(parent, slot, estate, values, isnull);
>
> - if (key->strategy == PARTITION_STRATEGY_RANGE)
> + if (key->strategy == PARTITION_STRATEGY_LIST && isnull[0])
>   {
>   /*
> - * Since we cannot route tuples with NULL partition keys through
> - * a range-partitioned table, simply return that no partition
> - * exists
> + * A null partition key is only acceptable if null-accepting list
> + * partition exists.
>   */

In RANGE, initially NULL was not allowed, now  NULL is routed to
default. I have only removed the check for null in RANGE and kept the
check for null partition in case of list.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-29 Thread Beena Emerson

On Mon, May 29, 2017 at 9:33 PM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
> Hi,
>
> I have rebased the patch on latest commit with few cosmetic changes.
>
> The patch fix_listdatums_get_qual_for_list_v3.patch [1]  needs to be applied
> before applying this patch.
>
> [1] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315490.html
>


This needs a rebase again.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-29 Thread Beena Emerson

On Thu, May 25, 2017 at 3:03 PM, Jeevan Ladhe
<jeevan.la...@enterprisedb.com> wrote:
>
> Forgot to attach the patch.
> PFA.
>
> On Thu, May 25, 2017 at 3:02 PM, Jeevan Ladhe <jeevan.la...@enterprisedb.com> 
> wrote:
>>
>> Hi Rajkumar,
>>
>>> postgres=# CREATE TEMP TABLE temp_list_part (a int) PARTITION BY LIST (a);
>>> CREATE TABLE
>>> postgres=# CREATE TEMP TABLE temp_def_part (a int);
>>> CREATE TABLE
>>> postgres=# ALTER TABLE temp_list_part ATTACH PARTITION temp_def_part 
>>> DEFAULT;
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>
>>
>> Thanks for reporting.
>> PFA patch that fixes above issue.
>>


The existing comment is not valid
/*
 * A null partition key is only acceptable if null-accepting list
 * partition exists.
 */
as we allow NULL to be stored in default. It should be updated.

DROP TABLE list1;
CREATE TABLE list1 (a int) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (2);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 VALUES (NULL);
SELECT * FROM list1_def;
 a
---

(1 row)


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-05-23 Thread Beena Emerson

Hello,

On Mon, May 22, 2017 at 11:29 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Mon, May 22, 2017 at 7:27 AM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> Would it be more readable if this reads as NOT
> (constraint_for_partition_1 || constraint_for_partition_2 ||
> constraint_for_partition_3)? That way, the code to create
> constraint_for_partition_n can be reused and there's high chance that
> we will end up with consistent constraints?
>

PFA the patch which gives the default partition constraint as you have
suggested.


>
> >
> > It still needs more work:
> > 1. Handling addition of new partition after default, insertion of data,
> few
> > more bugs
> > 2. Documentation
> > 3. Regression tests
> >
>
> I think, the default partition support for all the strategies
> supporting it should be a single patch since most of the code will be
> shared?
>
>
Dependency on list default patch:
gram.y : adding the syntax
partition.c:
- default_index member in PartitionBoundInfoData;
- check_new_partition_bound : the code for adding a partition after default
has been completely reused.
- isDefaultPartitionBound function is used.

The structures are same  but the handling of list and range is very
different and the code mostly has  the switch case construct to handle the
2 separately. I feel it could be taken separately.

As suggested in the default list thread, I have created
a partition_bound_has_default macro and avoided usage of has_default in
range code. This has to be used for list as well.
Another suggestion for list which has to be implemented in this patch in
removal of PARTITION_DEFAULT. Ii have not done this in this version.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-21 Thread Beena Emerson

Hello,

Patch for default range partition has been added. PFA the rebased v12 patch
for the same.
I have not removed the has_default variable yet.

Default range partition:
https://www.postgresql.org/message-id/CAOG9ApEYj34fWMcvBMBQ-YtqR9fTdXhdN82QEKG0SVZ6zeL1xg%40mail.gmail.com
-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_partition_v12_rebase.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Default Partition for Range

2017-05-21 Thread Beena Emerson

Hello,

Many were in favour of the default partition for tables partitioned by
range [1].
Please find attached the WIP patch for the same which can be applied over
the default_partition_v12.patch.

Syntax: Same as agreed for list:
CREATE PARTITION  PARTITION OF  DEFAULT;

Default Constraint:
Negation constraints of all existing partitions.

One case:
CREATE TABLE range1 (a int, b int) PARTITION by range (a);
CREATE TABLE range1_1 PARTITION OF range1 FOR VALUES FROM (1) TO (5);
CREATE TABLE range1_2 PARTITION OF range1 FOR VALUES FROM (7) TO (10);
CREATE TABLE range1_def PARTITION OF range1 DEFAULT;
\d+ range1_def
Table "public.range1_def"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+
-+--+-
 a  | integer |   | not null | | plain   |
 |
 b  | integer |   |  | | plain   |
 |
Partition of: range1 DEFAULT
Partition constraint: (((a < 1) OR (a >= 5)) AND ((a < 7) OR (a >= 10)))

It still needs more work:
1. Handling addition of new partition after default, insertion of data, few
more bugs
2. Documentation
3. Regression tests

[1] Discussion on default partition:
*https://www.postgresql.org/message-id/flat/CAOgcT0PLPge%3D5U6%3DGU5SnC3_8yutCbWWOiUva3Cw94M9zpbvgQ%40mail.gmail.com
<https://www.postgresql.org/message-id/flat/CAOgcT0PLPge%3D5U6%3DGU5SnC3_8yutCbWWOiUva3Cw94M9zpbvgQ%40mail.gmail.com>*

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_WIP.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-12 Thread Beena Emerson

Hello,


On Fri, May 12, 2017 at 5:30 PM, Rahila Syed <rahilasye...@gmail.com> wrote:

> Hello,
>
> >(1) With the new patch, we allow new partitions when there is overlapping
> data with default partition. The entries in default are ignored when
> running queries satisfying the new partition.
> This was introduced in latest version. We are not allowing adding a
> partition when entries with same key value exist in default partition. So
> this scenario should not
> come in picture. Please find attached an updated patch which corrects this.
>

Thank you for the updated patch. However, now I cannot create a partition
after default.

CREATE TABLE list1 (
a int,
b int
) PARTITION BY LIST (a);

CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
CREATE TABLE list1_5 PARTITION OF list1 FOR VALUES IN (3);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>


>
>
> >(2) I get the following warning:
>
> >partition.c: In function ‘check_new_partition_bound’:
> >partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in
> this function [-Wmaybe-uninitialized]
> >   && boundinfo->has_default)
>^
> >preproc.y:3250.2-8: warning: type clash on default action:  != <>
> I failed to notice this warning. I will look into it.
>
> >This is incredibly ugly.  I don't know exactly what should be done
> >about it, but I think PARTITION_DEFAULT is a bad idea and has got to
> >go.  Maybe add a separate isDefault flag to PartitionBoundSpec
> Will look at other ways to do it.
>
> >Doesn't it strike you as a bit strange that get_qual_for_default()
> >doesn't return a qual?  Functions should generally have names that
> >describe what they do.
> Will fix this.
>
> >There's an existing function that you can use to concatenate two lists
> >instead of open-coding it.
> Will check this.
>
> >you should really add the documentation and
> >regression tests which you mentioned as a TODO.  And run the code
> >through pgindent
> I will also update the next version with documentation and regression tests
> and run pgindent
>
> Thank you,
> Rahila Syed
>
> On Fri, May 12, 2017 at 4:33 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
>
>>
>>
>> On Thu, May 11, 2017 at 7:37 PM, Rahila Syed <rahilasye...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Please find attached an updated patch with review comments and bugs
>>> reported till date implemented.
>>>
>>
>> Hello Rahila,
>>
>> Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."
>>
>> (1) With the new patch, we allow new partitions when there is overlapping
>> data with default partition. The entries in default are ignored when
>> running queries satisfying the new partition.
>>
>> DROP TABLE list1;
>> CREATE TABLE list1 (
>> a int,
>> b int
>> ) PARTITION BY LIST (a);
>> CREATE TABLE list1_1 (LIKE list1);
>> ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
>> CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
>> INSERT INTO list1 SELECT generate_series(1,2),1;
>> -- Partition overlapping with DEF
>> CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
>> INSERT INTO list1 SELECT generate_series(2,3),2;
>>
>> postgres=# SELECT * FROM list1 ORDER BY a,b;
>>  a | b
>> ---+---
>>  1 | 1
>>  2 | 1
>>  2 | 2
>>  3 | 2
>> (4 rows)
>>
>> postgres=# SELECT * FROM list1 WHERE a=2;
>>  a | b
>> ---+---
>>  2 | 2
>> (1 row)
>>
>> This ignores the a=2 entries in the DEFAULT.
>>
>> postgres=# SELECT * FROM list1_def;
>>  a | b
>> ---+---
>>  2 | 1
>>  3 | 2
>> (2 rows)
>>
>>
>> (2) I get the following warning:
>>
>> partition.c: In function ‘check_new_partition_bound’:
>> partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in
>> this function [-Wmaybe-uninitialized]
>>&& boundinfo->has_default)
>>^
>> preproc.y:3250.2-8: warning: type clash on default action:  != <>
>>
>>
>>> >1.
>>> >In following block, we can just do with def_index, and we do not need
>>> found_def
>>> >flag. We can check if def_index is -1 or not to decide if default
>>> partition is
>>> >present.
>>> found_def is used to set boundinfo->has_default which is used at couple
>>> of other places to check if default partition exists. The implementation
>>> is similar
>>> to has_null.
>>>
>>> >3.
>>> >In following function isDefaultPartitionBound, first statement "return
>>> false"
>>> >is not needed.
>>> It is needed to return false if the node is not DefElem.
>>>
>>> Todo:
>>> Add regression tests
>>> Documentation
>>>
>>> Thank you,
>>> Rahila Syed
>>>
>>>
>>>
>


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-12 Thread Beena Emerson

On Thu, May 11, 2017 at 7:37 PM, Rahila Syed  wrote:

> Hello,
>
> Please find attached an updated patch with review comments and bugs
> reported till date implemented.
>

Hello Rahila,

Tested on "efa2c18 Doc fix: scale(numeric) returns integer, not numeric."

(1) With the new patch, we allow new partitions when there is overlapping
data with default partition. The entries in default are ignored when
running queries satisfying the new partition.

DROP TABLE list1;
CREATE TABLE list1 (
a int,
b int
) PARTITION BY LIST (a);
CREATE TABLE list1_1 (LIKE list1);
ALTER TABLE  list1 ATTACH PARTITION list1_1 FOR VALUES IN (1);
CREATE TABLE list1_def PARTITION OF list1 DEFAULT;
INSERT INTO list1 SELECT generate_series(1,2),1;
-- Partition overlapping with DEF
CREATE TABLE list1_2 PARTITION OF list1 FOR VALUES IN (2);
INSERT INTO list1 SELECT generate_series(2,3),2;

postgres=# SELECT * FROM list1 ORDER BY a,b;
 a | b
---+---
 1 | 1
 2 | 1
 2 | 2
 3 | 2
(4 rows)

postgres=# SELECT * FROM list1 WHERE a=2;
 a | b
---+---
 2 | 2
(1 row)

This ignores the a=2 entries in the DEFAULT.

postgres=# SELECT * FROM list1_def;
 a | b
---+---
 2 | 1
 3 | 2
(2 rows)


(2) I get the following warning:

partition.c: In function ‘check_new_partition_bound’:
partition.c:882:15: warning: ‘boundinfo’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
   && boundinfo->has_default)
   ^
preproc.y:3250.2-8: warning: type clash on default action:  != <>


> >1.
> >In following block, we can just do with def_index, and we do not need
> found_def
> >flag. We can check if def_index is -1 or not to decide if default
> partition is
> >present.
> found_def is used to set boundinfo->has_default which is used at couple
> of other places to check if default partition exists. The implementation
> is similar
> to has_null.
>
> >3.
> >In following function isDefaultPartitionBound, first statement "return
> false"
> >is not needed.
> It is needed to return false if the node is not DefElem.
>
> Todo:
> Add regression tests
> Documentation
>
> Thank you,
> Rahila Syed
>
>
>

Re: [HACKERS] multi-column range partition constraint

2017-05-02 Thread Beena Emerson

On Tue, May 2, 2017 at 2:47 PM, Amit Langote <langote_amit...@lab.ntt.co.jp>
wrote:

> Hi Beena,
>
> On 2017/05/02 17:47, Beena Emerson wrote:
> > Hello Amit,
> >
> > On Tue, May 2, 2017 at 12:21 PM, Amit Langote <
> langote_amit...@lab.ntt.co.jp
> >> wrote:
> >
> >> Per an off-list report from Olaf Gawenda (thanks Olaf), it seems that
> the
> >> range partition's constraint is sometimes incorrect, at least in the
> case
> >> of multi-column range partitioning.  See below:
> >>
> >> create table p (a int, b int) partition by range (a, b);
> >> create table p1 partition of p for values from (1, 1) to (10 ,10);
> >> create table p2 partition of p for values from (11, 1) to (20, 10);
> >>
> >> Perhaps unusual, but it's still a valid definition.  Tuple-routing puts
> >> rows where they belong correctly.
> >>
> >> -- ok
> >> insert into p values (10, 9);
> >> select tableoid::regclass, * from p;
> >>  tableoid | a  | b
> >> --++---
> >>  p1   | 10 | 9
> >> (1 row)
> >>
> >> -- but see this
> >> select tableoid::regclass, * from p where a = 10;
> >>  tableoid | a | b
> >> --+---+---
> >> (0 rows)
> >>
> >> explain select tableoid::regclass, * from p where a = 10;
> >> QUERY PLAN
> >> ---
> >>  Result  (cost=0.00..0.00 rows=0 width=12)
> >>One-Time Filter: false
> >> (2 rows)
> >>
> >> -- or this
> >> insert into p1 values (10, 9);
> >> ERROR:  new row for relation "p1" violates partition constraint
> >> DETAIL:  Failing row contains (10, 9).
> >>
> >> This is because of the constraint being generated is not correct in this
> >> case.  p1's constraint is currently:
> >>
> >>   a >= 1 and a < 10
> >>
> >> where it should really be the following:
> >>
> >>   (a > 1  OR (a = 1  AND b >= 1))
> >> AND
> >>   (a < 10 OR (a = 10 AND b < 10))
> >>
> >
> >
> > IIUC, when we say range 1 to 10 we allow values from 1 to 9. Here we are
> > allowing a=10 be stored in p1 Is it okay?
>
> Yes it is.  Consider that the multi-column range partitioning uses
> tuple-comparison logic to determine if a row's partition key is >=
> lower_bound and < upper_bound tuple.
>
> In this case, p1's lower bound is a "tuple" (1, 1) and (10, 10) its upper
> bound.  Consider an input row with (2, -1) as its partition key.
> Tuple-comparison logic puts this row into p1, because:
>
> select (1, 1) <= (2, -1) and (2, -1) < (10, 10);
>  ?column?
> --
>  t
> (1 row)
>
> When performing tuple-comparison with the lower bound, since 2 > 1, the
> tuple comparison is done and the whole tuple is concluded to be > (1, 1);
> no attention is paid to the second column (or to the fact that -1 not >=
> 1).


> Now consider an input row with (10, 9) as its partition key, which too
> fits in p1, because:
>
> select (1, 1) <= (10, 9) and (10, 9) < (10, 10);
>  ?column?
> --
>  t
> (1 row)
>
> In this case, since 10 = 10, tuple-comparison considers the next column to
> determine the result.  In this case, since the second column 9 < 10, the
> whole tuple is concluded to be < (10, 10).
>
> However, neither of (1, 0), (10, 10), or (10, 11) is admissible into p1 by
> the above comparison logic:
>
> select (1, 1) <= (1, 0) and (1, 0) < (10, 10);
>  ?column?
> --
>  f
> (1 row)
>
> select (1, 1) <= (10, 10) and (10, 10) < (10, 10);
>  ?column?
> --
>  f
> (1 row)
>
> select (1, 1) <= (10, 11) and (10, 11) < (10, 10);
>  ?column?
> --
>  f
> (1 row)
>
> I havent been following these partition mails much. Sorry if I am missing
> > something obvious.
>
> No problem.
>

I have recently started looking at partition. I was wondering why 2nd
column is ignored and the exceptions.

For following partitions:
create table p1 partition of p for values from (1, 1) to (10 ,10);
create table p2 partition of p for values from (11, 1) to (20, 10);

IIUC, we can store values a = 1 to 9 , 11 to 19 and any value in b. But can
store 10 and 20 only when b <=9.

This still seems a bit confusing to me but thank you for your explanation.
These are just rules u have to abide by I guess!


>
> >> Attached patch rewrites get_qual_for_range() for the same, along with
> some
> >> code rearrangement for reuse.  I also added some new tests to insert.sql
> >> and inherit.sql, but wondered (maybe, too late now) whether there should
> >> really be a declarative_partition.sql for these, moving in some of the
> old
> >> tests too.
> >>
> > I got the following warning on compiling:
> > partition.c: In function ‘make_partition_op_expr’:
> > partition.c:1267:2: warning: ‘result’ may be used uninitialized in this
> > function [-Wmaybe-uninitialized]
> >   return result;
>
> Oops, fixed.  Updated patch attached.
>
>
Thank you,


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] multi-column range partition constraint

2017-05-02 Thread Beena Emerson

Hello Amit,

On Tue, May 2, 2017 at 12:21 PM, Amit Langote <langote_amit...@lab.ntt.co.jp
> wrote:

> Per an off-list report from Olaf Gawenda (thanks Olaf), it seems that the
> range partition's constraint is sometimes incorrect, at least in the case
> of multi-column range partitioning.  See below:
>
> create table p (a int, b int) partition by range (a, b);
> create table p1 partition of p for values from (1, 1) to (10 ,10);
> create table p2 partition of p for values from (11, 1) to (20, 10);
>
> Perhaps unusual, but it's still a valid definition.  Tuple-routing puts
> rows where they belong correctly.
>
> -- ok
> insert into p values (10, 9);
> select tableoid::regclass, * from p;
>  tableoid | a  | b
> --++---
>  p1   | 10 | 9
> (1 row)
>
> -- but see this
> select tableoid::regclass, * from p where a = 10;
>  tableoid | a | b
> --+---+---
> (0 rows)
>
> explain select tableoid::regclass, * from p where a = 10;
> QUERY PLAN
> ---
>  Result  (cost=0.00..0.00 rows=0 width=12)
>One-Time Filter: false
> (2 rows)
>
> -- or this
> insert into p1 values (10, 9);
> ERROR:  new row for relation "p1" violates partition constraint
> DETAIL:  Failing row contains (10, 9).
>
> This is because of the constraint being generated is not correct in this
> case.  p1's constraint is currently:
>
>   a >= 1 and a < 10
>
> where it should really be the following:
>
>   (a > 1  OR (a = 1  AND b >= 1))
> AND
>   (a < 10 OR (a = 10 AND b < 10))
>


IIUC, when we say range 1 to 10 we allow values from 1 to 9. Here we are
allowing a=10 be stored in p1 Is it okay?

I havent been following these partition mails much. Sorry if I am missing
something obvious.


>
> Attached patch rewrites get_qual_for_range() for the same, along with some
> code rearrangement for reuse.  I also added some new tests to insert.sql
> and inherit.sql, but wondered (maybe, too late now) whether there should
> really be a declarative_partition.sql for these, moving in some of the old
> tests too.
>
> Adding to the open items list.
>
> Thanks,
> Amit
>
> PS: due to vacation, I won't be able to reply promptly until Monday 05/08.
>
>
I got the following warning on compiling:
partition.c: In function ‘make_partition_op_expr’:
partition.c:1267:2: warning: ‘result’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
  return result;


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Crash when partition column specified twice

2017-04-28 Thread Beena Emerson

Hello Amit,

The extra n->is_from_type = false; seems to be added by mistake?

@@ -11888,6 +11891,8 @@ TableFuncElement:   ColId Typename
opt_collate_clause
n->is_local = true;
n->is_not_null = false;
n->is_from_type = false;
+   n->is_from_type = false;
+   n->is_from_parent = false;
n->storage = 0;
n->raw_default = NULL;
n->cooked_default = NULL;



On Fri, Apr 28, 2017 at 6:08 AM, Amit Langote <langote_amit...@lab.ntt.co.jp
> wrote:

> On 2017/04/27 12:36, Amit Langote wrote:
> > Noticed that a crash occurs if a column is specified twice when creating
> a
> > partition:
> >
> > create table p (a int) partition by list (a);
> >
> > -- crashes
> > create table p1 partition of parent (
> >   a not null,
> >   a default 1
> > ) for values in (1);
> >
> > The logic in MergeAttributes() that merged partition column options with
> > those of the parent didn't properly check for column being specified
> twice
> > and instead tried to delete the same ColumnDef from a list twice, causing
> > the crash.
> >
> > Attached fixes that.
>
> Patch rebased, because of a conflict with b9a3ef55b2.
>
> Thanks,
> Amit
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-04-07 Thread Beena Emerson

I ran tests and following are the details:

Machine details:
Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):192
On-line CPU(s) list:   0-191
Thread(s) per core:8
Core(s) per socket:1
Socket(s): 24
NUMA node(s):  4
Model: IBM,8286-42A

clients>  16  32   64
 128
size
16MB  18895.63486 28799.48759 37855.39521 27968.88309
32MB  18313.1461  29201.44954 40733.80051 32458.74147
64 MB18055.73141 30875.28687 42713.54447 38009.60542
128MB   18234.31424 33208.65419 48604.5593  45498.27689
256MB19524.36498 35740.19032 54686.16898 54060.11168
512MB 20351.90719 37426.72174 55045.60719 56194.99349
1024MB   19667.67062 35696.19194 53666.60373 54353.0614

I did not get any degradation, in fact, higher values showed performance
improvement for higher client count.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread Beena Emerson

On Fri, Apr 7, 2017 at 2:35 AM, Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:

> On 04/06/2017 08:33 PM, David Steele wrote:
>>
>>
>> I'm in favor of 16,64,256,1024.
>>
>>
> I don't see a particular reason for this, TBH. The sweet spots will be
> likely dependent hardware / OS configuration etc. Assuming there actually
> are sweet spots - no one demonstrated that yet.
>
> Also, I don't see how supporting additional WAL sizes increases chance of
> incompatibility. We already allow that, so either the tools (e.g. backup
> solutions) assume WAL segments are always 16MB (in which case are
> essentially broken) or support valid file sizes (in which case they should
> have no issues with the new ones).
>
> If we're going to do this, I'm in favor of deciding some reasonable upper
> limit (say, 1GB or 2GB sounds good), and allowing all 2^n values up to that
> limit.


I think the majority consensus is to use all valid values. Since 1GB is
what we have finalized as the upper limit, lets continue with that for now.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread Beena Emerson

Hello,

On Wed, Apr 5, 2017 at 6:06 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

>
> (Roughly speaking, to get started, this would mean compiling with
> --with-wal-segsize 16, 32, 64, 128, 256, run make check-world both
> sequentially and in parallel, and take note of a) passing, b) run time,
> c) disk space.)
>
>
The attached patch updates a pg_upgrade test which fails for higher segment
values: The output of SELECT restart_lsn FROM pg_replication_slots WHERE
slot_name = 'slot1'}.

The following are the results of the installcheck-world execution.

wal_size time   cluster_size   pg_wal   files
16 5m32.761s   539M 417M  26
32 5m32.618s   545M 417M 13
64 5m39.685s   571M 449M  7
128   5m52.961s641M 513M  4
256   6m13.402s   635M 512M   2
512   7m3.252s 1.2G  1G   2
1024 9m0.205s 1.2G   1G  1


wal_size time   cluster_size   pg_wal   files
16 5m31.137s   542M 417M  26
32 5m29.532s  539M 417M 13
64 5m36.189s   571M 449M  7
128   5m50.027s635M 513M  4
256   6m13.603s   635M 512M   2
512   7m4.154s 1.2G   1G   2
1024 8m55.357s1.2G   1G  1

For every test, except for connect/test5 in src/interfaces/ecpg, all else
passed.

We can see that smaller chunks take lesser time for the same amount of WAL
(128 and 256, 512 and 1024). But these tests are not large enough to
conclude.


Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


initdb_update_regress.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread Beena Emerson

Hello,

On Wed, Apr 5, 2017 at 6:06 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:e format and expand the range?
>
>
> I don't think me saying it felt a bit slow around 256 MB is a proper
> technical analysis that should lead to the conclusion that that upper
> limit should be 128 MB. ;-)
>

I ran a couple of tests for 16MB and 1GB and found less than 4% performance
dip. I am currently running test to check consistency of the results and
for various sizes. I will update soon.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread Beena Emerson

Hello,

On Wed, Apr 5, 2017 at 4:59 PM, Simon Riggs <si...@2ndquadrant.com> wrote:

> On 5 April 2017 at 06:04, Beena Emerson <memissemer...@gmail.com> wrote:
>
> I see various issues raised but not properly addressed
>
> 1. we would need to drop support for segment sizes < 16MB unless we
> adopt a new incompatible filename format.
> I think at 16MB the naming should be the same as now and that
> WALfilename -> LSN is very important.
> For this release, I think we should just allow >= 16MB and avoid the
> issue thru lack of time.
>
> 2. It's not clear to me the advantage of being able to pick varying
> filesizes. I see great disadvantage in having too many options, which
> greatly increases the chance of incompatibility, annoyance and
> breakage. I favour a small number of values that have been shown by
> testing to be sweet spots in performance and usability. (1GB has been
> suggested)
>

Does the options 16, 64 and 1024 seem good.
We can remove sizes below 16 as most have agreed and as per the discussion,
64MB and 1GB seems favoured. We could probably allow 32MB since it was an
already allowed size?

> 3. New file allocation has been a problem raised with this patch for
> some months now.
>

This did not seem to be an open issue, at least there was not many
disagreements. Higher the server load, more WAL generated. For the same
load, the frequency of file allocation reduces for higher wal-segsize
values. Overall it is either filling up many files or few larger files.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-04-05 Thread Beena Emerson

Hello,

On Wed, Apr 5, 2017 at 9:29 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 4/4/17 22:47, Amit Kapila wrote:
> >> Committed first part to allow internal representation change (only).
> >>
> >> No commitment yet to increasing wal-segsize in the way this patch has
> it.
> >>
> >
> > What part of patch you don't like and do you have any suggestions to
> > improve the same?
>
> I think there are still some questions and disagreements about how it
> should behave.
>

The  WALfilename - LSN mapping disruption for higher values you mean? Is
there anything else I have missed?


>
> I suggest the next step is to dial up the allowed segment size in
> configure and run some tests about what a reasonable maximum value could
> be.  I did a little bit of that, but somewhere around 256 MB, things got
> really slow.
>

Would it be better if just increase the limit to 128MB for now?
In next we can change the WAL file name format and expand the range?

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-03-30 Thread Beena Emerson

Hello,

On Fri, Mar 31, 2017 at 11:20 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
wrote:

> On Fri, Mar 31, 2017 at 10:40 AM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > On 30 Mar 2017 15:10, "Kuntal Ghosh" <kuntalghosh.2...@gmail.com> wrote:
>
> > I do not think a generalised function is a good idea. Besides, I feel the
> > respective approaches are best kept in the modules used also because the
> > internal code is not easily accessible by utils.
> >
> Ahh, I wonder what the reason can be. Anyway, I'll leave that decision
> for the committer. I'm moving the status to Ready for committer.
>
>
Thank you.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-03-30 Thread Beena Emerson

Hello,

Thanks for testing my patch.

On 30 Mar 2017 15:10, "Kuntal Ghosh" <kuntalghosh.2...@gmail.com> wrote:

On Tue, Mar 28, 2017 at 1:06 AM, Beena Emerson <memissemer...@gmail.com>
wrote:
> On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
>>
>> At this point, I suggest splitting this patch up into several
>> potentially less controversial pieces.
>>
>> One big piece is that we currently don't support segment sizes larger
>> than 64 GB, for various internal arithmetic reasons.  Your patch appears
>> to address that.  So I suggest isolating that.  Assuming it works
>> correctly, I think there would be no great concern about it.
>>
>> The next piece would be making the various tools aware of varying
>> segment sizes without having to rely on a built-in value.
>>
>> The third piece would then be the rest that allows you to set the size
>> at initdb
>>
>> If we take these in order, we would make it easier to test various sizes
>> and see if there are any more unforeseen issues when changing sizes.  It
>> would also make it easier to do performance testing so we can address
>> the original question of what the default size should be.
>
>
> PFA the patches divided into 3 parts:
Thanks for splitting the patches.
01-add-XLogSegmentOffset-macro.patch is same as before and it looks good.

> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.
looks good.

> 03-modify-tools.patch - Makes XLogSegSize into a variable, currently set
as
> XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
> inbuilt value.
Several methods are declared and defined in different tools to fetch
the size of wal-seg-size.
In pg_standby.c,
RetrieveXLogSegSize() - /* Set XLogSegSize from the WAL file header */

In pg_basebackup/streamutil.c,
 on behaRetrieveXLogSegSize(PGconn *conn) - /* set XLogSegSize using
SHOW wal_segment_size */

In pg_waldump.c,
ReadXLogFromDir(char *archive_loc)
RetrieveXLogSegSize(char *archive_path) /* Scan through the archive
location to set XLogSegsize from the first WAL file */

IMHO, it's better to define a single method in xlog.c and based on the
different strategy, it can retrieve the XLogSegsize on behalf of
different modules. I've suggested the same in my first set review and
I'll still vote for it. For example, in xlog.c, you can define
something as following:
bool RetrieveXLogSegSize(RetrieveStrategy rs, void* ptr)

Now based on the RetrieveStrategy(say Conn, File, Dir), you can cast
the void pointer to the appropriate type. So, when a new tool needs to
retrieve XLogSegSize, it can just call this function instead of
defining a new RetrieveXLogSegSize method.

It's just a suggestion from my side. Is there anything I'm missing
which can cause the aforesaid approach not to be working?
Apart from that, I've nothing to add here.



I do not think a generalised function is a good idea. Besides, I feel the
respective approaches are best kept in the modules used also because the
internal code is not easily accessible by utils.


> 04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and
> make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE
> instead of XLOG_SEG_SIZE
looks good.

>>
>> One concern I have is that your patch does not contain any tests.  There
>> should probably be lots of tests.
>
>
> 05-initdb_tests.patch adds tap tests to initialize cluster with different
> wal_segment_size and then check the config values. What other tests do you
> have in mind? Checking the various tools?
Nothing from me to add here.

I've nothing to add here for the attached set of patches. On top of
these, David's patch can be used for preserving LSNs in the file
naming for all segment sizes.

Re: [HACKERS] increasing the default WAL segment size

2017-03-27 Thread Beena Emerson

Hello,

On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> At this point, I suggest splitting this patch up into several
> potentially less controversial pieces.
>
> One big piece is that we currently don't support segment sizes larger
> than 64 GB, for various internal arithmetic reasons.  Your patch appears
> to address that.  So I suggest isolating that.  Assuming it works
> correctly, I think there would be no great concern about it.
>
> The next piece would be making the various tools aware of varying
> segment sizes without having to rely on a built-in value.
>
> The third piece would then be the rest that allows you to set the size
> at initdb
>
> If we take these in order, we would make it easier to test various sizes
> and see if there are any more unforeseen issues when changing sizes.  It
> would also make it easier to do performance testing so we can address
> the original question of what the default size should be.
>

PFA the patches divided into 3 parts:

02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
the internal representation of max_wal_size and min_wal_size to mb.

03-modify-tools.patch - Makes XLogSegSize into a variable, currently set as
XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
inbuilt value.

04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and
make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE
instead of XLOG_SEG_SIZE


> One concern I have is that your patch does not contain any tests.  There
> should probably be lots of tests.


05-initdb_tests.patch adds tap tests to initialize cluster with different
wal_segment_size and then check the config values. What other tests do you
have in mind? Checking the various tools?

--
Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


01-add-XLogSegmentOffset-macro.patch
Description: Binary data


02-increase-max-wal-segsize.patch
Description: Binary data


03-modify-tools.patch
Description: Binary data


05-initdb_tests.patch
Description: Binary data


04-initdb-walsegsize.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread Beena Emerson

Hello,

On Wed, Mar 22, 2017 at 9:41 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
wrote:

> On Wed, Mar 22, 2017 at 3:14 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > PFA an updated patch which fixes a minor bug I found. It only increases
> the
> > string size in pretty_wal_size function.
> > The 01-add-XLogSegmentOffset-macro.patch has also been rebased.
> Thanks for the updated versions. Here is a partial review of the patch:
>
> In pg_standby.c and pg_waldump.c,
> + XLogPageHeader hdr = (XLogPageHeader) buf;
> + XLogLongPageHeader NewLongPage = (XLogLongPageHeader) hdr;
> +
> + XLogSegSize = NewLongPage->xlp_seg_size;
> It waits until the file is at least XLOG_BLCKSZ, then gets the
> expected final size from XLogPageHeader. This looks really clean
> compared to the previous approach.
>

thank you for testing. PFA the rebased patch incorporating your comments.


>
> + * Verify that the min and max wal_size meet the minimum requirements.
> Better to write min_wal_size and max_wal_size.
>

Updated wherever applicable.


>
> + errmsg("Insufficient value for \"min_wal_size\"")));
> "min_wal_size %d is too low" may be? Use lower case for error
> messages. Same for max_wal_size.
>

updated.


>
> + /* Set the XLogSegSize */
> + XLogSegSize = ControlFile->xlog_seg_size;
> +
> A call to IsValidXLogSegSize() will be good after this, no?
>

Is it necessary? We already have the CRC check for ControlFiles. Is that
not enough?


>
> + /* Update variables using XLogSegSize */
> + check_wal_size();
> The method name looks somewhat misleading compared to the comment for
> it, doesn't it?
>

The method name is correct since it only checks if the value provided is
sufficient (at least 2 segment size). I have updated the comment to say
Check and update variables dependent on XLogSegSize


> This patch introduces a new guc_unit having values in MB for
> max_wal_size and min_wal_size. I'm not sure about the upper limit
> which is set to INT_MAX for 32-bit systems as well. Is it needed to
> define something like MAX_MEGABYTES similar to MAX_KILOBYTES?
> It is worth mentioning that GUC_UNIT_KB can't be used in this case
> since MAX_KILOBYTES is INT_MAX / 1024(<2GB) on 32-bit systems. That's
> not a sufficient value for min_wal_size/max_wal_size.
>

You are right. I can use the same value as upper limit for GUC_UNIT_MB, we
could probably change the name of MAX_KILOBYTES to something more general
for both GUC_UNIT_MB and GUC_UNIT_KB. The max size in 32-bit systems would
be 2TB.


> While testing with pg_waldump, I got the following error.
> bin/pg_waldump -p master/pg_wal/ -s 0/0100
> Floating point exception (core dumped)
> Stack:
> #0  0x004039d6 in ReadPageInternal ()
> #1  0x00404c84 in XLogFindNextRecord ()
> #2  0x00401e08 in main ()
> I think that the problem is in following code:
> /* parse files as start/end boundaries, extract path if not specified */
> if (optind < argc)
> {
> ....
> + if (!RetrieveXLogSegSize(full_path))
> ...
> }
> In this case, RetrieveXLogSegSize is conditionally called. So, if the
> condition is false, XLogSegSize will not be initialized.
>
>
pg_waldump code has been updated.




-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


02-initdb-walsegsize-v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-22 Thread Beena Emerson

Hello,


On Wed, Mar 22, 2017 at 9:19 AM, Robert Haas <robertmh...@gmail.com> wrote:

>
> I'm a little worried that this whole question of changing the file
> naming scheme is a diversion which will result in torpedoing any
> chance of getting some kind of improvement here for v11.  I don't
> think the patch is all that far from being committable but it's not
> going to get there if we start redesigning the world around it.
>
>
As stated above, the default 16MB has not changed and so we can take this
separately and not as part of this patch.

PFA an updated patch which fixes a minor bug I found. It only increases the
string size in pretty_wal_size function.
The 01-add-XLogSegmentOffset-macro.patch has also been rebased.

-- 
Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


02-initdb-walsegsize-v7.patch
Description: Binary data


01-add-XLogSegmentOffset-macro.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread Beena Emerson

PFA an updated patch.

This fixes an issue reported by Tushar internally. Since the patch changes
the way min and max wal_size is stored internally from segment count to
size in kb, it limited the maximum size of min and max_wal_size to 2GB in
32 bit systems.

The minimum required segment is 2 and the minimum wal size is 1MB. So the
lowest possible value of the min/max wal_size is 2MB. Hence, I have changed
the internal representation to MB instead of KB so that we can increase the
range. Also, for any wal-seg-size, it retains the default seg count as 5
and 64 for min and max wal_size. Consequently, the size of the variables
increase automatically according to the wal_segment_size. This behaviour is
similar to that of existing code.

I have also run pg_indent on the files.


On Mon, Mar 20, 2017 at 11:37 PM, Beena Emerson <memissemer...@gmail.com>
wrote:

> Hello,
>
> PFA the updated patch.
>
> On Fri, Mar 17, 2017 at 6:40 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
>
>> On Tue, Mar 14, 2017 at 1:44 AM, Beena Emerson <memissemer...@gmail.com>
>> wrote:
>> > Attached is the updated patch. It fixes the issues and also updates few
>> code
>> > comments.
>>
>> I did an initial readthrough of this patch tonight just to get a
>> feeling for what's going on.  Based on that, here are a few review
>> comments:
>>
>> The changes to pg_standby seem to completely break the logic to wait
>> until the file has attained the correct size.  I don't know how to
>> salvage that logic off-hand, but just breaking it isn't acceptable.
>>
>
> Using, the XLogLongPageHeader->xlp_seg_size, all the original checks have
> been retained. This methid is even used in pg_waldump.
>
>
>
>>
>> + Note that changing this value requires an initdb.
>>
>> Instead, maybe say something like "Note that this value is fixed for
>> the lifetime of the database cluster."
>>
>
> Corrected.
>
>
>>
>> -intmax_wal_size = 64;/* 1 GB */
>> -intmin_wal_size = 5;/* 80 MB */
>> +intwal_segment_size = 2048;/* 16 MB */
>> +intmax_wal_size = 1024 * 1024;/* 1 GB */
>> +intmin_wal_size = 80 * 1024;/* 80 MB */
>>
>> If wal_segment_size is now measured in multiple of XLOG_BLCKSZ, then
>> it's not the case that 2048 is always 16MB.  If the other values are
>> now measured in kB, perhaps rename the variables to add _kb, to avoid
>> confusion with the way it used to work (and in general).  The problem
>> with leaving this as-is is that any existing references to
>> max_wal_size in core or extension code will silently break; you want
>
> it to break in a noticeable way so that it gets fixed.
>>
>>
> The  wal_segment_size  now is DEFAULT_XLOG_SEG_SIZE / XLOG_BLCKSZ;
> min and max wal_size  have _kb postfix
>
>
>> + * UsableBytesInSegment: It is set in assign_wal_segment_size and stores
>> the
>> + * number of bytes in a WAL segment usable for WAL data.
>>
>> The comment doesn't need to say where it gets set, and it doesn't need
>> to repeat the variable name.  Just say "The number of bytes in a..."
>>
>
> Done.
>
>
>>
>> +assign_wal_segment_size(int newval, void *extra)
>>
>> Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
>> should only be there to expose the value; it shouldn't have
>> calculation logic associated with it.
>>
>
> Removed the function and called the functions in ReadControlFile.
>
>
>>
>>  /*
>> + * initdb passes the WAL segment size in an environment variable. We
>> don't
>> + * bother doing any sanity checking, we already check in initdb that
>> the
>> + * user gives a sane value.
>> + */
>> +XLogSegSize = pg_atoi(getenv("XLOG_SEG_SIZE"), sizeof(uint32), 0);
>>
>> I think we should bother.  I don't like the idea of the postmaster
>> crashing in flames without so much as a reasonable error message if
>> this parameter-passing mechanism goes wrong.
>>
>
> I have rechecked the XLogSegSize.
>
>
>>
>> +{"wal-segsize", required_argument, NULL, 'Z'},
>>
>> When adding an option with no documented short form, generally one
>> picks a number that isn't a character for the value at the end.  See
>> pg_regress.c or initdb.c for examples.
>>
>
> Done.
>
>
>>
>> +   wal_segment_size = atoi(str_wal_segment_size);
>>
>> So, you're comfortable interpreting --wal-segsize=1TB or
>> --wa

Re: [HACKERS] increasing the default WAL segment size

2017-03-20 Thread Beena Emerson

Hello,

PFA the updated patch.

On Fri, Mar 17, 2017 at 6:40 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Mar 14, 2017 at 1:44 AM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > Attached is the updated patch. It fixes the issues and also updates few
> code
> > comments.
>
> I did an initial readthrough of this patch tonight just to get a
> feeling for what's going on.  Based on that, here are a few review
> comments:
>
> The changes to pg_standby seem to completely break the logic to wait
> until the file has attained the correct size.  I don't know how to
> salvage that logic off-hand, but just breaking it isn't acceptable.
>

Using, the XLogLongPageHeader->xlp_seg_size, all the original checks have
been retained. This methid is even used in pg_waldump.



>
> + Note that changing this value requires an initdb.
>
> Instead, maybe say something like "Note that this value is fixed for
> the lifetime of the database cluster."
>

Corrected.


>
> -intmax_wal_size = 64;/* 1 GB */
> -intmin_wal_size = 5;/* 80 MB */
> +intwal_segment_size = 2048;/* 16 MB */
> +intmax_wal_size = 1024 * 1024;/* 1 GB */
> +intmin_wal_size = 80 * 1024;/* 80 MB */
>
> If wal_segment_size is now measured in multiple of XLOG_BLCKSZ, then
> it's not the case that 2048 is always 16MB.  If the other values are
> now measured in kB, perhaps rename the variables to add _kb, to avoid
> confusion with the way it used to work (and in general).  The problem
> with leaving this as-is is that any existing references to
> max_wal_size in core or extension code will silently break; you want

it to break in a noticeable way so that it gets fixed.
>
>
The  wal_segment_size  now is DEFAULT_XLOG_SEG_SIZE / XLOG_BLCKSZ;
min and max wal_size  have _kb postfix


> + * UsableBytesInSegment: It is set in assign_wal_segment_size and stores
> the
> + * number of bytes in a WAL segment usable for WAL data.
>
> The comment doesn't need to say where it gets set, and it doesn't need
> to repeat the variable name.  Just say "The number of bytes in a..."
>

Done.


>
> +assign_wal_segment_size(int newval, void *extra)
>
> Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
> should only be there to expose the value; it shouldn't have
> calculation logic associated with it.
>

Removed the function and called the functions in ReadControlFile.


>
>  /*
> + * initdb passes the WAL segment size in an environment variable. We
> don't
> + * bother doing any sanity checking, we already check in initdb that
> the
> + * user gives a sane value.
> + */
> +XLogSegSize = pg_atoi(getenv("XLOG_SEG_SIZE"), sizeof(uint32), 0);
>
> I think we should bother.  I don't like the idea of the postmaster
> crashing in flames without so much as a reasonable error message if
> this parameter-passing mechanism goes wrong.
>

I have rechecked the XLogSegSize.


>
> +{"wal-segsize", required_argument, NULL, 'Z'},
>
> When adding an option with no documented short form, generally one
> picks a number that isn't a character for the value at the end.  See
> pg_regress.c or initdb.c for examples.
>

Done.


>
> +   wal_segment_size = atoi(str_wal_segment_size);
>
> So, you're comfortable interpreting --wal-segsize=1TB or
> --wal-segsize=1GB as 1?  Implicitly, 1MB?
>

Imitating the current behaviour of config option --with-wal-segment, I have
used strtol to throw an error if the value is not only integers.


>
> + * ControlFile is not accessible here so use SHOW wal_segment_size command
> + * to set the XLogSegSize
>
> Breaks compatibility with pre-9.6 servers.
>

Added check for the version, the SHOW command will be run only in v10 and
above. Previous versions do not need this.


>
> --
Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


02-initdb-walsegsize-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-17 Thread Beena Emerson

Hello,

Thank you for your comments, I will post an updated patch soon.

On Fri, Mar 17, 2017 at 6:40 AM, Robert Haas <robertmh...@gmail.com> wrote:
>
>
> +assign_wal_segment_size(int newval, void *extra)
>
> Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
> should only be there to expose the value; it shouldn't have
> calculation logic associated with it.
>

The Checkpoint Segments and the UsableBytesInSegment had to be changed
depending on the value of  wal_segment_size set during initdb. I will
figure out another way to assign these values without using this
assign_hook.

> +   wal_segment_size = atoi(str_wal_segment_size);
>
> So, you're comfortable interpreting --wal-segsize=1TB or
> --wal-segsize=1GB as 1?  Implicitly, 1MB?
>

The option was intended to only accept values in MB as the original  config
--with-wal-segsize option, unfortunately, the patch does not throw error as
in the config option when the units are specified.

Error with config option --with-wal-segsize=1MB
configure: error: Invalid WAL segment size. Allowed values are
1,2,4,8,16,32,64.

Should we imitate this behaviour and just add a check to see if it only
contains numbers? or would it be better to allow the use of the units and
make appropriate code changes?

-- 
Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-03-13 Thread Beena Emerson

Hello,

Attached is the updated patch. It fixes the issues and also updates few
code comments.

On Fri, Mar 10, 2017 at 1:09 PM, tushar <tushar.ah...@enterprisedb.com>
wrote:
>
>
> 1)at the time of initdb, we have set - "--wal-segsize 4"  ,so all the WAL
> file size should be 4 MB each  but in the postgresql.conf file , it is
> mentioned
>
> #wal_keep_segments = 0  # in logfile segments,* 16MB each*; 0
> disables
>

> so the comment  (16MB ) mentioned against parameter 'wal_keep_segments'
> looks wrong , either we should remove this or modify it .
>

Removed.


>
> 2)Getting "Aborted (core dumped)"  error at the time of running
> pg_basebackup  ,
> *(this issue is only coming on Linux32 ,not on Linux64) * we have  double
> check to confirm it .
>

 Can you please check with the new patch?

-- 
Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


02-initdb-walsegsize-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-13 Thread Beena Emerson

Hello,

On Mon, Mar 13, 2017 at 1:49 PM, Prabhat Sahu <prabhat.s...@enterprisedb.com
> wrote:

> Hi,
>
> 2)Getting "Aborted (core dumped)"  error at the time of running
> pg_basebackup  , *(this issue is only coming on Linux32 ,not on Linux64)*
>  we have  double check to confirm it .
>
> Steps to reproduce on Linux32
>
>> ===
>>> fetch the sources
>>> apply both the patches
>>>  ./configure --with-zlib   --enable-debug  --enable-cassert
>>> --enable-depend --prefix=$PWD/edbpsql --with-openssl CFLAGS="-g -O0"; make
>>> all install
>>> Performed initdb with switch "--wal-segsize 4"
>>>
>>
>> Does the crash occur with only size 4?
>>
>
>
> Crash occurs for the value of *"--wal-segsize " 1, 2, 4, 8*  with stack
> details as below :
>
> For value the value of *"--wal-segsize " 16, 32, 64*... (all multiple of
> 16)  we are getting "Segmentation fault" message as below:
> [bin]$ ./pg_basebackup -v -D /tmp/slave16
> Segmentation fault (core dumped)
>
> and for all other values of* "--wal-segsize " 3, 5, 7, 9, 10, 11, ... 15,
> 17, 18,* ...  we are getting invalid message during "initdb":
> [bin]$ ./initdb -D data1 --wal-segsize=17
> initdb: Invalid WAL segment size 17
>

 The permissible values for  the wal-segment size is power of 2 from 1 to
1024. Hence the Invalid message is expected behaviour.

Just to summarize, In Linux32, values 1 to 8 crashed and 16 to 1024 gave
segmentation fault.


>
>>
>>> start the server
>>> run pg_basebackup
>>>
>>> [centos@tushar-centos bin]$ ./pg_basebackup -v -D /tmp/myslave
>>> *** glibc detected *** ./pg_basebackup: free(): invalid pointer:
>>> 0x08da7f00 ***
>>>
>>> [centos@tushar-centos bin]$
>>>
>>> same scenario is working fine against HEAD (v10 ) on Linux32 [i.e no
>>> patch applied]
>>>
>>> [centos@tushar-centos bin]$ ./pg_basebackup --verbose -D /tmp/slave11
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: transaction log start point: 0/2800024 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: transaction log end point: 0/28000E4
>>> pg_basebackup: waiting for background process to finish streaming ...
>>> pg_basebackup: base backup completed
>>> [centos@tushar-centos bin]$
>>>
>>
>> Just to confirm, was this done with configure flag --with-wal-segsize=4 ?
>>
>
> we also have configure with the option "*--with-wal-segsize=4*" and
> getting warning.
> ./configure --with-zlib   --enable-debug  --enable-cassert
> --enable-depend --prefix=$PWD/inst --with-openssl CFLAGS="-g -O0"
> *--with-wal-segsize=4*
>
> configure: WARNING: unrecognized options: --with-wal-segsize
>

 configure option was for the HEAD, without the patch applied.

I guess, I am missing something regarding the 32 bit machines, I am looking
into it.


Thank you,

--

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-03-13 Thread Beena Emerson

Hello,

Thank you testing. I just wanted to confirm few things since I do not have
linux32 setup yet.

On Fri, Mar 10, 2017 at 1:09 PM, tushar <tushar.ah...@enterprisedb.com>
wrote:

> On 03/10/2017 11:23 AM, Beena Emerson wrote:
>
>
>> Thank you for your reviews Kuntal, Jim, Ashutosh
>
> Attached in an updated 02 patch which:
>
>1. Call RetrieveXLogSegSize(conn) in pg_receivewal.c
>2. Remove the warning in Windows
>3. Change PATH_MAX in pg_waldump with MAXPGPATH
>
> Regarding the usage of the wal file size as the XLogSegSize, I agree with
> what Robert has said. Generally, the wal size will be of the expected
> wal_segment_size and to have it any other size, esspecially of a valid
> power2 value is extremely rare and I feel it is not a major cause of
> concern.
>
> We (Prabhat and I) have started basic  testing of this feature -
> 2 quick issue -
>
> 1)at the time of initdb, we have set - "--wal-segsize 4"  ,so all the WAL
> file size should be 4 MB each  but in the postgresql.conf file , it is
> mentioned
>
> #wal_keep_segments = 0  # in logfile segments,* 16MB each*; 0
> disables
>
> so the comment  (16MB ) mentioned against parameter 'wal_keep_segments'
> looks wrong , either we should remove this or modify it .
>
> 2)Getting "Aborted (core dumped)"  error at the time of running
> pg_basebackup  ,
> *(this issue is only coming on Linux32 ,not on Linux64) * we have  double
> check to confirm it .
>
> Steps to reproduce on Linux32
> ===
> fetch the sources
> apply both the patches
>  ./configure --with-zlib   --enable-debug  --enable-cassert
> --enable-depend --prefix=$PWD/edbpsql --with-openssl CFLAGS="-g -O0"; make
> all install
> Performed initdb with switch "--wal-segsize 4"
>

Does the crash occur with only size 4?


> start the server
> run pg_basebackup
>
> [centos@tushar-centos bin]$ ./pg_basebackup -v -D /tmp/myslave
> *** glibc detected *** ./pg_basebackup: free(): invalid pointer:
> 0x08da7f00 ***
>
> [centos@tushar-centos bin]$
>
> same scenario is working fine against HEAD (v10 ) on Linux32 [i.e no patch
> applied]
>
> [centos@tushar-centos bin]$ ./pg_basebackup --verbose -D /tmp/slave11
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: transaction log start point: 0/2800024 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: transaction log end point: 0/28000E4
> pg_basebackup: waiting for background process to finish streaming ...
> pg_basebackup: base backup completed
> [centos@tushar-centos bin]$
>

Just to confirm, was this done with configure flag --with-wal-segsize=4 ?


Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-03-09 Thread Beena Emerson

Hello,

On Tue, Mar 7, 2017 at 10:46 AM, Ashutosh Sharma <ashu.coe...@gmail.com>
wrote:

> Hi,
>
> I took a look at this patch. Overall, the patch looks good to me.
> However, there are some review comments that I would like to share,
>
> 1. I think the macro 'PATH_MAX' used in pg_waldump.c file is specific
> to Linux. It needs to be changed to some constant value or may be
> MAXPGPATH inorder to make it platform independent.
>
> 2. As already mentioned by Jim and Kuntal upthread, you are trying to
> detect the configured WAL segment size in pg_waldump.c and
> pg_standby.c files based on the size of the random WAL file which
> doesn't look like a good idea. But, then I think the only option we
> have is to pass the location of pg_control file to pg_waldump module
> along with the start and end wal segments.
>
> 3. When trying to compile '02-initdb-walsegsize-v2.patch' on Windows,
> I got this warning message,
>
> Warning1warning C4005: 'DEFAULT_XLOG_SEG_SIZE' : macro
> redefinition
> c:\users\ashu\postgresql\src\include\pg_config_manual.h20
>
> Apart from these, I am not having any comments as of now. I am still
> validating the patch on Windows. If I find any issues i will update
> it.
>
> Thank you for your reviews Kuntal, Jim, Ashutosh

Attached in an updated 02 patch which:

   1. Call RetrieveXLogSegSize(conn) in pg_receivewal.c
   2. Remove the warning in Windows
   3. Change PATH_MAX in pg_waldump with MAXPGPATH

Regarding the usage of the wal file size as the XLogSegSize, I agree with
what Robert has said. Generally, the wal size will be of the expected
wal_segment_size and to have it any other size, esspecially of a valid
power2 value is extremely rare and I feel it is not a major cause of
concern.

Thank you,

Beena Emerson
EnterpriseDB: http://www.enterprisedb.com


01-add-XLogSegmentOffset-macro.patch
Description: Binary data


02-initdb-walsegsize-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-02-27 Thread Beena Emerson

On Tue, Feb 28, 2017 at 9:45 AM, Jim Nasby <jim.na...@bluetreble.com> wrote:

> On 2/24/17 6:30 AM, Kuntal Ghosh wrote:
>
>> * You're considering any WAL file with a power of 2 as valid. Suppose,
>> the correct WAL seg size is 64mb. For some reason, the server
>> generated a 16mb invalid WAL file(maybe it crashed while creating the
>> WAL file). Your code seems to treat this as a valid file which I think
>> is incorrect. Do you agree with that?
>>
>
> Detecting correct WAL size based on the size of a random WAL file seems
> like a really bad idea to me.

> I also don't see the reason for #2... or is that how initdb writes out the
> correct control file?

The initdb module reads the size from the option provided and sets the
environment variable. This variable is read
in src/backend/access/transam/xlog.c and the ControlFile written.
Unlike pg_resetwal and pg_rewind, pg_basebackup cannot access the Control
file. It only accesses the wal log folder.  So we get the XLogSegSize from
the SHOW command using  replication connection.
As Kuntal pointed out, I might need to set it from  pg_receivewal.c as
well.

Thank you,

Beena Emerson

EnterpriseDB: https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [HACKERS] increasing the default WAL segment size

2017-02-26 Thread Beena Emerson

Hello,

Since the following commit, does not enable us to apply the patch cleanly,
I have attached a rebased patch 02.  patch 01 does not have any problems.
commit 9e3755ecb2d058f7d123dd35a2e1784006190962
Author: Tom Lane <t...@sss.pgh.pa.us>
Date:   Sat Feb 25 16:12:24 2017 -0500

Remove useless duplicate inclusions of system header files.


-- 
Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


02-initdb-walsegsize-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-02-23 Thread Beena Emerson

On Mon, Feb 20, 2017 at 2:47 PM, Beena Emerson <memissemer...@gmail.com>
wrote:

> Hello,
>
> On Thu, Feb 16, 2017 at 3:37 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
> wrote:
>
>> On Mon, Feb 6, 2017 at 11:09 PM, Beena Emerson <memissemer...@gmail.com>
>> wrote:
>> >
>> > Hello,
>> > PFA the updated patches.
>> I've started reviewing the patches.
>> 01-add-XLogSegmentOffset-macro.patch looks clean to me. I'll post my
>> detailed review after looking into the second patch. But, both the
>> patches needs a rebase based on the commit 85c11324cabaddcfaf3347df7
>> (Rename user-facing tools with "xlog" in the name to say "wal").
>>
>>
>
> PFA the rebased patches.
>
>
> Hello,

The recent commit  c29aff959dc64f7321062e7f33d8c6ec23db53d has again
changed the code and the second patch cannot be applied cleanly. Please
find attached the rebased 02 patch. 01 patch is the same .



-- 
Thank you,

Beena Emerson

Have a Great Day!


01-add-XLogSegmentOffset-macro.patch
Description: Binary data


02-initdb-walsegsize-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-02-20 Thread Beena Emerson

Hello,

On Thu, Feb 16, 2017 at 3:37 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
wrote:

> On Mon, Feb 6, 2017 at 11:09 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> >
> > Hello,
> > PFA the updated patches.
> I've started reviewing the patches.
> 01-add-XLogSegmentOffset-macro.patch looks clean to me. I'll post my
> detailed review after looking into the second patch. But, both the
> patches needs a rebase based on the commit 85c11324cabaddcfaf3347df7
> (Rename user-facing tools with "xlog" in the name to say "wal").
>
>

PFA the rebased patches.


Thank you,

Beena Emerson

Have a Great Day!


01-add-XLogSegmentOffset-macro.patch
Description: Binary data


02-initdb-walsegsize-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-08 Thread Beena Emerson

Hello,

On Wed, Feb 8, 2017 at 3:40 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Tue, Feb 7, 2017 at 5:14 PM, Mithun Cy <mithun...@enterprisedb.com>
> wrote:
> > On Tue, Feb 7, 2017 at 12:24 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> >> On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> >>> Are 2 workers required?
> >>>
> >>
> >> I think in the new design there is a provision of launching the worker
> >> dynamically to dump the buffers, so there seems to be a need of
> >> separate workers for loading and dumping the buffers.  However, there
> >> is no explanation in the patch or otherwise when and why this needs a
> >> pair of workers.  Also, if the dump interval is greater than zero,
> >> then do we really need to separately register a dynamic worker?
> >
> > We have introduced a new value  -1 for pg_prewarm.dump_interval this
> > means we will not dump at all, At this state, I thought auto
> > pg_prewarm process need not run at all, so I coded to exit the auto
> > pg_prewarm immediately. But If the user decides to start the auto
> > pg_prewarm to dump only without restarting the server, I have
> > introduced a launcher function "launch_pg_prewarm_dump" to restart the
> > auto pg_prewarm only to dump. Since now we can launch worker only to
> > dump, I thought we can redistribute the code between two workers, one
> > which only does prewarm (load only) and another dumps periodically.
> > This helped me to modularize and reuse the code. So once load worker
> > has finished its job, it registers a dump worker and then exists.
> > But if max_worker_processes is not enough to launch the "auto
> > pg_prewarm dump" bgworker
> > We throw an error
> > 2017-02-07 14:51:59.789 IST [50481] ERROR:  registering dynamic
> > bgworker "auto pg_prewarm dump" failed c
> > 2017-02-07 14:51:59.789 IST [50481] HINT:  Consider increasing
> > configuration parameter "max_worker_processes".
> >
> > Now thinking again instead of such error and then correcting same by
> > explicitly launching the auto pg_prewarm dump bgwroker through
> > launch_pg_prewarm_dump(), I can go back to original design where there
> > will be one worker which loads and then dumps periodically. And
> > launch_pg_prewarm_dump will relaunch dump only activity of that
> > worker. Does this sound good?
> >
>
> Won't it be simple if you consider -1 as a value to just load library?
>  For *_interval = -1, it will neither load nor dump.
>
>
+1
That is what I thought was the behaviour we decided upon for -1.



-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-07 Thread Beena Emerson

Hello,

On Tue, Feb 7, 2017 at 5:14 PM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

> On Tue, Feb 7, 2017 at 12:24 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> > On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> >> Are 2 workers required?
> >>
> >
> > I think in the new design there is a provision of launching the worker
> > dynamically to dump the buffers, so there seems to be a need of
> > separate workers for loading and dumping the buffers.  However, there
> > is no explanation in the patch or otherwise when and why this needs a
> > pair of workers.  Also, if the dump interval is greater than zero,
> > then do we really need to separately register a dynamic worker?
>
> We have introduced a new value  -1 for pg_prewarm.dump_interval this
> means we will not dump at all, At this state, I thought auto
> pg_prewarm process need not run at all, so I coded to exit the auto
> pg_prewarm immediately. But If the user decides to start the auto
> pg_prewarm to dump only without restarting the server, I have
> introduced a launcher function "launch_pg_prewarm_dump" to restart the
> auto pg_prewarm only to dump. Since now we can launch worker only to
> dump, I thought we can redistribute the code between two workers, one
> which only does prewarm (load only) and another dumps periodically.
> This helped me to modularize and reuse the code. So once load worker
> has finished its job, it registers a dump worker and then exists.
> But if max_worker_processes is not enough to launch the "auto
> pg_prewarm dump" bgworker
> We throw an error
> 2017-02-07 14:51:59.789 IST [50481] ERROR:  registering dynamic
> bgworker "auto pg_prewarm dump" failed c
> 2017-02-07 14:51:59.789 IST [50481] HINT:  Consider increasing
> configuration parameter "max_worker_processes".
>
> Now thinking again instead of such error and then correcting same by
> explicitly launching the auto pg_prewarm dump bgwroker through
> launch_pg_prewarm_dump(), I can go back to original design where there
> will be one worker which loads and then dumps periodically. And
> launch_pg_prewarm_dump will relaunch dump only activity of that
> worker. Does this sound good?
>

 Yes it would be better to have only one pg_prewarm worker as the loader is
idle for the entire server run time after the initial load activity of few
secs.

-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-07 Thread Beena Emerson

Hello,

On Tue, Feb 7, 2017 at 5:52 PM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

> Thanks Beena,
>
> On Tue, Feb 7, 2017 at 4:46 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > Few more comments:
> >
> > = Background worker messages:
> >
> > - Workers when launched, show messages like: "logical replication
> launcher
> > started”, "autovacuum launcher started”. We should probably have a
> similar
> > message to show that the pg_prewarm load and dump bgworker has started.
>
> -- Thanks, I will add startup and shutdown message.
>
> > - "auto pg_prewarm load: number of buffers to load x”, other messages
> show
> > space before and after “:”, we should keep it consistent through out.
> >
>
> -- I think you are testing patch 03. The latest patch_04 have
> corrected same. Can you please re-test it.
>

I had initially written comments for 03 and then again tested for 04 and
retained comments valid for it. I guess I missed to removed this. Sorry.


>
> >
> > = Action of -1.
> > I thought we decided that interval value of -1 would mean that the auto
> > prewarm worker will not be run at all. With current code, -1 is
> explained to
> > mean it will not dump. I noticed that reloading with new option as -1
> stops
> > both the workers but restarting loads the data and then quits. Why does
> it
> > allow loading with -1? Please explain this better in the documents.
> >
>
> -- '-1' means we do not want to dump at all. So we decide not to
> continue with launched bgworker and it exits. As per your first
> comment, if I register the startup and shutdown messages for auto
> pg_prewarm I think it will look better. Will try to explain it in a
> better way in documentation. The "auto pg_prewarm load" will not be
> affected with dump_interval value. It will always start, load(prewarm)
> and then exit.
>
> >
> > = launch_pg_prewarm_dump()
>
> > =# SELECT launch_pg_prewarm_dump();
> >  launch_pg_prewarm_dump
> > 
> >   53552
> > (1 row)
> >
> > $ ps -ef | grep 53552
> > b_emers+  53555   4391  0 16:21 pts/100:00:00 grep --color=auto 53552
>
> -- If dump_interval = -1 "auto pg_prewarm dump" will exit immediately.
>
> > = Function names
> > - load_now could be better named as load_file, load_dumpfile or similar.
> > - dump_now -> dump_buffer or  better?
>
> I did choose load_now and dump_now to indicate we are doing it
> immediately as invoking them was based on a timer/state. Probably we
> can improve that but dump_buffer, load_file may not be the right
> replacement.
>
> >
> > = Corrupt file
> > if the dump file is corrupted, the system crashes and the prewarm
> bgworkers
> > are not restarted. This needs to be handled better.
> >
> > WARNING:  terminating connection because of crash of another server
> process
> > 2017-02-07 16:36:58.680 IST [54252] DETAIL:  The postmaster has commanded
> > this server process to roll back the current transaction and exit,
> because
> > another server process exited abnormally and possibly corrupted shared
> > memory
>
> --- Can you please paste you autopgprewarm.save file, I edited the
> file manually to some illegal entry but did not see any crash.  Only
> we failed to load as block number were invalid. Please share your
> tests so that I can reproduce same.
>

I only changed the fork number from 0 to 10 in one of the entry.


> > = Documentation
> >
> > I feel the documentation needs to be improved greatly.
> >
> > - The first para in pg_prewarm should mention the autoload feature too.
> >
> > - The new section should be named “The pg_prewarm autoload” or something
> > better. "auto pg_prewarm bgworker” does not seem appropriate.  The
> > configuration parameter should also be listed out clearly like in
> auth-delay
> > page. The new function launch_pg_prewarm_dump() should be listed under
> > Functions.
>
> -- Thanks I will try to improve the documentation. And, put more details
> there.
>
>

--
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-07 Thread Beena Emerson

On Tue, Feb 7, 2017 at 3:01 PM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

> On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > launched by other  applications. Also with max_worker_processes = 2 and
> > restart, the system crashes when the 2nd worker is not launched:
> > 2017-02-07 11:36:39.132 IST [20573] LOG:  auto pg_prewarm load : number
> of
> > buffers actually tried to load 64
> > 2017-02-07 11:36:39.143 IST [18014] LOG:  worker process: auto pg_prewarm
> > load (PID 20573) was terminated by signal 11: Segmentation fault
>
> SEGFAULT was the coding mistake I have called the C-language function
> directly without initializing the functioncallinfo. Thanks for
> raising. Below patch fixes same.
>
> --
> Thanks and Regards
> Mithun C Y
> EnterpriseDB: http://www.enterprisedb.com
>


Few more comments:

= Background worker messages:

- Workers when launched, show messages like: "logical replication launcher
started”, "autovacuum launcher started”. We should probably have a similar
message to show that the pg_prewarm load and dump bgworker has started.

- "auto pg_prewarm load: number of buffers to load x”, other messages show
space before and after “:”, we should keep it consistent through out.


= Action of -1.
I thought we decided that interval value of -1 would mean that the auto
prewarm worker will not be run at all. With current code, -1 is explained
to mean it will not dump. I noticed that reloading with new option as -1
stops both the workers but restarting loads the data and then quits. Why
does it allow loading with -1? Please explain this better in the documents.


= launch_pg_prewarm_dump()
With dump_interval=-1, Though function returns a pid, this worker is not
running in the 04 patch. 03 version it was launching. Dumping is not done
now.

=# SELECT launch_pg_prewarm_dump();
 launch_pg_prewarm_dump

  53552
(1 row)

$ ps -ef | grep 53552
b_emers+  53555   4391  0 16:21 pts/100:00:00 grep --color=auto 53552


= Function names
- load_now could be better named as load_file, load_dumpfile or similar.
- dump_now -> dump_buffer or  better?


= Corrupt file
if the dump file is corrupted, the system crashes and the prewarm bgworkers
are not restarted. This needs to be handled better.

WARNING:  terminating connection because of crash of another server process
2017-02-07 16:36:58.680 IST [54252] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory


= Documentation

I feel the documentation needs to be improved greatly.

- The first para in pg_prewarm should mention the autoload feature too.

- The new section should be named “The pg_prewarm autoload” or something
better. "auto pg_prewarm bgworker” does not seem appropriate.  The
configuration parameter should also be listed out clearly like in
auth-delay page. The new function launch_pg_prewarm_dump() should be listed
under Functions.

-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-06 Thread Beena Emerson

Hello,

Thank you for the updated patch.

On Tue, Feb 7, 2017 at 10:44 AM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

> Hi all,
> Here is the new patch which fixes all of above comments, I changed the
> design a bit now as below
>
> What is it?
> ===
> A pair of bgwrokers one which automatically dumps buffer pool's block
> info at a given interval and another which loads those block into
> buffer pool when
> the server restarts.
>

Are 2 workers required? This would reduce the number of workers to be
launched by other  applications. Also with max_worker_processes = 2 and
restart, the system crashes when the 2nd worker is not launched:
2017-02-07 11:36:39.132 IST [20573] LOG:  auto pg_prewarm load : number of
buffers actually tried to load 64
2017-02-07 11:36:39.143 IST [18014] LOG:  worker process: auto pg_prewarm
load (PID 20573) was terminated by signal 11: Segmentation fault

I think the document should also mention that an appropriate
max_worker_processes should be set else the dump worker will not be
launched at all.

-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

2017-02-06 Thread Beena Emerson

Hello,
PFA the updated patches.

On Fri, Jan 27, 2017 at 2:17 PM, Beena Emerson <memissemer...@gmail.com>
wrote:

> Hello Andres,
>
> Thank you for your review.
>
> On Fri, Jan 27, 2017 at 12:39 AM, Andres Freund <and...@anarazel.de>
> wrote:
>
>> Hi,
>>
>> On 2017-01-23 11:35:11 +0530, Beena Emerson wrote:
>> > Please find attached an updated WIP patch. I have incorporated almost
>> all
>> > comments. This is to be applied over Robert's patches. I will post
>> > performance results later on.
>> >
>> > 1. shift (>>) and AND (&) operations: The assign hook of
>> wal_segment_size
>> > sets the WalModMask and WalShiftBit. All the modulo and division
>> operations
>> > using XLogSegSize has been replaced with these. However, there are many
>> > preprocessors which divide with XLogSegSize in xlog_internal.h. I have
>> not
>> > changed these because it would mean I will have to reassign the
>> WalShiftBit
>> > along with XLogSegSize in all the modules which use these macros. That
>> does
>> > not seem to be a good idea. Also, this means shift operator can be used
>> > only in couple of places.
>>
>> I think it'd be better not to have XLogSegSize anymore. Silently
>> changing a macros behaviour from being a compile time constant to
>> something runtime configurable is a bad idea.
>>
>
> I dont think I understood u clearly. You mean convert the macros using
> XLogSegSize to functions?
>

I have moved the ModMask related changes to a separate patch
 01-add-XLogSegmentOffset-macro.patch. This creates the macro
XLogSegmentOffset and replaces all "% XLogSegSize" and "% XLOG_SEG_SIZE"
with this macro.
I have not included the shift operator because the changes only apply to
about 4 lines did not give any performance boost or such.

Hm. Are GUC hooks a good way to compute the masks?  Interdependent GUCs
>> are unfortunately not working well, and several GUCs might end up
>> depending on these.  I think it might be better to assign the variables
>> somewhere early in StartupXLOG() or such.
>>
>
> I am not sure about these interdependent GUCs. I need to study this better
> and make changes as required.
>
>
The process flow is such thatthe Control File which sets the XLogSegSIze is
read after the GUC options are initialized. StartupXLOG is called by
StartupProcessMain() which restores the XLOG and then exits. Hence he value
initialised here are not persistent throughout the postgres run. It throws
error during pg_ctl stop.
The XLogSegSize adjustment in assign hooks have been removed and a new
macro ConvertToXSegs is used to convert the min and max wal_size to the
segment count when required. wal_segment_size set from ReadControlFile also
affects the Checkpointsegment value and hence the assign_wal_segment_size
calls CalculateCheckpointSegments.

Documentation is updated


Performance Tests:

I ran pgbench tests for different wal segment size on database of scale
factor 300 with shared_buffers of 8GB. Each of the tests ran for 10 min and
a median of 3 readings were considered. The following table shows the
performance of the patch wrt the HEAD for different client count for
various wal-segsize value. We could say that there is not performance
difference.

16 32 64 128





8MB -1.36 0.02 0.43 -0.24
16MB -0.38 0.18 -0.09 0.4
32MB -0.52 0.29 0.39 0.59
64MB -0.15 0.04 0.52 0.38



-- 
Thank you,

Beena Emerson

Have a Great Day!


01-add-XLogSegmentOffset-macro.patch
Description: Binary data


02-initdb-walsegsize-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-01-27 Thread Beena Emerson

Hello Andres,

Thank you for your review.

On Fri, Jan 27, 2017 at 12:39 AM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2017-01-23 11:35:11 +0530, Beena Emerson wrote:
> > Please find attached an updated WIP patch. I have incorporated almost all
> > comments. This is to be applied over Robert's patches. I will post
> > performance results later on.
> >
> > 1. shift (>>) and AND (&) operations: The assign hook of wal_segment_size
> > sets the WalModMask and WalShiftBit. All the modulo and division
> operations
> > using XLogSegSize has been replaced with these. However, there are many
> > preprocessors which divide with XLogSegSize in xlog_internal.h. I have
> not
> > changed these because it would mean I will have to reassign the
> WalShiftBit
> > along with XLogSegSize in all the modules which use these macros. That
> does
> > not seem to be a good idea. Also, this means shift operator can be used
> > only in couple of places.
>
> I think it'd be better not to have XLogSegSize anymore. Silently
> changing a macros behaviour from being a compile time constant to
> something runtime configurable is a bad idea.
>

I dont think I understood u clearly. You mean convert the macros using
XLogSegSize to functions?


> > 8. Declaring XLogSegSize: There are 2 internal variables for the same
> > parameter. In original code XLOG_SEG_SIZE is defined in the
> auto-generated
> > file src/include/pg_config.h. And xlog_internal.h defines:
> >
> > #define XLogSegSize ((uint32) XLOG_SEG_SIZE)
> >
> > To avoid renaming all parts of code, I made the following change in
> > xlog_internal.h
> >
> > + extern uint32 XLogSegSize;
> >
> > +#define XLOG_SEG_SIZE XLogSegSize
> >
> >  would it be better to just use one variable XLogSegSize everywhere. But
> > few external modules could be using XLOG_SEG_SIZE. Thoughts?
>
> They'll quite possibly break with configurable size anyway.  So I'd
> rather have those broken explicitly.
>

Ok. I will remove the XLOG_SEGS_SIZE variable then?


> > +/*
> > + * These variables are set in assign_wal_segment_size
> > + *
> > + * WalModMask: It is an AND mask for XLogSegSize to allow for faster
> modulo
> > + *   operations using it.
> > + *
> > + * WalShiftBit: It is an shift bit for XLogSegSize to allow for faster
> > + *   division operations using it.
> > + *
> > + * UsableBytesInSegment: It is the number of bytes in a WAL segment
> usable for
> > + *   WAL data.
> > + */
> > +uint32   WalModMask;
> > +static int   UsableBytesInSegment;
> > +static int   WalShiftBit;
>
> This could use some editorializing. "Faster modulo operations" isn't an
> explaining how/why it's actually being used. Same for WalShiftBit.
>

 I will change these comments.


>
> >  /*
> >   * Private, possibly out-of-date copy of shared LogwrtResult.
> > @@ -957,6 +975,7 @@ XLogInsertRecord(XLogRecData *rdata,
> >   if (!XLogInsertAllowed())
> >   elog(ERROR, "cannot make new WAL entries during recovery");
> >
> > +
> >   /*--
> >*
>
> Spurious newline change.
>
> >   if (ptr % XLOG_BLCKSZ == SizeOfXLogShortPHD &&
> > - ptr % XLOG_SEG_SIZE > XLOG_BLCKSZ)
> > + (ptr & WalModMask) > XLOG_BLCKSZ)
> >   initializedUpto = ptr - SizeOfXLogShortPHD;
> >   else if (ptr % XLOG_BLCKSZ == SizeOfXLogLongPHD &&
> > -  ptr % XLOG_SEG_SIZE < XLOG_BLCKSZ)
> > +  (ptr & WalModMask) < XLOG_BLCKSZ)
> >   initializedUpto = ptr - SizeOfXLogLongPHD;
> >   else
> >   initializedUpto = ptr;
>
> How about we introduce a XLogSegmentOffset(XLogRecPtr) function like
> macro in a first patch?  That'll reduce the amount of change in the
> commit actually changing things quite noticeably, and makes it easier to
> adjust things later.  I see very little benefit for in-place usage of
> either % XLOG_SEG_SIZE or & WalModMask.
>

I will check this.


>
>
> > @@ -1794,6 +1813,7 @@ XLogBytePosToRecPtr(uint64 bytepos)
> >   uint32  seg_offset;
> >   XLogRecPtr  result;
> >
> > +
> >   fullsegs = bytepos / UsableBytesInSegment;
> >   bytesleft = bytepos % UsableBytesInSegment;
>
> spurious change.
>
> > @@ -1878,7 +1898,7 @@ XLogRe

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-26 Thread Beena Emerson

On Fri, Jan 27, 2017 at 8:14 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Thu, Jan 26, 2017 at 8:45 PM, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
> > On 1/24/17 3:26 AM, Mithun Cy wrote:
> >> In my code by default, we only dump at shutdown time. If we want to
> >> dump at regular interval then we need to set the GUC
> >> pg_autoprewarm.buff_dump_interval to > 0.
> >
> > Just a thought with an additional use case:  If I want to set up a
> > standby for offloading queries, could I take the dump file from the
> > primary or another existing standby, copy it to the new standby, and
> > have it be warmed up to the state of the other instance from that?
> >
> > In my experience, that kind of use is just as interesting as preserving
> > the buffers across a restart.
> >
>
> An interesting use case.  I am not sure if somebody has tried that way
> but it appears to me that the current proposed patch should work for
> this use case.
>
>
Even I feel this should work.
In that case, we could add the file location parameter.  By default it
would store in the cluster directory else in the location provided. We can
update this parameter in standby for it to access the file.
Thoughts?


-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] \h tab-completion

2017-01-24 Thread Beena Emerson

On Wed, Jan 25, 2017 at 11:43 AM, Michael Paquier <michael.paqu...@gmail.com
> wrote:

> On Wed, Jan 25, 2017 at 3:03 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > I think the following change in tab-complete.c would do the trick.
> >
> > -   else if (Matches1("ALTER"))
> > +   else if (TailMatches1("ALTER"))
>
> Nope. This change breaks a bunch of subcommands, take for example
> ALTER TABLE foo ALTER, which would be completed to all the potential
> objects of ALTER commands with your patch, but in this case for
> example we just need to look at the column names, CONSTRAINT and
> COLUMN. CREATE is not part of any subcommands so that's easy to see it
> work with \h. What I think you should do is making the code path of
> \\h smarter with some exceptions by using TailMatchesCS2() for ALTER.
> There is as well the case of DROP commands that should be treated by
> the way.
>
>
I feared the same.
I will check this.

-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] \h tab-completion

2017-01-24 Thread Beena Emerson

On Wed, Jan 25, 2017 at 9:03 AM, Stephen Frost <sfr...@snowman.net> wrote:

> All,
>
> I'm not really inclined to do it myself right now, but it'd be awful
> nice if we had better table completion for \h.
>
> Right now, '\h alter' returns nothing, and '\h alter' returns a
> *bunch* of stuff.
>
> Yet, we happily support '\h alter view' and friends, returning just the
> info relevant for that particular command.
>
> This would be a good starter project for someone new to work on, imv,
> tho it could also go on the to-do list.
>
> Thanks!
>
>

I think the following change in tab-complete.c would do the trick.

-   else if (Matches1("ALTER"))
+   else if (TailMatches1("ALTER"))


postgres=# \h ALTER
AGGREGATE DOMAINFUNCTION
 MATERIALIZED VIEW RULE  SYSTEMTYPE
COLLATION EVENT TRIGGER GROUP OPERATOR
 SCHEMATABLE USER
CONVERSIONEXTENSION INDEX POLICY
 SEQUENCE  TABLESPACEUSER MAPPING FOR
DATABASE  FOREIGN DATA WRAPPER  LANGUAGE
 PUBLICATION   SERVERTEXT SEARCH   VIEW
DEFAULT PRIVILEGESFOREIGN TABLE LARGE OBJECT  ROLE
 SUBSCRIPTION  TRIGGER


-- 
Thank you,

Beena Emerson

Have a Great Day!


tab-complete-hALTER.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-24 Thread Beena Emerson

On Wed, Jan 25, 2017 at 10:36 AM, Jim Nasby <jim.na...@bluetreble.com>
wrote:

> On 1/24/17 2:26 AM, Mithun Cy wrote:
>
>> Thanks for looking into this patch, I just downloaded the patch and
>> applied same to the latest code, I can see file " autoprewarm.save" in
>> $PGDATA which is created and dumped at shutdown time and an activity
>> is logged as below
>> 2017-01-24 13:22:25.012 IST [91755] LOG:  Buffer Dump: saved metadata
>> of 59 blocks.
>>
>
> Yeah, I wasn't getting that at all, though I did see the shared library
> being loaded. If I get a chance I'll try it again.



Hope u added the following to the conf file:

shared_preload_libraries = 'pg_autoprewarm' # (change requires restart)
pg_autoprewarm.buff_dump_interval=20

Even after this u could not see the message then that's strange.

-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-24 Thread Beena Emerson

On Tue, Jan 24, 2017 at 1:56 PM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

> On Tue, Jan 24, 2017 at 5:07 AM, Jim Nasby <jim.na...@bluetreble.com>
> wrote:
> > I took a look at this again, and it doesn't appear to be working for me.
> The library is being loaded during startup, but I don't see any further
> activity in the log, and I don't see an autoprewarm file in $PGDATA.
>
> Hi Jim,
> Thanks for looking into this patch, I just downloaded the patch and
> applied same to the latest code, I can see file " autoprewarm.save" in
> $PGDATA which is created and dumped at shutdown time and an activity
> is logged as below
> 2017-01-24 13:22:25.012 IST [91755] LOG:  Buffer Dump: saved metadata
> of 59 blocks.
>
> In my code by default, we only dump at shutdown time. If we want to
> dump at regular interval then we need to set the GUC
> pg_autoprewarm.buff_dump_interval to > 0. I think I am missing
> something while trying to recreate the bug reported above. Can you
> please let me know what exactly you mean by the library is not
> working.
>
> > There needs to be some kind of documentation change as part of this
> patch.
> Thanks, I will add a sgml for same.
>
> For remaining suggestions, I will try to address in my next patch
> based on its feasibility.
>
>
>
The patch works for me too.

Few initial comments:

1. I think the README was maintained as is from the 1st version and says
pg_autoprewarm is a contrib module. It should be rewritten to
pg_autoprewarm is a part of pg_prewarm module. The documentation should be
added to pgprewarm.sgml instead of the README

2. buff_dump_interval could be renamed to just dump_interval or
buffer_dump_interval. Also, since it is now part of pg_prewarm. I think it
makes sense to have the conf parameter be: pg_prewarm.xxx instead of
pg_autoprewarm.xxx

3. During startup we get the following message:

2017-01-24 16:13:57.615 IST [90061] LOG:  Num buffers : 272

I could be better written as “pg_prewarm: 272 blocks loaded from dump” or
something similar.

4. Also, the message while dumping says:

2017-01-24 16:15:17.712 IST [90061] LOG:  Buffer Dump: saved metadata of
272 blocks

It would be better to write the module name in message instead of "Buffer
Dump"

Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

2017-01-22 Thread Beena Emerson

Hello,

Please find attached an updated WIP patch. I have incorporated almost all
comments. This is to be applied over Robert's patches. I will post
performance results later on.

1. shift (>>) and AND (&) operations: The assign hook of wal_segment_size
sets the WalModMask and WalShiftBit. All the modulo and division operations
using XLogSegSize has been replaced with these. However, there are many
preprocessors which divide with XLogSegSize in xlog_internal.h. I have not
changed these because it would mean I will have to reassign the WalShiftBit
along with XLogSegSize in all the modules which use these macros. That does
not seem to be a good idea. Also, this means shift operator can be used
only in couple of places.

2. pg_standby: it deals with WAL files, so I have used the file size to set
the XLogSegSize (similar to pg_xlogdump).  Also, macro
MaxSegmentsPerLogFile using  XLOG_SEG_SIZE  is now defined
in SetWALFileNameForCleanup where it is used. Since XLOG_SEG_SIZE is not
preset, the code which throws an message if the file size is greater than
XLOG_SEG_SIZE had to be removed.

3. XLOGChooseNumBuffers: This function, called during the creation of
Shared Memory Segment, requires XLogSegSize which is set from the
ControlFile. Hence temporarily read the ControlFile in XLOGShmemSize before
invoking  XLOGChooseNumBuffer. The ControlFile is read again and stored on
the Shared Memory later on.

4. IsValidXLogSegSize: This is a macro to verify the XLogSegSize. This is
used in initdb, pg_xlogdump, pg_standby.

5. Macro for power2: There were couple of ideas to make it centralised.
For now, I have just defined it in xlog_internal.

6. Since CRC is used to verify the ControlFile before reading all the
contents from it as is and I do not see the need to re-verify the
xlog_seg_size.

7. min/max_wal_size still take values in KB unit and internally store it as
segment count. Though the calculation is now shifted to their respective
assign hook as the GUC_UNIT_XSEGS had to be removed.

8. Declaring XLogSegSize: There are 2 internal variables for the same
parameter. In original code XLOG_SEG_SIZE is defined in the auto-generated
file src/include/pg_config.h. And xlog_internal.h defines:

#define XLogSegSize ((uint32) XLOG_SEG_SIZE)

To avoid renaming all parts of code, I made the following change in
xlog_internal.h

+ extern uint32 XLogSegSize;

+#define XLOG_SEG_SIZE XLogSegSize

 would it be better to just use one variable XLogSegSize everywhere. But
few external modules could be using XLOG_SEG_SIZE. Thoughts?

9. Documentation will be added in next version of patch.
-- 

Beena Emerson

On Sat, Jan 21, 2017 at 5:30 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Sat, Jan 21, 2017 at 4:50 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Fri, Jan 20, 2017 at 2:34 AM, Michael Paquier
> > <michael.paqu...@gmail.com> wrote:
> >> On Fri, Jan 20, 2017 at 12:49 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> >>> On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
> >>>> Problems 2-4 actually have to do with a DestReceiver of type
> >>>> DestRemote really, really wanting to have an associated Portal and
> >>>> database connection, so one approach is to create a stripped-down
> >>>> DestReceiver that doesn't care about those things and then passing
> >>>> that to GetPGVariable.
> >>>
> >>> I tried that and it worked out pretty well, so I'm inclined to go with
> >>> this approach.  Proposed patches attached.  0001 adds the new
> >>> DestReceiver type, and 0002 is a revised patch to implement the SHOW
> >>> command itself.
> >>>
> >>> Thoughts, comments?
> >>
> >> This looks like a sensible approach to me. DestRemoteSimple could be
> >> useful for background workers that are not connected to a database as
> >> well. Isn't there a problem with PGC_REAL parameters?
> >
> > No, because the output of SHOW is always of type text, regardless of
> > the type of the GUC.
>
> Thinking about that over night, that looks pretty nice. What would be
> nicer though would be to add INT8OID and BYTEAOID in the list, and
> convert as well the other replication commands. At the end, I think
> that we should finish by being able to remove all pq_* routine
> dependencies in walsender.c, saving quite a couple of lines.
> --
> Michael
>

-- 
Thank you,

Beena Emerson

Have a Great Day!

initdb-walsegsize-v2_WIP.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-01-17 Thread Beena Emerson

On Tue, Jan 17, 2017 at 12:50 PM, Beena Emerson <memissemer...@gmail.com>
wrote:

>
>
> On Tue, Jan 17, 2017 at 12:38 PM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> On Tue, Jan 17, 2017 at 4:06 PM, Beena Emerson <memissemer...@gmail.com>
>> wrote:
>> > I have already made patch for the generic SHOW replication command
>> > (attached) and am working on the new initdb patch based on that.
>> > I have not yet fixed the pg_standby issue. I am trying to address all
>> the
>> > comments and bugs still.
>>
>> Having documentation for this patch in protocol.sgml would be nice.
>>
>
> Yes. I will add that.
>
>
>
PFA the patch with the documentation included.


-- 
Thank you,

Beena Emerson

Have a Great Day!


repl_show_cmd_with_docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-01-16 Thread Beena Emerson

On Tue, Jan 17, 2017 at 12:38 PM, Michael Paquier <michael.paqu...@gmail.com
> wrote:

> On Tue, Jan 17, 2017 at 4:06 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > I have already made patch for the generic SHOW replication command
> > (attached) and am working on the new initdb patch based on that.
> > I have not yet fixed the pg_standby issue. I am trying to address all the
> > comments and bugs still.
>
> Having documentation for this patch in protocol.sgml would be nice.
>

Yes. I will add that.

-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

2017-01-16 Thread Beena Emerson

On Tue, Jan 17, 2017 at 12:18 PM, Michael Paquier <michael.paqu...@gmail.com
> wrote:

> On Sun, Jan 8, 2017 at 9:52 AM, Jim Nasby <jim.na...@bluetreble.com>
> wrote:
> >> I agree pretty_print_kb would have been a better for this function.
> >> However, I have realised that using the show hook and this function is
> >> not suitable and have found a better way of handling the removal of
> >> GUC_UNIT_XSEGS which no longer needs this function : using the
> >> GUC_UNIT_KB, convert the value in bytes to wal segment count instead in
> >> the assign hook. The next version of patch will use this.
> >
> >
> > ... it sounds like you're going back to exposing KB to users, and that's
> all
> > that really matters.
> >
> >> IMHO it'd be better to use the n & (n-1) check detailed at [3].
>
> That would be better.
>
> So I am looking at the proposed patch, though there have been reviews
> the patch was in "Needs Review" state, and as far as I can see it is a
> couple of things for frontends. Just by grepping for XLOG_SEG_SIZE I
> have spotted the following problems:
> - pg_standby uses it to know about the next segment available.
>

Yes. I am aware of this and had mentioned it in my post.


> - pg_receivexlog still uses it in segment handling.
> It may be a good idea to just remove XLOG_SEG_SIZE and fix the code
> paths that fail to compile without it, frontend utilities included
> because a lot of them now rely on the value coded in xlog_internal.h,
> but with this patch the value is set up in the context of initdb. And
> this would induce major breakages in many backup tools, pg_rman coming
> first in mind... We could replace it with for example a macro that
> frontends could use to check if the size of the WAL segment is in a
> valid range if the tool does not have direct access to the Postgres
> instance (aka the size of the WAL segment used there) as there are as
> well offline tools.
>

I will see whats the best way to do this.


>
> -#define XLogSegSize((uint32) XLOG_SEG_SIZE)
> +
> +extern uint32 XLogSegSize;
> +#define XLOG_SEG_SIZE XLogSegSize
> This bit is really bad for frontend declaring xlog_internal.h...
>
> --- a/src/bin/pg_test_fsync/pg_test_fsync.c
> +++ b/src/bin/pg_test_fsync/pg_test_fsync.c
> @@ -62,7 +62,7 @@ static const char *progname;
>
>  static int secs_per_test = 5;
>  static int needs_unlink = 0;
> -static char full_buf[XLOG_SEG_SIZE],
> +static char full_buf[DEFAULT_XLOG_SEG_SIZE],
> This would make sense as a new option of pg_test_fsync.
>
> A performance study would be a good idea as well. Regarding the
> generic SHOW command in the replication protocol, I may do it for next
> CF, I have use cases for it in my pocket.
>
>
Thank you for your review.

I have already made patch for the generic SHOW replication command
(attached) and am working on the new initdb patch based on that.
I have not yet fixed the pg_standby issue. I am trying to address all the
comments and bugs still.


-- 


Beena Emerson

Have a Great Day!


repl_show_cmd.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-11 Thread Beena Emerson

Hello,

On Wed, Jan 11, 2017 t 6:06 PM, Beena Emerson <memissemer...@gmail.com>
wrote:

>
>
> On Fri, Jan 6, 2017 at 11:54 AM, Ryan Murphy <ryanfmur...@gmail.com>
> wrote:
>
>> The following review has been posted through the commitfest application:
>> make installcheck-world:  tested, failed
>> Implements feature:   tested, passed
>> Spec compliant:   tested, passed
>> Documentation:tested, passed
>>
>> (Though I could not check "make installcheck-world" as passed because it
>> failed 1 test, I think it basically SHOULD pass - see my comment below.)
>>
>> Patch looks good to me and does what we talked about, and Docs seem clear
>> and correct.
>>
>> I was able to build Postgres and run pg_ctl and observe that it waited by
>> default for the 'start' action, which addresses my original concern.
>>
>> `make` and `make install` went fine, and `make check` did as well, but
>> `make installcheck-world` said (after a while):
>>
>> ===
>>  1 of 55 tests failed.
>> ===
>>
>>
>>
> I am sure you would get this error even without the patch.
>
>
>
The patch is good. I do not have any comments to make about the patch.

Ryan try to run 'make install-world' then 'make -i installcheck-world', -i
option will ignore the error and proceed. You can check if any other tests
fails. This is a separate issue, unrelated to this patch. I do not think we
should stop from changing the status because of this.

The status is now updated to 'Ready for committer'

Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-11 Thread Beena Emerson

On Fri, Jan 6, 2017 at 11:54 AM, Ryan Murphy <ryanfmur...@gmail.com> wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
>
> (Though I could not check "make installcheck-world" as passed because it
> failed 1 test, I think it basically SHOULD pass - see my comment below.)
>
> Patch looks good to me and does what we talked about, and Docs seem clear
> and correct.
>
> I was able to build Postgres and run pg_ctl and observe that it waited by
> default for the 'start' action, which addresses my original concern.
>
> `make` and `make install` went fine, and `make check` did as well, but
> `make installcheck-world` said (after a while):
>
> ===
>  1 of 55 tests failed.
> ===
>
>
>
I am sure you would get this error even without the patch.



-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

2017-01-06 Thread Beena Emerson

On Fri, Jan 6, 2017 at 11:36 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Thu, Jan 5, 2017 at 8:39 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > On Tue, Jan 3, 2017 at 5:46 PM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >> Actually, why not just having an equivalent of the SQL
> >> command and be able to query parameter values?
> >
> > This patch only needed the wal_segment_size and hence I made this
> specific
> > command.
> > How often and why would we need other parameter values in the replication
> > connection?
> > Making it a more general command to fetch any parameter can be a separate
> > topic. If it gets consensus, maybe it could be done and used here.
>
> I concur that for this patch it may not be necessary. But let's not
> narrow us in a corner when designing things. Being able to query the
> value of parameters is something that I think is actually useful for
> cases where custom GUCs are loaded on the server's
> shared_preload_libraries to do validation checks (one case is a
> logical decoder on backend, with streaming receiver as client
> expecting the logical decoder to do a minimum). This can allow a
> client to do checks only using a replication stream. Another case that
> I have in mind is for utilities like pg_rewind, we have been
> discussing about being able to not need a superuser when querying the
> target server. Having such a command would allow for example pg_rewind
> to do a 'SHOW full_page_writes' without the need of an extra
> connection.
>
>
I see the point. I will change the SHOW_WAL_SEGSZ to a general SHOW command
in the next version of the patch.

Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

2017-01-05 Thread Beena Emerson

Hello,

On Tue, Jan 3, 2017 at 5:46 PM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Tue, Jan 3, 2017 at 6:23 AM, Jim Nasby <jim.na...@bluetreble.com>
> wrote:
> > +   /* Check if wal_segment_size is in the power of 2 */
> > +   for (i = 0;; i++, pow2 = pow(2, i))
> > +   if (pow2 >= wal_segment_size)
> > +   break;
> > +
> > +   if (wal_segment_size != 1 && pow2 > wal_segment_size)
> > +   {
> > +   fprintf(stderr, _("%s: WAL segment size must be
> in the power of 2\n"), progname);
> > +   exit(1);
> > +   }
>
> I recall taht pow(x, 2) and x * x result usually in the same assembly
> code, but pow() can never be more optimal than a simple
> multiplication. So I'd think that it is wiser to avoid it in this code
> path. Documentation is missing for the new replication command
> SHOW_WAL_SEG.

As mentioned earlier, documents are not fully updated.

> Actually, why not just having an equivalent of the SQL
> command and be able to query parameter values?
>

This patch only needed the wal_segment_size and hence I made this specific
command.
How often and why would we need other parameter values in the replication
connection?
Making it a more general command to fetch any parameter can be a separate
topic. If it gets consensus, maybe it could be done and used here.

Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

2017-01-05 Thread Beena Emerson

Hello,

Thank you for your review.

On Tue, Jan 3, 2017 at 2:53 AM, Jim Nasby <jim.na...@bluetreble.com> wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> General comments:
> There was some discussion about the impact of this on small installs,
> particularly around min_wal_size. The concern was that changing the default
> segment size to 64MB would significantly increase min_wal_size in terms of
> bytes. The default value for min_wal_size is 5 segments, so 16MB->64MB
> would mean going from 80MB to 320MB. IMHO if you're worried about that then
> just initdb with a smaller segment size. There's probably a number of other
> changes a small environment wants to make besides that. Perhaps it'd be
> worth making DEFAULT_XLOG_SEG_SIZE a configure option to better support
> that.
>

The patch maintains the current XLOG_SEG_SIZE of 16MB as the default. Only
the capability to change its value has been moved for configure to initdb.


>
> It's not clear from the thread that there is consensus that this feature
> is desired. In particular, the performance aspects of changing segment size
> from a C constant to a variable are in question. Someone with access to
> large hardware should test that. Andres[1] and Robert[2] did suggest that
> the option could be changed to a bitshift, which IMHO would also solve some
> sanity-checking issues.
>
> +* initdb passes the WAL segment size in an environment variable.
> We don't
> +* bother doing any sanity checking, we already check in initdb
> that the
> +* user gives a sane value.
>
> That doesn't seem like a good idea to me. If anything, the backend should
> sanity-check and initdb just rely on that. Perhaps this is how other initdb
> options work, but it still seems bogus. In particular, verifying the size
> is a power of 2 seems important, as failing that would probably be
> ReallyBad(tm).
>
> The patch also blindly trusts the value read from the control file; I'm
> not sure if that's standard procedure or not, but ISTM it'd be worth
> sanity-checking that as well.
>

There is a CRC check to detect error in the file. I think all the
ControlFile values are used directly and not re-verified.


>
> The patch leaves the base GUC units for min_wal_size and max_wal_size as
> the # of segments. I'm not sure if that's a great idea.
>

I think we can leave it as is. This is used in CalculateCheckpontSegments
and in XLOGfileslop to calculate the segment numbers.


> + * convert_unit
> + *
> + * This takes the value in kbytes and then returns value in user-readable
> format
>
> This function needs a more specific name, such as pretty_print_kb().
>

I agree pretty_print_kb would have been a better for this function.
However, I have realised that using the show hook and this function is not
suitable and have found a better way of handling the removal of
GUC_UNIT_XSEGS which no longer needs this function : using the GUC_UNIT_KB,
convert the value in bytes to wal segment count instead in the assign hook.
The next version of patch will use this.



>
> +   /* Check if wal_segment_size is in the power of 2 */
> +   for (i = 0;; i++, pow2 = pow(2, i))
> +   if (pow2 >= wal_segment_size)
> +   break;
> +
> +   if (wal_segment_size != 1 && pow2 > wal_segment_size)
> +   {
> +   fprintf(stderr, _("%s: WAL segment size must be in
> the power of 2\n"), progname);
> +   exit(1);
> +   }
>
> IMHO it'd be better to use the n & (n-1) check detailed at [3].
>

Yes, even I had come across it. I will incorporate this in the next version
of the patch.


>
> Actually, there's got to be other places that need to check this, so it'd
> be nice to just create a function that verifies a number is a power of 2.
>
> +   if (log_fname != NULL)
> +   XLogFromFileName(log_fname, , );
> +
>
> Please add a comment about why XLogFromFileName has to be delayed.
>

Oh yes!.


>
>  /*
> + * DEFAULT_XLOG_SEG_SIZE is the size of a single WAL file.  This must be
> a power
> + * of 2 and larger than XLOG_BLCKSZ (preferably, a great deal larger than
> + * XLOG_BLCKSZ).
> + *
> + * Changing DEFAULT_XLOG_SEG_SIZE requires an initdb.
> + */
> +#define DEFAULT_XLOG_SEG_SIZE  (16*1024*1024)
>
> That comment isn't really accurate. It would be more useful to explain
> that DEFAULT_XLOG_SEG_SIZE is the default size of a WAL segment used by
> initdb if a different value isn't specified.
>

I will correct this comment


The new version of the patch will be posted soon.


Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

2016-12-20 Thread Beena Emerson

Hello Andres,

On Tue, Dec 20, 2016 at 1:58 PM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2016-12-19 15:14:50 +0530, Beena Emerson wrote:
> > The attached patch removes --with-wal-segsize configure option and adds a
> > new initdb option --wal-segsize. The module initdb passes the wal-segsize
> > value into an environment variable which is used to overwrite the guc
> value
> > wal_ segment_size and set the internal variables : XLogSegSize and
> > XLOG_SEG_SIZE (xlog_internal.h). The default wal_segment_size is not
> > changed but I have increased the maximum size to 1GB.
> >
> > Since  XLOG_SEG_SIZE is now variable, it could not be used directly in
> > src/bin modules and few macros and few changes had to be made:
>
> I do think this has the potential for negative performance
> implications. Consider code like
> /* skip over the page header */
> if (CurrPos % XLogSegSize == 0)
> {
> CurrPos += SizeOfXLogLongPHD;
> currpos += SizeOfXLogLongPHD;
> }
> else
> right now that's doable in an efficient manner, because XLogSegSize is
> constant. If it's a variable and the compiler doesn't even know it's a
> power-of-two, it'll have to do a full "div" - and that's quite easily
> noticeable in a lot of cases.
>
> Now it could entirely be that the costs of this will be swamped by
> everything else, but I'd not want to rely on it.
>
> I think we need tests with concurrent large-file copies. And then also
> look at the profile to see whether the relevant places become new
> hotspots (not that we introduce something that's just hidden for now).


> We might be able to do a bit better, efficency wise, by storing
> XLogSegSize as a "shift factor". I.e. the 16M setting would be 24
> (i.e. XLogSegSize would be defined as 1 << 24).
>
>
Thank you.
I did not realize we could face such problems. I will perform the tests to
check the performance and do the required changes.


--

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

2016-12-19 Thread Beena Emerson

Hello all,

On Mon, Dec 19, 2016 at 3:14 PM, Beena Emerson <memissemer...@gmail.com>
wrote:

> Hello all,
>
> Please find attached a patch to make wal segment size initdb configurable.
>
> The attached patch removes --with-wal-segsize configure option and adds a
> new initdb option --wal-segsize. The module initdb passes the wal-segsize
> value into an environment variable which is used to overwrite the guc value
> wal_ segment_size and set the internal variables : XLogSegSize and
> XLOG_SEG_SIZE (xlog_internal.h). The default wal_segment_size is not
> changed but I have increased the maximum size to 1GB.
>
> Since  XLOG_SEG_SIZE is now variable, it could not be used directly in
> src/bin modules and few macros and few changes had to be made:
>   -  in guc.c , remove GUC_UNIT_XSEGS which used XLOG_SEG_SIZE and
> introduce show functions for the guc which used the unit (min_wal_size and
> max_wal_size).
>   -  For pg_basebackup, add new replication command SHOW_WAL_SEGSZ to
> fetch the wal_segment_size in bytes.
>   - pg_controldata, pg_resetxlog, pg_rewind, fetch the xlog_seg_size from
> the ControlFile.
>   - Since pg_xlogdump reads the wal files, it uses the file size to
> determine the xlog_seg_size.
>   - In pg_test_fsync, a buffer of size XLOG_SEG_SIZE was created, filled
> with random data and written to a temporary file to check for any
> write/fsync error before performing the tests. Since it does not affect the
> actual performance results, the XLOG_SEG_SIZE in the module is replaced
> with the default value (16MB).
>
> Please note that the documents are not updated in this patch.
>
> Feedback and suggestions are welcome.
>

This patch has been added to the commit fest (https://commitfest.
postgresql.org/12/921/)

After further testing, I found that pg_standby contrib module does not work
with the changes. I will fix it in the next version of the patch. Comments
on the current patch are welcome.


Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] increasing the default WAL segment size

2016-12-19 Thread Beena Emerson

Hello all,

Please find attached a patch to make wal segment size initdb configurable.

The attached patch removes --with-wal-segsize configure option and adds a
new initdb option --wal-segsize. The module initdb passes the wal-segsize
value into an environment variable which is used to overwrite the guc value
wal_ segment_size and set the internal variables : XLogSegSize and
XLOG_SEG_SIZE (xlog_internal.h). The default wal_segment_size is not
changed but I have increased the maximum size to 1GB.

Since  XLOG_SEG_SIZE is now variable, it could not be used directly in
src/bin modules and few macros and few changes had to be made:
  -  in guc.c , remove GUC_UNIT_XSEGS which used XLOG_SEG_SIZE and
introduce show functions for the guc which used the unit (min_wal_size and
max_wal_size).
  -  For pg_basebackup, add new replication command SHOW_WAL_SEGSZ to fetch
the wal_segment_size in bytes.
  - pg_controldata, pg_resetxlog, pg_rewind, fetch the xlog_seg_size from
the ControlFile.
  - Since pg_xlogdump reads the wal files, it uses the file size to
determine the xlog_seg_size.
  - In pg_test_fsync, a buffer of size XLOG_SEG_SIZE was created, filled
with random data and written to a temporary file to check for any
write/fsync error before performing the tests. Since it does not affect the
actual performance results, the XLOG_SEG_SIZE in the module is replaced
with the default value (16MB).

Please note that the documents are not updated in this patch.

Feedback and suggestions are welcome.
--

Beena Emerson

Have a Great Day!


initdb-walsegsize_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specifying the log file name of pgbench -l option

2016-11-07 Thread Beena Emerson

Hello,

On Mon, Nov 7, 2016 at 8:42 PM, Masahiko Sawada <sawada.m...@gmail.com>
wrote:

> On Mon, Nov 7, 2016 at 10:52 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > Hello Sawada-san,
> >
> > On Mon, Nov 7, 2016 at 4:47 PM, Masahiko Sawada <sawada.m...@gmail.com>
> > wrote:
> >>
> >> On Wed, Nov 2, 2016 at 3:57 PM, Beena Emerson <memissemer...@gmail.com>
> >> wrote:
> >> > Hello,
> >> >
> >> > On Wed, Nov 2, 2016 at 4:16 AM, Fabien COELHO <coe...@cri.ensmp.fr>
> >> > wrote:
> >> >>
> >> >>
> >> >> Hello Masahiko,
> >> >>
> >> >>>> So I would suggest to:
> >> >>>>  - fix the compilation issue
> >> >>>>  - leave -l/--log as it is, i.e. use "pgbench_log" as a prefix
> >> >>>>  - add --log-prefix=... (long option only) for changing this prefix
> >> >>>
> >> >>>
> >> >>> I agree. It's better to add the separated option to specify the
> prefix
> >> >>> of log file instead of changing the existing behaviour. Attached
> >> >>> latest patch incorporated review comments.
> >> >>> Please review it.
> >> >>
> >> >>
> >> >> Patch applies, compiles and works for me.
> >> >
> >> >
> >> > It works for me as well.
> >> >
> >> >>
> >> >>
> >> >> This new option is for benchmarking, so "benchmarking_option_set"
> >> >> should
> >> >> be set to true.
> >> >>
> >> >> To simplify the code, I would suggest to initialize explicitely
> >> >> "logfile_prefix" to the default value which is then overriden when
> the
> >> >> option is set, which allows to get rid of the "prefix" variable
> later.
> >> >>
> >> >> There is no reason to change the documentation by breaking a line at
> >> >> another place if the text is not changed (where ... with 1).
> >> >>
> >> >> I would suggest to simplify a little bit the documentation:
> >> >>   "prefix of log file ..." ->
> >> >>   "default log file prefix that can be changed with ..."
> >> >>
> >> >> Otherwise the explanations seem clear enough to me. If a native
> English
> >> >> speaker could check them though, it would be nice.
> >> >
> >> >
> >> > For the explanation of the option --log-prefix, I feel it would be
> >> > better to
> >> > say "Custom prefix for transaction log file. Default is pgbench_log"
> >> >
> >> >
> >> > -   pgbench_log.nnn, where
> >> > +
> >> >
> >> > pgbench_log.<
> replaceable>nnn,
> >> > +   where pgbench_log is the prefix of log
> >> > file
> >> > that can
> >> > +   be changed by specifying --log-prefix, and where
> >> >
> >> > It could just say "the default prefix pgbench_log can be changed with
> >> > option
> >> > --log-prefix, and " we need not use where again.
> >> >
> >> > Also the error message is made similar to that of aggregate-interval
> but
> >> > I
> >> > think the one in sampling-rate is better:
> >> >
> >> > $ pgbench --log-prefix=chk -t 20
> >> > log file prefix (--log-prefix) is allowed only when actually logging
> >> > transactions
> >> >
> >> > pgbench  --sampling-rate=1 -t 20
> >> > log sampling (--sampling-rate) is allowed only when logging
> transactions
> >> > (-l)
> >> >
> >>
> >> Thank you for reviewing this patch!
> >>
> >> Attached latest patch incorporated comments.
> >> Please review it.
> >>
> >
> > Thank you for updating the patch.
> >
> > I note that the current changes, break the behaviour when we do not use
> -l
> > option.
> >
> > Since the log_prefix variable is set to default value, the check " if
> > (!use_log && logfile_prefix)"  always returns true. This throws error
> even
> > when we have not used the -l and --log-prefix option as shown below.
> >
> > $ pgbench -T 50
> > log file prefix (--log-prefix) is allowed only when logging transactions
> > (-l)
> >
>
> Thanks.
> Attached latest patch.
> Please review it.
>

Thank you for the update.

I checked. It works as expected. I have no more comments.
If its okay with Fabien, we can mark it ready for committer.

Thanks,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Specifying the log file name of pgbench -l option

2016-11-07 Thread Beena Emerson

Hello Sawada-san,

On Mon, Nov 7, 2016 at 4:47 PM, Masahiko Sawada <sawada.m...@gmail.com>
wrote:

> On Wed, Nov 2, 2016 at 3:57 PM, Beena Emerson <memissemer...@gmail.com>
> wrote:
> > Hello,
> >
> > On Wed, Nov 2, 2016 at 4:16 AM, Fabien COELHO <coe...@cri.ensmp.fr>
> wrote:
> >>
> >>
> >> Hello Masahiko,
> >>
> >>>> So I would suggest to:
> >>>>  - fix the compilation issue
> >>>>  - leave -l/--log as it is, i.e. use "pgbench_log" as a prefix
> >>>>  - add --log-prefix=... (long option only) for changing this prefix
> >>>
> >>>
> >>> I agree. It's better to add the separated option to specify the prefix
> >>> of log file instead of changing the existing behaviour. Attached
> >>> latest patch incorporated review comments.
> >>> Please review it.
> >>
> >>
> >> Patch applies, compiles and works for me.
> >
> >
> > It works for me as well.
> >
> >>
> >>
> >> This new option is for benchmarking, so "benchmarking_option_set" should
> >> be set to true.
> >>
> >> To simplify the code, I would suggest to initialize explicitely
> >> "logfile_prefix" to the default value which is then overriden when the
> >> option is set, which allows to get rid of the "prefix" variable later.
> >>
> >> There is no reason to change the documentation by breaking a line at
> >> another place if the text is not changed (where ... with 1).
> >>
> >> I would suggest to simplify a little bit the documentation:
> >>   "prefix of log file ..." ->
> >>   "default log file prefix that can be changed with ..."
> >>
> >> Otherwise the explanations seem clear enough to me. If a native English
> >> speaker could check them though, it would be nice.
> >
> >
> > For the explanation of the option --log-prefix, I feel it would be
> better to
> > say "Custom prefix for transaction log file. Default is pgbench_log"
> >
> >
> > -   pgbench_log.nnn, where
> > +
> > pgbench_log.<
> replaceable>nnn,
> > +   where pgbench_log is the prefix of log
> file
> > that can
> > +   be changed by specifying --log-prefix, and where
> >
> > It could just say "the default prefix pgbench_log can be changed with
> option
> > --log-prefix, and " we need not use where again.
> >
> > Also the error message is made similar to that of aggregate-interval but
> I
> > think the one in sampling-rate is better:
> >
> > $ pgbench --log-prefix=chk -t 20
> > log file prefix (--log-prefix) is allowed only when actually logging
> > transactions
> >
> > pgbench  --sampling-rate=1 -t 20
> > log sampling (--sampling-rate) is allowed only when logging transactions
> > (-l)
> >
>
> Thank you for reviewing this patch!
>
> Attached latest patch incorporated comments.
> Please review it.
>
>
Thank you for updating the patch.

I note that the current changes, break the behaviour when we do not use -l
option.

Since the log_prefix variable is set to default value, the check " if
(!use_log && logfile_prefix)"  always returns true. This throws error even
when we have not used the -l and --log-prefix option as shown below.

$ pgbench -T 50
log file prefix (--log-prefix) is allowed only when logging transactions
(-l)

Kindly fix this.

Thank you,


--

Beena Emerson

Have a Great Day!

Re: [HACKERS] Specifying the log file name of pgbench -l option

2016-11-02 Thread Beena Emerson

Hello,

On Wed, Nov 2, 2016 at 4:16 AM, Fabien COELHO <coe...@cri.ensmp.fr> wrote:

>
> Hello Masahiko,
>
> So I would suggest to:
>>>  - fix the compilation issue
>>>  - leave -l/--log as it is, i.e. use "pgbench_log" as a prefix
>>>  - add --log-prefix=... (long option only) for changing this prefix
>>>
>>
>> I agree. It's better to add the separated option to specify the prefix
>> of log file instead of changing the existing behaviour. Attached
>> latest patch incorporated review comments.
>> Please review it.
>>
>
> Patch applies, compiles and works for me.
>

It works for me as well.


>
> This new option is for benchmarking, so "benchmarking_option_set" should
> be set to true.
>
> To simplify the code, I would suggest to initialize explicitely
> "logfile_prefix" to the default value which is then overriden when the
> option is set, which allows to get rid of the "prefix" variable later.
>
> There is no reason to change the documentation by breaking a line at
> another place if the text is not changed (where ... with 1).
>
> I would suggest to simplify a little bit the documentation:
>   "prefix of log file ..." ->
>   "default log file prefix that can be changed with ..."
>
> Otherwise the explanations seem clear enough to me. If a native English
> speaker could check them though, it would be nice.


For the explanation of the option --log-prefix, I feel it would be better
to say "Custom prefix for transaction log file. Default is pgbench_log"


-   pgbench_log.nnn, where
+
pgbench_log.nnn,
+   where pgbench_log is the prefix of log file
that can
+   be changed by specifying --log-prefix, and where

It could just say "the default prefix pgbench_log can be changed with
option --log-prefix, and " we need not use where again.

Also the error message is made similar to that of aggregate-interval but I
think the one in sampling-rate is better:

$ pgbench --log-prefix=chk -t 20
log file prefix (--log-prefix) is allowed only when actually logging
transactions

pgbench  --sampling-rate=1 -t 20
log sampling (--sampling-rate) is allowed only when logging transactions
(-l)


Thanks,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-20 Thread Beena Emerson

On Mon, Oct 19, 2015 at 8:47 PM, Masahiko Sawada <sawada.m...@gmail.com>
wrote:

>
> Hi,
>
> Attached patch is a rough patch which supports multi sync replication
> by another approach I sent before.
>
> The new GUC parameters are:
> * synchronous_standby_num, which specifies the number of standby
> servers using sync rep. (default is 0)
> * synchronous_replication_method, which specifies replication method;
> priority or quorum. (default is priority)
>
> The behaviour of 'priority' and 'quorum' are same as what we've been
> discussing.
> But I write overview of these here again here.
>
> [Priority Method]
> The standby server has each different priority, and the active standby
> servers having the top N priroity are become sync standby server.
> If synchronous_standby_names = '*', the all active standby server
> would be sync standby server.
> If you want to set up standby like 9.5 or before, you can set
> synchronous_standby_num = 1.
>
>

I used the following setting with 2 servers A and D connected:

synchronous_standby_names = 'A,B,C,D'
synchronous_standby_num = 2
synchronous_replication_method = 'priority'

Though s_r_m = 'quorum' worked fine, changing it to 'priority' caused
segmentation fault.

Regards,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-14 Thread Beena Emerson

On Wed, Oct 14, 2015 at 10:38 AM, Michael Paquier  wrote:

> On Wed, Oct 14, 2015 at 3:02 AM, Masahiko Sawada wrote:
>
> > It would be good even if there are some restriction such as the
> > nesting level, the group setting.
> > The another new approach that I came up with is,
> > * Add new parameter synchronous_replication_method (say s_r_method)
> > which can have two names: 'priority', 'quorum'
> > * If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3')
> > is handled using priority. It's same as '[n1,n2,n3]' in dedicated
> > language.
> > * If s_r_method = 'quorum', the value of s_s_names is handled using
> > quorum commit, It's same as '(n1,n2,n3)' in dedicated language.
> > * Setting of synchronous_standby_names is same as today. That is, the
> > storing the nesting value is not supported.
> > * If we want to support more complex syntax like what we are
> > discussing, we can add the new value to s_r_method, for example
> > 'complex', 'json'.
>
> If we go that path, I think that we still would need an extra
> parameter to control the number of nodes that need to be taken from
> the set defined in s_s_names whichever of quorum or priority is used.
> Let's not forget that in the current configuration the first node
> listed in s_s_names and *connected* to the master will be used to
> acknowledge the commit.
>

Would it be better to just use a simple language instead of 3 different
parameters?

s_s_names = 2[X,Y,Z]  # 2 priority
s_s_names = 1(A,B,C) # 1 quorum
s_s_names = R,S,T # default behavior: 1 priorty?

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-08 Thread Beena Emerson

Amir Rohan wrote:
> But implementing a mechanism that can be used by other features in
> the future seems the deciding factor here, rather then the brevity of a 
> bespoke mini-language.

One decision to be taken is which among JSON or mini-language is better for
the SR setting.
Mini language can fit into the postgresql.conf single line.

For JSON currently a different file is used. But as said earlier, in case
composite types are required in future for other parameters then having
multiple .conf files does not make sense. To avoid this we can:
-   support multi-line GUC which would be helpful for other comma-separated
conf values along with s_s_names.  (This can make mini-language more
readable as well)
-   Allow JSON support in postgresql.conf. So that other parameters in 
future
can use JSON as well within postgresql.conf. 

What are the chances of future data requiring JSON? I think rare.

> > And I'm really not much in favor of a separate file; if we go that way
> > then we're going to have to reinvent a huge amount of infrastructure
> > that already exists for GUCs.
>
> Adding support for JSON objects (or some other kind of composite data
> type) 
> to the .conf parser would negate the need for one, and would also solve
> the
> problem being discussed for future cases.

With the current pg_syncinfo file, the only code added was to check the
pg_syncinfo file in the specified path and read the entire content of the
file into a variable which was used for further parsing which could have
been avoided with multi-line GUC.




-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5869285.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-08 Thread Beena Emerson

Hello,

The JSON method was used in the patch because it seemed to be the group
consensus.

Requirement:
- Grouping : Specify a list of node names with the required number of
ACK for the group. We could have priority or quorum group. Quorum treats
all the standby in same level and ACK from any k can be considered. In
priority behavior, ACK must be received from the specified k lowest priority
servers for a successful transaction.
- Group names to enable easier status reporting for group. The
topmost group may not be named. It will be assigned a default name. All the
sub groups are to be compulsorily named.
- Not more than 3 groups with 1 level of nesting expected

Behavior in submitted patch:
- The name of the top most group is named ‘Default Group”. All the
other standby_names or groups will have to be listed within this.
- When more than 1 connected standby has the same name then the
highest LSN among them is chosen. Example: 2 priority in (X,Y,Z). If there 2
nodes X connected, even though both X have returned ACK, the server will
wait for ACK from Y.
- There are no “potential” standbys. In quorum behavior, there are
no fixed standbys which are to be in sync, all members are equal. ACK from
any specified n nodes from a set is considered success.

Further:
- improvements to pg_stat_replication to give the node tree and
status?
- Manipulate/Edit conf setting using functions.
- Regression tests

Mini-lang:
[] - to specify prioirty
() - to specify quorum
Format - : []
Not specifying count defaults to 1.
Ex: s_s_names = '2(cluster1: 1(A,B), cluster2: 2[X,Y,Z], U)'

JSON
It would contain 2 main keys: "sync_info" and "groups"
The "sync_info" would consist of "quorum"/"priority" with the count and
"nodes"/"group" with the group name or node list.
The optional "groups" key would list out all the "group" mentioned within
"sync_info" along with the node list.
Ex: {
"sync_info":
{
"quorum":2,
"nodes":
[
{"quorum":1,"group":"cluster1"},
{"prioirty":2,"group": "cluster2"},
"U"
]
},
"groups":
{
"cluster1":["A","B"],
"cluster2":["X","Y","z"]
}
}

JSON and mini-language:
- JSON is more verbose
- You can define a group and use it multiple times in sync settings
but since no many levels or nesting is expected I am not sure how useful
this will be.
- Though JSON parser is inbuilt, additional code is required to check
for the required format of JSON. For mini-language, new parser will have to
be written.

Despite all, I feel the mini-language is better mainly for its brevity.
Also, it will not require additional GUC parser support (multi line).

-
Beena Emerson

--
View this message in context:
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5869286.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-08 Thread Beena Emerson

Sawada Masahiko wrote:
>
> I agree with adding support for multi-line GUC parameters.
> But I though it is:
> param = 'param1,
> param2,
> param3'
>
> This reads as 'value1,value2,value3'.

Use of '\' ensures that omission the closing quote does not break the entire
file.





-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5869289.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-09-15 Thread Beena Emerson

Hello,

Thank you Thomas and Sameer for checking the patch and giving your comments!

I will post an updated patch soon.

Regards,

Beena Emerson

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-09-10 Thread Beena Emerson

Hello,

Please find attached the WIP patch for the proposed feature. It is built
based on the already discussed design.

Changes made:
- add new parameter "sync_file" to provide the location of the pg_syncinfo
file. The default is 'ConfigDir/pg_syncinfo.conf', same as for pg_hba and
pg_ident file.
- pg_syncinfo file will hold the sync rep information in the approved JSON
format.
- synchronous_standby_names can be set to 'pg_syncinfo.conf'  to read the
JSON value stored in the file.
- All the standbys mentioned in the s_s_names or the pg_syncinfo file
currently get the priority as 1 and all others as  0 (async)
- Various functions in syncrep.c to read the json file and store the values
in a struct to be used in checking the quorum status of syncrep standbys
(SyncRepGetQuorumRecPtr function).

It does not support the current behavior for synchronous_standby_names =
'*'. I am yet to thoroughly test the patch.

Thoughts?

--
Beena Emerson


WIP_multiple_syncrep.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-08-04 Thread Beena Emerson

Michael Paquier wrote:
 And this is the case of any format as well. String format validation
 for a GUC occurs when server is reloaded or restarted, one advantage
 of JSON is that the parser validator is already here, so we don't need
 to reinvent a new machinery for that.

IIUC correctly, we would also have to add additional code to check that that
given JSON has the required keys and entries. For ex: The group mentioned
in the s_s_names  should be definied in the groups section, etc.





-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5860758.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 121 matches

Mail list logo