回复:A problem in ExecModifyTable

2021-08-17 Thread ()
Hi, David

Your answer explains that we still need to ModifyTable node without Leaf 
partitions.
You are right about this.

 But you can review the source code again, 
  ```
 /*
 * Fetch rows from subplan, and execute the required table modification
 * for each row. */
for (;;){
...
 /* No more tuples to process? */
if (TupIsNull(planSlot))break;


/*
  * For UPDATE/DELETE, fetch the row identity info for the tuple to 
be
 * updated/deleted.  For a heap relation, that's a TID; otherwise we
 * may have a wholerow junk attr that carries the old tuple in toto.
 * Keep this in step with the part of ExecInitModifyTable that sets 
up
 * ri_RowIdAttNo.
 */
if (operation == CMD_UPDATE || operation == CMD_DELETE)
{
charrelkind;
Datum   datum;
boolisNull;
relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
if (relkind == RELKIND_RELATION ||
relkind == RELKIND_MATVIEW ||relkind == 
RELKIND_PARTITIONED_TABLE)

...
}

```
 We can see that if the scanned tuple is NULL, it will break.
 Therefore, this step will never be achieved that is to extract ctid.

>We'll need some sort of ResultRelInfo in the case that all partitions
>are pruned.  

 In this case, the tuple returned should be null. 

>Using the one for the partitioned table is convenient.  I
>believe that's required if there are any statement-level triggers on
>the partitioned table or if there's a RETURNING clause.

 If the tuple to be modified is null, you do not need to process RETURNING 
clause.
 statement-level triggers cannot use tuple, and triggers on partitioned tables 
cannot have transition tables.
 I set Assert(relkind! = RELKIND_PARTITIONED_TABLE);  in the code, But it never 
arrives.

 Did I not understand your expression clearly?
 Could you provide me with a case to verify it? 

 Thank you very much for your answer.
 
 Regards & Thanks Adger
 

 


A problem in ExecModifyTable

2021-08-16 Thread ()
Hi hackers,
Recently, I noticed a great patch in pg 14.
"Rework planning and execution of UPDATE and DELETE. 
(86dc90056dfdbd9d1b891718d2e5614e3e432f35)"
This changes the DML execution of the partitioned table and makes it more 
friendly.
 But I am very confused about the following changes:
```
+   relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
+   if (relkind == RELKIND_RELATION ||
+   relkind == RELKIND_MATVIEW ||
+   relkind == RELKIND_PARTITIONED_TABLE)
{
-   charrelkind;
-   Datum   datum;
-   boolisNull;
-
-   relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
-   if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW)
-   {
-   datum = ExecGetJunkAttribute(slot,
-junkfilter->jf_junkAttNo,
-&isNull);
-   /* shouldn't ever get a null result... */```
According to my understanding, the parent table of a partitioned table does not 
store any tuples. 
Then why is "relkind = = RELKIND_PARTITIONED_TABLE" suddenly added here ?

There is no comment on this point in the code. 
Can you answer my confusion? Be deeply grateful.

Regards & Thanks Adger


 




回复:BEFORE ROW triggers for partitioned tables

2021-01-18 Thread ()
Hi commiter,

Many of our customers expect to use BR triggers in partitioned tables.
After I followed your discussion, I also checked your patch. 
Here are two questions confusing me:

1. Your modification removes the check BR triggers against partitioned table,
 and a more friendly error message is added to the ExecInsert and ExecUpdate.  
You are correct. ExecInsert does not reroute partitions. 
However, when ExecUpdate modifies partition keys, 
it is almost equivalent to ExecDelete and ExecInsert, 
and it is re-routed(ExecPrepareTupleRouting) once before ExecInsert. Therefore, 
why should an error be thrown in ExecUpdate?
Let's look at a case : 
```
postgres=# create table parted (a int, b int, c text) partition by list 
(a);
CREATE TABLE
postgres=# create table parted_1 partition of parted for values in (1);
CREATE TABLE
postgres=# create table parted_2 partition of parted for values in (2);
CREATE TABLE
postgres=# create function parted_trigfunc() returns trigger 
language plpgsql as $$
 begin
   new.a = new.a + 1;
   return new;
  end;
 $$;
CREATE FUNCTION
postgres=# insert into parted values (1, 1, 'uno uno v1'); 
INSERT 0 1
postgres=# create trigger t before update on parted
   for each row execute function parted_trigfunc();
CREATE TRIGGER
postgres=# update parted set c = c || 'v3'; 
   ```
If there is no check in the ExecUpdate, 
the above update SQL will be executed successfully.
However, in your code, this will fail.
So, what is the reason for your consideration?

2. In this patch, you only removed the restrictions BR trigger against 
the partitioned table, but did not fundamentally solve the problem caused 
by modifying partition keys in the BR trigger. What are the difficulties in 
solving this problem fundamentally? We plan to implement it. 
Can you give us some suggestions?

We look forward to your reply.
Thank you very much,
 Regards,  Adger





--
发件人:Alvaro Herrera 
发送时间:2021年1月18日(星期一) 20:36
收件人:Ashutosh Bapat 
抄 送:Ashutosh Bapat ; Pg Hackers 

主 题:Re: BEFORE ROW triggers for partitioned tables

Thanks for the reviews; I have pushed it now.

(I made the doc edits you mentioned too.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




回复:BEFORE ROW triggers for partitioned tables

2021-01-18 Thread ()
Hi commiter,

Many of our customers expect to use BR triggers in partitioned tables.
After I followed your discussion, I also checked your patch. 
Here are two questions confusing me:

1. Your modification removes the check BR triggers against partitioned table,
 and a more friendly error message is added to the ExecInsert and ExecUpdate.  
You are correct. ExecInsert does not reroute partitions. 
However, when ExecUpdate modifies partition keys, 
it is almost equivalent to ExecDelete and ExecInsert, 
and it is re-routed(ExecPrepareTupleRouting) once before ExecInsert. Therefore, 
why should an error be thrown in ExecUpdate?
Let's look at a case : 
```
postgres=# create table parted (a int, b int, c text) partition by list 
(a);
CREATE TABLE
postgres=# create table parted_1 partition of parted for values in (1);
CREATE TABLE
postgres=# create table parted_2 partition of parted for values in (2);
CREATE TABLE
postgres=# create function parted_trigfunc() returns trigger 
language plpgsql as $$
 begin
   new.a = new.a + 1;
   return new;
  end;
 $$;
CREATE FUNCTION
postgres=# insert into parted values (1, 1, 'uno uno v1'); 
INSERT 0 1
postgres=# create trigger t before update on parted
   for each row execute function parted_trigfunc();
CREATE TRIGGER
postgres=# update parted set c = c || 'v3'; 
   ```
If there is no check in the ExecUpdate, 
the above update SQL will be executed successfully.
However, in your code, this will fail.
So, what is the reason for your consideration?

2. In this patch, you only removed the restrictions BR trigger against 
the partitioned table, but did not fundamentally solve the problem caused 
by modifying partition keys in the BR trigger. What are the difficulties in 
solving this problem fundamentally? We plan to implement it. 
Can you give us some suggestions?


--
发件人:Alvaro Herrera 
发送时间:2021年1月18日(星期一) 20:36
收件人:Ashutosh Bapat 
抄 送:Ashutosh Bapat ; Pg Hackers 

主 题:Re: BEFORE ROW triggers for partitioned tables

Thanks for the reviews; I have pushed it now.

(I made the doc edits you mentioned too.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




回复:回复:回复:回复:how to create index concurrently on partitioned table

2020-06-17 Thread ()
> Not sure I am following.  In the case of REINDEX, it seems to me that
> the calls to validate_index() and index_concurrently_build() can
> happen in a separate transaction for each index, as long as all the
> calls to index_concurrently_swap() are grouped together in the same
> transaction to make sure that index partition trees are switched
> consistently when all entries are swapped from an invalid state to a
> valid state, because the swapping phase is also when we attach a fresh
> index to a partition tree.  See also index_concurrently_create_copy()
> where we don't set parentIndexRelid for the lower call to
> index_create().  It would be good of course to check that when
> swapping we have the code to handle that for a lot of indexes at
> once.

Let's look at this example:  
A partition table has five partitions,
parttable: part1, part2, part3, part3 ,part5
We simply abstract the following definitions:
 phase 1:  index_create(),   it is  only registered in catalogs.
phase 2: index_concurrently_build(), Build the indexes.
phase 3: validate_index(), insert any missing index entries, mark the index 
as valid.

(schema 1)
```
StartTransaction  one
parttable  phase 1
part 1   phase 1
part 2   phase 1
part 3   phase 1
part 4   phase 1
part 5   phase 1
CommitTransaction

StartTransaction two
parttable  phase 2part 1   phase 2
part 2   phase 2
part 3   phase 2 (error occurred )
part 4   phase 2
part 5   phase 2
CommitTransaction

StartTransaction three
parttable  phase 3
part 1   phase 3
part 2   phase 3 
part 3   phase 3  
part 4   phase 4  
part 5   phase 5  CommitTransaction
...
```
Now, the following steps cannot continue due to an error in Transaction two .
so, Transaction two  roll back, Transaction three haven't started.
All of our indexes are invalid.  In this way, 
we ensure the strong consistency of indexes in the partition tree.
However, we need to rebuild all indexes when reindex.

(schema 2)
```
StartTransaction  one
parttable  phase 1
part 1   phase 1
part 2   phase 1
part 3   phase 1
part 4   phase 1
part 5   phase 1
CommitTransaction

StartTransaction two part 1   phase 2
part 1   phase 3
CommitTransaction

StartTransaction three part 2   phase 2
part 2   phase 3
CommitTransaction

StartTransaction fourpart 3   phase 2  (error  occurred )
part 3   phase 3
CommitTransaction

StartTransaction five part 4   phase 2
part 4   phase 3


StartTransaction sixpart 5   phase 2
part 5   phase 3
CommitTransaction


StartTransaction sevenparttable phase 2
parttable  phase 3
CommitTransaction

```

Now, the following steps cannot continue due to an error  in Transaction four .
so, Transaction four roll back, Transactions behind Transaction 3 have not 
started
The indexes of the p1 and p2 partitions are available. Other indexes are 
invalid.
In reindex, we can ignore the rebuild of p1 and p2.
This seems better, although it seems to be inconsistent.

Do you think that scheme is more suitable for CIC?


Thank you very much,
 Regards,  Adger








--
发件人:Michael Paquier 
发送时间:2020年6月18日(星期四) 10:41
收件人:李杰(慎追) 
抄 送:Justin Pryzby ; pgsql-hackers 
; 曾文旌(义从) ; 
Alvaro Herrera 
主 题:Re: 回复:回复:回复:how to create index concurrently on partitioned table

On Wed, Jun 17, 2020 at 10:22:28PM +0800, 李杰(慎追) wrote:
> However, I found a problem. If there are many partitions, 
> we may need to handle too many missing index entries when
> validate_index().  Especially for the first partition, the time may
> have been long and many entries are missing.  In this case, why
> don't we put the second and third phase together into a transaction
> for each partition? 

Not sure I am following.  In the case of REINDEX, it seems to me that
the calls to validate_index() and index_concurrently_build() can
happen in a separate transaction for each index, as long as all the
calls to index_concurrently_swap() are grouped together in the same
transaction to make sure that index partition trees are switched
consistently when all entries are swapped from an invalid state to a
valid state, because the swapping phase is also when we attach a fresh
index to a partition tree.  See also index_concurrently_create_copy()
where we don't set parentIndexRelid for the lower call to
index_create().  It would be good of course to check that when
swapping we have the code to handle that for a lot of indexes at
once.
--
Michael


回复:回复:回复:how to create index concurrently on partitioned table

2020-06-17 Thread ()
> We can refer to the implementation in the ReindexRelationConcurrently,
> in the three phases of the REINDEX CONCURRENTLY,
> all indexes of the partitions are operated one by one in each phase.
> In this way, we can maintain the consistency of the entire partitioned table 
> index.
> After we implement CIC in this way, we can also complete reindex partitioned 
> table index concurrently (this is not done now.)

 After careful analysis, I found that there were two choices that embarrassed 
me.
 Although we can handle the entire partition tree with one transaction  in each 
of the three phases of CIC, just like ordinary tables.

However, I found a problem. If there are many partitions, 
we may need to handle too many missing index entries when validate_index() .
Especially for the first partition, the time may have been long and many 
entries are missing.
In this case, why don't we put the second and third phase together into a 
transaction for each partition?

So, which schema do you think is better?
Choose to maintain consistency in all three phases, 
or just maintain consistency in the first phase?


Thank you very much,
 Regards,  Adger





回复:回复:回复:how to create index concurrently on partitioned table

2020-06-15 Thread ()
> I am afraid that this makes the error handling more complicated, with
> risks of having inconsistent partition trees.  That's the point you
> raised.  This one is going to need more thoughts.
> CIC is an operation that exists while allowing read and writes to
> still happen in parallel, so that's fine IMO if it takes time.  Now it
> is true that it may not scale well so we should be careful with the
> approach taken.  Also, I think that the case of REINDEX would require
> less work overall because we already have some code in place to gather
> multiple indexes from one or more relations and work on these
> consistently, all at once.


I'm with you on that. 
(Scheme 1)
We can refer to the implementation in the ReindexRelationConcurrently,
 in the three phases of the REINDEX CONCURRENTLY,
all indexes of the partitions are operated one by one in each phase.
In this way, we can maintain the consistency of the entire partitioned table 
index.
After we implement CIC in this way, we can also complete reindex partitioned 
table index concurrently (this is not done now.)

Looking back, let's talk about our original schema.
(Scheme 2)
If CIC is performed one by one on each partition, 
how can we make it on subsequent partitions continue when an error occurs in 
the second partition?
 If this problem can be solved, It's not that I can't accept it.
Because a partition CIC error is acceptable, you can reindex it later.
Pseudo indexes on partitioned tables are useless for real queries, 
 but the indexes on partitions are really useful.

Scheme 1, more elegant, can also implement reindex concurrently on partitioned 
table, 
but the implementation is more complex.
Scheme 2: simple implementation, but not so friendly.

Hi Jsutin,
Which scheme do you think is more helpful to realize CIC?






----------
发件人:Michael Paquier 
发送时间:2020年6月16日(星期二) 09:02
收件人:李杰(慎追) 
抄 送:Justin Pryzby ; pgsql-hackers 
; 曾文旌(义从) ; 
Alvaro Herrera 
主 题:Re: 回复:回复:how to create index concurrently on partitioned table

On Mon, Jun 15, 2020 at 09:33:05PM +0800, 李杰(慎追) wrote:
> This is a good idea.
> We should maintain the consistency of the entire partition table.
> However, this is not a small change in the code.
> May be we need to make a new design for DefineIndex function

Indeed.  I have looked at the patch set a bit and here is the related
bit for CIC in 0001, meaning that you handle the basic index creation
for a partition tree within one transaction for each:
+   /*
+* CIC needs to mark a partitioned table as VALID, which itself
+* requires setting READY, which is unset for CIC (even though
+* it's meaningless for an index without storage).
+*/
+   if (concurrent)
+   {
+   PopActiveSnapshot();
+   CommitTransactionCommand();
+   StartTransactionCommand();
+   index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+
+   CommandCounterIncrement();
+   index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
}

I am afraid that this makes the error handling more complicated, with
risks of having inconsistent partition trees.  That's the point you
raised.  This one is going to need more thoughts.

> But most importantly, if we use three steps to implement CIC, 
> we will spend more time than ordinary tables, especially in high
> concurrency cases.  To wait for one of partitions which the users to
> use  frequently, the creation of other partition indexes is delayed. 
> Is it worth doing this?

CIC is an operation that exists while allowing read and writes to
still happen in parallel, so that's fine IMO if it takes time.  Now it
is true that it may not scale well so we should be careful with the
approach taken.  Also, I think that the case of REINDEX would require
less work overall because we already have some code in place to gather
multiple indexes from one or more relations and work on these
consistently, all at once.
--
Michael


回复:回复:how to create index concurrently on partitioned table

2020-06-15 Thread ()
> That's a problem.  I really think that we should make the steps of the
> concurrent operation consistent across all relations, meaning that all
> the indexes should be created as invalid for all the parts of the
> partition tree, including partitioned tables as well as their
> partitions, in the same transaction.  Then a second new transaction
> gets used for the index build, followed by a third one for the
> validation that switches the indexes to become valid.

This is a good idea.
We should maintain the consistency of the entire partition table.
However, this is not a small change in the code.
May be we need to make a new design for DefineIndex function

But most importantly, if we use three steps to implement CIC, 
we will spend more time than ordinary tables, especially in high concurrency 
cases.
To wait for one of partitions which the users to use  frequently, 
the creation of other partition indexes is delayed.
Is it worth doing this?

 Regards,  Adger





--
发件人:Michael Paquier 
发送时间:2020年6月15日(星期一) 20:37
收件人:李杰(慎追) 
抄 送:Justin Pryzby ; pgsql-hackers 
; 曾文旌(义从) ; 
Alvaro Herrera 
主 题:Re: 回复:how to create index concurrently on partitioned table

On Mon, Jun 15, 2020 at 08:15:05PM +0800, 李杰(慎追) wrote:
> As shown above, an error occurred while creating an index in the second 
> partition. 
> It can be clearly seen that the index of the partitioned table is invalid 
> and the index of the first partition is normal, the second partition is 
> invalid, 
> and the Third Partition index does not exist at all.

That's a problem.  I really think that we should make the steps of the
concurrent operation consistent across all relations, meaning that all
the indexes should be created as invalid for all the parts of the
partition tree, including partitioned tables as well as their
partitions, in the same transaction.  Then a second new transaction
gets used for the index build, followed by a third one for the
validation that switches the indexes to become valid.
--
Michael


回复:how to create index concurrently on partitioned table

2020-06-15 Thread ()
 
Description
+---+---+--+-+--+--+-
 a | integer  |  | |  | plain | |
 b | integer  |  | |  | plain | |
 c | character varying |  | |  | extended | |
Partition of: prt1 FOR VALUES FROM (0) TO (25)
Partition constraint: ((a IS NOT NULL) AND (a >= 0) AND (a < 25))
Indexes:
 "prt1_p1_a_idx" UNIQUE, btree (a)
Access method: heap

postgres=# \d+ prt1_p2
 Table "public.prt1_p2"
 Column |  Type | Collation | Nullable | Default | Storage | Stats target | 
Description
+---+---+--+-+--+--+-
 a | integer  |  | |  | plain | |
 b | integer  |  | |  | plain | |
 c | character varying |  | |  | extended | |
Partition of: prt1 FOR VALUES FROM (25) TO (50)
Partition constraint: ((a IS NOT NULL) AND (a >= 25) AND (a < 50))
Indexes:
 "prt1_p2_a_idx" UNIQUE, btree (a)
Access method: heap

postgres=# \d+ prt1_p3
 Table "public.prt1_p3"
 Column |  Type | Collation | Nullable | Default | Storage | Stats target | 
Description
+---+---+--+-+--+--+-
 a | integer  |  | |  | plain | |
 b | integer  |  | |  | plain | |
 c | character varying |  | |  | extended | |
Partition of: prt1 FOR VALUES FROM (50) TO (75)
Partition constraint: ((a IS NOT NULL) AND (a >= 50) AND (a < 75))
Access method: heap
```

Now we can see that the first two partitions have indexes, 
but the third partition has no indexes due to an error. 
Therefore, in our first case, it should not be what we expected that the third 
partition has no index.
That is to say, when our CIC goes wrong, either roll back all or go down 
instead of stopping in the middle.
This is my shallow opinion, please take it as your reference.

Thank you very much,
 Regards,  Adger



----------
发件人:Justin Pryzby 
发送时间:2020年6月13日(星期六) 02:15
收件人:Michael Paquier ; 李杰(慎追) 
抄 送:pgsql-hackers ; 曾文旌(义从) 
; Alvaro Herrera 
主 题:Re: how to create index concurrently on partitioned table

On Fri, Jun 12, 2020 at 04:17:34PM +0800, 李杰(慎追) wrote:
> As we all know, CIC has three transactions. If we recursively in n 
> partitioned tables, 
> it will become 3N transactions. If an error occurs in these transactions, we 
> have too many things to deal...
> 
> If an error occurs when an index is created in one of the partitions, 
> what should we do with our new index?

My (tentative) understanding is that these types of things should use a
"subtransaction" internally..  So if the toplevel transaction rolls back, its
changes are lost.  In some cases, it might be desirable to not roll back, in
which case the user(client) should first create indexes (concurrently if
needed) on every child, and then later create index on parent (that has the
advtantage of working on older servers, too).

postgres=# SET client_min_messages=debug;
postgres=# CREATE INDEX ON t(i);
DEBUG:  building index "t1_i_idx" on table "t1" with request for 1 parallel 
worker
DEBUG:  index "t1_i_idx" can safely use deduplication
DEBUG:  creating and filling new WAL file
DEBUG:  done creating and filling new WAL file
DEBUG:  creating and filling new WAL file
DEBUG:  done creating and filling new WAL file
DEBUG:  building index "t2_i_idx" on table "t2" with request for 1 parallel 
worker
^C2020-06-12 13:08:17.001 CDT [19291] ERROR:  canceling statement due to user 
request
2020-06-12 13:08:17.001 CDT [19291] STATEMENT:  CREATE INDEX ON t(i);
2020-06-12 13:08:17.001 CDT [27410] FATAL:  terminating connection due to 
administrator command
2020-06-12 13:08:17.001 CDT [27410] STATEMENT:  CREATE INDEX ON t(i);
Cancel request sent

If the index creation is interrupted at this point, no indexes will exist.

On Fri, Jun 12, 2020 at 04:06:28PM +0800, 李杰(慎追) wrote:
> >On Sat, Jun 06, 2020 at 09:23:32AM -0500, Justin Pryzby wrote:
> > I looked at CIC now and came up with the attached.  All that's needed to 
> > allow
> > this case is to close the relation before recursing to partitions - it 
> > needs to
> > be closed before calling CommitTransactionCommand().  There's probably a 
> > better
> > way to write this, but I can't see that there's anything complicated about
> > handling partitioned tables.
> 
> I'm so sorry about getting back late.
> Thank you very much for helping me consider this issue.
> I compiled the patch v1 you provided. And I patch v2-001 again to enter 
> postgresql.
> I got a coredump that was easy to reproduce. As follows:

> I have been trying to get familiar with the source code of create index.
> Can you solve this bug first? I will try my best to implement CIC with you.
> Next, I will read your patch

回复:how to create index concurrently on paritioned table

2020-06-12 Thread ()
Hi Justin,

> Maybe I'm wrong, but I don't think there's any known difficulty - just that> 
> nobody did it yet.  You should pay attention to what happens on error, but
> hopefully you wouldn't need to add much code and can rely on existing code to
> paths to handle that right.

yes,  I am trying to learn the code of index definition.

> I think you'd look at the commits and code implementing indexes on partitioned
> tables and CREATE INDEX CONCURRENTLY.  And maybe any following commits with
> fixes.

> You'd first loop around all children (recursively if there are partitions 
> which
> are themselves partitioned) and create indexes concurrently. 

As we all know, CIC has three transactions. If we recursively in n partitioned 
tables, 
it will become 3N transactions. If an error occurs in these transactions, we 
have too many things to deal...

If an error occurs when an index is created in one of the partitions, 
what should we do with our new index?


Thank you very much,
 Regards,  Adger








回复:how to create index concurrently on paritioned table

2020-06-12 Thread ()

>On Sat, Jun 06, 2020 at 09:23:32AM -0500, Justin Pryzby wrote:
> > On Wed, Jun 03, 2020 at 08:22:29PM +0800, 李杰(慎追) wrote:
> > > Partitioning is necessary for very large tables.
> > >  However, I found that postgresql does not support create index 
> > > concurrently on partitioned tables.
> > > The document show that we need to create an index on each partition 
> > > individually and then finally create the partitioned index 
> > > non-concurrently. 
> > > This is undoubtedly a complex operation for DBA, especially when there 
> > > are many partitions. 
> > 
> > > Therefore, I wonder why pg does not support concurrent index creation on 
> > > partitioned tables? 
> > > What are the difficulties of this function? 
> > > If I want to implement it, what should I pay attention?
> > 
> > Maybe I'm wrong, but I don't think there's any known difficulty - just that
> > nobody did it yet.

> I said that but I was actually thinking about the code for "REINDEX
> CONCURRENTLY" (which should also handle partitioned tables).

> I looked at CIC now and came up with the attached.  All that's needed to allow
> this case is to close the relation before recursing to partitions - it needs 
> to
> be closed before calling CommitTransactionCommand().  There's probably a 
> better
> way to write this, but I can't see that there's anything complicated about
> handling partitioned tables.

Hi, Justin Pryzby

I'm so sorry about getting back late.
Thank you very much for helping me consider this issue.
I compiled the patch v1 you provided. And I patch v2-001 again to enter 
postgresql.
I got a coredump that was easy to reproduce. As follows:

#0  PopActiveSnapshot () at snapmgr.c:822
#1  0x005ca687 in DefineIndex (relationId=relationId@entry=16400, 
stmt=stmt@entry=0x1aa5e28, indexRelationId=16408, indexRelationId@entry=0, 
parentIndexId=parentIndexId@entry=16406,
  parentConstraintId=0, is_alter_table=is_alter_table@entry=false,
 check_rights=true, check_not_in_use=true, skip_build=false, quiet=false) 
at indexcmds.c:1426
#2  0x005ca5ab in DefineIndex (relationId=relationId@entry=16384, 
stmt=stmt@entry=0x1b35278, indexRelationId=16406, indexRelationId@entry=0,
 parentIndexId=parentIndexId@entry=0,
  parentConstraintId=parentConstraintId@entry=0, is_alter_table=
is_alter_table@entry=false, check_rights=true, check_not_in_use=true,
 skip_build=false, quiet=false) at indexcmds.c:1329
#3  0x0076bf80 in ProcessUtilitySlow (pstate=pstate@entry=0x1b350c8, 
pstmt=pstmt@entry=0x1a2bf40,
  queryString=queryString@entry=0x1a2b2c8 "create index CONCURRENTLY 
idxpart_a_idx on idxpart (a);", context=context@entry=PROCESS_UTILITY_TOPLEVEL, 
params=params@entry=0x0,
  queryEnv=queryEnv@entry=0x0, qc=0x7ffc86cc7630, dest=0x1a2c200) at 
utility.c:1474
#4  0x0076afeb in standard_ProcessUtility (pstmt=0x1a2bf40, 
queryString=0x1a2b2c8 "create index CONCURRENTLY idxpart_a_idx on idxpart (a);",
 context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
  queryEnv=0x0, dest=0x1a2c200, qc=0x7ffc86cc7630) at utility.c:1069
#5  0x00768992 in PortalRunUtility (portal=0x1a8d1f8, pstmt=0x1a2bf40, 
isTopLevel=, setHoldSnapshot=, dest=,
 qc=0x7ffc86cc7630) at pquery.c:1157
#6  0x007693f3 in PortalRunMulti (portal=portal@entry=0x1a8d1f8, 
isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x1a2c200,
  altdest=altdest@entry=0x1a2c200, qc=qc@entry=0x7ffc86cc7630) at pquery.c:1310
#7  0x00769ed3 in PortalRun (portal=portal@entry=0x1a8d1f8, count=count
@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once
@entry=true, dest=dest@entry=0x1a2c200,
  altdest=altdest@entry=0x1a2c200, qc=0x7ffc86cc7630) at pquery.c:779
#8  0x00765b06 in exec_simple_query (query_string=0x1a2b2c8 "create 
index CONCURRENTLY idxpart_a_idx on idxpart (a);") at postgres.c:1239
#9  0x00767de5 in PostgresMain (argc=, argv=argv@entry
=0x1a552c8, dbname=, username=) at postgres.c:4315
#10 0x006f2b23 in BackendRun (port=0x1a4d1e0, port=0x1a4d1e0) at 
postmaster.c:4523
#11 BackendStartup (port=0x1a4d1e0) at postmaster.c:4215
#12 ServerLoop () at postmaster.c:1727
#13 0x006f3a1f in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x1a25ea0) 
at postmaster.c:1400
#14 0x004857f9 in main (argc=3, argv=0x1a25ea0) at main.c:210


You can re-produce it like this:
```
create table idxpart (a int, b int, c text) partition by range (a);
create table idxpart1 partition of idxpart for values from (0) to (10);
create table idxpart2 partition of idxpart for values from (10) to (20);
create index CONCURRENTLY idxpart_a_idx on idxpart (a);


I have been trying to get familiar with the source code of create index.
Can you solve this bug first? I will try my best to implement CIC with you.
Next, I will read your patchs v2-002 and v2-003.

Thank you very much,
 Regards,  Adger





how to create index concurrently on paritioned table

2020-06-03 Thread ()
Hi hackers,

Partitioning is necessary for very large tables.
 However, I found that postgresql does not support create index concurrently on 
partitioned tables.
The document show that we need to create an index on each partition 
individually and then finally create the partitioned index non-concurrently. 
This is undoubtedly a complex operation for DBA, especially when there are many 
partitions. 

Therefore, I wonder why pg does not support concurrent index creation on 
partitioned tables? 
What are the difficulties of this function? 
If I want to implement it, what should I pay attention?

Sincerely look forward to your reply. 

Regards & Thanks Adger

回复:回复:回复:Bug about drop index concurrently

2019-10-23 Thread ()
e, and the most 
unbearable is that it will bring delay to the log apply on standb, resulting in 
inconsistent data between the master and the standby.

All in all, I think this bug is a flaw in postgres design. We need to think 
carefully about how to handle it better. Even we can learn from other database 
products. I hope I can help you.

Thank you very much for your attention.

Regards.

adger.





--
发件人:Tomas Vondra 
发送时间:2019年10月24日(星期四) 06:04
收件人:李杰(慎追) 
抄 送:pgsql-hackers 
主 题:Re: 回复:回复:Bug about drop index concurrently

On Wed, Oct 23, 2019 at 02:38:45PM +0800, 李杰(慎追) wrote:
>>
>>I'm a bit confused. You shouldn't see any crashes and/or core files in
>>this scenario, for two reasons. Firstly, I assume you're running a
>>regular build without asserts. Secondly, I had to add an extra assert
>>to trigger the failure. So what core are you talking about?
>>
>Sorry, I should explain it more clearly.  I saw the core file because I
>modified the postgres source code and added Assert to it.
>>

OK

>>Also, it's not clear to me what do you mean by "bug in the standby" or
>>no lock in the drop index concurrently. Can you explain?
>>
>"bug in the standby" means that we built a master-slave instance, when
>we executed a large number of queries on the standby, we executed 'drop
>index concurrently' on the master so that get ‘error’ in the standby.
>Although it is not 100%, it will appear.  no lock in the drop index
>concurrently :::  I think this is because there are not enough advanced
>locks when executing ‘ drop index concurrently’.
>

OK, thanks for the clarification. Yes, it won't appear every time, it's
likely timing-sensitive (I'll explain in a minute).

>>Hmm, so you observe the issue with regular queries, not just EXPLAIN
>>ANALYZE?
>
>yeah, we have seen this error frequently.
>

That suggests you're doing a lot of 'drop index concurrently', right?

>>>Of course, we considered applying the method of waiting to detect the
>>>query lock on the master to the standby, but worried about affecting
>>>the standby application log delay, so we gave up that.
>>>
>>I don't understand? What method?
>>
>
>I analyzed this problem, I used to find out the cause of this problem,
>I also executed 'drop index concurrently' and ‘explain select * from
>xxx’  on the master, but the bug did not appear as expected.  So I went
>to analyze the source code. I found that there is such a mechanism on
>the master that when the 'drop index concurrently' is execute, it wait
>will every transaction that saw the old index state has finished.
>source code is as follows follow as:
>
>WaitForLockers(heaplocktag, AccessExclusiveLock);
>
>Therefore, I think that if this method is also available in standby,
>then the error will not appear. but I worried about affecting the
>standby application log delay, so we gave up that.
>

Yes, but we can't really do that, I'm afraid.

We certainly can't do that on the master because we simply don't have
the necessary information about locks from the standby, and we really
don't want to have it, because with a busy standby that might be quite a
bit of data (plust the standby would have to wait for the master to
confirm each lock acquisition, I think which seems pretty terrible).

On the standby, we don't really have an idea that the there's a drop
index running - we only get information about AE locks, and a bunch of
catalog updates. I don't think we have a way to determine this is a drop
index in concurrent mode :-(

More preciresly, the master sends information about AccessExclusiveLock
in XLOG_STANDBY_LOCK wal record (in xl_standby_locks struct). And when
the standby replays that, it should acquire those locks.

For regular DROP INDEX we send this:

rmgr: Standby ... desc: LOCK xid  db 16384 rel 16385 
rmgr: Standby ... desc: LOCK xid  db 16384 rel 20573 
... catalog changes ...
rmgr: Transaction ... desc: COMMIT 2019-10-23 22:42:27.950995 CEST;
rels: base/16384/20573; inval msgs: catcache 32 catcache 7 catcache
6 catcache 50 catcache 49 relcache 20573 relcache 16385 snapshot 2608

while for DROP IDNEX CONCURRENTLY we send this

rmgr: Heap... desc: INPLACE ... catalog update
rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
  relcache 21288 relcache 16385
rmgr: Heap... desc: INPLACE ... catalog update
rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
  relcache 21288 relcache 16385
rmgr: Standby ... desc: LOCK xid 10326 db 16384 rel 21288 
... catalog updates ...
rmgr: Transaction ... desc: COMMIT 2019-10-23 23:47:10.04256

回复:Bug about drop index concurrently

2019-10-22 Thread ()
Ah ha ha , this is great, I am very ashamed of my English expression, did not 
let you clearly understand my mail.

now, I am very glad that you can understand this. I sincerely hope that I can 
help you. I am also a postgres fan, a freshly graduated student.

We have all confirmed that this bug will only appear on the standby and will 
not appear on the master.But it does affect the use of standby.

For this bug, I proposed two options, one is to disable this feature (drop 
index concurrently), the other is to wait for the standby select like on the 
master, but it may affect the application delay of the log. Because this bug 
appears on the standby, I think both methods have advantages and disadvantages. 
So I hope that you can discuss it so much that it will help you.

Sincerely 

adger



--
发件人:Tomas Vondra 
发送时间:2019年10月23日(星期三) 06:49
收件人:李杰(慎追) 
抄 送:pgsql-hackers 
主 题:Re: Bug about drop index concurrently

On Fri, Oct 18, 2019 at 05:00:54PM +0200, Tomas Vondra wrote:
>Hi,
>
>I can trivially reproduce this - it's enough to create a master-standby
>setup, and then do this on the master
>
> CREATE TABLE t (a int, b int);
> INSERT INTO t SELECT i, i FROM generate_series(1,1) s(i);
>
>and run pgbench with this script
>
> DROP INDEX CONCURRENTLY IF EXISTS t_a_idx;
> CREATE INDEX t_a_idx ON t(a);
>
>while on the standby there's another pgbench running this script
>
> EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;
>
>and it fails pretty fast for me. With an extra assert(false) added to
>src/backend/access/common/relation.c I get a backtrace like this:
>
>   Program terminated with signal SIGABRT, Aborted.
>   #0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
>   Missing separate debuginfos, use: dnf debuginfo-install 
> glibc-2.29-22.fc30.x86_64
>   (gdb) bt
>   #0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
>   #1  0x7c32e457a895 in abort () from /lib64/libc.so.6
>   #2  0x00a58579 in ExceptionalCondition (conditionName=0xacd9bc 
> "!(0)", errorType=0xacd95b "FailedAssertion", fileName=0xacd950 "relation.c", 
> lineNumber=64) at assert.c:54
>   #3  0x0048d1bd in relation_open (relationId=38216, lockmode=1) at 
> relation.c:64
>   #4  0x005082e4 in index_open (relationId=38216, lockmode=1) at 
> indexam.c:130
>   #5  0x0080ac3f in get_relation_info (root=0x21698b8, 
> relationObjectId=16385, inhparent=false, rel=0x220ce60) at plancat.c:196
>   #6  0x008118c6 in build_simple_rel (root=0x21698b8, relid=1, 
> parent=0x0) at relnode.c:292
>   #7  0x007d485d in add_base_rels_to_query (root=0x21698b8, 
> jtnode=0x2169478) at initsplan.c:114
>   #8  0x007d48a3 in add_base_rels_to_query (root=0x21698b8, 
> jtnode=0x21693e0) at initsplan.c:122
>   #9  0x007d8fad in query_planner (root=0x21698b8, 
> qp_callback=0x7ded97 , qp_extra=0x7fffa4834f10) at 
> planmain.c:168
>   #10 0x007dc316 in grouping_planner (root=0x21698b8, 
> inheritance_update=false, tuple_fraction=0) at planner.c:2048
>   #11 0x007da7ca in subquery_planner (glob=0x220d078, 
> parse=0x2168f78, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at 
> planner.c:1012
>   #12 0x007d942c in standard_planner (parse=0x2168f78, 
> cursorOptions=256, boundParams=0x0) at planner.c:406
>   #13 0x007d91e8 in planner (parse=0x2168f78, cursorOptions=256, 
> boundParams=0x0) at planner.c:275
>   #14 0x008e1b0d in pg_plan_query (querytree=0x2168f78, 
> cursorOptions=256, boundParams=0x0) at postgres.c:878
>   #15 0x00658683 in ExplainOneQuery (query=0x2168f78, 
> cursorOptions=256, into=0x0, es=0x220cd90, queryString=0x21407b8 "explain 
> analyze select * from t where a = 1;", params=0x0, queryEnv=0x0) at 
> explain.c:367
>   #16 0x00658386 in ExplainQuery (pstate=0x220cc28, stmt=0x2141728, 
> queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
> params=0x0, queryEnv=0x0, dest=0x220cb90) at explain.c:255
>   #17 0x008ea218 in standard_ProcessUtility (pstmt=0x21425c0, 
> queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
>   completionTag=0x7fffa48355c0 "") at utility.c:675
>   #18 0x008e9a45 in ProcessUtility (pstmt=0x21425c0, 
> queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
>   completionTag=0x7fffa48355c0 "") at utility.c:360
>   #19 0x008e8a0c in PortalRunUtility 

回复:回复:Bug about drop index concurrently

2019-10-22 Thread ()
>
>I'm a bit confused. You shouldn't see any crashes and/or core files in
>this scenario, for two reasons. Firstly, I assume you're running a
>regular build without asserts. Secondly, I had to add an extra assert to
>trigger the failure. So what core are you talking about?
>
Sorry, I should explain it more clearly.
I saw the core file because I modified the postgres source code and added 
Assert to it.
>
>Also, it's not clear to me what do you mean by "bug in the standby" or
>no lock in the drop index concurrently. Can you explain?
>
"bug in the standby" means that we built a master-slave instance, when we 
executed a large number of queries on the standby, we executed 'drop index 
concurrently' on the master so that get ‘error’ in the standby. Although it is 
not 100%, it will appear.
no lock in the drop index concurrently :::  I think this is because there are 
not enough advanced locks when executing ‘ drop index concurrently’.

>Hmm, so you observe the issue with regular queries, not just EXPLAIN
>ANALYZE?

yeah, we have seen this error frequently.

>>Of course, we considered applying the method of waiting to detect the
>>query lock on the master to the standby, but worried about affecting
>>the standby application log delay, so we gave up that.
>>
>I don't understand? What method?
>

I analyzed this problem, I used to find out the cause of this problem, I also 
executed 'drop index concurrently' and ‘explain select * from xxx’  on the 
master, but the bug did not appear as expected.
So I went to analyze the source code. I found that there is such a mechanism on 
the master that when the 'drop index concurrently' is execute, it wait will 
every transaction that saw the old index state has finished. source code is as 
follows follow as:



WaitForLockers(heaplocktag, AccessExclusiveLock);

Therefore, I think that if this method is also available in standby, then the 
error will not appear. but I worried about affecting the standby application 
log delay, so we gave up that.






--
发件人:Tomas Vondra 
发送时间:2019年10月23日(星期三) 01:47
收件人:李杰(慎追) 
抄 送:pgsql-hackers 
主 题:Re: 回复:Bug about drop index concurrently

On Mon, Oct 21, 2019 at 10:36:04AM +0800, 李杰(慎追) wrote:
>Thanks for the quick reply.  And sorry I haven’t got back to you sooner
>.
>
>I have seen this backtrace in the core file, and I also looked at the
>bug in the standby because there is no lock in the drop index
>concurrently.
>

I'm a bit confused. You shouldn't see any crashes and/or core files in
this scenario, for two reasons. Firstly, I assume you're running a
regular build without asserts. Secondly, I had to add an extra assert to
trigger the failure. So what core are you talking about?

Also, it's not clear to me what do you mean by "bug in the standby" or
no lock in the drop index concurrently. Can you explain?

>However, when our business will perform a large number of queries in
>the standby, this problem will occur more frequently. So we are trying
>to solve this problem, and the solution we are currently dealing with
>is to ban it.
>

Hmm, so you observe the issue with regular queries, not just EXPLAIN
ANALYZE?

>Of course, we considered applying the method of waiting to detect the
>query lock on the master to the standby, but worried about affecting
>the standby application log delay, so we gave up that.
>

I don't understand? What method?

>If you have a better solution in the future, please push it to the new
>version, or email it, thank you very much.
>


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

回复:Bug about drop index concurrently

2019-10-22 Thread ()
Hi all,

I am sorry to bother you again.

I want to consult again, about the last time I raised a bug about drop index, 
are you going to deal with it in the future? Is it to ban it or to propose a 
repair plan, what is your next plan?

Sincerely look forward to your reply and thanks.

adger.


--
发件人:Tomas Vondra 
发送时间:2019年10月19日(星期六) 02:00
收件人:李杰(慎追) 
抄 送:pgsql-hackers 
主 题:Re: Bug about drop index concurrently

Hi,

I can trivially reproduce this - it's enough to create a master-standby
setup, and then do this on the master

  CREATE TABLE t (a int, b int);
  INSERT INTO t SELECT i, i FROM generate_series(1,1) s(i);

and run pgbench with this script

  DROP INDEX CONCURRENTLY IF EXISTS t_a_idx;
  CREATE INDEX t_a_idx ON t(a);

while on the standby there's another pgbench running this script

  EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;

and it fails pretty fast for me. With an extra assert(false) added to
src/backend/access/common/relation.c I get a backtrace like this:

Program terminated with signal SIGABRT, Aborted.
#0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install 
glibc-2.29-22.fc30.x86_64
(gdb) bt
#0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
#1  0x7c32e457a895 in abort () from /lib64/libc.so.6
#2  0x00a58579 in ExceptionalCondition (conditionName=0xacd9bc 
"!(0)", errorType=0xacd95b "FailedAssertion", fileName=0xacd950 "relation.c", 
lineNumber=64) at assert.c:54
#3  0x0048d1bd in relation_open (relationId=38216, lockmode=1) at 
relation.c:64
#4  0x005082e4 in index_open (relationId=38216, lockmode=1) at 
indexam.c:130
#5  0x0080ac3f in get_relation_info (root=0x21698b8, 
relationObjectId=16385, inhparent=false, rel=0x220ce60) at plancat.c:196
#6  0x008118c6 in build_simple_rel (root=0x21698b8, relid=1, 
parent=0x0) at relnode.c:292
#7  0x007d485d in add_base_rels_to_query (root=0x21698b8, 
jtnode=0x2169478) at initsplan.c:114
#8  0x007d48a3 in add_base_rels_to_query (root=0x21698b8, 
jtnode=0x21693e0) at initsplan.c:122
#9  0x007d8fad in query_planner (root=0x21698b8, 
qp_callback=0x7ded97 , qp_extra=0x7fffa4834f10) at 
planmain.c:168
#10 0x007dc316 in grouping_planner (root=0x21698b8, 
inheritance_update=false, tuple_fraction=0) at planner.c:2048
#11 0x007da7ca in subquery_planner (glob=0x220d078, 
parse=0x2168f78, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at 
planner.c:1012
#12 0x007d942c in standard_planner (parse=0x2168f78, 
cursorOptions=256, boundParams=0x0) at planner.c:406
#13 0x007d91e8 in planner (parse=0x2168f78, cursorOptions=256, 
boundParams=0x0) at planner.c:275
#14 0x008e1b0d in pg_plan_query (querytree=0x2168f78, 
cursorOptions=256, boundParams=0x0) at postgres.c:878
#15 0x00658683 in ExplainOneQuery (query=0x2168f78, 
cursorOptions=256, into=0x0, es=0x220cd90, queryString=0x21407b8 "explain 
analyze select * from t where a = 1;", params=0x0, queryEnv=0x0) at 
explain.c:367
#16 0x00658386 in ExplainQuery (pstate=0x220cc28, stmt=0x2141728, 
queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
params=0x0, queryEnv=0x0, dest=0x220cb90) at explain.c:255
#17 0x008ea218 in standard_ProcessUtility (pstmt=0x21425c0, 
queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
completionTag=0x7fffa48355c0 "") at utility.c:675
#18 0x008e9a45 in ProcessUtility (pstmt=0x21425c0, 
queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
completionTag=0x7fffa48355c0 "") at utility.c:360
#19 0x008e8a0c in PortalRunUtility (portal=0x219c278, 
pstmt=0x21425c0, isTopLevel=true, setHoldSnapshot=true, dest=0x220cb90, 
completionTag=0x7fffa48355c0 "") at pquery.c:1175
#20 0x008e871a in FillPortalStore (portal=0x219c278, 
isTopLevel=true) at pquery.c:1035
#21 0x008e8075 in PortalRun (portal=0x219c278, 
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x21efb90, 
altdest=0x21efb90, completionTag=0x7fffa48357b0 "") at pquery.c:765
#22 0x008e207c in exec_simple_query (query_string=0x21407b8 
"explain analyze select * from t where a = 1;") at postgres.c:1215
#23 0x008e636e in PostgresMain (argc=1, argv=0x216c600, 
dbname=0x216c4e0 "test", username=0x213c3f8 "user") at postgres.c:4236
#24 0x0083c71e in BackendRun (port=0x2165850) at postmaster.c:4437
#2

回复:Bug about drop index concurrently

2019-10-20 Thread ()
Thanks for the quick reply.
And sorry I haven’t got back to you sooner .

I have seen this backtrace in the core file, and I also looked at the bug in 
the standby because there is no lock in the drop index concurrently.

However, when our business will perform a large number of queries in the 
standby, this problem will occur more frequently. So we are trying to solve 
this problem, and the solution we are currently dealing with is to ban it.

Of course, we considered applying the method of waiting to detect the query 
lock on the master to the standby, but worried about affecting the standby 
application log delay, so we gave up that.

If you have a better solution in the future, please push it to the new version, 
or email it, thank you very much.

regards.

adger.


--
发件人:Tomas Vondra 
发送时间:2019年10月19日(星期六) 02:00
收件人:李杰(慎追) 
抄 送:pgsql-hackers 
主 题:Re: Bug about drop index concurrently

Hi,

I can trivially reproduce this - it's enough to create a master-standby
setup, and then do this on the master

  CREATE TABLE t (a int, b int);
  INSERT INTO t SELECT i, i FROM generate_series(1,1) s(i);

and run pgbench with this script

  DROP INDEX CONCURRENTLY IF EXISTS t_a_idx;
  CREATE INDEX t_a_idx ON t(a);

while on the standby there's another pgbench running this script

  EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;

and it fails pretty fast for me. With an extra assert(false) added to
src/backend/access/common/relation.c I get a backtrace like this:

Program terminated with signal SIGABRT, Aborted.
#0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install 
glibc-2.29-22.fc30.x86_64
(gdb) bt
#0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
#1  0x7c32e457a895 in abort () from /lib64/libc.so.6
#2  0x00a58579 in ExceptionalCondition (conditionName=0xacd9bc 
"!(0)", errorType=0xacd95b "FailedAssertion", fileName=0xacd950 "relation.c", 
lineNumber=64) at assert.c:54
#3  0x0048d1bd in relation_open (relationId=38216, lockmode=1) at 
relation.c:64
#4  0x005082e4 in index_open (relationId=38216, lockmode=1) at 
indexam.c:130
#5  0x0080ac3f in get_relation_info (root=0x21698b8, 
relationObjectId=16385, inhparent=false, rel=0x220ce60) at plancat.c:196
#6  0x008118c6 in build_simple_rel (root=0x21698b8, relid=1, 
parent=0x0) at relnode.c:292
#7  0x007d485d in add_base_rels_to_query (root=0x21698b8, 
jtnode=0x2169478) at initsplan.c:114
#8  0x007d48a3 in add_base_rels_to_query (root=0x21698b8, 
jtnode=0x21693e0) at initsplan.c:122
#9  0x007d8fad in query_planner (root=0x21698b8, 
qp_callback=0x7ded97 , qp_extra=0x7fffa4834f10) at 
planmain.c:168
#10 0x007dc316 in grouping_planner (root=0x21698b8, 
inheritance_update=false, tuple_fraction=0) at planner.c:2048
#11 0x007da7ca in subquery_planner (glob=0x220d078, 
parse=0x2168f78, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at 
planner.c:1012
#12 0x007d942c in standard_planner (parse=0x2168f78, 
cursorOptions=256, boundParams=0x0) at planner.c:406
#13 0x007d91e8 in planner (parse=0x2168f78, cursorOptions=256, 
boundParams=0x0) at planner.c:275
#14 0x008e1b0d in pg_plan_query (querytree=0x2168f78, 
cursorOptions=256, boundParams=0x0) at postgres.c:878
#15 0x00658683 in ExplainOneQuery (query=0x2168f78, 
cursorOptions=256, into=0x0, es=0x220cd90, queryString=0x21407b8 "explain 
analyze select * from t where a = 1;", params=0x0, queryEnv=0x0) at 
explain.c:367
#16 0x00658386 in ExplainQuery (pstate=0x220cc28, stmt=0x2141728, 
queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
params=0x0, queryEnv=0x0, dest=0x220cb90) at explain.c:255
#17 0x008ea218 in standard_ProcessUtility (pstmt=0x21425c0, 
queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
completionTag=0x7fffa48355c0 "") at utility.c:675
#18 0x008e9a45 in ProcessUtility (pstmt=0x21425c0, 
queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
completionTag=0x7fffa48355c0 "") at utility.c:360
#19 0x008e8a0c in PortalRunUtility (portal=0x219c278, 
pstmt=0x21425c0, isTopLevel=true, setHoldSnapshot=true, dest=0x220cb90, 
completionTag=0x7fffa48355c0 "") at pquery.c:1175
#20 0x008e871a in FillPortalStore (portal=0x219c278, 
isTopLevel=true) at pquery.c:1035
#21 0x008e8075 in PortalRun (portal=0x219c278, 
count=9223372036854775807, isTopLevel=true, run_once=true, dest

Bug about drop index concurrently

2019-10-16 Thread ()

Hi hackers,

In recently, I discovered a postgres bug, and I hope I can ask you for the best 
solution.
The problem is as follows:

postgres=# explain analyze select * from xxx where a=500;
ERROR: could not open relation with OID 25989
The structure of my table is as follows:
postgres=# \d xxx
 Table "public.xxx"
 Column | Type  | Collation | Nullable | Default
+-+---+--+-
 a | integer |  | |
 b | text |  | |

postgres=# select count(*) from xxx;
 count 

 80
(1 row)

postgres=# select * from xxx limit 3;

 a | b
---+--
 1 | 203c51477570aa517cfa317155dcc52c
 2 | b58da31baa5c78fee4642fd398bd5909
 3 | c7c475bf0a3ca2fc2afc4812a4d44c58

I opened the log file and saw that the index of table xxx was deleted,

postgres=# drop index CONCURRENTLY idx_xxx ;
DROP INDEX

In order to reproduce this bug, I created and deleted the index again and again 
on the master.
What is hard to understand is that this bug cannot be repeated 100%.
I wrote a script that loops over the master and runs the following two 
sentences.

postgres=# create index idx_xxx on xxx (a);
postgres=# drop index CONCURRENTLY idx_xxx ;
postgres=# create index idx_xxx on xxx (a);
postgres=# drop index CONCURRENTLY idx_xxx ;
...
...
...
At the same time, I started two clients in the standby, 
respectively execute the following sql on the table xxx:

postgres=# explain analyze select * from xxx where a=500;
postgres=# \watch 0.1

After a few minutes, the bug will appear.

I finally confirmed my guess, I used an index scan in the standby query,
but deleted the index on the master at the same time.
Curious, I went to read the source code of Postgres. I found that
 regular DROP INDEX commands imposes a AccessExclusiveLock on the table,
 while drop index concurrently commands only used ShareUpdateExclusiveLock.

As we all know, only AccessExclusiveLock and  AccessShareLock ,a select's  lock 
,
are mutually exclusive, and AccessShareLock can't block 
ShareUpdateExclusiveLock.
This is very weird and not desirable.

This is of course, developers must have thought of this, so we can see in the 
source 
code, before the drop index concurrently, will wait for all transactions using 
this
 index to wait for detection.

 But this only exists on the master, my query is executed on the standby.
 I use the pg_waldump tool to parse the wal file, and analyze the stantup 
process,
 I found that there is no similar operation on the standby, so it will appear 
that 
 when I execute the query on the standby, the index will be deleted by others.


I think this is a bug that will affect the user's experience. we need to fix it.
 I have imagined that the logic that detects the query transaction and
  waits for it to end is implemented on the standby,but this may increase the
 log application delay and the delay is exacerbated that cause the master and 
backup. 
This is not desirable if the query concurrency is large.

All in all, I expect that you can provide a solution that can use drop index 
concurrently 
without affecting the master-slave delay.

Sincerely look forward to your reply and thanks.

adger