Hello,

I'll add this to CF2017-09.

At Tue, 27 Jun 2017 16:27:18 +0900, Amit Langote 
<langote_amit...@lab.ntt.co.jp> wrote in 
<75fe42df-b1d8-89ff-596d-d9da0749e...@lab.ntt.co.jp>
> On 2017/06/26 18:44, Kyotaro HORIGUCHI wrote:
> > At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote wrote:
> >>
> >> I recall I had proposed a fix for the same thing some time ago [1].
> > 
> > Wow. About 1.5 years ago and stuck? I have a concrete case where
> > this harms  so I'd like to fix that this time. How can we move on?
> 
> Agreed that this should be fixed.
> 
> Your proposed approach #1 to recheck the inheritance after obtaining the
> lock on the child table seems to be a good way forward.
> 
> Approach #2 of reordering locking is a simpler solution, but does not
> completely prevent the problem, because DROP TABLE child can still cause
> it to occur, as you mentioned.
> 
> >>> The cause is that NO INHERIT doesn't take an exlusive lock on the
> >>> parent. This allows expand_inherited_rtentry to add the child
> >>> relation into appendrel after removal from the inheritance but
> >>> still exists.
> >>
> >> Right.
> >>
> >>> I see two ways to fix this.
> >>>
> >>> The first patch adds a recheck of inheritance relationship if the
> >>> corresponding attribute is missing in the child in
> >>> make_inh_translation_list(). The recheck is a bit complex but it
> >>> is not performed unless the sequence above is happen. It checks
> >>> duplication of relid (or cycles in inheritance) following
> >>> find_all_inheritors (but doing a bit different) but I'm not sure
> >>> it is really useful.
> >>
> >> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
> >> I guess your hash table based solution will do the job as far as
> >> performance of this check is concerned, although I haven't checked the
> >> code closely.
> > 
> > The hash table is not crucial in the patch. Substantially it just
> > recurs down looking up pg_inherits for the child. I considered
> > the new index but abandoned because I thought that such case
> > won't occur so frequently.
> 
> Agreed.  BTW, the hash table in patch #1 does not seem to be really
> helpful.  In the while loop in is_descendant_of_internal(), does
> hash_search() ever return found = true?  AFAICS, it does not.
> 
> >>> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> >>> the parent first.
> >>>
> >>> Since the latter has a larger impact on the current behavior and
> >>> we already treat "DROP TABLE child" case in the similar way, I
> >>> suppose that the first approach would be preferable.
> >>
> >> That makes sense.

So, I attached only the first patch, rebased on the current
master (It actually failed to apply on it.) and fixed a typo in a
comment.

This still runs a closed-loop test using temporary hash but it
seems a bit paranoic. (this is the same check with what
find_all_inheritors is doing)


<< the following is another topic >>

> >> BTW, in the partitioned table case, the parent is always locked first
> >> using an AccessExclusiveLock.  There are other considerations in that case
> >> such as needing to recreate the partition descriptor upon termination of
> >> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).
> > 
> > Apart from the degree of concurrency, if we keep parent->children
> > order of locking, such recreation does not seem to be
> > needed. Maybe I'm missing something.
> 
> Sorry to have introduced that topic in this thread, but I will try to
> explain anyway why things are the way they are currently:
> 
> Once a table is no longer a partition of the parent (detached or dropped),
> we must make sure that the next commands in the transaction don't see it
> as one.  That information is currently present in the relcache
> (rd_partdesc), which is used by a few callers, most notably the
> tuple-routing code.  Next commands must recreate the entry so that the
> correct thing happens based on the updated information.  More precisely,
> we must invalidate the current entry.  RelationClearRelation() will either
> delete the entry or rebuild it.  If it's being referenced somewhere, it
> will be rebuilt.  The place holding the reference may also be looking at
> the content of rd_partdesc, which we don't require them to make a copy of,
> so we must preserve its content while other fields of RelationData are
> being read anew from the catalog.  We don't have to preserve it if there
> has been any change (partition added/dropped), but to make such a change
> one would need to take a strong enough lock on the relation (parent).  We
> assume here that anyone who wants to reference rd_partdesc takes at least
> AccessShareLock lock on the relation, and anyone who wants to change its
> content must take a lock that will conflict with it, so
> AccessExclusiveLock.  Note that in all of this, we are only talking about
> one relation, that is the parent, so parent -> child ordering of taking
> locks may be irrelevant.

I think I understand this, anyway DropInherit and DropPartition
is different-but-at-the-same-level operations so surely needs
amendment for drop/detach cases. Is there already a solution? Or
reproducing steps?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/catalog/pg_inherits.c
--- b/src/backend/catalog/pg_inherits.c
***************
*** 42,47 **** typedef struct SeenRelsEntry
--- 42,49 ----
  	ListCell   *numparents_cell;	/* corresponding list cell */
  } SeenRelsEntry;
  
+ static bool is_descendent_of_internal(Oid parentId, Oid childId,
+ 									  HTAB *seen_rels);
  /*
   * find_inheritance_children
   *
***************
*** 400,402 **** typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
--- 402,472 ----
  
  	return result;
  }
+ 
+ /*
+  * Check if the child is really a descendent of the parent
+  */
+ bool
+ is_descendent_of(Oid parentId, Oid childId)
+ {
+ 	HTAB	   *seen_rels;
+ 	HASHCTL		ctl;
+ 	bool		ischild = false;
+ 
+ 	memset(&ctl, 0, sizeof(ctl));
+ 	ctl.keysize = sizeof(Oid);
+ 	ctl.entrysize = sizeof(Oid);
+ 	ctl.hcxt = CurrentMemoryContext;
+ 
+ 	seen_rels = hash_create("is_descendent_of temporary table",
+ 							32, /* start small and extend */
+ 							&ctl,
+ 							HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ 
+ 	ischild = is_descendent_of_internal(parentId, childId, seen_rels);
+ 
+ 	hash_destroy(seen_rels);
+ 
+ 	return ischild;
+ }
+ 
+ static bool
+ is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels)
+ {
+ 	Relation	inhrel;
+ 	SysScanDesc scan;
+ 	ScanKeyData key[1];
+ 	bool		ischild = false;
+ 	HeapTuple	inheritsTuple;
+ 
+ 	inhrel = heap_open(InheritsRelationId, AccessShareLock);
+ 	ScanKeyInit(&key[0], Anum_pg_inherits_inhparent,
+ 				BTEqualStrategyNumber, F_OIDEQ,	ObjectIdGetDatum(parentId));
+ 	scan = systable_beginscan(inhrel, InheritsParentIndexId, true,
+ 							  NULL, 1, key);
+ 
+ 	while ((inheritsTuple = systable_getnext(scan)) != NULL)
+ 	{
+ 		bool found;
+ 		Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
+ 
+ 		hash_search(seen_rels, &inhrelid, HASH_ENTER, &found);
+ 
+ 		/*
+ 		 * Recursively check into children. Although there can't theoretically
+ 		 * be any cycles in the inheritance graph, check the cycles following
+ 		 * find_all_inheritors.
+ 		 */
+ 		if (inhrelid == childId ||
+ 			(!found && is_descendent_of_internal(inhrelid, childId, seen_rels)))
+ 		{
+ 			ischild = true;
+ 			break;
+ 		}
+ 	}
+ 
+ 	systable_endscan(scan);
+ 	heap_close(inhrel, AccessShareLock);
+ 
+ 	return ischild;
+ }
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
***************
*** 100,109 **** static List *generate_append_tlist(List *colTypes, List *colCollations,
  static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
  static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
  						 Index rti);
! static void make_inh_translation_list(Relation oldrelation,
  						  Relation newrelation,
! 						  Index newvarno,
! 						  List **translated_vars);
  static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
  					List *translated_vars);
  static Node *adjust_appendrel_attrs_mutator(Node *node,
--- 100,108 ----
  static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
  static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
  						 Index rti);
! static List *make_inh_translation_list(Relation oldrelation,
  						  Relation newrelation,
! 						  Index newvarno);
  static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
  					List *translated_vars);
  static Node *adjust_appendrel_attrs_mutator(Node *node,
***************
*** 1508,1513 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
--- 1507,1527 ----
  		 */
  		if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
  		{
+ 			List *translated_vars =
+ 				make_inh_translation_list(oldrelation, newrelation,
+ 										  childRTindex);
+ 
+ 			if (!translated_vars)
+ 			{
+ 				/*
+ 				 * This childrel is no longer a child of the parent.
+ 				 * Close the child relation releasing locks.
+ 				 */
+ 				if (childOID != parentOID)
+ 					heap_close(newrelation, lockmode);
+ 				continue;
+ 			}
+ 
  			/* Remember if we saw a real child. */
  			if (childOID != parentOID)
  				has_child = true;
***************
*** 1517,1524 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
  			appinfo->child_relid = childRTindex;
  			appinfo->parent_reltype = oldrelation->rd_rel->reltype;
  			appinfo->child_reltype = newrelation->rd_rel->reltype;
! 			make_inh_translation_list(oldrelation, newrelation, childRTindex,
! 									  &appinfo->translated_vars);
  			appinfo->parent_reloid = parentOID;
  			appinfos = lappend(appinfos, appinfo);
  
--- 1531,1537 ----
  			appinfo->child_relid = childRTindex;
  			appinfo->parent_reltype = oldrelation->rd_rel->reltype;
  			appinfo->child_reltype = newrelation->rd_rel->reltype;
! 			appinfo->translated_vars = translated_vars;
  			appinfo->parent_reloid = parentOID;
  			appinfos = lappend(appinfos, appinfo);
  
***************
*** 1623,1636 **** expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
  /*
   * make_inh_translation_list
   *	  Build the list of translations from parent Vars to child Vars for
!  *	  an inheritance child.
   *
   * For paranoia's sake, we match type/collation as well as attribute name.
   */
! static void
  make_inh_translation_list(Relation oldrelation, Relation newrelation,
! 						  Index newvarno,
! 						  List **translated_vars)
  {
  	List	   *vars = NIL;
  	TupleDesc	old_tupdesc = RelationGetDescr(oldrelation);
--- 1636,1649 ----
  /*
   * make_inh_translation_list
   *	  Build the list of translations from parent Vars to child Vars for
!  *	  an inheritance child. Returns NULL if the two relations are no longer in
!  *	  a inheritance relationship.
   *
   * For paranoia's sake, we match type/collation as well as attribute name.
   */
! static List *
  make_inh_translation_list(Relation oldrelation, Relation newrelation,
! 						  Index newvarno)
  {
  	List	   *vars = NIL;
  	TupleDesc	old_tupdesc = RelationGetDescr(oldrelation);
***************
*** 1700,1707 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
--- 1713,1730 ----
  					break;
  			}
  			if (new_attno >= newnatts)
+ 			{
+ 				/*
+ 				 * It's possible that newrelation is no longer a child of the
+ 				 * oldrelation. We just ingore it for the case.
+ 				 */
+ 				if (!is_descendent_of(oldrelation->rd_id, newrelation->rd_id))
+ 					return NULL;
+ 
+ 				/* If still a child, something's wrong. */
  				elog(ERROR, "could not find inherited attribute \"%s\" of relation \"%s\"",
  					 attname, RelationGetRelationName(newrelation));
+ 			}
  		}
  
  		/* Found it, check type and collation match */
***************
*** 1720,1726 **** make_inh_translation_list(Relation oldrelation, Relation newrelation,
  									 0));
  	}
  
! 	*translated_vars = vars;
  }
  
  /*
--- 1743,1749 ----
  									 0));
  	}
  
! 	return vars;
  }
  
  /*
*** a/src/include/catalog/pg_inherits_fn.h
--- b/src/include/catalog/pg_inherits_fn.h
***************
*** 23,27 **** extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
--- 23,28 ----
  extern bool has_subclass(Oid relationId);
  extern bool has_superclass(Oid relationId);
  extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId);
+ extern bool is_descendent_of(Oid parentId, Oid childId);
  
  #endif							/* PG_INHERITS_FN_H */
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to