Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

Tom Lane Thu, 17 Jan 2019 12:42:25 -0800

Peter Geoghegan <p...@bowt.ie> writes:
> On Tue, Dec 18, 2018 at 2:26 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Hm, that definitely leads me to feel that we've got bug(s) in
>> dependency.c.  I'll take a look sometime soon.


> Thanks!
> I'm greatly relieved that I probably won't have to become an expert on
> dependency.c after all.   :-)

So I poked around for awhile with running the regression tests under
ignore_system_indexes.  There seem to be a number of issues involved
here.  To a significant extent, they aren't bugs, at least not according
to the original conception of the dependency code: it was not a design
goal that different dependencies of the same object-to-be-deleted would
be processed in a fixed order.  That leads to the well-known behavior
that cascade drops report the dropped objects in an unstable order.

Now, perhaps we should make such stability a design goal, as it'd allow
us to get rid of some "suppress the cascade outputs" hacks in the
regression tests.  But it's a bit of a new feature.  If we wanted to
do that, I'd be inclined to do it by absorbing all the pg_depend entries
for a particular object into an ObjectAddress array and then sorting
them before we process them.  The main stumbling block here is "what
would the sort order be?".  The best idea I can come up with offhand
is to sort by OID, which at least for regression test purposes would
mean objects would be listed/processed more or less in creation order.

However, there are a couple of places where the ignore_system_indexes
output does something weird like

 DROP TABLE PKTABLE;
 ERROR:  cannot drop table pktable because other objects depend on it
-DETAIL:  constraint constrname2 on table fktable depends on table pktable
+DETAIL:  constraint constrname2 on table fktable depends on index pktable_pkey
 HINT:  Use DROP ... CASCADE to drop the dependent objects too.

The reason for this is that the reported "nearest dependency" is the
object that we first reached the named object by recursing from.
If there are multiple dependency paths to the same object then it's
order-traversal-dependent which path we take first.  Sorting the
pg_depend entries before processing, as I suggested above, would
remove the instability, but it's not real clear whether we'd get a
desirable choice of reported object that way.  Perhaps we could
improve matters by having the early exits from findDependentObjects
(at stack_address_present_add_flags and object_address_present_add_flags)
consider replacing the already-recorded dependee with the current
source object, according to some set of rules.  One rule that I think
would be useful is to compare dependency types: I think a normal
dependency is more interesting than an auto dependency which is
more interesting than an internal one.  Beyond that, though, I don't
see anything we could do but compare OIDs.  (If we do compare OIDs,
then the result should be stable with or without pre-sorting of the
pg_depend entries.)

I also noticed some places where the output reports a different number
of objects dropped by a cascade.  This happens when a table column
is reached by some dependency path, while the whole table is reached
by another one.  If we find the whole table first, then when we find
the table column we just ignore it.  But in the other order, they're
reported as two separate droppable objects.  The reasoning is
explained in object_address_present_add_flags:

                 * We get here if we find a need to delete a whole table after
                 * having already decided to drop one of its columns.  We
                 * can't report that the whole object is in the array, but we
                 * should mark the subobject with the whole object's flags.
                 *
                 * It might seem attractive to physically delete the column's
                 * array entry, or at least mark it as no longer needing
                 * separate deletion.  But that could lead to, e.g., dropping
                 * the column's datatype before we drop the table, which does
                 * not seem like a good idea.  This is a very rare situation
                 * in practice, so we just take the hit of doing a separate
                 * DROP COLUMN action even though we know we're gonna delete
                 * the table later.

I think this reasoning is sound, and we should continue to do the separate
DROP COLUMN step, but it occurs to me that just because we drop the column
separately doesn't mean we have to report it separately to the user.
I propose that we handle this case by adding a new DEPFLAG_IS_SUBOBJECT
flag to the column object's flags, denoting that we know the whole table
will be dropped later.  The only effect of this flag is to suppress
reporting of the column object in reportDependentObjects.

Another order-dependent effect that can be seen in the regression tests
comes from the fact that the first loop in findDependentObjects (over
the target's referenced objects) errors out immediately on the first
DEPENDENCY_INTERNAL, DEPENDENCY_INTERNAL_AUTO, or DEPENDENCY_EXTENSION
entry it finds.  When this code was written, that was fine because the
only possibility was DEPENDENCY_INTERNAL, and there can be only one
DEPENDENCY_INTERNAL dependency for an object.  The addition of
DEPENDENCY_EXTENSION makes that shaky: if you have a couple of
internally-related objects inside an extension, and you try to drop the
dependent one, in theory you might get told either to drop the extension
or to drop the parent object.  However, I believe we generally avoid
making DEPENDENCY_EXTENSION entries for internally-dependent objects
(they usually don't have their own owner dependencies either) so I'm not
sure the case arises in practice, or needs to arise in practice.  It'd
be reasonable to expect that any given object has at most one INTERNAL+
EXTENSION dependency.  (Whether it's reasonable to try to enforce that
is hard to say, though.)

DEPENDENCY_INTERNAL_AUTO, however, broke this completely, as the code
has no hesitation about making multiple entries of that kind.   After
rather cursorily looking at that code, I'm leaning to the position
that DEPENDENCY_INTERNAL_AUTO is broken-by-design and needs to be
nuked from orbit.  In the cases where it's being used, such as
partitioned indexes, I think that probably the right design is one
DEPENDENCY_INTERNAL dependency on the partition master index, and
then one DEPENDENCY_AUTO dependency on the matching partitioned table.
It does not make sense to claim that an object is "part of the
implementation of" more than one thing.

If we can't do that, then ensuring stability of the report would again
require sorting of the referenced objects before we examine them.

I've not attempted to write patches for any of these ideas yet;
I thought I'd throw them out for comments first.

                        regards, tom lane

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

Reply via email to