On Sat, Mar 14, 2026 at 12:36 AM Corey Huinker <[email protected]> wrote:
> On Fri, Mar 13, 2026 at 8:17 AM Etsuro Fujita <[email protected]> wrote:
>> On Fri, Mar 13, 2026 at 4:42 AM Corey Huinker <[email protected]> 
>> wrote:
>> >>
>> >> By the errors-to-warnings change, even when those libpq functions
>> >> fail, we fall back to the normal ANALYZE processing, but I don't think
>> >> that that is OK, because if those functions fail for some reasons
>> >> (network-related issues like network disconnection or memory-related
>> >> issues like out of memory), then libpq functions called later for the
>> >> normal processing would also be likely to fail for the same reasons,
>> >> causing the same failure again, which I don't think is good.  To avoid
>> >> that, when libpq functions in fetch_relstats() and fetch_attstats()
>> >> fail, shouldn't we just throw an error, as before?
>>
>> > Right now the code doesn't differentiate on the kind of an error that was 
>> > encountered. If it was a network error, then not only do we expect 
>> > fallback to also fail, but simple fdw queries will fail as well. If, 
>> > however it were a permission issue (no SELECT permission on pg_stats, for 
>> > instance, or pg_stats doesn't exist because the remote database is not a 
>> > regular Postgres like redshift or something) then I definitely want to 
>> > fall back. Currently, the code makes no judgement on the details of the 
>> > error, it just trusts that fallback-analyze will either succeed (because 
>> > it was permissions related) or it will quickly encounter the same 
>> > insurmountable effort, and one extra PQsendQuery isn't that much overhead.
>>
>> I'm not sure if that's really true; if the error is the one that
>> doesn't take a time to detect, the extra function call wouldn't
>> probably be a problem, but if the error is the one that takes a long
>> time to detect, the function call would make the situation worse.
>
> I'm struggling to think of what kind of error that would be that would take a 
> long time to detect. Network timeouts could be one thing, but such a 
> situation would affect normal query operations as well, not just analyzes. If 
> you have a specific error in mind, please let me know.

I too was thinking of network timeouts (and in such a case I thought
that even if PQsendQuery was run successfully, the following
PQgetResult would have to block for a long time).  And yes, that
situation can occur for normal queries, but in such a query, the extra
functions call is done only when aborting the remote transaction, in a
way libpq functions don't have to wait for a network timeout.

>> > If you think that inspecting the error that we get and matching against a 
>> > list of errors that should skip the retry or skip the fallback, I'd be in 
>> > favor of that. It's probably easier to start with a list of errorcodes 
>> > that we feel are hopeless and should remain ERRORs not WARNINGs.
>>
>> That's an improvement, but I think that that's nice-to-have, not
>> must-have.  As there's not much time left until the feature freeze,
>> I'd like to propose to make the first cut as simple as possible and
>> thus leave that for future work, if this is intended for v19.  If no
>> objections from you (or anyone else), I'd like to update the patch as
>> I proposed.
>
> I'd like to get this into 19 as well, so I'm likely to agree to your proposal.

Ok, will update the patch.

>> Another thing I noticed is related to this comment in fetch_relstats():
>>
>> +   /*
>> +    * Before v14, a reltuples value of 0 was ambiguous: it could either mean
>> +    * the relation is empty, or it could mean that it hadn't yet been
>> +    * vacuumed or analyzed.  (Newer versions use -1 for the latter case).
>> +    *
>> +    * We can ignore this change, because if the remote table wasn't 
>> analyzed,
>> +    * then it would have no attribute stats, and thus we wouldn't have stats
>> +    * that we would try to import. So we can take the reltuples value as-is.
>> +    */
>>
>> I might have misread the patch, but for v14 or later the patch tries
>> to import remote stats even when we have reltuples = -1.  Is that
>> intentional?  In that case the remote table has never yet been
>> vacuumed/analyzed, so I think we should just fall back to the normal
>> processing.
>
> For cases where we affirmatively have -1, it makes sense to immediately go to 
> the ANALYZE option (if available) and fall back if not. I'll make that change.
>
>>   Also, for v13 or earlier it also tries to import remote
>> stats even when we have reltuples = 0, but in that case the remote
>> table might not have been vacuumed/analyzed, so to avoid possibly
>> useless processing, I think we should just fall back to it in that
>> case as well.
>
> +1

I forgot to mention this, but when we have reltuples = 0 for v14 or
later, the patch tries to import both relstats and attstats, but in
that case I think it would be OK to do so only for the former for the
consistency with the normal processing.  What do you think about that?

Best regards,
Etsuro Fujita


Reply via email to