RE: BUG #18016: REINDEX TABLE failure

2023-07-11 Thread Richard Veselý
Hi Gurjeet,

Thank you for the follow-up. I was worried my message got buried in the middle 
of the thread. I also appreciate your work on the patch to fix/improve the 
REINDEX TABLE behavior even though most people would never encounter it in the 
wild.

As a preface I would first like to say that I can appreciate the emphasis on 
general maintainability of the codebase, trying to avoid having some overly 
clever hacks that might impede understanding, having ideally one way of doing 
things like having a common database page structure, etc. The more one keeps to 
this "happy" path the better the general state of the project end up by keeping 
it accessible to the rest of the community and attracting more contributions in 
turn.

That being said, PostgreSQL can be extremely conservative in scenarios where it 
might not be warranted while giving a limited opportunity to influence said 
behavior. This often leads to a very low hardware resource utilization. You can 
easily find many instances across StackOverflow, dba.stackexchange.com, 
/r/postgres and pgsql-performance where people run into ingress/egress 
bottlenecks even though their hardware can trivially support much larger 
workload.

In my experience, you can be very hard-pressed in many cases to saturate even a 
modest enterprise HDD while observing the official guidelines 
(https://www.postgresql.org/docs/current/populate.html), e.g. minimal WAL and 
host of other configuration optimizations, having no indexes and constraints 
and creating table and filling it with binary COPY within the same transaction. 
And the situation with pg_dump/pg_restore is often much worse.

Is there an interest in improving the current state of affairs? I will be 
rewriting the indexing first to get the whole picture, but I can already tell 
you that there is a -lot- of performance left on the table even considering the 
effort to improve COPY performance in PostgreSQL 16. Given sufficient hardware, 
you should always be heavily IO-bound without exception and saturate any 
reasonable number of NVMe SSDs.

Best regards,
Richard

-Original Message-
From: Gurjeet Singh  
Sent: Monday, July 10, 2023 6:44 PM
To: Richard Veselý ; Postgres Hackers 

Cc: Tom Lane ; Michael Paquier 
Subject: Re: BUG #18016: REINDEX TABLE failure

On Sun, Jul 9, 2023 at 7:21 AM Richard Veselý  wrote:
>
> ... there's no shortage of people that suffer from sluggish 
> pg_dump/pg_restore cycle and I imagine there are any number of people that 
> would be interested in improving bulk ingestion which is often a bottleneck 
> for analytical workloads as you are well aware. What's the best place to 
> discuss this topic further - pgsql-performance or someplace else?

(moved conversation to -hackers, and moved -bugs to BCC)

> I was dissatisfied with storage layer performance, especially during 
> the initial database population, so I rewrote it for my use case. I'm 
> done with the heap, but for the moment I still rely on PostgreSQL to 
> build indexes,

It sounds like you've developed a method to speed up loading of tables, and 
might have ideas/suggestions for speeding up CREATE INDEX/REINDEX. The -hackers 
list feels like a place to discuss such changes.

Best regards,
Gurjeet
http://Gurje.et


Re: BUG #18016: REINDEX TABLE failure

2023-07-11 Thread Gurjeet Singh
On Sun, Jul 9, 2023 at 7:18 AM Tom Lane  wrote:
>
> Michael Paquier  writes:
> > That should be OK, I assume.  However, if this is improved and
> > something we want to support in the long-run I guess that a TAP test
> > may be appropriate.
>
> I do not see the point of a TAP test.  It's not like the code isn't
> covered perfectly well.

Please find attached the patch that makes REINDEX TABLE perform
reindex on toast table before reindexing the main table's indexes.

The code block movement involved slightly more thought and care than I
had previously imagined. As explained in comments in the patch, the
enumeration and suppression of indexes on the main table must happen
before any CommandCounterIncrement() call, hence the
reindex-the-toast-table-if-any code had to be placed after that
enumeration.

In support of the argument above, the patch does not include any TAP
tests. Reliably reproducing the original error message involves
restarting the database, and since that can't be done via SQL
commands, no sql tests are included, either.

The patch also includes minor wordsmithing, and benign whitespace
changes in neighboring code.

Best regards,
Gurjeet
http://Gurje.et


v1-0001-Reindex-toast-table-s-index-before-main-table-s-i.patch
Description: Binary data


Re: BUG #18016: REINDEX TABLE failure

2023-07-10 Thread Gurjeet Singh
On Sun, Jul 9, 2023 at 7:21 AM Richard Veselý  wrote:
>
> ... there's no shortage of people that suffer from sluggish 
> pg_dump/pg_restore cycle and I imagine there are any number of people that 
> would be interested in improving bulk ingestion which is often a bottleneck 
> for analytical workloads as you are well aware. What's the best place to 
> discuss this topic further - pgsql-performance or someplace else?

(moved conversation to -hackers, and moved -bugs to BCC)

> I was dissatisfied with storage layer performance, especially during the 
> initial database population, so I rewrote it for my use case. I'm done with 
> the heap, but for the moment I still rely on PostgreSQL to build indexes,

It sounds like you've developed a method to speed up loading of
tables, and might have ideas/suggestions for speeding up CREATE
INDEX/REINDEX. The -hackers list feels like a place to discuss such
changes.

Best regards,
Gurjeet
http://Gurje.et