my-ship-it commented on issue #1648:
URL: https://github.com/apache/cloudberry/issues/1648#issuecomment-4167022056
Hi @adnanhamdussalam,
Thanks for the thorough investigation and detailed logs — they were very
helpful in pinpointing the issue.
### What's Happening
You're right, this is a confirmed bug. The problem is that every mirror
recovery via `gprecoverseg -F` (or `gpaddmirrors`) ends up running
`pg_basebackup` **twice** per segment:
1. The first attempt copies all the data (~1TB in your case) and reaches
100%, but then fails at the WAL streaming phase because
`internal_wal_replication_slot` doesn't exist on the primary.
2. `pg_basebackup` then removes the entire data directory it just copied.
3. A second attempt is launched with `--create-slot`, which starts the full
copy all over again from scratch — and this one succeeds.
So for your 16 segments, you're effectively doing **32 full base backups**
instead of 16, doubling the time and I/O required.
### Root Cause
The recovery code in `gpsegrecovery.py` assumes the first attempt without
`--create-slot` will "fail quickly" if the slot doesn't exist. However, the
slot check only happens during the WAL streaming phase — **after** the full
data copy has already completed. This makes the retry extremely expensive for
large segments.
There is actually a `GPDB_12_MERGE_FIXME` comment in the code acknowledging
this needs to be fixed.
### Workaround
Until we get a proper fix, you can avoid the double-copy by manually
creating the replication slot on each primary before running `gprecoverseg`.
Note that you need to connect in **utility mode** since direct connections to
primary segments are not allowed by default:
```bash
PGOPTIONS='-c gp_role=utility' psql -h sky-cbseg03 -p 50002 -d postgres -c \
"SELECT
pg_create_physical_replication_slot('internal_wal_replication_slot');"
```
To check which primaries already have the slot (this can be run from the
coordinator):
```sql
SELECT gp_segment_id, slot_name
FROM gp_dist_random('pg_replication_slots')
WHERE slot_name = 'internal_wal_replication_slot';
```
### Next Steps
I've created #1654 to track the fix. The plan is to check whether the
replication slot exists on the primary before running `pg_basebackup`, and
create it if needed — so the first attempt always succeeds.
Thanks again for reporting this!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]