my-ship-it commented on issue #1648:
URL: https://github.com/apache/cloudberry/issues/1648#issuecomment-4167022056

   Hi @adnanhamdussalam,
   
   Thanks for the thorough investigation and detailed logs — they were very 
helpful in pinpointing the issue.
   
   ### What's Happening
   
   You're right, this is a confirmed bug. The problem is that every mirror 
recovery via `gprecoverseg -F` (or `gpaddmirrors`) ends up running 
`pg_basebackup` **twice** per segment:
   
   1. The first attempt copies all the data (~1TB in your case) and reaches 
100%, but then fails at the WAL streaming phase because 
`internal_wal_replication_slot` doesn't exist on the primary.
   2. `pg_basebackup` then removes the entire data directory it just copied.
   3. A second attempt is launched with `--create-slot`, which starts the full 
copy all over again from scratch — and this one succeeds.
   
   So for your 16 segments, you're effectively doing **32 full base backups** 
instead of 16, doubling the time and I/O required.
   
   ### Root Cause
   
   The recovery code in `gpsegrecovery.py` assumes the first attempt without 
`--create-slot` will "fail quickly" if the slot doesn't exist. However, the 
slot check only happens during the WAL streaming phase — **after** the full 
data copy has already completed. This makes the retry extremely expensive for 
large segments.
   
   There is actually a `GPDB_12_MERGE_FIXME` comment in the code acknowledging 
this needs to be fixed.
   
   ### Workaround
   
   Until we get a proper fix, you can avoid the double-copy by manually 
creating the replication slot on each primary before running `gprecoverseg`. 
Note that you need to connect in **utility mode** since direct connections to 
primary segments are not allowed by default:
   
   ```bash
   PGOPTIONS='-c gp_role=utility' psql -h sky-cbseg03 -p 50002 -d postgres -c \
     "SELECT 
pg_create_physical_replication_slot('internal_wal_replication_slot');"
   ```
   
   To check which primaries already have the slot (this can be run from the 
coordinator):
   
   ```sql
   SELECT gp_segment_id, slot_name
   FROM gp_dist_random('pg_replication_slots')
   WHERE slot_name = 'internal_wal_replication_slot';
   ```
   
   ### Next Steps
   
   I've created #1654 to track the fix. The plan is to check whether the 
replication slot exists on the primary before running `pg_basebackup`, and 
create it if needed — so the first attempt always succeeds.
   
   Thanks again for reporting this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to