On Fri, Jan 23, 2026 at 7:27 AM Amul Sul <[email protected]> wrote: > Another option I previously considered was adding the filtration logic > inside the archive streamer itself. However, since the very first read > is required to calculate the WAL segment size, the filter check cannot > be performed immediately. However, we could send a signal to the > archive streamer via privateInfo (e.g., a read_any_wal or > skip_wal_check boolean flag) to disable the filtration check until the > size is calculated. But that approach isn't very elegant; if the first > WAL page we read belongs to a segment we actually want to skip, we > would still have to run the filter check and handle the skip/removal > logic outside of the streamer (i.e., inside init_archive_reader()). > This would result in performing the same filtration check in two > different places.
I mean, I don't really buy this logic. If the information added to privateInfo is "here's the LSN before which you can remove stuff," and it starts out initialized to 0/0, then the read of the first WAL page causes no problem at all, because nothing is before 0/0. After it gets updated to some non-zero value, the next call to astreamer_waldump_content() can handle discarding any data we don't need. IMHO, the best argument for keeping things are is that in some cases, that decision might result in a bit of delay in discarding old data, but I don't think that really matters. I think all that we care about is the peak memory utilization of an operation, and I don't think that an explicit signaling system should increase that at all. That said, I'm certainly willing to consider other ideas about how this can work. However, I feel strongly that the logic needs to be not only correct, but clear and well-explained. Setting cur_wal to NULL to make the astreamer skip without adequate comments doesn't meet that standard. Maybe with some better comments it's all right, but frankly I'm a bit skeptical. Right now, you're using whether or not cur_wal is NULL as a signal to skip data or not skip data. How is that better than passing down the LSN and TLI that you want to read next and letting the astreamer figure out what to do itself? It's a signaling mechanism either way, but it seems a lot easier to figure out whether we always keep the LSN and TLI updated properly than to figure out whether cur_wal is always NULL at exactly the right times. -- Robert Haas EDB: http://www.enterprisedb.com
