Hello, Matthias and others!
Realized new horizon was applied only during validation phase (once index
is marked as ready).
Now it applied if index is not marked as valid yet.
Updated version in attach.
--
> I think the best way for this to work
Hi again!
Made an error in `GlobalVisHorizonKindForRel` logic, and it was caught by a
new test.
Fixed version in attach.
>
From 9a8ea366f6d2d144979e825c4ac0bdd2937bf7c1 Mon Sep 17 00:00:00 2001
From: nkey
Date: Tue, 7 May 2024 22:10:56 +0200
Subject: [PATCH v3] WIP: fix d9d076222f5b "VACUUM:
Hello, Matthias and others!
Updated WIP in attach.
Changes are:
* Renaming, now it feels better for me
* More reliable approach in `GlobalVisHorizonKindForRel` to make sure we
have not missed `rd_safeindexconcurrentlybuilding` by calling
`RelationGetIndexList` if required
* Optimization to avoid
Hello, Matthias!
I just realized there is a much simpler and safe way to deal with the
problem.
So, d9d076222f5b94a85e0e318339cfc44b8f26022d(1) had a bug because the scan
was not protected by a snapshot. At the same time, we want this snapshot to
affect not all the relations, but only a subset
Hello, Matthias!
> We can just release the current snapshot, and get a new one, right? I
> mean, we don't actually use the transaction for much else than
> visibility during the first scan, and I don't think there is a need
> for an actual transaction ID until we're ready to mark the index entry
Hello!
> I'm not a fan of this approach. Changing visibility and cleanup
> semantics to only benefit R/CIC sounds like a pain to work with in
> essentially all visibility-related code. I'd much rather have to deal
> with another index AM, even if it takes more time: the changes in
> semantics
Hi!
> How do you suppose this would work differently from a long-lived
> normal snapshot, which is how it works right now?
Difference in the ability to take new visibility snapshot periodically
during the second phase with rechecking visibility of tuple according
to the "reference" snapshot
One more idea - is just forbid HOT prune while the second phase is
running. It is not possible anyway currently because of snapshot held.
Possible enhancements:
* we may apply restriction only to particular tables
* we may apply restrictions only to part of the tables (not yet
scanned by R/CICs).
Hello!
> I think the best way for this to work would be an index method that
> exclusively stores TIDs, and of which we can quickly determine new
> tuples, too. I was thinking about something like GIN's format, but
> using (generation number, tid) instead of ([colno, colvalue], tid) as
> key data
> > > I just realised there is one issue with this design: We can't cheaply
> > > reset the snapshot during the second table scan:
> > > It is critically important that the second scan of R/CIC uses an index
> > > contents summary (made with index_bulk_delete) that was created while
> > > the
Hello, Melanie!
Sorry to interrupt you, just a quick question.
> Correct, but there are changes being discussed where we would freeze
> tuples during pruning as well [0], which would invalidate that
> implementation detail. And, if I had to choose between improved
> opportunistic freezing and
Hello!
> Correct, but there are changes being discussed where we would freeze
> tuples during pruning as well [0], which would invalidate that
> implementation detail. And, if I had to choose between improved
> opportunistic freezing and improved R/CIC, I'd probably choose
> improved freezing
Hello!
It seems like the idea of "old" snapshot is still a valid one.
> Should this deal with any potential XID wraparound, too?
As far as I understand in our case, we are not affected by this in any way.
Vacuum in our table is not possible because of locking, so, nothing
may be frozen (see
Hello.
Realized my last idea is invalid (because tuples are frozen by using
dynamically calculated horizon) - so, don't waste your time on it :)
Need to think a little bit more here.
Thanks,
Mikhail.
> Yes, good catch.
> Assuming we have somehow prevented vac_truncate_clog from occurring
> during CIC, we can leave frozen and potentially frozen
> (xmin
Hello!
> How would this deal with tuples not visible to the old snapshot?
> Presumably we can assume they're newer than that snapshot (the old
> snapshot didn't have it, but the new one does, so it's committed after
> the old snapshot, making them newer), so that backend must have
> inserted it
Hello!
> Also, feels like we may apply this to both phases (first and the second
> scans).
> The original patch (1) was helping only to the second one (after call
> to set_indexsafe_procflags).
Oops, I was wrong here. The original version of the patch was also applied to
both phases.
> Note
Hello!
> I've thought about alternative solutions, too: how about getting a new
> snapshot every so often?
> We don't really care about the liveness of the already-scanned data; the
> snapshots used for RIC
> are used only during the scan. C/RIC's relation's lock level means vacuum
> can't run
Hello, hackers!
I think about revisiting (1) ({CREATE INDEX, REINDEX} CONCURRENTLY
improvements) in some lighter way.
Yes, a serious bug was (2) caused by this optimization and now it reverted.
But what about a more safe idea in that direction:
1) add new horizon which ignores PROC_IN_SAFE_IC
Thanks everyone for help!
Hello, good question!
Thanks for your edits.
As answer: probably we need to change
"If we know that we're holding ProcArrayLock exclusively, we don't
need the read barrier."
to
"If we're removing xid, we don't need the read barrier because only
the startup process can remove and add xids to
Hello!
Updated version (with read barriers is attached).
> One remaining question I have is whether it is okay if we see an updated value
> for one of the head/tail variables but not the other. It looks like the
> tail variable is only updated with ProcArrayLock held exclusively, which
> IIUC
Hello, Nathan.
> What sort of benefits do you see from this patch? It might be worthwhile
> in itself to remove spinlocks when possible, but IME it's much easier to
> justify such changes when there is a tangible benefit we can point to.
Oh, it is not an easy question :)
The answer, probably,
%40mail.gmail.com#cc4827dee902978f93278732435e8521
From d968645551412abdb3fac6b8514c3d6746e8ac3d Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Sat, 18 Mar 2023 21:28:00 +0300
Subject: [PATCH v2] Removes known_assigned_xids_lck, using the write memory
barrier as suggested earlier.
---
src
Hello, Andres.
> Not sure I like TransactionIdRetreatSafely() as a name. Maybe
> TransactionIdRetreatClamped() is better?
I think it is better to just replace TransactionIdRetreatedBy.
It is not used anymore after
`v3-0001-WIP-Fix-corruption-due-to-vacuum_defer_cleanup_ag.patch` -
so, it is
Hello.
I have registered it as patch in the commit fest:
https://commitfest.postgresql.org/42/4138/
Best regards,
Michail.
Hello.
The few things I have got so far:
1) It is not required to order by random() to reproduce the issue - it
could be done using queries like:
BEGIN;
SELECT omg.*
FROM something_is_wrong_here AS omg
ORDER BY value -- change is here
LIMIT 1
FOR
Hello!
Thanks for checking the issue!
> FWIW, I find that the "Assert(ItemIdIsNormal(lp));" at the top of
> heap_lock_tuple() is the first thing that fails on my assert-enabled
> build.
Yes, the same for me:
TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c",
Line: 4292, PID:
Hello, Andres.
I apologize for the direct ping, but I think your snapshot scalability
work in PG14 could be related to the issue.
The TransactionIdRetreatedBy implementation looks correct... But with
txid_current=212195 I see errors like "could not access status of
transaction 58643736"...
So,
Hello, hackers.
It seems like PG 14 works incorrectly with vacuum_defer_cleanup_age
(or just not cleared rows, not sure) and SELECT FOR UPDATE + UPDATE.
I am not certain, but hot_standby_feedback probably able to cause the
same issues.
Steps to reproduce:
1) Start Postgres like this:
> Does that by any chance mean you are using a non-community version of
> Postgres which has some other changes?
It is a managed Postgres service in the general cloud. Usually, such
providers apply some custom minor patches.
The only one I know about - about forbidding of canceling queries
while
Hello, Amid.
> The point which is not completely clear from your description is the
> timing of missing records. In one of your previous emails, you seem to
> have indicated that the data missed from Table B is from the time when
> the initial sync for Table B was in-progress, right? Also, from
Hello.
> None of these entries are from the point mentioned by you [1]
> yesterday where you didn't find the corresponding data in the
> subscriber. How did you identify that the entries corresponding to
> that timing were missing?
Some of the before the interval, some after... But the source
Hello, Amit!
> IUC, this is the time when only table B's initial sync was
> in-progress. Table A's initial sync was finished by that time and for
> Table C, it is yet not started.
Yes, it is correct. C was started too, but unsuccessfully (restarted
after, see below).
> During the time of
> the
Hello again.
Just small a fix for:
> 2022-12-14 09:21:25.705 to
> 2022-12-14 09:49:20.664 (after synchronization start, but before finish).
Correct values are:
2022-12-14 09:49:31.340
2022-12-14 09:49:41.683
So, it looks like we lost about 10s of one of the tables WAL.
Hello.
Just a small story about small data-loss on logical replication.
We were logically replicating a 4 TB database from
> PostgreSQL 12.12 (Ubuntu 12.12-201-yandex.49163.d86383ed5b) on
> x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0,
> 64-bit
to
> PostgreSQL
Hello, Tom.
> Since we're running out of time in the current commitfest,
> I went ahead and changed that, and made the cosmetic fixes
> I wanted, and pushed.
Great, thanks!
The small thing I was thinking to add in KnownAssignedXidsCompress is
the assertion like
Assert(MyBackendType ==
b9805
[2]:
https://www.postgresql.org/message-id/flat/CANtu0oiPoSdQsjRd6Red5WMHi1E83d2%2B-bM9J6dtWR3c5Tap9g%40mail.gmail.com#cc4827dee902978f93278732435e8521
--
Michail Nikolaev
From 926403598e9506edfa32c9534be9a4e48f756002 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Wed, 23 No
Oh, seems like it is not my day :) The image fixed again.
Oops, wrong image, this is correct one. But is 1-run tests, so it
shows only basic correlation,
t-only for pgbench).
If such approach looks committable - I could do more careful
performance testing to find the best value for
WASTED_SNAPSHOT_WORK_LIMIT_TO_COMPRESS.
[1]: https://gist.github.com/michail-nikolaev/e1dfc70bdd7cfd1b902523dbb3db2f28
--
Michail Nikolaev
Index: src/backend/storage/ip
xcept non-startup
processes (approach with offsets to skip gaps while building snapshot).
[1]: https://gist.github.com/michail-nikolaev/e1dfc70bdd7cfd1b902523dbb3db2f28
[2]:
https://www.postgresql.org/message-id/flat/CANtu0ogzo4MsR7My9%2BNhu3to5%3Dy7G9zSzUbxfWYOn9W5FfHjTA%40mail.gmail.com#341a3c3b033f69b26
Hello.
Thanks to everyone for the review.
> It seems to me storing the index itself is simpler and maybe faster by
> the cycles to perform addition.
Yes, first version used 1-byte for offset with maximum value of 255.
Agreed, looks like there is no sense to store offsets now.
> A simple patch
Hello, Andrey.
> I've looked into v5.
Thanks!
Patch is updated accordingly your remarks.
Best regards,
Michail.
From 1301a262dea7f541c11092851e82f10932150ee3 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Tue, 19 Jul 2022 23:50:53 +0300
Subject: [PATCH v6] Curren
Hello, Simon.
Sorry for calling you directly, but you know the subject better than
anyone else. It is related to your work from 2010 - replacing
KnownAssignedXidsHash with the KnownAssignedXids array.
I have added additional optimization to the data structure you
implemented. Initially, it was
Hello, hackers.
While working on (1) in commit
2871b4618af1acc85665eec0912c48f8341504c4 (2) from 2010 I noticed Simon
Riggs was thinking about usage of memory barrier for KnownAssignedXids
access instead of spinlocks.
> We could dispense with the spinlock if we were to
> create suitable memory
Hello, Tom.
>> This is a pretty broad claim to make on the basis of one undocumented
>> test case on one unmentioned platform.
> I'll try to use pg_stat_kcache to check the difference between Wall
> and CPU for my case.
In my case I see pretty high correlation of pg_stat_kcache and
Hello, Tom.
> This is a pretty broad claim to make on the basis of one undocumented
> test case on one unmentioned platform.
I'll try to use pg_stat_kcache to check the difference between Wall
and CPU for my case.
> On what grounds do you claim getrusage will be better? One thing we
> can be
Hello, Thomas.
> This might be interesting:
> https://github.com/powa-team/pg_stat_kcache
Oh, nice, looks like it could help me to reduce CPU and test my
assumption (using exec_user_time and exec_system_time).
BWT, do you know why extension is not in standard contrib (looks mature)?
Best
Hello, hackers.
Today I was doing some aggregates over pg_stat_statements in order to
find types of queries consuming most of the CPU. Aggregates were made
on two pg_state_statement snapshots within 30 sec delay.
The sum(total_time) had the biggest value for a very frequent query
with about 10ms
Hello, David.
Thanks for your review!
> As a specific recommendation here - submit patches with a complete commit
> message.
> Tweak it for each new version so that any prior discussion that informed the
> general design of
> the patch is reflected in the commit message.
Yes, agreed. Applied
Hello, Peter.
> The simple answer is: I don't know. I could probably come up with a
> better answer than that, but it would take real effort, and time.
I remember you had an idea about using the LP_REDIRECT bit in btree
indexes as some kind of “recently dead” flag (1).
Is this idea still in
Hello.
Just an updated commit message.
Thanks,
Michail.
From 934d649a18c66f8b448463e57375865b33ce53e7 Mon Sep 17 00:00:00 2001
From: nkey
Date: Fri, 1 Apr 2022 02:17:55 +0300
Subject: [PATCH v5] Optimize KnownAssignedXidsGetAndSetXmin by maintaining
offset to next valid xid.
MIME-Version: 1.0
UPD:
> I was thinking it is safe to have additional hint bits
> on primary, but it seems like no.
Oh, sorry for the mistake, it is about standby of course.
> BTW I am wondering if it is possible
> to achieve the same situation by pg_rewind and standby promotion…
Looks like it is impossible,
Hello, Peter.
Thanks for your review!
> I doubt that the patch's use of pg_memory_barrier() in places like
> _bt_killitems() is correct. There is no way to know for sure if this
> novel new lockless algorithm is correct or not, since it isn't
> explained anywhere.
The memory barrier is used
Hello, Greg.
> I'm seeing a recovery test failure. Not sure if this represents an
> actual bug or just a test that needs to be adjusted for the new
> behaviour.
Thanks for notifying me. It is a failure of a test added in the patch.
It is a little hard to make it stable (because it depends on
Hello, Peter.
>> * Add code to _bt_killitems() that detects if it has generated an FPI,
>> just to set some LP_DEAD bits.
>> * Instead of avoiding the FPI when this happens, proactively call
>> _bt_simpledel_pass() just before _bt_killitems() returns. Accept the
>> immediate cost of setting an
ci.com/build/5599876384817152), so, moving it back to
"ready for committer" .
Best regards,
Michail.
From 9ecb33a54971cfa1c766ed9d129c6abb44e39f98 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Sat, 15 Jan 2022 16:21:51 +0300
Subject: [PATCH v10 1/3] code
---
src/backend/access/
Hello, Peter.
Thanks for your comments.
> There is one FPI per checkpoint for any leaf page that is modified
> during that checkpoint. The difference between having that happen once
> or twice per leaf page and having that happen many more times per leaf
> page could be very large.
Yes, I am
Hello, Peter.
> * Instead of avoiding the FPI when this happens, proactively call
> _bt_simpledel_pass() just before _bt_killitems() returns. Accept the
> immediate cost of setting an LP_DEAD bit, just like today, but avoid
> repeated FPIs.
Hm, not sure here
AFAIK current implementation does not
Hello, Andrey.
Thanks for your efforts.
> Patch on barrier seems too complicated to me right now. I’d propose to focus
> on KnowAssignedXidsNext patch: it’s clean, simple and effective.
I'll extract it to the separated patch later.
> I’ve rebased the patch so that it does not depend on
Hello, Yura.
A one additional moment:
> 1332: Assert((oldFlags & (BM_PIN_COUNT_WAITER | BM_IO_IN_PROGRESS)) == 0);
> 1333: CLEAR_BUFFERTAG(buf->tag);
> 1334: buf_state &= ~(BUF_FLAG_MASK | BUF_USAGECOUNT_MASK);
> 1335: UnlockBufHdr(buf, buf_state);
I think there is no sense to unlock buffer
Hello, Yura.
Test results look promising. But it seems like the naming and dynahash
API change is a little confusing.
1) I think it is better to split the main part and atomic nentries
optimization into separate commits.
2) Also, it would be nice to also fix hash_update_hash_key bug :)
3) Do we
Hello, Julien.
> I rebased the pathset in attached v9. Please double check that I didn't miss
> anything in the rebase.
Thanks a lot for your help.
> I will let you mark the patch as Ready for Committer once you validate that
> the
> rebase was ok.
Yes, rebase looks good.
Best regards,
back to "Ready for Committer" once it passes tests.
Best regards,
Michail.
From a46315fd96b5432241ab6c67c37493ef41d7dc73 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Sun, 23 Jan 2022 20:47:51 +0300
Subject: [PATCH v8 2/3] test
---
src/test/recovery/Makefile
> I'm switching this patch on Waiting on Author.
I have tested it multiple times on my Github repo, seems to be stable now.
Switching back to Ready for committer.
Best regards.
Michail.
From 9372bac9b56d27cf993e9d1fa66127c86b51f25c Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Sat, 15 Ja
Hello.
It is also could be related -
https://www.postgresql.org/message-id/flat/20220112112425.pgzymqcgdy62e7m3%40jrouhaud#097b54a539ac3091ca4e4ed8ce9ab89c
(both Windows and Linux cases.
Best regards,
Michail.
Hello.
Looks like logical replication also affected:
[08:26:35.599] # poll_query_until timed out executing this query:
[08:26:35.599] # SELECT count(1) = 0 FROM pg_subscription_rel WHERE
srsubstate NOT IN ('r', 's');
[08:26:35.599] # expecting this output:
[08:26:35.599] # t
[08:26:35.599] #
Hello, Andrey.
> Write barrier must be issued after write, not before.
> Don't we need to issue read barrier too?
The write barrier is issued after the changes to KnownAssignedXidsNext
and KnownAssignedXidsValid arrays and before the update of
headKnownAssignedXids.
So, it seems to be correct.
e
and some additional investigation had been done. So, I think I’ll
re-add the patch to the commitfest app.
Thanks,
Michail
From 94cd2fbf37b5f0b824e0f9a9bc23f762a8bb19b5 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Sun, 21 Nov 2021 21:23:02 +0300
Subject: [PATCH v3 1/2] memory barrier instead
ot lean on the compiler here
because of `volatile` args.
Also, I have added some comments.
Best regards,
Michail.
From 1d55c6fae8cc160eadd705da0d70d9e7fb5bc00f Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Wed, 10 Nov 2021 00:02:18 +0300
Subject: [PATCH v2] known assignment xid next
s feature in the commitfest app works
in a different way :)
Best regards,
Michail.
From 02b0dd27944c37007d8a92905a14e6b3e8e50fa8 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Tue, 9 Nov 2021 21:43:58 +0300
Subject: [PATCH v6 2/3] test
---
src/test/recovery/Makefile| 1 +
.../r
Woo-hoo :)
> Attached is a proposal for a minor addition that would make sense to me, add
> it if you think it's appropriate.
Yes, I'll add to the patch.
> I think I've said enough, changing the status to "ready for committer" :-)
Thanks a lot for your help and attention!
Best regards,
I have changed approach, so it is better to start from this email:
https://www.postgresql.org/message-id/flat/CANtu0ohHu1r1xQfTzEJuxeaOMYncG7xRxUQWdH%3DcMXZSf%2Bnzvg%40mail.gmail.com#4c81a4d623d8152f5e8889e97e750eec
be” LSN-related logic to the test.
Thanks a lot,
Michail.
From f8a87a2329e81b55b484547dd50edfd97a722ad2 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Fri, 5 Nov 2021 19:28:12 +0300
Subject: [PATCH v5 3/3] doc
---
src/backend/access/nbtree/README | 35 ++--
sr
Hello, Andres.
Could you please clarify how to better deal with the situation?
According to your previous letter, I think there was some
misunderstanding regarding the latest patch version (but I am not
sure). Because as far as understand provided optimization (lazily
calculated optional offset
nt bits itself.
It is tested on different standby, see
> is(hints_num($node_standby_2), qq(10), 'index hint bits already
set on second standby 2');
Also, I added checks for BTP_LP_SAFE_ON_STANDBY to make sure
everything in the test goes by scenario.
Thanks a lot,
Michail.
From cfb45d1a9cbf30be6098b2
Hello, Andres.
Thanks for the feedback again.
> The problem is that we don't want to add a lot of work
> KnownAssignedXidsAdd/Remove, because very often nobody will build a snapshot
> for that moment and building a sorted, gap-free, linear array of xids isn't
> cheap. In my experience it's more
Hello, Andres.
Thanks for your feedback.
>> Maybe use a hashtable of running transactions? It will be slightly faster
>> when adding\removing single transactions. But much worse when doing
>> KnownAssignedXidsRemove().
> Why would it be worse for KnownAssignedXidsRemove()? Were you intending to
Hello.
> I have tried such an approach but looks like it is not effective,
> probably because of CPU caching issues.
It was a mistake by me. I have repeated the approach and got good
results with small and a non-invasive patch.
The main idea is simple optimistic optimization - store offset to
Hello.
I saw this error multiple times trying to replicate the 2-3 TB server
(version 11 to version 12). I was unable to find any explanation for
this error.
Thanks,
Michail.
Hello, Kirill.
> Also, maybe it is better to reduce the invasivity by using a more
> simple approach. For example, use the first bit to mark xid as valid
> and the last 7 bit (128 values) as an optimistic offset to the next
> valid xid (jump by 127 steps in the worse scenario).
> What do you
)Hello.
> I recently ran into a problem in one of our production postgresql cluster.
> I had noticed lock contention on procarray lock on standby, which causes WAL
> replay lag growth.
Yes, I saw the same issue on my production cluster.
> 1) set max_connections to big number, like 10
I
Hello.
Added a check for standby promotion with the long transaction to the
test (code and docs are unchanged).
Thanks,
Michail.
From c5e1053805c537b50b0922151bcf127754500adb Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Fri, 14 May 2021 00:32:30 +0300
Subject: [PATCH v3 3/4] test
e."
Fixed.
Updated version in attach.
Thanks a lot,
Michail.
From 004b2dea9b700d890147b840573bb5b796c1f96a Mon Sep 17 00:00:00 2001
From: Michail Nikolaev
Date: Wed, 12 May 2021 22:56:18 +0300
Subject: [PATCH v2 4/4] doc
---
src/backend/access/nbtree/README | 35 ++-
Hello,
Antonin.
> Sorry, I missed the fact that your example can be executed inside BEGIN - END
> block, in which case minRecoveryPoint won't advance after each command.
No, the block is not executed as a single transaction, all commands
are separated transactions (see below)
> Actually I think
Hello, Antonin.
> I'm trying to review the patch, but not sure if I understand this problem,
> please see my comment below.
Thanks a lot for your attention. It is strongly recommended to look at
version N3 (1) because it is a much more elegant, easy, and reliable
solution :) But the
Hello, Peter.
If you are interested, the possible patch (based on FPI mask during
replay) was sent with some additional explanation and graphics to (1).
At the moment I unable to find any "incorrectness" in it.
Thanks again for your comments.
Michail.
[1]
Hello, Peter.
> AFAICT that's not true, at least not in any practical sense. See the
> comment in the middle of MarkBufferDirtyHint() that begins with "If we
> must not write WAL, due to a relfilenode-specific...", and see the
> "Checksums" section at the end of src/backend/storage/page/README.
Hello, Peter.
Thanks a lot for your comments.
There are some mine thoughts, related to the “masked bits” solution and
your comments:
> During recovery, we will probably always have to consider the
> possibility that LP_DEAD bits that get set on the primary may be
> received by a replica through
Hello, Peter.
> Yeah, it would help a lot. But those bits are precious. So it makes
> sense to think about what to do with both of them in index AMs at the
> same time. Otherwise we risk missing some important opportunity.
Hm. I was trying to "expand the scope" as you said and got an idea...
Hello, Peter.
> I wonder if it would help to not actually use the LP_DEAD bit for
> this. Instead, you could use the currently-unused-in-indexes
> LP_REDIRECT bit.
Hm… Sound very promising - an additional bit is a lot in this situation.
> Whether or not "recently dead" means "dead to my
>
Hello, hackers.
Sorry for necroposting, but if someone is interested - I hope the patch is
ready now and available in the other thread (1).
Thanks,
Michail.
[1]
https://www.postgresql.org/message-id/flat/CANtu0oiP18H31dSaEzn0B0rW6tA_q1G7%3D9Y92%2BUS_WHGOoQevg%40mail.gmail.com
Hello, hackers.
I think I was able to fix the issue related to minRecoveryPoint and crash
recovery. To make sure standby will be consistent after crash recovery, we
need to take the current value of minRecoveryPoint into account while
setting LP_DEAD hints (almost the same way as it is done for
Hello, everyone.
Oh, I just realized that it seems like I was too naive to allow
standby to set LP_DEAD bits this way.
There is a possible consistency problem in the case of low
minRecoveryPoint value (because hint bits do not move PageLSN
forward).
Something like this:
LSN=10 STANDBY INSERTS
Hello, hackers.
[ABSTRACT]
Execution of queries to hot standby is one of the most popular ways to
scale application workload. Most of the modern Postgres installations
have two standby nodes for high-availability support. So, utilization
of replica's CPU seems to be a reasonable idea.
At the
Hello, Peter.
Thanks for your explanation. One of the reasons I was asking - is an idea
to use the same technique in the "LP_DEAD index hint bits on standby" WIP
patch to reduce the amount of additional WAL.
Now I am sure such optimization should work correctly.
Thanks,
Michail.
Hello, hackers.
Working on some stuff, I realized I do not understand why
latestRemovedXid|cuteoff_xid (in different types of WAL records) are
sent every time they appear on the primary side.
latestRemovedXid|cuteoff_xid is used to call
ResolveRecoveryConflictWithSnapshot and cancel conflicting
Hello, Peter.
> Attached is a revised version of your patch
Thanks for your work, the patch is looking better now.
Michail.
Hello, Peter.
Thanks for the update.
Yes, it is the right decision.
I have started to spot that bug only while working on a faster scan
using hint bits on replicas [1], so it is unlikely to hit it in
production at the moment.
Thanks,
Michail.
[1]:
1 - 100 of 136 matches
Mail list logo