truncation mechanism will only ever
kick-in during a VACUUM operation.
--
Peter Geoghegan
paround/aggressive autovacuum
immediately afterwards. We want that second anti-wraparound VACUUM to
hurry from the very start of lazy_scan_heap().
--
Peter Geoghegan
On Mon, Apr 5, 2021 at 4:29 PM Tom Lane wrote:
> Peter Geoghegan writes:
> > Do you think that it's okay that we rely on the propagation of global
> > state to parallel workers on Postgres 13? Don't we need something like
> > my fixup commit 49f49def on Postgr
> because most people don't run manual VACUUM with cost limits turned
> on. And autovacuum doesn't use parallelism.
Oh yeah. "static BufferAccessStrategy vac_strategy" is guaranteed to
be initialized to 0, simply because it's static and global. That
explains it.
--
Peter Geoghegan
On Mon, Apr 5, 2021 at 5:09 PM Peter Geoghegan wrote:
> Oh yeah. "static BufferAccessStrategy vac_strategy" is guaranteed to
> be initialized to 0, simply because it's static and global. That
> explains it.
So do we need to allocate a strategy in workers now, or leave
utovacuum_freeze_max_age. How about the following: "...
> >VACUUM< will use the higher of this value and 105% of
> >guc-autovacuum-freeze-max-age<, so that only ...". It's only slightly
> easier to read, but at least it conveys that values lower than 105% of
> autovacuum_freeze_max_age are not considered. The same can be said for
> the multixact guc documentation.
This does need work too.
I'm going to push 0002- and 0003- tomorrow morning pacific time. I'll
publish a new set of patches tomorrow, once I've finished that up. The
last 2 patches will require a lot of focus to get over the line for
Postgres 14.
--
Peter Geoghegan
GUCs to reflect the fact that we do everything possible to advance
relfrozenxid in the case where the fail safe mechanism kicks in -- not
just skipping index vacuuming. It also incorporates your most recent
round of feedback.
Thanks
--
Peter Geoghegan
v11-0001-Truncate-line-pointer-arra
on of the patch that I just pushed. We have run out of time to
consider calling PageTruncateLinePointerArray() in more places. I
think that the most important thing is that we have *some* protection
against line pointer bloat.
--
Peter Geoghegan
On Wed, Apr 7, 2021 at 8:52 AM Peter Geoghegan wrote:
> All of the changes from your fixup patch are clear improvements, and
> so I'll include them in the final commit. Thanks!
I did change the defaults of the GUCs to 1.6 billion, though.
All patches in the patch series have
curitup, while this one is about curitemid. While I have no problem
silencing this compiler warning now, I don't see any reason to not
just do the same thing again. Which is to initialize the pointer to
NULL.
--
Peter Geoghegan
) -- the failing
visibilitymap_clear() call must occur inside a critical section, and
all such calls are made within heapam.c (only VACUUM ANALYZE uses a
transaction and does writes). It cannot be the two calls to
visibilitymap_clear() inside vacuumlazy.c.
I suspect that you've figured this much already. Just pointing it out.
--
Peter Geoghegan
flag won't change underneath it, and if it did, it's clear how this
> symptom would ensue.
Does this patch seem to fix the problem?
--
Peter Geoghegan
0001-Acquire-super-exclusive-lock-in-lazy_vacuum_heap_rel.patch
Description: Binary data
r a long
time.
How about temporarily committing this patch, just to review how it
affects the buildfarm?
--
Peter Geoghegan
On Sun, Apr 11, 2021 at 9:10 AM Peter Geoghegan wrote:
> I don't have any reason to believe that using a super-exclusive lock
> during heap page vacuuming is necessary. My guess is that returning to
> doing it that way might make the buildfarm green again. That would at
>
to maintain safely. We need to simplify it.
It is way too complicated. I don't think that I quite understand your
first proposal right now, so I'll need to go think about it.
--
Peter Geoghegan
rcised from vacuumlazy.c
(in the same commit that stopped using a superexclusive lock). It was
also described as an optimization that wasn't quite worth it. But I
don't quite buy that. ISTM that there is a better explanation: it
evolved the appearance of being an optimization that might make sense.
Which was just camouflage.
--
Peter Geoghegan
ttps://postgr.es/m/capphfdtseohx8dsk9qp+z++i4bgqoffkip6jdwngea+g7z-...@mail.gmail.com
--
Peter Geoghegan
new documentation should address.
The documentation would still have plenty to say about HOT, though. It
would also have something to say about bottom-up index deletion, which
can be thought of as avoiding problems when HOT doesn't or can't be
applied very often.
--
Peter Geoghegan
, since it's likely to change in
Postgres 15 and Postgres 16 anyway.
--
Peter Geoghegan
ns that come with the feature (?):
> - A threshold, as an integer, to define a number of pages.
> - A scale factor to define a percentage of pages.
Why?
--
Peter Geoghegan
ootgun. It might
appear to help at first, but risks destabilizing things much later on.
--
Peter Geoghegan
es, then we must also assume that your
fix has negligible risk in the backbranches, because of how it is
structured. And so I agree -- I lean towards backpatching.
--
Peter Geoghegan
tion is exactly the kind of thing that
risks adding significant, unpredictable delay at a time when we need
to advance relfrozenxid as quickly as possible.
I pushed a trivial commit that makes the failsafe bypass heap
truncation as well just now.
Thanks
--
Peter Geoghegan
uiring a pin or a lock is made more clear. The added commentary is
> descriptive and appears grammatically correct, at least to a non native
> speaker.
I didn't see this review until now because it ended up in gmail's spam
folder. :-(
Thanks for taking a look at it!
--
Peter Geoghegan
On Fri, Jul 17, 2020 at 5:53 PM Peter Geoghegan wrote:
> Pushed the first patch just now, and intend to push the other one soon.
> Thanks!
Pushed the second piece of this (the nbtree patch) just now.
Thanks for the review!
--
Peter Geoghegan
t_lockbuf() wrapper function instead, so that the new Valgrind
instrumentation is used. This shouldn't be hard.
Perhaps you can use Valgrind to verify that this patch doesn't have
any unsafe buffer accesses. I recall problems like that in earlier
versions of the patch series. Valgrind should be able to detect most
bugs like that (though see comments within _bt_lockbuf() for details
of a bug in this area that Valgrind cannot detect even now).
--
Peter Geoghegan
On Tue, Jul 21, 2020 at 1:30 PM Bruce Momjian wrote:
> On Tue, Jul 14, 2020 at 03:49:40PM -0700, Peter Geoghegan wrote:
> > Maybe I missed your point here. The problem is not so much that we'll
> > get HashAggs that spill -- there is nothing intrinsically wrong with
> >
ting to hear back from
Tomas about that. Tomas?
Thanks
--
Peter Geoghegan
v3-0001-Remove-hashagg_avoid_disk_plan-GUC.patch
Description: Binary data
v3-0002-Add-hash_mem_multiplier-GUC.patch
Description: Binary data
_TLIST.
That does make it sound like the costs of the hash agg aren't being
represented. I suppose it isn't clear if this is a costing issue
because it isn't clear if the execution time performance itself is
pathological or is instead something that must be accepted as the cost
of spilling the hash agg in a general kind of way.
--
Peter Geoghegan
ere
> explaining where such tuples would come from.
There could be an opportunity to put this on a formal footing by doing
something in the amcheck heap checker patch.
--
Peter Geoghegan
lot of wasted I/O from repeated spilling of the same tuples. Too
+ * many will result in lots of memory wasted buffering the spill files (which
+ * could instead be spent on a larger hash table).
+ */
+#define HASHAGG_PARTITION_FACTOR 1.50
+#define HASHAGG_MIN_PARTITIONS 4
+#define HASHAGG_MAX_PARTITIONS 1024
--
Peter Geoghegan
isticated cost model for this in
cost_agg(). Maybe the "pages_written" calculation could be pessimized.
However, it doesn't seem like this is precisely an issue with I/O
costs.
--
Peter Geoghegan
agg might write out much less data than the total
memory used for the equivalent "peak optimal nospill" hash agg case --
or much more. (Again, reiterating what I said in my last e-mail.)
--
Peter Geoghegan
On Fri, Jul 24, 2020 at 12:55 PM Peter Geoghegan wrote:
> Could that be caused by clustering in the data?
>
> If the input data is in totally random order then we have a good
> chance of never having to spill skewed "common" values. That is, we're
> bound to
On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan wrote:
> "Peak Memory Usage: 1605334kB"
>
> Hash agg avoids spilling entirely (so the planner gets it right this
> time around). It even uses notably less memory.
I guess that this is because the reported memory usage doesn
20MB work_mem to complete without
spilling at all -- so it's kind of extreme, though not quite as
extreme as many of the similar test results from Tomas.)
--
Peter Geoghegan
t affects simple in-memory hash
aggregation, though.)
--
Peter Geoghegan
On Sat, Jul 25, 2020 at 1:10 PM Jeff Davis wrote:
> On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote:
> > What worries me a bit is the sharp discontinuities when spilling with
> > significantly less work_mem than the "optimal" amount. For example,
> > with
On Sat, Jul 25, 2020 at 12:41 PM Peter Geoghegan wrote:
> I have added a new open item for this separate
> LookupTupleHashEntryHash()/lookup_hash_entry() pipeline-stall issue.
Attached is a rebased version of Andres' now-bitrot 2020-06-12 patch
("aggspeed.diff").
I find
e it doesn't report this excess), or the random case really
doesn't exceed work_mem but has a surprising advantage over the sorted
case.
--
Peter Geoghegan
ter principled approach is possible. It's hard
to judge how much of a problem this really is, though. We'll need to
think about this aspect some more.
Thanks
--
Peter Geoghegan
On Sat, Jul 25, 2020 at 5:31 PM Peter Geoghegan wrote:
> I'm glad that this better principled approach is possible. It's hard
> to judge how much of a problem this really is, though. We'll need to
> think about this aspect some more.
BTW, your HLL patch ameliorates the
On Sun, Jul 26, 2020 at 4:17 PM Jeff Davis wrote:
> I saw less of an improvement than you or Andres (perhaps just more
> noise). But given that both you and Andres are reporting a measurable
> improvement, then I went ahead and committed it. Thank you.
Thanks!
--
Peter Geoghegan
. The memory usage is reported on accurately by
EXPLAIN ANALYZE.
--
Peter Geoghegan
On Mon, Jul 27, 2020 at 10:30 AM Alvaro Herrera
wrote:
> On 2020-Jul-23, Peter Geoghegan wrote:
> I notice you put the prototype for get_hash_mem in nodeHash.h. This
> would be fine if not for the fact that optimizer needs to call the
> function too, which means now optimizer hav
the attached
revision. No other changes.
The v4-0001-Remove-hashagg_avoid_disk_plan-GUC.patch changes are
surprisingly complicated. It would be nice if you could take a look at
that aspect (or confirm that it's included in your review).
--
Peter Geoghegan
v4-0001-Remove-hashagg
On Mon, Jul 27, 2020 at 12:52 PM Alvaro Herrera
wrote:
> On 2020-Jul-27, Peter Geoghegan wrote:
> > The v4-0001-Remove-hashagg_avoid_disk_plan-GUC.patch changes are
> > surprisingly complicated. It would be nice if you could take a look at
> > that aspect (or confirm that
nse.
> Anyway, the patch looks good to me. We can have a separate discussion
> about pessimizing the costing, if necessary.
Pushed the hashagg_avoid_disk_plan patch -- thanks!
--
Peter Geoghegan
er on the incremental sort thread, so I thought I'd ask
again here, while I was reminded of that issue.)
--
Peter Geoghegan
h is the kind of thing that could only
really happen inside tuplesort.c. This is clear because some of the
variables have the tell-tale 0x7f7f7f pattern that we written by
CLOBBER_FREED_MEMORY builds when memory is freed.
--
Peter Geoghegan
On Tue, Jul 28, 2020 at 8:47 AM Peter Geoghegan wrote:
> This is very likely to be related to incremental sort because it's a
> use-after-free issue, which is the kind of thing that could only
> really happen inside tuplesort.c. This is clear because some of the
> variables h
On Tue, Jul 28, 2020 at 10:06 AM Peter Geoghegan wrote:
> I wrote the assertion that fails here with the bug that I fixed in
> commit 4974d7f87e62a58e80c6524e49677cb25cc10e12 in mind specifically.
> That was a bug that involved a scan that returned duplicate tuples due
> to
On Tue, Jul 28, 2020 at 10:37 AM Peter Geoghegan wrote:
> It's starting to look more like that. I can reproduce the bug by
> running the REINDEX in a tight loop while "make installcheck" runs. It
> looks as if the two tuples passed to comparetup_index_btree() are
> sepa
7nga6e...@mail.gmail.com
We see a violation of the HOT invariant in this case, though only for
a system catalog index, and only in fairly particular circumstances
involving concurrency.
--
Peter Geoghegan
On Tue, Jul 28, 2020 at 12:00 PM Peter Geoghegan wrote:
> I still don't know what's going on here, but I find it suspicious that
> some relevant tuples go through the HEAPTUPLE_INSERT_IN_PROGRESS +
> !TransactionIdIsCurrentTransactionId() path of
> heapam_index_build_range
th (i.e. the path that uses the
root_offsets array).
I notice that the root tuple of the hot chain is marked HEAP_COMBOCID
(and xmin == xmax for the HOT chain tuple). The xmin for the successor
(which matches xmin and xmax for root tuple) exactly matches the
REINDEX/crashing session's OldestXmin.
--
Peter Geoghegan
On Tue, Jul 28, 2020 at 1:26 PM Peter Geoghegan wrote:
> On Tue, Jul 28, 2020 at 1:04 PM Tom Lane wrote:
> > No, I don't think so. It was designed for the case of unique key X
> > being inserted immediately after a deletion of the same key. The
> > deleted tuple is p
On Tue, Jul 28, 2020 at 2:46 PM Peter Geoghegan wrote:
> The fact remains that this function (originally known as
> IndexBuildHeapScan(), now heapam_index_build_range_scan()) did not
> care about whether or not the index is unique for about 3 years
> (excluding the tupleIsAlive stuf
documentation link for the GUC would be very helpful.
Thanks
--
Peter Geoghegan
e
that it's worth backpatching to 13 now. Assuming it is a really
measurable cost in practice.
--
Peter Geoghegan
On Mon, Jul 27, 2020 at 5:55 PM Peter Geoghegan wrote:
> Pushed the hashagg_avoid_disk_plan patch -- thanks!
Pushed the hash_mem_multiplier patch as well -- thanks again!
As I've said before, I am not totally opposed to adding a true escape
hatch. That has not proven truly necessary
f
hash_mem_multiplier. This allows hash aggregation to use more memory
without affecting competing query operations that are generally less
likely to put any additional memory to good use.
--
Peter Geoghegan
t it next to the hash agg item itself. It's more
likely to be noticed there, and highlighting it a little seems
warranted.
OTOH, this may not be a problem at all for many individual users.
Framing it as a tip rather than a compatibility item gets that across.
--
Peter Geoghegan
), which can be either memory or disk space. Technically they
don't violate the letter of the law that you cite. (Though admittedly
this is a legalistic loophole -- the "space" value presumably has to
be interpreted according to different rules for programs that consume
non-text EXPLAIN output.)
Have I missed something?
--
Peter Geoghegan
On Wed, Jul 29, 2020 at 9:05 PM Justin Pryzby wrote:
> So my 2ndary suggestion is to conditionalize based on the method rather than
> value of space used.
What's the advantage of doing it that way?
--
Peter Geoghegan
On Thu, Jul 30, 2020 at 10:32 AM Bruce Momjian wrote:
> I came up with a more verbose documentation suggestion, attached.
I'm okay with this.
Thanks
--
Peter Geoghegan
"sparse allocation".
Although maybe the preallocation stuff should somehow be rolled into
the much older nHoleBlocks stuff. Unsure.
--
Peter Geoghegan
something that we have to live with for the foreseeable future. I
don't think that this is a bad example -- we don't output
maxDiskSpaceUsed or maxMemorySpaceUsed at the conceptual level.
--
Peter Geoghegan
ccess to the sort, and it happens to spill -- we can
randomly access the final tape, but cannot do a final on-the-fly
merge. Say for a merge join.
--
Peter Geoghegan
On Thu, Jul 30, 2020 at 6:40 PM Justin Pryzby wrote:
> Feel free to close it out. I'm satisfied that we've had a discussion about
> it.
Closed it out.
--
Peter Geoghegan
On Wed, May 13, 2020 at 4:06 PM Peter Geoghegan wrote:
> On Mon, May 11, 2020 at 5:56 AM Alexander Korotkov
> wrote:
> > Thank you. 2nd patch is proposed for master and makes btree page
> > unlink remove all the items from the page being deleted.
>
> This looks
ch --
I have only made superficial adjuments. I think that I will be able to
commit this next week, barring objections.
--
Peter Geoghegan
v3-0001-Avoid-backwards-scan-page-deletion-standby-race.patch
Description: Binary data
the table or other indexes) at the first sign of trouble,
anyway? It makes sense for the heap structure, but not for indexes.
--
Peter Geoghegan
On Thu, Jul 30, 2020 at 10:59 AM Robert Haas wrote:
> I don't really like the name, either. I get that it's probably
> inspired by Perl, but I think it should be given a less-clever name
> like report_corruption() or something.
+1 -- confess() is an awful name for this.
--
Peter Geoghegan
uring index verification? That's what it would take to do
this in the general case, I guess. More fundamentally, I wonder how
many inconsistencies one should imagine that this index has, before we
even get into talking about the implementation.
--
Peter Geoghegan
On Thu, Jul 30, 2020 at 10:45 AM Peter Geoghegan wrote:
> On Thu, Jul 30, 2020 at 10:32 AM Bruce Momjian wrote:
> > I came up with a more verbose documentation suggestion, attached.
>
> I'm okay with this.
Are you going to push this soon, Bruce?
--
Peter Geoghegan
On Mon, Jul 20, 2020 at 11:46 AM Andrey M. Borodin wrote:
> In this thread [0] we decided that lock coupling is necessary for
> btree_xlog_unlink_page().
> So, maybe let's revive this patch?
Yes, let's do that. Can you resubmit it, please?
Peter Geoghegan
On Sun, Aug 2, 2020 at 9:07 AM Michail Nikolaev
wrote:
> Thanks for your work, the patch is looking better now.
Pushed -- thanks!
--
Peter Geoghegan
On Tue, Jul 28, 2020 at 3:09 PM Peter Geoghegan wrote:
> Perhaps 2011's commit 520bcd9c9bb missed similar
> HEAPTUPLE_INSERT_IN_PROGRESS issues that manifest themselves within
> Justin's test case now?
Any further thoughts on this, Tom?
This is clearly a live bug that we sh
her
probably contain no needles at all. (OTOH, once you find one needle
all bets are off, and you could very well go on to find a huge number
of them.)
--
Peter Geoghegan
ety. When some (though not all) of the problems came to light a
few years later, 520bcd9c9bb didn't go far enough. We know that
1ddc2703a93 got the DELETE_IN_PROGRESS stuff wrong -- why doubt that
it also got the INSERT_IN_PROGRESS stuff wrong?
--
Peter Geoghegan
vot tuple on the page. This keeps things simple
for WAL consistency checking."
(Just a suggestion.)
Thanks!
--
Peter Geoghegan
On Tue, Aug 4, 2020 at 4:18 PM Alexander Korotkov wrote:
> Pushed. Comment is changed as you suggested. But I've replaced "last
> pivot tuple" with "remaining tuples", because the page can also have a
> high key, which is also a tuple.
You're right, of course.
Thanks again
--
Peter Geoghegan
trivial code that only gets used in circumstances that by
definition should never happen. If users really needed to recover the
data in the index then maybe it would happen -- but they don't.
The biggest problem that amcheck currently has is that it isn't used
enough, because it isn't positioned as a general purpose tool at all.
I'm hoping that the work from Mark helps with that.
--
Peter Geoghegan
llaghan's blog is a pretty good resource for learning about
LSMs [2] (perhaps you've heard of him?). He wrote a bunch of stuff
about Postgres recently, which I enjoyed.
[1] http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf
[2] https://smalldatum.blogspot.com/
--
Peter Geoghegan
that the
level of interest in that tool doesn't seem to be all that great,
though I myself have used it to investigate corruption to great
effect. But I suppose there is no way to know how it's being used.
--
Peter Geoghegan
by not using the word hybrid
in the name.
--
Peter Geoghegan
ing link -- we should avoid a self-deadlock in
the event of a page that is corrupt in just the wrong way.
* Updated obsolescent comments that claimed that we never couple
buffer locks in amcheck.
I would like to commit something like this in the next day or two.
Thoughts?
--
Peter Geoghegan
now has a new check for page
deletion -- that was missing from v4.
--
Peter Geoghegan
v5-0002-Adjust-btree_xlog_split-locking.patch
Description: Binary data
v5-0001-Add-amcheck-sibling-link-checks.patch
Description: Binary data
suggest breaking out the series into distinct patch files using
"git format-patch master".
--
Peter Geoghegan
e at some point.
Commit 3bbf668d certainly creates that impression. But I might have
missed something.
[1]
postgr.es/m/CAH2-Wzm3=SLwu5=z8qG6UBpCemZW3dUNXWbX-cpXCgb=y3o...@mail.gmail.com
--
Peter Geoghegan
d
> didn't need to bother with locks at all. I think that it did take some
> locks even then, but that was because of code sharing with the primary
> execution path rather than being something we wanted.
Makes sense.
--
Peter Geoghegan
On Thu, Aug 6, 2020 at 7:00 PM Peter Geoghegan wrote:
> On Thu, Aug 6, 2020 at 6:08 PM Tom Lane wrote:
> > +1 for making this more like what happens in original execution ("on the
> > primary", to use your wording). Perhaps what you suggest here is still
> &g
rsion of this patch just now. I added some
commentary about this canonical example in header comments for the new
function.
Thanks
--
Peter Geoghegan
script at
> https://postgr.es/m/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de
> can give you results like e.g.
> https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg
It seems to have bitrot. Do you have a more recent version of the script?
--
Peter Geoghegan
On Wed, Aug 12, 2020 at 5:39 PM Andres Freund wrote:
> Attached. Needed one python3 fix, and to be adapted so it works with
> futex based semaphores. Seems to work for both sysv and posix semaphores
> now, based a very short test.
Great, thanks!
--
Peter Geoghegan
g" when the value gets copied around.
--
Peter Geoghegan
r than incrementally rewriting the authoritative
page in shared memory, which sidestepped the problem.
I definitely think that we should have something like this, though.
It's a relatively easy win. There are plenty of workloads that spend
lots of time on pruning.
--
Peter Geoghegan
that the extra free space
left behind is used as intended, for updates of tuples located on the
same heap page?
Thoughts?
[1]
https://github.com/petergeoghegan/benchmarksql/commit/3ef4fe71077b40f56b91286d4b874a15835c241e
--
Peter Geoghegan
eap have
> the most dead space, and focus I/O efforts on those blocks first. It
> may or may not be worth the extra complexity by itself, but it seems
> it would work synergistically with your proposal: Updating related
> tuples would concentrate dead tuples on fewer pages, and vacuums would
>
301 - 400 of 3245 matches
Mail list logo