On Sat, 2007-06-09 at 09:58 -0400, Tom Lane wrote:
Jeff Davis [EMAIL PROTECTED] writes:
* For a large table, do lazy_scan_heap, scan_heap, and a sequential
scan usually progress at approximately the same rate?
scan_heap would probably be faster than a regular seqscan, since it
isn't
Tom Lane wrote:
Jeff Davis [EMAIL PROTECTED] writes:
* Just adding in the syncscan to scan_heap and lazy_scan_heap seems
very easy at first thought. Are there any complications that I'm
missing?
I believe there are assumptions buried in both full and lazy vacuum that
blocks are scanned in
Heikki Linnakangas [EMAIL PROTECTED] writes:
I don't think sync-scanning vacuum is worth pursuing, though, because of the
other issues: index scans, vacuum cost accounting, and the fact that the 2nd
pass would be harder to synchronize. There's a lot of other interesting ideas
for vacuum that
Jeff Davis [EMAIL PROTECTED] writes:
I'm sure this has been brought up before, does someone have a pointer to
a discussion about doing VACUUM-like work in a sequential scan?
Yeah, it's been discussed before; try looking for incremental vacuum
and such phrases.
The main stumbling block is
Tom Lane wrote:
Jeff Davis [EMAIL PROTECTED] writes:
I'm sure this has been brought up before, does someone have a pointer to
a discussion about doing VACUUM-like work in a sequential scan?
Yeah, it's been discussed before; try looking for incremental vacuum
and such phrases.
The main
Jeff Davis [EMAIL PROTECTED] writes:
* For a large table, do lazy_scan_heap, scan_heap, and a sequential
scan usually progress at approximately the same rate?
scan_heap would probably be faster than a regular seqscan, since it
isn't doing any where-clause-checking or data output. Except if
Tom Lane [EMAIL PROTECTED] writes:
The vacuum-cost-limit issue may be sufficient reason to kill this idea;
not sure.
We already have a much higher cost for blocks that cause i/o than blocks which
don't. I think if we had zero cost for blocks which don't cause i/o it would
basically work unless
Gregory Stark [EMAIL PROTECTED] writes:
Tom Lane [EMAIL PROTECTED] writes:
The vacuum-cost-limit issue may be sufficient reason to kill this idea;
not sure.
We already have a much higher cost for blocks that cause i/o than
blocks which don't. I think if we had zero cost for blocks which
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
I fixed a little off-by-one in backward scan, not inited branch, but I
was unable to test it. It seems that code is actually never used because
that case is optimized to a rewind in the executor. I marked those
seemingly unreachable
Tom Lane wrote:
Jeff Davis [EMAIL PROTECTED] writes:
Just to be sure: a backwards-started scan is currently unreachable code,
correct?
[ yawn... ] I think so, but I wouldn't swear to it right at the moment.
In any case it doesn't seem like a path that we need to optimize.
Agreed, let's
Heikki Linnakangas [EMAIL PROTECTED] writes:
Tom Lane wrote:
It occurs to me that there's an actual bug here for catalog access.
The code assumes that it can measure rs_nblocks only once and not worry
about tuples added beyond that endpoint. But this is only true when
using an MVCC-safe
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
BTW: Should we do the synchronization in the non-page-at-a-time mode?
It's not many lines of code to do so, but IIRC that codepath is only
used for catalog access. System tables really shouldn't grow that big,
and if they do we
Heikki Linnakangas [EMAIL PROTECTED] writes:
Tom Lane wrote:
Jeff Davis [EMAIL PROTECTED] writes:
Just to be sure: a backwards-started scan is currently unreachable code,
correct?
[ yawn... ] I think so, but I wouldn't swear to it right at the moment.
In any case it doesn't seem like a
On Fri, 2007-06-08 at 11:05 +0100, Heikki Linnakangas wrote:
BTW: Should we do the synchronization in the non-page-at-a-time mode?
It's not many lines of code to do so, but IIRC that codepath is only
used for catalog access. System tables really shouldn't grow that big,
and if they do we
Jeff Davis [EMAIL PROTECTED] writes:
On Fri, 2007-06-08 at 11:05 +0100, Heikki Linnakangas wrote:
BTW: Should we do the synchronization in the non-page-at-a-time mode?
http://archives.postgresql.org/pgsql-hackers/2006-09/msg01199.php
There is a very minor assumption there that scans on
On Fri, 2007-06-08 at 12:22 -0400, Tom Lane wrote:
Now that I'm awake, it is reachable code, per this comment:
* Note: when we fall off the end of the scan in either direction, we
* reset rs_inited. This means that a further request with the same
* scan direction will restart the scan,
Heikki Linnakangas [EMAIL PROTECTED] writes:
Here's an update of the patch. I reverted the behavior at end of scan
back to the way it was in Jeff's original patch, and disabled reporting
the position when moving backwards.
Applied with minor editorializations --- notably, I got rid of the
Jeff Davis [EMAIL PROTECTED] writes:
On Fri, 2007-06-08 at 12:22 -0400, Tom Lane wrote:
Now that I'm awake, it is reachable code, per this comment:
* Note: when we fall off the end of the scan in either direction, we
* reset rs_inited. This means that a further request with the same
* scan
On Fri, 2007-06-08 at 11:57 -0700, Jeff Davis wrote:
On Fri, 2007-06-08 at 14:36 -0400, Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
Here's an update of the patch. I reverted the behavior at end of scan
back to the way it was in Jeff's original patch, and disabled
On Thu, 2007-06-07 at 22:52 -0400, Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
I fixed a little off-by-one in backward scan, not inited branch, but I
was unable to test it. It seems that code is actually never used because
that case is optimized to a rewind in the
Jeff Davis [EMAIL PROTECTED] writes:
Just to be sure: a backwards-started scan is currently unreachable code,
correct?
[ yawn... ] I think so, but I wouldn't swear to it right at the moment.
In any case it doesn't seem like a path that we need to optimize.
regards,
Tom Lane wrote:
But note that barring backend crash, once all the scans are done it is
guaranteed that the hint will be removed --- somebody will be last to
update the hint, and therefore will remove it when they do heap_endscan,
even if others are not quite done. This is good in the sense that
On Mon, 2007-06-04 at 21:39 -0400, Tom Lane wrote:
idea of deleting the hint. But if we could change the hint behavior to
say start reading here, successive short LIMITed reads would all start
reading from the same point, which fixes both my reproducibility concern
and Heikki's original point
Jeff Davis [EMAIL PROTECTED] writes:
That's how it works now. Small limit queries don't change the location
in the hint, so if you repeat them, the queries keep starting from the
same place, and fetching the same tuples.
OK, maybe the problem's not as severe as I thought then.
On Mon, 2007-06-04 at 10:53 +0100, Heikki Linnakangas wrote:
I'm now done with this patch and testing it.
One difference between our patches is that, in my patch, the ending
condition of the scan is after the hint is set back to the starting
position.
That means, in my patch, if you do:
I'm now done with this patch and testing it.
I fixed a little off-by-one in backward scan, not inited branch, but I
was unable to test it. It seems that code is actually never used because
that case is optimized to a rewind in the executor. I marked those
seemingly unreachable places in the
Heikki Linnakangas [EMAIL PROTECTED] writes:
For the record, this patch has a small negative impact on scans like
SELECT * FROM foo LIMIT 1000. If such a scan is run repeatedly, in CVS
HEAD the first 1000 rows will stay in buffer cache, but with the patch
each scan will start from roughly
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
For the record, this patch has a small negative impact on scans like
SELECT * FROM foo LIMIT 1000. If such a scan is run repeatedly, in CVS
HEAD the first 1000 rows will stay in buffer cache, but with the patch
each scan will start
Heikki Linnakangas wrote:
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
For the record, this patch has a small negative impact on scans like
SELECT * FROM foo LIMIT 1000. If such a scan is run repeatedly, in CVS
HEAD the first 1000 rows will stay in buffer cache, but with
Bruce Momjian [EMAIL PROTECTED] writes:
As I understand it, the problem is that while currently LIMIT without
ORDER BY always starts at the beginning of the table, it will not with
this patch. I consider that acceptable.
It's definitely going to require stronger warnings than we have now
Tom Lane wrote:
Bruce Momjian [EMAIL PROTECTED] writes:
As I understand it, the problem is that while currently LIMIT without
ORDER BY always starts at the beginning of the table, it will not with
this patch. I consider that acceptable.
It's definitely going to require stronger warnings than
On Mon, 2007-06-04 at 10:53 +0100, Heikki Linnakangas wrote:
I'm now done with this patch and testing it.
Great!
For the record, this patch has a small negative impact on scans like
SELECT * FROM foo LIMIT 1000. If such a scan is run repeatedly, in CVS
HEAD the first 1000 rows will stay
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
For the record, this patch has a small negative impact on scans like
SELECT * FROM foo LIMIT 1000. If such a scan is run repeatedly, in CVS
HEAD the first 1000 rows will stay in buffer cache, but with the patch
each scan will
Alvaro Herrera wrote:
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
For the record, this patch has a small negative impact on scans like
SELECT * FROM foo LIMIT 1000. If such a scan is run repeatedly, in CVS
HEAD the first 1000 rows will stay in buffer cache, but with the patch
Jeff Davis wrote:
No surprise here, as you and Bruce have already pointed out.
If we wanted to reduce the occurrence of this phenomena, we could
perhaps time out the hints so that it's impossible to pick up a hint
from a scan that finished 5 minutes ago.
It doesn't seem helpful to further
Heikki Linnakangas [EMAIL PROTECTED] writes:
I don't think anyone can reasonably expect to get the same ordering when
the same query issued twice in general, but within the same transaction
it wouldn't be that unreasonable. If we care about that, we could keep
track of starting locations
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
I don't think anyone can reasonably expect to get the same ordering when
the same query issued twice in general, but within the same transaction
it wouldn't be that unreasonable. If we care about that, we could keep
track of
On Jun 4, 2007, at 15:24 , Heikki Linnakangas wrote:
I don't think anyone can reasonably expect to get the same ordering
when the same query issued twice in general, but within the same
transaction it wouldn't be that unreasonable.
The order rows are returned without an ORDER BY clause
On Mon, 2007-06-04 at 16:42 -0400, Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
I don't think anyone can reasonably expect to get the same ordering when
the same query issued twice in general, but within the same transaction
it wouldn't be that unreasonable. If we care
On Mon, 2007-06-04 at 22:09 +0100, Heikki Linnakangas wrote:
I think the real problem here is that the first scan is leaving state
behind that changes the behavior of the next scan. Which can have no
positive benefit, since obviously the first scan is not still
proceeding; the best you
On Jun 4, 2007, at 16:34 , Heikki Linnakangas wrote:
LIMIT without ORDER BY is worse because it not only returns tuples
in different order, but it can return different tuples altogether
when you run it multiple times.
Wouldn't DISTINCT ON suffer from the same issue without ORDER BY?
Jeff Davis wrote:
On Mon, 2007-06-04 at 22:09 +0100, Heikki Linnakangas wrote:
I think the real problem here is that the first scan is leaving state
behind that changes the behavior of the next scan. Which can have no
positive benefit, since obviously the first scan is not still
proceeding;
Jeff Davis [EMAIL PROTECTED] writes:
My thought was that every time the location was reported by a backend,
it would store 3 pieces of information, not 2:
* relfilenode
* the PID of the backend that created or updated this particular hint
last
* the location
Then, on heap_endscan() (if
Heikki Linnakangas [EMAIL PROTECTED] writes:
Were you thinking of storing the PID of the backend that originally created
the
hint, or updating the PID every time the hint is updated? In any case, we
still
wouldn't know if there's other scanners still running.
My reaction was if you
On Mon, 2007-06-04 at 18:25 -0400, Tom Lane wrote:
But note that barring backend crash, once all the scans are done it is
guaranteed that the hint will be removed --- somebody will be last to
update the hint, and therefore will remove it when they do heap_endscan,
even if others are not quite
Jeff Davis [EMAIL PROTECTED] writes:
The problem is, I think people would be more frustrated by 1 in 1000
queries starting the scan in the wrong place because a hint was deleted,
Yeah --- various people have been complaining recently about how we have
good average performance and bad worst
46 matches
Mail list logo