Re: [GENERAL] clustering without locking

2008-05-03 Thread Tom Lane
Craig Ringer [EMAIL PROTECTED] writes:
 Later on, though, less new space would have to be allocated because more 
 and more of the space allocated earlier to hold moved tuples would be 
 being freed up in useful chunks that could be reused.

I don't see how that works.  If the minimum size of the table is X
pages, ISTM that the first pass has to push everything up to pages above
X.  You can't put any temporary copies in pages = X because you might
need that space when it comes time to make the clustering happen.  So
the table is going to bloat to (at least) 2X pages.  The extra pages
will be *mostly* empty when you're done, but probably not *entirely*
empty if there have been concurrent insertions --- and you'll never be
able to clean them out without taking exclusive lock.

If you could accurately predict a tuple's final position, you could
maybe get away with putting it temporarily in a page above that one
but still less than X.  I don't see how you do that though, especially
not in the face of concurrent insertions.  (In fact, given concurrent
insertions I don't even see how to know what X is.)

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-03 Thread Craig Ringer

Tom Lane wrote:

Craig Ringer [EMAIL PROTECTED] writes:
Later on, though, less new space would have to be allocated because more 
and more of the space allocated earlier to hold moved tuples would be 
being freed up in useful chunks that could be reused.


I don't see how that works.  If the minimum size of the table is X
pages, ISTM that the first pass has to push everything up to pages above
X.


What I was suggesting was to essentially cluster the table in small 
parts. Rather than two huge passes (move all tuples to free / newly 
allocated space at end of table ; move back into old locations in order) 
 it'd be done in a series of smaller moves.


After moving a chunk out of the way and into free/new space at the end 
of the table data would be copied from later in the table into the freed 
space. That space could then be re-used to hold data from the next chunk 
that needs to be moved out of the way.


I'm just trying to understand if it can actually work. Sorry if my 
attempted explanations are unclear; I'm probably doing a terrible job, 
and it's probably actually a stupid idea anyway (if nothing else it 
might just be too slow). Nonetheless I'm curious. Maybe I can explain 
another way.


Visually:

`0' to `9' : tuples. Desired eventual cluster order is face value.
`.' : Dead/obsoleted tuple not yet marked reusable by VACUUM
` ' : free space

Initial state:

-
584736120
-

Begin a transaction and free the first chunk (2 tuples in this case, but 
obviously many more in a real case):


---
..473612058
---

Use that freed space to store the first ordered tuples:

---
014736.2.58
---

Commit, and when the last transaction to which the . tuples above are 
still visible completes mark them as free with VACUUM or similar.


---
014736 2 58
---

Repeat, but now use the freed space to store the next set of tuples that 
must be moved rather than extending the table:


---
01..3642758
---
---
0123.64.758
---
---
0123 64 758
---

During the next pass someone inserts `9' after tuples have been moved to 
 make a hole and others have been moved into the hole, but before the 
old locations of the moved tuples are marked as free:


---
0123 .46758
---
---
012345.67.8
---

012345.67.89  - INSERT 9


012345 67 89


You'd never land up with this sort of convenient ordering half way 
through in a real case with realistic numbers of tuples, so it'd keep 
going, doing small chunks each time, until the whole table had been 
processed.


So, the table does grow, and its final state does contain dead space at 
the end, but not *too* much of it:



0123456789




If low values are inserted late in the progressive cluster they can 
just stay at the end of the table. They'll get moved if the user runs a 
progressive cluster operation again later. However, since you're ideally 
doing this with a non-100% fillfactor, they should land up in roughly 
the right place on initial insert rather than at the end of the table, 
avoiding the issue (and helping avoid having newly inserted tuples 
prevent table truncation by vacuum when the progressive clustering 
finishes).


Say a table containing the range 1-9 was being clustered with a 50% 
fillfactor and was about half way through the process:


-
 1 2 3 47986
-

and someone inserts `0' it should ideally land up in the right place 
anyway (as a bit of reading suggests that insert tries to respect 
cluster order):


-
01 2 3 47986
-


If you're re-clustering a table with a fillfactor set you may not need 
to extend it at all, because you can use already allocated free space to 
store tuples temporarily while moving them around.



You can't put any temporary copies in pages = X because you might
need that space when it comes time to make the clustering happen.  So
the table is going to bloat to (at least) 2X pages.


Yes, if you perform the whole process in a single huge chunk. What I was 
hoping was that it was possible to do it in a series of smaller 
operations instead, avoiding the need for such a huge amount of bloat at 
the end of the table.


Say you're clustering in 10 steps. You need X*0.1 worth of scratch space 
created by extending the table if there's not already appropriate free 
space late in the table. Assuming you can reclaim all space you've used 
for temporary copies of tuples after each pass your ideal final table 
size is X*1.1+(space used by new inserts), of which X*0.1 is free space 
at the end of the table. That free space probably has scattered rows 
from concurrent inserts, but if you're using a non-100% fillfactor it 
might not.


It seems like it should work, *if* space used for temporary storage can 
be efficiently reclaimed for reuse (or new inserts) after each pass. 
Even if it can work, 

Re: [GENERAL] clustering without locking

2008-05-03 Thread Tom Lane
Craig Ringer [EMAIL PROTECTED] writes:
 Begin a transaction and free the first chunk (2 tuples in this case, but 
 obviously many more in a real case):

 ---
 ..473612058
 ---

 Use that freed space to store the first ordered tuples:

 ---
 014736.2.58
 ---

 Commit, and when the last transaction to which the . tuples above are 
 still visible completes mark them as free with VACUUM or similar.

 ---
 014736 2 58
 ---

Oh, the problem is that you're misexplaining this.  You can't do it like
that: you can't overwrite the moved-up . tuples until after they
aren't visible to any other transaction.  So you need two transactions
to do the above.  I'm not sure if you need two wait for all others or
just one --- it's not clear to me what's the urgency of clearing out the
moved-down tuples after they're moved down.  (You *would* want to vacuum
every so often, but this is to reclaim index space, not so much heap
space because you'll reuse that anyway.)

Anyway I think the main practical problem would be with deadlocks
against other transactions trying to update/delete tuples at the same
times you need to move them.  Dealing with uncommitted insertions would
be tricky too --- I think you'd need to wait out the inserting
transaction, which would add more possibilities of deadlock.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-03 Thread Craig Ringer

Tom Lane wrote:


Anyway I think the main practical problem would be with deadlocks
against other transactions trying to update/delete tuples at the same
times you need to move them.  Dealing with uncommitted insertions would
be tricky too --- I think you'd need to wait out the inserting
transaction, which would add more possibilities of deadlock.


I really appreciate your taking the time to think about and explain 
this. It's very helpful, as I'm trying to understand some of the basics 
of PostgreSQL's underlying operation.


I'd completely missed thinking about uncomitted inserts - I never 
normally need to think about them so they just didn't cross my mind. I 
guess it'd either have to do the equivalent of a SELECT FOR UPDATE 
NOWAIT on all tuples in the pages to be freed before doing anything 
else, or would have to take out an EXCLUSIVE table lock while freeing a 
chunk of pages.


I can also vaguely see how problems would arise with concurrent 
multi-tuple updates grabbing locks in a different order to the 
progressive cluster and deadlocking, and again hadn't even thought about 
that.


I guess it might be OK if the progressive cluster attempted to get row 
exclusive locks on all tuples in the contiguous range of pages to be 
freed, and if it failed to get even one it released them all and retried 
that whole step. It sounds like it could be slow and inefficient, 
though, possibly so much so as to defeat the point of the clustering 
operation in the first place.


Thanks again for taking the time to go over that - it's extremely 
helpful and much appreciated.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-02 Thread Martijn van Oosterhout
On Thu, May 01, 2008 at 05:12:52PM -0700, fschmidt wrote:
 
 An implementation of clustering without locking would start by comparing the
 index to the table from the beginning to find the first mismatch.  Rows
 before the mismatch are fine, and can be left alone.  From here on, go
 through the index and rewrite each row in order.  This will put the rows at
 the end of the table in cluster order.  When done, vacuum the table.  This
 will result in a clustered table without any locking needed.  Those few
 records that were updated while clustering was happening will be out of
 order, but that should only be a few.

Huh? If I'm understanding you correctly you'll end up with rows in
order, but with a really big hole in the middle of the table. I'm not
sure if that qualifies as clusters.

 So, could this work?  I could really use clustering without locking.

Nice idea, but I don't think it's going to work.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [GENERAL] clustering without locking

2008-05-02 Thread Fujii Masao
On Fri, May 2, 2008 at 9:12 AM, fschmidt [EMAIL PROTECTED] wrote:

  An implementation of clustering without locking would start by comparing the
  index to the table from the beginning to find the first mismatch.  Rows
  before the mismatch are fine, and can be left alone.  From here on, go
  through the index and rewrite each row in order.  This will put the rows at
  the end of the table in cluster order.  When done, vacuum the table.  This
  will result in a clustered table without any locking needed.  Those few
  records that were updated while clustering was happening will be out of
  order, but that should only be a few.

In your proposal, a large amount of dead tuple can be generated in both
table and index. This is a serious problem that spoils the effect of clustering.

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-02 Thread Scott Ribe
 Huh? If I'm understanding you correctly you'll end up with rows in
 order, but with a really big hole in the middle of the table. I'm not
 sure if that qualifies as clusters.

That's why he said vacuum when done. Anyway, I'm not sure that a big
*contiguous* hole in the middle of the table would matter as much for
queries, because most rows would still be close to each other--most queries
would pull from one side or other of the hole, and even for those that
didn't, it would be one seek across the hole, not seeking all over the
place?

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-02 Thread Scott Ribe
 Wouldn't new / updated tuples just get put in the hole, fairly rapidly
 un-clustering the table again?

How is that different than putting them in newly-allocated space at the end
of the table?

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-02 Thread Craig Ringer

Scott Ribe wrote:

Huh? If I'm understanding you correctly you'll end up with rows in
order, but with a really big hole in the middle of the table. I'm not
sure if that qualifies as clusters.



That's why he said vacuum when done. Anyway, I'm not sure that a big
*contiguous* hole in the middle of the table would matter as much for
queries, because most rows would still be close to each other--most queries
would pull from one side or other of the hole, and even for those that
didn't, it would be one seek across the hole, not seeking all over the
place?
  


Wouldn't new / updated tuples just get put in the hole, fairly rapidly 
un-clustering the table again?


I guess you could also have a fillfactor to pad out the newly clustered 
data and just accept huge disk space use.


When you ran the lockless cluster again it could also fill the hole in 
partly.


--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-02 Thread Tom Lane
Scott Ribe [EMAIL PROTECTED] writes:
 Huh? If I'm understanding you correctly you'll end up with rows in
 order, but with a really big hole in the middle of the table. I'm not
 sure if that qualifies as clusters.

 That's why he said vacuum when done.

Huh?  A plain vacuum wouldn't fix that; a vacuum full would close up the
hole, but (a) it'd not preserve the row ordering, and (b) it'd take an
exclusive lock.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-02 Thread Scott Ribe
 Huh?  A plain vacuum wouldn't fix that; a vacuum full would close up the
 hole, but (a) it'd not preserve the row ordering, and (b) it'd take an
 exclusive lock.

OK.

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-02 Thread Craig Ringer

Scott Ribe wrote:

Wouldn't new / updated tuples just get put in the hole, fairly rapidly
un-clustering the table again?



How is that different than putting them in newly-allocated space at the end
of the table?

  

It isn't, I just wasn't thinking straight.

This is probably a stupid idea as I'm fairly clueless about Pg's 
innards, but I find myself wondering about doing a non-exclusive cluster 
in small progressive chunks in series of short transactions.


In each transaction tuples from a smallish contiguous chunk of pages 
(beginning at the start of the table and moving onward as the cluster 
progresses) are copied to free or newly allocated space later in the 
table and given the xid of the transaction. The old versions are marked 
as having been obsoleted by this transaction. The transaction then 
commits. This step is much like a standalone UPDATE that's not actually 
changing the field values and that has some weird tuple selection criteria.


Progressive clustering then waits until the last transaction that could 
see the old copies of the tuples that were just relocated finishes. When 
the space that was occupied by the moved tuples becomes reclaimable (ie 
vacuum could mark it as free if it was to run) progressive clustering 
resumes. A new transaction is begun. It looks up tuples from the index 
being clustered on in order and reads enough to fill the space just 
freed into RAM. It then writes those tuples into the freed space 
(without ever actually marking it as free, so no other transactions 
could ever write to it), giving them the xid of the writing transaction. 
The old copies are marked as obsoleted by this transaction and the 
transaction commits. Again, it's like an UPDATE, but combined with a 
small selective vacuum that never bothers to update the free space map 
because it instantly reuses the space.


Now a small chunk of the table, at the start, is in order. The process 
can now begin again with the next chunk of unordered pages. It never 
needs to create a huge hole in the table or expand the table massively. 
It can even space the data out if a fillfactor has been set by leaving 
gaps as it writes the replacement data into the freed chunks and 
updating the free space map.


The progressive cluster would also need to be able to run a periodic 
vacuum (or do the equivalent internally) with a restriction that 
prevented it from examining pages in the range the progressive cluster 
was currently trying to free. Otherwise all the space freed by old 
versions of rows that're now neatly ordered in the table wouldn't be 
freed and couldn't be used as scratch space.


In my ignorance of Pg's innards it seems like all this should work, 
though it'd basically never finish if there were long running 
transactions touching the table being clustered. The other potential 
problem I wonder about is bloating the indexes, since every record in 
the table is being moved twice over the course of the progressive 
cluster operation.


Oh: There's actually only a need for one transaction per step:

begin transaction
move ordered tuples into just-freed hole
move tuples from next block of pages to free space later in table
commit
wait for all transactions that can see the old versions to complete
repeat

So ... is this crazy? Concurrently clustering the table by moving each 
record *twice*, in batches, with pauses to allow old versions to cease 
being visible by any live transaction? Or can it actually work?


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-02 Thread Tom Lane
Craig Ringer [EMAIL PROTECTED] writes:
 So ... is this crazy? Concurrently clustering the table by moving each 
 record *twice*, in batches, with pauses to allow old versions to cease 
 being visible by any live transaction? Or can it actually work?

It seems to me you'd have a pretty horrid bloat problem: at completion,
the table and all its indexes must be at least twice the minimum size,
and it's unlikely that you'd be able to compact them much afterward.
(If any other insertions were in progress they'd have probably ended up
near the end of the enlarged table, so just truncating is unlikely to
work for the table --- and for the indexes it's a lost cause entirely.)

Now admittedly this is no worse than what you get after a full-table
UPDATE, but for a maintenance operation that is supposed to be improving
efficiency, it doesn't seem like a pleasant property.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] clustering without locking

2008-05-02 Thread Craig Ringer

Tom Lane wrote:

Craig Ringer [EMAIL PROTECTED] writes:
So ... is this crazy? Concurrently clustering the table by moving each 
record *twice*, in batches, with pauses to allow old versions to cease 
being visible by any live transaction? Or can it actually work?


It seems to me you'd have a pretty horrid bloat problem: at completion,
the table and all its indexes must be at least twice the minimum size,


I was hoping that wouldn't be the case for the table its self, though I 
expected the indexes would be pretty messed up and need their own 
cleanup process.


I'll try to explain my thinking, as I'm curious about what I've 
misunderstood.


The documentation on VACUUM (well, on pg_freespacemap actually) suggests 
that vacuum can recover space from pages that contain some free space 
but are not wholly free. Is that right?


In this theoretical progressive cluster, tuples are moved in chunks from 
their original locations to free space later in the table (probably 
newly allocated at the end). However, if vacuum can recover partial 
pages shouldn't it be marking some of the scratch space and originally 
allocated space as free (thus permitting its reuse as scratch space) as 
tuples are picked out and moved back to the start of the table in order?


Guesswork:

If the initial order of tuples in the table before clustering starts is 
completely random then early on the table would expand by a full 
progressive cluster chunk size each step, because the pages being moved 
back to the start would be from all over the place and the space being 
freed would be very scattered and hard to reclaim.


Later on, though, less new space would have to be allocated because more 
and more of the space allocated earlier to hold moved tuples would be 
being freed up in useful chunks that could be reused. That'd also permit 
inserts and updates unrelated to the ongoing progressive clustering 
process to be written inside already allocated space rather than being 
appended to the table after all the progressive clustering scratch space.


So, if I understand vacuum's reclaiming correctly then even starting off 
with a completely record order it should expand the table somewhat for 
scratch space, but to less than double its original size. How much less, 
and how much could be truncated at the end, would depend on how good 
vacuum is at finding small holes to shove new/moved tuples into, and how 
similar the tuple sizes are. Right?


That assumes that the initial ordering of tuples is in fact random. If 
you're re-clustering a table it's probably far from random, and many of 
the tuples will already be in roughly the right areas. That should 
permit a much smaller allocation of scratch space, since much more of 
the data from the scratch area will be copied back and marked as free 
for reuse (for new unrelated inserts or for more moved tuples) within 
the next few steps of the progressive cluster. Especially if there's a 
non-100% fillfactor it should also be possible to truncate most of the 
newly allocated space at the end, as new inserts can be put in sensible 
places in the table rather than at the end.


Now, even if that's right the indexes will presumably be in an awful 
state. I've noticed that PostgreSQL has `CREATE INDEX CONCURRENTLY' but 
not `REINDEX CONCURRENTLY'. That's not too surprising, as nobody's 
trying to use an index while you're initially creating it. If there's no 
way to clean up the indexes after an operation like this then it's 
probably not worth it ... so, is there any way to clean up / optimize 
and index that doesn't require a long exclusive lock? A REINDEX 
CONCURRENTLY equivalent?


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] clustering without locking

2008-05-01 Thread fschmidt

An implementation of clustering without locking would start by comparing the
index to the table from the beginning to find the first mismatch.  Rows
before the mismatch are fine, and can be left alone.  From here on, go
through the index and rewrite each row in order.  This will put the rows at
the end of the table in cluster order.  When done, vacuum the table.  This
will result in a clustered table without any locking needed.  Those few
records that were updated while clustering was happening will be out of
order, but that should only be a few.

So, could this work?  I could really use clustering without locking.

-- 
View this message in context: 
http://www.nabble.com/clustering-without-locking-tp16996348p16996348.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general