Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-29 Thread PFC

On Sun, 27 May 2007 19:34:30 +0200, PFC [EMAIL PROTECTED] wrote:

On Sun, 27 May 2007 17:53:38 +0200, Jim C. Nasby [EMAIL PROTECTED]  
wrote:



On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote:

This does not run a complete sort on the table. It would be about as
	fast  as your seq scan disk throughput. Obviously, the end result is  
not as
good  as a real CLUSTER since the table will be made up of several  
ordered

chunks and a range lookup. Therefore, a range lookup on the clustered
columns would need at most N seeks, versus 1 for a really clustered  
table.
But it only scans the table once and writes it once, even counting  
index

rebuild.


Do you have any data that indicates such an arrangement would be
substantially better than less-clustered data?


	While the little benchmark that will answer your question is running,  
I'll add a few comments :


Alright, so far :

	This is a simulated forum workload, so it's mostly post insertions, some  
edits, and some topic deletes.
	It will give results applicable to forums, obviously, but also anything  
that wotks on the same schema :

- topics + posts
- blog articles + coomments
- e-commerce site where users can enter their reviews
	So, the new trend being to let the users to participate, this kind of  
workload will become more and more relevant for websites.


	So, how to cluster the posts table on (topic_id, post_id) to get all the  
posts on the same webpake in 1 seek ?


I am benchmarking the following :
- CLUSTER obviously
	- Creating a new table and INSERT .. SELECT ORDER BY topic_id, post_id,  
then reindexing etc

- not doing anything (just vacuuming all tables)
- not even vacuuming the posts table.

I al also trying the following more exotic approaches :

* chunked sort :

	Well, sorting 1GB of data when your work_mem is only 512 MB needs several  
passes, hence a lot of disk IO. The more data, the more IO.

So, instead of doing this, I will :
- grab about 250 MB of posts from the table
- sort them by (topic_id, post_id)
- insert them in a new table
- repeat
- then reindex, etc and replace old table with new.
	(reindex is very fast, since the table is nicely defragmented now, I get  
full disk speed. However I would like being able to create 2 indexes with  
ONE table scan !)

I'm trying 2 different ways to do that, with plpgsql and cursors.
	It is much faster than sorting the whole data set, because the sorts are  
only done in memory (hence the chunks)
	So far, it seems a database clustered this way is about as fast as using  
CLUSTER, but the clustering operation is faster.

More results in about 3 days when the benchmarks finish.

* other dumb stuff

	I'll try DELETing the last 250MB of records, stuff them in a temp table,  
vacuum, and re-insert them in order.





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-29 Thread Jim Nasby

On May 27, 2007, at 12:34 PM, PFC wrote:
On Sun, 27 May 2007 17:53:38 +0200, Jim C. Nasby  
[EMAIL PROTECTED] wrote:

On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote:
	This does not run a complete sort on the table. It would be  
about as
	fast  as your seq scan disk throughput. Obviously, the end  
result is not as
good  as a real CLUSTER since the table will be made up of  
several ordered
chunks and a range lookup. Therefore, a range lookup on the  
clustered
columns would need at most N seeks, versus 1 for a really  
clustered table.
But it only scans the table once and writes it once, even  
counting index

rebuild.


Do you have any data that indicates such an arrangement would be
substantially better than less-clustered data?
	While the little benchmark that will answer your question is  
running, I'll add a few comments :


	I have been creating a new benchmark for PostgreSQL and MySQL,  
that I will call the Forum Benchmark. It mimics the activity of a  
forum.
	So far, I have got interesting results about Postgres and InnoDB  
and will publish an extensive report with lots of nasty stuff in  
it, in, say, 2 weeks, since I'm doing this in spare time.


	Anyway, forums like clustered tables, specifically clusteriing  
posts on (topic_id, post_id), in order to be able to display a page  
with one disk seek, instead of one seek per post.
	PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x  
faster since I run it on dual core ; InnoDB uses only one core).  
However, InnoDB can automatically cluster tables without  
maintenance. This means InnoDB will, even though it sucks and is  
awfully bloated, run a lot faster than postgres if things become IO- 
bound, ie. if the dataset is larger than RAM.
	Postgres needs to cluster the posts table in order to keep going.  
CLUSTER is very slow. I tried inserting into a new posts table,  
ordering by (post_id, topic_id), then renaming the new table in  
place of the old. It is faster, but still slow when handling lots  
of data.
	I am trying other approaches, some quite hack-ish, and will report  
my findings.


I assume you meant topic_id, post_id. :)

The problem with your proposal is that it does nothing to ensure that  
posts for a topic stay together as soon as the table is large enough  
that you can't sort it in a single pass. If you've got a long-running  
thread, it's still going to get spread out throughout the table.


What you really want is CLUSTER CONCURRENTLY, which I believe is on  
the TODO list. BUT... there's another caveat here: for any post where  
the row ends up being larger than 2k, the text is going to get  
TOASTed anyway, which means it's going to be in a separate table, in  
a different ordering. I don't know of a good way to address that; you  
can cluster the toast table, but you'll be clustering on an OID,  
which isn't going to help you.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-29 Thread PFC



I assume you meant topic_id, post_id. :)


Um, yes ;)

The problem with your proposal is that it does nothing to ensure that  
posts for a topic stay together as soon as the table is large enough  
that you can't sort it in a single pass. If you've got a long-running  
thread, it's still going to get spread out throughout the table.


I completely agree with you.
However, you have to consider the use cases for clustered tables.

	Suppose you want to cluster on (A,B). A can be topic, category, month,  
store, whatever ; B can be post_id, product_id, etc.


Now consider these cases :

1- Fully clustered table and defragmented file, TOAST table also in the  
same order as the main table.
CLUSTER does not do the second part, but [INSERT INTO new_table SELECT *  
FROM old_table ORDER BY a,b] also fills the TOAST table in the right order.


2- Totally unclustered

3- InnoDB which is an index-table ie. a BTree with data in the leafs ;  
this means clustering is automatic


4- My partial cluster proposal, ie. the table and its TOAST table have  
been clustered in chunks, say, of 500 MB.


* You always want to get ALL records with a specific value of A.

In this case, a fully clustered table will obviously be the best choice :  
1 seek, then seq scan, Bitmap Index Scan rules.

You might think that the InnoDB case would perform the same.
However, after some time the table and files on disk will be very  
fragmented, and btree pages with the same value of A will be everywhere,  
as they have been splitted and joined by insertions and deletions, so your  
theoretical sequential scan might well translate into 1 seek per page.  
Obviously since all the rows in the page will be of interest to you, this  
is still better than 1 seek per row, but well, you get the idea.


* You want records with a specific value of A, and B inside a range

Example :
- topic_id = X AND post_id between start_of_page and end_of_page
- get the sales record for january grouped by store
- get the first 10 comments of a blog post
- etc
	I would bet that this use case happens much more often than the previous  
one.


In this case, a fully clustered table will obviously, again, be the best  
choice.
Randomly organized table will cost 1 seek per row, ie. buy more RAM or  
get fired.
InnoDB will not work that bad, the number of seeks will be (number of rows  
wanted) / (rows per page)


However, how would the chunked clustered case work ?

	In the worst case, if the table has been sorted in N chunks, you'll  
need N seeks.
	However, since people generally cluster their stuff on some sort of  
temporal related column (posts and comments are sorted by insertion time)  
you can safely bet that most of the rows you want will end up in the same  
chunk, or in maybe 2-3 chunks, reducing the number of seeks to something  
between 1 and the number of chunks.


	The fact is, sometimes people don't use CLUSTER when it would really help  
because it takes too long.


Suppose you have a 3GB table, and your work_mem is 512MB.

CLUSTER will take forever.
	A hypothetical new implementation of CLUSTER which would do an on-disk  
sort would create several sort bins, then combine them.
	A chunked cluster like I'm proposing would be several times faster since  
it would roughly operate at the raw disk IO speed (Postgres sorting is so  
fast when in RAM...)


	So, having a full and chunked cluster would allow users to run the full  
cluster maybe once a month, and the chunked cluster maybe once every few  
days.
	And the chunked CLUSTER would find most of the rows already in order, so  
its end result would be very close to a full CLUSTER, with a fraction of  
the runtime.


An added bonus is for index rebuild.
	Instead of first, clustering the table, then rebuilding the indexes, this  
could be done :


- initialize sort buffers for the index builds
- loop :
- grab 500 MB of data from the table
- sort it
- insert it into new table
		- while data is still in RAM, extract the indexed columns and shove them  
into each index's sort buffer

- repeat until all data is processed

	- now, you have all indexed columns ready to be used, without need to  
rescan the table ! index rebuild will be much faster.


I see VACUUM FULL is scheduled for a reimplementation in a future 
version.
	This could be the way : with the same code path, this could do VACUUM  
FULL, CLUSTER, and chunked CLUSTER, just by changing if/how the sort step  
is done, giving the user a nice performance / maintenance time tradeoff.
	Getting this in maybe, 8.5 is better than CLUSTER CONCURRENTLY in 8.10 I  
would dare say.

And the original table can still be read from.

Benchmarks are running. I will back this with figures in a few days.

	Anoher thing to do for CLUSTER would 

Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-27 Thread Jim C. Nasby
On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote:
   This does not run a complete sort on the table. It would be about as 
   fast  as your seq scan disk throughput. Obviously, the end result is 
 not as 
 good  as a real CLUSTER since the table will be made up of several ordered  
 chunks and a range lookup. Therefore, a range lookup on the clustered  
 columns would need at most N seeks, versus 1 for a really clustered table.  
 But it only scans the table once and writes it once, even counting index  
 rebuild.

Do you have any data that indicates such an arrangement would be
substantially better than less-clustered data?
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpxnzY69XnoC.pgp
Description: PGP signature


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-27 Thread PFC
On Sun, 27 May 2007 17:53:38 +0200, Jim C. Nasby [EMAIL PROTECTED]  
wrote:



On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote:

This does not run a complete sort on the table. It would be about as
	fast  as your seq scan disk throughput. Obviously, the end result is  
not as
good  as a real CLUSTER since the table will be made up of several  
ordered

chunks and a range lookup. Therefore, a range lookup on the clustered
columns would need at most N seeks, versus 1 for a really clustered  
table.

But it only scans the table once and writes it once, even counting index
rebuild.


Do you have any data that indicates such an arrangement would be
substantially better than less-clustered data?


	While the little benchmark that will answer your question is running,  
I'll add a few comments :


	I have been creating a new benchmark for PostgreSQL and MySQL, that I  
will call the Forum Benchmark. It mimics the activity of a forum.
	So far, I have got interesting results about Postgres and InnoDB and will  
publish an extensive report with lots of nasty stuff in it, in, say, 2  
weeks, since I'm doing this in spare time.


	Anyway, forums like clustered tables, specifically clusteriing posts on  
(topic_id, post_id), in order to be able to display a page with one disk  
seek, instead of one seek per post.
	PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster  
since I run it on dual core ; InnoDB uses only one core). However, InnoDB  
can automatically cluster tables without maintenance. This means InnoDB  
will, even though it sucks and is awfully bloated, run a lot faster than  
postgres if things become IO-bound, ie. if the dataset is larger than RAM.
	Postgres needs to cluster the posts table in order to keep going. CLUSTER  
is very slow. I tried inserting into a new posts table, ordering by  
(post_id, topic_id), then renaming the new table in place of the old. It  
is faster, but still slow when handling lots of data.
	I am trying other approaches, some quite hack-ish, and will report my  
findings.


Regards




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-27 Thread Alexander Staubo

On 5/27/07, PFC [EMAIL PROTECTED] wrote:

PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster
since I run it on dual core ; InnoDB uses only one core). However, InnoDB
can automatically cluster tables without maintenance.


How does it know what to cluster by? Does it gather statistics about
query patterns on which it can decide an optimal clustering, or does
it merely follow a clustering previously set up by the user?

Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Feature suggestion : FAST CLUSTER

2007-05-27 Thread PFC




How does it know what to cluster by? Does it gather statistics about
query patterns on which it can decide an optimal clustering, or does
it merely follow a clustering previously set up by the user?


	Nothing fancy, InnoDB ALWAYS clusters on the primary key, whatever it is.  
So, if you can hack your stuff into having a primary key that clusters  
nicely, good for you. If not, well...
	So, I used (topic_id, post_id) as the PK, even though it isn't the real  
PK (this should be post_id)...


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Feature suggestion : FAST CLUSTER

2007-05-22 Thread PFC



	Well, CLUSTER is so slow (and it doesn't cluster the toast tables  
associated with the table to be clustered).

However, when people use CLUSTER they use it to speed up their queries.
For that the table does not need to be perfectly in-order.

So, here is a new idea for CLUSTER :

- choose a chunk size (about 50% of your RAM)
- setup disk sorts for all indexes
- seq scan the table :
- take a chunk of chunk_size
- sort it (in memory)
- write it into new table file
		- while we have the data on-hand, also send the indexed columns data  
into the corresponding disk-sorts


- finish the index disk sorts and rebuild indexes

	This does not run a complete sort on the table. It would be about as fast  
as your seq scan disk throughput. Obviously, the end result is not as good  
as a real CLUSTER since the table will be made up of several ordered  
chunks and a range lookup. Therefore, a range lookup on the clustered  
columns would need at most N seeks, versus 1 for a really clustered table.  
But it only scans the table once and writes it once, even counting index  
rebuild.


	I would think that, with this approach, if people can CLUSTER a large  
table in 5 minutes instead of hours, they will use it, instead of not  
using it. Therefore, even if the resulting table is not as optimal as a  
fully clustered table, it will still be much better than the non-clustered  
case.





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly