Re: [GENERAL] new index type with clustering in mind.
Currently, one issue you're going to face is that brin doesn't rescan a range to find the tighest possible summary tuple. That's going to be an issue I think, thanks for mentioning it. We'd need some sort of mechanism for achieving this without a complete REINDEX, even if it only reset the min/max when all the blocks in the range are entirely cleared out. Ah well :) Another issue is how to find the best possible ordering. For minmax opclasses it's easy, but for other opclass designs it's less clear what to do. Even for minmax you need to find some way to communicate to the system what's the order to follow ... Do you mean the ordering for the clustered table tuples or the ordering of index tuples in the BRIN index? I'm the former because I'm also assuming you always scan an entire BRIN index as there isn't a trivial way of optimizing the index scan for ranges (unless you 'granulate' the ranges along the lines of this: http://dba.stackexchange.com/a/22295/1396)? If you mean the clustering order, for the use cases I'm concerned with it isn't important - as long as tuples with the same cluster key gravitate towards the same blocks, it often doesn't matter what order those blocks are in because the main mission is to reduce the number of blocks scanned. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
in 9.4, GIN indexes are pretty close to this already Do I understand correctly that BRIN indexes will be even closer to this? Kindest regards Jack -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: 24 May 2014 22:46 To: Martijn van Oosterhout Cc: Jack Douglas; pgsql-general@postgresql.org Subject: Re: [GENERAL] new index type with clustering in mind. Martijn van Oosterhout klep...@svana.org writes: On Sat, May 24, 2014 at 05:58:37PM +0100, Jack Douglas wrote: Would the following be practical to implement: A btree-like index type that points to *pages* rather than individual rows. It's an interesting idea, but, how can you *ever* delete index entries? I.e. is there a way to maintain the index without rebuilding it regularly? The discussions at PGCon pointed out that with the posting-list compression logic added in 9.4, GIN indexes are pretty close to this already. Multiple items on the same heap page will typically only take one byte of index space per item; but there is an identifiable entry, so you don't get into these questions of when VACUUM should remove entries, and it's not lossy so you're not forced to pay the overhead of rechecking every entry on the linked-to page. Not to say that 9.4 GIN is necessarily the last word on the subject, but it would be worth testing it out before deciding that we need something better. (beta1 is out. It needs testing. Hint hint.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
Jack Douglas wrote: in 9.4, GIN indexes are pretty close to this already Do I understand correctly that BRIN indexes will be even closer to this? Yeah, in a way. You could say they are closer from the opposite end. There is one index tuple in a BRIN index for each page range (contiguous set of pages); each index tuple contains a summary of what in that page range. There are no exact entries. If the values are randomly scattered, the index is useless; all page ranges will have to be scanned for possibly matching tuples. If the values are perfectly clustered, the index is optimal because you scan the minimal set of pages. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
If the values are perfectly clustered, the index is optimal because you scan the minimal set of pages. That's the bit I'm particularly interested in, as my plan would be to keep the pages well clustered: http://dba.stackexchange.com/a/66293/1396 Do you see any blocker preventing BRIN being used for a continuous background re-clustering job (in parallel with or as part of vacuum), similar to the mechanism I experimented with before? If not is this something there might be support for adding to the TODO list? Kindest regards Jack -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
Jack Douglas wrote: If the values are perfectly clustered, the index is optimal because you scan the minimal set of pages. That's the bit I'm particularly interested in, as my plan would be to keep the pages well clustered: http://dba.stackexchange.com/a/66293/1396 Do you see any blocker preventing BRIN being used for a continuous background re-clustering job (in parallel with or as part of vacuum), similar to the mechanism I experimented with before? If not is this something there might be support for adding to the TODO list? In principle, CLUSTER sucks, and having data clustered is clearly good, so improvements in this area are certainly welcome. If you were to propose some general mechanism that works for any index, I don't see that we would reject having it work specifically for BRIN; having it for BRIN only would be strange. (I guess it's good enough if it works for btree and BRIN. Not sure about GiST and SP-GiST; GIN clearly is of no use here.) Currently, one issue you're going to face is that brin doesn't rescan a range to find the tighest possible summary tuple. Thinking in min/max terms, once a tuple with a very high or very low is inserted, the range doesn't get any smaller once that tuple is deleted from the range. You would need to find a way to fix that. (The easiest way is to REINDEX the whole thing, of course, but that processes the whole table and not just some portion of it.) Another issue is how to find the best possible ordering. For minmax opclasses it's easy, but for other opclass designs it's less clear what to do. Even for minmax you need to find some way to communicate to the system what's the order to follow ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
To reduce complexity (eg MVCC/snapshot related issues), index entries would be added when a row is inserted, but they would not be removed when the row is updated/deleted (or when an insert is rolled back): It's an interesting idea, but, how can you *ever* delete index entries? I.e. is there a way to maintain the index without rebuilding it regularly? The discussions at PGCon pointed out that with the posting-list compression logic added in 9.4, GIN indexes are pretty close to this already. Multiple items on the same heap page will typically only take one byte of index space per item; but there is an identifiable entry, so you don't get into these questions of when VACUUM should remove entries, and it's not lossy so you're not forced to pay the overhead of rechecking every entry on the linked-to page. The README file in the source code for GIN indexes says: Note: There is no delete operation in the key (entry) tree. The reason for this is that in our experience, the set of distinct words in a large corpus changes very slowly. This greatly simplifies the code and concurrency algorithms.. Does that mean that for the case where GIN is used as a simple replacement for btree, my initial suggestion above (...would be added when a row is inserted, but they would not be removed...) is effectively what happens already with GIN indexes? --- I've written up a test to demonstrate the principle of this 'clustering lite': http://dba.stackexchange.com/q/66292/1396 In brief, it shows the benefit of this sort of clustering (much lower io versus unclustered, and no exclusive lock or heavy WAL generation versus current clustering implementations) with a workable way of achieving it in the current release. The basic idea is: 1. turn off autovacuum for the table 2. check each block to determine the degree of clustering 3. delete and re-insert all the rows from blocks below a clustering threshold 4. manually vacuum to free those (complete) blocks 5. repeat steps 2-4 as regularly as necessary A weakness is that is requires a full table scan is required. All that is needed to avoid the full-scan would be a way of getting `blkno` from an index-only scan, which as far as I can tell is not currently possible, is it? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
The discussions at PGCon pointed out that with the posting-list compression logic added in 9.4, GIN indexes are pretty close to this already. Multiple items on the same heap page will typically only take one byte of index space per item; but there is an identifiable entry, so you don't get into these questions of when VACUUM should remove entries, and it's not lossy so you're not forced to pay the overhead of rechecking every entry on the linked-to page. Not to say that 9.4 GIN is necessarily the last word on the subject, but it would be worth testing it out before deciding that we need something better. (beta1 is out. It needs testing. Hint hint.) Hint taken, and first impressions are positive: the compression is very efficient for the kind of scenario I'm imagining where the key is deliberately chosen so that the average page has one distinct key. I have a 25Mb gin 'cluster' index on a table where an equivalent regular btree index is 10 times as large. So the questions are, a) is this kind of clustering broadly useful (ie not just to me), b) how much effort will it be to implement a 'vacuum-like' operation that scans a designated index and performs the relevant delete/inserts to achieve this kind of clustering? And c) if it is broadly useful and not a major implementation mountain to climb, is it something that might be added to the todo list? If someone can tell me how to decode a `ctid` into a page number (discarding the row number portion - is there a better way than ` (replace(replace(ctid::text,'(','{'),')','}')::integer[])[1]`), I should be able to show some analysis demonstrating this working, albeit inefficiently as I'll have to scan the table itself for the page/key statistics. Would that sort of analysis be helpful? Kindest regards Jack PS It occurs to me that the btree_gin documentation page for 9.4, http://www.postgresql.org/docs/9.4/static/btree-gin.html, might benefit from including some mention of index compression when discussing the relative performance of regular and gin btree indexes. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
On Sat, May 24, 2014 at 05:58:37PM +0100, Jack Douglas wrote: Would the following be practical to implement: A btree-like index type that points to *pages* rather than individual rows. Ie if there are many rows in a page with the same data (in the indexed columns), only one index entry will exist. In its normal use case, this index would be much smaller than a regular index on the same columns which would contain one entry for each individual row. To reduce complexity (eg MVCC/snapshot related issues), index entries would be added when a row is inserted, but they would not be removed when the row is updated/deleted (or when an insert is rolled back): this would cause index bloat over time in volatile tables but this would be acceptable for the use case I have in mind. So in essence, an entry in the index would indicate that there *may* be matching rows in the page, not that there actually are. It's an interesting idea, but, how can you *ever* delete index entries? I.e. is there a way to maintain the index without rebuilding it regularly? Maybe there's something you could do with tracking all the entries that point to one page or something, or a counter. Because really, the fact that the item pointer in a btree index includes the item number is only really needed for deletion. Postgres always has to read in the whole page anyway, so if you can find a way around that it might be an interesting improvement. Mind you, hash indexes could get this almost free, except they're not crash safe. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [GENERAL] new index type with clustering in mind.
Martijn van Oosterhout klep...@svana.org writes: On Sat, May 24, 2014 at 05:58:37PM +0100, Jack Douglas wrote: Would the following be practical to implement: A btree-like index type that points to *pages* rather than individual rows. It's an interesting idea, but, how can you *ever* delete index entries? I.e. is there a way to maintain the index without rebuilding it regularly? The discussions at PGCon pointed out that with the posting-list compression logic added in 9.4, GIN indexes are pretty close to this already. Multiple items on the same heap page will typically only take one byte of index space per item; but there is an identifiable entry, so you don't get into these questions of when VACUUM should remove entries, and it's not lossy so you're not forced to pay the overhead of rechecking every entry on the linked-to page. Not to say that 9.4 GIN is necessarily the last word on the subject, but it would be worth testing it out before deciding that we need something better. (beta1 is out. It needs testing. Hint hint.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general