Re: [GENERAL] new index type with clustering in mind.
Currently, one issue you're going to face is that brin doesn't rescan a range to find the tighest possible summary tuple. That's going to be an issue I think, thanks for mentioning it. We'd need some sort of mechanism for achieving this without a complete REINDEX, even if it only reset the min/max when all the blocks in the range are entirely cleared out. Ah well :) Another issue is how to find the best possible ordering. For minmax opclasses it's easy, but for other opclass designs it's less clear what to do. Even for minmax you need to find some way to communicate to the system what's the order to follow ... Do you mean the ordering for the clustered table tuples or the ordering of index tuples in the BRIN index? I'm the former because I'm also assuming you always scan an entire BRIN index as there isn't a trivial way of optimizing the index scan for ranges (unless you 'granulate' the ranges along the lines of this: http://dba.stackexchange.com/a/22295/1396)? If you mean the clustering order, for the use cases I'm concerned with it isn't important - as long as tuples with the same cluster key gravitate towards the same blocks, it often doesn't matter what order those blocks are in because the main mission is to reduce the number of blocks scanned. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
in 9.4, GIN indexes are pretty close to this already Do I understand correctly that BRIN indexes will be even closer to this? Kindest regards Jack -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: 24 May 2014 22:46 To: Martijn van Oosterhout Cc: Jack Douglas; pgsql-general@postgresql.org Subject: Re: [GENERAL] new index type with clustering in mind. Martijn van Oosterhout klep...@svana.org writes: On Sat, May 24, 2014 at 05:58:37PM +0100, Jack Douglas wrote: Would the following be practical to implement: A btree-like index type that points to *pages* rather than individual rows. It's an interesting idea, but, how can you *ever* delete index entries? I.e. is there a way to maintain the index without rebuilding it regularly? The discussions at PGCon pointed out that with the posting-list compression logic added in 9.4, GIN indexes are pretty close to this already. Multiple items on the same heap page will typically only take one byte of index space per item; but there is an identifiable entry, so you don't get into these questions of when VACUUM should remove entries, and it's not lossy so you're not forced to pay the overhead of rechecking every entry on the linked-to page. Not to say that 9.4 GIN is necessarily the last word on the subject, but it would be worth testing it out before deciding that we need something better. (beta1 is out. It needs testing. Hint hint.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
If the values are perfectly clustered, the index is optimal because you scan the minimal set of pages. That's the bit I'm particularly interested in, as my plan would be to keep the pages well clustered: http://dba.stackexchange.com/a/66293/1396 Do you see any blocker preventing BRIN being used for a continuous background re-clustering job (in parallel with or as part of vacuum), similar to the mechanism I experimented with before? If not is this something there might be support for adding to the TODO list? Kindest regards Jack -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
To reduce complexity (eg MVCC/snapshot related issues), index entries would be added when a row is inserted, but they would not be removed when the row is updated/deleted (or when an insert is rolled back): It's an interesting idea, but, how can you *ever* delete index entries? I.e. is there a way to maintain the index without rebuilding it regularly? The discussions at PGCon pointed out that with the posting-list compression logic added in 9.4, GIN indexes are pretty close to this already. Multiple items on the same heap page will typically only take one byte of index space per item; but there is an identifiable entry, so you don't get into these questions of when VACUUM should remove entries, and it's not lossy so you're not forced to pay the overhead of rechecking every entry on the linked-to page. The README file in the source code for GIN indexes says: Note: There is no delete operation in the key (entry) tree. The reason for this is that in our experience, the set of distinct words in a large corpus changes very slowly. This greatly simplifies the code and concurrency algorithms.. Does that mean that for the case where GIN is used as a simple replacement for btree, my initial suggestion above (...would be added when a row is inserted, but they would not be removed...) is effectively what happens already with GIN indexes? --- I've written up a test to demonstrate the principle of this 'clustering lite': http://dba.stackexchange.com/q/66292/1396 In brief, it shows the benefit of this sort of clustering (much lower io versus unclustered, and no exclusive lock or heavy WAL generation versus current clustering implementations) with a workable way of achieving it in the current release. The basic idea is: 1. turn off autovacuum for the table 2. check each block to determine the degree of clustering 3. delete and re-insert all the rows from blocks below a clustering threshold 4. manually vacuum to free those (complete) blocks 5. repeat steps 2-4 as regularly as necessary A weakness is that is requires a full table scan is required. All that is needed to avoid the full-scan would be a way of getting `blkno` from an index-only scan, which as far as I can tell is not currently possible, is it? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] new index type with clustering in mind.
The discussions at PGCon pointed out that with the posting-list compression logic added in 9.4, GIN indexes are pretty close to this already. Multiple items on the same heap page will typically only take one byte of index space per item; but there is an identifiable entry, so you don't get into these questions of when VACUUM should remove entries, and it's not lossy so you're not forced to pay the overhead of rechecking every entry on the linked-to page. Not to say that 9.4 GIN is necessarily the last word on the subject, but it would be worth testing it out before deciding that we need something better. (beta1 is out. It needs testing. Hint hint.) Hint taken, and first impressions are positive: the compression is very efficient for the kind of scenario I'm imagining where the key is deliberately chosen so that the average page has one distinct key. I have a 25Mb gin 'cluster' index on a table where an equivalent regular btree index is 10 times as large. So the questions are, a) is this kind of clustering broadly useful (ie not just to me), b) how much effort will it be to implement a 'vacuum-like' operation that scans a designated index and performs the relevant delete/inserts to achieve this kind of clustering? And c) if it is broadly useful and not a major implementation mountain to climb, is it something that might be added to the todo list? If someone can tell me how to decode a `ctid` into a page number (discarding the row number portion - is there a better way than ` (replace(replace(ctid::text,'(','{'),')','}')::integer[])[1]`), I should be able to show some analysis demonstrating this working, albeit inefficiently as I'll have to scan the table itself for the page/key statistics. Would that sort of analysis be helpful? Kindest regards Jack PS It occurs to me that the btree_gin documentation page for 9.4, http://www.postgresql.org/docs/9.4/static/btree-gin.html, might benefit from including some mention of index compression when discussing the relative performance of regular and gin btree indexes. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Data Checksum feature and streaming replication
I posted this question on dba.se (http://dba.stackexchange.com/q/65821/1396) and was advised to try asking here. If I'm using the Data Checksum feature (new to 9.3: https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.3#Data_Checksu ms), and in the event of a checksum failure on a replication master, will that corrupt data replicate to slaves or will replication halt. Does it depend on the setting of ignore_checksum_failure? Thanks Jack
[GENERAL] new index type with clustering in mind.
Hi Please forgive this question if it is naïve. Would the following be practical to implement: A btree-like index type that points to *pages* rather than individual rows. Ie if there are many rows in a page with the same data (in the indexed columns), only one index entry will exist. In its normal use case, this index would be much smaller than a regular index on the same columns which would contain one entry for each individual row. To reduce complexity (eg MVCC/snapshot related issues), index entries would be added when a row is inserted, but they would not be removed when the row is updated/deleted (or when an insert is rolled back): this would cause index bloat over time in volatile tables but this would be acceptable for the use case I have in mind. So in essence, an entry in the index would indicate that there *may* be matching rows in the page, not that there actually are. For data where there is a high degree of clustering on the columns of the index, this index would be quite efficient at locating matching rows: the index itself would be small and most pages would contain largely rows that match. (In practice it would behave similar to a cluster index in Oracle, without the guarantee of clustering). This brings me on to my use case: a method to efficiently keep a table largely clustered (with clustered more loosely defined than the usual Postgres definition) without locking more than a page at a time. The above index could be scanned periodically for entries where multiple keys point to the same page. Of that set of keys/pages, the subset where the same key appears in multiple pages can be clustered by simply deleting and reinserting the rows, key by key. In this case, the pages affected by the deletes could be locked briefly in order to safely delete the corresponding index entries without conflict with other transactions. If a lock cant be obtained the process could simply skip and move on to the next candidate key/pages. Although this kind of clustered data is less ordered that a normal Postgres cluster, it retains some of the useful properties: particularly that data for a particular key is likely to be found in a small number of pages rather than scattered evenly through the entire table, requiring potentially far more io to select. Many thanks in advance and my apologies once again if this kind of thing has been rejected before or is obviously impractical. Jack
[GENERAL] returning clause and source columns
Hi Related to this post: http://dba.stackexchange.com/q/50693/1396 Would it be a major change to allow the `returning` clause of `insert` to return values that are *not* inserted? It seems you can already do so for literals (and perhaps that should be made explicit in the docs here: http://www.postgresql.org/docs/9.3/static/sql-insert.html#VARIABLELIST). For example: insert into t1(id) select id from t2 returning *, t2.foo; Kindest regards Jack
[GENERAL] interval hour to minute or interval day to minute
Hi I discovered the 'fields' option of 'interval', but i can't figure out from the docs how it is supposed to work. Are hour to minute and day to minute really the same thing? And if not, in what circumstances are they treated differently? psql (8.4.7) Type help for help. postgres= select '3 years 2 months - 1 day + 10 hours 5 minutes'::interval day to minute; interval -- 3 years 2 mons -1 days +10:05:00 (1 row) postgres= select '3 years 2 months - 1 day + 10 hours 5 minutes'::interval hour to minute; interval -- 3 years 2 mons -1 days +10:05:00 (1 row) Warm regards Jack Douglas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general