Re: [GENERAL] BRIN indexes and ORDER BY
On Wed, Oct 5, 2016 at 3:27 PM, Stephen Frost wrote: > Darren, > > * Darren Lafreniere (dlafreni...@onezero.com) wrote: >> Tom Lane wrote: >> > > Gavin Wahl wrote: >> > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You >> > >> just find the page range with the largest/smallest value, and then only >> > >> scan that one. Would that be hard to implement? I'm interested in >> > working >> > >> on it if someone can give me some pointers. >> > >> > I think this proposal is fairly broken anyway. The page range with the >> > largest max-value may once have contained the largest live row, but >> > there's no guarantee that it still does. It might even be completely >> > empty. You could imagine an algorithm like this: >> > >> > 1. Find page-range with largest max. Scan it to identify live row with >> > largest value. If *no* live values, find page-range with next largest >> > max, repeat until no page ranges remain (whereupon return NULL). >> > >> > 2. For each remaining page-range whose indexed max exceeds the value >> > currently in hand, scan that page-range to see if any value exceeds >> > the one in hand, replacing the value if so. >> > >> > This'd probably allow you to omit scanning some of the page-ranges >> > in the table, but in a lot of cases you'd end up scanning many of them; >> > and you'd need a lot of working state to remember which ranges you'd >> > already looked at. It'd certainly always be a lot more expensive than >> > answering the same question with a btree index, because in no case do >> > you get to avoid scanning the entire contents of the index. > [...] >> A b-tree index would certainly be faster for ordering. But in scenarios >> where you have huge datasets that can't afford the space or update time >> required for b-tree, could such a BRIN-accelerated ordering algorithm at >> least be faster than ordering with no index? > > For at least some of the common BRIN use-cases, where the rows are > inserted in-order and never/very-rarely modified or deleted, this > approach would work very well. > > Certainly, using this would be much cheaper than a seqscan/top-N sort, > for small values of 'N', relative to the number of rows in the table, > in those cases. > > In general, I like the idea of supporting this as BRIN indexes strike me > as very good for very large tables which have highly clumped data in > them and being able to do a top-N query on those can be very useful at > times. Yeah. If the brin average page overlap and % dead tuple coefficients are low it absolutely makes sense to drive top N with brin. It will never beat a btree but typically brin is used when the btree index is no good for various reasons. brin indexes are pretty neat; they can provide stupefying amounts of optimization in many common warehousing workloads. They even beat out index only scans for a tiny fraction of the storage. Of course, you have to work around the limitations... :-) merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BRIN indexes and ORDER BY
Stephen Frost wrote: > For at least some of the common BRIN use-cases, where the rows are > inserted in-order and never/very-rarely modified or deleted, this > approach would work very well. > Thanks Stephen, this is exactly our use case.
Re: [GENERAL] BRIN indexes and ORDER BY
Darren, * Darren Lafreniere (dlafreni...@onezero.com) wrote: > Tom Lane wrote: > > > Gavin Wahl wrote: > > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You > > >> just find the page range with the largest/smallest value, and then only > > >> scan that one. Would that be hard to implement? I'm interested in > > working > > >> on it if someone can give me some pointers. > > > > I think this proposal is fairly broken anyway. The page range with the > > largest max-value may once have contained the largest live row, but > > there's no guarantee that it still does. It might even be completely > > empty. You could imagine an algorithm like this: > > > > 1. Find page-range with largest max. Scan it to identify live row with > > largest value. If *no* live values, find page-range with next largest > > max, repeat until no page ranges remain (whereupon return NULL). > > > > 2. For each remaining page-range whose indexed max exceeds the value > > currently in hand, scan that page-range to see if any value exceeds > > the one in hand, replacing the value if so. > > > > This'd probably allow you to omit scanning some of the page-ranges > > in the table, but in a lot of cases you'd end up scanning many of them; > > and you'd need a lot of working state to remember which ranges you'd > > already looked at. It'd certainly always be a lot more expensive than > > answering the same question with a btree index, because in no case do > > you get to avoid scanning the entire contents of the index. [...] > A b-tree index would certainly be faster for ordering. But in scenarios > where you have huge datasets that can't afford the space or update time > required for b-tree, could such a BRIN-accelerated ordering algorithm at > least be faster than ordering with no index? For at least some of the common BRIN use-cases, where the rows are inserted in-order and never/very-rarely modified or deleted, this approach would work very well. Certainly, using this would be much cheaper than a seqscan/top-N sort, for small values of 'N', relative to the number of rows in the table, in those cases. In general, I like the idea of supporting this as BRIN indexes strike me as very good for very large tables which have highly clumped data in them and being able to do a top-N query on those can be very useful at times. Thanks! Stephen signature.asc Description: Digital signature
Re: [GENERAL] BRIN indexes and ORDER BY
Tom Lane wrote: > > Gavin Wahl wrote: > >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You > >> just find the page range with the largest/smallest value, and then only > >> scan that one. Would that be hard to implement? I'm interested in > working > >> on it if someone can give me some pointers. > > I think this proposal is fairly broken anyway. The page range with the > largest max-value may once have contained the largest live row, but > there's no guarantee that it still does. It might even be completely > empty. You could imagine an algorithm like this: > > 1. Find page-range with largest max. Scan it to identify live row with > largest value. If *no* live values, find page-range with next largest > max, repeat until no page ranges remain (whereupon return NULL). > > 2. For each remaining page-range whose indexed max exceeds the value > currently in hand, scan that page-range to see if any value exceeds > the one in hand, replacing the value if so. > > This'd probably allow you to omit scanning some of the page-ranges > in the table, but in a lot of cases you'd end up scanning many of them; > and you'd need a lot of working state to remember which ranges you'd > already looked at. It'd certainly always be a lot more expensive than > answering the same question with a btree index, because in no case do > you get to avoid scanning the entire contents of the index. > > regards, tom lane > Thanks Tom, A b-tree index would certainly be faster for ordering. But in scenarios where you have huge datasets that can't afford the space or update time required for b-tree, could such a BRIN-accelerated ordering algorithm at least be faster than ordering with no index? Darren Lafreniere
Re: [GENERAL] BRIN indexes and ORDER BY
Alvaro Herrera writes: > Darren Lafreniere wrote: >> We found a pgsql-hackers thread from about a year ago about optimizing >> ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it: >> https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us > Tom said he was working on some infrastructure planner changes > ("upper-planner path-ification"), not that he was working on improving > usage of BRIN indexes. As far as I know, nobody has worked on that. Alvaro's reading is correct; I wasn't planning to work on any such thing, and still am not. Looking again at the original thread: > Gavin Wahl wrote: >> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You >> just find the page range with the largest/smallest value, and then only >> scan that one. Would that be hard to implement? I'm interested in working >> on it if someone can give me some pointers. I think this proposal is fairly broken anyway. The page range with the largest max-value may once have contained the largest live row, but there's no guarantee that it still does. It might even be completely empty. You could imagine an algorithm like this: 1. Find page-range with largest max. Scan it to identify live row with largest value. If *no* live values, find page-range with next largest max, repeat until no page ranges remain (whereupon return NULL). 2. For each remaining page-range whose indexed max exceeds the value currently in hand, scan that page-range to see if any value exceeds the one in hand, replacing the value if so. This'd probably allow you to omit scanning some of the page-ranges in the table, but in a lot of cases you'd end up scanning many of them; and you'd need a lot of working state to remember which ranges you'd already looked at. It'd certainly always be a lot more expensive than answering the same question with a btree index, because in no case do you get to avoid scanning the entire contents of the index. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BRIN indexes and ORDER BY
Ahh, yes. I misread that. Thank you for the clarification. On Wed, Oct 5, 2016 at 2:27 PM, Alvaro Herrera wrote: > Darren Lafreniere wrote: > > > "In addition to simply finding the rows to be returned by a query, an > index > > may be able to deliver them in a specific sorted order. This allows a > > query's ORDER BY specification to be honored without a separate sorting > > step. Of the index types currently supported by PostgreSQL, only B-tree > can > > produce sorted output — the other index types return matching rows in an > > unspecified, implementation-dependent order." > > > > We found a pgsql-hackers thread from about a year ago about optimizing > > ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it: > > https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us > > Tom said he was working on some infrastructure planner changes > ("upper-planner path-ification"), not that he was working on improving > usage of BRIN indexes. As far as I know, nobody has worked on that. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [GENERAL] BRIN indexes and ORDER BY
Darren Lafreniere wrote: > "In addition to simply finding the rows to be returned by a query, an index > may be able to deliver them in a specific sorted order. This allows a > query's ORDER BY specification to be honored without a separate sorting > step. Of the index types currently supported by PostgreSQL, only B-tree can > produce sorted output — the other index types return matching rows in an > unspecified, implementation-dependent order." > > We found a pgsql-hackers thread from about a year ago about optimizing > ORDER BY for BRIN indexes. Tom Lane suggested that he was working on it: > https://www.postgresql.org/message-id/11881.1443393360%40sss.pgh.pa.us Tom said he was working on some infrastructure planner changes ("upper-planner path-ification"), not that he was working on improving usage of BRIN indexes. As far as I know, nobody has worked on that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general