Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

Jim Nasby Sun, 22 Jan 2017 15:27:07 -0800

On 1/16/17 10:09 PM, Haribabu Kommi wrote:

Yes, that' correct. Currently with this approach, it is not possible to
ditch the
heap completely. This approach is useful for the cases, where the user wants
to store only some columns as part of clustered index.

Ahh, that's unfortunate. Billion row+ tables are becoming rather common,and that 24GB of overhead starts becoming very painful. It's actually alot worse considering there will be at least one index on the table, so100GB+ of overhead isn't that uncommon.

    Another complication is that one of the big advantages of a CSTORE
    is allowing analysis to be done efficiently on a column-by-column
    (as opposed to row-by-row) basis. Does your patch by chance provide
    that?

Not the base patch that I shared. But the further patches provides the
data access
column-by-column basis using the custom plan methods.

Great, that's something else that a column store really needs to besuccessful. Something else I suspect is necessary is a faster/better wayto eliminate chunks of rows from scans.

Just as an example, with my simple array-based approach, you can store arange type along with each array that contains the min and max valuesfor the array. That means any query that wants values between 50 and 100can include a clause that filters on range types that overlap with[50,100]. That can be indexed very efficiently and is fast to run checksagainst.

    Generally speaking, I do think the idea of adding support for this
    as an "index" is a really good starting point, since that part of

... as discussed elsewhere in the thread, adding a bunch of hooks isprobably not a good way to do this. :/

    That would be a great way to gain knowledge on what users would want
    to see in a column store, something else I suspect we need. It would
    also be far less code than what you or Alvaro are proposing. When it
    comes to large changes that don't have crystal-clear requirements, I
    think that's really important.

The  main use case of this patch is to support mixed load environments,
where both OLTP and OLAP queries are possible. The advantage of
proposed patch design is, providing good performance to OLAP queries
without affecting OLTP.

Yeah, that's a big part of what I was envisioning with my array-basedapproach. In simple terms, there would be a regular row-based table, andan array-based table, with a view that allows seamless querying intoboth (re-presenting the array-storage on a per-row basis). There wouldbe a periodic process that moves entire sets of rows from the rowstorage into the array storage.

If you updated or deleted a row that was part of an array, the contentsof the entire array could be moved back into row-based storage. After aperiod of time, rows would get moved back into array storage. Or thearray could be modified in place, but you need to be very careful aboutbloating the array storage if you do that.

The big missing piece here is getting the planner to intelligentlyhandle a mixed row/column store. As I mentioned, you can easily addrange type fields to greatly increase performance, but they won't do anygood unless the appropriate filters get added. It's not THAT hard to dothat by hand, but it'd be great if there was a more automated method.Such a method might also be very useful for transforming expressionslike date_part('quarter', ...) into something that could use existingindexes.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

Reply via email to