On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas <hlinnakan...@vmware.com > wrote:
> On 12/17/2013 12:22 AM, Alexander Korotkov wrote: > >> On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas < >> hlinnakan...@vmware.com >> >>> wrote: >>> >> >> On 12/12/2013 06:44 PM, Alexander Korotkov wrote: >>> >>> When values are packed into small groups, we have to either insert >>> >>>> inefficiently encoded value or re-encode whole right part of values. >>>> >>> >>> It would probably be simplest to store newly inserted items uncompressed, >>> in a separate area in the page. For example, grow the list of >>> uncompressed >>> items downwards from pg_upper, and the compressed items upwards from >>> pg_lower. When the page fills up, re-encode the whole page. >>> >> > I hacked together an implementation of a variant of Simple9, to see how it > performs. Insertions are handled per the above scheme. > > In a limited pg_trgm test case I've been using a lot for this, this > reduces the index size about 20%, compared to varbyte encoding. It might be > possible to squeeze it a bit more, I handcrafted the "selectors" in the > encoding algorithm to suite our needs, but I don't actually have a good > idea of how to choose them optimally. Also, the encoding can encode 0 > values, but we never need to do that, so you could take advantage of that > to pack items tighter. > > Compression and decompression speed seems to be about the same. > > Patch attached if you want to play with it. WAL replay is still broken, > and there are probably bugs. > > > Good idea. But: >> 1) We'll still need item indexes in the end of page for fast scan. >> > > Sure. > > > 2) Storage would be easily extendable to hold additional information as >> well. >> Better compression shouldn't block more serious improvements. >> > > I'm not sure I agree with that. For all the cases where you don't care > about additional information - which covers all existing users for example > - reducing disk size is pretty important. How are you planning to store the > additional information, and how does using another encoding gets in the way > of that? I was planned to store additional information datums between varbyte-encoded tids. I was expected it would be hard to do with PFOR. However, I don't see significant problems in your implementation of Simple9 encoding. I'm going to dig deeper in your version of patch. ------ With best regards, Alexander Korotkov.