On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas <hlinnakan...@vmware.com
> wrote:

> On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
>
>> On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas <
>> hlinnakan...@vmware.com
>>
>>> wrote:
>>>
>>
>>  On 12/12/2013 06:44 PM, Alexander Korotkov wrote:
>>>
>>>   When values are packed into small groups, we have to either insert
>>>
>>>> inefficiently encoded value or re-encode whole right part of values.
>>>>
>>>
>>> It would probably be simplest to store newly inserted items uncompressed,
>>> in a separate area in the page. For example, grow the list of
>>> uncompressed
>>> items downwards from pg_upper, and the compressed items upwards from
>>> pg_lower. When the page fills up, re-encode the whole page.
>>>
>>
> I hacked together an implementation of a variant of Simple9, to see how it
> performs. Insertions are handled per the above scheme.
>
> In a limited pg_trgm test case I've been using a lot for this, this
> reduces the index size about 20%, compared to varbyte encoding. It might be
> possible to squeeze it a bit more, I handcrafted the "selectors" in the
> encoding algorithm to suite our needs, but I don't actually have a good
> idea of how to choose them optimally. Also, the encoding can encode 0
> values, but we never need to do that, so you could take advantage of that
> to pack items tighter.
>
> Compression and decompression speed seems to be about the same.
>
> Patch attached if you want to play with it. WAL replay is still broken,
> and there are probably bugs.
>
>
>  Good idea. But:
>> 1) We'll still need item indexes in the end of page for fast scan.
>>
>
> Sure.
>
>
>  2) Storage would be easily extendable to hold additional information as
>> well.
>> Better compression shouldn't block more serious improvements.
>>
>
> I'm not sure I agree with that. For all the cases where you don't care
> about additional information - which covers all existing users for example
> - reducing disk size is pretty important. How are you planning to store the
> additional information, and how does using another encoding gets in the way
> of that?


I was planned to store additional information datums between
varbyte-encoded tids. I was expected it would be hard to do with PFOR.
However, I don't see significant problems in your implementation of Simple9
encoding. I'm going to dig deeper in your version of patch.

------
With best regards,
Alexander Korotkov.

Reply via email to