Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

Josh McKenzie Tue, 22 Nov 2022 02:17:17 -0800

Strong +1 for the proposal here.

> One of the questions that we want to ask is whether anyone objects to 
> maintaining full compatibility with existing files created by DataStax 
> Enterprise.
No concerns here. So long as it's clear in the implementation what it is and 
why it's there I don't see a problem; I think encouraging this kind of bridge 
work can only benefit the project as it'll encourage more upstreaming from 
forks where more aggressive innovation might be taking place.


Regarding testing, I'd recommend we start with enabling the index + memtables 
together in the utests-trie target rather than trying to multiplex everything 
together.


On Tue, Nov 22, 2022, at 4:18 AM, Jacek Lewandowski wrote:
> +1 for the proposal !
> 
> btw. regarding tests - perhaps we will have to let Python DTests run with 
> either new or old format
> 
> thanks
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
> 
> 
> On Mon, Nov 21, 2022 at 3:06 PM Benedict <[email protected]> wrote:
>> 
>> Yes of course, this was absolutely just a query and not a precondition for 
>> this work. It stands on its own on my view, and I’m already ready to +1 the 
>> proposal.
>> 
>> 
>>> On 21 Nov 2022, at 13:55, Branimir Lambov <[email protected]> wrote:
>>> 
>>> I see. This does make a lot of sense for full row indexing, and also if one 
>>> can specify sub-kb granularity (at the current default we just won't have 
>>> an index in these cases). How does opening a ticket to do these two* after 
>>> the current code is committed sound?
>>> 
>>> * embedded index for sub-X-byte partitions + granularity in bytes
>>> 
>>> On Mon, Nov 21, 2022 at 3:38 PM Benedict <[email protected]> wrote:
>>>> 
>>>> Buffering on write up to at most one page seems fine? Once you are past a 
>>>> single page it’s fine to write either to the end of the partition or to a 
>>>> separate file, there’s nothing much to be gained, but esp. for small 
>>>> partitions there’s likely significant value in prepending it?
>>>> 
>>>> It might be preferable to retain the separate index for those that 
>>>> overflow this buffer, and simply encode in the partition index whether the 
>>>> row index is inline or in the separate file.
>>>> 
>>>> 
>>>>> On 21 Nov 2022, at 13:29, Branimir Lambov <[email protected]> wrote:
>>>>> 
>>>>> There is no intention to introduce any new versions of the format 
>>>>> specifically for DSE. If there are any further changes to the format, 
>>>>> they will be OSS-first. In other words this support only extends to 
>>>>> preexisting versions of the format.
>>>>> 
>>>>> Inline row index in the data file is not something we have implemented, 
>>>>> and it's currently not in any plans. I personally am not sure how it can 
>>>>> be done to provide a benefit: if we place it at the end of a partition, 
>>>>> it does not help much compared to a separate file; if we place it in 
>>>>> front, we have to buffer the partition content, which will affect write 
>>>>> performance. In either case it may be harder to cache. Do you have 
>>>>> something different in mind?
>>>>> 
>>>>> Regards,
>>>>> Branimir
>>>>> 
>>>>> On Mon, Nov 21, 2022 at 3:01 PM Benedict <[email protected]> wrote:
>>>>>> 
>>>>>> Personally very pleased to see this proposal, and I’m not opposed to 
>>>>>> easing your migration by maintaining some light support for internal 
>>>>>> file versions - though would prefer the support have some version limit 
>>>>>> where it can be excised (maybe for one minor version bump?)
>>>>>> 
>>>>>> One implementation question: are there any plans to support inline row 
>>>>>> index in the big sstable format files? Is this something DSE supports, 
>>>>>> and on the roadmap just not for initial work, or currently not 
>>>>>> envisioned?
>>>>>> 
>>>>>> I would anticipate significant advantage to this for many workloads, and 
>>>>>> no downside (except for streaming - which could be resolved fairly 
>>>>>> easily by skipping over these sections when streaming to an old node, 
>>>>>> but since we don’t generally stream between versions I don’t see any 
>>>>>> major issue anyway).
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 21 Nov 2022, at 12:43, Branimir Lambov <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> We would like to put CEP-25 for discussion.
>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format
>>>>>>> 
>>>>>>> The proposal describes DSE's Big Trie-indexed SSTable format, which 
>>>>>>> replaces the primary index with on-disk tries to improve lookup 
>>>>>>> performance and index size, better handle wide partitions, and remove 
>>>>>>> the need to manage key caching and index summaries.
>>>>>>> 
>>>>>>> We would like to discuss this proposal with you.
>>>>>>> 
>>>>>>> One of the questions that we want to ask is whether anyone objects to 
>>>>>>> maintaining full compatibility with existing files created by DataStax 
>>>>>>> Enterprise.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Branimir
>>>>> 
>>>>> 
>>>>>

Re: [DISCUSS] CEP-25: Trie-indexed SSTable format

Reply via email to