Re: [DISCUSS] CEP-7 Storage Attached Index

Mike Adamson Thu, 16 Sep 2021 06:38:29 -0700

Hi,

Just to keep this thread up to date with development progress, we will be 
adding row-aware support to SAI in the next few weeks. This is currently going 
through the final stages of review and testing.


This feature also adds on-disk versioning to SAI. This allows SAI to support 
multiple on-disk formats during upgrades. 

I am mentioning this now because the CEP mentions “Partition Based Iteration” 
as an initial feature. We will change that to “Row Based Iteration” when the 
feature is merged.

MikeA

> On 15 Sep 2021, at 19:42, Caleb Rackliffe <[email protected]> wrote:
> 
> Hey there,
> 
> In the spirit of trying to get as many possible objections to a successful
> vote out of the way, I've added a "Challenges" section to the CEP:
> 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index#CEP7:StorageAttachedIndex-Challenges
>  
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index#CEP7:StorageAttachedIndex-Challenges>
> 
> Most of you will be familiar with these, but I think we need to be as
> open/candid as possible about the potential risk they pose to SAI's broader
> usability. I've described them from the point of view that they are not
> intractable, but if anyone thinks they are, let's hash that disagreement
> out.
> 
> Thanks!
> 
> On Thu, Sep 9, 2021 at 11:13 AM Patrick McFadin <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> +1 on introducing this in an incremental manner and after reading through
>> CASSANDRA-16092 that seems like a perfect place to start. I see that work
>> on that Jira has stopped until direction for CEP-7 has been voted in.
>> 
>> I say start the vote and let's get this really valuable developer feature
>> underway.
>> 
>> Patrick
>> 
>> On Tue, Sep 7, 2021 at 10:40 AM Caleb Rackliffe <[email protected]>
>> wrote:
>> 
>>> So this thread stalled almost a year ago. (Wow, time flies when you're
>>> trying to release 4.0.) My synthesis of the conversation to this point is
>>> that while there are some open questions about testing
>>> methodology/"definition of done" and our choice of particular on-disk
>> data
>>> structures, neither of these should be a serious obstacle to moving
>> forward
>>> w/ a vote. Having said that, is there anything left around the CEP that
>> we
>>> feel should prevent it from moving to a vote?
>>> 
>>> In terms of how we would proceed from the point a vote passes, it seems
>>> like there have been enough concerns around the proposed/necessary
>> breaking
>>> changes to the 2i API, that we will start development by introducing
>>> components as incrementally as possible into a long-running feature
>> branch
>>> off trunk. (This work would likely start w/ *CASSANDRA-16092*
>>> <https://issues.apache.org/jira/browse/CASSANDRA-16092>, which we could
>>> resolve as a sub-task of the SAI epic without interfering with other
>> trunk
>>> development likely destined for a 4.x minor, etc.)
>>> 
>>> On Thu, Sep 24, 2020 at 2:47 AM Jasonstack Zhao Yang <
>>> [email protected]> wrote:
>>> 
>>>>>> Question is: is this planned as a next step?
>>>>>> If yes, how are we going to mark SAI as experimental until it gets
>>>>>> row offsets? Also, it is likely that index format is going to change
>>>> when
>>>>>> row offsets are added, so my concern is that we may have to support
>>> two
>>>>>> versions of a format for a smooth migration.
>>>> 
>>>> The goal is to support row-level index when merging SAI, I will update
>>> the
>>>> CEP about it.
>>>> 
>>>>>> I think switching to row
>>>>>> offsets also has a huge impact on interaction with SPRC and has some
>>>>>> potential for optimisations.
>>>> 
>>>> Can you share more details on the optimizations?
>>>> 
>>>> 
>>>> 
>>>> On Thu, 24 Sep 2020 at 15:20, Oleksandr Petrov <
>>> [email protected]
>>>>> 
>>>> wrote:
>>>> 
>>>>>> But for improving overall index read performance, I think improving
>>>> base
>>>>> table read perf  (because SAI/SASI executes LOTS of
>>>>> SinglePartitionReadCommand after searching on-disk index) is more
>>>> effective
>>>>> than switching from Trie to Prefix BTree.
>>>>> 
>>>>> I haven't suggested switching to Prefix B-Tree or any other
>> structure,
>>>> the
>>>>> question was about rationale and motivation of picking one over the
>>>> other,
>>>>> which I am curious about for personal reasons/interests that lie
>>> outside
>>>> of
>>>>> Cassandra. Having this listed in CEP could have been helpful for
>> future
>>>>> guidance. It's ok if this question is outside of the CEP scope.
>>>>> 
>>>>> I also agree that there are many areas that require improvement
>> around
>>>> the
>>>>> read/write path and 2i, many of which (even outside of base table
>>> format
>>>> or
>>>>> read perf) can yield positive performance results.
>>>>> 
>>>>>> FWIW, I personally look forward to receiving that contribution when
>>> the
>>>>> time is right.
>>>>> 
>>>>> I am very excited for this contribution, too, and it looks like very
>>>> solid
>>>>> work.
>>>>> 
>>>>> I have one more question, about "Upon resolving partition keys, rows
>>> are
>>>>> loaded using Cassandra’s internal partition read command across
>>> SSTables
>>>>> and are post filtered". One of the criticisms of SASI and reasons for
>>>>> marking it as experimental was CASSANDRA-11990. I think switching to
>>> row
>>>>> offsets also has a huge impact on interaction with SPRC and has some
>>>>> potential for optimisations. Question is: is this planned as a next
>>> step?
>>>>> If yes, how are we going to mark SAI as experimental until it gets
>>>>> row offsets? Also, it is likely that index format is going to change
>>> when
>>>>> row offsets are added, so my concern is that we may have to support
>> two
>>>>> versions of a format for a smooth migration.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Sep 24, 2020 at 6:53 AM Jasonstack Zhao Yang <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>>>> I think CEP should be more upfront with "eventually replace
>>>>>>>> it" bit, since it raises the question about what the people who
>>> are
>>>>>> using
>>>>>>>> other index implementations can expect.
>>>>>> 
>>>>>> Will update the CEP to emphasize: SAI will replace other indexes.
>>>>>> 
>>>>>>>> Unfortunately, I do not have an
>>>>>>>> implementation sitting around for a direct comparison, but I can
>>>>> imagine
>>>>>>>> situations when B-Trees may perform better because of simpler
>>>>>> construction.
>>>>>>>> Maybe we should even consider prototyping a prefix B-Tree to
>> have
>>> a
>>>>> more
>>>>>>>> fair comparison.
>>>>>> 
>>>>>> As long as prefix BTree supports range/prefix aggregation (which is
>>>> used
>>>>> to
>>>>>> speed up
>>>>>> range/prefix query when matching entire subtree), we can plug it in
>>> and
>>>>>> compare. It won't
>>>>>> affect the CEP design which focuses on sharing data across indexes
>>> and
>>>>>> posting aggregation.
>>>>>> 
>>>>>> But for improving overall index read performance, I think improving
>>>> base
>>>>>> table read perf
>>>>>> (because SAI/SASI executes LOTS of SinglePartitionReadCommand
>> after
>>>>>> searching on-disk index)
>>>>>> is more effective than switching from Trie to Prefix BTree.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, 24 Sep 2020 at 05:33, Benedict Elliott Smith <
>>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> FWIW, I personally look forward to receiving that contribution
>> when
>>>> the
>>>>>>> time is right.
>>>>>>> 
>>>>>>> On 23/09/2020, 18:45, "Josh McKenzie" <[email protected]>
>>> wrote:
>>>>>>> 
>>>>>>>    talking about that would involve some bits of information
>>>> DataStax
>>>>>>> might
>>>>>>>    not be ready to share?
>>>>>>> 
>>>>>>>    At the risk of derailing, I've been poking and prodding this
>>> week
>>>>> at
>>>>>> we
>>>>>>>    contributors at DS getting our act together w/a draft CEP for
>>>>>> donating
>>>>>>> the
>>>>>>>    trie-based indices to the ASF project.
>>>>>>> 
>>>>>>>    More to come; the intention is certainly to contribute that
>>> code.
>>>>> The
>>>>>>> lack
>>>>>>>    of a destination to merge it into (i.e. no 5.0-dev branch) is
>>>>>> removing
>>>>>>>    significant urgency from the process as well (not to open a
>> 3rd
>>>>>>> Pandora's
>>>>>>>    box), but there's certainly an interrelatedness to the
>>>>> conversations
>>>>>>> going
>>>>>>>    on.
>>>>>>> 
>>>>>>>    ---
>>>>>>>    Josh McKenzie
>>>>>>> 
>>>>>>> 
>>>>>>>    Sent via Superhuman <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__sprh.mn_-3Fvip-3Djmckenzie-40apache.org&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=W153pibedwV7j_YCKUR0MVt-tPDUbvaHukx68pAo9zc&m=epkiu_3NED8CL23Ylg9qVnK7VfGLJGsT28TGXN6Wmc4&s=gJ7VsN1vFUYz0czKFU8Dv28TViVbCWWF1zE3ZQlxtWc&e=
>>  
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__sprh.mn_-3Fvip-3Djmckenzie-40apache.org&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=W153pibedwV7j_YCKUR0MVt-tPDUbvaHukx68pAo9zc&m=epkiu_3NED8CL23Ylg9qVnK7VfGLJGsT28TGXN6Wmc4&s=gJ7VsN1vFUYz0czKFU8Dv28TViVbCWWF1zE3ZQlxtWc&e=>
>>  
>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>    On Wed, Sep 23, 2020 at 12:48 PM, Caleb Rackliffe <
>>>>>>> [email protected]>
>>>>>>>    wrote:
>>>>>>> 
>>>>>>>> As long as we can construct the on-disk indexes
>>>>>> efficiently/directly
>>>>>>> from
>>>>>>>> a Memtable-attached index on flush, there's room to try
>> other
>>>>> data
>>>>>>>> structures. Most of the innovation in SAI is around the
>>> layout
>>>> of
>>>>>>> postings
>>>>>>>> (something we can expand on if people are interested) and
>>>> having
>>>>> a
>>>>>>>> natively row-oriented design that scales w/ multiple
>> indexed
>>>>>> columns
>>>>>>> on
>>>>>>>> single SSTables. There are some broader implications of
>> using
>>>> the
>>>>>>> trie that
>>>>>>>> reach outside SAI itself, but talking about that would
>>> involve
>>>>> some
>>>>>>> bits of
>>>>>>>> information DataStax might not be ready to share?
>>>>>>>> 
>>>>>>>> On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan <
>>>>>> jeremiah.jordan@
>>>>>>>> gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Short question: looking forward, how are we going to
>> maintain
>>>>> three
>>>>>>> 2i
>>>>>>>> implementations: SASI, SAI, and 2i?
>>>>>>>> 
>>>>>>>> I think one of the goals stated in the CEP is for SAI to
>> have
>>>>>> parity
>>>>>>> with
>>>>>>>> 2i such that it could eventually replace it.
>>>>>>>> 
>>>>>>>> On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <
>>>>>>>> 
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>> Short question: looking forward, how are we going to
>> maintain
>>>>> three
>>>>>>> 2i
>>>>>>>> implementations: SASI, SAI, and 2i?
>>>>>>>> 
>>>>>>>> Another thing I think this CEP is missing is rationale and
>>>>>> motivation
>>>>>>>> about why trie-based indexes were chosen over, say, B-Tree.
>>> We
>>>>> did
>>>>>>> have a
>>>>>>>> short discussion about this on Slack, but both arguments
>> that
>>>>> I've
>>>>>>> heard
>>>>>>>> (space-saving and keeping a small subset of nodes in
>> memory)
>>>> work
>>>>>>> only
>>>>>>>> 
>>>>>>>> for
>>>>>>>> 
>>>>>>>> the most primitive implementation of a B-Tree.
>> Fully-occupied
>>>>>> prefix
>>>>>>>> 
>>>>>>>> B-Tree
>>>>>>>> 
>>>>>>>> can have similar properties. There's been a lot of research
>>> on
>>>>>>> B-Trees
>>>>>>>> 
>>>>>>>> and
>>>>>>>> 
>>>>>>>> optimisations in those. Unfortunately, I do not have an
>>>>>>> implementation
>>>>>>>> sitting around for a direct comparison, but I can imagine
>>>>>> situations
>>>>>>> when
>>>>>>>> B-Trees may perform better because of simpler
>>>>>>>> 
>>>>>>>> construction.
>>>>>>>> 
>>>>>>>> Maybe we should even consider prototyping a prefix B-Tree
>> to
>>>>> have a
>>>>>>> more
>>>>>>>> fair comparison.
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> -- Alex
>>>>>>>> 
>>>>>>>> On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang <
>>>>>>> jasonstack.zhao@
>>>>>>>> gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Thank you Patrick for hosting Cassandra Contributor Meeting
>>> for
>>>>>> CEP-7
>>>>>>>> 
>>>>>>>> SAI.
>>>>>>>> 
>>>>>>>> The recorded video is available here:
>>>>>>>> 
>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/
>>>>>>>> 2020-09-01+Apache+Cassandra+Contributor+Meeting
>>>>>>>> 
>>>>>>>> On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang <
>>>>>>> jasonstack.zhao@gmail.
>>>>>>>> com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Thank you, Charles and Patrick
>>>>>>>> 
>>>>>>>> On Tue, 1 Sep 2020 at 04:56, Charles Cao <
>>> [email protected]
>>>>> 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Thank you, Patrick!
>>>>>>>> 
>>>>>>>> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin <
>>>>>> [email protected]
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I just moved it to 8AM for this meeting to better
>> accommodate
>>>>> APAC.
>>>>>>>> 
>>>>>>>> Please
>>>>>>>> 
>>>>>>>> see the update here:
>>>>>>>> 
>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/
>>>>>>>> 2020-08-01+Apache+Cassandra+Contributor+Meeting
>>>>>>>> 
>>>>>>>> Patrick
>>>>>>>> 
>>>>>>>> On Mon, Aug 31, 2020 at 10:04 AM Charles Cao <
>>>>> [email protected]
>>>>>>> 
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Patrick,
>>>>>>>> 
>>>>>>>> 11AM PST is a bad time for the people in the APAC timezone.
>>> Can
>>>>> we
>>>>>>> move it
>>>>>>>> to 7 or 8AM PST in the morning to accommodate their needs ?
>>>>>>>> 
>>>>>>>> ~Charles
>>>>>>>> 
>>>>>>>> On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin <
>>>>>> [email protected]
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Meeting scheduled.
>>>>>>>> 
>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/
>>>>>>>> 2020-08-01+Apache+Cassandra+Contributor+Meeting
>>>>>>>> 
>>>>>>>> Tuesday September 1st, 11AM PST. I added a basic bullet for
>>> the
>>>>>>>> 
>>>>>>>> agenda
>>>>>>>> 
>>>>>>>> but
>>>>>>>> 
>>>>>>>> if there is more, edit away.
>>>>>>>> 
>>>>>>>> Patrick
>>>>>>>> 
>>>>>>>> On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang <
>>>>>>> jasonstack.zhao@
>>>>>>>> gmail.com> wrote:
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
>>>>>>>> 
>>>>>>>> [email protected]>
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe <
>>>>>>>> 
>>>>>>>> [email protected]>
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin <
>>>>>>>> 
>>>>>>>> [email protected]>
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> This is related to the discussion Jordan and I had about
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> contributor
>>>>>>>> 
>>>>>>>> Zoom call. Instead of open mic for any issue, call it
>>>>>>>> 
>>>>>>>> based
>>>>>>>> 
>>>>>>>> on a
>>>>>>>> 
>>>>>>>> discussion
>>>>>>>> 
>>>>>>>> thread or threads for higher bandwidth discussion.
>>>>>>>> 
>>>>>>>> I would be happy to schedule on for next week to
>>>>>>>> 
>>>>>>>> specifically
>>>>>>>> 
>>>>>>>> discuss
>>>>>>>> 
>>>>>>>> CEP-7. I can attach the recorded call to the CEP after.
>>>>>>>> 
>>>>>>>> +1 or -1?
>>>>>>>> 
>>>>>>>> Patrick
>>>>>>>> 
>>>>>>>> On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie <
>>>>>>>> 
>>>>>>>> [email protected]>
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Does community plan to open another discussion or CEP
>>>>>>>> 
>>>>>>>> on
>>>>>>>> 
>>>>>>>> modularization?
>>>>>>>> 
>>>>>>>> We probably should have a discussion on the ML or
>>>>>>>> 
>>>>>>>> monthly
>>>>>>>> 
>>>>>>>> contrib
>>>>>>>> 
>>>>>>>> call
>>>>>>>> 
>>>>>>>> about it first to see how aligned the interested
>>>>>>>> 
>>>>>>>> contributors
>>>>>>>> 
>>>>>>>> are.
>>>>>>>> 
>>>>>>>> Could
>>>>>>>> 
>>>>>>>> do
>>>>>>>> 
>>>>>>>> that through CEP as well but CEP's (at least thus far
>>>>>>>> 
>>>>>>>> sans k8s
>>>>>>>> 
>>>>>>>> operator)
>>>>>>>> 
>>>>>>>> tend to start with a strong, deeply thought out point of
>>>>>>>> 
>>>>>>>> view
>>>>>>>> 
>>>>>>>> being
>>>>>>>> 
>>>>>>>> expressed.
>>>>>>>> 
>>>>>>>> On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang <
>>>>>>>> 
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>> SASI's performance, specifically the search in the
>>>>>>>> 
>>>>>>>> B+
>>>>>>>> 
>>>>>>>> tree
>>>>>>>> 
>>>>>>>> component,
>>>>>>>> 
>>>>>>>> depends a lot on the component file's header being
>>>>>>>> 
>>>>>>>> available
>>>>>>>> 
>>>>>>>> in
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> pagecache. SASI benefits from (needs) nodes with
>>>>>>>> 
>>>>>>>> lots of
>>>>>>>> 
>>>>>>>> RAM.
>>>>>>>> 
>>>>>>>> Is
>>>>>>>> 
>>>>>>>> SAI
>>>>>>>> 
>>>>>>>> bound
>>>>>>>> 
>>>>>>>> to this same or similar limitation?
>>>>>>>> 
>>>>>>>> SAI also benefits from larger memory because SAI puts
>>>>>>>> 
>>>>>>>> block
>>>>>>>> 
>>>>>>>> info
>>>>>>>> 
>>>>>>>> on
>>>>>>>> 
>>>>>>>> heap
>>>>>>>> 
>>>>>>>> for searching on-disk components and having
>>>>>>>> 
>>>>>>>> cross-index
>>>>>>>> 
>>>>>>>> files on
>>>>>>>> 
>>>>>>>> page
>>>>>>>> 
>>>>>>>> cache
>>>>>>>> 
>>>>>>>> improves read performance of different indexes on the
>>>>>>>> 
>>>>>>>> same
>>>>>>>> 
>>>>>>>> table.
>>>>>>>> 
>>>>>>>> Flushing of SASI can be CPU+IO intensive, to the
>>>>>>>> 
>>>>>>>> point of
>>>>>>>> 
>>>>>>>> saturation,
>>>>>>>> 
>>>>>>>> pauses, and crashes on the node. SSDs are a must,
>>>>>>>> 
>>>>>>>> along
>>>>>>>> 
>>>>>>>> with
>>>>>>>> 
>>>>>>>> a
>>>>>>>> 
>>>>>>>> bit
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> tuning, just to avoid bringing down your cluster.
>>>>>>>> 
>>>>>>>> Beyond
>>>>>>>> 
>>>>>>>> reducing
>>>>>>>> 
>>>>>>>> space
>>>>>>>> 
>>>>>>>> requirements, does SAI improve on these things?
>>>>>>>> 
>>>>>>>> Like
>>>>>>>> 
>>>>>>>> SASI how
>>>>>>>> 
>>>>>>>> does
>>>>>>>> 
>>>>>>>> SAI,
>>>>>>>> 
>>>>>>>> in
>>>>>>>> 
>>>>>>>> its own way, change/narrow the recommendations on
>>>>>>>> 
>>>>>>>> node
>>>>>>>> 
>>>>>>>> hardware
>>>>>>>> 
>>>>>>>> specs?
>>>>>>>> 
>>>>>>>> SAI won't crash the node during compaction and
>>>>>>>> 
>>>>>>>> requires
>>>>>>>> 
>>>>>>>> less
>>>>>>>> 
>>>>>>>> CPU/IO.
>>>>>>>> 
>>>>>>>> * SAI defines global memory limit for compaction
>>>>>>>> 
>>>>>>>> instead of
>>>>>>>> 
>>>>>>>> per-index
>>>>>>>> 
>>>>>>>> memory limit used by SASI.
>>>>>>>> 
>>>>>>>> For example, compactions are running on 10 tables
>>>>>>>> 
>>>>>>>> and
>>>>>>>> 
>>>>>>>> each
>>>>>>>> 
>>>>>>>> has
>>>>>>>> 
>>>>>>>> 10
>>>>>>>> 
>>>>>>>> indexes. SAI will cap the
>>>>>>>> 
>>>>>>>> memory usage with global limit while SASI may use up
>>>>>>>> 
>>>>>>>> to
>>>>>>>> 
>>>>>>>> 100 *
>>>>>>>> 
>>>>>>>> per-index
>>>>>>>> 
>>>>>>>> limit.
>>>>>>>> 
>>>>>>>> * After flushing in-memory segments to disk, SAI won't
>>>>>>>> 
>>>>>>>> merge
>>>>>>>> 
>>>>>>>> on-disk
>>>>>>>> 
>>>>>>>> segments while SASI
>>>>>>>> 
>>>>>>>> attempts to merge them at the end.
>>>>>>>> 
>>>>>>>> There are pros and cons of not merging segments:
>>>>>>>> 
>>>>>>>> ** Pros: compaction runs faster and requires fewer
>>>>>>>> 
>>>>>>>> resources.
>>>>>>>> 
>>>>>>>> ** Cons: small segments reduce compression ratio.
>>>>>>>> 
>>>>>>>> * SAI on-disk format with row ids compresses better.
>>>>>>>> 
>>>>>>>> I understand the desire in keeping out of scope
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> longer
>>>>>>>> 
>>>>>>>> term
>>>>>>>> 
>>>>>>>> deprecation
>>>>>>>> 
>>>>>>>> and migration plan, but… if SASI provides
>>>>>>>> 
>>>>>>>> functionality
>>>>>>>> 
>>>>>>>> that
>>>>>>>> 
>>>>>>>> SAI
>>>>>>>> 
>>>>>>>> doesn't,
>>>>>>>> 
>>>>>>>> like tokenisation and DelimiterAnalyzer, yet
>>>>>>>> 
>>>>>>>> introduces a
>>>>>>>> 
>>>>>>>> body
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> code
>>>>>>>> 
>>>>>>>> ~somewhat similar, shouldn't we be roughly
>>>>>>>> 
>>>>>>>> sketching out
>>>>>>>> 
>>>>>>>> how
>>>>>>>> 
>>>>>>>> to
>>>>>>>> 
>>>>>>>> reduce
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> maintenance surface area?
>>>>>>>> 
>>>>>>>> Agreed that we should reduce maintenance area if
>>>>>>>> 
>>>>>>>> possible,
>>>>>>>> 
>>>>>>>> but
>>>>>>>> 
>>>>>>>> only
>>>>>>>> 
>>>>>>>> very
>>>>>>>> 
>>>>>>>> limited
>>>>>>>> 
>>>>>>>> code base (eg. RangeIterator, QueryPlan) can be
>>>>>>>> 
>>>>>>>> shared.
>>>>>>>> 
>>>>>>>> The
>>>>>>>> 
>>>>>>>> rest
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> code base
>>>>>>>> 
>>>>>>>> is quite different because of on-disk format and
>>>>>>>> 
>>>>>>>> cross-index
>>>>>>>> 
>>>>>>>> files.
>>>>>>>> 
>>>>>>>> The goal of this CEP is to get community buy-in on
>>>>>>>> 
>>>>>>>> SAI's
>>>>>>>> 
>>>>>>>> design.
>>>>>>>> 
>>>>>>>> Tokenization,
>>>>>>>> 
>>>>>>>> DelimiterAnalyzer should be straightforward to
>>>>>>>> 
>>>>>>>> implement on
>>>>>>>> 
>>>>>>>> top
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> SAI.
>>>>>>>> 
>>>>>>>> Can we list what configurations of SASI will
>>>>>>>> 
>>>>>>>> become
>>>>>>>> 
>>>>>>>> deprecated
>>>>>>>> 
>>>>>>>> once
>>>>>>>> 
>>>>>>>> SAI
>>>>>>>> 
>>>>>>>> becomes non-experimental?
>>>>>>>> 
>>>>>>>> Except for "Like", "Tokenisation",
>>>>>>>> 
>>>>>>>> "DelimiterAnalyzer",
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> rest
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> SASI
>>>>>>>> 
>>>>>>>> can
>>>>>>>> 
>>>>>>>> be replaced by SAI.
>>>>>>>> 
>>>>>>>> Given a few bugs are open against 2i and SASI, can
>>>>>>>> 
>>>>>>>> we
>>>>>>>> 
>>>>>>>> provide
>>>>>>>> 
>>>>>>>> some
>>>>>>>> 
>>>>>>>> overview, or rough indication, of how many of them
>>>>>>>> 
>>>>>>>> we
>>>>>>>> 
>>>>>>>> could
>>>>>>>> 
>>>>>>>> "triage
>>>>>>>> 
>>>>>>>> away"?
>>>>>>>> 
>>>>>>>> I believe most of the known bugs in 2i/SASI either
>>>>>>>> 
>>>>>>>> have
>>>>>>>> 
>>>>>>>> been
>>>>>>>> 
>>>>>>>> addressed
>>>>>>>> 
>>>>>>>> in
>>>>>>>> 
>>>>>>>> SAI or
>>>>>>>> 
>>>>>>>> don't apply to SAI.
>>>>>>>> 
>>>>>>>> And, is it time for the project to start
>>>>>>>> 
>>>>>>>> introducing new
>>>>>>>> 
>>>>>>>> SPI
>>>>>>>> 
>>>>>>>> implementations as separate sub-modules and jar
>>>>>>>> 
>>>>>>>> files
>>>>>>>> 
>>>>>>>> that
>>>>>>>> 
>>>>>>>> are
>>>>>>>> 
>>>>>>>> only
>>>>>>>> 
>>>>>>>> loaded
>>>>>>>> 
>>>>>>>> at runtime based on configuration settings? (sorry
>>>>>>>> 
>>>>>>>> for
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> conflation
>>>>>>>> 
>>>>>>>> on
>>>>>>>> 
>>>>>>>> this one, but maybe it's the right time to raise
>>>>>>>> 
>>>>>>>> it
>>>>>>>> 
>>>>>>>> :shrug:)
>>>>>>>> 
>>>>>>>> Agreed that modularization is the way to go and will
>>>>>>>> 
>>>>>>>> speed up
>>>>>>>> 
>>>>>>>> module
>>>>>>>> 
>>>>>>>> development speed.
>>>>>>>> 
>>>>>>>> Does community plan to open another discussion or CEP
>>>>>>>> 
>>>>>>>> on
>>>>>>>> 
>>>>>>>> modularization?
>>>>>>>> 
>>>>>>>> On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever <
>>>>>>>> 
>>>>>>>> [email protected]>
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Adding to Duy's questions…
>>>>>>>> 
>>>>>>>> * Hardware specs
>>>>>>>> 
>>>>>>>> SASI's performance, specifically the search in the
>>>>>>>> 
>>>>>>>> B+
>>>>>>>> 
>>>>>>>> tree
>>>>>>>> 
>>>>>>>> component,
>>>>>>>> 
>>>>>>>> depends a lot on the component file's header being
>>>>>>>> 
>>>>>>>> available in
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> pagecache. SASI benefits from (needs) nodes with
>>>>>>>> 
>>>>>>>> lots
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> RAM.
>>>>>>>> 
>>>>>>>> Is
>>>>>>>> 
>>>>>>>> SAI
>>>>>>>> 
>>>>>>>> bound
>>>>>>>> 
>>>>>>>> to this same or similar limitation?
>>>>>>>> 
>>>>>>>> Flushing of SASI can be CPU+IO intensive, to the
>>>>>>>> 
>>>>>>>> point of
>>>>>>>> 
>>>>>>>> saturation,
>>>>>>>> 
>>>>>>>> pauses, and crashes on the node. SSDs are a must,
>>>>>>>> 
>>>>>>>> along
>>>>>>>> 
>>>>>>>> with a
>>>>>>>> 
>>>>>>>> bit
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> tuning, just to avoid bringing down your cluster.
>>>>>>>> 
>>>>>>>> Beyond
>>>>>>>> 
>>>>>>>> reducing
>>>>>>>> 
>>>>>>>> space
>>>>>>>> 
>>>>>>>> requirements, does SAI improve on these things? Like
>>>>>>>> 
>>>>>>>> SASI
>>>>>>>> 
>>>>>>>> how
>>>>>>>> 
>>>>>>>> does
>>>>>>>> 
>>>>>>>> SAI,
>>>>>>>> 
>>>>>>>> in
>>>>>>>> 
>>>>>>>> its own way, change/narrow the recommendations on
>>>>>>>> 
>>>>>>>> node
>>>>>>>> 
>>>>>>>> hardware
>>>>>>>> 
>>>>>>>> specs?
>>>>>>>> 
>>>>>>>> * Code Maintenance
>>>>>>>> 
>>>>>>>> I understand the desire in keeping out of scope the
>>>>>>>> 
>>>>>>>> longer
>>>>>>>> 
>>>>>>>> term
>>>>>>>> 
>>>>>>>> deprecation
>>>>>>>> 
>>>>>>>> and migration plan, but… if SASI provides
>>>>>>>> 
>>>>>>>> functionality
>>>>>>>> 
>>>>>>>> that
>>>>>>>> 
>>>>>>>> SAI
>>>>>>>> 
>>>>>>>> doesn't,
>>>>>>>> 
>>>>>>>> like tokenisation and DelimiterAnalyzer, yet
>>>>>>>> 
>>>>>>>> introduces a
>>>>>>>> 
>>>>>>>> body
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> code
>>>>>>>> 
>>>>>>>> ~somewhat similar, shouldn't we be roughly sketching
>>>>>>>> 
>>>>>>>> out
>>>>>>>> 
>>>>>>>> how to
>>>>>>>> 
>>>>>>>> reduce
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> maintenance surface area?
>>>>>>>> 
>>>>>>>> Can we list what configurations of SASI will become
>>>>>>>> 
>>>>>>>> deprecated
>>>>>>>> 
>>>>>>>> once
>>>>>>>> 
>>>>>>>> SAI
>>>>>>>> 
>>>>>>>> becomes non-experimental?
>>>>>>>> 
>>>>>>>> Given a few bugs are open against 2i and SASI, can
>>>>>>>> 
>>>>>>>> we
>>>>>>>> 
>>>>>>>> provide
>>>>>>>> 
>>>>>>>> some
>>>>>>>> 
>>>>>>>> overview, or rough indication, of how many of them
>>>>>>>> 
>>>>>>>> we
>>>>>>>> 
>>>>>>>> could
>>>>>>>> 
>>>>>>>> "triage
>>>>>>>> 
>>>>>>>> away"?
>>>>>>>> 
>>>>>>>> And, is it time for the project to start introducing
>>>>>>>> 
>>>>>>>> new
>>>>>>>> 
>>>>>>>> SPI
>>>>>>>> 
>>>>>>>> implementations as separate sub-modules and jar
>>>>>>>> 
>>>>>>>> files
>>>>>>>> 
>>>>>>>> that
>>>>>>>> 
>>>>>>>> are
>>>>>>>> 
>>>>>>>> only
>>>>>>>> 
>>>>>>>> loaded
>>>>>>>> 
>>>>>>>> at runtime based on configuration settings? (sorry
>>>>>>>> 
>>>>>>>> for the
>>>>>>>> 
>>>>>>>> conflation
>>>>>>>> 
>>>>>>>> on
>>>>>>>> 
>>>>>>>> this one, but maybe it's the right time to raise it
>>>>>>>> 
>>>>>>>> :shrug:)
>>>>>>>> 
>>>>>>>> regards,
>>>>>>>> 
>>>>>>>> Mick
>>>>>>>> 
>>>>>>>> On Tue, 18 Aug 2020 at 13:05, DuyHai Doan <
>>>>>>>> 
>>>>>>>> [email protected]>
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Thank you Zhao Yang for starting this topic
>>>>>>>> 
>>>>>>>> After reading the short design doc, I have a few
>>>>>>>> 
>>>>>>>> questions
>>>>>>>> 
>>>>>>>> 1) SASI was pretty inefficient indexing wide
>>>>>>>> 
>>>>>>>> partitions
>>>>>>>> 
>>>>>>>> because
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> index
>>>>>>>> 
>>>>>>>> structure only retains the partition token, not
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> clustering
>>>>>>>> 
>>>>>>>> colums.
>>>>>>>> 
>>>>>>>> As
>>>>>>>> 
>>>>>>>> per design doc SAI has row id mapping to partition
>>>>>>>> 
>>>>>>>> offset,
>>>>>>>> 
>>>>>>>> can
>>>>>>>> 
>>>>>>>> we
>>>>>>>> 
>>>>>>>> hope
>>>>>>>> 
>>>>>>>> that
>>>>>>>> 
>>>>>>>> indexing wide partition will be more efficient
>>>>>>>> 
>>>>>>>> with
>>>>>>>> 
>>>>>>>> SAI
>>>>>>>> 
>>>>>>>> ? One
>>>>>>>> 
>>>>>>>> detail
>>>>>>>> 
>>>>>>>> that
>>>>>>>> 
>>>>>>>> worries me is that in the beggining of the design
>>>>>>>> 
>>>>>>>> doc,
>>>>>>>> 
>>>>>>>> it is
>>>>>>>> 
>>>>>>>> said
>>>>>>>> 
>>>>>>>> that
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> matching rows are post filtered while scanning the
>>>>>>>> 
>>>>>>>> partition.
>>>>>>>> 
>>>>>>>> Can
>>>>>>>> 
>>>>>>>> you
>>>>>>>> 
>>>>>>>> confirm or infirm that SAI is efficient with wide
>>>>>>>> 
>>>>>>>> partitions
>>>>>>>> 
>>>>>>>> and
>>>>>>>> 
>>>>>>>> provides
>>>>>>>> 
>>>>>>>> the partition offsets to the matching rows ?
>>>>>>>> 
>>>>>>>> 2) About space efficiency, one of the biggest
>>>>>>>> 
>>>>>>>> drawback of
>>>>>>>> 
>>>>>>>> SASI
>>>>>>>> 
>>>>>>>> was
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> huge
>>>>>>>> 
>>>>>>>> space required for index structure when using
>>>>>>>> 
>>>>>>>> CONTAINS
>>>>>>>> 
>>>>>>>> logic
>>>>>>>> 
>>>>>>>> because
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> decomposition of text columns into n-grams. Will
>>>>>>>> 
>>>>>>>> SAI
>>>>>>>> 
>>>>>>>> suffer
>>>>>>>> 
>>>>>>>> from
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> same
>>>>>>>> 
>>>>>>>> issue in future iterations ? I'm anticipating a
>>>>>>>> 
>>>>>>>> bit
>>>>>>>> 
>>>>>>>> 3) If I'm querying using SAI and providing
>>>>>>>> 
>>>>>>>> complete
>>>>>>>> 
>>>>>>>> partition
>>>>>>>> 
>>>>>>>> key,
>>>>>>>> 
>>>>>>>> will
>>>>>>>> 
>>>>>>>> it
>>>>>>>> 
>>>>>>>> be more efficient than querying without partition
>>>>>>>> 
>>>>>>>> key. In
>>>>>>>> 
>>>>>>>> other
>>>>>>>> 
>>>>>>>> words,
>>>>>>>> 
>>>>>>>> does
>>>>>>>> 
>>>>>>>> SAI provide any optimisation when partition key is
>>>>>>>> 
>>>>>>>> specified
>>>>>>>> 
>>>>>>>> ?
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> 
>>>>>>>> Duy Hai DOAN
>>>>>>>> 
>>>>>>>> Le mar. 18 août 2020 à 11:39, Mick Semb Wever <
>>>>>>>> 
>>>>>>>> [email protected]>
>>>>>>>> 
>>>>>>>> a
>>>>>>>> 
>>>>>>>> écrit :
>>>>>>>> 
>>>>>>>> We are looking forward to the community's
>>>>>>>> 
>>>>>>>> feedback
>>>>>>>> 
>>>>>>>> and
>>>>>>>> 
>>>>>>>> suggestions.
>>>>>>>> 
>>>>>>>> What comes immediately to mind is testing
>>>>>>>> 
>>>>>>>> requirements. It
>>>>>>>> 
>>>>>>>> has
>>>>>>>> 
>>>>>>>> been
>>>>>>>> 
>>>>>>>> mentioned already that the project's testability
>>>>>>>> 
>>>>>>>> and QA
>>>>>>>> 
>>>>>>>> guidelines
>>>>>>>> 
>>>>>>>> are
>>>>>>>> 
>>>>>>>> inadequate to successfully introduce new
>>>>>>>> 
>>>>>>>> features
>>>>>>>> 
>>>>>>>> and
>>>>>>>> 
>>>>>>>> refactorings
>>>>>>>> 
>>>>>>>> to
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> codebase. During the 4.0 beta phase this was
>>>>>>>> 
>>>>>>>> intended
>>>>>>>> 
>>>>>>>> to be
>>>>>>>> 
>>>>>>>> addressed,
>>>>>>>> 
>>>>>>>> i.e.
>>>>>>>> 
>>>>>>>> defining more specific QA guidelines for 4.0-rc.
>>>>>>>> 
>>>>>>>> This
>>>>>>>> 
>>>>>>>> would
>>>>>>>> 
>>>>>>>> be
>>>>>>>> 
>>>>>>>> an
>>>>>>>> 
>>>>>>>> important
>>>>>>>> 
>>>>>>>> step towards QA guidelines for all changes and
>>>>>>>> 
>>>>>>>> CEPs
>>>>>>>> 
>>>>>>>> post-4.0.
>>>>>>>> 
>>>>>>>> Questions from me
>>>>>>>> 
>>>>>>>> - How will this be tested, how will its QA
>>>>>>>> 
>>>>>>>> status and
>>>>>>>> 
>>>>>>>> lifecycle
>>>>>>>> 
>>>>>>>> be
>>>>>>>> 
>>>>>>>> defined? (per above)
>>>>>>>> 
>>>>>>>> - With existing C* code needing to be changed,
>>>>>>>> 
>>>>>>>> what
>>>>>>>> 
>>>>>>>> is the
>>>>>>>> 
>>>>>>>> proposed
>>>>>>>> 
>>>>>>>> plan
>>>>>>>> 
>>>>>>>> for making those changes ensuring maintained QA,
>>>>>>>> 
>>>>>>>> e.g.
>>>>>>>> 
>>>>>>>> is
>>>>>>>> 
>>>>>>>> there
>>>>>>>> 
>>>>>>>> separate
>>>>>>>> 
>>>>>>>> QA
>>>>>>>> 
>>>>>>>> cycles planned for altering the SPI before
>>>>>>>> 
>>>>>>>> adding
>>>>>>>> 
>>>>>>>> a
>>>>>>>> 
>>>>>>>> new SPI
>>>>>>>> 
>>>>>>>> implementation?
>>>>>>>> 
>>>>>>>> - Despite being out of scope, it would be nice
>>>>>>>> 
>>>>>>>> to have
>>>>>>>> 
>>>>>>>> some
>>>>>>>> 
>>>>>>>> idea
>>>>>>>> 
>>>>>>>> from
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> CEP author of when users might still choose
>>>>>>>> 
>>>>>>>> afresh 2i
>>>>>>>> 
>>>>>>>> or
>>>>>>>> 
>>>>>>>> SASI
>>>>>>>> 
>>>>>>>> over
>>>>>>>> 
>>>>>>>> SAI,
>>>>>>>> 
>>>>>>>> - Who fills the roles involved? Who are the
>>>>>>>> 
>>>>>>>> contributors
>>>>>>>> 
>>>>>>>> in
>>>>>>>> 
>>>>>>>> this
>>>>>>>> 
>>>>>>>> DataStax
>>>>>>>> 
>>>>>>>> team? Who is the shepherd? Are there other
>>>>>>>> 
>>>>>>>> stakeholders
>>>>>>>> 
>>>>>>>> willing
>>>>>>>> 
>>>>>>>> to
>>>>>>>> 
>>>>>>>> be
>>>>>>>> 
>>>>>>>> involved?
>>>>>>>> 
>>>>>>>> - Is there a preference to use gdoc instead of
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> project's
>>>>>>>> 
>>>>>>>> wiki,
>>>>>>>> 
>>>>>>>> and
>>>>>>>> 
>>>>>>>> why? (the CEP process suggest a wiki page, and
>>>>>>>> 
>>>>>>>> feedback on
>>>>>>>> 
>>>>>>>> why
>>>>>>>> 
>>>>>>>> another
>>>>>>>> 
>>>>>>>> approach is considered better helps evolve the
>>>>>>>> 
>>>>>>>> CEP
>>>>>>>> 
>>>>>>>> process
>>>>>>>> 
>>>>>>>> itself)
>>>>>>>> 
>>>>>>>> cheers,
>>>>>>>> 
>>>>>>>> Mick
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> 
>>>>>>>> To unsubscribe, e-mail:
>> [email protected]
>>>> For
>>>>>>>> additional commands, e-mail: [email protected]
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>> To
>>>>>>>> unsubscribe, e-mail: [email protected]
>>> For
>>>>>>> additional
>>>>>>>> commands, e-mail: [email protected]
>>>>>>>> 
>>>>>>>> --
>>>>>>>> alex p
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>> To
>>>>>>>> unsubscribe, e-mail: [email protected]
>>> For
>>>>>>> additional
>>>>>>>> commands, e-mail: [email protected]
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> alex p

Re: [DISCUSS] CEP-7 Storage Attached Index

Reply via email to