Re: [DISCUSS] CEP-39: Cost Based Optimizer

Benjamin Lerer Wed, 20 Dec 2023 07:53:13 -0800

>
> If we are to address that within the CEP itself then we should discuss it
> here, as I would like to fully understand the approach as well as how it
> relates to consistency of execution and the idea of triggering
> re-optimisation.



Sure, that was my plan.

I’m not sold on the proposed set of characteristics, and think my coupling
> an execution plan to a given prepared statement for clients to supply is
> perhaps simpler to implement and maintain, and has corollary benefits -
> such as providing a mechanism for users to specify their own execution plan.
>

>
Note, my proposal cuts across all of these elements of the CEP. There is no
> obvious need for a cross-cluster re-optimisation event or cross cluster
> statistic management.
>

I think that I am missing one part of your proposal. How do you plan to
build the initial execution plan for a prepared statement?

Le mer. 20 déc. 2023 à 14:05, Benedict <[email protected]> a écrit :

> If we are to address that within the CEP itself then we should discuss it
> here, as I would like to fully understand the approach as well as how it
> relates to consistency of execution and the idea of triggering
> re-optimisation. These ideas are all interrelated.
>
> I’m not sold on the proposed set of characteristics, and think my coupling
> an execution plan to a given prepared statement for clients to supply is
> perhaps simpler to implement and maintain, and has corollary benefits -
> such as providing a mechanism for users to specify their own execution plan.
>
> Note, my proposal cuts across all of these elements of the CEP. There is
> no obvious need for a cross-cluster re-optimisation event or cross cluster
> statistic management.
>
> We still also need to discuss more concretely how the base statistics
> themselves will be derived, as there is little detail here today in the
> proposal.
>
> On 20 Dec 2023, at 12:58, Benjamin Lerer <[email protected]> wrote:
>
> 
> After the second phase of the CEP, we will have two optimizer
> implementations. One will be similar to what we have today and the other
> one will be the CBO. As those implementations will be behind the new
> Optimizer API interfaces they will both have support for EXPLAIN and they
> will both benefit from the simplification/normalization rules. Such as the
> ones that David mentioned.
>
> Regarding functions, we are already able to determine which ones are
> deterministic (
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/Function.java#L55).
> We simply do not take advantage of it.
>
> I removed the ALLOW FILTERING part and will open a discussion about it at
> the beginning of next year.
>
> Regarding the statistics management part, I would like to try to address
> it within the CEP itself, if feasible. If it turns out to be too
> complicated, I will separate it into its own CEP.
>
> Le mar. 19 déc. 2023 à 22:23, David Capwell <[email protected]> a écrit :
>
>> even if the only outcome of all this work were to tighten up
>> inconsistencies in our grammar and provide more robust EXPLAIN and EXPLAIN
>> ANALYZE functionality to our end users, I think that would be highly
>> valuable
>>
>>
>> In my mental model a no-op optimizer just becomes what we have today
>> (since all new features really should be disabled by default, I would hope
>> we support this), so we benefit from having a logical AST + ability to
>> mutate it before we execute it and we can use this to make things nicer for
>> users (as you are calling out)
>>
>> Here is one example that stands out to me in accord
>>
>> LET a = (select * from tbl where pk=0);
>> Insert into tbl2 (pk, …) values (a.pk, …); — this is not allowed as we
>> don’t know the primary key… but this could trivially be written to replace
>> a.pk with 0…
>>
>> With this work we could also rethink what functions are deterministic and
>> which ones are not (not trying to bike shed)… simple example is “now”
>> (select now() from tbl; — each row will have a different timestamp), if we
>> make this deterministic we can avoid calling it for each row and instead
>> just replace it with a constant for the query…
>>
>> Even if the CBO is dropped in favor of no-op (what we do today), I still
>> see value in this work.
>>
>> I do think that the CBO really doesn’t solve the fact some features don’t
>> work well, if anything it could just mask it until it’s too late….  If user
>> builds an app using filtering and everything is going well in QA, but once
>> they see a spike in traffic in prod we start rejecting… this is a bad
>> user experience IMO… we KNOW you must think about this before you go this
>> route, so a CBO letting you ignore it till you hit a wall I don’t think is
>> the best (not saying ALLOW FILTERING is the solution to this… but it at
>> least is a signal to users to think through their data model).
>>
>>
>> On Dec 15, 2023, at 6:38 PM, Josh McKenzie <[email protected]> wrote:
>>
>> Goals
>>
>>    - Introduce a Cascades(2) query optimizer with rules easily
>>    extendable
>>    - Improve query performance for most common queries
>>    - Add support for EXPLAIN and EXPLAIN ANALYZE to help with query
>>    optimization and troubleshooting
>>    - Lay the groundwork for the addition of features like joins,
>>    subqueries, OR/NOT and index ordering
>>    - Put in place some performance benchmarks to validate query
>>    optimizations
>>
>> I think these are sensible goals. We're possibly going to face a
>> chicken-or-egg problem with a feature like this that so heavily intersects
>> with other as-yet written features where much of the value is in the
>> intersection of them; if we continue down the current "one heuristic to
>> rule them all" query planning approach we have now, we'll struggle to
>> meaningfully explore or conceptualize the value of potential alternatives
>> different optimizers could present us. Flip side, to Benedict's point,
>> until SAI hits and/or some other potential future things we've all talked
>> about, this cbo would likely fall directly into the same path that we
>> effectively have hard-coded today (primary index path only).
>>
>> One thing I feel pretty strongly about: even if the only outcome of all
>> this work were to tighten up inconsistencies in our grammar and provide
>> more robust EXPLAIN and EXPLAIN ANALYZE functionality to our end users, I
>> think that would be highly valuable. This path of "only" would be
>> predicated on us not having successful introduction of a robust secondary
>> index implementation and a variety of other things we have a lot of
>> interest in, so I find it unlikely, but worth calling out.
>>
>> re: the removal of ALLOW FILTERING - is there room for compromise here
>> and instead converting it to a guardrail that defaults to being enabled?
>> That could theoretically give us a more gradual path to migration to a
>> cost-based guardrail for instance, and would preserve the current
>> robustness of the system while making it at least a touch more configurable.
>>
>> On Fri, Dec 15, 2023, at 11:03 AM, Chris Lohfink wrote:
>>
>> Thanks for time in addressing concerns. At least with initial versions,
>> as long as there is a way to replace it with noop or disable it I would be
>> happy. This is pretty standard practice with features nowadays but I wanted
>> to highlight it as this might require some pretty tight coupling.
>>
>> Chris
>>
>> On Fri, Dec 15, 2023 at 7:57 AM Benjamin Lerer <[email protected]> wrote:
>>
>> Hey Chris,
>> You raise some valid points.
>>
>> I believe that there are 3 points that you mentioned:
>> 1) CQL restrictions are some form of safety net and should be kept
>> 2) A lot of Cassandra features do not scale and/or are too easy to use in
>> a wrong way that can make the whole system collapse. We should not add more
>> to that list. Especially not joins.
>>
>> 3) Should we not start to fix features like secondary index rather than
>> adding new ones? Which is heavily linked to 2).
>>
>> Feel free to correct me if I got them wrong or missed one.
>>
>> Regarding 1), I believe that you refer to the "Removing unnecessary CQL
>> query limitations and inconsistencies" section. We are not planning to
>> remove any safety net here.
>> What we want to remove is a certain amount of limitations which make
>> things confusing for a user trying to write a query for no good reason.
>> Like "why can I define a column alias but not use it anywhere in my query?"
>> or "Why can I not create a list with 2 bind parameters?". While refactoring
>> some CQL code, I kept on finding those types of exceptions that we can
>> easily remove while simplifying the code at the same time.
>>
>> For 2), I agree that at a certain scale or for some scenarios, some
>> features simply do not scale or catch users by surprise. The goal of the
>> CEP is to improve things in 2 ways. One is by making Cassandra smarter in
>> the way it chooses how to process queries, hopefully improving its overall
>> scalability. The other by being transparent about how Cassandra will
>> execute the queries through the use of EXPLAIN. One problem of GROUP BY for
>> example is that most users do not realize what is actually happening under
>> the hood and therefore its limitations. I do not believe that EXPLAIN will
>> change everything but it will help people to get a better understanding of
>> the limitations of some features.
>>
>> I do not know which features will be added in the future to C*. That will
>> be discussed through some future CEPs. Nevertheless, I do not believe that
>> it makes sense to write a CEP for a query optimizer without taking into
>> account that we might at some point add some level of support for joins or
>> subqueries. We have been too often delivering features without looking at
>> what could be the possible evolutions which resulted in code where adding
>> new features was more complex than it should have been. I do not want to
>> make the same mistake. I want to create an optimizer that can be improved
>> easily and considering joins or other features simply help to build things
>> in a more generic way.
>>
>> Regarding feature stabilization, I believe that it is happening. I have
>> heard plans of how to solve MVs, range queries, hot partitions, ... and
>> there was a lot of thinking behind those plans. Secondary indexes are being
>> worked on. We hope that the optimizer will also help with some index
>> queries.
>>
>> It seems to me that this proposal is going toward the direction that you
>> want without introducing new problems for scalability.
>>
>>
>>
>>
>> Le jeu. 14 déc. 2023 à 16:47, Chris Lohfink <[email protected]> a
>> écrit :
>>
>> I don't wanna be a blocker for this CEP or anything but did want to put
>> my 2 cents in. This CEP is horrifying to me.
>>
>> I have seen thousands of clusters across multiple companies and helped
>> them get working successfully. A vast majority of that involved blocking
>> the use of MVs, GROUP BY, secondary indexes, and even just simple _range
>> queries_. The "unncessary restrictions of cql" are not only necessary IMHO,
>> more restrictions are necessary to be successful at scale. The idea of just
>> opening up CQL to general purpose relational queries and lines like 
>> "supporting
>> queries with joins in an efficient way" ... I would really like us to
>> make secondary indexes be a viable option before we start opening up
>> floodgates on stuff like this.
>>
>> Chris
>>
>> On Thu, Dec 14, 2023 at 9:37 AM Benedict <[email protected]> wrote:
>>
>>
>> > So yes, this physical plan is the structure that you have in mind but
>> the idea of sharing it is not part of the CEP.
>>
>> I think it should be. This should form a major part of the API on which
>> any CBO is built.
>>
>> > It seems that there is a difference between the goal of your proposal
>> and the one of the CEP. The goal of the CEP is first to ensure optimal
>> performance. It is ok to change the execution plan for one that delivers
>> better performance. What we want to minimize is having a node performing
>> queries in an inefficient way for a long period of time.
>>
>> You have made a goal of the CEP synchronising summary statistics across
>> the whole cluster in order to achieve some degree of uniformity of query
>> plan. So this is explicitly a goal of the CEP, and synchronising summary
>> statistics is a hard problem and won’t provide strong guarantees.
>>
>> > The client side proposal targets consistency for a given query on a
>> given driver instance. In practice, it would be possible to have 2 similar
>> queries with 2 different execution plans on the same driver
>>
>> This would only be possible if the driver permitted it. A driver could
>> (and should) enforce that it only permits one query plan per query.
>>
>> The opposite is true for your proposal: some queries may begin degrading
>> because they touch specific replicas that optimise the query differently,
>> and this will be hard to debug.
>>
>>
>>
>> On 14 Dec 2023, at 15:30, Benjamin Lerer <[email protected]> wrote:
>>
>> 
>> The binding of the parser output to the schema (what is today the
>> Raw.prepare call) will create the logical plan, expressed as a tree of
>> relational operators. Simplification and normalization will happen on that
>> tree to produce a new equivalent logical plan. That logical plan will be
>> used as input to the optimizer. The output will be a physical plan
>> producing the output specified by the logical plan. A tree of physical
>> operators specifying how the operations should be performed.
>>
>> That physical plan will be stored as part of the statements
>> (SelectStatement, ModificationStatement, ...) in the prepared statement
>> cache. Upon execution, variables will be bound and the
>> RangeCommands/Mutations will be created based on the physical plan.
>>
>> The string representation of a physical plan will effectively represent
>> the output of an EXPLAIN statement but outside of that the physical plan
>> will stay encapsulated within the statement classes.
>> Hints will be parameters provided to the optimizer to enforce some
>> specific choices. Like always using an Index Scan instead of a Table Scan,
>> ignoring the cost comparison.
>>
>> So yes, this physical plan is the structure that you have in mind but the
>> idea of sharing it is not part of the CEP. I did not document it because it
>> will simply be a tree of physical operators used internally.
>>
>> My proposal is that the execution plan of the coordinator that prepares a
>> query gets serialised to the client, which then provides the execution plan
>> to all future coordinators, and coordinators provide it to replicas as
>> necessary.
>>
>> This means it is not possible for any conflict to arise for a single
>> client. It would guarantee consistency of execution for any single client
>> (and avoid any drift over the client’s sessions), without necessarily
>> guaranteeing consistency for all clients.
>>
>>
>>  It seems that there is a difference between the goal of your proposal
>> and the one of the CEP. The goal of the CEP is first to ensure optimal
>> performance. It is ok to change the execution plan for one that delivers
>> better performance. What we want to minimize is having a node performing
>> queries in an inefficient way for a long period of time.
>>
>> The client side proposal targets consistency for a given query on a given
>> driver instance. In practice, it would be possible to have 2 similar
>> queries with 2 different execution plans on the same driver making things
>> really confusing. Identifying the source of an inefficient query will also
>> be pretty hard.
>>
>> Interestingly, having 2 nodes with 2 different execution plans might not
>> be a serious problem. It simply means that based on cardinality at t1, the
>> optimizer on node 1 chose plan 1 while the one on node 2 chose plan 2 at
>> t2. In practice if the cost estimates reflect properly the actual cost
>> those 2 plans should have pretty similar efficiency. The problem is more
>> about the fact that you would ideally want a uniform behavior around your
>> cluster.
>> Changes of execution plans should only occur at certain points. So the
>> main problematic scenario is when the data distribution is around one of
>> those points. Which is also the point where the change should have the
>> least impact.
>>
>>
>>
>> Le jeu. 14 déc. 2023 à 11:38, Benedict <[email protected]> a écrit :
>>
>>
>> There surely needs to be a more succinct and abstract representation in
>> order to perform transformations on the query plan? You don’t intend to
>> manipulate the object graph directly as you apply any transformations when
>> performing simplification or cost based analysis? This would also (I
>> expect) be the form used to support EXPLAIN functionality, and probably
>> also HINTs etc. This would ideally *not* be coupled to the CBO itself,
>> and would ideally be succinctly serialised.
>>
>> I would very much expect the query plan to be represented abstractly as
>> part of this work, and for there to be a mechanism that translates this
>> abstract representation into the object graph that executes it.
>>
>> If I’m incorrect, could you please elaborate more specifically how you
>> intend to go about this?
>>
>>
>> On 14 Dec 2023, at 10:33, Benjamin Lerer <[email protected]> wrote:
>>
>> 
>>
>> I mean that an important part of this work - not specified in the CEP
>> (AFAICT) - should probably be to define some standard execution model, that
>> we can manipulate and serialise, for use across (and without) optimisers.
>>
>>
>> I am confused because for me an execution model defines how operations
>> are executed within the database in a conceptual way, which is not
>> something that this CEP intends to change. Do you mean the
>> physical/execution plan?
>> Today this plan is somehow represented for reads by the SelectStatement
>> and its components (Selections, StatementRestrictions, ...) it is then
>> converted at execution time after parameter binding into a ReadCommand
>> which is sent to the replicas.
>> We plan to refactor SelectStatement and its components but the
>> ReadCommands change should be relatively small. What you are proposing is
>> not part of the scope of this CEP.
>>
>> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer <[email protected]> a
>> écrit :
>>
>> Can you share the reasons why Apache Calcite is not suitable for this
>> case and why it was rejected
>>
>>
>> My understanding is that Calcite was made for two main things: to help
>> with optimizing SQL-like languages and to let people query different kinds
>> of data sources together.
>>
>> We could think about using it for our needs, but there are some big
>> problems:
>>
>>    1.
>>
>>    CQL is not SQL. There are significant differences between the 2
>>    languages
>>    2.
>>
>>    Cassandra has its own specificities that will influence the cost
>>    model and the way we deal with optimizations: partitions, replication
>>    factors, consistency levels, LSM tree storage, ...
>>    3.
>>
>>    Every framework comes with its own limitations and additional cost
>>
>> From my view, there are too many big differences between what Calcite
>> does and what we need in Cassandra. If we used Calcite, it would also mean
>> relying a lot on another system that everyone would have to learn and
>> adjust to. The problems and extra work this would bring don't seem worth
>> the benefits we might get
>>
>>
>> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer <[email protected]> a
>> écrit :
>>
>> One thing that I did not mention is the fact that this CEP is only a high
>> level proposal. There will be deeper discussions on the dev list around the
>> different parts of this proposal when we reach those parts and have enough
>> details to make those discussions more meaningful.
>>
>>
>> The maintenance and distribution of summary statistics in particular is
>> worthy of its own CEP, and it might be preferable to split it out.
>>
>>
>> For maintaining node statistics the idea is to re-use the current
>> Memtable/SSTable mechanism and relies on mergeable statistics. That will
>> allow us to easily build node level statistics for a given table by merging
>> all the statistics of its memtable and SSTables. For the distribution of
>> these node statistics we are still exploring different options. We can come
>> back with a precise proposal once we have hammered all the details.
>> Is it for you a blocker for this CEP or do you just want to make sure
>> that this part is discussed in deeper details before we implement it?
>>
>>
>>
>>
>> The proposal also seems to imply we are aiming for coordinators to all
>> make the same decision for a query, which I think is challenging, and it
>> would be worth fleshing out the design here a little (perhaps just in Jira).
>>
>>
>>
>> The goal is that the large majority of nodes preparing a query at a given
>> point in time should make the same decision and that over time all nodes
>> should converge toward the same decision. This part is dependent on the
>> node statistics distribution, the cost model and the triggers for
>> re-optimization (that will require some experimentation).
>>
>> There’s also not much discussion of the execution model: I think it would
>> make most sense for this to be independent of any cost and optimiser models
>> (though they might want to operate on them), so that EXPLAIN and hints can
>> work across optimisers (a suitable hint might essentially bypass the
>> optimiser, if the optimiser permits it, by providing a standard execution
>> model)
>>
>>
>> It is not clear to me what you mean by "a standard execution model"?
>> Otherwise, we were not planning to have the execution model or the hints
>> depending on the optimizer.
>>
>> I think it would be worth considering providing the execution plan to the
>> client as part of query preparation, as an opaque payload to supply to
>> coordinators on first contact, as this might simplify the problem of
>> ensuring queries behave the same without adopting a lot of complexity for
>> synchronising statistics (which will never provide strong guarantees). Of
>> course, re-preparing a query might lead to a new plan, though any
>> coordinators with the query in their cache should be able to retrieve it
>> cheaply. If the execution model is efficiently serialised this might have
>> the ancillary benefit of improving the occupancy of our prepared query
>> cache.
>>
>>
>> I am not sure that I understand your proposal. If 2 nodes build a
>> different execution plan how do you solve that conflict?
>>
>> Le mer. 13 déc. 2023 à 09:55, Benedict <[email protected]> a écrit :
>>
>>
>> A CBO can only make worse decisions than the status quo for what I
>> presume are the majority of queries - i.e. those that touch only primary
>> indexes. In general, there are plenty of use cases that prefer determinism.
>> So I agree that there should at least be a CBO implementation that makes
>> the same decisions as the status quo, deterministically.
>>
>> I do support the proposal, but would like to see some elements discussed
>> in more detail. The maintenance and distribution of summary statistics in
>> particular is worthy of its own CEP, and it might be preferable to split it
>> out. The proposal also seems to imply we are aiming for coordinators to all
>> make the same decision for a query, which I think is challenging, and it
>> would be worth fleshing out the design here a little (perhaps just in Jira).
>>
>> While I’m not a fan of ALLOW FILTERING, I’m not convinced that this CEP
>> deprecates it. It is a concrete qualitative guard rail, that I expect some
>> users will prefer to a cost-based guard rail. Perhaps this could be left to
>> the CBO to decide how to treat.
>>
>> There’s also not much discussion of the execution model: I think it would
>> make most sense for this to be independent of any cost and optimiser models
>> (though they might want to operate on them), so that EXPLAIN and hints can
>> work across optimisers (a suitable hint might essentially bypass the
>> optimiser, if the optimiser permits it, by providing a standard execution
>> model)
>>
>> I think it would be worth considering providing the execution plan to the
>> client as part of query preparation, as an opaque payload to supply to
>> coordinators on first contact, as this might simplify the problem of
>> ensuring queries behave the same without adopting a lot of complexity for
>> synchronising statistics (which will never provide strong guarantees). Of
>> course, re-preparing a query might lead to a new plan, though any
>> coordinators with the query in their cache should be able to retrieve it
>> cheaply. If the execution model is efficiently serialised this might have
>> the ancillary benefit of improving the occupancy of our prepared query
>> cache.
>>
>>
>> On 13 Dec 2023, at 00:44, Jon Haddad <[email protected]> wrote:
>>
>> 
>> I think it makes sense to see what the actual overhead is of CBO before
>> making the assumption it'll be so high that we need to have two code
>> paths.  I'm happy to provide thorough benchmarking and analysis when it
>> reaches a testing phase.
>>
>> I'm excited to see where this goes.  I think it sounds very forward
>> looking and opens up a lot of possibilities.
>>
>> Jon
>>
>> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell <[email protected]> wrote:
>>
>> Nothing expresses my thoughts better than +1
>> ，It feels like it means a lot to Cassandra.
>>
>> I have a question. Is it easy to turn off cbo's optimizer or by pass in
>> some way? Because some simple read and write requests will have better
>> performance without cbo, which is also the advantage of Cassandra compared
>> to some rdbms.
>>
>> David Capwell <[email protected]>于2023年12月13日 周三上午3:37写道：
>>
>> Overall LGTM.
>>
>>
>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer <[email protected]> wrote:
>>
>> Hi everybody,
>>
>> I would like to open the discussion on the introduction of a cost based
>> optimizer to allow Cassandra to pick the best execution plan based on the
>> data distribution.Therefore, improving the overall query performance.
>>
>> This CEP should also lay the groundwork for the future addition of
>> features like joins, subqueries, OR/NOT and index ordering.
>>
>> The proposal is here:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>>
>> Thank you in advance for your feedback.
>>
>>
>>

Re: [DISCUSS] CEP-39: Cost Based Optimizer

Reply via email to