> > If we are to address that within the CEP itself then we should discuss it > here, as I would like to fully understand the approach as well as how it > relates to consistency of execution and the idea of triggering > re-optimisation.
Sure, that was my plan. I’m not sold on the proposed set of characteristics, and think my coupling > an execution plan to a given prepared statement for clients to supply is > perhaps simpler to implement and maintain, and has corollary benefits - > such as providing a mechanism for users to specify their own execution plan. > > Note, my proposal cuts across all of these elements of the CEP. There is no > obvious need for a cross-cluster re-optimisation event or cross cluster > statistic management. > I think that I am missing one part of your proposal. How do you plan to build the initial execution plan for a prepared statement? Le mer. 20 déc. 2023 à 14:05, Benedict <bened...@apache.org> a écrit : > If we are to address that within the CEP itself then we should discuss it > here, as I would like to fully understand the approach as well as how it > relates to consistency of execution and the idea of triggering > re-optimisation. These ideas are all interrelated. > > I’m not sold on the proposed set of characteristics, and think my coupling > an execution plan to a given prepared statement for clients to supply is > perhaps simpler to implement and maintain, and has corollary benefits - > such as providing a mechanism for users to specify their own execution plan. > > Note, my proposal cuts across all of these elements of the CEP. There is > no obvious need for a cross-cluster re-optimisation event or cross cluster > statistic management. > > We still also need to discuss more concretely how the base statistics > themselves will be derived, as there is little detail here today in the > proposal. > > On 20 Dec 2023, at 12:58, Benjamin Lerer <b.le...@gmail.com> wrote: > > > After the second phase of the CEP, we will have two optimizer > implementations. One will be similar to what we have today and the other > one will be the CBO. As those implementations will be behind the new > Optimizer API interfaces they will both have support for EXPLAIN and they > will both benefit from the simplification/normalization rules. Such as the > ones that David mentioned. > > Regarding functions, we are already able to determine which ones are > deterministic ( > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/Function.java#L55). > We simply do not take advantage of it. > > I removed the ALLOW FILTERING part and will open a discussion about it at > the beginning of next year. > > Regarding the statistics management part, I would like to try to address > it within the CEP itself, if feasible. If it turns out to be too > complicated, I will separate it into its own CEP. > > Le mar. 19 déc. 2023 à 22:23, David Capwell <dcapw...@apple.com> a écrit : > >> even if the only outcome of all this work were to tighten up >> inconsistencies in our grammar and provide more robust EXPLAIN and EXPLAIN >> ANALYZE functionality to our end users, I think that would be highly >> valuable >> >> >> In my mental model a no-op optimizer just becomes what we have today >> (since all new features really should be disabled by default, I would hope >> we support this), so we benefit from having a logical AST + ability to >> mutate it before we execute it and we can use this to make things nicer for >> users (as you are calling out) >> >> Here is one example that stands out to me in accord >> >> LET a = (select * from tbl where pk=0); >> Insert into tbl2 (pk, …) values (a.pk, …); — this is not allowed as we >> don’t know the primary key… but this could trivially be written to replace >> a.pk with 0… >> >> With this work we could also rethink what functions are deterministic and >> which ones are not (not trying to bike shed)… simple example is “now” >> (select now() from tbl; — each row will have a different timestamp), if we >> make this deterministic we can avoid calling it for each row and instead >> just replace it with a constant for the query… >> >> Even if the CBO is dropped in favor of no-op (what we do today), I still >> see value in this work. >> >> I do think that the CBO really doesn’t solve the fact some features don’t >> work well, if anything it could just mask it until it’s too late…. If user >> builds an app using filtering and everything is going well in QA, but once >> they see a spike in traffic in prod we start rejecting… this is a bad >> user experience IMO… we KNOW you must think about this before you go this >> route, so a CBO letting you ignore it till you hit a wall I don’t think is >> the best (not saying ALLOW FILTERING is the solution to this… but it at >> least is a signal to users to think through their data model). >> >> >> On Dec 15, 2023, at 6:38 PM, Josh McKenzie <jmcken...@apache.org> wrote: >> >> Goals >> >> - Introduce a Cascades(2) query optimizer with rules easily >> extendable >> - Improve query performance for most common queries >> - Add support for EXPLAIN and EXPLAIN ANALYZE to help with query >> optimization and troubleshooting >> - Lay the groundwork for the addition of features like joins, >> subqueries, OR/NOT and index ordering >> - Put in place some performance benchmarks to validate query >> optimizations >> >> I think these are sensible goals. We're possibly going to face a >> chicken-or-egg problem with a feature like this that so heavily intersects >> with other as-yet written features where much of the value is in the >> intersection of them; if we continue down the current "one heuristic to >> rule them all" query planning approach we have now, we'll struggle to >> meaningfully explore or conceptualize the value of potential alternatives >> different optimizers could present us. Flip side, to Benedict's point, >> until SAI hits and/or some other potential future things we've all talked >> about, this cbo would likely fall directly into the same path that we >> effectively have hard-coded today (primary index path only). >> >> One thing I feel pretty strongly about: even if the only outcome of all >> this work were to tighten up inconsistencies in our grammar and provide >> more robust EXPLAIN and EXPLAIN ANALYZE functionality to our end users, I >> think that would be highly valuable. This path of "only" would be >> predicated on us not having successful introduction of a robust secondary >> index implementation and a variety of other things we have a lot of >> interest in, so I find it unlikely, but worth calling out. >> >> re: the removal of ALLOW FILTERING - is there room for compromise here >> and instead converting it to a guardrail that defaults to being enabled? >> That could theoretically give us a more gradual path to migration to a >> cost-based guardrail for instance, and would preserve the current >> robustness of the system while making it at least a touch more configurable. >> >> On Fri, Dec 15, 2023, at 11:03 AM, Chris Lohfink wrote: >> >> Thanks for time in addressing concerns. At least with initial versions, >> as long as there is a way to replace it with noop or disable it I would be >> happy. This is pretty standard practice with features nowadays but I wanted >> to highlight it as this might require some pretty tight coupling. >> >> Chris >> >> On Fri, Dec 15, 2023 at 7:57 AM Benjamin Lerer <ble...@apache.org> wrote: >> >> Hey Chris, >> You raise some valid points. >> >> I believe that there are 3 points that you mentioned: >> 1) CQL restrictions are some form of safety net and should be kept >> 2) A lot of Cassandra features do not scale and/or are too easy to use in >> a wrong way that can make the whole system collapse. We should not add more >> to that list. Especially not joins. >> >> 3) Should we not start to fix features like secondary index rather than >> adding new ones? Which is heavily linked to 2). >> >> Feel free to correct me if I got them wrong or missed one. >> >> Regarding 1), I believe that you refer to the "Removing unnecessary CQL >> query limitations and inconsistencies" section. We are not planning to >> remove any safety net here. >> What we want to remove is a certain amount of limitations which make >> things confusing for a user trying to write a query for no good reason. >> Like "why can I define a column alias but not use it anywhere in my query?" >> or "Why can I not create a list with 2 bind parameters?". While refactoring >> some CQL code, I kept on finding those types of exceptions that we can >> easily remove while simplifying the code at the same time. >> >> For 2), I agree that at a certain scale or for some scenarios, some >> features simply do not scale or catch users by surprise. The goal of the >> CEP is to improve things in 2 ways. One is by making Cassandra smarter in >> the way it chooses how to process queries, hopefully improving its overall >> scalability. The other by being transparent about how Cassandra will >> execute the queries through the use of EXPLAIN. One problem of GROUP BY for >> example is that most users do not realize what is actually happening under >> the hood and therefore its limitations. I do not believe that EXPLAIN will >> change everything but it will help people to get a better understanding of >> the limitations of some features. >> >> I do not know which features will be added in the future to C*. That will >> be discussed through some future CEPs. Nevertheless, I do not believe that >> it makes sense to write a CEP for a query optimizer without taking into >> account that we might at some point add some level of support for joins or >> subqueries. We have been too often delivering features without looking at >> what could be the possible evolutions which resulted in code where adding >> new features was more complex than it should have been. I do not want to >> make the same mistake. I want to create an optimizer that can be improved >> easily and considering joins or other features simply help to build things >> in a more generic way. >> >> Regarding feature stabilization, I believe that it is happening. I have >> heard plans of how to solve MVs, range queries, hot partitions, ... and >> there was a lot of thinking behind those plans. Secondary indexes are being >> worked on. We hope that the optimizer will also help with some index >> queries. >> >> It seems to me that this proposal is going toward the direction that you >> want without introducing new problems for scalability. >> >> >> >> >> Le jeu. 14 déc. 2023 à 16:47, Chris Lohfink <clohfin...@gmail.com> a >> écrit : >> >> I don't wanna be a blocker for this CEP or anything but did want to put >> my 2 cents in. This CEP is horrifying to me. >> >> I have seen thousands of clusters across multiple companies and helped >> them get working successfully. A vast majority of that involved blocking >> the use of MVs, GROUP BY, secondary indexes, and even just simple _range >> queries_. The "unncessary restrictions of cql" are not only necessary IMHO, >> more restrictions are necessary to be successful at scale. The idea of just >> opening up CQL to general purpose relational queries and lines like >> "supporting >> queries with joins in an efficient way" ... I would really like us to >> make secondary indexes be a viable option before we start opening up >> floodgates on stuff like this. >> >> Chris >> >> On Thu, Dec 14, 2023 at 9:37 AM Benedict <bened...@apache.org> wrote: >> >> >> > So yes, this physical plan is the structure that you have in mind but >> the idea of sharing it is not part of the CEP. >> >> I think it should be. This should form a major part of the API on which >> any CBO is built. >> >> > It seems that there is a difference between the goal of your proposal >> and the one of the CEP. The goal of the CEP is first to ensure optimal >> performance. It is ok to change the execution plan for one that delivers >> better performance. What we want to minimize is having a node performing >> queries in an inefficient way for a long period of time. >> >> You have made a goal of the CEP synchronising summary statistics across >> the whole cluster in order to achieve some degree of uniformity of query >> plan. So this is explicitly a goal of the CEP, and synchronising summary >> statistics is a hard problem and won’t provide strong guarantees. >> >> > The client side proposal targets consistency for a given query on a >> given driver instance. In practice, it would be possible to have 2 similar >> queries with 2 different execution plans on the same driver >> >> This would only be possible if the driver permitted it. A driver could >> (and should) enforce that it only permits one query plan per query. >> >> The opposite is true for your proposal: some queries may begin degrading >> because they touch specific replicas that optimise the query differently, >> and this will be hard to debug. >> >> >> >> On 14 Dec 2023, at 15:30, Benjamin Lerer <b.le...@gmail.com> wrote: >> >> >> The binding of the parser output to the schema (what is today the >> Raw.prepare call) will create the logical plan, expressed as a tree of >> relational operators. Simplification and normalization will happen on that >> tree to produce a new equivalent logical plan. That logical plan will be >> used as input to the optimizer. The output will be a physical plan >> producing the output specified by the logical plan. A tree of physical >> operators specifying how the operations should be performed. >> >> That physical plan will be stored as part of the statements >> (SelectStatement, ModificationStatement, ...) in the prepared statement >> cache. Upon execution, variables will be bound and the >> RangeCommands/Mutations will be created based on the physical plan. >> >> The string representation of a physical plan will effectively represent >> the output of an EXPLAIN statement but outside of that the physical plan >> will stay encapsulated within the statement classes. >> Hints will be parameters provided to the optimizer to enforce some >> specific choices. Like always using an Index Scan instead of a Table Scan, >> ignoring the cost comparison. >> >> So yes, this physical plan is the structure that you have in mind but the >> idea of sharing it is not part of the CEP. I did not document it because it >> will simply be a tree of physical operators used internally. >> >> My proposal is that the execution plan of the coordinator that prepares a >> query gets serialised to the client, which then provides the execution plan >> to all future coordinators, and coordinators provide it to replicas as >> necessary. >> >> This means it is not possible for any conflict to arise for a single >> client. It would guarantee consistency of execution for any single client >> (and avoid any drift over the client’s sessions), without necessarily >> guaranteeing consistency for all clients. >> >> >> It seems that there is a difference between the goal of your proposal >> and the one of the CEP. The goal of the CEP is first to ensure optimal >> performance. It is ok to change the execution plan for one that delivers >> better performance. What we want to minimize is having a node performing >> queries in an inefficient way for a long period of time. >> >> The client side proposal targets consistency for a given query on a given >> driver instance. In practice, it would be possible to have 2 similar >> queries with 2 different execution plans on the same driver making things >> really confusing. Identifying the source of an inefficient query will also >> be pretty hard. >> >> Interestingly, having 2 nodes with 2 different execution plans might not >> be a serious problem. It simply means that based on cardinality at t1, the >> optimizer on node 1 chose plan 1 while the one on node 2 chose plan 2 at >> t2. In practice if the cost estimates reflect properly the actual cost >> those 2 plans should have pretty similar efficiency. The problem is more >> about the fact that you would ideally want a uniform behavior around your >> cluster. >> Changes of execution plans should only occur at certain points. So the >> main problematic scenario is when the data distribution is around one of >> those points. Which is also the point where the change should have the >> least impact. >> >> >> >> Le jeu. 14 déc. 2023 à 11:38, Benedict <bened...@apache.org> a écrit : >> >> >> There surely needs to be a more succinct and abstract representation in >> order to perform transformations on the query plan? You don’t intend to >> manipulate the object graph directly as you apply any transformations when >> performing simplification or cost based analysis? This would also (I >> expect) be the form used to support EXPLAIN functionality, and probably >> also HINTs etc. This would ideally *not* be coupled to the CBO itself, >> and would ideally be succinctly serialised. >> >> I would very much expect the query plan to be represented abstractly as >> part of this work, and for there to be a mechanism that translates this >> abstract representation into the object graph that executes it. >> >> If I’m incorrect, could you please elaborate more specifically how you >> intend to go about this? >> >> >> On 14 Dec 2023, at 10:33, Benjamin Lerer <b.le...@gmail.com> wrote: >> >> >> >> I mean that an important part of this work - not specified in the CEP >> (AFAICT) - should probably be to define some standard execution model, that >> we can manipulate and serialise, for use across (and without) optimisers. >> >> >> I am confused because for me an execution model defines how operations >> are executed within the database in a conceptual way, which is not >> something that this CEP intends to change. Do you mean the >> physical/execution plan? >> Today this plan is somehow represented for reads by the SelectStatement >> and its components (Selections, StatementRestrictions, ...) it is then >> converted at execution time after parameter binding into a ReadCommand >> which is sent to the replicas. >> We plan to refactor SelectStatement and its components but the >> ReadCommands change should be relatively small. What you are proposing is >> not part of the scope of this CEP. >> >> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer <b.le...@gmail.com> a >> écrit : >> >> Can you share the reasons why Apache Calcite is not suitable for this >> case and why it was rejected >> >> >> My understanding is that Calcite was made for two main things: to help >> with optimizing SQL-like languages and to let people query different kinds >> of data sources together. >> >> We could think about using it for our needs, but there are some big >> problems: >> >> 1. >> >> CQL is not SQL. There are significant differences between the 2 >> languages >> 2. >> >> Cassandra has its own specificities that will influence the cost >> model and the way we deal with optimizations: partitions, replication >> factors, consistency levels, LSM tree storage, ... >> 3. >> >> Every framework comes with its own limitations and additional cost >> >> From my view, there are too many big differences between what Calcite >> does and what we need in Cassandra. If we used Calcite, it would also mean >> relying a lot on another system that everyone would have to learn and >> adjust to. The problems and extra work this would bring don't seem worth >> the benefits we might get >> >> >> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer <b.le...@gmail.com> a >> écrit : >> >> One thing that I did not mention is the fact that this CEP is only a high >> level proposal. There will be deeper discussions on the dev list around the >> different parts of this proposal when we reach those parts and have enough >> details to make those discussions more meaningful. >> >> >> The maintenance and distribution of summary statistics in particular is >> worthy of its own CEP, and it might be preferable to split it out. >> >> >> For maintaining node statistics the idea is to re-use the current >> Memtable/SSTable mechanism and relies on mergeable statistics. That will >> allow us to easily build node level statistics for a given table by merging >> all the statistics of its memtable and SSTables. For the distribution of >> these node statistics we are still exploring different options. We can come >> back with a precise proposal once we have hammered all the details. >> Is it for you a blocker for this CEP or do you just want to make sure >> that this part is discussed in deeper details before we implement it? >> >> >> >> >> The proposal also seems to imply we are aiming for coordinators to all >> make the same decision for a query, which I think is challenging, and it >> would be worth fleshing out the design here a little (perhaps just in Jira). >> >> >> >> The goal is that the large majority of nodes preparing a query at a given >> point in time should make the same decision and that over time all nodes >> should converge toward the same decision. This part is dependent on the >> node statistics distribution, the cost model and the triggers for >> re-optimization (that will require some experimentation). >> >> There’s also not much discussion of the execution model: I think it would >> make most sense for this to be independent of any cost and optimiser models >> (though they might want to operate on them), so that EXPLAIN and hints can >> work across optimisers (a suitable hint might essentially bypass the >> optimiser, if the optimiser permits it, by providing a standard execution >> model) >> >> >> It is not clear to me what you mean by "a standard execution model"? >> Otherwise, we were not planning to have the execution model or the hints >> depending on the optimizer. >> >> I think it would be worth considering providing the execution plan to the >> client as part of query preparation, as an opaque payload to supply to >> coordinators on first contact, as this might simplify the problem of >> ensuring queries behave the same without adopting a lot of complexity for >> synchronising statistics (which will never provide strong guarantees). Of >> course, re-preparing a query might lead to a new plan, though any >> coordinators with the query in their cache should be able to retrieve it >> cheaply. If the execution model is efficiently serialised this might have >> the ancillary benefit of improving the occupancy of our prepared query >> cache. >> >> >> I am not sure that I understand your proposal. If 2 nodes build a >> different execution plan how do you solve that conflict? >> >> Le mer. 13 déc. 2023 à 09:55, Benedict <bened...@apache.org> a écrit : >> >> >> A CBO can only make worse decisions than the status quo for what I >> presume are the majority of queries - i.e. those that touch only primary >> indexes. In general, there are plenty of use cases that prefer determinism. >> So I agree that there should at least be a CBO implementation that makes >> the same decisions as the status quo, deterministically. >> >> I do support the proposal, but would like to see some elements discussed >> in more detail. The maintenance and distribution of summary statistics in >> particular is worthy of its own CEP, and it might be preferable to split it >> out. The proposal also seems to imply we are aiming for coordinators to all >> make the same decision for a query, which I think is challenging, and it >> would be worth fleshing out the design here a little (perhaps just in Jira). >> >> While I’m not a fan of ALLOW FILTERING, I’m not convinced that this CEP >> deprecates it. It is a concrete qualitative guard rail, that I expect some >> users will prefer to a cost-based guard rail. Perhaps this could be left to >> the CBO to decide how to treat. >> >> There’s also not much discussion of the execution model: I think it would >> make most sense for this to be independent of any cost and optimiser models >> (though they might want to operate on them), so that EXPLAIN and hints can >> work across optimisers (a suitable hint might essentially bypass the >> optimiser, if the optimiser permits it, by providing a standard execution >> model) >> >> I think it would be worth considering providing the execution plan to the >> client as part of query preparation, as an opaque payload to supply to >> coordinators on first contact, as this might simplify the problem of >> ensuring queries behave the same without adopting a lot of complexity for >> synchronising statistics (which will never provide strong guarantees). Of >> course, re-preparing a query might lead to a new plan, though any >> coordinators with the query in their cache should be able to retrieve it >> cheaply. If the execution model is efficiently serialised this might have >> the ancillary benefit of improving the occupancy of our prepared query >> cache. >> >> >> On 13 Dec 2023, at 00:44, Jon Haddad <j...@jonhaddad.com> wrote: >> >> >> I think it makes sense to see what the actual overhead is of CBO before >> making the assumption it'll be so high that we need to have two code >> paths. I'm happy to provide thorough benchmarking and analysis when it >> reaches a testing phase. >> >> I'm excited to see where this goes. I think it sounds very forward >> looking and opens up a lot of possibilities. >> >> Jon >> >> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell <cclive1...@gmail.com> wrote: >> >> Nothing expresses my thoughts better than +1 >> ,It feels like it means a lot to Cassandra. >> >> I have a question. Is it easy to turn off cbo's optimizer or by pass in >> some way? Because some simple read and write requests will have better >> performance without cbo, which is also the advantage of Cassandra compared >> to some rdbms. >> >> David Capwell <dcapw...@apple.com>于2023年12月13日 周三上午3:37写道: >> >> Overall LGTM. >> >> >> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer <ble...@apache.org> wrote: >> >> Hi everybody, >> >> I would like to open the discussion on the introduction of a cost based >> optimizer to allow Cassandra to pick the best execution plan based on the >> data distribution.Therefore, improving the overall query performance. >> >> This CEP should also lay the groundwork for the future addition of >> features like joins, subqueries, OR/NOT and index ordering. >> >> The proposal is here: >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer >> >> Thank you in advance for your feedback. >> >> >>