wall of text inc.
*tl;dr: *Aiming to come to some conclusions about what we are doing with
MV's and how we are going to make them stable in production. But really
just trying to raise awareness/involvement for MV's.

It seems we've got an excess of MV bugs that pretty much make them
completely unusable in production, or at least incredibly risky and also
limited. It also appears that we don't have many people totally across MV's
either (or at least a lack of people currently looking at them). To avoid
us "forgetting" about MV's I'd like to raise the current issues and get
opinions on the direction we should go with MV's. I know historically there
was a lot of discussion about this, but it seems a lot of the originally
involved are currently less involved, and thus before making wild changes
to MV's it might be worth going back to the start and think through the
original requirements and implementation.

Probably worth summarising the original goals of MV's:

   - Maintain eventual consistency between base table and view tables
   - Provide mechanisms to repair consistency between base and views
   - Aim to keep convergence between base and view fast without sacrificing
   availability (low MTTR)
   Goals that weren't explicitly mentioned but more or less implied:
   - Performance must be at least good enough to justify using them over
   rolling-your-own. (we haven't really tried to measure this yet - only
   measured in comparison to not-a-MV)
   - Allow a user to redefine their partitioning key

And also a quick summary of *some *of the limitations in our implementation
(there are more, but majority of our current problems revolve around these):

   1. Primary key of the base table must be included in the view,
   optionally one non-primary key column can be included in the view primary
   key.
   2. All columns in the view primary key must be declared NOT NULL.
   3. Base tables and views are one-to-one. That is, a *primary key* in a
   base maps to exactly one *primary key *in the view. Therefore you should
   never expect multiple rows in the view for a partition with multiple rows
   in the base.


I've summarised the bulk of the outstanding bugs below (may have missed
some), but notably it would be useful to get some decision-making happening
on them. Fixing these bugs is a bit more involved and there is likely a few
possible solutions and implications. Also they all pretty much touch the
same parts of the code, so needs to be some collaboration across the
patches (part of the reason I'm trying to bring more attention to them).

CASSANDRA-13657 <https://issues.apache.org/jira/browse/CASSANDRA-13657> -
Using a non-PK column in the view PK means that you can TTL that column in
the base without TTLing the resulting view row. Potential solution is to
change the definition of liveness info for view rows. This would probably
work but makes moving away from the NOT NULL requirement on view PK's
harder. Need to decide if that's what we want to do or if we pursue a
different solution.

CASSANDRA-13127 <https://issues.apache.org/jira/browse/CASSANDRA-13127> -
Inserting with key with a TTL then updating the TTL on a column from the
base that doesn't exist in the view doesn't update the liveness of the row
in the MV, and thus the MV row expires before the base. The current
proposed solution should work but will increase the amount of cases where
we need to read the existing data. Needs some reviewing and wouldn't hurt
to benchmark the changes.

CASSANDRA-13547 <https://issues.apache.org/jira/browse/CASSANDRA-13547> -
Being able to leave a column out of your SELECT but including it in the
view filters causes some serious issues. Proposed fix is to force user to
select all columns also included in where clause. This will potentially be
a compatibility issue but *should *be fine as it only is checked on MV
creation - so people upgrading shouldn't be affected (needs reviewing).
Also another issue is addressed in the patch regarding timestamps - choice
of timestamps led to rows not being deleted in the view. This comes back to
the fact that we allow a non-PK column in the view PK. Needs more reviewing.
Also related somewhat to 11500.

CASSANDRA-13409 <https://issues.apache.org/jira/browse/CASSANDRA-13409> -
Issues with shadowable tombstones. Has a patch but not sure if resolved
based on Zhao's last comment. Another case of bringing data back in the
view and thus making base and view inconsistent. Needs reviewing.

CASSANDRA-11500 <https://issues.apache.org/jira/browse/CASSANDRA-11500>
CASSANDRA-10965 <https://issues.apache.org/jira/browse/CASSANDRA-10965> -
Both these appear to be instances of the same issue. Got a couple of
potential solutions. Back to that problem of shadowable tombstones and
timestamps. Pretty involved and would require an in depth review as
decisions could greatly impact the complexity/usefulness of MV's.

CASSANDRA-13069 <https://issues.apache.org/jira/browse/CASSANDRA-13069> -
Node movements can cause inconsistencies. Paulo has written a patch but
Sylvain has raised some concerns about our use of the local batchlog.
Haven't confirmed myself but belief is that our eventual consistency
guarantee is broken... :/ needs reviewing...

CASSANDRA-12888 <https://issues.apache.org/jira/browse/CASSANDRA-12888> -
Most people are probably aware of this one. Losing the repaired_at status
for all MV streams as they are replayed through the write path. Has a
potential solution in place for 4.x, but we need to commit to a work around
for 3.11.x at least.

CASSANDRA-12730 <https://issues.apache.org/jira/browse/CASSANDRA-12730> -
This touches on some very common repair issues that we should probably look
at, but I don't think it directly relates to MV's anymore. Might be worth
removing the Materialized View component. (but this ticket probably still
deserves a bit of attention).

If anyone has been working on any of these tickets and no longer is able
to, either update the ticket or let me know and I'll either take over/find
some other poor soul to have a stab at it.
It would also be nice to get some volunteers who are familiar with MV's to
review the above tickets.

Another thing I'm not sure of is that we are aiming to guarantee eventual
consistency between base and view, however even with using the batchlog my
understanding is we can't achieve this without some tool to synchronise the
base with the view, however I don't think this tool currently exists and it
seems like CASSANDRA-10346
<https://issues.apache.org/jira/browse/CASSANDRA-10346> agrees... Can
anyone clarify if this is actually a requirement for eventual consistency?

My general advice these days is for users to steer clear of MV's for the
moment, however we have no clear plan for when these will really be stable.
I think as some of the changes to fix MV's may potentially require a major
version change, we should at least aim to get all those in for 4.0
(although still need to figure out what exactly these issues are).
Interested to hear peoples thoughts.

Reply via email to