Re: DateTieredCompactionStrategy and static columns

Benedict Elliott Smith Fri, 01 May 2015 07:15:15 -0700

It also doesn't solve the atomicity problem, which is its own challenge. We
would probably need to merge the memtables for the entire keyspace/node,
and split them out into their own sstables on flush. Or introduce mutual
exclusion at the partition key level for the node.


On Fri, May 1, 2015 at 3:01 PM, Jonathan Ellis <[email protected]> wrote:

> I'm down for adding JOIN support within a partition, eventually.  I can see
> a lot of stuff I'd rather prioritize higher in the short term though.
>
> On Fri, May 1, 2015 at 8:44 AM, Jonathan Haddad <[email protected]> wrote:
>
> > I think what Benedict has described feels very much like a very
> specialized
> > version of the following:
> >
> > 1. Updates to different tables in a batch become atomic if the node is a
> > replica for the partition
> > 2. Supporting Inner joins if the partition key is the same in both
> tables.
> >
> > I'd rather see join support personally :)
> >
> > Jon
> >
> > On Fri, May 1, 2015 at 6:38 AM graham sanderson <[email protected]> wrote:
> >
> > > I 100% agree with Benedict, but just to be clear about my use case
> > >
> > > 1) We have state of lets say real estate listings
> > > 2) We get field level deltas for them
> > > 3) Previously we would store the base state all the deltas in partition
> > > and roll them up from the beginning of time (this was a prototype and
> > silly
> > > since there was no expiration strategy)
> > > 4) Preferred plan is to keep current state in a static map (i.e. one
> > delta
> > > field only updates one cell) - we are MVCC but in the common case the
> > > latest version will be what we want
> > > 5) However we require history, so we’d use the partition to keep TTL
> > > deltas going backwards from the now state - this seems like a common
> > > pattern people would want. Note also that sometimes we might need to
> > apply
> > > reverse deltas if C* is ahead of our SOLR indexes
> > >
> > > The static columns and the regular columns ARE completely different in
> > > behavior/lifecycle, so I’d definitely vote for them being treated as
> > such.
> > >
> > >
> > > > On May 1, 2015, at 7:27 AM, Benedict Elliott Smith <
> > > [email protected]> wrote:
> > > >
> > > >>
> > > >> How would it be different from creating an actual real extra table
> > > instead?
> > > >
> > > >
> > > > There's nothing that warrants making the codebase more complex to
> > > >> accomplish something it already does.
> > > >
> > > >
> > > > As far as I was aware, the only point of static columns was to
> support
> > > the
> > > > thrift ability to mutate and read them in the same expression, with
> > > > atomicity and isolation. As to whether or not it is more complex, I'm
> > not
> > > > at all convinced that it would be. We have had a lot of unexpected
> > > special
> > > > casing added to ensure they behave correctly (e.g. paging is broken),
> > and
> > > > have complicated the comparison/slice logic to accommodate them, so
> > that
> > > it
> > > > is harder to reason about (and to optimise). They also have very
> > > different
> > > > compaction characteristics, so the complexity on the user is
> increased
> > > > without their necessarily realising it. All told, it introduces a lot
> > > more
> > > > subtlety of behaviour than there would be with a separate set of
> > > sstables,
> > > > or perhaps a separate file attached to each sstable.
> > > >
> > > > Of course, we've already implemented it as a specialisation of the
> > > > slice/comparator, I think because it seemed like the least frictional
> > > path
> > > > to do so, but that doesn't mean it is the least complex. It does mean
> > > it's
> > > > the least work (assuming we're now on top of the bugs), which is its
> > own
> > > > virtue.
> > > >
> > > > There are some advantages to having them managed separately, and
> > > advantages
> > > > to having them combined. Combined, for small partitions, they can be
> > read
> > > > in the same seek. However for large partitions this is no longer
> true,
> > > and
> > > > we may behave much worse by polluting the page cache with lots of
> > > unwanted
> > > > data that is adjacent to the static columns. If they were managed
> > > > separately, the page cache would be populated mostly with other
> static
> > > > columns, which may be more likely of use. We could quite easily have
> a
> > > > "static column" cache, also, and completely avoid merging them. Or at
> > > least
> > > > we could easily read them with collectTimeOrderedData instead of
> > > > collectAllData semantics.
> > > >
> > > > All told, it certainly isn't a terrible idea, and shouldn't be
> > dismissed
> > > so
> > > > readily. Personally I think in the long run whether or not we manage
> > > static
> > > > columns together with non-static columns is dependent on if we intend
> > to
> > > > add tiered "static" columns (i.e., if each level of clustering
> > component
> > > > can have columns associated with it). If we do, we should definitely
> > keep
> > > > it all inline. If not, it probably permits a lot better behaviour to
> > > > separate them, since it's easier to reason about and improve their
> > > distinct
> > > > characteristics.
> > > >
> > > >
> > > > On Fri, May 1, 2015 at 1:24 AM, graham sanderson <[email protected]>
> > > wrote:
> > > >
> > > >> Well you lose the atomicity and isolation, but in this case that is
> > > >> probably fine
> > > >>
> > > >> That said, in every interaction I’ve had with static columns, they
> > seem
> > > to
> > > >> be an odd duck (e.g. adding or complicating range slices), perhaps
> > > worthy
> > > >> of their own code path and sstables. Just food for thought.
> > > >>
> > > >>> On Apr 30, 2015, at 7:13 PM, Jonathan Haddad <[email protected]>
> > > wrote:
> > > >>>
> > > >>> If you want it in a separate sstable, just use a separate table.
> > > There's
> > > >>> nothing that warrants making the codebase more complex to
> accomplish
> > > >>> something it already does.
> > > >>>
> > > >>> On Thu, Apr 30, 2015 at 5:07 PM graham sanderson <[email protected]>
> > > >> wrote:
> > > >>>
> > > >>>> Anyone here have an opinion; how realistic would it be to have a
> > > >> separate
> > > >>>> memtable/sstable for static columns?
> > > >>>>
> > > >>>> Begin forwarded message:
> > > >>>>
> > > >>>> *From: *Jonathan Haddad <[email protected]>
> > > >>>> *Subject: **Re: DateTieredCompactionStrategy and static columns*
> > > >>>> *Date: *April 30, 2015 at 3:55:46 PM CDT
> > > >>>> *To: *[email protected]
> > > >>>> *Reply-To: *[email protected]
> > > >>>>
> > > >>>>
> > > >>>> I suspect this will kill the benefit of DTCS, but haven't tested
> it
> > to
> > > >> be
> > > >>>> 100% here.
> > > >>>>
> > > >>>> The benefit of DTCS is that sstables are selected for compaction
> > based
> > > >> on
> > > >>>> the age of the data, not their size.  When you mix TTL'ed data and
> > non
> > > >>>> TTL'ed data, you end up screwing with the "drop the entire
> SSTable"
> > > >>>> optimization.  I don't believe this is any different just because
> > > you're
> > > >>>> mixing in static columns.  What I think will happen is you'll end
> up
> > > >> with
> > > >>>> an sstable that's almost entirely TTL'ed with a few static columns
> > > that
> > > >>>> will never get compacted or dropped.  Pretty much the worst
> > scenario I
> > > >> can
> > > >>>> think of.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Apr 30, 2015 at 11:21 AM graham sanderson <
> [email protected]>
> > > >> wrote:
> > > >>>>
> > > >>>>> I have a potential use case I haven’t had a chance to prototype
> > yet,
> > > >>>>> which would normally be a good candidate for DTCS (i.e. data
> > > delivered
> > > >> in
> > > >>>>> order and a fixed TTL), however with every write we’d also be
> > > updating
> > > >> some
> > > >>>>> static cells (namely a few key/values in a static map<text.text>
> > CQL
> > > >>>>> column). There could also be explicit deletes of keys in the
> static
> > > >> map,
> > > >>>>> though that’s not 100% necessary.
> > > >>>>>
> > > >>>>> Since those columns don’t have TTL, without reading thru the code
> > > code
> > > >>>>> and/or trying it, I have no idea what effect this has on DTCS
> > > (perhaps
> > > >> it
> > > >>>>> needs to use separate sstables for static columns). Has anyone
> > tried
> > > >> this.
> > > >>>>> If not I eventually will and will report back.
> > > >>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

Re: DateTieredCompactionStrategy and static columns

Reply via email to