Re: Future of storage in Aurora

Erb, Stephan Mon, 02 Oct 2017 04:24:42 -0700

What do you have in mind for the in-memory replacement? Revert back to the 
usage of thrift objects within plain Java containers like we do for the task 
store?


On 02.10.17, 00:59, "Bill Farner" <[email protected]> wrote:

    I would like to revive this discussion in light of some work i have been
    doing around the storage system.  The fruits of the DB storage system will
    require a lot of additional effort to reach the beneficial outcomes i laid
    out above, and i agree that we should cut our losses.
    
    I plan to introduce patches soon to introduce non-H2 in-memory store
    implementations.  *If anyone disagrees with removing the H2 implementations
    as well, please chime in here.*
    
    Disclaimer - i may propose an alternative for the persistent storage in the
    near future.
    
    On Mon, Apr 3, 2017 at 9:40 AM, Stephan Erb <[email protected]> wrote:
    
    > H2 could give us fine granular data access. However, most of our code
    > performs massive joins to reconstruct fully hydrated thrift objects.
    > Most of the time we are then only interested in very few properties of
    > those thrift structs. This applies to internal usage, but also how we
    > use the API.
    >
    > I therefore believe we have to improve and refine our domain model in
    > order to significantly improve the storage situation.
    >
    > I really liked Maxim's proposal from last year, and I think it is worth
    > reconsidering: https://docs.google.com/document/d/1myYX3yuofGr8JIzud98x
    > Xd5mqgpZ8q_RqKBpSff4-WE/edit
    >
    > Best regards,
    > Stephan
    >
    > On Thu, 2017-03-30 at 15:53 -0700, David McLaughlin wrote:
    > > So it sounds like before we make any decisions around removing the
    > > work
    > > done in H2 so far, we should figure out what is remaining to move to
    > > external storage (or if it's even still a goal).
    > >
    > > I may still play around with reviving the in-memory stores, but will
    > > separate that work from any goal to remove the H2 layer. Since it's
    > > motivated by performance, I'd verify there is a benefit before
    > > submitting
    > > any review.
    > >
    > > Thanks all for the feedback.
    > >
    > >
    > > On Thu, Mar 30, 2017 at 12:08 PM, Bill Farner <[email protected]
    > > m>
    > > wrote:
    > >
    > > > Adding some background - there were several motivators to using SQL
    > > > that
    > > > come to mind:
    > > > a) well-understood transaction isolation guarantees leading to a
    > > > simpler
    > > > programming model w.r.t. concurrency
    > > > b) ability to offload storage to a separate system (e.g. Postgres)
    > > > and
    > > > scale it separately
    > > > c) relief of computational burden of performing snapshots and
    > > > backups due
    > > > to (b)
    > > > d) simpler code and operations model due to (b)
    > > > e) schema backwards compatibility guarantees due to persistence-
    > > > friendly
    > > > migration-scripts
    > > > f) straightforward normalization to facilitate sharing of
    > > > otherwise-redundant state (I.e. TaskConfig)
    > > >
    > > > The storage overhaul comes with a huge caveat requiring the
    > > > approach to
    > > > scheduling rounds to change. I concur that the current model is
    > > > hostile to
    > > > offloaded storage, as ~all state must be read every scheduling
    > > > round. If
    > > > that cannot be worked around with lazy state or best-effort
    > > > concurrency
    > > > (I.e. in-memory caching), the approach is indeed flawed.
    > > >
    > > > On Mar 30, 2017, 10:29 AM -0700, Joshua Cohen <[email protected]>,
    > > > wrote:
    > > > > My understanding of the H2-backed stores is that at least part of
    > > > > the
    > > > > original rationale behind them was that they were meant to be an
    > > > > interim
    > > > > point on the way to external SQL-backed stores which should
    > > > > theoretically
    > > > > provide significant benefits w.r.t. to GC (obviously unproven,
    > > > > especially
    > > > > at scale).
    > > > >
    > > > > I don't disagree that the H2 stores themselves are problematic
    > > > > (to say
    > > >
    > > > the
    > > > > least); do we have evidence that returning to memory based stores
    > > > > will be
    > > > > an improvement on that?
    > > > >
    > > > > On Thu, Mar 30, 2017 at 12:16 PM, David McLaughlin <
    > > >
    > > > [email protected]
    > > > > wrote:
    > > > >
    > > > > > Hi all,
    > > > > >
    > > > > > I'd like to start a discussion around storage in Aurora.
    > > > > >
    > > > > > I think one of the biggest mistakes we made in migrating our
    > > > > > storage
    > > >
    > > > to H2
    > > > > > was deleting the memory stores as we moved. We made a pretty
    > > > > > big bet
    > > >
    > > > that
    > > > > > we could eventually make H2/relational databases work. I don't
    > > > > > think
    > > >
    > > > that
    > > > > > bet has paid off and that we need to revisit the direction
    > > > > > we're
    > > >
    > > > taking.
    > > > > >
    > > > > > My belief is that the current H2/MyBatis approach is untenable
    > > > > > for
    > > >
    > > > large
    > > > > > production clusters, at least without changing our current
    > > >
    > > > single-master
    > > > > > architecture. At Twitter we are already having to fight to keep
    > > > > > GC
    > > > > > manageable even without DbTaskStore enabled, so I don't see a
    > > > > > path
    > > >
    > > > forward
    > > > > > where we could eventually enable that. So far experiments with
    > > > > > H2
    > > >
    > > > off-heap
    > > > > > storage have provided marginal (if any) gains.
    > > > > >
    > > > > > Would anyone object to restoring the in-memory stores and
    > > > > > creating new
    > > > > > implementations for the missing ones (UpdateStore)? I'd even go
    > > >
    > > > further and
    > > > > > propose that we consider in-memory H2 and MyBatis a failed
    > > > > > experiment
    > > >
    > > > and
    > > > > > we drop that storage layer completely.
    > > > > >
    > > > > > Cheers,
    > > > > > David
    > > > > >
    >

Re: Future of storage in Aurora

Reply via email to