What do you have in mind for the in-memory replacement? Revert back to the usage of thrift objects within plain Java containers like we do for the task store?
On 02.10.17, 00:59, "Bill Farner" <wfar...@apache.org> wrote: I would like to revive this discussion in light of some work i have been doing around the storage system. The fruits of the DB storage system will require a lot of additional effort to reach the beneficial outcomes i laid out above, and i agree that we should cut our losses. I plan to introduce patches soon to introduce non-H2 in-memory store implementations. *If anyone disagrees with removing the H2 implementations as well, please chime in here.* Disclaimer - i may propose an alternative for the persistent storage in the near future. On Mon, Apr 3, 2017 at 9:40 AM, Stephan Erb <s...@apache.org> wrote: > H2 could give us fine granular data access. However, most of our code > performs massive joins to reconstruct fully hydrated thrift objects. > Most of the time we are then only interested in very few properties of > those thrift structs. This applies to internal usage, but also how we > use the API. > > I therefore believe we have to improve and refine our domain model in > order to significantly improve the storage situation. > > I really liked Maxim's proposal from last year, and I think it is worth > reconsidering: https://docs.google.com/document/d/1myYX3yuofGr8JIzud98x > Xd5mqgpZ8q_RqKBpSff4-WE/edit > > Best regards, > Stephan > > On Thu, 2017-03-30 at 15:53 -0700, David McLaughlin wrote: > > So it sounds like before we make any decisions around removing the > > work > > done in H2 so far, we should figure out what is remaining to move to > > external storage (or if it's even still a goal). > > > > I may still play around with reviving the in-memory stores, but will > > separate that work from any goal to remove the H2 layer. Since it's > > motivated by performance, I'd verify there is a benefit before > > submitting > > any review. > > > > Thanks all for the feedback. > > > > > > On Thu, Mar 30, 2017 at 12:08 PM, Bill Farner <wfarnerapa...@gmail.co > > m> > > wrote: > > > > > Adding some background - there were several motivators to using SQL > > > that > > > come to mind: > > > a) well-understood transaction isolation guarantees leading to a > > > simpler > > > programming model w.r.t. concurrency > > > b) ability to offload storage to a separate system (e.g. Postgres) > > > and > > > scale it separately > > > c) relief of computational burden of performing snapshots and > > > backups due > > > to (b) > > > d) simpler code and operations model due to (b) > > > e) schema backwards compatibility guarantees due to persistence- > > > friendly > > > migration-scripts > > > f) straightforward normalization to facilitate sharing of > > > otherwise-redundant state (I.e. TaskConfig) > > > > > > The storage overhaul comes with a huge caveat requiring the > > > approach to > > > scheduling rounds to change. I concur that the current model is > > > hostile to > > > offloaded storage, as ~all state must be read every scheduling > > > round. If > > > that cannot be worked around with lazy state or best-effort > > > concurrency > > > (I.e. in-memory caching), the approach is indeed flawed. > > > > > > On Mar 30, 2017, 10:29 AM -0700, Joshua Cohen <jco...@apache.org>, > > > wrote: > > > > My understanding of the H2-backed stores is that at least part of > > > > the > > > > original rationale behind them was that they were meant to be an > > > > interim > > > > point on the way to external SQL-backed stores which should > > > > theoretically > > > > provide significant benefits w.r.t. to GC (obviously unproven, > > > > especially > > > > at scale). > > > > > > > > I don't disagree that the H2 stores themselves are problematic > > > > (to say > > > > > > the > > > > least); do we have evidence that returning to memory based stores > > > > will be > > > > an improvement on that? > > > > > > > > On Thu, Mar 30, 2017 at 12:16 PM, David McLaughlin < > > > > > > dmclaugh...@apache.org > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I'd like to start a discussion around storage in Aurora. > > > > > > > > > > I think one of the biggest mistakes we made in migrating our > > > > > storage > > > > > > to H2 > > > > > was deleting the memory stores as we moved. We made a pretty > > > > > big bet > > > > > > that > > > > > we could eventually make H2/relational databases work. I don't > > > > > think > > > > > > that > > > > > bet has paid off and that we need to revisit the direction > > > > > we're > > > > > > taking. > > > > > > > > > > My belief is that the current H2/MyBatis approach is untenable > > > > > for > > > > > > large > > > > > production clusters, at least without changing our current > > > > > > single-master > > > > > architecture. At Twitter we are already having to fight to keep > > > > > GC > > > > > manageable even without DbTaskStore enabled, so I don't see a > > > > > path > > > > > > forward > > > > > where we could eventually enable that. So far experiments with > > > > > H2 > > > > > > off-heap > > > > > storage have provided marginal (if any) gains. > > > > > > > > > > Would anyone object to restoring the in-memory stores and > > > > > creating new > > > > > implementations for the missing ones (UpdateStore)? I'd even go > > > > > > further and > > > > > propose that we consider in-memory H2 and MyBatis a failed > > > > > experiment > > > > > > and > > > > > we drop that storage layer completely. > > > > > > > > > > Cheers, > > > > > David > > > > > >