Re: Dynamic Scaling of Accumulo

Christopher Thu, 23 Mar 2023 15:58:52 -0700

What do you mean by "when not used in this manner"? What other way is
there to use that feature? Do you mean simply never being brought
online?


Would it be possible to support (external) compactions for an offline table?

I feel like that's a pretty useful feature to revert, and would want
to consider alternatives.

On Thu, Mar 23, 2023 at 6:39 PM Dave Marion <[email protected]> wrote:
>
> Keith and I had a discussion today (that included some user input)
> regarding table operations with the new OnDemand table concept. I have put
> the notes up on the wiki at:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=247828052.
> One thing that came out of that is that we may want to revert the change in
> the new bulk import code that allows a user to import into an offline
> table. The feature allows a user to create a table that is initially
> offline, bulk import data into it, then bring it online. However, when not
> used in this manner the number of bulk import files would continue to grow
> because compactions are never run on the table.
>
> On Mon, Mar 20, 2023 at 9:37 AM Dave Marion <[email protected]> wrote:
>
> > Following up on this. Discussion and design documents are up on the
> > wiki[1]. There is a GitHub project[2] for planning out some of the tasks,
> > which are then turned into issues. Some of the issues have draft PRs
> > submitted for them.
> >
> > [1] https://cwiki.apache.org/confluence/display/ACCUMULO/Elasticity
> > [2] https://github.com/orgs/apache/projects/164
> >
> > On Wed, Feb 22, 2023 at 2:35 PM Dave Marion <[email protected]> wrote:
> >
> >> Except for the new bulk import code, Accumulo requires that tables are in
> >> an online state to work with them (ingest, scan, compact, split, etc.). In
> >> some cases this could become cost prohibitive and resource inefficient as
> >> resources necessary to keep the tables online might be unused. I'd like to
> >> propose a new capability for Accumulo - the ability to work with tables
> >> that are not online. This could either mean working with tables in an
> >> offline state, or maybe the ability to assign/host tables/tablets on
> >> demand.
> >>
> >> At a high level the two ideas currently being discussed are below. I
> >> think in both cases the root and metadata tables must be online, table
> >> management functions move to manager components, and compactions of offline
> >> tables move to the external compaction processes. In addition, new metrics
> >> would need to be emitted so that an external resource scheduler could spin
> >> up/down server processes as demand changes.
> >>
> >>
> >> *Offline Operations*
> >>
> >> This approach allows all operations to occur on offline tables at the
> >> cost of having eventual consistency to the data at scan time (via Scan
> >> Servers only). Live ingest could be supported through the creation of an
> >> ingest server component that just receives mutations and minor compacts.
> >>
> >>
> >>
> >> *On-demand Tables*
> >> This approach allows for user tables to be offline and un-hosted, but
> >> hosts them on demand for the purpose of live ingest and immediate scans at
> >> the latency cost of possibly assigning and hosting the tablet.
> >>
> >> We have a few releases (1.10.3, 2.1.1, and 3.0.0) coming up in likely the
> >> next month or two, but after that I'd like to start implementing something
> >> to address this. Please contribute to the discussion if you have thoughts
> >> on requirements, design, etc.
> >>
> >>
> >>
> >>

Re: Dynamic Scaling of Accumulo

Reply via email to