Yes, if the table is never brought online. I believe that Keith said that
the table could still be scanned when offline with existing MapReduce code
or the OfflineScanner, which presents an issue that is not currently
handled. I think we discussed today that the same thing could be achieved
with tables in the on demand state. The reason to not modify an offline
table is the export case, where the table needs to be immutable until the
files are copied.

On Thu, Mar 23, 2023, 6:58 PM Christopher <ctubb...@apache.org> wrote:

> What do you mean by "when not used in this manner"? What other way is
> there to use that feature? Do you mean simply never being brought
> online?
>
> Would it be possible to support (external) compactions for an offline
> table?
>
> I feel like that's a pretty useful feature to revert, and would want
> to consider alternatives.
>
> On Thu, Mar 23, 2023 at 6:39 PM Dave Marion <dmario...@gmail.com> wrote:
> >
> > Keith and I had a discussion today (that included some user input)
> > regarding table operations with the new OnDemand table concept. I have
> put
> > the notes up on the wiki at:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=247828052
> .
> > One thing that came out of that is that we may want to revert the change
> in
> > the new bulk import code that allows a user to import into an offline
> > table. The feature allows a user to create a table that is initially
> > offline, bulk import data into it, then bring it online. However, when
> not
> > used in this manner the number of bulk import files would continue to
> grow
> > because compactions are never run on the table.
> >
> > On Mon, Mar 20, 2023 at 9:37 AM Dave Marion <dmario...@gmail.com> wrote:
> >
> > > Following up on this. Discussion and design documents are up on the
> > > wiki[1]. There is a GitHub project[2] for planning out some of the
> tasks,
> > > which are then turned into issues. Some of the issues have draft PRs
> > > submitted for them.
> > >
> > > [1] https://cwiki.apache.org/confluence/display/ACCUMULO/Elasticity
> > > [2] https://github.com/orgs/apache/projects/164
> > >
> > > On Wed, Feb 22, 2023 at 2:35 PM Dave Marion <dmario...@gmail.com>
> wrote:
> > >
> > >> Except for the new bulk import code, Accumulo requires that tables
> are in
> > >> an online state to work with them (ingest, scan, compact, split,
> etc.). In
> > >> some cases this could become cost prohibitive and resource
> inefficient as
> > >> resources necessary to keep the tables online might be unused. I'd
> like to
> > >> propose a new capability for Accumulo - the ability to work with
> tables
> > >> that are not online. This could either mean working with tables in an
> > >> offline state, or maybe the ability to assign/host tables/tablets on
> > >> demand.
> > >>
> > >> At a high level the two ideas currently being discussed are below. I
> > >> think in both cases the root and metadata tables must be online, table
> > >> management functions move to manager components, and compactions of
> offline
> > >> tables move to the external compaction processes. In addition, new
> metrics
> > >> would need to be emitted so that an external resource scheduler could
> spin
> > >> up/down server processes as demand changes.
> > >>
> > >>
> > >> *Offline Operations*
> > >>
> > >> This approach allows all operations to occur on offline tables at the
> > >> cost of having eventual consistency to the data at scan time (via Scan
> > >> Servers only). Live ingest could be supported through the creation of
> an
> > >> ingest server component that just receives mutations and minor
> compacts.
> > >>
> > >>
> > >>
> > >> *On-demand Tables*
> > >> This approach allows for user tables to be offline and un-hosted, but
> > >> hosts them on demand for the purpose of live ingest and immediate
> scans at
> > >> the latency cost of possibly assigning and hosting the tablet.
> > >>
> > >> We have a few releases (1.10.3, 2.1.1, and 3.0.0) coming up in likely
> the
> > >> next month or two, but after that I'd like to start implementing
> something
> > >> to address this. Please contribute to the discussion if you have
> thoughts
> > >> on requirements, design, etc.
> > >>
> > >>
> > >>
> > >>
>

Reply via email to