> I think we should deprecate support for offline table scanning, since it
shouldn't be needed with the availability of ScanServers.

Just making sure I understand your suggestion - you mean removing the
OfflineScanner and the ability to scan over offline tables in the MapReduce
code, but we should continue our efforts to allow Scan Servers to scan
offline tables, right?

> As for "ondemand" table state, from a user perspective, I'm not sure what
it mean

I have been thinking about it as "online" means always hosted, "ondemand"
means hosted as needed, and "offline" means never hosted.

> is the "on-demand availability" applicable only for live ingest /
immediate consistency? Is it still "always available"for bulk import /
ScanServers? Or does "on-demand availability" somehow apply to all
interactions, including bulk import and ScanServer reads?

We tried to reason about that in
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=247828052

Regarding the rest of your email, I think removing the ondemand state would
be ok. The ondemand commits added a new property for the user to specify
which tablet unloader class[1] to use, with the default being [2]. We could
add a new default implementation that does not unload and users would have
to opt-in to unloading by setting the property for their online tables.
However this is some code surrounding the new ondemand state that we would
need to address. For example, when a TabletServer is low on memory it
doesn't call the specified TabletUnloader, it just unloads a Tablet.

[1]
https://github.com/apache/accumulo/blob/elasticity/core/src/main/java/org/apache/accumulo/core/spi/ondemand/OnDemandTabletUnloader.java
[2]
https://github.com/apache/accumulo/blob/elasticity/core/src/main/java/org/apache/accumulo/core/spi/ondemand/DefaultOnDemandTabletUnloader.java

On Tue, Mar 28, 2023 at 10:27 AM Christopher <ctubb...@apache.org> wrote:

> I think we should deprecate support for offline table scanning, since
> it shouldn't be needed with the availability of ScanServers. Any
> MapReduce that previously relied on scanning offline tables could be
> made to use that instead.
>
> I agree there is a need to have an immutable table state, for which it
> is possible to read, but no changes can be made. However, even in that
> "locked" state, one should still be able to perform surgery on its
> metadata, or manually / surgically compact files (with the
> understanding that doing so will interfere with any concurrent export
> or scan operations that are relying on it being immutable, which I
> think is a tolerable amount of risk, when actually in a situation
> where such surgery is needed).
>
> As for "ondemand" table state, from a user perspective, I'm not sure
> what it means... is the "on-demand availability" applicable only for
> live ingest / immediate consistency? Is it still "always available"
> for bulk import / ScanServers? Or does "on-demand availability"
> somehow apply to all interactions, including bulk import and
> ScanServer reads?
>
> I think the "ondemand" state is confusing, because it's exposing
> internal state through to the user, and in a way that isn't as clear
> as the simple "online/offline" states used to be. Previously, users
> didn't need to understand what was going on internally... "online"
> just meant "I can interact with this table", and "offline" meant "I
> can't interact with this table". The user wasn't required to
> understand what a tablet was, or how it was hosted, or anything of
> that nature. As we started adding support for "offline" features, the
> lines separating "online and offline" meaning "available and
> unavailable" became blurred. As we proceed adding elasticity, I think
> we should work to make things more clear and explicit again... and I
> think "ondemand" as a table state, makes things even less clear when
> the concept is exposed to the user as a separate table state.
>
> I do think we need some kind of on-demand availability for live-ingest
> and immediate consistency in order to be more elastic, and from the
> discussion, it's obvious we need an immutable table state, but I think
> it's a mistake to expose the on-demand availability for live-ingest
> and immediate consistency as a new table state. I think that should be
> left as either some kind of automatic internal behavior, or as a
> secondary fine-grained control over an online table (like pinned
> tablets, either permanently pinned or temporally pinned, based on
> activity).
>
> On Tue, Mar 28, 2023 at 9:51 AM Drew Farris <d...@apache.org> wrote:
> >
> > On Mon, Mar 27, 2023 at 2:16 PM Keith Turner <ke...@deenlo.com> wrote:
> >
> > > One realization that came out examining the different table states is
> > > that export table currently relies on the fact that offline tables
> > > will not delete files.  If we enable compactions on offline tables
> > > then that could cause files to be deleted which would break the
> > > expectation of export table.
> > >
> >
> > This is a good point. I hadn't considered the potential breakage to
> export
> > table. I suspect another concern could be the hadoop input format that
> > operates over the rfiles in an offline table - and can do so relatively
> > safely
> > because the table is not expected to change while it is offline.
> >
> > So, it would seem that there is value in having an 'immutable' table
> state
> > in
> > the form of an offline table. Perhaps 'ondemand' is the alternate state
> > that
> > lets us do things like import, split, compact, merge, etc.
>

Reply via email to