I think we should deprecate support for offline table scanning, since
it shouldn't be needed with the availability of ScanServers. Any
MapReduce that previously relied on scanning offline tables could be
made to use that instead.

I agree there is a need to have an immutable table state, for which it
is possible to read, but no changes can be made. However, even in that
"locked" state, one should still be able to perform surgery on its
metadata, or manually / surgically compact files (with the
understanding that doing so will interfere with any concurrent export
or scan operations that are relying on it being immutable, which I
think is a tolerable amount of risk, when actually in a situation
where such surgery is needed).

As for "ondemand" table state, from a user perspective, I'm not sure
what it means... is the "on-demand availability" applicable only for
live ingest / immediate consistency? Is it still "always available"
for bulk import / ScanServers? Or does "on-demand availability"
somehow apply to all interactions, including bulk import and
ScanServer reads?

I think the "ondemand" state is confusing, because it's exposing
internal state through to the user, and in a way that isn't as clear
as the simple "online/offline" states used to be. Previously, users
didn't need to understand what was going on internally... "online"
just meant "I can interact with this table", and "offline" meant "I
can't interact with this table". The user wasn't required to
understand what a tablet was, or how it was hosted, or anything of
that nature. As we started adding support for "offline" features, the
lines separating "online and offline" meaning "available and
unavailable" became blurred. As we proceed adding elasticity, I think
we should work to make things more clear and explicit again... and I
think "ondemand" as a table state, makes things even less clear when
the concept is exposed to the user as a separate table state.

I do think we need some kind of on-demand availability for live-ingest
and immediate consistency in order to be more elastic, and from the
discussion, it's obvious we need an immutable table state, but I think
it's a mistake to expose the on-demand availability for live-ingest
and immediate consistency as a new table state. I think that should be
left as either some kind of automatic internal behavior, or as a
secondary fine-grained control over an online table (like pinned
tablets, either permanently pinned or temporally pinned, based on
activity).

On Tue, Mar 28, 2023 at 9:51 AM Drew Farris <d...@apache.org> wrote:
>
> On Mon, Mar 27, 2023 at 2:16 PM Keith Turner <ke...@deenlo.com> wrote:
>
> > One realization that came out examining the different table states is
> > that export table currently relies on the fact that offline tables
> > will not delete files.  If we enable compactions on offline tables
> > then that could cause files to be deleted which would break the
> > expectation of export table.
> >
>
> This is a good point. I hadn't considered the potential breakage to export
> table. I suspect another concern could be the hadoop input format that
> operates over the rfiles in an offline table - and can do so relatively
> safely
> because the table is not expected to change while it is offline.
>
> So, it would seem that there is value in having an 'immutable' table state
> in
> the form of an offline table. Perhaps 'ondemand' is the alternate state
> that
> lets us do things like import, split, compact, merge, etc.

Reply via email to