I think we should deprecate support for offline table scanning, since it shouldn't be needed with the availability of ScanServers. Any MapReduce that previously relied on scanning offline tables could be made to use that instead.
I agree there is a need to have an immutable table state, for which it is possible to read, but no changes can be made. However, even in that "locked" state, one should still be able to perform surgery on its metadata, or manually / surgically compact files (with the understanding that doing so will interfere with any concurrent export or scan operations that are relying on it being immutable, which I think is a tolerable amount of risk, when actually in a situation where such surgery is needed). As for "ondemand" table state, from a user perspective, I'm not sure what it means... is the "on-demand availability" applicable only for live ingest / immediate consistency? Is it still "always available" for bulk import / ScanServers? Or does "on-demand availability" somehow apply to all interactions, including bulk import and ScanServer reads? I think the "ondemand" state is confusing, because it's exposing internal state through to the user, and in a way that isn't as clear as the simple "online/offline" states used to be. Previously, users didn't need to understand what was going on internally... "online" just meant "I can interact with this table", and "offline" meant "I can't interact with this table". The user wasn't required to understand what a tablet was, or how it was hosted, or anything of that nature. As we started adding support for "offline" features, the lines separating "online and offline" meaning "available and unavailable" became blurred. As we proceed adding elasticity, I think we should work to make things more clear and explicit again... and I think "ondemand" as a table state, makes things even less clear when the concept is exposed to the user as a separate table state. I do think we need some kind of on-demand availability for live-ingest and immediate consistency in order to be more elastic, and from the discussion, it's obvious we need an immutable table state, but I think it's a mistake to expose the on-demand availability for live-ingest and immediate consistency as a new table state. I think that should be left as either some kind of automatic internal behavior, or as a secondary fine-grained control over an online table (like pinned tablets, either permanently pinned or temporally pinned, based on activity). On Tue, Mar 28, 2023 at 9:51 AM Drew Farris <d...@apache.org> wrote: > > On Mon, Mar 27, 2023 at 2:16 PM Keith Turner <ke...@deenlo.com> wrote: > > > One realization that came out examining the different table states is > > that export table currently relies on the fact that offline tables > > will not delete files. If we enable compactions on offline tables > > then that could cause files to be deleted which would break the > > expectation of export table. > > > > This is a good point. I hadn't considered the potential breakage to export > table. I suspect another concern could be the hadoop input format that > operates over the rfiles in an offline table - and can do so relatively > safely > because the table is not expected to change while it is offline. > > So, it would seem that there is value in having an 'immutable' table state > in > the form of an offline table. Perhaps 'ondemand' is the alternate state > that > lets us do things like import, split, compact, merge, etc.