joellubi commented on code in PR #38385:
URL: https://github.com/apache/arrow/pull/38385#discussion_r1389316958
##########
format/FlightSql.proto:
##########
@@ -1778,6 +1794,47 @@ message CommandPreparedStatementUpdate {
bytes prepared_statement_handle = 1;
}
+/*
+ * Represents a bulk ingestion request. Used in the command member of
FlightDescriptor
+ * for the the RPC call DoPut to cause the server load the contents of the
stream's
+ * FlightData into the target destination.
+ */
+message CommandStatementIngest {
+ option (experimental) = true;
+
+ // Describes the behavior for loading bulk data.
+ enum IngestMode {
+ // Ingestion behavior unspecified.
+ INGEST_MODE_UNSPECIFIED = 0;
+ // Create the target table. Fail if the target table already exists.
+ INGEST_MODE_CREATE = 1;
+ // Append to an existing target table. Fail if the target table does not
exist.
+ INGEST_MODE_APPEND = 2;
+ // Drop the target table if it exists. Then follow INGEST_MODE_CREATE
behavior.
+ INGEST_MODE_REPLACE = 3;
+ // Create the target table if it does not exist. Then follow
INGEST_MODE_APPEND behavior.
+ INGEST_MODE_CREATE_APPEND = 4;
+ }
+
+ // The ingestion behavior.
+ IngestMode mode = 1;
+ // The table to load data into.
+ string table = 2;
+ // The db_schema of the destination table to load data into. If unset, a
backend-specific default may be used.
+ optional string schema = 3;
+ // The catalog of the destination table to load data into. If unset, a
backend-specific default may be used.
+ optional string catalog = 4;
+ // Use a temporary table.
+ optional bool temporary = 5;
Review Comment:
I agree with this. There is a remaining edge case with `APPEND` that I think
is worth clarifying:
1. Create and write into a temp table `foo` (mode=`CREATE`, table=`foo`,
temporary=`true`)
2a. Append to `foo` without specifying temporary in the request
(mode=`APPEND`, table=`foo`), or...
2b. Append to `foo` with temporary `false` (mode=`APPEND`, table=`foo`,
temporary=`false`)
Even if there is an existing non-temp table `foo`, I think most clients
would expect the creation in step 1 to succeed, shadowing the original table.
Subsequent appends that omit the `temporary` arg (case 2a) would continue to
exhibit this shadowing behavior, targeting the temp table as long as it exists
in the session. This behavior would be identical in this case to setting
temporary=`true`, except when it is explicitly `true` an additional check is
performed to ensure the table is temporary, failing if it is not.
If subsequent appends explicitly set temporary=`false` (case 2b), then the
data is appended to the non-temp table with that name. If a non-temp table with
that name does not exist, the request fails.
A concrete example (with equivalent pseudo-SQL):
```proto
// Create temp and non-temp
CommandStatementIngest(mode=CREATE, table=foo) -> CREATE TABLE foo
CommandStatementIngest(mode=CREATE, table=foo, temporary=true) -> CREATE
TEMP TABLE foo
// Only create non-temp
CommandStatementIngest(mode=CREATE, table=bar) -> CREATE TABLE bar
// Only create temp
CommandStatementIngest(mode=CREATE, table=baz, temporary=true) -> CREATE
TEMP TABLE baz
// Temp unspecified
CommandStatementIngest(mode=APPEND, table=foo) -> INSERT INTO temp.foo //
shadowing public.foo
CommandStatementIngest(mode=APPEND, table=bar) -> INSERT INTO public.bar
CommandStatementIngest(mode=APPEND, table=baz) -> INSERT INTO temp.baz
// Temp true
CommandStatementIngest(mode=APPEND, table=foo, temporary=true) -> INSERT
INTO temp.foo
CommandStatementIngest(mode=APPEND, table=bar, temporary=true) -> ERROR //
not temporary
CommandStatementIngest(mode=APPEND, table=baz, temporary=true) -> INSERT
INTO temp.baz
// Temp false
CommandStatementIngest(mode=APPEND, table=foo, temporary=false) -> INSERT
INTO public.foo
CommandStatementIngest(mode=APPEND, table=bar, temporary=false) -> INSERT
INTO public.bar
CommandStatementIngest(mode=APPEND, table=baz, temporary=false) -> ERROR //
only temporary
```
I think this behavior would be the most intuitive based on the arguments we
expose. The last case where temporary=`false` is currently achieved in most
systems by explicitly specifying the namespace of the table (e.g. `public.foo`
instead of `foo`). So it is worth pointing out that we would be adding a
feature that isn't necessarily common (explicitly unshadow a non-temp table
without knowing the original schema name), but it may be useful and likely
wouldn't be difficult to implement even if not directly supported by the
underlying database.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]