joellubi commented on code in PR #38385:
URL: https://github.com/apache/arrow/pull/38385#discussion_r1389316958


##########
format/FlightSql.proto:
##########
@@ -1778,6 +1794,47 @@ message CommandPreparedStatementUpdate {
   bytes prepared_statement_handle = 1;
 }
 
+/*
+ * Represents a bulk ingestion request. Used in the command member of 
FlightDescriptor
+ * for the the RPC call DoPut to cause the server load the contents of the 
stream's
+ * FlightData into the target destination.
+ */
+message CommandStatementIngest {
+  option (experimental) = true;
+
+  // Describes the behavior for loading bulk data.
+  enum IngestMode {
+    // Ingestion behavior unspecified.
+    INGEST_MODE_UNSPECIFIED = 0;
+    // Create the target table. Fail if the target table already exists.
+    INGEST_MODE_CREATE = 1;
+    // Append to an existing target table. Fail if the target table does not 
exist.
+    INGEST_MODE_APPEND = 2;
+    // Drop the target table if it exists. Then follow INGEST_MODE_CREATE 
behavior.
+    INGEST_MODE_REPLACE = 3;
+    // Create the target table if it does not exist. Then follow 
INGEST_MODE_APPEND behavior.
+    INGEST_MODE_CREATE_APPEND = 4;
+  }
+
+  // The ingestion behavior.
+  IngestMode mode = 1;
+  // The table to load data into.
+  string table = 2;
+  // The db_schema of the destination table to load data into. If unset, a 
backend-specific default may be used.
+  optional string schema = 3;
+  // The catalog of the destination table to load data into. If unset, a 
backend-specific default may be used.
+  optional string catalog = 4;
+  // Use a temporary table.
+  optional bool temporary = 5;

Review Comment:
   I agree with this. There is a remaining edge case with `APPEND` that I think 
is worth clarifying:
   1. Create and write into a temp table `foo` (mode=`CREATE`, table=`foo`, 
temporary=`true`)
   2a. Append to `foo` without specifying temporary in the request 
(mode=`APPEND`, table=`foo`), or...
   2b. Append to `foo` with temporary `false` (mode=`APPEND`, table=`foo`, 
temporary=`false`)
   
   Even if there is an existing non-temp table `foo`, I think most clients 
would expect the creation in step 1 to succeed, shadowing the original table.
   
   Subsequent appends that omit the `temporary` arg (case 2a) would continue to 
exhibit this shadowing behavior, targeting the temp table as long as it exists 
in the session. This behavior would be identical in this case to setting 
temporary=`true`, except when it is explicitly `true` an additional check is 
performed to ensure the table is temporary, failing if it is not.
   
   If subsequent appends explicitly set temporary=`false` (case 2b), then the 
data is appended to the non-temp table with that name. If a non-temp table with 
that name does not exist, the request fails.
   
   A concrete example (with equivalent pseudo-SQL):
   ```proto
   // Create temp and non-temp
   CommandStatementIngest(mode=CREATE, table=foo) -> CREATE TABLE foo
   CommandStatementIngest(mode=CREATE, table=foo, temporary=true) -> CREATE 
TEMP TABLE foo
   
   // Only create non-temp
   CommandStatementIngest(mode=CREATE, table=bar) -> CREATE TABLE bar
   
   // Only create temp
   CommandStatementIngest(mode=CREATE, table=baz, temporary=true) -> CREATE 
TEMP TABLE baz
   
   // Temp unspecified
   CommandStatementIngest(mode=APPEND, table=foo) -> INSERT INTO temp.foo // 
shadowing public.foo
   CommandStatementIngest(mode=APPEND, table=bar) -> INSERT INTO public.bar
   CommandStatementIngest(mode=APPEND, table=baz) -> INSERT INTO temp.baz
   
   // Temp true
   CommandStatementIngest(mode=APPEND, table=foo, temporary=true) -> INSERT 
INTO temp.foo
   CommandStatementIngest(mode=APPEND, table=bar, temporary=true) -> ERROR // 
not temporary
   CommandStatementIngest(mode=APPEND, table=baz, temporary=true) -> INSERT 
INTO temp.baz
   
   // Temp false
   CommandStatementIngest(mode=APPEND, table=foo, temporary=false) -> INSERT 
INTO public.foo
   CommandStatementIngest(mode=APPEND, table=bar, temporary=false) -> INSERT 
INTO public.bar
   CommandStatementIngest(mode=APPEND, table=baz, temporary=false) -> ERROR // 
only temporary
   ```
   
   I think this behavior would be the most intuitive based on the arguments we 
expose. The last case where temporary=`false` is currently achieved in most 
systems by explicitly specifying the namespace of the table (e.g. `public.foo` 
instead of `foo`). So it is worth pointing out that we would be adding a 
feature that isn't necessarily common (explicitly unshadow a non-temp table 
without knowing the original schema name), but it may be useful and likely 
wouldn't be difficult to implement even if not directly supported by the 
underlying database.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to