Re: [PR] GH-38255: [Go][C++] Implement Flight SQL Bulk Ingestion [arrow]

via GitHub Thu, 09 Nov 2023 16:19:39 -0800


joellubi commented on code in PR #38385:
URL: https://github.com/apache/arrow/pull/38385#discussion_r1388725396



##########
format/FlightSql.proto:
##########
@@ -1778,6 +1794,47 @@ message CommandPreparedStatementUpdate {
   bytes prepared_statement_handle = 1;
 }
 
+/*
+ * Represents a bulk ingestion request. Used in the command member of 
FlightDescriptor
+ * for the the RPC call DoPut to cause the server load the contents of the 
stream's
+ * FlightData into the target destination.
+ */
+message CommandStatementIngest {
+  option (experimental) = true;
+
+  // Describes the behavior for loading bulk data.
+  enum IngestMode {
+    // Ingestion behavior unspecified.
+    INGEST_MODE_UNSPECIFIED = 0;
+    // Create the target table. Fail if the target table already exists.
+    INGEST_MODE_CREATE = 1;
+    // Append to an existing target table. Fail if the target table does not 
exist.
+    INGEST_MODE_APPEND = 2;
+    // Drop the target table if it exists. Then follow INGEST_MODE_CREATE 
behavior.
+    INGEST_MODE_REPLACE = 3;
+    // Create the target table if it does not exist. Then follow 
INGEST_MODE_APPEND behavior.
+    INGEST_MODE_CREATE_APPEND = 4;
+  }
+
+  // The ingestion behavior.
+  IngestMode mode = 1;
+  // The table to load data into.
+  string table = 2;
+  // The db_schema of the destination table to load data into. If unset, a 
backend-specific default may be used.
+  optional string schema = 3;
+  // The catalog of the destination table to load data into. If unset, a 
backend-specific default may be used.
+  optional string catalog = 4;
+  // Use a temporary table.
+  optional bool temporary = 5;

Review Comment:
   I took some time to map out how this could work in the various ingestion 
scenarios and ran into some trouble with certain edge cases. After reviewing 
how several DB engines I'm familiar with handle these scenarios, I can see the 
benefit of making an opinionated choice regarding the semantics.
   
   Based on my observations, there are at least **two** qualities of temporary 
tables that most clients would expect:
   1. They are dropped automatically by the DB at the end of the session. The 
scope of the session could be a connection, transaction, etc. The specific 
scope seems to be backend-specific, and sometimes configurable.
   2. Another quality that seems common to implementations I've seen is that 
you treat temp tables _exactly the same_ as a "regular" table after they've 
been created, for the duration of the session. This means that the only time a 
client would specify that a table is temporary is during creation (i.e. 
`CREATE` vs `CREATE TEMP`). For this reason I take back my statement implying 
that `temporary` is unrelated to table creation. To the contrary it seems that 
table creation is the only place where `temporary` should have an effect.
   
   This is based on DBs I'm familiar with, so I would appreciate your 
perspectives on whether these expectations are broadly applicable.
   
   Putting these observations together, I would propose the following semantics 
when used with `temporary`:
   - `INGEST_MODE_CREATE`: Create a temp table. Fail if it already exists. 
Should create even if a non-temp table with the same name already exists.
   - `INGEST_MODE_APPEND`: Ignore the temporary argument. Append to the table 
specified. The backend can tell whether it's temporary or not. (I'm ok with 
returning an error here instead if there's a good reason to do so)
   - `INGEST_MODE_REPLACE`: Drop the table if it exists. Then follow 
INGEST_MODE_CREATE behavior. Fail if the target is not a temp table (might help 
avoid unexpected data loss).
   - `INGEST_MODE_CREATE_APPEND`: Create a temp table if it does not exist. 
Then follow INGEST_MODE_APPEND behavior.
   
   Thoughts on these conclusions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] GH-38255: [Go][C++] Implement Flight SQL Bulk Ingestion [arrow]

Reply via email to