The description I sent is for the planner but there's of course a run-time component which would consist of a 'RecordWriter' for the underlying DB. In case of MapR-DB, this RecordWriter would simply call the underlying PUT or the Bulk PUT API. In addition, we need to figure out the tablet/region affinity. For single row inserts, doing a remote write from the foreman node may be okay but for INSERT - SELECT type of operations where the SELECT side is producing millions of rows and it has already been parallelized, these rows need to be inserted through a parallel bulk insert .. so we would want to range-partition the rows based on the tablet rowid ranges such that rows belonging to the same tablet are somewhat 'grouped together' and 2 minor fragments in Drill don't try to write to the same tablet.
Aman On Tue, May 28, 2019 at 12:50 PM Aman Sinha <[email protected]> wrote: > Yes, Calcite already supports the INSERT/UPSERT syntax. Within Drill, you > would need to 'unblock' this syntax (not all of it but whatever variation > we may want to support). You can take a look at DrillParserImpl.java > (SqlInsert() method) which is actually a generated file from JavaCC. > > We would need to look at the Calcite logical plan that is created for the > DML statements such as this and then determine the corresponding Drill > logical/physical plan. Since I haven't seen a Calcite logical plan with > DML operators yet, I am not completely sure but if it follows the standard > logical plan, then in Drill we need the following: > > - since this would only be supported for a specific storage/format > plugin, there should be an early validation check of data source (in the > FROM clause) to ensure if it qualifies > - a logical rule that converts the Calcite logical plan node to Drill > logical plan node (for example, see *DrillProjectRule*.java) > - a logical rel that represents the plan node .. e.g DrillInsertRel > (for example see existing *DrillProjectRel*). > - a physical rule (e.g see *ProjectPrule*) > - a physical rel (e.g see *ProjectPrel*) > - optimizer rules for any plugin specific pushdown are implemented > within the plugin and added to the list of rules for that plugin (e.g see > *MapRDBFormatPlugin.getOptimizerRules()*). These are then automatically > picked up by Drill. > > Aman > > > On Mon, May 27, 2019 at 11:12 PM Ted Dunning <[email protected]> > wrote: > >> Yes. CTAS should be a similar problem to unsafe inserts. >> >> We have a few people interested in the work. What is needed more is >> pointers to where to find out about the details. >> >> 1. How can we enable the syntax? >> >> 2. What operators are really necessary? >> >> 3. How should writers inject insert optimizer rules to allow insert or >> update operator pushdown? >> >> >> >> On Mon, May 27, 2019 at 9:42 PM Paul Rogers <[email protected]> >> wrote: >> >> > Hi Ted, >> > >> > Drill can do a CTAS today, which uses a writer provided by the format >> > plugin. One would think this same structure could work for an INSERT >> > operation, with a writer provided by the storage plugin. The devil, of >> > course, is always in the details. And in finding resources to do the >> work... >> > >> > Thanks, >> > - Paul >> > >> > >> > >> > On Monday, May 27, 2019, 5:28:27 PM PDT, Ted Dunning < >> > [email protected]> wrote: >> > >> > I have in mind the ability to push rows to an underlying DB without any >> > transactional support. >> > >> > >> > >> > >> > >> >
