Related work inn Iceberg. Worth a read : https://docs.google.com/document/d/1Pk34C3diOfVCRc-sfxfhXZfzvxwum1Odo-6Jj9mwK38/edit#
On Tue, May 28, 2019 at 2:17 PM Aman Sinha <[email protected]> wrote: > The description I sent is for the planner but there's of course a run-time > component which would consist of a 'RecordWriter' for the underlying DB. > In case of MapR-DB, this RecordWriter would simply call the underlying PUT > or the Bulk PUT API. In addition, we need to figure out the tablet/region > affinity. For single row inserts, doing a remote write from the foreman > node may be okay but for INSERT - SELECT type of operations where the > SELECT side is producing millions of rows and it has already been > parallelized, these rows need to be inserted through a parallel bulk insert > .. so we would want to range-partition the rows based on the tablet rowid > ranges such that rows belonging to the same tablet are somewhat 'grouped > together' and 2 minor fragments in Drill don't try to write to the same > tablet. > > Aman > > > On Tue, May 28, 2019 at 12:50 PM Aman Sinha <[email protected]> wrote: > > > Yes, Calcite already supports the INSERT/UPSERT syntax. Within Drill, > you > > would need to 'unblock' this syntax (not all of it but whatever variation > > we may want to support). You can take a look at DrillParserImpl.java > > (SqlInsert() method) which is actually a generated file from JavaCC. > > > > We would need to look at the Calcite logical plan that is created for the > > DML statements such as this and then determine the corresponding Drill > > logical/physical plan. Since I haven't seen a Calcite logical plan with > > DML operators yet, I am not completely sure but if it follows the > standard > > logical plan, then in Drill we need the following: > > > > - since this would only be supported for a specific storage/format > > plugin, there should be an early validation check of data source (in the > > FROM clause) to ensure if it qualifies > > - a logical rule that converts the Calcite logical plan node to Drill > > logical plan node (for example, see *DrillProjectRule*.java) > > - a logical rel that represents the plan node .. e.g DrillInsertRel > > (for example see existing *DrillProjectRel*). > > - a physical rule (e.g see *ProjectPrule*) > > - a physical rel (e.g see *ProjectPrel*) > > - optimizer rules for any plugin specific pushdown are implemented > > within the plugin and added to the list of rules for that plugin (e.g see > > *MapRDBFormatPlugin.getOptimizerRules()*). These are then automatically > > picked up by Drill. > > > > Aman > > > > > > On Mon, May 27, 2019 at 11:12 PM Ted Dunning <[email protected]> > > wrote: > > > >> Yes. CTAS should be a similar problem to unsafe inserts. > >> > >> We have a few people interested in the work. What is needed more is > >> pointers to where to find out about the details. > >> > >> 1. How can we enable the syntax? > >> > >> 2. What operators are really necessary? > >> > >> 3. How should writers inject insert optimizer rules to allow insert or > >> update operator pushdown? > >> > >> > >> > >> On Mon, May 27, 2019 at 9:42 PM Paul Rogers <[email protected]> > >> wrote: > >> > >> > Hi Ted, > >> > > >> > Drill can do a CTAS today, which uses a writer provided by the format > >> > plugin. One would think this same structure could work for an INSERT > >> > operation, with a writer provided by the storage plugin. The devil, of > >> > course, is always in the details. And in finding resources to do the > >> work... > >> > > >> > Thanks, > >> > - Paul > >> > > >> > > >> > > >> > On Monday, May 27, 2019, 5:28:27 PM PDT, Ted Dunning < > >> > [email protected]> wrote: > >> > > >> > I have in mind the ability to push rows to an underlying DB without > any > >> > transactional support. > >> > > >> > > >> > > >> > > >> > > >> > > >
