Related work inn Iceberg. Worth a read :
https://docs.google.com/document/d/1Pk34C3diOfVCRc-sfxfhXZfzvxwum1Odo-6Jj9mwK38/edit#


On Tue, May 28, 2019 at 2:17 PM Aman Sinha <[email protected]> wrote:

> The description I sent is for the planner but there's of course a run-time
> component which would consist of a 'RecordWriter'  for the underlying DB.
>  In case of MapR-DB, this RecordWriter would simply call the underlying PUT
> or the Bulk PUT API.   In addition, we need to figure out the tablet/region
> affinity.  For single row inserts, doing a remote write from the foreman
> node may be okay but for INSERT - SELECT type of operations where the
> SELECT side is producing millions of rows and it has already been
> parallelized, these rows need to be inserted through a parallel bulk insert
> .. so we would want to range-partition the rows based on the  tablet rowid
> ranges  such that rows belonging to the same tablet are somewhat 'grouped
> together'  and 2 minor fragments in Drill don't try to write to the same
> tablet.
>
> Aman
>
>
> On Tue, May 28, 2019 at 12:50 PM Aman Sinha <[email protected]> wrote:
>
> > Yes, Calcite already supports the INSERT/UPSERT syntax.  Within Drill,
> you
> > would need to 'unblock' this syntax (not all of it but whatever variation
> > we may want to support). You can take a look at DrillParserImpl.java
> > (SqlInsert() method) which is actually a generated file from JavaCC.
> >
> > We would need to look at the Calcite logical plan that is created for the
> > DML statements such as this and then determine the corresponding Drill
> > logical/physical plan.  Since I haven't seen a Calcite logical plan with
> > DML operators yet, I am not completely sure but if it follows the
> standard
> > logical plan, then in Drill we need the following:
> >
> >   - since this would only be supported for a specific storage/format
> > plugin, there should be an early validation check of data source (in the
> > FROM clause) to ensure if it qualifies
> >   - a logical rule that converts the Calcite logical plan node to Drill
> > logical plan node (for example, see *DrillProjectRule*.java)
> >   - a logical rel that represents the plan node .. e.g DrillInsertRel
> > (for example see existing *DrillProjectRel*).
> >   - a physical rule  (e.g see *ProjectPrule*)
> >   - a physical rel (e.g see *ProjectPrel*)
> >   - optimizer rules for any plugin specific pushdown are implemented
> > within the plugin and added to the list of rules for that plugin (e.g see
> > *MapRDBFormatPlugin.getOptimizerRules()*).  These are then automatically
> > picked up by Drill.
> >
> > Aman
> >
> >
> > On Mon, May 27, 2019 at 11:12 PM Ted Dunning <[email protected]>
> > wrote:
> >
> >> Yes. CTAS should be a similar problem to unsafe inserts.
> >>
> >> We have a few people interested in the work. What is needed more is
> >> pointers to where to find out about the details.
> >>
> >> 1. How can we enable the syntax?
> >>
> >> 2. What operators are really necessary?
> >>
> >> 3. How should writers inject insert optimizer rules to allow insert or
> >> update operator pushdown?
> >>
> >>
> >>
> >> On Mon, May 27, 2019 at 9:42 PM Paul Rogers <[email protected]>
> >> wrote:
> >>
> >> > Hi Ted,
> >> >
> >> > Drill can do a CTAS today, which uses a writer provided by the format
> >> > plugin. One would think this same structure could work for an INSERT
> >> > operation, with a writer provided by the storage plugin. The devil, of
> >> > course, is always in the details. And in finding resources to do the
> >> work...
> >> >
> >> > Thanks,
> >> > - Paul
> >> >
> >> >
> >> >
> >> >     On Monday, May 27, 2019, 5:28:27 PM PDT, Ted Dunning <
> >> > [email protected]> wrote:
> >> >
> >> >  I have in mind the ability to push rows to an underlying DB without
> any
> >> > transactional support.
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
>

Reply via email to