Re: Push-down of operations for SystemSchema tables

Jason Koch Fri, 14 May 2021 12:51:44 -0700

@Julian - thank you for review & confirming.

Hi Clint


Thank you, I appreciate the response. I have responded Inline, some
q's, I've also written in my words as a confirmation that I understand
...

> In the mid term, I think that some of us have been thinking that moving
> system tables into the Druid native query engine is the way to go, and have
> been working on resolving a number of hurdles that are required to make
> this happen. One of the main motivators to do this is so that we have just
> the Druid query path in the planner in the Calcite layer, and deprecating
> and eventually dropping the "bindable" path completely, described in
> https://github.com/apache/druid/issues/9896. System tables would be pushed
> into Druid Datasource implementations, and queries would be handled in the
> native engine. Gian has even made a prototype of what this might look like,
> https://github.com/apache/druid/compare/master...gianm:sql-sys-table-native
> since much of the ground work is now in place, though it takes a hard-line
> approach of completely removing bindable instead of hiding it behind a
> flag, and doesn't implement all of the system tables yet, at least last
> time I looked at it.

Looking over the changes it seems that:
- a new VirtualDataSource is introduced, which the Druid non-sql
processing engine can process, that can wrap an Iterable. This exposes
lazy segment & iterable using  InlineDataSource.
- the SegmentsTable has been converted from a ScannableTable to a
DruidTable, and a ScannableTableIterator is introduced to generate an
iterable containing the rows; the new VirtualDataSource can be used to
access the rows of this table.
- finally, the Bindable convention is discarded from DruidPlanner and Rules.

> I think there are a couple of remaining parts to resolve that would make
> this feasible. The first is native scan queries need support for ordering
> by arbitrary columns, instead of just time, so that we can retain
> capabilities of the existing system tables.

It seems you want to use the native queries to support ordering; do
you mean here the underlying SegmentsTable, or something in the Druid
engine? Currently, the SegmentsTable etc relies on, as you say, the
bindable convention to provide sort. If it was a DruidTable then it
seems that Sorting gets pushed into PartialDruidQuery->DruidQuery,
which conceptually is able to do a sort, but as described in [1] [2]
the ordering is not supported by the underlying druid engine [3].

This would mean that an order by, sort, limit query would not be
supported on any of the migrated sys.* tables until Druid has a way to
perform the sort on a ScanQuery.

[1] https://druid.apache.org/docs/latest/querying/scan-query.html#time-ordering
[2] 
https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java#L1075-L1078
[3] 
https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/scan/ScanQueryEngine.java

> This isn't actually a blocker
> for adding native system table queries, but rather a blocker for replacing
> the bindable convention by default so that there isn't a loss (or rather
> trade) of functionality. Additionally, I think there is maybe some matters
> regarding authorization of system tables when handled by the native engine
> that will need resolved, but this can be done while adding the native
> implementations.

It looks like the port of the tables from classic ScannableTable to a
DruidTable itself is straightforward. However, it seems this PR
doesn't bring them across from SQL domain to be available in any
native queries. I'm not sure if this is expected or an interim step or
if I have misunderstood the goal.

> I think there are some various ideas and experiments underway of how to do
> sorting on scan queries at normal Druid datasource scale, which is sort of
> a big project, but in the short term we might be able to do something less
> ambitious that works well enough at system tables scale to allow this plan
> to fully proceed.

One possible way, that I think leads in the correct direction:
1) We have an existing rule for LogicalTable with DruidTable to
DruidQueryRel which can eventually construct a DruidQuery.
2) The VirtualDataSource, created during SQL parsing takes an
already-constructed Iterable; so, we need to have already performed
the filter/sort before creating the VirtualDataSource (and
DruidQuery). This means the push-down filter logic has to happen
during sql/ stage setup and before handoff to processing/ engine.
3) Perhaps a new VirtualDruidTable subclassing DruidTable w/ a
RelOptRule that can identify LogicalXxx above a VirtualDruidTable and
push down? Then, our SegmentTable and friends can expose the correct
Iterable. This should allow us to solve the perf concerns, and would
allow us to present a correctly constructed VirtualDataSource. Sort
from SQL _should_ be supported (I think) as the planner can push the
sort etc down to these nodes directly.

In this, the majority of the work would have had to have happened
prior to Druid engine, in sql/, before reaching Druid and so Druid
core doesn't actually need to know anything about these changes.

On the other hand, whilst it keeps the pathway open, I'm not sure this
does any of the actual work to make the sys.* tables available as
native tables. If we are to try and make these into truly native
tables, without a native sort, and remove their implementation from
sql/, the DruidQuery in the planner would need to be configured to
pass the ScanQuery sort to the processing engine _but only for sys.*
tables_ and then processing engine would need to know how to find
these tables. (I haven't explored this). As you mention, implementing
native sort across multiple data sources seems like a more ambitious
piece of work.

As another idea, we could consider creating a bridge
Bindable/EnumerableToDruid rule that would allow druid to embed these
tables, and move them out of sql/ into processing/, exposed as
Iterable/Enumerable, and make them available in queries if that is a
goal. I'm not really sure that adds anything to the overall goals
though.

> Does this approach make sense? I don't believe Gian is actively working on
> this at the moment, so I think if you're interested in moving along this
> approach and want to start laying the groundwork I'm happy to provide
> guidance and help out.
>

I am interested. For my current work, I do want to keep focus on the
sys.* performance work. If there's a way to do it and lay the
groundwork or even get all the work done, then I am 100% for that.
Looking at what you want to do to convert these sys.* to native
tables, if we have a viable solution or are comfortable with my
suggestions above I'd be happy to build it out.

Thanks
Jason

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Re: Push-down of operations for SystemSchema tables

Reply via email to