Jason,

> I'm new to Calcite (and Druid) so if I have some terminology
> incorrect, please point it out.

From a Calcite perspective, I can tell you that your terminology (and ideas) 
seem spot on.

I can’t say whether they make sense in Druid (or are easy to achieve).

Julian


> On May 13, 2021, at 4:21 PM, Jason Koch <jk...@netflix.com.INVALID> wrote:
> 
> Hi all,
> 
> I'm looking to implement push-down for some operations in the
> SystemSchema class, and looking for your input on the best way to
> tackle this.
> 
> With profiling, we have found some UI slowness related to large
> segment counts and task counts. Inspecting the code, it seems that
> much of the data is fully materialized before often being discarded
> [1], which makes it a good opportunity for a pushdown optimization.
> This would make for a more snappy UI experience for segments, tasks
> and so on. It would also I believe address #6827 [2].
> 
> In looking at the code I can see a couple of approaches that might be 
> sensible:
> 
> - Modify the tables to support the linq4j Queryable interface, and
> have all inputs provided in a single pass. This would be a
> (relatively) straightforward way of fixing this specific problem,
> however I am not sure how extensible/reusable this is, and whether I
> have a correct understanding of Queryable.
> 
> - Build up a custom RelNode structure for the sys. tables, along with
> some rules, that could perform the required Logical operations in a
> single pass on the underlying structures. This seems that perhaps some
> of the rules would be more reusable and more in line with existing
> Druid query architecture, however, seems like a more complex solution.
> I think a starting point would be to convert these tables to
> `ProjectableFilterableTable` and then develop a RelOptRule to match a
> LogicalSort+tables and pushing down the required additional sort
> comparators in. On scan(), then, the sort, projection, and filter
> details would all be available to perform a single pass.
> 
> - Any other suggestions or pointers?
> 
> Other thoughts:
> - Are there any common query use cases in these schemas that you think
> we should target as a goal for opt rules? If I know in advance then I
> can use that to guide the work.
> - If complexity goes up a lot, especially for the second option, it
> might be beneficial to move the rules and configuration to a new
> package (org.apache.druid.sql.calcite.schema.sys?).
> - It seems that these queries are currently performed in the Bindable
> convention which would be a little slower than the Enumerable
> convention. Is there any appetite to switch? I did not identify any
> negative consequences from my reading.
> 
> I'm new to Calcite (and Druid) so if I have some terminology
> incorrect, please point it out.
> 
> [1] 
> https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java#L292-L373
> - for ex, a "select * from sys.segments order by date desc limit 25"
> requires full materialization of all fields of all objects
> (.toString()) in order to correctly sort, at which point we pick first
> 25 rows, and then most data is not needed. Ideally we could perform a
> sort based on underlying timestamp, and only materialize the results
> for the first 25 discovered rows.
> [2] https://github.com/apache/druid/issues/6827
> 
> Thanks
> Jason
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to