GitHub user mengw15 edited a comment on the discussion: Design: interactive 
grid for the operator result pane

Thanks for putting this together - a few comments / questions:

1. Sort can't be pushed down (Iceberg has no order-by), and the residual 
filters (contains / endsWith / row-search) can't be pruned by file stats 
either. Even the pushdownable ops (=, <, >, in, startsWith) only skip files 
when the data is clustered by that column — operator results are written in 
arrival order, so min/max ranges overlap and pruning is usually weak. On a 
large output this can mean scanning most of the table. Do we have a sense of 
the actual latency and compute cost there — how long does a user wait for a 
sort or row-search to come back? And if a user accidentally sorts/filters a 
huge result, is there a way to cancel an in-flight query (or a timeout) so the 
panel doesn't hang?

2. View vs. dataflow semantics. We're a workflow system, so I'm assuming the 
filter/sort here only changes what's shown in the panel — the data passed to 
the downstream operator is still the full, unfiltered output. If so, could this 
mislead users into thinking they've filtered the actual data? 

3. Persistence of the query state. Are the filter/sort (and their results) 
persisted? If a user sets a filter, switches away, and re-opens the operator, 
do they get the filtered view back, or does it re-scan and re-filter from 
scratch? 

4. Overlap with the Filter / Selection operators. We already have Filter and 
Selection operators. For a dataflow system, the more intuitive way to persist a 
filter is an operator — its output flows downstream (which also addresses the 
second point) and stays semantically consistent with the rest of the system; 
and if the operator-result cache (cc @Xiao-zhen-Liu ) is enabled, the cost 
should be comparable since the upstream is cached. Curious how you're drawing 
that line.

GitHub link: 
https://github.com/apache/texera/discussions/5395#discussioncomment-17287802

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to