xudong963 commented on issue #15177:
URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2718074072

   There is a similar thought named `prewhere`: 
https://clickhouse.com/docs/sql-reference/statements/select/prewhere.
   
   Even though it aims to filter, the idea is similar, for example:
   
   Table `t` has 100 columns, one of them is `a`, for sql: `select * from t 
where a = 1`, it'll do the following steps:
   1. First, read only the data in column a
   2. Apply a = 1 filter to filter out matching rows.
   3. Read the remaining 99 columns only for those matching rows.
   
   
   Back to topk, `select * from t order by a limit 10`
   1. First, read only the data in column a
   2. Perform a sort to find the row_id of the top 10 rows.
   3. Row identifiers as determined by 2 and selectively read only the other 
columns of these 10 rows.
   
   We can spilt the idea to the query:
   ```sql
   WITH ids AS (SELECT row_id, a FROM t ORDER BY a LIMIT 10)
   SELECT t.* FROM t JOIN ids WHERE t.row_id IN (SELECT row_id FROM ids)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to