xudong963 commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2718074072
There is a similar thought named `prewhere`: https://clickhouse.com/docs/sql-reference/statements/select/prewhere. Even though it aims to filter, the idea is similar, for example: Table `t` has 100 columns, one of them is `a`, for sql: `select * from t where a = 1`, it'll do the following steps: 1. First, read only the data in column a 2. Apply a = 1 filter to filter out matching rows. 3. Read the remaining 99 columns only for those matching rows. Back to topk, `select * from t order by a limit 10` 1. First, read only the data in column a 2. Perform a sort to find the row_id of the top 10 rows. 3. Row identifiers as determined by 2 and selectively read only the other columns of these 10 rows. We can spilt the idea to the query: ```sql WITH ids AS (SELECT row_id, a FROM t ORDER BY a LIMIT 10) SELECT t.* FROM t JOIN ids WHERE t.row_id IN (SELECT row_id FROM ids) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org