zhuqi-lucas commented on PR #7454:
URL: https://github.com/apache/arrow-rs/pull/7454#issuecomment-2841893849

   > > It helps part of the regression about the read record/skip record too 
dense, which is the original regression:
   > > Here is the result for page cache without this PR: [#7363 
(comment)](https://github.com/apache/arrow-rs/issues/7363#issuecomment-2769850463)
 The regression will from Q24->28 and Q30 -> Q31.
   > > Q30 / Q31 no regression now for current PR:
   > > ```rust
   > > │ QQuery 30    │  420.68ms │                       431.94ms │     no 
change │
   > > │ QQuery 31    │  571.58ms │                       528.87ms │ +1.08x 
faster │
   > > ```
   > 
   > 
https://github.com/apache/datafusion/blob/7b370e26fea75fcd17121272eec1bd9447b2cb8f/benchmarks/queries/clickbench/queries.sql#L31-L32
   > 
   > These queries have a predicate like
   > 
   > ```sql
   > WHERE "SearchPhrase" <> ''
   > ```
   > 
   > But `SearchPhrase` is not used except for filtering (aka it is not in the 
projection)
   > 
   > For example
   > 
   > ```sql
   > SELECT "SearchEngineID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), 
AVG("ResolutionWidth") FROM hits WHERE "SearchPhrase" <> '' GROUP BY 
"SearchEngineID", "ClientIP" ORDER BY c DESC LIMIT 10;
   > ```
   > 
   > > But Q24 -> Q 28 still have regression, same with original result:
   > > ```rust
   > > │ QQuery 24    │  273.40ms │                       386.72ms │  1.41x 
slower │
   > > │ QQuery 25    │  274.14ms │                       370.83ms │  1.35x 
slower │
   > > │ QQuery 26    │  320.12ms │                       435.73ms │  1.36x 
slower │
   > > │ QQuery 27    │  900.06ms │                      1354.63ms │  1.51x 
slower │
   > > │ QQuery 28    │ 7812.82ms │                      9813.62ms │  1.26x 
slower │
   > > ```
   > 
   > 
https://github.com/apache/datafusion/blob/7b370e26fea75fcd17121272eec1bd9447b2cb8f/benchmarks/queries/clickbench/queries.sql#L25-L29
   > 
   > These queries have the same predicate
   > 
   > ```sql
   > WHERE "SearchPhrase" <> ''
   > ```
   > 
   > But in this case `SearchPhrase` is also used in the rest of the query (and 
thus the projection)
   > 
   > For example
   > 
   > ```sql
   > SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY 
"EventTime" LIMIT 10;
   > ```
   
   Thank you @alamb , good finding, so in theory we can combine the unified 
select(this PR) and also the page cache, in theory we can get the best 
performance until now. I will try to do a poc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to