alamb commented on issue #11567:
URL: https://github.com/apache/datafusion/issues/11567#issuecomment-2258049992

   I looked at some short queries and found one potential improvement 
https://github.com/apache/datafusion/issues/11719
   
   I also looked at Q38
   ```sql
   SELECT "URL", COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND 
"EventDate"::INT::DATE >= '2013-07-01' AND "EventDate"::INT::DATE <= 
'2013-07-31' AND "IsRefresh" = 0 AND "IsLink" <> 0 AND "IsDownload" = 0 GROUP 
BY "URL" ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
   ```
   
   ```shell
   $ cargo run --release --bin dfbench -- clickbench --iterations 100 --path 
benchmarks/data/hits_partitioned  --query 38
   ```
   
   More than 50% of the time is spent doing snappy decoding (which we aren't 
likely to be able to improve)
   
   <img width="1728" alt="Screenshot 2024-07-30 at 6 40 44 AM" 
src="https://github.com/user-attachments/assets/a1e53db1-6e67-4014-b4f5-77308a581c76";>
   
   12% of the time is reading string data from parquet (maybe stringview will 
help)
   10% of the time is spent decoding parquet metadata
   
   <img width="1728" alt="Screenshot 2024-07-30 at 6 44 17 AM" 
src="https://github.com/user-attachments/assets/c95aac55-0cac-4be7-a981-a3e3ce8c79ac";>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to