shehabgamin commented on PR #17022:
URL: https://github.com/apache/datafusion/pull/17022#issuecomment-3148410589

   Not seeing any benefit when testing with S3 data cc @alamb 
   
   > 
   > Tested on Derived TPC-H (100 GB) querying S3 from EC2.
   > 
   > EC2 Tab 1:
   > 
   > ```
   > env RUSTFLAGS="-C target-cpu=native" cargo build -r -p sail-cli --bins 
--target-dir target/parquet-metadata-cache
   > 
   > env \
   > RUST_LOG=info \
   > SAIL_PARQUET__FILE_METADATA_CACHE=true \
   > target/parquet-metadata-cache/release/sail spark server
   > ```
   > 
   > EC2 Tab 2:
   > 
   > ```
   > python python/pysail/examples/spark/tpch.py \
   > --data-path s3://BUCKET-PATH HERE \
   > --query-path python/pysail/data/tpch/queries \
   > --query-all \
   > --num-runs 3
   > 
   > Run 1 Total time for all queries: 180.1174988746643 seconds.
   > 
   > Run 2 Total time for all queries: 184.3733410835266 seconds.
   > 
   > Run 3 Total time for all queries: 176.67709589004517 seconds.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to