[GitHub] [arrow] lidavidm commented on pull request #11535: ARROW-14429: [C++] Speed up IPC file reader on high-latency filesystems

GitBox Mon, 25 Oct 2021 08:33:55 -0700


lidavidm commented on pull request #11535:
URL: https://github.com/apache/arrow/pull/11535#issuecomment-951049087



   I tested this with minio and toxiproxy set up with 
`toxiproxy-cli-linux-amd64 toxic add -n latency -t latency --attribute 
latency=100 s3`. Now, this is rather unrealistic - this is a lot more latency 
than you should expect from S3, unless you're doing a cross-region read - but 
it highlights the cost of I/O in this case.
   
   Median times are given below. Three methods are compared: iterating through 
all record batches, iterating through all batches using the generator (which 
also uses coalescing), and using Datasets (async scanner) to read the data as a 
table.
   
   ```
   Baseline:
   Iterator: 5.54072s
   Generator: 0.560195s
   Datasets: 1.39329s
   
   With the IPC message optimization:
   Iterator: 2.95526s
   Generator: 0.561748s
   Datasets: 1.39662s
   
   With the IPC message optimization and the footer optimization:
   Iterator: 2.84875s
   Generator: 0.456949s
   Datasets: 1.08955s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on pull request #11535: ARROW-14429: [C++] Speed up IPC file reader on high-latency filesystems

Reply via email to