MisterRaindrop commented on PR #61: URL: https://github.com/apache/cloudberry-pxf/pull/61#issuecomment-3888340431
All deployments are local. Sizes: 100MB, 1GB, 10GB Workers: 4 Size | Rows | Workers | COUNT seq (ms) | COUNT par (ms) | Speedup | SUM seq (ms) | SUM par (ms) | Speedup ---------|--------------|---------|----------------|----------------|---------|----------------|----------------|-------- 100MB | 487,700 | 4 | 282 | 311 | 0.91x | 290 | 188 | 1.54x 1GB | 4,994,140 | 4 | 2352 | 1514 | 1.55x | 2448 | 1314 | 1.86x 10GB | 49,941,480 | 4 | 21524 | 11589 | 1.86x | 21954 | 11547 | 1.90x When exploring parallelization, the good news is that parallelization does indeed improve efficiency. For small data volumes, the improvement is not obvious and may even be less efficient than non-parallel processing. Only when the data volume is large does it show a noticeable improvement. However, the current improvement still falls short of the expected level. Theoretically, the speedup factor should be almost equal to the number of workers. The reason it hasn’t reached the expected level may be due to bottlenecks in I/O or CPU. Further exploration will be conducted in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
