MisterRaindrop commented on PR #61:
URL: https://github.com/apache/cloudberry-pxf/pull/61#issuecomment-3888340431

   All deployments are local.
   Sizes:     100MB, 1GB, 10GB
   Workers:     4
   
   Size     |         Rows | Workers | COUNT seq (ms) | COUNT par (ms) | 
Speedup |   SUM seq (ms) |   SUM par (ms) | Speedup
   
---------|--------------|---------|----------------|----------------|---------|----------------|----------------|--------
   100MB    |      487,700 |       4 |            282 |            311 |  0.91x 
 |            290 |            188 |  1.54x
   1GB      |    4,994,140 |       4 |           2352 |           1514 |  1.55x 
 |           2448 |           1314 |  1.86x
   10GB     |   49,941,480 |       4 |          21524 |          11589 |  1.86x 
 |          21954 |          11547 |  1.90x
   
   
   When exploring parallelization, the good news is that parallelization does 
indeed improve efficiency. For small data volumes, the improvement is not 
obvious and may even be less efficient than non-parallel processing. Only when 
the data volume is large does it show a noticeable improvement.
   
   However, the current improvement still falls short of the expected level. 
Theoretically, the speedup factor should be almost equal to the number of 
workers. The reason it hasn’t reached the expected level may be due to 
bottlenecks in I/O or CPU. Further exploration will be conducted in the future.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to