louislepage commented on PR #1946:
URL: https://github.com/apache/systemds/pull/1946#issuecomment-1868308483

   I was able to run it on 5.000 to 50.000 samples and, as expected, it did 
scale linearly.
   However this also showed, that the python implementation just has a huge 
overhead for small sample sizes, but scales at virtually the same rate.
   
   
![image](https://github.com/apache/systemds/assets/66960491/baa8b3d3-e723-44fe-8fd0-ff1177eeff97)
   
   
   And rbind() is still the slowest single operation for very large 
sample-sizes, but i was unable to directly write to rows of a pre-allocated 
matrix, which could be benifical in this case.
   
   ```
   Heavy hitter instructions:
     #  Instruction                    Time(s)  Count
     1  shapley_sampling                56.988      1
     2  shapley_sampling_prepare        29.756    107
     3  rbind                           22.870      2
     4  !                               10.097    107
     5  append                           6.878    219
     6  rand                             4.483    219
     7  leftIndex                        3.574    321
     ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to