louislepage commented on PR #1946: URL: https://github.com/apache/systemds/pull/1946#issuecomment-1868308483
I was able to run it on 5.000 to 50.000 samples and, as expected, it did scale linearly. However this also showed, that the python implementation just has a huge overhead for small sample sizes, but scales at virtually the same rate.  And rbind() is still the slowest single operation for very large sample-sizes, but i was unable to directly write to rows of a pre-allocated matrix, which could be benifical in this case. ``` Heavy hitter instructions: # Instruction Time(s) Count 1 shapley_sampling 56.988 1 2 shapley_sampling_prepare 29.756 107 3 rbind 22.870 2 4 ! 10.097 107 5 append 6.878 219 6 rand 4.483 219 7 leftIndex 3.574 321 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
