louislepage commented on PR #1946: URL: https://github.com/apache/systemds/pull/1946#issuecomment-1849490290
I rewrote the generic shapley value computation with sampling and added an example script, as well as a jupyter notebook in which i compared the results of the official SHAP package and my systemds implementation. The reults (at least in the case of scaled data) look good, however I found that the rbind() calls during preparation of the instances matrix are very slow and take the longest. Therefor I would revisit this to make further optimizations, but I think I need some advice on how make those appends/writes to large matrices faster. Here is a quick plot of the computed values for the 107 features of the tranformencoded adult dataset. Both implementations used the full ~32000 samples as background data and ran for 10000 iterations for each sample.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
