louislepage commented on PR #1946:
URL: https://github.com/apache/systemds/pull/1946#issuecomment-1849490290

   I rewrote the generic shapley value computation with sampling and added an 
example script, as well as a jupyter notebook in which i compared the results 
of the official SHAP package and my systemds implementation.
   
   The reults (at least in the case of scaled data) look good, however I found 
that the rbind() calls during preparation of the instances matrix are very slow 
and take the longest. Therefor I would revisit this to make further 
optimizations, but I think I need some advice on how make those appends/writes 
to large matrices faster.
   
   Here is a quick plot of the computed values for the 107 features of the 
tranformencoded adult dataset.
   Both implementations used the full ~32000 samples as background data and ran 
for 10000 iterations for each sample.
   
![image](https://github.com/apache/systemds/assets/66960491/e2117874-92c7-4230-b420-75c50d4ae5d2)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to