janniklinde commented on PR #2376: URL: https://github.com/apache/systemds/pull/2376#issuecomment-3669261021
LGTM @mboehm7 - one possible concern I have both with the single-threaded and multi-threaded implementation is that `clen == 1` has to guarantee that `in.getDenseBlockValues()` does not throw an exception or returns only the first row. Because the single-threaded case already had that implementation, I am not sure if it is intentional or an issue. Thank you again for your contribution, bugfixes, and detailed tests and benchmarks @Biranavan-Parameswaran. It appears that `System.arraycopy` already saturates memory bandwidth (shown by your single column benchmarks). An optional improvement I could see for `DenseBlockFP64` is to always perform two large arraycopies instead of many small ones (currently this is only limited to `clen==1`). Due to the contiguous row-major layout of some dense block implementations, performing two large arraycopies should be possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
