ramesesz opened a new pull request, #1959:
URL: https://github.com/apache/systemds/pull/1959
This patch improves the builtin dist function by removing the outer product
operator. For 100 function calls on an arbitrary matrix with 4000 rows and 800
cols, the new dist function shortens the runtime from 66.541s to 60.268s.
The following experiment was run with varying rows and cols size:
```
X = rand(rows=4000, cols=800, min=-1, max=1, seed=42)
for (i in 1:100){
Y = new_distance_matrix(X)
}
print( sum(Y) )
new_distance_matrix = function(matrix[double] X)
return (matrix[double] out)
{
n = nrow(X)
s = rowSums(X * X)
out = - 2*X %*% t(X) + s + t(s)
out = replace(target = out, pattern=NaN, replacement = 0);
}
```
Terminal outputs from the experiments using the _time_ prefix and _-stats_
argument of the systemds CLI can be seen
[here](https://glossy-flame-9af.notion.site/Distance-function-improvement-751d04bfe5c9458e9cf51a6d99c5d4c6).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]