Mike Dusenberry created SYSTEMML-510:
----------------------------------------
Summary: Generalize All Existing wdivmm Patterns
Key: SYSTEMML-510
URL: https://issues.apache.org/jira/browse/SYSTEMML-510
Project: SystemML
Issue Type: Sub-task
Reporter: Mike Dusenberry
If we look at the inner loop of Poisson nonnegative matrix factorization (PNMF)
in general, we update the factors as
{code}
H = (H * (t(W) %*% (V/(W%*%H + 1e-17))))/t(colSums(W)
W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
{code}.
Notice the addition of the "1e-17" epsilon term in the denominators.
Mathematically, we need this in case any cell of W%*%H evaluates to zero so
that we can avoid dividing by zero. R needs this, but SystemML technically
does not due to a fused operator, "wdivmm", that takes care of these situations
(or this may be done in the general case?). This fused operator is currently
applied to the pattern {{t(W) %*% (V / %* (W %*% H))}}, amongst other similar
patterns. Ideally, this would easily apply to {{t(W) %*% (V/(W%*%H + 1e-17)}},
regardless of the unneeded epsilon term. Currently, the addition of the
epsilon term causes the algorithm to run in non-linear time (quad or
exponential). Initially, the behavior pointed towards the possibility of the
optimizer avoiding the rewrite to the fused operator, resulting in naive
computation, and non-linear growth in training time. Further exploration seems
to show that the rewrite is indeed still being applied, but there seems to also
be a recursive nesting of the same rewrite over various regions of the above
statements that is not found when the epsilon term is removed.
The following is the full PNMF DML script used:
{code}
V = read($X)
max_iteration = $maxiter
rank = $rank
n = nrow(V)
m = ncol(V)
range = 0.01
W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
loglik0 = sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H))
i=0
while(i < max_iteration) {
# Addition of epsilon (1e-17) term causes script to run in non-linear time:
H = (H * (t(W) %*% (V/(W%*%H + 1e-17))))/t(colSums(W))
W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
# Removal of epsilon works correctly:
#H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W))
#W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
i = i + 1;
print(i + "")
}
loglik = sum(V*log(W%*%H+1e-17)) - as.scalar(colSums(W)%*%rowSums(H))
print("pnmf: " + loglik0 + " -> " + loglik)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)