[jira] [Commented] (SYSTEMML-510) Generalized wdivmm w/ eps all patterns
[ https://issues.apache.org/jira/browse/SYSTEMML-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177285#comment-15177285 ] Matthias Boehm commented on SYSTEMML-510: - excellent [~gweidner] - yes that would be very helpful. > Generalized wdivmm w/ eps all patterns > -- > > Key: SYSTEMML-510 > URL: https://issues.apache.org/jira/browse/SYSTEMML-510 > Project: SystemML > Issue Type: Task > Components: Compiler, Parser, Runtime >Reporter: Mike Dusenberry > Original Estimate: 6h > Remaining Estimate: 6h > > If we look at the inner loop of Poisson nonnegative matrix factorization > (PNMF) in general, we update the factors as > {code} > H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W) > W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H)) > {code}. > Notice the addition of the "1e-17" epsilon term in the denominators. > Mathematically, we need this in case any cell of {code}W%*%H{code} evaluates > to zero so that we can avoid dividing by zero. R needs this, but SystemML > technically does not due to a fused operator, "wdivmm", that takes care of > these situations (or this may be done in the general case?). This fused > operator is currently applied to the pattern {code}t(W) %*% (V / %* (W %*% > H)){code}, amongst other similar patterns. Ideally, this would easily apply > to {code}t(W) %*% (V/(W%*%H + 1e-17){code}, regardless of the unneeded > epsilon term. Currently, the addition of the epsilon term causes the > algorithm to run in non-linear time (quad or exponential). Initially, the > behavior pointed towards the possibility of the optimizer avoiding the > rewrite to the fused operator, resulting in naive computation, and non-linear > growth in training time. Further exploration seems to show that the rewrite > is indeed still being applied, but there seems to also be a recursive nesting > of the same rewrite over various regions of the above statements that is not > found when the epsilon term is removed. > The following is the full PNMF DML script used: > {code} > V = read($X) > max_iteration = $maxiter > rank = $rank > n = nrow(V) > m = ncol(V) > range = 0.01 > W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform") > H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform") > loglik0 = sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)) > i=0 > while(i < max_iteration) { > # Addition of epsilon (1e-17) term causes script to run in non-linear time: > H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W)) > W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H)) > # Removal of epsilon works correctly: > #H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) > #W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H)) > i = i + 1; > } > loglik = sum(V*log(W%*%H+1e-17)) - as.scalar(colSums(W)%*%rowSums(H)) > print("pnmf: " + loglik0 + " -> " + loglik) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-549) Update OptimizerUtils.OptimizationLevel javadocs
Deron Eriksson created SYSTEMML-549: --- Summary: Update OptimizerUtils.OptimizationLevel javadocs Key: SYSTEMML-549 URL: https://issues.apache.org/jira/browse/SYSTEMML-549 Project: SystemML Issue Type: Task Components: Documentation Reporter: Deron Eriksson The javadoc description for the OptimizerUtils.OptimizationLevel enum lists the following 5 levels with descriptions. {code} O0 STATIC O1 MEMORY_BASED O2 MEMORY_BASED O3 GLOBAL TIME_MEMORY_BASED O4 DEBUG MODE {code} However, the actual 6 optimization levels are: {code} O0_LOCAL_STATIC O1_LOCAL_MEMORY_MIN O2_LOCAL_MEMORY_DEFAULT O3_LOCAL_RESOURCE_TIME_MEMORY O4_GLOBAL_TIME_MEMORY O5_DEBUG_MODE {code} Some minor HTML tags would also help readability in the javadocs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (SYSTEMML-510) Generalized wdivmm w/ eps all patterns
[ https://issues.apache.org/jira/browse/SYSTEMML-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176552#comment-15176552 ] Glenn Weidner edited comment on SYSTEMML-510 at 3/2/16 9:56 PM: I can also work on separate PR for step 2 to add similar pattern for cross entropy wcemm (sum(X*log(U%*%t(V) + eps))). was (Author: gweidner): I can also work on separate PR for step 2 to add similar pattern for cross entropy case (sum(X*log(U%*%t(V) + eps))). > Generalized wdivmm w/ eps all patterns > -- > > Key: SYSTEMML-510 > URL: https://issues.apache.org/jira/browse/SYSTEMML-510 > Project: SystemML > Issue Type: Task > Components: Compiler, Parser, Runtime >Reporter: Mike Dusenberry > Original Estimate: 6h > Remaining Estimate: 6h > > If we look at the inner loop of Poisson nonnegative matrix factorization > (PNMF) in general, we update the factors as > {code} > H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W) > W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H)) > {code}. > Notice the addition of the "1e-17" epsilon term in the denominators. > Mathematically, we need this in case any cell of {code}W%*%H{code} evaluates > to zero so that we can avoid dividing by zero. R needs this, but SystemML > technically does not due to a fused operator, "wdivmm", that takes care of > these situations (or this may be done in the general case?). This fused > operator is currently applied to the pattern {code}t(W) %*% (V / %* (W %*% > H)){code}, amongst other similar patterns. Ideally, this would easily apply > to {code}t(W) %*% (V/(W%*%H + 1e-17){code}, regardless of the unneeded > epsilon term. Currently, the addition of the epsilon term causes the > algorithm to run in non-linear time (quad or exponential). Initially, the > behavior pointed towards the possibility of the optimizer avoiding the > rewrite to the fused operator, resulting in naive computation, and non-linear > growth in training time. Further exploration seems to show that the rewrite > is indeed still being applied, but there seems to also be a recursive nesting > of the same rewrite over various regions of the above statements that is not > found when the epsilon term is removed. > The following is the full PNMF DML script used: > {code} > V = read($X) > max_iteration = $maxiter > rank = $rank > n = nrow(V) > m = ncol(V) > range = 0.01 > W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform") > H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform") > loglik0 = sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)) > i=0 > while(i < max_iteration) { > # Addition of epsilon (1e-17) term causes script to run in non-linear time: > H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W)) > W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H)) > # Removal of epsilon works correctly: > #H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) > #W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H)) > i = i + 1; > } > loglik = sum(V*log(W%*%H+1e-17)) - as.scalar(colSums(W)%*%rowSums(H)) > print("pnmf: " + loglik0 + " -> " + loglik) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SYSTEMML-510) Generalized wdivmm w/ eps all patterns
[ https://issues.apache.org/jira/browse/SYSTEMML-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176552#comment-15176552 ] Glenn Weidner commented on SYSTEMML-510: I can also work on separate PR for step 2 to add similar pattern for cross entropy case (sum(X*log(U%*%t(V) + eps))). > Generalized wdivmm w/ eps all patterns > -- > > Key: SYSTEMML-510 > URL: https://issues.apache.org/jira/browse/SYSTEMML-510 > Project: SystemML > Issue Type: Task > Components: Compiler, Parser, Runtime >Reporter: Mike Dusenberry > Original Estimate: 6h > Remaining Estimate: 6h > > If we look at the inner loop of Poisson nonnegative matrix factorization > (PNMF) in general, we update the factors as > {code} > H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W) > W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H)) > {code}. > Notice the addition of the "1e-17" epsilon term in the denominators. > Mathematically, we need this in case any cell of {code}W%*%H{code} evaluates > to zero so that we can avoid dividing by zero. R needs this, but SystemML > technically does not due to a fused operator, "wdivmm", that takes care of > these situations (or this may be done in the general case?). This fused > operator is currently applied to the pattern {code}t(W) %*% (V / %* (W %*% > H)){code}, amongst other similar patterns. Ideally, this would easily apply > to {code}t(W) %*% (V/(W%*%H + 1e-17){code}, regardless of the unneeded > epsilon term. Currently, the addition of the epsilon term causes the > algorithm to run in non-linear time (quad or exponential). Initially, the > behavior pointed towards the possibility of the optimizer avoiding the > rewrite to the fused operator, resulting in naive computation, and non-linear > growth in training time. Further exploration seems to show that the rewrite > is indeed still being applied, but there seems to also be a recursive nesting > of the same rewrite over various regions of the above statements that is not > found when the epsilon term is removed. > The following is the full PNMF DML script used: > {code} > V = read($X) > max_iteration = $maxiter > rank = $rank > n = nrow(V) > m = ncol(V) > range = 0.01 > W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform") > H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform") > loglik0 = sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)) > i=0 > while(i < max_iteration) { > # Addition of epsilon (1e-17) term causes script to run in non-linear time: > H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W)) > W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H)) > # Removal of epsilon works correctly: > #H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) > #W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H)) > i = i + 1; > } > loglik = sum(V*log(W%*%H+1e-17)) - as.scalar(colSums(W)%*%rowSums(H)) > print("pnmf: " + loglik0 + " -> " + loglik) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-548) Add Python examples to Spark MLContext Programming Guide
Deron Eriksson created SYSTEMML-548: --- Summary: Add Python examples to Spark MLContext Programming Guide Key: SYSTEMML-548 URL: https://issues.apache.org/jira/browse/SYSTEMML-548 Project: SystemML Issue Type: Task Components: Documentation Reporter: Deron Eriksson Assignee: Deron Eriksson Add Python examples to the Spark MLContext Programming Guide. The first example can be Python independent of a notebook. The second example can be Python in a notebook. [~mwdus...@us.ibm.com] is currently developing working Python examples including a notebook, so this work can serve as the basis for this documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)