[jira] [Commented] (SYSTEMML-510) Generalized wdivmm w/ eps all patterns

2016-03-02 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177285#comment-15177285
 ] 

Matthias Boehm commented on SYSTEMML-510:
-

excellent [~gweidner] - yes that would be very helpful.

> Generalized wdivmm w/ eps all patterns
> --
>
> Key: SYSTEMML-510
> URL: https://issues.apache.org/jira/browse/SYSTEMML-510
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Parser, Runtime
>Reporter: Mike Dusenberry
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> If we look at the inner loop of Poisson nonnegative matrix factorization 
> (PNMF) in general, we update the factors as 
> {code}
> H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W)
> W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
> {code}.
> Notice the addition of the "1e-17" epsilon term in the denominators.  
> Mathematically, we need this in case any cell of {code}W%*%H{code} evaluates 
> to zero so that we can avoid dividing by zero.  R needs this, but SystemML 
> technically does not due to a fused operator, "wdivmm", that takes care of 
> these situations (or this may be done in the general case?).  This fused 
> operator is currently applied to the pattern {code}t(W) %*% (V / %* (W %*% 
> H)){code}, amongst other similar patterns.  Ideally, this would easily apply 
> to {code}t(W) %*% (V/(W%*%H + 1e-17){code}, regardless of the unneeded 
> epsilon term.  Currently, the addition of the epsilon term causes the 
> algorithm to run in non-linear time (quad or exponential).  Initially, the 
> behavior pointed towards the possibility of the optimizer avoiding the 
> rewrite to the fused operator, resulting in naive computation, and non-linear 
> growth in training time.  Further exploration seems to show that the rewrite 
> is indeed still being applied, but there seems to also be a recursive nesting 
> of the same rewrite over various regions of the above statements that is not 
> found when the epsilon term is removed.
> The following is the full PNMF DML script used:
> {code}
> V = read($X)
> max_iteration = $maxiter
> rank = $rank
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> loglik0 = sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H))
> i=0
> while(i < max_iteration) {
>   # Addition of epsilon (1e-17) term causes script to run in non-linear time:
>   H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W))
>   W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
>   # Removal of epsilon works correctly:
>   #H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W))
>   #W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> loglik = sum(V*log(W%*%H+1e-17)) - as.scalar(colSums(W)%*%rowSums(H))
> print("pnmf: " + loglik0 + " -> " + loglik)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-549) Update OptimizerUtils.OptimizationLevel javadocs

2016-03-02 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-549:
---

 Summary: Update OptimizerUtils.OptimizationLevel javadocs
 Key: SYSTEMML-549
 URL: https://issues.apache.org/jira/browse/SYSTEMML-549
 Project: SystemML
  Issue Type: Task
  Components: Documentation
Reporter: Deron Eriksson


The javadoc description for the OptimizerUtils.OptimizationLevel enum lists the 
following 5 levels with descriptions.
{code}
O0 STATIC
O1 MEMORY_BASED
O2 MEMORY_BASED
O3 GLOBAL TIME_MEMORY_BASED
O4 DEBUG MODE
{code}

However, the actual 6 optimization levels are:
{code}
O0_LOCAL_STATIC
O1_LOCAL_MEMORY_MIN
O2_LOCAL_MEMORY_DEFAULT
O3_LOCAL_RESOURCE_TIME_MEMORY
O4_GLOBAL_TIME_MEMORY
O5_DEBUG_MODE
{code}

Some minor HTML tags would also help readability in the javadocs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (SYSTEMML-510) Generalized wdivmm w/ eps all patterns

2016-03-02 Thread Glenn Weidner (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176552#comment-15176552
 ] 

Glenn Weidner edited comment on SYSTEMML-510 at 3/2/16 9:56 PM:


I can also work on separate PR for step 2 to add similar pattern for cross 
entropy wcemm (sum(X*log(U%*%t(V) + eps))).


was (Author: gweidner):
I can also work on separate PR for step 2 to add similar pattern for cross 
entropy case (sum(X*log(U%*%t(V) + eps))).

> Generalized wdivmm w/ eps all patterns
> --
>
> Key: SYSTEMML-510
> URL: https://issues.apache.org/jira/browse/SYSTEMML-510
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Parser, Runtime
>Reporter: Mike Dusenberry
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> If we look at the inner loop of Poisson nonnegative matrix factorization 
> (PNMF) in general, we update the factors as 
> {code}
> H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W)
> W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
> {code}.
> Notice the addition of the "1e-17" epsilon term in the denominators.  
> Mathematically, we need this in case any cell of {code}W%*%H{code} evaluates 
> to zero so that we can avoid dividing by zero.  R needs this, but SystemML 
> technically does not due to a fused operator, "wdivmm", that takes care of 
> these situations (or this may be done in the general case?).  This fused 
> operator is currently applied to the pattern {code}t(W) %*% (V / %* (W %*% 
> H)){code}, amongst other similar patterns.  Ideally, this would easily apply 
> to {code}t(W) %*% (V/(W%*%H + 1e-17){code}, regardless of the unneeded 
> epsilon term.  Currently, the addition of the epsilon term causes the 
> algorithm to run in non-linear time (quad or exponential).  Initially, the 
> behavior pointed towards the possibility of the optimizer avoiding the 
> rewrite to the fused operator, resulting in naive computation, and non-linear 
> growth in training time.  Further exploration seems to show that the rewrite 
> is indeed still being applied, but there seems to also be a recursive nesting 
> of the same rewrite over various regions of the above statements that is not 
> found when the epsilon term is removed.
> The following is the full PNMF DML script used:
> {code}
> V = read($X)
> max_iteration = $maxiter
> rank = $rank
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> loglik0 = sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H))
> i=0
> while(i < max_iteration) {
>   # Addition of epsilon (1e-17) term causes script to run in non-linear time:
>   H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W))
>   W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
>   # Removal of epsilon works correctly:
>   #H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W))
>   #W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> loglik = sum(V*log(W%*%H+1e-17)) - as.scalar(colSums(W)%*%rowSums(H))
> print("pnmf: " + loglik0 + " -> " + loglik)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-510) Generalized wdivmm w/ eps all patterns

2016-03-02 Thread Glenn Weidner (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176552#comment-15176552
 ] 

Glenn Weidner commented on SYSTEMML-510:


I can also work on separate PR for step 2 to add similar pattern for cross 
entropy case (sum(X*log(U%*%t(V) + eps))).

> Generalized wdivmm w/ eps all patterns
> --
>
> Key: SYSTEMML-510
> URL: https://issues.apache.org/jira/browse/SYSTEMML-510
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Parser, Runtime
>Reporter: Mike Dusenberry
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> If we look at the inner loop of Poisson nonnegative matrix factorization 
> (PNMF) in general, we update the factors as 
> {code}
> H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W)
> W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
> {code}.
> Notice the addition of the "1e-17" epsilon term in the denominators.  
> Mathematically, we need this in case any cell of {code}W%*%H{code} evaluates 
> to zero so that we can avoid dividing by zero.  R needs this, but SystemML 
> technically does not due to a fused operator, "wdivmm", that takes care of 
> these situations (or this may be done in the general case?).  This fused 
> operator is currently applied to the pattern {code}t(W) %*% (V / %* (W %*% 
> H)){code}, amongst other similar patterns.  Ideally, this would easily apply 
> to {code}t(W) %*% (V/(W%*%H + 1e-17){code}, regardless of the unneeded 
> epsilon term.  Currently, the addition of the epsilon term causes the 
> algorithm to run in non-linear time (quad or exponential).  Initially, the 
> behavior pointed towards the possibility of the optimizer avoiding the 
> rewrite to the fused operator, resulting in naive computation, and non-linear 
> growth in training time.  Further exploration seems to show that the rewrite 
> is indeed still being applied, but there seems to also be a recursive nesting 
> of the same rewrite over various regions of the above statements that is not 
> found when the epsilon term is removed.
> The following is the full PNMF DML script used:
> {code}
> V = read($X)
> max_iteration = $maxiter
> rank = $rank
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> loglik0 = sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H))
> i=0
> while(i < max_iteration) {
>   # Addition of epsilon (1e-17) term causes script to run in non-linear time:
>   H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W))
>   W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
>   # Removal of epsilon works correctly:
>   #H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W))
>   #W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> loglik = sum(V*log(W%*%H+1e-17)) - as.scalar(colSums(W)%*%rowSums(H))
> print("pnmf: " + loglik0 + " -> " + loglik)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-548) Add Python examples to Spark MLContext Programming Guide

2016-03-02 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-548:
---

 Summary: Add Python examples to Spark MLContext Programming Guide
 Key: SYSTEMML-548
 URL: https://issues.apache.org/jira/browse/SYSTEMML-548
 Project: SystemML
  Issue Type: Task
  Components: Documentation
Reporter: Deron Eriksson
Assignee: Deron Eriksson


Add Python examples to the Spark MLContext Programming Guide. The first example 
can be Python independent of a notebook. The second example can be Python in a 
notebook.

[~mwdus...@us.ibm.com] is currently developing working Python examples 
including a notebook, so this work can serve as the basis for this 
documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)