[jira] [Updated] (SYSTEMML-590) Assume Parent's Namespace for Nested UDF calls.

2016-03-22 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-590:
-
Description: 
Currently, if a UDF body involves calling another UDF, the default global 
namespace is assumed, unless a namespace is explicitly indicated.  This becomes 
a problem when a file contains UDFs, and is then sourced from another script.

Imagine a file {{funcs.dml}} as follows:
{code}
f = function(double x, int a) return (double ans) {
  x2 = g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Then, let's try to call {{f}}:
{code}
script = """
source ("funcs.dml") as funcs

ans = funcs::f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

This results in an error since {{f}} is in the {{funcs}} namespace, but the 
call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
user intends to the use the {{g}} that is located in the same file.

Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
assume that {{f}} and {{g}} are in a {{funcs}} namespace:
{code}
f = function(double x, int a) return (double ans) {
  x2 = funcs::g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Instead, it would be better to simply first look for {{g}} in its parent's 
namespace.  In this case, the "parent" would be the function {{f}}, and the 
namespace we have selected is {{funcs}}.  Then, namespace assumptions would not 
be necessary.

  was:
Currently, if a UDF body involves calling another UDF, the default global 
namespace is assumed, unless a namespace is explicitly indicated.  This becomes 
a problem when a file contains UDFs, and is then sourced from another script.

Imagine a file {{funcs.dml}} as follows:
{code}
f = function(double x, int a) return (double ans) {
  x2 = g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Then, let's try to call {{f}}:
{code}
script = """
source ("funcs.dml") as funcs

ans = funcs::f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

This results in an error since {{f}} is in the {{funcs}} namespace, but the 
call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
user intends to the use the {{g}} that is located in the same file.

Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
assume that {{f}} and {{g}} are in a {{funcs}} namespace:
{code}
f = function(double x, int a) return (double ans) {


f = function(double x, int a) return (double ans) {
  x2 = funcs::g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Instead, it would be better to simply first look for {{g}} in its parent's 
namespace.  In this case, the "parent" would be the function {{f}}, and the 
namespace we have selected is {{funcs}}.  Then, namespace assumptions would not 
be necessary.


> Assume Parent's Namespace for Nested UDF calls.
> ---
>
> Key: SYSTEMML-590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-590
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>
> Currently, if a UDF body involves calling another UDF, the default global 
> namespace is assumed, unless a namespace is explicitly indicated.  This 
> becomes a problem when a file contains UDFs, and is then sourced from another 
> script.
> Imagine a file {{funcs.dml}} as follows:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Then, let's try to call {{f}}:
> {code}
> script = """
> source ("funcs.dml") as funcs
> ans = funcs::f(3, 1)
> print(ans)
> """
> ml.reset()
> ml.executeScript(script)
> {code}
> This results in an error since {{f}} is in the {{funcs}} namespace, but the 
> call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
> user intends to the use the {{g}} that is located in the same file.
> Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
> assume that {{f}} and {{g}} are in a {{funcs}} namespace:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = funcs::g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Instead, it would be better to simply first look for {{g}} in its parent's 
> namespace.  In this case, the "parent" would be the function {{f}}, and the 
> namespace we have selected is {{funcs}}.  Then, namespace assumptions would 
> not be necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-589) Add Default Parameter Values to UDFs

2016-03-22 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-589:


 Summary: Add Default Parameter Values to UDFs
 Key: SYSTEMML-589
 URL: https://issues.apache.org/jira/browse/SYSTEMML-589
 Project: SystemML
  Issue Type: Sub-task
Reporter: Mike Dusenberry


This task aims to add default parameter values to UDFs for scalar and boolean 
types.  There may already be runtime support, but the grammar does not seem to 
allow it.

Example that currently works:
{code}
script = """
f = function(double x, int a) return (double ans) {
  ans = a * x
}

ans = f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

Example that would be nice:
{code}
script = """
f = function(double x, int a=1) return (double ans) {
  ans = a * x
}

ans = f(3)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-588) Improve UDFs

2016-03-22 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-588:


 Summary: Improve UDFs
 Key: SYSTEMML-588
 URL: https://issues.apache.org/jira/browse/SYSTEMML-588
 Project: SystemML
  Issue Type: Epic
Reporter: Mike Dusenberry


This epic aims to improve the state of user-defined functions (UDFs) in DML & 
PyDML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-587) Improvements Triggered By Deep Learning Work

2016-03-22 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-587:


 Summary: Improvements Triggered By Deep Learning Work
 Key: SYSTEMML-587
 URL: https://issues.apache.org/jira/browse/SYSTEMML-587
 Project: SystemML
  Issue Type: Umbrella
Reporter: Mike Dusenberry
Priority: Minor


This convenience umbrella tracks all improvements triggered by the work on deep 
learning (SYSTEMML-540), but not directly related to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-580) Add Scala LogisticRegression API For Spark ML Pipeline

2016-03-20 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197797#comment-15197797
 ] 

Mike Dusenberry commented on SYSTEMML-580:
--

[PR 70 | https://github.com/apache/incubator-systemml/pull/70] merged as 
[commit 7ce19c8097f3d24d07be87d9427890834f9a9bea | 
https://github.com/apache/incubator-systemml/commit/7ce19c8097f3d24d07be87d9427890834f9a9bea].

> Add Scala LogisticRegression API For Spark ML Pipeline
> --
>
> Key: SYSTEMML-580
> URL: https://issues.apache.org/jira/browse/SYSTEMML-580
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Tommy Yu
>Assignee: Tommy Yu
>
> I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
> for scala user.
> I propose a scala version example since some weakness for java version.
> It's not naturally to extend scala class in java code. We need know function 
> style after compile, like
> @Override
> public void 
> org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
>  arg0) {}
> I assume it's set function, but do nothing here
> Hard to follow ml parameter style, but define parameter like below
> private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
> private DoubleParam reg = new DoubleParam(this, "reg", "Value of 
> regularization parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-580) Add Scala LogisticRegression API For Spark ML Pipeline

2016-03-19 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197785#comment-15197785
 ] 

Mike Dusenberry commented on SYSTEMML-580:
--

[PR 70 | https://github.com/apache/incubator-systemml/pull/70] submitted.

> Add Scala LogisticRegression API For Spark ML Pipeline
> --
>
> Key: SYSTEMML-580
> URL: https://issues.apache.org/jira/browse/SYSTEMML-580
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Tommy Yu
>Assignee: Tommy Yu
>
> I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
> for scala user.
> I propose a scala version example since some weakness for java version.
> It's not naturally to extend scala class in java code. We need know function 
> style after compile, like
> @Override
> public void 
> org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
>  arg0) {}
> I assume it's set function, but do nothing here
> Hard to follow ml parameter style, but define parameter like below
> private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
> private DoubleParam reg = new DoubleParam(this, "reg", "Value of 
> regularization parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-580) Add Scala LogisticRegression API For Spark ML Pipeline

2016-03-19 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-580:
-
Summary: Add Scala LogisticRegression API For Spark ML Pipeline  (was: Add 
Scala LogisticRegression API For Spark Pipeline)

> Add Scala LogisticRegression API For Spark ML Pipeline
> --
>
> Key: SYSTEMML-580
> URL: https://issues.apache.org/jira/browse/SYSTEMML-580
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Tommy Yu
>Assignee: Tommy Yu
>
> I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
> for scala user.
> I propose a scala version example since some weakness for java version.
> It's not naturally to extend scala class in java code. We need know function 
> style after compile, like
> @Override
> public void 
> org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
>  arg0) {}
> I assume it's set function, but do nothing here
> Hard to follow ml parameter style, but define parameter like below
> private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
> private DoubleParam reg = new DoubleParam(this, "reg", "Value of 
> regularization parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-580) Add Scala LogisticRegression API For Spark Pipeline

2016-03-19 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-580:


 Summary: Add Scala LogisticRegression API For Spark Pipeline
 Key: SYSTEMML-580
 URL: https://issues.apache.org/jira/browse/SYSTEMML-580
 Project: SystemML
  Issue Type: New Feature
Reporter: Tommy Yu
Assignee: Tommy Yu


I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
for scala user.

I propose a scala version example since some weakness for java version.

It's not naturally to extend scala class in java code. We need know function 
style after compile, like
@Override
public void 
org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
 arg0) {}

I assume it's set function, but do nothing here

Hard to follow ml parameter style, but define parameter like below

private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
private DoubleParam reg = new DoubleParam(this, "reg", "Value of regularization 
parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-540) Deep Learning

2016-03-19 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198220#comment-15198220
 ] 

Mike Dusenberry commented on SYSTEMML-540:
--

Update: I'm working on an experimental, layers-based framework directly in DML 
to contain layer abstractions with simple forward/backward APIs for affine, 
convolution (start with 2D), max-pooling, non-linearities (relu, sigmoid, 
softmax, etc.), dropout, loss functions, and other layers.  As part of this 
experiment, I'm starting by implementing as much as possible in DML, and then 
will move to built-in functions as necessary.

> Deep Learning
> -
>
> Key: SYSTEMML-540
> URL: https://issues.apache.org/jira/browse/SYSTEMML-540
> Project: SystemML
>  Issue Type: Epic
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> This epic covers the addition of deep learning to SystemML, including:
> * Core DML layer abstractions for deep (convolutional, recurrent) neural 
> nets, with simple forward/backward API: affine, convolution (start with 2D), 
> max-pooling, non-linearities (relu, sigmoid, softmax), dropout, loss 
> functions.
> * Modularized DML optimizers: (mini-batch, stochastic) gradient descent (w/ 
> momentum, etc.).
> * Additional DML language support as necessary (tensors, built-in functions 
> such as convolution, function pointers, list structures, etc.).
> * Integration with other deep learning frameworks (Caffe, Torch, Theano, 
> TensoFlow, etc.) via automatic DML code generation.
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-582) Determine If Multiple Builds Are Needed For Different Scala Versions.

2016-03-19 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-582:


 Summary: Determine If Multiple Builds Are Needed For Different 
Scala Versions.
 Key: SYSTEMML-582
 URL: https://issues.apache.org/jira/browse/SYSTEMML-582
 Project: SystemML
  Issue Type: New Feature
Reporter: Mike Dusenberry






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (SYSTEMML-580) Add Scala LogisticRegression API For Spark ML Pipeline

2016-03-19 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry resolved SYSTEMML-580.
--
Resolution: Fixed

> Add Scala LogisticRegression API For Spark ML Pipeline
> --
>
> Key: SYSTEMML-580
> URL: https://issues.apache.org/jira/browse/SYSTEMML-580
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Tommy Yu
>Assignee: Tommy Yu
>
> I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
> for scala user.
> I propose a scala version example since some weakness for java version.
> It's not naturally to extend scala class in java code. We need know function 
> style after compile, like
> @Override
> public void 
> org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
>  arg0) {}
> I assume it's set function, but do nothing here
> Hard to follow ml parameter style, but define parameter like below
> private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
> private DoubleParam reg = new DoubleParam(this, "reg", "Value of 
> regularization parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-579) Packing our algorithm scripts into JAR

2016-03-19 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-579:
-
Description: 
Packing our algorithm to JAR without look into the user's filesystem.

We should look into the possibility of packing our algorithm scripts into the 
JAR during build time as perhaps a Maven "resource" that would be available to 
the Java process without needing to look into the user's filesystem.  This 
should help with the Scala API introduced in SYSTEMML-580.  One issue I see 
with the current approach is if a user wishes to attach the SystemML JAR to a 
cloud notebook (such as Databricks Cloud) in which an environment variable may 
not be able to be set, the API will not function.

  was:Packing our algorithm to JAR without look into the user's filesystem.


> Packing our algorithm scripts into JAR
> --
>
> Key: SYSTEMML-579
> URL: https://issues.apache.org/jira/browse/SYSTEMML-579
> Project: SystemML
>  Issue Type: Task
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.9
>Reporter: Tommy Yu
>Priority: Minor
>
> Packing our algorithm to JAR without look into the user's filesystem.
> We should look into the possibility of packing our algorithm scripts into the 
> JAR during build time as perhaps a Maven "resource" that would be available 
> to the Java process without needing to look into the user's filesystem.  This 
> should help with the Scala API introduced in SYSTEMML-580.  One issue I see 
> with the current approach is if a user wishes to attach the SystemML JAR to a 
> cloud notebook (such as Databricks Cloud) in which an environment variable 
> may not be able to be set, the API will not function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SYSTEMML-545) Document Scala build support in Eclipse

2016-03-19 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry closed SYSTEMML-545.


> Document Scala build support in Eclipse
> ---
>
> Key: SYSTEMML-545
> URL: https://issues.apache.org/jira/browse/SYSTEMML-545
> Project: SystemML
>  Issue Type: Improvement
>  Components: Build
>Reporter: Glenn Weidner
>Assignee: Glenn Weidner
>
> In preparation for [SYSTEMML-543 Refactor MLContext in 
> Scala|https://issues.apache.org/jira/browse/SYSTEMML-543], the project build 
> needs to support Scala in Eclipse.  Initial investigation and discussion can 
> be found in [PR70|https://github.com/apache/incubator-systemml/pull/70].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-581) Add Scala API Tests to Maven Test Suites

2016-03-18 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-581:


 Summary: Add Scala API Tests to Maven Test Suites
 Key: SYSTEMML-581
 URL: https://issues.apache.org/jira/browse/SYSTEMML-581
 Project: SystemML
  Issue Type: New Feature
Reporter: Mike Dusenberry
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-577) Add High-Level "executeScript" API to Python MLContext

2016-03-15 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-577:
-
Description: 
This adds the {{executeScript(...)}} function to the Python MLContext API, and 
in the process hides the need to use {{registerInput(...)}} and 
{{registerOutput(...)}} by allowing the user to pass in a dictionary of 
key:value inputs of any type, and an array of outputs to keep.

Example:
{code}
pnmf = """ // script here """
outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
["W", "H", "negloglik"])
{code}

  was:
This adds the {{executeScript(...)}} function to the Python MLContext API, and 
in the process hides the need to use {{registerInput(...)}} and 
{{registerOutput(...)}} by allowing the user to pass in a dictionary of 
key:value inputs of any type, and an array of outputs to keep.

Example:
{code}
pnmf = """ // script here """
outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
["W", "H", "negloglik"])
{code}}


> Add High-Level "executeScript" API to Python MLContext
> --
>
> Key: SYSTEMML-577
> URL: https://issues.apache.org/jira/browse/SYSTEMML-577
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>
> This adds the {{executeScript(...)}} function to the Python MLContext API, 
> and in the process hides the need to use {{registerInput(...)}} and 
> {{registerOutput(...)}} by allowing the user to pass in a dictionary of 
> key:value inputs of any type, and an array of outputs to keep.
> Example:
> {code}
> pnmf = """ // script here """
> outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
> ["W", "H", "negloglik"])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-543) Refactor MLContext in Scala

2016-03-03 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179337#comment-15179337
 ] 

Mike Dusenberry commented on SYSTEMML-543:
--

[~tommy_cug] Thanks for reaching out!  I haven't started on this, so please 
feel free to work on it.  However, I think that the redesign will rely on what 
[~deron] is working on with SYSTEMML-544, so please coordinate with him! :)

> Refactor MLContext in Scala
> ---
>
> Key: SYSTEMML-543
> URL: https://issues.apache.org/jira/browse/SYSTEMML-543
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> Our {{MLContext}} API relies on a myriad of optional parameters as 
> conveniences for end-users, which has led to our Java implementation growing 
> in size.  Moving to Scala will allow us to use default parameters and 
> continue to expand the capabilities of the API in a clean way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-540) Deep Learning

2016-02-25 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-540:
-
Description: 
This epic covers the addition of deep learning to SystemML, including:

* Core DML layer abstractions for deep (convolutional) neural nets.
* DML language support as necessary.
* DML code generation (Caffe, Torch, Theano, TensoFlow, etc. integration)
* etc.

  was:
This epic covers the addition of deep learning to SystemML, including:

* Core DML layer abstractions for deep (convolutional) neural nets.
* DML language support as necessary.
* DML code generation (Caffe, Theano, etc. integration)
* etc.


> Deep Learning
> -
>
> Key: SYSTEMML-540
> URL: https://issues.apache.org/jira/browse/SYSTEMML-540
> Project: SystemML
>  Issue Type: Epic
>Reporter: Mike Dusenberry
>
> This epic covers the addition of deep learning to SystemML, including:
> * Core DML layer abstractions for deep (convolutional) neural nets.
> * DML language support as necessary.
> * DML code generation (Caffe, Torch, Theano, TensoFlow, etc. integration)
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-02-18 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152761#comment-15152761
 ] 

Mike Dusenberry commented on SYSTEMML-512:
--

[~mboehm7] Confirmed -- the OOM issue is indeed related to the young generation 
heap size.  Setting -Xmn=100M with driver memory still set to 1G allows the 
script to run.  Is there anything we can do internally to avoid this?

For clarity to anyone else reading this, the long runtime issue is still 
present.

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
>   at 
> 

[jira] [Comment Edited] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-02-17 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151623#comment-15151623
 ] 

Mike Dusenberry edited comment on SYSTEMML-512 at 2/18/16 2:52 AM:
---

[~mboehm7] I've added two Scala files with code that expresses the issue.  
{{test1.scala}} works correctly, and {{test2.scala}} has the issue described 
above.  The only difference is the PNMF script stored in {{val pnmf = ...}}.  
To replicate this, I used {{$SPARK_HOME/bin/spark-shell --master local[*] 
--driver-memory 1G --jars $SYSTEMML_HOME/target/SystemML.jar}}, and then 
{{:load test1.scala}} and {{:load test2.scala}} to run the scripts.  You will 
need the Amazon data in the same directory.

Also, smaller data sizes (2000) will allow {{test2.scala}} to run to 
completion, but it will run much slower than {{test1.scala}}.: 


was (Author: mwdus...@us.ibm.com):
[~mboehm7] I've added two Scala files with code that expresses the issue.  
{{test1.scala}} works correctly, and {{test2.scala}} has the issue described 
above.  The only difference is the PNMF script stored in {{val pnmf = ...}}.  
To replicate this, I used {{$SPARK_HOME/bin/spark-shell --master local[*] 
--driver-memory 1G --jars $SYSTEMML_HOME/target/SystemML.jar}}, and then 
{{:load test1.scala}} and {{:load test2.scala}} to run the scripts.  You will 
need the Amazon data in the same directory.

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> 

[jira] [Commented] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-02-17 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151623#comment-15151623
 ] 

Mike Dusenberry commented on SYSTEMML-512:
--

[~mboehm7] I've added two Scala files with code that expresses the issue.  
{{test1.scala}} works correctly, and {{test2.scala}} has the issue described 
above.  The only difference is the PNMF script stored in {{val pnmf = ...}}.  
To replicate this, I used {{$SPARK_HOME/bin/spark-shell --master local[*] 
--driver-memory 1G --jars $SYSTEMML_HOME/target/SystemML.jar}}, and then 
{{:load test1.scala}} and {{:load test2.scala}} to run the scripts.  You will 
need the Amazon data in the same directory.

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> 

[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[,2:]  # select all rows, and all columns except the first one
{code}.

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified.  It would be useful to be able to specify *either* a lower *or* 
upper bound, with the missing bound implicitly added internally.  This would 
allow for scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[,2:]  # select all rows, and all columns except the first one
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[,2:]  # select all rows, and all columns except the first one
> {code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
{code}.

This is the same functionality that [NumPy provides 
|http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html].

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified for a given row or column range.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
> {code}.
> This is the same functionality that [NumPy provides 
> |http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1-4, and columns 2-numColumns
{code}.

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[,2:]  # select all rows, and all columns except the first one
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified for a given row or column range.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[1:4, 2:]  # select rows 1-4, and columns 2-numColumns
> {code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to numColumns
{code}.

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1-4, and columns 2-numColumns
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified for a given row or column range.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to numColumns
> {code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
{code}.

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to numColumns
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified for a given row or column range.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
> {code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-516:


 Summary: Index Range Slicing Should Allow Implicit Upper Or Lower 
Bounds
 Key: SYSTEMML-516
 URL: https://issues.apache.org/jira/browse/SYSTEMML-516
 Project: SystemML
  Issue Type: Improvement
Reporter: Mike Dusenberry


DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified.  It would be useful to be able to specify *either* a lower *or* 
upper bound, with the missing bound implicitly added internally.  This would 
allow for scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[,2:]  # select all rows, and all columns except the first one
{code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-02-11 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-512:
-
Summary: DML Script With UDFs Results In Out Of Memory Error As Compared to 
Without UDFs  (was: DML Script With UDFs Results In Out Of Memory Error)

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
>   at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
> 

<    4   5   6   7   8   9