[jira] [Updated] (SYSTEMML-1703) Create tiny sample notebook to import Caffe VGG model and do prediction.

2017-07-12 Thread Arvind Surve (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Surve updated SYSTEMML-1703:
---
Sprint: Sprint 2  (was: Sprint 1)

> Create tiny sample notebook to import Caffe VGG model and do prediction.
> 
>
> Key: SYSTEMML-1703
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1703
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Arvind Surve
>Assignee: Arvind Surve
>
> Once this notebook is ready and functional, it will demonstrate following
>   - Caffe VGG model can be imported into SystemML
>   - This model can be used to classify the image (prediction)
> Assumptions:
>   - Functionality to import Caffe model is reasonably working
>   - Caffe is not installed on the system
>   
>   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1742) Do transfer learning using imported VGG model

2017-07-12 Thread Arvind Surve (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Surve updated SYSTEMML-1742:
---
Sprint:   (was: Sprint 2)

> Do transfer learning using imported VGG model
> -
>
> Key: SYSTEMML-1742
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1742
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Arvind Surve
>Assignee: Arvind Surve
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1465) Add stain normalization to preprocessing.

2017-07-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1465:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: SYSTEMML-1185)

> Add stain normalization to preprocessing.
> -
>
> Key: SYSTEMML-1465
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1465
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1465) Add stain normalization to preprocessing.

2017-07-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1465:
--
Issue Type: New Feature  (was: Bug)

> Add stain normalization to preprocessing.
> -
>
> Key: SYSTEMML-1465
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1465
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1462) Extract preprocessing code into a Python package + script

2017-07-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1462:
--
Issue Type: New Feature  (was: Sub-task)
Parent: (was: SYSTEMML-1185)

> Extract preprocessing code into a Python package + script
> -
>
> Key: SYSTEMML-1462
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1462
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 0.14
>
>
> Currently, the breast cancer preprocessing code is contained entirely within 
> a Jupyter notebook ({{Preprocessing.ipynb}}).  As the code has clearly 
> outgrown the notebook, this task aims to extract out the functions into a new 
> {{breastcancer}} Python package, along with a {{preprocess.py}} script.  The 
> notebook should have all functions removed, leaving only the settings and 
> execution of the preprocessing, and the new script will essentially be a copy 
> of that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1466) Update `convnet.dml` to use distributed SGD.

2017-07-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1466:
--
Issue Type: New Feature  (was: Sub-task)
Parent: (was: SYSTEMML-1185)

> Update `convnet.dml` to use distributed SGD.
> 
>
> Key: SYSTEMML-1466
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1466
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 1.0
>
> Attachments: convnet_distrib_sgd.dml, run_convnet_distrib_sgd.py, 
> run_convnet_distrib_sgd-stats.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1612) Add preprocessing unit tests.

2017-07-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1612:
--
Issue Type: Test  (was: Sub-task)
Parent: (was: SYSTEMML-1185)

> Add preprocessing unit tests.
> -
>
> Key: SYSTEMML-1612
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1612
> Project: SystemML
>  Issue Type: Test
>Reporter: Mike Dusenberry
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1185) SystemML Breast Cancer Project

2017-07-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1185:
--
Epic Name: SystemML Breast Cancer Project

> SystemML Breast Cancer Project
> --
>
> Key: SYSTEMML-1185
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1185
> Project: SystemML
>  Issue Type: Epic
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Attachments: approach.svg
>
>
> h1. Predicting Breast Cancer Proliferation Scores with Apache Spark and 
> Apache SystemML
> h3. Overview
> The [Tumor Proliferation Assessment Challenge 2016 (TUPAC16) | 
> http://tupac.tue-image.nl/] is a "Grand Challenge" that was created for the 
> [2016 Medical Image Computing and Computer Assisted Intervention (MICCAI 
> 2016) | http://miccai2016.org/en/] conference.  In this challenge, the goal 
> is to develop state-of-the-art algorithms for automatic prediction of tumor 
> proliferation scores from whole-slide histopathology images of breast tumors.
> h3. Background
> Breast cancer is the leading cause of cancerous death in women in 
> less-developed countries, and is the second leading cause of cancerous deaths 
> in developed countries, accounting for 29% of all cancers in women within the 
> U.S. \[1]. Survival rates increase as early detection increases, giving 
> incentive for pathologists and the medical world at large to develop improved 
> methods for even earlier detection \[2].  There are many forms of breast 
> cancer including Ductal Carcinoma in Situ (DCIS), Invasive Ductal Carcinoma 
> (IDC), Tubular Carcinoma of the Breast, Medullary Carcinoma of the Breast, 
> Invasive Lobular Carcinoma, Inflammatory Breast Cancer and several others 
> \[3]. Within all of these forms of breast cancer, the rate in which breast 
> cancer cells grow (proliferation), is a strong indicator of a patient’s 
> prognosis. Although there are many means of determining the presence of 
> breast cancer, tumor proliferation speed has been proven to help pathologists 
> determine the treatment for the patient. The most common technique for 
> determining the proliferation speed is through mitotic count (mitotic index) 
> estimates, in which a pathologist counts the dividing cell nuclei in 
> hematoxylin and eosin (H&E) stained slide preparations to determine the 
> number of mitotic bodies.  Given this, the pathologist produces a 
> proliferation score of either 1, 2, or 3, ranging from better to worse 
> prognosis \[4]. Unfortunately, this approach is known to have reproducibility 
> problems due to the variability in counting, as well as the difficulty in 
> distinguishing between different grades.
> References:  
> \[1] http://emedicine.medscape.com/article/1947145-overview#a3  
> \[2] http://emedicine.medscape.com/article/1947145-overview#a7  
> \[3] http://emedicine.medscape.com/article/1954658-overview  
> \[4] http://emedicine.medscape.com/article/1947145-workup#c12  
> h3. Goal & Approach
> In an effort to automate the process of classification, this project aims to 
> develop a large-scale deep learning approach for predicting tumor scores 
> directly from the pixels of whole-slide histopathology images.  Our proposed 
> approach is based on a recent research paper from Stanford \[1].  Starting 
> with 500 extremely high-resolution tumor slide images with accompanying score 
> labels, we aim to make use of Apache Spark in a preprocessing step to cut and 
> filter the images into smaller square samples, generating 4.7 million samples 
> for a total of ~7TB of data \[2].  We then utilize Apache SystemML on top of 
> Spark to develop and train a custom, large-scale, deep convolutional neural 
> network on these samples, making use of the familiar linear algebra syntax 
> and automatically-distributed execution of SystemML \[3].  Our model takes as 
> input the pixel values of the individual samples, and is trained to predict 
> the correct tumor score classification for each one.  In addition to 
> distributed linear algebra, we aim to exploit task-parallelism via parallel 
> for-loops for hyperparameter optimization, as well as hardware acceleration 
> for faster training via a GPU-backed runtime.  Ultimately, we aim to develop 
> a model that is sufficiently stronger than existing approaches for the task 
> of breast cancer tumor proliferation score classification.
> References:  
> \[1] https://web.stanford.edu/group/rubinlab/pubs/2243353.pdf  
> \[2] See [{{Preprocessing.ipynb}} | 
> https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/Preprocessing.ipynb].
>   
> \[3] See [{{MachineLearning.ipynb}} | 
> https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/MachineLearning.ipynb],
>  [{{softmax_clf.dml}} | 
> https://github.com/apache/incubator-sy

[jira] [Assigned] (SYSTEMML-1756) Potential infinite recursion in Explain#explain(DMLProgram, Program, ExplainType)

2017-07-12 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson reassigned SYSTEMML-1756:


Assignee: Ted Yu

> Potential infinite recursion in Explain#explain(DMLProgram, Program, 
> ExplainType)
> -
>
> Key: SYSTEMML-1756
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1756
> Project: SystemML
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>
> Here is related code:
> {code}
> public static String explain(DMLProgram prog, Program rtprog, 
> ExplainType type)
> throws HopsException, DMLRuntimeException, LanguageException {
> return explain(prog, rtprog, type);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not exist in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

If adding some codes to ignore the null output matrix block from {{MatrixBlock 
out = rix.get(ixtmp)}},  the distributed MNIST example could run in the Spark 
mode, but the result may not be right. 


  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not exist in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  


> Improve the robustness of sparse matrix reshape function for the Spark mode
> ---
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  there is a 
> {{java.lang.NullPointerException}} error when reshaping the sparse matrix. 
> The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not exist in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> If adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a 
{code:java}
java.lang.NullPointerException
{code}
 error when reshaping the sparse matrix. The involved function is 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse(org.apache.sysml.runtime.matrix.data.MatrixBlock,
 long, long, 
java.util.HashMap,
 long, long, long, long, int, int, boolean)` . The reason is that the output 
matrixIndex computed by 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex` 
does not exist in the `HashMap rix`. 

To reproduce the error, the attached scala file `MNIST_Distrib_Sgd` could be 
used to run the distributed MNIST example.  

  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a `java.lang.NullPointerException` error when reshaping the sparse 
matrix. The involved function is 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse(org.apache.sysml.runtime.matrix.data.MatrixBlock,
 long, long, 
java.util.HashMap,
 long, long, long, long, int, int, boolean)` . The reason is that the output 
matrixIndex computed by 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex` 
does not exist in the `HashMap rix`. 

To reproduce the error, the attached scala file `MNIST_Distrib_Sgd` could be 
used to run the distributed MNIST example.  


> Improve the robustness of sparse matrix reshape function for the Spark mode
> ---
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  there is a 
> {code:java}
> java.lang.NullPointerException
> {code}
>  error when reshaping the sparse matrix. The involved function is 
> `org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse(org.apache.sysml.runtime.matrix.data.MatrixBlock,
>  long, long, 
> java.util.HashMap,
>  long, long, long, long, int, int, boolean)` . The reason is that the output 
> matrixIndex computed by 
> `org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex` 
> does not exist in the `HashMap rix`. 
> To reproduce the error, the attached scala file `MNIST_Distrib_Sgd` could be 
> used to run the distributed MNIST example.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)
Fei Hu created SYSTEMML-1762:


 Summary: Improve the robustness of sparse matrix reshape function 
for the Spark mode
 Key: SYSTEMML-1762
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
 Project: SystemML
  Issue Type: Bug
  Components: Algorithms, ParFor, Runtime
Reporter: Fei Hu
 Attachments: MNIST_Distrib_Sgd.scala

When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a `java.lang.NullPointerException` error when reshaping the sparse 
matrix. The involved function is 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse(org.apache.sysml.runtime.matrix.data.MatrixBlock,
 long, long, 
java.util.HashMap,
 long, long, long, long, int, int, boolean)` . The reason is that the output 
matrixIndex computed by 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex` 
does not exist in the `HashMap rix`. 

To reproduce the error, the attached scala file `MNIST_Distrib_Sgd` could be 
used to run the distributed MNIST example.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not exist in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse(org.apache.sysml.runtime.matrix.data.MatrixBlock,
 long, long, 
java.util.HashMap,
 long, long, long, long, int, int, boolean)` . The reason is that the output 
matrixIndex computed by 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex` 
does not exist in the `HashMap rix`. 

To reproduce the error, the attached scala file `MNIST_Distrib_Sgd` could be 
used to run the distributed MNIST example.  


> Improve the robustness of sparse matrix reshape function for the Spark mode
> ---
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  there is a 
> {{java.lang.NullPointerException}} error when reshaping the sparse matrix. 
> The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not exist in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084521#comment-16084521
 ] 

Fei Hu commented on SYSTEMML-1762:
--

The error messages are as following:

{code:java}
17/07/12 12:04:47 ERROR TaskSetManager: Task 1 in stage 177.0 failed 1 times; 
aborting job
17/07/12 12:04:47 INFO TaskSetManager: Lost task 3.0 in stage 177.0 (TID 528) 
on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 
1]
17/07/12 12:04:47 INFO TaskSchedulerImpl: Cancelling stage 177
17/07/12 12:04:47 INFO TaskSchedulerImpl: Stage 177 was cancelled
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 2.0 in stage 
177.0 (TID 527)
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 0.0 in stage 
177.0 (TID 525)
17/07/12 12:04:47 INFO DAGScheduler: ShuffleMapStage 177 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.016 s due to Job aborted due 
to stage failure: Task 1 in stage 177.0 failed 1 times, most recent failure: 
Lost task 1.0 in stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
17/07/12 12:04:47 INFO DAGScheduler: Job 139 failed: fold at 
RDDAggregateUtils.java:137, took 0.018972 s
17/07/12 12:04:47 INFO Executor: Executor killed task 0.0 in stage 177.0 (TID 
525)
17/07/12 12:04:47 ERROR ParWorker: Failed to execute task (type=SET, 
iterations={[j=3]}), retry:0
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program 
block generated from statement block between lines 0 and 0 -- Error evaluating 
instruction: 
SPARK°uark+°_mVar3618·MATRIX·DOUBLE°_mVar3619·MATRIX·DOUBLE°SINGLE_BLOCK
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:316)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:217)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:163)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeSetTask(ParWorker.java:167)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeTask(ParWorker.java:136)
at 
org.apache.sysml.runtime.controlprogram.parfor.LocalParWorker.run(LocalParWorker.java:122)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 177.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
 

[jira] [Comment Edited] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084521#comment-16084521
 ] 

Fei Hu edited comment on SYSTEMML-1762 at 7/12/17 7:10 PM:
---

The error messages are as following:

{code:java}
17/07/12 12:04:47 ERROR TaskSetManager: Task 1 in stage 177.0 failed 1 times; 
aborting job
17/07/12 12:04:47 INFO TaskSetManager: Lost task 3.0 in stage 177.0 (TID 528) 
on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 
1]
17/07/12 12:04:47 INFO TaskSchedulerImpl: Cancelling stage 177
17/07/12 12:04:47 INFO TaskSchedulerImpl: Stage 177 was cancelled
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 2.0 in stage 
177.0 (TID 527)
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 0.0 in stage 
177.0 (TID 525)
17/07/12 12:04:47 INFO DAGScheduler: ShuffleMapStage 177 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.016 s due to Job aborted due 
to stage failure: Task 1 in stage 177.0 failed 1 times, most recent failure: 
Lost task 1.0 in stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
17/07/12 12:04:47 INFO DAGScheduler: Job 139 failed: fold at 
RDDAggregateUtils.java:137, took 0.018972 s
17/07/12 12:04:47 INFO Executor: Executor killed task 0.0 in stage 177.0 (TID 
525)
17/07/12 12:04:47 ERROR ParWorker: Failed to execute task (type=SET, 
iterations={[j=3]}), retry:0
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program 
block generated from statement block between lines 0 and 0 -- Error evaluating 
instruction: 
SPARK°uark+°_mVar3618·MATRIX·DOUBLE°_mVar3619·MATRIX·DOUBLE°SINGLE_BLOCK
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:316)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:217)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:163)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeSetTask(ParWorker.java:167)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeTask(ParWorker.java:136)
at 
org.apache.sysml.runtime.controlprogram.parfor.LocalParWorker.run(LocalParWorker.java:122)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 177.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.It

[jira] [Assigned] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu reassigned SYSTEMML-1762:


Assignee: Fei Hu

> Improve the robustness of sparse matrix reshape function for the Spark mode
> ---
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  there is a 
> {{java.lang.NullPointerException}} error when reshaping the sparse matrix. 
> The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not exist in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> If adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse(org.apache.sysml.runtime.matrix.data.MatrixBlock,
 long, long, 
java.util.HashMap,
 long, long, long, long, int, int, boolean)` . The reason is that the output 
matrixIndex computed by 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex` 
does not exist in the `HashMap rix`. 

To reproduce the error, the attached scala file `MNIST_Distrib_Sgd` could be 
used to run the distributed MNIST example.  

  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a 
{code:java}
java.lang.NullPointerException
{code}
 error when reshaping the sparse matrix. The involved function is 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse(org.apache.sysml.runtime.matrix.data.MatrixBlock,
 long, long, 
java.util.HashMap,
 long, long, long, long, int, int, boolean)` . The reason is that the output 
matrixIndex computed by 
`org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex` 
does not exist in the `HashMap rix`. 

To reproduce the error, the attached scala file `MNIST_Distrib_Sgd` could be 
used to run the distributed MNIST example.  


> Improve the robustness of sparse matrix reshape function for the Spark mode
> ---
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  there is a 
> {{java.lang.NullPointerException}} error when reshaping the sparse matrix. 
> The involved function is 
> `org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse(org.apache.sysml.runtime.matrix.data.MatrixBlock,
>  long, long, 
> java.util.HashMap,
>  long, long, long, long, int, int, boolean)` . The reason is that the output 
> matrixIndex computed by 
> `org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex` 
> does not exist in the `HashMap rix`. 
> To reproduce the error, the attached scala file `MNIST_Distrib_Sgd` could be 
> used to run the distributed MNIST example.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1763) Fix Explain countCompiledInstructions for CP

2017-07-12 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-1763:


 Summary: Fix Explain countCompiledInstructions for CP
 Key: SYSTEMML-1763
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1763
 Project: SystemML
  Issue Type: Bug
Reporter: Deron Eriksson
Assignee: Deron Eriksson
Priority: Minor


The counts.numCPInst++ in the countCompiledInstructions method of the Explain 
class should be incremented based on the boolean CP parameter, not the boolean 
SP parameter.

{code}
private static int countCompiledInstructions( ArrayList 
instSet, ExplainCounts counts, boolean MR, boolean CP, boolean SP )
{
int ret = 0;

for( Instruction inst : instSet )
{
if( MR && inst instanceof MRJobInstruction ) 
counts.numJobs++;
else if( SP && inst instanceof CPInstruction )
counts.numCPInst++;
else if( SP && inst instanceof SPInstruction )
counts.numJobs++;

//keep track of reblocks (in order to prevent 
unnecessary spark context creation)
if( SP && (inst instanceof CSVReblockSPInstruction || 
inst instanceof ReblockSPInstruction) )
counts.numReblocks++;
}

return ret;
}
{code}

Also, the return value is irrelevant so the method return type should be 
changed to void and ret should be removed.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1185) SystemML Breast Cancer Project

2017-07-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1185:
--
Issue Type: Epic  (was: New Feature)

> SystemML Breast Cancer Project
> --
>
> Key: SYSTEMML-1185
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1185
> Project: SystemML
>  Issue Type: Epic
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Attachments: approach.svg
>
>
> h1. Predicting Breast Cancer Proliferation Scores with Apache Spark and 
> Apache SystemML
> h3. Overview
> The [Tumor Proliferation Assessment Challenge 2016 (TUPAC16) | 
> http://tupac.tue-image.nl/] is a "Grand Challenge" that was created for the 
> [2016 Medical Image Computing and Computer Assisted Intervention (MICCAI 
> 2016) | http://miccai2016.org/en/] conference.  In this challenge, the goal 
> is to develop state-of-the-art algorithms for automatic prediction of tumor 
> proliferation scores from whole-slide histopathology images of breast tumors.
> h3. Background
> Breast cancer is the leading cause of cancerous death in women in 
> less-developed countries, and is the second leading cause of cancerous deaths 
> in developed countries, accounting for 29% of all cancers in women within the 
> U.S. \[1]. Survival rates increase as early detection increases, giving 
> incentive for pathologists and the medical world at large to develop improved 
> methods for even earlier detection \[2].  There are many forms of breast 
> cancer including Ductal Carcinoma in Situ (DCIS), Invasive Ductal Carcinoma 
> (IDC), Tubular Carcinoma of the Breast, Medullary Carcinoma of the Breast, 
> Invasive Lobular Carcinoma, Inflammatory Breast Cancer and several others 
> \[3]. Within all of these forms of breast cancer, the rate in which breast 
> cancer cells grow (proliferation), is a strong indicator of a patient’s 
> prognosis. Although there are many means of determining the presence of 
> breast cancer, tumor proliferation speed has been proven to help pathologists 
> determine the treatment for the patient. The most common technique for 
> determining the proliferation speed is through mitotic count (mitotic index) 
> estimates, in which a pathologist counts the dividing cell nuclei in 
> hematoxylin and eosin (H&E) stained slide preparations to determine the 
> number of mitotic bodies.  Given this, the pathologist produces a 
> proliferation score of either 1, 2, or 3, ranging from better to worse 
> prognosis \[4]. Unfortunately, this approach is known to have reproducibility 
> problems due to the variability in counting, as well as the difficulty in 
> distinguishing between different grades.
> References:  
> \[1] http://emedicine.medscape.com/article/1947145-overview#a3  
> \[2] http://emedicine.medscape.com/article/1947145-overview#a7  
> \[3] http://emedicine.medscape.com/article/1954658-overview  
> \[4] http://emedicine.medscape.com/article/1947145-workup#c12  
> h3. Goal & Approach
> In an effort to automate the process of classification, this project aims to 
> develop a large-scale deep learning approach for predicting tumor scores 
> directly from the pixels of whole-slide histopathology images.  Our proposed 
> approach is based on a recent research paper from Stanford \[1].  Starting 
> with 500 extremely high-resolution tumor slide images with accompanying score 
> labels, we aim to make use of Apache Spark in a preprocessing step to cut and 
> filter the images into smaller square samples, generating 4.7 million samples 
> for a total of ~7TB of data \[2].  We then utilize Apache SystemML on top of 
> Spark to develop and train a custom, large-scale, deep convolutional neural 
> network on these samples, making use of the familiar linear algebra syntax 
> and automatically-distributed execution of SystemML \[3].  Our model takes as 
> input the pixel values of the individual samples, and is trained to predict 
> the correct tumor score classification for each one.  In addition to 
> distributed linear algebra, we aim to exploit task-parallelism via parallel 
> for-loops for hyperparameter optimization, as well as hardware acceleration 
> for faster training via a GPU-backed runtime.  Ultimately, we aim to develop 
> a model that is sufficiently stronger than existing approaches for the task 
> of breast cancer tumor proliferation score classification.
> References:  
> \[1] https://web.stanford.edu/group/rubinlab/pubs/2243353.pdf  
> \[2] See [{{Preprocessing.ipynb}} | 
> https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/Preprocessing.ipynb].
>   
> \[3] See [{{MachineLearning.ipynb}} | 
> https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/MachineLearning.ipynb],
>  [{{softmax_clf.dml}} | 
> https://github.com/apache/incubator-systemm

[jira] [Resolved] (SYSTEMML-1756) Potential infinite recursion in Explain#explain(DMLProgram, Program, ExplainType)

2017-07-12 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1756.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Fixed by [PR568|https://github.com/apache/systemml/pull/568].

> Potential infinite recursion in Explain#explain(DMLProgram, Program, 
> ExplainType)
> -
>
> Key: SYSTEMML-1756
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1756
> Project: SystemML
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: SystemML 1.0
>
>
> Here is related code:
> {code}
> public static String explain(DMLProgram prog, Program rtprog, 
> ExplainType type)
> throws HopsException, DMLRuntimeException, LanguageException {
> return explain(prog, rtprog, type);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1756) Potential infinite recursion in Explain#explain(DMLProgram, Program, ExplainType)

2017-07-12 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1756.


> Potential infinite recursion in Explain#explain(DMLProgram, Program, 
> ExplainType)
> -
>
> Key: SYSTEMML-1756
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1756
> Project: SystemML
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: SystemML 1.0
>
>
> Here is related code:
> {code}
> public static String explain(DMLProgram prog, Program rtprog, 
> ExplainType type)
> throws HopsException, DMLRuntimeException, LanguageException {
> return explain(prog, rtprog, type);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not exist in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

In addition, if adding some codes to ignore the null output matrix block from 
{{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could run 
in the Spark mode, but the result may not be right. 


  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not exist in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

If adding some codes to ignore the null output matrix block from {{MatrixBlock 
out = rix.get(ixtmp)}},  the distributed MNIST example could run in the Spark 
mode, but the result may not be right. 



> Improve the robustness of sparse matrix reshape function for the Spark mode
> ---
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there is a 
> {{java.lang.NullPointerException}} error when reshaping the sparse matrix. 
> The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not exist in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not exist in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

If adding some codes to ignore the null output matrix block from {{MatrixBlock 
out = rix.get(ixtmp)}},  the distributed MNIST example could run in the Spark 
mode, but the result may not be right. 


  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not exist in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

If adding some codes to ignore the null output matrix block from {{MatrixBlock 
out = rix.get(ixtmp)}},  the distributed MNIST example could run in the Spark 
mode, but the result may not be right. 



> Improve the robustness of sparse matrix reshape function for the Spark mode
> ---
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there is a 
> {{java.lang.NullPointerException}} error when reshaping the sparse matrix. 
> The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not exist in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> If adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1764) Fix cbind value in AppendGAlignedSP constructor

2017-07-12 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-1764:


 Summary: Fix cbind value in AppendGAlignedSP constructor
 Key: SYSTEMML-1764
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1764
 Project: SystemML
  Issue Type: Bug
Reporter: Deron Eriksson
Assignee: Deron Eriksson
Priority: Minor


The _cbind field in AppendGAlignedSP is hardcoded to true in the constructor 
rather than being set by the cbind parameter.

{code}
public AppendGAlignedSP(Lop input1, Lop input2, Lop input3, DataType 
dt, ValueType vt, boolean cbind) 
{
super(Lop.Type.Append, dt, vt); 
init(input1, input2, input3, dt, vt);

_cbind = true;
}
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1765) Reading of dml scripts from object stores (main, mlcontext)

2017-07-12 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1765:


 Summary: Reading of dml scripts from object stores (main, 
mlcontext)
 Key: SYSTEMML-1765
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1765
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1766) Move experimental breast cancer project code into main repo

2017-07-12 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-1766:
-

 Summary: Move experimental breast cancer project code into main 
repo
 Key: SYSTEMML-1766
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1766
 Project: SystemML
  Issue Type: New Feature
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry


This aims to consolidate and cleanup experimental breast cancer project code in 
the main repo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1319) Statistical estimates over compressed matrix blocks

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1319.
--
Resolution: Done
  Assignee: Matthias Boehm

> Statistical estimates over compressed matrix blocks
> ---
>
> Key: SYSTEMML-1319
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1319
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Statistical estimates like moment, cov, aggregate, table, median, and 
> quantiles can be efficiently computed over compressed matrix blocks by 
> mapping distinct items + counts to weighted statistical estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1538) Improved dynamic recompilation (size update after rewrites)

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1538.
--
Resolution: Done
  Assignee: Matthias Boehm

> Improved dynamic recompilation (size update after rewrites)
> ---
>
> Key: SYSTEMML-1538
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1538
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Dynamic recompilation currently first updates matrix characteristics and 
> subsequently applied dynamic rewrites and operator selection which depend on 
> the updates stats. However, there are various scenarios where applied 
> rewrites simplify the propagation of statistics. Hence, we should 
> additionally update statistics after rewrites in order to increase the 
> potential of subsequent operator selection and code generation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1555) Decouple literal replacement from in-place recompilation

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1555.


> Decouple literal replacement from in-place recompilation
> 
>
> Key: SYSTEMML-1555
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1555
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> The current literal replacement framework contains basic scalar literal 
> replacement as well as the replacement of small matrix operations with their 
> literal results. If this framework is invoked with temporary matrix objects 
> created during size propagation any matrix operation would obviously fail. So 
> far, this created no problems because literal replacement was tied to 
> recompilations that are not in-place, i.e., recompilations that create a deep 
> copy of the hop dag, which in turn only happens for single-dag recompilations.
> This task aims to decouple the literal replacement from in-place 
> recompilations in order to increase the literal replacement potential and 
> allow for a more flexible use of this literal replacement framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1289) Support compressed matrix blocks

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1289.
--
Resolution: Done
  Assignee: Matthias Boehm

> Support compressed matrix blocks
> 
>
> Key: SYSTEMML-1289
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1289
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support all fused operator templates over compressed matrix 
> blocks, without decompression.
> 1) Cellwise and multi-aggregate operator templates (column-wise processing)
> 2) Row-wise operator templates (row decompression)
> 3) Outer-product operator templates (column-wise processing)
> 4) Exploitation of distinct tuples whenever safe to do so.
> 5) Side input handling with partial decompression (e.g., leverage random 
> access of DDC groups) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1289) Support compressed matrix blocks

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1289.


> Support compressed matrix blocks
> 
>
> Key: SYSTEMML-1289
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1289
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support all fused operator templates over compressed matrix 
> blocks, without decompression.
> 1) Cellwise and multi-aggregate operator templates (column-wise processing)
> 2) Row-wise operator templates (row decompression)
> 3) Outer-product operator templates (column-wise processing)
> 4) Exploitation of distinct tuples whenever safe to do so.
> 5) Side input handling with partial decompression (e.g., leverage random 
> access of DDC groups) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1538) Improved dynamic recompilation (size update after rewrites)

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1538.


> Improved dynamic recompilation (size update after rewrites)
> ---
>
> Key: SYSTEMML-1538
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1538
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Dynamic recompilation currently first updates matrix characteristics and 
> subsequently applied dynamic rewrites and operator selection which depend on 
> the updates stats. However, there are various scenarios where applied 
> rewrites simplify the propagation of statistics. Hence, we should 
> additionally update statistics after rewrites in order to increase the 
> potential of subsequent operator selection and code generation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1555) Decouple literal replacement from in-place recompilation

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1555.
--
Resolution: Done
  Assignee: Matthias Boehm

> Decouple literal replacement from in-place recompilation
> 
>
> Key: SYSTEMML-1555
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1555
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> The current literal replacement framework contains basic scalar literal 
> replacement as well as the replacement of small matrix operations with their 
> literal results. If this framework is invoked with temporary matrix objects 
> created during size propagation any matrix operation would obviously fail. So 
> far, this created no problems because literal replacement was tied to 
> recompilations that are not in-place, i.e., recompilations that create a deep 
> copy of the hop dag, which in turn only happens for single-dag recompilations.
> This task aims to decouple the literal replacement from in-place 
> recompilations in order to increase the literal replacement potential and 
> allow for a more flexible use of this literal replacement framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1766) Move experimental breast cancer project code into main repo

2017-07-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1766:
--
Sprint: Sprint 2

> Move experimental breast cancer project code into main repo
> ---
>
> Key: SYSTEMML-1766
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1766
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> This aims to consolidate and cleanup experimental breast cancer project code 
> in the main repo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084943#comment-16084943
 ] 

Mike Dusenberry commented on SYSTEMML-1762:
---

cc [~mboehm7]

> Improve the robustness of sparse matrix reshape function for the Spark mode
> ---
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there is a 
> {{java.lang.NullPointerException}} error when reshaping the sparse matrix. 
> The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not exist in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there are some errors 
about
{{java.lang.NullPointerException}}  and 
{{java.lang.ArrayIndexOutOfBoundsException: 1000}}when reshaping the sparse 
matrix. The involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not match the keys in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

In addition, if adding some codes to ignore the null output matrix block from 
{{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could run 
in the Spark mode, but the result may not be right. 


  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there is a 
{{java.lang.NullPointerException}} error when reshaping the sparse matrix. The 
involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not exist in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

In addition, if adding some codes to ignore the null output matrix block from 
{{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could run 
in the Spark mode, but the result may not be right. 



> Improve the robustness of sparse matrix reshape function for the Spark mode
> ---
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}}when reshaping the sparse 
> matrix. The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1767) Performance issues codegen rowwise (column aggregation) w/ wide matrices

2017-07-12 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1767:


 Summary: Performance issues codegen rowwise (column aggregation) 
w/ wide matrices
 Key: SYSTEMML-1767
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1767
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


On scenarios with wide matrices of millions of features, the codegen rowwise 
template shows performance issues due to unnecessary multi-threading which 
requires additional memory per thread for partial aggregation which leads to 
cache thrashing. We should similarly to the mmchain operator establish a 
threshold for maximum temporary results and fall back to sequential operations 
if this threshold is exceeded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Fix the sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Summary: Fix the sparse matrix reshape function for the Spark mode  (was: 
Improve the robustness of sparse matrix reshape function for the Spark mode)

> Fix the sparse matrix reshape function for the Spark mode
> -
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}}when reshaping the sparse 
> matrix. The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Fix the sparse matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there are some errors 
about
{{java.lang.NullPointerException}}  and 
{{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the sparse 
matrix. The involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not match the keys in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

In addition, if adding some codes to ignore the null output matrix block from 
{{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could run 
in the Spark mode, but the result may not be right. 


  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there are some errors 
about
{{java.lang.NullPointerException}}  and 
{{java.lang.ArrayIndexOutOfBoundsException: 1000}}when reshaping the sparse 
matrix. The involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not match the keys in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

In addition, if adding some codes to ignore the null output matrix block from 
{{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could run 
in the Spark mode, but the result may not be right. 



> Fix the sparse matrix reshape function for the Spark mode
> -
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the sparse 
> matrix. The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1762) Fix the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084968#comment-16084968
 ] 

Fei Hu commented on SYSTEMML-1762:
--

When setting the training parameters as following:

{code:java}
val N = 64
val Nval = 64
val Ntest = 1
val C = 3
val Hin = 224
val Win = 224
val K = 10
val batchSize = 32
val paralellBatches = 4
val epochs = 1
{code}

the errors come from the dense matrix as following:

{code:java}
17/07/12 17:20:40 INFO DAGScheduler: ShuffleMapStage 111 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.290 s due to Job aborted due 
to stage failure: Task 3 in stage 111.0 failed 1 times, most recent failure: 
Lost task 3.0 in stage 111.0 (TID 331, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeDense(LibMatrixReorg.java:1550)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:506)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}







> Fix the matrix reshape function for the Spark mode
> --
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. 
> The involved functions are 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense}}. The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SYSTEMML-1762) Fix the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084968#comment-16084968
 ] 

Fei Hu edited comment on SYSTEMML-1762 at 7/13/17 12:31 AM:


When setting the training parameters as following:

{code:java}
val N = 64
val Nval = 64
val Ntest = 1
val C = 3
val Hin = 224
val Win = 224
val K = 10
val batchSize = 32
val paralellBatches = 4
val epochs = 1
{code}

the errors come from the dense matrix reshape as following:

{code:java}
17/07/12 17:20:40 INFO DAGScheduler: ShuffleMapStage 111 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.290 s due to Job aborted due 
to stage failure: Task 3 in stage 111.0 failed 1 times, most recent failure: 
Lost task 3.0 in stage 111.0 (TID 331, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeDense(LibMatrixReorg.java:1550)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:506)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}








was (Author: tenma):
When setting the training parameters as following:

{code:java}
val N = 64
val Nval = 64
val Ntest = 1
val C = 3
val Hin = 224
val Win = 224
val K = 10
val batchSize = 32
val paralellBatches = 4
val epochs = 1
{code}

the errors come from the dense matrix as following:

{code:java}
17/07/12 17:20:40 INFO DAGScheduler: ShuffleMapStage 111 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.290 s due to Job aborted due 
to stage failure: Task 3 in stage 111.0 failed 1 times, most recent failure: 
Lost task 3.0 in stage 111.0 (TID 331, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeDense(LibMatrixReorg.java:1550)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:506)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}







> Fix the matrix reshape function for the Spark mode
> --
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse

[jira] [Comment Edited] (SYSTEMML-1762) Fix the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084521#comment-16084521
 ] 

Fei Hu edited comment on SYSTEMML-1762 at 7/13/17 12:31 AM:


When setting the training parameters as 
{code:java}
  val N = 1
val Nval = 1
val Ntest = 1
val C = 3
val Hin = 224
val Win = 224
val K = 10
val batchSize = 32
val paralellBatches = 4
val epochs = 1
{code}

the errors comes from the sparse matrix reshape, and the error messages are as 
following:

{code:java}
17/07/12 12:04:47 ERROR TaskSetManager: Task 1 in stage 177.0 failed 1 times; 
aborting job
17/07/12 12:04:47 INFO TaskSetManager: Lost task 3.0 in stage 177.0 (TID 528) 
on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 
1]
17/07/12 12:04:47 INFO TaskSchedulerImpl: Cancelling stage 177
17/07/12 12:04:47 INFO TaskSchedulerImpl: Stage 177 was cancelled
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 2.0 in stage 
177.0 (TID 527)
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 0.0 in stage 
177.0 (TID 525)
17/07/12 12:04:47 INFO DAGScheduler: ShuffleMapStage 177 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.016 s due to Job aborted due 
to stage failure: Task 1 in stage 177.0 failed 1 times, most recent failure: 
Lost task 1.0 in stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
17/07/12 12:04:47 INFO DAGScheduler: Job 139 failed: fold at 
RDDAggregateUtils.java:137, took 0.018972 s
17/07/12 12:04:47 INFO Executor: Executor killed task 0.0 in stage 177.0 (TID 
525)
17/07/12 12:04:47 ERROR ParWorker: Failed to execute task (type=SET, 
iterations={[j=3]}), retry:0
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program 
block generated from statement block between lines 0 and 0 -- Error evaluating 
instruction: 
SPARK°uark+°_mVar3618·MATRIX·DOUBLE°_mVar3619·MATRIX·DOUBLE°SINGLE_BLOCK
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:316)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:217)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:163)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeSetTask(ParWorker.java:167)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeTask(ParWorker.java:136)
at 
org.apache.sysml.runtime.controlprogram.parfor.LocalParWorker.run(LocalParWorker.java:122)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 177.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixRes

[jira] [Updated] (SYSTEMML-1762) Fix the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there are some errors 
about
{{java.lang.NullPointerException}}  and 
{{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. 
The involved functions are 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense(org.apache.sysml.runtime.matrix.data.MatrixBlock,
 long, long, 
java.util.HashMap,
 long, long, long, long, int, int, boolean)}}. The reason is that the output 
matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not match the keys in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

In addition, if adding some codes to ignore the null output matrix block from 
{{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could run 
in the Spark mode, but the result may not be right. 


  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there are some errors 
about
{{java.lang.NullPointerException}}  and 
{{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the sparse 
matrix. The involved function is 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not match the keys in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

In addition, if adding some codes to ignore the null output matrix block from 
{{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could run 
in the Spark mode, but the result may not be right. 



> Fix the matrix reshape function for the Spark mode
> --
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. 
> The involved functions are 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense(org.apache.sysml.runtime.matrix.data.MatrixBlock,
>  long, long, 
> java.util.HashMap,
>  long, long, long, long, int, int, boolean)}}. The reason is that the output 
> matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Fix the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Summary: Fix the matrix reshape function for the Spark mode  (was: Fix the 
sparse matrix reshape function for the Spark mode)

> Fix the matrix reshape function for the Spark mode
> --
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the sparse 
> matrix. The involved function is 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} . The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1762) Fix the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Description: 
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there are some errors 
about
{{java.lang.NullPointerException}}  and 
{{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. 
The involved functions are 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense}}. The 
reason is that the output matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not match the keys in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

In addition, if adding some codes to ignore the null output matrix block from 
{{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could run 
in the Spark mode, but the result may not be right. 


  was:
When running the [distributed MNIST LeNet example | 
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
 it works well in the hybrid mode. But in the Spark mode, there are some errors 
about
{{java.lang.NullPointerException}}  and 
{{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. 
The involved functions are 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense(org.apache.sysml.runtime.matrix.data.MatrixBlock,
 long, long, 
java.util.HashMap,
 long, long, long, long, int, int, boolean)}}. The reason is that the output 
matrix index computed by 
{{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}} 
does not match the keys in the {{HashMap rix}}. 

To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
could be used to run the distributed MNIST example.  

In addition, if adding some codes to ignore the null output matrix block from 
{{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could run 
in the Spark mode, but the result may not be right. 



> Fix the matrix reshape function for the Spark mode
> --
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. 
> The involved functions are 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense}}. The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SYSTEMML-1762) Fix the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084521#comment-16084521
 ] 

Fei Hu edited comment on SYSTEMML-1762 at 7/13/17 12:28 AM:


When setting the training parameters as 
{code:java}
  val N = 1
val Nval = 1
val Ntest = 1
val C = 3
val Hin = 224
val Win = 224
val K = 10
val batchSize = 32
val paralellBatches = 4
val epochs = 1
{code}

the error messages are as following:

{code:java}
17/07/12 12:04:47 ERROR TaskSetManager: Task 1 in stage 177.0 failed 1 times; 
aborting job
17/07/12 12:04:47 INFO TaskSetManager: Lost task 3.0 in stage 177.0 (TID 528) 
on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 
1]
17/07/12 12:04:47 INFO TaskSchedulerImpl: Cancelling stage 177
17/07/12 12:04:47 INFO TaskSchedulerImpl: Stage 177 was cancelled
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 2.0 in stage 
177.0 (TID 527)
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 0.0 in stage 
177.0 (TID 525)
17/07/12 12:04:47 INFO DAGScheduler: ShuffleMapStage 177 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.016 s due to Job aborted due 
to stage failure: Task 1 in stage 177.0 failed 1 times, most recent failure: 
Lost task 1.0 in stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
17/07/12 12:04:47 INFO DAGScheduler: Job 139 failed: fold at 
RDDAggregateUtils.java:137, took 0.018972 s
17/07/12 12:04:47 INFO Executor: Executor killed task 0.0 in stage 177.0 (TID 
525)
17/07/12 12:04:47 ERROR ParWorker: Failed to execute task (type=SET, 
iterations={[j=3]}), retry:0
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program 
block generated from statement block between lines 0 and 0 -- Error evaluating 
instruction: 
SPARK°uark+°_mVar3618·MATRIX·DOUBLE°_mVar3619·MATRIX·DOUBLE°SINGLE_BLOCK
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:316)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:217)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:163)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeSetTask(ParWorker.java:167)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeTask(ParWorker.java:136)
at 
org.apache.sysml.runtime.controlprogram.parfor.LocalParWorker.run(LocalParWorker.java:122)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 177.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshap

[jira] [Updated] (SYSTEMML-1766) Move experimental breast cancer project code into main repo

2017-07-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1766:
--
Description: This aims to consolidate and cleanup experimental breast 
cancer project code, and move it into the main repo.  (was: This aims to 
consolidate and cleanup experimental breast cancer project code in the main 
repo.)

> Move experimental breast cancer project code into main repo
> ---
>
> Key: SYSTEMML-1766
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1766
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> This aims to consolidate and cleanup experimental breast cancer project code, 
> and move it into the main repo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1763) Fix Explain countCompiledInstructions for CP

2017-07-12 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1763.


> Fix Explain countCompiledInstructions for CP
> 
>
> Key: SYSTEMML-1763
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1763
> Project: SystemML
>  Issue Type: Bug
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>Priority: Minor
> Fix For: SystemML 1.0
>
>
> The counts.numCPInst++ in the countCompiledInstructions method of the Explain 
> class should be incremented based on the boolean CP parameter, not the 
> boolean SP parameter.
> {code}
>   private static int countCompiledInstructions( ArrayList 
> instSet, ExplainCounts counts, boolean MR, boolean CP, boolean SP )
>   {
>   int ret = 0;
>   
>   for( Instruction inst : instSet )
>   {
>   if( MR && inst instanceof MRJobInstruction ) 
>   counts.numJobs++;
>   else if( SP && inst instanceof CPInstruction )
>   counts.numCPInst++;
>   else if( SP && inst instanceof SPInstruction )
>   counts.numJobs++;
>   
>   //keep track of reblocks (in order to prevent 
> unnecessary spark context creation)
>   if( SP && (inst instanceof CSVReblockSPInstruction || 
> inst instanceof ReblockSPInstruction) )
>   counts.numReblocks++;
>   }
>   
>   return ret;
>   }
> {code}
> Also, the return value is irrelevant so the method return type should be 
> changed to void and ret should be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1763) Fix Explain countCompiledInstructions for CP

2017-07-12 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1763.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Fixed by [PR569|https://github.com/apache/systemml/pull/569].

> Fix Explain countCompiledInstructions for CP
> 
>
> Key: SYSTEMML-1763
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1763
> Project: SystemML
>  Issue Type: Bug
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>Priority: Minor
> Fix For: SystemML 1.0
>
>
> The counts.numCPInst++ in the countCompiledInstructions method of the Explain 
> class should be incremented based on the boolean CP parameter, not the 
> boolean SP parameter.
> {code}
>   private static int countCompiledInstructions( ArrayList 
> instSet, ExplainCounts counts, boolean MR, boolean CP, boolean SP )
>   {
>   int ret = 0;
>   
>   for( Instruction inst : instSet )
>   {
>   if( MR && inst instanceof MRJobInstruction ) 
>   counts.numJobs++;
>   else if( SP && inst instanceof CPInstruction )
>   counts.numCPInst++;
>   else if( SP && inst instanceof SPInstruction )
>   counts.numJobs++;
>   
>   //keep track of reblocks (in order to prevent 
> unnecessary spark context creation)
>   if( SP && (inst instanceof CSVReblockSPInstruction || 
> inst instanceof ReblockSPInstruction) )
>   counts.numReblocks++;
>   }
>   
>   return ret;
>   }
> {code}
> Also, the return value is irrelevant so the method return type should be 
> changed to void and ret should be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1764) Fix cbind value in AppendGAlignedSP constructor

2017-07-12 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1764.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Fixed by [PR571|https://github.com/apache/systemml/pull/571].

> Fix cbind value in AppendGAlignedSP constructor
> ---
>
> Key: SYSTEMML-1764
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1764
> Project: SystemML
>  Issue Type: Bug
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>Priority: Minor
> Fix For: SystemML 1.0
>
>
> The _cbind field in AppendGAlignedSP is hardcoded to true in the constructor 
> rather than being set by the cbind parameter.
> {code}
>   public AppendGAlignedSP(Lop input1, Lop input2, Lop input3, DataType 
> dt, ValueType vt, boolean cbind) 
>   {
>   super(Lop.Type.Append, dt, vt); 
>   init(input1, input2, input3, dt, vt);
>   
>   _cbind = true;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1764) Fix cbind value in AppendGAlignedSP constructor

2017-07-12 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1764.


> Fix cbind value in AppendGAlignedSP constructor
> ---
>
> Key: SYSTEMML-1764
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1764
> Project: SystemML
>  Issue Type: Bug
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>Priority: Minor
> Fix For: SystemML 1.0
>
>
> The _cbind field in AppendGAlignedSP is hardcoded to true in the constructor 
> rather than being set by the cbind parameter.
> {code}
>   public AppendGAlignedSP(Lop input1, Lop input2, Lop input3, DataType 
> dt, ValueType vt, boolean cbind) 
>   {
>   super(Lop.Type.Append, dt, vt); 
>   init(input1, input2, input3, dt, vt);
>   
>   _cbind = true;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1319) Statistical estimates over compressed matrix blocks

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1319.


> Statistical estimates over compressed matrix blocks
> ---
>
> Key: SYSTEMML-1319
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1319
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Statistical estimates like moment, cov, aggregate, table, median, and 
> quantiles can be efficiently computed over compressed matrix blocks by 
> mapping distinct items + counts to weighted statistical estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1766) Move experimental breast cancer project code into main repo

2017-07-12 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085026#comment-16085026
 ] 

Mike Dusenberry commented on SYSTEMML-1766:
---

[PR 573 | https://github.com/apache/systemml/pull/573] submitted.

> Move experimental breast cancer project code into main repo
> ---
>
> Key: SYSTEMML-1766
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1766
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> This aims to consolidate and cleanup experimental breast cancer project code 
> in the main repo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1768) Cleanup SystemML-config.xml

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1768:
-
Labels: beginner  (was: )

> Cleanup SystemML-config.xml
> ---
>
> Key: SYSTEMML-1768
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1768
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Compiler, Runtime
>Reporter: Matthias Boehm
>  Labels: beginner
> Fix For: SystemML 1.0
>
>
> cp.parallel.matrixmult -> cp.parallel.ops
> cp.parallel.textio -> cp.parallel.io



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1426) Rename builtin function ceil to ceiling

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1426:
-
Labels: beginner  (was: )

> Rename builtin function ceil to ceiling
> ---
>
> Key: SYSTEMML-1426
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1426
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Compiler, Runtime
>Reporter: Matthias Boehm
>  Labels: beginner
> Fix For: SystemML 1.0
>
>
> The builtin function ceil unnecessarily differs from R's ceiling, which might 
> cause confusion. Hence, this task aims to rename ceil to ceiling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1768) Cleanup SystemML-config.xml

2017-07-12 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1768:


 Summary: Cleanup SystemML-config.xml
 Key: SYSTEMML-1768
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1768
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm


cp.parallel.matrixmult -> cp.parallel.ops
cp.parallel.textio -> cp.parallel.io



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SYSTEMML-1762) Fix the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084968#comment-16084968
 ] 

Fei Hu edited comment on SYSTEMML-1762 at 7/13/17 3:04 AM:
---

When setting the training parameters as following:

{code:java}
val N = 64
val Nval = 64
val Ntest = 1
val C = 3
val Hin = 224
val Win = 224
val K = 10
val batchSize = 32
val paralellBatches = 4
val epochs = 1
{code}

the errors come from the dense matrix reshape as following:

{code:java}
17/07/12 17:20:40 INFO DAGScheduler: ShuffleMapStage 111 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.290 s due to Job aborted due 
to stage failure: Task 3 in stage 111.0 failed 1 times, most recent failure: 
Lost task 3.0 in stage 111.0 (TID 331, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeDense(LibMatrixReorg.java:1550)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:506)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}








was (Author: tenma):
When setting the training parameters as following:

{code:java}
val N = 64
val Nval = 64
val Ntest = 1
val C = 3
val Hin = 224
val Win = 224
val K = 10
val batchSize = 32
val paralellBatches = 4
val epochs = 1
{code}

the errors come from the dense matrix reshape as following:

{code:java}
17/07/12 17:20:40 INFO DAGScheduler: ShuffleMapStage 111 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.290 s due to Job aborted due 
to stage failure: Task 3 in stage 111.0 failed 1 times, most recent failure: 
Lost task 3.0 in stage 111.0 (TID 331, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeDense(LibMatrixReorg.java:1550)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:506)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}







> Fix the matrix reshape function for the Spark mode
> --
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/j

[jira] [Comment Edited] (SYSTEMML-1762) Fix the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084521#comment-16084521
 ] 

Fei Hu edited comment on SYSTEMML-1762 at 7/13/17 3:05 AM:
---

When setting the training parameters as 
{code:java}
val N = 1
val Nval = 1
val Ntest = 1
val C = 3
val Hin = 224
val Win = 224
val K = 10
val batchSize = 32
val paralellBatches = 4
val epochs = 1
{code}

the errors comes from the sparse matrix reshape, and the error messages are as 
following:

{code:java}
17/07/12 12:04:47 ERROR TaskSetManager: Task 1 in stage 177.0 failed 1 times; 
aborting job
17/07/12 12:04:47 INFO TaskSetManager: Lost task 3.0 in stage 177.0 (TID 528) 
on localhost, executor driver: java.lang.NullPointerException (null) [duplicate 
1]
17/07/12 12:04:47 INFO TaskSchedulerImpl: Cancelling stage 177
17/07/12 12:04:47 INFO TaskSchedulerImpl: Stage 177 was cancelled
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 2.0 in stage 
177.0 (TID 527)
17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 0.0 in stage 
177.0 (TID 525)
17/07/12 12:04:47 INFO DAGScheduler: ShuffleMapStage 177 (flatMapToPair at 
MatrixReshapeSPInstruction.java:106) failed in 0.016 s due to Job aborted due 
to stage failure: Task 1 in stage 177.0 failed 1 times, most recent failure: 
Lost task 1.0 in stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
17/07/12 12:04:47 INFO DAGScheduler: Job 139 failed: fold at 
RDDAggregateUtils.java:137, took 0.018972 s
17/07/12 12:04:47 INFO Executor: Executor killed task 0.0 in stage 177.0 (TID 
525)
17/07/12 12:04:47 ERROR ParWorker: Failed to execute task (type=SET, 
iterations={[j=3]}), retry:0
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program 
block generated from statement block between lines 0 and 0 -- Error evaluating 
instruction: 
SPARK°uark+°_mVar3618·MATRIX·DOUBLE°_mVar3619·MATRIX·DOUBLE°SINGLE_BLOCK
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:316)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:217)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:163)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeSetTask(ParWorker.java:167)
at 
org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeTask(ParWorker.java:136)
at 
org.apache.sysml.runtime.controlprogram.parfor.LocalParWorker.run(LocalParWorker.java:122)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 177.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 177.0 (TID 526, localhost, executor driver): 
java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(LibMatrixReorg.java:1591)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrixReorg.java:504)
at 
org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstruction$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138)
at 
org.apache.sysml.runtime.instructions.spark.MatrixRes

[jira] [Updated] (SYSTEMML-1762) Improve the matrix reshape function for the Spark mode

2017-07-12 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1762:
-
Summary: Improve the matrix reshape function for the Spark mode  (was: Fix 
the matrix reshape function for the Spark mode)

> Improve the matrix reshape function for the Spark mode
> --
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. 
> The involved functions are 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense}}. The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1762) Improve the matrix reshape function for the Spark mode

2017-07-12 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085249#comment-16085249
 ] 

Matthias Boehm commented on SYSTEMML-1762:
--

thanks [~Tenma] for catching this issue. As it turned out, this issue occurs in 
the special case if, for a given input block, we create at least three output 
blocks and the first and last output block have the same row index. For 
example, if we have an output matrix of 13 column blocks and we computed (1,12) 
and (1,1) as the first and last output block index, we missed the middle index 
(1,13).

> Improve the matrix reshape function for the Spark mode
> --
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. 
> The involved functions are 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense}}. The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)