Re: machine learning - Some tests failure when build systemML project -Stack Overflow

2016-04-11 Thread 281165273
Thanks your quick response. But I saw many different errors occur on different 
test case. I archived the failsafe-report. Would you like help to see it?




-- Original --
From:  "Matthias Boehm";;
Date:  Tue, Apr 12, 2016 11:31 AM
To:  "dev"; 
Cc:  ""<281165...@qq.com>; 
Subject:  Re: machine learning - Some tests failure when build systemML project 
-Stack Overflow




well the error is not coming from R but from SystemML's runtime. Could you 
please provide the full stacktrace to see what is going on here?

Regards,
Matthias

281165273---04/11/2016 08:22:18 PM---Sorry to bother you guys, I am a developer 
from IBM and interesting in this project. I just ask a qu

From:281165...@qq.com
To:"dev" 
Date:04/11/2016 08:22 PM
Subject:machine learning - Some tests failure when build systemML 
project - Stack Overflow





Sorry to bother you guys, I am a developer from IBM and interesting in this 
project. I just ask a question 
(http://stackoverflow.com/questions/36562951/some-tests-failure-when-build-systemml-project)
 on stackoverflow, but I don't find any tag for this project, could I post it 
in this mail list?

Re: Fw: Updating documentation for notebook

2016-04-11 Thread Niketan Pansare

Hi Deron,

I too like the idea of having a single command but rather than supporting
web-datasets in read(), how about having a Java/Scala wrapper (see below
point 1) ?

1. Let's have a wrapper org.apache.sysml.api.Datasets which has following
methods:
a. load_*() similar to
http://scikit-learn.org/stable/datasets/#toy-datasets. These methods
download the toy dataset (if not already downloaded), puts it in a
configurable tmp directory and pushes it to underlying FS.
b. make_*() similar to
http://scikit-learn.org/stable/datasets/#sample-generators. These methods
call the DML scripts in the folder
https://github.com/apache/incubator-systemml/tree/master/scripts/datagen
using MLContext/JMLC.

The load_*() methods helps creates interesting demos (but which will likely
run in CP), whereas make_*() will test the scalability of SystemML :)

2. We need to embed all our existing DML scripts into the jar with an
option for the user to provide a custom script directory. This allows the
user to simply import the jar (without downloading the scripts) and run one
of our wrapper org.apache.sysml.api.ml.LogisticRegression.

3. MLPipeline wrappers need to be implemented for the scripts in
https://github.com/apache/incubator-systemml/tree/master/scripts/algorithms
. A sample implementation is available at
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/ml/LogisticRegression.java

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   Deron Eriksson 
To: dev@systemml.incubator.apache.org
Date:   04/11/2016 10:46 AM
Subject:Re: Fw: Updating documentation for notebook



Hi Niketan,

I think a separate section for Notebooks is a great idea since, as you
point out, they are hidden under the MLContext section. Also, I really like
the idea of making it as easy as possible for a new user to try out
SystemML in a Notebook. Very good points.

Tutorials for all the algorithms using real-world data would be fantastic.
To me, I would also like to see single-line algorithm invocations (possibly
with generated data) that could be copy/pasted that work with no
modifications needed by the user. This would probably mean either including
small sets of example data in the project, or allowing the reading of data
from URLs.

It would be nice to take something like these 5 commands:
---
$ wget
https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/datagen/genRandData4Univariate.dml

$ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f
genRandData4Univariate.dml -exec hybrid_spark -args 100 100 10 1 2 3 4
uni.mtx
$ echo '1' > uni-types.csv
$ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
$ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f
$SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs
X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt
---
and reduce this to 1 command (in the documentation) that the user can
copy/paste and the algorithm runs without any additional work needed by the
user:
---
$ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f
$SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=
http://www.example.com/uni.mtx TYPES=http://www.example.com/uni-types.csv
STATS=uni-stats.txt
---
If we had this for each of the main algorithms, this would give the users
working examples to start with, which is easier than trying to figure out
this kind of thing by reading the comments in the DML algorithm files.

Deron


On Fri, Apr 8, 2016 at 4:51 PM, Niketan Pansare  wrote:

> Hi all,
>
> As per Luciano's suggestion, I have create a PR with
bluemix/datascientist
> tutorial and have flagged it with "Please DONOT push this PR until the
> discussion on dev mailing list is complete." :)
>
> Also, I apologize for incorrect indentation in last email. Here is
another
> attempt:
> - How do you want try SystemML ?
> --+ Notebook on cloud
> * Bluemix
> -- + Zeppelin
> --- Using Python Kernel
>  + Learn how to write DML program--(something along the lines
> of
>
http://apache.github.io/incubator-systemml/beginners-guide-to-dml-and-pydml.html

> )
>  + Try out pre-packaged algorithms on real-world dataset
> -- * Linear Regression
> -- * GLM
> -- * ALS
> -- * ...
>  + Learn how to pass RDD/DataFrame to SystemML
>  + Learn how to use SystemML as MLPipeline
> estimator/transformer
>  + Learn how to use SystemML with existing Python packages
> --- Using Scala Kernel
>  + ... similar to Python kernel
> --- Using DML Kernel
>  + Learn how to write DML program
> -- + Jupyter
> - Using Python Kernel
> - Using Scala Kernel
> - Using DML Kernel
> * Data 

Re: Fw: Updating documentation for notebook

2016-04-11 Thread Deron Eriksson
Hi Niketan,

I think a separate section for Notebooks is a great idea since, as you
point out, they are hidden under the MLContext section. Also, I really like
the idea of making it as easy as possible for a new user to try out
SystemML in a Notebook. Very good points.

Tutorials for all the algorithms using real-world data would be fantastic.
To me, I would also like to see single-line algorithm invocations (possibly
with generated data) that could be copy/pasted that work with no
modifications needed by the user. This would probably mean either including
small sets of example data in the project, or allowing the reading of data
from URLs.

It would be nice to take something like these 5 commands:
---
$ wget
https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/datagen/genRandData4Univariate.dml
$ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f
genRandData4Univariate.dml -exec hybrid_spark -args 100 100 10 1 2 3 4
uni.mtx
$ echo '1' > uni-types.csv
$ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
$ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f
$SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs
X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt
---
and reduce this to 1 command (in the documentation) that the user can
copy/paste and the algorithm runs without any additional work needed by the
user:
---
$ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f
$SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=
http://www.example.com/uni.mtx TYPES=http://www.example.com/uni-types.csv
STATS=uni-stats.txt
---
If we had this for each of the main algorithms, this would give the users
working examples to start with, which is easier than trying to figure out
this kind of thing by reading the comments in the DML algorithm files.

Deron


On Fri, Apr 8, 2016 at 4:51 PM, Niketan Pansare  wrote:

> Hi all,
>
> As per Luciano's suggestion, I have create a PR with bluemix/datascientist
> tutorial and have flagged it with "Please DONOT push this PR until the
> discussion on dev mailing list is complete." :)
>
> Also, I apologize for incorrect indentation in last email. Here is another
> attempt:
> - How do you want try SystemML ?
> --+ Notebook on cloud
> * Bluemix
> -- + Zeppelin
> --- Using Python Kernel
>  + Learn how to write DML program--(something along the lines
> of
> http://apache.github.io/incubator-systemml/beginners-guide-to-dml-and-pydml.html
> )
>  + Try out pre-packaged algorithms on real-world dataset
> -- * Linear Regression
> -- * GLM
> -- * ALS
> -- * ...
>  + Learn how to pass RDD/DataFrame to SystemML
>  + Learn how to use SystemML as MLPipeline
> estimator/transformer
>  + Learn how to use SystemML with existing Python packages
> --- Using Scala Kernel
>  + ... similar to Python kernel
> --- Using DML Kernel
>  + Learn how to write DML program
> -- + Jupyter
> - Using Python Kernel
> - Using Scala Kernel
> - Using DML Kernel
> * Data scientist's work bench
> * Databricks cloud
> * ...
> --+ Notebook on laptop/cluster
> * Zeppelin
> * Jupyter
> --+ Laptop
> * Run SystemML as Standalone jar:
> http://apache.github.io/incubator-systemml/quick-start-guide.html
> * Embed SystemML into other Java program:
> http://apache.github.io/incubator-systemml/jmlc.html
> * Debug a DML script:
> http://apache.github.io/incubator-systemml/debugger-guide.html
> * Spark local mode
> --+ Spark Cluster
> * Batch invocation
> * Using Spark REPL
> --+ Learn how to pass RDD/DataFrame to SystemML
> --+ Learn how to use SystemML as MLPipeline estimator/transformer
> * Using PySpark REPL
> --+ Learn how to pass RDD/DataFrame to SystemML
> --+ Learn how to use SystemML as MLPipeline estimator/transformer
> --+ Hadoop Cluster
> --+ Spark Cluster on EC2
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> - Forwarded by Niketan Pansare/Almaden/IBM on 04/08/2016 04:48 PM
> -
>
>
>
> *Fw: Updating documentation for notebook*
>
> *Niketan Pansare *
> to:
> dev
> 04/08/2016 01:11 PM
>
>
>
>
> From:
> Niketan Pansare/Almaden/IBM
>
>
>
>
> To:
> dev 
>
> Hi all,
>
> Here are few suggestions to get things started:
> 1. Have a "Quick Start" (or "Get Started") button besides "Get SystemML"
> on http://systemml.apache.org/.
>
> 2. Then user can go through following questionnaire/bulleted list which
> points people to appropriate link:
> - How do you want try SystemML ?
> + Notebook on cloud
> * Bluemix
> + Zeppelin
> - Using Python Kernel
> + Learn how to write DML program (something along the