Re: machine learning - Some tests failure when build systemML project -Stack Overflow
Thanks your quick response. But I saw many different errors occur on different test case. I archived the failsafe-report. Would you like help to see it? -- Original -- From: "Matthias Boehm";; Date: Tue, Apr 12, 2016 11:31 AM To: "dev" ; Cc: ""<281165...@qq.com>; Subject: Re: machine learning - Some tests failure when build systemML project -Stack Overflow well the error is not coming from R but from SystemML's runtime. Could you please provide the full stacktrace to see what is going on here? Regards, Matthias 281165273---04/11/2016 08:22:18 PM---Sorry to bother you guys, I am a developer from IBM and interesting in this project. I just ask a qu From:281165...@qq.com To:"dev" Date:04/11/2016 08:22 PM Subject:machine learning - Some tests failure when build systemML project - Stack Overflow Sorry to bother you guys, I am a developer from IBM and interesting in this project. I just ask a question (http://stackoverflow.com/questions/36562951/some-tests-failure-when-build-systemml-project) on stackoverflow, but I don't find any tag for this project, could I post it in this mail list?
Re: Fw: Updating documentation for notebook
Hi Deron, I too like the idea of having a single command but rather than supporting web-datasets in read(), how about having a Java/Scala wrapper (see below point 1) ? 1. Let's have a wrapper org.apache.sysml.api.Datasets which has following methods: a. load_*() similar to http://scikit-learn.org/stable/datasets/#toy-datasets. These methods download the toy dataset (if not already downloaded), puts it in a configurable tmp directory and pushes it to underlying FS. b. make_*() similar to http://scikit-learn.org/stable/datasets/#sample-generators. These methods call the DML scripts in the folder https://github.com/apache/incubator-systemml/tree/master/scripts/datagen using MLContext/JMLC. The load_*() methods helps creates interesting demos (but which will likely run in CP), whereas make_*() will test the scalability of SystemML :) 2. We need to embed all our existing DML scripts into the jar with an option for the user to provide a custom script directory. This allows the user to simply import the jar (without downloading the scripts) and run one of our wrapper org.apache.sysml.api.ml.LogisticRegression. 3. MLPipeline wrappers need to be implemented for the scripts in https://github.com/apache/incubator-systemml/tree/master/scripts/algorithms . A sample implementation is available at https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/ml/LogisticRegression.java Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar From: Deron ErikssonTo: dev@systemml.incubator.apache.org Date: 04/11/2016 10:46 AM Subject:Re: Fw: Updating documentation for notebook Hi Niketan, I think a separate section for Notebooks is a great idea since, as you point out, they are hidden under the MLContext section. Also, I really like the idea of making it as easy as possible for a new user to try out SystemML in a Notebook. Very good points. Tutorials for all the algorithms using real-world data would be fantastic. To me, I would also like to see single-line algorithm invocations (possibly with generated data) that could be copy/pasted that work with no modifications needed by the user. This would probably mean either including small sets of example data in the project, or allowing the reading of data from URLs. It would be nice to take something like these 5 commands: --- $ wget https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/datagen/genRandData4Univariate.dml $ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f genRandData4Univariate.dml -exec hybrid_spark -args 100 100 10 1 2 3 4 uni.mtx $ echo '1' > uni-types.csv $ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd $ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f $SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt --- and reduce this to 1 command (in the documentation) that the user can copy/paste and the algorithm runs without any additional work needed by the user: --- $ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f $SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X= http://www.example.com/uni.mtx TYPES=http://www.example.com/uni-types.csv STATS=uni-stats.txt --- If we had this for each of the main algorithms, this would give the users working examples to start with, which is easier than trying to figure out this kind of thing by reading the comments in the DML algorithm files. Deron On Fri, Apr 8, 2016 at 4:51 PM, Niketan Pansare wrote: > Hi all, > > As per Luciano's suggestion, I have create a PR with bluemix/datascientist > tutorial and have flagged it with "Please DONOT push this PR until the > discussion on dev mailing list is complete." :) > > Also, I apologize for incorrect indentation in last email. Here is another > attempt: > - How do you want try SystemML ? > --+ Notebook on cloud > * Bluemix > -- + Zeppelin > --- Using Python Kernel > + Learn how to write DML program--(something along the lines > of > http://apache.github.io/incubator-systemml/beginners-guide-to-dml-and-pydml.html > ) > + Try out pre-packaged algorithms on real-world dataset > -- * Linear Regression > -- * GLM > -- * ALS > -- * ... > + Learn how to pass RDD/DataFrame to SystemML > + Learn how to use SystemML as MLPipeline > estimator/transformer > + Learn how to use SystemML with existing Python packages > --- Using Scala Kernel > + ... similar to Python kernel > --- Using DML Kernel > + Learn how to write DML program > -- + Jupyter > - Using Python Kernel > - Using Scala Kernel > - Using DML Kernel > * Data
Re: Fw: Updating documentation for notebook
Hi Niketan, I think a separate section for Notebooks is a great idea since, as you point out, they are hidden under the MLContext section. Also, I really like the idea of making it as easy as possible for a new user to try out SystemML in a Notebook. Very good points. Tutorials for all the algorithms using real-world data would be fantastic. To me, I would also like to see single-line algorithm invocations (possibly with generated data) that could be copy/pasted that work with no modifications needed by the user. This would probably mean either including small sets of example data in the project, or allowing the reading of data from URLs. It would be nice to take something like these 5 commands: --- $ wget https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/datagen/genRandData4Univariate.dml $ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f genRandData4Univariate.dml -exec hybrid_spark -args 100 100 10 1 2 3 4 uni.mtx $ echo '1' > uni-types.csv $ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd $ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f $SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt --- and reduce this to 1 command (in the documentation) that the user can copy/paste and the algorithm runs without any additional work needed by the user: --- $ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f $SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X= http://www.example.com/uni.mtx TYPES=http://www.example.com/uni-types.csv STATS=uni-stats.txt --- If we had this for each of the main algorithms, this would give the users working examples to start with, which is easier than trying to figure out this kind of thing by reading the comments in the DML algorithm files. Deron On Fri, Apr 8, 2016 at 4:51 PM, Niketan Pansarewrote: > Hi all, > > As per Luciano's suggestion, I have create a PR with bluemix/datascientist > tutorial and have flagged it with "Please DONOT push this PR until the > discussion on dev mailing list is complete." :) > > Also, I apologize for incorrect indentation in last email. Here is another > attempt: > - How do you want try SystemML ? > --+ Notebook on cloud > * Bluemix > -- + Zeppelin > --- Using Python Kernel > + Learn how to write DML program--(something along the lines > of > http://apache.github.io/incubator-systemml/beginners-guide-to-dml-and-pydml.html > ) > + Try out pre-packaged algorithms on real-world dataset > -- * Linear Regression > -- * GLM > -- * ALS > -- * ... > + Learn how to pass RDD/DataFrame to SystemML > + Learn how to use SystemML as MLPipeline > estimator/transformer > + Learn how to use SystemML with existing Python packages > --- Using Scala Kernel > + ... similar to Python kernel > --- Using DML Kernel > + Learn how to write DML program > -- + Jupyter > - Using Python Kernel > - Using Scala Kernel > - Using DML Kernel > * Data scientist's work bench > * Databricks cloud > * ... > --+ Notebook on laptop/cluster > * Zeppelin > * Jupyter > --+ Laptop > * Run SystemML as Standalone jar: > http://apache.github.io/incubator-systemml/quick-start-guide.html > * Embed SystemML into other Java program: > http://apache.github.io/incubator-systemml/jmlc.html > * Debug a DML script: > http://apache.github.io/incubator-systemml/debugger-guide.html > * Spark local mode > --+ Spark Cluster > * Batch invocation > * Using Spark REPL > --+ Learn how to pass RDD/DataFrame to SystemML > --+ Learn how to use SystemML as MLPipeline estimator/transformer > * Using PySpark REPL > --+ Learn how to pass RDD/DataFrame to SystemML > --+ Learn how to use SystemML as MLPipeline estimator/transformer > --+ Hadoop Cluster > --+ Spark Cluster on EC2 > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > - Forwarded by Niketan Pansare/Almaden/IBM on 04/08/2016 04:48 PM > - > > > > *Fw: Updating documentation for notebook* > > *Niketan Pansare * > to: > dev > 04/08/2016 01:11 PM > > > > > From: > Niketan Pansare/Almaden/IBM > > > > > To: > dev > > Hi all, > > Here are few suggestions to get things started: > 1. Have a "Quick Start" (or "Get Started") button besides "Get SystemML" > on http://systemml.apache.org/. > > 2. Then user can go through following questionnaire/bulleted list which > points people to appropriate link: > - How do you want try SystemML ? > + Notebook on cloud > * Bluemix > + Zeppelin > - Using Python Kernel > + Learn how to write DML program (something along the