Fw: Updating documentation for notebook

Niketan Pansare Fri, 08 Apr 2016 16:52:49 -0700


Hi all,


As per Luciano's suggestion, I have create a PR with bluemix/datascientist
tutorial and have flagged it with "Please DONOT push this PR until the
discussion on dev mailing list is complete." :)

Also, I apologize for incorrect indentation in last email. Here is another
attempt:
- How do you want try SystemML ?
--+ Notebook on cloud
----* Bluemix
------ + Zeppelin
----------- Using Python Kernel
------------ + Learn how to write DML program--(something along the lines
of
http://apache.github.io/incubator-systemml/beginners-guide-to-dml-and-pydml.html
)
------------ + Try out pre-packaged algorithms on real-world dataset
-------------- * Linear Regression
-------------- * GLM
-------------- * ALS
-------------- * ...
------------ + Learn how to pass RDD/DataFrame to SystemML
------------ + Learn how to use SystemML as MLPipeline
estimator/transformer
------------ + Learn how to use SystemML with existing Python packages
----------- Using Scala Kernel
------------ + ... similar to Python kernel
----------- Using DML Kernel
------------ + Learn how to write DML program
------ + Jupyter
--------- Using Python Kernel
--------- Using Scala Kernel
--------- Using DML Kernel
----* Data scientist's work bench
----* Databricks cloud
----* ...
--+ Notebook on laptop/cluster
----* Zeppelin
----* Jupyter
--+ Laptop
----* Run SystemML as Standalone jar:
http://apache.github.io/incubator-systemml/quick-start-guide.html
----* Embed SystemML into other Java program:
http://apache.github.io/incubator-systemml/jmlc.html
----* Debug a DML script:
http://apache.github.io/incubator-systemml/debugger-guide.html
----* Spark local mode
--+ Spark Cluster
----* Batch invocation
----* Using Spark REPL
------+ Learn how to pass RDD/DataFrame to SystemML
------+ Learn how to use SystemML as MLPipeline estimator/transformer
----* Using PySpark REPL
------+ Learn how to pass RDD/DataFrame to SystemML
------+ Learn how to use SystemML as MLPipeline estimator/transformer
--+ Hadoop Cluster
--+ Spark Cluster on EC2

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

----- Forwarded by Niketan Pansare/Almaden/IBM on 04/08/2016 04:48 PM -----
                                                                            
                                                                            
                                                                            
   Fw: Updating documentation for notebook                                  
                                                                            
                                                                            
   Niketan Pansare                                                          
                   to:                                                      
                      dev                                                   
                                                                          0 
                                                                          4 
                                                                          / 
                                                                          0 
                                                                          8 
                                                                          / 
                                                                          2 
                                                                          0 
                                                                          1 
                                                                          6 
                                                                          0 
                                                                          1 
                                                                          : 
                                                                          1 
                                                                          1 
                                                                          P 
                                                                          M 
                                                                            
                                                                            


                                                                                
                                                                                
                                                                                
  Fro Niketan Pansare/Almaden/IBM                                               
  m:                                                                            
                                                                                
                                                                                
                                                                                
  To: dev <[email protected]>                                   
                                                                                





Hi all,

Here are few suggestions to get things started:
1. Have a "Quick Start" (or "Get Started") button besides "Get SystemML" on
http://systemml.apache.org/.

2. Then user can go through following questionnaire/bulleted list which
points people to appropriate link:
- How do you want try SystemML ?
  + Notebook on cloud
                * Bluemix
                         + Zeppelin
                                        - Using Python Kernel
                                           + Learn how to write DML program
(something along the lines of
http://apache.github.io/incubator-systemml/beginners-guide-to-dml-and-pydml.html
)
                                           + Try out pre-packaged algorithms on
real-world dataset
                                                   * Linear Regression
                                                   * GLM
                                                   * ALS
                                                   * ...
                                           + Learn how to pass RDD/DataFrame to
SystemML (for example:
http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
)
                                           + Learn how to use SystemML as 
MLPipeline
estimator/transformer
                                           + Learn how to use SystemML with 
existing
Python packages
                                        - Using Scala Kernel
                                           +  ... similar to Python kernel
                                        - Using DML Kernel
                                           + Learn how to write DML program
                         + Jupyter
                                - Using Python Kernel
                                - Using Scala Kernel
                                - Using DML Kernel
                * Data scientist's work bench
                * Databricks cloud
                * ...

  + Notebook on laptop/cluster
                * Zeppelin using docker images (for example:
http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html#zeppelin-notebook-example---linear-regression-algorithm
)
                * Jupyter (for example:
http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html#jupyter-pyspark-notebook-example---poisson-nonnegative-matrix-factorization
)

  + Laptop
                * Run SystemML as Standalone jar:
http://apache.github.io/incubator-systemml/quick-start-guide.html
                * Embed SystemML into other Java program:
http://apache.github.io/incubator-systemml/jmlc.html
                * Debug a DML script:
http://apache.github.io/incubator-systemml/debugger-guide.html
                * Spark local mode

  + Spark Cluster
                * Batch invocation
                * Using Spark REPL
                        + Learn how to pass RDD/DataFrame to SystemML
                        + Learn how to use SystemML as MLPipeline
estimator/transformer
                * Using PySpark REPL
                        + Learn how to pass RDD/DataFrame to SystemML
                        + Learn how to use SystemML as MLPipeline
estimator/transformer

  + Hadoop Cluster
  + Spark Cluster on EC2

3. Add links to SystemML presentations:
https://www.youtube.com/watch?v=n3JJP6UbH6Q
https://www.youtube.com/watch?v=6VpiJK8Jydw
https://www.youtube.com/watch?v=PV-5pZboo4A
https://www.youtube.com/watch?v=7Zrc5EzOTjg
https://www.youtube.com/watch?v=3T32lweGxOA

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

----- Forwarded by Niketan Pansare/Almaden/IBM on 04/08/2016 01:03 PM -----
                                                                            
                                                                            
                                                                            
   Re: Updating documentation for notebook                                  
                                                                            
                                                                            
   Niketan Pansare                                                          
                   to:                                                      
                      dev                                                   
                                                                          0 
                                                                          4 
                                                                          / 
                                                                          0 
                                                                          8 
                                                                          / 
                                                                          2 
                                                                          0 
                                                                          1 
                                                                          6 
                                                                          1 
                                                                          0 
                                                                          : 
                                                                          4 
                                                                          7 
                                                                          A 
                                                                          M 
                                                                            
                                                                            
   Please respond to dev                                                    
                                                                            
                                                                            
                                                                            





Thanks Abhishek. I am glad it was helpful :)

Luciano: I agree with you about having a central place for documentation.
Before cleaning up the tutorial and putting it into our documentation, I
wanted to:
1. Have a discussion about which setup should we use to introduce SystemML:
command-line standalone, command-line spark/pyspark REPL (yarn/standalone),
command-line hadoop, scala/python notebook (online notebook or require user
to setup jupyter/zeppelin).
2. Encourage other contributors to come up with intellectually simulating
tutorial using real world dataset and our existing DML algorithms. This
means creating JIRAs that people can work on. My repository is only a POC
to facilitate discussion and will be deleted after that.
3. If we do decide to go with online notebook based tutorial, have a
discussion on how to structure the tutorial:
- so as to support variety of hosting sites (bluemix / datascientist
workbench / databricks cloud / azureml / aws / ...).
- Python or Scala as primary language.
- Jupyter or Zeppelin as primary notebook.
- DML kernel or MLContext-based or JMLC-based example.
- Any standard tutorial (or textbook) we should use as example for choosing
the dataset.
- Whether the emphasis should be on learning DML or on building larger data
pipeline (for example: our MLPipeline-wrapper).

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

Abhishek Srivastava ---04/08/2016 08:55:58 AM---Great job Niketan , I had
been searching for such document off late. Regards,

From: Abhishek Srivastava <[email protected]>
To: [email protected]
Date: 04/08/2016 08:55 AM
Subject: Re: Updating documentation for notebook



Great job Niketan , I had been searching for such document off late.

Regards,
Abhishek Srivastava
Fellowship Scholar , IIM Ranchi
Skype : abhi.sri3

On Fri, Apr 8, 2016 at 6:34 AM, Niketan Pansare <[email protected]> wrote:

>
>
> Hi all,
>
> Here is a suggestion for reducing the barrier to entry for SystemML:
"Have
> a detailed quickstart guide/video using Notebook on free (or trial-based)
> hosting solution like IBM Bluemix or Data Scientist Workbench".
>
> I have create a sample tutorial:
> https://github.com/niketanpansare/systemml_tutorial
>
> Missing items in above tutorial:
> 1. Create a separate section for Notebook rather than have it hidden
under
> MLContext Programming guide (
>
>
http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html

> ).
> 2. Add Python Notebooks (This requires attaching both jars and python
> MLContext to Zeppelin or Jupyter context).
> 3. Allow users to use jars from our nightly build (see my jupyter
example)
> as well as released version (see my zeppelin example).
> 4. Tutorials for all our algorithms using real world dataset. Example:
>
>
https://www.ibm.com/support/knowledgecenter/SSPT3X_2.1.2/com.ibm.swg.im.infosphere.biginsights.tut.doc/doc/tut_Mod_BigR.html

> .
> 5. DML Kernel for Zeppelin (see
> https://issues.apache.org/jira/browse/SYSTEMML-542).
> 6. Other hosting services such as AzureML.
> 7. Tutorial that shows SystemML's integration with MLPipeline.
>
> These missing items can be broken down into relatively small tasks with
> detailed specification that external contributors can work on. Any
> thoughts ?
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>

Fw: Updating documentation for notebook

Reply via email to