Repository: incubator-systemml Updated Branches: refs/heads/gh-pages 940e1ea73 -> 254f9e5da
[SYSTEMML-1088] [SYSTEMML-1090] Remove the need for label mapping for Scala wrappers - Also, removed the need to set SparkContext if already created (for example in interactive pyspark shell). - Updated the documentation. Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/254f9e5d Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/254f9e5d Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/254f9e5d Branch: refs/heads/gh-pages Commit: 254f9e5dadd601122fe861da4f1018dd2780e826 Parents: 940e1ea Author: Niketan Pansare <npan...@us.ibm.com> Authored: Wed Nov 16 15:26:42 2016 -0800 Committer: Niketan Pansare <npan...@us.ibm.com> Committed: Wed Nov 16 15:28:31 2016 -0800 ---------------------------------------------------------------------- beginners-guide-python.md | 60 ++++++++++++++++++------------------------ 1 file changed, 25 insertions(+), 35 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/254f9e5d/beginners-guide-python.md ---------------------------------------------------------------------- diff --git a/beginners-guide-python.md b/beginners-guide-python.md index 65eef50..8d597bf 100644 --- a/beginners-guide-python.md +++ b/beginners-guide-python.md @@ -72,54 +72,41 @@ brew install apache-spark16 #### Step 1: Install SystemML Python package +We are working towards uploading the python package on pypi. Until then, please use following commands: + ```bash -pip install systemml +git checkout https://github.com/apache/incubator-systemml.git +cd incubator-systemml +mvn post-integration-test -P distribution -DskipTests +pip install src/main/python/dist/systemml-incubating-0.11.0.dev1.tar.gz ``` -#### Step 2: Download SystemML Java binaries - -SystemML Python package downloads the corresponding Java binaries (along with algorithms) and places them -into the installed location. To find the location of the downloaded Java binaries, use the following command: - +The above commands will install Python package and place the corresponding Java binaries (along with algorithms) into the installed location. +To find the location of the downloaded Java binaries, use the following command: ```bash python -c 'import imp; import os; print os.path.join(imp.find_module("systemml")[1], "systemml-java")' ``` -#### Step 3: (Optional but recommended) Set SYSTEMML_HOME environment variable -<div class="codetabs"> -<div data-lang="OSX" markdown="1"> -```bash -SYSTEMML_HOME=`python -c 'import imp; import os; print os.path.join(imp.find_module("systemml")[1], "systemml-java")'` -# If you are using zsh or ksh or csh, append it to ~/.zshrc or ~/.profile or ~/.login respectively. -echo '' >> ~/.bashrc -echo 'export SYSTEMML_HOME='$SYSTEMML_HOME >> ~/.bashrc -``` -</div> -<div data-lang="Linux" markdown="1"> -```bash -SYSTEMML_HOME=`python -c 'import imp; import os; print os.path.join(imp.find_module("systemml")[1], "systemml-java")'` -# If you are using zsh or ksh or csh, append it to ~/.zshrc or ~/.profile or ~/.login respectively. -echo '' >> ~/.bashrc -echo 'export SYSTEMML_HOME='$SYSTEMML_HOME >> ~/.bashrc -``` -</div> -</div> - Note: the user is free to either use the prepackaged Java binaries or download them from [SystemML website](http://systemml.apache.org/download.html) or build them from the [source](https://github.com/apache/incubator-systemml). +To uninstall SystemML, please use following command: +```bash +pip uninstall systemml-incubating +``` + ### Start Pyspark shell <div class="codetabs"> <div data-lang="OSX" markdown="1"> ```bash -pyspark --master local[*] --driver-class-path $SYSTEMML_HOME"/SystemML.jar" +pyspark --master local[*] ``` </div> <div data-lang="Linux" markdown="1"> ```bash -pyspark --master local[*] --driver-class-path $SYSTEMML_HOME"/SystemML.jar" +pyspark --master local[*] ``` </div> </div> @@ -131,7 +118,6 @@ To get started with SystemML, let's try few elementary matrix multiplication ope ```python import systemml as sml import numpy as np -sml.setSparkContext(sc) m1 = sml.matrix(np.ones((3,3)) + 2) m2 = sml.matrix(np.ones((3,3)) + 3) m2 = m1 * (m2 + m1) @@ -166,7 +152,6 @@ X_test = diabetes_X[-20:] y_train = diabetes.target[:-20] y_test = diabetes.target[-20:] # Train Linear Regression model -sml.setSparkContext(sc) X = sml.matrix(X_train) y = sml.matrix(y_train) A = X.transpose().dot(X) @@ -236,7 +221,7 @@ from pyspark.sql import SQLContext sqlCtx = SQLContext(sc) digits = datasets.load_digits() X_digits = digits.data -y_digits = digits.target + 1 +y_digits = digits.target n_samples = len(X_digits) X_train = X_digits[:.9 * n_samples] y_train = y_digits[:.9 * n_samples] @@ -260,18 +245,23 @@ To train the above algorithm on larger dataset, we can load the dataset into Dat from sklearn import datasets, neighbors from systemml.mllearn import LogisticRegression from pyspark.sql import SQLContext +import pandas as pd +from sklearn.metrics import accuracy_score import systemml as sml sqlCtx = SQLContext(sc) digits = datasets.load_digits() X_digits = digits.data -y_digits = digits.target + 1 +y_digits = digits.target n_samples = len(X_digits) # Split the data into training/testing sets and convert to PySpark DataFrame df_train = sml.convertToLabeledDF(sqlContext, X_digits[:.9 * n_samples], y_digits[:.9 * n_samples]) -X_test = X_digits[.9 * n_samples:] -y_test = y_digits[.9 * n_samples:] +X_test = sqlCtx.createDataFrame(pd.DataFrame(X_digits[.9 * n_samples:])) logistic = LogisticRegression(sqlCtx) -print('LogisticRegression score: %f' % logistic.fit(df_train).score(X_test, y_test)) +logistic.fit(df_train) +y_predicted = logistic.predict(X_test) +y_predicted = y_predicted.select('prediction').toPandas().as_matrix().flatten() +y_test = y_digits[.9 * n_samples:] +print('LogisticRegression score: %f' % accuracy_score(y_test, y_predicted)) ``` Output: