This is an automated email from the ASF dual-hosted git repository.
janardhan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemds.git
The following commit(s) were added to refs/heads/master by this push:
new 0b55591 [MINOR] Fix broken links and update version names (#1259)
0b55591 is described below
commit 0b555912411acc975ad95209f6a2f88ac96cf1b5
Author: j143 <[email protected]>
AuthorDate: Thu May 6 00:14:00 2021 +0530
[MINOR] Fix broken links and update version names (#1259)
* Add permalinks for the haberman.data
* Use the latest spark 2.4.7 version
* Make sure the content is fresh and relevant.
---
notebooks/systemds_dev.ipynb | 160 ++++++++++++++-----------------------------
1 file changed, 50 insertions(+), 110 deletions(-)
diff --git a/notebooks/systemds_dev.ipynb b/notebooks/systemds_dev.ipynb
index dd9d706..9ba218f 100644
--- a/notebooks/systemds_dev.ipynb
+++ b/notebooks/systemds_dev.ipynb
@@ -5,7 +5,9 @@
"colab": {
"name": "SystemDS on Colaboratory.ipynb",
"provenance": [],
- "collapsed_sections": []
+ "collapsed_sections": [],
+ "toc_visible": true,
+ "include_colab_link": true
},
"kernelspec": {
"name": "python3",
@@ -16,8 +18,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "XX60cA7YuZsw",
- "colab_type": "text"
+ "id": "XX60cA7YuZsw"
},
"source": [
"##### Copyright © 2020 The Apache Software Foundation."
@@ -27,9 +28,7 @@
"cell_type": "code",
"metadata": {
"id": "8GEGDZ9GuZGp",
- "colab_type": "code",
- "cellView": "form",
- "colab": {}
+ "cellView": "form"
},
"source": [
"# @title Apache Version 2.0 (The \"License\");\n",
@@ -60,8 +59,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "_BbCdLjRoy2A",
- "colab_type": "text"
+ "id": "_BbCdLjRoy2A"
},
"source": [
"### Developer notebook for Apache SystemDS"
@@ -70,8 +68,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "zhdfvxkEq1BX",
- "colab_type": "text"
+ "id": "zhdfvxkEq1BX"
},
"source": [
"Run this notebook online at [Google Colab
↗](https://colab.research.google.com/github/apache/systemds/blob/master/notebooks/systemds_dev.ipynb).\n",
@@ -82,8 +79,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "efFVuggts1hr",
- "colab_type": "text"
+ "id": "efFVuggts1hr"
},
"source": [
"This Jupyter/Colab-based tutorial will interactively walk through
development setup and running SystemDS in both the\n",
@@ -99,8 +95,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "vBC5JPhkGbIV",
- "colab_type": "text"
+ "id": "vBC5JPhkGbIV"
},
"source": [
"#### Download and Install the dependencies\n",
@@ -113,8 +108,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "VkLasseNylPO",
- "colab_type": "text"
+ "id": "VkLasseNylPO"
},
"source": [
"##### Setup\n",
@@ -125,9 +119,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "4Wmf-7jfydVH",
- "colab_type": "code",
- "colab": {}
+ "id": "4Wmf-7jfydVH"
},
"source": [
"# Run and print a shell command.\n",
@@ -142,8 +134,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "kvD4HBMi0ohY",
- "colab_type": "text"
+ "id": "kvD4HBMi0ohY"
},
"source": [
"##### Install Java\n",
@@ -153,11 +144,10 @@
{
"cell_type": "code",
"metadata": {
- "id": "8Xnb_ePUyQIL",
- "colab_type": "code",
- "colab": {}
+ "id": "8Xnb_ePUyQIL"
},
"source": [
+ "!apt-get update\n",
"!apt-get install openjdk-8-jdk-headless -qq > /dev/null\n",
"\n",
"# run the below command to replace the existing installation\n",
@@ -174,8 +164,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "BhmBWf3u3Q0o",
- "colab_type": "text"
+ "id": "BhmBWf3u3Q0o"
},
"source": [
"##### Install Apache Maven\n",
@@ -188,9 +177,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "I81zPDcblchL",
- "colab_type": "code",
- "colab": {}
+ "id": "I81zPDcblchL"
},
"source": [
"# Download the maven source.\n",
@@ -214,8 +201,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "Xphbe3R43XLw",
- "colab_type": "text"
+ "id": "Xphbe3R43XLw"
},
"source": [
"##### Install Apache Spark (Optional, if you want to work with spark
backend)\n"
@@ -224,8 +210,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "_WgEa00pTs3w",
- "colab_type": "text"
+ "id": "_WgEa00pTs3w"
},
"source": [
"NOTE: If spark is not downloaded. Let us make sure the version we are
trying to download is officially supported at\n",
@@ -235,18 +220,16 @@
{
"cell_type": "code",
"metadata": {
- "id": "3zdtkFkLnskx",
- "colab_type": "code",
- "colab": {}
+ "id": "3zdtkFkLnskx"
},
"source": [
"# Spark and Hadoop version\n",
- "spark_version = 'spark-2.4.6'\n",
+ "spark_version = 'spark-2.4.7'\n",
"hadoop_version = 'hadoop2.7'\n",
"spark_path = f\"/opt/{spark_version}-bin-{hadoop_version}\"\n",
"if not os.path.exists(spark_path):\n",
" run(f\"wget -q -nc -O apache-spark.tgz
https://downloads.apache.org/spark/{spark_version}/{spark_version}-bin-{hadoop_version}.tgz\")\n",
- " run('tar zxf apache-spark.tgz -C /opt')\n",
+ " run('tar zxfv apache-spark.tgz -C /opt')\n",
" run('rm -f apache-spark.tgz')\n",
"\n",
"os.environ[\"SPARK_HOME\"] = spark_path\n",
@@ -258,8 +241,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "91pJ5U8k3cjk",
- "colab_type": "text"
+ "id": "91pJ5U8k3cjk"
},
"source": [
"#### Get Apache SystemDS\n",
@@ -270,9 +252,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "SaPIprmg3lKE",
- "colab_type": "code",
- "colab": {}
+ "id": "SaPIprmg3lKE"
},
"source": [
"!git clone https://github.com/apache/systemds systemds --depth=1\n",
@@ -284,8 +264,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "40Fo9tPUzbWK",
- "colab_type": "text"
+ "id": "40Fo9tPUzbWK"
},
"source": [
"##### Build the project"
@@ -294,9 +273,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "s0Iorb0ICgHa",
- "colab_type": "code",
- "colab": {}
+ "id": "s0Iorb0ICgHa"
},
"source": [
"# Logging flags: -q only for ERROR; -X for DEBUG; -e for ERROR\n",
@@ -312,8 +289,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "SUGac5w9ZRBQ",
- "colab_type": "text"
+ "id": "SUGac5w9ZRBQ"
},
"source": [
"### A. Working with SystemDS in **standalone** mode\n",
@@ -325,8 +301,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "g5Nk2Bb4UU2O",
- "colab_type": "text"
+ "id": "g5Nk2Bb4UU2O"
},
"source": [
"##### 1. Set SystemDS environment variables\n",
@@ -337,9 +312,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "2ZnSzkq8UT32",
- "colab_type": "code",
- "colab": {}
+ "id": "2ZnSzkq8UT32"
},
"source": [
"!export SYSTEMDS_ROOT=$(pwd)\n",
@@ -351,8 +324,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "zyLmFCv6ZYk5",
- "colab_type": "text"
+ "id": "zyLmFCv6ZYk5"
},
"source": [
"##### 2. Download Haberman data\n",
@@ -373,9 +345,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "ZrQFBQehV8SF",
- "colab_type": "code",
- "colab": {}
+ "id": "ZrQFBQehV8SF"
},
"source": [
"!mkdir ../data"
@@ -386,12 +356,10 @@
{
"cell_type": "code",
"metadata": {
- "id": "E1ZFCTFmXFY_",
- "colab_type": "code",
- "colab": {}
+ "id": "E1ZFCTFmXFY_"
},
"source": [
- "!wget -P ../data/
http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data"
+ "!wget -P ../data/
https://web.archive.org/web/20200725014530/https://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data"
],
"execution_count": null,
"outputs": []
@@ -399,9 +367,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "FTo8Py_vOGpX",
- "colab_type": "code",
- "colab": {}
+ "id": "FTo8Py_vOGpX"
},
"source": [
"# Display first 10 lines of the dataset\n",
@@ -414,8 +380,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "Oy2kgVdkaeWK",
- "colab_type": "text"
+ "id": "Oy2kgVdkaeWK"
},
"source": [
"##### 2.1 Set `metadata` for the data\n",
@@ -428,9 +393,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "vfypIgJWXT6K",
- "colab_type": "code",
- "colab": {}
+ "id": "vfypIgJWXT6K"
},
"source": [
"# generate metadata file for the dataset\n",
@@ -446,8 +409,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "7Vis3V31bA53",
- "colab_type": "text"
+ "id": "7Vis3V31bA53"
},
"source": [
"##### 3. Find the algorithm to run with `systemds`"
@@ -456,9 +418,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "L_0KosFhbhun",
- "colab_type": "code",
- "colab": {}
+ "id": "L_0KosFhbhun"
},
"source": [
"# Inspect the directory structure of systemds code base\n",
@@ -470,9 +430,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "R7C5DVM7YfTb",
- "colab_type": "code",
- "colab": {}
+ "id": "R7C5DVM7YfTb"
},
"source": [
"# List all the scripts (also called top level algorithms!)\n",
@@ -484,9 +442,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "5PrxwviWJhNd",
- "colab_type": "code",
- "colab": {}
+ "id": "5PrxwviWJhNd"
},
"source": [
"# Lets choose univariate statistics script.\n",
@@ -500,9 +456,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "zv_7wRPFSeuJ",
- "colab_type": "code",
- "colab": {}
+ "id": "zv_7wRPFSeuJ"
},
"source": [
"!./bin/systemds ./scripts/algorithms/Univar-Stats.dml -nvargs
X=../data/haberman.data TYPES=../data/types.csv STATS=../data/univarOut.mtx
CONSOLE_OUTPUT=TRUE"
@@ -513,8 +467,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "IqY_ARNnavrC",
- "colab_type": "text"
+ "id": "IqY_ARNnavrC"
},
"source": [
"##### 3.1 Let us inspect the output data"
@@ -523,9 +476,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "k-_eQg9TauPi",
- "colab_type": "code",
- "colab": {}
+ "id": "k-_eQg9TauPi"
},
"source": [
"# output first 10 lines only.\n",
@@ -537,8 +488,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "o5VCCweiDMjf",
- "colab_type": "text"
+ "id": "o5VCCweiDMjf"
},
"source": [
"#### B. Run SystemDS with Apache Spark"
@@ -547,8 +497,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "6gJhL7lc1vf7",
- "colab_type": "text"
+ "id": "6gJhL7lc1vf7"
},
"source": [
"#### Playground for DML scripts\n",
@@ -559,8 +508,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "zzqeSor__U6M",
- "colab_type": "text"
+ "id": "zzqeSor__U6M"
},
"source": [
"##### A test `dml` script to prototype algorithms\n",
@@ -572,9 +520,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "t59rTyNbOF5b",
- "colab_type": "code",
- "colab": {}
+ "id": "t59rTyNbOF5b"
},
"source": [
"%%writefile ../test.dml\n",
@@ -590,8 +536,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "VDfeuJYE1JfK",
- "colab_type": "text"
+ "id": "VDfeuJYE1JfK"
},
"source": [
"Submit the `dml` script to Spark with `spark-submit`.\n",
@@ -601,9 +546,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "YokktyNE1Cig",
- "colab_type": "code",
- "colab": {}
+ "id": "YokktyNE1Cig"
},
"source": [
"!$SPARK_HOME/bin/spark-submit \\\n",
@@ -615,8 +558,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "gCMkudo_-8_8",
- "colab_type": "text"
+ "id": "gCMkudo_-8_8"
},
"source": [
"##### Run a binary classification example with sample data\n",
@@ -627,9 +569,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "OSLq2cZb_SUl",
- "colab_type": "code",
- "colab": {}
+ "id": "OSLq2cZb_SUl"
},
"source": [
"# Example binary classification task with sample data.\n",