[13/15] madlib-site git commit: jupyter notebooks for 1.14 release
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/418f361c/community-artifacts/Decision-trees-v1.ipynb -- diff --git a/community-artifacts/Decision-trees-v1.ipynb b/community-artifacts/Decision-trees-v1.ipynb new file mode 100644 index 000..e97b943 --- /dev/null +++ b/community-artifacts/Decision-trees-v1.ipynb @@ -0,0 +1,1590 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +"# Decision trees\n", +"\n", +"A decision tree is a supervised learning method that can be used for classification and regression. It consists of a structure in which internal nodes represent tests on attributes, and the branches from nodes represent the result of those tests. Each leaf node is a class label and the paths from root to leaf nodes define the set of classification or regression rules." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ +{ + "name": "stdout", + "output_type": "stream", + "text": [ + "The sql extension is already loaded. To reload it, use:\n", + " %reload_ext sql\n" + ] +} + ], + "source": [ +"%load_ext sql" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ +{ + "data": { + "text/plain": [ + "u'Connected: fmcquillan@madlib'" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" +} + ], + "source": [ +"# Greenplum Database 5.4.0 on GCP (demo machine)\n", +"#%sql postgresql://gpadmin@35.184.253.255:5432/madlib\n", +"\n", +"# PostgreSQL local\n", +"%sql postgresql://fmcquillan@localhost:5432/madlib\n", +"\n", +"# Greenplum Database 4.3.10.0\n", +"#%sql postgresql://gpdbchina@10.194.10.68:61000/madlib" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ +"%sql select madlib.version();\n", +"#%sql select version();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +"# Decision tree classification examples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +"# 1. Load data\n", +"Data set related to whether to play golf or not." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ +{ + "name": "stdout", + "output_type": "stream", + "text": [ + "Done.\n", + "Done.\n", + "14 rows affected.\n", + "14 rows affected.\n" + ] +}, +{ + "data": { + "text/html": [ + "\n", + "\n", + "id\n", + "OUTLOOK\n", + "temperature\n", + "humidity\n", + "Temp_Humidity\n", + "clouds_airquality\n", + "windy\n", + "class\n", + "observation_weight\n", + "\n", + "\n", + "1\n", + "sunny\n", + "85.0\n", + "85.0\n", + "[85.0, 85.0]\n", + "[u'none', u'unhealthy']\n", + "False\n", + "Don't Play\n", + "5.0\n", + "\n", + "\n", + "2\n", + "sunny\n", + "80.0\n", + "90.0\n", + "[80.0, 90.0]\n", + "[u'none', u'moderate']\n", + "True\n", + "Don't Play\n", + "5.0\n", + "\n", + "\n", + "3\n", + "overcast\n", + "83.0\n", + "78.0\n", + "[83.0, 78.0]\n", + "[u'low', u'moderate']\n", + "False\n", + "Play\n", + "1.5\n", + "\n", + "\n", + "4\n", + "rain\n", + "70.0\n", + "96.0\n", + "[70.0, 96.0]\n", + "[u'low', u'moderate']\n", + "False\n", + "Play\n", + "1.0\n", + "\n", + "\n", + "5\n", + "rain\n", + "68.0\n", + "80.0\n", + "[68.0, 80.0]\n", + "[u'medium', u'good']\n", + "False\n", + "Play\n", + "1.0\n", + "\n", + "\n", + "6\n", + "rain\n", + "65.0\n", + "70.0\n", + "[65.0, 70.0]\n", + "[u'low', u'unhealthy']\n", + "True\n", + "Don't Play\n", + "1.0\n", + "\n", + "\n", + "7\n", + "overcast\n", + "64.0\n", + "65.0\n", +
[13/15] madlib-site git commit: jupyter notebooks for 1.14 release
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/3f849b9e/community-artifacts/Decision-trees-v1.ipynb -- diff --git a/community-artifacts/Decision-trees-v1.ipynb b/community-artifacts/Decision-trees-v1.ipynb new file mode 100644 index 000..e97b943 --- /dev/null +++ b/community-artifacts/Decision-trees-v1.ipynb @@ -0,0 +1,1590 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +"# Decision trees\n", +"\n", +"A decision tree is a supervised learning method that can be used for classification and regression. It consists of a structure in which internal nodes represent tests on attributes, and the branches from nodes represent the result of those tests. Each leaf node is a class label and the paths from root to leaf nodes define the set of classification or regression rules." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ +{ + "name": "stdout", + "output_type": "stream", + "text": [ + "The sql extension is already loaded. To reload it, use:\n", + " %reload_ext sql\n" + ] +} + ], + "source": [ +"%load_ext sql" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ +{ + "data": { + "text/plain": [ + "u'Connected: fmcquillan@madlib'" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" +} + ], + "source": [ +"# Greenplum Database 5.4.0 on GCP (demo machine)\n", +"#%sql postgresql://gpadmin@35.184.253.255:5432/madlib\n", +"\n", +"# PostgreSQL local\n", +"%sql postgresql://fmcquillan@localhost:5432/madlib\n", +"\n", +"# Greenplum Database 4.3.10.0\n", +"#%sql postgresql://gpdbchina@10.194.10.68:61000/madlib" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ +"%sql select madlib.version();\n", +"#%sql select version();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +"# Decision tree classification examples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ +"# 1. Load data\n", +"Data set related to whether to play golf or not." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ +{ + "name": "stdout", + "output_type": "stream", + "text": [ + "Done.\n", + "Done.\n", + "14 rows affected.\n", + "14 rows affected.\n" + ] +}, +{ + "data": { + "text/html": [ + "\n", + "\n", + "id\n", + "OUTLOOK\n", + "temperature\n", + "humidity\n", + "Temp_Humidity\n", + "clouds_airquality\n", + "windy\n", + "class\n", + "observation_weight\n", + "\n", + "\n", + "1\n", + "sunny\n", + "85.0\n", + "85.0\n", + "[85.0, 85.0]\n", + "[u'none', u'unhealthy']\n", + "False\n", + "Don't Play\n", + "5.0\n", + "\n", + "\n", + "2\n", + "sunny\n", + "80.0\n", + "90.0\n", + "[80.0, 90.0]\n", + "[u'none', u'moderate']\n", + "True\n", + "Don't Play\n", + "5.0\n", + "\n", + "\n", + "3\n", + "overcast\n", + "83.0\n", + "78.0\n", + "[83.0, 78.0]\n", + "[u'low', u'moderate']\n", + "False\n", + "Play\n", + "1.5\n", + "\n", + "\n", + "4\n", + "rain\n", + "70.0\n", + "96.0\n", + "[70.0, 96.0]\n", + "[u'low', u'moderate']\n", + "False\n", + "Play\n", + "1.0\n", + "\n", + "\n", + "5\n", + "rain\n", + "68.0\n", + "80.0\n", + "[68.0, 80.0]\n", + "[u'medium', u'good']\n", + "False\n", + "Play\n", + "1.0\n", + "\n", + "\n", + "6\n", + "rain\n", + "65.0\n", + "70.0\n", + "[65.0, 70.0]\n", + "[u'low', u'unhealthy']\n", + "True\n", + "Don't Play\n", + "1.0\n", + "\n", + "\n", + "7\n", + "overcast\n", + "64.0\n", + "65.0\n", +