Repository: incubator-systemml
Updated Branches:
  refs/heads/master 266d4c8ce -> efa8c4984


[SYSTEMML-594] Updating the flight delay demo notebook

This modification allows the notebook to be executed on datascientist
workbench which has Spark 1.4


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/efa8c498
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/efa8c498
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/efa8c498

Branch: refs/heads/master
Commit: efa8c49840953eec16ffe78775c42b0150877417
Parents: 266d4c8
Author: Niketan Pansare <npan...@us.ibm.com>
Authored: Wed Jun 22 16:55:05 2016 -0700
Committer: Niketan Pansare <npan...@us.ibm.com>
Committed: Wed Jun 22 16:55:05 2016 -0700

----------------------------------------------------------------------
 samples/jupyter-notebooks/Flight_Delay_Demo.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/efa8c498/samples/jupyter-notebooks/Flight_Delay_Demo.ipynb
----------------------------------------------------------------------
diff --git a/samples/jupyter-notebooks/Flight_Delay_Demo.ipynb 
b/samples/jupyter-notebooks/Flight_Delay_Demo.ipynb
index 877d110..db153b1 100644
--- a/samples/jupyter-notebooks/Flight_Delay_Demo.ipynb
+++ b/samples/jupyter-notebooks/Flight_Delay_Demo.ipynb
@@ -1 +1 @@
-{"nbformat_minor": 0, "cells": [{"source": "# Flight Delay Prediction Demo 
Using SystemML", "cell_type": "markdown", "metadata": {}}, {"source": "This 
notebook is based on datascientistworkbench.com's tutorial notebook for 
predicting flight delay.", "cell_type": "markdown", "metadata": {}}, {"source": 
"## Loading SystemML ", "cell_type": "markdown", "metadata": {}}, {"source": 
"To use one of the released version, use \"%AddDeps org.apache.systemml 
systemml 0.9.0-incubating\". To use nightly build, \"%AddJar 
https://sparktc.ibmcloud.com/repo/latest/SystemML.jar\"\n\nOr you provide 
SystemML.jar and dependency through commandline when starting the notebook (for 
example: --packages com.databricks:spark-csv_2.10:1.4.0 --jars SystemML.jar)", 
"cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": 
"code", "source": "%AddJar 
https://sparktc.ibmcloud.com/repo/latest/SystemML.jar";, "outputs": 
[{"output_type": "stream", "name": "stdout", "text": "Using cached version of S
 ystemML.jar\n"}], "metadata": {"collapsed": false, "trusted": true}}, 
{"source": "Use Spark's CSV package for loading the CSV file", "cell_type": 
"markdown", "metadata": {}}, {"execution_count": 2, "cell_type": "code", 
"source": "%AddDeps com.databricks spark-csv_2.10 1.4.0", "outputs": 
[{"output_type": "stream", "name": "stdout", "text": ":: loading settings :: 
url = 
jar:file:/usr/local/spark-kernel/lib/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml\n::
 resolving dependencies :: com.ibm.spark#spark-kernel;working [not 
transitive]\n\tconfs: [default]\n\tfound com.databricks#spark-csv_2.10;1.4.0 in 
central\n:: resolution report :: resolve 98ms :: artifacts dl 5ms\n\t:: modules 
in use:\n\tcom.databricks#spark-csv_2.10;1.4.0 from central in 
[default]\n\t---------------------------------------------------------------------\n\t|
                  |            modules            ||   artifacts   |\n\t|       
conf       | number| search|dwnlded|evicted|| number|dwnlded|\n\t----
 -----------------------------------------------------------------\n\t|      
default     |   1   |   0   |   0   |   0   ||   1   |   0   
|\n\t---------------------------------------------------------------------\n:: 
retrieving :: com.ibm.spark#spark-kernel\n\tconfs: [default]\n\t0 artifacts 
copied, 1 already retrieved (0kB/10ms)\n"}], "metadata": {"collapsed": false, 
"trusted": true}}, {"source": "## Import Data", "cell_type": "markdown", 
"metadata": {"collapsed": true}}, {"source": "Download the airline dataset from 
stat-computing.org if not already downloaded", "cell_type": "markdown", 
"metadata": {}}, {"execution_count": 3, "cell_type": "code", "source": "import 
sys.process._\nimport java.net.URL\nimport java.io.File\nval url = 
\"http://stat-computing.org/dataexpo/2009/2007.csv.bz2\"\nval localFilePath = 
\"airline2007.csv.bz2\"\nif(!new java.io.File(localFilePath).exists) {\n    new 
URL(url) #> new File(localFilePath) !!\n}", "outputs": [], "metadata": 
{"collapsed": false, "trust
 ed": true}}, {"source": "Load the dataset into DataFrame using Spark CSV 
package", "cell_type": "markdown", "metadata": {}}, {"execution_count": null, 
"cell_type": "code", "source": "import org.apache.spark.sql.SQLContext\nval 
sqlContext = new SQLContext(sc)\nval fmt = 
sqlContext.read.format(\"com.databricks.spark.csv\")\nval opt = 
fmt.options(Map(\"header\"->\"true\", \"inferSchema\"->\"true\"))\nval airline 
= opt.load(localFilePath).na.replace( \"*\", Map(\"NA\" -> \"0.0\") )", 
"outputs": [], "metadata": {"collapsed": false, "trusted": true}}, 
{"execution_count": null, "cell_type": "code", "source": "airline.printSchema", 
"outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": 
"## Data Exploration\nWhich airports have the most delays?", "cell_type": 
"markdown", "metadata": {}}, {"execution_count": null, "cell_type": "code", 
"source": "airline.registerTempTable(\"airline\")\nsqlContext.sql(\"\"\"SELECT 
Origin, count(*) conFlight, avg(DepDelay) delay\n         
            FROM airline\n                    GROUP BY Origin\n                 
   ORDER BY delay DESC\"\"\").show", "outputs": [], "metadata": {"collapsed": 
false, "trusted": true}}, {"source": "## Modeling: Logistic 
Regression\n\nPredict departure delays of greater than 15 of flights from JFK", 
"cell_type": "markdown", "metadata": {}}, {"execution_count": null, 
"cell_type": "code", "source": "sqlContext.udf.register(\"checkDelay\", 
(depDelay:String) => try { if(depDelay.toDouble > 15) 1.0 else 2.0 } catch { 
case e:Exception => 1.0 })\nval smallAirlineData = sqlContext.sql(\"SELECT *, 
checkDelay(DepDelay) label FROM airline WHERE Origin = 'JFK'\")\nval datasets = 
smallAirlineData.randomSplit(Array(0.7, 0.3))\nval trainDataset = 
datasets(0).cache\nval testDataset = 
datasets(1).cache\ntrainDataset.count\ntestDataset.count", "outputs": [], 
"metadata": {"collapsed": false, "trusted": true}}, {"source": "### Feature 
selection", "cell_type": "markdown", "metadata": {}}, {"source": "Encode
  the destination using one-hot encoding and include the columns Year, Month, 
DayofMonth, DayOfWeek, Distance", "cell_type": "markdown", "metadata": {}}, 
{"execution_count": null, "cell_type": "code", "source": "import 
org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer, 
VectorAssembler}\n\nval indexer = new 
StringIndexer().setInputCol(\"Dest\").setOutputCol(\"DestIndex\") // 
.setHandleInvalid(\"skip\") // Only works on Spark 1.6 or later\nval encoder = 
new OneHotEncoder().setInputCol(\"DestIndex\").setOutputCol(\"DestVec\")\nval 
assembler = new 
VectorAssembler().setInputCols(Array(\"Year\",\"Month\",\"DayofMonth\",\"DayOfWeek\",\"Distance\",\"DestVec\")).setOutputCol(\"features\")",
 "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": 
"### Build the model: Use SystemML's MLPipeline wrapper. \n\nThis wrapper 
invokes MultiLogReg.dml (for training) and GLM-predict.dml (for prediction). 
These DML algorithms are available at https://github.com/apache/incuba
 tor-systemml/tree/master/scripts/algorithms", "cell_type": "markdown", 
"metadata": {}}, {"execution_count": 9, "cell_type": "code", "source": "import 
org.apache.spark.ml.Pipeline\nimport 
org.apache.sysml.api.ml.LogisticRegression\n\nval lr = new 
LogisticRegression(\"log\", 
sc).setRegParam(1e-4).setTol(1e-2).setMaxInnerIter(5).setMaxOuterIter(5)\n\nval 
pipeline = new Pipeline().setStages(Array(indexer, encoder, assembler, 
lr))\nval model = pipeline.fit(trainDataset)", "outputs": [{"output_type": 
"stream", "name": "stdout", "text": "BEGIN MULTINOMIAL LOGISTIC REGRESSION 
SCRIPT\nReading X...\nReading Y...\n-- Initially:  Objective = 
61247.87116863789,  Gradient Norm = 4.86977583580406E7,  Trust Delta = 
0.0013049801226298022\n-- Outer Iteration 1: Had 1 CG iterations\n   -- 
Obj.Reduction:  Actual = 10085.186599774679,  Predicted = 9703.748642685421  
(A/P: 1.0393),  Trust Delta = 4.148360874623699E-4\n   -- New Objective = 
51162.68456886321,  Beta Change Norm = 3.9852958205347075E-4,  Gr
 adient Norm = 3857928.1315281712\n \n-- Outer Iteration 2: Had 2 CG 
iterations\n   -- Obj.Reduction:  Actual = 140.7058278433251,  Predicted = 
138.05703502188976  (A/P: 1.0192),  Trust Delta = 4.148360874623699E-4\n   -- 
New Objective = 51021.978741019884,  Beta Change Norm = 1.251386078420451E-4,  
Gradient Norm = 110384.84459596197\nTermination / Convergence condition 
satisfied.\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"source": 
"### Evaluate the model \n\nOutput RMS error on test data", "cell_type": 
"markdown", "metadata": {}}, {"execution_count": null, "cell_type": "code", 
"source": "val predictions = 
model.transform(testDataset.withColumnRenamed(\"label\", 
\"OriginalLabel\"))\npredictions.registerTempTable(\"predictions\")\nsqlContext.sql(\"SELECT
 sqrt(avg(pow(OriginalLabel - label, 2.0))) FROM predictions\").show", 
"outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": 
"### Perform k-fold cross-validation to tune the hyperparameters\n\nP
 erform cross-validation to tune the regularization parameter for Logistic 
regression.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 
null, "cell_type": "code", "source": "import 
org.apache.spark.ml.evaluation.BinaryClassificationEvaluator\nimport 
org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator}\n\nval crossval = 
new CrossValidator().setEstimator(pipeline).setEvaluator(new 
BinaryClassificationEvaluator)\nval paramGrid = new 
ParamGridBuilder().addGrid(lr.regParam, Array(0.1, 1e-3, 
1e-6)).build()\ncrossval.setEstimatorParamMaps(paramGrid)\ncrossval.setNumFolds(2)
 // Setting k = 2\nval cvmodel = crossval.fit(trainDataset)", "outputs": [], 
"metadata": {"collapsed": true, "trusted": true}}, {"source": "### Evaluate the 
cross-validated model", "cell_type": "markdown", "metadata": {}}, 
{"execution_count": null, "cell_type": "code", "source": "val cvpredictions = 
cvmodel.transform(testDataset.withColumnRenamed(\"label\", 
\"OriginalLabel\"))\ncvpredictions.regist
 erTempTable(\"cvpredictions\")\nsqlContext.sql(\"SELECT 
sqrt(avg(pow(OriginalLabel - label, 2.0))) FROM cvpredictions\").show", 
"outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": 
"## Homework ;)\n\nRead 
http://apache.github.io/incubator-systemml/algorithms-classification.html#multinomial-logistic-regression
 and perform cross validation on other hyperparameters: for example: icpt, tol, 
maxOuterIter, maxInnerIter", "cell_type": "markdown", "metadata": {}}], 
"nbformat": 4, "metadata": {"kernelspec": {"display_name": "Scala 2.10.4 (Spark 
1.5.2)", "name": "spark", "language": "scala"}, "language_info": {"name": 
"scala"}}}
\ No newline at end of file
+{"nbformat_minor": 0, "cells": [{"source": "# Flight Delay Prediction Demo 
Using SystemML", "cell_type": "markdown", "metadata": {}}, {"source": "This 
notebook is based on datascientistworkbench.com's tutorial notebook for 
predicting flight delay.", "cell_type": "markdown", "metadata": {}}, {"source": 
"## Loading SystemML ", "cell_type": "markdown", "metadata": {}}, {"source": 
"To use one of the released version, use \"%AddDeps org.apache.systemml 
systemml 0.9.0-incubating\". To use nightly build, \"%AddJar 
https://sparktc.ibmcloud.com/repo/latest/SystemML.jar\"\n\nOr you provide 
SystemML.jar and dependency through commandline when starting the notebook (for 
example: --packages com.databricks:spark-csv_2.10:1.4.0 --jars SystemML.jar)", 
"cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": 
"code", "source": "%AddJar 
https://sparktc.ibmcloud.com/repo/latest/SystemML.jar";, "outputs": 
[{"output_type": "stream", "name": "stdout", "text": "Using cached version of S
 ystemML.jar\n"}], "metadata": {"collapsed": false, "trusted": true}}, 
{"source": "Use Spark's CSV package for loading the CSV file", "cell_type": 
"markdown", "metadata": {}}, {"execution_count": 2, "cell_type": "code", 
"source": "%AddDeps com.databricks spark-csv_2.10 1.4.0", "outputs": 
[{"output_type": "stream", "name": "stdout", "text": ":: loading settings :: 
url = 
jar:file:/usr/local/spark-kernel/lib/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml\n::
 resolving dependencies :: com.ibm.spark#spark-kernel;working [not 
transitive]\n\tconfs: [default]\n\tfound com.databricks#spark-csv_2.10;1.4.0 in 
central\ndownloading 
https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.4.0/spark-csv_2.10-1.4.0.jar
 ...\n\t[SUCCESSFUL ] com.databricks#spark-csv_2.10;1.4.0!spark-csv_2.10.jar 
(68ms)\n:: resolution report :: resolve 642ms :: artifacts dl 72ms\n\t:: 
modules in use:\n\tcom.databricks#spark-csv_2.10;1.4.0 from central in 
[default]\n\t---------------------------------
 ------------------------------------\n\t|                  |            
modules            ||   artifacts   |\n\t|       conf       | number| 
search|dwnlded|evicted|| 
number|dwnlded|\n\t---------------------------------------------------------------------\n\t|
      default     |   1   |   1   |   1   |   0   ||   1   |   1   
|\n\t---------------------------------------------------------------------\n:: 
retrieving :: com.ibm.spark#spark-kernel\n\tconfs: [default]\n\t1 artifacts 
copied, 0 already retrieved (153kB/9ms)\n"}], "metadata": {"collapsed": false, 
"trusted": true}}, {"source": "## Import Data", "cell_type": "markdown", 
"metadata": {"collapsed": true}}, {"source": "Download the airline dataset from 
stat-computing.org if not already downloaded", "cell_type": "markdown", 
"metadata": {}}, {"execution_count": 3, "cell_type": "code", "source": "import 
sys.process._\nimport java.net.URL\nimport java.io.File\nval url = 
\"http://stat-computing.org/dataexpo/2009/2007.csv.bz2\"\nval loc
 alFilePath = \"airline2007.csv.bz2\"\nif(!new 
java.io.File(localFilePath).exists) {\n    new URL(url) #> new 
File(localFilePath) !!\n}", "outputs": [], "metadata": {"collapsed": false, 
"trusted": true}}, {"source": "Load the dataset into DataFrame using Spark CSV 
package", "cell_type": "markdown", "metadata": {}}, {"execution_count": 4, 
"cell_type": "code", "source": "import org.apache.spark.sql.SQLContext\nimport 
org.apache.spark.storage.StorageLevel\nval sqlContext = new SQLContext(sc)\nval 
fmt = sqlContext.read.format(\"com.databricks.spark.csv\")\nval opt = 
fmt.options(Map(\"header\"->\"true\", \"inferSchema\"->\"true\"))\nval airline 
= opt.load(localFilePath).na.replace( \"*\", Map(\"NA\" -> \"0.0\") )", 
"outputs": [], "metadata": {"collapsed": false, "trusted": true}}, 
{"execution_count": 5, "cell_type": "code", "source": "airline.printSchema", 
"outputs": [{"output_type": "stream", "name": "stdout", "text": "root\n |-- 
Year: integer (nullable = true)\n |-- Month: integer (null
 able = true)\n |-- DayofMonth: integer (nullable = true)\n |-- DayOfWeek: 
integer (nullable = true)\n |-- DepTime: string (nullable = true)\n |-- 
CRSDepTime: integer (nullable = true)\n |-- ArrTime: string (nullable = true)\n 
|-- CRSArrTime: integer (nullable = true)\n |-- UniqueCarrier: string (nullable 
= true)\n |-- FlightNum: integer (nullable = true)\n |-- TailNum: string 
(nullable = true)\n |-- ActualElapsedTime: string (nullable = true)\n |-- 
CRSElapsedTime: string (nullable = true)\n |-- AirTime: string (nullable = 
true)\n |-- ArrDelay: string (nullable = true)\n |-- DepDelay: string (nullable 
= true)\n |-- Origin: string (nullable = true)\n |-- Dest: string (nullable = 
true)\n |-- Distance: integer (nullable = true)\n |-- TaxiIn: integer (nullable 
= true)\n |-- TaxiOut: integer (nullable = true)\n |-- Cancelled: integer 
(nullable = true)\n |-- CancellationCode: string (nullable = true)\n |-- 
Diverted: integer (nullable = true)\n |-- CarrierDelay: integer (nullable = 
true)\n 
 |-- WeatherDelay: integer (nullable = true)\n |-- NASDelay: integer (nullable 
= true)\n |-- SecurityDelay: integer (nullable = true)\n |-- LateAircraftDelay: 
integer (nullable = true)\n\n"}], "metadata": {"collapsed": false, "trusted": 
true}}, {"source": "## Data Exploration\nWhich airports have the most delays?", 
"cell_type": "markdown", "metadata": {}}, {"execution_count": 6, "cell_type": 
"code", "source": 
"airline.registerTempTable(\"airline\")\nsqlContext.sql(\"\"\"SELECT Origin, 
count(*) conFlight, avg(DepDelay) delay\n                    FROM airline\n     
               GROUP BY Origin\n                    ORDER BY delay 
DESC\"\"\").show", "outputs": [{"output_type": "stream", "name": "stdout", 
"text": "+------+---------+------------------+\n|Origin|conFlight|             
delay|\n+------+---------+------------------+\n|   PIR|        4|              
45.5|\n|   ACK|      314|45.296178343949045|\n|   SOP|      195| 
34.02051282051282|\n|   HHH|      997| 22.58776328986961|\n|   
 MCN|      992|22.496975806451612|\n|   AKN|      235|21.123404255319148|\n|   
CEC|     1055|20.807582938388627|\n|   GNV|     1927| 20.69797612869746|\n|   
EYW|     1052|20.224334600760457|\n|   ACY|      735|20.141496598639456|\n|   
SPI|     1745|19.545558739255014|\n|   GST|       90|19.233333333333334|\n|   
EWR|   154113|18.800853918877706|\n|   BRW|      726| 18.02754820936639|\n|   
AGS|     2286|17.728346456692915|\n|   ORD|   375784|17.695756072637472|\n|   
TRI|     1207| 17.63628831814416|\n|   SBN|     5128|17.505850234009362|\n|   
FAY|     2185| 17.48970251716247|\n|   PHL|   
104063|17.067776250924922|\n+------+---------+------------------+\nonly showing 
top 20 rows\n\n"}], "metadata": {"collapsed": false, "trusted": true}}, 
{"source": "## Modeling: Logistic Regression\n\nPredict departure delays of 
greater than 15 of flights from JFK", "cell_type": "markdown", "metadata": {}}, 
{"execution_count": 8, "cell_type": "code", "source": 
"sqlContext.udf.register(\"checkDelay\", (d
 epDelay:String) => try { if(depDelay.toDouble > 15) 1.0 else 2.0 } catch { 
case e:Exception => 1.0 })\nval tempSmallAirlineData = sqlContext.sql(\"SELECT 
*, checkDelay(DepDelay) label FROM airline WHERE Origin = 
'JFK'\").persist(StorageLevel.MEMORY_AND_DISK)\nval popularDest = 
tempSmallAirlineData.select(\"Dest\").map(y => (y.get(0).toString, 
1)).reduceByKey(_ + _).filter(_._2 > 
1000).collect.toMap\nsqlContext.udf.register(\"onlyUsePopularDest\", (x:String) 
=> 
popularDest.contains(x))\ntempSmallAirlineData.registerTempTable(\"tempAirline\")\nval
 smallAirlineData = sqlContext.sql(\"SELECT * FROM tempAirline WHERE 
onlyUsePopularDest(Dest)\")\n\nval datasets = 
smallAirlineData.randomSplit(Array(0.7, 0.3))\nval trainDataset = 
datasets(0).cache\nval testDataset = 
datasets(1).cache\ntrainDataset.count\ntestDataset.count", "outputs": 
[{"execution_count": 8, "output_type": "execute_result", "data": {"text/plain": 
"34773"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}
 , {"source": "### Feature selection", "cell_type": "markdown", "metadata": 
{}}, {"source": "Encode the destination using one-hot encoding and include the 
columns Year, Month, DayofMonth, DayOfWeek, Distance", "cell_type": "markdown", 
"metadata": {}}, {"execution_count": 9, "cell_type": "code", "source": "import 
org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer, 
VectorAssembler}\n\nval indexer = new 
StringIndexer().setInputCol(\"Dest\").setOutputCol(\"DestIndex\") // 
.setHandleInvalid(\"skip\") // Only works on Spark 1.6 or later\nval encoder = 
new OneHotEncoder().setInputCol(\"DestIndex\").setOutputCol(\"DestVec\")\nval 
assembler = new 
VectorAssembler().setInputCols(Array(\"Year\",\"Month\",\"DayofMonth\",\"DayOfWeek\",\"Distance\",\"DestVec\")).setOutputCol(\"features\")",
 "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": 
"### Build the model: Use SystemML's MLPipeline wrapper. \n\nThis wrapper 
invokes MultiLogReg.dml (for training) and GLM-predi
 ct.dml (for prediction). These DML algorithms are available at 
https://github.com/apache/incubator-systemml/tree/master/scripts/algorithms";, 
"cell_type": "markdown", "metadata": {}}, {"execution_count": 10, "cell_type": 
"code", "source": "import org.apache.spark.ml.Pipeline\nimport 
org.apache.sysml.api.ml.LogisticRegression\n\nval lr = new 
LogisticRegression(\"log\", 
sc).setRegParam(1e-4).setTol(1e-2).setMaxInnerIter(0).setMaxOuterIter(100)\n\nval
 pipeline = new Pipeline().setStages(Array(indexer, encoder, assembler, 
lr))\nval model = pipeline.fit(trainDataset)", "outputs": [{"output_type": 
"stream", "name": "stdout", "text": "BEGIN MULTINOMIAL LOGISTIC REGRESSION 
SCRIPT\nReading X...\nReading Y...\n-- Initially:  Objective = 
56433.27085246851,  Gradient Norm = 4.469119635504498E7,  Trust Delta = 
0.001024586722033724\n-- Outer Iteration 1: Had 1 CG iterations\n   -- 
Obj.Reduction:  Actual = 9262.13484840509,  Predicted = 8912.05664442707  (A/P: 
1.0393),  Trust Delta = 4.151353931082
 8525E-4\n   -- New Objective = 47171.13600406342,  Beta Change Norm = 
3.9882828705797336E-4,  Gradient Norm = 3491408.311614066\n \n-- Outer 
Iteration 2: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
107.11137476684962,  Predicted = 105.31921188128369  (A/P: 1.017),  Trust Delta 
= 4.1513539310828525E-4\n   -- New Objective = 47064.02462929657,  Beta Change 
Norm = 1.0302143846288746E-4,  Gradient Norm = 84892.35372269012\nTermination / 
Convergence condition satisfied.\n"}], "metadata": {"scrolled": true, 
"collapsed": false, "trusted": true}}, {"source": "### Evaluate the model 
\n\nOutput RMS error on test data", "cell_type": "markdown", "metadata": {}}, 
{"execution_count": 11, "cell_type": "code", "source": "val predictions = 
model.transform(testDataset.withColumnRenamed(\"label\", 
\"OriginalLabel\"))\npredictions.select(\"prediction\", 
\"OriginalLabel\").show\nsqlContext.udf.register(\"square\", (x:Double) => 
Math.pow(x, 2.0))", "outputs": [{"output_type": "stream", "name": "s
 tdout", "text": 
"+----------+-------------+\n|prediction|OriginalLabel|\n+----------+-------------+\n|
       1.0|          2.0|\n|       1.0|          1.0|\n|       1.0|          
2.0|\n|       1.0|          2.0|\n|       1.0|          2.0|\n|       1.0|      
    2.0|\n|       1.0|          2.0|\n|       1.0|          2.0|\n|       1.0|  
        1.0|\n|       1.0|          2.0|\n|       1.0|          1.0|\n|       
1.0|          2.0|\n|       1.0|          2.0|\n|       1.0|          2.0|\n|   
    1.0|          1.0|\n|       1.0|          2.0|\n|       1.0|          
2.0|\n|       1.0|          1.0|\n|       1.0|          1.0|\n|       1.0|      
    1.0|\n+----------+-------------+\nonly showing top 20 rows\n\n"}, 
{"execution_count": 11, "output_type": "execute_result", "data": {"text/plain": 
"UserDefinedFunction(<function1>,DoubleType,List())"}, "metadata": {}}], 
"metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 12, 
"cell_type": "code", "source": "predictions.reg
 isterTempTable(\"predictions\")\nsqlContext.sql(\"SELECT 
sqrt(avg(square(OriginalLabel - prediction))) FROM predictions\").show", 
"outputs": [{"output_type": "stream", "name": "stdout", "text": 
"+------------------+\n|               
_c0|\n+------------------+\n|0.8557362892866146|\n+------------------+\n\n"}], 
"metadata": {"collapsed": false, "trusted": true}}, {"source": "### Perform 
k-fold cross-validation to tune the hyperparameters\n\nPerform cross-validation 
to tune the regularization parameter for Logistic regression.", "cell_type": 
"markdown", "metadata": {}}, {"execution_count": 13, "cell_type": "code", 
"source": "import 
org.apache.spark.ml.evaluation.BinaryClassificationEvaluator\nimport 
org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator}\n\nval crossval = 
new CrossValidator().setEstimator(pipeline).setEvaluator(new 
BinaryClassificationEvaluator)\nval paramGrid = new 
ParamGridBuilder().addGrid(lr.regParam, Array(0.1, 1e-3, 
1e-6)).build()\ncrossval.setEstimatorPara
 mMaps(paramGrid)\ncrossval.setNumFolds(2) // Setting k = 2\nval cvmodel = 
crossval.fit(trainDataset)", "outputs": [{"output_type": "stream", "name": 
"stdout", "text": "BEGIN MULTINOMIAL LOGISTIC REGRESSION SCRIPT\nReading 
X...\nReading Y...\n-- Initially:  Objective = 28202.772482623055,  Gradient 
Norm = 2.221087060254761E7,  Trust Delta = 0.001024586722033724\n-- Outer 
Iteration 1: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 
4576.927438869821,  Predicted = 4405.651264293149  (A/P: 1.0389),  Trust Delta 
= 4.127578309122139E-4\n   -- New Objective = 23625.845043753234,  Beta Change 
Norm = 3.9671126297839183E-4,  Gradient Norm = 1718538.331150294\nTermination / 
Convergence condition satisfied.\nBEGIN MULTINOMIAL LOGISTIC REGRESSION 
SCRIPT\nReading X...\nReading Y...\n-- Initially:  Objective = 
28202.772482623055,  Gradient Norm = 2.221087060254761E7,  Trust Delta = 
0.001024586722033724\n-- Outer Iteration 1: Had 1 CG iterations\n   -- 
Obj.Reduction:  Actual = 4576.927438878782
 ,  Predicted = 4405.651264300938  (A/P: 1.0389),  Trust Delta = 
4.127578309130283E-4\n   -- New Objective = 23625.845043744273,  Beta Change 
Norm = 3.967112629790933E-4,  Gradient Norm = 1718538.3311583179\n \n-- Outer 
Iteration 2: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
52.06267761322306,  Predicted = 51.207226997373795  (A/P: 1.0167),  Trust Delta 
= 4.127578309130283E-4\n   -- New Objective = 23573.78236613105,  Beta Change 
Norm = 1.0195505438829344E-4,  Gradient Norm = 41072.985998067124\n \n-- Outer 
Iteration 3: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
0.03776156834283029,  Predicted = 0.037741389955733964  (A/P: 1.0005),  Trust 
Delta = 4.127578309130283E-4\n   -- New Objective = 23573.744604562708,  Beta 
Change Norm = 3.3257729178954336E-6,  Gradient Norm = 
3559.0088415221207\nTermination / Convergence condition satisfied.\nBEGIN 
MULTINOMIAL LOGISTIC REGRESSION SCRIPT\nReading X...\nReading Y...\n-- 
Initially:  Objective = 28202.772482623055,  Gradient No
 rm = 2.221087060254761E7,  Trust Delta = 0.001024586722033724\n-- Outer 
Iteration 1: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 
4576.927438878873,  Predicted = 4405.651264301018  (A/P: 1.0389),  Trust Delta 
= 4.1275783091303654E-4\n   -- New Objective = 23625.845043744182,  Beta Change 
Norm = 3.9671126297910036E-4,  Gradient Norm = 1718538.331158408\n \n-- Outer 
Iteration 2: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
52.062677613230335,  Predicted = 51.20722699738286  (A/P: 1.0167),  Trust Delta 
= 4.1275783091303654E-4\n   -- New Objective = 23573.782366130952,  Beta Change 
Norm = 1.0195505438831547E-4,  Gradient Norm = 41072.98599806662\n \n-- Outer 
Iteration 3: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
0.03776156833919231,  Predicted = 0.037741389955751575  (A/P: 1.0005),  Trust 
Delta = 4.1275783091303654E-4\n   -- New Objective = 23573.744604562613,  Beta 
Change Norm = 3.3257729178972746E-6,  Gradient Norm = 3559.008841523661\n \n-- 
Outer Iteration 4: 
 Had 3 CG iterations, trust bound REACHED\n   -- Obj.Reduction:  Actual = 
1.3742707646742929,  Predicted = 1.374282851981874  (A/P: 1.0),  Trust Delta = 
0.0016510313236521462\n   -- New Objective = 23572.37033379794,  Beta Change 
Norm = 4.1275783091303654E-4,  Gradient Norm = 23218.782943544382\n \n-- Outer 
Iteration 5: Had 3 CG iterations, trust bound REACHED\n   -- Obj.Reduction:  
Actual = 5.475667862796399,  Predicted = 5.475595423716493  (A/P: 1.0),  Trust 
Delta = 0.006604125294608585\n   -- New Objective = 23566.894665935142,  Beta 
Change Norm = 0.0016510313236521464,  Gradient Norm = 3400.306136071355\n \n-- 
Outer Iteration 6: Had 3 CG iterations, trust bound REACHED\n   -- 
Obj.Reduction:  Actual = 19.796611347293947,  Predicted = 19.796668922654057  
(A/P: 1.0),  Trust Delta = 0.02641650117843434\n   -- New Objective = 
23547.09805458785,  Beta Change Norm = 0.006604125294608585,  Gradient Norm = 
12384.979229404262\n \n-- Outer Iteration 7: Had 3 CG iterations, trust bound 
REACH
 ED\n   -- Obj.Reduction:  Actual = 48.9038754012945,  Predicted = 
48.86479486479853  (A/P: 1.0008),  Trust Delta = 0.039975464358656405\n   -- 
New Objective = 23498.194179186554,  Beta Change Norm = 0.026416501178434335,  
Gradient Norm = 25887.667183269536\n \n-- Outer Iteration 8: Had 1 CG 
iterations\n   -- Obj.Reduction:  Actual = 0.007870123248721939,  Predicted = 
0.007868226951946769  (A/P: 1.0002),  Trust Delta = 0.039975464358656405\n   -- 
New Objective = 23498.186309063305,  Beta Change Norm = 6.078745447586554E-7,  
Gradient Norm = 1345.8027775103888\n \n-- Outer Iteration 9: Had 5 CG 
iterations, trust bound REACHED\n   -- Obj.Reduction:  Actual = 
25.04238552428069,  Predicted = 25.024767443519863  (A/P: 1.0007),  Trust Delta 
= 0.0405590959281579\n   -- New Objective = 23473.143923539024,  Beta Change 
Norm = 0.039975464358656405,  Gradient Norm = 63769.52436782582\n \n-- Outer 
Iteration 10: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 
0.04773861860303441,  Predicted = 
 0.04771039962536379  (A/P: 1.0006),  Trust Delta = 0.0405590959281579\n   -- 
New Objective = 23473.09618492042,  Beta Change Norm = 1.4963385754664812E-6,  
Gradient Norm = 720.8018323328566\n \n-- Outer Iteration 11: Had 5 CG 
iterations, trust bound REACHED\n   -- Obj.Reduction:  Actual = 
8.123822556943196,  Predicted = 8.128868676639112  (A/P: 0.9994),  Trust Delta 
= 0.10966765508915642\n   -- New Objective = 23464.972362363478,  Beta Change 
Norm = 0.040559095928157894,  Gradient Norm = 72691.91595482397\n \n-- Outer 
Iteration 12: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 
0.06196295309564448,  Predicted = 0.061921093377362  (A/P: 1.0007),  Trust 
Delta = 0.10966765508915642\n   -- New Objective = 23464.910399410383,  Beta 
Change Norm = 1.7036583109418734E-6,  Gradient Norm = 482.30416635512506\n \n-- 
Outer Iteration 13: Had 6 CG iterations, trust bound REACHED\n   -- 
Obj.Reduction:  Actual = 17.71440401360087,  Predicted = 17.616303961789683  
(A/P: 1.0056),  Trust Delta = 
 0.16941777360208057\n   -- New Objective = 23447.19599539678,  Beta Change 
Norm = 0.10966765508915642,  Gradient Norm = 448422.2320019876\n \n-- Outer 
Iteration 14: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 
2.386916461367946,  Predicted = 2.397254649433668  (A/P: 0.9957),  Trust Delta 
= 0.16941777360208057\n   -- New Objective = 23444.809078935414,  Beta Change 
Norm = 1.0691952710422448E-5,  Gradient Norm = 2940.4721234861527\n \n-- Outer 
Iteration 15: Had 4 CG iterations\n   -- Obj.Reduction:  Actual = 
4.294265273932979,  Predicted = 4.301599925371988  (A/P: 0.9983),  Trust Delta 
= 0.16941777360208057\n   -- New Objective = 23440.51481366148,  Beta Change 
Norm = 0.018008719957742635,  Gradient Norm = 4590.1170762087395\n \n-- Outer 
Iteration 16: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 
2.4845889129210263E-4,  Predicted = 2.4844829761319425E-4  (A/P: 1.0),  Trust 
Delta = 0.16941777360208057\n   -- New Objective = 23440.51456520259,  Beta 
Change Norm = 1.08253577
 62700158E-7,  Gradient Norm = 280.5707172598387\n \n-- Outer Iteration 17: Had 
8 CG iterations, trust bound REACHED\n   -- Obj.Reduction:  Actual = 
22.440803682489786,  Predicted = 22.42170069553472  (A/P: 1.0009),  Trust Delta 
= 0.2496076412979077\n   -- New Objective = 23418.0737615201,  Beta Change Norm 
= 0.16941777360208057,  Gradient Norm = 37677.05806399844\n \n-- Outer 
Iteration 18: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
0.15241017882726737,  Predicted = 0.15239595431754965  (A/P: 1.0001),  Trust 
Delta = 0.2496076412979077\n   -- New Objective = 23417.921351341272,  Beta 
Change Norm = 8.477249180981066E-6,  Gradient Norm = 707.427496995126\n \n-- 
Outer Iteration 19: Had 8 CG iterations, trust bound REACHED\n   -- 
Obj.Reduction:  Actual = 36.817799356838805,  Predicted = 36.84419020002096  
(A/P: 0.9993),  Trust Delta = 0.3890684157185231\n   -- New Objective = 
23381.103551984434,  Beta Change Norm = 0.2496076412979077,  Gradient Norm = 
181659.30511599063\n \n-- Ou
 ter Iteration 20: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
3.9036142495642707,  Predicted = 3.907242243615839  (A/P: 0.9991),  Trust Delta 
= 0.3890684157185231\n   -- New Objective = 23377.19993773487,  Beta Change 
Norm = 4.3252276508826854E-5,  Gradient Norm = 4562.596683929567\n \n-- Outer 
Iteration 21: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 
2.4621394186397083E-4,  Predicted = 2.462032554160668E-4  (A/P: 1.0),  Trust 
Delta = 0.3890684157185231\n   -- New Objective = 23377.199691520927,  Beta 
Change Norm = 1.0792242771895522E-7,  Gradient Norm = 293.5155793389021\n \n-- 
Outer Iteration 22: Had 8 CG iterations, trust bound REACHED\n   -- 
Obj.Reduction:  Actual = 32.60430984508639,  Predicted = 32.63142558199526  
(A/P: 0.9992),  Trust Delta = 0.6911480264449816\n   -- New Objective = 
23344.59538167584,  Beta Change Norm = 0.38906841571852313,  Gradient Norm = 
13358.735388646046\n \n-- Outer Iteration 23: Had 1 CG iterations\n   -- 
Obj.Reduction:  Actual = 0.00
 21210133490967564,  Predicted = 0.002120754723733256  (A/P: 1.0001),  Trust 
Delta = 0.6911480264449816\n   -- New Objective = 23344.593260662492,  Beta 
Change Norm = 3.175083062930857E-7,  Gradient Norm = 969.5458081582332\n \n-- 
Outer Iteration 24: Had 6 CG iterations\n   -- Obj.Reduction:  Actual = 
1.0072309033639613,  Predicted = 1.0078398039430247  (A/P: 0.9994),  Trust 
Delta = 0.6911480264449816\n   -- New Objective = 23343.586029759128,  Beta 
Change Norm = 0.008749259137025917,  Gradient Norm = 1067.7896535923433\n \n-- 
Outer Iteration 25: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 
1.3547600246965885E-5,  Predicted = 1.3547465425594469E-5  (A/P: 1.0),  Trust 
Delta = 0.6911480264449816\n   -- New Objective = 23343.586016211528,  Beta 
Change Norm = 2.5374783095185467E-8,  Gradient Norm = 83.20291366858535\n \n-- 
Outer Iteration 26: Had 12 CG iterations\n   -- Obj.Reduction:  Actual = 
15.302215361618437,  Predicted = 15.310868474305936  (A/P: 0.9994),  Trust 
Delta = 0.69
 11480264449816\n   -- New Objective = 23328.28380084991,  Beta Change Norm = 
0.5120342239089952,  Gradient Norm = 15756.152919911565\n \n-- Outer Iteration 
27: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 0.0029535907960962504,  
Predicted = 0.002953150612459315  (A/P: 1.0001),  Trust Delta = 
0.6911480264449816\n   -- New Objective = 23328.280847259113,  Beta Change Norm 
= 3.74856810221399E-7,  Gradient Norm = 933.6635694330404\n \n-- Outer 
Iteration 28: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
1.0478267358848825E-4,  Predicted = 1.0478219919535331E-4  (A/P: 1.0),  Trust 
Delta = 0.6911480264449816\n   -- New Objective = 23328.28074247644,  Beta 
Change Norm = 2.2480413822676833E-7,  Gradient Norm = 
5.538385572102319\nTermination / Convergence condition satisfied.\nBEGIN 
MULTINOMIAL LOGISTIC REGRESSION SCRIPT\nReading X...\nReading Y...\n-- 
Initially:  Objective = 28230.498369845453,  Gradient Norm = 
2.248032584752783E7,  Trust Delta = 0.001024586722033724\n-- Outer I
 teration 1: Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 
4685.514381090245,  Predicted = 4506.656096079343  (A/P: 1.0397),  Trust Delta 
= 4.1751229311831877E-4\n   -- New Objective = 23544.983988755208,  Beta Change 
Norm = 4.0094223959613487E-4,  Gradient Norm = 1773112.5532909825\nTermination 
/ Convergence condition satisfied.\nBEGIN MULTINOMIAL LOGISTIC REGRESSION 
SCRIPT\nReading X...\nReading Y...\n-- Initially:  Objective = 
28230.498369845453,  Gradient Norm = 2.248032584752783E7,  Trust Delta = 
0.001024586722033724\n-- Outer Iteration 1: Had 1 CG iterations\n   -- 
Obj.Reduction:  Actual = 4685.51438109942,  Predicted = 4506.6560960873  (A/P: 
1.0397),  Trust Delta = 4.17512293119143E-4\n   -- New Objective = 
23544.983988746033,  Beta Change Norm = 4.0094223959684285E-4,  Gradient Norm = 
1773112.553299248\n \n-- Outer Iteration 2: Had 2 CG iterations\n   -- 
Obj.Reduction:  Actual = 55.08478867724625,  Predicted = 54.14637164341834  
(A/P: 1.0173),  Trust Delta = 4.175122931
 19143E-4\n   -- New Objective = 23489.899200068787,  Beta Change Norm = 
1.0409436207463608E-4,  Gradient Norm = 43863.264421495034\n \n-- Outer 
Iteration 3: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
0.0425455416625482,  Predicted = 0.0425210724118125  (A/P: 1.0006),  Trust 
Delta = 4.17512293119143E-4\n   -- New Objective = 23489.856654527124,  Beta 
Change Norm = 3.4860035525762597E-6,  Gradient Norm = 
3473.0626928235138\nTermination / Convergence condition satisfied.\nBEGIN 
MULTINOMIAL LOGISTIC REGRESSION SCRIPT\nReading X...\nReading Y...\n-- 
Initially:  Objective = 28230.498369845453,  Gradient Norm = 
2.248032584752783E7,  Trust Delta = 0.001024586722033724\n-- Outer Iteration 1: 
Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 4685.514381099514,  
Predicted = 4506.65609608738  (A/P: 1.0397),  Trust Delta = 
4.175122931191516E-4\n   -- New Objective = 23544.98398874594,  Beta Change 
Norm = 4.0094223959685E-4,  Gradient Norm = 1773112.5532993283\n \n-- Outer 
Iteration 2:
  Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 55.08478867725353,  
Predicted = 54.14637164342853  (A/P: 1.0173),  Trust Delta = 
4.175122931191516E-4\n   -- New Objective = 23489.899200068685,  Beta Change 
Norm = 1.0409436207466114E-4,  Gradient Norm = 43863.264421514185\n \n-- Outer 
Iteration 3: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
0.0425455416625482,  Predicted = 0.04252107241182405  (A/P: 1.0006),  Trust 
Delta = 4.175122931191516E-4\n   -- New Objective = 23489.856654527022,  Beta 
Change Norm = 3.486003552576232E-6,  Gradient Norm = 3473.0626928274914\n \n-- 
Outer Iteration 4: Had 3 CG iterations, trust bound REACHED\n   -- 
Obj.Reduction:  Actual = 1.3618165665211563,  Predicted = 1.3618300786307123  
(A/P: 1.0),  Trust Delta = 0.0016700491724766064\n   -- New Objective = 
23488.4948379605,  Beta Change Norm = 4.1751229311915155E-4,  Gradient Norm = 
22750.17168667339\n \n-- Outer Iteration 5: Had 3 CG iterations, trust bound 
REACHED\n   -- Obj.Reduction:  Actual
  = 5.399070530791505,  Predicted = 5.398983505048864  (A/P: 1.0),  Trust Delta 
= 0.006680196689906426\n   -- New Objective = 23483.09576742971,  Beta Change 
Norm = 0.0016700491724766064,  Gradient Norm = 3277.243187563727\n \n-- Outer 
Iteration 6: Had 3 CG iterations, trust bound REACHED\n   -- Obj.Reduction:  
Actual = 19.04347611745834,  Predicted = 19.043530665204045  (A/P: 1.0),  Trust 
Delta = 0.026720786759625702\n   -- New Objective = 23464.05229131225,  Beta 
Change Norm = 0.006680196689906425,  Gradient Norm = 12014.210859652962\n \n-- 
Outer Iteration 7: Had 3 CG iterations, trust bound REACHED\n   -- 
Obj.Reduction:  Actual = 41.1452816738456,  Predicted = 41.09983187966176  
(A/P: 1.0011),  Trust Delta = 0.03287099410333282\n   -- New Objective = 
23422.907009638406,  Beta Change Norm = 0.0267207867596257,  Gradient Norm = 
30568.57509207747\n \n-- Outer Iteration 8: Had 1 CG iterations\n   -- 
Obj.Reduction:  Actual = 0.011022901580872713,  Predicted = 0.01101972206380871 
 (A/P:
  1.0003),  Trust Delta = 0.03287099410333282\n   -- New Objective = 
23422.895986736825,  Beta Change Norm = 7.209836919526366E-7,  Gradient Norm = 
1251.6678613601161\n \n-- Outer Iteration 9: Had 8 CG iterations, trust bound 
REACHED\n   -- Obj.Reduction:  Actual = 13.978709434930352,  Predicted = 
13.974847661855666  (A/P: 1.0003),  Trust Delta = 0.033257209609599145\n   -- 
New Objective = 23408.917277301895,  Beta Change Norm = 0.03287099410333282,  
Gradient Norm = 15328.859090870203\n \n-- Outer Iteration 10: Had 2 CG 
iterations\n   -- Obj.Reduction:  Actual = 0.004639191432943335,  Predicted = 
0.004638318429644279  (A/P: 1.0002),  Trust Delta = 0.033257209609599145\n   -- 
New Objective = 23408.91263811046,  Beta Change Norm = 1.0519781798129972E-6,  
Gradient Norm = 335.02440722968106\n \n-- Outer Iteration 11: Had 4 CG 
iterations, trust bound REACHED\n   -- Obj.Reduction:  Actual = 
6.3662166226313275,  Predicted = 6.366164181244294  (A/P: 1.0),  Trust Delta = 
0.06697443441569934\n
    -- New Objective = 23402.54642148783,  Beta Change Norm = 
0.033257209609599145,  Gradient Norm = 2307.51433331859\n \n-- Outer Iteration 
12: Had 7 CG iterations, trust bound REACHED\n   -- Obj.Reduction:  Actual = 
11.15761233725425,  Predicted = 11.149031539741129  (A/P: 1.0008),  Trust Delta 
= 0.10211243265236637\n   -- New Objective = 23391.388809150576,  Beta Change 
Norm = 0.06697443441569932,  Gradient Norm = 71503.76594916714\n \n-- Outer 
Iteration 13: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
0.600488582651451,  Predicted = 0.6001508149708464  (A/P: 1.0006),  Trust Delta 
= 0.10211243265236637\n   -- New Objective = 23390.788320567925,  Beta Change 
Norm = 1.6834966454979097E-5,  Gradient Norm = 840.347770623361\n \n-- Outer 
Iteration 14: Had 8 CG iterations, trust bound REACHED\n   -- Obj.Reduction:  
Actual = 19.757560698417365,  Predicted = 19.765740859017424  (A/P: 0.9996),  
Trust Delta = 0.24398632984391763\n   -- New Objective = 23371.030759869507,  
Beta Change
  Norm = 0.10211243265236637,  Gradient Norm = 48752.608649999434\n \n-- Outer 
Iteration 15: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
0.2778570437403687,  Predicted = 0.2779044747609064  (A/P: 0.9998),  Trust 
Delta = 0.24398632984391763\n   -- New Objective = 23370.752902825767,  Beta 
Change Norm = 1.1465782794751552E-5,  Gradient Norm = 490.74546662109907\n \n-- 
Outer Iteration 16: Had 7 CG iterations, trust bound REACHED\n   -- 
Obj.Reduction:  Actual = 35.87021488765458,  Predicted = 35.87139479548606  
(A/P: 1.0),  Trust Delta = 0.5998608188063514\n   -- New Objective = 
23334.882687938112,  Beta Change Norm = 0.24398632984391766,  Gradient Norm = 
114111.92221839691\n \n-- Outer Iteration 17: Had 2 CG iterations\n   -- 
Obj.Reduction:  Actual = 1.5378956803469919,  Predicted = 1.5387644534721423  
(A/P: 0.9994),  Trust Delta = 0.5998608188063514\n   -- New Objective = 
23333.344792257765,  Beta Change Norm = 2.7062912410241883E-5,  Gradient Norm = 
1827.5390228667288\n \n-- O
 uter Iteration 18: Had 8 CG iterations, trust bound REACHED\n   -- 
Obj.Reduction:  Actual = 55.357956099222065,  Predicted = 55.4569565918232  
(A/P: 0.9982),  Trust Delta = 0.8894009952541146\n   -- New Objective = 
23277.986836158543,  Beta Change Norm = 0.5998608188063514,  Gradient Norm = 
30684.985380679016\n \n-- Outer Iteration 19: Had 2 CG iterations\n   -- 
Obj.Reduction:  Actual = 0.017656232350418577,  Predicted = 
0.017644837185100737  (A/P: 1.0006),  Trust Delta = 0.8894009952541146\n   -- 
New Objective = 23277.969179926193,  Beta Change Norm = 1.984483688888249E-6,  
Gradient Norm = 137.4544897991739\n \n-- Outer Iteration 20: Had 10 CG 
iterations\n   -- Obj.Reduction:  Actual = 13.663528841007064,  Predicted = 
13.567360160458493  (A/P: 1.0071),  Trust Delta = 0.8894009952541146\n   -- New 
Objective = 23264.305651085186,  Beta Change Norm = 0.4790943358344082,  
Gradient Norm = 15753.857353150117\n \n-- Outer Iteration 21: Had 1 CG 
iterations\n   -- Obj.Reduction:  Actual = 0
 .002973383649077732,  Predicted = 0.002972929227391132  (A/P: 1.0002),  Trust 
Delta = 0.8894009952541146\n   -- New Objective = 23264.302677701537,  Beta 
Change Norm = 3.774223875140864E-7,  Gradient Norm = 1264.8256951027395\n \n-- 
Outer Iteration 22: Had 2 CG iterations\n   -- Obj.Reduction:  Actual = 
1.9038948812521994E-4,  Predicted = 1.9038853266221582E-4  (A/P: 1.0),  Trust 
Delta = 0.8894009952541146\n   -- New Objective = 23264.30248731205,  Beta 
Change Norm = 3.019597152404477E-7,  Gradient Norm = 
10.843636813611397\nTermination / Convergence condition satisfied.\nBEGIN 
MULTINOMIAL LOGISTIC REGRESSION SCRIPT\nReading X...\nReading Y...\n-- 
Initially:  Objective = 56433.27085246851,  Gradient Norm = 
4.469119635504498E7,  Trust Delta = 0.001024586722033724\n-- Outer Iteration 1: 
Had 1 CG iterations\n   -- Obj.Reduction:  Actual = 9262.134848396847,  
Predicted = 8912.05664441991  (A/P: 1.0393),  Trust Delta = 
4.151353931079128E-4\n   -- New Objective = 47171.13600407166,  Beta 
 Change Norm = 3.9882828705765304E-4,  Gradient Norm = 
3491408.3116066065\nTermination / Convergence condition satisfied.\n"}], 
"metadata": {"collapsed": false, "trusted": true}}, {"source": "### Evaluate 
the cross-validated model", "cell_type": "markdown", "metadata": {}}, 
{"execution_count": 1, "cell_type": "code", "source": "val cvpredictions = 
cvmodel.transform(testDataset.withColumnRenamed(\"label\", 
\"OriginalLabel\"))\ncvpredictions.registerTempTable(\"cvpredictions\")\nsqlContext.sql(\"SELECT
 sqrt(avg(square(OriginalLabel - prediction))) FROM cvpredictions\").show", 
"outputs": [{"output_type": "stream", "name": "stdout", "text": 
"+------------------+\n|               
_c0|\n+------------------+\n|0.8557362892866146|\n+------------------+\n\n"}], 
"metadata": {"collapsed": false, "trusted": true}}, {"source": "## Homework 
;)\n\nRead 
http://apache.github.io/incubator-systemml/algorithms-classification.html#multinomial-logistic-regression
 and perform cross validation on other hype
 rparameters: for example: icpt, tol, maxOuterIter, maxInnerIter", "cell_type": 
"markdown", "metadata": {}}], "nbformat": 4, "metadata": {"kernelspec": 
{"display_name": "Scala 2.10.4 (Spark 1.5.2)", "name": "spark", "language": 
"scala"}, "language_info": {"name": "scala"}}}
\ No newline at end of file


Reply via email to