[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-10 Thread JeremyNixon
Github user JeremyNixon commented on a diff in the pull request:

https://github.com/apache/spark/pull/11547#discussion_r55645928
  
--- Diff: examples/src/main/python/ml/train_validation_split.py ---
@@ -0,0 +1,69 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+# $example on$
+from pyspark.ml import Pipeline
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
+from pyspark.sql import SQLContext
+# $example off$
+
+"""
+This example demonstrates applying TrainValidationSplit to split data
+and preform model selection, as well as applying Pipelines.
--- End diff --

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-194713380
  
@JeremyNixon thanks! Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/11547#discussion_r55642006
  
--- Diff: examples/src/main/python/ml/train_validation_split.py ---
@@ -0,0 +1,69 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+# $example on$
+from pyspark.ml import Pipeline
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
+from pyspark.sql import SQLContext
+# $example off$
+
+"""
+This example demonstrates applying TrainValidationSplit to split data
+and preform model selection, as well as applying Pipelines.
--- End diff --

I just cleaned up the import and comment for Pipelines, since it's not 
actually used in this example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11547


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-194599374
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-194599376
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52786/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-194599129
  
**[Test build #52786 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52786/consoleFull)**
 for PR 11547 at commit 
[`c813a93`](https://github.com/apache/spark/commit/c813a931637cd8a5966ca22d4ce1f8beefb45950).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-194591420
  
**[Test build #52786 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52786/consoleFull)**
 for PR 11547 at commit 
[`c813a93`](https://github.com/apache/spark/commit/c813a931637cd8a5966ca22d4ce1f8beefb45950).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread JeremyNixon
Github user JeremyNixon commented on a diff in the pull request:

https://github.com/apache/spark/pull/11547#discussion_r55617350
  
--- Diff: examples/src/main/python/ml/train_validation_split.py ---
@@ -0,0 +1,69 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+# $example on$
+from pyspark.ml import Pipeline
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
+from pyspark.sql import SQLContext
+# $example off$
+
+"""
+This example demonstrats applying TrainValidationSplit to split data
+and preform model selection, as well as applying Pipelines.
+Run with:
+
+  bin/spark-submit examples/src/main/python/ml/train_validation_split.py
+"""
+
+if __name__ == "__main__":
+sc = SparkContext(appName="TrainValidationSplit")
+sqlContext = SQLContext(sc)
+# $example on$
+# Prepare training and test data.
+data = sqlContext.read.format("libsvm")\
+.load("data/mllib/sample_linear_regression_data.txt")
+train, test = data.randomSplit([0.7, 0.3])
+lr = LinearRegression(maxIter=10, regParam=0.1)
+
+# We use a ParamGridBuilder to construct a grid of parameters to 
search over.
+# TrainValidationSplit will try all combinations of values and 
determine best model using
+# the evaluator.
+paramGrid = ParamGridBuilder()\
+.addGrid(lr.regParam, [0.1, 0.01]) \
+.addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])\
+.build()
+
+# In this case the estimator is simply the linear regression.
+# A TrainValidationSplit requires an Estimator, a set of Estimator 
ParamMaps, and an Evaluator.
+tvs = TrainValidationSplit(estimator=lr,
+   estimatorParamMaps=paramGrid,
+   evaluator=RegressionEvaluator(),
+   # 80% of the data will be used for 
training, 20% for validation.
+   trainRatio=0.8)
+
+# Run TrainValidationSplit, chosing the set of parameters that 
optimizes the evaluator.
+model = tvs.fit(train)
+# Make predictions on test data. model is the model with combination 
of parameters
--- End diff --

I went ahead made the comments completely consistent with the existing 
Scala and Java examples, seems reasonable to have the examples resemble one 
another exactly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread JeremyNixon
Github user JeremyNixon commented on a diff in the pull request:

https://github.com/apache/spark/pull/11547#discussion_r55617330
  
--- Diff: examples/src/main/python/ml/train_validation_split.py ---
@@ -0,0 +1,69 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+# $example on$
+from pyspark.ml import Pipeline
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
+from pyspark.sql import SQLContext
+# $example off$
+
+"""
+This example demonstrats applying TrainValidationSplit to split data
+and preform model selection, as well as applying Pipelines.
+Run with:
+
+  bin/spark-submit examples/src/main/python/ml/train_validation_split.py
+"""
+
+if __name__ == "__main__":
+sc = SparkContext(appName="TrainValidationSplit")
+sqlContext = SQLContext(sc)
+# $example on$
+# Prepare training and test data.
+data = sqlContext.read.format("libsvm")\
+.load("data/mllib/sample_linear_regression_data.txt")
+train, test = data.randomSplit([0.7, 0.3])
+lr = LinearRegression(maxIter=10, regParam=0.1)
+
+# We use a ParamGridBuilder to construct a grid of parameters to 
search over.
+# TrainValidationSplit will try all combinations of values and 
determine best model using
+# the evaluator.
+paramGrid = ParamGridBuilder()\
+.addGrid(lr.regParam, [0.1, 0.01]) \
+.addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])\
+.build()
+
+# In this case the estimator is simply the linear regression.
+# A TrainValidationSplit requires an Estimator, a set of Estimator 
ParamMaps, and an Evaluator.
+tvs = TrainValidationSplit(estimator=lr,
+   estimatorParamMaps=paramGrid,
+   evaluator=RegressionEvaluator(),
+   # 80% of the data will be used for 
training, 20% for validation.
+   trainRatio=0.8)
+
+# Run TrainValidationSplit, chosing the set of parameters that 
optimizes the evaluator.
--- End diff --

I went ahead made the comments completely consistent with the existing 
Scala and Java examples, seems reasonable to have the examples resemble one 
another exactly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-09 Thread JeremyNixon
Github user JeremyNixon commented on a diff in the pull request:

https://github.com/apache/spark/pull/11547#discussion_r55617143
  
--- Diff: examples/src/main/python/ml/train_validation_split.py ---
@@ -0,0 +1,69 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+# $example on$
+from pyspark.ml import Pipeline
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
+from pyspark.sql import SQLContext
+# $example off$
+
+"""
+This example demonstrats applying TrainValidationSplit to split data
--- End diff --

Thanks, updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-08 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-194153088
  
Made a few minor comments, pending those LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-08 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/11547#discussion_r55480608
  
--- Diff: examples/src/main/python/ml/train_validation_split.py ---
@@ -0,0 +1,69 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+# $example on$
+from pyspark.ml import Pipeline
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
+from pyspark.sql import SQLContext
+# $example off$
+
+"""
+This example demonstrats applying TrainValidationSplit to split data
+and preform model selection, as well as applying Pipelines.
+Run with:
+
+  bin/spark-submit examples/src/main/python/ml/train_validation_split.py
+"""
+
+if __name__ == "__main__":
+sc = SparkContext(appName="TrainValidationSplit")
+sqlContext = SQLContext(sc)
+# $example on$
+# Prepare training and test data.
+data = sqlContext.read.format("libsvm")\
+.load("data/mllib/sample_linear_regression_data.txt")
+train, test = data.randomSplit([0.7, 0.3])
+lr = LinearRegression(maxIter=10, regParam=0.1)
+
+# We use a ParamGridBuilder to construct a grid of parameters to 
search over.
+# TrainValidationSplit will try all combinations of values and 
determine best model using
+# the evaluator.
+paramGrid = ParamGridBuilder()\
+.addGrid(lr.regParam, [0.1, 0.01]) \
+.addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])\
+.build()
+
+# In this case the estimator is simply the linear regression.
+# A TrainValidationSplit requires an Estimator, a set of Estimator 
ParamMaps, and an Evaluator.
+tvs = TrainValidationSplit(estimator=lr,
+   estimatorParamMaps=paramGrid,
+   evaluator=RegressionEvaluator(),
+   # 80% of the data will be used for 
training, 20% for validation.
+   trainRatio=0.8)
+
+# Run TrainValidationSplit, chosing the set of parameters that 
optimizes the evaluator.
--- End diff --

Perhaps we could change this comment to something like `Running 
TrainValidationSplit returns the model with the combination of parameters that 
performed best on the validation set.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-08 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/11547#discussion_r55480665
  
--- Diff: examples/src/main/python/ml/train_validation_split.py ---
@@ -0,0 +1,69 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+# $example on$
+from pyspark.ml import Pipeline
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
+from pyspark.sql import SQLContext
+# $example off$
+
+"""
+This example demonstrats applying TrainValidationSplit to split data
+and preform model selection, as well as applying Pipelines.
+Run with:
+
+  bin/spark-submit examples/src/main/python/ml/train_validation_split.py
+"""
+
+if __name__ == "__main__":
+sc = SparkContext(appName="TrainValidationSplit")
+sqlContext = SQLContext(sc)
+# $example on$
+# Prepare training and test data.
+data = sqlContext.read.format("libsvm")\
+.load("data/mllib/sample_linear_regression_data.txt")
+train, test = data.randomSplit([0.7, 0.3])
+lr = LinearRegression(maxIter=10, regParam=0.1)
+
+# We use a ParamGridBuilder to construct a grid of parameters to 
search over.
+# TrainValidationSplit will try all combinations of values and 
determine best model using
+# the evaluator.
+paramGrid = ParamGridBuilder()\
+.addGrid(lr.regParam, [0.1, 0.01]) \
+.addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])\
+.build()
+
+# In this case the estimator is simply the linear regression.
+# A TrainValidationSplit requires an Estimator, a set of Estimator 
ParamMaps, and an Evaluator.
+tvs = TrainValidationSplit(estimator=lr,
+   estimatorParamMaps=paramGrid,
+   evaluator=RegressionEvaluator(),
+   # 80% of the data will be used for 
training, 20% for validation.
+   trainRatio=0.8)
+
+# Run TrainValidationSplit, chosing the set of parameters that 
optimizes the evaluator.
+model = tvs.fit(train)
+# Make predictions on test data. model is the model with combination 
of parameters
--- End diff --

... and here we can then simply have `Make predictions on test data using 
the model`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-08 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/11547#discussion_r55479803
  
--- Diff: examples/src/main/python/ml/train_validation_split.py ---
@@ -0,0 +1,69 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+# $example on$
+from pyspark.ml import Pipeline
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
+from pyspark.sql import SQLContext
+# $example off$
+
+"""
+This example demonstrats applying TrainValidationSplit to split data
--- End diff --

Typo: `demonstrats`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-193740426
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52661/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-193740423
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-193740105
  
**[Test build #52661 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52661/consoleFull)**
 for PR 11547 at commit 
[`4a92dfd`](https://github.com/apache/spark/commit/4a92dfd7c24a4834ecec214c807820d07ee8fd69).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-08 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-193732035
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-193733671
  
**[Test build #52661 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52661/consoleFull)**
 for PR 11547 at commit 
[`4a92dfd`](https://github.com/apache/spark/commit/4a92dfd7c24a4834ecec214c807820d07ee8fd69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-07 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-193521102
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-07 Thread jodersky
Github user jodersky commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-193517578
  
Looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-07 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/11547#discussion_r55284873
  
--- Diff: examples/src/main/python/ml/train_validation_split.py ---
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark import SparkContext
+# $example on$
+from pyspark.ml import Pipeline
+from pyspark.ml.evaluation import RegressionEvaluator
+from pyspark.ml.regression import LinearRegression
+from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit
+# $example off$
+
+"""
+This example demonstrats applying TrainValidationSplit to split data
+and preform model selection, as well as applying Pipelines.
+Run with:
+
+  bin/spark-submit examples/src/main/python/ml/train_validation_split.py
+"""
+
+if __name__ == "__main__":
+sc = SparkContext(appName="TrainValidationSplit")
+sqlContext = SQLContext(sc)
--- End diff --

Running the example as stated in the comment above, I get a "NameError: 
name 'SQLContext' is not defined". Could it be that an import is missing?
Other than that, LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11547#issuecomment-192921452
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...

2016-03-06 Thread JeremyNixon
GitHub user JeremyNixon opened a pull request:

https://github.com/apache/spark/pull/11547

[SPARK-13706] [ML] Add Python Example for Train Validation Split

## What changes were proposed in this pull request?

This pull request adds a python example for train validation split.

## How was this patch tested?

This was style tested through lint-python, generally tested with 
./dev/run-tests, and run in notebook and shell environments. It was viewed in 
docs locally with jekyll serve.

This contribution is my original work and I license it to Spark under its 
open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JeremyNixon/spark tvs_example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11547.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11547


commit 07103fa6c86e5e7d951c8f166b3516d001cd
Author: JeremyNixon 
Date:   2016-03-06T14:24:09Z

add train-validation-split example to docs

commit 994204a0a93e8c6fa58ef1134159f9ffbcce4a34
Author: JeremyNixon 
Date:   2016-03-06T15:05:32Z

clean lint

commit 9bf5f4e7a4e65046d7b4daeb3aef5c9beb0b92f3
Author: JeremyNixon 
Date:   2016-03-06T15:34:17Z

remove pipeline




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org