[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user JeremyNixon commented on a diff in the pull request: https://github.com/apache/spark/pull/11547#discussion_r55645928 --- Diff: examples/src/main/python/ml/train_validation_split.py --- @@ -0,0 +1,69 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import SparkContext +# $example on$ +from pyspark.ml import Pipeline +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit +from pyspark.sql import SQLContext +# $example off$ + +""" +This example demonstrates applying TrainValidationSplit to split data +and preform model selection, as well as applying Pipelines. --- End diff -- Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-194713380 @JeremyNixon thanks! Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11547#discussion_r55642006 --- Diff: examples/src/main/python/ml/train_validation_split.py --- @@ -0,0 +1,69 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import SparkContext +# $example on$ +from pyspark.ml import Pipeline +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit +from pyspark.sql import SQLContext +# $example off$ + +""" +This example demonstrates applying TrainValidationSplit to split data +and preform model selection, as well as applying Pipelines. --- End diff -- I just cleaned up the import and comment for Pipelines, since it's not actually used in this example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11547 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-194599374 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-194599376 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52786/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-194599129 **[Test build #52786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52786/consoleFull)** for PR 11547 at commit [`c813a93`](https://github.com/apache/spark/commit/c813a931637cd8a5966ca22d4ce1f8beefb45950). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-194591420 **[Test build #52786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52786/consoleFull)** for PR 11547 at commit [`c813a93`](https://github.com/apache/spark/commit/c813a931637cd8a5966ca22d4ce1f8beefb45950). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user JeremyNixon commented on a diff in the pull request: https://github.com/apache/spark/pull/11547#discussion_r55617350 --- Diff: examples/src/main/python/ml/train_validation_split.py --- @@ -0,0 +1,69 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import SparkContext +# $example on$ +from pyspark.ml import Pipeline +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit +from pyspark.sql import SQLContext +# $example off$ + +""" +This example demonstrats applying TrainValidationSplit to split data +and preform model selection, as well as applying Pipelines. +Run with: + + bin/spark-submit examples/src/main/python/ml/train_validation_split.py +""" + +if __name__ == "__main__": +sc = SparkContext(appName="TrainValidationSplit") +sqlContext = SQLContext(sc) +# $example on$ +# Prepare training and test data. +data = sqlContext.read.format("libsvm")\ +.load("data/mllib/sample_linear_regression_data.txt") +train, test = data.randomSplit([0.7, 0.3]) +lr = LinearRegression(maxIter=10, regParam=0.1) + +# We use a ParamGridBuilder to construct a grid of parameters to search over. +# TrainValidationSplit will try all combinations of values and determine best model using +# the evaluator. +paramGrid = ParamGridBuilder()\ +.addGrid(lr.regParam, [0.1, 0.01]) \ +.addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])\ +.build() + +# In this case the estimator is simply the linear regression. +# A TrainValidationSplit requires an Estimator, a set of Estimator ParamMaps, and an Evaluator. +tvs = TrainValidationSplit(estimator=lr, + estimatorParamMaps=paramGrid, + evaluator=RegressionEvaluator(), + # 80% of the data will be used for training, 20% for validation. + trainRatio=0.8) + +# Run TrainValidationSplit, chosing the set of parameters that optimizes the evaluator. +model = tvs.fit(train) +# Make predictions on test data. model is the model with combination of parameters --- End diff -- I went ahead made the comments completely consistent with the existing Scala and Java examples, seems reasonable to have the examples resemble one another exactly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user JeremyNixon commented on a diff in the pull request: https://github.com/apache/spark/pull/11547#discussion_r55617330 --- Diff: examples/src/main/python/ml/train_validation_split.py --- @@ -0,0 +1,69 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import SparkContext +# $example on$ +from pyspark.ml import Pipeline +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit +from pyspark.sql import SQLContext +# $example off$ + +""" +This example demonstrats applying TrainValidationSplit to split data +and preform model selection, as well as applying Pipelines. +Run with: + + bin/spark-submit examples/src/main/python/ml/train_validation_split.py +""" + +if __name__ == "__main__": +sc = SparkContext(appName="TrainValidationSplit") +sqlContext = SQLContext(sc) +# $example on$ +# Prepare training and test data. +data = sqlContext.read.format("libsvm")\ +.load("data/mllib/sample_linear_regression_data.txt") +train, test = data.randomSplit([0.7, 0.3]) +lr = LinearRegression(maxIter=10, regParam=0.1) + +# We use a ParamGridBuilder to construct a grid of parameters to search over. +# TrainValidationSplit will try all combinations of values and determine best model using +# the evaluator. +paramGrid = ParamGridBuilder()\ +.addGrid(lr.regParam, [0.1, 0.01]) \ +.addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])\ +.build() + +# In this case the estimator is simply the linear regression. +# A TrainValidationSplit requires an Estimator, a set of Estimator ParamMaps, and an Evaluator. +tvs = TrainValidationSplit(estimator=lr, + estimatorParamMaps=paramGrid, + evaluator=RegressionEvaluator(), + # 80% of the data will be used for training, 20% for validation. + trainRatio=0.8) + +# Run TrainValidationSplit, chosing the set of parameters that optimizes the evaluator. --- End diff -- I went ahead made the comments completely consistent with the existing Scala and Java examples, seems reasonable to have the examples resemble one another exactly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user JeremyNixon commented on a diff in the pull request: https://github.com/apache/spark/pull/11547#discussion_r55617143 --- Diff: examples/src/main/python/ml/train_validation_split.py --- @@ -0,0 +1,69 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import SparkContext +# $example on$ +from pyspark.ml import Pipeline +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit +from pyspark.sql import SQLContext +# $example off$ + +""" +This example demonstrats applying TrainValidationSplit to split data --- End diff -- Thanks, updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-194153088 Made a few minor comments, pending those LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11547#discussion_r55480608 --- Diff: examples/src/main/python/ml/train_validation_split.py --- @@ -0,0 +1,69 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import SparkContext +# $example on$ +from pyspark.ml import Pipeline +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit +from pyspark.sql import SQLContext +# $example off$ + +""" +This example demonstrats applying TrainValidationSplit to split data +and preform model selection, as well as applying Pipelines. +Run with: + + bin/spark-submit examples/src/main/python/ml/train_validation_split.py +""" + +if __name__ == "__main__": +sc = SparkContext(appName="TrainValidationSplit") +sqlContext = SQLContext(sc) +# $example on$ +# Prepare training and test data. +data = sqlContext.read.format("libsvm")\ +.load("data/mllib/sample_linear_regression_data.txt") +train, test = data.randomSplit([0.7, 0.3]) +lr = LinearRegression(maxIter=10, regParam=0.1) + +# We use a ParamGridBuilder to construct a grid of parameters to search over. +# TrainValidationSplit will try all combinations of values and determine best model using +# the evaluator. +paramGrid = ParamGridBuilder()\ +.addGrid(lr.regParam, [0.1, 0.01]) \ +.addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])\ +.build() + +# In this case the estimator is simply the linear regression. +# A TrainValidationSplit requires an Estimator, a set of Estimator ParamMaps, and an Evaluator. +tvs = TrainValidationSplit(estimator=lr, + estimatorParamMaps=paramGrid, + evaluator=RegressionEvaluator(), + # 80% of the data will be used for training, 20% for validation. + trainRatio=0.8) + +# Run TrainValidationSplit, chosing the set of parameters that optimizes the evaluator. --- End diff -- Perhaps we could change this comment to something like `Running TrainValidationSplit returns the model with the combination of parameters that performed best on the validation set.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11547#discussion_r55480665 --- Diff: examples/src/main/python/ml/train_validation_split.py --- @@ -0,0 +1,69 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import SparkContext +# $example on$ +from pyspark.ml import Pipeline +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit +from pyspark.sql import SQLContext +# $example off$ + +""" +This example demonstrats applying TrainValidationSplit to split data +and preform model selection, as well as applying Pipelines. +Run with: + + bin/spark-submit examples/src/main/python/ml/train_validation_split.py +""" + +if __name__ == "__main__": +sc = SparkContext(appName="TrainValidationSplit") +sqlContext = SQLContext(sc) +# $example on$ +# Prepare training and test data. +data = sqlContext.read.format("libsvm")\ +.load("data/mllib/sample_linear_regression_data.txt") +train, test = data.randomSplit([0.7, 0.3]) +lr = LinearRegression(maxIter=10, regParam=0.1) + +# We use a ParamGridBuilder to construct a grid of parameters to search over. +# TrainValidationSplit will try all combinations of values and determine best model using +# the evaluator. +paramGrid = ParamGridBuilder()\ +.addGrid(lr.regParam, [0.1, 0.01]) \ +.addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])\ +.build() + +# In this case the estimator is simply the linear regression. +# A TrainValidationSplit requires an Estimator, a set of Estimator ParamMaps, and an Evaluator. +tvs = TrainValidationSplit(estimator=lr, + estimatorParamMaps=paramGrid, + evaluator=RegressionEvaluator(), + # 80% of the data will be used for training, 20% for validation. + trainRatio=0.8) + +# Run TrainValidationSplit, chosing the set of parameters that optimizes the evaluator. +model = tvs.fit(train) +# Make predictions on test data. model is the model with combination of parameters --- End diff -- ... and here we can then simply have `Make predictions on test data using the model` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/11547#discussion_r55479803 --- Diff: examples/src/main/python/ml/train_validation_split.py --- @@ -0,0 +1,69 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import SparkContext +# $example on$ +from pyspark.ml import Pipeline +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit +from pyspark.sql import SQLContext +# $example off$ + +""" +This example demonstrats applying TrainValidationSplit to split data --- End diff -- Typo: `demonstrats` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-193740426 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52661/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-193740423 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-193740105 **[Test build #52661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52661/consoleFull)** for PR 11547 at commit [`4a92dfd`](https://github.com/apache/spark/commit/4a92dfd7c24a4834ecec214c807820d07ee8fd69). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-193732035 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-193733671 **[Test build #52661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52661/consoleFull)** for PR 11547 at commit [`4a92dfd`](https://github.com/apache/spark/commit/4a92dfd7c24a4834ecec214c807820d07ee8fd69). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-193521102 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-193517578 Looks good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/11547#discussion_r55284873 --- Diff: examples/src/main/python/ml/train_validation_split.py --- @@ -0,0 +1,68 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark import SparkContext +# $example on$ +from pyspark.ml import Pipeline +from pyspark.ml.evaluation import RegressionEvaluator +from pyspark.ml.regression import LinearRegression +from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit +# $example off$ + +""" +This example demonstrats applying TrainValidationSplit to split data +and preform model selection, as well as applying Pipelines. +Run with: + + bin/spark-submit examples/src/main/python/ml/train_validation_split.py +""" + +if __name__ == "__main__": +sc = SparkContext(appName="TrainValidationSplit") +sqlContext = SQLContext(sc) --- End diff -- Running the example as stated in the comment above, I get a "NameError: name 'SQLContext' is not defined". Could it be that an import is missing? Other than that, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11547#issuecomment-192921452 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13706] [ML] Add Python Example for Trai...
GitHub user JeremyNixon opened a pull request: https://github.com/apache/spark/pull/11547 [SPARK-13706] [ML] Add Python Example for Train Validation Split ## What changes were proposed in this pull request? This pull request adds a python example for train validation split. ## How was this patch tested? This was style tested through lint-python, generally tested with ./dev/run-tests, and run in notebook and shell environments. It was viewed in docs locally with jekyll serve. This contribution is my original work and I license it to Spark under its open source license. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JeremyNixon/spark tvs_example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11547.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11547 commit 07103fa6c86e5e7d951c8f166b3516d001cd Author: JeremyNixon Date: 2016-03-06T14:24:09Z add train-validation-split example to docs commit 994204a0a93e8c6fa58ef1134159f9ffbcce4a34 Author: JeremyNixon Date: 2016-03-06T15:05:32Z clean lint commit 9bf5f4e7a4e65046d7b4daeb3aef5c9beb0b92f3 Author: JeremyNixon Date: 2016-03-06T15:34:17Z remove pipeline --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org