[ https://issues.apache.org/jira/browse/SPARK-15254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344304#comment-15344304 ]
Krishna Kalyan edited comment on SPARK-15254 at 6/22/16 1:29 PM: ----------------------------------------------------------------- Can I take up this task, if no one is working on it?. >From what I understand, `Scaladoc` http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidatorModel http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidator `PyDoc ` http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml (Sections below) - CrossValidator - CrossValidatorModel Are the locations that need to have more information. Can you please confirm [~holdenk] / [~mlnick] so that I can start working on the pull request?. CrossValidator CrossValidator begins by splitting the dataset into a set of folds which are used as separate training and test datasets; e.g., with k=3k=3 folds, CrossValidator will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing. CrossValidatorModel An important task in ML is model selection, or using data to find the best model or parameters for a given task. This is also called tuning. Pipelines facilitate model selection by making it easy to tune an entire Pipeline at once, rather than tuning each element in the Pipeline separately. was (Author: krishnakalyan3): Can I take up this task, if no one is working on it?. >From what I understand, `Scaladoc` http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidatorModel http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidator `PyDoc ` http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml (Sections below) - CrossValidator - CrossValidatorModel Are the locations that need to have more information. Can you please confirm @holdenk / [~mlnick] so that I can start working on the pull request?. CrossValidator CrossValidator begins by splitting the dataset into a set of folds which are used as separate training and test datasets; e.g., with k=3k=3 folds, CrossValidator will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing. CrossValidatorModel An important task in ML is model selection, or using data to find the best model or parameters for a given task. This is also called tuning. Pipelines facilitate model selection by making it easy to tune an entire Pipeline at once, rather than tuning each element in the Pipeline separately. > Improve ML pipeline Cross Validation Scaladoc & PyDoc > ----------------------------------------------------- > > Key: SPARK-15254 > URL: https://issues.apache.org/jira/browse/SPARK-15254 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML > Reporter: holdenk > Priority: Minor > > The ML pipeline Cross Validation Scaladoc & PyDoc is very sparse - we should > fill this out with a more concrete description. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org