[jira] [Comment Edited] (SPARK-15254) Improve ML pipeline Cross Validation Scaladoc & PyDoc

Krishna Kalyan (JIRA) Wed, 22 Jun 2016 06:30:19 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344304#comment-15344304
 ]


Krishna Kalyan edited comment on SPARK-15254 at 6/22/16 1:29 PM:
-----------------------------------------------------------------

Can I take up this task, if no one is working on it?. 

>From what I understand, 
`Scaladoc`
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidatorModel
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidator

`PyDoc `
http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml
 (Sections below)
- CrossValidator
- CrossValidatorModel

Are the locations that need to have more information. Can you please confirm  
[~holdenk] / [~mlnick] so that I can start working on the pull request?.

CrossValidator
CrossValidator begins by splitting the dataset into a set of folds which are 
used as separate training and test datasets; e.g., with k=3k=3 folds, 
CrossValidator will generate 3 (training, test) dataset pairs, each of which 
uses 2/3 of the data for training and 1/3 for testing.

CrossValidatorModel
An important task in ML is model selection, or using data to find the best 
model or parameters for a given task. This is also called tuning. Pipelines 
facilitate model selection by making it easy to tune an entire Pipeline at 
once, rather than tuning each element in the Pipeline separately.





was (Author: krishnakalyan3):
Can I take up this task, if no one is working on it?. 

>From what I understand, 
`Scaladoc`
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidatorModel
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidator

`PyDoc `
http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml
 (Sections below)
- CrossValidator
- CrossValidatorModel

Are the locations that need to have more information. Can you please confirm  
@holdenk / [~mlnick] so that I can start working on the pull request?.

CrossValidator
CrossValidator begins by splitting the dataset into a set of folds which are 
used as separate training and test datasets; e.g., with k=3k=3 folds, 
CrossValidator will generate 3 (training, test) dataset pairs, each of which 
uses 2/3 of the data for training and 1/3 for testing.

CrossValidatorModel
An important task in ML is model selection, or using data to find the best 
model or parameters for a given task. This is also called tuning. Pipelines 
facilitate model selection by making it easy to tune an entire Pipeline at 
once, rather than tuning each element in the Pipeline separately.




> Improve ML pipeline Cross Validation Scaladoc & PyDoc
> -----------------------------------------------------
>
>                 Key: SPARK-15254
>                 URL: https://issues.apache.org/jira/browse/SPARK-15254
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation, ML
>            Reporter: holdenk
>            Priority: Minor
>
> The ML pipeline Cross Validation Scaladoc & PyDoc is very sparse - we should 
> fill this out with a more concrete description.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-15254) Improve ML pipeline Cross Validation Scaladoc & PyDoc

Reply via email to