[ 
https://issues.apache.org/jira/browse/SPARK-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944128#comment-14944128
 ] 

Xiangrui Meng commented on SPARK-9941:
--------------------------------------

Created https://issues.apache.org/jira/browse/SPARK-10935. I think we could 
start by importing the datasets using spark-csv.

> Try ML pipeline API on Kaggle competitions
> ------------------------------------------
>
>                 Key: SPARK-9941
>                 URL: https://issues.apache.org/jira/browse/SPARK-9941
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> This is an umbrella JIRA to track some fun tasks :)
> We have built many features under the ML pipeline API, and we want to see how 
> it works on real-world datasets, e.g., Kaggle competition datasets 
> (https://www.kaggle.com/competitions). We want to invite community members to 
> help test. The goal is NOT to win the competitions but to provide code 
> examples and to find out missing features and other issues to help shape the 
> roadmap.
> For people who are interested, please do the following:
> 1. Create a subtask (or leave a comment if you cannot create a subtask) to 
> claim a Kaggle dataset.
> 2. Use the ML pipeline API to build and tune an ML pipeline that works for 
> the Kaggle dataset.
> 3. Paste the code to gist (https://gist.github.com/) and provide the link 
> here.
> 4. Report missing features, issues, running times, and accuracy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to