[ https://issues.apache.org/jira/browse/SPARK-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944128#comment-14944128 ]
Xiangrui Meng commented on SPARK-9941: -------------------------------------- Created https://issues.apache.org/jira/browse/SPARK-10935. I think we could start by importing the datasets using spark-csv. > Try ML pipeline API on Kaggle competitions > ------------------------------------------ > > Key: SPARK-9941 > URL: https://issues.apache.org/jira/browse/SPARK-9941 > Project: Spark > Issue Type: Umbrella > Components: ML > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > > This is an umbrella JIRA to track some fun tasks :) > We have built many features under the ML pipeline API, and we want to see how > it works on real-world datasets, e.g., Kaggle competition datasets > (https://www.kaggle.com/competitions). We want to invite community members to > help test. The goal is NOT to win the competitions but to provide code > examples and to find out missing features and other issues to help shape the > roadmap. > For people who are interested, please do the following: > 1. Create a subtask (or leave a comment if you cannot create a subtask) to > claim a Kaggle dataset. > 2. Use the ML pipeline API to build and tune an ML pipeline that works for > the Kaggle dataset. > 3. Paste the code to gist (https://gist.github.com/) and provide the link > here. > 4. Report missing features, issues, running times, and accuracy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org