[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706722#comment-14706722 ] Manoj Kumar commented on SPARK-6192: [~rxin] It gets over in a few hours from now. I have written a blog post summarizing my work done during this summer. https://manojbits.wordpress.com/2015/08/21/google-summer-of-code-wrapup/ I think this can be marked as resolved :) cc: [~josephkb] Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707233#comment-14707233 ] Joseph K. Bradley commented on SPARK-6192: -- I'll mark it resolved. Thanks again for all of your help this summer---we really appreciate it. Good luck with everything, and I hope you're able to keep contributing. Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706191#comment-14706191 ] Xiangrui Meng commented on SPARK-6192: -- Not yet, officially. Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706197#comment-14706197 ] Xiangrui Meng commented on SPARK-6192: -- [~srblakcHwak] As I mentioned above, it would be great if you can start with some small features or helping review others' PRs. We need to know each other before we can plan a GSoC project. This is a good place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702528#comment-14702528 ] Reynold Xin commented on SPARK-6192: Is this one completely done? Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638078#comment-14638078 ] K S Sreenivasa Raghavan commented on SPARK-6192: I am interested in python API for MLlib . I took 2 courses in pyspark and quite good in python. Is there any possibility of collaborating. Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581736#comment-14581736 ] Manoj Kumar commented on SPARK-6192: [~mengxr] I have linked other ongoing issues as well. Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560528#comment-14560528 ] Xiangrui Meng commented on SPARK-6192: -- [~bu_min] Sorry for my late response! As I mentioned above, it would be great if you can start with some small features or helping review others' PRs. We need to know each other before we can plan a GSoC project. This is a good place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379777#comment-14379777 ] min cheng commented on SPARK-6192: -- Hello,all, I am a candidate for Professional master of University of Chinese Academy of Sciences. I major in cloud computing and Machine Learning. Now, I am preparing for the application for this project of GSoC 2015. I have a good foundation of Python and have a good understanding of all the common algorithms of machine Learning, I also have done some application upon Machine Learning, I even participated the ALIDATA DISCOVERY competition last year. Besides, I have 3-year experience in using Hadoop platforms, I proficient in MapReduce computing framework. Furthermore, I have been learning Spark for half a year. Do you think it is suitable for me to apply this project of GSoC 2015? waiting for your advice.Thank ! Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376329#comment-14376329 ] Manoj Kumar commented on SPARK-6192: [~mengxr] Sorry for spamming, but do you have anything else to add? (The deadline is a few days away, hence) Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377197#comment-14377197 ] Xiangrui Meng commented on SPARK-6192: -- Thanks for the update! The current version looks good to me. Please keep me updated on key events. Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367701#comment-14367701 ] Manoj Kumar commented on SPARK-6192: Thanks for your feedback. I've fixed it up (same link) adding an Importance section, denoting the importance of the project. Let me know if there is anything else to be done. Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365918#comment-14365918 ] Xiangrui Meng commented on SPARK-6192: -- [~MechCoder] Please be a little (but not too) specific in the proposal. For example, you should mention Python in the title of the proposal, which sets the theme of the project. Scala/Java will be definitely involved, but the goal is to have a better coverage of MLlib's Python API. This also helps reviewers understand the scope the proposal and rate it. You should also mention in the proposal that if the features are implemented by others, we will create new tasks within the theme of the project. So it is good for both MLlib and GSoC. Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363767#comment-14363767 ] Manoj Kumar commented on SPARK-6192: [~mengxr] Google Summer of Code applications are open today. I have submitted my proposal here, http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/manojkumar/5654792596619264 It would be great if you could register as a mentor, and do the needful as described here (https://community.apache.org/mentee-ranking-process.html). Thanks! Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353347#comment-14353347 ] Xiangrui Meng commented on SPARK-6192: -- [~Manglano] and [~leckie-chn] Thanks for your interests in GSoC Spark MLlib! As [~MechCoder] mentioned, this JIRA was created for him based on his past experience and recent contributions to Spark MLlib. We tried to set a theme for the project but make the actual tasks flexible. So it doesn't mean that we are blocking others from implementing these features. You can contribute any of these features at any time. It would be great if you can start with some small features or helping review others' PRs. We need to know each other before we can plan a GSoC project, but I'm afraid that we may not have enough time to make it happen this year. Anyway, this is a good place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352459#comment-14352459 ] Yan Ni commented on SPARK-6192: --- hello, I am a senior year undergraduate student and had experience in python ML. Now I am interested in distributed platforms like spark but don't have any experience. I want to take this project as my starting point in spark. Any advice? Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352546#comment-14352546 ] Manoj Kumar commented on SPARK-6192: [~Manglano] [~leckie-chn] Hi, I am actually not a mentor but a student whom this GSoC project is preassigned to by Xiangrui (since I've been working on the Spark codebase for about a couple of months right now) . This project idea was actually a result of brainstorming across different Pull Requests. I would suggest you have a look at different issues which would help you gain familiarity with the API and help to propose a project proposal. Hope that helps. Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
[ https://issues.apache.org/jira/browse/SPARK-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351030#comment-14351030 ] David J. Manglano commented on SPARK-6192: -- Hello, I am experienced in Python, have an interest in machine learning, and have some knowledge of the graph and probability theory involved. I am also interested in the use of cluster computing in scientific data analysis. I would like to work on this project for GSoC 2015. What skills would be required, and what would be the next step? Thanks! Enhance MLlib's Python API (GSoC 2015) -- Key: SPARK-6192 URL: https://issues.apache.org/jira/browse/SPARK-6192 Project: Spark Issue Type: Umbrella Components: ML, MLlib, PySpark Reporter: Xiangrui Meng Assignee: Manoj Kumar Labels: gsoc, gsoc2015, mentor This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are: 1. For all models in MLlib, provide save/load method. This also includes save/load in Scala. 2. Python API for evaluation metrics. 3. Python API for streaming ML algorithms. 4. Python API for distributed linear algebra. 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use customized serialization, making MLLibPythonAPI hard to maintain. It would be nice to use the DataFrames for serialization. I'll link the JIRAs for each of the tasks. Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO list will be dynamic based on the backlog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org