[jira] [Commented] (SPARK-18813) MLlib 2.2 Roadmap

Joseph K. Bradley (JIRA) Mon, 12 Dec 2016 15:32:27 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743513#comment-15743513
 ]


Joseph K. Bradley commented on SPARK-18813:
-------------------------------------------

Those are definitely some top items in my mind too.  Personally, I plan to 
focus on feature parity + Python parity, as well as ML persistence 
improvements, but please do add items which you're able to shepherd.

> MLlib 2.2 Roadmap
> -----------------
>
>                 Key: SPARK-18813
>                 URL: https://issues.apache.org/jira/browse/SPARK-18813
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib
>            Reporter: Joseph K. Bradley
>            Priority: Blocker
>              Labels: roadmap
>
> *PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.*
> The roadmap process described below is significantly updated since the 2.1 
> roadmap [SPARK-15581].  Please refer to [SPARK-15581] for more discussion on 
> the basis for this proposal, and comment in this JIRA if you have suggestions 
> for improvements.
> h1. Roadmap process
> This roadmap is a master list for MLlib improvements we are working on during 
> this release.  This includes ML-related changes in PySpark and SparkR.
> *What is planned for the next release?*
> * This roadmap lists issues which at least one Committer has prioritized.  
> See details below in "Instructions for committers."
> * This roadmap only lists larger or more critical issues.
> *How can contributors influence this roadmap?*
> * If you believe an issue should be in this roadmap, please discuss the issue 
> on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
> least one must agree to shepherd the issue.
> * For general discussions, use this JIRA or the dev mailing list.  For 
> specific issues, please comment on those issues or the mailing list.
> h2. Target Version and Priority
> This section describes the meaning of Target Version and Priority.  _These 
> meanings have been updated in this proposal for the 2.2 process._
> || Category | Target Version | Priority | Shepherd | Put on roadmap? | In 
> next release? ||
> | 1 | next release | Blocker | *must* | *must* | *must* |
> | 2 | next release | Critical | *must* | yes, unless small | *best effort* |
> | 3 | next release | Major | *must* | optional | *best effort* |
> | 4 | next release | Minor | optional | no | maybe |
> | 5 | next release | Trivial | optional | no | maybe |
> | 6 | (empty) | (any) | yes | no | maybe |
> | 7 | (empty) | (any) | no | no | maybe |
> The *Category* in the table above has the following meaning:
> 1. A committer has promised to see this issue to completion for the next 
> release.  Contributions *will* receive attention.
> 2-3. A committer has promised to see this issue to completion for the next 
> release.  Contributions *will* receive attention.  The issue may slip to the 
> next release if development is slower than expected.
> 4-5. A committer has promised interest in this issue.  Contributions *will* 
> receive attention.  The issue may slip to another release.
> 6. A committer has promised interest in this issue and should respond, but no 
> promises are made about priorities or releases.
> 7. This issue is open for discussion, but it needs a committer to promise 
> interest to proceed.
> h1. Instructions
> h2. For contributors
> Getting started
> * Please read http://spark.apache.org/contributing.html carefully. Code 
> style, documentation, and unit tests are important.
> * If you are a first-time contributor, please always start with a small 
> [starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather 
> than a larger feature.
> Coordinating on JIRA
> * Never work silently. Let everyone know on the corresponding JIRA page when 
> you start work. This is to avoid duplicate work. For small patches, you do 
> not need to get the JIRA assigned to you to begin work.
> * For medium/large features or features with dependencies, please get 
> assigned first before coding and keep the ETA updated on the JIRA. If there 
> is no activity on the JIRA page for a certain amount of time, the JIRA should 
> be released for other contributors.
> * Do not claim multiple (>3) JIRAs at the same time. Try to finish them one 
> after another.
> * Do not set these fields: Target Version, Fix Version, or Shepherd.  Only 
> Committers should set those.
> Writing and reviewing PRs
> * Remember to add the `@Since("VERSION")` annotation to new public APIs.
> * *Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code 
> review greatly helps to improve others' code as well as yours.*
> h2. For Committers
> Adding to this roadmap
> * You can update the roadmap by (a) adding issues to this list and (b) 
> setting Target Versions.  Only Committers may make these changes.
> * *If you add an issue to this roadmap or set a Target Version, you _must_ 
> assign yourself or another Committer as Shepherd.*
> * This list should be actively managed during the release.
> * If you target a significant item for the next release, please list the item 
> on this roadmap.
> * If you commit to shepherding a new public API, you implicitly commit to 
> shepherding the follow-up issues as well (Python/R APIs, docs).
> Creating JIRA issues
> * Try to break down big features into small and specific JIRA tasks and link 
> them properly.
> * Add a "starter" label to starter tasks.
> * Put a rough time estimate for medium/big features and track the progress.
> * Set Priority carefully.  Priority should not be mixed with size of effort 
> for implementation.
> Managing JIRA issues and PRs
> * Please add yourself to the Shepherd field on JIRA if you start reviewing a 
> PR.
> * If the code looks good to you, please comment "LGTM". For non-trivial PRs, 
> please ping a Committer experienced with the relevant code to make a final 
> pass.
> Follow-up issues: *After merging a PR, create and link the necessary 
> follow-up JIRAs.*
> * For a new Scala/Java API
> ** Create issues for adding analogous Python and R APIs
> ** Create issues for adding example code and documentation
> * For a new Python/R API
> ** Create issues for adding example code and documentation
> h1. Roadmap for this release
> This roadmap only includes larger, more critical tasks targeted at the next 
> release.  To find all issues targeted for the next release, use the links 
> listed below.
> Notes
> * We will prioritize API parity, bug fixes, and improvements over new 
> features.
> * The RDD-based API (`spark.mllib`) is in maintenance mode now.  We will 
> accept bug fixes for it, but new features, APIs, and improvements will only 
> be added to the DataFrame-based API (`spark.ml`).
> *WIP: This section is still being updated, pending confirmation of the 
> Roadmap Process described above.*
> h2. Critical feature parity in DataFrame-based API
> * Umbrella JIRA: [SPARK-4591]
> h2. Persistence
> * Complete persistence within MLlib
> ** Python tuning (SPARK-13786)
> * MLlib in R format: compatibility with other languages (SPARK-15572)
> * Impose backwards compatibility for persistence (SPARK-15573)
> h2. SparkR
> * Release SparkR on CRAN [SPARK-15799]
> h2. Other prioritized issues: links for searching JIRA
> This section provides links to help people identify smaller patches targeted 
> at the next release, as well as patches for major areas within MLlib.
> * [All MLlib, SparkR, GraphX JIRAs with Target Version 2.2 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20in%20(ML%2C%20MLlib%2C%20SparkR%2C%20GraphX)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0%20AND%20(fixVersion%20is%20EMPTY%20OR%20fixVersion%20!%3D%202.2.0)%20ORDER%20BY%20priority]
> * [MLlib, SparkR, GraphX Umbrella JIRAs (regardless of Target Version) | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20in%20(ML%2C%20MLlib%2C%20SparkR%2C%20GraphX)%20AND%20Type%20%3D%20%22Umbrella%22%20AND%20Status%20in%20(%22Open%22%2C%20%22In%20Progress%22%2C%20%22Reopened%22)]
> h1. Long-term roadmap
> This section lists long-term or constant efforts.  For example, Python/R API 
> parity with Scala/Java will always be a priority, but we do not promise exact 
> parity with each release.
> h2. Python and R feature parity
> Python feature parity: The main goal of the Python API is to have feature 
> parity with the Scala/Java API. You can find a [complete list of Python MLlib 
> issues targeted at the next release here| 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20component%20in%20(PySpark)%20AND%20"Target%20Version%2Fs"%20%3D%202.2.0%20ORDER%20BY%20priority%20DESC].
> R feature parity: We are building towards feature parity in SparkR as well. 
> You can find a [complete list of SparkR MLlib issues targeted at the next 
> release here| 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20component%20in%20(SparkR)%20AND%20"Target%20Version%2Fs"%20%3D%202.2.0%20ORDER%20BY%20priority%20DESC].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18813) MLlib 2.2 Roadmap

Reply via email to