[jira] [Commented] (SPARK-23758) MLlib 2.4 Roadmap

Marco Gaido (JIRA) Tue, 16 Jul 2019 12:38:47 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-23758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886406#comment-16886406
 ]


Marco Gaido commented on SPARK-23758:
-------------------------------------

[~dongjoon] seems weird to set the affected version to 3.0 for this one. Shall 
we rather close it?

> MLlib 2.4 Roadmap
> -----------------
>
>                 Key: SPARK-23758
>                 URL: https://issues.apache.org/jira/browse/SPARK-23758
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 3.0.0
>            Reporter: Joseph K. Bradley
>            Priority: Major
>
> h1. Roadmap process
> This roadmap is a master list for MLlib improvements we are working on during 
> this release.  This includes ML-related changes in PySpark and SparkR.
> *What is planned for the next release?*
> * This roadmap lists issues which at least one Committer has prioritized.  
> See details below in "Instructions for committers."
> * This roadmap only lists larger or more critical issues.
> *How can contributors influence this roadmap?*
> * If you believe an issue should be in this roadmap, please discuss the issue 
> on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
> least one must agree to shepherd the issue.
> * For general discussions, use this JIRA or the dev mailing list.  For 
> specific issues, please comment on those issues or the mailing list.
> * Vote for & watch issues which are important to you.
> ** MLlib, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
> ** SparkR, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]
> h2. Target Version and Priority
> This section describes the meaning of Target Version and Priority.
> || Category | Target Version | Priority | Shepherd | Put on roadmap? | In 
> next release? ||
> | [1 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Blocker | *must* | *must* | *must* |
> | [2 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Critical%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Critical | *must* | yes, unless small | *best effort* |
> | [3 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Major%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Major | *must* | optional | *best effort* |
> | [4 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Minor%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Minor | optional | no | maybe |
> | [5 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Trivial%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Trivial | optional | no | maybe |
> | [6 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20"Target%20Version%2Fs"%20in%20(EMPTY)%20AND%20Shepherd%20not%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
>  | (empty) | (any) | yes | no | maybe |
> | [7 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(EMPTY)%20AND%20Shepherd%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
>  | (empty) | (any) | no | no | maybe |
> The *Category* in the table above has the following meaning:
> 1. A committer has promised to see this issue to completion for the next 
> release.  Contributions *will* receive attention.
> 2-3. A committer has promised to see this issue to completion for the next 
> release.  Contributions *will* receive attention.  The issue may slip to the 
> next release if development is slower than expected.
> 4-5. A committer has promised interest in this issue.  Contributions *will* 
> receive attention.  The issue may slip to another release.
> 6. A committer has promised interest in this issue and should respond, but no 
> promises are made about priorities or releases.
> 7. This issue is open for discussion, but it needs a committer to promise 
> interest to proceed.
> h1. Instructions
> h2. For contributors
> Getting started
> * Please read http://spark.apache.org/contributing.html carefully. Code 
> style, documentation, and unit tests are important.
> * If you are a first-time contributor, please always start with a small 
> [starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather 
> than a larger feature.
> Coordinating on JIRA
> * Never work silently. Let everyone know on the corresponding JIRA page when 
> you start work. This is to avoid duplicate work. For small patches, you do 
> not need to get the JIRA assigned to you to begin work.
> * For medium/large features or features with dependencies, please get 
> assigned first before coding and keep the ETA updated on the JIRA. If there 
> is no activity on the JIRA page for a certain amount of time, the JIRA should 
> be released for other contributors.
> * Do not claim multiple (>3) JIRAs at the same time. Try to finish them one 
> after another.
> * Do not set these fields: Target Version, Fix Version, or Shepherd.  Only 
> Committers should set those.
> Writing and reviewing PRs
> * Remember to add the `@Since("VERSION")` annotation to new public APIs.
> * *Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code 
> review greatly helps to improve others' code as well as yours.*
> h2. For Committers
> Adding to this roadmap
> * You can update the roadmap by (a) adding issues to this list and (b) 
> setting Target Versions.  Only Committers may make these changes.
> * *If you add an issue to this roadmap or set a Target Version, you _must_ 
> assign yourself or another Committer as Shepherd.*
> * This list should be actively managed during the release.
> * If you target a significant item for the next release, please list the item 
> on this roadmap.
> * If you commit to shepherding a new public API, you implicitly commit to 
> shepherding the follow-up issues as well (Python/R APIs, docs).
> Creating JIRA issues
> * Try to break down big features into small and specific JIRA tasks and link 
> them properly.
> * Add a "starter" label to starter tasks.
> * Put a rough time estimate for medium/big features and track the progress.
> * Set Priority carefully.  Priority should not be mixed with size of effort 
> for implementation.
> Managing JIRA issues and PRs
> * Please add yourself to the Shepherd field on JIRA if you start reviewing a 
> PR.
> * If the code looks good to you, please comment "LGTM". For non-trivial PRs, 
> please ping a Committer experienced with the relevant code to make a final 
> pass.
> Follow-up issues: *After merging a PR, create and link the necessary 
> follow-up JIRAs.*
> * For a new Scala/Java API
> ** Create issues for adding analogous Python and R APIs
> ** Create issues for adding example code and documentation
> * For a new Python/R API
> ** Create issues for adding example code and documentation
> h1. Roadmap for this release
> This roadmap only includes larger, more critical tasks targeted at the next 
> release.  To find all issues targeted for the next release, use the links 
> listed below.
> Notes
> * We will prioritize API parity, bug fixes, and improvements over new 
> features.
> * The RDD-based API (`spark.mllib`) is in maintenance mode now.  We will 
> accept bug fixes for it, but new features, APIs, and improvements will only 
> be added to the DataFrame-based API (`spark.ml`).
> *WIP: This section is still being updated, pending confirmation of the 
> Roadmap Process described above.*
> h2. Critical feature parity in DataFrame-based API
> * [SPARK-14501]: Frequent Pattern Mining
> * [SPARK-15784]: Power Iteration Clustering (PIC)
> The umbrella JIRA for feature parity is [SPARK-4591], though it includes 
> items not targeted for this release.
> h2. Other prioritized issues: links for searching JIRA
> This section provides links to help people identify smaller patches targeted 
> at the next release, as well as patches for major areas within MLlib.
> * [All MLlib, SparkR, GraphX JIRAs with Target Version | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20in%20(ML%2C%20MLlib%2C%20SparkR%2C%20GraphX)%20AND%20"Target%20Version%2Fs"%20in%20(2.4.0%2C%203.0.0)%20AND%20(fixVersion%20is%20EMPTY)%20ORDER%20BY%20priority]
> * [MLlib, SparkR, GraphX Umbrella JIRAs (regardless of Target Version) | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20in%20(ML%2C%20MLlib%2C%20SparkR%2C%20GraphX)%20AND%20Type%20%3D%20%22Umbrella%22%20AND%20Status%20in%20(%22Open%22%2C%20%22In%20Progress%22%2C%20%22Reopened%22)]
> * [MLlib, SparkR, GraphX bugs | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20issuetype%20%3D%20Bug%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20ORDER%20BY%20priority%20DESC]
> h1. Long-term roadmap
> This section lists long-term or constant efforts.  For example, Python/R API 
> parity with Scala/Java will always be a priority, but we do not promise exact 
> parity with each release.
> h2. Python and R feature parity
> Python feature parity: The main goal of the Python API is to have feature 
> parity with the Scala/Java API. You can find a [complete list of open Python 
> MLlib issues here| 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20component%20in%20(PySpark)%20ORDER%20BY%20priority%20DESC].
> R feature parity: We are building towards feature parity in SparkR as well. 
> You can find a [complete list of SparkR MLlib issues here| 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20priority%20DESC].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23758) MLlib 2.4 Roadmap

Reply via email to