[ https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743513#comment-15743513 ]
Joseph K. Bradley commented on SPARK-18813: ------------------------------------------- Those are definitely some top items in my mind too. Personally, I plan to focus on feature parity + Python parity, as well as ML persistence improvements, but please do add items which you're able to shepherd. > MLlib 2.2 Roadmap > ----------------- > > Key: SPARK-18813 > URL: https://issues.apache.org/jira/browse/SPARK-18813 > Project: Spark > Issue Type: Umbrella > Components: ML, MLlib > Reporter: Joseph K. Bradley > Priority: Blocker > Labels: roadmap > > *PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.* > The roadmap process described below is significantly updated since the 2.1 > roadmap [SPARK-15581]. Please refer to [SPARK-15581] for more discussion on > the basis for this proposal, and comment in this JIRA if you have suggestions > for improvements. > h1. Roadmap process > This roadmap is a master list for MLlib improvements we are working on during > this release. This includes ML-related changes in PySpark and SparkR. > *What is planned for the next release?* > * This roadmap lists issues which at least one Committer has prioritized. > See details below in "Instructions for committers." > * This roadmap only lists larger or more critical issues. > *How can contributors influence this roadmap?* > * If you believe an issue should be in this roadmap, please discuss the issue > on JIRA and/or the dev mailing list. Make sure to ping Committers since at > least one must agree to shepherd the issue. > * For general discussions, use this JIRA or the dev mailing list. For > specific issues, please comment on those issues or the mailing list. > h2. Target Version and Priority > This section describes the meaning of Target Version and Priority. _These > meanings have been updated in this proposal for the 2.2 process._ > || Category | Target Version | Priority | Shepherd | Put on roadmap? | In > next release? || > | 1 | next release | Blocker | *must* | *must* | *must* | > | 2 | next release | Critical | *must* | yes, unless small | *best effort* | > | 3 | next release | Major | *must* | optional | *best effort* | > | 4 | next release | Minor | optional | no | maybe | > | 5 | next release | Trivial | optional | no | maybe | > | 6 | (empty) | (any) | yes | no | maybe | > | 7 | (empty) | (any) | no | no | maybe | > The *Category* in the table above has the following meaning: > 1. A committer has promised to see this issue to completion for the next > release. Contributions *will* receive attention. > 2-3. A committer has promised to see this issue to completion for the next > release. Contributions *will* receive attention. The issue may slip to the > next release if development is slower than expected. > 4-5. A committer has promised interest in this issue. Contributions *will* > receive attention. The issue may slip to another release. > 6. A committer has promised interest in this issue and should respond, but no > promises are made about priorities or releases. > 7. This issue is open for discussion, but it needs a committer to promise > interest to proceed. > h1. Instructions > h2. For contributors > Getting started > * Please read http://spark.apache.org/contributing.html carefully. Code > style, documentation, and unit tests are important. > * If you are a first-time contributor, please always start with a small > [starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather > than a larger feature. > Coordinating on JIRA > * Never work silently. Let everyone know on the corresponding JIRA page when > you start work. This is to avoid duplicate work. For small patches, you do > not need to get the JIRA assigned to you to begin work. > * For medium/large features or features with dependencies, please get > assigned first before coding and keep the ETA updated on the JIRA. If there > is no activity on the JIRA page for a certain amount of time, the JIRA should > be released for other contributors. > * Do not claim multiple (>3) JIRAs at the same time. Try to finish them one > after another. > * Do not set these fields: Target Version, Fix Version, or Shepherd. Only > Committers should set those. > Writing and reviewing PRs > * Remember to add the `@Since("VERSION")` annotation to new public APIs. > * *Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code > review greatly helps to improve others' code as well as yours.* > h2. For Committers > Adding to this roadmap > * You can update the roadmap by (a) adding issues to this list and (b) > setting Target Versions. Only Committers may make these changes. > * *If you add an issue to this roadmap or set a Target Version, you _must_ > assign yourself or another Committer as Shepherd.* > * This list should be actively managed during the release. > * If you target a significant item for the next release, please list the item > on this roadmap. > * If you commit to shepherding a new public API, you implicitly commit to > shepherding the follow-up issues as well (Python/R APIs, docs). > Creating JIRA issues > * Try to break down big features into small and specific JIRA tasks and link > them properly. > * Add a "starter" label to starter tasks. > * Put a rough time estimate for medium/big features and track the progress. > * Set Priority carefully. Priority should not be mixed with size of effort > for implementation. > Managing JIRA issues and PRs > * Please add yourself to the Shepherd field on JIRA if you start reviewing a > PR. > * If the code looks good to you, please comment "LGTM". For non-trivial PRs, > please ping a Committer experienced with the relevant code to make a final > pass. > Follow-up issues: *After merging a PR, create and link the necessary > follow-up JIRAs.* > * For a new Scala/Java API > ** Create issues for adding analogous Python and R APIs > ** Create issues for adding example code and documentation > * For a new Python/R API > ** Create issues for adding example code and documentation > h1. Roadmap for this release > This roadmap only includes larger, more critical tasks targeted at the next > release. To find all issues targeted for the next release, use the links > listed below. > Notes > * We will prioritize API parity, bug fixes, and improvements over new > features. > * The RDD-based API (`spark.mllib`) is in maintenance mode now. We will > accept bug fixes for it, but new features, APIs, and improvements will only > be added to the DataFrame-based API (`spark.ml`). > *WIP: This section is still being updated, pending confirmation of the > Roadmap Process described above.* > h2. Critical feature parity in DataFrame-based API > * Umbrella JIRA: [SPARK-4591] > h2. Persistence > * Complete persistence within MLlib > ** Python tuning (SPARK-13786) > * MLlib in R format: compatibility with other languages (SPARK-15572) > * Impose backwards compatibility for persistence (SPARK-15573) > h2. SparkR > * Release SparkR on CRAN [SPARK-15799] > h2. Other prioritized issues: links for searching JIRA > This section provides links to help people identify smaller patches targeted > at the next release, as well as patches for major areas within MLlib. > * [All MLlib, SparkR, GraphX JIRAs with Target Version 2.2 | > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20in%20(ML%2C%20MLlib%2C%20SparkR%2C%20GraphX)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0%20AND%20(fixVersion%20is%20EMPTY%20OR%20fixVersion%20!%3D%202.2.0)%20ORDER%20BY%20priority] > * [MLlib, SparkR, GraphX Umbrella JIRAs (regardless of Target Version) | > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20in%20(ML%2C%20MLlib%2C%20SparkR%2C%20GraphX)%20AND%20Type%20%3D%20%22Umbrella%22%20AND%20Status%20in%20(%22Open%22%2C%20%22In%20Progress%22%2C%20%22Reopened%22)] > h1. Long-term roadmap > This section lists long-term or constant efforts. For example, Python/R API > parity with Scala/Java will always be a priority, but we do not promise exact > parity with each release. > h2. Python and R feature parity > Python feature parity: The main goal of the Python API is to have feature > parity with the Scala/Java API. You can find a [complete list of Python MLlib > issues targeted at the next release here| > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20component%20in%20(PySpark)%20AND%20"Target%20Version%2Fs"%20%3D%202.2.0%20ORDER%20BY%20priority%20DESC]. > R feature parity: We are building towards feature parity in SparkR as well. > You can find a [complete list of SparkR MLlib issues targeted at the next > release here| > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20AND%20component%20in%20(SparkR)%20AND%20"Target%20Version%2Fs"%20%3D%202.2.0%20ORDER%20BY%20priority%20DESC]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org