[jira] [Commented] (SPARK-4591) Algorithm/model parity for spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882286#comment-16882286 ] Sean Owen commented on SPARK-4591: -- What else would go under this umbrella? > Algorithm/model parity for spark.ml (Scala) > --- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve critical > feature parity for the next release. > h3. Instructions for 3 subtask types > *Review tasks*: detailed review of a subpackage to identify feature gaps > between spark.mllib and spark.ml. > * Should be listed as a subtask of this umbrella. > * Review subtasks cover major algorithm groups. To pick up a review subtask, > please: > ** Comment that you are working on it. > ** Compare the public APIs of spark.ml vs. spark.mllib. > ** Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > ** Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > *Critical tasks*: higher priority missing features which are required for > this umbrella JIRA. > * Should be linked as "requires" links. > *Other tasks*: lower priority missing features which can be completed after > the critical tasks. > * Should be linked as "contains" links. > h4. Excluded items > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * Moving linalg to spark.ml: [SPARK-13944] > * Streaming ML: Requires stabilizing some internal APIs of structured > streaming first > h3. TODO list > *Critical issues* > * [SPARK-14501]: Frequent Pattern Mining > * [SPARK-14709]: linear SVM > * [SPARK-15784]: Power Iteration Clustering (PIC) > *Lower priority issues* > * Missing methods within algorithms (see Issue Links below) > * evaluation submodule > * stat submodule (should probably be covered in DataFrames) > * Developer-facing submodules: > ** optimization (including [SPARK-17136]) > ** random, rdd > ** util > *To be prioritized* > * single-instance prediction: [SPARK-10413] > * pmml [SPARK-11171] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity for spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520635#comment-16520635 ] Joseph K. Bradley commented on SPARK-4591: -- There are still a few contained tasks which are incomplete. I'd like to leave this open for now. > Algorithm/model parity for spark.ml (Scala) > --- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve critical > feature parity for the next release. > h3. Instructions for 3 subtask types > *Review tasks*: detailed review of a subpackage to identify feature gaps > between spark.mllib and spark.ml. > * Should be listed as a subtask of this umbrella. > * Review subtasks cover major algorithm groups. To pick up a review subtask, > please: > ** Comment that you are working on it. > ** Compare the public APIs of spark.ml vs. spark.mllib. > ** Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > ** Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > *Critical tasks*: higher priority missing features which are required for > this umbrella JIRA. > * Should be linked as "requires" links. > *Other tasks*: lower priority missing features which can be completed after > the critical tasks. > * Should be linked as "contains" links. > h4. Excluded items > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * Moving linalg to spark.ml: [SPARK-13944] > * Streaming ML: Requires stabilizing some internal APIs of structured > streaming first > h3. TODO list > *Critical issues* > * [SPARK-14501]: Frequent Pattern Mining > * [SPARK-14709]: linear SVM > * [SPARK-15784]: Power Iteration Clustering (PIC) > *Lower priority issues* > * Missing methods within algorithms (see Issue Links below) > * evaluation submodule > * stat submodule (should probably be covered in DataFrames) > * Developer-facing submodules: > ** optimization (including [SPARK-17136]) > ** random, rdd > ** util > *To be prioritized* > * single-instance prediction: [SPARK-10413] > * pmml [SPARK-11171] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity for spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512153#comment-16512153 ] Lee Dongjin commented on SPARK-4591: [~josephkb] Excuse me. By SPARK-14376 was resolved recently, I think we should make this issue be resolve also. > Algorithm/model parity for spark.ml (Scala) > --- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve critical > feature parity for the next release. > h3. Instructions for 3 subtask types > *Review tasks*: detailed review of a subpackage to identify feature gaps > between spark.mllib and spark.ml. > * Should be listed as a subtask of this umbrella. > * Review subtasks cover major algorithm groups. To pick up a review subtask, > please: > ** Comment that you are working on it. > ** Compare the public APIs of spark.ml vs. spark.mllib. > ** Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > ** Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > *Critical tasks*: higher priority missing features which are required for > this umbrella JIRA. > * Should be linked as "requires" links. > *Other tasks*: lower priority missing features which can be completed after > the critical tasks. > * Should be linked as "contains" links. > h4. Excluded items > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * Moving linalg to spark.ml: [SPARK-13944] > * Streaming ML: Requires stabilizing some internal APIs of structured > streaming first > h3. TODO list > *Critical issues* > * [SPARK-14501]: Frequent Pattern Mining > * [SPARK-14709]: linear SVM > * [SPARK-15784]: Power Iteration Clustering (PIC) > *Lower priority issues* > * Missing methods within algorithms (see Issue Links below) > * evaluation submodule > * stat submodule (should probably be covered in DataFrames) > * Developer-facing submodules: > ** optimization (including [SPARK-17136]) > ** random, rdd > ** util > *To be prioritized* > * single-instance prediction: [SPARK-10413] > * pmml [SPARK-11171] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity for spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923294#comment-15923294 ] Joseph K. Bradley commented on SPARK-4591: -- For the record: * Kernel Density: later, I'd say * Multivariate: Now under [SPARK-19634] > Algorithm/model parity for spark.ml (Scala) > --- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve critical > feature parity for the next release. > h3. Instructions for 3 subtask types > *Review tasks*: detailed review of a subpackage to identify feature gaps > between spark.mllib and spark.ml. > * Should be listed as a subtask of this umbrella. > * Review subtasks cover major algorithm groups. To pick up a review subtask, > please: > ** Comment that you are working on it. > ** Compare the public APIs of spark.ml vs. spark.mllib. > ** Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > ** Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > *Critical tasks*: higher priority missing features which are required for > this umbrella JIRA. > * Should be linked as "requires" links. > *Other tasks*: lower priority missing features which can be completed after > the critical tasks. > * Should be linked as "contains" links. > h4. Excluded items > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * Moving linalg to spark.ml: [SPARK-13944] > * Streaming ML: Requires stabilizing some internal APIs of structured > streaming first > h3. TODO list > *Critical issues* > * [SPARK-14501]: Frequent Pattern Mining > * [SPARK-14709]: linear SVM > * [SPARK-15784]: Power Iteration Clustering (PIC) > *Lower priority issues* > * Missing methods within algorithms (see Issue Links below) > * evaluation submodule > * stat submodule (should probably be covered in DataFrames) > * Developer-facing submodules: > ** optimization (including [SPARK-17136]) > ** random, rdd > ** util > *To be prioritized* > * single-instance prediction: [SPARK-10413] > * pmml [SPARK-11171] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity for spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866288#comment-15866288 ] Timothy Hunter commented on SPARK-4591: --- [~josephkb] do you also want some subtasks for KernelDensity and multivariate summaries? They are in the state module but not covered. > Algorithm/model parity for spark.ml (Scala) > --- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve critical > feature parity for the next release. > h3. Instructions for 3 subtask types > *Review tasks*: detailed review of a subpackage to identify feature gaps > between spark.mllib and spark.ml. > * Should be listed as a subtask of this umbrella. > * Review subtasks cover major algorithm groups. To pick up a review subtask, > please: > ** Comment that you are working on it. > ** Compare the public APIs of spark.ml vs. spark.mllib. > ** Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > ** Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > *Critical tasks*: higher priority missing features which are required for > this umbrella JIRA. > * Should be linked as "requires" links. > *Other tasks*: lower priority missing features which can be completed after > the critical tasks. > * Should be linked as "contains" links. > h4. Excluded items > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * Moving linalg to spark.ml: [SPARK-13944] > * Streaming ML: Requires stabilizing some internal APIs of structured > streaming first > h3. TODO list > *Critical issues* > * [SPARK-14501]: Frequent Pattern Mining > * [SPARK-14709]: linear SVM > * [SPARK-15784]: Power Iteration Clustering (PIC) > *Lower priority issues* > * Missing methods within algorithms (see Issue Links below) > * evaluation submodule > * stat submodule (should probably be covered in DataFrames) > * Developer-facing submodules: > ** optimization (including [SPARK-17136]) > ** random, rdd > ** util > *To be prioritized* > * single-instance prediction: [SPARK-10413] > * pmml [SPARK-11171] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity for spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746473#comment-15746473 ] Joseph K. Bradley commented on SPARK-4591: -- I also removed the target version since this includes non-2.2 subtasks. > Algorithm/model parity for spark.ml (Scala) > --- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve critical > feature parity for the next release. > h3. Instructions for 3 subtask types > *Review tasks*: detailed review of a subpackage to identify feature gaps > between spark.mllib and spark.ml. > * Should be listed as a subtask of this umbrella. > * Review subtasks cover major algorithm groups. To pick up a review subtask, > please: > ** Comment that you are working on it. > ** Compare the public APIs of spark.ml vs. spark.mllib. > ** Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > ** Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > *Critical tasks*: higher priority missing features which are required for > this umbrella JIRA. > * Should be linked as "requires" links. > *Other tasks*: lower priority missing features which can be completed after > the critical tasks. > * Should be linked as "contains" links. > h4. Excluded items > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * Moving linalg to spark.ml: [SPARK-13944] > * Streaming ML: Requires stabilizing some internal APIs of structured > streaming first > h3. TODO list > *Critical issues* > * [SPARK-14501]: Frequent Pattern Mining > * [SPARK-14709]: linear SVM > * [SPARK-15784]: Power Iteration Clustering (PIC) > *Lower priority issues* > * Missing methods within algorithms (see Issue Links below) > * evaluation submodule > * stat submodule (should probably be covered in DataFrames) > * Developer-facing submodules: > ** optimization (including [SPARK-17136]) > ** random, rdd > ** util > *To be prioritized* > * single-instance prediction: [SPARK-10413] > * pmml [SPARK-11171] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity for spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746471#comment-15746471 ] Joseph K. Bradley commented on SPARK-4591: -- I just updated this a bit. I did not finish linking all issues mentioned in Review subtasks yet. > Algorithm/model parity for spark.ml (Scala) > --- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve critical > feature parity for the next release. > h3. Instructions for 3 subtask types > *Review tasks*: detailed review of a subpackage to identify feature gaps > between spark.mllib and spark.ml. > * Should be listed as a subtask of this umbrella. > * Review subtasks cover major algorithm groups. To pick up a review subtask, > please: > ** Comment that you are working on it. > ** Compare the public APIs of spark.ml vs. spark.mllib. > ** Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > ** Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > *Critical tasks*: higher priority missing features which are required for > this umbrella JIRA. > * Should be linked as "requires" links. > *Other tasks*: lower priority missing features which can be completed after > the critical tasks. > * Should be linked as "contains" links. > h4. Excluded items > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * Moving linalg to spark.ml: [SPARK-13944] > * Streaming ML: Requires stabilizing some internal APIs of structured > streaming first > h3. TODO list > *Critical issues* > * [SPARK-14501]: Frequent Pattern Mining > * [SPARK-14709]: linear SVM > * [SPARK-15784]: Power Iteration Clustering (PIC) > *Lower priority issues* > * Missing methods within algorithms (see Issue Links below) > * evaluation submodule > * stat submodule (should probably be covered in DataFrames) > * Developer-facing submodules: > ** optimization (including [SPARK-17136]) > ** random, rdd > ** util > *To be prioritized* > * single-instance prediction: [SPARK-10413] > * pmml [SPARK-11171] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity for spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746350#comment-15746350 ] Joseph K. Bradley commented on SPARK-4591: -- Good point. It should be. I'll add it. > Algorithm/model parity for spark.ml (Scala) > --- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve feature > parity for the next release. > Subtasks cover major algorithm groups. To pick up a review subtask, please: > * Comment that you are working on it. > * Compare the public APIs of spark.ml vs. spark.mllib. > * Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > * Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * single-Row prediction: [SPARK-10413] > Also, this does not include the following items (but will eventually): > * User-facing: > ** Streaming ML > ** evaluation > ** pmml > ** stat > ** linalg [SPARK-13944] > * Developer-facing: > ** optimization > ** random, rdd > ** util -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity for spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744481#comment-15744481 ] Felix Cheung commented on SPARK-4591: - Is SVM part of this? > Algorithm/model parity for spark.ml (Scala) > --- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve feature > parity for the next release. > Subtasks cover major algorithm groups. To pick up a review subtask, please: > * Comment that you are working on it. > * Compare the public APIs of spark.ml vs. spark.mllib. > * Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > * Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * single-Row prediction: [SPARK-10413] > Also, this does not include the following items (but will eventually): > * User-facing: > ** Streaming ML > ** evaluation > ** pmml > ** stat > ** linalg [SPARK-13944] > * Developer-facing: > ** optimization > ** random, rdd > ** util -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity in spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233104#comment-15233104 ] Joseph K. Bradley commented on SPARK-4591: -- Note: I am leaving this task targeted at 2.0 to bring attention to it. However, we will not achieve full parity for 2.0. > Algorithm/model parity in spark.ml (Scala) > -- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve feature > parity for the next release. > Subtasks cover major algorithm groups. To pick up a review subtask, please: > * Comment that you are working on it. > * Compare the public APIs of spark.ml vs. spark.mllib. > * Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > * Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * single-Row prediction: [SPARK-10413] > Also, this does not include the following items (but will eventually): > * User-facing: > ** Streaming ML > ** evaluation > ** fpm > ** pmml > ** stat > ** linalg [SPARK-13944] > * Developer-facing: > ** optimization > ** random, rdd > ** util -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity in spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227445#comment-15227445 ] Joseph K. Bradley commented on SPARK-4591: -- We will; eventually, we should support everything. I just noted the highest priority items first. > Algorithm/model parity in spark.ml (Scala) > -- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve feature > parity for the next release. > Subtasks cover major algorithm groups. To pick up a review subtask, please: > * Comment that you are working on it. > * Compare the public APIs of spark.ml vs. spark.mllib. > * Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > * Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * single-Row prediction: [SPARK-10413] > Also, this does not include the following items: > * User-facing: > ** Streaming ML (to be done under structured streaming in the 2.x line) > ** evaluation > ** fpm > ** pmml > ** stat > ** linalg [SPARK-13944] > * Developer-facing: > ** optimization > ** random, rdd > ** util -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity in spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226849#comment-15226849 ] Nick Pentreath commented on SPARK-4591: --- Are we explicitly not porting FPM models to ML? > Algorithm/model parity in spark.ml (Scala) > -- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve feature > parity for the next release. > Subtasks cover major algorithm groups. To pick up a review subtask, please: > * Comment that you are working on it. > * Compare the public APIs of spark.ml vs. spark.mllib. > * Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > * Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * single-Row prediction: [SPARK-10413] > Also, this does not include the following items: > * User-facing: > ** Streaming ML (to be done under structured streaming in the 2.x line) > ** evaluation > ** fpm > ** pmml > ** stat > ** linalg [SPARK-13944] > * Developer-facing: > ** optimization > ** random, rdd > ** util -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity in spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225133#comment-15225133 ] Joseph K. Bradley commented on SPARK-4591: -- Would others like to help review for parity? Thanks! > Algorithm/model parity in spark.ml (Scala) > -- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve feature > parity for the next release. > Subtasks cover major algorithm groups. To pick up a review subtask, please: > * Comment that you are working on it. > * Compare the public APIs of spark.ml vs. spark.mllib. > * Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > * Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * single-Row prediction: [SPARK-10413] > Also, this does not include the following items: > * User-facing: > ** Streaming ML (to be done under structured streaming in the 2.x line) > ** evaluation > ** fpm > ** pmml > ** stat > ** linalg [SPARK-13944] > * Developer-facing: > ** optimization > ** random, rdd > ** util -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4591) Algorithm/model parity in spark.ml (Scala)
[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225018#comment-15225018 ] Joseph K. Bradley commented on SPARK-4591: -- I created subtasks for reviewing the major algorithm classes. We can later decide how to handle the following items. User-facing: * Streaming ML (to be done under structured streaming in the 2.x line) * evaluation * fpm * pmml * stat Developer-facing: * optimization * random, rdd * util Note that linalg is being handled separately: [SPARK-13944] > Algorithm/model parity in spark.ml (Scala) > -- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML >Reporter: Xiangrui Meng >Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve feature > parity for the next release. > Create or link subtasks for: > * missing algorithms or models (However, this does NOT include stats or > linear algebra; those will be handled separately.) > * existing algorithms or models which are missing features, params, etc. > This only covers Scala since we can compare Scala vs. Python in spark.ml > itself. > _Note: Please search JIRA for existing issues to avoid duplicates._ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org