[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475860#comment-15475860 ] Peng Meng commented on SPARK-6160: -- hi [~GayathriMurali], are you still working on this, if not, I can work on it. thanks. > ChiSqSelector should keep test statistic info > - > > Key: SPARK-6160 > URL: https://issues.apache.org/jira/browse/SPARK-6160 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > It is useful to have the test statistics explaining selected features, but > these data are thrown out when constructing the ChiSqSelectorModel. The data > are expensive to recompute, so the ChiSqSelectorModel should store and expose > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475769#comment-15475769 ] Peng Meng commented on SPARK-6160: -- Hi [~josephkb], I have some discussion with [~srowen] about keeping test statistic info of ChiSqSelector in PR: https://github.com/apache/spark/pull/14597. Can you review that PR, and commit: https://github.com/apache/spark/pull/14597/commits/3d6aecb8441503c9c3d62a2d8a3d48824b9d6637 > ChiSqSelector should keep test statistic info > - > > Key: SPARK-6160 > URL: https://issues.apache.org/jira/browse/SPARK-6160 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > It is useful to have the test statistics explaining selected features, but > these data are thrown out when constructing the ChiSqSelectorModel. The data > are expensive to recompute, so the ChiSqSelectorModel should store and expose > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475709#comment-15475709 ] Peng Meng commented on SPARK-6160: -- hi Joseph K. Bradley > ChiSqSelector should keep test statistic info > - > > Key: SPARK-6160 > URL: https://issues.apache.org/jira/browse/SPARK-6160 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > It is useful to have the test statistics explaining selected features, but > these data are thrown out when constructing the ChiSqSelectorModel. The data > are expensive to recompute, so the ChiSqSelectorModel should store and expose > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228990#comment-15228990 ] Joseph K. Bradley commented on SPARK-6160: -- Update: This should be done within spark.ml, rather than spark.mllib. If this requires changes within spark.mllib, then we should move the implementation to spark.ml (in a separate JIRA) before working on this task. [~GayathriMurali] I don't think anyone is working on this. Let's figure out a good API for those results within spark.ml, and then it should be clear how to persist them (as a set of DataFrame columns, presumably). It may not make sense to use ChiSqTestResult within spark.ml, so we should decide on a good API there. > ChiSqSelector should keep test statistic info > - > > Key: SPARK-6160 > URL: https://issues.apache.org/jira/browse/SPARK-6160 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > It is useful to have the test statistics explaining selected features, but > these data are thrown out when constructing the ChiSqSelectorModel. The data > are expensive to recompute, so the ChiSqSelectorModel should store and expose > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173163#comment-15173163 ] Gayathri Murali commented on SPARK-6160: [~josephkb] Should the test statistics result be stored as a text/parquet file? or Can it just be stored in a local array? > ChiSqSelector should keep test statistic info > - > > Key: SPARK-6160 > URL: https://issues.apache.org/jira/browse/SPARK-6160 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Priority: Minor > > It is useful to have the test statistics explaining selected features, but > these data are thrown out when constructing the ChiSqSelectorModel. The data > are expensive to recompute, so the ChiSqSelectorModel should store and expose > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166419#comment-15166419 ] Gayathri Murali commented on SPARK-6160: Is anyone working on this? If not, I can. > ChiSqSelector should keep test statistic info > - > > Key: SPARK-6160 > URL: https://issues.apache.org/jira/browse/SPARK-6160 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Priority: Minor > > It is useful to have the test statistics explaining selected features, but > these data are thrown out when constructing the ChiSqSelectorModel. The data > are expensive to recompute, so the ChiSqSelectorModel should store and expose > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372256#comment-14372256 ] Joseph K. Bradley commented on SPARK-6160: -- I think that sounds reasonable. > ChiSqSelector should keep test statistic info > - > > Key: SPARK-6160 > URL: https://issues.apache.org/jira/browse/SPARK-6160 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Priority: Minor > > It is useful to have the test statistics explaining selected features, but > these data are thrown out when constructing the ChiSqSelectorModel. The data > are expensive to recompute, so the ChiSqSelectorModel should store and expose > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info
[ https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371172#comment-14371172 ] Vikas Veshishth commented on SPARK-6160: Do you want the Array[ChiSqTestResult] set within a new ChiSqSelectorModel ? > ChiSqSelector should keep test statistic info > - > > Key: SPARK-6160 > URL: https://issues.apache.org/jira/browse/SPARK-6160 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley >Priority: Minor > > It is useful to have the test statistics explaining selected features, but > these data are thrown out when constructing the ChiSqSelectorModel. The data > are expensive to recompute, so the ChiSqSelectorModel should store and expose > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org