[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-09-08 Thread Peng Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475860#comment-15475860
 ] 

Peng Meng commented on SPARK-6160:
--

hi [~GayathriMurali], are you still working on this, if not, I can work on it. 
thanks.

> ChiSqSelector should keep test statistic info
> -
>
> Key: SPARK-6160
> URL: https://issues.apache.org/jira/browse/SPARK-6160
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> It is useful to have the test statistics explaining selected features, but 
> these data are thrown out when constructing the ChiSqSelectorModel.  The data 
> are expensive to recompute, so the ChiSqSelectorModel should store and expose 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-09-08 Thread Peng Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475769#comment-15475769
 ] 

Peng Meng commented on SPARK-6160:
--

Hi [~josephkb], I have some discussion with [~srowen] about keeping test 
statistic info of ChiSqSelector in PR: 
https://github.com/apache/spark/pull/14597. 
Can you review that PR, and commit: 
https://github.com/apache/spark/pull/14597/commits/3d6aecb8441503c9c3d62a2d8a3d48824b9d6637


> ChiSqSelector should keep test statistic info
> -
>
> Key: SPARK-6160
> URL: https://issues.apache.org/jira/browse/SPARK-6160
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> It is useful to have the test statistics explaining selected features, but 
> these data are thrown out when constructing the ChiSqSelectorModel.  The data 
> are expensive to recompute, so the ChiSqSelectorModel should store and expose 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-09-08 Thread Peng Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475709#comment-15475709
 ] 

Peng Meng commented on SPARK-6160:
--

hi Joseph K. Bradley

> ChiSqSelector should keep test statistic info
> -
>
> Key: SPARK-6160
> URL: https://issues.apache.org/jira/browse/SPARK-6160
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> It is useful to have the test statistics explaining selected features, but 
> these data are thrown out when constructing the ChiSqSelectorModel.  The data 
> are expensive to recompute, so the ChiSqSelectorModel should store and expose 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-04-06 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228990#comment-15228990
 ] 

Joseph K. Bradley commented on SPARK-6160:
--

Update: This should be done within spark.ml, rather than spark.mllib.  If this 
requires changes within spark.mllib, then we should move the implementation to 
spark.ml (in a separate JIRA) before working on this task.

[~GayathriMurali]  I don't think anyone is working on this.  Let's figure out a 
good API for those results within spark.ml, and then it should be clear how to 
persist them (as a set of DataFrame columns, presumably).  It may not make 
sense to use ChiSqTestResult within spark.ml, so we should decide on a good API 
there.

> ChiSqSelector should keep test statistic info
> -
>
> Key: SPARK-6160
> URL: https://issues.apache.org/jira/browse/SPARK-6160
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> It is useful to have the test statistics explaining selected features, but 
> these data are thrown out when constructing the ChiSqSelectorModel.  The data 
> are expensive to recompute, so the ChiSqSelectorModel should store and expose 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-02-29 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173163#comment-15173163
 ] 

Gayathri Murali commented on SPARK-6160:


[~josephkb] Should the test statistics result be stored as a text/parquet file? 
or Can it just be stored in a local array? 

> ChiSqSelector should keep test statistic info
> -
>
> Key: SPARK-6160
> URL: https://issues.apache.org/jira/browse/SPARK-6160
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> It is useful to have the test statistics explaining selected features, but 
> these data are thrown out when constructing the ChiSqSelectorModel.  The data 
> are expensive to recompute, so the ChiSqSelectorModel should store and expose 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-02-24 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166419#comment-15166419
 ] 

Gayathri Murali commented on SPARK-6160:


Is anyone working on this? If not, I can.

> ChiSqSelector should keep test statistic info
> -
>
> Key: SPARK-6160
> URL: https://issues.apache.org/jira/browse/SPARK-6160
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> It is useful to have the test statistics explaining selected features, but 
> these data are thrown out when constructing the ChiSqSelectorModel.  The data 
> are expensive to recompute, so the ChiSqSelectorModel should store and expose 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2015-03-20 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372256#comment-14372256
 ] 

Joseph K. Bradley commented on SPARK-6160:
--

I think that sounds reasonable.

> ChiSqSelector should keep test statistic info
> -
>
> Key: SPARK-6160
> URL: https://issues.apache.org/jira/browse/SPARK-6160
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> It is useful to have the test statistics explaining selected features, but 
> these data are thrown out when constructing the ChiSqSelectorModel.  The data 
> are expensive to recompute, so the ChiSqSelectorModel should store and expose 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2015-03-20 Thread Vikas Veshishth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371172#comment-14371172
 ] 

Vikas Veshishth commented on SPARK-6160:


Do you want the Array[ChiSqTestResult] set within a new ChiSqSelectorModel ?

> ChiSqSelector should keep test statistic info
> -
>
> Key: SPARK-6160
> URL: https://issues.apache.org/jira/browse/SPARK-6160
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> It is useful to have the test statistics explaining selected features, but 
> these data are thrown out when constructing the ChiSqSelectorModel.  The data 
> are expensive to recompute, so the ChiSqSelectorModel should store and expose 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org