[jira] [Commented] (SPARK-18088) ChiSqSelector FPR PR cleanups

Sean Owen (JIRA) Tue, 25 Oct 2016 05:53:14 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-18088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605239#comment-15605239
 ]


Sean Owen commented on SPARK-18088:
-----------------------------------

[~josephkb] could we pause and discuss this? I'm not sure I agree with some 
your assertions here. It might be useful to review the discussion on the 
previous changes. For example, it's actually comparing on the raw stat that's 
incorrect in the context of Spark's implementation.

> ChiSqSelector FPR PR cleanups
> -----------------------------
>
>                 Key: SPARK-18088
>                 URL: https://issues.apache.org/jira/browse/SPARK-18088
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>
> There are several cleanups I'd like to make as a follow-up to the PRs from 
> [SPARK-17017]:
> * Rename selectorType values to match corresponding Params
> * Add Since tags where missing
> * a few minor cleanups
> One major item: FPR is not implemented correctly.  Testing against only the 
> p-value and not the test statistic does not really tell you anything.  We 
> should follow sklearn, which allows a p-value threshold for any selection 
> method: 
> [http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFpr.html]
> * In this PR, I'm just going to remove FPR completely.  We can add it back in 
> a follow-up PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18088) ChiSqSelector FPR PR cleanups

Reply via email to