GitHub user sethah opened a pull request:

    https://github.com/apache/spark/pull/15721

    [SPARK-17772][ML][TEST] Add test functions for ML sample weights

    ## What changes were proposed in this pull request?
    
    More and more ML algos are accepting sample weights, and they have been 
tested rather heterogeneously and with code duplication. This patch adds 
extensible helper methods to `MLTestingUtils` that can be reused by various 
algorithms accepting sample weights. Up to now, there seems to be a few tests 
that have been implemented commonly:
    
    * Check that oversampling is the same as giving the instances sample 
weights proportional to the number of samples
    * Check that outliers with tiny sample weights do not affect the 
algorithm's performance
    
    This patch adds an additional test:
    
    * Check that algorithms are invariant to constant scaling of the sample 
weights. i.e. uniform sample weights with `w_i = 1.0` is effectively the same 
as uniform sample weights with `w_i = 10000` or `w_i = 0.0001`
    
    The instances of these tests occurred in LinearRegression, NaiveBayes, and 
LogisticRegression. Those tests have been removed/modified to use the new 
helper methods. These helper functions will be of use when 
[SPARK-9478](https://issues.apache.org/jira/browse/SPARK-9478) is implemented. 
    
    ## How was this patch tested?
    
    This patch only involves modifying test suites.
    
    ## Other notes
    
    Both IsotonicRegression and GeneralizedLinearRegression also extend 
`HasWeightCol`. I did not modify these test suites because it will make this 
patch easier to review, and because they did not duplicate the same tests as 
the three suites that were modified. If we want to change them later, we can 
create a JIRA for it now, but it's open for debate.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sethah/spark SPARK-17772

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15721
    
----
commit e10be455ee943230a96e57370b718683647e6f03
Author: sethah <seth.hendrickso...@gmail.com>
Date:   2016-10-18T21:27:02Z

    add sample weight helper tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to