[ 
https://issues.apache.org/jira/browse/METRON-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768438#comment-15768438
 ] 

ASF GitHub Bot commented on METRON-637:
---------------------------------------

Github user mattf-horton commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/401#discussion_r93537846
  
    --- Diff: 
metron-analytics/metron-statistics/src/test/java/org/apache/metron/statistics/StellarStatisticsFunctionsTest.java
 ---
    @@ -356,6 +357,44 @@ public void testSkewness() throws Exception {
         assertEquals(stats.getSkewness(), (Double) actual, 0.1);
       }
     
    +  @Test
    +  public void testStatsBin() throws Exception {
    +    statsInit(windowSize);
    +    
statsBinRunner(StellarStatisticsFunctions.Bin.BinSplits.QUARTILE.split);
    +    
statsBinRunner(StellarStatisticsFunctions.Bin.BinSplits.QUARTILE.split, 
"'QUARTILE'");
    +    
statsBinRunner(StellarStatisticsFunctions.Bin.BinSplits.QUINTILE.split, 
"'QUINTILE'");
    +    statsBinRunner(StellarStatisticsFunctions.Bin.BinSplits.DECILE.split, 
"'DECILE'");
    +    statsBinRunner(ImmutableList.of(25.0, 50.0, 75.0), "[25.0, 50.0, 
75.0]");
    +  }
    +
    +  public void statsBinRunner(List<Double> splits) throws Exception {
    +    statsBinRunner(splits, null);
    +  }
    +
    +  public void statsBinRunner(List<Double> splits, String splitsName) 
throws Exception {
    +    int bin = 0;
    +    for(Double d : stats.getSortedValues()) {
    +      StatisticsProvider provider = 
(StatisticsProvider)variables.get("stats");
    +      if(bin < splits.size()) {
    +        double percentileOfBin = provider.getPercentile(splits.get(bin));
    +        if (d > percentileOfBin) {
    +          //we aren't the right bin, so let's find the right one.
    +          // Keep in mind that this value could be more than one bin away 
from the last good bin.
    +          for(;bin < splits.size() && d > 
provider.getPercentile(splits.get(bin));bin++) {
    +
    --- End diff --
    
    Yup, thanks.  Then the block (lines 378-387) could be better stated as
    ```
    while ( bin < splits.size()  &&  d > 
provider.getPercentile(splits.get(bin)) ) {
           //increment the bin number until it includes the target value, or we 
run out of bins
           bin++; 
    }
    ```


> Add a STATS_BIN function to Stellar.
> ------------------------------------
>
>                 Key: METRON-637
>                 URL: https://issues.apache.org/jira/browse/METRON-637
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When passing parameters to models, it's often useful to pass the binned 
> representation of a variable based on an empirical statistical distribution, 
> rather than the actual variable.  This function should accept a set of 
> percentile bins and a statistical sketch and a value.  It should return the 
> index where the percentile of the value falls.
> For instance, consider the value 17 who is percentile 27.  If we use 25, 75, 
> 95 to define our bins, this function would return 1, because its percentile, 
> 27, is between 25 and 75.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to