[ https://issues.apache.org/jira/browse/METRON-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768374#comment-15768374 ]
ASF GitHub Bot commented on METRON-637: --------------------------------------- Github user mattf-horton commented on a diff in the pull request: https://github.com/apache/incubator-metron/pull/401#discussion_r93530871 --- Diff: metron-analytics/metron-statistics/src/test/java/org/apache/metron/statistics/StellarStatisticsFunctionsTest.java --- @@ -356,6 +357,44 @@ public void testSkewness() throws Exception { assertEquals(stats.getSkewness(), (Double) actual, 0.1); } + @Test + public void testStatsBin() throws Exception { + statsInit(windowSize); + statsBinRunner(StellarStatisticsFunctions.Bin.BinSplits.QUARTILE.split); + statsBinRunner(StellarStatisticsFunctions.Bin.BinSplits.QUARTILE.split, "'QUARTILE'"); + statsBinRunner(StellarStatisticsFunctions.Bin.BinSplits.QUINTILE.split, "'QUINTILE'"); + statsBinRunner(StellarStatisticsFunctions.Bin.BinSplits.DECILE.split, "'DECILE'"); + statsBinRunner(ImmutableList.of(25.0, 50.0, 75.0), "[25.0, 50.0, 75.0]"); + } + + public void statsBinRunner(List<Double> splits) throws Exception { + statsBinRunner(splits, null); + } + + public void statsBinRunner(List<Double> splits, String splitsName) throws Exception { + int bin = 0; + for(Double d : stats.getSortedValues()) { + StatisticsProvider provider = (StatisticsProvider)variables.get("stats"); + if(bin < splits.size()) { + double percentileOfBin = provider.getPercentile(splits.get(bin)); + if (d > percentileOfBin) { --- End diff -- Sorry if I'm wrong here, but I couldn't find the definition of stats.getSortedValues(). Isn't this line 380 comparing a raw value `d` to a percentile value `percentileOfBin` ? > Add a STATS_BIN function to Stellar. > ------------------------------------ > > Key: METRON-637 > URL: https://issues.apache.org/jira/browse/METRON-637 > Project: Metron > Issue Type: Improvement > Reporter: Casey Stella > Original Estimate: 48h > Remaining Estimate: 48h > > When passing parameters to models, it's often useful to pass the binned > representation of a variable based on an empirical statistical distribution, > rather than the actual variable. This function should accept a set of > percentile bins and a statistical sketch and a value. It should return the > index where the percentile of the value falls. > For instance, consider the value 17 who is percentile 27. If we use 25, 75, > 95 to define our bins, this function would return 1, because its percentile, > 27, is between 25 and 75. -- This message was sent by Atlassian JIRA (v6.3.4#6332)