Sean Owen created SPARK-17768:
---------------------------------

             Summary: Small {Sum,Count,Mean}Evaluator problems and 
suboptimalities
                 Key: SPARK-17768
                 URL: https://issues.apache.org/jira/browse/SPARK-17768
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.1
            Reporter: Sean Owen
            Assignee: Sean Owen


This tracks a few related issues with 
org.apache.spark.partial.(Count,Mean,Sum)Evaluator and their "Grouped" 
counterparts:

- GroupedMeanEvaluator and GroupedSumEvaluator are unused, as is the 
StudentTCacher support class
- CountEvaluator can return a lower bound < 0, when counts can't be negative
- MeanEvaluator will actually fail on exactly 1 datum (yields t-test with 0 DOF)
- CountEvaluator uses a normal distribution, which may be an inappropriate 
approximation (leading to above)
- CountEvaluator, MeanEvaluator have no unit tests to catch these
- Duplication across CountEvaluator, GroupedCountEvaluator
- SumEvaluator might have an issue related to CountEvaluator (or could delegate 
to compute CountEvaluator times MeanEvaluator?)
- The stats in each could use a bit of documentation as I had to guess at them
- (Code could use a few cleanups and optimizations too)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to