Sean Owen created SPARK-17768: --------------------------------- Summary: Small {Sum,Count,Mean}Evaluator problems and suboptimalities Key: SPARK-17768 URL: https://issues.apache.org/jira/browse/SPARK-17768 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.1 Reporter: Sean Owen Assignee: Sean Owen
This tracks a few related issues with org.apache.spark.partial.(Count,Mean,Sum)Evaluator and their "Grouped" counterparts: - GroupedMeanEvaluator and GroupedSumEvaluator are unused, as is the StudentTCacher support class - CountEvaluator can return a lower bound < 0, when counts can't be negative - MeanEvaluator will actually fail on exactly 1 datum (yields t-test with 0 DOF) - CountEvaluator uses a normal distribution, which may be an inappropriate approximation (leading to above) - CountEvaluator, MeanEvaluator have no unit tests to catch these - Duplication across CountEvaluator, GroupedCountEvaluator - SumEvaluator might have an issue related to CountEvaluator (or could delegate to compute CountEvaluator times MeanEvaluator?) - The stats in each could use a bit of documentation as I had to guess at them - (Code could use a few cleanups and optimizations too) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org