[ https://issues.apache.org/jira/browse/SPARK-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod KC updated SPARK-7438: ---------------------------- Description: Eg Code: val a = sc.parallelize(1 to 10000, 20) val b = a ++ a ++ a ++ a ++ a b.countApproxDistinct(0.38) "java.lang.IllegalArgumentException: requirement failed: p (3) must be at least 4" Issue 1: When relative accuracy >= 0.38, IAE is thrown, as the precision p evaluates to 3. However,same input in countApproxDistinctByKey(0.38), works fine. Usage of relativeSD should be consistent in both countApproxDistinct and countApproxDistinctByKey Issue 2: Validation error message "p (3) must be at least 4" is not giving a clue on what went wrong. Issue 3: When relative accuracy < 0.000017, a proper validation error message is not shown from countApproxDistinct was: Eg Code: val a = sc.parallelize(1 to 10000, 20) val b = a++a++a++a++a b.countApproxDistinct(0.38) "java.lang.IllegalArgumentException: requirement failed: p (3) must be at least 4" Issue 1: When relative accuracy >= 0.38, IAE is thrown, as the precision p evaluates to 3. However,same input in countApproxDistinctByKey(0.38), works fine. Usage of relativeSD should be consistent in both countApproxDistinct and countApproxDistinctByKey Issue 2: Validation error message "p (3) must be at least 4" is not giving a clue on what went wrong. Issue 3: When relative accuracy < 0.000017, a proper validation error message is not shown from countApproxDistinct > Validation Error while running countApproxDistinct with relative accuracy > >= 0.38 > -------------------------------------------------------------------------------------- > > Key: SPARK-7438 > URL: https://issues.apache.org/jira/browse/SPARK-7438 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Vinod KC > Priority: Minor > > Eg Code: > val a = sc.parallelize(1 to 10000, 20) > val b = a ++ a ++ a ++ a ++ a > b.countApproxDistinct(0.38) > "java.lang.IllegalArgumentException: requirement failed: p (3) must be at > least 4" > Issue 1: When relative accuracy >= 0.38, IAE is thrown, as the precision p > evaluates to 3. > However,same input in countApproxDistinctByKey(0.38), works fine. Usage of > relativeSD should be consistent in both countApproxDistinct and > countApproxDistinctByKey > Issue 2: Validation error message "p (3) must be at least 4" is not giving a > clue on what went wrong. > Issue 3: When relative accuracy < 0.000017, a proper validation error message > is not shown from countApproxDistinct -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org