[ https://issues.apache.org/jira/browse/SPARK-26899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-26899. ------------------------------- Resolution: Not A Problem This isn't "Major", and I don't think it's a doc problem in a comment, and I don't think it's wrong: it's just stating what count min sketch is for in general. > CountMinSketchAgg ExpressionDescription is not so correct > --------------------------------------------------------- > > Key: SPARK-26899 > URL: https://issues.apache.org/jira/browse/SPARK-26899 > Project: Spark > Issue Type: Documentation > Components: SQL > Affects Versions: 2.4.0 > Reporter: tomzhu > Priority: Major > > Hi, all, there are some not-so-correct comment in CountMinSketchAgg.scala, > the ExpressionDescription says: > {code:java} > @ExpressionDescription( > usage = """ > _FUNC_(col, eps, confidence, seed) - Returns a count-min sketch of a > column with the given esp, > confidence and seed. The result is an array of bytes, which can be > deserialized to a > `CountMinSketch` before usage. Count-min sketch is a probabilistic data > structure used for > cardinality estimation using sub-linear space. > """, > since = "2.2.0") > {code} > , *the Count-min sketch is a probabilistic data structure used for > cardinality estimation*, ** actually, Count-min sketch is mainly used for > point query, self_join size query, > how can it support cardinality estimation? a fix might be better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org