Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19774#discussion_r152264097 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -689,6 +689,11 @@ case class DescribeColumnCommand( buffer += Row("distinct_count", cs.map(_.distinctCount.toString).getOrElse("NULL")) buffer += Row("avg_col_len", cs.map(_.avgLen.toString).getOrElse("NULL")) buffer += Row("max_col_len", cs.map(_.maxLen.toString).getOrElse("NULL")) + buffer ++= cs.flatMap(_.histogram.map { hist => + val header = Row("histogram", s"height: ${hist.height}, num_of_bins: ${hist.bins.length}") + Seq(header) ++ hist.bins.map(bin => + Row("", s"lower_bound: ${bin.lo}, upper_bound: ${bin.hi}, distinct_count: ${bin.ndv}")) --- End diff -- @gatorsmile In Hive, there isn't yet a histogram implementation (HIVE-3526). In Oracle and MySQL, the information is stored in metadata tables which can be queried (https://docs.oracle.com/cloud/latest/db112/REFRN/statviews_2106.htm#REFRN20279) and here we have mainly two information: - the cumulative count so far; - the endpoint value for the current bin.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org