amaliujia commented on code in PR #38938: URL: https://github.com/apache/spark/pull/38938#discussion_r1041691534
########## python/pyspark/sql/connect/dataframe.py: ########## @@ -1239,6 +1239,16 @@ def summary(self, *statistics: str) -> "DataFrame": session=self._session, ) + def describe(self, *cols: str) -> "DataFrame": + _cols: List[str] = list(cols) Review Comment: ```suggestion """Computes basic statistics for numeric and string columns .. versionadded:: 3.4.0 This include count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical or string columns. Notes ----- This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting :class:`DataFrame`. Use summary for expanded statistics and control over which statistics to compute. Parameters ---------- cols : str, list, optional Column name or list of column names to describe by (default All columns). Returns ------- :class:`DataFrame` A new DataFrame that describes (provides statistics) given DataFrame. """ _cols: List[str] = list(cols) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org