Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20487#discussion_r166547502 --- Diff: python/pyspark/sql/session.py --- @@ -646,6 +646,9 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr except Exception: has_pandas = False if has_pandas and isinstance(data, pandas.DataFrame): + from pyspark.sql.utils import require_minimum_pandas_version + require_minimum_pandas_version() --- End diff -- I don't think I exactly know all the places exactly. For now, I can think of: createDataFrame with Pandas DataFrame input, toPandas and pandas_udf for APIs, and some places in `session.py` / `types.py` for internal methods like `_check*` family or `*arrow*` or `*pandas*`. I was thinking of working on putting those into a single module (file) after 2.3.0. Will cc you and @ueshin there.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org