Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r166547502
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -646,6 +646,9 @@ def createDataFrame(self, data, schema=None, 
samplingRatio=None, verifySchema=Tr
             except Exception:
                 has_pandas = False
             if has_pandas and isinstance(data, pandas.DataFrame):
    +            from pyspark.sql.utils import require_minimum_pandas_version
    +            require_minimum_pandas_version()
    --- End diff --
    
    I don't think I exactly know all the places exactly. For now, I can think 
of: createDataFrame with Pandas DataFrame input, toPandas and pandas_udf for 
APIs, and some places in `session.py` / `types.py` for internal methods like 
`_check*` family or `*arrow*` or `*pandas*`.
    
    I was thinking of working on putting those into a single module (file) 
after 2.3.0. Will cc you and @ueshin there.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to