HyukjinKwon commented on a change in pull request #27109: [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package URL: https://github.com/apache/spark/pull/27109#discussion_r363555414
########## File path: python/pyspark/sql/dataframe.py ########## @@ -31,23 +31,23 @@ from pyspark import copy_func, since, _NoValue from pyspark.rdd import RDD, _load_from_socket, _local_iterator_from_socket, \ - ignore_unicode_prefix, PythonEvalType -from pyspark.serializers import ArrowCollectSerializer, BatchedSerializer, PickleSerializer, \ + ignore_unicode_prefix +from pyspark.serializers import BatchedSerializer, PickleSerializer, \ UTF8Deserializer from pyspark.storagelevel import StorageLevel from pyspark.traceback_utils import SCCallSiteSync from pyspark.sql.types import _parse_datatype_json_string from pyspark.sql.column import Column, _to_seq, _to_list, _to_java_column from pyspark.sql.readwriter import DataFrameWriter from pyspark.sql.streaming import DataStreamWriter -from pyspark.sql.types import IntegralType from pyspark.sql.types import * -from pyspark.util import _exception_message +from pyspark.sql.pandas.conversion import PandasConversionMixin +from pyspark.sql.pandas.map_ops import PandasMapOpsMixin __all__ = ["DataFrame", "DataFrameNaFunctions", "DataFrameStatFunctions"] -class DataFrame(object): +class DataFrame(PandasMapOpsMixin, PandasConversionMixin): Review comment: Yeah, ```python def __init__(self, ...): ... self._pandasMapOpsMixin = PandasMapOpsMixin(self) ... def mapInPandas(self, udf): return self._pandasMapOpsMixin.mapInPandas(udf) ``` This one, the `DataFrame` or `SparkSession` have to expose the method themselves in their own classes. So, if you want to add some new pandas APIs, then it should be added into `PandasMapOpsMixin` and then also `DataFrame`. Using `__getattr__` looks to me overkill. The point I wanted to say is, the current `DataFrame` with mixin approaches is closer to the Scala side's `Dataset` since it does not expose such APIs. Also, I believe what I am doing is what self type trait is supposed to be doing. It's coupled to specific type and other types cannot implement this trait. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org