HyukjinKwon commented on a change in pull request #32835: URL: https://github.com/apache/spark/pull/32835#discussion_r648793340
########## File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst ########## @@ -5,23 +5,23 @@ Best Practices Leverage PySpark APIs --------------------- -Koalas uses Spark under the hood; therefore, many features and performance optimization are available -in Koalas as well. Leverage and combine those cutting-edge features with Koalas. +Pandas APIs on Spark uses Spark under the hood; therefore, many features and performance optimization are available +in pandas APIs on Spark as well. Leverage and combine those cutting-edge features with pandas APIs on Spark. -Existing Spark context and Spark sessions are used out of the box in Koalas. If you already have your own -configured Spark context or sessions running, Koalas uses them. +Existing Spark context and Spark sessions are used out of the box in pandas APIs on Spark. If you already have your own +configured Spark context or sessions running, pandas APIs on Spark uses them. If there is no Spark context or session running in your environment (e.g., ordinary Python interpreter), such configurations can be set to ``SparkContext`` and/or ``SparkSession``. -Once Spark context and/or session is created, Koalas can use this context and/or session automatically. +Once Spark context and/or session is created, pandas APIs on Spark can use this context and/or session automatically. For example, if you want to configure the executor memory in Spark, you can do as below: .. code-block:: python from pyspark import SparkConf, SparkContext conf = SparkConf() conf.set('spark.executor.memory', '2g') - # Koalas automatically uses this Spark context with the configurations set. + # Pandas APIs on Spark automatically uses this Spark context with the configurations set. Review comment: ```suggestion # Pandas APIs on Spark automatically use this Spark context with the configurations set. ``` ########## File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst ########## @@ -33,23 +33,23 @@ it can be set into Spark session as below: .. code-block:: python from pyspark.sql import SparkSession - builder = SparkSession.builder.appName("Koalas") + builder = SparkSession.builder.appName("pandas-on-spark") builder = builder.config("spark.sql.execution.arrow.enabled", "true") - # Koalas automatically uses this Spark session with the configurations set. + # Pandas APIs on Spark automatically uses this Spark session with the configurations set. Review comment: ```suggestion # Pandas APIs on Spark automatically use this Spark session with the configurations set. ``` ########## File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst ########## @@ -33,23 +33,23 @@ it can be set into Spark session as below: .. code-block:: python from pyspark.sql import SparkSession - builder = SparkSession.builder.appName("Koalas") + builder = SparkSession.builder.appName("pandas-on-spark") builder = builder.config("spark.sql.execution.arrow.enabled", "true") - # Koalas automatically uses this Spark session with the configurations set. + # Pandas APIs on Spark automatically uses this Spark session with the configurations set. builder.getOrCreate() import pyspark.pandas as ks ... -All Spark features such as history server, web UI and deployment modes can be used as are with Koalas. +All Spark features such as history server, web UI and deployment modes can be used as are with pandas APIs on Spark. If you are interested in performance tuning, please see also `Tuning Spark <https://spark.apache.org/docs/latest/tuning.html>`_. Check execution plans --------------------- Expensive operations can be predicted by leveraging PySpark API `DataFrame.spark.explain()` -before the actual computation since Koalas is based on lazy execution. For example, see below. +before the actual computation since pandas APIs on Spark is based on lazy execution. For example, see below. Review comment: ```suggestion before the actual computation since pandas APIs on Spark are based on lazy execution. For example, see below. ``` ########## File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst ########## @@ -65,14 +65,14 @@ before the actual computation since Koalas is based on lazy execution. For examp Whenever you are not sure about such cases, you can check the actual execution plans and foresee the expensive cases. -Even though Koalas tries its best to optimize and reduce such shuffle operations by leveraging Spark +Even though pandas APIs on Spark tries its best to optimize and reduce such shuffle operations by leveraging Spark Review comment: ```suggestion Even though pandas APIs on Spark try its best to optimize and reduce such shuffle operations by leveraging Spark ``` ########## File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst ########## @@ -157,14 +157,14 @@ as it is less expensive because data can be distributed and computed for each gr Avoid reserved column names --------------------------- -Columns with leading ``__`` and trailing ``__`` are reserved in Koalas. To handle internal behaviors for, such as, index, -Koalas uses some internal columns. Therefore, it is discouraged to use such column names and not guaranteed to work. +Columns with leading ``__`` and trailing ``__`` are reserved in pandas APIs on Spark. To handle internal behaviors for, such as, index, +pandas APIs on Spark uses some internal columns. Therefore, it is discouraged to use such column names and not guaranteed to work. Do not use duplicated column names ---------------------------------- -It is disallowed to use duplicated column names because Spark SQL does not allow this in general. Koalas inherits +It is disallowed to use duplicated column names because Spark SQL does not allow this in general. Pandas APIs on Spark inherits Review comment: ```suggestion It is disallowed to use duplicated column names because Spark SQL does not allow this in general. Pandas APIs on Spark inherit ``` ########## File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst ########## @@ -175,7 +175,7 @@ this behavior. For instance, see below: ... Reference 'a' is ambiguous, could be: a, a.; -Additionally, it is strongly discouraged to use case sensitive column names. Koalas disallows it by default. +Additionally, it is strongly discouraged to use case sensitive column names. Pandas APIs on Spark disallows it by default. Review comment: ```suggestion Additionally, it is strongly discouraged to use case sensitive column names. Pandas APIs on Spark disallow it by default. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org