HyukjinKwon commented on a change in pull request #32835: URL: https://github.com/apache/spark/pull/32835#discussion_r648800503
########## File path: python/docs/source/user_guide/pandas_on_spark/typehints.rst ########## @@ -1,36 +1,36 @@ -==================== -Type Hints In Koalas -==================== +================================== +Type Hints In pandas APIs on Spark +================================== .. currentmodule:: pyspark.pandas -Koalas, by default, infers the schema by taking some top records from the output, -in particular, when you use APIs that allow users to apply a function against Koalas DataFrame +Pandas APIs on Spark, by default, infers the schema by taking some top records from the output, +in particular, when you use APIs that allow users to apply a function against pandas APIs on Spark DataFrame such as :func:`DataFrame.transform`, :func:`DataFrame.apply`, :func:`DataFrame.koalas.apply_batch`, :func:`DataFrame.koalas.apply_batch`, :func:`Series.koalas.apply_batch`, etc. However, this is potentially expensive. If there are several expensive operations such as a shuffle -in the upstream of the execution plan, Koalas will end up with executing the Spark job twice, once +in the upstream of the execution plan, pandas APIs on Spark will end up with executing the Spark job twice, once for schema inference, and once for processing actual data with the schema. -To avoid the consequences, Koalas has its own type hinting style to specify the schema to avoid -schema inference. Koalas understands the type hints specified in the return type and converts it +To avoid the consequences, pandas APIs on Spark has its own type hinting style to specify the schema to avoid +schema inference. Pandas APIs on Spark understands the type hints specified in the return type and converts it as a Spark schema for pandas UDFs used internally. The way of type hinting has been evolved over the time. In this chapter, it covers the recommended way and the supported ways in details. .. note:: - The variadic generics support is experimental and unstable in Koalas. + The variadic generics support is experimental and unstable in pandas APIs on Spark. The way of typing can change between minor releases without a warning. See also `PEP 646 <https://www.python.org/dev/peps/pep-0646/>`_ for variadic generics in Python. -Koalas DataFrame and Pandas DataFrame -------------------------------------- +Pandas APIs on Spark DataFrame and Pandas DataFrame +--------------------------------------------------- -In the early Koalas version, it was introduced to specify a type hint in the function in order to use -it as a Spark schema. As an example, you can specify the return type hint as below by using Koalas +In the early pandas APIs on Spark version, it was introduced to specify a type hint in the function in order to use +it as a Spark schema. As an example, you can specify the return type hint as below by using pandas APIs on Spark Review comment: pandas-on-Spark DataFrame -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org