[ https://issues.apache.org/jira/browse/SPARK-40178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582932#comment-17582932 ]
Apache Spark commented on SPARK-40178: -------------------------------------- User 'mhconradt' has created a pull request for this issue: https://github.com/apache/spark/pull/37616 > Rebalance/Repartition Hints Not Working in PySpark > -------------------------------------------------- > > Key: SPARK-40178 > URL: https://issues.apache.org/jira/browse/SPARK-40178 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2 > Environment: Mac OSX 11.4 Big Sur > Python 3.9.7 > Spark version >= 3.2.0 (perhaps before as well). > Reporter: Maxwell Conradt > Priority: Major > Fix For: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1 > > Original Estimate: 168h > Remaining Estimate: 168h > > Partitioning hints in PySpark do not work because the column parameters are > not converted to Catalyst `Expression` instances before being passed to the > hint resolver. > The behavior of the hints is documented > [here|https://spark.apache.org/docs/3.3.0/sql-ref-syntax-qry-select-hints.html#partitioning-hints-types]. > Example: > > {code:java} > >>> df = spark.range(1024) > >>> > >>> df > DataFrame[id: bigint] > >>> df.hint("rebalance", "id") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line > 980, in hint > jdf = self._jdf.hint(name, self._jseq(parameters)) > File > "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, > in deco > raise converted from None > pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include > columns, but id found > >>> df.hint("repartition", "id") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line > 980, in hint > jdf = self._jdf.hint(name, self._jseq(parameters)) > File > "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, > in deco > raise converted from None > pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should > include columns, but id found {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org