[ https://issues.apache.org/jira/browse/SPARK-29317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-29317. ---------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25989 [https://github.com/apache/spark/pull/25989] > Avoid inheritance hierarchy in pandas CoGroup arrow runner and its plan > ----------------------------------------------------------------------- > > Key: SPARK-29317 > URL: https://issues.apache.org/jira/browse/SPARK-29317 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL > Affects Versions: 3.0.0 > Reporter: Hyukjin Kwon > Priority: Major > Fix For: 3.0.0 > > > At SPARK-27463, some refactoring was made. There are two common base abstract > classes were introduced: > 1. {{BaseArrowPythonRunner}} > Before: > {code} > └── BasePythonRunner > ├── ArrowPythonRunner > ├── CoGroupedArrowPythonRunner > ├── PythonRunner > └── PythonUDFRunner > {code} > After: > {code} > BasePythonRunner > ├── BaseArrowPythonRunner > │ ├── ArrowPythonRunner > │ └── CoGroupedArrowPythonRunner > ├── PythonRunner > └── PythonUDFRunner > {code} > The problem is that R code path is being matched with Python side: > {code} > └── BaseRRunner > ├── ArrowRRunner > └── RRunner > {code} > I would like to match the hierarchy and decouple other stuff for now. Ideally > we should deduplicate both code paths. Internal implementation is also > similar intentionally. > 2. {{BasePandasGroupExec}} > Before: > {code} > ├── FlatMapGroupsInPandasExec > └── FlatMapCoGroupsInPandasExec > {code} > After: > {code} > └── BasePandasGroupExec > ├── FlatMapGroupsInPandasExec > └── FlatMapCoGroupsInPandasExec > {code} > Problem is that, R (with Arrow optimization, in particular) has some > duplicated codes with Pandas UDFs. > {{FlatMapGroupsInRWithArrowExec}} <> {{FlatMapGroupsInPandasExec}} > {{MapPartitionsInRWithArrowExec}} <> {{ArrowEvalPythonExec}} > In order to prepare deduplication here as well, it might better avoid > changing hierarchy alone in Python sides but just rather decouple it. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org