[ https://issues.apache.org/jira/browse/SPARK-37022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maciej Szymkiewicz updated SPARK-37022: --------------------------------------- Description: [{{black}}|https://github.com/psf/black] is a popular Python code formatter. It is used by a number of projects, both small and large, including prominent ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used to format a {{pyspark.pandas}} and (though not enforced) stubs files. We should consider using black to enforce formatting of all PySpark files. There are multiple reasons to do that: - Consistency: black is already used across existing codebase and black formatted chunks of code are already added to modules other than pyspark.pandas as a result of type hints inlining (SPARK-36845). - Lower cost of contributing and reviewing: Formatting can be automatically enforced and applied. - Simplify reviews: In general, black formatted code, produces small and highly readable diffs. - Reduce effort required to maintain patched forks: smaller diffs + predictable formatting. Risks: - Initial reformatting requires quite significant changes. - Applying black will break blame in GitHub UI (for git in general see [Avoiding ruining git blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]). Additional steps: - To simplify backporting, black will have to be applied to all active branches. was: [{{black}}|https://github.com/psf/black] is a popular Python code formatter. It is used by a number of projects, both small and large, including prominent ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used to format a {{pyspark.pandas}} and (though not enforced) stubs files. We should consider using black to enforce formatting of all PySpark files. There are multiple reasons to do that: - Consistency: black is already used across existing codebase and black formatted chunks of code are already added to modules other than pyspark.pandas as a result of type hints inlining (SPARK-36845). - Lower cost of contributing and reviewing: Formatting can be automatically enforced and applied. - Simplify reviews: In general, black formatted code, produces small and highly readable diffs. Risks: - Initial reformatting requires quite significant changes. - Applying black will break blame in GitHub UI (for git in general see [Avoiding ruining git blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]). Additional steps: - To simplify backporting, black will have to be applied to all active branches. > Use black as a formatter for the whole PySpark codebase. > -------------------------------------------------------- > > Key: SPARK-37022 > URL: https://issues.apache.org/jira/browse/SPARK-37022 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.3.0 > Reporter: Maciej Szymkiewicz > Priority: Major > Attachments: black-diff-stats.txt, pyproject.toml > > > [{{black}}|https://github.com/psf/black] is a popular Python code formatter. > It is used by a number of projects, both small and large, including prominent > ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used > to format a {{pyspark.pandas}} and (though not enforced) stubs files. > We should consider using black to enforce formatting of all PySpark files. > There are multiple reasons to do that: > - Consistency: black is already used across existing codebase and black > formatted chunks of code are already added to modules other than > pyspark.pandas as a result of type hints inlining (SPARK-36845). > - Lower cost of contributing and reviewing: Formatting can be automatically > enforced and applied. > - Simplify reviews: In general, black formatted code, produces small and > highly readable diffs. > - Reduce effort required to maintain patched forks: smaller diffs + > predictable formatting. > Risks: > - Initial reformatting requires quite significant changes. > - Applying black will break blame in GitHub UI (for git in general see > [Avoiding ruining git > blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]). > Additional steps: > - To simplify backporting, black will have to be applied to all active > branches. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org