Re: [DISCUSS] Autoformat python code with Black

Łukasz Gajowy Thu, 30 May 2019 01:50:27 -0700

+1 for any autoformatter for Python SDK that does the job. My experience is
that since spotless in Java SDK I would never start a new Java project
without it. So many great benefits not only for one person coding but for
all community.


It is a GitHub UI issue that you cannot easily browse past the reformat. It
is not actually that hard, but does take a couple extra clicks to get
GitHub to display blame before a reformat. It is easier with the command
line. I do a lot of code history digging and the global Java reformat is
not really a problem.

It's actually one more click on Github but I agree it's not the best way to
search the history. The most convenient and clear one I've found so far is
in Jetbrains IDEs (Intelij) where you can:

right click on line number -> "annotate" -> click again -> "annotate
previous revision" -> ...

You can also use "compare with" to see the diff between two revisions.

Łukasz





czw., 30 maj 2019 o 06:15 Kenneth Knowles <k...@apache.org> napisał(a):

> +1 pending good enough tooling (I can't quite tell - seems there are some
> issues?)
>
> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
> ka.kucharc...@gmail.com> wrote:
>
>> What else actually we gain? My guess is faster PR review iteration. We
>> will skip some of conversations about code style.
>>
> ...
>
>> Last but not least, new contributor may be less discouraged. When I
>> started contribute I didn’t know how to format my code and I lost a lot of
>> time to add pylint and adjust IntelliJ. I eventually failed. Currently I
>> write code intuitively and when I don’t forget I rerun tox.
>>
>
> This is a huge benefit. This is why I supported it so much for Java. It is
> a community benefit. You do not have to be a contributor to the Python SDK
> to support this. That is why I am writing here. Just eliminate all
> discussion of formatting. It doesn't really matter what the resulting
> format is, if it is not crazy to read. I strongly oppose maintaining a
> non-default format.
>
> Reformating 20k lines or 200k is not hard. The Java global reformat
> touched 50k lines. It does not really matter how big it is. Definitely do
> it all at once if you think the tool is good enough. And you should pin a
> version, so churn is not a problem. You can upgrade the version and
> reformat in a PR later and that is also easy.
>
> It is a GitHub UI issue that you cannot easily browse past the reformat.
> It is not actually that hard, but does take a couple extra clicks to get
> GitHub to display blame before a reformat. It is easier with the command
> line. I do a lot of code history digging and the global Java reformat is
> not really a problem.
>
> Kenn
>
>
>
>> Also everything will be formatted in a same way, so eventually it would
>> be easier to read.
>>
>> Moreover, as it was mentioned in previous emails - a lot of Jenkins
>> failures won’t take place, so we save time and resources.
>>
>>
>> One of disadvantages is that our pipelines has custom syntax and after
>> formatting they looks a little bit weird, but maybe extending the only
>> configurable option in Black - lines, from 88 to 110 would be solution.
>>
>> Second one is that Black requires Python 3 to be run. I don’t know how
>> big obstacle it would be.
>>
>> I believe there are two options how it would be possible to introduce
>> Black. First: just do it, it will hurt but then it would be ok (same as a
>> dentist appointment). Of course it may require some work to adjust linters.
>> On the other hand we can do it gradually and start including sdk parts one
>> by one - maybe it will be less painful?
>>
>> As an example I can share one of projects [2] I know that uses Black
>> (they use also other cool checkers and pre-commit [3]). This is how looks
>> their build with all checks [4].
>>
>> To sum up I believe that if we want improve our coding experience, we
>> should improve our toolset. Black seems be recent and quite popular tool
>> what makes think they won’t stop developing it.
>>
>> [1]
>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>
>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>
>> [3] https://pre-commit.com
>>
>> [4]
>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>
>>
>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> Reformatting to 4 spaces seems a non-starter to me, as it would change
>>> nearly every single line in the codebase (and the loss of all context as
>>> well as that particular line).
>>>
>>> This is probably why the 2-space fork exists. However, we don't conform
>>> to that either--we use 2 spaces for indentation, but 4 for continuation
>>> indentation. (As for the history of this, this goes back to Google's
>>> internal style guide, probably motivated by consistency with C++, Java, ...
>>> and the fact that with an indent level of 4 one ends up wrapping lines
>>> quite frequently (it's telling that black's default line length is 88)).
>>> This turns out to be an easy change to the codebase.
>>>
>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this tool
>>> introduces a huge amount of vertical whitespace (e.g. closing parentheses
>>> on their own line), e.g.
>>>
>>> def foo(
>>>     args
>>> ):
>>>   if (
>>>       long expression)
>>>   ):
>>>     func(
>>>         args
>>>     )
>>>
>>> I wrote a simple post-processor to put closing parentheses on the same
>>> lines, as well as omit the newline after "if (", and disabling formatting
>>> of strings, which reduce the churn in our codebase to 15k lines (adding
>>> about 4k) out of 200k total.
>>>
>>> https://github.com/apache/beam/pull/8712/files
>>>
>>> It's still very opinionated, often in different ways then me, and
>>> doesn't understand the semantics of the code, but possibly something we
>>> could live with given the huge advantages of an autoformatter.
>>>
>>> An intermediate point would be to allow, but not require, autoformatting
>>> of changed lines.
>>>
>>> As for being beta quality, it looks like it's got a decent number of
>>> contributors and in my book being in the python github project is a strong
>>> positive signal. But, due to the above issues, I think we'd have to
>>> maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue
>>> is a 2-line change, and the rest implemented as a post-processing step (for
>>> now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>
>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>>> >
>>> > > I think the question is if it can be configured in a way to fit our
>>> > > current linter's style. I don't think it is feasible to reformat the
>>> > > entire Python SDK.
>>> >
>>> > It cannot be configured to do what we actually do because Black is
>>> > configurable only to support the standard python codestyle guidelines
>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>> > python world use.
>>> >
>>> > > Reformatted lines don't allow quick access to the Git history. This
>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>> > > that this might be less of a problem with Python because the linter
>>> has
>>> > > more rules than Checkstyle had.
>>> >
>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>> > to deal with.
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <m...@apache.org>
>>> wrote:
>>> > >
>>> > > I think the question is if it can be configured in a way to fit our
>>> > > current linter's style. I don't think it is feasible to reformat the
>>> > > entire Python SDK.
>>> > >
>>> > > Reformatted lines don't allow quick access to the Git history. This
>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>> > > that this might be less of a problem with Python because the linter
>>> has
>>> > > more rules than Checkstyle had.
>>> > >
>>> > > -Max
>>> > >
>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>> > > >> My concerns are:
>>> > > >> - The product is clearly marked as beta with a big warning.
>>> > > >> - It looks like mostly a single person project. For the same
>>> reason I also strongly prefer not using a fork for a specific setting. Fork
>>> will only have less people looking at it.
>>> > > >
>>> > > > I suppose the project is marked as beta because it is recent, it
>>> was
>>> > > > presented in 2018’s pycon, and because some things can change since
>>> > > > auto-formatters are pretty tricky beasts, I think beta in that
>>> case is
>>> > > > like our own ‘@Experimental’. If you look at the contribution page
>>> [1]
>>> > > > you can notice that it is less and less a single person project,
>>> there
>>> > > > have been 93 independent contributions since the project became
>>> > > > public, and the fact that it is hosted in the python organization
>>> > > > github [2] gives some confidence on the project continuity.
>>> > > >
>>> > > > You are right however about the fact that the main author seems to
>>> be
>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>> > > > arbitrary, but he is just following pep8 style guide
>>> recommendations
>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>> > > > recommendation of PEP-8 or even Google's own Python style guide
>>> [4],
>>> > > > So, probably it should be to us to reconsider the current policy to
>>> > > > adapt to the standards (and the tool).
>>> > > >
>>> > > > I did a quick run of black with python 2.7 compatibility on
>>> > > > sdks/python and got only 4 parsing errors which is positive given
>>> the
>>> > > > size of our code base.
>>> > > >
>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to
>>> reformat.
>>> > > >
>>> > > > error: cannot format
>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>> > > > Cannot parse: 47:22:   _display_progress = print
>>> > > > error: cannot format
>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>> > > > error: cannot format
>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>> file=sys.stderr)
>>> > > > error: cannot format
>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '
>>>  ',
>>> > > > end=' ')
>>> > > >
>>> > > > I still think this can be positive for the project but well I am
>>> > > > barely a contributor to the python code base so I let you the
>>> python
>>> > > > maintainers to reconsider this, in any case it seems like a good
>>> > > > improvement for the project.
>>> > > >
>>> > > > [1] https://github.com/python/black/graphs/contributors
>>> > > > [2] https://github.com/python
>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>> > > > [4]
>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>> > > >
>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com>
>>> wrote:
>>> > > >>
>>> > > >> I am in the same boat with Robert, I am in favor of
>>> autoformatters but I am not familiar with this one. My concerns are:
>>> > > >> - The product is clearly marked as beta with a big warning.
>>> > > >> - It looks like mostly a single person project. For the same
>>> reason I also strongly prefer not using a fork for a specific setting. Fork
>>> will only have less people looking at it.
>>> > > >>
>>> > > >> IMO, this is in an early stage for us. That said lint issues are
>>> real as pointed in the thread. If someone would like to give it a try and
>>> see how it would look like for us that would be interesting.
>>> > > >>
>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>> ka.kucharc...@gmail.com> wrote:
>>> > > >>>
>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are
>>> caused by lint problems.
>>> > > >>> I think it would be great to have something similar to Spotless
>>> in Java SDK (I heard there is problem with configuring Black with IntelliJ).
>>> > > >>>
>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>> rober...@google.com> wrote:
>>> > > >>>>
>>> > > >>>> I'm generally in favor of autoformatters, though I haven't
>>> looked at
>>> > > >>>> how well this particular one works. We might have to go with
>>> > > >>>> https://github.com/desbma/black-2spaces given
>>> > > >>>> https://github.com/python/black/issues/378 .
>>> > > >>>>
>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>> pabl...@google.com> wrote:
>>> > > >>>>>
>>> > > >>>>> This looks pretty good:) I know at least a couple people
>>> (myself included) who've been annoyed by having to take care of lint issues
>>> that maybe a code formatter could save us.
>>> > > >>>>> Thanks for sharing Ismael.
>>> > > >>>>> -P.
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <ieme...@gmail.com>
>>> wrote:
>>> > > >>>>>>
>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto
>>> formatter that
>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and
>>> wanted to
>>> > > >>>>>> bring to the ML Is there interest from the python people to
>>> get this
>>> > > >>>>>> into the build?
>>> > > >>>>>>
>>> > > >>>>>> The introduction of spotless for Java has been a good
>>> improvement and
>>> > > >>>>>> maybe the python code base may benefit of this too.
>>> > > >>>>>>
>>> > > >>>>>> WDYT?
>>> > > >>>>>>
>>> > > >>>>>> [1] https://github.com/python/black
>>>
>>

Re: [DISCUSS] Autoformat python code with Black

Reply via email to