Re: Gradle Task Configuration Avoidance

2022-12-05 Thread Kenneth Knowles
Nice!

I believe at some point in the past we made a pass to try to convert our
stuff to this model. I wonder if we can prevent it proactively somehow,
like disabling the legacy way of creating tasks or something.

Kenn

On Mon, Dec 5, 2022 at 6:25 AM Kerry Donny-Clark via dev <
dev@beam.apache.org> wrote:

> Thanks Damon! I really appreciate how clear your emails are here. Instead
> of my usual feeling of "I don't quite understand, and don't have time to
> get context" I can read all the context in the mail.
> This error message had confused me, so I really appreciate the cleanup and
> explanation.
>
> On Fri, Dec 2, 2022, 7:28 PM Damon Douglas via dev 
> wrote:
>
>> Hello Everyone,
>>
>> *If you are new to Beam and coming from non-Java language conventions, it
>> is likely you are new to gradle.  At the end of this email is a list of
>> definitions and references to help understand this email.*
>>
>> *Short Version (For those who know gradle)*:
>> A pull request [1] may fix the continual error message "Error: Backend
>> initialization required, please run "terraform init"".  The PR applies Task
>> Configuration Avoidance [2] by applying changes to a few tasks from
>> tasks(String) to tasks.register(String).
>>
>> *Long Version (For those who are not as familiar with gradle)*:
>>
>> I write this not as an expert but as someone still learning.  Gradle [3]
>> is the software we use in the Beam repository to automate many needed tasks
>> associated with building and testing code.  It is typically used in Java
>> projects but can be extended for other purposes.  We store code related to
>> our Beam Playground [4] that also uses gradle though it is not mainly a
>> Java project.  The unit of work for Gradle is what is called a task.  To
>> run a task you open a terminal and type "./gradlew nameOfMyTask".  There
>> are two main ways to create a custom task in our build.gradle files.  One
>> is writing task("doSomething") and the other is
>> tasks.register("doSomethingElse").  According to [2], the recommendation is
>> to use the tasks.register("doSomething").  This avoids executing other work
>> (configuration but don't worry about it for now) until one runs the
>> doSomething task or another task we are running depends on it.
>>
>> So why were we seeing this "Error: Backend initialization required"
>> message all the time?  The reason is that tasks were configured as
>> task("doSomething").  All I had to do was change this to
>> tasks.register("doSomething") and it removed the message.
>>
>> *Definitions/References*
>>
>> 1. https://github.com/apache/beam/pull/24509
>> 2.
>> https://docs.gradle.org/current/userguide/task_configuration_avoidance.html
>> 3. https://docs.gradle.org/current/userguide/what_is_gradle.html
>> 4. https://play.beam.apache.org/
>>
>> *Suggested Learning Path To Understand This Email*
>> 1.
>> https://docs.gradle.org/current/samples/sample_building_java_libraries.html
>> 2. https://docs.gradle.org/current/userguide/build_lifecycle.html
>> 3. https://docs.gradle.org/current/userguide/tutorial_using_tasks.html
>> 4.
>> https://docs.gradle.org/current/userguide/task_configuration_avoidance.html
>>
>> Best,
>>
>> Damon
>>
>>


Re: Achievement unlocked: fully triaged

2022-12-05 Thread Kenneth Knowles
I definitely think reducing the label zoo could help. We have a lot of
labels that are decompositions of what used to be Jira components.

Kenn

On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev 
wrote:

> > Previously, we had automation that would automatically mark
> self-assigned self-reported issues as triaged. That is probably a third of
> issues or more.
>
> I believe that automation exists now[1], but it wasn't retroactively
> applied to old issues.
>
> > One issue is that a lot of triage work is getting the labels right (a
> lot of things end up in beam-model or beam-community)
>
> Do you think it would help to cut down on our label options?
> beam-community might be popular because it's the default option, so
> reducing options might not help that much unfortunately.
>
> [1] example - https://github.com/apache/beam/issues/24521
>
> On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles  wrote:
>
>> Previously, we had automation that would automatically mark self-assigned
>> self-reported issues as triaged. That is probably a third of issues or
>> more. I'm not sure what else. I appreciate Valentyn keeping an eye on the
>> Python label. One issue is that a lot of triage work is getting the labels
>> right (a lot of things end up in beam-model or beam-community)
>>
>> Kenn
>>
>> On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev <
>> dev@beam.apache.org> wrote:
>>
>>> This is a glorious achievement Kenn! To keep things clean going forward
>>> are there any improvements we can make in our issue creation flow?
>>>
>>> On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles  wrote:
>>>
 Hi all,

 I've finally done it! I've emptied the label "awaiting triage". Help me
 keep it that way! This ensures that we actually at least *look* at each
 issue once, preferably soon after it is filed. The idea is that you make
 sure the priority and other labels are right, since users are not expected
 to know how we use labels.


 https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3A%22awaiting+triage%22

 Kenn

>>>


Re: Achievement unlocked: fully triaged

2022-12-05 Thread Danny McCormick via dev
> Previously, we had automation that would automatically mark self-assigned
self-reported issues as triaged. That is probably a third of issues or
more.

I believe that automation exists now[1], but it wasn't retroactively
applied to old issues.

> One issue is that a lot of triage work is getting the labels right (a lot
of things end up in beam-model or beam-community)

Do you think it would help to cut down on our label options? beam-community
might be popular because it's the default option, so reducing options might
not help that much unfortunately.

[1] example - https://github.com/apache/beam/issues/24521

On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles  wrote:

> Previously, we had automation that would automatically mark self-assigned
> self-reported issues as triaged. That is probably a third of issues or
> more. I'm not sure what else. I appreciate Valentyn keeping an eye on the
> Python label. One issue is that a lot of triage work is getting the labels
> right (a lot of things end up in beam-model or beam-community)
>
> Kenn
>
> On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev <
> dev@beam.apache.org> wrote:
>
>> This is a glorious achievement Kenn! To keep things clean going forward
>> are there any improvements we can make in our issue creation flow?
>>
>> On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles  wrote:
>>
>>> Hi all,
>>>
>>> I've finally done it! I've emptied the label "awaiting triage". Help me
>>> keep it that way! This ensures that we actually at least *look* at each
>>> issue once, preferably soon after it is filed. The idea is that you make
>>> sure the priority and other labels are right, since users are not expected
>>> to know how we use labels.
>>>
>>>
>>> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3A%22awaiting+triage%22
>>>
>>> Kenn
>>>
>>


Re: Achievement unlocked: fully triaged

2022-12-05 Thread Robert Burke
I do a regular look at the go label myself. Partly because that's the best
way to learn what to fix next.

On Mon, Dec 5, 2022, 11:57 AM Kenneth Knowles  wrote:

> Previously, we had automation that would automatically mark self-assigned
> self-reported issues as triaged. That is probably a third of issues or
> more. I'm not sure what else. I appreciate Valentyn keeping an eye on the
> Python label. One issue is that a lot of triage work is getting the labels
> right (a lot of things end up in beam-model or beam-community)
>
> Kenn
>
> On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev <
> dev@beam.apache.org> wrote:
>
>> This is a glorious achievement Kenn! To keep things clean going forward
>> are there any improvements we can make in our issue creation flow?
>>
>> On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles  wrote:
>>
>>> Hi all,
>>>
>>> I've finally done it! I've emptied the label "awaiting triage". Help me
>>> keep it that way! This ensures that we actually at least *look* at each
>>> issue once, preferably soon after it is filed. The idea is that you make
>>> sure the priority and other labels are right, since users are not expected
>>> to know how we use labels.
>>>
>>>
>>> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3A%22awaiting+triage%22
>>>
>>> Kenn
>>>
>>


Re: Achievement unlocked: fully triaged

2022-12-05 Thread Kenneth Knowles
Previously, we had automation that would automatically mark self-assigned
self-reported issues as triaged. That is probably a third of issues or
more. I'm not sure what else. I appreciate Valentyn keeping an eye on the
Python label. One issue is that a lot of triage work is getting the labels
right (a lot of things end up in beam-model or beam-community)

Kenn

On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev <
dev@beam.apache.org> wrote:

> This is a glorious achievement Kenn! To keep things clean going forward
> are there any improvements we can make in our issue creation flow?
>
> On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> I've finally done it! I've emptied the label "awaiting triage". Help me
>> keep it that way! This ensures that we actually at least *look* at each
>> issue once, preferably soon after it is filed. The idea is that you make
>> sure the priority and other labels are right, since users are not expected
>> to know how we use labels.
>>
>>
>> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3A%22awaiting+triage%22
>>
>> Kenn
>>
>


Re: Gradle Task Configuration Avoidance

2022-12-05 Thread Kerry Donny-Clark via dev
Thanks Damon! I really appreciate how clear your emails are here. Instead
of my usual feeling of "I don't quite understand, and don't have time to
get context" I can read all the context in the mail.
This error message had confused me, so I really appreciate the cleanup and
explanation.

On Fri, Dec 2, 2022, 7:28 PM Damon Douglas via dev 
wrote:

> Hello Everyone,
>
> *If you are new to Beam and coming from non-Java language conventions, it
> is likely you are new to gradle.  At the end of this email is a list of
> definitions and references to help understand this email.*
>
> *Short Version (For those who know gradle)*:
> A pull request [1] may fix the continual error message "Error: Backend
> initialization required, please run "terraform init"".  The PR applies Task
> Configuration Avoidance [2] by applying changes to a few tasks from
> tasks(String) to tasks.register(String).
>
> *Long Version (For those who are not as familiar with gradle)*:
>
> I write this not as an expert but as someone still learning.  Gradle [3]
> is the software we use in the Beam repository to automate many needed tasks
> associated with building and testing code.  It is typically used in Java
> projects but can be extended for other purposes.  We store code related to
> our Beam Playground [4] that also uses gradle though it is not mainly a
> Java project.  The unit of work for Gradle is what is called a task.  To
> run a task you open a terminal and type "./gradlew nameOfMyTask".  There
> are two main ways to create a custom task in our build.gradle files.  One
> is writing task("doSomething") and the other is
> tasks.register("doSomethingElse").  According to [2], the recommendation is
> to use the tasks.register("doSomething").  This avoids executing other work
> (configuration but don't worry about it for now) until one runs the
> doSomething task or another task we are running depends on it.
>
> So why were we seeing this "Error: Backend initialization required"
> message all the time?  The reason is that tasks were configured as
> task("doSomething").  All I had to do was change this to
> tasks.register("doSomething") and it removed the message.
>
> *Definitions/References*
>
> 1. https://github.com/apache/beam/pull/24509
> 2.
> https://docs.gradle.org/current/userguide/task_configuration_avoidance.html
> 3. https://docs.gradle.org/current/userguide/what_is_gradle.html
> 4. https://play.beam.apache.org/
>
> *Suggested Learning Path To Understand This Email*
> 1.
> https://docs.gradle.org/current/samples/sample_building_java_libraries.html
> 2. https://docs.gradle.org/current/userguide/build_lifecycle.html
> 3. https://docs.gradle.org/current/userguide/tutorial_using_tasks.html
> 4.
> https://docs.gradle.org/current/userguide/task_configuration_avoidance.html
>
> Best,
>
> Damon
>
>


Re: Achievement unlocked: fully triaged

2022-12-05 Thread Kerry Donny-Clark via dev
This is a glorious achievement Kenn! To keep things clean going forward are
there any improvements we can make in our issue creation flow?

On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles  wrote:

> Hi all,
>
> I've finally done it! I've emptied the label "awaiting triage". Help me
> keep it that way! This ensures that we actually at least *look* at each
> issue once, preferably soon after it is filed. The idea is that you make
> sure the priority and other labels are right, since users are not expected
> to know how we use labels.
>
>
> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3A%22awaiting+triage%22
>
> Kenn
>


Beam High Priority Issue Report (57)

2022-12-05 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/24383 [Bug]: Daemon will be stopped at 
the end of the build after the daemon was no longer found in the daemon registry
https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot be 
passed to flink runner
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to 
lock gradle
https://github.com/apache/beam/issues/24263 [Bug]: Remote call on 
apache-beam-jenkins-3 failed. The channel is closing down or has closed down
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22969 Discrepancy in behavior of 
`DoFn.process()` when `yield` is combined with `return` statement, or vice versa
https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently 
skips most of records without job fail
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21561 
ExternalPythonTransformTest.trivialPythonTransform flaky
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21113 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20975 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM 
on Flink
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19734 
WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp 
order (sickbayed)
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.
https://github.com/apache/beam/issues/19241 Python Dataflow integration tests 
should export the pipeline Job ID and console output to Jenkins Test Result 
section


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/24100 [Bug]: `Filter.whereFieldName` 
appears in docs but not available
https://github.com/apache/beam/issues/23906 [Bug]: Dataflow jpms tests fail on 
the 2.43.0 release branch
https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true 
for unequal rows
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/23489 [Bug]: add DebeziumIO to the 
connectors page
https://github.com/apache/beam/issues/23306 [Bug]: BigQueryBatchFileLoads in 
python loses data when using WRITE_TRUNCATE
https://github.com/apache/beam/issues/23286 [Bug]: 
beam_PerformanceTests_InfluxDbIO_IT Flaky > 50 % Fa