Re: Achievement unlocked: fully triaged

2022-12-06 Thread Robert Burke
+1 to letting conjunctions form naturally.

In the bikeshedding discusion:

That would mean I'm biased to having the reduced label set to have reduced
colours for the general category.

Eg. SDK colour, Runner colour, beam resources colour, and IO being it's own
special unique colour, and awaiting triage being unique as well.

This would make triage checking a bit more glancible, since except for very
particular issues that might warrant "several SDKs" or "several runners".

On Tue, Dec 6, 2022, 11:13 AM Danny McCormick via dev 
wrote:

> I like that idea (and the list) as well.
>
> On Tue, Dec 6, 2022 at 1:59 PM Kerry Donny-Clark 
> wrote:
>
>> I really like the idea of multi-select and automatic "awaiting triage".
>> Kenn, I think the list you have looks good to me.
>>
>> On Tue, Dec 6, 2022 at 1:55 PM Kenneth Knowles  wrote:
>>
>>> Noting that what you've listed are the options in the issue template,
>>> which are then expanded to multiple labels. So focusing on the issue
>>> template, I like the general idea, but maybe we can simplify it even more:
>>>
>>> When a user is filing a bug, I think a good outcome is for it to get
>>> into the right person's saved search (like Go, Python, etc) while still
>>> having the "awaiting triage" label on it.
>>>
>>> What if we just went all the way simple and had checkboxes for just the
>>> highest level. Something like the following:
>>>
>>> Which language SDK or feature is related to your report? (check all that
>>> apply)
>>> [ ] Python
>>> [ ] Java
>>> [ ] Go
>>> [ ] Typescript
>>> [ ] IO connector
>>> [ ] Beam examples
>>> [ ] Beam playground
>>> [ ] Beam katas
>>> [ ] Website
>>> [ ] Spark Runner
>>> [ ] Flink Runner
>>> [ ] Samza Runner
>>> [ ] Twister2 Runner
>>> [ ] Hazelcast Jet Runner
>>> [ ] Google Cloud Dataflow Runner
>>>
>>> We could even trim it even further to just language, and let the person
>>> doing triage handle the rest.
>>>
>>> Kenn
>>>
>>> On Tue, Dec 6, 2022 at 9:11 AM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
 > Is it possible to not have a default option?

 Sadly, no AFAIK. I agree this would help. We could try things like
 making the default " " and auto-closing issues that don't pick something
 other than the default, that's a pretty rough experience though and not
 worth it IMO.

 > I definitely think reducing the label zoo could help.

 What's our desired end state here? I put together a doc with my
 suggested labels -
 https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing
  -
 listed below as well for convenience. Please comment in the doc if you have
 thoughts/labels you care about, or continue the email thread if you have
 bigger ideas (e.g. getting rid of labels, changing our templates entirely
 instead, etc...).

 *Danny's Proposed Labels:*


-

beam-community
-

beam-playground
-

community-metrics
-

cross-language
-

examples-java
-

examples-python
-

extensions
-

infrastructure
-

io-go
-

io-ideas
-

io-java
-

io-py
-

katas
-

release
-

run-inference
-

runner
-

runner-dataflow
-

runner-direct
-

runner-flink
-

runner-samza
-

runner-spark
-

runner-universal
-

sdk-go
-

sdk-ideas
-

sdk-java
-

sdk-py
-

sdk-typescript
-

test-failures
-

website


 On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen <
 bjornpeder...@google.com> wrote:

> As someone still newer to Beam, I can attest that the number of labels
> can be overwhelming.
>
> Is it possible to not have a default option? Even just getting people
> to interact with the dropdown might go a long way, especially if the 
> labels
> were fewer and clearer.
>
> Bjorn
>
> On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles 
> wrote:
>
>> I definitely think reducing the label zoo could help. We have a lot
>> of labels that are decompositions of what used to be Jira components.
>>
>> Kenn
>>
>> On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> > Previously, we had automation that would automatically mark
>>> self-assigned self-reported issues as triaged. That is probably a third 
>>> of
>>> issues or more.
>>>
>>> I believe that automation exists now[1], but it wasn't retroactively
>>> 

Re: Achievement unlocked: fully triaged

2022-12-06 Thread Danny McCormick via dev
I like that idea (and the list) as well.

On Tue, Dec 6, 2022 at 1:59 PM Kerry Donny-Clark  wrote:

> I really like the idea of multi-select and automatic "awaiting triage".
> Kenn, I think the list you have looks good to me.
>
> On Tue, Dec 6, 2022 at 1:55 PM Kenneth Knowles  wrote:
>
>> Noting that what you've listed are the options in the issue template,
>> which are then expanded to multiple labels. So focusing on the issue
>> template, I like the general idea, but maybe we can simplify it even more:
>>
>> When a user is filing a bug, I think a good outcome is for it to get into
>> the right person's saved search (like Go, Python, etc) while still having
>> the "awaiting triage" label on it.
>>
>> What if we just went all the way simple and had checkboxes for just the
>> highest level. Something like the following:
>>
>> Which language SDK or feature is related to your report? (check all that
>> apply)
>> [ ] Python
>> [ ] Java
>> [ ] Go
>> [ ] Typescript
>> [ ] IO connector
>> [ ] Beam examples
>> [ ] Beam playground
>> [ ] Beam katas
>> [ ] Website
>> [ ] Spark Runner
>> [ ] Flink Runner
>> [ ] Samza Runner
>> [ ] Twister2 Runner
>> [ ] Hazelcast Jet Runner
>> [ ] Google Cloud Dataflow Runner
>>
>> We could even trim it even further to just language, and let the person
>> doing triage handle the rest.
>>
>> Kenn
>>
>> On Tue, Dec 6, 2022 at 9:11 AM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> > Is it possible to not have a default option?
>>>
>>> Sadly, no AFAIK. I agree this would help. We could try things like
>>> making the default " " and auto-closing issues that don't pick something
>>> other than the default, that's a pretty rough experience though and not
>>> worth it IMO.
>>>
>>> > I definitely think reducing the label zoo could help.
>>>
>>> What's our desired end state here? I put together a doc with my
>>> suggested labels -
>>> https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing
>>>  -
>>> listed below as well for convenience. Please comment in the doc if you have
>>> thoughts/labels you care about, or continue the email thread if you have
>>> bigger ideas (e.g. getting rid of labels, changing our templates entirely
>>> instead, etc...).
>>>
>>> *Danny's Proposed Labels:*
>>>
>>>
>>>-
>>>
>>>beam-community
>>>-
>>>
>>>beam-playground
>>>-
>>>
>>>community-metrics
>>>-
>>>
>>>cross-language
>>>-
>>>
>>>examples-java
>>>-
>>>
>>>examples-python
>>>-
>>>
>>>extensions
>>>-
>>>
>>>infrastructure
>>>-
>>>
>>>io-go
>>>-
>>>
>>>io-ideas
>>>-
>>>
>>>io-java
>>>-
>>>
>>>io-py
>>>-
>>>
>>>katas
>>>-
>>>
>>>release
>>>-
>>>
>>>run-inference
>>>-
>>>
>>>runner
>>>-
>>>
>>>runner-dataflow
>>>-
>>>
>>>runner-direct
>>>-
>>>
>>>runner-flink
>>>-
>>>
>>>runner-samza
>>>-
>>>
>>>runner-spark
>>>-
>>>
>>>runner-universal
>>>-
>>>
>>>sdk-go
>>>-
>>>
>>>sdk-ideas
>>>-
>>>
>>>sdk-java
>>>-
>>>
>>>sdk-py
>>>-
>>>
>>>sdk-typescript
>>>-
>>>
>>>test-failures
>>>-
>>>
>>>website
>>>
>>>
>>> On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen 
>>> wrote:
>>>
 As someone still newer to Beam, I can attest that the number of labels
 can be overwhelming.

 Is it possible to not have a default option? Even just getting people
 to interact with the dropdown might go a long way, especially if the labels
 were fewer and clearer.

 Bjorn

 On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles  wrote:

> I definitely think reducing the label zoo could help. We have a lot of
> labels that are decompositions of what used to be Jira components.
>
> Kenn
>
> On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> > Previously, we had automation that would automatically mark
>> self-assigned self-reported issues as triaged. That is probably a third 
>> of
>> issues or more.
>>
>> I believe that automation exists now[1], but it wasn't retroactively
>> applied to old issues.
>>
>> > One issue is that a lot of triage work is getting the labels right
>> (a lot of things end up in beam-model or beam-community)
>>
>> Do you think it would help to cut down on our label options?
>> beam-community might be popular because it's the default option, so
>> reducing options might not help that much unfortunately.
>>
>> [1] example - https://github.com/apache/beam/issues/24521
>>
>> On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles 
>> wrote:
>>
>>> Previously, we had automation that would automatically mark
>>> self-assigned self-reported issues as triaged. That is probably a third 
>>> of
>>> issues or more. I'm not sure what else. I appreciate Valentyn 

Re: Achievement unlocked: fully triaged

2022-12-06 Thread Kerry Donny-Clark via dev
I really like the idea of multi-select and automatic "awaiting triage".
Kenn, I think the list you have looks good to me.

On Tue, Dec 6, 2022 at 1:55 PM Kenneth Knowles  wrote:

> Noting that what you've listed are the options in the issue template,
> which are then expanded to multiple labels. So focusing on the issue
> template, I like the general idea, but maybe we can simplify it even more:
>
> When a user is filing a bug, I think a good outcome is for it to get into
> the right person's saved search (like Go, Python, etc) while still having
> the "awaiting triage" label on it.
>
> What if we just went all the way simple and had checkboxes for just the
> highest level. Something like the following:
>
> Which language SDK or feature is related to your report? (check all that
> apply)
> [ ] Python
> [ ] Java
> [ ] Go
> [ ] Typescript
> [ ] IO connector
> [ ] Beam examples
> [ ] Beam playground
> [ ] Beam katas
> [ ] Website
> [ ] Spark Runner
> [ ] Flink Runner
> [ ] Samza Runner
> [ ] Twister2 Runner
> [ ] Hazelcast Jet Runner
> [ ] Google Cloud Dataflow Runner
>
> We could even trim it even further to just language, and let the person
> doing triage handle the rest.
>
> Kenn
>
> On Tue, Dec 6, 2022 at 9:11 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> > Is it possible to not have a default option?
>>
>> Sadly, no AFAIK. I agree this would help. We could try things like making
>> the default " " and auto-closing issues that don't pick something other
>> than the default, that's a pretty rough experience though and not worth it
>> IMO.
>>
>> > I definitely think reducing the label zoo could help.
>>
>> What's our desired end state here? I put together a doc with my suggested
>> labels -
>> https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing
>>  -
>> listed below as well for convenience. Please comment in the doc if you have
>> thoughts/labels you care about, or continue the email thread if you have
>> bigger ideas (e.g. getting rid of labels, changing our templates entirely
>> instead, etc...).
>>
>> *Danny's Proposed Labels:*
>>
>>
>>-
>>
>>beam-community
>>-
>>
>>beam-playground
>>-
>>
>>community-metrics
>>-
>>
>>cross-language
>>-
>>
>>examples-java
>>-
>>
>>examples-python
>>-
>>
>>extensions
>>-
>>
>>infrastructure
>>-
>>
>>io-go
>>-
>>
>>io-ideas
>>-
>>
>>io-java
>>-
>>
>>io-py
>>-
>>
>>katas
>>-
>>
>>release
>>-
>>
>>run-inference
>>-
>>
>>runner
>>-
>>
>>runner-dataflow
>>-
>>
>>runner-direct
>>-
>>
>>runner-flink
>>-
>>
>>runner-samza
>>-
>>
>>runner-spark
>>-
>>
>>runner-universal
>>-
>>
>>sdk-go
>>-
>>
>>sdk-ideas
>>-
>>
>>sdk-java
>>-
>>
>>sdk-py
>>-
>>
>>sdk-typescript
>>-
>>
>>test-failures
>>-
>>
>>website
>>
>>
>> On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen 
>> wrote:
>>
>>> As someone still newer to Beam, I can attest that the number of labels
>>> can be overwhelming.
>>>
>>> Is it possible to not have a default option? Even just getting people to
>>> interact with the dropdown might go a long way, especially if the labels
>>> were fewer and clearer.
>>>
>>> Bjorn
>>>
>>> On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles  wrote:
>>>
 I definitely think reducing the label zoo could help. We have a lot of
 labels that are decompositions of what used to be Jira components.

 Kenn

 On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> > Previously, we had automation that would automatically mark
> self-assigned self-reported issues as triaged. That is probably a third of
> issues or more.
>
> I believe that automation exists now[1], but it wasn't retroactively
> applied to old issues.
>
> > One issue is that a lot of triage work is getting the labels right
> (a lot of things end up in beam-model or beam-community)
>
> Do you think it would help to cut down on our label options?
> beam-community might be popular because it's the default option, so
> reducing options might not help that much unfortunately.
>
> [1] example - https://github.com/apache/beam/issues/24521
>
> On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles 
> wrote:
>
>> Previously, we had automation that would automatically mark
>> self-assigned self-reported issues as triaged. That is probably a third 
>> of
>> issues or more. I'm not sure what else. I appreciate Valentyn keeping an
>> eye on the Python label. One issue is that a lot of triage work is 
>> getting
>> the labels right (a lot of things end up in beam-model or beam-community)
>>
>> Kenn
>>
>> On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev <
>> dev@beam.apache.org> wrote:

Re: Achievement unlocked: fully triaged

2022-12-06 Thread Kenneth Knowles
Noting that what you've listed are the options in the issue template, which
are then expanded to multiple labels. So focusing on the issue template, I
like the general idea, but maybe we can simplify it even more:

When a user is filing a bug, I think a good outcome is for it to get into
the right person's saved search (like Go, Python, etc) while still having
the "awaiting triage" label on it.

What if we just went all the way simple and had checkboxes for just the
highest level. Something like the following:

Which language SDK or feature is related to your report? (check all that
apply)
[ ] Python
[ ] Java
[ ] Go
[ ] Typescript
[ ] IO connector
[ ] Beam examples
[ ] Beam playground
[ ] Beam katas
[ ] Website
[ ] Spark Runner
[ ] Flink Runner
[ ] Samza Runner
[ ] Twister2 Runner
[ ] Hazelcast Jet Runner
[ ] Google Cloud Dataflow Runner

We could even trim it even further to just language, and let the person
doing triage handle the rest.

Kenn

On Tue, Dec 6, 2022 at 9:11 AM Danny McCormick via dev 
wrote:

> > Is it possible to not have a default option?
>
> Sadly, no AFAIK. I agree this would help. We could try things like making
> the default " " and auto-closing issues that don't pick something other
> than the default, that's a pretty rough experience though and not worth it
> IMO.
>
> > I definitely think reducing the label zoo could help.
>
> What's our desired end state here? I put together a doc with my suggested
> labels -
> https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing
>  -
> listed below as well for convenience. Please comment in the doc if you have
> thoughts/labels you care about, or continue the email thread if you have
> bigger ideas (e.g. getting rid of labels, changing our templates entirely
> instead, etc...).
>
> *Danny's Proposed Labels:*
>
>
>-
>
>beam-community
>-
>
>beam-playground
>-
>
>community-metrics
>-
>
>cross-language
>-
>
>examples-java
>-
>
>examples-python
>-
>
>extensions
>-
>
>infrastructure
>-
>
>io-go
>-
>
>io-ideas
>-
>
>io-java
>-
>
>io-py
>-
>
>katas
>-
>
>release
>-
>
>run-inference
>-
>
>runner
>-
>
>runner-dataflow
>-
>
>runner-direct
>-
>
>runner-flink
>-
>
>runner-samza
>-
>
>runner-spark
>-
>
>runner-universal
>-
>
>sdk-go
>-
>
>sdk-ideas
>-
>
>sdk-java
>-
>
>sdk-py
>-
>
>sdk-typescript
>-
>
>test-failures
>-
>
>website
>
>
> On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen 
> wrote:
>
>> As someone still newer to Beam, I can attest that the number of labels
>> can be overwhelming.
>>
>> Is it possible to not have a default option? Even just getting people to
>> interact with the dropdown might go a long way, especially if the labels
>> were fewer and clearer.
>>
>> Bjorn
>>
>> On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles  wrote:
>>
>>> I definitely think reducing the label zoo could help. We have a lot of
>>> labels that are decompositions of what used to be Jira components.
>>>
>>> Kenn
>>>
>>> On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
 > Previously, we had automation that would automatically mark
 self-assigned self-reported issues as triaged. That is probably a third of
 issues or more.

 I believe that automation exists now[1], but it wasn't retroactively
 applied to old issues.

 > One issue is that a lot of triage work is getting the labels right (a
 lot of things end up in beam-model or beam-community)

 Do you think it would help to cut down on our label options?
 beam-community might be popular because it's the default option, so
 reducing options might not help that much unfortunately.

 [1] example - https://github.com/apache/beam/issues/24521

 On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles  wrote:

> Previously, we had automation that would automatically mark
> self-assigned self-reported issues as triaged. That is probably a third of
> issues or more. I'm not sure what else. I appreciate Valentyn keeping an
> eye on the Python label. One issue is that a lot of triage work is getting
> the labels right (a lot of things end up in beam-model or beam-community)
>
> Kenn
>
> On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev <
> dev@beam.apache.org> wrote:
>
>> This is a glorious achievement Kenn! To keep things clean going
>> forward are there any improvements we can make in our issue creation 
>> flow?
>>
>> On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles  wrote:
>>
>>> Hi all,
>>>
>>> I've finally done it! I've emptied the label "awaiting triage". Help
>>> me keep it that way! This ensures that we actually at least *look* at 
>>> each
>>> issue once, 

Re: Configuration Driven File Writes

2022-12-06 Thread Damon Douglas via dev
*(To those who identify on the Beam learning path, I supplemented the
original email with additional references/definitions that could help
understand this reply.)*

Hello Robert,

Thank you for reading the document and taking your time to review.  The
numbered answers below correspond to the numbered questions.

1) A machine is the intended configuration producer and SchemaTransform
output consumer.  However, I started with the perspective of a human user
as an automated process would serve those needs.  User stories are best
written in collaboration with intended users.  Yet, not having this I had
to draw on my experience working with various Beam and non-Beam customers.
I derived the following user story which is also in the document.  The key
phrase is underlined.

*As a Beam IO Developer, …*


*I want a schema aware configuration to produce a file writing PTransform,
so that I can unify and normalize a single point of entry to write Row
elements to a file or object system.*


*I want the provider to decide the intended format and file type based on
configured inputs so I don’t need to write code to support this.*


To prevent messy if/then statements and enumeration of mapping a format to
the resulting PTransform, PDone> transform, I
implemented FileWriteSchemaTransformFormatProvider [1] as an extension of
Providers.Identifyable [2] whereas implementations are annotated with
@AutoService [3].

2) The Read side could adopt the same.  We decided to put the code in its
own module and will coordinate efforts with the individual who volunteered
on the Read side implementation to maintain consistency.  Potentially, the
same format String parameter could map to the appropriate
PTransform> using the same mechanism.

*References / Definitions*

1. FileWriteSchemaTransformFormatProvider - an interface extension of
Providers.Identifyable [2].  The intended cardinality of a file format such
as Json, Avro, XML, etc and a FileWriteSchemaTransformFormatProvider is 1
to 1.
See
https://github.com/apache/beam/blob/master/sdks/java/io/fileschematransform/src/main/java/org/apache/beam/sdk/io/fileschematransform/FileWriteSchemaTransformFormatProvider.java
and
https://github.com/apache/beam/blob/master/sdks/java/io/fileschematransform/src/main/java/org/apache/beam/sdk/io/fileschematransform/FileWriteSchemaTransformFormatProviders.java

2. Providers.Identifyable - allows us to use a string value to map to a
class in Beam.
See
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/io/Providers.Identifyable.html

3. @AutoService - a Java class annotation that allows us to list all
classes that are annotated with a particular class.  The practical
implications are that AutoService gives us a list.  In combination with
Providers.Identifyable gives us a convenient lookup Map.
See https://www.baeldung.com/google-autoservice for a tutorial on
AutoService.

4. PBegin - the "input" to a root PTransform used typically in transforms
that read from sources.
See
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PBegin.html

5. PDone - the "output" of a PTransform typically in transforms that write
to sinks.
See
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PDone.html

On Fri, Dec 2, 2022 at 1:41 PM Robert Bradshaw via dev 
wrote:

> Thanks for looking into this and the careful writeup. I've read the
> design doc and it looks great, but have a couple of questions.
>
> (1) Why did you decide on having a single top-level FileWrite
> transform whose config is ([common_parameters], [xml-params],
> [csv-params], ...) rather than separate schema transforms for each.
> (2) Is there a plan to do a similar thing for the Read side?
>
> On Fri, Dec 2, 2022 at 9:48 AM Damon Douglas 
> wrote:
> >
> > Hello Everyone,
> >
> > For those new to Beam, even if this is your first day, consider
> yourselves a welcome contributor to this conversation.  I remember what it
> was like first learning Beam on my own and I am passionate about everyone's
> learning experience.  Below are definitions/references and a suggested
> learning path to understand this email.
> >
> > Short Version (assumes Beam knowledge):  Could someone review
> https://github.com/apache/beam/pull/24479? Based on the design document
> [1], It's the first of a series of pull requests that enable FileIO.Write
> [2] support for Schema Transforms [3].
> >
> > Long Version (for those first learning Beam):
> >
> > Explaining this without using Beam specific language.
> >
> > Suppose my team needs to quickly write to a file or object storage
> system without writing the specific code to accomplish this final step.
> This pull request begins work in enabling such ability.  I can specify the
> format such as avro, json, xml, etc in the configuration file and a backend
> service will deal with the remaining details of how to achieve this at
> scale.
> >
> > If you are interested in how this works, please see the design 

Re: Achievement unlocked: fully triaged

2022-12-06 Thread Danny McCormick via dev
> Is it possible to not have a default option?

Sadly, no AFAIK. I agree this would help. We could try things like making
the default " " and auto-closing issues that don't pick something other
than the default, that's a pretty rough experience though and not worth it
IMO.

> I definitely think reducing the label zoo could help.

What's our desired end state here? I put together a doc with my suggested
labels -
https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing
-
listed below as well for convenience. Please comment in the doc if you have
thoughts/labels you care about, or continue the email thread if you have
bigger ideas (e.g. getting rid of labels, changing our templates entirely
instead, etc...).

*Danny's Proposed Labels:*


   -

   beam-community
   -

   beam-playground
   -

   community-metrics
   -

   cross-language
   -

   examples-java
   -

   examples-python
   -

   extensions
   -

   infrastructure
   -

   io-go
   -

   io-ideas
   -

   io-java
   -

   io-py
   -

   katas
   -

   release
   -

   run-inference
   -

   runner
   -

   runner-dataflow
   -

   runner-direct
   -

   runner-flink
   -

   runner-samza
   -

   runner-spark
   -

   runner-universal
   -

   sdk-go
   -

   sdk-ideas
   -

   sdk-java
   -

   sdk-py
   -

   sdk-typescript
   -

   test-failures
   -

   website


On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen 
wrote:

> As someone still newer to Beam, I can attest that the number of labels can
> be overwhelming.
>
> Is it possible to not have a default option? Even just getting people to
> interact with the dropdown might go a long way, especially if the labels
> were fewer and clearer.
>
> Bjorn
>
> On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles  wrote:
>
>> I definitely think reducing the label zoo could help. We have a lot of
>> labels that are decompositions of what used to be Jira components.
>>
>> Kenn
>>
>> On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> > Previously, we had automation that would automatically mark
>>> self-assigned self-reported issues as triaged. That is probably a third of
>>> issues or more.
>>>
>>> I believe that automation exists now[1], but it wasn't retroactively
>>> applied to old issues.
>>>
>>> > One issue is that a lot of triage work is getting the labels right (a
>>> lot of things end up in beam-model or beam-community)
>>>
>>> Do you think it would help to cut down on our label options?
>>> beam-community might be popular because it's the default option, so
>>> reducing options might not help that much unfortunately.
>>>
>>> [1] example - https://github.com/apache/beam/issues/24521
>>>
>>> On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles  wrote:
>>>
 Previously, we had automation that would automatically mark
 self-assigned self-reported issues as triaged. That is probably a third of
 issues or more. I'm not sure what else. I appreciate Valentyn keeping an
 eye on the Python label. One issue is that a lot of triage work is getting
 the labels right (a lot of things end up in beam-model or beam-community)

 Kenn

 On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev <
 dev@beam.apache.org> wrote:

> This is a glorious achievement Kenn! To keep things clean going
> forward are there any improvements we can make in our issue creation flow?
>
> On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> I've finally done it! I've emptied the label "awaiting triage". Help
>> me keep it that way! This ensures that we actually at least *look* at 
>> each
>> issue once, preferably soon after it is filed. The idea is that you make
>> sure the priority and other labels are right, since users are not 
>> expected
>> to know how we use labels.
>>
>>
>> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3A%22awaiting+triage%22
>>
>> Kenn
>>
>


Re: Achievement unlocked: fully triaged

2022-12-06 Thread Bjorn Pedersen via dev
As someone still newer to Beam, I can attest that the number of labels can
be overwhelming.

Is it possible to not have a default option? Even just getting people to
interact with the dropdown might go a long way, especially if the labels
were fewer and clearer.

Bjorn

On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles  wrote:

> I definitely think reducing the label zoo could help. We have a lot of
> labels that are decompositions of what used to be Jira components.
>
> Kenn
>
> On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> > Previously, we had automation that would automatically mark
>> self-assigned self-reported issues as triaged. That is probably a third of
>> issues or more.
>>
>> I believe that automation exists now[1], but it wasn't retroactively
>> applied to old issues.
>>
>> > One issue is that a lot of triage work is getting the labels right (a
>> lot of things end up in beam-model or beam-community)
>>
>> Do you think it would help to cut down on our label options?
>> beam-community might be popular because it's the default option, so
>> reducing options might not help that much unfortunately.
>>
>> [1] example - https://github.com/apache/beam/issues/24521
>>
>> On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles  wrote:
>>
>>> Previously, we had automation that would automatically mark
>>> self-assigned self-reported issues as triaged. That is probably a third of
>>> issues or more. I'm not sure what else. I appreciate Valentyn keeping an
>>> eye on the Python label. One issue is that a lot of triage work is getting
>>> the labels right (a lot of things end up in beam-model or beam-community)
>>>
>>> Kenn
>>>
>>> On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev <
>>> dev@beam.apache.org> wrote:
>>>
 This is a glorious achievement Kenn! To keep things clean going forward
 are there any improvements we can make in our issue creation flow?

 On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles  wrote:

> Hi all,
>
> I've finally done it! I've emptied the label "awaiting triage". Help
> me keep it that way! This ensures that we actually at least *look* at each
> issue once, preferably soon after it is filed. The idea is that you make
> sure the priority and other labels are right, since users are not expected
> to know how we use labels.
>
>
> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3A%22awaiting+triage%22
>
> Kenn
>



Re: Beam High Priority Issue Report (60)

2022-12-06 Thread Kenneth Knowles
Now that we've managed to triage all our incoming issues, perhaps the next
easiest step is to get updates on the *assigned* P1s. After all, in theory
people are working on these and they are quite urgent. If you are assigned
a P1 and not working on it, go ahead and unassign it so we have an accurate
view of the state of bugs.

On Tue, Dec 6, 2022 at 2:03 AM  wrote:

> This is your daily summary of Beam's current high priority issues that may
> need attention.
>
> See https://beam.apache.org/contribute/issue-priorities for the
> meaning and expectations around issue priorities.
>
> Unassigned P1 Issues:
>
> https://github.com/apache/beam/issues/24537 [Bug]: python flink runner is
> not compatible with Azure blob file system in Java
> https://github.com/apache/beam/issues/24535 [Bug]: Bigquery Load jobs
> with WRITE_TRUNCATE disposition may truncate valid records.
> https://github.com/apache/beam/issues/24415 [Bug]: Cannot find a matching
> Calcite SqlTypeName for Beam type: LOGICAL_TYPE seen in 2.44.0 SNAPSHOT
> https://github.com/apache/beam/issues/24383 [Bug]: Daemon will be stopped
> at the end of the build after the daemon was no longer found in the daemon
> registry
> https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot
> be passed to flink runner
> https://github.com/apache/beam/issues/24313 [Flaky]:
> apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
> https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout
> waiting to lock gradle
> https://github.com/apache/beam/issues/24263 [Bug]: Remote call on
> apache-beam-jenkins-3 failed. The channel is closing down or has closed down
> https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron
> regularily failing - test_pardo_large_input flaky
> https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes
> in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and
> ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
> https://github.com/apache/beam/issues/22969 Discrepancy in behavior of
> `DoFn.process()` when `yield` is combined with `return` statement, or vice
> versa
> https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery
> silently skips most of records without job fail
> https://github.com/apache/beam/issues/22913 [Bug]:
> beam_PostCommit_Java_ValidatesRunner_Flink is flakes in
> org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
> https://github.com/apache/beam/issues/22321
> PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly
> failing on jenkins
> https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get
> output to Failed Inserts PCollection
> https://github.com/apache/beam/issues/21561
> ExternalPythonTransformTest.trivialPythonTransform flaky
> https://github.com/apache/beam/issues/21480 flake:
> FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
> https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink
> flaky: Connection refused
> https://github.com/apache/beam/issues/21462 Flake in
> org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in
> use
> https://github.com/apache/beam/issues/21261
> org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
> is flaky
> https://github.com/apache/beam/issues/21260 Python DirectRunner does not
> emit data at GC time
> https://github.com/apache/beam/issues/21121
> apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
> flakey
> https://github.com/apache/beam/issues/21113
> testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky
> https://github.com/apache/beam/issues/20976
> apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
> is flaky
> https://github.com/apache/beam/issues/20975
> org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
> false] is flaky
> https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake
> with grpc.FutureTimeoutError on SDK harness startup
> https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize
> OOM on Flink
> https://github.com/apache/beam/issues/20108 Python direct runner doesn't
> emit empty pane when it should
> https://github.com/apache/beam/issues/19814 Flink streaming flakes in
> ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful
> and
> ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
> https://github.com/apache/beam/issues/19734
> WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in
> timestamp order (sickbayed)
> https://github.com/apache/beam/issues/19465 Explore possibilities to
> lower in-use IP address quota footprint.
> https://github.com/apache/beam/issues/19241 Python Dataflow 

Beam High Priority Issue Report (60)

2022-12-06 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/24537 [Bug]: python flink runner is not 
compatible with Azure blob file system in Java
https://github.com/apache/beam/issues/24535 [Bug]: Bigquery Load jobs with 
WRITE_TRUNCATE disposition may truncate valid records.
https://github.com/apache/beam/issues/24415 [Bug]: Cannot find a matching 
Calcite SqlTypeName for Beam type: LOGICAL_TYPE seen in 2.44.0 SNAPSHOT
https://github.com/apache/beam/issues/24383 [Bug]: Daemon will be stopped at 
the end of the build after the daemon was no longer found in the daemon registry
https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot be 
passed to flink runner
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to 
lock gradle
https://github.com/apache/beam/issues/24263 [Bug]: Remote call on 
apache-beam-jenkins-3 failed. The channel is closing down or has closed down
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22969 Discrepancy in behavior of 
`DoFn.process()` when `yield` is combined with `return` statement, or vice versa
https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently 
skips most of records without job fail
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21561 
ExternalPythonTransformTest.trivialPythonTransform flaky
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21113 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20975 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM 
on Flink
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19734 
WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp 
order (sickbayed)
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.
https://github.com/apache/beam/issues/19241 Python Dataflow integration tests 
should export the pipeline Job ID and console output to Jenkins Test Result 
section


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/24100 [Bug]: `Filter.whereFieldName` 
appears in docs but not available
https://github.com/apache/beam/issues/23906 [Bug]: Dataflow jpms tests fail on 
the 2.43.0 release branch
https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true 
for unequal rows