Re: Achievement unlocked: fully triaged
+1 to letting conjunctions form naturally. In the bikeshedding discusion: That would mean I'm biased to having the reduced label set to have reduced colours for the general category. Eg. SDK colour, Runner colour, beam resources colour, and IO being it's own special unique colour, and awaiting triage being unique as well. This would make triage checking a bit more glancible, since except for very particular issues that might warrant "several SDKs" or "several runners". On Tue, Dec 6, 2022, 11:13 AM Danny McCormick via dev wrote: > I like that idea (and the list) as well. > > On Tue, Dec 6, 2022 at 1:59 PM Kerry Donny-Clark > wrote: > >> I really like the idea of multi-select and automatic "awaiting triage". >> Kenn, I think the list you have looks good to me. >> >> On Tue, Dec 6, 2022 at 1:55 PM Kenneth Knowles wrote: >> >>> Noting that what you've listed are the options in the issue template, >>> which are then expanded to multiple labels. So focusing on the issue >>> template, I like the general idea, but maybe we can simplify it even more: >>> >>> When a user is filing a bug, I think a good outcome is for it to get >>> into the right person's saved search (like Go, Python, etc) while still >>> having the "awaiting triage" label on it. >>> >>> What if we just went all the way simple and had checkboxes for just the >>> highest level. Something like the following: >>> >>> Which language SDK or feature is related to your report? (check all that >>> apply) >>> [ ] Python >>> [ ] Java >>> [ ] Go >>> [ ] Typescript >>> [ ] IO connector >>> [ ] Beam examples >>> [ ] Beam playground >>> [ ] Beam katas >>> [ ] Website >>> [ ] Spark Runner >>> [ ] Flink Runner >>> [ ] Samza Runner >>> [ ] Twister2 Runner >>> [ ] Hazelcast Jet Runner >>> [ ] Google Cloud Dataflow Runner >>> >>> We could even trim it even further to just language, and let the person >>> doing triage handle the rest. >>> >>> Kenn >>> >>> On Tue, Dec 6, 2022 at 9:11 AM Danny McCormick via dev < >>> dev@beam.apache.org> wrote: >>> > Is it possible to not have a default option? Sadly, no AFAIK. I agree this would help. We could try things like making the default " " and auto-closing issues that don't pick something other than the default, that's a pretty rough experience though and not worth it IMO. > I definitely think reducing the label zoo could help. What's our desired end state here? I put together a doc with my suggested labels - https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing - listed below as well for convenience. Please comment in the doc if you have thoughts/labels you care about, or continue the email thread if you have bigger ideas (e.g. getting rid of labels, changing our templates entirely instead, etc...). *Danny's Proposed Labels:* - beam-community - beam-playground - community-metrics - cross-language - examples-java - examples-python - extensions - infrastructure - io-go - io-ideas - io-java - io-py - katas - release - run-inference - runner - runner-dataflow - runner-direct - runner-flink - runner-samza - runner-spark - runner-universal - sdk-go - sdk-ideas - sdk-java - sdk-py - sdk-typescript - test-failures - website On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen < bjornpeder...@google.com> wrote: > As someone still newer to Beam, I can attest that the number of labels > can be overwhelming. > > Is it possible to not have a default option? Even just getting people > to interact with the dropdown might go a long way, especially if the > labels > were fewer and clearer. > > Bjorn > > On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles > wrote: > >> I definitely think reducing the label zoo could help. We have a lot >> of labels that are decompositions of what used to be Jira components. >> >> Kenn >> >> On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev < >> dev@beam.apache.org> wrote: >> >>> > Previously, we had automation that would automatically mark >>> self-assigned self-reported issues as triaged. That is probably a third >>> of >>> issues or more. >>> >>> I believe that automation exists now[1], but it wasn't retroactively >>>
Re: Achievement unlocked: fully triaged
I like that idea (and the list) as well. On Tue, Dec 6, 2022 at 1:59 PM Kerry Donny-Clark wrote: > I really like the idea of multi-select and automatic "awaiting triage". > Kenn, I think the list you have looks good to me. > > On Tue, Dec 6, 2022 at 1:55 PM Kenneth Knowles wrote: > >> Noting that what you've listed are the options in the issue template, >> which are then expanded to multiple labels. So focusing on the issue >> template, I like the general idea, but maybe we can simplify it even more: >> >> When a user is filing a bug, I think a good outcome is for it to get into >> the right person's saved search (like Go, Python, etc) while still having >> the "awaiting triage" label on it. >> >> What if we just went all the way simple and had checkboxes for just the >> highest level. Something like the following: >> >> Which language SDK or feature is related to your report? (check all that >> apply) >> [ ] Python >> [ ] Java >> [ ] Go >> [ ] Typescript >> [ ] IO connector >> [ ] Beam examples >> [ ] Beam playground >> [ ] Beam katas >> [ ] Website >> [ ] Spark Runner >> [ ] Flink Runner >> [ ] Samza Runner >> [ ] Twister2 Runner >> [ ] Hazelcast Jet Runner >> [ ] Google Cloud Dataflow Runner >> >> We could even trim it even further to just language, and let the person >> doing triage handle the rest. >> >> Kenn >> >> On Tue, Dec 6, 2022 at 9:11 AM Danny McCormick via dev < >> dev@beam.apache.org> wrote: >> >>> > Is it possible to not have a default option? >>> >>> Sadly, no AFAIK. I agree this would help. We could try things like >>> making the default " " and auto-closing issues that don't pick something >>> other than the default, that's a pretty rough experience though and not >>> worth it IMO. >>> >>> > I definitely think reducing the label zoo could help. >>> >>> What's our desired end state here? I put together a doc with my >>> suggested labels - >>> https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing >>> - >>> listed below as well for convenience. Please comment in the doc if you have >>> thoughts/labels you care about, or continue the email thread if you have >>> bigger ideas (e.g. getting rid of labels, changing our templates entirely >>> instead, etc...). >>> >>> *Danny's Proposed Labels:* >>> >>> >>>- >>> >>>beam-community >>>- >>> >>>beam-playground >>>- >>> >>>community-metrics >>>- >>> >>>cross-language >>>- >>> >>>examples-java >>>- >>> >>>examples-python >>>- >>> >>>extensions >>>- >>> >>>infrastructure >>>- >>> >>>io-go >>>- >>> >>>io-ideas >>>- >>> >>>io-java >>>- >>> >>>io-py >>>- >>> >>>katas >>>- >>> >>>release >>>- >>> >>>run-inference >>>- >>> >>>runner >>>- >>> >>>runner-dataflow >>>- >>> >>>runner-direct >>>- >>> >>>runner-flink >>>- >>> >>>runner-samza >>>- >>> >>>runner-spark >>>- >>> >>>runner-universal >>>- >>> >>>sdk-go >>>- >>> >>>sdk-ideas >>>- >>> >>>sdk-java >>>- >>> >>>sdk-py >>>- >>> >>>sdk-typescript >>>- >>> >>>test-failures >>>- >>> >>>website >>> >>> >>> On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen >>> wrote: >>> As someone still newer to Beam, I can attest that the number of labels can be overwhelming. Is it possible to not have a default option? Even just getting people to interact with the dropdown might go a long way, especially if the labels were fewer and clearer. Bjorn On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles wrote: > I definitely think reducing the label zoo could help. We have a lot of > labels that are decompositions of what used to be Jira components. > > Kenn > > On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev < > dev@beam.apache.org> wrote: > >> > Previously, we had automation that would automatically mark >> self-assigned self-reported issues as triaged. That is probably a third >> of >> issues or more. >> >> I believe that automation exists now[1], but it wasn't retroactively >> applied to old issues. >> >> > One issue is that a lot of triage work is getting the labels right >> (a lot of things end up in beam-model or beam-community) >> >> Do you think it would help to cut down on our label options? >> beam-community might be popular because it's the default option, so >> reducing options might not help that much unfortunately. >> >> [1] example - https://github.com/apache/beam/issues/24521 >> >> On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles >> wrote: >> >>> Previously, we had automation that would automatically mark >>> self-assigned self-reported issues as triaged. That is probably a third >>> of >>> issues or more. I'm not sure what else. I appreciate Valentyn
Re: Achievement unlocked: fully triaged
I really like the idea of multi-select and automatic "awaiting triage". Kenn, I think the list you have looks good to me. On Tue, Dec 6, 2022 at 1:55 PM Kenneth Knowles wrote: > Noting that what you've listed are the options in the issue template, > which are then expanded to multiple labels. So focusing on the issue > template, I like the general idea, but maybe we can simplify it even more: > > When a user is filing a bug, I think a good outcome is for it to get into > the right person's saved search (like Go, Python, etc) while still having > the "awaiting triage" label on it. > > What if we just went all the way simple and had checkboxes for just the > highest level. Something like the following: > > Which language SDK or feature is related to your report? (check all that > apply) > [ ] Python > [ ] Java > [ ] Go > [ ] Typescript > [ ] IO connector > [ ] Beam examples > [ ] Beam playground > [ ] Beam katas > [ ] Website > [ ] Spark Runner > [ ] Flink Runner > [ ] Samza Runner > [ ] Twister2 Runner > [ ] Hazelcast Jet Runner > [ ] Google Cloud Dataflow Runner > > We could even trim it even further to just language, and let the person > doing triage handle the rest. > > Kenn > > On Tue, Dec 6, 2022 at 9:11 AM Danny McCormick via dev < > dev@beam.apache.org> wrote: > >> > Is it possible to not have a default option? >> >> Sadly, no AFAIK. I agree this would help. We could try things like making >> the default " " and auto-closing issues that don't pick something other >> than the default, that's a pretty rough experience though and not worth it >> IMO. >> >> > I definitely think reducing the label zoo could help. >> >> What's our desired end state here? I put together a doc with my suggested >> labels - >> https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing >> - >> listed below as well for convenience. Please comment in the doc if you have >> thoughts/labels you care about, or continue the email thread if you have >> bigger ideas (e.g. getting rid of labels, changing our templates entirely >> instead, etc...). >> >> *Danny's Proposed Labels:* >> >> >>- >> >>beam-community >>- >> >>beam-playground >>- >> >>community-metrics >>- >> >>cross-language >>- >> >>examples-java >>- >> >>examples-python >>- >> >>extensions >>- >> >>infrastructure >>- >> >>io-go >>- >> >>io-ideas >>- >> >>io-java >>- >> >>io-py >>- >> >>katas >>- >> >>release >>- >> >>run-inference >>- >> >>runner >>- >> >>runner-dataflow >>- >> >>runner-direct >>- >> >>runner-flink >>- >> >>runner-samza >>- >> >>runner-spark >>- >> >>runner-universal >>- >> >>sdk-go >>- >> >>sdk-ideas >>- >> >>sdk-java >>- >> >>sdk-py >>- >> >>sdk-typescript >>- >> >>test-failures >>- >> >>website >> >> >> On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen >> wrote: >> >>> As someone still newer to Beam, I can attest that the number of labels >>> can be overwhelming. >>> >>> Is it possible to not have a default option? Even just getting people to >>> interact with the dropdown might go a long way, especially if the labels >>> were fewer and clearer. >>> >>> Bjorn >>> >>> On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles wrote: >>> I definitely think reducing the label zoo could help. We have a lot of labels that are decompositions of what used to be Jira components. Kenn On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev < dev@beam.apache.org> wrote: > > Previously, we had automation that would automatically mark > self-assigned self-reported issues as triaged. That is probably a third of > issues or more. > > I believe that automation exists now[1], but it wasn't retroactively > applied to old issues. > > > One issue is that a lot of triage work is getting the labels right > (a lot of things end up in beam-model or beam-community) > > Do you think it would help to cut down on our label options? > beam-community might be popular because it's the default option, so > reducing options might not help that much unfortunately. > > [1] example - https://github.com/apache/beam/issues/24521 > > On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles > wrote: > >> Previously, we had automation that would automatically mark >> self-assigned self-reported issues as triaged. That is probably a third >> of >> issues or more. I'm not sure what else. I appreciate Valentyn keeping an >> eye on the Python label. One issue is that a lot of triage work is >> getting >> the labels right (a lot of things end up in beam-model or beam-community) >> >> Kenn >> >> On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev < >> dev@beam.apache.org> wrote:
Re: Achievement unlocked: fully triaged
Noting that what you've listed are the options in the issue template, which are then expanded to multiple labels. So focusing on the issue template, I like the general idea, but maybe we can simplify it even more: When a user is filing a bug, I think a good outcome is for it to get into the right person's saved search (like Go, Python, etc) while still having the "awaiting triage" label on it. What if we just went all the way simple and had checkboxes for just the highest level. Something like the following: Which language SDK or feature is related to your report? (check all that apply) [ ] Python [ ] Java [ ] Go [ ] Typescript [ ] IO connector [ ] Beam examples [ ] Beam playground [ ] Beam katas [ ] Website [ ] Spark Runner [ ] Flink Runner [ ] Samza Runner [ ] Twister2 Runner [ ] Hazelcast Jet Runner [ ] Google Cloud Dataflow Runner We could even trim it even further to just language, and let the person doing triage handle the rest. Kenn On Tue, Dec 6, 2022 at 9:11 AM Danny McCormick via dev wrote: > > Is it possible to not have a default option? > > Sadly, no AFAIK. I agree this would help. We could try things like making > the default " " and auto-closing issues that don't pick something other > than the default, that's a pretty rough experience though and not worth it > IMO. > > > I definitely think reducing the label zoo could help. > > What's our desired end state here? I put together a doc with my suggested > labels - > https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing > - > listed below as well for convenience. Please comment in the doc if you have > thoughts/labels you care about, or continue the email thread if you have > bigger ideas (e.g. getting rid of labels, changing our templates entirely > instead, etc...). > > *Danny's Proposed Labels:* > > >- > >beam-community >- > >beam-playground >- > >community-metrics >- > >cross-language >- > >examples-java >- > >examples-python >- > >extensions >- > >infrastructure >- > >io-go >- > >io-ideas >- > >io-java >- > >io-py >- > >katas >- > >release >- > >run-inference >- > >runner >- > >runner-dataflow >- > >runner-direct >- > >runner-flink >- > >runner-samza >- > >runner-spark >- > >runner-universal >- > >sdk-go >- > >sdk-ideas >- > >sdk-java >- > >sdk-py >- > >sdk-typescript >- > >test-failures >- > >website > > > On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen > wrote: > >> As someone still newer to Beam, I can attest that the number of labels >> can be overwhelming. >> >> Is it possible to not have a default option? Even just getting people to >> interact with the dropdown might go a long way, especially if the labels >> were fewer and clearer. >> >> Bjorn >> >> On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles wrote: >> >>> I definitely think reducing the label zoo could help. We have a lot of >>> labels that are decompositions of what used to be Jira components. >>> >>> Kenn >>> >>> On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev < >>> dev@beam.apache.org> wrote: >>> > Previously, we had automation that would automatically mark self-assigned self-reported issues as triaged. That is probably a third of issues or more. I believe that automation exists now[1], but it wasn't retroactively applied to old issues. > One issue is that a lot of triage work is getting the labels right (a lot of things end up in beam-model or beam-community) Do you think it would help to cut down on our label options? beam-community might be popular because it's the default option, so reducing options might not help that much unfortunately. [1] example - https://github.com/apache/beam/issues/24521 On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles wrote: > Previously, we had automation that would automatically mark > self-assigned self-reported issues as triaged. That is probably a third of > issues or more. I'm not sure what else. I appreciate Valentyn keeping an > eye on the Python label. One issue is that a lot of triage work is getting > the labels right (a lot of things end up in beam-model or beam-community) > > Kenn > > On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev < > dev@beam.apache.org> wrote: > >> This is a glorious achievement Kenn! To keep things clean going >> forward are there any improvements we can make in our issue creation >> flow? >> >> On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles wrote: >> >>> Hi all, >>> >>> I've finally done it! I've emptied the label "awaiting triage". Help >>> me keep it that way! This ensures that we actually at least *look* at >>> each >>> issue once,
Re: Configuration Driven File Writes
*(To those who identify on the Beam learning path, I supplemented the original email with additional references/definitions that could help understand this reply.)* Hello Robert, Thank you for reading the document and taking your time to review. The numbered answers below correspond to the numbered questions. 1) A machine is the intended configuration producer and SchemaTransform output consumer. However, I started with the perspective of a human user as an automated process would serve those needs. User stories are best written in collaboration with intended users. Yet, not having this I had to draw on my experience working with various Beam and non-Beam customers. I derived the following user story which is also in the document. The key phrase is underlined. *As a Beam IO Developer, …* *I want a schema aware configuration to produce a file writing PTransform, so that I can unify and normalize a single point of entry to write Row elements to a file or object system.* *I want the provider to decide the intended format and file type based on configured inputs so I don’t need to write code to support this.* To prevent messy if/then statements and enumeration of mapping a format to the resulting PTransform, PDone> transform, I implemented FileWriteSchemaTransformFormatProvider [1] as an extension of Providers.Identifyable [2] whereas implementations are annotated with @AutoService [3]. 2) The Read side could adopt the same. We decided to put the code in its own module and will coordinate efforts with the individual who volunteered on the Read side implementation to maintain consistency. Potentially, the same format String parameter could map to the appropriate PTransform> using the same mechanism. *References / Definitions* 1. FileWriteSchemaTransformFormatProvider - an interface extension of Providers.Identifyable [2]. The intended cardinality of a file format such as Json, Avro, XML, etc and a FileWriteSchemaTransformFormatProvider is 1 to 1. See https://github.com/apache/beam/blob/master/sdks/java/io/fileschematransform/src/main/java/org/apache/beam/sdk/io/fileschematransform/FileWriteSchemaTransformFormatProvider.java and https://github.com/apache/beam/blob/master/sdks/java/io/fileschematransform/src/main/java/org/apache/beam/sdk/io/fileschematransform/FileWriteSchemaTransformFormatProviders.java 2. Providers.Identifyable - allows us to use a string value to map to a class in Beam. See https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/io/Providers.Identifyable.html 3. @AutoService - a Java class annotation that allows us to list all classes that are annotated with a particular class. The practical implications are that AutoService gives us a list. In combination with Providers.Identifyable gives us a convenient lookup Map. See https://www.baeldung.com/google-autoservice for a tutorial on AutoService. 4. PBegin - the "input" to a root PTransform used typically in transforms that read from sources. See https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PBegin.html 5. PDone - the "output" of a PTransform typically in transforms that write to sinks. See https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PDone.html On Fri, Dec 2, 2022 at 1:41 PM Robert Bradshaw via dev wrote: > Thanks for looking into this and the careful writeup. I've read the > design doc and it looks great, but have a couple of questions. > > (1) Why did you decide on having a single top-level FileWrite > transform whose config is ([common_parameters], [xml-params], > [csv-params], ...) rather than separate schema transforms for each. > (2) Is there a plan to do a similar thing for the Read side? > > On Fri, Dec 2, 2022 at 9:48 AM Damon Douglas > wrote: > > > > Hello Everyone, > > > > For those new to Beam, even if this is your first day, consider > yourselves a welcome contributor to this conversation. I remember what it > was like first learning Beam on my own and I am passionate about everyone's > learning experience. Below are definitions/references and a suggested > learning path to understand this email. > > > > Short Version (assumes Beam knowledge): Could someone review > https://github.com/apache/beam/pull/24479? Based on the design document > [1], It's the first of a series of pull requests that enable FileIO.Write > [2] support for Schema Transforms [3]. > > > > Long Version (for those first learning Beam): > > > > Explaining this without using Beam specific language. > > > > Suppose my team needs to quickly write to a file or object storage > system without writing the specific code to accomplish this final step. > This pull request begins work in enabling such ability. I can specify the > format such as avro, json, xml, etc in the configuration file and a backend > service will deal with the remaining details of how to achieve this at > scale. > > > > If you are interested in how this works, please see the design
Re: Achievement unlocked: fully triaged
> Is it possible to not have a default option? Sadly, no AFAIK. I agree this would help. We could try things like making the default " " and auto-closing issues that don't pick something other than the default, that's a pretty rough experience though and not worth it IMO. > I definitely think reducing the label zoo could help. What's our desired end state here? I put together a doc with my suggested labels - https://docs.google.com/document/d/1FpaFr_Sdg217ogd5oMDRX4uLIMSatKLF_if9CzLg9tM/edit?usp=sharing - listed below as well for convenience. Please comment in the doc if you have thoughts/labels you care about, or continue the email thread if you have bigger ideas (e.g. getting rid of labels, changing our templates entirely instead, etc...). *Danny's Proposed Labels:* - beam-community - beam-playground - community-metrics - cross-language - examples-java - examples-python - extensions - infrastructure - io-go - io-ideas - io-java - io-py - katas - release - run-inference - runner - runner-dataflow - runner-direct - runner-flink - runner-samza - runner-spark - runner-universal - sdk-go - sdk-ideas - sdk-java - sdk-py - sdk-typescript - test-failures - website On Tue, Dec 6, 2022 at 11:17 AM Bjorn Pedersen wrote: > As someone still newer to Beam, I can attest that the number of labels can > be overwhelming. > > Is it possible to not have a default option? Even just getting people to > interact with the dropdown might go a long way, especially if the labels > were fewer and clearer. > > Bjorn > > On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles wrote: > >> I definitely think reducing the label zoo could help. We have a lot of >> labels that are decompositions of what used to be Jira components. >> >> Kenn >> >> On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev < >> dev@beam.apache.org> wrote: >> >>> > Previously, we had automation that would automatically mark >>> self-assigned self-reported issues as triaged. That is probably a third of >>> issues or more. >>> >>> I believe that automation exists now[1], but it wasn't retroactively >>> applied to old issues. >>> >>> > One issue is that a lot of triage work is getting the labels right (a >>> lot of things end up in beam-model or beam-community) >>> >>> Do you think it would help to cut down on our label options? >>> beam-community might be popular because it's the default option, so >>> reducing options might not help that much unfortunately. >>> >>> [1] example - https://github.com/apache/beam/issues/24521 >>> >>> On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles wrote: >>> Previously, we had automation that would automatically mark self-assigned self-reported issues as triaged. That is probably a third of issues or more. I'm not sure what else. I appreciate Valentyn keeping an eye on the Python label. One issue is that a lot of triage work is getting the labels right (a lot of things end up in beam-model or beam-community) Kenn On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev < dev@beam.apache.org> wrote: > This is a glorious achievement Kenn! To keep things clean going > forward are there any improvements we can make in our issue creation flow? > > On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles wrote: > >> Hi all, >> >> I've finally done it! I've emptied the label "awaiting triage". Help >> me keep it that way! This ensures that we actually at least *look* at >> each >> issue once, preferably soon after it is filed. The idea is that you make >> sure the priority and other labels are right, since users are not >> expected >> to know how we use labels. >> >> >> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3A%22awaiting+triage%22 >> >> Kenn >> >
Re: Achievement unlocked: fully triaged
As someone still newer to Beam, I can attest that the number of labels can be overwhelming. Is it possible to not have a default option? Even just getting people to interact with the dropdown might go a long way, especially if the labels were fewer and clearer. Bjorn On Mon, Dec 5, 2022 at 6:46 PM Kenneth Knowles wrote: > I definitely think reducing the label zoo could help. We have a lot of > labels that are decompositions of what used to be Jira components. > > Kenn > > On Mon, Dec 5, 2022 at 12:17 PM Danny McCormick via dev < > dev@beam.apache.org> wrote: > >> > Previously, we had automation that would automatically mark >> self-assigned self-reported issues as triaged. That is probably a third of >> issues or more. >> >> I believe that automation exists now[1], but it wasn't retroactively >> applied to old issues. >> >> > One issue is that a lot of triage work is getting the labels right (a >> lot of things end up in beam-model or beam-community) >> >> Do you think it would help to cut down on our label options? >> beam-community might be popular because it's the default option, so >> reducing options might not help that much unfortunately. >> >> [1] example - https://github.com/apache/beam/issues/24521 >> >> On Mon, Dec 5, 2022 at 2:57 PM Kenneth Knowles wrote: >> >>> Previously, we had automation that would automatically mark >>> self-assigned self-reported issues as triaged. That is probably a third of >>> issues or more. I'm not sure what else. I appreciate Valentyn keeping an >>> eye on the Python label. One issue is that a lot of triage work is getting >>> the labels right (a lot of things end up in beam-model or beam-community) >>> >>> Kenn >>> >>> On Mon, Dec 5, 2022 at 6:23 AM Kerry Donny-Clark via dev < >>> dev@beam.apache.org> wrote: >>> This is a glorious achievement Kenn! To keep things clean going forward are there any improvements we can make in our issue creation flow? On Fri, Dec 2, 2022, 6:44 PM Kenneth Knowles wrote: > Hi all, > > I've finally done it! I've emptied the label "awaiting triage". Help > me keep it that way! This ensures that we actually at least *look* at each > issue once, preferably soon after it is filed. The idea is that you make > sure the priority and other labels are right, since users are not expected > to know how we use labels. > > > https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3A%22awaiting+triage%22 > > Kenn >
Re: Beam High Priority Issue Report (60)
Now that we've managed to triage all our incoming issues, perhaps the next easiest step is to get updates on the *assigned* P1s. After all, in theory people are working on these and they are quite urgent. If you are assigned a P1 and not working on it, go ahead and unassign it so we have an accurate view of the state of bugs. On Tue, Dec 6, 2022 at 2:03 AM wrote: > This is your daily summary of Beam's current high priority issues that may > need attention. > > See https://beam.apache.org/contribute/issue-priorities for the > meaning and expectations around issue priorities. > > Unassigned P1 Issues: > > https://github.com/apache/beam/issues/24537 [Bug]: python flink runner is > not compatible with Azure blob file system in Java > https://github.com/apache/beam/issues/24535 [Bug]: Bigquery Load jobs > with WRITE_TRUNCATE disposition may truncate valid records. > https://github.com/apache/beam/issues/24415 [Bug]: Cannot find a matching > Calcite SqlTypeName for Beam type: LOGICAL_TYPE seen in 2.44.0 SNAPSHOT > https://github.com/apache/beam/issues/24383 [Bug]: Daemon will be stopped > at the end of the build after the daemon was no longer found in the daemon > registry > https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot > be passed to flink runner > https://github.com/apache/beam/issues/24313 [Flaky]: > apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder > https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout > waiting to lock gradle > https://github.com/apache/beam/issues/24263 [Bug]: Remote call on > apache-beam-jenkins-3 failed. The channel is closing down or has closed down > https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron > regularily failing - test_pardo_large_input flaky > https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes > in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and > ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle > https://github.com/apache/beam/issues/22969 Discrepancy in behavior of > `DoFn.process()` when `yield` is combined with `return` statement, or vice > versa > https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery > silently skips most of records without job fail > https://github.com/apache/beam/issues/22913 [Bug]: > beam_PostCommit_Java_ValidatesRunner_Flink is flakes in > org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState > https://github.com/apache/beam/issues/22321 > PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly > failing on jenkins > https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get > output to Failed Inserts PCollection > https://github.com/apache/beam/issues/21561 > ExternalPythonTransformTest.trivialPythonTransform flaky > https://github.com/apache/beam/issues/21480 flake: > FlinkRunnerTest.testEnsureStdoutStdErrIsRestored > https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink > flaky: Connection refused > https://github.com/apache/beam/issues/21462 Flake in > org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in > use > https://github.com/apache/beam/issues/21261 > org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer > is flaky > https://github.com/apache/beam/issues/21260 Python DirectRunner does not > emit data at GC time > https://github.com/apache/beam/issues/21121 > apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it > flakey > https://github.com/apache/beam/issues/21113 > testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky > https://github.com/apache/beam/issues/20976 > apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics > is flaky > https://github.com/apache/beam/issues/20975 > org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: > false] is flaky > https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake > with grpc.FutureTimeoutError on SDK harness startup > https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize > OOM on Flink > https://github.com/apache/beam/issues/20108 Python direct runner doesn't > emit empty pane when it should > https://github.com/apache/beam/issues/19814 Flink streaming flakes in > ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful > and > ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful > https://github.com/apache/beam/issues/19734 > WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in > timestamp order (sickbayed) > https://github.com/apache/beam/issues/19465 Explore possibilities to > lower in-use IP address quota footprint. > https://github.com/apache/beam/issues/19241 Python Dataflow
Beam High Priority Issue Report (60)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/24537 [Bug]: python flink runner is not compatible with Azure blob file system in Java https://github.com/apache/beam/issues/24535 [Bug]: Bigquery Load jobs with WRITE_TRUNCATE disposition may truncate valid records. https://github.com/apache/beam/issues/24415 [Bug]: Cannot find a matching Calcite SqlTypeName for Beam type: LOGICAL_TYPE seen in 2.44.0 SNAPSHOT https://github.com/apache/beam/issues/24383 [Bug]: Daemon will be stopped at the end of the build after the daemon was no longer found in the daemon registry https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot be passed to flink runner https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to lock gradle https://github.com/apache/beam/issues/24263 [Bug]: Remote call on apache-beam-jenkins-3 failed. The channel is closing down or has closed down https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/22969 Discrepancy in behavior of `DoFn.process()` when `yield` is combined with `return` statement, or vice versa https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently skips most of records without job fail https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22321 PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing on jenkins https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output to Failed Inserts PCollection https://github.com/apache/beam/issues/21561 ExternalPythonTransformTest.trivialPythonTransform flaky https://github.com/apache/beam/issues/21480 flake: FlinkRunnerTest.testEnsureStdoutStdErrIsRestored https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21462 Flake in org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use https://github.com/apache/beam/issues/21261 org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer is flaky https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit data at GC time https://github.com/apache/beam/issues/21121 apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it flakey https://github.com/apache/beam/issues/21113 testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky https://github.com/apache/beam/issues/20976 apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics is flaky https://github.com/apache/beam/issues/20975 org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: false] is flaky https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK harness startup https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM on Flink https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit empty pane when it should https://github.com/apache/beam/issues/19814 Flink streaming flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful https://github.com/apache/beam/issues/19734 WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp order (sickbayed) https://github.com/apache/beam/issues/19465 Explore possibilities to lower in-use IP address quota footprint. https://github.com/apache/beam/issues/19241 Python Dataflow integration tests should export the pipeline Job ID and console output to Jenkins Test Result section P1 Issues with no update in the last week: https://github.com/apache/beam/issues/24100 [Bug]: `Filter.whereFieldName` appears in docs but not available https://github.com/apache/beam/issues/23906 [Bug]: Dataflow jpms tests fail on the 2.43.0 release branch https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true for unequal rows