Thanks, Danny!

> On 24 Jun 2022, at 19:23, Danny McCormick <> wrote:
> Sure, I put up a fix - 
> <>
> On Fri, Jun 24, 2022 at 1:20 PM Alexey Romanenko < 
> <>> wrote:
>> > 2. The links in this report start with api.github.* and don’t take us 
>> > directly to the issues.
>> > Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>> This is already fixed - Pablo actually beat me to it! 
>> <>
> It adds also a colon after URL and some mail clients consider it as a part of 
> URL which leads to a broken link.
> Should we just remove a colon there or add a space between?
> —
> Alexey
>> Thanks,
>> Danny
>> On Thu, Jun 23, 2022 at 8:30 PM Brian Hulette < 
>> <>> wrote:
>> +1 for that proposal!
>> > 1. P2 and P3 issues should be noticed and resolved as well. Shall we have 
>> > a longer time window for the rest of not triaged or stagnate issues and 
>> > include them?
>> I worry these lists would get _very_ long and wouldn't be actionable. But 
>> maybe it's worth reporting something like "There are 376 P2's with no update 
>> in the last 6 months" with a link to a query?
>> > 2. The links in this report start with api.github.* and don’t take us 
>> > directly to the issues.
>> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>> On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada < 
>> <>> wrote:
>> Thanks. I like the proposal, and I've found the emails useful.
>> Best
>> -P.
>> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang < 
>> <>> wrote:
>> Sounds good! It’s like our internal reports of JIRA tickets exceeding SLA 
>> time and having no response from engineers.  We either resolve them or 
>> downgrade the priority to extend time window.
>> Besides,
>> 1. P2 and P3 issues should be noticed and resolved as well. Shall we have a 
>> longer time window for the rest of not triaged or stagnate issues and 
>> include them?
>> 2. The links in this report start with api.github.* and don’t take us 
>> directly to the issues.
>> Danny McCormick < 
>> <>>于2022年6月24日 周五04:48写道:
>> That generally sounds right to me - I also would vote that we consolidate to 
>> 1 email and stop distinguishing between flaky P1s and normal P1s.
>> So the single daily report would be:
>> - Unassigned P0s
>> - P0s with no update in the last 36 hours
>> - Unassigned P1s
>> - P1s with no update in the last 7 days
>> I think that will generate a pretty good list of issues that require some 
>> kind of action.
>> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles < 
>> <>> wrote:
>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more like 
>> ~hours for true outages of CI/website/etc) and P1s > 7 days?
>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette < 
>> <>> wrote:
>> I think that Danny's alternate proposal (a daily email that show only issues 
>> last updated >7 days ago, and those with no assignee) fits well with the two 
>> goals you describe, if we include "triage needed" issues in the latter 
>> category. Maybe we also explicitly separate these two concerns in the report?
>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles < 
>> <>> wrote:
>> Forking thread because lots of people may just ignore this topic, per the 
>> discussion :-)
>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>> I'll add some other outcomes of these emails:
>>  - people file P0s that are not outages and P1s that are not data loss and I 
>> downgrade them
>>  - I randomly open up a few flaky test bugs and see if I can fix them really 
>> quick
>>  - people file legit P0s and P1s and I subscribe and follow them
>> Of these, only the last one seems important (not just that *I* follow them, 
>> but that new P0s and P1s get immediate attention from many eyes)
>> So maybe one take on the goal is to:
>>  - have new P0s and P1s evaluated quickly: P0s are an outage or outage-like 
>> occurrence that needs immediate remedy, and P1s need to be evaluated for 
>> release blocking, etc.
>>  - make sure P0s and P1s get attention appropriate to their priority
>> It can also be helpful to just state the failure modes which would happen by 
>> default if we don't have a good process or automation:
>>  - Real P0 gets filed and not noticed or fixed in a timely manner, blocking 
>> users and/or community in real time
>>  - Real P1 gets filed and not noticed, so release goes out with known data 
>> loss bug or other total loss of functionality
>>  - Non-real P0s and P1s accumulate, throwing off our data and making it hard 
>> to find the real problems
>>  - Flakes are never fixed
>> WDYT?
>> If we have P0s and P1s in the "awaiting triage" state, those are the ones we 
>> need to notice. Then for a P0 or P1 outside of that state, we just need some 
>> way of making sure it doesn't stagnate. Or if it does stagnate, that 
>> empirically demonstrates it isn't really P1 (just like our P2 to P3 
>> downgrade automation). If everything is P1, nothing is, as they say.
>> Kenn
>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick < 
>> <>> wrote:
>> > Maybe it would be helpful to sort these by last update time (and 
>> > potentially include that information in the email). Then we can at least 
>> > prioritize them instead of looking at a big wall of issues.
>> I agree that this is a good idea (and pretty trivial to do). I'll update the 
>> automation to do that once we get consensus on an approach.
>> > I think the motivation for daily emails is that per the priorities guide 
>> > [1] P1 issues should be getting "continuous status updates". If these 
>> > issues aren't actually that important, I think the noise is good as it 
>> > should motivate us to prioritize them correctly. In practice that hasn't 
>> > been happening though...
>> I guess the questions here are:
>> 1) What is the goal of this email?
>> 2) Is it effective at accomplishing that goal.
>> I think you're saying that the goal (or a goal) is to highlight issues that 
>> aren't getting the attention they need; if that's our goal, then I don't 
>> think this is a particularly effective mechanism for it because (a) its very 
>> unclear which issues fall into that category and (b) there are too many to 
>> manually go through on a daily basis. From the email alone, it's not clear 
>> to me that any of the issues above "shouldn't" be P1s (though I'd guess 
>> you're right that some/many of them don't belong since most were created 
>> before the Jira -> GH migration based on the titles). I'd also argue that a 
>> daily email just desensitizes us to them since there almost always will be 
>> some valid P1s that don't need extra attention.
>> I do still think this could have value as a weekly email, with the goal 
>> being "it's probably a good idea for someone to take a look at each of 
>> these". Another option would be to only include issues with no action in the 
>> last 7 days and/or no assignees and keep it daily.
>> A couple side notes:
>> - No matter what we do, if we keep the current automation in any form we 
>> should fix the url from 
>> <> to 
>> <> - the current links are very 
>> annoying.
>> - After I send this, I will do a pass of the current P1s since it does 
>> indeed seem like too many are P1s and many should actually be P2s (or lower).
>> Thanks,
>> Danny
>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette < 
>> <>> wrote:
>> I think the motivation for daily emails is that per the priorities guide [1] 
>> P1 issues should be getting "continuous status updates". If these issues 
>> aren't actually that important, I think the noise is good as it should 
>> motivate us to prioritize them correctly. In practice that hasn't been 
>> happening though...
>> Maybe it would be helpful to sort these by last update time (and potentially 
>> include that information in the email). Then we can at least prioritize them 
>> instead of looking at a big wall of issues.
>> Brian
>> [1] 
>> <>
>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick < 
>> <>> wrote:
>> I think a weekly summary seems like a good idea for the P1 issues and flaky 
>> tests, though daily still seems appropriate for P0 issues. I put up 
>> <> to just send the P1/flaky test 
>> reports on Wednesdays, if anyone objects please let me know - I'll wait on 
>> merging til tomorrow to leave time for feedback (and it's always reversible 
>> 🙂).
>> Thanks,
>> Danny
>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang < 
>> <>> wrote:
>> Hi all,
>> what is this daily summary intended for? Not all issues look like P1. And 
>> will a weekly summary be less noise?
>> < <>>于2022年6月22日 周三23:45写道:
>> This is your daily summary of Beam's current P1 issues, not including flaky 
>> tests.
>>     See 
>> <> for the 
>> meaning and expectations around P1 issues.
>> <>: [Playground] 
>> Implement Share Any Code feature on the frontend
>> <>: [Bug]: No way to 
>> read or write to file when running Beam in Flink
>> <>: [Bug]: Reject 
>> illformed GBK Coders
>> <>: [Feature Request]: 
>> Flink runner savepoint backward compatibility 
>> <>: [Bug]: BigQuery 
>> Storage Write API implementation does not support table partitioning
>> <>: Dataflow runner 
>> creates a new timer whenever the output timestamp is change
>> <>: [Playground Task]: 
>> Migrate from Google Analytics to Matomo Cloud
>> <>: Data missing when 
>> using CassandraIO.Read
>> <>: 404s in BigQueryIO 
>> don't get output to Failed Inserts PCollection
>> <>: Python Streaming 
>> job failing to drain with BigQueryIO write errors
>> <>: 
>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>> <>: SpannerWriteIT 
>> failing in beam PostCommit Java V1
>> <>: 
>> --dataflowServiceOptions=use_runner_v2 is broken
>> <>: 
>> DataflowPipelineResult does not raise exception for unsuccessful states.
>> <>: BigQuery Storage 
>> API insert with writeResult retry and write to error table
>> <>: Install Python 
>> wheel and dependencies to local venv in SDK harness
>> <>: 
>> doesn't pick up new TopicPartitions
>> <>: Add integration 
>> testing for BQ Storage API  write modes
>> <>: WriteToBigQuery 
>> Dynamic table destinations returns wrong tableId
>> <>: Beam x-lang 
>> Dataflow tests failing due to _InactiveRpcError
>> <>: 
>> PVR_Spark2_Streaming perma-red
>> <>: Simplify version 
>> override for Dev versions of the Go SDK.
>> <>: Kafka commit offset 
>> drop data on failure for runners that have non-checkpointing shuffle
>> <>: Delete orphaned 
>> files
>> <>: Race between member 
>> variable being accessed due to leaking uninitialized state via 
>> OutboundObserverFactory
>> <>: WriteToBigQuery 
>> submits a duplicate BQ load job if a 503 error code is returned from 
>> googleapi
>> <>: 
>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>  'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>> <>: (Broken Pipe 
>> induced) Bricked Dataflow Pipeline 
>> <>: Python AfterAny, 
>> AfterAll do not follow spec
>> <>: Python DirectRunner 
>> does not emit data at GC time
>> <>: Consumer group with 
>> random prefix
>> <>: Dataflow error in 
>> CombinePerKey operation
>> <>: Either Create or 
>> DirectRunner fails to produce all elements to the following transform
>> <>: Multiple jobs 
>> running on Flink session cluster reuse the persistent Python environment.
>> <>: Migrate to the next 
>> version of Python `requests` when released
>> <>: "Java IO IT Tests" 
>> - missing data in grafana
>> <>: JdbcIO date 
>> conversion is sensitive to OS
>> <>: Dataflow 
>> SocketException (SSLException) error while trying to send message from Cloud 
>> Pub/Sub to BigQuery
>> <>: Java creates an 
>> incorrect pipeline proto when core-construction-java jar is not in the 
>> <>: codecov/patch has 
>> poor behavior
>> <>: SDF BoundedSource 
>> seems to execute significantly slower than 'normal' BoundedSource
>> <>: 
>> With Flink Kafka
>> <>: Portable runners 
>> should be able to issue checkpoints to Splittable DoFn
>> <>: 
>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some 
>> Avro logical types
>> <>: Python Beam SDK 
>> Harness hangs when installing pip packages
>> <>: XmlIO.Read does not 
>> handle XML encoding per spec
>> <>: JmsIO is not 
>> acknowledging messages correctly
>> <>: No trigger early 
>> repeatedly for session windows
>> <>: Cross-language 
>> consistency (RequiresStableInputs) is quietly broken (at least on portable 
>> flink runner)
>> <>: Timer with dataflow 
>> runner can be set multiple times (dataflow runner)
>> <>: Beam metrics should 
>> be displayed in Flink UI "Metrics" tab
>> <>: Kafka 
>> commitOffsetsInFinalize OOM on Flink
>> <>: Support for coder 
>> argument in WriteToBigQuery
>> <>: FileBasedSink: 
>> allow setting temp directory provider per dynamic destination
>> <>: Make non-portable 
>> Splittable DoFn the only option when executing Java "Read" transforms
>> <>: SpannerIO tests 
>> don't actually assert anything.
>> <>: python 
>> CombineGlobally().with_fanout() cause duplicate combine results for sliding 
>> windows
>> <>: 
>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already 
>> allocated"
>> <>: FileIO writeDynamic 
>> with AvroIO.sink not writing all data
>> <>: Remove insecure ssl 
>> options from MongoDBIO
>> <>: SortValues should 
>> fail if SecondaryKey coder is not deterministic
>> <>: Python direct 
>> runner doesn't emit empty pane when it should
>> <>: 
>> Environment-sensitive provisioning for Dataflow
>> <>: [SQL] Some Hive 
>> tests throw NullPointerException, but get marked as passing (Direct Runner)
>> <>: datetime and 
>> decimal should be logical types
>> <>: Add support for 
>> remaining data types in python RowCoder 
>> <>: PubsubIO returns 
>> empty message bodies for all messages read
>> <>: User reports 
>> protobuf ClassChangeError running against 2.6.0 or above
>> <>: KafkaIO doesn't 
>> commit offsets while being used as bounded source
>> <>: [Bug]: Java 
>> Precommit permared

Reply via email to