Re: [DISCUSS] Jenkins -> GitHub Actions ?

2022-11-18 Thread Damon Douglas
Adding more helpful information about Cloud Build.  With it's additional
features, we can:

1. Define GitHub triggers on certain branches and optionally multiple
wildcarded includes and exclude paths.
2. Flag whether to trigger on pull requests or specific branch commits
3. Version control the terraform code that provisions the triggers;
typically applied once per GCP project setup
4. See on GitHub progress as the trigger executes and success/failure

On Fri, Nov 18, 2022, 4:20 PM Damon Douglas  wrote:

> I wonder if some integration tests could be offloaded to their respective
> cloud provider. For example, the Google cloud related integration tests
> could be executed on Cloud build.  Cloud Build's service account, or
> custom, could have the minimally necessary IAM roles to access Google Cloud
> resources as part of its execution.  The 'build' in its name shouldn't
> mislead one to think it's only for building.  It's essentially just a
> container that when connected to the GitHub repository its triggers can
> report back the success or failure of a run.  As a bonus: no need for
> static service account keys 朗朗朗
>
> For those reading this and do not know what Cloud Build is, please see
> https://youtu.be/Bvo6jzC3J_A
>
> For information about service accounts, please see
> https://youtu.be/xXk1YlkKW_k
>
> For information about service account keys, please see
> https://youtu.be/SDhMwyyd9_0
>
> And finally, IAM permissions, please see https://youtu.be/Sdt-i-Q7tyA
>
> On Wed, Oct 19, 2022, 8:32 AM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> As you probably noticed, there's a lot of work going on around adding
>> more GitHub Actions workflows.
>>
>> Can we fully migrate to GitHub Actions? Similar to our GitHub Issues
>> migration (but less user-facing) it would bring us on to "default"
>> infrastructure that more people understand and is maintained by GitHub.
>>
>> So far we have hit some serious roadblocks. It isn't just a simple
>> migration. We have to weigh doing the work to get there.
>>
>> I started a document with a table of the things we get from Jenkins that
>> we need to be sure to have for GitHub Actions before we could think about
>> migrating:
>>
>> https://s.apache.org/beam-jenkins-to-gha
>>
>> Can you please help me by adding things that we get from Jenkins, and if
>> you know how to get them from GitHub Actions add that too.
>>
>> Thanks!
>>
>> Kenn
>>
>


Re: [DISCUSS] Jenkins -> GitHub Actions ?

2022-11-18 Thread Damon Douglas
I wonder if some integration tests could be offloaded to their respective
cloud provider. For example, the Google cloud related integration tests
could be executed on Cloud build.  Cloud Build's service account, or
custom, could have the minimally necessary IAM roles to access Google Cloud
resources as part of its execution.  The 'build' in its name shouldn't
mislead one to think it's only for building.  It's essentially just a
container that when connected to the GitHub repository its triggers can
report back the success or failure of a run.  As a bonus: no need for
static service account keys 朗朗朗

For those reading this and do not know what Cloud Build is, please see
https://youtu.be/Bvo6jzC3J_A

For information about service accounts, please see
https://youtu.be/xXk1YlkKW_k

For information about service account keys, please see
https://youtu.be/SDhMwyyd9_0

And finally, IAM permissions, please see https://youtu.be/Sdt-i-Q7tyA

On Wed, Oct 19, 2022, 8:32 AM Kenneth Knowles  wrote:

> Hi all,
>
> As you probably noticed, there's a lot of work going on around adding more
> GitHub Actions workflows.
>
> Can we fully migrate to GitHub Actions? Similar to our GitHub Issues
> migration (but less user-facing) it would bring us on to "default"
> infrastructure that more people understand and is maintained by GitHub.
>
> So far we have hit some serious roadblocks. It isn't just a simple
> migration. We have to weigh doing the work to get there.
>
> I started a document with a table of the things we get from Jenkins that
> we need to be sure to have for GitHub Actions before we could think about
> migrating:
>
> https://s.apache.org/beam-jenkins-to-gha
>
> Can you please help me by adding things that we get from Jenkins, and if
> you know how to get them from GitHub Actions add that too.
>
> Thanks!
>
> Kenn
>


[GitHub] [beam-site] lukecwik commented on pull request #637: Fix previously released Javadocs to not contain -SNAPSHOT in the version.

2022-11-18 Thread GitBox


lukecwik commented on PR #637:
URL: https://github.com/apache/beam-site/pull/637#issuecomment-1320601743

   R: @slilichenko @kileys 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [beam-site] lukecwik opened a new pull request, #637: Fix previously released Javadocs to not contain -SNAPSHOT in the version.

2022-11-18 Thread GitBox


lukecwik opened a new pull request, #637:
URL: https://github.com/apache/beam-site/pull/637

   Ran `find ./ -type f -exec sed -i 's/-SNAPSHOT//g' {} \;` to search and 
remove -SNAPSHOT
   
   This is to apply https://github.com/apache/beam/pull/24269 to previous 
javadoc documentation releases for https://github.com/apache/beam/issues/24266


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[ANNOUNCE] Apache Beam 2.43.0 Released

2022-11-18 Thread Chamikara Jayalath via dev
The Apache Beam team is pleased to announce the release of version 2.43.0.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes bug fixes, features, and improvements detailed on the
Beam blog: https://beam.apache.org/blog/beam-2.43.0/  and the Github
release page https://github.com/apache/beam/releases/tag/v2.43.0

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.43.0.

-- Cham, on behalf of The Apache Beam team


Re: [DISCUSS] Jenkins -> GitHub Actions ?

2022-11-18 Thread Tomo Suzuki via dev
Kenn, thank you for the summary. SGTM. Looking forward to GitHub Actions.

On Mon, Nov 7, 2022 at 12:58 PM Kenneth Knowles  wrote:

> OK, it seems like there is general consensus. Not too much action on the
> document. I will summarize the gaps that don't have an answer in the doc,
> and my new opinion of how important they are:
>
>  - [required] Run specific non-default workflow on PR
>  - [required] View history of a workflow
>  - [required] Publish nightly snapshots
>  - [required] Run workflow on dedicated worker pool for performance testing
>  - [important but not required] Summarize flakiness statistics of one or
> all workflows
>  - [important but not required] History of all/many workflows in a single
> view
>  - [nice to have] History of specific test case (not just the workflow
> level)
>
> Do any of these seem like I got the importance wrong?
>
> Kenn
>
> On Mon, Nov 7, 2022 at 9:09 AM Austin Bennett  wrote:
>
>> +1
>>
>> Also would help address a good amount of what concerns me that was
>> [sorta] raised by
>> https://lists.apache.org/thread/7jr99nc5xsb3ft1d75kb0ml32bzw89rv
>>
>>
>> Once we think this is something we want to do, but might be
>> blocked/concerned because of lack of definitively comparable features, I'd
>> be happy to take a look at what exists in the wider ecosystem or could be
>> built.
>>
>> Cheers -
>>
>>
>>
>> On Fri, Oct 21, 2022 at 11:10 AM Ismaël Mejía  wrote:
>>
>>> +1 Github Actions are more intuitive and easy to modify and test for
>>> everyone.
>>> Also Beam wins because that makes one less system to maintain.
>>>
>>> Regards,
>>> Ismaël
>>>
>>> On Wed, Oct 19, 2022 at 5:50 PM Danny McCormick via dev
>>>  wrote:
>>> >
>>> > Thanks for kicking this conversation off. I'm +1 on migrating, but
>>> only once we've found a specific replacement for easy observability (which
>>> workflows have been failing lately, and how often) and trigger phrases (for
>>> retries and workflows that aren't automatically kicked off but should be
>>> run for extra validation, e.g. postcommits). Until we have viable
>>> replacements, I don't think we should make the move. Publishing nightly
>>> snapshots is eventually also a must to fully migrate, but probably doesn't
>>> need to block us from making progress here.
>>> >
>>> > With those caveats, the reason that I'm +1 on moving is that our
>>> Jenkins reliability has been rough. Since I joined the project in January,
>>> I can think of 3 different incidents that significantly harmed our ability
>>> to do work.
>>> >
>>> > 1. Jenkins triggers cause multi-day outage - this led to a multi-day
>>> code freeze, and we lost our trigger functionality for days afterwards.
>>> Investigating/restoring our state ate up a pretty full week for me.
>>> > 2. Jenkins plugin cause multi-day outage - this led to multiple days
>>> of Jenkins downtime before eventually being resolved by Infra.
>>> > 3. Cert issues cause many workers to go down - I don't have a thread
>>> for this because I handled most of the investigation the day of, but many
>>> of our workers went down for around a day and nobody noticed until queue
>>> time reached 6+ hours for each workflow.
>>> >
>>> > There may be others that I'm overlooking.
>>> >
>>> > GitHub Actions isn't a magic bullet to fix these problems, but it
>>> minimizes the amount of infra that we're maintaining ourselves, increases
>>> the isolation between workflows (catastrophic failure is less likely), has
>>> uptime guarantees, and is more likely to receive investment going forward
>>> (we're likely to get increasing benefits over time for free). We've also
>>> done a lot of exploration in this area already, so we're not starting from
>>> scratch.
>>> >
>>> > Thanks,
>>> > Danny
>>> >
>>> > On Wed, Oct 19, 2022 at 11:32 AM Kenneth Knowles 
>>> wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> As you probably noticed, there's a lot of work going on around adding
>>> more GitHub Actions workflows.
>>> >>
>>> >> Can we fully migrate to GitHub Actions? Similar to our GitHub Issues
>>> migration (but less user-facing) it would bring us on to "default"
>>> infrastructure that more people understand and is maintained by GitHub.
>>> >>
>>> >> So far we have hit some serious roadblocks. It isn't just a simple
>>> migration. We have to weigh doing the work to get there.
>>> >>
>>> >> I started a document with a table of the things we get from Jenkins
>>> that we need to be sure to have for GitHub Actions before we could think
>>> about migrating:
>>> >>
>>> >> https://s.apache.org/beam-jenkins-to-gha
>>> >>
>>> >> Can you please help me by adding things that we get from Jenkins, and
>>> if you know how to get them from GitHub Actions add that too.
>>> >>
>>> >> Thanks!
>>> >>
>>> >> Kenn
>>>
>>

-- 
Regards,
Tomo


Re: [DISCUSS] Avro dependency update, design doc

2022-11-18 Thread Alexey Romanenko
Since there are no principal objections against the proposed option 2 (extract 
Avro-related code from “core” to Avro extension but keep it in “core” for some 
time because of transition period), then we will try to move forward and take 
this path. 

I’m pretty sure that we will face some hidden issues while working on this, so 
I’ll keep you posted =)

—
Alexey

> On 11 Nov 2022, at 18:05, Austin Bennett  wrote:
> 
> @Moritz: I *think* should be fine, and don't have anything specific to offer 
> for what might go wrong throughout the process.  :-) :shrug:
> 
> 
> 
> On Fri, Nov 11, 2022 at 2:07 AM Moritz Mack  > wrote:
>> Thanks a lot for the feedback so far! I can only second Alexey. It was 
>> painful to come to realize that the only feasible option seems to be copying 
>> a lot of code during the transition phase.
>> 
>> For that reason, it will be critical to be disciplined about the removal of 
>> the to-be deprecated code in core and, ahead of time, agree on when to 
>> remove it again. Any thought on how long the transition phase should be?
>> 
>>  
>> 
>>  I am concerned of what could go wrong for users in the 
>> in-between/transition state while more slowly transitioning avro to 
>> extension.
>> 
>>  
>> 
>> @Austin Do you have any specific concern in mind here?
>> 
>> To minimize this risk, we propose that all APIs should be kept as is to make 
>> the migration as easy as possible and kick off with the Avro version used in 
>> core. The only  thing that changes will be package names.
>> 
>>  
>> 
>> / Moritz
>> 
>>  
>> 
>> On 10.11.22, 22:46, "Kenneth Knowles" > > wrote:
>> 
>>  
>> 
>> Thank you for writing this document. It really helps to understand the 
>> options. I agree that option 2 (make a new extension and deprecate from 
>> core) seems best. I think +Reuven Lax might have the most context on any 
>> technical issue we will
>> 
>> Thank you for writing this document. It really helps to understand the 
>> options. I agree that option 2 (make a new extension and deprecate from 
>> core) seems best. I think +Reuven Lax  might have 
>> the most context on any technical issue we will encounter around schema 
>> codegen.
>> 
>>  
>> 
>> Kenn
>> 
>>  
>> 
>> On Thu, Nov 10, 2022 at 7:24 AM Alexey Romanenko > > wrote:
>> 
>> Personally, I think that keeping two mostly identical versions of 
>> Avro-related code in two different places (“core" and "extension") is rathe 
>> bad practice, especially, in case of need to fix some issues there - though, 
>> it’s a very low risk there since this code is quite mature and it’s not 
>> touched often. On the other hand, it should give time for users (several 
>> Beam releases) to update their code and use Avro from extension artifact 
>> instead of core.
>> 
>>  
>> 
>> Though, if we accept that this breaking change at compile time is allowable, 
>> then this process of transition should be much faster and can be performed 
>> within only one Beam release. Our main concern here is runtime breaking 
>> changes that we can miss but must be avoided by all means. 
>> 
>>  
>> 
>> —
>> 
>> Alexey
>> 
>> 
>> 
>> 
>> On 9 Nov 2022, at 18:47, Austin Bennett > > wrote:
>> 
>>  
>> 
>> Being tied to a specific version of a dependency, and esp. one that is 
>> not-[actually-long-term]critical, sounds like a problem.  It doesn't seem 
>> like Avro needs to be in core.  I am in favor of about any path someone 
>> wants to address towards removing that from core [ #2 in the design doc 
>> seems reasonable ].  
>> 
>>  
>> 
>> Naturally, having ways to more easily change versions [esp. to remediate 
>> CVEs, but for any specific reason ], seems very valuable.
>> 
>>  
>> 
>> It reads as a significant problem; I wouldn't take issue with a breaking [ 
>> compile time ] change, if that got things addressed and somewhat 
>> straightforwardly - I am concerned of what could go wrong for users in the 
>> in-between/transition state while more slowly transitioning avro to 
>> extension.
>> 
>>  
>> 
>> On Wed, Nov 9, 2022 at 5:43 AM Alexey Romanenko > > wrote:
>> 
>> Any thoughts on this? For now, we'd need to decide which path finally to 
>> take to move forward.
>> 
>>  
>> 
>> Thanks in advance!
>> 
>>  
>> 
>> —
>> 
>> Alexey
>> 
>> 
>> 
>> 
>> On 4 Nov 2022, at 16:44, Alexey Romanenko > > wrote:
>> 
>>  
>> 
>> Hi all,
>> 
>>  
>> 
>> Following-up an Avro dependency update discussion [1] that showed a lot of 
>> uncertainties to move forward, Moritz and I decided to create a design 
>> document [2] with potential options, that we believe, can be considered and 
>> used further. Unfortunately, all solutions lead to breaking changes in some 
>> way, though, for some of them the negative effect can be reduced by 
>> preparing users for this in advance and make this 

Beam High Priority Issue Report (53)

2022-11-18 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23745 [Bug]: Samza 
AsyncDoFnRunnerTest.testSimplePipeline is flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22969 Discrepancy in behavior of 
`DoFn.process()` when `yield` is combined with `return` statement, or vice versa
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21561 
ExternalPythonTransformTest.trivialPythonTransform flaky
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21113 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20975 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM 
on Flink
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19734 
WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp 
order (sickbayed)
https://github.com/apache/beam/issues/19241 Python Dataflow integration tests 
should export the pipeline Job ID and console output to Jenkins Test Result 
section


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/23906 [Bug]: Dataflow jpms tests fail on 
the 2.43.0 release branch
https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true 
for unequal rows
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/23489 [Bug]: add DebeziumIO to the 
connectors page
https://github.com/apache/beam/issues/23306 [Bug]: BigQueryBatchFileLoads in 
python loses data when using WRITE_TRUNCATE
https://github.com/apache/beam/issues/23286 [Bug]: 
beam_PerformanceTests_InfluxDbIO_IT Flaky > 50 % Fail 
https://github.com/apache/beam/issues/22891 [Bug]: 
beam_PostCommit_XVR_PythonUsingJavaDataflow is flaky
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22115 [Bug]: 
apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses
 is flaky
https://github.com/apache/beam/issues/22011 [Bug]: 
org.apache.beam.sdk.io.aws2.kinesis.KinesisIOWriteTest.testWriteFailure flaky
https://github.com/apache/beam/issues/21709 
beam_PostCommit_Java_ValidatesRunner_Samza Failing
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently