Re: [NEED HELP] Populating the change list for 2.30.0 release

2021-05-27 Thread Ahmet Altay
Feel free to use whatever is already in the changes.md under the 2.30.0
section. We can always update this, during voting, even after the release.

On Thu, May 27, 2021 at 11:50 AM Heejong Lee  wrote:

> Hi Beam developers,
>
> I'm gathering the information for the changes in the 2.30.0 release. If
> you have any idea about important *new features* / *breaking changes* /
> *deprecation* / *known issues* for the 2.30.0 release, please note down
> them in CHANGES.md or just let me know.
>
> Thanks!
>


Re: Add Member

2021-05-27 Thread Kenneth Knowles
Welcome!

I added Jira user pawas2...@gmail.com to the "Contributors" role so you can
be assigned tickets.

Kenn

On Thu, May 27, 2021 at 5:18 PM Pawas Chhokra  wrote:

> Hi Beam Team,
>
> I am working for the Samza team at LinkedIn and I would like to contribute
> to the Samza Runner in Beam. My GitHub username is *PawasChhokra*. Could
> I please get the permission to add/assign tickets in the Beam Jira?
>
> Thanks & Regards,
> Pawas Chhokra
>


Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-05-27 Thread Ke Wu
Good to know. We are working on running java portable pipeline for Samza runner 
and I believe we could take on the task to enhance the java workflow to support 
timeout/retry etc on gRPC calls. 

Created BEAM-12419  to track 
the work.

Best,
Ke

> On May 27, 2021, at 4:30 PM, Kyle Weaver  wrote:
> 
> I don't think there's any specific reason we don't set a timeout, I'm 
> guessing it was just never worth the effort of implementing. If it's stuck it 
> should be pretty obvious from the logs: "Still waiting for startup of 
> environment from {} for worker id {}"
> 
> On Thu, May 27, 2021 at 4:04 PM Ke Wu  > wrote:
> Hi Kyle,
> 
> Thank you for the prompt response and apologize for the late reply. 
> 
> [1] seems to be only available in python portable_runner but not java 
> PortableRunner, is it intended or we could add similar changes in java as 
> well?
> 
> [2] makes sense to block since the wait/retry is handled in the previous 
> prepare(), however, is there any specific reason why we do not want to 
> support timeout in start worker request?
> 
> Best,
> Ke
> 
>> On May 14, 2021, at 11:25 AM, Kyle Weaver > > wrote:
>> 
>> 1. and 2. are both facilitated by GRPC, which takes care of most of the 
>> retry/wait logic. In some places we have a configurable timeout (which 
>> defaults to 60s) [1], while in other places we block [2][3].
>> 
>> [1] https://issues.apache.org/jira/browse/BEAM-7933 
>> 
>> [2] 
>> https://github.com/apache/beam/blob/51541a595b09751dd3dde2c50caf2a968ac01b68/sdks/python/apache_beam/runners/portability/portable_runner.py#L238-L242
>>  
>> 
>> [3] 
>> https://github.com/apache/beam/blob/9601bdef8870bc6acc7895c06252e43ec040bd8c/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ExternalEnvironmentFactory.java#L115
>>  
>> 
>> On Fri, May 14, 2021 at 10:51 AM Ke Wu > > wrote:
>> Hello All,
>> 
>> I came across this question when I am reading Beam on Flink on Kubernetes 
>> 
>>  and flink-on-k8s-operator 
>> 
>>  and realized that there seems no retry/wait logic built in PortableRunner 
>> nor ExternalEnvironmentFactory, (correct me if I am wrong) which creates 
>> implications that:
>> 
>> 1. Job Server needs to be ready to accept request before SDK Client could 
>> submit request.
>> 2. External Worker Pool Service needs to be ready to accept start/stop 
>> worker request before runner starts to request.
>> 
>> This may bring some challenges on k8s since Flink opt to use multi 
>> containers pattern when bringing up a beam portable pipeline, in addition, I 
>> don’t find any special lifecycle management in place to guarantee the order, 
>> e.g. External Worker Pool Service container to start and ready before the 
>> task manager container to start making requests. 
>> 
>> I am wondering if I missed anything to guarantee the readiness of the 
>> dependent service or we are relying on that dependent containers are much 
>> lighter weigh so it should, in most time, be ready before the other 
>> container start to make requests. 
>> 
>> Best,
>> Ke
>> 
> 



Add Member

2021-05-27 Thread Pawas Chhokra
Hi Beam Team,

I am working for the Samza team at LinkedIn and I would like to contribute
to the Samza Runner in Beam. My GitHub username is *PawasChhokra*. Could I
please get the permission to add/assign tickets in the Beam Jira?

Thanks & Regards,
Pawas Chhokra


Re: Out of band pickling in Python (pickle5)

2021-05-27 Thread Stephan Hoyer
I'm unlikely to have bandwidth to take this one on, but I do think it would
be quite valuable!

On Thu, May 27, 2021 at 4:42 PM Brian Hulette  wrote:

> I filed https://issues.apache.org/jira/browse/BEAM-12418 for this. Would
> you have any interest in taking it on?
>
> On Tue, May 25, 2021 at 3:09 PM Brian Hulette  wrote:
>
>> Hm this would definitely be of interest for the DataFrame API, which is
>> shuffling pandas objects. This issue [1] confirms what you suggested above,
>> that pandas supports out-of-band pickling since DataFrames are mostly just
>> collections of numpy arrays.
>>
>> Brian
>>
>> [1] https://github.com/pandas-dev/pandas/issues/34244
>>
>> On Tue, May 25, 2021 at 2:59 PM Stephan Hoyer  wrote:
>>
>>> Beam's PickleCoder would need to be updated to pass the
>>> "buffer_callback" argument into pickle.dumps() and the "buffers" argument
>>> into pickle.loads(). I expect this would be relatively straightforward.
>>>
>>> Then it should "just work", assuming that data is stored in objects
>>> (like NumPy arrays or wrappers of NumPy arrays) that implement the
>>> out-of-band Pickle protocol.
>>>
>>>
>>> On Tue, May 25, 2021 at 2:50 PM Brian Hulette 
>>> wrote:
>>>
 I'm not aware of anyone looking at it.

 Will out-of-band pickling "just work" in Beam for types that implement
 the correct interface in Python 3.8?

 On Tue, May 25, 2021 at 2:43 PM Evan Galpin 
 wrote:

> +1
>
> FWIW I recently ran into the exact case you described (high
> serialization cost). The solution was to implement some not-so-intuitive
> alternative transforms in my case, but I would have very much appreciated
> faster serialization performance.
>
> Thanks,
> Evan
>
> On Tue, May 25, 2021 at 15:26 Stephan Hoyer  wrote:
>
>> Has anyone looked into out of band pickling for Beam's Python SDK,
>> i.e., Pickle protocol version 5?
>> https://www.python.org/dev/peps/pep-0574/
>> https://docs.python.org/3/library/pickle.html#out-of-band-buffers
>>
>> For Beam pipelines passing around NumPy arrays (or collections of
>> NumPy arrays, like pandas or Xarray) I've noticed that serialization 
>> costs
>> can be significant. Beam seems to currently incur at least one one (maybe
>> two) unnecessary memory copies.
>>
>> Pickle protocol version 5 exists for solving exactly this problem.
>> You can serialize collections of arbitrary Python objects in a fully
>> streaming fashion using memory buffers. This is a Python 3.8 feature, but
>> the "pickle5" library provides a backport to Python 3.6 and 3.7. It has
>> been supported by NumPy since version 1.16, released in January 2019.
>>
>> Cheers,
>> Stephan
>>
>


Re: Lots of VMs

2021-05-27 Thread Ahmet Altay
Folks, I will stop the VMs I listed in my first email sometime next week.
Feel free to resume them if you need them.

And if you have any other unused VMs please stop or delete them.

Thank you,
Ahmet

On Mon, May 10, 2021 at 10:08 AM Ahmet Altay  wrote:

> That is a good suggestion. I can do that sometime after 2 weeks for the
> VMs I listed. There are close to 200 VMs though, there might be more unused
> or underused VMs.
>
> On Mon, May 10, 2021 at 9:22 AM Brian Hulette  wrote:
>
>> Perhaps we should give people a week or two to justify keeping these
>> online and if we don't hear anything go ahead and shut them down?
>>
>> Brian
>>
>> On Fri, May 7, 2021 at 5:52 PM Ahmet Altay  wrote:
>>
>>> Hello,
>>>
>>> It looks like we have accumulated a bunch of running VMs with very low
>>> utilization. Some of them have temp, test etc. in their names and probably
>>> no longer needed. If you started a one time use only VM and no longer need
>>> it could you stop or delete those?
>>>
>>> A few that could be potentially deleted:
>>> beam-jenkins-clang-format
>>> new-ci-node-test
>>> temporary-jenkins-node-tmp-cleanup
>>> tmp-jenkins-node
>>> temporal-beam-jenkins-1
>>> temporal-beam-jenkins-2
>>> temporary-jenkins-node-tmp-cleanup
>>>
>>> List is here:
>>> http://console.cloud.google.com/compute/instances?project=apache-beam-testing=(%22instances%22:(%22s%22:%5B(%22i%22:%22recommendationSortKey%22,%22s%22:%220%22),(%22i%22:%22name%22,%22s%22:%220%22)%5D,%22p%22:0))
>>>
>>> Thank you!
>>> Ahmet
>>>
>>


Re: Out of band pickling in Python (pickle5)

2021-05-27 Thread Brian Hulette
I filed https://issues.apache.org/jira/browse/BEAM-12418 for this. Would
you have any interest in taking it on?

On Tue, May 25, 2021 at 3:09 PM Brian Hulette  wrote:

> Hm this would definitely be of interest for the DataFrame API, which is
> shuffling pandas objects. This issue [1] confirms what you suggested above,
> that pandas supports out-of-band pickling since DataFrames are mostly just
> collections of numpy arrays.
>
> Brian
>
> [1] https://github.com/pandas-dev/pandas/issues/34244
>
> On Tue, May 25, 2021 at 2:59 PM Stephan Hoyer  wrote:
>
>> Beam's PickleCoder would need to be updated to pass the "buffer_callback"
>> argument into pickle.dumps() and the "buffers" argument into
>> pickle.loads(). I expect this would be relatively straightforward.
>>
>> Then it should "just work", assuming that data is stored in objects (like
>> NumPy arrays or wrappers of NumPy arrays) that implement the out-of-band
>> Pickle protocol.
>>
>>
>> On Tue, May 25, 2021 at 2:50 PM Brian Hulette 
>> wrote:
>>
>>> I'm not aware of anyone looking at it.
>>>
>>> Will out-of-band pickling "just work" in Beam for types that implement
>>> the correct interface in Python 3.8?
>>>
>>> On Tue, May 25, 2021 at 2:43 PM Evan Galpin 
>>> wrote:
>>>
 +1

 FWIW I recently ran into the exact case you described (high
 serialization cost). The solution was to implement some not-so-intuitive
 alternative transforms in my case, but I would have very much appreciated
 faster serialization performance.

 Thanks,
 Evan

 On Tue, May 25, 2021 at 15:26 Stephan Hoyer  wrote:

> Has anyone looked into out of band pickling for Beam's Python SDK,
> i.e., Pickle protocol version 5?
> https://www.python.org/dev/peps/pep-0574/
> https://docs.python.org/3/library/pickle.html#out-of-band-buffers
>
> For Beam pipelines passing around NumPy arrays (or collections of
> NumPy arrays, like pandas or Xarray) I've noticed that serialization costs
> can be significant. Beam seems to currently incur at least one one (maybe
> two) unnecessary memory copies.
>
> Pickle protocol version 5 exists for solving exactly this problem. You
> can serialize collections of arbitrary Python objects in a fully streaming
> fashion using memory buffers. This is a Python 3.8 feature, but the
> "pickle5" library provides a backport to Python 3.6 and 3.7. It has been
> supported by NumPy since version 1.16, released in January 2019.
>
> Cheers,
> Stephan
>



Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-05-27 Thread Kyle Weaver
I don't think there's any specific reason we don't set a timeout, I'm
guessing it was just never worth the effort of implementing. If it's stuck
it should be pretty obvious from the logs: "Still waiting for startup of
environment from {} for worker id {}"

On Thu, May 27, 2021 at 4:04 PM Ke Wu  wrote:

> Hi Kyle,
>
> Thank you for the prompt response and apologize for the late reply.
>
> [1] seems to be only available in python portable_runner but not java
> PortableRunner, is it intended or we could add similar changes in java as
> well?
>
> [2] makes sense to block since the wait/retry is handled in the previous
> prepare(), however, is there any specific reason why we do not want to
> support timeout in start worker request?
>
> Best,
> Ke
>
> On May 14, 2021, at 11:25 AM, Kyle Weaver  wrote:
>
> 1. and 2. are both facilitated by GRPC, which takes care of most of the
> retry/wait logic. In some places we have a configurable timeout (which
> defaults to 60s) [1], while in other places we block [2][3].
>
> [1] https://issues.apache.org/jira/browse/BEAM-7933
> [2]
> https://github.com/apache/beam/blob/51541a595b09751dd3dde2c50caf2a968ac01b68/sdks/python/apache_beam/runners/portability/portable_runner.py#L238-L242
> [3]
> https://github.com/apache/beam/blob/9601bdef8870bc6acc7895c06252e43ec040bd8c/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ExternalEnvironmentFactory.java#L115
>
> On Fri, May 14, 2021 at 10:51 AM Ke Wu  wrote:
>
>> Hello All,
>>
>> I came across this question when I am reading Beam on Flink on Kubernetes
>> 
>>  and
>> flink-on-k8s-operator
>> 
>>  and
>> realized that there seems no retry/wait logic built in PortableRunner
>> nor ExternalEnvironmentFactory, (correct me if I am wrong) which creates
>> implications that:
>>
>> 1. Job Server needs to be ready to accept request before SDK Client could
>> submit request.
>> 2. External Worker Pool Service needs to be ready to accept start/stop
>> worker request before runner starts to request.
>>
>> This may bring some challenges on k8s since Flink opt to use multi
>> containers pattern when bringing up a beam portable pipeline, in addition,
>> I don’t find any special lifecycle management in place to guarantee the
>> order, e.g. External Worker Pool Service container to start and ready
>> before the task manager container to start making requests.
>>
>> I am wondering if I missed anything to guarantee the readiness of the
>> dependent service or we are relying on that dependent containers are much
>> lighter weigh so it should, in most time, be ready before the other
>> container start to make requests.
>>
>> Best,
>> Ke
>>
>>
>


Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-05-27 Thread Ke Wu
Hi Kyle,

Thank you for the prompt response and apologize for the late reply. 

[1] seems to be only available in python portable_runner but not java 
PortableRunner, is it intended or we could add similar changes in java as well?

[2] makes sense to block since the wait/retry is handled in the previous 
prepare(), however, is there any specific reason why we do not want to support 
timeout in start worker request?

Best,
Ke

> On May 14, 2021, at 11:25 AM, Kyle Weaver  wrote:
> 
> 1. and 2. are both facilitated by GRPC, which takes care of most of the 
> retry/wait logic. In some places we have a configurable timeout (which 
> defaults to 60s) [1], while in other places we block [2][3].
> 
> [1] https://issues.apache.org/jira/browse/BEAM-7933 
> 
> [2] 
> https://github.com/apache/beam/blob/51541a595b09751dd3dde2c50caf2a968ac01b68/sdks/python/apache_beam/runners/portability/portable_runner.py#L238-L242
>  
> 
> [3] 
> https://github.com/apache/beam/blob/9601bdef8870bc6acc7895c06252e43ec040bd8c/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ExternalEnvironmentFactory.java#L115
>  
> 
> On Fri, May 14, 2021 at 10:51 AM Ke Wu  > wrote:
> Hello All,
> 
> I came across this question when I am reading Beam on Flink on Kubernetes 
> 
>  and flink-on-k8s-operator 
> 
>  and realized that there seems no retry/wait logic built in PortableRunner 
> nor ExternalEnvironmentFactory, (correct me if I am wrong) which creates 
> implications that:
> 
> 1. Job Server needs to be ready to accept request before SDK Client could 
> submit request.
> 2. External Worker Pool Service needs to be ready to accept start/stop worker 
> request before runner starts to request.
> 
> This may bring some challenges on k8s since Flink opt to use multi containers 
> pattern when bringing up a beam portable pipeline, in addition, I don’t find 
> any special lifecycle management in place to guarantee the order, e.g. 
> External Worker Pool Service container to start and ready before the task 
> manager container to start making requests. 
> 
> I am wondering if I missed anything to guarantee the readiness of the 
> dependent service or we are relying on that dependent containers are much 
> lighter weigh so it should, in most time, be ready before the other container 
> start to make requests. 
> 
> Best,
> Ke
> 



Re: One Pager - Test Command Line Discoverability in Beam

2021-05-27 Thread Kenneth Knowles
On Wed, May 26, 2021 at 5:05 PM Kyle Weaver  wrote:

> (I assume the "job_" prefix is arbitrary; I'm not sure why we use a
>> filename prefix instead of a proper subfolder).
>>
>
Tangent: Totally agree this is odd. I believe it was because we failed to
figure out how to import utility libraries such as
CommonJobProperties.groovy from another folder.


> In Jenkins, the job name is prefixed by "beam_" (I assume for historical
>> reasons) on Jenkins and postfixed for variants like "_cron" or "_phrase".
>>
>
Quick explanation why we need the _commit and the _phrase variant (for
precommits): the _commit variant only runs if certain files are touched.
Even with a phrase trigger it will choose not to run. So the _phrase
variant never runs by default, but has no such filter and will run when
requested. It is all encapsulated in the builder classes though, I hope.

Kenn



> 2. The Github comment trigger, e.g. "Run Java Dataflow V2 ValidatesRunner"
>> 3. The job name displayed on Github, e.g. "Google Cloud Dataflow Runner
>> V2 Java ValidatesRunner Tests"
>>
>> The latter two seem redundant.
>>
>> On Wed, May 26, 2021 at 3:47 PM Kenneth Knowles  wrote:
>>
>>> Yea, big picture I agree with Kyle. "./gradlew $MODULE:test" and
>>> "./gradlew $MODULE:integrationTest" should be catch-alls ideally. With
>>> reasonable error messages if integrationTest has required parameters.
>>>
>>> Also just naming Jenkins jobs exactly by the command to run them would
>>> go a long way for me, personally.
>>>
>>> Kenn
>>>
>>> On Tue, May 25, 2021 at 2:46 PM Austin Bennett <
>>> whatwouldausti...@gmail.com> wrote:
>>>
 Cool; will be good to have and make things clearer!

 On Tue, May 25, 2021 at 2:39 PM Kyle Weaver 
 wrote:

> I left some comments. In summary, I think this is mostly a
> documentation problem. If running a test isn't as easy as "./gradlew
> $MODULE:integrationTest", there should be instructions in the test class's
> javadoc.
>
> On Tue, May 25, 2021 at 2:05 PM Udi Meiri  wrote:
>
>> My first place to go would be here:
>> https://cwiki.apache.org/confluence/display/BEAM/Java+Tips (although
>> it doesn't document your use-case)
>>
>> You are right that finding the correct gradle task or jenkins job is
>> not straightforward.
>>
>>
>> On Tue, May 25, 2021 at 12:48 PM Alex Amato 
>> wrote:
>>
>>> Friendly ping. I'll wait for more suggestions by the end of the
>>> week. Then close it out.
>>>
>>> -- Forwarded message -
>>> From: Alex Amato 
>>> Date: Fri, May 21, 2021 at 2:54 PM
>>> Subject: One Pager - Test Command Line Discoverability in Beam
>>> To: dev 
>>>
>>>
>>> Hi, I have had some issues determining how to run Beam tests. I have
>>> written a one pager for review and would like your feedback, to solve 
>>> the
>>> problem
>>> 
>>> :
>>>
>>> "A Beam developer is looking at a test file, such as
>>> “BigQueryTornadoesIT.java” and wants to run this test. But they do not 
>>> know
>>> the command line they need to type to run this test."
>>>
>>> I would like your feedback, to get toward a more concrete proposal.
>>> A few solutions are possible for this, mentioned in the proposal. But 
>>> any
>>> solution that makes it very easy to understand how to run the test is a
>>> viable option as well.
>>>
>>> Cheers,
>>> Alex
>>>
>>


[NEED HELP] Populating the change list for 2.30.0 release

2021-05-27 Thread Heejong Lee
Hi Beam developers,

I'm gathering the information for the changes in the 2.30.0 release. If you
have any idea about important *new features* / *breaking changes* /
*deprecation* / *known issues* for the 2.30.0 release, please note down
them in CHANGES.md or just let me know.

Thanks!


Flaky test issue report (40)

2021-05-27 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-12322: 
FnApiRunnerTestWithGrpcAndMultiWorkers flaky (py precommit) (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12309: 
PubSubIntegrationTest.test_streaming_data_only flake (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12307: 
PubSubBigQueryIT.test_file_loads flake (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12303: Flake in 
PubSubIntegrationTest.test_streaming_with_attributes (created 2021-05-06)
https://issues.apache.org/jira/browse/BEAM-12293: 
FlinkSavepointTest.testSavepointRestoreLegacy flakes due to 
FlinkJobNotFoundException (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (created 2021-03-18)
https://issues.apache.org/jira/browse/BEAM-11792: Python precommit failed 
(flaked?) installing package  (created 2021-02-10)
https://issues.apache.org/jira/browse/BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (created 2021-01-20)
https://issues.apache.org/jira/browse/BEAM-11662: elasticsearch tests 
failing (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11645: beam_PostCommit_XVR_Flink 
failing (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11541: 
testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. 
(created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-11540: Linter sometimes flakes 
on apache_beam.dataframe.frames_test (created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-10995: Java + Universal Local 
Runner: WindowingTest.testWindowPreservation fails (created 2020-09-30)
https://issues.apache.org/jira/browse/BEAM-10987: 
stager_test.py::StagerTest::test_with_main_session flaky on windows py3.6,3.7 
(created 2020-09-29)
https://issues.apache.org/jira/browse/BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (created 2020-09-25)
https://issues.apache.org/jira/browse/BEAM-10955: Flink Java Runner test 
flake: Could not find Flink job  (created 2020-09-23)
https://issues.apache.org/jira/browse/BEAM-10866: 
PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS 
(created 2020-09-09)
https://issues.apache.org/jira/browse/BEAM-10504: Failure / flake in 
ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn (created 
2020-07-15)
https://issues.apache.org/jira/browse/BEAM-10501: 
CheckGrafanaStalenessAlerts and PingGrafanaHttpApi fail with Connection refused 
(created 2020-07-15)
https://issues.apache.org/jira/browse/BEAM-10485: Failure / flake: 
ElasticsearchIOTest > testWriteWithIndexFn (created 2020-07-14)
https://issues.apache.org/jira/browse/BEAM-9649: 
beam_python_mongoio_load_test started failing due to mismatched results 
(created 2020-03-31)
https://issues.apache.org/jira/browse/BEAM-9392: TestStream tests are all 
flaky (created 2020-02-27)
https://issues.apache.org/jira/browse/BEAM-9232: 
BigQueryWriteIntegrationTests is flaky coercing to Unicode (created 2020-01-31)
https://issues.apache.org/jira/browse/BEAM-9119: 
apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest[...].test_large_elements
 is flaky (created 2020-01-14)
https://issues.apache.org/jira/browse/BEAM-8101: Flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for 
Direct, Spark, Flink (created 2019-08-27)
https://issues.apache.org/jira/browse/BEAM-8035: 
[beam_PreCommit_Java_Phrase] [WatchTest.testMultiplePollsWithManyResults]  
Flake: Outputs must be in timestamp order (created 2019-08-22)
https://issues.apache.org/jira/browse/BEAM-7992: Unhandled type_constraint 
in 

P1 issues report (39)

2021-05-27 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-12398: 
beam_PostRelease_NightlySnapshot timing out (created 2021-05-24)
https://issues.apache.org/jira/browse/BEAM-12396: 
beam_PostCommit_XVR_Direct failed (flaked?) (created 2021-05-24)
https://issues.apache.org/jira/browse/BEAM-12389: 
beam_PostCommit_XVR_Dataflow flaky: Expand method not found (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12387: beam_PostCommit_Python* 
timing out (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12386: 
beam_PostCommit_Py_VR_Dataflow(_V2) failing metrics tests (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12380: Go SDK Kafka IO Transform 
implemented via XLang (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12374: Spark postcommit failing 
ResumeFromCheckpointStreamingTest (created 2021-05-20)
https://issues.apache.org/jira/browse/BEAM-12337: Replace invalid UW 
container name for Java SDK (created 2021-05-14)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12316: LGPL in bundled 
dependencies (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12310: 
beam_PostCommit_Java_DataflowV2 failing (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12279: Implement 
destination-dependent sharding in FileIO.writeDynamic (created 2021-05-04)
https://issues.apache.org/jira/browse/BEAM-12256: 
PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some 
Avro logical types (created 2021-04-29)
https://issues.apache.org/jira/browse/BEAM-12231: 
beam_PostRelease_NightlySnapshot failing (created 2021-04-27)
https://issues.apache.org/jira/browse/BEAM-11959: Python Beam SDK Harness 
hangs when installing pip packages (created 2021-03-11)
https://issues.apache.org/jira/browse/BEAM-11906: No trigger early 
repeatedly for session windows (created 2021-03-01)
https://issues.apache.org/jira/browse/BEAM-11875: XmlIO.Read does not 
handle XML encoding per spec (created 2021-02-26)
https://issues.apache.org/jira/browse/BEAM-11828: JmsIO is not 
acknowledging messages correctly (created 2021-02-17)
https://issues.apache.org/jira/browse/BEAM-11755: Cross-language 
consistency (RequiresStableInputs) is quietly broken (at least on portable 
flink runner) (created 2021-02-05)
https://issues.apache.org/jira/browse/BEAM-11578: `dataflow_metrics` 
(python) fails with TypeError (when int overflowing?) (created 2021-01-06)
https://issues.apache.org/jira/browse/BEAM-11434: Expose Spanner 
admin/batch clients in Spanner Accessor (created 2020-12-10)
https://issues.apache.org/jira/browse/BEAM-11148: Kafka 
commitOffsetsInFinalize OOM on Flink (created 2020-10-28)
https://issues.apache.org/jira/browse/BEAM-11017: Timer with dataflow 
runner can be set multiple times (dataflow runner) (created 2020-10-05)
https://issues.apache.org/jira/browse/BEAM-10670: Make non-portable 
Splittable DoFn the only option when executing Java "Read" transforms (created 
2020-08-10)
https://issues.apache.org/jira/browse/BEAM-10617: python 
CombineGlobally().with_fanout() cause duplicate combine results for sliding 
windows (created 2020-07-31)
https://issues.apache.org/jira/browse/BEAM-10569: SpannerIO tests don't 
actually assert anything. (created 2020-07-23)
https://issues.apache.org/jira/browse/BEAM-10288: Quickstart documents are 
out of date (created 2020-06-19)
https://issues.apache.org/jira/browse/BEAM-10244: Populate requirements 
cache fails on poetry-based packages (created 2020-06-11)
https://issues.apache.org/jira/browse/BEAM-10100: FileIO writeDynamic with 
AvroIO.sink not writing all data (created 2020-05-27)
https://issues.apache.org/jira/browse/BEAM-9564: Remove insecure ssl 
options from MongoDBIO (created 2020-03-20)
https://issues.apache.org/jira/browse/BEAM-9455: Environment-sensitive 
provisioning for Dataflow (created 2020-03-05)
https://issues.apache.org/jira/browse/BEAM-9293: Python direct runner 
doesn't emit empty pane when it should (created 2020-02-11)
https://issues.apache.org/jira/browse/BEAM-8986: SortValues may not work 
correct for numerical types (created 2019-12-17)
https://issues.apache.org/jira/browse/BEAM-8985: SortValues should fail if 
SecondaryKey coder is not deterministic (created 2019-12-17)
https://issues.apache.org/jira/browse/BEAM-8407: [SQL] Some Hive tests 

Re: Compatibility Check Badges need attention

2021-05-27 Thread Ahmet Altay
Is this compatibility checking tool still supported? If not, maybe we can
remove these badges?

On Wed, May 19, 2021 at 2:43 PM Lily Li  wrote:

> I have not worked on this project for quite some time.
>
> +Colin Nelson 
>
> On Tue, May 18, 2021 at 4:00 PM Valentyn Tymofieiev 
> wrote:
>
>> Looks like they were added in https://github.com/apache/beam/pull/8791.
>> +Lily Li  do you know by chance if this tooling is
>> still maintained?
>>
>>
>> On Tue, May 18, 2021 at 2:27 PM Brian Hulette 
>> wrote:
>>
>>> Hi all,
>>> I just noticed that the two "compatibility check" badges on
>>> github.com/apache/beam are in a bad state [1,2]. Does anyone have
>>> context on these? What do we need to do to fix them?
>>>
>>> It looks like it may be a configuration issue. Some of the complaints
>>> are related to python 2. It's also notable that in one of the Python 3
>>> sections it states "The package does not support this version of python."
>>>
>>> Thanks,
>>> Brian
>>>
>>> [1]
>>> https://python-compatibility-tools.appspot.com/one_badge_target?package=apache-beam%5Bgcp%5D
>>> [2]
>>> https://python-compatibility-tools.appspot.com/one_badge_target?package=git%2Bgit%3A//github.com/apache/beam.git%23subdirectory%3Dsdks/python
>>>
>>