Re: [I] The PostCommit XVR Samza job is flaky [beam]

2024-05-25 Thread via GitHub


github-actions[bot] commented on issue #30601:
URL: https://github.com/apache/beam/issues/30601#issuecomment-2132049111

   Reopening since the workflow is still flaky


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Changed the retry order for test_big_query_write_temp_table_append_schema_update [beam]

2024-05-25 Thread via GitHub


github-actions[bot] commented on PR #31407:
URL: https://github.com/apache/beam/pull/31407#issuecomment-2131456730

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @jrmccluskey for label python.
   R: @chamikaramj for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Changed the retry order for test_big_query_write_temp_table_append_schema_update [beam]

2024-05-25 Thread via GitHub


liferoad commented on PR #31407:
URL: https://github.com/apache/beam/pull/31407#issuecomment-2131407578

   @tvalentyn not sure if you can trigger the python postcommits with my PR. 
Now they are broken due to this retry: 
https://github.com/apache/beam/actions/runs/9235128248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Changed the retry order for test_big_query_write_temp_table_append_schema_update [beam]

2024-05-25 Thread via GitHub


liferoad opened a new pull request, #31407:
URL: https://github.com/apache/beam/pull/31407

   After https://github.com/apache/beam/pull/31364 is merged, we started seeing 
this:
   
   ```
   
   === FAILURES 
=== |  
   -- | --
     | _ 
BigQueryWriteIntegrationTests.test_big_query_write_temp_table_append_schema_update
 _ |  
     | [gw5] linux -- Python 3.10.14 
/runner/_work/beam/beam/build/gradleenv/417525523/bin/python3.10 |  
     |   |  
     | args = 
(,) |  
     | kw = {} |  
     |   |  
     | @functools.wraps( |  
     | f, functools.WRAPPER_ASSIGNMENTS + ("__defaults__", "__kwdefaults__") |  
     | ) |  
     | def wrapped_f(*args: t.Any, **kw: t.Any) -> t.Any: |  
     | >   return self(f, *args, **kw) |  
     |   |  
     | 
../../build/gradleenv/417525523/lib/python3.10/site-packages/tenacity/__init__.py:330:
 |  
     | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ |  
     | 
../../build/gradleenv/417525523/lib/python3.10/site-packages/tenacity/__init__.py:467:
 in __call__ |  
     | do = self.iter(retry_state=retry_state) |  
     | 
../../build/gradleenv/417525523/lib/python3.10/site-packages/tenacity/__init__.py:368:
 in iter |  
     | result = action(retry_state) |  
     | 
../../build/gradleenv/417525523/lib/python3.10/site-packages/tenacity/__init__.py:410:
 in exc_check |  
     | raise retry_exc.reraise() |  
     | 
../../build/gradleenv/417525523/lib/python3.10/site-packages/tenacity/__init__.py:183:
 in reraise |  
     | raise self.last_attempt.result() |  
     | 
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/concurrent/futures/_base.py:451:
 in result |  
     | return self.__get_result() |  
     | 
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/concurrent/futures/_base.py:403:
 in __get_result |  
     | raise self._exception
   
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ |  
   -- | --
     |   |  
     | self = , 
wait=, before=, after=)> |  
     | fn = None |  
     | args = 
(,) |  
     | kwargs = {} |  
     | retry_state =  |  
     | do =  |  
     |   |  
     | def __call__( |  
     | self, |  
     | fn: t.Callable[..., WrappedFnReturnT], |  
     | *args: t.Any, |  
     | **kwargs: t.Any, |  
     | ) -> WrappedFnReturnT: |  
     | self.begin() |  
     |   |  
     | retry_state = RetryCallState(retry_object=self, fn=fn, args=args, 
kwargs=kwargs) |  
     | while True: |  
     | do = self.iter(retry_state=retry_state) |  
     | if isinstance(do, DoAttempt): |  
     | try: |  
     | >   result = fn(*args, **kwargs) |  
     | E   TypeError: 'NoneType' object is not 
callable |  
     |   |  
     | 
../../build/gradleenv/417525523/lib/python3.10/site-packages/tenacity/__init__.py:470:
 TypeError
   
   
   
   ```
   
   
https://ge.apache.org/s/t3t76xr2mbgdo/console-log/task/:sdks:python:test-suites:direct:py310:postCommitIT?anchor=203=1
   
   I suspect the retry order might cause this issue.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   

Re: [PR] [flink] #31390 emit watermark with empty source [beam]

2024-05-25 Thread via GitHub


je-ik merged PR #31391:
URL: https://github.com/apache/beam/pull/31391


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The PostCommit XVR Spark3 job is flaky [beam]

2024-05-25 Thread via GitHub


github-actions[bot] commented on issue #30602:
URL: https://github.com/apache/beam/issues/30602#issuecomment-2131310534

   Reopening since the workflow is still flaky


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] 31112 drop flink 1.14 [beam]

2024-05-25 Thread via GitHub


je-ik merged PR #31394:
URL: https://github.com/apache/beam/pull/31394


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Task]: Remove Flink 1.14 and cleanup [beam]

2024-05-25 Thread via GitHub


je-ik closed issue #31112: [Task]: Remove Flink 1.14 and cleanup
URL: https://github.com/apache/beam/issues/31112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Parse YAML ExpansionService configs directly using SnakeYAML [beam]

2024-05-25 Thread via GitHub


github-actions[bot] commented on PR #31406:
URL: https://github.com/apache/beam/pull/31406#issuecomment-2131302369

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @kennknowles for label java.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Parse YAML ExpansionService configs directly using SnakeYAML [beam]

2024-05-25 Thread via GitHub


chamikaramj commented on PR #31406:
URL: https://github.com/apache/beam/pull/31406#issuecomment-2131302126

   assign set of reviewers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] 31112 drop flink 1.14 [beam]

2024-05-25 Thread via GitHub


Abacn commented on PR #31394:
URL: https://github.com/apache/beam/pull/31394#issuecomment-2131269856

   Python ML PreCommit us recently added and has been flaky. TypeScript test 
has been permares for long. They are not related to this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Revert #28614: Add UseDataStreamForBatch option to Flink runner [beam]

2024-05-25 Thread via GitHub


github-actions[bot] commented on PR #29993:
URL: https://github.com/apache/beam/pull/29993#issuecomment-2131242056

   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add ExternalTransformProvider example [beam]

2024-05-25 Thread via GitHub


github-actions[bot] commented on PR #30666:
URL: https://github.com/apache/beam/pull/30666#issuecomment-2131242047

   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Parse YAML ExpansionService configs directly using SnakeYAML [beam]

2024-05-25 Thread via GitHub


github-actions[bot] commented on PR #31406:
URL: https://github.com/apache/beam/pull/31406#issuecomment-2131112044

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: KinesisIO source on FlinkRunner initializes the same splits twice [beam]

2024-05-25 Thread via GitHub


akashk99 commented on issue #31313:
URL: https://github.com/apache/beam/issues/31313#issuecomment-2131023652

   Thanks for the suggestions, will give them a try. I believe the first 
comment of the ticket provides a simple pipeline that exhibits this behavior on 
the flink runner but if that doesn’t work, happy to provide another. The 
example also submits the job in detached mode which may be related, although 
have seen similar behavior without it. Appreciate your help looking into this, 
if there’s anything I can assist with, please let me know 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] 31112 drop flink 1.14 [beam]

2024-05-25 Thread via GitHub


je-ik commented on PR #31394:
URL: https://github.com/apache/beam/pull/31394#issuecomment-2130996448

   @Abacn I'm not sure why the Python ML test fail, should this be ignored for 
this PR? The typescript tests still generally fail?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: KinesisIO source on FlinkRunner initializes the same splits twice [beam]

2024-05-25 Thread via GitHub


je-ik commented on issue #31313:
URL: https://github.com/apache/beam/issues/31313#issuecomment-2130993220

   > Im surprised this is a bug considering restoring from a flink savepoint is 
a pretty common use case, is it possible there some configuration missing 
somewhere? I havent been able to find anyone else online experiencing this same 
issue but I was able to replicate it using both kinesis and kafka. Given how 
common of a use case it is, Im not 100% sure I believe this is in fact a bug 
and most likely some user error on my part.
   
   Can you please provide a minimal example and setup to reproduce the behavior?
   
   > I can make do without savepoints by utilizing kafka offset commits and 
consumer groups to ensure no data is lost, but cant figure out a way to not 
lose data that is windowed but not triggered when the flink application is 
stopped. Maybe you know of a solution to that problem?
   
   You can drain the Pipeline, see 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/cli/#terminating-a-job
   
   > it seems like a lot of the subtasks arent being utilized when stripping 
IDs with beam_fn_api despite the number of shards being 20 and parallelism 
being 24 (in theory should only be 4 idle subtasks)
   
   This is related to how Flink computes target splits. It is affected by 
maximal parallelism (which is computed automatically, if not specified). You 
can try increasing it via `--setMaxParallelism=32768` (32768 is maximal value), 
this could make the assignment more balanced.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [flink] #31390 emit watermark with empty source [beam]

2024-05-25 Thread via GitHub


je-ik commented on PR #31391:
URL: https://github.com/apache/beam/pull/31391#issuecomment-2130976269

   > * This sounds similar to [[runners-flink] Fix watermark emission for 
empty splits (#29816) #30969](https://github.com/apache/beam/pull/30969), what 
is the difference here ?
   
   The fix in #30969 was related, but different. Source can be empty 
_temporarily_ or _finally_. The fact, that the source is empty for ever is 
signaled by watermark going to infinity. Then the split can be closed (and this 
results in watermark  move, because closed split does not hold watermark 
anymore).
   
   This PR fixes the other case - when the source is not emitting any data, but 
_does not_ move watermark to infinity, but rather uses some idle source policy. 
Before this PR no watermark was emitted downstream _until at least one element 
was emitted from the source_. This is fixed now.
   
   > 
   > * I also observed similar issue on JmsIO on Dataflow runner  
("watermark does not increase when there is no incoming data for a while") and 
the fix [[DRAFT] Attempt fix Jms watermark 
#30337](https://github.com/apache/beam/pull/30337) didn't work. I am wondering 
if [[Bug]: FlinkRunner does not emit watermark with empty source 
#31390](https://github.com/apache/beam/issues/31390) is generic at SDK level 
and a fix could posed in general ?
   
   All these fixes relate to Flink only. These issues were introduced by source 
refactoring in FlinkRunner, so nothing that can be extended to a general case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org