Looks like the only runner left to make work is the Flink one. Let's take the LeaderBoardIT [1] test as an example; when calling the runLeaderBoard method from LeaderBoard all the reading, processing and writing is done successfully. I can see the BigQuery dataset and table get created and populated. Nonetheless, the pipeline fails to finish and thus the validation is never reached which causes the test to fail after two hours. Is there a Flink-specific way to force the pipeline to finish?
Thanks! [1] https://github.com/apache/beam/blob/b165d1fb092e4ba4ad1174692bfd9022465c6d5d/examples/java/src/test/java/org/apache/beam/examples/complete/game/LeaderBoardIT.java On Tue, Mar 22, 2022 at 1:51 PM Fer Morales Martinez < [email protected]> wrote: > Great! > I will change the log retention policy just to double check they are > passing. > > Thanks again, Kyle! > > On Tue, Mar 22, 2022 at 12:51 PM Kyle Weaver <[email protected]> wrote: > >> Those log messages are only printed by the Dataflow runner, so we >> shouldn't expect to see them printed by other runners. >> https://github.com/apache/beam/blob/9a36490de8129359106baf37e1d0b071e19e303a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/TestDataflowRunner.java#L275 >> >> Not sure about the Dataflow job, but I don't think there's sufficient >> evidence to show they're not passing. The success logs are likely just >> covered up by the subsequent cleanup logs. >> >> If you really want to make sure, you can temporarily change the Jenkins >> job definition to retain logs. This will require rerunning the seed job >> though. https://stackoverflow.com/a/54720951 >> >> PS I believe there is work in progress to somehow correlate Jenkins jobs >> with Dataflow job IDs (aside from just logs)? >> >> >> On Tue, Mar 22, 2022 at 11:51 AM Fer Morales Martinez < >> [email protected]> wrote: >> >>> For some tests [1], [2] I see something along the lines of: >>> >>> INFO: Job 2022-03-17_09_56_46-14482025605245680862 finished with status >>> DONE. >>> Mar 17, 2022 5:02:44 PM org.apache.beam.runners.dataflow.TestDataflowRunner >>> checkForPAssertSuccess >>> INFO: Success result for Dataflow job >>> 2022-03-17_09_56_46-14482025605245680862. Found 0 success, 0 failures out >>> of 0 expected assertions. >>> Mar 17, 2022 5:02:44 PM >>> org.apache.beam.runners.dataflow.DataflowPipelineJob logTerminalState >>> INFO: Job 2022-03-17_09_56_46-14482025605245680862 finished with status >>> DONE. >>> >>> >>> But for others [3], I'm not seeing those success and DONE messages. >>> Which led me to believe they fail at some point of the execution. >>> >>> [1] >>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2_PR/60/testReport/org.apache.beam.examples.complete.game/UserScoreIT/testE2EUserScore/ >>> [2] >>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2_PR/60/testReport/org.apache.beam.examples.complete.game/HourlyTeamScoreIT/testE2EHourlyTeamScore/ >>> [3] >>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2_PR/60/testReport/org.apache.beam.examples.complete.game/LeaderBoardIT/testE2ELeaderBoard/ >>> >>> On Tue, Mar 22, 2022 at 10:55 AM Kyle Weaver <[email protected]> >>> wrote: >>> >>>> Are you sure about the false positives? I believe Jenkins truncates >>>> Junit logs intentionally for passing tests to save disk space. >>>> https://plugins.jenkins.io/junit/ >>>> >>>> On Tue, Mar 22, 2022 at 10:51 AM Fer Morales Martinez < >>>> [email protected]> wrote: >>>> >>>>> Totally! >>>>> >>>>> Flink >>>>> >>>>> >>>>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Flink/8/testReport/org.apache.beam.examples.complete.game/ >>>>> >>>>> Direct >>>>> >>>>> >>>>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Direct/10/testReport/org.apache.beam.examples.complete.game/ >>>>> >>>>> Dataflow >>>>> >>>>> >>>>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2_PR/60/testReport/org.apache.beam.examples.complete.game/ >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> On Tue, Mar 22, 2022 at 10:24 AM Kyle Weaver <[email protected]> >>>>> wrote: >>>>> >>>>>> Can you send the links to the jenkins jobs? >>>>>> >>>>>> On Tue, Mar 22, 2022 at 9:28 AM Fer Morales Martinez < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks for the reply Kyle! >>>>>>> >>>>>>> I will skip streaming tests for the Spark runner for now. >>>>>>> >>>>>>> After *disabling* the *GameStatsIT* test, these are the results. At >>>>>>> first sight it looks like all three runners successfully execute some or >>>>>>> all tests, but upon closer inspection I realized those were just false >>>>>>> positives. >>>>>>> >>>>>>> Flink >>>>>>> >>>>>>> LeaderBoardIT: OutOfMemoryError: GC overhead limit exceeded >>>>>>> >>>>>>> StatefulTeamScoreIT: OutOfMemoryError: GC overhead limit exceeded >>>>>>> >>>>>>> Would increasing the memory of the jenkins runner help? >>>>>>> >>>>>>> >>>>>>> HourlyTeamScoreIT and UserScoreIT successful execution is a false >>>>>>> positive. Logs get truncated but I see a message about setting the >>>>>>> fasterCopy option for better performance. Will try that and let you >>>>>>> know my >>>>>>> findings. >>>>>>> >>>>>>> Direct >>>>>>> >>>>>>> All tests failing in Jenkins but for some reason they get reported >>>>>>> as successful. Looks like something fails but the log gets truncated. >>>>>>> >>>>>>> Locally all run successfully. >>>>>>> >>>>>>> Dataflow >>>>>>> >>>>>>> StatefulTeamScoreIT, LeaderBoardIT, are failing but the log file >>>>>>> gets truncated. >>>>>>> >>>>>>> HourlyTeamScoreIT and UserScoreIT are successful. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 9:50 AM Kyle Weaver <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> These examples are all streaming pipelines, so I'm not surprised >>>>>>>> the Spark runner has issues there. I would describe the Spark runner's >>>>>>>> streaming mode as sketchy at best. >>>>>>>> >>>>>>>> Not sure what is happening with other runners. Your latest email >>>>>>>> summary doesn't seem to match the test results on >>>>>>>> https://github.com/apache/beam/pull/17015. >>>>>>>> >>>>>>>> On Fri, Mar 18, 2022 at 8:17 PM Ahmet Altay <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Adding @Kyle Weaver <[email protected]> and @Kenneth Knowles >>>>>>>>> <[email protected]> - they might have some idea. >>>>>>>>> >>>>>>>>> On Fri, Mar 18, 2022 at 2:48 PM Fer Morales Martinez < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hello everyone >>>>>>>>>> >>>>>>>>>> These are my latest findings. If anyone has a clue as to what's >>>>>>>>>> happening, any help would be greatly appreciated. >>>>>>>>>> >>>>>>>>>> Spark >>>>>>>>>> >>>>>>>>>> *StatefulTeamScoreIT* >>>>>>>>>> >>>>>>>>>> According to the compatibility matrix [1], stateful processing >>>>>>>>>> is not yet supported hence the error. Will sickbay this test and >>>>>>>>>> wait for >>>>>>>>>> that to be implemented. >>>>>>>>>> >>>>>>>>>> *LeaderBoardIT* >>>>>>>>>> >>>>>>>>>> Looks like PubSubClient is being shutdown before reading can >>>>>>>>>> complete. >>>>>>>>>> >>>>>>>>>> *GameStatsIT* >>>>>>>>>> >>>>>>>>>> No TransformEvaluator registered for UNBOUNDED transform >>>>>>>>>> View.CreatePCollectionView. >>>>>>>>>> >>>>>>>>>> Flink >>>>>>>>>> >>>>>>>>>> Runner dies after two hours, possibly due to GameStats example. >>>>>>>>>> Will disable that test in order to complete the other ones. >>>>>>>>>> >>>>>>>>>> Direct >>>>>>>>>> >>>>>>>>>> All tests failing in Jenkins. Locally, only GameStats fails. >>>>>>>>>> >>>>>>>>>> Dataflow >>>>>>>>>> >>>>>>>>>> Contrary to the previous email, StatefulTeamScoreIT, >>>>>>>>>> LeaderBoardIT, GameStatsIT are failing but the log file gets >>>>>>>>>> truncated so >>>>>>>>>> no further details here at the moment. >>>>>>>>>> >>>>>>>>>> HourlyTeamScoreIT and UserScoreIT are successful. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Again, PR is here [2] in case you want to take a look at how the >>>>>>>>>> integration tests scaffolding is being made. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://beam.apache.org/documentation/runners/capability-matrix/what-is-being-computed/ >>>>>>>>>> >>>>>>>>>> [2] https://github.com/apache/beam/pull/17015/files >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 16, 2022 at 4:03 PM Fer Morales Martinez < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi team! >>>>>>>>>>> >>>>>>>>>>> I've been implementing integration tests [1] for the examples >>>>>>>>>>> under the beam/examples/complete/game folder. >>>>>>>>>>> Most of the integration tests (the one for *GameStats* being >>>>>>>>>>> the only outlier so far) run successfully when executed >>>>>>>>>>> *locally* with the *Direct* runner. >>>>>>>>>>> However, I've found that when executed against the Spark, Flink >>>>>>>>>>> and Direct runners in jenkins, they fail with a different error >>>>>>>>>>> depending >>>>>>>>>>> on the runner. >>>>>>>>>>> Direct runner on jenkins plays mostly well [2] with all the ITs; >>>>>>>>>>> again, *GameStats* being the only one that for some reason gets >>>>>>>>>>> stuck when trying to get a fixedWindow [3]. >>>>>>>>>>> With Spark [4] three tests fail, two of those being executed >>>>>>>>>>> successfully by the Direct runner. >>>>>>>>>>> Flink [5] gives up after two hours executing, possibly due to >>>>>>>>>>> the same *GameStats* issue. >>>>>>>>>>> Fortunately, DataFlow executes all the ITs successfully [6], >>>>>>>>>>> even the *GameStats* one. >>>>>>>>>>> >>>>>>>>>>> I guess my question would be, what could be improved integration >>>>>>>>>>> testing-wise? How would you pinpoint the root cause in this kind of >>>>>>>>>>> scenario? >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> >>>>>>>>>>> - Fer >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [1] https://github.com/apache/beam/pull/17015/files >>>>>>>>>>> [2] >>>>>>>>>>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Direct/9/testReport/ >>>>>>>>>>> [3] >>>>>>>>>>> https://github.com/apache/beam/blob/3e6e8a7e2afa08552849518439577eee07700f77/examples/java/src/main/java/org/apache/beam/examples/complete/game/GameStats.java#L292 >>>>>>>>>>> [4] >>>>>>>>>>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Spark/7/testReport/ >>>>>>>>>>> [5] >>>>>>>>>>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Flink/7/ >>>>>>>>>>> [6] >>>>>>>>>>> https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2_PR/59/testReport/ >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *This email and its contents (including any attachments) are >>>>>>>>>> being sent toyou on the condition of confidentiality and may be >>>>>>>>>> protected >>>>>>>>>> by legalprivilege. Access to this email by anyone other than the >>>>>>>>>> intended >>>>>>>>>> recipientis unauthorized. If you are not the intended recipient, >>>>>>>>>> please >>>>>>>>>> immediatelynotify the sender by replying to this message and delete >>>>>>>>>> the >>>>>>>>>> materialimmediately from your system. Any further use, dissemination, >>>>>>>>>> distributionor reproduction of this email is strictly prohibited. >>>>>>>>>> Further, >>>>>>>>>> norepresentation is made with respect to any content contained in >>>>>>>>>> this >>>>>>>>>> email.* >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *This email and its contents (including any attachments) are being >>>>>>> sent toyou on the condition of confidentiality and may be protected by >>>>>>> legalprivilege. Access to this email by anyone other than the intended >>>>>>> recipientis unauthorized. If you are not the intended recipient, please >>>>>>> immediatelynotify the sender by replying to this message and delete the >>>>>>> materialimmediately from your system. Any further use, dissemination, >>>>>>> distributionor reproduction of this email is strictly prohibited. >>>>>>> Further, >>>>>>> norepresentation is made with respect to any content contained in this >>>>>>> email.* >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *This email and its contents (including any attachments) are being >>>>> sent toyou on the condition of confidentiality and may be protected by >>>>> legalprivilege. Access to this email by anyone other than the intended >>>>> recipientis unauthorized. If you are not the intended recipient, please >>>>> immediatelynotify the sender by replying to this message and delete the >>>>> materialimmediately from your system. Any further use, dissemination, >>>>> distributionor reproduction of this email is strictly prohibited. Further, >>>>> norepresentation is made with respect to any content contained in this >>>>> email.* >>>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> *This email and its contents (including any attachments) are being sent >>> toyou on the condition of confidentiality and may be protected by >>> legalprivilege. Access to this email by anyone other than the intended >>> recipientis unauthorized. If you are not the intended recipient, please >>> immediatelynotify the sender by replying to this message and delete the >>> materialimmediately from your system. Any further use, dissemination, >>> distributionor reproduction of this email is strictly prohibited. Further, >>> norepresentation is made with respect to any content contained in this >>> email.* >> >> -- *This email and its contents (including any attachments) are being sent to you on the condition of confidentiality and may be protected by legal privilege. Access to this email by anyone other than the intended recipient is unauthorized. If you are not the intended recipient, please immediately notify the sender by replying to this message and delete the material immediately from your system. Any further use, dissemination, distribution or reproduction of this email is strictly prohibited. Further, no representation is made with respect to any content contained in this email.*
