Filed https://issues.apache.org/jira/browse/BEAM-7016 with the details.
On Fri, Apr 5, 2019 at 1:47 PM Lukasz Cwik <[email protected]> wrote: > Yes, it seems like the reset() method resets System.out even if it never > was initialized. Seems like a simple fix to have reset() be safe to call at > all times. Csaba or Michael, would either of you like to open a PR and send > it my way? > > On Fri, Apr 5, 2019 at 1:39 PM Michael Luckey <[email protected]> wrote: > >> Ah... >> >> Did not yet debug. But wouldn't [1] mean setting system.out to 'null' on >> first call to @setup ? As there was no previous call to >> DataflowWorkerLoggingInitializer.initialize? >> >> >> https://github.com/apache/beam/blame/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java#L81 >> >> On Fri, Apr 5, 2019 at 10:12 PM Lukasz Cwik <[email protected]> wrote: >> >>> We replace System.out/err to capture user logs and forward the logs for >>> the Dataflow worker[1]. It could be that this test[2] is not resetting it >>> afterwards which leaves it at null and then some future code causes it to >>> fail. >>> >>> 1: >>> https://github.com/apache/beam/blob/e69d69d72dc5b9c3d6069c0b71825c3c2b0b4e61/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java#L132 >>> 2: >>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java >>> >>> On Fri, Apr 5, 2019 at 1:42 AM Michael Luckey <[email protected]> >>> wrote: >>> >>>> FWIW, the TimerRecieverTest is also failing consistently on my macOS. >>>> Running on my ubuntu VM, they pass. >>>> >>>> Now the stacktrace indicates an NullPinterException thrown out of the >>>> finally block [1] >>>> >>>> As this is really bad and of course would hide the cause, I added some >>>> >>>> diff --git >>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java >>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java >>>> >>>> index 708b669112..8c21928da1 100644 >>>> >>>> --- >>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java >>>> >>>> +++ >>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java >>>> >>>> @@ -169,7 +169,12 @@ public class FnHarness { >>>> >>>> LOG.info("Entering instruction processing loop"); >>>> >>>> control.processInstructionRequests(options.as >>>> (GcsOptions.class).getExecutorService()); >>>> >>>> } finally { >>>> >>>> - System.out.println("Shutting SDK harness down."); >>>> >>>> + try { >>>> >>>> + System.out.println("Shutting SDK harness down."); >>>> >>>> + } catch (NullPointerException npe) { >>>> >>>> + LOG.warn("NPE sys.out=" + System.out, npe); >>>> >>>> + } >>>> >>>> } >>>> >>>> } >>>> >>>> } >>>> >>>> No my test shows outputs >>>> >>>> Apr 05, 2019 9:29:59 AM org.apache.beam.fn.harness.FnHarness main >>>> WARNING: NPE sys.out=null >>>> java.lang.NullPointerException >>>> at org.apache.beam.fn.harness.FnHarness.main(FnHarness.java:173) >>>> at >>>> org.apache.beam.runners.dataflow.worker.fn.control.TimerReceiverTest.lambda$setUp$0(TimerReceiverTest.java:123) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>>> at java.lang.Thread.run(Thread.java:748) >>>> >>>> >>>> >>>> and pass (sic!) >>>> >>>> Something weird is going on here.... >>>> >>>> Now replacing that 'System.out' with 'LOG.info' seems also to be >>>> working. At least I could not reproduce the failure trying several times. I >>>> am lost here, as there is probably a good reason to use sys out here. >>>> >>>> Btw. After the first failure with NullPointerExceptions. successive >>>> runs seem to fail for different reasons. Getting timeout in test setup. >>>> Unsure, might indicate some grpc port/server startup issue because previous >>>> run did not do proper cleanup. >>>> >>>> best, >>>> >>>> michel >>>> >>>> [1] >>>> https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L172 >>>> >>>> On Thu, Apr 4, 2019 at 10:42 PM Lukasz Cwik <[email protected]> wrote: >>>> >>>>> I looked at the failures you were experiencing and the error message >>>>> doesn't provide enough information to figure out why it is failing. >>>>> >>>>> On Wed, Apr 3, 2019 at 9:23 PM Csaba Kassai <[email protected]> wrote: >>>>> >>>>>> Oh, I just missed it then :) >>>>>> Thank you Lukasz for connecting us. >>>>>> >>>>>> Yeah, the two TimerReceiverTest tests fail reliably for me. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, 2 Apr 2019 at 23:53, Lukasz Cwik <[email protected]> wrote: >>>>>> >>>>>>> +Ahmed >>>>>>> >>>>>>> I have added you as a contributor. >>>>>>> >>>>>>> It seems as though Ahmed had just picked up BEAM-3489 yesterday. >>>>>>> Reach out to Ahmed if you would like to help them out with the task. >>>>>>> >>>>>>> Was TimerReceiverTest failing reliably when performing a parallel >>>>>>> build or is it flaky? >>>>>>> >>>>>>> I have asked Chamikara to take a look for PR 8180. >>>>>>> >>>>>>> >>>>>>> On Tue, Apr 2, 2019 at 8:33 AM Csaba Kassai <[email protected]> wrote: >>>>>>> >>>>>>>> Hi All! >>>>>>>> >>>>>>>> I am Csabi, I would be happy to contribute to Beam. >>>>>>>> Could you grant me contributor role and assign issue BEAM-3489 >>>>>>>> <https://issues.apache.org/jira/browse/BEAM-3489> to me? My user >>>>>>>> name is "csabakassai". >>>>>>>> >>>>>>>> After I checked out the code and tried to do a gradle check I found >>>>>>>> these issues: >>>>>>>> >>>>>>>> 1. *jUnit tests fails:* the TimerReceiverTest fails in the >>>>>>>> ":beam-runners-google-cloud-dataflow-java-fn-api-worker:test" and >>>>>>>> the >>>>>>>> ":beam-runners-google-cloud-dataflow-java-legacy-worker:test" >>>>>>>> tasks. When I >>>>>>>> execute tests independently everything is fine, so I disabled the >>>>>>>> parallel >>>>>>>> build and this solves the problem. I have not investigated further, >>>>>>>> do you >>>>>>>> have any more insights on this issue? I have attached the test >>>>>>>> reports. >>>>>>>> 2. *python test fail*: there is a python test which fails if >>>>>>>> the current offset of your timezone differs from the offset in >>>>>>>> 1970. In my >>>>>>>> case the Singapore is now GMT+8 and it was GMT+7:30 in 1970. I >>>>>>>> created a >>>>>>>> ticket for this issue where I I describe the problem in details: >>>>>>>> https://jira.apache.org/jira/browse/BEAM-6947. Could you assign >>>>>>>> the ticket to me? Also I created a PR with a possible fix: >>>>>>>> https://github.com/apache/beam/pull/8180. Could you suggest me >>>>>>>> a reviewer? >>>>>>>> >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Csabi >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>
