opened PR https://github.com/apache/beam/pull/8245
On Fri, Apr 5, 2019 at 10:53 PM Lukasz Cwik <[email protected]> wrote: > Filed https://issues.apache.org/jira/browse/BEAM-7016 with the details. > > On Fri, Apr 5, 2019 at 1:47 PM Lukasz Cwik <[email protected]> wrote: > >> Yes, it seems like the reset() method resets System.out even if it never >> was initialized. Seems like a simple fix to have reset() be safe to call at >> all times. Csaba or Michael, would either of you like to open a PR and send >> it my way? >> >> On Fri, Apr 5, 2019 at 1:39 PM Michael Luckey <[email protected]> >> wrote: >> >>> Ah... >>> >>> Did not yet debug. But wouldn't [1] mean setting system.out to 'null' >>> on first call to @setup ? As there was no previous call to >>> DataflowWorkerLoggingInitializer.initialize? >>> >>> >>> https://github.com/apache/beam/blame/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java#L81 >>> >>> On Fri, Apr 5, 2019 at 10:12 PM Lukasz Cwik <[email protected]> wrote: >>> >>>> We replace System.out/err to capture user logs and forward the logs for >>>> the Dataflow worker[1]. It could be that this test[2] is not resetting it >>>> afterwards which leaves it at null and then some future code causes it to >>>> fail. >>>> >>>> 1: >>>> https://github.com/apache/beam/blob/e69d69d72dc5b9c3d6069c0b71825c3c2b0b4e61/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java#L132 >>>> 2: >>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java >>>> >>>> On Fri, Apr 5, 2019 at 1:42 AM Michael Luckey <[email protected]> >>>> wrote: >>>> >>>>> FWIW, the TimerRecieverTest is also failing consistently on my macOS. >>>>> Running on my ubuntu VM, they pass. >>>>> >>>>> Now the stacktrace indicates an NullPinterException thrown out of the >>>>> finally block [1] >>>>> >>>>> As this is really bad and of course would hide the cause, I added some >>>>> >>>>> diff --git >>>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java >>>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java >>>>> >>>>> index 708b669112..8c21928da1 100644 >>>>> >>>>> --- >>>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java >>>>> >>>>> +++ >>>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java >>>>> >>>>> @@ -169,7 +169,12 @@ public class FnHarness { >>>>> >>>>> LOG.info("Entering instruction processing loop"); >>>>> >>>>> control.processInstructionRequests(options.as >>>>> (GcsOptions.class).getExecutorService()); >>>>> >>>>> } finally { >>>>> >>>>> - System.out.println("Shutting SDK harness down."); >>>>> >>>>> + try { >>>>> >>>>> + System.out.println("Shutting SDK harness down."); >>>>> >>>>> + } catch (NullPointerException npe) { >>>>> >>>>> + LOG.warn("NPE sys.out=" + System.out, npe); >>>>> >>>>> + } >>>>> >>>>> } >>>>> >>>>> } >>>>> >>>>> } >>>>> >>>>> No my test shows outputs >>>>> >>>>> Apr 05, 2019 9:29:59 AM org.apache.beam.fn.harness.FnHarness main >>>>> WARNING: NPE sys.out=null >>>>> java.lang.NullPointerException >>>>> at org.apache.beam.fn.harness.FnHarness.main(FnHarness.java:173) >>>>> at >>>>> org.apache.beam.runners.dataflow.worker.fn.control.TimerReceiverTest.lambda$setUp$0(TimerReceiverTest.java:123) >>>>> at >>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>>>> at java.lang.Thread.run(Thread.java:748) >>>>> >>>>> >>>>> >>>>> and pass (sic!) >>>>> >>>>> Something weird is going on here.... >>>>> >>>>> Now replacing that 'System.out' with 'LOG.info' seems also to be >>>>> working. At least I could not reproduce the failure trying several times. >>>>> I >>>>> am lost here, as there is probably a good reason to use sys out here. >>>>> >>>>> Btw. After the first failure with NullPointerExceptions. successive >>>>> runs seem to fail for different reasons. Getting timeout in test setup. >>>>> Unsure, might indicate some grpc port/server startup issue because >>>>> previous >>>>> run did not do proper cleanup. >>>>> >>>>> best, >>>>> >>>>> michel >>>>> >>>>> [1] >>>>> https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L172 >>>>> >>>>> On Thu, Apr 4, 2019 at 10:42 PM Lukasz Cwik <[email protected]> wrote: >>>>> >>>>>> I looked at the failures you were experiencing and the error message >>>>>> doesn't provide enough information to figure out why it is failing. >>>>>> >>>>>> On Wed, Apr 3, 2019 at 9:23 PM Csaba Kassai <[email protected]> wrote: >>>>>> >>>>>>> Oh, I just missed it then :) >>>>>>> Thank you Lukasz for connecting us. >>>>>>> >>>>>>> Yeah, the two TimerReceiverTest tests fail reliably for me. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, 2 Apr 2019 at 23:53, Lukasz Cwik <[email protected]> wrote: >>>>>>> >>>>>>>> +Ahmed >>>>>>>> >>>>>>>> I have added you as a contributor. >>>>>>>> >>>>>>>> It seems as though Ahmed had just picked up BEAM-3489 yesterday. >>>>>>>> Reach out to Ahmed if you would like to help them out with the task. >>>>>>>> >>>>>>>> Was TimerReceiverTest failing reliably when performing a parallel >>>>>>>> build or is it flaky? >>>>>>>> >>>>>>>> I have asked Chamikara to take a look for PR 8180. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Apr 2, 2019 at 8:33 AM Csaba Kassai <[email protected]> wrote: >>>>>>>> >>>>>>>>> Hi All! >>>>>>>>> >>>>>>>>> I am Csabi, I would be happy to contribute to Beam. >>>>>>>>> Could you grant me contributor role and assign issue BEAM-3489 >>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-3489> to me? My user >>>>>>>>> name is "csabakassai". >>>>>>>>> >>>>>>>>> After I checked out the code and tried to do a gradle check I >>>>>>>>> found these issues: >>>>>>>>> >>>>>>>>> 1. *jUnit tests fails:* the TimerReceiverTest fails in the >>>>>>>>> ":beam-runners-google-cloud-dataflow-java-fn-api-worker:test" and >>>>>>>>> the >>>>>>>>> ":beam-runners-google-cloud-dataflow-java-legacy-worker:test" >>>>>>>>> tasks. When I >>>>>>>>> execute tests independently everything is fine, so I disabled the >>>>>>>>> parallel >>>>>>>>> build and this solves the problem. I have not investigated >>>>>>>>> further, do you >>>>>>>>> have any more insights on this issue? I have attached the test >>>>>>>>> reports. >>>>>>>>> 2. *python test fail*: there is a python test which fails if >>>>>>>>> the current offset of your timezone differs from the offset in >>>>>>>>> 1970. In my >>>>>>>>> case the Singapore is now GMT+8 and it was GMT+7:30 in 1970. I >>>>>>>>> created a >>>>>>>>> ticket for this issue where I I describe the problem in details: >>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6947. Could you >>>>>>>>> assign the ticket to me? Also I created a PR with a possible fix: >>>>>>>>> https://github.com/apache/beam/pull/8180. Could you suggest me >>>>>>>>> a reviewer? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Csabi >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>
