Filed https://issues.apache.org/jira/browse/BEAM-7016 with the details.

On Fri, Apr 5, 2019 at 1:47 PM Lukasz Cwik <[email protected]> wrote:

> Yes, it seems like the reset() method resets System.out even if it never
> was initialized. Seems like a simple fix to have reset() be safe to call at
> all times. Csaba or Michael, would either of you like to open a PR and send
> it my way?
>
> On Fri, Apr 5, 2019 at 1:39 PM Michael Luckey <[email protected]> wrote:
>
>> Ah...
>>
>> Did not yet debug. But wouldn't [1] mean setting system.out  to 'null' on
>> first call to @setup ? As there was no previous call to
>> DataflowWorkerLoggingInitializer.initialize?
>>
>>
>> https://github.com/apache/beam/blame/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java#L81
>>
>> On Fri, Apr 5, 2019 at 10:12 PM Lukasz Cwik <[email protected]> wrote:
>>
>>> We replace System.out/err to capture user logs and forward the logs for
>>> the Dataflow worker[1]. It could be that this test[2] is not resetting it
>>> afterwards which leaves it at null and then some future code causes it to
>>> fail.
>>>
>>> 1:
>>> https://github.com/apache/beam/blob/e69d69d72dc5b9c3d6069c0b71825c3c2b0b4e61/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java#L132
>>> 2:
>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java
>>>
>>> On Fri, Apr 5, 2019 at 1:42 AM Michael Luckey <[email protected]>
>>> wrote:
>>>
>>>> FWIW, the TimerRecieverTest is also failing consistently on my macOS.
>>>> Running on my ubuntu VM, they pass.
>>>>
>>>> Now the stacktrace indicates an NullPinterException thrown out of the
>>>> finally block [1]
>>>>
>>>> As this is really bad and of course would hide the cause, I added some
>>>>
>>>> diff --git
>>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>>
>>>> index 708b669112..8c21928da1 100644
>>>>
>>>> ---
>>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>>
>>>> +++
>>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>>
>>>> @@ -169,7 +169,12 @@ public class FnHarness {
>>>>
>>>>        LOG.info("Entering instruction processing loop");
>>>>
>>>>        control.processInstructionRequests(options.as
>>>> (GcsOptions.class).getExecutorService());
>>>>
>>>>      } finally {
>>>>
>>>> -      System.out.println("Shutting SDK harness down.");
>>>>
>>>> +      try {
>>>>
>>>> +        System.out.println("Shutting SDK harness down.");
>>>>
>>>> +      } catch (NullPointerException npe) {
>>>>
>>>> +        LOG.warn("NPE sys.out=" + System.out, npe);
>>>>
>>>> +      }
>>>>
>>>>      }
>>>>
>>>>    }
>>>>
>>>>  }
>>>>
>>>> No my test shows outputs
>>>>
>>>> Apr 05, 2019 9:29:59 AM org.apache.beam.fn.harness.FnHarness main
>>>> WARNING: NPE  sys.out=null
>>>> java.lang.NullPointerException
>>>>    at org.apache.beam.fn.harness.FnHarness.main(FnHarness.java:173)
>>>>    at 
>>>> org.apache.beam.runners.dataflow.worker.fn.control.TimerReceiverTest.lambda$setUp$0(TimerReceiverTest.java:123)
>>>>    at 
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>    at 
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>  at java.lang.Thread.run(Thread.java:748)
>>>>
>>>>
>>>>
>>>> and pass (sic!)
>>>>
>>>> Something weird is going on here....
>>>>
>>>> Now replacing that 'System.out' with 'LOG.info' seems also to be
>>>> working. At least I could not reproduce the failure trying several times. I
>>>> am lost here, as there is probably a good reason to use sys out here.
>>>>
>>>> Btw. After the first failure with NullPointerExceptions. successive
>>>> runs seem to fail for different reasons. Getting timeout in test setup.
>>>> Unsure, might indicate some grpc port/server startup issue because previous
>>>> run did not do proper cleanup.
>>>>
>>>> best,
>>>>
>>>> michel
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L172
>>>>
>>>> On Thu, Apr 4, 2019 at 10:42 PM Lukasz Cwik <[email protected]> wrote:
>>>>
>>>>> I looked at the failures you were experiencing and the error message
>>>>> doesn't provide enough information to figure out why it is failing.
>>>>>
>>>>> On Wed, Apr 3, 2019 at 9:23 PM Csaba Kassai <[email protected]> wrote:
>>>>>
>>>>>> Oh, I just missed it then :)
>>>>>> Thank you Lukasz for connecting us.
>>>>>>
>>>>>> Yeah, the two TimerReceiverTest tests fail reliably for me.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 2 Apr 2019 at 23:53, Lukasz Cwik <[email protected]> wrote:
>>>>>>
>>>>>>> +Ahmed
>>>>>>>
>>>>>>> I have added you as a contributor.
>>>>>>>
>>>>>>> It seems as though Ahmed had just picked up BEAM-3489 yesterday.
>>>>>>> Reach out to Ahmed if you would like to help them out with the task.
>>>>>>>
>>>>>>> Was TimerReceiverTest failing reliably when performing a parallel
>>>>>>> build or is it flaky?
>>>>>>>
>>>>>>> I have asked Chamikara to take a look for PR 8180.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 2, 2019 at 8:33 AM Csaba Kassai <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi All!
>>>>>>>>
>>>>>>>> I am Csabi, I would be happy to contribute to Beam.
>>>>>>>> Could you grant me contributor role and assign issue BEAM-3489
>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-3489>  to me? My user
>>>>>>>> name is "csabakassai".
>>>>>>>>
>>>>>>>> After I checked out the code and tried to do a gradle check I found
>>>>>>>> these issues:
>>>>>>>>
>>>>>>>>    1. *jUnit tests fails:* the TimerReceiverTest fails in the
>>>>>>>>    ":beam-runners-google-cloud-dataflow-java-fn-api-worker:test" and 
>>>>>>>> the
>>>>>>>>    ":beam-runners-google-cloud-dataflow-java-legacy-worker:test" 
>>>>>>>> tasks. When I
>>>>>>>>    execute tests independently everything is fine, so I disabled the 
>>>>>>>> parallel
>>>>>>>>    build and this solves the problem. I have not investigated further, 
>>>>>>>> do you
>>>>>>>>    have any more insights on this issue? I have attached the test 
>>>>>>>> reports.
>>>>>>>>    2. *python test fail*: there is a python test which fails if
>>>>>>>>    the current offset of your timezone differs from the offset in 
>>>>>>>> 1970. In my
>>>>>>>>    case the Singapore is now GMT+8 and it was GMT+7:30 in 1970. I 
>>>>>>>> created a
>>>>>>>>    ticket for this issue where I I describe the problem in details:
>>>>>>>>    https://jira.apache.org/jira/browse/BEAM-6947. Could you assign
>>>>>>>>    the ticket to me? Also I created a PR with a possible fix:
>>>>>>>>    https://github.com/apache/beam/pull/8180. Could you suggest me
>>>>>>>>    a reviewer?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Csabi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>

Reply via email to