opened PR https://github.com/apache/beam/pull/8245

On Fri, Apr 5, 2019 at 10:53 PM Lukasz Cwik <[email protected]> wrote:

> Filed https://issues.apache.org/jira/browse/BEAM-7016 with the details.
>
> On Fri, Apr 5, 2019 at 1:47 PM Lukasz Cwik <[email protected]> wrote:
>
>> Yes, it seems like the reset() method resets System.out even if it never
>> was initialized. Seems like a simple fix to have reset() be safe to call at
>> all times. Csaba or Michael, would either of you like to open a PR and send
>> it my way?
>>
>> On Fri, Apr 5, 2019 at 1:39 PM Michael Luckey <[email protected]>
>> wrote:
>>
>>> Ah...
>>>
>>> Did not yet debug. But wouldn't [1] mean setting system.out  to 'null'
>>> on first call to @setup ? As there was no previous call to
>>> DataflowWorkerLoggingInitializer.initialize?
>>>
>>>
>>> https://github.com/apache/beam/blame/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java#L81
>>>
>>> On Fri, Apr 5, 2019 at 10:12 PM Lukasz Cwik <[email protected]> wrote:
>>>
>>>> We replace System.out/err to capture user logs and forward the logs for
>>>> the Dataflow worker[1]. It could be that this test[2] is not resetting it
>>>> afterwards which leaves it at null and then some future code causes it to
>>>> fail.
>>>>
>>>> 1:
>>>> https://github.com/apache/beam/blob/e69d69d72dc5b9c3d6069c0b71825c3c2b0b4e61/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java#L132
>>>> 2:
>>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java
>>>>
>>>> On Fri, Apr 5, 2019 at 1:42 AM Michael Luckey <[email protected]>
>>>> wrote:
>>>>
>>>>> FWIW, the TimerRecieverTest is also failing consistently on my macOS.
>>>>> Running on my ubuntu VM, they pass.
>>>>>
>>>>> Now the stacktrace indicates an NullPinterException thrown out of the
>>>>> finally block [1]
>>>>>
>>>>> As this is really bad and of course would hide the cause, I added some
>>>>>
>>>>> diff --git
>>>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>>>
>>>>> index 708b669112..8c21928da1 100644
>>>>>
>>>>> ---
>>>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>>>
>>>>> +++
>>>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>>>
>>>>> @@ -169,7 +169,12 @@ public class FnHarness {
>>>>>
>>>>>        LOG.info("Entering instruction processing loop");
>>>>>
>>>>>        control.processInstructionRequests(options.as
>>>>> (GcsOptions.class).getExecutorService());
>>>>>
>>>>>      } finally {
>>>>>
>>>>> -      System.out.println("Shutting SDK harness down.");
>>>>>
>>>>> +      try {
>>>>>
>>>>> +        System.out.println("Shutting SDK harness down.");
>>>>>
>>>>> +      } catch (NullPointerException npe) {
>>>>>
>>>>> +        LOG.warn("NPE sys.out=" + System.out, npe);
>>>>>
>>>>> +      }
>>>>>
>>>>>      }
>>>>>
>>>>>    }
>>>>>
>>>>>  }
>>>>>
>>>>> No my test shows outputs
>>>>>
>>>>> Apr 05, 2019 9:29:59 AM org.apache.beam.fn.harness.FnHarness main
>>>>> WARNING: NPE  sys.out=null
>>>>> java.lang.NullPointerException
>>>>>   at org.apache.beam.fn.harness.FnHarness.main(FnHarness.java:173)
>>>>>   at 
>>>>> org.apache.beam.runners.dataflow.worker.fn.control.TimerReceiverTest.lambda$setUp$0(TimerReceiverTest.java:123)
>>>>>   at 
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>   at 
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>   at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>  at java.lang.Thread.run(Thread.java:748)
>>>>>
>>>>>
>>>>>
>>>>> and pass (sic!)
>>>>>
>>>>> Something weird is going on here....
>>>>>
>>>>> Now replacing that 'System.out' with 'LOG.info' seems also to be
>>>>> working. At least I could not reproduce the failure trying several times. 
>>>>> I
>>>>> am lost here, as there is probably a good reason to use sys out here.
>>>>>
>>>>> Btw. After the first failure with NullPointerExceptions. successive
>>>>> runs seem to fail for different reasons. Getting timeout in test setup.
>>>>> Unsure, might indicate some grpc port/server startup issue because 
>>>>> previous
>>>>> run did not do proper cleanup.
>>>>>
>>>>> best,
>>>>>
>>>>> michel
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L172
>>>>>
>>>>> On Thu, Apr 4, 2019 at 10:42 PM Lukasz Cwik <[email protected]> wrote:
>>>>>
>>>>>> I looked at the failures you were experiencing and the error message
>>>>>> doesn't provide enough information to figure out why it is failing.
>>>>>>
>>>>>> On Wed, Apr 3, 2019 at 9:23 PM Csaba Kassai <[email protected]> wrote:
>>>>>>
>>>>>>> Oh, I just missed it then :)
>>>>>>> Thank you Lukasz for connecting us.
>>>>>>>
>>>>>>> Yeah, the two TimerReceiverTest tests fail reliably for me.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 2 Apr 2019 at 23:53, Lukasz Cwik <[email protected]> wrote:
>>>>>>>
>>>>>>>> +Ahmed
>>>>>>>>
>>>>>>>> I have added you as a contributor.
>>>>>>>>
>>>>>>>> It seems as though Ahmed had just picked up BEAM-3489 yesterday.
>>>>>>>> Reach out to Ahmed if you would like to help them out with the task.
>>>>>>>>
>>>>>>>> Was TimerReceiverTest failing reliably when performing a parallel
>>>>>>>> build or is it flaky?
>>>>>>>>
>>>>>>>> I have asked Chamikara to take a look for PR 8180.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Apr 2, 2019 at 8:33 AM Csaba Kassai <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi All!
>>>>>>>>>
>>>>>>>>> I am Csabi, I would be happy to contribute to Beam.
>>>>>>>>> Could you grant me contributor role and assign issue BEAM-3489
>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-3489>  to me? My user
>>>>>>>>> name is "csabakassai".
>>>>>>>>>
>>>>>>>>> After I checked out the code and tried to do a gradle check I
>>>>>>>>> found these issues:
>>>>>>>>>
>>>>>>>>>    1. *jUnit tests fails:* the TimerReceiverTest fails in the
>>>>>>>>>    ":beam-runners-google-cloud-dataflow-java-fn-api-worker:test" and 
>>>>>>>>> the
>>>>>>>>>    ":beam-runners-google-cloud-dataflow-java-legacy-worker:test" 
>>>>>>>>> tasks. When I
>>>>>>>>>    execute tests independently everything is fine, so I disabled the 
>>>>>>>>> parallel
>>>>>>>>>    build and this solves the problem. I have not investigated 
>>>>>>>>> further, do you
>>>>>>>>>    have any more insights on this issue? I have attached the test 
>>>>>>>>> reports.
>>>>>>>>>    2. *python test fail*: there is a python test which fails if
>>>>>>>>>    the current offset of your timezone differs from the offset in 
>>>>>>>>> 1970. In my
>>>>>>>>>    case the Singapore is now GMT+8 and it was GMT+7:30 in 1970. I 
>>>>>>>>> created a
>>>>>>>>>    ticket for this issue where I I describe the problem in details:
>>>>>>>>>    https://jira.apache.org/jira/browse/BEAM-6947. Could you
>>>>>>>>>    assign the ticket to me? Also I created a PR with a possible fix:
>>>>>>>>>    https://github.com/apache/beam/pull/8180. Could you suggest me
>>>>>>>>>    a reviewer?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Csabi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Reply via email to