Suggestion #4: Create source files outside the writer thread, and then copy
them from a source directory to the watched directory. That should
atomically write the file with the already known lastModificationTime.

On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas <jklu...@mozilla.com> wrote:

> I'll work on getting a PR together this morning, probably following
> Eugene's suggestion #1.
>
> On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri <eh...@google.com> wrote:
>
>> Alex, the only way to implement my suggestion #1 (that I know of) would
>> be to write to a file and read it back.
>> I don't have good example for #2.
>>
>> Eugene's suggestion no. 1 seems like a good idea. There are some example
>> <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
>> in the codebase.
>>
>> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <kirpic...@google.com>
>> wrote:
>>
>>> Yeah the "List<MatchResult.Metadata> expected" is constructed
>>> from Files.getLastModifiedTime() calls before the files are actually
>>> modified, the code is basically unconditionally broken rather than merely
>>> flaky.
>>>
>>> There's several easy options:
>>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>>> assertThat().contains() inside that, with the list constructed at time the
>>> assertion is applied rather than declared.
>>> 2) Implement a Matcher<Metadata> that ignores last modified time and use
>>> that
>>>
>>> Jeff - your option #3 is unfortunately also race-prone, because the code
>>> may match the files after they have been written but before
>>> setLastModifiedTime was called.
>>>
>>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <j...@klukas.net> wrote:
>>>
>>>> Another option:
>>>>
>>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>>>> after each File.write. Then the lastModifiedMillis can be a stable value
>>>> for each file and we can use those same static values in our expected
>>>> result. I think that would also eliminate the race condition.
>>>>
>>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <ajam...@google.com> wrote:
>>>>
>>>>> Thanks Udi, is there a good example for either of these?
>>>>> #1 - seems like you have to rewrite your assertion logic without the
>>>>> PAssert? Is there some way to capture the pipeline output and iterate over
>>>>> it? The pattern I have seen for this in the past also has thread safety
>>>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>>>> collection is not safe since the collection can be executed concurrently)
>>>>> #2 - Would BigqueryMatcher be a good example for this? which is used
>>>>> in BigQueryTornadoesIT.java Or is there another example you would suggest
>>>>> looking at for reference?
>>>>>
>>>>>    - I guess to this you need to implement the SerializableMatcher
>>>>>    interface and use the matcher as an option in the pipeline options.
>>>>>
>>>>>
>>>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>>> Some options:
>>>>>> - You could wait to assert until after p.waitForFinish().
>>>>>> - You could PAssert using SerializableMatcher and allow any
>>>>>> lastModifiedTime.
>>>>>>
>>>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <ajam...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +Jeff, Eugene,
>>>>>>>
>>>>>>> Hi Jeff and Eugene,
>>>>>>>
>>>>>>> I've noticed that Jeff's PR
>>>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884>
>>>>>>>  introduced
>>>>>>> a race condition in this test, but its not clear exactly how to add 
>>>>>>> Jeff's
>>>>>>> test check in a thread safe way. I believe this to be the source of the
>>>>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>>>>> test)?
>>>>>>>
>>>>>>> I added some details to this JIRA issue explaining in full
>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <ajam...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've seen this fail in a few different PRs for different
>>>>>>>> contributors, and its causing some issues during the presubmit 
>>>>>>>> process..
>>>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>>>>>>> suspicious as the source of the problem.
>>>>>>>>
>>>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>>>
>>>>>>>> I filed a JIRA for this issue:
>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>
>>>>>>>>
>>>>>>>>

Reply via email to