Suggestion #4: Create source files outside the writer thread, and then copy them from a source directory to the watched directory. That should atomically write the file with the already known lastModificationTime.
On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas <[email protected]> wrote: > I'll work on getting a PR together this morning, probably following > Eugene's suggestion #1. > > On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri <[email protected]> wrote: > >> Alex, the only way to implement my suggestion #1 (that I know of) would >> be to write to a file and read it back. >> I don't have good example for #2. >> >> Eugene's suggestion no. 1 seems like a good idea. There are some example >> <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340> >> in the codebase. >> >> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <[email protected]> >> wrote: >> >>> Yeah the "List<MatchResult.Metadata> expected" is constructed >>> from Files.getLastModifiedTime() calls before the files are actually >>> modified, the code is basically unconditionally broken rather than merely >>> flaky. >>> >>> There's several easy options: >>> 1) Use PAssert.that().satisfies() instead of .contains(), and use >>> assertThat().contains() inside that, with the list constructed at time the >>> assertion is applied rather than declared. >>> 2) Implement a Matcher<Metadata> that ignores last modified time and use >>> that >>> >>> Jeff - your option #3 is unfortunately also race-prone, because the code >>> may match the files after they have been written but before >>> setLastModifiedTime was called. >>> >>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <[email protected]> wrote: >>> >>>> Another option: >>>> >>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly >>>> after each File.write. Then the lastModifiedMillis can be a stable value >>>> for each file and we can use those same static values in our expected >>>> result. I think that would also eliminate the race condition. >>>> >>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <[email protected]> wrote: >>>> >>>>> Thanks Udi, is there a good example for either of these? >>>>> #1 - seems like you have to rewrite your assertion logic without the >>>>> PAssert? Is there some way to capture the pipeline output and iterate over >>>>> it? The pattern I have seen for this in the past also has thread safety >>>>> issues (Using a DoFn at the end of the pipeline to add the output to a >>>>> collection is not safe since the collection can be executed concurrently) >>>>> #2 - Would BigqueryMatcher be a good example for this? which is used >>>>> in BigQueryTornadoesIT.java Or is there another example you would suggest >>>>> looking at for reference? >>>>> >>>>> - I guess to this you need to implement the SerializableMatcher >>>>> interface and use the matcher as an option in the pipeline options. >>>>> >>>>> >>>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <[email protected]> wrote: >>>>> >>>>>> Some options: >>>>>> - You could wait to assert until after p.waitForFinish(). >>>>>> - You could PAssert using SerializableMatcher and allow any >>>>>> lastModifiedTime. >>>>>> >>>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> +Jeff, Eugene, >>>>>>> >>>>>>> Hi Jeff and Eugene, >>>>>>> >>>>>>> I've noticed that Jeff's PR >>>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> >>>>>>> introduced >>>>>>> a race condition in this test, but its not clear exactly how to add >>>>>>> Jeff's >>>>>>> test check in a thread safe way. I believe this to be the source of the >>>>>>> flakeyness Do you have any suggestions Eugene (since you authored this >>>>>>> test)? >>>>>>> >>>>>>> I added some details to this JIRA issue explaining in full >>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2 >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I've seen this fail in a few different PRs for different >>>>>>>> contributors, and its causing some issues during the presubmit >>>>>>>> process.. >>>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit >>>>>>>> suspicious as the source of the problem. >>>>>>>> >>>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/ >>>>>>>> >>>>>>>> I filed a JIRA for this issue: >>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2 >>>>>>>> >>>>>>>> >>>>>>>>
