Alex, the only way to implement my suggestion #1 (that I know of) would be to write to a file and read it back. I don't have good example for #2.
Eugene's suggestion no. 1 seems like a good idea. There are some example <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340> in the codebase. On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <[email protected]> wrote: > Yeah the "List<MatchResult.Metadata> expected" is constructed > from Files.getLastModifiedTime() calls before the files are actually > modified, the code is basically unconditionally broken rather than merely > flaky. > > There's several easy options: > 1) Use PAssert.that().satisfies() instead of .contains(), and use > assertThat().contains() inside that, with the list constructed at time the > assertion is applied rather than declared. > 2) Implement a Matcher<Metadata> that ignores last modified time and use > that > > Jeff - your option #3 is unfortunately also race-prone, because the code > may match the files after they have been written but before > setLastModifiedTime was called. > > On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <[email protected]> wrote: > >> Another option: >> >> #3 Have the writer thread call Files.setLastModifiedTime explicitly after >> each File.write. Then the lastModifiedMillis can be a stable value for each >> file and we can use those same static values in our expected result. I >> think that would also eliminate the race condition. >> >> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <[email protected]> wrote: >> >>> Thanks Udi, is there a good example for either of these? >>> #1 - seems like you have to rewrite your assertion logic without the >>> PAssert? Is there some way to capture the pipeline output and iterate over >>> it? The pattern I have seen for this in the past also has thread safety >>> issues (Using a DoFn at the end of the pipeline to add the output to a >>> collection is not safe since the collection can be executed concurrently) >>> #2 - Would BigqueryMatcher be a good example for this? which is used in >>> BigQueryTornadoesIT.java Or is there another example you would suggest >>> looking at for reference? >>> >>> - I guess to this you need to implement the SerializableMatcher >>> interface and use the matcher as an option in the pipeline options. >>> >>> >>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <[email protected]> wrote: >>> >>>> Some options: >>>> - You could wait to assert until after p.waitForFinish(). >>>> - You could PAssert using SerializableMatcher and allow any >>>> lastModifiedTime. >>>> >>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <[email protected]> wrote: >>>> >>>>> +Jeff, Eugene, >>>>> >>>>> Hi Jeff and Eugene, >>>>> >>>>> I've noticed that Jeff's PR >>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884> >>>>> introduced >>>>> a race condition in this test, but its not clear exactly how to add Jeff's >>>>> test check in a thread safe way. I believe this to be the source of the >>>>> flakeyness Do you have any suggestions Eugene (since you authored this >>>>> test)? >>>>> >>>>> I added some details to this JIRA issue explaining in full >>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2 >>>>> >>>>> >>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <[email protected]> wrote: >>>>> >>>>>> I've seen this fail in a few different PRs for different >>>>>> contributors, and its causing some issues during the presubmit process.. >>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit >>>>>> suspicious as the source of the problem. >>>>>> >>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/ >>>>>> >>>>>> I filed a JIRA for this issue: >>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2 >>>>>> >>>>>> >>>>>>
smime.p7s
Description: S/MIME Cryptographic Signature
