Alex, the only way to implement my suggestion #1 (that I know of) would be
to write to a file and read it back.
I don't have good example for #2.

Eugene's suggestion no. 1 seems like a good idea. There are some example
<https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
in the codebase.

On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <[email protected]>
wrote:

> Yeah the "List<MatchResult.Metadata> expected" is constructed
> from Files.getLastModifiedTime() calls before the files are actually
> modified, the code is basically unconditionally broken rather than merely
> flaky.
>
> There's several easy options:
> 1) Use PAssert.that().satisfies() instead of .contains(), and use
> assertThat().contains() inside that, with the list constructed at time the
> assertion is applied rather than declared.
> 2) Implement a Matcher<Metadata> that ignores last modified time and use
> that
>
> Jeff - your option #3 is unfortunately also race-prone, because the code
> may match the files after they have been written but before
> setLastModifiedTime was called.
>
> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <[email protected]> wrote:
>
>> Another option:
>>
>> #3 Have the writer thread call Files.setLastModifiedTime explicitly after
>> each File.write. Then the lastModifiedMillis can be a stable value for each
>> file and we can use those same static values in our expected result. I
>> think that would also eliminate the race condition.
>>
>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <[email protected]> wrote:
>>
>>> Thanks Udi, is there a good example for either of these?
>>> #1 - seems like you have to rewrite your assertion logic without the
>>> PAssert? Is there some way to capture the pipeline output and iterate over
>>> it? The pattern I have seen for this in the past also has thread safety
>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>> collection is not safe since the collection can be executed concurrently)
>>> #2 - Would BigqueryMatcher be a good example for this? which is used in
>>> BigQueryTornadoesIT.java Or is there another example you would suggest
>>> looking at for reference?
>>>
>>>    - I guess to this you need to implement the SerializableMatcher
>>>    interface and use the matcher as an option in the pipeline options.
>>>
>>>
>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <[email protected]> wrote:
>>>
>>>> Some options:
>>>> - You could wait to assert until after p.waitForFinish().
>>>> - You could PAssert using SerializableMatcher and allow any
>>>> lastModifiedTime.
>>>>
>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <[email protected]> wrote:
>>>>
>>>>> +Jeff, Eugene,
>>>>>
>>>>> Hi Jeff and Eugene,
>>>>>
>>>>> I've noticed that Jeff's PR
>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884>
>>>>>  introduced
>>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>>> test check in a thread safe way. I believe this to be the source of the
>>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>>> test)?
>>>>>
>>>>> I added some details to this JIRA issue explaining in full
>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>
>>>>>
>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <[email protected]> wrote:
>>>>>
>>>>>> I've seen this fail in a few different PRs for different
>>>>>> contributors, and its causing some issues during the presubmit process..
>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>>>>> suspicious as the source of the problem.
>>>>>>
>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>
>>>>>> I filed a JIRA for this issue:
>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>
>>>>>>
>>>>>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to