Thanks for the review, Alex. I pushed an additional commit to add
commentary and to explicitly pass a copy option that indicates we want to
preserve attributes like timestamps.

On Wed, Jan 23, 2019 at 12:23 PM Alex Amato <[email protected]> wrote:

> Thanks Jeff, I reviewed your PR with one suggestion to add a comment to
> make the test more clear. I am assuming the modified times get copied, not
> re-timestamped on copy, which is why your method works.
> Otherwise looks good to me
>
> On Wed, Jan 23, 2019 at 5:49 AM Jeff Klukas <[email protected]> wrote:
>
>> Posted https://github.com/apache/beam/pull/7599
>>
>> That PR follows suggestion #4. I chose that route because it maintains
>> the PAssert containsInAnyOrder check which seems easier to read and more
>> straight-forward than PAssert satisfies.
>>
>> Do let me know if you disagree and I can switch back to Eugene's
>> suggestion #1.
>>
>> On Wed, Jan 23, 2019 at 8:12 AM Jeff Klukas <[email protected]> wrote:
>>
>>> Suggestion #4: Create source files outside the writer thread, and then
>>> copy them from a source directory to the watched directory. That should
>>> atomically write the file with the already known lastModificationTime.
>>>
>>> On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas <[email protected]> wrote:
>>>
>>>> I'll work on getting a PR together this morning, probably following
>>>> Eugene's suggestion #1.
>>>>
>>>> On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri <[email protected]> wrote:
>>>>
>>>>> Alex, the only way to implement my suggestion #1 (that I know of)
>>>>> would be to write to a file and read it back.
>>>>> I don't have good example for #2.
>>>>>
>>>>> Eugene's suggestion no. 1 seems like a good idea. There are some
>>>>> example
>>>>> <https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
>>>>> in the codebase.
>>>>>
>>>>> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Yeah the "List<MatchResult.Metadata> expected" is constructed
>>>>>> from Files.getLastModifiedTime() calls before the files are actually
>>>>>> modified, the code is basically unconditionally broken rather than merely
>>>>>> flaky.
>>>>>>
>>>>>> There's several easy options:
>>>>>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>>>>>> assertThat().contains() inside that, with the list constructed at time 
>>>>>> the
>>>>>> assertion is applied rather than declared.
>>>>>> 2) Implement a Matcher<Metadata> that ignores last modified time and
>>>>>> use that
>>>>>>
>>>>>> Jeff - your option #3 is unfortunately also race-prone, because the
>>>>>> code may match the files after they have been written but before
>>>>>> setLastModifiedTime was called.
>>>>>>
>>>>>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas <[email protected]> wrote:
>>>>>>
>>>>>>> Another option:
>>>>>>>
>>>>>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>>>>>>> after each File.write. Then the lastModifiedMillis can be a stable value
>>>>>>> for each file and we can use those same static values in our expected
>>>>>>> result. I think that would also eliminate the race condition.
>>>>>>>
>>>>>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Udi, is there a good example for either of these?
>>>>>>>> #1 - seems like you have to rewrite your assertion logic without
>>>>>>>> the PAssert? Is there some way to capture the pipeline output and 
>>>>>>>> iterate
>>>>>>>> over it? The pattern I have seen for this in the past also has thread
>>>>>>>> safety issues (Using a DoFn at the end of the pipeline to add the 
>>>>>>>> output to
>>>>>>>> a collection is not safe since the collection can be executed 
>>>>>>>> concurrently)
>>>>>>>> #2 - Would BigqueryMatcher be a good example for this? which is
>>>>>>>> used in BigQueryTornadoesIT.java Or is there another example you would
>>>>>>>> suggest looking at for reference?
>>>>>>>>
>>>>>>>>    - I guess to this you need to implement the SerializableMatcher
>>>>>>>>    interface and use the matcher as an option in the pipeline options.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Some options:
>>>>>>>>> - You could wait to assert until after p.waitForFinish().
>>>>>>>>> - You could PAssert using SerializableMatcher and allow any
>>>>>>>>> lastModifiedTime.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +Jeff, Eugene,
>>>>>>>>>>
>>>>>>>>>> Hi Jeff and Eugene,
>>>>>>>>>>
>>>>>>>>>> I've noticed that Jeff's PR
>>>>>>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884>
>>>>>>>>>>  introduced
>>>>>>>>>> a race condition in this test, but its not clear exactly how to add 
>>>>>>>>>> Jeff's
>>>>>>>>>> test check in a thread safe way. I believe this to be the source of 
>>>>>>>>>> the
>>>>>>>>>> flakeyness Do you have any suggestions Eugene (since you authored 
>>>>>>>>>> this
>>>>>>>>>> test)?
>>>>>>>>>>
>>>>>>>>>> I added some details to this JIRA issue explaining in full
>>>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've seen this fail in a few different PRs for different
>>>>>>>>>>> contributors, and its causing some issues during the presubmit 
>>>>>>>>>>> process..
>>>>>>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a 
>>>>>>>>>>> bit
>>>>>>>>>>> suspicious as the source of the problem.
>>>>>>>>>>>
>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>>>>>>
>>>>>>>>>>> I filed a JIRA for this issue:
>>>>>>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to