FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Alex Amato
I've seen this fail in a few different PRs for different contributors, and
its causing some issues during the presubmit process.. This is a
multithreadred test with a lot of sleeps, so it looks a bit suspicious as
the source of the problem.
https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/

I filed a JIRA for this issue:
https://jira.apache.org/jira/browse/BEAM-6491?filter=-2


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Ruoyun Huang
+1  getting the same issue as well.

I saw there were @Ignore on those tests before. If it is not critical and
just caused by the way how we do test, does it make sense to put those
@Ignores back until it's resolved?


On Tue, Jan 22, 2019 at 3:35 PM Alex Amato  wrote:

> I've seen this fail in a few different PRs for different contributors, and
> its causing some issues during the presubmit process.. This is a
> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
> the source of the problem.
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>
> I filed a JIRA for this issue:
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
>

-- 

Ruoyun  Huang


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Alex Amato
+Jeff, Eugene,

Hi Jeff and Eugene,

I've noticed that Jeff's PR

introduced
a race condition in this test, but its not clear exactly how to add Jeff's
test check in a thread safe way. I believe this to be the source of the
flakeyness Do you have any suggestions Eugene (since you authored this
test)?

I added some details to this JIRA issue explaining in full
https://jira.apache.org/jira/browse/BEAM-6491?filter=-2


On Tue, Jan 22, 2019 at 3:34 PM Alex Amato  wrote:

> I've seen this fail in a few different PRs for different contributors, and
> its causing some issues during the presubmit process.. This is a
> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
> the source of the problem.
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>
> I filed a JIRA for this issue:
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
>


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Udi Meiri
Some options:
- You could wait to assert until after p.waitForFinish().
- You could PAssert using SerializableMatcher and allow any
lastModifiedTime.

On Tue, Jan 22, 2019 at 3:56 PM Alex Amato  wrote:

> +Jeff, Eugene,
>
> Hi Jeff and Eugene,
>
> I've noticed that Jeff's PR
> 
>  introduced
> a race condition in this test, but its not clear exactly how to add Jeff's
> test check in a thread safe way. I believe this to be the source of the
> flakeyness Do you have any suggestions Eugene (since you authored this
> test)?
>
> I added some details to this JIRA issue explaining in full
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato  wrote:
>
>> I've seen this fail in a few different PRs for different contributors,
>> and its causing some issues during the presubmit process.. This is a
>> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
>> the source of the problem.
>>
>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>
>> I filed a JIRA for this issue:
>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Alex Amato
Thanks Udi, is there a good example for either of these?
#1 - seems like you have to rewrite your assertion logic without the
PAssert? Is there some way to capture the pipeline output and iterate over
it? The pattern I have seen for this in the past also has thread safety
issues (Using a DoFn at the end of the pipeline to add the output to a
collection is not safe since the collection can be executed concurrently)
#2 - Would BigqueryMatcher be a good example for this? which is used in
BigQueryTornadoesIT.java Or is there another example you would suggest
looking at for reference?

   - I guess to this you need to implement the SerializableMatcher
   interface and use the matcher as an option in the pipeline options.


On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri  wrote:

> Some options:
> - You could wait to assert until after p.waitForFinish().
> - You could PAssert using SerializableMatcher and allow any
> lastModifiedTime.
>
> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato  wrote:
>
>> +Jeff, Eugene,
>>
>> Hi Jeff and Eugene,
>>
>> I've noticed that Jeff's PR
>> 
>>  introduced
>> a race condition in this test, but its not clear exactly how to add Jeff's
>> test check in a thread safe way. I believe this to be the source of the
>> flakeyness Do you have any suggestions Eugene (since you authored this
>> test)?
>>
>> I added some details to this JIRA issue explaining in full
>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>
>>
>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato  wrote:
>>
>>> I've seen this fail in a few different PRs for different contributors,
>>> and its causing some issues during the presubmit process.. This is a
>>> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
>>> the source of the problem.
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>
>>> I filed a JIRA for this issue:
>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>
>>>
>>>


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Eugene Kirpichov
Yeah the "List expected" is constructed
from Files.getLastModifiedTime() calls before the files are actually
modified, the code is basically unconditionally broken rather than merely
flaky.

There's several easy options:
1) Use PAssert.that().satisfies() instead of .contains(), and use
assertThat().contains() inside that, with the list constructed at time the
assertion is applied rather than declared.
2) Implement a Matcher that ignores last modified time and use
that

Jeff - your option #3 is unfortunately also race-prone, because the code
may match the files after they have been written but before
setLastModifiedTime was called.

On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas  wrote:

> Another option:
>
> #3 Have the writer thread call Files.setLastModifiedTime explicitly after
> each File.write. Then the lastModifiedMillis can be a stable value for each
> file and we can use those same static values in our expected result. I
> think that would also eliminate the race condition.
>
> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato  wrote:
>
>> Thanks Udi, is there a good example for either of these?
>> #1 - seems like you have to rewrite your assertion logic without the
>> PAssert? Is there some way to capture the pipeline output and iterate over
>> it? The pattern I have seen for this in the past also has thread safety
>> issues (Using a DoFn at the end of the pipeline to add the output to a
>> collection is not safe since the collection can be executed concurrently)
>> #2 - Would BigqueryMatcher be a good example for this? which is used in
>> BigQueryTornadoesIT.java Or is there another example you would suggest
>> looking at for reference?
>>
>>- I guess to this you need to implement the SerializableMatcher
>>interface and use the matcher as an option in the pipeline options.
>>
>>
>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri  wrote:
>>
>>> Some options:
>>> - You could wait to assert until after p.waitForFinish().
>>> - You could PAssert using SerializableMatcher and allow any
>>> lastModifiedTime.
>>>
>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato  wrote:
>>>
 +Jeff, Eugene,

 Hi Jeff and Eugene,

 I've noticed that Jeff's PR
 
  introduced
 a race condition in this test, but its not clear exactly how to add Jeff's
 test check in a thread safe way. I believe this to be the source of the
 flakeyness Do you have any suggestions Eugene (since you authored this
 test)?

 I added some details to this JIRA issue explaining in full
 https://jira.apache.org/jira/browse/BEAM-6491?filter=-2


 On Tue, Jan 22, 2019 at 3:34 PM Alex Amato  wrote:

> I've seen this fail in a few different PRs for different contributors,
> and its causing some issues during the presubmit process.. This is a
> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
> the source of the problem.
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>
> I filed a JIRA for this issue:
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
>


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Udi Meiri
Alex, the only way to implement my suggestion #1 (that I know of) would be
to write to a file and read it back.
I don't have good example for #2.

Eugene's suggestion no. 1 seems like a good idea. There are some example

in the codebase.

On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov 
wrote:

> Yeah the "List expected" is constructed
> from Files.getLastModifiedTime() calls before the files are actually
> modified, the code is basically unconditionally broken rather than merely
> flaky.
>
> There's several easy options:
> 1) Use PAssert.that().satisfies() instead of .contains(), and use
> assertThat().contains() inside that, with the list constructed at time the
> assertion is applied rather than declared.
> 2) Implement a Matcher that ignores last modified time and use
> that
>
> Jeff - your option #3 is unfortunately also race-prone, because the code
> may match the files after they have been written but before
> setLastModifiedTime was called.
>
> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas  wrote:
>
>> Another option:
>>
>> #3 Have the writer thread call Files.setLastModifiedTime explicitly after
>> each File.write. Then the lastModifiedMillis can be a stable value for each
>> file and we can use those same static values in our expected result. I
>> think that would also eliminate the race condition.
>>
>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato  wrote:
>>
>>> Thanks Udi, is there a good example for either of these?
>>> #1 - seems like you have to rewrite your assertion logic without the
>>> PAssert? Is there some way to capture the pipeline output and iterate over
>>> it? The pattern I have seen for this in the past also has thread safety
>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>> collection is not safe since the collection can be executed concurrently)
>>> #2 - Would BigqueryMatcher be a good example for this? which is used in
>>> BigQueryTornadoesIT.java Or is there another example you would suggest
>>> looking at for reference?
>>>
>>>- I guess to this you need to implement the SerializableMatcher
>>>interface and use the matcher as an option in the pipeline options.
>>>
>>>
>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri  wrote:
>>>
 Some options:
 - You could wait to assert until after p.waitForFinish().
 - You could PAssert using SerializableMatcher and allow any
 lastModifiedTime.

 On Tue, Jan 22, 2019 at 3:56 PM Alex Amato  wrote:

> +Jeff, Eugene,
>
> Hi Jeff and Eugene,
>
> I've noticed that Jeff's PR
> 
>  introduced
> a race condition in this test, but its not clear exactly how to add Jeff's
> test check in a thread safe way. I believe this to be the source of the
> flakeyness Do you have any suggestions Eugene (since you authored this
> test)?
>
> I added some details to this JIRA issue explaining in full
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato  wrote:
>
>> I've seen this fail in a few different PRs for different
>> contributors, and its causing some issues during the presubmit process..
>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>> suspicious as the source of the problem.
>>
>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>
>> I filed a JIRA for this issue:
>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-23 Thread Jeff Klukas
I'll work on getting a PR together this morning, probably following
Eugene's suggestion #1.

On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri  wrote:

> Alex, the only way to implement my suggestion #1 (that I know of) would be
> to write to a file and read it back.
> I don't have good example for #2.
>
> Eugene's suggestion no. 1 seems like a good idea. There are some example
> 
> in the codebase.
>
> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov 
> wrote:
>
>> Yeah the "List expected" is constructed
>> from Files.getLastModifiedTime() calls before the files are actually
>> modified, the code is basically unconditionally broken rather than merely
>> flaky.
>>
>> There's several easy options:
>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>> assertThat().contains() inside that, with the list constructed at time the
>> assertion is applied rather than declared.
>> 2) Implement a Matcher that ignores last modified time and use
>> that
>>
>> Jeff - your option #3 is unfortunately also race-prone, because the code
>> may match the files after they have been written but before
>> setLastModifiedTime was called.
>>
>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas  wrote:
>>
>>> Another option:
>>>
>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>>> after each File.write. Then the lastModifiedMillis can be a stable value
>>> for each file and we can use those same static values in our expected
>>> result. I think that would also eliminate the race condition.
>>>
>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato  wrote:
>>>
 Thanks Udi, is there a good example for either of these?
 #1 - seems like you have to rewrite your assertion logic without the
 PAssert? Is there some way to capture the pipeline output and iterate over
 it? The pattern I have seen for this in the past also has thread safety
 issues (Using a DoFn at the end of the pipeline to add the output to a
 collection is not safe since the collection can be executed concurrently)
 #2 - Would BigqueryMatcher be a good example for this? which is used in
 BigQueryTornadoesIT.java Or is there another example you would suggest
 looking at for reference?

- I guess to this you need to implement the SerializableMatcher
interface and use the matcher as an option in the pipeline options.


 On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri  wrote:

> Some options:
> - You could wait to assert until after p.waitForFinish().
> - You could PAssert using SerializableMatcher and allow any
> lastModifiedTime.
>
> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato  wrote:
>
>> +Jeff, Eugene,
>>
>> Hi Jeff and Eugene,
>>
>> I've noticed that Jeff's PR
>> 
>>  introduced
>> a race condition in this test, but its not clear exactly how to add 
>> Jeff's
>> test check in a thread safe way. I believe this to be the source of the
>> flakeyness Do you have any suggestions Eugene (since you authored this
>> test)?
>>
>> I added some details to this JIRA issue explaining in full
>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>
>>
>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato 
>> wrote:
>>
>>> I've seen this fail in a few different PRs for different
>>> contributors, and its causing some issues during the presubmit process..
>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>> suspicious as the source of the problem.
>>>
>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>
>>> I filed a JIRA for this issue:
>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>
>>>
>>>


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-23 Thread Jeff Klukas
Suggestion #4: Create source files outside the writer thread, and then copy
them from a source directory to the watched directory. That should
atomically write the file with the already known lastModificationTime.

On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas  wrote:

> I'll work on getting a PR together this morning, probably following
> Eugene's suggestion #1.
>
> On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri  wrote:
>
>> Alex, the only way to implement my suggestion #1 (that I know of) would
>> be to write to a file and read it back.
>> I don't have good example for #2.
>>
>> Eugene's suggestion no. 1 seems like a good idea. There are some example
>> 
>> in the codebase.
>>
>> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov 
>> wrote:
>>
>>> Yeah the "List expected" is constructed
>>> from Files.getLastModifiedTime() calls before the files are actually
>>> modified, the code is basically unconditionally broken rather than merely
>>> flaky.
>>>
>>> There's several easy options:
>>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>>> assertThat().contains() inside that, with the list constructed at time the
>>> assertion is applied rather than declared.
>>> 2) Implement a Matcher that ignores last modified time and use
>>> that
>>>
>>> Jeff - your option #3 is unfortunately also race-prone, because the code
>>> may match the files after they have been written but before
>>> setLastModifiedTime was called.
>>>
>>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas  wrote:
>>>
 Another option:

 #3 Have the writer thread call Files.setLastModifiedTime explicitly
 after each File.write. Then the lastModifiedMillis can be a stable value
 for each file and we can use those same static values in our expected
 result. I think that would also eliminate the race condition.

 On Tue, Jan 22, 2019 at 7:48 PM Alex Amato  wrote:

> Thanks Udi, is there a good example for either of these?
> #1 - seems like you have to rewrite your assertion logic without the
> PAssert? Is there some way to capture the pipeline output and iterate over
> it? The pattern I have seen for this in the past also has thread safety
> issues (Using a DoFn at the end of the pipeline to add the output to a
> collection is not safe since the collection can be executed concurrently)
> #2 - Would BigqueryMatcher be a good example for this? which is used
> in BigQueryTornadoesIT.java Or is there another example you would suggest
> looking at for reference?
>
>- I guess to this you need to implement the SerializableMatcher
>interface and use the matcher as an option in the pipeline options.
>
>
> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri  wrote:
>
>> Some options:
>> - You could wait to assert until after p.waitForFinish().
>> - You could PAssert using SerializableMatcher and allow any
>> lastModifiedTime.
>>
>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato 
>> wrote:
>>
>>> +Jeff, Eugene,
>>>
>>> Hi Jeff and Eugene,
>>>
>>> I've noticed that Jeff's PR
>>> 
>>>  introduced
>>> a race condition in this test, but its not clear exactly how to add 
>>> Jeff's
>>> test check in a thread safe way. I believe this to be the source of the
>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>> test)?
>>>
>>> I added some details to this JIRA issue explaining in full
>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>
>>>
>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato 
>>> wrote:
>>>
 I've seen this fail in a few different PRs for different
 contributors, and its causing some issues during the presubmit 
 process..
 This is a multithreadred test with a lot of sleeps, so it looks a bit
 suspicious as the source of the problem.

 https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/

 I filed a JIRA for this issue:
 https://jira.apache.org/jira/browse/BEAM-6491?filter=-2





Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-23 Thread Jeff Klukas
Posted https://github.com/apache/beam/pull/7599

That PR follows suggestion #4. I chose that route because it maintains the
PAssert containsInAnyOrder check which seems easier to read and more
straight-forward than PAssert satisfies.

Do let me know if you disagree and I can switch back to Eugene's suggestion
#1.

On Wed, Jan 23, 2019 at 8:12 AM Jeff Klukas  wrote:

> Suggestion #4: Create source files outside the writer thread, and then
> copy them from a source directory to the watched directory. That should
> atomically write the file with the already known lastModificationTime.
>
> On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas  wrote:
>
>> I'll work on getting a PR together this morning, probably following
>> Eugene's suggestion #1.
>>
>> On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri  wrote:
>>
>>> Alex, the only way to implement my suggestion #1 (that I know of) would
>>> be to write to a file and read it back.
>>> I don't have good example for #2.
>>>
>>> Eugene's suggestion no. 1 seems like a good idea. There are some example
>>> 
>>> in the codebase.
>>>
>>> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov 
>>> wrote:
>>>
 Yeah the "List expected" is constructed
 from Files.getLastModifiedTime() calls before the files are actually
 modified, the code is basically unconditionally broken rather than merely
 flaky.

 There's several easy options:
 1) Use PAssert.that().satisfies() instead of .contains(), and use
 assertThat().contains() inside that, with the list constructed at time the
 assertion is applied rather than declared.
 2) Implement a Matcher that ignores last modified time and
 use that

 Jeff - your option #3 is unfortunately also race-prone, because the
 code may match the files after they have been written but before
 setLastModifiedTime was called.

 On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas  wrote:

> Another option:
>
> #3 Have the writer thread call Files.setLastModifiedTime explicitly
> after each File.write. Then the lastModifiedMillis can be a stable value
> for each file and we can use those same static values in our expected
> result. I think that would also eliminate the race condition.
>
> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato  wrote:
>
>> Thanks Udi, is there a good example for either of these?
>> #1 - seems like you have to rewrite your assertion logic without the
>> PAssert? Is there some way to capture the pipeline output and iterate 
>> over
>> it? The pattern I have seen for this in the past also has thread safety
>> issues (Using a DoFn at the end of the pipeline to add the output to a
>> collection is not safe since the collection can be executed concurrently)
>> #2 - Would BigqueryMatcher be a good example for this? which is used
>> in BigQueryTornadoesIT.java Or is there another example you would suggest
>> looking at for reference?
>>
>>- I guess to this you need to implement the SerializableMatcher
>>interface and use the matcher as an option in the pipeline options.
>>
>>
>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri  wrote:
>>
>>> Some options:
>>> - You could wait to assert until after p.waitForFinish().
>>> - You could PAssert using SerializableMatcher and allow any
>>> lastModifiedTime.
>>>
>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato 
>>> wrote:
>>>
 +Jeff, Eugene,

 Hi Jeff and Eugene,

 I've noticed that Jeff's PR
 
  introduced
 a race condition in this test, but its not clear exactly how to add 
 Jeff's
 test check in a thread safe way. I believe this to be the source of the
 flakeyness Do you have any suggestions Eugene (since you authored this
 test)?

 I added some details to this JIRA issue explaining in full
 https://jira.apache.org/jira/browse/BEAM-6491?filter=-2


 On Tue, Jan 22, 2019 at 3:34 PM Alex Amato 
 wrote:

> I've seen this fail in a few different PRs for different
> contributors, and its causing some issues during the presubmit 
> process..
> This is a multithreadred test with a lot of sleeps, so it looks a bit
> suspicious as the source of the problem.
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>
> I filed a JIRA for this issue:
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
>


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-23 Thread Alex Amato
Thanks Jeff, I reviewed your PR with one suggestion to add a comment to
make the test more clear. I am assuming the modified times get copied, not
re-timestamped on copy, which is why your method works.
Otherwise looks good to me

On Wed, Jan 23, 2019 at 5:49 AM Jeff Klukas  wrote:

> Posted https://github.com/apache/beam/pull/7599
>
> That PR follows suggestion #4. I chose that route because it maintains the
> PAssert containsInAnyOrder check which seems easier to read and more
> straight-forward than PAssert satisfies.
>
> Do let me know if you disagree and I can switch back to Eugene's
> suggestion #1.
>
> On Wed, Jan 23, 2019 at 8:12 AM Jeff Klukas  wrote:
>
>> Suggestion #4: Create source files outside the writer thread, and then
>> copy them from a source directory to the watched directory. That should
>> atomically write the file with the already known lastModificationTime.
>>
>> On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas  wrote:
>>
>>> I'll work on getting a PR together this morning, probably following
>>> Eugene's suggestion #1.
>>>
>>> On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri  wrote:
>>>
 Alex, the only way to implement my suggestion #1 (that I know of) would
 be to write to a file and read it back.
 I don't have good example for #2.

 Eugene's suggestion no. 1 seems like a good idea. There are some
 example
 
 in the codebase.

 On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov 
 wrote:

> Yeah the "List expected" is constructed
> from Files.getLastModifiedTime() calls before the files are actually
> modified, the code is basically unconditionally broken rather than merely
> flaky.
>
> There's several easy options:
> 1) Use PAssert.that().satisfies() instead of .contains(), and use
> assertThat().contains() inside that, with the list constructed at time the
> assertion is applied rather than declared.
> 2) Implement a Matcher that ignores last modified time and
> use that
>
> Jeff - your option #3 is unfortunately also race-prone, because the
> code may match the files after they have been written but before
> setLastModifiedTime was called.
>
> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas  wrote:
>
>> Another option:
>>
>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>> after each File.write. Then the lastModifiedMillis can be a stable value
>> for each file and we can use those same static values in our expected
>> result. I think that would also eliminate the race condition.
>>
>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato 
>> wrote:
>>
>>> Thanks Udi, is there a good example for either of these?
>>> #1 - seems like you have to rewrite your assertion logic without the
>>> PAssert? Is there some way to capture the pipeline output and iterate 
>>> over
>>> it? The pattern I have seen for this in the past also has thread safety
>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>> collection is not safe since the collection can be executed 
>>> concurrently)
>>> #2 - Would BigqueryMatcher be a good example for this? which is used
>>> in BigQueryTornadoesIT.java Or is there another example you would 
>>> suggest
>>> looking at for reference?
>>>
>>>- I guess to this you need to implement the SerializableMatcher
>>>interface and use the matcher as an option in the pipeline options.
>>>
>>>
>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri  wrote:
>>>
 Some options:
 - You could wait to assert until after p.waitForFinish().
 - You could PAssert using SerializableMatcher and allow any
 lastModifiedTime.

 On Tue, Jan 22, 2019 at 3:56 PM Alex Amato 
 wrote:

> +Jeff, Eugene,
>
> Hi Jeff and Eugene,
>
> I've noticed that Jeff's PR
> 
>  introduced
> a race condition in this test, but its not clear exactly how to add 
> Jeff's
> test check in a thread safe way. I believe this to be the source of 
> the
> flakeyness Do you have any suggestions Eugene (since you authored this
> test)?
>
> I added some details to this JIRA issue explaining in full
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato 
> wrote:
>
>> I've seen this fail in a few different PRs for different
>> contributors, and its causing some issues during the presubmit 
>> process..

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-23 Thread Jeff Klukas
Thanks for the review, Alex. I pushed an additional commit to add
commentary and to explicitly pass a copy option that indicates we want to
preserve attributes like timestamps.

On Wed, Jan 23, 2019 at 12:23 PM Alex Amato  wrote:

> Thanks Jeff, I reviewed your PR with one suggestion to add a comment to
> make the test more clear. I am assuming the modified times get copied, not
> re-timestamped on copy, which is why your method works.
> Otherwise looks good to me
>
> On Wed, Jan 23, 2019 at 5:49 AM Jeff Klukas  wrote:
>
>> Posted https://github.com/apache/beam/pull/7599
>>
>> That PR follows suggestion #4. I chose that route because it maintains
>> the PAssert containsInAnyOrder check which seems easier to read and more
>> straight-forward than PAssert satisfies.
>>
>> Do let me know if you disagree and I can switch back to Eugene's
>> suggestion #1.
>>
>> On Wed, Jan 23, 2019 at 8:12 AM Jeff Klukas  wrote:
>>
>>> Suggestion #4: Create source files outside the writer thread, and then
>>> copy them from a source directory to the watched directory. That should
>>> atomically write the file with the already known lastModificationTime.
>>>
>>> On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas  wrote:
>>>
 I'll work on getting a PR together this morning, probably following
 Eugene's suggestion #1.

 On Tue, Jan 22, 2019 at 8:34 PM Udi Meiri  wrote:

> Alex, the only way to implement my suggestion #1 (that I know of)
> would be to write to a file and read it back.
> I don't have good example for #2.
>
> Eugene's suggestion no. 1 seems like a good idea. There are some
> example
> 
> in the codebase.
>
> On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov 
> wrote:
>
>> Yeah the "List expected" is constructed
>> from Files.getLastModifiedTime() calls before the files are actually
>> modified, the code is basically unconditionally broken rather than merely
>> flaky.
>>
>> There's several easy options:
>> 1) Use PAssert.that().satisfies() instead of .contains(), and use
>> assertThat().contains() inside that, with the list constructed at time 
>> the
>> assertion is applied rather than declared.
>> 2) Implement a Matcher that ignores last modified time and
>> use that
>>
>> Jeff - your option #3 is unfortunately also race-prone, because the
>> code may match the files after they have been written but before
>> setLastModifiedTime was called.
>>
>> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas  wrote:
>>
>>> Another option:
>>>
>>> #3 Have the writer thread call Files.setLastModifiedTime explicitly
>>> after each File.write. Then the lastModifiedMillis can be a stable value
>>> for each file and we can use those same static values in our expected
>>> result. I think that would also eliminate the race condition.
>>>
>>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato 
>>> wrote:
>>>
 Thanks Udi, is there a good example for either of these?
 #1 - seems like you have to rewrite your assertion logic without
 the PAssert? Is there some way to capture the pipeline output and 
 iterate
 over it? The pattern I have seen for this in the past also has thread
 safety issues (Using a DoFn at the end of the pipeline to add the 
 output to
 a collection is not safe since the collection can be executed 
 concurrently)
 #2 - Would BigqueryMatcher be a good example for this? which is
 used in BigQueryTornadoesIT.java Or is there another example you would
 suggest looking at for reference?

- I guess to this you need to implement the SerializableMatcher
interface and use the matcher as an option in the pipeline options.


 On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri  wrote:

> Some options:
> - You could wait to assert until after p.waitForFinish().
> - You could PAssert using SerializableMatcher and allow any
> lastModifiedTime.
>
> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato 
> wrote:
>
>> +Jeff, Eugene,
>>
>> Hi Jeff and Eugene,
>>
>> I've noticed that Jeff's PR
>> 
>>  introduced
>> a race condition in this test, but its not clear exactly how to add 
>> Jeff's
>> test check in a thread safe way. I believe this to be the source of 
>> the
>> flakeyness Do you have any suggestions Eugene (since you authored 
>> this
>> test)?
>>
>> I added some details t

[BEAM-7165] FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-04-26 Thread Alex Amato
https://issues.apache.org/jira/browse/BEAM-7165

https://builds.apache.org/job/beam_PreCommit_Java_Commit/5634/testReport/junit/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/

Note: This test was flakey and fixed in BEAM-6491
, filed this new ticket
since I am not sure if its the same issue.
Stacktrace

java.lang.AssertionError:
FileIO.MatchAll/Reshuffle.ViaRandomKey/Values/Values/Map/ParMultiDo(Anonymous).output:
Expected: iterable with items
[,
,
] in any
order but: not matched:
 at
org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:393) at
org.apache.beam.sdk.testing.PAssert.that(PAssert.java:385) at
org.apache.beam.sdk.io.FileIOTest.testMatchWatchForNewFiles(FileIOTest.java:262)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
at
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:265)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) at
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:349) at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314) at
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312) at
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at
org.junit.runners.ParentRunner.run(ParentRunner.java:396) at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl