On 8/28/2019 12:15 PM, SZEDER Gábor wrote:
> On Wed, Aug 28, 2019 at 11:39:44AM -0400, Jeff King wrote:
>> On Wed, Aug 28, 2019 at 10:54:12AM -0400, Jeff King wrote:
>>
>>>> Unfortunately, however, while running './t5516-fetch-push.sh -r 1,79
>>>> --stress' to try to reproduce a failure caused by those mingled
>>>> messages, the same check only failed for a different reason so far
>>>> (both on Linux and macOS (on Travis CI)):
>>>
>>> There's some hand-waving argument that this should be race-free in
>>> 014ade7484 (upload-pack: send ERR packet for non-tip objects,
>>> 2019-04-13), but I am not too surprised if there is a flaw in that
>>> logic.
>>
>> By the way, I've not been able to reproduce this locally after ~10
>> minutes of running "./t5516-fetch-push.sh -r 1,79 --stress" on my Linux
>> box. I wonder what's different.
>>
>> Are you running the tip of master?
>
> Yeah, but this seems to be one of those "you have to be really lucky,
> even with --stress" cases.
>
> So... I was away for keyboard for over an hour and let it run on
> 'master', but it didn't fail. Then I figured that I give it a try
> with Derrick's patch, because, well, why not, and then I got this
> broken pipe error in ~150 repetitions. Run it again, same error after
> ~200 reps. However, I didn't understand how that patch could lead to
> broken pipe, so went back to stressing master... nothing. So I
> started writing the reply to that patch saying that it seems to cause
> some racy failures on Linux, and was already proofreading before
> sending when the damn thing finally did fail. Oh, well.
>
> Then tried it on macOS, and it failed fairly quickly. For lack of
> better options I used Travis CI's debug shell to access a mac VM, and
> could reproduce the failure both with and without the patch before it
> timeouted.
I'm running these tests under --stress now, but not seeing the error
you saw.
However, I do have a theory: the process exits before flushing the
packet line. Adding this line before exit(1) should fix it:
packet_writer_flush(writer);
I can send this in a v2, but it would be nice if you could test this
in your environment that already demonstrated the failure.
Thanks,
-Stolee