[jira] [Commented] (DL-145) Fix the flaky testServiceTimeout

ASF GitHub Bot (JIRA) Mon, 19 Dec 2016 23:25:10 -0800

    [ 
https://issues.apache.org/jira/browse/DL-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763511#comment-15763511
 ]


ASF GitHub Bot commented on DL-145:
-----------------------------------

GitHub user xieliang opened a pull request:

    https://github.com/apache/incubator-distributedlog/pull/78

    DL-145 : the write requests should be error out immediately even if the 
rolling writer still be creating

    Passed all test cases locally, now 
TestDistributedLogService#testServiceTimeout case is stable on my box

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xieliang/incubator-distributedlog DL-145

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-distributedlog/pull/78.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #78
    
----
commit 6be8ca4d01c4c40947d4b901f0299c8dcc97c509
Author: xieliang <[email protected]>
Date:   2016-12-20T07:19:38Z

    the write requests should be error out immediately even if the rolling 
writer still be creating

----


> Fix the flaky testServiceTimeout
> --------------------------------
>
>                 Key: DL-145
>                 URL: https://issues.apache.org/jira/browse/DL-145
>             Project: DistributedLog
>          Issue Type: Test
>          Components: distributedlog-service
>    Affects Versions: 0.4.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>
> The TestDistributedLogService#testServiceTimeout case is not stable, e.g. 
> https://builds.apache.org/job/distributedlog-precommit-pullrequest/22/com.twitter$distributedlog-service/testReport/com.twitter.distributedlog.service/TestDistributedLogService/testServiceTimeout/
> It could be reproduced on my box occasionally, and the failures were stable 
> if i tuned the ServiceTimeoutMs from 200 to 150, and always passed if tuned 
> to a larger value, e.g. 1000(btw, my disk is SSD type)
> After digging into it, shows it related with starting a new log segment 
> corner case.
> For a good case, once service time out occurs, steam status : ERROR -> 
> CLOSING -> CLOSED, calling Abortables.asyncAbort will trigger the cached 
> logsegment be aborted, then writeOp will be injected an exception, e.g. write 
> cancel exception.
> For a bad case, since no log records be written before, so there'll be an 
> async start new log segment, once the timeout occurs, the segment starting 
> still not be done, so no cache, then asyncAbort has no change to abort that 
> segment.
> I think change the test timeout value to a larger one should be fine for this 
> special test corner case.
> will attach a minor patch later.  Any suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DL-145) Fix the flaky testServiceTimeout

Reply via email to