[ 
https://issues.apache.org/jira/browse/STORM-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498733#comment-14498733
 ] 

ASF GitHub Bot commented on STORM-773:
--------------------------------------

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/526#issuecomment-93840072
  
    So after much searching and tracing through logs, with some added logs in 
the CoordinatedBolt I found out that the CoordinatedBolt was timing out the 
batch in a few cases, if the batch took longer then 300ms to complete because 
the timeout is set to 30 seconds by default and 10 seconds of simulated time 
equals 100ms of wall time.  When this would happen the bolts would be confused 
and the batch would never be fully acked.  I am not sure why the 
coordinator/spout was not getting a timeout and replaying the batch in 
simulated time, but because it is a simulated time issue, and only really shows 
up on this one test, I decided to increase the timeout.  If others think we 
should dig deeper and understand why the replay is not happening I am happy to 
hand the JIRA over to them.


> backtype.storm.transactional-test fails periodically with timeout
> -----------------------------------------------------------------
>
>                 Key: STORM-773
>                 URL: https://issues.apache.org/jira/browse/STORM-773
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: failure.txt, success.txt
>
>
> I'm not totally sure what is happening here, but fairly frequently now on my 
> mac running JDK8 backtype.storm.transactional-test will timeout.
> test-transactional-topology-restart seems to be the test in there that is 
> getting the timeouts.
> I made some modifications to the test to just run that one test case, and to 
> turn topology.debug on. I captured examples of it working and failing.  I'll 
> attach the logs files shortly.  I am have not really had much time to dig 
> into this, so I am not totally sure what is happening here.  I can see from 
> the logs that on the first run of the topology the failure case only emits 10 
> batches, where as the successful case outputs many more.  On the second run 
> of the topology the failure case starts off at batch 11, but does not go 
> beyond it.  Where as the successful case keeps going.
> I'll try to find some time to look into it more, but I'm not sure how much 
> time I will have in the near future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to