GitHub user shanthoosh opened a pull request:

    https://github.com/apache/samza/pull/260

    Fix flaky, slow integration tests in TestZkStreamProcessor and 
TestZkStreamProcessorSession

    Fix flaky and slow integration tests in TestZkStreamProcessor and 
TestZkStreamProcessorSession
    Reason for failures:
    
    There’re three configurable wait times in rebalancing phase in samza 
standalone before consensus is acheived and processing resumes with updated 
jobModel.
    
    * debounceTime (Specified by `job.debounce.time.ms`. Upon processor change, 
leader waits for this interval before generating jobModel expecting 
stabilization in processors group(new arrival, deletion etc)).
    * taskShutdownMs (Specified by `task.shutdown.ms`. Wait time for 
SamzaContainer shutdown in StreamProcessor).
    * barrierWaitTimeOutMs (Specified by 
`job.coordinator.zk.consensus.timeout.ms`. Wait time for all processors in the 
group to join the barrier after creation).
    
    Above wait times affects rebalancing phase duration. All these wait time 
have defaults in order of 40-60 seconds and not set to low values.
    
    Flaky tests expects processors to come back up after rebalancing phase and 
drain message sources(Accomplished by checking a latch.count. 
RemoteApplicationRunner integration tests does exact same thing by checking if 
kafka input queue is drained directly with similar logic).
    
    In worst case rebalancing phases can last upto 3-4 minutes(Making these 
tests sometime take 10 minutes at worst case).
    
    Change:
    
    Set all the above timeouts to 2 seconds(Sufficient for tests and verified 
by local build).
    
    Benefits:
    
    * Faster build time(Average runtime of these individual tests were reduced 
from 1m56s to 14s)
    * More predicability in assertions(Didn’t fail even once in 30-40 
attempts locally).
    
    
    
    NOTE: If this doesn’t fix TestZkStreamProcessor and 
TestZkStreamProcessorSession,
    longer term fix should be to use message markers in input source and 
    shutdown taskCoordinator upon receiving them from TaskImpl(Or use 
    bounded collection based pluggable 
InMemorySystemConsumer/InMemorySystemProducer).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shanthoosh/samza FIX_ZK_PROCESSOR_FLAKY_TESTS

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #260
    
----
commit 3e8b9f443ab6fcda3d66f39e765685c3edb71a53
Author: Shanthoosh Venkataraman <[email protected]>
Date:   2017-08-03T19:24:24Z

    Fix flaky and slow integration tests in TestZkStreamProcessor and 
TestZkStreamProcessorSession.
    
    Reason for failures:
    
    There’re three configurable wait times in rebalancing phase in samza 
standalone before consensus is acheived and processing resumes with updated 
jobModel.
    
    * debounceTime (Specified by `job.debounce.time.ms`. Upon processor change, 
leader waits for this interval before generating jobModel expecting 
stabilization in processors group(new arrival, deletion etc)).
    * taskShutdownMs (Specified by `task.shutdown.ms`. Wait time for 
SamzaContainer shutdown in StreamProcessor).
    * barrierWaitTimeOutMs (Specified by 
`job.coordinator.zk.consensus.timeout.ms`. Wait time for all processors in the 
group to join the barrier after creation).
    
    Above wait times affects rebalancing phase duration. All these wait time 
have defaults in order of 40-60 seconds and not overridden to low values in 
TestZkStreamProcessor, TestZkStreamProcessorSession.
    
    Flaky tests expects processors to come back up after rebalancing phase and 
drain message sources(Accomplished by checking a latch.count. 
RemoteApplicationRunner integration tests does exact same thing by checking if 
kafka input queue is drained directly with similar logic).
    
    In worst case rebalancing phases can last upto 3-4 minutes(Making these 
tests sometime take 10 minutes at worst case).
    
    Change:
    
    Set all the above timeouts to 2 seconds(Sufficient for tests and verified 
by local build).
    
    Benefits:
    
    * Faster build time(Average runtime of these individual tests were reduced 
from 1m56s to 14s)
    * More predicability in assertions(Didn’t fail even once in 30-40 
attempts locally).
    
    NOTE: If this doesn’t fix TestZkStreamProcessor and 
TestZkStreamProcessorSession, longer term fix should be to use message markers 
in input source and shutdown taskCoordinator upon receiving them from 
TaskImpl(Or use bounded collection based pluggable 
InMemorySystemConsumer/InMemorySystemProducer).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to