GitHub user mattf-horton opened a pull request:

    https://github.com/apache/incubator-metron/pull/442

    METRON-322 Global Batching and Flushing

    DO NOT INTEGRATE YET.  This is a preliminary review request.
    This Jira Ticket is to add timeout flushing to all Writers that do batch 
writes.  Currently most of them will wait indefinitely for a batch to fill, 
resulting in message recycling due to `topology.message.timeout.secs`.  This 
changeset so far implements timeout flushing for the BulkMessageWriterBolt.
    
    I am now starting to work on unit tests, so while this code compiles and 
passes existing unit tests, the new functionality has not yet been tested.  
However, I would be grateful for a quick review of the basic approach, focusing 
on changes in:
    * BulkMessageWriterBolt
    * BulkWriterComponent
    * BatchTimeoutHelper (new), and 
    * IndexingConfigurations.  
    
    All of the other changes are basically boilerplate, just laying 
"getBatchTimeout()" facade methods alongside existing "getBatchSize()" facade 
methods.  (There's a lot of layers!)
    
    The flush-on-timeout logic is fairly straightforward.  It was implemented 
by a refactoring of BulkWriterComponent and basic Tick Tuple reception in 
BulkMessageWriterBolt.
    
    The tricky part was figuring out the appropriate setting for 
'topology.tick.tuple.freq.secs' if the administrator configures non-default 
batchTimeouts.  It is necessary to enumerate the batchTimeout settings for all 
configured sensorNames, which is implemented in 
IndexingConfigurations$getAllConfiguredTimeouts().  Then multiple other factors 
must be taken into account to determine the allowed and recommended settings, 
which is implemented in BatchTimeoutHelper.  If there are better ways to 
accomplish these things, please share your ideas.
    
    After feedback is incorporated, I need to make similar changes to the 
ParserWriter, and possibly other places in the code where batching is used and 
timeouts are not yet implemented.  That is separable work, which I'm aware also 
needs to be done.  Thanks for taking time to look at this.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mattf-horton/incubator-metron METRON-322

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/442.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #442
    
----
commit 6fdf67c962a4bcfeb43da9c1aa1318ace6b194c3
Author: mattf-horton <mfo...@hortonworks.com>
Date:   2017-02-02T00:54:17Z

    METRON-322 The first easy changes: add batchTimeout most places batchSize 
is currently used.

commit b3eeb3dbbc0834c4c9f3d2fc24d099f2f8e5a3ea
Author: mattf-horton <mfo...@hortonworks.com>
Date:   2017-02-02T08:57:41Z

    mods for flush and timeout in BulkWriterComponent.  Some work in 
BulkMessageWriterBolt, but no getComponentConfiguration() yet.

commit 666f14a0ffe0eefc7de12e226d6953fd3e32d2d5
Author: mattf-horton <mfo...@hortonworks.com>
Date:   2017-02-06T06:06:07Z

    full implementation of getComponentConfiguration() method in 
BulkMessageWriterBolt, with implementation details in IndexingConfigurations 
and new BatchTimeoutHelper class.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to