GitHub user mmiklavc opened a pull request:

    https://github.com/apache/metron/pull/1045

    METRON-1594: KafkaWriter is asynchronous and may lose data on node failure

    ## Contributor Comments
    
    https://issues.apache.org/jira/browse/METRON-1594
    
    This covers the work to convert the KafkaWriter from a basic MessageWriter 
to a BulkMessageWriter in order to address making producer.send() synchronous. 
This impacts: parsers, enrichment, indexing (error topic output), and profiler. 
Anything previously using the KafkaWriter as a single-record MessageWriter has 
been converted over to using it as a BulkMessageWriter.
    
    Other:
    
    - ParserBolt was lacking Tick Tuples so I've ported over the functionality 
almost verbatim from the BulkMessageWriterBolt.
    - BulkMessageWriterBolt has been generalized. Previously, it was extending 
ConfiguredIndexingBolt in both enrichments and indexing.
    - Configuring batchSize and batchTimeout for the BulkWriterComponent that 
wraps BulkMessageWriter(s):
      - parsers stays the same, default updated to 15 (unless/until perf 
testing suggests a different default more suitable)
      - enrichment - pulls from global config: `enrichment.writer.batchSize` 
and `enrichment.writer.batchTimeout`
      - indexing - stays the same except for the error output to kafka topic, 
which has been set to batch size 15 by default
      - profiler - Added a ProfilerWriterConfiguration to match the pattern 
used by other writers. Pulls from global config: `profiler.writer.batchSize` 
and `profiler.writer.batchTimeout`
    
    Tested in full dev and data flows into the ES indexes. Currently undergoing 
performance testing to establish a proper baseline batch size that does not 
result in performance regressions.
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron.  
    Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    
    ### For code changes:
    - [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
      ```
      mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
      ```
    
    - [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:
    
      ```
      cd site-book
      mvn site
      ```
    
    #### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
    It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mmiklavc/metron kafka-writer-synchro

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1045
    
----
commit e541e33c0d1f08569921b85a0feb5996761566cf
Author: Michael Miklavcic <michael.miklavcic@...>
Date:   2018-05-22T23:04:02Z

    Update configurations. Add tests. Fix KafkaWriter to handle batches.

commit efaf5cc602c6f33cd2ffc98381151e99ff91ab2b
Author: Michael Miklavcic <michael.miklavcic@...>
Date:   2018-05-25T21:09:41Z

    Add tick tuple for batching to ParserBolt. Finish configuration changes to 
enable batching for all BulkMessageWriterBolt users.

commit dc7163ccd811e5c53d186796b2213acc69ea823b
Author: Michael Miklavcic <michael.miklavcic@...>
Date:   2018-05-29T16:47:52Z

    config test fixes

commit eb80852404350af50b9d5b7a591231e10d50cab1
Author: Michael Miklavcic <michael.miklavcic@...>
Date:   2018-05-29T20:36:11Z

    Missed flux writer config change

commit 5105e721593adde143c5c83f533cb3c1e2b051d6
Author: Michael Miklavcic <michael.miklavcic@...>
Date:   2018-05-31T02:23:27Z

    Batch config documentation. Set KafkaProducer batch size default in 
KafkaWriter.

commit 7aa1b4d12b6fb638da2bf84012e287af04525d3b
Author: Michael Miklavcic <michael.miklavcic@...>
Date:   2018-05-31T14:04:42Z

    Fix merge with master.

commit df3d29a2e4677564da89ebb40c81f5f97c206437
Author: Michael Miklavcic <michael.miklavcic@...>
Date:   2018-05-31T18:35:11Z

    Parser tests change with merge with master.

commit 07286640d244de15058129f9689f483d34ad318a
Author: Michael Miklavcic <michael.miklavcic@...>
Date:   2018-05-31T20:29:36Z

    Adjust batch sizes for integration tests

----


---

Reply via email to