[ https://issues.apache.org/jira/browse/METRON-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497166#comment-16497166 ]
ASF GitHub Bot commented on METRON-1594: ---------------------------------------- GitHub user mmiklavc opened a pull request: https://github.com/apache/metron/pull/1045 METRON-1594: KafkaWriter is asynchronous and may lose data on node failure ## Contributor Comments https://issues.apache.org/jira/browse/METRON-1594 This covers the work to convert the KafkaWriter from a basic MessageWriter to a BulkMessageWriter in order to address making producer.send() synchronous. This impacts: parsers, enrichment, indexing (error topic output), and profiler. Anything previously using the KafkaWriter as a single-record MessageWriter has been converted over to using it as a BulkMessageWriter. Other: - ParserBolt was lacking Tick Tuples so I've ported over the functionality almost verbatim from the BulkMessageWriterBolt. - BulkMessageWriterBolt has been generalized. Previously, it was extending ConfiguredIndexingBolt in both enrichments and indexing. - Configuring batchSize and batchTimeout for the BulkWriterComponent that wraps BulkMessageWriter(s): - parsers stays the same, default updated to 15 (unless/until perf testing suggests a different default more suitable) - enrichment - pulls from global config: `enrichment.writer.batchSize` and `enrichment.writer.batchTimeout` - indexing - stays the same except for the error output to kafka topic, which has been set to batch size 15 by default - profiler - Added a ProfilerWriterConfiguration to match the pattern used by other writers. Pulls from global config: `profiler.writer.batchSize` and `profiler.writer.batchTimeout` Tested in full dev and data flows into the ES indexes. Currently undergoing performance testing to establish a proper baseline batch size that does not result in performance regressions. ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh ``` - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`: ``` cd site-book mvn site ``` #### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mmiklavc/metron kafka-writer-synchro Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1045.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1045 ---- commit e541e33c0d1f08569921b85a0feb5996761566cf Author: Michael Miklavcic <michael.miklavcic@...> Date: 2018-05-22T23:04:02Z Update configurations. Add tests. Fix KafkaWriter to handle batches. commit efaf5cc602c6f33cd2ffc98381151e99ff91ab2b Author: Michael Miklavcic <michael.miklavcic@...> Date: 2018-05-25T21:09:41Z Add tick tuple for batching to ParserBolt. Finish configuration changes to enable batching for all BulkMessageWriterBolt users. commit dc7163ccd811e5c53d186796b2213acc69ea823b Author: Michael Miklavcic <michael.miklavcic@...> Date: 2018-05-29T16:47:52Z config test fixes commit eb80852404350af50b9d5b7a591231e10d50cab1 Author: Michael Miklavcic <michael.miklavcic@...> Date: 2018-05-29T20:36:11Z Missed flux writer config change commit 5105e721593adde143c5c83f533cb3c1e2b051d6 Author: Michael Miklavcic <michael.miklavcic@...> Date: 2018-05-31T02:23:27Z Batch config documentation. Set KafkaProducer batch size default in KafkaWriter. commit 7aa1b4d12b6fb638da2bf84012e287af04525d3b Author: Michael Miklavcic <michael.miklavcic@...> Date: 2018-05-31T14:04:42Z Fix merge with master. commit df3d29a2e4677564da89ebb40c81f5f97c206437 Author: Michael Miklavcic <michael.miklavcic@...> Date: 2018-05-31T18:35:11Z Parser tests change with merge with master. commit 07286640d244de15058129f9689f483d34ad318a Author: Michael Miklavcic <michael.miklavcic@...> Date: 2018-05-31T20:29:36Z Adjust batch sizes for integration tests ---- > KafkaWriter is asynchronous and may lose data on node failure > ------------------------------------------------------------- > > Key: METRON-1594 > URL: https://issues.apache.org/jira/browse/METRON-1594 > Project: Metron > Issue Type: Bug > Reporter: Michael Miklavcic > Assignee: Michael Miklavcic > Priority: Major > > Currently, we do not block for data to be sent from kafka writer and we do > not batch. Unfortunately, the send call is asynchronous, so if the node dies > before the data is actually sent from kafka then it will drop data. It is > highly unlikely that we will be able to make kafkawriter synchronous in a > performant way, so we will likely need to batch in the parser and enrichment > topologies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)