[ https://issues.apache.org/jira/browse/NIFI-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Bende updated NIFI-1221: ------------------------------ Attachment: 0001-NIFI-1221-Updating-unit-tests.patch I finally got a chance to review this and I definitely support the batching concept. I thought it would be a good idea to have a unit test that verifies the batching functionality so I went ahead and added that in the patch I am attaching. While doing that I saw that testErrorQueue was passing, but not actually doing what we expected since it was actually failing to send any messages b/c it couldn't bind to the port. I started fixing this test and realized that the error queue was no longer being used, and the test was really testing that a parsing exception routes to invalid. I renamed that test accordingly and removed the error queue from the processor all together. I'm wondering though, should we be using the error queue? What happens when we have already removed something from the queue and then an unexpected exception happens? I also added the 100ms timeout on the poll() calls because otherwise the processor ends up yielding more than it needs to which can degrade performance and may not be clear to users what is happening (this wasn't clear to me). > ListenSyslog should support batching > ------------------------------------ > > Key: NIFI-1221 > URL: https://issues.apache.org/jira/browse/NIFI-1221 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions > Reporter: Mark Payne > Assignee: Bryan Bende > Fix For: 0.4.0 > > Attachments: > 0001-NIFI-1221-Support-batching-of-Syslog-messages.patch, > 0001-NIFI-1221-Updating-unit-tests.patch, > 0001-NIFI-1221-fixing-test-failures-contrib-failed-line-m.patch > > > Currently, performance of ListenSyslog is pretty reasonable. If I configure > logging to WARN level, and I use a Yield Duration of 0 ms, 3 threads, 25 ms > run duration, then I can push about 23,000 messages per second to a single > NiFi node without any loss, with message parsing enabled. > However, I think we can do a lot better than that. Since these Syslog events > are just log messages, they lend themselves very well to concatenation. We > should have a Max Batch Size property as well as a Message Delimiter property. > If using batch, though, it's important that we do not allow the Parse > Messages property to be true, since it doesn't really make sense to add > attributes if we have multiple messages. > However, since we cannot parse the messages if bundled together, we should > have a separate ParseSyslog processor that does parse them. This way, we can > route specific events to a ParseSyslog processor. For instance, using > RouteText to pull out events of interest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)