[ 
https://issues.apache.org/jira/browse/METRON-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545714#comment-16545714
 ] 

ASF GitHub Bot commented on METRON-1657:
----------------------------------------

Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1099#discussion_r202817242
  
    --- Diff: metron-platform/metron-parsers/README.md ---
    @@ -82,6 +82,12 @@ topology in kafka.  Errors are collected with the 
context of the error
     (e.g. stacktrace) and original message causing the error and sent to an
     `error` queue.  Invalid messages as determined by global validation
     functions are also treated as errors and sent to an `error` queue. 
    +
    +Multiple sensors can be aggregated into a single Storm topology. When this 
is done, there will be
    +multiple Kafka spouts, but only a single parser bolt which will handle 
delegating to the correct 
    --- End diff --
    
    @justinleet can you maybe create a data flow diagram or sequence diagram 
that shows a syslog record from the use-case flowing through this topology and 
add it to the use-case around parser chaining?
    
    It'd be something like, given a `cisco-6-302` record, it'll go:
    * From NiFi to the `pix_syslog_router` kafka topic
    * From the `pix_syslog_router` kafka topic to the `pix_syslog_router` spout 
in the aggregated storm topology
    * From the `pix_syslog_router` kafka spout to the parser bolt, which will 
run the `pix_syslog_router` Grok parser and write out to the `cisco-6-302` 
kafka topic
    * From the `cisco-6-302` kafka topic to the `cisco-6-302` spout in the 
aggregated storm topology
    * From the `cisco-6-302` kafka spout to the `cisco-6-302` Grok parser and 
write out to the `enrichments` kafka topic, where it's picked up by the 
enrichment topology.
    
    Eventually, we should consider taking out the writing to the `cisco-6-302` 
topic (optionally), but even eventually there may be value in those 
intermediate kafka topics due to how users may want to group sensors (e.g. 
grouping may be done via velocity or scalability requirements, rather than 
logical connection).


> Parser aggregation in storm
> ---------------------------
>
>                 Key: METRON-1657
>                 URL: https://issues.apache.org/jira/browse/METRON-1657
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Justin Leet
>            Assignee: Justin Leet
>            Priority: Major
>
> Currently our parsing solution requires one storm topology per sensor. It has 
> been complained that this may be wasteful of resources and that, rather than 
> one storm topology per sensor, it would be advantageous to have multiple 
> sensors in the same topology. The benefit to this is that it would require 
> fewer storm slots.
> The issue with this is that whenever we've aggregated functionality like this 
> before, we've run into issues appropriately being able to scale storm (e.g. 
> batch vs random access indexing in the same topology).  The main point in 
> addressing this is to recommend that parsers with similar velocities and 
> complexity are grouped together.
> Particularly for a first cut, leave the configuration mostly as-is, while 
> allowing for comma separated lists of sensors in start_parser_topology.sh 
> (e.g. bro,yaf creates a aggregated parser consisting of those two). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to