[jira] [Commented] (METRON-1657) Parser aggregation in storm

ASF GitHub Bot (JIRA) Mon, 16 Jul 2018 11:50:12 -0700


    [ 
https://issues.apache.org/jira/browse/METRON-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545610#comment-16545610
 ]


ASF GitHub Bot commented on METRON-1657:
----------------------------------------

Github user justinleet commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1099#discussion_r202785248
  
    --- Diff: 
metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/bolt/ParserBolt.java
 ---
    @@ -182,40 +185,61 @@ public void prepare(Map stormConf, TopologyContext 
context, OutputCollector coll
         super.prepare(stormConf, context, collector);
         messageGetStrategy = MessageGetters.DEFAULT_BYTES_FROM_POSITION.get();
         this.collector = collector;
    -    if(getSensorParserConfig() != null) {
    -      cache = 
CachingStellarProcessor.createCache(getSensorParserConfig().getCacheConfig());
    -    }
    -    initializeStellar();
    -    if(getSensorParserConfig() != null && filter == null) {
    -      
getSensorParserConfig().getParserConfig().putIfAbsent("stellarContext", 
stellarContext);
    -      if 
(!StringUtils.isEmpty(getSensorParserConfig().getFilterClassName())) {
    -        filter = Filters.get(getSensorParserConfig().getFilterClassName()
    -                , getSensorParserConfig().getParserConfig()
    -        );
    +
    +    // Build the Stellar cache
    +    Map<String, Object> cacheConfig = new HashMap<>();
    +    for (Map.Entry<String, ParserComponents> entry: 
sensorToComponentMap.entrySet()) {
    +      String sensor = entry.getKey();
    +      SensorParserConfig config = getSensorParserConfig(sensor);
    +
    +      if (config != null) {
    +        cacheConfig.putAll(config.getCacheConfig());
           }
         }
    +    cache = CachingStellarProcessor.createCache(cacheConfig);
     
    -    parser.init();
    +    // Need to prep all sensors
    +    for (Map.Entry<String, ParserComponents> entry: 
sensorToComponentMap.entrySet()) {
    +      String sensor = entry.getKey();
    +      MessageParser<JSONObject> parser = 
entry.getValue().getMessageParser();
     
    --- End diff --
    
    I left it as a single shared cache on purpose. 
    
    I don't believe that there'd be any incorrect evictions by sharing the 
cache, and I think evicting based on the overall usage in the aggregated parser 
is the appropriate place to handle it. Since the cache is (mostly) LRU, I'd 
prefer to drop the least recently used entry of all parsers rather than 
dropping for each parser.  LRU of the overall flow seems better than LRU of 
each of the sensors. Assuming a single cache, you'd bump up the cache configs 
to account for this, rather than having to optimize each config individually 
(and potentially as a group afterwards).
    
    Caching is also off by default for the parsers, so this is a case that's 
only hit if the user explicitly chooses to do so.
    
    Having said that, I do think I need to shore up the documentation around 
that logic, assuming we choose to go forward with it.  Let me know what you 
think, and I can adjust appropriately.


> Parser aggregation in storm
> ---------------------------
>
>                 Key: METRON-1657
>                 URL: https://issues.apache.org/jira/browse/METRON-1657
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Justin Leet
>            Assignee: Justin Leet
>            Priority: Major
>
> Currently our parsing solution requires one storm topology per sensor. It has 
> been complained that this may be wasteful of resources and that, rather than 
> one storm topology per sensor, it would be advantageous to have multiple 
> sensors in the same topology. The benefit to this is that it would require 
> fewer storm slots.
> The issue with this is that whenever we've aggregated functionality like this 
> before, we've run into issues appropriately being able to scale storm (e.g. 
> batch vs random access indexing in the same topology).  The main point in 
> addressing this is to recommend that parsers with similar velocities and 
> complexity are grouped together.
> Particularly for a first cut, leave the configuration mostly as-is, while 
> allowing for comma separated lists of sensors in start_parser_topology.sh 
> (e.g. bro,yaf creates a aggregated parser consisting of those two). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (METRON-1657) Parser aggregation in storm

Reply via email to