Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/609#discussion_r120932266
  
    --- Diff: metron-platform/metron-enrichment/README.md ---
    @@ -71,40 +73,94 @@ The `fieldMap`contents are of interest because they 
contain the routing and conf
           ]
           }
     ```
    -Based on this sample config, both ip_src_addr and ip_dst_addr will go to 
the `geo`, `host`, and `hbaseEnrichment` adapter bolts. For the `geo`, `host` 
and `hbaseEnrichment`, this is sufficient.  However, more complex enrichments 
may contain their own configuration.  Currently, the `stellar` enrichment 
requires a more complex configuration, such as:
    +Based on this sample config, both `ip_src_addr` and `ip_dst_addr` will go 
to the `geo`, `host`, and 
    +`hbaseEnrichment` adapter bolts. 
    + 
    +#### Stellar Enrichment Configuration
    +For the `geo`, `host` and `hbaseEnrichment`, this is sufficient. However, 
more complex enrichments 
    +may contain their own configuration.  Currently, the `stellar` enrichment 
is more adaptable and thus
    +requires a more nuanced configuration.
    +
    +At its most basic, we want to take a message and apply a couple of 
enrichments, such as converting the
    +`hostname` field to lowercase. We do this by specifying the transformation 
inside of the 
    +`config` for the `stellar` fieldMap.  There are two syntaxes that are 
supported, specifying the transformations
    +as a map with the key as the field and the value the stellar expression:
     ```
         "fieldMap": {
            ...
           "stellar" : {
             "config" : {
    -          "numeric" : {
    -                      "foo": "1 + 1"
    -                      }
    -          ,"ALL_CAPS" : "TO_UPPER(source.type)"
    +          "hostname" : "TO_LOWER(hostname)"
             }
           }
         }
     ```
     
    -Whereas the simpler enrichments just need a set of fields explicitly 
stated so they can be separated from the message and sent to the enrichment 
adapter bolt for enrichment and ultimately joined back in the join bolt, the 
stellar enrichment has its set of required fields implicitly stated through 
usage.  For instance, if your stellar statement references a field, it should 
be included and if not, then it should not be included.  We did not want to 
require users to make explicit the implicit.
    +Another approach is to make the transformations as a list with the same 
`var := expr` syntax as is used
    +in the Stellar REPL, such as:
    +```
    +    "fieldMap": {
    +       ...
    +      "stellar" : {
    +        "config" : [
    +          "hostname := TO_LOWER(hostname)"
    +        ]
    +      }
    +    }
    +```
    +
    +Sometimes arbitrary stellar enrichments may take enough time that you 
would prefer to split some of them
    +into groups and execute the groups of stellar enrichments in parallel.  
Take, for instance, if you wanted
    +to do an HBase enrichment and a profiler call which were independent of 
one another.  This usecase is 
    +supported by splitting the enrichments up as groups.
     
    -The other way in which the stellar enrichment is somewhat more complex is 
in how the statements are executed.  In the general purpose case for a list of 
fields, those fields are used to create a message to send to the enrichment 
adapter bolt and that bolt's worker will handle the fields one by one in serial 
for a given message.  For stellar enrichment, we wanted to have a more complex 
design so that users could specify the groups of stellar statements sent to the 
same worker in the same message (and thus executed sequentially).  Consider the 
following configuration:
    +Consider the following example:
     ```
         "fieldMap": {
    +       ...
           "stellar" : {
             "config" : {
    -          "numeric" : {
    -                      "foo": "1 + 1"
    -                      "bar" : TO_LOWER(source.type)"
    -                      }
    -         ,"text" : {
    -                   "ALL_CAPS" : "TO_UPPER(source.type)"
    -                   }
    +          "malicious_domain_enrichment" : {
    +            "is_bad_domain" : "ENRICHMENT_EXISTS('malicious_domains', 
ip_dst_addr, 'enrichments', 'cf')"
    +          },
    +          "login_profile" : [
    +            "profile_window := PROFILE_WINDOW('from 6 months ago')", 
    +            "global_login_profile := 
PROFILE_GET('distinct_login_attempts', 'global', profile_window)",
    +            "stats := STATS_MERGE(global_login_profile)",
    +            "auth_attempts_median := STATS_PERCENTILE(stats, 0.5)", 
    +            "auth_attempts_sd := STATS_SD(stats)",
    +            "profile_window := null", 
    +            "global_login_profile := null", 
    +            "stats := null"
    +          ]
             }
           }
         }
     ```
    -We have a group called `numeric` whose stellar statements will be executed 
sequentially.  In parallel to that, we have the group of stellar statements 
under the group `text` executing.  The intent here is to allow you to not force 
higher latency operations to be done sequentially. You can use any name for 
your groupings you like. Be aware that the configuration is a map and duplicate 
configuration keys' values are not combined, so the duplicate configuration 
value will be overwritten.
    +
    +Here we want to perform two enrichments that hit HBase and we would rather 
not run in sequence.  These
    +enrichments are entirely independent of one another (i.e. neither relies 
on the output of the other).  In
    +this case, we've created a group called `malicious_domain_enrichment` to 
inquire about whether the destination
    +address exists in the HBase enrichment table in the `malicious_domains` 
enrichment type.  This is a simple
    +enrichment, so we can express the enrichment group as a map with the new 
field `is_bad_domain` being a key
    +and the stellar expression associated with that operation being the 
associated value.
    +
    +In contrast, the stellar enrichment group `login_profile` is interacting 
with the profiler, has multiple temporary
    +expressions (i.e. `profile_window`, `global_login_profile`, and `stats`) 
that are useful only within the context
    +of this group of stellar expressions.  In this case, we would need to 
ensure that we use the list construct
    +when specifying the group and remember to set the temporary variables to 
`null` so they are not passed along.
    +
    +In general, things to note from this section are as follows:
    +* The stellar enrichments for the `stellar` enrichment adapter are 
specified in the `config` for the `stellar` enrichment
    +adapter in the `fieldMap`
    +* Groups of independent (i.e. no expression in any group depend on the 
output of an expression from an other group) may be executed in parallel
    +* If you have the need to use temporary variables, you may use the list 
construct.  Ensure that you assign the variables to `null` before the end of 
the group.
    +* **Ensure that you do not assign a field to a stellar expression which 
returns an object which JSON cannot represent.**
    --- End diff --
    
    I agree, this was just filling in documentation that didn't exist.  There 
should be a follow-on ticket here for a solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to