Github user justinleet commented on a diff in the pull request:
https://github.com/apache/metron/pull/609#discussion_r120917140
--- Diff: metron-platform/metron-enrichment/README.md ---
@@ -71,40 +73,94 @@ The `fieldMap`contents are of interest because they
contain the routing and conf
]
}
```
-Based on this sample config, both ip_src_addr and ip_dst_addr will go to
the `geo`, `host`, and `hbaseEnrichment` adapter bolts. For the `geo`, `host`
and `hbaseEnrichment`, this is sufficient. However, more complex enrichments
may contain their own configuration. Currently, the `stellar` enrichment
requires a more complex configuration, such as:
+Based on this sample config, both `ip_src_addr` and `ip_dst_addr` will go
to the `geo`, `host`, and
+`hbaseEnrichment` adapter bolts.
+
+#### Stellar Enrichment Configuration
+For the `geo`, `host` and `hbaseEnrichment`, this is sufficient. However,
more complex enrichments
+may contain their own configuration. Currently, the `stellar` enrichment
is more adaptable and thus
+requires a more nuanced configuration.
+
+At its most basic, we want to take a message and apply a couple of
enrichments, such as converting the
+`hostname` field to lowercase. We do this by specifying the transformation
inside of the
+`config` for the `stellar` fieldMap. There are two syntaxes that are
supported, specifying the transformations
+as a map with the key as the field and the value the stellar expression:
```
"fieldMap": {
...
"stellar" : {
"config" : {
- "numeric" : {
- "foo": "1 + 1"
- }
- ,"ALL_CAPS" : "TO_UPPER(source.type)"
+ "hostname" : "TO_LOWER(hostname)"
}
}
}
```
-Whereas the simpler enrichments just need a set of fields explicitly
stated so they can be separated from the message and sent to the enrichment
adapter bolt for enrichment and ultimately joined back in the join bolt, the
stellar enrichment has its set of required fields implicitly stated through
usage. For instance, if your stellar statement references a field, it should
be included and if not, then it should not be included. We did not want to
require users to make explicit the implicit.
+Another approach is to make the transformations as a list with the same
`var := expr` syntax as is used
+in the Stellar REPL, such as:
+```
+ "fieldMap": {
+ ...
+ "stellar" : {
+ "config" : [
+ "hostname := TO_LOWER(hostname)"
+ ]
+ }
+ }
+```
+
+Sometimes arbitrary stellar enrichments may take enough time that you
would prefer to split some of them
+into groups and execute the groups of stellar enrichments in parallel.
Take, for instance, if you wanted
+to do an HBase enrichment and a profiler call which were independent of
one another. This usecase is
+supported by splitting the enrichments up as groups.
-The other way in which the stellar enrichment is somewhat more complex is
in how the statements are executed. In the general purpose case for a list of
fields, those fields are used to create a message to send to the enrichment
adapter bolt and that bolt's worker will handle the fields one by one in serial
for a given message. For stellar enrichment, we wanted to have a more complex
design so that users could specify the groups of stellar statements sent to the
same worker in the same message (and thus executed sequentially). Consider the
following configuration:
+Consider the following example:
```
"fieldMap": {
+ ...
"stellar" : {
"config" : {
- "numeric" : {
- "foo": "1 + 1"
- "bar" : TO_LOWER(source.type)"
- }
- ,"text" : {
- "ALL_CAPS" : "TO_UPPER(source.type)"
- }
+ "malicious_domain_enrichment" : {
+ "is_bad_domain" : "ENRICHMENT_EXISTS('malicious_domains',
ip_dst_addr, 'enrichments', 'cf')"
+ },
+ "login_profile" : [
+ "profile_window := PROFILE_WINDOW('from 6 months ago')",
+ "global_login_profile :=
PROFILE_GET('distinct_login_attempts', 'global', profile_window)",
+ "stats := STATS_MERGE(global_login_profile)",
+ "auth_attempts_median := STATS_PERCENTILE(stats, 0.5)",
+ "auth_attempts_sd := STATS_SD(stats)",
+ "profile_window := null",
+ "global_login_profile := null",
+ "stats := null"
+ ]
}
}
}
```
-We have a group called `numeric` whose stellar statements will be executed
sequentially. In parallel to that, we have the group of stellar statements
under the group `text` executing. The intent here is to allow you to not force
higher latency operations to be done sequentially. You can use any name for
your groupings you like. Be aware that the configuration is a map and duplicate
configuration keys' values are not combined, so the duplicate configuration
value will be overwritten.
+
+Here we want to perform two enrichments that hit HBase and we would rather
not run in sequence. These
+enrichments are entirely independent of one another (i.e. neither relies
on the output of the other). In
+this case, we've created a group called `malicious_domain_enrichment` to
inquire about whether the destination
+address exists in the HBase enrichment table in the `malicious_domains`
enrichment type. This is a simple
+enrichment, so we can express the enrichment group as a map with the new
field `is_bad_domain` being a key
+and the stellar expression associated with that operation being the
associated value.
+
+In contrast, the stellar enrichment group `login_profile` is interacting
with the profiler, has multiple temporary
+expressions (i.e. `profile_window`, `global_login_profile`, and `stats`)
that are useful only within the context
+of this group of stellar expressions. In this case, we would need to
ensure that we use the list construct
+when specifying the group and remember to set the temporary variables to
`null` so they are not passed along.
+
+In general, things to note from this section are as follows:
+* The stellar enrichments for the `stellar` enrichment adapter are
specified in the `config` for the `stellar` enrichment
+adapter in the `fieldMap`
+* Groups of independent (i.e. no expression in any group depend on the
output of an expression from an other group) may be executed in parallel
+* If you have the need to use temporary variables, you may use the list
construct. Ensure that you assign the variables to `null` before the end of
the group.
+* **Ensure that you do not assign a field to a stellar expression which
returns an object which JSON cannot represent.**
--- End diff --
I don't think it's in scope for this, but it feels like this should be
enforceable at an early time. It would be nice to be able to do some high
level checks like this before allowing things like this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---