GitHub user cestella opened a pull request: https://github.com/apache/incubator-metron/pull/142
METRON-204: Field Transformation Domain Specific Language Similar to the domain specific query language, it would be nice to have a domain specific language for transformations which is used as an optional FieldTransformation implementation. * A fixed set of transformation functions: * `TO_LOWER(string)` : Transforms the first argument to a lowercase string * `TO_UPPER(string)` : Transforms the first argument to an uppercase string * `TO_STRING(string)` : Transforms the first argument to a string * `TO_INTEGER(x)` : Transforms the first argument to an integer * `TO_DOUBLE(x)` : Transforms the first argument to a double * `TRIM(string)` : Trims whitespace from both sides of a string. * `JOIN(list, delim)` : Joins the components of the list with the specified delimiter * `SPLIT(string, delim)` : Splits the string by the delimiter. Returns a list. * `GET_FIRST(list)` : Returns the first element of the list * `GET_LAST(list)` : Returns the last element of the list * `GET(list, i)` : Returns the i'th element of the list (i is 0-based). * `MAP_GET(key, map, default)` : Returns the value associated with the key in the map. If the key does not exist, the default will be returned. If the default is unspecified, then null will be returned. * `DOMAIN_TO_TLD(domain)` : Returns the TLD of the domain. * `DOMAIN_REMOVE_TLD(domain)` : Remove the TLD of the domain. * `REMOVE_TLD(domain)` : Removes the TLD from the domain. * `URL_TO_HOST(url)` : Returns the host from a URL * `URL_TO_PROTOCOL(url)` : Returns the protocol from a URL * `URL_TO_PORT(url)` : Returns the port from a URL * `URL_TO_PATH(url)` : Returns the path from a URL * `TO_EPOCH_TIMESTAMP(dateTime, format, timezone)` : Returns the epoch timestamp of the `dateTime` given the `format`. If the format does not have a timestamp and you wish to assume a given timestamp, you may specify the `timezone` optionally. * A FieldTransformer implementation, `MTL` which exposes the transformation language Example MTL transformation: Consider the following sensor parser config to add three new fields to a message: * `utc_timestamp` : The unix epoch timestamp based on the `timestamp` field, a `dc` field which is the data center the message comes from and a `dc2tz` map mapping data centers to timezones * `url_host` : The host associated with the url in the `url` field * `url_protocol` : The protocol associated with the url in the `url` field ``` { ... "fieldTransformations" : [ { "transformation" : "MTL" ,"output" : [ "utc_timestamp", "url_host", "url_protocol" ] ,"config" : { "utc_timestamp" : "TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC') )" ,"url_host" : "URL_TO_HOST(url)" ,"url_protocol" : "URL_TO_PROTOCOL(url)" } } ] ,"parserConfig" : { "dc2tz" : { "nyc" : "EST" ,"la" : "PST" ,"london" : "UTC" } } } ``` Note that the `dc2tz` map is in the parser config, so it is accessible in the functions. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cestella/incubator-metron METRON-204 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-metron/pull/142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #142 ---- commit cb9e925a199b3ee4b377f955066a959a91fc87c2 Author: cstella <ceste...@gmail.com> Date: 2016-06-03T02:05:33Z METRON-204: Field Transformation Domain Specific Language ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---