GitHub user cestella opened a pull request:

    https://github.com/apache/incubator-metron/pull/142

    METRON-204: Field Transformation Domain Specific Language

    Similar to the domain specific query language, it would be nice to have a 
domain specific language for transformations which is used as an optional 
FieldTransformation implementation.
    
    * A fixed set of transformation functions:
       * `TO_LOWER(string)` : Transforms the first argument to a lowercase 
string
       * `TO_UPPER(string)` : Transforms the first argument to an uppercase 
string
       * `TO_STRING(string)` : Transforms the first argument to a string
       * `TO_INTEGER(x)` : Transforms the first argument to an integer
       * `TO_DOUBLE(x)` : Transforms the first argument to a double
       * `TRIM(string)` : Trims whitespace from both sides of a string.
       * `JOIN(list, delim)` : Joins the components of the list with the 
specified delimiter
       * `SPLIT(string, delim)` : Splits the string by the delimiter.  Returns 
a list.
       * `GET_FIRST(list)` : Returns the first element of the list
       * `GET_LAST(list)` : Returns the last element of the list
       * `GET(list, i)` : Returns the i'th element of the list (i is 0-based).
       * `MAP_GET(key, map, default)` : Returns the value associated with the 
key in the map.  If the key does not exist, the default will   be returned.  If 
the default is unspecified, then null will be returned.
       * `DOMAIN_TO_TLD(domain)` : Returns the TLD of the domain.
       * `DOMAIN_REMOVE_TLD(domain)` : Remove the TLD of the domain.
       * `REMOVE_TLD(domain)` : Removes the TLD from the domain.
       * `URL_TO_HOST(url)` : Returns the host from a URL
       * `URL_TO_PROTOCOL(url)` : Returns the protocol from a URL
       * `URL_TO_PORT(url)` : Returns the port from a URL
       * `URL_TO_PATH(url)` : Returns the path from a URL
       * `TO_EPOCH_TIMESTAMP(dateTime, format, timezone)` : Returns the epoch 
timestamp of the `dateTime` given the `format`.  If the format does not have a 
timestamp and you wish to assume a given timestamp, you may specify the 
`timezone` optionally.
    * A FieldTransformer implementation, `MTL` which exposes the transformation 
language
    
    Example MTL transformation:
    
    Consider the following sensor parser config to add three new fields to a
    message:
    * `utc_timestamp` : The unix epoch timestamp based on the `timestamp` 
field, a `dc` field which is the data center the message comes     from and a 
`dc2tz` map mapping data centers to timezones
    * `url_host` : The host associated with the url in the `url` field
    * `url_protocol` : The protocol associated with the url in the `url` field
    
    ```
    {
    ...
        "fieldTransformations" : [
              {
               "transformation" : "MTL"
              ,"output" : [ "utc_timestamp", "url_host", "url_protocol" ]
              ,"config" : {
                "utc_timestamp" : "TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd
    HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC') )"
               ,"url_host" : "URL_TO_HOST(url)"
               ,"url_protocol" : "URL_TO_PROTOCOL(url)"
                          }
              }
                          ]
       ,"parserConfig" : {
          "dc2tz" : {
                    "nyc" : "EST"
                   ,"la" : "PST"
                   ,"london" : "UTC"
                    }
        }
    }
    ```
    
    Note that the `dc2tz` map is in the parser config, so it is accessible
    in the functions.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron METRON-204

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #142
    
----
commit cb9e925a199b3ee4b377f955066a959a91fc87c2
Author: cstella <ceste...@gmail.com>
Date:   2016-06-03T02:05:33Z

    METRON-204: Field Transformation Domain Specific Language

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to