[ 
https://issues.apache.org/jira/browse/METRON-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484274#comment-16484274
 ] 

ASF GitHub Bot commented on METRON-1569:
----------------------------------------

GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/1022

    METRON-1569 Allow user to change field name conversion when indexing …

    The `ElasticsearchWriter` has a mechanism to transform the field names of a 
message before it is written to Elasticsearch.  Right now this mechanism is 
hard-coded to replace all '.' dots with ':' colons.
    
    This mechanism was needed for Elasticsearch 2.x which did not allow dots in 
field names.  Now that Metron supports Elasticsearch 5.x this is no longer a 
problem. A user should be able to configure the field name transformation when 
writing to Elasticsearch, as needed.  
    
    While it might have been simpler to just remove the de-dotting mechanism, 
this would break backwards compatibility.  Taking this approach provides users 
with an upgrade path.
    
    ## Changes
    
    This change allows the user to configure the field name converter as part 
of the index writer configuration.  
    
    Acceptable values include the following.
    * `DEDOT`: Replaces all '.' with ':' which is the default, backwards 
compatible behavior.
    * `NOOP`: No field name change.
    
    If no "fieldNameConverter" is defined, it defaults to using `DEDOT` which 
maintains backwards compatibility.
    
    A cache of `FieldNameConverter`s is maintained since the index writer 
configuration can be changed at run-time and each sensor has its own index 
writer configuration.
    
    An example configuration looks-like the following.
    ```
        {
          "hdfs" : {
            "enabled" : false
          },
          "elasticsearch" : {
            "index" : "bro",
            "batchSize" : 5,
            "enabled" : true,
            "fieldNameConverter": "NOOP"
          },
          "solr" : {
            "enabled" : false
          }
        } 
    ```
    
    ## Code Changes
    
    * Added the `fieldNameConverter` parameter to the Index writer 
configuration.
    
    * Moved the `FieldNameConverter` implementations to a dedicated package in 
`metron-common`.
    
    * Renamed `ElasticsearchFieldNameConverter` to `DeDotFieldNameConverter`.
    
    * Implemented the `NoopFieldNameConverter` which does not modify the field 
name.
    
    * Created `FieldNameConverters` class that allows a user to specify either 
`DEDOT` or `NOOP` to choose the appropriate implementation.
    
    * Implemented a `CachedFieldNameConverterFactory` that encapsulates all the 
logic for choosing and instantiating the appropriate `FieldNameConverter`.  
    
    * Updated `ElasticsearchWriter` to use the 
`CachedFieldNameConverterFactory`.
    
    * Updated the README to document the new configuration parameter.
    
    
    ## Manual Testing
    
    1. Launch a development environment and login.
    
        ```
        vagrant ssh
        sudo su -
        source /etc/default/metron
        ```
    
    1. Validate the environment by ensuring alerts are visible in the Alerts UI 
and that the Ambari Service Check completes successfully.  This ensures that 
the change is backwards compatible.
    
    1. Login to the Storm UI and enable DEBUG logging for 
`org.apache.metron.common` and `org.apache.metron.elasticsearch`.
    
    1. The Storm worker logs in 
`/var/log/storm/worker-artifacts/random_access_indexing*/worker.log` should 
contain the following log statements, if you have enabled DEBUG logging 
correctly.  This shows that the default `DEDOT` converter is in-use.
    
        ```
        2018-05-22 14:38:... [DEBUG] Renamed dotted field; 
original=source.type, new=source:type
        2018-05-22 14:38:... [DEBUG] Renamed dotted field; 
original=adapter.geoadapter.end.ts, new=adapter:geoadapter:end:ts
        2018-05-22 14:38:... [DEBUG] Renamed dotted field; 
original=threatintelsplitterbolt.splitter.end.ts, 
new=threatintelsplitterbolt:splitter:end:ts
        2018-05-22 14:38:... [DEBUG] Renamed dotted field; 
original=adapter.threatinteladapter.begin.ts, 
new=adapter:threatinteladapter:begin:ts
        2018-05-22 14:38:... [DEBUG] Renamed dotted field; 
original=enrichments.geo.ip_dst_addr.location_point, 
new=enrichments:geo:ip_dst_addr:location_point
        2018-05-22 14:38:... [DEBUG] Renamed dotted field; 
original=adapter.threatinteladapter.end.ts, 
new=adapter:threatinteladapter:end:ts
        2018-05-22 14:38:... [DEBUG] Renamed dotted field; 
original=enrichmentsplitterbolt.splitter.end.ts, 
new=enrichmentsplitterbolt:splitter:end:ts
        ```
    
    1. Launch the REPL.
    
        ```
        ./bin/stellar -z $ZOOKEEPER
        ```
    
    1. Change the field name converter to NOOP.
    
        ```
        [Stellar]>>> conf := SHELL_EDIT()
        {
          "hdfs" : {
            "enabled" : false
          },
          "elasticsearch" : {
            "index" : "bro",
            "batchSize" : 5,
            "enabled" : true,
            "fieldNameConverter": "NOOP"
          },
          "solr" : {
            "enabled" : false
          }
        }
    
        [Stellar]>>> CONFIG_PUT("INDEXING", conf, "bro")
        ```
    
    1. It can take up to 5 minutes for the topology to pick-up this change.  
The old `FieldNameConverter` needs to expire from the cache first.
    
    1. Go back to the Storm worker logs.  When the change takes effect, we 
should see a log like the following indicating that the 
`NoopFieldNameConverter` was created.
    
        ```
        2018-05-22 16:... [DEBUG] Created field name converter; sensorType=bro, 
configuredName=NOOP, class=NoopFieldNameConverter
        ```
    
    1. In the same logs, we will start to see tuples fail to be indexed.  
Elasticsearch complains because the templates have been created to expect 
`source:type`, but that field no longer exists because the `FieldNameConverter` 
was changed.
    
        ```
        2018-05-22 16:0...[ERROR] Failing 1 tuples
        org.elasticsearch.index.mapper.MapperParsingException: Could not 
dynamically add mapping for field [source.type]. Existing mapping for [source] 
must be of type object but found [keyword].
                at 
org.elasticsearch.index.mapper.DocumentParser.getDynamicParentMapper(DocumentParser.java:876)
 ~[stormjar.jar:?]
                at 
org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:596)
 ~[stormjar.jar:?]
                at 
org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:396)
 ~[stormjar.jar:?]
                at 
org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:373)
 ~[stormjar.jar:?]
                at 
org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:93)
 ~[stormjar.jar:?]
                at 
org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:66)
 ~[stormjar.jar:?]
    
        ```
    
    
    ## Pull Request Checklist
    
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
    - [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1569

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1022.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1022
    
----
commit c478de1f099d62c0f5271cd9b8ad5124089ad735
Author: Nick Allen <nick@...>
Date:   2018-05-21T20:50:25Z

    METRON-1569 Allow user to change field name conversion when indexing to 
Elasticsearch

----


> Allow user to change field name conversion when indexing to Elasticsearch
> -------------------------------------------------------------------------
>
>                 Key: METRON-1569
>                 URL: https://issues.apache.org/jira/browse/METRON-1569
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>            Priority: Major
>
> The `ElasticsearchWriter` has a mechanism to transform the field names of a 
> message before it is written to Elasticsearch.  Right now this mechanism is 
> hard-coded to replace all '.' dots with ':' colons.
> This mechanism was needed for Elasticsearch 2.x which did not allow dots in 
> field names.  Now that Metron supports Elasticsearch 5.x this is no longer a 
> problem.
> A user should be able to configure the field name transformation when writing 
> to Elasticsearch, as needed.  
> While it might have been simpler to just remove the de-dotting mechanism, 
> this would break backwards compatibility.  Providing users with a means to 
> configure this mechanism provides them with an upgrade path.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to