GitHub user JonZeolla opened a pull request:

    https://github.com/apache/incubator-metron/pull/358

    METRON-517: Update elasticsearch bro templates for uri

    ## Problem
    
    [METRON-517](https://issues.apache.org/jira/browse/METRON-517)
    
    The bro uri field in 
[HTTP::Info](https://www.bro.org/sphinx/scripts/base/protocols/http/main.bro.html#type-HTTP::Info)
 can grow to a size which fails to insert the message into Elasticsearch.  The 
related error message is:
    `IllegalArgumentException[Document contains at least one immense term in 
field=\"uri\" (whose UTF8 encoding is longer than the max length 32766), all of 
which were skipped...`
    
    ## Short-term Solution
    
    Set the elasticsearch template to [truncate the URI 
field](https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html)
 at 10922 (32776 / 3), as [RFC 3986](https://tools.ietf.org/html/rfc3986) 
allows UTF-8 characters to be used in URIs.  32766 is a size limitation built 
into Lucene 
([MAX_TERM_LENGTH](https://lucene.apache.org/core/6_2_1/core/constant-values.html#org.apache.lucene.index.IndexWriter.MAX_TERM_LENGTH)),
 and each UTF-8 character can be at most 4 bytes, but are predominantly [3 
bytes](https://en.wikipedia.org/wiki/CJK_characters).
    
    ## Long-term Solution
    
    Currently being tracked via 
[METRON-542](https://issues.apache.org/jira/browse/METRON-542)
    
    ## Testing
    
    Inserted this into my bare metal cluster via `curl --data 
"@/root/incubator-metron/metron-deployment/roles/metron_elasticsearch_templates/files/es_templates/bro_index.template"
 -XPUT server5:9200/_template/bro_index`
    
    Did some kibana queries and cluster monitoring on my bare metal cluster.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JonZeolla/incubator-metron METRON-517

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/358.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #358
    
----
commit b63040f0fa3b735fc9636a5eb98b966f164fe1f3
Author: Jon Zeolla <zeo...@gmail.com>
Date:   2016-11-13T03:22:07Z

    Restrict Bro HTTP URI field to 10922 characters so it will not surpass 
32766 bytes (Assumes a UTF-8 maxb 3 bytes per character)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to