GitHub user JonZeolla opened a pull request: https://github.com/apache/incubator-metron/pull/358
METRON-517: Update elasticsearch bro templates for uri ## Problem [METRON-517](https://issues.apache.org/jira/browse/METRON-517) The bro uri field in [HTTP::Info](https://www.bro.org/sphinx/scripts/base/protocols/http/main.bro.html#type-HTTP::Info) can grow to a size which fails to insert the message into Elasticsearch. The related error message is: `IllegalArgumentException[Document contains at least one immense term in field=\"uri\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped...` ## Short-term Solution Set the elasticsearch template to [truncate the URI field](https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html) at 10922 (32776 / 3), as [RFC 3986](https://tools.ietf.org/html/rfc3986) allows UTF-8 characters to be used in URIs. 32766 is a size limitation built into Lucene ([MAX_TERM_LENGTH](https://lucene.apache.org/core/6_2_1/core/constant-values.html#org.apache.lucene.index.IndexWriter.MAX_TERM_LENGTH)), and each UTF-8 character can be at most 4 bytes, but are predominantly [3 bytes](https://en.wikipedia.org/wiki/CJK_characters). ## Long-term Solution Currently being tracked via [METRON-542](https://issues.apache.org/jira/browse/METRON-542) ## Testing Inserted this into my bare metal cluster via `curl --data "@/root/incubator-metron/metron-deployment/roles/metron_elasticsearch_templates/files/es_templates/bro_index.template" -XPUT server5:9200/_template/bro_index` Did some kibana queries and cluster monitoring on my bare metal cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JonZeolla/incubator-metron METRON-517 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-metron/pull/358.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #358 ---- commit b63040f0fa3b735fc9636a5eb98b966f164fe1f3 Author: Jon Zeolla <zeo...@gmail.com> Date: 2016-11-13T03:22:07Z Restrict Bro HTTP URI field to 10922 characters so it will not surpass 32766 bytes (Assumes a UTF-8 maxb 3 bytes per character) ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---