anandsubbu commented on a change in pull request #1572: METRON-2327 - Support for SOLR time-based arrays URL: https://github.com/apache/metron/pull/1572#discussion_r353655928
########## File path: metron-platform/metron-solr/metron-solr-common/README.md ########## @@ -164,3 +164,142 @@ The `create_collection.sh` script depends on schemas installed in `$METRON_HOME/ Additional schemas should be installed in that location if using the `create_collection.sh` script. Any collection can be deleted with the `delete_collection.sh` script. These scripts use the [Solr Collection API](http://lucene.apache.org/solr/guide/7_4/collections-api.html). + +## Time routed alias support +An alias is a pointer that points to a Collection. Sending a document to an alias sends it to the collection the alias points too. +The collection an alias points to can be changed with a single, low-cost operation. Time Routed Aliases (TRAs) is a SolrCloud feature +that manages an alias and a time sequential series of collections. + +A TRA automatically creates new collections and (optionally) deletes old ones as it routes documents to the correct collection +based on the timestamp of the event. This approach allows for indefinite indexing of data without degradation of performance otherwise +experienced due to the continuous growth of a single index. + +A TRA is defined with a minimum time and a defined interval period and SOLR provides a collection for each interval for a +contiguous set of datetime intervals from the start date to the maximum received document date. Collections are created to host documents based on examining the document's event-time. If a document does not currently +have a collection created for it, then starting at the minimum date SOLR will create a collection for each interval that does not have one + up until the interval period needed to store the current document. + +See SOLR documentation [\(1\)](https://lucene.apache.org/solr/guide/7_4/time-routed-aliases.html) +[\(2\)](https://lucene.apache.org/solr/guide/7_4/collections-api.html#createalias) for more information. + +### Setting up Time routed alias support + +Using SOLR's tme-based routing requires using SOLR's native datetime types. At the moment, Metron uses the LongTrie field type +to store dates, which is not a SOLR native datetime type. At a later stage the Metron code-base will be changed to use SOLR native datetime types +(as the LongTrie type is deprecated), but for now a workaround procedure has been created to allow for the use of time-based routing, while at the + same time allowing for Metron to continue to use the LongTrie type. This procedure only works for new collections, and is as follows: + +1. Add the following field type definition near the end of the schema.xml document (the entry must be inside the schema tags) + ``` + <fieldType name="datetime" stored="false" indexed="false" multiValued="false" docValues="true" class="solr.DatePointField"/> + ``` + + +1. Add the following field definition near the start of the schema.xml document (the entry must be inside the schema tags) + ``` + <field name="datetime" type="datetime" /> + ``` + + +1. Create the configset for the collection: Assuming that the relevant collections schema.xml and solrconfig.xml are located in +`$METRON_HOME/config/schema/$COLLECTION_NAME` folder, use the following command: + ``` + $METRON_HOME/bin/create_configset $COLLECTION_NAME + ``` + + +1. Create the time-based routing alias for the collection: +Assuming the following values: + * SOLR_HOST: Host SOLR is installed on + + * ALIAS_NAME: Name of the new alias Review comment: In my case, I created the alias name as `bro1` but I notice the indices are not being written. In the solr admin UI, I see the name of the collection as `bro1_2019-12-03`. Can you provide an example for "name of the new alias" ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services