[
https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554031#comment-15554031
]
Eungsop Yoo commented on SOLR-9562:
-----------------------------------
I run a SolrCloud cluster for log indexing with a daily created collection
which has only 1 replica and multiple shards. (But they are stored in HDFS with
3 replica and autoAddReplicas feature is enabled.) In my use case the query
performance doesn't matter so 1 replica would be enough. The indexing
performance for given system resources is best with 1 replica. But in some
other use cases your idea would make sense.
Deleting TTL expired
documents([SOLR-5795|https://issues.apache.org/jira/browse/SOLR-5795]) is not
efficient for large log data. So I create and delete a daily collection every
morning in my crontab. We need to find a smarter way for maintaining
collections or shards of time series data.
> Minimize queried collections for time series alias
> --------------------------------------------------
>
> Key: SOLR-9562
> URL: https://issues.apache.org/jira/browse/SOLR-9562
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Eungsop Yoo
> Priority: Minor
> Attachments: SOLR-9562-v2.patch, SOLR-9562.patch
>
>
> For indexing time series data(such as large log data), we can create a new
> collection regularly(hourly, daily, etc.) with a write alias and create a
> read alias for all of those collections. But all of the collections of the
> read alias are queried even if we search over very narrow time window. In
> this case, the docs to be queried may be stored in very small portion of
> collections. So we don't need to do that.
> I suggest this patch for read alias to minimize queried collections. Three
> parameters for CREATEALIAS action are added.
> || Key || Type || Required || Default || Description ||
> | timeField | string | No | | The time field name for time series data. It
> should be date type. |
> | dateTimeFormat | string | No | | The format of timestamp for collection
> creation. Every collection should has a suffix(start with "_") with this
> format.
> Ex. dateTimeFormat: yyyyMMdd, collectionName: col_20160927
> See
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
> |
> | timeZone | string | No | | The time zone information for dateTimeFormat
> parameter.
> Ex. GMT+9.
> See
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
> |
> And then when we query with filter query like this "timeField:\[fromTime TO
> toTime\]", only the collections have the docs for a given time range will be
> queried.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]