[ https://issues.apache.org/jira/browse/SOLR-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204175#comment-16204175 ]
David Smiley commented on SOLR-11299: ------------------------------------- Nice article on some of these ideas by [~markrmil...@gmail.com] https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/ 4 years old but still relevant. > Time partitioned collections (umbrella issue) > --------------------------------------------- > > Key: SOLR-11299 > URL: https://issues.apache.org/jira/browse/SOLR-11299 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: David Smiley > Assignee: David Smiley > > Solr ought to have the ability to manage large-scale time-series data (think > logs or sensor data / IOT) itself without a lot of manual/external work. The > most naive and painless approach today is to create a collection with a high > numShards with hash routing but this isn't as good as partitioning the > underlying indexes by time for these reasons: > * Easy to scale up/down horizontally as data/requirements change. (No need > to over-provision, use shard splitting, or re-index with different config) > * Faster queries: > ** can search fewer shards, reducing overall load > ** realtime search is more tractable (since most shards are stable -- > good caches) > ** "recent" shards (that might be queried more) can be allocated to > faster hardware > ** aged out data is simply removed, not marked as deleted. Deleted docs > still have search overhead. > * Outages of a shard result in a degraded but sometimes a useful system > nonetheless (compare to random subset missing) > Ideally you could set this up once and then simply work with a collection > (potentially actually an alias) in a normal way (search or update), letting > Solr handle the addition of new partitions, removing of old ones, and > appropriate routing of requests depending on their nature. > This issue is an umbrella issue for the particular tasks that will make it > all happen -- either subtasks or issue linking. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org