[ 
https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550996#comment-15550996
 ] 

Eungsop Yoo commented on SOLR-9562:
-----------------------------------

I see. 

I found some articles related to this issue and read them.
http://stackoverflow.com/questions/32343813/custom-sharding-or-auto-sharding-on-solrcloud
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud?focusedCommentId=61317676#comment-61317676

On manual sharding, the client should do some work related to shard for 
indexing and querying. But It seems this work can be moved to SolrCloud server 
from the client. So we can make new time series router which does the works 
related to sharding for time series data. How do you think about this approach?

> Minimize queried collections for time series alias
> --------------------------------------------------
>
>                 Key: SOLR-9562
>                 URL: https://issues.apache.org/jira/browse/SOLR-9562
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Eungsop Yoo
>            Priority: Minor
>         Attachments: SOLR-9562-v2.patch, SOLR-9562.patch
>
>
> For indexing time series data(such as large log data), we can create a new 
> collection regularly(hourly, daily, etc.) with a write alias and create a 
> read alias for all of those collections. But all of the collections of the 
> read alias are queried even if we search over very narrow time window. In 
> this case, the docs to be queried may be stored in very small portion of 
> collections. So we don't need to do that.
> I suggest this patch for read alias to minimize queried collections. Three 
> parameters for CREATEALIAS action are added.
> || Key || Type || Required || Default || Description ||
> | timeField | string | No | | The time field name for time series data. It 
> should be date type. |
> | dateTimeFormat | string | No | | The format of timestamp for collection 
> creation. Every collection should has a suffix(start with "_") with this 
> format. 
> Ex. dateTimeFormat: yyyyMMdd, collectionName: col_20160927
> See 
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
>  |
> | timeZone | string | No | | The time zone information for dateTimeFormat 
> parameter.
> Ex. GMT+9. 
> See 
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
>  |
> And then when we query with filter query like this "timeField:\[fromTime TO 
> toTime\]", only the collections have the docs for a given time range will be 
> queried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to