[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias

Erick Erickson (JIRA) Thu, 06 Oct 2016 09:25:32 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552397#comment-15552397
 ]


Erick Erickson commented on SOLR-9562:
--------------------------------------

We can already add custom properties to replicas, so a mechanism exists for 
putting this stuff in the replica information. Not quite sure if that is the 
right place for them to go though since this is more a shard-level property 
than replica-level.

I'd guess a specialized TimeSeriesUpdateHandler and TimeSeriesQueryComponent 
would keep this kind of support from interfering with the un-specialized case. 
The query component/router could then "know" what to look at to route the 
queries. However, before going there how much effort is really spent on a 
filter query that has no hits? Is it worth the complexity? Hmmm, I guess 
there'd be more work than I thought if there are multiple fq clauses.  What I'm 
wondering is if the right place to put this would be in the query routing or at 
at each replica level. I.e. build a component that intelligently evaluated fq 
clauses (based on configuration) and short-circuited the rest of the query 
processing. Hmmmm2. If we ever changed fq processing to handle implicit union 
rather than intersection that'd break wouldn't it? Siiiiggggh....

As far as the auto-scaling stuff goes, there's the autoAddReplica work, but 
that requires shared filesystems, right?

I'm a little leery of how automatically adding replicas would work in practice. 
Detecting load and adding a replica would entail replicating the entire index 
(which may be huge) precisely at a time when the system was busy. Would 
scheduling it be better? I.e. "at midnight, remove extra replicas of shards 3 
days old and add extra replicas for the most recent shard" or something. 
Operations people can break out in hives when programs start messing with their 
network unpredictably.

> Minimize queried collections for time series alias
> --------------------------------------------------
>
>                 Key: SOLR-9562
>                 URL: https://issues.apache.org/jira/browse/SOLR-9562
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Eungsop Yoo
>            Priority: Minor
>         Attachments: SOLR-9562-v2.patch, SOLR-9562.patch
>
>
> For indexing time series data(such as large log data), we can create a new 
> collection regularly(hourly, daily, etc.) with a write alias and create a 
> read alias for all of those collections. But all of the collections of the 
> read alias are queried even if we search over very narrow time window. In 
> this case, the docs to be queried may be stored in very small portion of 
> collections. So we don't need to do that.
> I suggest this patch for read alias to minimize queried collections. Three 
> parameters for CREATEALIAS action are added.
> || Key || Type || Required || Default || Description ||
> | timeField | string | No | | The time field name for time series data. It 
> should be date type. |
> | dateTimeFormat | string | No | | The format of timestamp for collection 
> creation. Every collection should has a suffix(start with "_") with this 
> format. 
> Ex. dateTimeFormat: yyyyMMdd, collectionName: col_20160927
> See 
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
>  |
> | timeZone | string | No | | The time zone information for dateTimeFormat 
> parameter.
> Ex. GMT+9. 
> See 
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
>  |
> And then when we query with filter query like this "timeField:\[fromTime TO 
> toTime\]", only the collections have the docs for a given time range will be 
> queried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias

Reply via email to