[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias
[ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554054#comment-15554054 ] Eungsop Yoo commented on SOLR-9562: --- I will come back sooner or later. It would be better to open new issue for *time series router*, not this? > Minimize queried collections for time series alias > -- > > Key: SOLR-9562 > URL: https://issues.apache.org/jira/browse/SOLR-9562 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9562-v2.patch, SOLR-9562.patch > > > For indexing time series data(such as large log data), we can create a new > collection regularly(hourly, daily, etc.) with a write alias and create a > read alias for all of those collections. But all of the collections of the > read alias are queried even if we search over very narrow time window. In > this case, the docs to be queried may be stored in very small portion of > collections. So we don't need to do that. > I suggest this patch for read alias to minimize queried collections. Three > parameters for CREATEALIAS action are added. > || Key || Type || Required || Default || Description || > | timeField | string | No | | The time field name for time series data. It > should be date type. | > | dateTimeFormat | string | No | | The format of timestamp for collection > creation. Every collection should has a suffix(start with "_") with this > format. > Ex. dateTimeFormat: MMdd, collectionName: col_20160927 > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > | timeZone | string | No | | The time zone information for dateTimeFormat > parameter. > Ex. GMT+9. > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > And then when we query with filter query like this "timeField:\[fromTime TO > toTime\]", only the collections have the docs for a given time range will be > queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias
[ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554034#comment-15554034 ] Eungsop Yoo commented on SOLR-9562: --- Yes, AutoAddReplicas requires shared file systems. Actually my cluster is running on HDFS(replication factor 3) with 1 replica and AutoAddReplicas enabled. AutoAddReplicas feature works so so. At first there was a bug of missing docs during failover([SOLR-9236|https://issues.apache.org/jira/browse/SOLR-9236]), but it is fixed now. But there is still a problem. It takes very long time to failover, especially transaction log replaying takes longer time than I expect. So I keep tlogs as small as possible now. > Minimize queried collections for time series alias > -- > > Key: SOLR-9562 > URL: https://issues.apache.org/jira/browse/SOLR-9562 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9562-v2.patch, SOLR-9562.patch > > > For indexing time series data(such as large log data), we can create a new > collection regularly(hourly, daily, etc.) with a write alias and create a > read alias for all of those collections. But all of the collections of the > read alias are queried even if we search over very narrow time window. In > this case, the docs to be queried may be stored in very small portion of > collections. So we don't need to do that. > I suggest this patch for read alias to minimize queried collections. Three > parameters for CREATEALIAS action are added. > || Key || Type || Required || Default || Description || > | timeField | string | No | | The time field name for time series data. It > should be date type. | > | dateTimeFormat | string | No | | The format of timestamp for collection > creation. Every collection should has a suffix(start with "_") with this > format. > Ex. dateTimeFormat: MMdd, collectionName: col_20160927 > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > | timeZone | string | No | | The time zone information for dateTimeFormat > parameter. > Ex. GMT+9. > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > And then when we query with filter query like this "timeField:\[fromTime TO > toTime\]", only the collections have the docs for a given time range will be > queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias
[ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554031#comment-15554031 ] Eungsop Yoo commented on SOLR-9562: --- I run a SolrCloud cluster for log indexing with a daily created collection which has only 1 replica and multiple shards. (But they are stored in HDFS with 3 replica and autoAddReplicas feature is enabled.) In my use case the query performance doesn't matter so 1 replica would be enough. The indexing performance for given system resources is best with 1 replica. But in some other use cases your idea would make sense. Deleting TTL expired documents([SOLR-5795|https://issues.apache.org/jira/browse/SOLR-5795]) is not efficient for large log data. So I create and delete a daily collection every morning in my crontab. We need to find a smarter way for maintaining collections or shards of time series data. > Minimize queried collections for time series alias > -- > > Key: SOLR-9562 > URL: https://issues.apache.org/jira/browse/SOLR-9562 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9562-v2.patch, SOLR-9562.patch > > > For indexing time series data(such as large log data), we can create a new > collection regularly(hourly, daily, etc.) with a write alias and create a > read alias for all of those collections. But all of the collections of the > read alias are queried even if we search over very narrow time window. In > this case, the docs to be queried may be stored in very small portion of > collections. So we don't need to do that. > I suggest this patch for read alias to minimize queried collections. Three > parameters for CREATEALIAS action are added. > || Key || Type || Required || Default || Description || > | timeField | string | No | | The time field name for time series data. It > should be date type. | > | dateTimeFormat | string | No | | The format of timestamp for collection > creation. Every collection should has a suffix(start with "_") with this > format. > Ex. dateTimeFormat: MMdd, collectionName: col_20160927 > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > | timeZone | string | No | | The time zone information for dateTimeFormat > parameter. > Ex. GMT+9. > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > And then when we query with filter query like this "timeField:\[fromTime TO > toTime\]", only the collections have the docs for a given time range will be > queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias
[ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550996#comment-15550996 ] Eungsop Yoo commented on SOLR-9562: --- I see. I found some articles related to this issue and read them. http://stackoverflow.com/questions/32343813/custom-sharding-or-auto-sharding-on-solrcloud https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud?focusedCommentId=61317676#comment-61317676 On manual sharding, the client should do some work related to shard for indexing and querying. But It seems this work can be moved to SolrCloud server from the client. So we can make new time series router which does the works related to sharding for time series data. How do you think about this approach? > Minimize queried collections for time series alias > -- > > Key: SOLR-9562 > URL: https://issues.apache.org/jira/browse/SOLR-9562 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9562-v2.patch, SOLR-9562.patch > > > For indexing time series data(such as large log data), we can create a new > collection regularly(hourly, daily, etc.) with a write alias and create a > read alias for all of those collections. But all of the collections of the > read alias are queried even if we search over very narrow time window. In > this case, the docs to be queried may be stored in very small portion of > collections. So we don't need to do that. > I suggest this patch for read alias to minimize queried collections. Three > parameters for CREATEALIAS action are added. > || Key || Type || Required || Default || Description || > | timeField | string | No | | The time field name for time series data. It > should be date type. | > | dateTimeFormat | string | No | | The format of timestamp for collection > creation. Every collection should has a suffix(start with "_") with this > format. > Ex. dateTimeFormat: MMdd, collectionName: col_20160927 > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > | timeZone | string | No | | The time zone information for dateTimeFormat > parameter. > Ex. GMT+9. > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > And then when we query with filter query like this "timeField:\[fromTime TO > toTime\]", only the collections have the docs for a given time range will be > queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9562) Minimize queried collections for time series alias
[ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eungsop Yoo updated SOLR-9562: -- Description: For indexing time series data(such as large log data), we can create a new collection regularly(hourly, daily, etc.) with a write alias and create a read alias for all of those collections. But all of the collections of the read alias are queried even if we search over very narrow time window. In this case, the docs to be queried may be stored in very small portion of collections. So we don't need to do that. I suggest this patch for read alias to minimize queried collections. Three parameters for CREATEALIAS action are added. || Key || Type || Required || Default || Description || | timeField | string | No | | The time field name for time series data. It should be date type. | | dateTimeFormat | string | No | | The format of timestamp for collection creation. Every collection should has a suffix(start with "_") with this format. Ex. dateTimeFormat: MMdd, collectionName: col_20160927 See [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. | | timeZone | string | No | | The time zone information for dateTimeFormat parameter. Ex. GMT+9. See [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. | And then when we query with filter query like this "timeField:\[fromTime TO toTime\]", only the collections have the docs for a given time range will be queried. was: For indexing time series data(such as large log data), we can create a new collection regularly(hourly, daily, etc.) with a write alias and create a read alias for all of those collections. But all of the collections of the read alias are queried even if we search over very narrow time window. In this case, the docs to be queried may be stored in very small portion of collections. So we don't need to do that. I suggest this patch for read alias to minimize queried collections. Three parameters for CREATEALIAS action are added. || Key || Type || Required || Default || Description || | timeField | string | No | | The time field name for time series data. It should be date type. | | dateTimeFormat | string | No | | The format of timestamp for collection creation. Every collection should has a suffix(start with "_") with this format. Ex. dateTimeFormat: MMdd, collectionName: col_20160927 See [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. | | timeZone | string | No | | The time zone information for dateTimeFormat parameter. Ex. GMT+9. See [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. | And then when we query with filter query like this "timeField:\[fromTime TO toTime\]", only the collections have the docs for a given time range will be queried. > Minimize queried collections for time series alias > -- > > Key: SOLR-9562 > URL: https://issues.apache.org/jira/browse/SOLR-9562 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9562-v2.patch, SOLR-9562.patch > > > For indexing time series data(such as large log data), we can create a new > collection regularly(hourly, daily, etc.) with a write alias and create a > read alias for all of those collections. But all of the collections of the > read alias are queried even if we search over very narrow time window. In > this case, the docs to be queried may be stored in very small portion of > collections. So we don't need to do that. > I suggest this patch for read alias to minimize queried collections. Three > parameters for CREATEALIAS action are added. > || Key || Type || Required || Default || Description || > | timeField | string | No | | The time field name for time series data. It > should be date type. | > | dateTimeFormat | string | No | | The format of timestamp for collection > creation. Every collection should has a suffix(start with "_") with this > format. > Ex. dateTimeFormat: MMdd, collectionName: col_20160927 > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > | timeZone | string | No | | The time zone information for dateTimeFormat > parameter. > Ex. GMT+9. > See > [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html]. > | > And then when we query with filter query like this "timeField:\[fromTime TO > toTime\]", only the collections have the docs for a given time range will be > queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias
[ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15544140#comment-15544140 ] Eungsop Yoo commented on SOLR-9562: --- I backported this patch to my own cluster, Solr 4.10.3-cdh5.4.9. It took over 20 seconds to query against last 30 minutes over the collections of 14 days without this patch, but it takes only 3 seconds now. > Minimize queried collections for time series alias > -- > > Key: SOLR-9562 > URL: https://issues.apache.org/jira/browse/SOLR-9562 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9562-v2.patch, SOLR-9562.patch > > > For indexing time series data(such as large log data), we can create a new > collection regularly(hourly, daily, etc.) with a write alias and create a > read alias for all of those collections. But all of the collections of the > read alias are queried even if we search over very narrow time window. In > this case, the docs to be queried may be stored in very small portion of > collections. So we don't need to do that. > I suggest this patch for read alias to minimize queried collections. Three > parameters for CREATEALIAS action are added. > || Key || Type || Required || Default || Description || > | timeField | string | No | | The time field name for time series data. It > should be date type. | > | dateTimeFormat | string | No | | The format of timestamp for collection > creation. Every collection should has a suffix(start with "_") with this > format. > Ex. dateTimeFormat: MMdd, collectionName: col_20160927 > See > [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. > | > | timeZone | string | No | | The time zone information for dateTimeFormat > parameter. > Ex. GMT+9. > See > [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. > | > And then when we query with filter query like this "timeField:\[fromTime TO > toTime\]", only the collections have the docs for a given time range will be > queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9562) Minimize queried collections for time series alias
[ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eungsop Yoo updated SOLR-9562: -- Attachment: SOLR-9562-v2.patch Some bugs are fixed. SimpleDateFormat is replaced with DateTimeFormatter. > Minimize queried collections for time series alias > -- > > Key: SOLR-9562 > URL: https://issues.apache.org/jira/browse/SOLR-9562 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9562-v2.patch, SOLR-9562.patch > > > For indexing time series data(such as large log data), we can create a new > collection regularly(hourly, daily, etc.) with a write alias and create a > read alias for all of those collections. But all of the collections of the > read alias are queried even if we search over very narrow time window. In > this case, the docs to be queried may be stored in very small portion of > collections. So we don't need to do that. > I suggest this patch for read alias to minimize queried collections. Three > parameters for CREATEALIAS action are added. > || Key || Type || Required || Default || Description || > | timeField | string | No | | The time field name for time series data. It > should be date type. | > | dateTimeFormat | string | No | | The format of timestamp for collection > creation. Every collection should has a suffix(start with "_") with this > format. > Ex. dateTimeFormat: MMdd, collectionName: col_20160927 > See > [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. > | > | timeZone | string | No | | The time zone information for dateTimeFormat > parameter. > Ex. GMT+9. > See > [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. > | > And then when we query with filter query like this "timeField:\[fromTime TO > toTime\]", only the collections have the docs for a given time range will be > queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias
[ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525038#comment-15525038 ] Eungsop Yoo commented on SOLR-9562: --- {quote} Thanks for contributing. I'm missing something... why is this metadata on a Collection Alias? What do Collection Aliases logically have to do with this feature? Wouldn't associating with the Shard be better, assuming a design in which there is one Collection & manual sharding? {quote} I run a SolrCloud cluster for indexing log data which has 10 billion docs a day and the log data are kept for 10 days. So I create a new collection per a day time frame and delete the oldest collection every day. Read and write aliases are created for those collections. I use [Banana|https://github.com/lucidworks/banana] to query from SolrCloud with read alias. I think that using read alias is the most transparent way for rolling collections for the Solr client such as Banana. So some metadata are added to Alias. {quote} BTW I consider SimpleDateFormat and friends a dead API with the advent of Java 8's new time API: https://docs.oracle.com/javase/8/docs/api/java/time/package-summary.html {quote} I see. > Minimize queried collections for time series alias > -- > > Key: SOLR-9562 > URL: https://issues.apache.org/jira/browse/SOLR-9562 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9562.patch > > > For indexing time series data(such as large log data), we can create a new > collection regularly(hourly, daily, etc.) with a write alias and create a > read alias for all of those collections. But all of the collections of the > read alias are queried even if we search over very narrow time window. In > this case, the docs to be queried may be stored in very small portion of > collections. So we don't need to do that. > I suggest this patch for read alias to minimize queried collections. Three > parameters for CREATEALIAS action are added. > || Key || Type || Required || Default || Description || > | timeField | string | No | | The time field name for time series data. It > should be date type. | > | dateTimeFormat | string | No | | The format of timestamp for collection > creation. Every collection should has a suffix(start with "_") with this > format. > Ex. dateTimeFormat: MMdd, collectionName: col_20160927 > See > [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. > | > | timeZone | string | No | | The time zone information for dateTimeFormat > parameter. > Ex. GMT+9. > See > [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. > | > And then when we query with filter query like this "timeField:\[fromTime TO > toTime\]", only the collections have the docs for a given time range will be > queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9562) Minimize queried collections for time series alias
[ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eungsop Yoo updated SOLR-9562: -- Attachment: SOLR-9562.patch > Minimize queried collections for time series alias > -- > > Key: SOLR-9562 > URL: https://issues.apache.org/jira/browse/SOLR-9562 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9562.patch > > > For indexing time series data(such as large log data), we can create a new > collection regularly(hourly, daily, etc.) with a write alias and create a > read alias for all of those collections. But all of the collections of the > read alias are queried even if we search over very narrow time window. In > this case, the docs to be queried may be stored in very small portion of > collections. So we don't need to do that. > I suggest this patch for read alias to minimize queried collections. Three > parameters for CREATEALIAS action are added. > || Key || Type || Required || Default || Description || > | timeField | string | No | | The time field name for time series data. It > should be date type. | > | dateTimeFormat | string | No | | The format of timestamp for collection > creation. Every collection should has a suffix(start with "_") with this > format. > Ex. dateTimeFormat: MMdd, collectionName: col_20160927 > See > [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. > | > | timeZone | string | No | | The time zone information for dateTimeFormat > parameter. > Ex. GMT+9. > See > [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. > | > And then when we query with filter query like this "timeField:\[fromTime TO > toTime\]", only the collections have the docs for a given time range will be > queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9562) Minimize queried collections for time series alias
Eungsop Yoo created SOLR-9562: - Summary: Minimize queried collections for time series alias Key: SOLR-9562 URL: https://issues.apache.org/jira/browse/SOLR-9562 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Eungsop Yoo Priority: Minor Attachments: SOLR-9562.patch For indexing time series data(such as large log data), we can create a new collection regularly(hourly, daily, etc.) with a write alias and create a read alias for all of those collections. But all of the collections of the read alias are queried even if we search over very narrow time window. In this case, the docs to be queried may be stored in very small portion of collections. So we don't need to do that. I suggest this patch for read alias to minimize queried collections. Three parameters for CREATEALIAS action are added. || Key || Type || Required || Default || Description || | timeField | string | No | | The time field name for time series data. It should be date type. | | dateTimeFormat | string | No | | The format of timestamp for collection creation. Every collection should has a suffix(start with "_") with this format. Ex. dateTimeFormat: MMdd, collectionName: col_20160927 See [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. | | timeZone | string | No | | The time zone information for dateTimeFormat parameter. Ex. GMT+9. See [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. | And then when we query with filter query like this "timeField:\[fromTime TO toTime\]", only the collections have the docs for a given time range will be queried. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9236) AutoAddReplicas feature with one replica loses some documents not committed during failover
[ https://issues.apache.org/jira/browse/SOLR-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eungsop Yoo updated SOLR-9236: -- Description: I need to index huge amount of logs, so I decide to use AutoAddReplica feature with only one replica. When using AutoAddReplicas with one replica, some benefits are expected. - no redundant data files for replicas -- saving disk usage - best indexing performance I expected that Solr fails over just like HBase. The feature worked almost as it was expected, except for some missing documents during failover. I found two reasons for the missing. 1. The leader replica does not replay any transaction logs. But when there is only one replica, it should be the leader. So I made the leader replica replay the transaction logs when using AutoAddReplicas with on replica. But the above fix did not resolve the problem. 2. As failover occurred, the transaction log directory had a deeper directory depth. Just like this, tlog/tlog/tlog/... The transaction log could not be replayed, because the transaction log directory was changed during failover. So I made the transaction log directory not changed during failover. After these fixes, AutoAddReplicas with one replica fails over well without losing any documents. was: I need to index huge amount of logs, so I decide to use AutoAddReplica feature with only one replica. When using AutoAddReplicas with one replica, some benefits are expected. - no redundant data files for replicas -- saving disk usage - best indexing performance I expected that Solr fails over just like HBase. The feature worked almost as it was expected, except for some missing documents during failover. I found two regions for the missing. 1. The leader replica does not replay any transaction logs. But when there is only one replica, it should be the leader. So I made the leader replica replay the transaction logs when using AutoAddReplicas with on replica. But the above fix did not resolve the problem. 2. As failover occurred, the transaction log directory had a deeper directory depth. Just like this, tlog/tlog/tlog/... The transaction log could not be replayed, because the transaction log directory was changed during failover. So I made the transaction log directory not changed during failover. After these fixes, AutoAddReplicas with one replica fails over well without losing any documents. > AutoAddReplicas feature with one replica loses some documents not committed > during failover > --- > > Key: SOLR-9236 > URL: https://issues.apache.org/jira/browse/SOLR-9236 > Project: Solr > Issue Type: Bug > Components: hdfs, SolrCloud >Reporter: Eungsop Yoo >Assignee: Mark Miller >Priority: Minor > Attachments: SOLR-9236.patch, SOLR-9236.patch > > > I need to index huge amount of logs, so I decide to use AutoAddReplica > feature with only one replica. > When using AutoAddReplicas with one replica, some benefits are expected. > - no redundant data files for replicas > -- saving disk usage > - best indexing performance > I expected that Solr fails over just like HBase. > The feature worked almost as it was expected, except for some missing > documents during failover. > I found two reasons for the missing. > 1. The leader replica does not replay any transaction logs. But when there is > only one replica, it should be the leader. > So I made the leader replica replay the transaction logs when using > AutoAddReplicas with on replica. > But the above fix did not resolve the problem. > 2. As failover occurred, the transaction log directory had a deeper directory > depth. Just like this, tlog/tlog/tlog/... > The transaction log could not be replayed, because the transaction log > directory was changed during failover. > So I made the transaction log directory not changed during failover. > After these fixes, AutoAddReplicas with one replica fails over well without > losing any documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9236) AutoAddReplicas feature with one replica loses some documents not committed during failover
[ https://issues.apache.org/jira/browse/SOLR-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358090#comment-15358090 ] Eungsop Yoo commented on SOLR-9236: --- LGTM > AutoAddReplicas feature with one replica loses some documents not committed > during failover > --- > > Key: SOLR-9236 > URL: https://issues.apache.org/jira/browse/SOLR-9236 > Project: Solr > Issue Type: Bug > Components: hdfs, SolrCloud >Reporter: Eungsop Yoo >Assignee: Mark Miller >Priority: Minor > Attachments: SOLR-9236.patch, SOLR-9236.patch > > > I need to index huge amount of logs, so I decide to use AutoAddReplica > feature with only one replica. > When using AutoAddReplicas with one replica, some benefits are expected. > - no redundant data files for replicas > -- saving disk usage > - best indexing performance > I expected that Solr fails over just like HBase. > The feature worked almost as it was expected, except for some missing > documents during failover. > I found two regions for the missing. > 1. The leader replica does not replay any transaction logs. But when there is > only one replica, it should be the leader. > So I made the leader replica replay the transaction logs when using > AutoAddReplicas with on replica. > But the above fix did not resolve the problem. > 2. As failover occurred, the transaction log directory had a deeper directory > depth. Just like this, tlog/tlog/tlog/... > The transaction log could not be replayed, because the transaction log > directory was changed during failover. > So I made the transaction log directory not changed during failover. > After these fixes, AutoAddReplicas with one replica fails over well without > losing any documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9236) AutoAddReplicas feature with one replica loses some documents not committed during failover
[ https://issues.apache.org/jira/browse/SOLR-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eungsop Yoo updated SOLR-9236: -- Attachment: SOLR-9236.patch > AutoAddReplicas feature with one replica loses some documents not committed > during failover > --- > > Key: SOLR-9236 > URL: https://issues.apache.org/jira/browse/SOLR-9236 > Project: Solr > Issue Type: Bug > Components: hdfs, SolrCloud >Reporter: Eungsop Yoo >Priority: Minor > Attachments: SOLR-9236.patch > > > I need to index huge amount of logs, so I decide to use AutoAddReplica > feature with only one replica. > When using AutoAddReplicas with one replica, some benefits are expected. > - no redundant data files for replicas > -- saving disk usage > - best indexing performance > I expected that Solr fails over just like HBase. > The feature worked almost as it was expected, except for some missing > documents during failover. > I found two regions for the missing. > 1. The leader replica does not replay any transaction logs. But when there is > only one replica, it should be the leader. > So I made the leader replica replay the transaction logs when using > AutoAddReplicas with on replica. > But the above fix did not resolve the problem. > 2. As failover occurred, the transaction log directory had a deeper directory > depth. Just like this, tlog/tlog/tlog/... > The transaction log could not be replayed, because the transaction log > directory was changed during failover. > So I made the transaction log directory not changed during failover. > After these fixes, AutoAddReplicas with one replica fails over well without > losing any documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9236) AutoAddReplicas feature with one replica loses some documents not committed during failover
Eungsop Yoo created SOLR-9236: - Summary: AutoAddReplicas feature with one replica loses some documents not committed during failover Key: SOLR-9236 URL: https://issues.apache.org/jira/browse/SOLR-9236 Project: Solr Issue Type: Bug Components: hdfs, SolrCloud Reporter: Eungsop Yoo Priority: Minor I need to index huge amount of logs, so I decide to use AutoAddReplica feature with only one replica. When using AutoAddReplicas with one replica, some benefits are expected. - no redundant data files for replicas -- saving disk usage - best indexing performance I expected that Solr fails over just like HBase. The feature worked almost as it was expected, except for some missing documents during failover. I found two regions for the missing. 1. The leader replica does not replay any transaction logs. But when there is only one replica, it should be the leader. So I made the leader replica replay the transaction logs when using AutoAddReplicas with on replica. But the above fix did not resolve the problem. 2. As failover occurred, the transaction log directory had a deeper directory depth. Just like this, tlog/tlog/tlog/... The transaction log could not be replayed, because the transaction log directory was changed during failover. So I made the transaction log directory not changed during failover. After these fixes, AutoAddReplicas with one replica fails over well without losing any documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org