Import Handler using shell scripts
Is it possible to call dataimport handler from a shell script? I have not found any documentation regarding this. Any pointers? -- Best, Vijay
Re: Clean checkbox on DIH
Even though it's a minority, I think it should be disabled by default. That's a cleaner approach. Accidentally running it in prod without unchecking 'clean' can be disastrous. On Apr 28, 2017, at 8:01AM, Mahmoud Almokademwrote: > > Thanks Shawn, > > We already using a shell scripts to do our import and using fullimport > command to do our delta import and everything is doing well several years > ago. But default of the UI is full import with clean and commit. If I press > the Execute button by mistake the whole index is cleaned without any > notification. > > Thanks, > Mahmoud > > > > > On Fri, Apr 28, 2017 at 2:51 PM, Shawn Heisey wrote: > > On 4/28/2017 5:11 AM, Mahmoud Almokadem wrote: >> >>> I'd like to request to uncheck the "Clean" checkbox by default on DIH >>> >> page, >> >>> cause it cleaned the whole index about 2TB when I click Execute button by >>> wrong. Or show a confirmation message that the whole index will be >>> >> cleaned!! >> >> When somebody is doing a full-import, clean is what almost all users are >> going to want. If you're wanting to do full-import without cleaning, >> then you are in the minority. It is perhaps a fairly large minority, >> but still not the majority. >> >> Also, once you move into production, you should not be using the admin >> UI for this. You should be calling the DIH handler directly with HTTP >> from another source, which might be a shell script using curl, or a >> full-blown program in another language. >> >> Thanks, >> Shawn > > > >
Re: DIH Speed
Let me clarify - DIH is running on Solr 6.5.0 that calls a different solr instance running on 4.5.0, which has 150M documents. If we try fetch them using DIH onto new solr cluster, wouldn't it result in deep paging on solr 4.5.0 and drastically slow down indexing on solr 6.5.0? On Thu, Apr 27, 2017 at 4:40 PM, Erick Erickson <erickerick...@gmail.com> wrote: > I'm unclear why DIH an deep paging are mixed. DIH is > indexing and deep paging is querying. > > If it's querying, consider cursorMark or the /export handler. > https://lucidworks.com/2013/12/12/coming-soon-to-solr- > efficient-cursor-based-iteration-of-large-result-sets/ > > If it's DIH, please explain a bit more. > > Best, > Erick > > On Thu, Apr 27, 2017 at 3:37 PM, Vijay Kokatnur > <kokatnur.vi...@gmail.com> wrote: > > We have a new solr 6.5.0 cluster, for which data is being imported via > DIH > > from another Solr cluster running version 4.5.0. > > > > This question comes back to deep paging, but we have observed that after > 30 > > minutes of querying the rate of processing goes down from 400/s to about > > 120/s. At that point it has processed only 500K of 1.3M docs. Is there > > any way to speed this up? > > > > And, I can't go back to the source for the data. > > > > -- > -- Best, Vijay
Re: DIH Speed
Hey Shawn, Unfortunately, we can't upgrade the existing cluster. That was my first approach as well. Yes, SolrEntityProcessor is used so it results in deep paging after certain rows. I have observed that instead of importing for a larger period, if data is imported only for 4 hours at a time, import process is much faster. Since we are importing for several months it would be nice if dataimport can be scripted, in bash or python. But I am can't find any documentation on it. Any pointers? -- *From:* Shawn Heisey*Sent:* Thursday, April 27, 2017 5:07 PM *To:* solr-user@lucene.apache.org *Subject:* Re: DIH Speed On 4/27/2017 5:40 PM, Erick Erickson wrote: > I'm unclear why DIH an deep paging are mixed. DIH is indexing and deep paging is querying. > > If it's querying, consider cursorMark or the /export handler. https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ Very likely they are using SolrEntityProcessor. Vijay, if the source server were running 4.7 (or later) instead of 4.5, you could enable cursorMark for SolrEntityProcessor in 6.5.0 as Erick mentioned, and pagination would be immensely more efficient. Unfortunately, 4.5 doesn't support cursorMark. https://issues.apache.org/jira/browse/SOLR-9668 Any chance you could upgrade the source server to a later 4.x version? Thanks, Shawn
DIH Speed
We have a new solr 6.5.0 cluster, for which data is being imported via DIH from another Solr cluster running version 4.5.0. This question comes back to deep paging, but we have observed that after 30 minutes of querying the rate of processing goes down from 400/s to about 120/s. At that point it has processed only 500K of 1.3M docs. Is there any way to speed this up? And, I can't go back to the source for the data. --
Split Shard not working
We recently upgraded 4.5 index to 6.5 using IndexUpgrader. The index size is around 600 GB on disk. When we try to split it using SPLITSHARD, it creates two new sub shards on the node and eventually crashes before completely the split. After restart, the original shard size if around 100 GB and each sub shards are around 2 GB. They are no doubt not fully constructed. We tried different heap settings as well- 15, 20 and 30 GB, but it always crashed. RAM is about 256GB. What's going on here? Any one faced this situation before?
Solr Memory Usage
I am observing some weird behavior with how Solr is using memory. We are running both Solr and zookeeper on the same node. We tested memory settings on Solr Cloud Setup of 1 shard with 146GB index size, and 2 Shard Solr setup with 44GB index size. Both are running on similar beefy machines. After running the setup for 3-4 days, I see that a lot of memory is inactive in all the nodes - 99052952 total memory 98606256 used memory 19143796 active memory 75063504 inactive memory And inactive memory is never reclaimed by the OS. When total memory size is reached, latency and disk IO shoots up. We observed this behavior in both Solr Cloud setup with 1 shard and Solr setup with 2 shards. For the Solr Cloud setup, we are running a cron job with following command to clear out the inactive memory. It is working as expected. Even though the index size of Cloud is 146GB, the used memory is always below 55GB. Our response times are better and no errors/exceptions are thrown. (This command causes issue in 2 Shard setup) echo 3 /proc/sys/vm/drop_caches We have disabled the query, doc and solr caches in our setup. Zookeeper is using around 10GB of memory and we are not running any other process in this system. Has anyone faced this issue before?
Re: SpanQuery with Boolean Queries
Pretty neat. Thanks! On Fri, Apr 25, 2014 at 2:44 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I am not sure how OR clauses are executed. But after re-reading your mail, I think you can use SpanOrQuery (for your q1) in your custom query parser plugin. val q2 = new SpanOrQuery( new SpanTermQuery(new Term(BookingRecordId, ID_1)), new SpanTermQuery(new Term(BookingRecordId, ID_N)) ); On Friday, April 25, 2014 3:22 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Thanks Ahmet. It worked! Does solr execute these nested queries in parallel? On Thu, Apr 24, 2014 at 12:53 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, May be you can use _query_ hook? _query_:{!span}BookingRecordId:234 OrderLineType:11 OR _query_:{!span} OrderLineType:13 + BookingRecordId:ID_N Ahmet On Thursday, April 24, 2014 9:34 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hi, I have defined a SpanQuery for proximity search like - val q1 = new SpanTermQuery(new Term(BookingRecordId, 234)) val q2 = new SpanTermQuery(new Term(OrderLineType, 11)) val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId) val sp = Array[SpanQuery](q1, q2m) val q = new SpanNearQuery(sp, -1, false) Query: *fq={!span} BookingRecordId: 234+OrderLineType11* However, I need to look up by multiple BookingRecordIds with an OR - *fq={!span}OrderLineType:13 + (BookingRecordId:ID_1 OR ... OR BookingRecordId:ID_N)* I can't specify multiple *span* in the same query like - *{!span} OrderLineType:13 + BookingRecordId:ID_1 OR ... OR {!span} OrderLineType:13 + BookingRecordId:ID_N* Is there any recommended to way to achieve this? Thanks, Vijay
Issue with SpanQuery
I have been working on SpanQuery for some time now to look up multivalued fields and found one more issue - Now if a document has following lookup fields among others *BookingRecordId*: [ 100268421, 190131, 8263325 ], *OrderLineType*: [ 13, 1, 11 ], Here is the query I construct - val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421)) val q2 = new SpanTermQuery(new Term(OrderLineType, 13)) val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId) val sp = Array[SpanQuery](q1, q2m) val q = new SpanNearQuery(sp, -1, false) Query to find element at first index position works fine - *{!span} BookingRecordId: 100268421 +OrderLineType:13* but query to find element at third index position doesn't return any result. - *{!span} BookingRecordId: 8263325 +OrderLineType:11 * If I increase the slope to 4 then it returns correct result. But it also matches BookingRecordId: 100268421 with OrderLineType:11 which is incorrect. I thought SpanQuery works for any multiValued field size. Any ideas how I can fix this? Thanks, -Vijay
Re: Issue with SpanQuery
Hey Ehmet, Here is the field def - field name=BookingRecordId type=token indexed=true stored=true multiValued=true omitTermFreqAndPositions=false/ fieldType name=token class=solr.TextField omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Can you paste your field definition of BookingRecordId and OrderLineType? It could be something related to positionIncrementGap. Ahmet On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote: Facing the same problem!! I have noticed it works fine as long as you're looking up the first index position. Anyone faced similar problem before? On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur kokatnur.vi...@gmail.comwrote: I have been working on SpanQuery for some time now to look up multivalued fields and found one more issue - Now if a document has following lookup fields among others *BookingRecordId*: [ 100268421, 190131, 8263325 ], *OrderLineType*: [ 13, 1, 11 ], Here is the query I construct - val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421)) val q2 = new SpanTermQuery(new Term(OrderLineType, 13)) val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId) val sp = Array[SpanQuery](q1, q2m) val q = new SpanNearQuery(sp, -1, false) Query to find element at first index position works fine - *{!span} BookingRecordId: 100268421 +OrderLineType:13* but query to find element at third index position doesn't return any result. - *{!span} BookingRecordId: 8263325 +OrderLineType:11 * If I increase the slope to 4 then it returns correct result. But it also matches BookingRecordId: 100268421 with OrderLineType:11 which is incorrect. I thought SpanQuery works for any multiValued field size. Any ideas how I can fix this? Thanks, -Vijay
Re: Issue with SpanQuery
Thanks Ahmet, I'll give that a try. Do I need to re-index to add/update positionIncrementGap? On Mon, Apr 28, 2014 at 3:31 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I would add positionIncrementGap to fieldType definitions and experiment with different values. 0, 1 and 100. fieldType name=token class=solr.TextField omitNorms=true positionIncrementGap=1 Same with OrderLineType too On Tuesday, April 29, 2014 1:25 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hey Ehmet, Here is the field def - field name=BookingRecordId type=token indexed=true stored=true multiValued=true omitTermFreqAndPositions=false/ fieldType name=token class=solr.TextField omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Can you paste your field definition of BookingRecordId and OrderLineType? It could be something related to positionIncrementGap. Ahmet On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote: Facing the same problem!! I have noticed it works fine as long as you're looking up the first index position. Anyone faced similar problem before? On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur kokatnur.vi...@gmail.comwrote: I have been working on SpanQuery for some time now to look up multivalued fields and found one more issue - Now if a document has following lookup fields among others *BookingRecordId*: [ 100268421, 190131, 8263325 ], *OrderLineType*: [ 13, 1, 11 ], Here is the query I construct - val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421)) val q2 = new SpanTermQuery(new Term(OrderLineType, 13)) val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId) val sp = Array[SpanQuery](q1, q2m) val q = new SpanNearQuery(sp, -1, false) Query to find element at first index position works fine - *{!span} BookingRecordId: 100268421 +OrderLineType:13* but query to find element at third index position doesn't return any result. - *{!span} BookingRecordId: 8263325 +OrderLineType:11 * If I increase the slope to 4 then it returns correct result. But it also matches BookingRecordId: 100268421 with OrderLineType:11 which is incorrect. I thought SpanQuery works for any multiValued field size. Any ideas how I can fix this? Thanks, -Vijay
Re: Issue with SpanQuery
Adding positionIncrementGap=1 to the fields worked for me. I didn't re-index all the existing docs so it works for only future documents. On Mon, Apr 28, 2014 at 3:54 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, It is a index time setting so yes solr restart and re-indexing is required. So A small test case would be handy On Tuesday, April 29, 2014 1:35 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Thanks Ahmet, I'll give that a try. Do I need to re-index to add/update positionIncrementGap? On Mon, Apr 28, 2014 at 3:31 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I would add positionIncrementGap to fieldType definitions and experiment with different values. 0, 1 and 100. fieldType name=token class=solr.TextField omitNorms=true positionIncrementGap=1 Same with OrderLineType too On Tuesday, April 29, 2014 1:25 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hey Ehmet, Here is the field def - field name=BookingRecordId type=token indexed=true stored=true multiValued=true omitTermFreqAndPositions=false/ fieldType name=token class=solr.TextField omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Can you paste your field definition of BookingRecordId and OrderLineType? It could be something related to positionIncrementGap. Ahmet On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote: Facing the same problem!! I have noticed it works fine as long as you're looking up the first index position. Anyone faced similar problem before? On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur kokatnur.vi...@gmail.comwrote: I have been working on SpanQuery for some time now to look up multivalued fields and found one more issue - Now if a document has following lookup fields among others *BookingRecordId*: [ 100268421, 190131, 8263325 ], *OrderLineType*: [ 13, 1, 11 ], Here is the query I construct - val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421)) val q2 = new SpanTermQuery(new Term(OrderLineType, 13)) val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId) val sp = Array[SpanQuery](q1, q2m) val q = new SpanNearQuery(sp, -1, false) Query to find element at first index position works fine - *{!span} BookingRecordId: 100268421 +OrderLineType:13* but query to find element at third index position doesn't return any result. - *{!span} BookingRecordId: 8263325 +OrderLineType:11 * If I increase the slope to 4 then it returns correct result. But it also matches BookingRecordId: 100268421 with OrderLineType:11 which is incorrect. I thought SpanQuery works for any multiValued field size. Any ideas how I can fix this? Thanks, -Vijay
SpanQuery with Boolean Queries
Hi, I have defined a SpanQuery for proximity search like - val q1 = new SpanTermQuery(new Term(BookingRecordId, 234)) val q2 = new SpanTermQuery(new Term(OrderLineType, 11)) val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId) val sp = Array[SpanQuery](q1, q2m) val q = new SpanNearQuery(sp, -1, false) Query: *fq={!span} BookingRecordId: 234+OrderLineType11* However, I need to look up by multiple BookingRecordIds with an OR - *fq={!span}OrderLineType:13 + (BookingRecordId:ID_1 OR ... OR BookingRecordId:ID_N)* I can't specify multiple *span* in the same query like - *{!span} OrderLineType:13 + BookingRecordId:ID_1 OR ... OR {!span} OrderLineType:13 + BookingRecordId:ID_N* Is there any recommended to way to achieve this? Thanks, Vijay
Re: SpanQuery with Boolean Queries
Thanks Ahmet. It worked! Does solr execute these nested queries in parallel? On Thu, Apr 24, 2014 at 12:53 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, May be you can use _query_ hook? _query_:{!span}BookingRecordId:234 OrderLineType:11 OR _query_:{!span} OrderLineType:13 + BookingRecordId:ID_N Ahmet On Thursday, April 24, 2014 9:34 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hi, I have defined a SpanQuery for proximity search like - val q1 = new SpanTermQuery(new Term(BookingRecordId, 234)) val q2 = new SpanTermQuery(new Term(OrderLineType, 11)) val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId) val sp = Array[SpanQuery](q1, q2m) val q = new SpanNearQuery(sp, -1, false) Query: *fq={!span} BookingRecordId: 234+OrderLineType11* However, I need to look up by multiple BookingRecordIds with an OR - *fq={!span}OrderLineType:13 + (BookingRecordId:ID_1 OR ... OR BookingRecordId:ID_N)* I can't specify multiple *span* in the same query like - *{!span} OrderLineType:13 + BookingRecordId:ID_1 OR ... OR {!span} OrderLineType:13 + BookingRecordId:ID_N* Is there any recommended to way to achieve this? Thanks, Vijay
Re: Searching multivalue fields.
Since Span is the only way to solve the problem, I won't mind re-indexing. It's just that I have never done it before. We've got 80G of indexed data replicated on two nodes in a cluster. Is there a preferred way to go about re-indexing? On Tue, Apr 8, 2014 at 12:17 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Changing value of omitTermFreqAndPositions requires re-indexing, unfortunately. And I remembered that you don't want to reindex. It looks like we are out of options. Ahmet On Tuesday, April 8, 2014 12:45 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Yes I did restart solr, but did not re-index. Is that necessary? We've got 80G of indexed data, is there a preferred way of doing it without impacting performance? On Sat, Apr 5, 2014 at 9:44 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Did restart solr and you re-index after schema change? On Saturday, April 5, 2014 2:39 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: I had already tested with omitTermFreqAndPositions=false . I still got the same error. Is there something that I am overlooking? On Fri, Apr 4, 2014 at 2:45 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, Add omitTermFreqAndPositions=false attribute to fieldType definitions. fieldType name=string class=solr.StrField omitTermFreqAndPositions=false sortMissingLast=true / fieldType name=int class=solr.TrieIntField omitTermFreqAndPositions=false precisionStep=0 positionIncrementGap=0/ You don't need termVectors for this. 1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields. And please reply to solr user mail, so others can use the threat later on. Ahmet On Saturday, April 5, 2014 12:18 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hey Ahmet, Sorry it took some time to test this. But schema definition seem to conflict with SpanQuery. I get following error when I use Spans field OrderLineType was indexed without position data; cannot run SpanTermQuery (term=11) I changed field definition in the schema but can't find the right attribute to set this. My last attempt was with following definition field name=OrderLineType type=string indexed=true stored=true multiValued=true *termVectors=true termPositions=true termOffsets=true*/ Any ideas what I am doing wrong? Thanks, -Vijay On Wed, Mar 26, 2014 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, After reading the documentation it seems that following query is what you are after. It will return OrderId:345 without matching OrderId:123 SpanQuery q1 = new SpanTermQuery(new Term(BookingRecordId, 234)); SpanQuery q2 = new SpanTermQuery(new Term(OrderLineType, 11)); SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId); Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false); Ahmet On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, I personally don't understand joins very well. Just a guess may be FieldMaskingSpanQuery could be used? http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html Ahmet On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hi, I am bumping this thread again one last time to see if anyone has a solution. In it's current state, our application is storing child items as multivalue fields. Consider some orders, for example - { OrderId:123 BookingRecordId : [145, 987, *234*] OrderLineType : [11, 12, *13*] . } { OrderId:345 BookingRecordId : [945, 882, *234*] OrderLineType : [1, 12, *11*] . } { OrderId:678 BookingRecordId : [444] OrderLineType : [11] . } Here, If you look up for an Order with BookingRecordId: 234 And OrderLineType:11. You will get two orders with orderId : 123 and 345, which is correct. You have two arrays in both the orders that satisfy this condition. However, for OrderId:123, the value at 3rd index of OrderLineType array is 13 and not 11( this is for OrderId:345). So orderId 123 should be excluded. This is what I am trying to achieve. I got some suggestions from a solr-user to use FieldsCollapsing, Join, Block-join or string concatenation. None of these approaches can be used without re-indexing schema. Has anyone found a non-invasive solution for this? Thanks, -Vijay
Re: Searching multivalue fields.
Yes I did restart solr, but did not re-index. Is that necessary? We've got 80G of indexed data, is there a preferred way of doing it without impacting performance? On Sat, Apr 5, 2014 at 9:44 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Did restart solr and you re-index after schema change? On Saturday, April 5, 2014 2:39 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: I had already tested with omitTermFreqAndPositions=false . I still got the same error. Is there something that I am overlooking? On Fri, Apr 4, 2014 at 2:45 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, Add omitTermFreqAndPositions=false attribute to fieldType definitions. fieldType name=string class=solr.StrField omitTermFreqAndPositions=false sortMissingLast=true / fieldType name=int class=solr.TrieIntField omitTermFreqAndPositions=false precisionStep=0 positionIncrementGap=0/ You don't need termVectors for this. 1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields. And please reply to solr user mail, so others can use the threat later on. Ahmet On Saturday, April 5, 2014 12:18 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hey Ahmet, Sorry it took some time to test this. But schema definition seem to conflict with SpanQuery. I get following error when I use Spans field OrderLineType was indexed without position data; cannot run SpanTermQuery (term=11) I changed field definition in the schema but can't find the right attribute to set this. My last attempt was with following definition field name=OrderLineType type=string indexed=true stored=true multiValued=true *termVectors=true termPositions=true termOffsets=true*/ Any ideas what I am doing wrong? Thanks, -Vijay On Wed, Mar 26, 2014 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, After reading the documentation it seems that following query is what you are after. It will return OrderId:345 without matching OrderId:123 SpanQuery q1 = new SpanTermQuery(new Term(BookingRecordId, 234)); SpanQuery q2 = new SpanTermQuery(new Term(OrderLineType, 11)); SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId); Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false); Ahmet On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, I personally don't understand joins very well. Just a guess may be FieldMaskingSpanQuery could be used? http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html Ahmet On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hi, I am bumping this thread again one last time to see if anyone has a solution. In it's current state, our application is storing child items as multivalue fields. Consider some orders, for example - { OrderId:123 BookingRecordId : [145, 987, *234*] OrderLineType : [11, 12, *13*] . } { OrderId:345 BookingRecordId : [945, 882, *234*] OrderLineType : [1, 12, *11*] . } { OrderId:678 BookingRecordId : [444] OrderLineType : [11] . } Here, If you look up for an Order with BookingRecordId: 234 And OrderLineType:11. You will get two orders with orderId : 123 and 345, which is correct. You have two arrays in both the orders that satisfy this condition. However, for OrderId:123, the value at 3rd index of OrderLineType array is 13 and not 11( this is for OrderId:345). So orderId 123 should be excluded. This is what I am trying to achieve. I got some suggestions from a solr-user to use FieldsCollapsing, Join, Block-join or string concatenation. None of these approaches can be used without re-indexing schema. Has anyone found a non-invasive solution for this? Thanks, -Vijay
Re: Searching multivalue fields.
I had already tested with omitTermFreqAndPositions=false . I still got the same error. Is there something that I am overlooking? On Fri, Apr 4, 2014 at 2:45 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, Add omitTermFreqAndPositions=false attribute to fieldType definitions. fieldType name=string class=solr.StrField omitTermFreqAndPositions=false sortMissingLast=true / fieldType name=int class=solr.TrieIntField omitTermFreqAndPositions=false precisionStep=0 positionIncrementGap=0/ You don't need termVectors for this. 1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields. And please reply to solr user mail, so others can use the threat later on. Ahmet On Saturday, April 5, 2014 12:18 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hey Ahmet, Sorry it took some time to test this. But schema definition seem to conflict with SpanQuery. I get following error when I use Spans field OrderLineType was indexed without position data; cannot run SpanTermQuery (term=11) I changed field definition in the schema but can't find the right attribute to set this. My last attempt was with following definition field name=OrderLineType type=string indexed=true stored=true multiValued=true *termVectors=true termPositions=true termOffsets=true*/ Any ideas what I am doing wrong? Thanks, -Vijay On Wed, Mar 26, 2014 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, After reading the documentation it seems that following query is what you are after. It will return OrderId:345 without matching OrderId:123 SpanQuery q1 = new SpanTermQuery(new Term(BookingRecordId, 234)); SpanQuery q2 = new SpanTermQuery(new Term(OrderLineType, 11)); SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId); Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false); Ahmet On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, I personally don't understand joins very well. Just a guess may be FieldMaskingSpanQuery could be used? http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html Ahmet On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hi, I am bumping this thread again one last time to see if anyone has a solution. In it's current state, our application is storing child items as multivalue fields. Consider some orders, for example - { OrderId:123 BookingRecordId : [145, 987, *234*] OrderLineType : [11, 12, *13*] . } { OrderId:345 BookingRecordId : [945, 882, *234*] OrderLineType : [1, 12, *11*] . } { OrderId:678 BookingRecordId : [444] OrderLineType : [11] . } Here, If you look up for an Order with BookingRecordId: 234 And OrderLineType:11. You will get two orders with orderId : 123 and 345, which is correct. You have two arrays in both the orders that satisfy this condition. However, for OrderId:123, the value at 3rd index of OrderLineType array is 13 and not 11( this is for OrderId:345). So orderId 123 should be excluded. This is what I am trying to achieve. I got some suggestions from a solr-user to use FieldsCollapsing, Join, Block-join or string concatenation. None of these approaches can be used without re-indexing schema. Has anyone found a non-invasive solution for this? Thanks, -Vijay
Searching multivalue fields.
Hi, I am bumping this thread again one last time to see if anyone has a solution. In it's current state, our application is storing child items as multivalue fields. Consider some orders, for example - { OrderId:123 BookingRecordId : [145, 987, *234*] OrderLineType : [11, 12, *13*] . } { OrderId:345 BookingRecordId : [945, 882, *234*] OrderLineType : [1, 12, *11*] . } { OrderId:678 BookingRecordId : [444] OrderLineType : [11] . } Here, If you look up for an Order with BookingRecordId: 234 And OrderLineType:11. You will get two orders with orderId : 123 and 345, which is correct. You have two arrays in both the orders that satisfy this condition. However, for OrderId:123, the value at 3rd index of OrderLineType array is 13 and not 11( this is for OrderId:345). So orderId 123 should be excluded. This is what I am trying to achieve. I got some suggestions from a solr-user to use FieldsCollapsing, Join, Block-join or string concatenation. None of these approaches can be used without re-indexing schema. Has anyone found a non-invasive solution for this? Thanks, -Vijay
Re: Re-index Parent-Child Schema
Hello Mikhail, Thanks for the suggestions. It took some time to get to this - 1. FieldsCollapsing cannot be done on Multivalue fields - https://wiki.apache.org/solr/FieldCollapsing 2. Join acts on documents, how can I use it to join multi-value fields in the same document? 3. Block-join requires you to index parent and child document separately using IndexWriter.addDocuments API 4. Concatenation requires me to index with those columns concatenated. This is not possible as I have around 20 multivalue fields. Is there a way to solve this without changing how it's indexed? Best, -Vijay On Thu, Mar 13, 2014 at 1:39 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Vijay, You can try FieldCollepsing, Join, Block-join, or just concatenate both field and search for concatenation. On Thu, Mar 13, 2014 at 7:16 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hi, I've inherited an Solr application with a Schema that contains parent-child relationship. All child elements are maintained in multi-value fields. So an Order with 3 Order lines will result in an array of size 3 in Solr, This worked fine as long as clients queried only on Order, but with new requirements it is serving inaccurate results. Consider some orders, for example - { OrderId:123 BookingRecordId : [145, 987, *234*] OrderLineType : [11, 12, *13*] . } { OrderId:345 BookingRecordId : [945, 882, *234*] OrderLineType : [1, 12, *11*] . } { OrderId:678 BookingRecordId : [444] OrderLineType : [11] . } If you look up for an Order with BookingRecordId: 234 And OrderLineType:11. You will get two orders : 123 and 345, which is correct per Solr. You have two arrays in both the orders that satisfy this condition. However, for OrderId:123, the value at 3rd index of OrderLineType array is 13 and not 11( this is for BookingRecordId:145) this should be excluded. Per this blog : http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html I can't use span queries as I have tons of child elements to query and I want to keep any changes to client queries to minimum. So is creating multiple indexes is the only way? We have 3 Physical boxes with SolrCloud and at some point we would like to shard. Appreciate any inputs. Best, -Vijay -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re-index Parent-Child Schema
Hi, I've inherited an Solr application with a Schema that contains parent-child relationship. All child elements are maintained in multi-value fields. So an Order with 3 Order lines will result in an array of size 3 in Solr, This worked fine as long as clients queried only on Order, but with new requirements it is serving inaccurate results. Consider some orders, for example - { OrderId:123 BookingRecordId : [145, 987, *234*] OrderLineType : [11, 12, *13*] . } { OrderId:345 BookingRecordId : [945, 882, *234*] OrderLineType : [1, 12, *11*] . } { OrderId:678 BookingRecordId : [444] OrderLineType : [11] . } If you look up for an Order with BookingRecordId: 234 And OrderLineType:11. You will get two orders : 123 and 345, which is correct per Solr. You have two arrays in both the orders that satisfy this condition. However, for OrderId:123, the value at 3rd index of OrderLineType array is 13 and not 11( this is for BookingRecordId:145) this should be excluded. Per this blog : http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html I can't use span queries as I have tons of child elements to query and I want to keep any changes to client queries to minimum. So is creating multiple indexes is the only way? We have 3 Physical boxes with SolrCloud and at some point we would like to shard. Appreciate any inputs. Best, -Vijay
Re: Date Range Query taking more time.
Maybe I spoke too soon. The second and third filter parameter *fq={!cache=false cost=50}ClientID:4*and *fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR] *above are not getting executed, unless I make it the first parameter. And when it's the first filter parameter the Qtime goes up to 250ms from 2ms!! Something I have noticed - Solr always respects only first q and fq parameters. Rest of the parameters are not applied at all. On Thu, Mar 6, 2014 at 11:55 AM, Vijay Kokatnur kokatnur.vi...@gmail.comwrote: That did the trick Ahmet. The first response was around 200ms, but the subsequent queries were around 2-5ms. I tried this q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 fq={!cache=false cost=100}Status:Booked fq={!cache=false cost=50}ClientID:4 fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR] On Thu, Mar 6, 2014 at 11:49 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Did you try with non-cached filter quries before? cached Filter queries are useful when they are re-used. How often do you commit? I thought that we can do something if we disable cache filter queries and manipulate their execution order with cost parameter. What happens with this : q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 fq={!cache=false cost=100}Status:Booked fq={!cache=false cost=50}ClientID:4 fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR] On Thursday, March 6, 2014 9:15 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Ahmet, I have tried filter queries before to fine tune query performance. However, whenever we use filter queries the response time goes up and remains there. With above change, the response time was consistently around 4-5 secs. We are using the default cache settings. Is there any settings I missed? On Thu, Mar 6, 2014 at 10:44 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Since your range query has NOW in it, it won't be cached meaningfully. http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/ This is untested but can you try this? q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 fq=Status:Booked fq=ClientID:4 fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR] On Thursday, March 6, 2014 8:29 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: I am working with date range query that is not giving me faster response times. After modifying date range construct after reading several forums, response time now is around 200ms, down from 2-3secs. However, I was wondering if there still some way to improve upon it as queries without date range have around 2-10ms latency, Query : To look up upcoming booked trips for a user whenever he logs in to the app- q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 AND Status:Booked ANDClientID:4 AND StartDate:[NOW/DAY TO NOW/DAY+1YEAR] Date configuration in Schema : field name=StartDate type=tdate indexed=true stored=true / fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ Appreciate any inputs. Thanks!
Multiple fq parameters are not executed
..Spawning this as a separate thread.. So I have a filter query with multiple fq parameters. However, I have noticed that only the first fq is used for filtering. For instance, a lookup with ...fq=ClientID:2 fq=HotelID:234-PPP fq={!cache=false}StartDate:[NOW/DAY TO *] In the above query, results are filtered only by ClientID and not by HotelID and StartDate. The same thing happens with q query. Does anyone know why?
Re: Date Range Query taking more time.
Thanks Erick. The links you provided are invaluable. Here are our commit settings. Since we have NRT search, softCommit is set to 1000s which explains why cache is constantly invalidated. autoCommit maxTime60/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit With constant cache invalidation it becomes almost impossible to get better response times. Is the only to solve this is to fine tune softCommit settings? On Fri, Mar 7, 2014 at 6:17 PM, Erick Erickson erickerick...@gmail.comwrote: OK, something is not right here. What are your autocommit settings? What you pasted above looks like you're looking at a searcher that has _just_ opened, which would mean either 1 you just had a hard commit with openSearcher=false happen or 2 you just had a soft commit happen In either case, the cache is thrown out. That said, if you have autowarming for the cache set up you should be seeing some hits eventually. The top part is the _current_ searcher. The cumulative_* is all the cache results since the application started. A couple of blogs: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/ I'm going to guess that you have soft commits or hard commits with openSearcher=true set to a very short interval and are having your filter caches invalidated very frequently, and that is misleading you, but that's just a guess. Best, Erick On Thu, Mar 6, 2014 at 9:32 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: My initial approach was to use filter cache static fields. However when filter query is used, every query after the first has the same response time as the first. For instance, when cache is enabled in the query under review, response time shoots up to 4-5secs and stays there. We are using default filter cache settings provided with 4.5.0 distribution. Current Filter Cache stats : lookups:0 hits:0 hitratio:0 inserts:0 evictions:0 size:0 warmupTime:0 cumulative_lookups:17135 cumulative_hits:2465 cumulative_hitratio:0.14 cumulative_inserts:14670 cumulative_evictions:0 I did not find what cumulative_* fields mean herehttp://wiki.apache.org/solr/SolrAdminStats , but it looks like nothing is being cached with fq as hit ratio is 0. Any idea whats happening? On Thu, Mar 6, 2014 at 2:41 PM, Ahmet Arslan iori...@yahoo.com wrote: Hoss, Thanks for the correction. I missed the /DAY part and thought as it was StartDate:[NOW TO NOW+1YEAR] Ahmet On Friday, March 7, 2014 12:33 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : That did the trick Ahmet. The first response was around 200ms, but the : subsequent queries were around 2-5ms. Are you really sure you want cache=false on all of those filters? While the ClientID:4 query may by something that cahnges significantly enough in every query to not be useful to cache, i suspect you'd find a lot of value in going ahead and caching those Status:Booked and StartDate:[NOW/DAY TO NOW/DAY+1YEAR] clauses ... the first query to hit them might be slower but ever query after that should be fairly fast -- and if you really need them to *always* be fast, configure them as static newSeracher warming queries (or make sure you have autowarming on. It also look like you forgot the StartDate: part of your range query in your last test... : fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR] And one finally comment just to make sure it doesn't slip throug hthe cracks : Since your range query has NOW in it, it won't be cached meaningfully. this is not applicable. the use of NOW in a range query doesn't mean that it can't be cached -- the problem is anytime you use really precise dates (or numeric values) that *change* in every query. if your range query uses NOW as a lower/upper end point, then it calls in that really precise dates situation -- but for this user, who is specifically rounding his dates to hte nearest day, that advice isn't really applicable -- the date range queries can be cached reused for an entire day. -Hoss http://www.lucidworks.com/
Re: Date Range Query taking more time.
Pardon my typo. I meant 1000ms in my last mail. Thanks, -Vijay On Mon, Mar 10, 2014 at 4:22 PM, Vijay Kokatnur kokatnur.vi...@gmail.comwrote: Thanks Erick. The links you provided are invaluable. Here are our commit settings. Since we have NRT search, softCommit is set to 1000s which explains why cache is constantly invalidated. autoCommit maxTime60/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit With constant cache invalidation it becomes almost impossible to get better response times. Is the only to solve this is to fine tune softCommit settings? On Fri, Mar 7, 2014 at 6:17 PM, Erick Erickson erickerick...@gmail.comwrote: OK, something is not right here. What are your autocommit settings? What you pasted above looks like you're looking at a searcher that has _just_ opened, which would mean either 1 you just had a hard commit with openSearcher=false happen or 2 you just had a soft commit happen In either case, the cache is thrown out. That said, if you have autowarming for the cache set up you should be seeing some hits eventually. The top part is the _current_ searcher. The cumulative_* is all the cache results since the application started. A couple of blogs: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/ I'm going to guess that you have soft commits or hard commits with openSearcher=true set to a very short interval and are having your filter caches invalidated very frequently, and that is misleading you, but that's just a guess. Best, Erick On Thu, Mar 6, 2014 at 9:32 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: My initial approach was to use filter cache static fields. However when filter query is used, every query after the first has the same response time as the first. For instance, when cache is enabled in the query under review, response time shoots up to 4-5secs and stays there. We are using default filter cache settings provided with 4.5.0 distribution. Current Filter Cache stats : lookups:0 hits:0 hitratio:0 inserts:0 evictions:0 size:0 warmupTime:0 cumulative_lookups:17135 cumulative_hits:2465 cumulative_hitratio:0.14 cumulative_inserts:14670 cumulative_evictions:0 I did not find what cumulative_* fields mean herehttp://wiki.apache.org/solr/SolrAdminStats , but it looks like nothing is being cached with fq as hit ratio is 0. Any idea whats happening? On Thu, Mar 6, 2014 at 2:41 PM, Ahmet Arslan iori...@yahoo.com wrote: Hoss, Thanks for the correction. I missed the /DAY part and thought as it was StartDate:[NOW TO NOW+1YEAR] Ahmet On Friday, March 7, 2014 12:33 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : That did the trick Ahmet. The first response was around 200ms, but the : subsequent queries were around 2-5ms. Are you really sure you want cache=false on all of those filters? While the ClientID:4 query may by something that cahnges significantly enough in every query to not be useful to cache, i suspect you'd find a lot of value in going ahead and caching those Status:Booked and StartDate:[NOW/DAY TO NOW/DAY+1YEAR] clauses ... the first query to hit them might be slower but ever query after that should be fairly fast -- and if you really need them to *always* be fast, configure them as static newSeracher warming queries (or make sure you have autowarming on. It also look like you forgot the StartDate: part of your range query in your last test... : fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR] And one finally comment just to make sure it doesn't slip throug hthe cracks : Since your range query has NOW in it, it won't be cached meaningfully. this is not applicable. the use of NOW in a range query doesn't mean that it can't be cached -- the problem is anytime you use really precise dates (or numeric values) that *change* in every query. if your range query uses NOW as a lower/upper end point, then it calls in that really precise dates situation -- but for this user, who is specifically rounding his dates to hte nearest day, that advice isn't really applicable -- the date range queries can be cached reused for an entire day. -Hoss http://www.lucidworks.com/
Date Range Query taking more time.
I am working with date range query that is not giving me faster response times. After modifying date range construct after reading several forums, response time now is around 200ms, down from 2-3secs. However, I was wondering if there still some way to improve upon it as queries without date range have around 2-10ms latency, Query : To look up upcoming booked trips for a user whenever he logs in to the app- q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 AND Status:Booked ANDClientID:4 AND StartDate:[NOW/DAY TO NOW/DAY+1YEAR] Date configuration in Schema : field name=StartDate type=tdate indexed=true stored=true / fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ Appreciate any inputs. Thanks!
Re: Date Range Query taking more time.
Ahmet, I have tried filter queries before to fine tune query performance. However, whenever we use filter queries the response time goes up and remains there. With above change, the response time was consistently around 4-5 secs. We are using the default cache settings. Is there any settings I missed? On Thu, Mar 6, 2014 at 10:44 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Since your range query has NOW in it, it won't be cached meaningfully. http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/ This is untested but can you try this? q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 fq=Status:Booked fq=ClientID:4 fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR] On Thursday, March 6, 2014 8:29 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: I am working with date range query that is not giving me faster response times. After modifying date range construct after reading several forums, response time now is around 200ms, down from 2-3secs. However, I was wondering if there still some way to improve upon it as queries without date range have around 2-10ms latency, Query : To look up upcoming booked trips for a user whenever he logs in to the app- q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 AND Status:Booked ANDClientID:4 AND StartDate:[NOW/DAY TO NOW/DAY+1YEAR] Date configuration in Schema : field name=StartDate type=tdate indexed=true stored=true / fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ Appreciate any inputs. Thanks!
Re: Date Range Query taking more time.
That did the trick Ahmet. The first response was around 200ms, but the subsequent queries were around 2-5ms. I tried this q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 fq={!cache=false cost=100}Status:Booked fq={!cache=false cost=50}ClientID:4 fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR] On Thu, Mar 6, 2014 at 11:49 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Did you try with non-cached filter quries before? cached Filter queries are useful when they are re-used. How often do you commit? I thought that we can do something if we disable cache filter queries and manipulate their execution order with cost parameter. What happens with this : q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 fq={!cache=false cost=100}Status:Booked fq={!cache=false cost=50}ClientID:4 fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR] On Thursday, March 6, 2014 9:15 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Ahmet, I have tried filter queries before to fine tune query performance. However, whenever we use filter queries the response time goes up and remains there. With above change, the response time was consistently around 4-5 secs. We are using the default cache settings. Is there any settings I missed? On Thu, Mar 6, 2014 at 10:44 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Since your range query has NOW in it, it won't be cached meaningfully. http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/ This is untested but can you try this? q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 fq=Status:Booked fq=ClientID:4 fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR] On Thursday, March 6, 2014 8:29 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: I am working with date range query that is not giving me faster response times. After modifying date range construct after reading several forums, response time now is around 200ms, down from 2-3secs. However, I was wondering if there still some way to improve upon it as queries without date range have around 2-10ms latency, Query : To look up upcoming booked trips for a user whenever he logs in to the app- q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 AND Status:Booked ANDClientID:4 AND StartDate:[NOW/DAY TO NOW/DAY+1YEAR] Date configuration in Schema : field name=StartDate type=tdate indexed=true stored=true / fieldType name=tdate class=solr.TrieDateField precisionStep=6 positionIncrementGap=0/ Appreciate any inputs. Thanks!
Re: Date Range Query taking more time.
My initial approach was to use filter cache static fields. However when filter query is used, every query after the first has the same response time as the first. For instance, when cache is enabled in the query under review, response time shoots up to 4-5secs and stays there. We are using default filter cache settings provided with 4.5.0 distribution. Current Filter Cache stats : lookups:0 hits:0 hitratio:0 inserts:0 evictions:0 size:0 warmupTime:0 cumulative_lookups:17135 cumulative_hits:2465 cumulative_hitratio:0.14 cumulative_inserts:14670 cumulative_evictions:0 I did not find what cumulative_* fields mean herehttp://wiki.apache.org/solr/SolrAdminStats , but it looks like nothing is being cached with fq as hit ratio is 0. Any idea whats happening? On Thu, Mar 6, 2014 at 2:41 PM, Ahmet Arslan iori...@yahoo.com wrote: Hoss, Thanks for the correction. I missed the /DAY part and thought as it was StartDate:[NOW TO NOW+1YEAR] Ahmet On Friday, March 7, 2014 12:33 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : That did the trick Ahmet. The first response was around 200ms, but the : subsequent queries were around 2-5ms. Are you really sure you want cache=false on all of those filters? While the ClientID:4 query may by something that cahnges significantly enough in every query to not be useful to cache, i suspect you'd find a lot of value in going ahead and caching those Status:Booked and StartDate:[NOW/DAY TO NOW/DAY+1YEAR] clauses ... the first query to hit them might be slower but ever query after that should be fairly fast -- and if you really need them to *always* be fast, configure them as static newSeracher warming queries (or make sure you have autowarming on. It also look like you forgot the StartDate: part of your range query in your last test... : fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR] And one finally comment just to make sure it doesn't slip throug hthe cracks : Since your range query has NOW in it, it won't be cached meaningfully. this is not applicable. the use of NOW in a range query doesn't mean that it can't be cached -- the problem is anytime you use really precise dates (or numeric values) that *change* in every query. if your range query uses NOW as a lower/upper end point, then it calls in that really precise dates situation -- but for this user, who is specifically rounding his dates to hte nearest day, that advice isn't really applicable -- the date range queries can be cached reused for an entire day. -Hoss http://www.lucidworks.com/