Import Handler using shell scripts

2017-04-28 Thread Vijay Kokatnur
Is it possible to call dataimport handler from a shell script?  I have not
found any documentation regarding this. Any pointers?

-- 
Best,
Vijay


Re: Clean checkbox on DIH

2017-04-28 Thread Vijay Kokatnur
Even though it's a minority, I think it should be disabled by default.
​That's a cleaner approach.  Accidentally running it in prod without
unchecking 'clean' can be disastrous.


On Apr 28, 2017, at 8:01AM, Mahmoud Almokadem 
wrote:
>
> Thanks Shawn,
>
> We already using a shell scripts to do our import and using fullimport
> command to do our delta import and everything is doing well several years
> ago. But default of the UI is full import with clean and commit. If I press
> the Execute button by mistake the whole index is cleaned without any
> notification.
>
> Thanks,
> Mahmoud
>
>
>
>
> On Fri, Apr 28, 2017 at 2:51 PM, Shawn Heisey  wrote:
>
>  On 4/28/2017 5:11 AM, Mahmoud Almokadem wrote:
>>
>>>  I'd like to request to uncheck the "Clean" checkbox by default on DIH
>>>
>>  page,
>>
>>>  cause it cleaned the whole index about 2TB when I click Execute button by
>>>  wrong. Or show a confirmation message that the whole index will be
>>>
>>  cleaned!!
>>
>>  When somebody is doing a full-import, clean is what almost all users are
>>  going to want.  If you're wanting to do full-import without cleaning,
>>  then you are in the minority.  It is perhaps a fairly large minority,
>>  but still not the majority.
>>
>>  Also, once you move into production, you should not be using the admin
>>  UI for this.  You should be calling the DIH handler directly with HTTP
>>  from another source, which might be a shell script using curl, or a
>>  full-blown program in another language.
>>
>>  Thanks,
>>  Shawn
>
>
>
>


Re: DIH Speed

2017-04-27 Thread Vijay Kokatnur
​Let me clarify -

DIH is running on Solr 6.5.0 that calls a different solr instance running​
on 4.5.0, which has 150M documents.  If we try fetch them using DIH onto
new solr cluster, wouldn't it result in deep paging on solr 4.5.0 and
drastically slow down indexing on solr 6.5.0?

On Thu, Apr 27, 2017 at 4:40 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> I'm unclear why DIH an deep paging are  mixed. DIH is
> indexing and deep paging is querying.
>
> If it's querying, consider cursorMark or the /export handler.
> https://lucidworks.com/2013/12/12/coming-soon-to-solr-
> efficient-cursor-based-iteration-of-large-result-sets/
>
> If it's DIH, please explain a bit more.
>
> Best,
> Erick
>
> On Thu, Apr 27, 2017 at 3:37 PM, Vijay Kokatnur
> <kokatnur.vi...@gmail.com> wrote:
> > We have a new solr 6.5.0 cluster, for which data is being imported via
> DIH
> > from another Solr cluster running version 4.5.0.
> >
> > This question comes back to deep paging, but we have observed that after
> 30
> > minutes of querying the rate of processing goes down from 400/s to about
> > 120/s.  At that point it has processed only 500K of 1.3M docs.  Is there
> > any way to speed this up?
> >
> > And, I can't go back to the source for the data.
> >
> > --
>



-- 
Best,
Vijay


Re: DIH Speed

2017-04-27 Thread Vijay Kokatnur
Hey Shawn,

Unfortunately, we can't upgrade the existing cluster.  That was my first
approach as well.

Yes, SolrEntityProcessor is used so it results in deep paging after certain
rows.

I have observed that instead of importing for a larger period, if data is
imported only for 4 hours at a time, import process is much faster.  Since
we are importing for several months it would be nice if dataimport can be
scripted, in bash or python.  But I am can't find any documentation on it.
Any pointers?

--
*From:* Shawn Heisey 
*Sent:* Thursday, April 27, 2017 5:07 PM
*To:* solr-user@lucene.apache.org
*Subject:* Re: DIH Speed

On 4/27/2017 5:40 PM, Erick Erickson wrote:
> I'm unclear why DIH an deep paging are mixed. DIH is indexing and deep
paging is querying.
>
> If it's querying, consider cursorMark or the /export handler.
https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Very likely they are using SolrEntityProcessor.

Vijay, if the source server were running 4.7 (or later) instead of 4.5,
you could enable cursorMark for SolrEntityProcessor in 6.5.0 as Erick
mentioned, and pagination would be immensely more efficient.
Unfortunately, 4.5 doesn't support cursorMark.

https://issues.apache.org/jira/browse/SOLR-9668

Any chance you could upgrade the source server to a later 4.x version?

Thanks,
Shawn


DIH Speed

2017-04-27 Thread Vijay Kokatnur
We have a new solr 6.5.0 cluster, for which data is being imported via DIH
from another Solr cluster running version 4.5.0.

This question comes back to deep paging, but we have observed that after 30
minutes of querying the rate of processing goes down from 400/s to about
120/s.  At that point it has processed only 500K of 1.3M docs.  Is there
any way to speed this up?

​And, I can't go back to the source for the data.​

--


Split Shard not working

2017-04-27 Thread Vijay Kokatnur
We recently upgraded 4.5 index to 6.5 using IndexUpgrader.  The index size
is around 600 GB on disk.  When we try to split it using SPLITSHARD, it
creates two new sub shards on the node and eventually crashes before
completely the split.  After restart, the original shard size if around 100
GB and each sub shards are around 2 GB.  They are no doubt not fully
constructed.

We tried different heap settings as well- 15, 20 and 30 GB, but it always
crashed.  RAM is about 256GB.

What's going on here?
​  Any one faced this situation before?​


Solr Memory Usage

2014-10-29 Thread Vijay Kokatnur
I am observing some weird behavior with how Solr is using memory.  We are
running both Solr and zookeeper on the same node.  We tested memory
settings on Solr Cloud Setup of 1 shard with 146GB index size, and 2 Shard
Solr setup with 44GB index size.  Both are running on similar beefy
machines.

 After running the setup for 3-4 days, I see that a lot of memory is
inactive in all the nodes -

 99052952  total memory
 98606256  used memory
 19143796  active memory
 75063504  inactive memory

And inactive memory is never reclaimed by the OS.  When total memory size
is reached, latency and disk IO shoots up.  We observed this behavior in
both Solr Cloud setup with 1 shard and Solr setup with 2 shards.

For the Solr Cloud setup, we are running a cron job with following command
to clear out the inactive memory.  It  is working as expected.  Even though
the index size of Cloud is 146GB, the used memory is always below 55GB.
Our response times are better and no errors/exceptions are thrown. (This
command causes issue in 2 Shard setup)

echo 3  /proc/sys/vm/drop_caches

We have disabled the query, doc and solr caches in our setup.  Zookeeper is
using around 10GB of memory and we are not running any other process in
this system.

Has anyone faced this issue before?


Re: SpanQuery with Boolean Queries

2014-04-28 Thread Vijay Kokatnur
Pretty neat. Thanks!


On Fri, Apr 25, 2014 at 2:44 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 I am not sure how OR clauses are executed.

 But after re-reading your mail, I think you can use SpanOrQuery (for your
 q1) in your custom query parser plugin.

 val q2 = new SpanOrQuery(
 new SpanTermQuery(new Term(BookingRecordId,
 ID_1)),
 new SpanTermQuery(new Term(BookingRecordId,
 ID_N))
 );




 On Friday, April 25, 2014 3:22 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Thanks Ahmet. It worked!

 Does solr execute these nested queries in parallel?



 On Thu, Apr 24, 2014 at 12:53 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Vijay,
 
  May be you can use _query_ hook?
 
  _query_:{!span}BookingRecordId:234 OrderLineType:11 OR _query_:{!span}
  OrderLineType:13 + BookingRecordId:ID_N
 
  Ahmet
 
 
  On Thursday, April 24, 2014 9:34 PM, Vijay Kokatnur 
  kokatnur.vi...@gmail.com wrote:
  Hi,
 
  I have defined a SpanQuery for proximity search like -
 
  val q1 = new SpanTermQuery(new Term(BookingRecordId, 234))
  val q2 = new SpanTermQuery(new Term(OrderLineType, 11))
  val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
  val sp = Array[SpanQuery](q1, q2m)
 
  val q = new SpanNearQuery(sp, -1, false)
 
  Query:
  *fq={!span} BookingRecordId: 234+OrderLineType11*
 
  However, I need to look up by multiple BookingRecordIds with an OR -
 
  *fq={!span}OrderLineType:13 + (BookingRecordId:ID_1 OR ... OR
  BookingRecordId:ID_N)*
 
  I can't specify multiple *span* in the same query like -
 
  *{!span} OrderLineType:13 + BookingRecordId:ID_1 OR ... OR {!span}
  OrderLineType:13 + BookingRecordId:ID_N*
 
  Is there any recommended to way to achieve this?
  Thanks, Vijay
 
 




Issue with SpanQuery

2014-04-28 Thread Vijay Kokatnur
I have been working on SpanQuery for some time now to look up multivalued
fields and found one more issue  -

Now if a document has following lookup fields among others

*BookingRecordId*: [ 100268421, 190131, 8263325 ],

*OrderLineType*: [ 13, 1, 11 ],

Here is the query I construct -

val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
val sp = Array[SpanQuery](q1, q2m)

val q = new SpanNearQuery(sp, -1, false)

Query to find element at first index position works fine -

*{!span} BookingRecordId: 100268421 +OrderLineType:13*
but query to find element at third index position doesn't return any
result. -

*{!span} BookingRecordId: 8263325 +OrderLineType:11 *

If I increase the slope to 4 then it returns correct result. But it also
matches BookingRecordId: 100268421 with OrderLineType:11 which is incorrect.

I thought SpanQuery works for any multiValued field size.  Any ideas how I
can fix this?

Thanks,
-Vijay


Re: Issue with SpanQuery

2014-04-28 Thread Vijay Kokatnur
Hey Ehmet,

Here is the field def -

field name=BookingRecordId type=token indexed=true stored=true
multiValued=true omitTermFreqAndPositions=false/

fieldType name=token class=solr.TextField omitNorms=true analyzer
tokenizer class=solr.KeywordTokenizerFactory/ filter
class=solr.LowerCaseFilterFactory/ /analyzer /fieldType




On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Can you paste your field definition of BookingRecordId and OrderLineType?
 It could be something related to positionIncrementGap.

 Ahmet



 On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
 Facing the same problem!! I have noticed it works fine as long as you're
 looking up the first index position.

 Anyone faced similar problem before?



 On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
 kokatnur.vi...@gmail.comwrote:

  I have been working on SpanQuery for some time now to look up multivalued
  fields and found one more issue  -
 
  Now if a document has following lookup fields among others
 
  *BookingRecordId*: [ 100268421, 190131, 8263325 ],
 
  *OrderLineType*: [ 13, 1, 11 ],
 
  Here is the query I construct -
 
  val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
  val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
  val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
  val sp = Array[SpanQuery](q1, q2m)
 
  val q = new SpanNearQuery(sp, -1, false)
 
  Query to find element at first index position works fine -
 
  *{!span} BookingRecordId: 100268421 +OrderLineType:13*
  but query to find element at third index position doesn't return any
  result. -
 
  *{!span} BookingRecordId: 8263325 +OrderLineType:11 *
 
  If I increase the slope to 4 then it returns correct result. But it also
  matches BookingRecordId: 100268421 with OrderLineType:11 which is
 incorrect.
 
  I thought SpanQuery works for any multiValued field size.  Any ideas how
 I
  can fix this?
 
  Thanks,
  -Vijay
 




Re: Issue with SpanQuery

2014-04-28 Thread Vijay Kokatnur
Thanks Ahmet, I'll give that a try.  Do I need to re-index to add/update
positionIncrementGap?


On Mon, Apr 28, 2014 at 3:31 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 I would add positionIncrementGap to fieldType definitions and experiment
 with different values. 0, 1 and 100.


 fieldType name=token class=solr.TextField omitNorms=true
 positionIncrementGap=1

 Same with OrderLineType too




 On Tuesday, April 29, 2014 1:25 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Hey Ehmet,

 Here is the field def -

 field name=BookingRecordId type=token indexed=true stored=true
 multiValued=true omitTermFreqAndPositions=false/

 fieldType name=token class=solr.TextField omitNorms=true analyzer
 tokenizer class=solr.KeywordTokenizerFactory/ filter
 class=solr.LowerCaseFilterFactory/ /analyzer /fieldType





 On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Can you paste your field definition of BookingRecordId and OrderLineType?
  It could be something related to positionIncrementGap.
 
  Ahmet
 
 
 
  On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
  Facing the same problem!! I have noticed it works fine as long as you're
  looking up the first index position.
 
  Anyone faced similar problem before?
 
 
 
  On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
  kokatnur.vi...@gmail.comwrote:
 
   I have been working on SpanQuery for some time now to look up
 multivalued
   fields and found one more issue  -
  
   Now if a document has following lookup fields among others
  
   *BookingRecordId*: [ 100268421, 190131, 8263325 ],
  
   *OrderLineType*: [ 13, 1, 11 ],
  
   Here is the query I construct -
  
   val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
   val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
   val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
   val sp = Array[SpanQuery](q1, q2m)
  
   val q = new SpanNearQuery(sp, -1, false)
  
   Query to find element at first index position works fine -
  
   *{!span} BookingRecordId: 100268421 +OrderLineType:13*
   but query to find element at third index position doesn't return any
   result. -
  
   *{!span} BookingRecordId: 8263325 +OrderLineType:11 *
  
   If I increase the slope to 4 then it returns correct result. But it
 also
   matches BookingRecordId: 100268421 with OrderLineType:11 which is
  incorrect.
  
   I thought SpanQuery works for any multiValued field size.  Any ideas
 how
  I
   can fix this?
  
   Thanks,
   -Vijay
  
 
 



Re: Issue with SpanQuery

2014-04-28 Thread Vijay Kokatnur
Adding positionIncrementGap=1 to the fields worked for me.  I didn't
re-index all the existing docs so it works for only future documents.


On Mon, Apr 28, 2014 at 3:54 PM, Ahmet Arslan iori...@yahoo.com wrote:



 Hi Vijay,

 It is a index time setting so yes solr restart and re-indexing is
 required. So A small test case would be handy




 On Tuesday, April 29, 2014 1:35 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Thanks Ahmet, I'll give that a try.  Do I need to re-index to add/update
 positionIncrementGap?



 On Mon, Apr 28, 2014 at 3:31 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  I would add positionIncrementGap to fieldType definitions and experiment
  with different values. 0, 1 and 100.
 
 
  fieldType name=token class=solr.TextField omitNorms=true
  positionIncrementGap=1
 
  Same with OrderLineType too
 
 
 
 
  On Tuesday, April 29, 2014 1:25 AM, Vijay Kokatnur 
  kokatnur.vi...@gmail.com wrote:
  Hey Ehmet,
 
  Here is the field def -
 
  field name=BookingRecordId type=token indexed=true stored=true
  multiValued=true omitTermFreqAndPositions=false/
 
  fieldType name=token class=solr.TextField omitNorms=true
 analyzer
  tokenizer class=solr.KeywordTokenizerFactory/ filter
  class=solr.LowerCaseFilterFactory/ /analyzer /fieldType
 
 
 
 
 
  On Mon, Apr 28, 2014 at 3:19 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
   Hi,
  
   Can you paste your field definition of BookingRecordId and
 OrderLineType?
   It could be something related to positionIncrementGap.
  
   Ahmet
  
  
  
   On Tuesday, April 29, 2014 12:58 AM, Ethan eh198...@gmail.com wrote:
   Facing the same problem!! I have noticed it works fine as long as
 you're
   looking up the first index position.
  
   Anyone faced similar problem before?
  
  
  
   On Mon, Apr 28, 2014 at 12:22 PM, Vijay Kokatnur
   kokatnur.vi...@gmail.comwrote:
  
I have been working on SpanQuery for some time now to look up
  multivalued
fields and found one more issue  -
   
Now if a document has following lookup fields among others
   
*BookingRecordId*: [ 100268421, 190131, 8263325 ],
   
*OrderLineType*: [ 13, 1, 11 ],
   
Here is the query I construct -
   
val q1 = new SpanTermQuery(new Term(BookingRecordId, 100268421))
val q2 = new SpanTermQuery(new Term(OrderLineType, 13))
val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
val sp = Array[SpanQuery](q1, q2m)
   
val q = new SpanNearQuery(sp, -1, false)
   
Query to find element at first index position works fine -
   
*{!span} BookingRecordId: 100268421 +OrderLineType:13*
but query to find element at third index position doesn't return any
result. -
   
*{!span} BookingRecordId: 8263325 +OrderLineType:11 *
   
If I increase the slope to 4 then it returns correct result. But it
  also
matches BookingRecordId: 100268421 with OrderLineType:11 which is
   incorrect.
   
I thought SpanQuery works for any multiValued field size.  Any ideas
  how
   I
can fix this?
   
Thanks,
-Vijay
   
  
  
 




SpanQuery with Boolean Queries

2014-04-24 Thread Vijay Kokatnur
Hi,

I have defined a SpanQuery for proximity search like -

val q1 = new SpanTermQuery(new Term(BookingRecordId, 234))
val q2 = new SpanTermQuery(new Term(OrderLineType, 11))
val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
val sp = Array[SpanQuery](q1, q2m)

val q = new SpanNearQuery(sp, -1, false)

Query:
*fq={!span} BookingRecordId: 234+OrderLineType11*

However, I need to look up by multiple BookingRecordIds with an OR -

*fq={!span}OrderLineType:13 + (BookingRecordId:ID_1 OR ... OR
BookingRecordId:ID_N)*

I can't specify multiple *span* in the same query like -

*{!span} OrderLineType:13 + BookingRecordId:ID_1 OR ... OR {!span}
OrderLineType:13 + BookingRecordId:ID_N*

Is there any recommended to way to achieve this?
Thanks, Vijay


Re: SpanQuery with Boolean Queries

2014-04-24 Thread Vijay Kokatnur
Thanks Ahmet. It worked!

Does solr execute these nested queries in parallel?


On Thu, Apr 24, 2014 at 12:53 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Vijay,

 May be you can use _query_ hook?

 _query_:{!span}BookingRecordId:234 OrderLineType:11 OR _query_:{!span}
 OrderLineType:13 + BookingRecordId:ID_N

 Ahmet


 On Thursday, April 24, 2014 9:34 PM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Hi,

 I have defined a SpanQuery for proximity search like -

 val q1 = new SpanTermQuery(new Term(BookingRecordId, 234))
 val q2 = new SpanTermQuery(new Term(OrderLineType, 11))
 val q2m = new FieldMaskingSpanQuery(q2, BookingRecordId)
 val sp = Array[SpanQuery](q1, q2m)

 val q = new SpanNearQuery(sp, -1, false)

 Query:
 *fq={!span} BookingRecordId: 234+OrderLineType11*

 However, I need to look up by multiple BookingRecordIds with an OR -

 *fq={!span}OrderLineType:13 + (BookingRecordId:ID_1 OR ... OR
 BookingRecordId:ID_N)*

 I can't specify multiple *span* in the same query like -

 *{!span} OrderLineType:13 + BookingRecordId:ID_1 OR ... OR {!span}
 OrderLineType:13 + BookingRecordId:ID_N*

 Is there any recommended to way to achieve this?
 Thanks, Vijay




Re: Searching multivalue fields.

2014-04-08 Thread Vijay Kokatnur
Since Span is the only way to solve the problem, I won't mind re-indexing.
 It's just that I have never done it before.

We've got 80G of indexed data replicated on two nodes in a cluster.  Is
there a preferred way to go about re-indexing?



On Tue, Apr 8, 2014 at 12:17 AM, Ahmet Arslan iori...@yahoo.com wrote:



 Hi,

 Changing value of omitTermFreqAndPositions requires re-indexing,
 unfortunately. And I remembered that you don't want to reindex. It looks
 like we are out of options.

 Ahmet


 On Tuesday, April 8, 2014 12:45 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Yes I did restart solr, but did not re-index.  Is that necessary?  We've
 got 80G of indexed data, is there a preferred way of doing it without
 impacting performance?


 On Sat, Apr 5, 2014 at 9:44 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Did restart solr and you re-index after schema change?
 On Saturday, April 5, 2014 2:39 AM, Vijay Kokatnur 
  kokatnur.vi...@gmail.com wrote:
   I had already tested with omitTermFreqAndPositions=false .  I still
  got the same error.
 
  Is there something that I am overlooking?
 
  On Fri, Apr 4, 2014 at 2:45 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi Vijay,
 
  Add omitTermFreqAndPositions=false  attribute to fieldType definitions.
 
  fieldType name=string class=solr.StrField
  omitTermFreqAndPositions=false sortMissingLast=true /
 
 fieldType name=int class=solr.TrieIntField
  omitTermFreqAndPositions=false precisionStep=0
  positionIncrementGap=0/
 
  You don't need termVectors  for this.
 
 1.2: omitTermFreqAndPositions attribute introduced, true by default
  except for text fields.
 
  And please reply to solr user mail, so others can use the threat later
 on.
 
  Ahmet
On Saturday, April 5, 2014 12:18 AM, Vijay Kokatnur 
  kokatnur.vi...@gmail.com wrote:
Hey Ahmet,
 
  Sorry it took some time to test this.  But schema definition seem to
  conflict with SpanQuery.  I get following error when I use Spans
 
   field OrderLineType was indexed without position data; cannot run
  SpanTermQuery (term=11)
 
  I changed field definition in the schema but can't find the right
  attribute to set this.  My last attempt was with following definition
 
 field name=OrderLineType type=string indexed=true stored=true
  multiValued=true *termVectors=true termPositions=true
  termOffsets=true*/
 
   Any ideas what I am doing wrong?
 
  Thanks,
  -Vijay
 
  On Wed, Mar 26, 2014 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi Vijay,
 
  After reading the documentation it seems that following query is what you
  are after. It will return OrderId:345 without matching OrderId:123
 
  SpanQuery q1  = new SpanTermQuery(new Term(BookingRecordId, 234));
  SpanQuery q2  = new SpanTermQuery(new Term(OrderLineType, 11));
  SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId);
  Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false);
 
  Ahmet
 
 
 
  On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com
  wrote:
  Hi Vijay,
 
  I personally don't understand joins very well. Just a guess may
  be FieldMaskingSpanQuery could be used?
 
 
 
 http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
 
 
  Ahmet
 
 
 
 
  On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur 
  kokatnur.vi...@gmail.com wrote:
  Hi,
 
  I am bumping this thread again one last time to see if anyone has a
  solution.
 
  In it's current state, our application is storing child items as
 multivalue
  fields.  Consider some orders, for example -
 
 
  {
  OrderId:123
  BookingRecordId : [145, 987, *234*]
  OrderLineType : [11, 12, *13*]
  .
  }
  {
  OrderId:345
  BookingRecordId : [945, 882, *234*]
  OrderLineType : [1, 12, *11*]
  .
  }
  {
  OrderId:678
  BookingRecordId : [444]
  OrderLineType : [11]
  .
  }
 
 
  Here, If you look up for an Order with BookingRecordId: 234 And
  OrderLineType:11.  You will get two orders with orderId : 123 and 345,
  which is correct.  You have two arrays in both the orders that satisfy
 this
  condition.
 
  However, for OrderId:123, the value at 3rd index of OrderLineType array
 is
  13 and not 11( this is for OrderId:345).  So orderId 123 should be
  excluded. This is what I am trying to achieve.
 
  I got some suggestions from a solr-user to use FieldsCollapsing, Join,
  Block-join or string concatenation.  None of these approaches can be used
  without re-indexing schema.
 
  Has anyone found a non-invasive solution for this?
 
  Thanks,
 
  -Vijay
 
 
 
 
 
 
 
 




Re: Searching multivalue fields.

2014-04-07 Thread Vijay Kokatnur
Yes I did restart solr, but did not re-index.  Is that necessary?  We've
got 80G of indexed data, is there a preferred way of doing it without
impacting performance?


On Sat, Apr 5, 2014 at 9:44 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Did restart solr and you re-index after schema change?
On Saturday, April 5, 2014 2:39 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
  I had already tested with omitTermFreqAndPositions=false .  I still
 got the same error.

 Is there something that I am overlooking?

 On Fri, Apr 4, 2014 at 2:45 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Vijay,

 Add omitTermFreqAndPositions=false  attribute to fieldType definitions.

 fieldType name=string class=solr.StrField
 omitTermFreqAndPositions=false sortMissingLast=true /

fieldType name=int class=solr.TrieIntField
 omitTermFreqAndPositions=false precisionStep=0
 positionIncrementGap=0/

 You don't need termVectors  for this.

1.2: omitTermFreqAndPositions attribute introduced, true by default
 except for text fields.

 And please reply to solr user mail, so others can use the threat later on.

 Ahmet
   On Saturday, April 5, 2014 12:18 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
   Hey Ahmet,

 Sorry it took some time to test this.  But schema definition seem to
 conflict with SpanQuery.  I get following error when I use Spans

  field OrderLineType was indexed without position data; cannot run
 SpanTermQuery (term=11)

 I changed field definition in the schema but can't find the right
 attribute to set this.  My last attempt was with following definition

field name=OrderLineType type=string indexed=true stored=true
 multiValued=true *termVectors=true termPositions=true
 termOffsets=true*/

  Any ideas what I am doing wrong?

 Thanks,
 -Vijay

 On Wed, Mar 26, 2014 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Vijay,

 After reading the documentation it seems that following query is what you
 are after. It will return OrderId:345 without matching OrderId:123

 SpanQuery q1  = new SpanTermQuery(new Term(BookingRecordId, 234));
 SpanQuery q2  = new SpanTermQuery(new Term(OrderLineType, 11));
 SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId);
 Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false);

 Ahmet



 On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
 Hi Vijay,

 I personally don't understand joins very well. Just a guess may
 be FieldMaskingSpanQuery could be used?


 http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html


 Ahmet




 On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Hi,

 I am bumping this thread again one last time to see if anyone has a
 solution.

 In it's current state, our application is storing child items as multivalue
 fields.  Consider some orders, for example -


 {
 OrderId:123
 BookingRecordId : [145, 987, *234*]
 OrderLineType : [11, 12, *13*]
 .
 }
 {
 OrderId:345
 BookingRecordId : [945, 882, *234*]
 OrderLineType : [1, 12, *11*]
 .
 }
 {
 OrderId:678
 BookingRecordId : [444]
 OrderLineType : [11]
 .
 }


 Here, If you look up for an Order with BookingRecordId: 234 And
 OrderLineType:11.  You will get two orders with orderId : 123 and 345,
 which is correct.  You have two arrays in both the orders that satisfy this
 condition.

 However, for OrderId:123, the value at 3rd index of OrderLineType array is
 13 and not 11( this is for OrderId:345).  So orderId 123 should be
 excluded. This is what I am trying to achieve.

 I got some suggestions from a solr-user to use FieldsCollapsing, Join,
 Block-join or string concatenation.  None of these approaches can be used
 without re-indexing schema.

 Has anyone found a non-invasive solution for this?

 Thanks,

 -Vijay










Re: Searching multivalue fields.

2014-04-04 Thread Vijay Kokatnur
I had already tested with omitTermFreqAndPositions=false .  I still got
the same error.

Is there something that I am overlooking?

On Fri, Apr 4, 2014 at 2:45 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Vijay,

 Add omitTermFreqAndPositions=false  attribute to fieldType definitions.

 fieldType name=string class=solr.StrField
 omitTermFreqAndPositions=false sortMissingLast=true /

fieldType name=int class=solr.TrieIntField
 omitTermFreqAndPositions=false precisionStep=0
 positionIncrementGap=0/

 You don't need termVectors  for this.

1.2: omitTermFreqAndPositions attribute introduced, true by default
 except for text fields.

 And please reply to solr user mail, so others can use the threat later on.

 Ahmet
   On Saturday, April 5, 2014 12:18 AM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
   Hey Ahmet,

 Sorry it took some time to test this.  But schema definition seem to
 conflict with SpanQuery.  I get following error when I use Spans

  field OrderLineType was indexed without position data; cannot run
 SpanTermQuery (term=11)

 I changed field definition in the schema but can't find the right
 attribute to set this.  My last attempt was with following definition

field name=OrderLineType type=string indexed=true stored=true
 multiValued=true *termVectors=true termPositions=true
 termOffsets=true*/

  Any ideas what I am doing wrong?

 Thanks,
 -Vijay

 On Wed, Mar 26, 2014 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Vijay,

 After reading the documentation it seems that following query is what you
 are after. It will return OrderId:345 without matching OrderId:123

 SpanQuery q1  = new SpanTermQuery(new Term(BookingRecordId, 234));
 SpanQuery q2  = new SpanTermQuery(new Term(OrderLineType, 11));
 SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId);
 Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false);

 Ahmet



 On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
 Hi Vijay,

 I personally don't understand joins very well. Just a guess may
 be FieldMaskingSpanQuery could be used?


 http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html


 Ahmet




 On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Hi,

 I am bumping this thread again one last time to see if anyone has a
 solution.

 In it's current state, our application is storing child items as multivalue
 fields.  Consider some orders, for example -


 {
 OrderId:123
 BookingRecordId : [145, 987, *234*]
 OrderLineType : [11, 12, *13*]
 .
 }
 {
 OrderId:345
 BookingRecordId : [945, 882, *234*]
 OrderLineType : [1, 12, *11*]
 .
 }
 {
 OrderId:678
 BookingRecordId : [444]
 OrderLineType : [11]
 .
 }


 Here, If you look up for an Order with BookingRecordId: 234 And
 OrderLineType:11.  You will get two orders with orderId : 123 and 345,
 which is correct.  You have two arrays in both the orders that satisfy this
 condition.

 However, for OrderId:123, the value at 3rd index of OrderLineType array is
 13 and not 11( this is for OrderId:345).  So orderId 123 should be
 excluded. This is what I am trying to achieve.

 I got some suggestions from a solr-user to use FieldsCollapsing, Join,
 Block-join or string concatenation.  None of these approaches can be used
 without re-indexing schema.

 Has anyone found a non-invasive solution for this?

 Thanks,

 -Vijay







Searching multivalue fields.

2014-03-26 Thread Vijay Kokatnur
Hi,

I am bumping this thread again one last time to see if anyone has a
solution.

In it's current state, our application is storing child items as multivalue
fields.  Consider some orders, for example -


 {
OrderId:123
BookingRecordId : [145, 987, *234*]
OrderLineType : [11, 12, *13*]
.
}
 {
OrderId:345
BookingRecordId : [945, 882, *234*]
OrderLineType : [1, 12, *11*]
.
}
 {
OrderId:678
BookingRecordId : [444]
OrderLineType : [11]
.
}


Here, If you look up for an Order with BookingRecordId: 234 And
OrderLineType:11.  You will get two orders with orderId : 123 and 345,
which is correct.  You have two arrays in both the orders that satisfy this
condition.

However, for OrderId:123, the value at 3rd index of OrderLineType array is
13 and not 11( this is for OrderId:345).  So orderId 123 should be
excluded. This is what I am trying to achieve.

I got some suggestions from a solr-user to use FieldsCollapsing, Join,
Block-join or string concatenation.  None of these approaches can be used
without re-indexing schema.

Has anyone found a non-invasive solution for this?

Thanks,

-Vijay


Re: Re-index Parent-Child Schema

2014-03-25 Thread Vijay Kokatnur
Hello Mikhail,

Thanks for the suggestions.  It took some time to get to this -

1. FieldsCollapsing cannot be done on Multivalue fields -
https://wiki.apache.org/solr/FieldCollapsing

2. Join acts on documents, how can I use it to join multi-value fields in
the same document?

3. Block-join requires you to index parent and child document separately
using IndexWriter.addDocuments API

4.  Concatenation requires me to index with those columns concatenated.
 This is not possible as I have around 20 multivalue fields.

Is there a way to solve this without changing how it's indexed?

Best,
-Vijay

On Thu, Mar 13, 2014 at 1:39 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello Vijay,
 You can try FieldCollepsing, Join, Block-join, or just concatenate both
 field and search for concatenation.


 On Thu, Mar 13, 2014 at 7:16 AM, Vijay Kokatnur kokatnur.vi...@gmail.com
 wrote:

  Hi,
 
  I've inherited an Solr application with a Schema that contains
 parent-child
  relationship.  All child elements are maintained in multi-value fields.
  So an Order with 3 Order lines will result in an array of size 3 in Solr,
 
  This worked fine as long as clients queried only on Order, but with new
  requirements it is serving inaccurate results.
 
  Consider some orders, for example -
 
 
   {
  OrderId:123
  BookingRecordId : [145, 987, *234*]
  OrderLineType : [11, 12, *13*]
  .
  }
   {
  OrderId:345
  BookingRecordId : [945, 882, *234*]
  OrderLineType : [1, 12, *11*]
  .
  }
   {
  OrderId:678
  BookingRecordId : [444]
  OrderLineType : [11]
  .
  }
 
 
  If you look up for an Order with BookingRecordId: 234 And
 OrderLineType:11.
   You will get two orders : 123 and 345, which is correct per Solr.   You
  have two arrays in both the orders that satisfy this condition.
 
  However, for OrderId:123, the value at 3rd index of OrderLineType array
 is
  13 and not 11( this is for BookingRecordId:145) this should be excluded.
 
  Per this blog :
 
 
 http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
 
  I can't use span queries as I have tons of child elements to query and I
  want to keep any changes to client queries to minimum.
 
  So is creating multiple indexes is the only way? We have 3 Physical boxes
  with SolrCloud and at some point we would like to shard.
 
  Appreciate any inputs.
 
 
  Best,
 
  -Vijay
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re-index Parent-Child Schema

2014-03-12 Thread Vijay Kokatnur
Hi,

I've inherited an Solr application with a Schema that contains parent-child
relationship.  All child elements are maintained in multi-value fields.
So an Order with 3 Order lines will result in an array of size 3 in Solr,

This worked fine as long as clients queried only on Order, but with new
requirements it is serving inaccurate results.

Consider some orders, for example -


 {
OrderId:123
BookingRecordId : [145, 987, *234*]
OrderLineType : [11, 12, *13*]
.
}
 {
OrderId:345
BookingRecordId : [945, 882, *234*]
OrderLineType : [1, 12, *11*]
.
}
 {
OrderId:678
BookingRecordId : [444]
OrderLineType : [11]
.
}


If you look up for an Order with BookingRecordId: 234 And OrderLineType:11.
 You will get two orders : 123 and 345, which is correct per Solr.   You
have two arrays in both the orders that satisfy this condition.

However, for OrderId:123, the value at 3rd index of OrderLineType array is
13 and not 11( this is for BookingRecordId:145) this should be excluded.

Per this blog :
http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html

I can't use span queries as I have tons of child elements to query and I
want to keep any changes to client queries to minimum.

So is creating multiple indexes is the only way? We have 3 Physical boxes
with SolrCloud and at some point we would like to shard.

Appreciate any inputs.


Best,

-Vijay


Re: Date Range Query taking more time.

2014-03-10 Thread Vijay Kokatnur
Maybe I spoke too soon.

The second and third filter parameter *fq={!cache=false cost=50}ClientID:4*and
 *fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR] *above are
not getting executed, unless I make it the first parameter.  And when it's
the first filter parameter the Qtime goes up to 250ms from 2ms!!

Something I have noticed - Solr always respects only first q and fq
parameters.   Rest of the parameters are not applied at all.





On Thu, Mar 6, 2014 at 11:55 AM, Vijay Kokatnur kokatnur.vi...@gmail.comwrote:

 That did the trick Ahmet.  The first response was around 200ms, but the
 subsequent queries were around 2-5ms.

 I tried this

 q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
 fq={!cache=false cost=100}Status:Booked
 fq={!cache=false cost=50}ClientID:4
 fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR]



 On Thu, Mar 6, 2014 at 11:49 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Did you try with non-cached filter quries before?
 cached Filter queries are useful when they are re-used. How often do you
 commit?

 I thought that we can do something if we disable cache filter queries and
 manipulate their execution order with cost parameter.

 What happens with this :
 q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
 fq={!cache=false cost=100}Status:Booked
 fq={!cache=false cost=50}ClientID:4

 fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR]



 On Thursday, March 6, 2014 9:15 PM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Ahmet, I have tried filter queries before to fine tune query performance.

 However, whenever we use filter queries the response time goes up and
 remains there.  With above change, the response time was consistently
 around 4-5 secs.  We are using the default cache settings.

 Is there any settings I missed?



 On Thu, Mar 6, 2014 at 10:44 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Since your range query has NOW in it, it won't be cached meaningfully.
  http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/
 
  This is untested but can you try this?
 
  q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
  fq=Status:Booked
  fq=ClientID:4
  fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR]
 
 
 
 
  On Thursday, March 6, 2014 8:29 PM, Vijay Kokatnur 
  kokatnur.vi...@gmail.com wrote:
  I am working with date range query that is not giving me faster response
  times.  After modifying date range construct after reading several
 forums,
  response time now is around 200ms, down from 2-3secs.
 
  However, I was wondering if there still some way to improve upon it as
  queries without date range have around 2-10ms latency,
 
  Query : To look up upcoming booked trips for a user whenever he logs in
 to
  the app-
 
  q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 AND Status:Booked
  ANDClientID:4 AND  StartDate:[NOW/DAY TO NOW/DAY+1YEAR]
 
  Date configuration in Schema :
 
  field name=StartDate type=tdate indexed=true stored=true /
  fieldType name=tdate class=solr.TrieDateField precisionStep=6
  positionIncrementGap=0/
 
  Appreciate any inputs.
 
  Thanks!
 
 





Multiple fq parameters are not executed

2014-03-10 Thread Vijay Kokatnur
..Spawning this as a separate thread..

So I have a filter query with multiple fq parameters.  However, I have
noticed that only the first fq is used for filtering.  For instance, a
lookup with

...fq=ClientID:2
fq=HotelID:234-PPP
fq={!cache=false}StartDate:[NOW/DAY TO *]

In the above query, results are filtered only by ClientID and not by
HotelID and StartDate.  The same thing happens with q query.  Does anyone
know why?


Re: Date Range Query taking more time.

2014-03-10 Thread Vijay Kokatnur
Thanks Erick.  The links you provided are invaluable.

Here are our commit settings.  Since we have NRT search, softCommit is set
to 1000s which explains why cache is constantly invalidated.

 autoCommit
   maxTime60/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

 autoSoftCommit
   maxTime1000/maxTime
 /autoSoftCommit


With constant cache invalidation it becomes almost impossible to get better
response times.  Is the only to solve this is to fine tune softCommit
settings?



On Fri, Mar 7, 2014 at 6:17 PM, Erick Erickson erickerick...@gmail.comwrote:

 OK, something is not right here. What are
 your autocommit settings? What you pasted
 above looks like you're looking at a searcher that
 has _just_ opened, which would mean either
 1 you just had a hard commit with openSearcher=false happen
 or
 2 you just had a soft commit happen

 In either case, the cache is thrown out. That said, if you
 have autowarming for the cache set up you should be
 seeing some hits eventually.

 The top part is the _current_ searcher. The cumulative_*
 is all the cache results since the application started.

 A couple of blogs:


 http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

 I'm going to guess that you have soft commits or hard commits
 with openSearcher=true set to a very short interval and are
 having your filter caches invalidated very frequently, and that is
 misleading you, but that's just a guess.

 Best,
 Erick



 On Thu, Mar 6, 2014 at 9:32 PM, Vijay Kokatnur kokatnur.vi...@gmail.com
 wrote:
  My initial approach was to use filter cache static fields.  However when
  filter query is used, every query after the first has the same response
  time as the first.  For instance, when cache is enabled in the query
 under
  review, response time shoots up to 4-5secs and stays there.
 
  We are using default filter cache settings provided with 4.5.0
  distribution.
 
  Current Filter Cache stats :
 
  lookups:0
  hits:0
  hitratio:0
  inserts:0
  evictions:0
  size:0
  warmupTime:0
  cumulative_lookups:17135
  cumulative_hits:2465
  cumulative_hitratio:0.14
  cumulative_inserts:14670
  cumulative_evictions:0
 
  I did not find what cumulative_* fields mean
  herehttp://wiki.apache.org/solr/SolrAdminStats ,
  but it looks like nothing is being cached with fq as hit ratio is 0.
 
  Any idea whats happening?
 
 
 
  On Thu, Mar 6, 2014 at 2:41 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hoss,
 
  Thanks for the correction. I missed the /DAY part and thought as it was
   StartDate:[NOW TO NOW+1YEAR]
 
  Ahmet
 
 
  On Friday, March 7, 2014 12:33 AM, Chris Hostetter 
  hossman_luc...@fucit.org wrote:
 
  : That did the trick Ahmet.  The first response was around 200ms, but
 the
  : subsequent queries were around 2-5ms.
 
  Are you really sure you want cache=false on all of those filters?
 
  While the ClientID:4 query may by something that cahnges significantly
  enough in every query to not be useful to cache, i suspect you'd find a
  lot of value in going ahead and caching those Status:Booked and
  StartDate:[NOW/DAY TO NOW/DAY+1YEAR] clauses ... the first query to hit
  them might be slower but ever query after that should be fairly fast
 --
  and if you really need them to *always* be fast, configure them as
 static
  newSeracher warming queries (or make sure you have autowarming on.
 
  It also look like you forgot the StartDate: part of your range query
 in
  your last test...
 
  : fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR]
 
  And one finally comment just to make sure it doesn't slip throug hthe
  cracks
 
 
  :   Since your range query has NOW in it, it won't be cached
  meaningfully.
 
  this is not applicable.  the use of NOW in a range query doesn't mean
  that it can't be cached -- the problem is anytime you use really precise
  dates (or numeric values) that *change* in every query.
 
  if your range query uses NOW as a lower/upper end point, then it calls
  in that really precise dates situation -- but for this user, who is
  specifically rounding his dates to hte nearest day, that advice isn't
  really applicable -- the date range queries can be cached  reused for
 an
  entire day.
 
 
 
  -Hoss
  http://www.lucidworks.com/
 
 



Re: Date Range Query taking more time.

2014-03-10 Thread Vijay Kokatnur
Pardon my typo.  I meant 1000ms in my last mail.

Thanks,
-Vijay


On Mon, Mar 10, 2014 at 4:22 PM, Vijay Kokatnur kokatnur.vi...@gmail.comwrote:

 Thanks Erick.  The links you provided are invaluable.

 Here are our commit settings.  Since we have NRT search, softCommit is set
 to 1000s which explains why cache is constantly invalidated.

  autoCommit
maxTime60/maxTime
openSearcherfalse/openSearcher
  /autoCommit

  autoSoftCommit
maxTime1000/maxTime
  /autoSoftCommit


 With constant cache invalidation it becomes almost impossible to get
 better response times.  Is the only to solve this is to fine tune
 softCommit settings?



 On Fri, Mar 7, 2014 at 6:17 PM, Erick Erickson erickerick...@gmail.comwrote:

 OK, something is not right here. What are
 your autocommit settings? What you pasted
 above looks like you're looking at a searcher that
 has _just_ opened, which would mean either
 1 you just had a hard commit with openSearcher=false happen
 or
 2 you just had a soft commit happen

 In either case, the cache is thrown out. That said, if you
 have autowarming for the cache set up you should be
 seeing some hits eventually.

 The top part is the _current_ searcher. The cumulative_*
 is all the cache results since the application started.

 A couple of blogs:


 http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

 I'm going to guess that you have soft commits or hard commits
 with openSearcher=true set to a very short interval and are
 having your filter caches invalidated very frequently, and that is
 misleading you, but that's just a guess.

 Best,
 Erick



 On Thu, Mar 6, 2014 at 9:32 PM, Vijay Kokatnur kokatnur.vi...@gmail.com
 wrote:
  My initial approach was to use filter cache static fields.  However when
  filter query is used, every query after the first has the same response
  time as the first.  For instance, when cache is enabled in the query
 under
  review, response time shoots up to 4-5secs and stays there.
 
  We are using default filter cache settings provided with 4.5.0
  distribution.
 
  Current Filter Cache stats :
 
  lookups:0
  hits:0
  hitratio:0
  inserts:0
  evictions:0
  size:0
  warmupTime:0
  cumulative_lookups:17135
  cumulative_hits:2465
  cumulative_hitratio:0.14
  cumulative_inserts:14670
  cumulative_evictions:0
 
  I did not find what cumulative_* fields mean
  herehttp://wiki.apache.org/solr/SolrAdminStats ,
  but it looks like nothing is being cached with fq as hit ratio is 0.
 
  Any idea whats happening?
 
 
 
  On Thu, Mar 6, 2014 at 2:41 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hoss,
 
  Thanks for the correction. I missed the /DAY part and thought as it was
   StartDate:[NOW TO NOW+1YEAR]
 
  Ahmet
 
 
  On Friday, March 7, 2014 12:33 AM, Chris Hostetter 
  hossman_luc...@fucit.org wrote:
 
  : That did the trick Ahmet.  The first response was around 200ms, but
 the
  : subsequent queries were around 2-5ms.
 
  Are you really sure you want cache=false on all of those filters?
 
  While the ClientID:4 query may by something that cahnges
 significantly
  enough in every query to not be useful to cache, i suspect you'd find a
  lot of value in going ahead and caching those Status:Booked and
  StartDate:[NOW/DAY TO NOW/DAY+1YEAR] clauses ... the first query to hit
  them might be slower but ever query after that should be fairly fast
 --
  and if you really need them to *always* be fast, configure them as
 static
  newSeracher warming queries (or make sure you have autowarming on.
 
  It also look like you forgot the StartDate: part of your range query
 in
  your last test...
 
  : fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR]
 
  And one finally comment just to make sure it doesn't slip throug hthe
  cracks
 
 
  :   Since your range query has NOW in it, it won't be cached
  meaningfully.
 
  this is not applicable.  the use of NOW in a range query doesn't mean
  that it can't be cached -- the problem is anytime you use really
 precise
  dates (or numeric values) that *change* in every query.
 
  if your range query uses NOW as a lower/upper end point, then it
 calls
  in that really precise dates situation -- but for this user, who is
  specifically rounding his dates to hte nearest day, that advice isn't
  really applicable -- the date range queries can be cached  reused for
 an
  entire day.
 
 
 
  -Hoss
  http://www.lucidworks.com/
 
 





Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur
I am working with date range query that is not giving me faster response
times.  After modifying date range construct after reading several forums,
response time now is around 200ms, down from 2-3secs.

However, I was wondering if there still some way to improve upon it as
queries without date range have around 2-10ms latency,

Query : To look up upcoming booked trips for a user whenever he logs in to
the app-

q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 AND Status:Booked
ANDClientID:4 AND  StartDate:[NOW/DAY TO NOW/DAY+1YEAR]

Date configuration in Schema :

 field name=StartDate type=tdate indexed=true stored=true /
fieldType name=tdate class=solr.TrieDateField precisionStep=6
positionIncrementGap=0/

Appreciate any inputs.

Thanks!


Re: Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur
Ahmet, I have tried filter queries before to fine tune query performance.

However, whenever we use filter queries the response time goes up and
remains there.  With above change, the response time was consistently
around 4-5 secs.  We are using the default cache settings.

Is there any settings I missed?


On Thu, Mar 6, 2014 at 10:44 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Since your range query has NOW in it, it won't be cached meaningfully.
 http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/

 This is untested but can you try this?

 q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
 fq=Status:Booked
 fq=ClientID:4
 fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR]




 On Thursday, March 6, 2014 8:29 PM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 I am working with date range query that is not giving me faster response
 times.  After modifying date range construct after reading several forums,
 response time now is around 200ms, down from 2-3secs.

 However, I was wondering if there still some way to improve upon it as
 queries without date range have around 2-10ms latency,

 Query : To look up upcoming booked trips for a user whenever he logs in to
 the app-

 q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 AND Status:Booked
 ANDClientID:4 AND  StartDate:[NOW/DAY TO NOW/DAY+1YEAR]

 Date configuration in Schema :

 field name=StartDate type=tdate indexed=true stored=true /
 fieldType name=tdate class=solr.TrieDateField precisionStep=6
 positionIncrementGap=0/

 Appreciate any inputs.

 Thanks!




Re: Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur
That did the trick Ahmet.  The first response was around 200ms, but the
subsequent queries were around 2-5ms.

I tried this

q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
fq={!cache=false cost=100}Status:Booked
fq={!cache=false cost=50}ClientID:4
fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR]



On Thu, Mar 6, 2014 at 11:49 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Did you try with non-cached filter quries before?
 cached Filter queries are useful when they are re-used. How often do you
 commit?

 I thought that we can do something if we disable cache filter queries and
 manipulate their execution order with cost parameter.

 What happens with this :
 q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
 fq={!cache=false cost=100}Status:Booked
 fq={!cache=false cost=50}ClientID:4

 fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR]



 On Thursday, March 6, 2014 9:15 PM, Vijay Kokatnur 
 kokatnur.vi...@gmail.com wrote:
 Ahmet, I have tried filter queries before to fine tune query performance.

 However, whenever we use filter queries the response time goes up and
 remains there.  With above change, the response time was consistently
 around 4-5 secs.  We are using the default cache settings.

 Is there any settings I missed?



 On Thu, Mar 6, 2014 at 10:44 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Since your range query has NOW in it, it won't be cached meaningfully.
  http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/
 
  This is untested but can you try this?
 
  q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
  fq=Status:Booked
  fq=ClientID:4
  fq={!cache=false cost=150}StartDate:[NOW/DAY TO NOW/DAY+1YEAR]
 
 
 
 
  On Thursday, March 6, 2014 8:29 PM, Vijay Kokatnur 
  kokatnur.vi...@gmail.com wrote:
  I am working with date range query that is not giving me faster response
  times.  After modifying date range construct after reading several
 forums,
  response time now is around 200ms, down from 2-3secs.
 
  However, I was wondering if there still some way to improve upon it as
  queries without date range have around 2-10ms latency,
 
  Query : To look up upcoming booked trips for a user whenever he logs in
 to
  the app-
 
  q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 AND Status:Booked
  ANDClientID:4 AND  StartDate:[NOW/DAY TO NOW/DAY+1YEAR]
 
  Date configuration in Schema :
 
  field name=StartDate type=tdate indexed=true stored=true /
  fieldType name=tdate class=solr.TrieDateField precisionStep=6
  positionIncrementGap=0/
 
  Appreciate any inputs.
 
  Thanks!
 
 




Re: Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur
My initial approach was to use filter cache static fields.  However when
filter query is used, every query after the first has the same response
time as the first.  For instance, when cache is enabled in the query under
review, response time shoots up to 4-5secs and stays there.

We are using default filter cache settings provided with 4.5.0
distribution.

Current Filter Cache stats :

lookups:0
hits:0
hitratio:0
inserts:0
evictions:0
size:0
warmupTime:0
cumulative_lookups:17135
cumulative_hits:2465
cumulative_hitratio:0.14
cumulative_inserts:14670
cumulative_evictions:0

I did not find what cumulative_* fields mean
herehttp://wiki.apache.org/solr/SolrAdminStats ,
but it looks like nothing is being cached with fq as hit ratio is 0.

Any idea whats happening?



On Thu, Mar 6, 2014 at 2:41 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hoss,

 Thanks for the correction. I missed the /DAY part and thought as it was
  StartDate:[NOW TO NOW+1YEAR]

 Ahmet


 On Friday, March 7, 2014 12:33 AM, Chris Hostetter 
 hossman_luc...@fucit.org wrote:

 : That did the trick Ahmet.  The first response was around 200ms, but the
 : subsequent queries were around 2-5ms.

 Are you really sure you want cache=false on all of those filters?

 While the ClientID:4 query may by something that cahnges significantly
 enough in every query to not be useful to cache, i suspect you'd find a
 lot of value in going ahead and caching those Status:Booked and
 StartDate:[NOW/DAY TO NOW/DAY+1YEAR] clauses ... the first query to hit
 them might be slower but ever query after that should be fairly fast --
 and if you really need them to *always* be fast, configure them as static
 newSeracher warming queries (or make sure you have autowarming on.

 It also look like you forgot the StartDate: part of your range query in
 your last test...

 : fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR]

 And one finally comment just to make sure it doesn't slip throug hthe
 cracks


 :   Since your range query has NOW in it, it won't be cached
 meaningfully.

 this is not applicable.  the use of NOW in a range query doesn't mean
 that it can't be cached -- the problem is anytime you use really precise
 dates (or numeric values) that *change* in every query.

 if your range query uses NOW as a lower/upper end point, then it calls
 in that really precise dates situation -- but for this user, who is
 specifically rounding his dates to hte nearest day, that advice isn't
 really applicable -- the date range queries can be cached  reused for an
 entire day.



 -Hoss
 http://www.lucidworks.com/