Re: Max number of documents in update request

2020-07-07 Thread Sidharth Negi
Thanks. This was useful, really appreciate it! :)

On Tue, Jul 7, 2020, 8:07 PM Walter Underwood  wrote:

> Agreed, I do something between 20 and 1000. If the master node is not
> handling any search traffic, use twice as many client threads as there are
> CPUs in the node. That should get you close to 100% CPU utilization.
> One thread will be waiting while a batch is being processed and another
> thread will be sending the next batch so there is no pause in processing.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jul 7, 2020, at 6:12 AM, Erick Erickson 
> wrote:
> >
> > As many as you can send before blowing up.
> >
> > Really, the question is not answerable. 1K docs? 1G docs? 1 field or 500?
> >
> > And I don’t think it’s a good use of time to pursue much. See:
> >
> > https://lucidworks.com/post/really-batch-updates-solr-2/
> >
> > If you’re looking at trying to maximize throughput, adding
> > client threads that send Solr documents is a better approach.
> >
> > All that said, I usually just pick 1,000 and don’t worry about it.
> >
> > Best,
> > Erick
> >
> >> On Jul 7, 2020, at 8:59 AM, Sidharth Negi 
> wrote:
> >>
> >> Hi,
> >>
> >> Could someone help me with the best way to go about determining the
> maximum
> >> number of docs I can send in a single update call to Solr in a master /
> >> slave architecture.
> >>
> >> Thanks!
> >
>
>


Max number of documents in update request

2020-07-07 Thread Sidharth Negi
Hi,

Could someone help me with the best way to go about determining the maximum
number of docs I can send in a single update call to Solr in a master /
slave architecture.

Thanks!


Query Elevation Component

2020-02-03 Thread Sidharth Negi
Hi,

I want to use the Solr query elevation component. Let's say I want to
elevate "doc_id" when a user inputs the query "qwerty". I am able to get a
prototype to work by filling these values in elevate.xml and hitting the
Solr API with q="qwerty".

However, in our service, where I want to plug this in, the 'q' parameter
isn't as pure and looks more like q="'qwerty' (field1:value1)
(field2:value2)".

Any suggestions on the best way to go about this?

Thanks


Analysing Multivalued Fields

2019-12-30 Thread Sidharth Negi
Hi,

Is there a way to analyze how multiple values in a multivalued field are
being tokenized and processed during indexing?

The "Analysis" page on the UI assumes that my multiple comma-separated
values is a single value. It filters out the comma and acts as if it's a
single value that I specified.

Thanks in advance!


Re: Searches across Cores

2019-08-09 Thread Sidharth Negi
Hi,

If the number of cores spanned is low, I guess firing parallel queries and
taking union or intersection should work since their schema is the same. Do
you notice any perceivable difference in performance?

Best,
Sidharth

On Fri, Aug 9, 2019 at 2:54 PM Komal Motwani 
wrote:

> Hi,
>
>
>
> I have a use case where I would like a query to span across Cores
> (Multi-Core); all the cores involved do have same schema. I have started
> using solr just recently and have been trying to find ways to achieve this
> but couldn’t find any solution so far (Distributed searches, shards are not
> what I am looking for). I remember in one of the tech talks, there was a
> mention of this feature to be included in future releases. Appreciate any
> pointers to help me progress further.
>
>
>
> Thanks,
>
> Komal Motwani
>


Re: Replicate Now Not Working

2019-07-23 Thread Sidharth Negi
Ah nevermind, I managed to resolve the issue.

It seems that replication only works if the index changes. I noticed that
both master and slave had same index versions since I had only changed the
schema.

When I modified a random field of a random document, the index versions of
master and slave became different, and replication worked as usual.

Is this common knowledge that I missed somehow?

Thanks!



On Tue, Jul 23, 2019, 7:10 PM Erick Erickson 
wrote:

> Are you sure that you’re _using_ schema.xml and not managed-schema? the
> default has changed. If no explicit entry is made in solrconfig.xml to
> define , you’ll be using managed-schema, not schema.xml.
>
> Best,
> Erick
>
> > On Jul 23, 2019, at 5:51 AM, Sidharth Negi 
> wrote:
> >
> > Hi,
> >
> > The "replicateNow" button in the admin UI doesn't seem to work since the
> > "schema.xml" (which I modified on slave) is not being updated to reflect
> > that of the master. I have used this button before and it has always
> cloned
> > index right away. Any ideas on what could be the possible reason for
> this?
> >
> > The master and slave have proper "/replication" handlers and "schema.xml"
> > is in the confFiles.
> >
> > Master's Solrconfig:
> > ---
> > 
>  > name="master"> commit  > "replicateAfter">startup 
> > schema.xml,stopwords.txt,synonyms.txt  
> >
> > Slave's Solrconfig:
> > -
> > 
>  > name="slave"> MASTER_URL  > "pollInterval">01:00:00  
> >
> > Thanks!
>
>


Replicate Now Not Working

2019-07-23 Thread Sidharth Negi
Hi,

The "replicateNow" button in the admin UI doesn't seem to work since the
"schema.xml" (which I modified on slave) is not being updated to reflect
that of the master. I have used this button before and it has always cloned
index right away. Any ideas on what could be the possible reason for this?

The master and slave have proper "/replication" handlers and "schema.xml"
is in the confFiles.

Master's Solrconfig:
---
  commit startup 
schema.xml,stopwords.txt,synonyms.txt  

Slave's Solrconfig:
-
  MASTER_URL 01:00:00  

Thanks!


Re: Understanding Performance of Function Query

2019-05-09 Thread Sidharth Negi
To those interested, I was able to disable coord factor by overriding it in
a new CustomSimilarity jar file. This can effectively sum the scores from
multiple edismax queries.
However, I'd be interested in any other methods which are able to do
not-just-direct-sums and can work on other logics for scores, eg. sqrt(q1)
+ sqrt(q2) + 0.6*q3.

On Wed, Apr 17, 2019 at 6:20 PM Sidharth Negi 
wrote:

> This does indeed reduce the time. but doesn't quite do what I wanted. This
> approach penalizes the docs based on "coord" factor. In other words, for a
> doc with scores=5 on just one query (and nothing on others), the resulting
> score would now be 5/3 since only one clause matches.
>
> 1. I wonder why does the above query work at all? I can't find the above
> query syntax anywhere in any docs or books on Solr, can you point me to
> your source for this syntax?
>
> 2. Which parser is used to parse the larger query? No info about the
> parser used for the larger query is given from parsedQuery field. (using
> debug=true)
>
> 3. What if I did not want to sum (the scores of q1, q2, q3) but rather
> wanted to use their values in some other way (eg. sqrt(q1) + sqrt(q2) +
> 0.6*q3). Is there no way of cleanly implementing a flow of computations to
> be done on sub-query scores?
>
> On Tue, Apr 9, 2019 at 7:40 PM Erik Hatcher 
> wrote:
>
>> maybe something like q=
>>
>> ({!edismax  v=$q1} OR {!edismax  v=$q2} OR {!edismax ...
>> v=$q3})
>>
>>  and setting q1, q2, q3 as needed (or all to the same maybe with
>> different qf’s and such)
>>
>>   Erik
>>
>> > On Apr 9, 2019, at 09:12, sidharth228  wrote:
>> >
>> > I did infact use "bf" parameter for individual edismax queries.
>> >
>> > However, the reason I can't condense these edismax queries into a single
>> > edismax query is because each of them uses different fields in "qf".
>> >
>> > Basically what I'm trying to do is this: each of these edismax queries
>> (q1,
>> > q2, q3) has a logic, and scores docs using it. I am then trying to
>> combine
>> > the scores (to get an overall score) from these scores later by summing
>> > them.
>> >
>> > What options do I have of implementing this?
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>


Re: Understanding Performance of Function Query

2019-04-17 Thread Sidharth Negi
This does indeed reduce the time. but doesn't quite do what I wanted. This
approach penalizes the docs based on "coord" factor. In other words, for a
doc with scores=5 on just one query (and nothing on others), the resulting
score would now be 5/3 since only one clause matches.

1. I wonder why does the above query work at all? I can't find the above
query syntax anywhere in any docs or books on Solr, can you point me to
your source for this syntax?

2. Which parser is used to parse the larger query? No info about the parser
used for the larger query is given from parsedQuery field. (using
debug=true)

3. What if I did not want to sum (the scores of q1, q2, q3) but rather
wanted to use their values in some other way (eg. sqrt(q1) + sqrt(q2) +
0.6*q3). Is there no way of cleanly implementing a flow of computations to
be done on sub-query scores?

On Tue, Apr 9, 2019 at 7:40 PM Erik Hatcher  wrote:

> maybe something like q=
>
> ({!edismax  v=$q1} OR {!edismax  v=$q2} OR {!edismax ...
> v=$q3})
>
>  and setting q1, q2, q3 as needed (or all to the same maybe with different
> qf’s and such)
>
>   Erik
>
> > On Apr 9, 2019, at 09:12, sidharth228  wrote:
> >
> > I did infact use "bf" parameter for individual edismax queries.
> >
> > However, the reason I can't condense these edismax queries into a single
> > edismax query is because each of them uses different fields in "qf".
> >
> > Basically what I'm trying to do is this: each of these edismax queries
> (q1,
> > q2, q3) has a logic, and scores docs using it. I am then trying to
> combine
> > the scores (to get an overall score) from these scores later by summing
> > them.
> >
> > What options do I have of implementing this?
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Understanding Performance of Function Query

2019-04-09 Thread Sidharth Negi
Hi,

I'm working with "edismax" and "function-query" parsers in Solr and have
difficulty in understanding whether the query time taken by
"function-query" makes sense. The query I'm trying to optimize looks as
follows:

q={!func sum($q1,$q2,$q3)} where q1,q2,q3 are edismax queries.

The QTime returned by edismax queries takes well under 50ms but it seems
that function-query is the rate determining step since combined query above
takes around 200-300ms. I also analyzed the performance of function query
using only constants.

The QTime results for different q are as follows:

   -

   097ms for q={!func} sum(10,20)
   -

   109ms for q={!func} sum(10,20,30)
   -

   127ms for q={!func} sum(10,20,30,40)
   -

   145ms for q={!func} sum(10,20,30,40,50)

Does this trend make sense? Are function-queries expected to be this slow?

What makes edismax queries so much faster?

What can I do to optimize my original query (which has edismax subqueries
q1,q2,q3) to work under 100ms?

I originally posted this question

on
StackOverflow with no success, so any help here would be appreciated.


Understanding Performance of Function Query

2019-04-09 Thread Sidharth Negi
Hi,

I'm working with "edismax" and "function-query" parsers in Solr and have
difficulty in understanding whether the query time taken by
"function-query" makes sense. The query I'm trying to optimize looks as
follows:

q={!func sum($q1,$q2,$q3)} where q1,q2,q3 are edismax queries.

The QTime returned by edismax queries takes well under 50ms but it seems
that function-query is the rate determining step since combined query above
takes around 200-300ms. I also analyzed the performance of function query
using only constants.

The QTime results for different q are as follows:

   -

   097ms for q={!func} sum(10,20)
   -

   109ms for q={!func} sum(10,20,30)
   -

   127ms for q={!func} sum(10,20,30,40)
   -

   145ms for q={!func} sum(10,20,30,40,50)

Does this trend make sense? Are function-queries expected to be this slow?

What makes edismax queries so much faster?

What can I do to optimize my original query (which has edismax subqueries
q1,q2,q3) to work under 100ms?

I originally posted this question

on
StackOverflow with no success, so any help here would be appreciated.