Re: Re: Need urgent help with Solr spatial search using SpatialRecursivePrefixTreeFieldType

2019-10-01 Thread David Smiley
Do you know how URLs are structured?  They include name=value pairs
separated by ampersands.  This takes precedence over the contents of any
particular name or value.  Consequently looking at your parenthesis doesn't
make sense since the open-close span ampersands and thus go to different
filter queries.  I think you can completely remove those parenthesis in
fact.  Also try a tool like Postman to compose your queries rather than
direct URL manipulation.

&sfield=adminLatLon
&d=80
&fq= {!geofilt pt=33.0198431,-96.6988856} OR {!geofilt
pt=50.2171726,8.265894}

Notice the leading space after 'fq'.  This is a syntax parsing gotcha that
has to do with how embedded queries are parsed, which is what you need to
do as you need to compose two with an operator.  It'd be kinda awkard to
fix that gotcha in Solr.  There are other techniques too, but this is the
most succinct.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Oct 1, 2019 at 7:34 AM anushka gupta <
anushka_gu...@external.mckinsey.com> wrote:

> Thanks,
>
> Could you please help me in combining two geofilt fqs as the following
> gives
> error, it treats ")" as part of the d parameter and gives error that
> 'd=80)'
> is not a valid param:
>
>
> ({!geofilt}&sfield=adminLatLon&pt=33.0198431,-96.6988856&d=80)+OR+({!geofilt}&sfield=adminLatLon&pt=50.2171726,8.265894&d=80)
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: filter in JSON Query DSL

2019-10-01 Thread Mikhail Khludnev
Raised  https://issues.apache.org/jira/browse/SOLR-13808. Thanks, Jochen!

On Mon, Sep 30, 2019 at 4:26 PM Mikhail Khludnev  wrote:

> Jochen, right! Sorry for didn't get your point earlier.  {!bool filter=}
> means Lucene filter, not Solr's one. I suppose {!bool cache=true} flag can
> be easily added, but so far there is no laconic syntax for it. Don't
> hesitate to raise a jira for it.
>
> On Mon, Sep 30, 2019 at 3:18 PM Jochen Barth 
> wrote:
>
>> Here the corrected equivalent query, giving the same results (and still
>> much faster) as JsonQueryDSL:
>>
>> +filter(+((_query_:"{!graph from=parent_ids to=id }(meta_title_txt:muller
>> meta_name_txt:muller meta_subject_txt:muller meta_shelflocator_txt:muller)"
>> _query_:"{!graph from=id to=parent_ids  traversalFilter=\"class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal\"}(meta_title_txt:muller meta_name_txt:muller
>> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
>> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
>> text_abstract_ft:muller text_pdf_ft:muller)") ) +class_s:meta )
>> -_query_:"{!join to=id from=parent_ids}(filter(+((_query_:\"{!graph
>> from=parent_ids to=id }(meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller meta_shelflocator_txt:muller)\" _query_:\"{!graph
>> from=id to=parent_ids  traversalFilter=\\\"class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal\\\"}(meta_title_txt:muller meta_name_txt:muller
>> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
>> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
>> text_abstract_ft:muller text_pdf_ft:muller)\") ) +class_s:meta ))"
>>
>> I am querying the "core" of the above query (the string before
>> »-_query_:"{!join«) for faceting;
>> than the next query is the one above [ like »+(a) -{!join...}(a)« ]
>>
>> Now the second query is running in much less time because the result of
>> term "a" is cached.
>>
>> Caching seems not to work with {boolean=>{must=>"*:*", filter=>...}}.
>>
>> Kind regards,
>> Jochen
>>
>>
>>
>>
>>
>>
>> Am 30.09.19 um 11:02 schrieb Jochen Barth:
>>
>> Ooops... Json is returning 48652 docs, StandardQueryParser 827...
>>
>> Must check this.
>>
>> Sorry,
>>
>> Jochen
>>
>> Am 30.09.19 um 10:39 schrieb Jochen Barth:
>>
>> the *:* in JsonQueryDSL is appearing two times because of two times
>> »filter(...)« in StandardQueryParser.
>>
>>
>>
>> I've did some System.out.println in FastLRU, LRU, LFUCache,
>> here the logging with JsonQueryDSL (solr 8.1.1):
>>
>> Fast-get +*:* #(+(([[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> [[meta_title_txt:muller meta_name_txt:muller text_ocr_ft:muller
>> text_heidicon_ft:muller text_watermark_ft:muller text_catalogue_ft:muller
>> text_index_ft:muller text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
>> +class_s:meta) valLen=null
>>
>> Fast-get DocValuesFieldExistsQuery [field=id] valLen=38
>>
>> Fast-get DocValuesFieldExistsQuery [field=parent_ids] valLen=38
>>
>> Fast-put +*:* #(+(([[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> [[meta_title_txt:muller meta_name_txt:muller text_ocr_ft:muller
>> text_heidicon_ft:muller text_watermark_ft:muller text_catalogue_ft:muller
>> text_index_ft:muller text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
>> +class_s:meta)
>>
>> ...
>>
>> Fast(LRUCache)-get is called only once, but it should have been called 2
>> Times:
>> the first for finding out that this filter is not already cached and the
>> second one for the identical part of the subquery.
>>
>>
>> So now analzying Cache access with StandardQueryParser:
>> Fast-get +(+[[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> +[[meta_title_txt:muller meta_name_txt
>> :muller text_ocr_ft:muller text_heidicon_ft:muller
>> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
>> text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>>  -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false])
>> +class_s:me

Re: Solrcloud export all results sorted by score

2019-10-01 Thread Walter Underwood
I had to do this recently on a Solr Cloud cluster. I wanted to export all the 
IDs, but they weren’t stored as docvalues.

The fastest approach was to fetch all the IDs in one request. First, I make a 
request for zero rows to get the numFound. Then I fetch numFound+1000 (in case 
docs were added while I wasn’t looking) in one request.

I also have a hairy shell script to do /export on each leader after parsing 
cluster status. That might be a little large to post to this list, but I can do 
it if there is general interest.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 1, 2019, at 9:14 AM, Erick Erickson  wrote:
> 
> First, thanks for taking the time to ask a question with enough supporting 
> details that I can hope to be able to answer in one exchange ;). It’s a 
> pleasure to see.
> 
> Second, NP with asking on Stack Overflow, they have some excellent answers 
> there. But you’re right, this list gets more Solr-centered eyeballs.
> 
> On to your question. I think the best answer was that “/export wasn’t 
> designed to deal with scores”, which you’ll find disappointing. 
> 
> You could use the Streaming “search” expression (using qt=/select or just 
> leave qt out) but that’ll sort all of the docs you’re exporting into a huge 
> list, which may perform worse than CursorMark even if it doesn’t blow up 
> memory.
> 
> The root of this problem is that export can sort in batches since the values 
> it’s sorting on are contained in each document, so it can iterate in batches, 
> send them out, then iterate again on the remaining documents.
> 
> Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs to 
> know where a doc lands in the final set relative to any other doc, so if it 
> were going to work it’d have to have enough memory to hold the scores of all 
> the docs in an ordered list, which is very expensive. Conceptually this is an 
> ordered list up to maxDoc long. Not only does there have to be enough memory 
> to hold the entire list, every doc has to be inserted individually which can 
> kill performance. This is the “deep paging” problem.
> 
> In the usual case of returning, say, 20 docs, the sorted list only has to be 
> 20 long, higher scoring docs evict lower scoring docs.
> 
> So I think CursorMark is your best bet.
> 
> Best,
> Erick
> 
>> On Oct 1, 2019, at 3:59 AM, Edward Turner  wrote:
>> 
>> Hi all,
>> 
>> As far as I understand, SolrCloud currently does not allow the use of
>> sorting by the pseudofield, score in the /export request handler (i.e., get
>> the results in relevancy order). If we do attempt this, we get an
>> exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
>> supported with xsort". We could use Solr's cursorMark, but this takes a
>> very long time ...
>> 
>> Exporting results does work, however, when exporting result sets by a
>> specific document field that has docValues set to true.
>> 
>> Question:
>> Does anyone know if/when it will be possible to sort by score in the
>> /export handler?
>> 
>> Research on the problem:
>> We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
>> https://issues.apache.org/jira/browse/SOLR-8664, which are related to this
>> issue, but don't fix it. Maybe I've missed a more relevant issue?
>> 
>> Our use-case We are using Solrcloud in our team and it's added a huge
>> amount of value to our users.
>> 
>> We show a table of search results ordered by score (relevancy) that was
>> obtained from sending a query to the standard /select handler. We're
>> working in the life-sciences domain and it is common for our result sets to
>> contain many millions of results (unfortunately). After users browse their
>> results, they then may want to download the results that they see, to do
>> some post-processing. However, to do this, such that the results appear in
>> the order that the user originally saw them, we'd need to be able to export
>> results based on score/relevancy.
>> 
>> Any suggestions or advice on this would be greatly appreciated!
>> 
>> Many thanks!
>> 
>> Edd
>> 
>> PS. apologies for posting also on Stackoverflow (
>> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score)
>> --
>> I only discovered the Solr mailing-list afterwards and thought it probably
>> better to reach out directly to Solr's people (I can share any answer from
>> this forum on there retrospectively).
> 



Re: Solrcloud export all results sorted by score

2019-10-01 Thread Erick Erickson
First, thanks for taking the time to ask a question with enough supporting 
details that I can hope to be able to answer in one exchange ;). It’s a 
pleasure to see.

Second, NP with asking on Stack Overflow, they have some excellent answers 
there. But you’re right, this list gets more Solr-centered eyeballs.

On to your question. I think the best answer was that “/export wasn’t designed 
to deal with scores”, which you’ll find disappointing. 

You could use the Streaming “search” expression (using qt=/select or just leave 
qt out) but that’ll sort all of the docs you’re exporting into a huge list, 
which may perform worse than CursorMark even if it doesn’t blow up memory.

The root of this problem is that export can sort in batches since the values 
it’s sorting on are contained in each document, so it can iterate in batches, 
send them out, then iterate again on the remaining documents.

Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs to 
know where a doc lands in the final set relative to any other doc, so if it 
were going to work it’d have to have enough memory to hold the scores of all 
the docs in an ordered list, which is very expensive. Conceptually this is an 
ordered list up to maxDoc long. Not only does there have to be enough memory to 
hold the entire list, every doc has to be inserted individually which can kill 
performance. This is the “deep paging” problem.

In the usual case of returning, say, 20 docs, the sorted list only has to be 20 
long, higher scoring docs evict lower scoring docs.

So I think CursorMark is your best bet.

Best,
Erick

> On Oct 1, 2019, at 3:59 AM, Edward Turner  wrote:
> 
> Hi all,
> 
> As far as I understand, SolrCloud currently does not allow the use of
> sorting by the pseudofield, score in the /export request handler (i.e., get
> the results in relevancy order). If we do attempt this, we get an
> exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
> supported with xsort". We could use Solr's cursorMark, but this takes a
> very long time ...
> 
> Exporting results does work, however, when exporting result sets by a
> specific document field that has docValues set to true.
> 
> Question:
> Does anyone know if/when it will be possible to sort by score in the
> /export handler?
> 
> Research on the problem:
> We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
> https://issues.apache.org/jira/browse/SOLR-8664, which are related to this
> issue, but don't fix it. Maybe I've missed a more relevant issue?
> 
> Our use-case We are using Solrcloud in our team and it's added a huge
> amount of value to our users.
> 
> We show a table of search results ordered by score (relevancy) that was
> obtained from sending a query to the standard /select handler. We're
> working in the life-sciences domain and it is common for our result sets to
> contain many millions of results (unfortunately). After users browse their
> results, they then may want to download the results that they see, to do
> some post-processing. However, to do this, such that the results appear in
> the order that the user originally saw them, we'd need to be able to export
> results based on score/relevancy.
> 
> Any suggestions or advice on this would be greatly appreciated!
> 
> Many thanks!
> 
> Edd
> 
> PS. apologies for posting also on Stackoverflow (
> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score)
> --
> I only discovered the Solr mailing-list afterwards and thought it probably
> better to reach out directly to Solr's people (I can share any answer from
> this forum on there retrospectively).



Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Erick Erickson
bq. My customer requested me to achieve 1000 qps with a single Solr.

You need to talk to your customer then and adjust expectations. You say you get 
60 QPS with 40% CPU utilization. So let’s say QPS is perfectly linear to CPU 
utilization and you can get CPU utilization to run at 100%.

In that case you can expect a QPS rate of 2.5 times what you get now, or 150 
QPS. Note that I don’t expect it to be perfectly linear so I don’t think you’ll 
get 150 QPS although you may get close. And that will be further adversely 
affected if there’s active indexing going on.

So before worrying about getting 100% QPS rate, the customer needs to 
understand that this won’t be achievable on one Solr instance so until they 
decide to have more Solr instances, trying to hit 100% CPU utilization is a 
waste of time.

Best,
Erick

> On Oct 1, 2019, at 6:38 AM, Yasufumi Mizoguchi  wrote:
> 
> It is difficult to answer that for me.
> 
> My customer requested me to achieve 1000 qps with single Solr.
> 
> Thanks,
> Yasufumi.
> 
> 2019年10月1日(火) 14:59 Jörn Franke :
> 
>> Why do you need 1000 qps?
>> 
>> 
>>> Am 30.09.2019 um 07:45 schrieb Yasufumi Mizoguchi <
>> yasufumi0...@gmail.com>:
>>> 
>>> Hi,
>>> 
>>> I am trying some tests to confirm if single Solr instance can perform
>> over
>>> 1000 queries per second(!).
>>> 
>>> But now, although CPU usage is 40% or so and iowait is almost 0%,
>>> throughput does not increase over 60 queries per second.
>>> 
>>> I think there are some bottlenecks around Kernel, JVM, or Solr settings.
>>> 
>>> The values we already checked and configured are followings.
>>> 
>>> * Kernel:
>>> file descriptor
>>> net.ipv4.tcp_max_syn_backlog
>>> net.ipv4.tcp_syncookies
>>> net.core.somaxconn
>>> net.core.rmem_max
>>> net.core.wmem_max
>>> net.ipv4.tcp_rmem
>>> net.ipv4.tcp_wmem
>>> 
>>> * JVM:
>>> Heap [ -> 32GB]
>>> G1GC settings
>>> 
>>> * Solr:
>>> (Jetty) MaxThreads [ -> 2]
>>> 
>>> 
>>> And the other info is as follows.
>>> 
>>> CPU : 16 cores
>>> RAM : 128 GB
>>> Disk : SSD 500GB
>>> NIC : 10Gbps(maybe)
>>> OS : Ubuntu 14.04
>>> JVM : OpenJDK 1.8.0u191
>>> Solr : 6.2.1
>>> Index size : about 60GB
>>> 
>>> Any insights will be appreciated.
>>> 
>>> Thanks and regards,
>>> Yasufumi.
>> 



Using policy/preference/... to keep certain collections split?

2019-10-01 Thread Koen De Groote
Hello,

I'm trying to achieve that no replicas of collection A exist on the same
host as a replica of collection B.

Using Solr7.6

Both are large collections/will become large collections and load on 1
should not impact the other in any way. They will both be queried heavily,
simultaneously.

So, I'm looking for a solution to that. Ideally, after the initial setup,
it would suffice to just add solr hosts, let zookeeper know about the new
host(s), it balances itself. Ideally.

I've been going over the following pages:

https://lucene.apache.org/solr/guide/7_6/migrate-to-policy-rule.html
https://lucene.apache.org/solr/guide/7_6/solrcloud-autoscaling-policy-preferences.html
https://lucene.apache.org/solr/guide/7_6/solrcloud-autoscaling-api.html

But I can't seem to find what I need. Or I'm just not understanding it
properly.

The collections already exist, I'm trying to find a command I can run, with
a specification in it, and once solr receives that, it starts re-arranging
things on its own.

It's probably just the case that I'm not properly understanding what's
required to properly orchestrate this.

Thanks in advance,
Koen De Groote


Re: Re: Need urgent help with Solr spatial search using SpatialRecursivePrefixTreeFieldType

2019-10-01 Thread anushka gupta
Thanks, 

Could you please help me in combining two geofilt fqs as the following gives
error, it treats ")" as part of the d parameter and gives error that 'd=80)'
is not a valid param:

({!geofilt}&sfield=adminLatLon&pt=33.0198431,-96.6988856&d=80)+OR+({!geofilt}&sfield=adminLatLon&pt=50.2171726,8.265894&d=80)



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Toke Eskildsen
On Tue, 2019-10-01 at 19:08 +0900, Yasufumi Mizoguchi wrote:
> * The number of Firing hosts : 6
> * [Each host]ThreadGroup.num_threads : 200
> * [Each host]ThreadGroup.ramp_time : 600
> * [Each host]ThreadGroup.duration: 1800
> * [Each host]ThreadGroup.delay: 0
> * The number of sample queries: 20,000,000
> 
> And we confirmed that Jetty threads was increasing and reached the
> limit (1).
> Therefore, we raised MaxThreads value.

You have a server with 16 CPU cores and you're running more than 10K
threads on it: That's a sure way to get lower throughput as the system
struggles to switch between the many threads. You would probably get
higher throughput by limiting the number of concurrent threads (for
once it seems that Erick and I are in a disagreement).

It's easy to try: Just set num_threads for each ThreadGroup to 20 and
see if your QPS rises above the 60 you have now.

Still, there's something off: 6*200 threads = 1200. That should not
blow Jetty thread allocation beyond 10K. Maybe your index is sharded?
that would explain extra thread allocation. If so, how many shards do
you have and have you tries running with a single shard? Single shard
indexes maximizes throughput at the possible cost of latency, so that
seems fitting for your requirements.

- Toke Eskildsen, Royal Danish Library




Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Yasufumi Mizoguchi
Hi Paras,

Thank you for your advice.

I will confirm JMeter's settings in addition to JVM options.

Yes, we are using documentCache and *after* finishing load test, we will
comment that out.
(By customer requests, we cannot update cache settings till the test
ends...)

Thanks,
Yasufumi

2019年10月1日(火) 19:38 Paras Lehana :

> Hi Yasufumi,
>
> Followings are current load test set up.
>
>
> Did you try decreasing ramp_time and increasing num_threads or Firing
> hosts? Out of 1800 secs, you are giving 200 threads a maximum of 600
> seconds to get ready. I don't increase with this value when I want to test
> parallel requests. In your case, the answer could as simple as that the
> testing threads itself are not capable of requesting more than 60 qps. Do
> confirm this bottleneck by playing with JMeter values.
>
> Cache hit rate is not checked now. Those small cache size are intended, but
> > I want to change those.
>
>
> I suggest you to check the cache stat in Solr Dashboard > Select Core >
> Plugins/Stats > Cache. I'm assuming that all of your 20M queries are unique
> though they could still use documentCache. Why not just comment out Cache
> settings in solrconfig.xml?
>
>
>
> On Tue, 1 Oct 2019 at 15:39, Yasufumi Mizoguchi 
> wrote:
>
> > Thank you for replying me.
> >
> > Followings are current load test set up.
> >
> > * Load test program : JUnit
> > * The number of Firing hosts : 6
> > * [Each host]ThreadGroup.num_threads : 200
> > * [Each host]ThreadGroup.ramp_time : 600
> > * [Each host]ThreadGroup.duration: 1800
> > * [Each host]ThreadGroup.delay: 0
> > * The number of sample queries: 20,000,000
> >
> > And we confirmed that Jetty threads was increasing and reached the limit
> > (1).
> > Therefore, we raised MaxThreads value.
> >
> > I checked GC logs and found that it happened no major GC, and almost all
> > minor GC were finished by 200ms.
> >
> > Cache hit rate is not checked now, but I think those are extremely low
> all
> > kinds of cache.
> > Because the number of sample query is big(20,000,000) compared to
> > queryResult and filter cache size(both 512) and there are few duplication
> > in fq and q parameter.
> > Those small cache size are intended, but I want to change those
> >
> > Thanks,
> > Yasufumi
> >
> >
> >
> > 2019年9月30日(月) 20:49 Erick Erickson :
> >
> > > The most basic question is how you are load-testing it? Assuming you
> have
> > > some kind of client firing queries at Solr, keep adding threads so Solr
> > is
> > > handling more and more queries in parallel. If you start to see the
> > > response time at the client get longer _and_ the  QTime in Solr’s
> > response
> > > stays about the same, then the queries are queueing up and you need to
> > see
> > > about increasing the Jetty threads handling queries.
> > >
> > > Second is whether you’re hitting GC pauses, look at the GC logs,
> > > especially for “stop the world” pauses. This is unlikely as you’re
> still
> > > getting 60 qps, but something to check.
> > >
> > > Setting your heap to 31G is good advice, but it won’t dramatically
> > > increase the throughput I’d guess.
> > >
> > > If your I/O isn’t very high, then your index is mostly
> memory-resident. A
> > > general bit of tuning advice is to _reduce_ the heap size, leaving OS
> > > memory for the index. See Uwe’s blog:
> > >
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> > .
> > > There’s a sweet spot between having too much heap and too little, and
> > > unfortunately you have to experiment to find out.
> > >
> > > But given the numbers you’re talking about, you won’t be getting 1,000
> > QPS
> > > on a single box and you’ll have to scale out with replicas to hit that
> > > number. Getting all the QPS you can out of the box is important, of
> > course.
> > > Do be careful to use enough different queries that you don’t get them
> > from
> > > the queryResultCache. I had one client who was thrilled they were
> getting
> > > 3ms response times…. by firing the same query over and over and hitting
> > the
> > > queryResultCache 99.% of the time ;).
> > >
> > > Best,
> > > Erick
> > >
> > > > On Sep 30, 2019, at 4:28 AM, Yasufumi Mizoguchi <
> > yasufumi0...@gmail.com>
> > > wrote:
> > > >
> > > > Hi, Ere.
> > > >
> > > > Thank you for valuable feedback.
> > > > I will try Xmx31G and Xms31G instead of current ones.
> > > >
> > > > Thanks and Regards,
> > > > Yasufumi.
> > > >
> > > > 2019年9月30日(月) 17:19 Ere Maijala :
> > > >
> > > >> Just a side note: -Xmx32G is really bad for performance as it forces
> > > >> Java to use non-compressed pointers. You'll actually get better
> > results
> > > >> with -Xmx31G. For more information, see e.g.
> > > >>
> > > >>
> > >
> >
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
> > > >>
> > > >> Regards,
> > > >> Ere
> > > >>
> > > >> Yasufumi Mizoguchi kirjoitti 30.9.2019 klo 11.05:
> > > >>> Hi, Deepak.
> > > >>> Thank you for replying me.
> > > >>>
> > > >>> 

Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Yasufumi Mizoguchi
It is difficult to answer that for me.

My customer requested me to achieve 1000 qps with single Solr.

Thanks,
Yasufumi.

2019年10月1日(火) 14:59 Jörn Franke :

> Why do you need 1000 qps?
>
>
> > Am 30.09.2019 um 07:45 schrieb Yasufumi Mizoguchi <
> yasufumi0...@gmail.com>:
> >
> > Hi,
> >
> > I am trying some tests to confirm if single Solr instance can perform
> over
> > 1000 queries per second(!).
> >
> > But now, although CPU usage is 40% or so and iowait is almost 0%,
> > throughput does not increase over 60 queries per second.
> >
> > I think there are some bottlenecks around Kernel, JVM, or Solr settings.
> >
> > The values we already checked and configured are followings.
> >
> > * Kernel:
> > file descriptor
> > net.ipv4.tcp_max_syn_backlog
> > net.ipv4.tcp_syncookies
> > net.core.somaxconn
> > net.core.rmem_max
> > net.core.wmem_max
> > net.ipv4.tcp_rmem
> > net.ipv4.tcp_wmem
> >
> > * JVM:
> > Heap [ -> 32GB]
> > G1GC settings
> >
> > * Solr:
> > (Jetty) MaxThreads [ -> 2]
> >
> >
> > And the other info is as follows.
> >
> > CPU : 16 cores
> > RAM : 128 GB
> > Disk : SSD 500GB
> > NIC : 10Gbps(maybe)
> > OS : Ubuntu 14.04
> > JVM : OpenJDK 1.8.0u191
> > Solr : 6.2.1
> > Index size : about 60GB
> >
> > Any insights will be appreciated.
> >
> > Thanks and regards,
> > Yasufumi.
>


Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Paras Lehana
Hi Yasufumi,

Followings are current load test set up.


Did you try decreasing ramp_time and increasing num_threads or Firing
hosts? Out of 1800 secs, you are giving 200 threads a maximum of 600
seconds to get ready. I don't increase with this value when I want to test
parallel requests. In your case, the answer could as simple as that the
testing threads itself are not capable of requesting more than 60 qps. Do
confirm this bottleneck by playing with JMeter values.

Cache hit rate is not checked now. Those small cache size are intended, but
> I want to change those.


I suggest you to check the cache stat in Solr Dashboard > Select Core >
Plugins/Stats > Cache. I'm assuming that all of your 20M queries are unique
though they could still use documentCache. Why not just comment out Cache
settings in solrconfig.xml?



On Tue, 1 Oct 2019 at 15:39, Yasufumi Mizoguchi 
wrote:

> Thank you for replying me.
>
> Followings are current load test set up.
>
> * Load test program : JUnit
> * The number of Firing hosts : 6
> * [Each host]ThreadGroup.num_threads : 200
> * [Each host]ThreadGroup.ramp_time : 600
> * [Each host]ThreadGroup.duration: 1800
> * [Each host]ThreadGroup.delay: 0
> * The number of sample queries: 20,000,000
>
> And we confirmed that Jetty threads was increasing and reached the limit
> (1).
> Therefore, we raised MaxThreads value.
>
> I checked GC logs and found that it happened no major GC, and almost all
> minor GC were finished by 200ms.
>
> Cache hit rate is not checked now, but I think those are extremely low all
> kinds of cache.
> Because the number of sample query is big(20,000,000) compared to
> queryResult and filter cache size(both 512) and there are few duplication
> in fq and q parameter.
> Those small cache size are intended, but I want to change those
>
> Thanks,
> Yasufumi
>
>
>
> 2019年9月30日(月) 20:49 Erick Erickson :
>
> > The most basic question is how you are load-testing it? Assuming you have
> > some kind of client firing queries at Solr, keep adding threads so Solr
> is
> > handling more and more queries in parallel. If you start to see the
> > response time at the client get longer _and_ the  QTime in Solr’s
> response
> > stays about the same, then the queries are queueing up and you need to
> see
> > about increasing the Jetty threads handling queries.
> >
> > Second is whether you’re hitting GC pauses, look at the GC logs,
> > especially for “stop the world” pauses. This is unlikely as you’re still
> > getting 60 qps, but something to check.
> >
> > Setting your heap to 31G is good advice, but it won’t dramatically
> > increase the throughput I’d guess.
> >
> > If your I/O isn’t very high, then your index is mostly memory-resident. A
> > general bit of tuning advice is to _reduce_ the heap size, leaving OS
> > memory for the index. See Uwe’s blog:
> > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> .
> > There’s a sweet spot between having too much heap and too little, and
> > unfortunately you have to experiment to find out.
> >
> > But given the numbers you’re talking about, you won’t be getting 1,000
> QPS
> > on a single box and you’ll have to scale out with replicas to hit that
> > number. Getting all the QPS you can out of the box is important, of
> course.
> > Do be careful to use enough different queries that you don’t get them
> from
> > the queryResultCache. I had one client who was thrilled they were getting
> > 3ms response times…. by firing the same query over and over and hitting
> the
> > queryResultCache 99.% of the time ;).
> >
> > Best,
> > Erick
> >
> > > On Sep 30, 2019, at 4:28 AM, Yasufumi Mizoguchi <
> yasufumi0...@gmail.com>
> > wrote:
> > >
> > > Hi, Ere.
> > >
> > > Thank you for valuable feedback.
> > > I will try Xmx31G and Xms31G instead of current ones.
> > >
> > > Thanks and Regards,
> > > Yasufumi.
> > >
> > > 2019年9月30日(月) 17:19 Ere Maijala :
> > >
> > >> Just a side note: -Xmx32G is really bad for performance as it forces
> > >> Java to use non-compressed pointers. You'll actually get better
> results
> > >> with -Xmx31G. For more information, see e.g.
> > >>
> > >>
> >
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
> > >>
> > >> Regards,
> > >> Ere
> > >>
> > >> Yasufumi Mizoguchi kirjoitti 30.9.2019 klo 11.05:
> > >>> Hi, Deepak.
> > >>> Thank you for replying me.
> > >>>
> > >>> JVM settings from solr.in.sh file are as follows. (Sorry, I could
> not
> > >> show
> > >>> all due to our policy)
> > >>>
> > >>> -verbose:gc
> > >>> -XX:+PrintHeapAtGC
> > >>> -XX:+PrintGCDetails
> > >>> -XX:+PrintGCDateStamps
> > >>> -XX:+PrintGCTimeStamps
> > >>> -XX:+PrintTenuringDistribution
> > >>> -XX:+PrintGCApplicationStoppedTime
> > >>> -Dcom.sun.management.jmxremote.ssl=false
> > >>> -Dcom.sun.management.jmxremote.authenticate=false
> > >>> -Dcom.sun.management.jmxremote.port=18983
> > >>> -XX:OnOutOfMemoryError=/home/solr/solr-6.2.1/bin/oom_solr.sh
> > >>> -XX:NewSize=1

Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Yasufumi Mizoguchi
Thank you for replying me.

I will try to resize NewRatio.

Thanks,
Yasufumi.

2019年10月1日(火) 11:19 Deepak Goel :

> Hello
>
> Can you please try increasing 'new size' and 'max new size' to 1GB+?
>
> Deepak
>
> On Mon, 30 Sep 2019, 13:35 Yasufumi Mizoguchi, 
> wrote:
>
> > Hi, Deepak.
> > Thank you for replying me.
> >
> > JVM settings from solr.in.sh file are as follows. (Sorry, I could not
> show
> > all due to our policy)
> >
> > -verbose:gc
> > -XX:+PrintHeapAtGC
> > -XX:+PrintGCDetails
> > -XX:+PrintGCDateStamps
> > -XX:+PrintGCTimeStamps
> > -XX:+PrintTenuringDistribution
> > -XX:+PrintGCApplicationStoppedTime
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.authenticate=false
> > -Dcom.sun.management.jmxremote.port=18983
> > -XX:OnOutOfMemoryError=/home/solr/solr-6.2.1/bin/oom_solr.sh
> > -XX:NewSize=128m
> > -XX:MaxNewSize=128m
> > -XX:+UseG1GC
> > -XX:+PerfDisableSharedMem
> > -XX:+ParallelRefProcEnabled
> > -XX:G1HeapRegionSize=8m
> > -XX:MaxGCPauseMillis=250
> > -XX:InitiatingHeapOccupancyPercent=75
> > -XX:+UseLargePages
> > -XX:+AggressiveOpts
> > -Xmx32G
> > -Xms32G
> > -Xss256k
> >
> >
> > Thanks & Regards
> > Yasufumi.
> >
> > 2019年9月30日(月) 16:12 Deepak Goel :
> >
> > > Hello
> > >
> > > Can you please share the JVM heap settings in detail?
> > >
> > > Deepak
> > >
> > > On Mon, 30 Sep 2019, 11:15 Yasufumi Mizoguchi,  >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying some tests to confirm if single Solr instance can perform
> > > over
> > > > 1000 queries per second(!).
> > > >
> > > > But now, although CPU usage is 40% or so and iowait is almost 0%,
> > > > throughput does not increase over 60 queries per second.
> > > >
> > > > I think there are some bottlenecks around Kernel, JVM, or Solr
> > settings.
> > > >
> > > > The values we already checked and configured are followings.
> > > >
> > > > * Kernel:
> > > > file descriptor
> > > > net.ipv4.tcp_max_syn_backlog
> > > > net.ipv4.tcp_syncookies
> > > > net.core.somaxconn
> > > > net.core.rmem_max
> > > > net.core.wmem_max
> > > > net.ipv4.tcp_rmem
> > > > net.ipv4.tcp_wmem
> > > >
> > > > * JVM:
> > > > Heap [ -> 32GB]
> > > > G1GC settings
> > > >
> > > > * Solr:
> > > > (Jetty) MaxThreads [ -> 2]
> > > >
> > > >
> > > > And the other info is as follows.
> > > >
> > > > CPU : 16 cores
> > > > RAM : 128 GB
> > > > Disk : SSD 500GB
> > > > NIC : 10Gbps(maybe)
> > > > OS : Ubuntu 14.04
> > > > JVM : OpenJDK 1.8.0u191
> > > > Solr : 6.2.1
> > > > Index size : about 60GB
> > > >
> > > > Any insights will be appreciated.
> > > >
> > > > Thanks and regards,
> > > > Yasufumi.
> > > >
> > >
> >
>


Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Yasufumi Mizoguchi
Thank you for your response.

Now, we have no JVM monitoring.
But checking GC logs, I found no major GC during load test.
As you saying, heap size might be too large and I am planning to reduce
that.

Thanks,
Yasufumi

2019年9月30日(月) 23:19 Walter Underwood :

> 31G is still a very large heap. We use 8G for all of our different
> clusters.
>
> Do you have JVM monitoring? Look at the heap used after a major GC. Use
> that number, plus some extra, for the heap size.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Sep 30, 2019, at 1:28 AM, Yasufumi Mizoguchi 
> wrote:
> >
> > Hi, Ere.
> >
> > Thank you for valuable feedback.
> > I will try Xmx31G and Xms31G instead of current ones.
> >
> > Thanks and Regards,
> > Yasufumi.
> >
> > 2019年9月30日(月) 17:19 Ere Maijala :
> >
> >> Just a side note: -Xmx32G is really bad for performance as it forces
> >> Java to use non-compressed pointers. You'll actually get better results
> >> with -Xmx31G. For more information, see e.g.
> >>
> >>
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
> >>
> >> Regards,
> >> Ere
> >>
> >> Yasufumi Mizoguchi kirjoitti 30.9.2019 klo 11.05:
> >>> Hi, Deepak.
> >>> Thank you for replying me.
> >>>
> >>> JVM settings from solr.in.sh file are as follows. (Sorry, I could not
> >> show
> >>> all due to our policy)
> >>>
> >>> -verbose:gc
> >>> -XX:+PrintHeapAtGC
> >>> -XX:+PrintGCDetails
> >>> -XX:+PrintGCDateStamps
> >>> -XX:+PrintGCTimeStamps
> >>> -XX:+PrintTenuringDistribution
> >>> -XX:+PrintGCApplicationStoppedTime
> >>> -Dcom.sun.management.jmxremote.ssl=false
> >>> -Dcom.sun.management.jmxremote.authenticate=false
> >>> -Dcom.sun.management.jmxremote.port=18983
> >>> -XX:OnOutOfMemoryError=/home/solr/solr-6.2.1/bin/oom_solr.sh
> >>> -XX:NewSize=128m
> >>> -XX:MaxNewSize=128m
> >>> -XX:+UseG1GC
> >>> -XX:+PerfDisableSharedMem
> >>> -XX:+ParallelRefProcEnabled
> >>> -XX:G1HeapRegionSize=8m
> >>> -XX:MaxGCPauseMillis=250
> >>> -XX:InitiatingHeapOccupancyPercent=75
> >>> -XX:+UseLargePages
> >>> -XX:+AggressiveOpts
> >>> -Xmx32G
> >>> -Xms32G
> >>> -Xss256k
> >>>
> >>>
> >>> Thanks & Regards
> >>> Yasufumi.
> >>>
> >>> 2019年9月30日(月) 16:12 Deepak Goel :
> >>>
>  Hello
> 
>  Can you please share the JVM heap settings in detail?
> 
>  Deepak
> 
>  On Mon, 30 Sep 2019, 11:15 Yasufumi Mizoguchi, <
> yasufumi0...@gmail.com>
>  wrote:
> 
> > Hi,
> >
> > I am trying some tests to confirm if single Solr instance can perform
>  over
> > 1000 queries per second(!).
> >
> > But now, although CPU usage is 40% or so and iowait is almost 0%,
> > throughput does not increase over 60 queries per second.
> >
> > I think there are some bottlenecks around Kernel, JVM, or Solr
> >> settings.
> >
> > The values we already checked and configured are followings.
> >
> > * Kernel:
> > file descriptor
> > net.ipv4.tcp_max_syn_backlog
> > net.ipv4.tcp_syncookies
> > net.core.somaxconn
> > net.core.rmem_max
> > net.core.wmem_max
> > net.ipv4.tcp_rmem
> > net.ipv4.tcp_wmem
> >
> > * JVM:
> > Heap [ -> 32GB]
> > G1GC settings
> >
> > * Solr:
> > (Jetty) MaxThreads [ -> 2]
> >
> >
> > And the other info is as follows.
> >
> > CPU : 16 cores
> > RAM : 128 GB
> > Disk : SSD 500GB
> > NIC : 10Gbps(maybe)
> > OS : Ubuntu 14.04
> > JVM : OpenJDK 1.8.0u191
> > Solr : 6.2.1
> > Index size : about 60GB
> >
> > Any insights will be appreciated.
> >
> > Thanks and regards,
> > Yasufumi.
> >
> 
> >>>
> >>
> >> --
> >> Ere Maijala
> >> Kansalliskirjasto / The National Library of Finland
> >>
>
>


Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Yasufumi Mizoguchi
Ah, sorry.

Not JUnit, we use JMeter.

Thanks,
Yasufumi

2019年10月1日(火) 19:08 Yasufumi Mizoguchi :

> Thank you for replying me.
>
> Followings are current load test set up.
>
> * Load test program : JUnit
> * The number of Firing hosts : 6
> * [Each host]ThreadGroup.num_threads : 200
> * [Each host]ThreadGroup.ramp_time : 600
> * [Each host]ThreadGroup.duration: 1800
> * [Each host]ThreadGroup.delay: 0
> * The number of sample queries: 20,000,000
>
> And we confirmed that Jetty threads was increasing and reached the limit
> (1).
> Therefore, we raised MaxThreads value.
>
> I checked GC logs and found that it happened no major GC, and almost all
> minor GC were finished by 200ms.
>
> Cache hit rate is not checked now, but I think those are extremely low all
> kinds of cache.
> Because the number of sample query is big(20,000,000) compared to
> queryResult and filter cache size(both 512) and there are few duplication
> in fq and q parameter.
> Those small cache size are intended, but I want to change those
>
> Thanks,
> Yasufumi
>
>
>
> 2019年9月30日(月) 20:49 Erick Erickson :
>
>> The most basic question is how you are load-testing it? Assuming you have
>> some kind of client firing queries at Solr, keep adding threads so Solr is
>> handling more and more queries in parallel. If you start to see the
>> response time at the client get longer _and_ the  QTime in Solr’s response
>> stays about the same, then the queries are queueing up and you need to see
>> about increasing the Jetty threads handling queries.
>>
>> Second is whether you’re hitting GC pauses, look at the GC logs,
>> especially for “stop the world” pauses. This is unlikely as you’re still
>> getting 60 qps, but something to check.
>>
>> Setting your heap to 31G is good advice, but it won’t dramatically
>> increase the throughput I’d guess.
>>
>> If your I/O isn’t very high, then your index is mostly memory-resident. A
>> general bit of tuning advice is to _reduce_ the heap size, leaving OS
>> memory for the index. See Uwe’s blog:
>> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
>> There’s a sweet spot between having too much heap and too little, and
>> unfortunately you have to experiment to find out.
>>
>> But given the numbers you’re talking about, you won’t be getting 1,000
>> QPS on a single box and you’ll have to scale out with replicas to hit that
>> number. Getting all the QPS you can out of the box is important, of course.
>> Do be careful to use enough different queries that you don’t get them from
>> the queryResultCache. I had one client who was thrilled they were getting
>> 3ms response times…. by firing the same query over and over and hitting the
>> queryResultCache 99.% of the time ;).
>>
>> Best,
>> Erick
>>
>> > On Sep 30, 2019, at 4:28 AM, Yasufumi Mizoguchi 
>> wrote:
>> >
>> > Hi, Ere.
>> >
>> > Thank you for valuable feedback.
>> > I will try Xmx31G and Xms31G instead of current ones.
>> >
>> > Thanks and Regards,
>> > Yasufumi.
>> >
>> > 2019年9月30日(月) 17:19 Ere Maijala :
>> >
>> >> Just a side note: -Xmx32G is really bad for performance as it forces
>> >> Java to use non-compressed pointers. You'll actually get better results
>> >> with -Xmx31G. For more information, see e.g.
>> >>
>> >>
>> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
>> >>
>> >> Regards,
>> >> Ere
>> >>
>> >> Yasufumi Mizoguchi kirjoitti 30.9.2019 klo 11.05:
>> >>> Hi, Deepak.
>> >>> Thank you for replying me.
>> >>>
>> >>> JVM settings from solr.in.sh file are as follows. (Sorry, I could not
>> >> show
>> >>> all due to our policy)
>> >>>
>> >>> -verbose:gc
>> >>> -XX:+PrintHeapAtGC
>> >>> -XX:+PrintGCDetails
>> >>> -XX:+PrintGCDateStamps
>> >>> -XX:+PrintGCTimeStamps
>> >>> -XX:+PrintTenuringDistribution
>> >>> -XX:+PrintGCApplicationStoppedTime
>> >>> -Dcom.sun.management.jmxremote.ssl=false
>> >>> -Dcom.sun.management.jmxremote.authenticate=false
>> >>> -Dcom.sun.management.jmxremote.port=18983
>> >>> -XX:OnOutOfMemoryError=/home/solr/solr-6.2.1/bin/oom_solr.sh
>> >>> -XX:NewSize=128m
>> >>> -XX:MaxNewSize=128m
>> >>> -XX:+UseG1GC
>> >>> -XX:+PerfDisableSharedMem
>> >>> -XX:+ParallelRefProcEnabled
>> >>> -XX:G1HeapRegionSize=8m
>> >>> -XX:MaxGCPauseMillis=250
>> >>> -XX:InitiatingHeapOccupancyPercent=75
>> >>> -XX:+UseLargePages
>> >>> -XX:+AggressiveOpts
>> >>> -Xmx32G
>> >>> -Xms32G
>> >>> -Xss256k
>> >>>
>> >>>
>> >>> Thanks & Regards
>> >>> Yasufumi.
>> >>>
>> >>> 2019年9月30日(月) 16:12 Deepak Goel :
>> >>>
>>  Hello
>> 
>>  Can you please share the JVM heap settings in detail?
>> 
>>  Deepak
>> 
>>  On Mon, 30 Sep 2019, 11:15 Yasufumi Mizoguchi, <
>> yasufumi0...@gmail.com>
>>  wrote:
>> 
>> > Hi,
>> >
>> > I am trying some tests to confirm if single Solr instance can
>> perform
>>  over
>> > 1000 queries per second(!).
>> >
>> > But now, although CPU usage is 40% or so and iowait is almost 0%,
>> > throughp

Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Yasufumi Mizoguchi
Thank you for your response.

Our index has about 25,000,000 docs. Is this quite difficult to achieve
1000 qps with single Solr instance even if all index files are cached on OS
page cache?

And as you pointed out, there are lots of minor GC during the test, I will
try to resize NewRatio.

Thanks,
Yasufumi.

2019年9月30日(月) 22:18 Shawn Heisey :

> On 9/29/2019 11:44 PM, Yasufumi Mizoguchi wrote:
> > I am trying some tests to confirm if single Solr instance can perform
> over
> > 1000 queries per second(!).
>
> In general, I would never expect a single instance to handle a large
> number of queries per second unless the index is REALLY small -- dozens
> or hundreds of very small documents.  A 60GB index definitely does not
> qualify.
>
> I don't think it will be possible to handle 1000 queries per second with
> a single server even on a really small index, but I've never actually
> tried.
>
> > But now, although CPU usage is 40% or so and iowait is almost 0%,
> > throughput does not increase over 60 queries per second.
>
> A query rate of 60 per second is pretty good with an index size of 60GB.
>   The low iowait would tend to confirm that the index is well cached by
> the OS.
>
> If you need to handle 1000 queries per second, you need more copies of
> your index on additional Solr servers, with something in the mix to
> perform load balancing.
>
> Some thoughts:
>
> With your -XX:MaxNewSize=128m setting, you are likely causing garbage
> collection to occur VERY frequently, which will slow things down.
> Solr's default GC settings include -XX:NewRatio=3 so that the new
> generation will be much larger than what you have set.  A program like
> Solr that allocates a lot of memory will need a fairly large new
> generation.
>
> I agree with the idea of setting the heap to 31GB.  Setting it to 31GB
> will actually leave more memory available to Solr than setting it to
> 32GB, because of the decreased pointer sizes.
>
> Definitely check what Erick mentioned.  If you're seeing what he
> described, adjusting how threads work might get you more throughput.
> But look into your new generation sizing first.
>
> Thanks,
> Shawn
>


Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Yasufumi Mizoguchi
Thank you for replying me.

Followings are current load test set up.

* Load test program : JUnit
* The number of Firing hosts : 6
* [Each host]ThreadGroup.num_threads : 200
* [Each host]ThreadGroup.ramp_time : 600
* [Each host]ThreadGroup.duration: 1800
* [Each host]ThreadGroup.delay: 0
* The number of sample queries: 20,000,000

And we confirmed that Jetty threads was increasing and reached the limit
(1).
Therefore, we raised MaxThreads value.

I checked GC logs and found that it happened no major GC, and almost all
minor GC were finished by 200ms.

Cache hit rate is not checked now, but I think those are extremely low all
kinds of cache.
Because the number of sample query is big(20,000,000) compared to
queryResult and filter cache size(both 512) and there are few duplication
in fq and q parameter.
Those small cache size are intended, but I want to change those

Thanks,
Yasufumi



2019年9月30日(月) 20:49 Erick Erickson :

> The most basic question is how you are load-testing it? Assuming you have
> some kind of client firing queries at Solr, keep adding threads so Solr is
> handling more and more queries in parallel. If you start to see the
> response time at the client get longer _and_ the  QTime in Solr’s response
> stays about the same, then the queries are queueing up and you need to see
> about increasing the Jetty threads handling queries.
>
> Second is whether you’re hitting GC pauses, look at the GC logs,
> especially for “stop the world” pauses. This is unlikely as you’re still
> getting 60 qps, but something to check.
>
> Setting your heap to 31G is good advice, but it won’t dramatically
> increase the throughput I’d guess.
>
> If your I/O isn’t very high, then your index is mostly memory-resident. A
> general bit of tuning advice is to _reduce_ the heap size, leaving OS
> memory for the index. See Uwe’s blog:
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
> There’s a sweet spot between having too much heap and too little, and
> unfortunately you have to experiment to find out.
>
> But given the numbers you’re talking about, you won’t be getting 1,000 QPS
> on a single box and you’ll have to scale out with replicas to hit that
> number. Getting all the QPS you can out of the box is important, of course.
> Do be careful to use enough different queries that you don’t get them from
> the queryResultCache. I had one client who was thrilled they were getting
> 3ms response times…. by firing the same query over and over and hitting the
> queryResultCache 99.% of the time ;).
>
> Best,
> Erick
>
> > On Sep 30, 2019, at 4:28 AM, Yasufumi Mizoguchi 
> wrote:
> >
> > Hi, Ere.
> >
> > Thank you for valuable feedback.
> > I will try Xmx31G and Xms31G instead of current ones.
> >
> > Thanks and Regards,
> > Yasufumi.
> >
> > 2019年9月30日(月) 17:19 Ere Maijala :
> >
> >> Just a side note: -Xmx32G is really bad for performance as it forces
> >> Java to use non-compressed pointers. You'll actually get better results
> >> with -Xmx31G. For more information, see e.g.
> >>
> >>
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
> >>
> >> Regards,
> >> Ere
> >>
> >> Yasufumi Mizoguchi kirjoitti 30.9.2019 klo 11.05:
> >>> Hi, Deepak.
> >>> Thank you for replying me.
> >>>
> >>> JVM settings from solr.in.sh file are as follows. (Sorry, I could not
> >> show
> >>> all due to our policy)
> >>>
> >>> -verbose:gc
> >>> -XX:+PrintHeapAtGC
> >>> -XX:+PrintGCDetails
> >>> -XX:+PrintGCDateStamps
> >>> -XX:+PrintGCTimeStamps
> >>> -XX:+PrintTenuringDistribution
> >>> -XX:+PrintGCApplicationStoppedTime
> >>> -Dcom.sun.management.jmxremote.ssl=false
> >>> -Dcom.sun.management.jmxremote.authenticate=false
> >>> -Dcom.sun.management.jmxremote.port=18983
> >>> -XX:OnOutOfMemoryError=/home/solr/solr-6.2.1/bin/oom_solr.sh
> >>> -XX:NewSize=128m
> >>> -XX:MaxNewSize=128m
> >>> -XX:+UseG1GC
> >>> -XX:+PerfDisableSharedMem
> >>> -XX:+ParallelRefProcEnabled
> >>> -XX:G1HeapRegionSize=8m
> >>> -XX:MaxGCPauseMillis=250
> >>> -XX:InitiatingHeapOccupancyPercent=75
> >>> -XX:+UseLargePages
> >>> -XX:+AggressiveOpts
> >>> -Xmx32G
> >>> -Xms32G
> >>> -Xss256k
> >>>
> >>>
> >>> Thanks & Regards
> >>> Yasufumi.
> >>>
> >>> 2019年9月30日(月) 16:12 Deepak Goel :
> >>>
>  Hello
> 
>  Can you please share the JVM heap settings in detail?
> 
>  Deepak
> 
>  On Mon, 30 Sep 2019, 11:15 Yasufumi Mizoguchi, <
> yasufumi0...@gmail.com>
>  wrote:
> 
> > Hi,
> >
> > I am trying some tests to confirm if single Solr instance can perform
>  over
> > 1000 queries per second(!).
> >
> > But now, although CPU usage is 40% or so and iowait is almost 0%,
> > throughput does not increase over 60 queries per second.
> >
> > I think there are some bottlenecks around Kernel, JVM, or Solr
> >> settings.
> >
> > The values we already checked and configured are followings.
> >
> > * Kernel:
> > file descriptor

Re: SOlr Spatial Search - Filter and sort on multiple cities

2019-10-01 Thread anushka gupta
Thanks,

Could you please also let me know how to combine two geofilt fqs because if
i use '&' like :
admin_directory_search_geolocation?q=david&fq=({!geofilt&sfield=adminLatLon&pt=33.0198431,-96.6988856&d=80})+OR+({!geofilt&sfield=adminLatLon&pt=50.2171726,8.265894&d=80})

then it gives me error as it takes ")" paranthesis as part of d=80



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solrcloud export all results sorted by score

2019-10-01 Thread Edward Turner
Hi all,

As far as I understand, SolrCloud currently does not allow the use of
sorting by the pseudofield, score in the /export request handler (i.e., get
the results in relevancy order). If we do attempt this, we get an
exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
supported with xsort". We could use Solr's cursorMark, but this takes a
very long time ...

Exporting results does work, however, when exporting result sets by a
specific document field that has docValues set to true.

Question:
Does anyone know if/when it will be possible to sort by score in the
/export handler?

Research on the problem:
We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
https://issues.apache.org/jira/browse/SOLR-8664, which are related to this
issue, but don't fix it. Maybe I've missed a more relevant issue?

Our use-case We are using Solrcloud in our team and it's added a huge
amount of value to our users.

We show a table of search results ordered by score (relevancy) that was
obtained from sending a query to the standard /select handler. We're
working in the life-sciences domain and it is common for our result sets to
contain many millions of results (unfortunately). After users browse their
results, they then may want to download the results that they see, to do
some post-processing. However, to do this, such that the results appear in
the order that the user originally saw them, we'd need to be able to export
results based on score/relevancy.

Any suggestions or advice on this would be greatly appreciated!

Many thanks!

Edd

PS. apologies for posting also on Stackoverflow (
https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score)
--
I only discovered the Solr mailing-list afterwards and thought it probably
better to reach out directly to Solr's people (I can share any answer from
this forum on there retrospectively).


Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Paras Lehana
Besides what all have suggested, can you share your testing setup? Are you
using JMeter? I'm asking this to confirm that your setup is "actually
trying" to generate 1000 simultaneous threads and not bottlenecking the
process. I get 170 qps easily on a simpler server but I remember that I
could not go past 200 threads due to my testing PC limitations (I was
firing queries from my PC to the server).

On Tue, 1 Oct 2019 at 11:29, Jörn Franke  wrote:

> Why do you need 1000 qps?
>
>
> > Am 30.09.2019 um 07:45 schrieb Yasufumi Mizoguchi <
> yasufumi0...@gmail.com>:
> >
> > Hi,
> >
> > I am trying some tests to confirm if single Solr instance can perform
> over
> > 1000 queries per second(!).
> >
> > But now, although CPU usage is 40% or so and iowait is almost 0%,
> > throughput does not increase over 60 queries per second.
> >
> > I think there are some bottlenecks around Kernel, JVM, or Solr settings.
> >
> > The values we already checked and configured are followings.
> >
> > * Kernel:
> > file descriptor
> > net.ipv4.tcp_max_syn_backlog
> > net.ipv4.tcp_syncookies
> > net.core.somaxconn
> > net.core.rmem_max
> > net.core.wmem_max
> > net.ipv4.tcp_rmem
> > net.ipv4.tcp_wmem
> >
> > * JVM:
> > Heap [ -> 32GB]
> > G1GC settings
> >
> > * Solr:
> > (Jetty) MaxThreads [ -> 2]
> >
> >
> > And the other info is as follows.
> >
> > CPU : 16 cores
> > RAM : 128 GB
> > Disk : SSD 500GB
> > NIC : 10Gbps(maybe)
> > OS : Ubuntu 14.04
> > JVM : OpenJDK 1.8.0u191
> > Solr : 6.2.1
> > Index size : about 60GB
> >
> > Any insights will be appreciated.
> >
> > Thanks and regards,
> > Yasufumi.
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Software Programmer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.