Running Solr as a service

2019-08-13 Thread Zheng Lin Edwin Yeo
Hi,

Is there anyway that we can run Solr as a service, and allow it to start
automatically during the startup of the system?

I have tried to set up the service by using nssm, but it only works for
ZooKeeper and not Solr.

I am using Solr 8.2.0.

Regards,
Edwin


Re: Indexed Data Size

2019-08-13 Thread Greg Harris
Brett, it’s probably because you hit the 5g default segment size limit on
Solr and in order to merge segments a huge number of the docs within the
segment must be marked as deleted. So even if large amounts of docs are
deleted docs within the segment, the segment is still there, happily taking
up space. That could theoretically be a reason for a optimize, but you’d
want to specify maxsegments with the goal of not merging to a single
segment for the entire index. Ideally you should just keep as many of the
logs as you actually use (which is hopefully more limited than what you are
keeping). Since the segments will be somewhat time based they would
eventually disappear/merge through time, hopefully negating any reason to
consider having to optimize

Greg

On Tue, Aug 13, 2019 at 3:31 PM Moyer, Brett  wrote:

> Turns out this is due to a job that indexes logs. We were able to clear
> some with another job. We are working through the value of these indexed
> logs. Thanks for all your help!
>
> Brett Moyer
> Manager, Sr. Technical Lead | TFS Technology
>   Public Production Support
>   Digital Search & Discovery
>
> 8625 Andrew Carnegie Blvd | 4th floor
> Charlotte, NC 28263
> Tel: 704.988.4508
> Fax: 704.988.4907
> bmo...@tiaa.org
>
> -Original Message-
> From: Shawn Heisey 
> Sent: Friday, August 9, 2019 2:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Indexed Data Size
>
> On 8/9/2019 12:17 PM, Moyer, Brett wrote:
> > The biggest is /data/solr/system_logs_shard1_replica_n1/data/index,
> files with the extensions I stated previously. Each is 5gb and there are a
> few hundred. Dated by to last 3 months. I don’t understand why there are so
> many files with such small indexes. Not sure how to clean them up.
>
> Can you get a screenshot of the core overview for that particular core?
> Solr should correctly calculate the size on the overview based on what
> files are actually in the index directory.
>
> Thanks,
> Shawn
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA
> *
>


RE: Indexed Data Size

2019-08-13 Thread Moyer, Brett
Turns out this is due to a job that indexes logs. We were able to clear some 
with another job. We are working through the value of these indexed logs. 
Thanks for all your help!

Brett Moyer
Manager, Sr. Technical Lead | TFS Technology
  Public Production Support
  Digital Search & Discovery

8625 Andrew Carnegie Blvd | 4th floor
Charlotte, NC 28263
Tel: 704.988.4508
Fax: 704.988.4907
bmo...@tiaa.org

-Original Message-
From: Shawn Heisey  
Sent: Friday, August 9, 2019 2:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexed Data Size

On 8/9/2019 12:17 PM, Moyer, Brett wrote:
> The biggest is /data/solr/system_logs_shard1_replica_n1/data/index, files 
> with the extensions I stated previously. Each is 5gb and there are a few 
> hundred. Dated by to last 3 months. I don’t understand why there are so many 
> files with such small indexes. Not sure how to clean them up.

Can you get a screenshot of the core overview for that particular core? 
Solr should correctly calculate the size on the overview based on what files 
are actually in the index directory.

Thanks,
Shawn
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Re: Turn off CDCR for only selected target clusters

2019-08-13 Thread Erick Erickson
You configure CDCR by _collection_, so this question really makes no sense. 
You’d never mention collection.configName. So what I suspect is that you’re
misreading the docs. 


${targetZkHost1},${targetZkHost2},${targetZkHost3}
sourceCollection_on_local_cluster
targetCollection_on_targetZkHost1 2 and 3


“Turning off CDCR” selective for ZooKeeper instances really makes no sense as 
the
point of ZK ensembles is to keep running even if one goes away.

So can you rephrase the question? Or state the problem you’re trying to solve 
another way?

Best,
Erick

> On Aug 13, 2019, at 1:57 PM, Arnold Bronley  wrote:
> 
> Hi,
> 
> Is there a way to turn off the CDCR for only selected target clusters.
> 
> Say, I have a configuration like following. I have 3 target clusters
> targetZkHost1, targetZkHost2 and targetZkHost3. Is it possible to turn off
> the CDCR for targetZkHost2 and targetZkHost3 but keep it on for
> targetZkHost1?
> 
> E.g.
> 
>  
> 
> ${targetZkHost1}
> ${collection.configName}
> ${collection.configName}
> 
> 
> 
> ${targetZkHost2}
> ${collection.configName}
> ${collection.configName}
> 
> 
> 
> ${targetZkHost3}
> ${collection.configName}
> ${collection.configName}
> 
> 
> 
> 8
> 1000
> 128
> 
> 
> 
> 1000
> 
> 
> 
> disabled
> 
>  



Turn off CDCR for only selected target clusters

2019-08-13 Thread Arnold Bronley
Hi,

Is there a way to turn off the CDCR for only selected target clusters.

Say, I have a configuration like following. I have 3 target clusters
targetZkHost1, targetZkHost2 and targetZkHost3. Is it possible to turn off
the CDCR for targetZkHost2 and targetZkHost3 but keep it on for
targetZkHost1?

E.g.

  
 
${targetZkHost1}
${collection.configName}
${collection.configName}
 

 
${targetZkHost2}
${collection.configName}
${collection.configName}
 

 
${targetZkHost3}
${collection.configName}
${collection.configName}
 

 
8
1000
128
 

 
1000
 

 
disabled
 
  


Re: Slow Indexing scaling issue

2019-08-13 Thread Erick Erickson
Here’s some sample SolrJ code using TIka outside of Solr’s Extracting Request 
Handler, along with some info about why loading Solr with the job of extracting 
text is not optimal speed wise:

https://lucidworks.com/post/indexing-with-solrj/

> On Aug 13, 2019, at 12:15 PM, Jan Høydahl  wrote:
> 
> You May want to review 
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-SlowIndexing
>  for some hints.
> 
> Make sure to index with multiple parallel threads. Also remember that using 
> /extract on the solr side is resource intensive and may make your cluster 
> slow and unstable. Better to use Tika or similar on the client side and send 
> text docs to solr.
> 
> Jan Høydahl
> 
>> 13. aug. 2019 kl. 16:52 skrev Parmeshwor Thapa :
>> 
>> Hi,
>> 
>> We are having some issue on scaling solr indexing. Looking for suggestion.
>> 
>> Setup : We have two solr cloud (7.4) instances running in separate cloud
>> VMs with an external zookeeper ensemble.
>> 
>> We are sending async / non-blocking http request to index documents in solr.
>> 2
>> 
>> cloud VMs ( 4 core * 32 GB)
>> 
>> 16 gb allocated for jvm
>> 
>> We are sending all types to document to solr , which it would extract and
>> index,  Using /update/extract request handler
>> 
>> We have stopwords.txt and dictionary (7mb) for stemming.
>> 
>> 
>> 
>> Issue : indexing speed is quite slow for us. It is taking around 2 hours to
>> index around 3 gb of data. 10,000 documents(PDF, xls, word, etc). We are
>> planning to index approximately 10 tb of data.
>> 
>> Below is the solr config setting and schema,
>> 
>> 
>> 
>> 
>> 
>>   
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> > languageSet="auto" ruleType="APPROX" concat="true"/>
>> 
>>   
>> 
>> 
>> 
>> 
>> 
>>   
>> 
>> > tokenizerModel="en-token.bin" sentenceModel="en-sent.bin"/>
>> 
>>   
>> 
>> > posTaggerModel="en-pos-maxent.bin"/>
>> 
>> > dictionary="en-lemmatizer-again.dict.txt"/>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> > stored="false"/>
>> 
>> 
>> 
>> 
>> 
>> > indexed="true" stored="true"/>
>> 
>> > required="true" stored="true"/>
>> 
>> > indexed="true" stored="true"/>
>> 
>> > indexed="true" stored="true"/>
>> 
>> > stored="true"/>
>> 
>> > indexed="true" stored="true"/>
>> 
>> > indexed="true" stored="true" />
>> 
>> > stored="true"/>
>> 
>> > indexed="true" stored="true"/>
>> 
>> > indexed="true" stored="true"/>
>> 
>> > indexed="true" stored="false"/>
>> 
>> > indexed="true" stored="false"/>
>> 
>> > indexed="true" stored="true"/>
>> 
>> > indexed="true" stored="true"/>
>> 
>> > indexed="true" stored="true"/>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> > docValues="false" />
>> 
>> 
>> 
>> And below is the solrConfig,
>> 
>> 
>> 
>> 
>> 
>>  BEST_COMPRESSION
>> 
>> 
>> 
>> 
>> 
>>   
>> 
>>   1000
>> 
>>   60
>> 
>>   false
>> 
>>   
>> 
>> 
>> 
>>   
>> 
>> ${solr.autoSoftCommit.maxTime:-1}
>> 
>>   
>> 
>> 
>> 
>> > 
>> startup="lazy"
>> 
>> class="solr.extraction.ExtractingRequestHandler" >
>> 
>>   
>> 
>> true
>> 
>> ignored_
>> 
>> content
>> 
>>   
>> 
>> 
>> 
>> *Thanks,*
>> 
>> *Parmeshwor Thapa*



Re: Slow Indexing scaling issue

2019-08-13 Thread Jan Høydahl
You May want to review 
https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-SlowIndexing
 for some hints.

Make sure to index with multiple parallel threads. Also remember that using 
/extract on the solr side is resource intensive and may make your cluster slow 
and unstable. Better to use Tika or similar on the client side and send text 
docs to solr.

Jan Høydahl

> 13. aug. 2019 kl. 16:52 skrev Parmeshwor Thapa :
> 
> Hi,
> 
> We are having some issue on scaling solr indexing. Looking for suggestion.
> 
> Setup : We have two solr cloud (7.4) instances running in separate cloud
> VMs with an external zookeeper ensemble.
> 
> We are sending async / non-blocking http request to index documents in solr.
> 2
> 
> cloud VMs ( 4 core * 32 GB)
> 
> 16 gb allocated for jvm
> 
> We are sending all types to document to solr , which it would extract and
> index,  Using /update/extract request handler
> 
> We have stopwords.txt and dictionary (7mb) for stemming.
> 
> 
> 
> Issue : indexing speed is quite slow for us. It is taking around 2 hours to
> index around 3 gb of data. 10,000 documents(PDF, xls, word, etc). We are
> planning to index approximately 10 tb of data.
> 
> Below is the solr config setting and schema,
> 
> 
> 
>  
> 
>
> 
>  
> 
>  
> 
>  
> 
>   languageSet="auto" ruleType="APPROX" concat="true"/>
> 
>
> 
>  
> 
>  
> 
>
> 
>   tokenizerModel="en-token.bin" sentenceModel="en-sent.bin"/>
> 
>
> 
>   posTaggerModel="en-pos-maxent.bin"/>
> 
>   dictionary="en-lemmatizer-again.dict.txt"/>
> 
> 
> 
> 
> 
>  
> 
>  
> 
> 
> 
>  
> 
> 
> 
>   stored="false"/>
> 
>  
> 
> 
> 
>   indexed="true" stored="true"/>
> 
>   required="true" stored="true"/>
> 
>   indexed="true" stored="true"/>
> 
>   indexed="true" stored="true"/>
> 
>   stored="true"/>
> 
>   indexed="true" stored="true"/>
> 
>   indexed="true" stored="true" />
> 
>   stored="true"/>
> 
>   indexed="true" stored="true"/>
> 
>   indexed="true" stored="true"/>
> 
>   indexed="true" stored="false"/>
> 
>   indexed="true" stored="false"/>
> 
>   indexed="true" stored="true"/>
> 
>   indexed="true" stored="true"/>
> 
>   indexed="true" stored="true"/>
> 
> 
> 
>  
> 
>  
> 
> 
> 
>   docValues="false" />
> 
> 
> 
> And below is the solrConfig,
> 
> 
> 
>  
> 
>   BEST_COMPRESSION
> 
>  
> 
> 
> 
>
> 
>1000
> 
>60
> 
>false
> 
>
> 
> 
> 
>
> 
>  ${solr.autoSoftCommit.maxTime:-1}
> 
>
> 
> 
> 
>   
>  startup="lazy"
> 
>  class="solr.extraction.ExtractingRequestHandler" >
> 
>
> 
>  true
> 
>  ignored_
> 
>  content
> 
>
> 
>  
> 
> *Thanks,*
> 
> *Parmeshwor Thapa*


Re: Solr cloud questions

2019-08-13 Thread Shawn Heisey

On 8/13/2019 9:28 AM, Kojo wrote:

Here are the last two gc logs:

https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ


Thank you for that.

Analyzing the 20MB gc log actually looks like a pretty healthy system. 
That log covers 58 hours of runtime, and everything looks very good to me.


https://www.dropbox.com/s/yu1pyve1bu9maun/gc-analysis-kojo.png?dl=0

But the small log shows a different story.  That log only covers a 
little more than four minutes.


https://www.dropbox.com/s/vkxfoihh12brbnr/gc-analysis-kojo2.png?dl=0

What happened at approximately 10:55:15 PM on the day that the smaller 
log was produced?  Whatever happened caused Solr's heap usage to 
skyrocket and require more than 6GB.


Thanks,
Shawn


Re: Solr cloud questions

2019-08-13 Thread Kojo
Shawn,
Here are the last two gc logs:

https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ


Thank you,
Koji


Em ter, 13 de ago de 2019 às 09:33, Shawn Heisey 
escreveu:

> On 8/13/2019 6:19 AM, Kojo wrote:
> > --
> > tail -f  node1/logs/solr_oom_killer-8983-2019-08-11_22_57_56.log
> > Running OOM killer script for process 38788 for Solr on port 8983
> > Killed process 38788
> > --
>
> Based on what I can see, a 6GB heap is not big enough for the setup
> you've got.  There are two ways to deal with an OOME problem.  1)
> Increase the resource that was depleted.  2) Change the configuration so
> the program needs less of that resource.
>
>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-JavaHeap
>
> > tail -50   node1/logs/archived/solr_gc.log.4.current
>
> To be useful, we will need the entire GC log, not a 50 line subset.  In
> the subset, I can see that there was a full GC that did absolutely
> nothing -- no memory was freed.  This is evidence that your heap is too
> small.  You will need to use a file shariog site and provide a URL for
> the entire GC log - email attachments rarely make it to the list.  The
> bigger the log is, the better idea we can get about what heap size you
> need.
>
> Thanks,
> Shawn
>


Slow Indexing scaling issue

2019-08-13 Thread Parmeshwor Thapa
Hi,

We are having some issue on scaling solr indexing. Looking for suggestion.

Setup : We have two solr cloud (7.4) instances running in separate cloud
VMs with an external zookeeper ensemble.

We are sending async / non-blocking http request to index documents in solr.
2

 cloud VMs ( 4 core * 32 GB)

16 gb allocated for jvm

We are sending all types to document to solr , which it would extract and
index,  Using /update/extract request handler

We have stopwords.txt and dictionary (7mb) for stemming.



Issue : indexing speed is quite slow for us. It is taking around 2 hours to
index around 3 gb of data. 10,000 documents(PDF, xls, word, etc). We are
planning to index approximately 10 tb of data.

Below is the solr config setting and schema,



  



  

  

  

  



  

  



  



  

  





  

  



  



  

  



  

  

  

  

  

  

  

  

  

  

  

  

  

  

  



  

  



  



And below is the solrConfig,



  

   BEST_COMPRESSION

  





1000

60

false







  ${solr.autoSoftCommit.maxTime:-1}





  



  true

  ignored_

  content



  

 *Thanks,*

*Parmeshwor Thapa*


Re: Enumerating cores via SolrJ

2019-08-13 Thread Mark H. Wood
On Fri, Aug 09, 2019 at 03:45:21PM -0600, Shawn Heisey wrote:
> On 8/9/2019 3:07 PM, Mark H. Wood wrote:
> > Did I miss something, or is there no way, using SolrJ, to enumerate
> > loaded cores, as:
> > 
> >curl 'http://solr.example.com:8983/solr/admin/cores?action=STATUS'
> > 
> > does?
> 
> This code will do so.  I tested it.
[snip]

Thank you.  That was just the example I needed.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Problem with Solr alias and suggester call

2019-08-13 Thread Eileen Mosch
Hi all,

we updated Solr from version 7.3.1 to version 8.1.1 and detected a problem 
requesting suggesters via multi-collection alias. It also exists in version 
8.2.0. I think it is a bug but maybe someone can verify this?

I created an alias called „WORLD“ pointing to six collections. If I send a 
/suggest request using SolrClient I get the error message: „No live SolrServers 
available to handle this request:[http://[myIP]:8983/solr/WORLD“. If I try to 
call /suggest via Solr HTTP API, I get „Timeout occured while waiting response 
from server at: http://[myIP]:8983/solr/AT.test_shard1_replica_n1/suggest“

If I use single-collection Aliases everything is fine.

Kind Regards,
Eileen

Here are the details:

Alias creation request:

  *   
http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=WORLD&collections=AT.test,BE.test,DE.test,FR.test,LU.test,NL.test

aliases.json content:
{
  "responseHeader":{
"status":0,
"QTime":0},
  "aliases":{
"AT":"AT.test",
"BE":"BE.test",
"DE":"DE.test",
"FR":"FR.test",
"LU":"LU.test",
"NL":"NL.test",
"WORLD":"AT.test,BE.test,DE.test,FR.test,LU.test,NL.test"},
  "properties":{}
}

Suggester call via http API

  *   
http://localhost:8983/solr/WORLD/suggest?suggest=true&suggest.q=Rue%20Roi%20Albert&suggest.dictionary=streetSuggester&suggest.dictionary=streetSuggesterFuzzy&suggest.build=true

Response/Stacktrace of http call:

  *   see attachement


---
LOGIBALL GmbH

B.Sc. Kartographie und Geomatik Eileen Mosch
Senior Software Engineer

Am Studio 2a
12489 Berlin
Germany
phone:
 +49.30.6392.802.42
email:
 eileen.mo...@logiball.de
internet:
 http://www.logiball.de

MD: Dr.-Ing. Roger Müller
AG Bochum, HRB 9402
Tax.-Nr: 325/5837/0464
VAT.-Nr.: DE 164887900

Die Information in dieser eMail ist vertraulich. Sie ist ausschließlich für den 
Adressaten bestimmt. Jeglicher Zugriff auf diese eMail durch andere Personen 
als den Adressaten ist untersagt. Sollten Sie nicht der für diese eMail 
bestimmte Adressat sein, ist Ihnen jede Veröffentlichung, Vervielfältigung oder 
Weitergabe wie auch das Ergreifen oder Unterlassen von Maßnahmen im Vertrauen 
auf erlangte Information untersagt.
The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.



Re: Solr cloud questions

2019-08-13 Thread Shawn Heisey

On 8/13/2019 6:19 AM, Kojo wrote:

--
tail -f  node1/logs/solr_oom_killer-8983-2019-08-11_22_57_56.log
Running OOM killer script for process 38788 for Solr on port 8983
Killed process 38788
--


Based on what I can see, a 6GB heap is not big enough for the setup 
you've got.  There are two ways to deal with an OOME problem.  1) 
Increase the resource that was depleted.  2) Change the configuration so 
the program needs less of that resource.


https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-JavaHeap


tail -50   node1/logs/archived/solr_gc.log.4.current


To be useful, we will need the entire GC log, not a 50 line subset.  In 
the subset, I can see that there was a full GC that did absolutely 
nothing -- no memory was freed.  This is evidence that your heap is too 
small.  You will need to use a file shariog site and provide a URL for 
the entire GC log - email attachments rarely make it to the list.  The 
bigger the log is, the better idea we can get about what heap size you need.


Thanks,
Shawn


Re: Solr cloud questions

2019-08-13 Thread Kojo
Erick and Shawn,
thank you very much for the very usefull information.

When I start to move from sigle Solr to cloud, I was planning to use the
cluster for very large collections.

But the collection that I said, will not grow that much, so I will downsize
shards.


Thanks for the information about load balancing. I will provide it.



Shawn, bellow I share the information that I hope will clarify.

Linux CentOS
Solr 6.6
64 Gb each box
6 Gb each node

The last time the node died was on 2019-08-11. It happens sometimes a week.


--
tail -f  node1/logs/solr_oom_killer-8983-2019-08-11_22_57_56.log
Running OOM killer script for process 38788 for Solr on port 8983
Killed process 38788
--


--
ls -ltr  node1/logs/archived/
total 82032
-rw-rw-r-- 1 solr solr 20973030 Aug  4 18:31 solr_gc.log.0
-rw-rw-r-- 1 solr solr 20973415 Aug  6 21:05 solr_gc.log.1
-rw-rw-r-- 1 solr solr 20971714 Aug  9 12:01 solr_gc.log.2
-rw-rw-r-- 1 solr solr 20971720 Aug 11 22:53 solr_gc.log.3
-rw-rw-r-- 1 solr solr77096 Aug 11 22:57 solr_gc.log.4.current
-rw-rw-r-- 1 solr solr  364 Aug 11 22:57 solr-8983-console.log
--


--
tail -50   node1/logs/archived/solr_gc.log.4.current
 Metaspace   used 50496K, capacity 51788K, committed 53140K, reserved
1097728K
  class spaceused 5001K, capacity 5263K, committed 5524K, reserved
1048576K
}
2019-08-11T22:57:39.231-0300: 802516.887: Total time for which application
threads were stopped: 12.5386815 seconds, Stopping threads took: 0.0001242
seconds
{Heap before GC invocations=34291 (full 252):
 par new generation   total 1310720K, used 1310719K [0x00064000,
0x0006a000, 0x0006a000)
  eden space 1048576K, 100% used [0x00064000, 0x00068000,
0x00068000)
  from space 262144K,  99% used [0x00069000, 0x00069ff8,
0x0006a000)
  to   space 262144K,   0% used [0x00068000, 0x00068000,
0x00069000)
 concurrent mark-sweep generation total 4718592K, used 4718592K
[0x0006a000, 0x0007c000, 0x0007c000)
 Metaspace   used 50496K, capacity 51788K, committed 53140K, reserved
1097728K
  class spaceused 5001K, capacity 5263K, committed 5524K, reserved
1048576K
2019-08-11T22:57:39.233-0300: 802516.889: [Full GC (Allocation Failure)
2019-08-11T22:57:39.233-0300: 802516.889: [CMS:
4718592K->4718591K(4718592K), 5.5779385 secs] 6029311K->6029311K(6029312K),
[Metaspace: 50496K->50496K(1097728K)], 5.5780863 secs] [Times: user=5.58
sys=0.00, real=5.58 secs]
Heap after GC invocations=34292 (full 253):
 par new generation   total 1310720K, used 1310719K [0x00064000,
0x0006a000, 0x0006a000)
  eden space 1048576K,  99% used [0x00064000, 0x00067f68,
0x00068000)
  from space 262144K,  99% used [0x00069000, 0x00069f18,
0x0006a000)
  to   space 262144K,   0% used [0x00068000, 0x00068000,
0x00069000)
 concurrent mark-sweep generation total 4718592K, used 4718591K
[0x0006a000, 0x0007c000, 0x0007c000)
 Metaspace   used 50496K, capacity 51788K, committed 53140K, reserved
1097728K
  class spaceused 5001K, capacity 5263K, committed 5524K, reserved
1048576K
}
2019-08-11T22:57:44.812-0300: 802522.469: Total time for which application
threads were stopped: 5.5805500 seconds, Stopping threads took: 0.0001295
seconds
{Heap before GC invocations=34292 (full 253):
 par new generation   total 1310720K, used 1310719K [0x00064000,
0x0006a000, 0x0006a000)
  eden space 1048576K, 100% used [0x00064000, 0x00068000,
0x00068000)
  from space 262144K,  99% used [0x00069000, 0x00069f98,
0x0006a000)
  to   space 262144K,   0% used [0x00068000, 0x00068000,
0x00069000)
 concurrent mark-sweep generation total 4718592K, used 4718591K
[0x0006a000, 0x0007c000, 0x0007c000)
 Metaspace   used 50496K, capacity 51788K, committed 53140K, reserved
1097728K
  class spaceused 5001K, capacity 5263K, committed 5524K, reserved
1048576K
2019-08-11T22:57:44.813-0300: 802522.470: [Full GC (Allocation Failure)
2019-08-11T22:57:44.813-0300: 802522.470: [CMS:
4718591K->4718591K(4718592K), 5.5944800 secs] 6029311K->6029311K(6029312K),
[Metaspace: 50496K->50496K(1097728K)], 5.5946363 secs] [Times: user=5.60
sys=0.00, real=5.59 secs]
Heap after GC invocations=34293 (full 254):
 par new generation   total 1310720K, used 1310719K [0x00064000,
0x0006a000, 0x0006a000)
  eden space 1048576K,  99% used [0x00064000, 0x00067fe8,
0x00068000)
  from space 262144K,  99% used [0x00069000, 0x00069f98,
0x0006a000)
  to   space 262144K,   0% used [0x00068000, 0x00068000,
0x00069000)
 concurrent mark-sweep generation total 4718592K, used 4718591K
[0x0006a

Re: Solr restricting time-consuming/heavy processing queries

2019-08-13 Thread Mark Robinson
Thank you Jan for the reply.
I will try it out.

Best,
Mark.

On Mon, Aug 12, 2019 at 6:29 PM Jan Høydahl  wrote:

> I have never used such settings, but you could check out
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#segmentterminateearly-parameter
> which will allow you to pre-sort the index so that any early termination
> will actually return the most relevant docs. This will probably be easier
> to setup once https://issues.apache.org/jira/browse/SOLR-13681 is done.
>
> According to that same page you will not be able to abort long-running
> faceting using timeAllowed, but there are other ways to optimize faceting,
> such as using jsonFacet, threaded execution etc.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 12. aug. 2019 kl. 23:10 skrev Mark Robinson :
>
> Hi Jan,
>
> Thanks for the reply.
> Our normal search times is within 650 ms.
> We were analyzing some queries and found that few of them were like 14675
> ms, 13767 ms etc...
> So was curious to see whether we have some way to restrict the query to
> not run beyond say 5s or some ideal timing  in SOLR even if it returns only
> partial results.
>
> That is how I came across the "timeAllowed" and wanted to check on it.
> Also was curious to know whether  "shardHandler"  could be used to work
> in those lines or it is meant for a totally different functionality.
>
> Thanks!
> Best,
> Mark
>
>
> On Sun, Aug 11, 2019 at 8:17 AM Jan Høydahl  wrote:
>
>> What is the root use case you are trying to solve? What kind of solr
>> install is this and do you not have control over the clients or what is the
>> reason that users overload your servers?
>>
>> Normally you would scale the cluster to handle normal expected load
>> instead of trying to give users timeout exceptions. What kind of query
>> times do you experience that are above 1s and are these not important
>> enough to invest extra HW? Trying to understand the real reason behind your
>> questions.
>>
>> Jan Høydahl
>>
>> > 11. aug. 2019 kl. 11:43 skrev Mark Robinson :
>> >
>> > Hello,
>> > Could someone share their thoughts please or point to some link that
>> helps
>> > understand my above queries?
>> > In the Solr documentation I came across a few lines on timeAllowed and
>> > shardHandler, but if there was an example scenario for both it would
>> help
>> > understand them more thoroughly.
>> > Also curious to know different ways if any n SOLR to restrict/ limit a
>> time
>> > consuming query from processing for a long time.
>> >
>> > Thanks!
>> > Mark
>> >
>> > On Fri, Aug 9, 2019 at 2:15 PM Mark Robinson 
>> > wrote:
>> >
>> >>
>> >> Hello,
>> >> I have the following questions please:-
>> >>
>> >> In solrconfig.xml I created a new "/selecttimeout" handler copying
>> >> "/select" handler and added the following to my new "/selecttimeout":-
>> >>  
>> >>10
>> >>20
>> >>  
>> >>
>> >> 1.
>> >> Does the above mean that if I dont get a request once in 10ms on the
>> >> socket handling the /selecttimeout handler, that socket will be closed?
>> >>
>> >> 2.
>> >> Same with  connTimeOut? ie the connection  object remains live only if
>> at
>> >> least a connection request comes once in every 20 mS; if not the object
>> >> gets closed?
>> >>
>> >> Suppose a time consumeing query (say with lots of facets etc...), is
>> fired
>> >> against SOLR. How can I prevent Solr processing it for not more than
>> 1s?
>> >>
>> >> 3.
>> >> Is this achieved by setting timeAllowed=1000?  Or are there any other
>> ways
>> >> to do this in Solr?
>> >>
>> >> 4
>> >> For the same purpose to prevent heavy queries overloading SOLR, does
>> the
>> >>  above help in anyway or is it that shardHandler has
>> nothing
>> >> to restrict a query once fired against Solr?
>> >>
>> >>
>> >> Could someone pls share your views?
>> >>
>> >> Thanks!
>> >> Mark
>> >>
>>
>
>


Re: Clustering error in Solr 8.2.0

2019-08-13 Thread Zheng Lin Edwin Yeo
For lingo3g, they have replaced commons-lang with commons-lang3 in version
1.16, which should be in line with what Solr has done.

Just that our lingo3g licence does not allow us to upgrade to the new
version 1.16, and if we stick to the older version 1.15.1, it requires the
use of commons-lang.

Regards,
Edwin



On Tue, 13 Aug 2019 at 13:16, Jörn Franke  wrote:

> Depends if they do breaking changes in common-lang or not.
>
> By using an old version of a library such as common-lang you may introduce
> security issues in your setup.
>
> > Am 13.08.2019 um 06:12 schrieb Zheng Lin Edwin Yeo  >:
> >
> > I have found that the  Lingo3GClusteringAlgorithm  will work if I copied
> > the commons-lang-2.6.jar from the previous version to
> > solr-8.2.0\server\solr-webapp\webapp\WEB-INF\lib.
> >
> > Will this work in the long run? Because our lingo3g licence is not
> eligible
> > to download the latest version of 1.16, so we are currently stuck with
> the
> > older version 1.15.1, which still uses commons-lang dependency.
> >
> > Regards,
> > Edwin
> >
> > On Tue, 13 Aug 2019 at 00:14, Zheng Lin Edwin Yeo 
> > wrote:
> >
> >> Hi Kevin,
> >>
> >> Thanks for the info.
> >>
> >> I think should be lingo3g problem.  The problem occurs when I use
> >> Lingo3GClusteringAlgorithm.
> >>  >>
> name="carrot.algorithm">com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm
> >>
> >> If I change back to LingoClusteringAlgorithm, it will work.
> >>  >>
> name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
> >>
> >> Regards,
> >> Edwin
> >>
> >>> On Fri, 9 Aug 2019 at 10:59, Kevin Risden  wrote:
> >>>
> >>> According to the stack trace:
> >>>
> >>> java.lang.NoClassDefFoundError: org/apache/commons/lang/ObjectUtils
> >>>at lingo3g.s.hashCode(Unknown Source)
> >>>
> >>> It looks like lingo3g - lingo3g isn't on Maven central and looks like
> it
> >>> requires a license to download. You would have to contact them to see
> if
> >>> it
> >>> still uses commons-lang. You could also copy in commons-lang
> dependency.
> >>>
> >>> Kevin Risden
> >>>
> >>>
> >>> On Thu, Aug 8, 2019 at 10:23 PM Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> 
> >>> wrote:
> >>>
>  Hi Erick,
> 
>  Thanks for your reply.
> 
>  My clustering code is taken as it is from the Solr package, only the
> >>> codes
>  related to lingo3g is taken from previous version.
> 
>  Below are the 3 files that I have taken from previous version:
>  - lingo3g-1.15.0
>  - morfologik-fsa-2.1.1
>  - morfologik-stemming-2.1.1
> 
>  Does anyone of these could have caused the error?
> 
>  Regards,
>  Edwin
> 
>  On Thu, 8 Aug 2019 at 19:56, Erick Erickson 
>  wrote:
> 
> > This dependency was removed as part of
> > https://issues.apache.org/jira/browse/SOLR-9079, so my guess is
> >>> you’re
> > pointing to an old version of the clustering code.
> >
> > Best,
> > Erick
> >
> >> On Aug 8, 2019, at 4:22 AM, Zheng Lin Edwin Yeo <
> >>> edwinye...@gmail.com>
> > wrote:
> >>
> >> ObjectUtils
> >
> >
> 
> >>>
> >>
>