Re: Master/Slave

2020-10-07 Thread Jan Høydahl
The API that enables master/slave is the ReplicationHandler, where the follower 
(slave) pulls index files from leader (master).
This same API is used in SolrCloud for the PULL replica type, and also as a 
fallback for full recovery if transaction log is not enough. 
So I don’t see it going away anytime soon, even if the non-cloud deployment 
style is less promoted in the documentation.

Jan

> 6. okt. 2020 kl. 16:25 skrev Oakley, Craig (NIH/NLM/NCBI) [C] 
> :
> 
>> it better not ever be depreciated.  it has been the most reliable mechanism 
>> for its purpose
> 
> I would like to know whether that is the consensus of Solr developers.
> 
> We had been scrambling to move from Master/Slave to CDCR based on the 
> assertion that CDCR support would last far longer than Master/Slave support.
> 
> Can we now assume safely that this assertion is now completely moot? Can we 
> now assume safely that Master/Slave is likely to be supported for the 
> foreseeable future? Or are we forced to assume that Master/Slave support will 
> evaporate shortly after the now-evaporated CDCR support?
> 
> -Original Message-
> From: David Hastings  
> Sent: Wednesday, September 30, 2020 3:10 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Master/Slave
> 
>> whether we should expect Master/Slave replication also to be deprecated
> 
> it better not ever be depreciated.  it has been the most reliable mechanism
> for its purpose, solr cloud isnt going to replace standalone, if it does,
> thats when I guess I stop upgrading or move to elastic
> 
> On Wed, Sep 30, 2020 at 2:58 PM Oakley, Craig (NIH/NLM/NCBI) [C]
>  wrote:
> 
>> Based on the thread below (reading "legacy" as meaning "likely to be
>> deprecated in later versions"), we have been working to extract ourselves
>> from Master/Slave replication
>> 
>> Most of our collections need to be in two data centers (a read/write copy
>> in one local data center: the disaster-recovery-site SolrCloud could be
>> read-only). We also need redundancy within each data center for when one
>> host or another is unavailable. We implemented this by having different
>> SolrClouds in the different data centers; with Master/Slave replication
>> pulling data from one of the read/write replicas to each of the Slave
>> replicas in the disaster-recovery-site read-only SolrCloud. Additionally,
>> for some collections, there is a desire to have local read-only replicas
>> remain unchanged for querying during the loading process: for these
>> collections, there is a local read/write loading SolrCloud, a local
>> read-only querying SolrCloud (normally configured for Master/Slave
>> replication from one of the replicas of the loader SolrCloud to both
>> replicas of the query SolrCloud, but with Master/Slave disabled when the
>> load was in progress on the loader SolrCloud, and with Master/Slave resumed
>> after the loaded data passes QA checks).
>> 
>> Based on the thread below, we made an attempt to switch to CDCR. The main
>> reason for wanting to change was that CDCR was said to be the supported
>> mechanism, and the replacement for Master/Slave replication.
>> 
>> After multiple unsuccessful attempts to get CDCR to work, we ended up with
>> reproducible cases of CDCR loosing data in transit. In June, I initiated a
>> thread in this group asking for clarification of how/whether CDCR could be
>> made reliable. This seemed to me to be met with deafening silence until the
>> announcement in July of the release of Solr8.6 and the deprecation of CDCR.
>> 
>> So we are left with the question whether we should expect Master/Slave
>> replication also to be deprecated; and if so, with what is it expected to
>> be replaced (since not with CDCR)? Or is it now sufficiently safe to assume
>> that Master/Slave replication will continue to be supported after all
>> (since the assertion that it would be replaced by CDCR has been
>> discredited)? In either case, are there other suggested implementations of
>> having a read-only SolrCloud receive data from a read/write SolrCloud?
>> 
>> 
>> Thanks
>> 
>> -Original Message-
>> From: Shawn Heisey 
>> Sent: Tuesday, May 21, 2019 11:15 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SolrCloud (7.3) and Legacy replication slaves
>> 
>> On 5/21/2019 8:48 AM, Michael Tracey wrote:
>>> Is it possible set up an existing SolrCloud cluster as the master for
>>> legacy replication to a slave server or two?   It looks like another
>> option
>>> is to use Uni-direction CDCR, but not sure what is the best option in
>> this
>>> case.
>> 
>> You're asking for problems if you try to combine legacy replication with
>> SolrCloud.  The two features are not guaranteed to work together.
>> 
>> CDCR is your best bet.  This replicates from one SolrCloud cluster to
>> another.
>> 
>> Thanks,
>> Shawn
>> 



Re: Question about solr commits

2020-10-07 Thread yaswanth kumar
Thank you very much both Eric and Shawn

Sent from my iPhone

> On Oct 7, 2020, at 10:41 PM, Shawn Heisey  wrote:
> 
> On 10/7/2020 4:40 PM, yaswanth kumar wrote:
>> I have the below in my solrconfig.xml
>> 
>> 
>>   ${solr.Data.dir:}
>> 
>> 
>>   ${solr.autoCommit.maxTime:6}
>>   false
>> 
>> 
>>   ${solr.autoSoftCommit.maxTime:5000}
>> 
>>   
>> Does this mean even though we are always sending data with commit=false on
>> update solr api, the above should do the commit every minute (6 ms)
>> right?
> 
> Assuming that you have not defined the "solr.autoCommit.maxTime" and/or 
> "solr.autoSoftCommit.maxTime" properties, this config has autoCommit set to 
> 60 seconds without opening a searcher, and autoSoftCommit set to 5 seconds.
> 
> So five seconds after any indexing begins, Solr will do a soft commit. When 
> that commit finishes, changes to the index will be visible to queries.  One 
> minute after any indexing begins, Solr will do a hard commit, which 
> guarantees that data is written to disk, but it will NOT open a new searcher, 
> which means that when the hard commit happens, any pending changes to the 
> index will not be visible.
> 
> It's not "every five seconds" or "every 60 seconds" ... When any changes are 
> made, Solr starts a timer.  When the timer expires, the commit is fired.  If 
> no changes are made, no commits happen, because the timer isn't started.
> 
> Thanks,
> Shawn


Re: Question about solr commits

2020-10-07 Thread Shawn Heisey

On 10/7/2020 4:40 PM, yaswanth kumar wrote:

I have the below in my solrconfig.xml


 
   ${solr.Data.dir:}
 
 
   ${solr.autoCommit.maxTime:6}
   false
 
 
   ${solr.autoSoftCommit.maxTime:5000}
 
   

Does this mean even though we are always sending data with commit=false on
update solr api, the above should do the commit every minute (6 ms)
right?


Assuming that you have not defined the "solr.autoCommit.maxTime" and/or 
"solr.autoSoftCommit.maxTime" properties, this config has autoCommit set 
to 60 seconds without opening a searcher, and autoSoftCommit set to 5 
seconds.


So five seconds after any indexing begins, Solr will do a soft commit. 
When that commit finishes, changes to the index will be visible to 
queries.  One minute after any indexing begins, Solr will do a hard 
commit, which guarantees that data is written to disk, but it will NOT 
open a new searcher, which means that when the hard commit happens, any 
pending changes to the index will not be visible.


It's not "every five seconds" or "every 60 seconds" ... When any changes 
are made, Solr starts a timer.  When the timer expires, the commit is 
fired.  If no changes are made, no commits happen, because the timer 
isn't started.


Thanks,
Shawn


Re: Question about solr commits

2020-10-07 Thread Erick Erickson
Yes.

> On Oct 7, 2020, at 6:40 PM, yaswanth kumar  wrote:
> 
> I have the below in my solrconfig.xml
> 
> 
>
>  ${solr.Data.dir:}
>
>
>  ${solr.autoCommit.maxTime:6}
>  false
>
>
>  ${solr.autoSoftCommit.maxTime:5000}
>
>  
> 
> Does this mean even though we are always sending data with commit=false on
> update solr api, the above should do the commit every minute (6 ms)
> right?
> 
> -- 
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com



Re: Term too complex for spellcheck.q param

2020-10-07 Thread Walter Underwood
The spellcheck feature was replaced by the suggester in Solr 4, released in 
2012,
so I would not expect any changes in spellcheck.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 7, 2020, at 3:53 PM, gnandre  wrote:
> 
> Is there a way to truncate spellcheck.q param value from Solr side?
> 
> On Wed, Oct 7, 2020, 6:22 PM gnandre  wrote:
> 
>> Thanks. Is this going to be fixed in some future version?
>> 
>> On Wed, Oct 7, 2020, 4:15 PM Mike Drob  wrote:
>> 
>>> Right now the only solution is to use a shorter term.
>>> 
>>> In a fuzzy query you could also try using a lower edit distance e.g.
>>> term~1
>>> (default is 2), but I’m not sure what the syntax for a spellcheck would
>>> be.
>>> 
>>> Mike
>>> 
>>> On Wed, Oct 7, 2020 at 2:59 PM gnandre  wrote:
>>> 
 Hi,
 
 I am getting following error when I pass '
 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
 ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2
 
 {
  "error": {
"code": 500,
"msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
"trace":
>>> "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
 Term too complex:
 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
 
 
>>> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
 
 
>>> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
 
 
>>> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
 
 
>>> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
 
 
>>> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
 
 
>>> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
 
 
>>> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
 
 
>>> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
 
 
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
 
 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
 
>>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
 org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
 
 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
 
 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
 
 
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
 
 
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
 
 
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
 
 
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
 
 
>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
 
 
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
 
 
>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
 
 
>>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
 
 
>>> org.eclipse.jetty.server.h

Re: Term too complex for spellcheck.q param

2020-10-07 Thread gnandre
Is there a way to truncate spellcheck.q param value from Solr side?

On Wed, Oct 7, 2020, 6:22 PM gnandre  wrote:

> Thanks. Is this going to be fixed in some future version?
>
> On Wed, Oct 7, 2020, 4:15 PM Mike Drob  wrote:
>
>> Right now the only solution is to use a shorter term.
>>
>> In a fuzzy query you could also try using a lower edit distance e.g.
>> term~1
>> (default is 2), but I’m not sure what the syntax for a spellcheck would
>> be.
>>
>> Mike
>>
>> On Wed, Oct 7, 2020 at 2:59 PM gnandre  wrote:
>>
>> > Hi,
>> >
>> > I am getting following error when I pass '
>> > 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
>> > ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2
>> >
>> > {
>> >   "error": {
>> > "code": 500,
>> > "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
>> > 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
>> > "trace":
>> "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
>> > Term too complex:
>> > 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
>> >
>> >
>> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
>> >
>> >
>> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
>> >
>> >
>> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
>> >
>> >
>> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
>> >
>> >
>> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
>> >
>> >
>> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
>> >
>> >
>> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
>> >
>> >
>> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
>> >
>> >
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
>> >
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
>> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
>> >
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
>> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
>> >
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
>> >
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
>> >
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
>> >
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> >
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
>> >
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>> >
>> >
>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>> > org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
>> >
>> org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat
>> > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat
>> >
>> >
>> org.ecli

Question about solr commits

2020-10-07 Thread yaswanth kumar
I have the below in my solrconfig.xml



  ${solr.Data.dir:}


  ${solr.autoCommit.maxTime:6}
  false


  ${solr.autoSoftCommit.maxTime:5000}

  

Does this mean even though we are always sending data with commit=false on
update solr api, the above should do the commit every minute (6 ms)
right?

-- 
Thanks & Regards,
Yaswanth Kumar Konathala.
yaswanth...@gmail.com


Re: Term too complex for spellcheck.q param

2020-10-07 Thread gnandre
Thanks. Is this going to be fixed in some future version?

On Wed, Oct 7, 2020, 4:15 PM Mike Drob  wrote:

> Right now the only solution is to use a shorter term.
>
> In a fuzzy query you could also try using a lower edit distance e.g. term~1
> (default is 2), but I’m not sure what the syntax for a spellcheck would be.
>
> Mike
>
> On Wed, Oct 7, 2020 at 2:59 PM gnandre  wrote:
>
> > Hi,
> >
> > I am getting following error when I pass '
> > 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
> > ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2
> >
> > {
> >   "error": {
> > "code": 500,
> > "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
> > 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
> > "trace":
> "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
> > Term too complex:
> > 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
> >
> >
> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
> >
> >
> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
> >
> >
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
> >
> >
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
> >
> >
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
> >
> >
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
> >
> >
> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
> >
> >
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> >
> >
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> > org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat
> >
> >
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
> > org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat
> > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat
> >
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat
> > org.eclipse.jetty.io
> >
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
> > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
> > org.e

Re: Term too complex for spellcheck.q param

2020-10-07 Thread Mike Drob
Right now the only solution is to use a shorter term.

In a fuzzy query you could also try using a lower edit distance e.g. term~1
(default is 2), but I’m not sure what the syntax for a spellcheck would be.

Mike

On Wed, Oct 7, 2020 at 2:59 PM gnandre  wrote:

> Hi,
>
> I am getting following error when I pass '
> 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
> ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2
>
> {
>   "error": {
> "code": 500,
> "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
> 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
> "trace": "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
> Term too complex:
> 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
>
> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
>
> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
>
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
>
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
>
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
>
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
>
> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
>
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
>
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat
>
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
> org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat
>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat
> org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
>
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
>
> org.eclipse.j

Term too complex for spellcheck.q param

2020-10-07 Thread gnandre
Hi,

I am getting following error when I pass '
김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2

{
  "error": {
"code": 500,
"msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
"trace": "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
Term too complex:
김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat
java.lang.Thread.run(Thread.java:748)\nCaused by:
org.apache.lucene.util.automaton.TooComplexToDeter

Using fromIndex for single collection

2020-10-07 Thread Irina Kamalova
I suppose my question is very simple.
Am I right that if I want to use joins in the single collection in
SolrCloud across several shards,
I need to use semantic "fromIndex"?
According to documentation I should use it only if I have different
collections.
I have one single collection across multiple shards and I didn't find a way
to join documents correctly, but with "fromIndex" semantic.

Am I correct?

Best regards,
Irina Kamalova


Re: Java GC issue investigation

2020-10-07 Thread Walter Underwood
First thing is to stop using CMS and use G1GC.

We’ve been using these settings with over a hundred machines
in prod for nearly four years.

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 7, 2020, at 2:39 AM, Karol Grzyb  wrote:
> 
> Hi Matthew, Erick!
> 
> Thank you very much for the feedback, I'll try to convince them to
> reduce the heap size.
> 
> current GC settings:
> 
> -XX:+CMSParallelRemarkEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:+ParallelRefProcEnabled
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
> -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:ConcGCThreads=4
> -XX:MaxTenuringThreshold=8
> -XX:NewRatio=3
> -XX:ParallelGCThreads=4
> -XX:PretenureSizeThreshold=64m
> -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90
> 
> Kind regards,
> Karol
> 
> 
> wt., 6 paź 2020 o 16:52 Erick Erickson  napisał(a):
>> 
>> 12G is not that huge, it’s surprising that you’re seeing this problem.
>> 
>> However, there are a couple of things to look at:
>> 
>> 1> If you’re saying that you have 16G total physical memory and are 
>> allocating 12G to Solr, that’s an anti-pattern. See:
>> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>> If at all possible, you should allocate between 25% and 50% of your physical 
>> memory to Solr...
>> 
>> 2> what garbage collector are you using? G1GC might be a better choice.
>> 
>>> On Oct 6, 2020, at 10:44 AM, matthew sporleder  wrote:
>>> 
>>> Your index is so small that it should easily get cached into OS memory
>>> as it is accessed.  Having a too-big heap is a known problem
>>> situation.
>>> 
>>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-HowmuchheapspacedoIneed?
>>> 
>>> On Tue, Oct 6, 2020 at 9:44 AM Karol Grzyb  wrote:
 
 Hi Matthew,
 
 Thank you for the answer, I cannot reproduce the setup locally I'll
 try to convince them to reduce Xmx, I guess they will rather not agree
 to 1GB but something less than 12G for sure.
 And have some proper dev setup because for now we could only test prod
 or stage which are difficult to adjust.
 
 Is being stuck in GC common behaviour when the index is small compared
 to available heap during bigger load? I was more worried about the
 ratio of heap to total host memory.
 
 Regards,
 Karol
 
 
 wt., 6 paź 2020 o 14:39 matthew sporleder  
 napisał(a):
> 
> You have a 12G heap for a 200MB index?  Can you just try changing Xmx
> to, like, 1g ?
> 
> On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb  wrote:
>> 
>> Hi,
>> 
>> I'm involved in investigation of issue that involves huge GC overhead
>> that happens during performance tests on Solr Nodes. Solr version is
>> 6.1. Last test were done on staging env, and we run into problems for
>> <100 requests/second.
>> 
>> The size of the index itself is ~200MB ~ 50K docs
>> Index has small updates every 15min.
>> 
>> 
>> 
>> Queries involve sorting and faceting.
>> 
>> I've gathered some heap dumps, I can see from them that most of heap
>> memory is retained because of object of following classes:
>> 
>> -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector
>> (>4G, 91% of heap)
>> -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs
>> -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue
>> -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector
>> (>3.7G 76% of heap)
>> 
>> 
>> 
>> Based on information above is there anything generic that can been
>> looked at as source of potential improvement without diving deeply
>> into schema and queries (which may be very difficlut to change at this
>> moment)? I don't see docvalues being enabled - could this help, as if
>> I get the docs correctly, it's specifically helpful when there are
>> many sorts/grouping/facets? Or I
>> 
>> Additionaly I see, that many threads are blocked on LRUCache.get,
>> should I recomend switching to FastLRUCache?
>> 
>> Also, I wonder if -Xmx12288m for java heap is not too much for 16G
>> memory? I see some (~5/s) page faults in Dynatrace during the biggest
>> traffic.
>> 
>> Thank you very much for any help,
>> Kind regards,
>> Karol
>> 



Re: Help using Noggit for streaming JSON data

2020-10-07 Thread Christopher Schultz
Yonic,

Thanks for the reply, and apologies for the long delay in this reply. Also 
apologies for top-posting, I’m writing from my phone. :(

Oh, of course... simply subclass the CharArr.

In my case, I should be able to immediately base64-decode the value (saves 1/4 
in-memory representation) and, if I do everything correctly, may be able to 
stream directly to my database.

With a *very* complicated CharArr implementation of course :)

Thanks,
-chris

> On Sep 17, 2020, at 12:22, Yonik Seeley  wrote:
> 
> See this method:
> 
>  /** Reads a JSON string into the output, decoding any escaped characters.
> */
>  public void getString(CharArr output) throws IOException
> 
> And then the idea is to create a subclass of CharArr to incrementally
> handle the string that is written to it.
> You could overload write methods, or perhaps reserve() to flush/handle the
> buffer when it reaches a certain size.
> 
> -Yonik
> 
> 
>> On Thu, Sep 17, 2020 at 11:48 AM Christopher Schultz <
>> ch...@christopherschultz.net> wrote:
>> 
>> All,
>> 
>> Is this an appropriate forum for asking questions about how to use
>> Noggit? The Github doesn't have any discussions available and filing an
>> "issue" to ask a question is kinda silly. I'm happy to be redirected to
>> the right place if this isn't appropriate.
>> 
>> I've been able to figure out most things in Noggit by reading the code,
>> but I have a new use-case where I expect that I'll have very large
>> values (base64-encoded binary) and I'd like to stream those rather than
>> calling parser.getString() and getting a potentially huge string coming
>> back. I'm streaming into a database so I never need the whole string in
>> one place at one time.
>> 
>> I was thinking something like this:
>> 
>> JSONParser p = ...;
>> 
>> int evt = p.nextEvent();
>> if(JSONParser.STRING == evt) {
>>  // Start streaming
>>  boolean eos = false;
>>  while(!eos) {
>>char c = p.getChar();
>>if(c == '"') {
>>  eos = true;
>>} else {
>>  append to stream
>>}
>>  }
>> }
>> 
>> But getChar() is not public. The only "documentation" I've really been
>> able to find for Noggit is this post from Yonic back in 2014:
>> 
>> http://yonik.com/noggit-json-parser/
>> 
>> It mostly says "Noggit is great!" and specifically mentions huge, long
>> strings but does not actually show any Java code to consume the JSON
>> data in any kind of streaming way.
>> 
>> The ObjectBuilder class is a great user of JSONParser, but it just
>> builds standard objects and would consume tons of memory in my case.
>> 
>> I know for sure that Solr consumes huge JSON documents and I'm assuming
>> that Noggit is being used in that situation, though I have not looked at
>> the code used to do that.
>> 
>> Any suggestions?
>> 
>> -chris
>> 


Solr 8.6.2 - Admin UI Issue

2020-10-07 Thread Vinay Rajput
Hi All,

We are currently using Solr 7.3.1 in cloud mode and planning to upgrade.
When I bootstrapped Solr 8.6.2 in my local machine and uploaded all
necessary configs, I noticed one issue in admin UI.

If I select a collection and go to files, it shows the content tree having
all files and folders present in that collection. In Solr 8.6.2, it is
somehow not showing the folders correctly. In my screenshot, you can see
that velocity and xslt are the folders and we have some config files inside
these two folders. Because of this issue, I can't click on folder nodes and
see children nodes. I checked the network calls and it looks like we are
getting the correct data from Solr. So, it looks like an Admin UI issue to
me.

Does anyone know if this is a* known issue* or I am missing something here?
Has anyone noticed the similar issue?  I can confirm that It works fine
with Solr 7.3.1.

[image: image.png][image: image.png]

Left image is for 8.6.2 and right image is for 7.3.1

Thanks,
Vinay


Re: Java GC issue investigation

2020-10-07 Thread Karol Grzyb
Hi Matthew, Erick!

Thank you very much for the feedback, I'll try to convince them to
reduce the heap size.

current GC settings:

-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:ConcGCThreads=4
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:ParallelGCThreads=4
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90

Kind regards,
Karol


wt., 6 paź 2020 o 16:52 Erick Erickson  napisał(a):
>
> 12G is not that huge, it’s surprising that you’re seeing this problem.
>
> However, there are a couple of things to look at:
>
> 1> If you’re saying that you have 16G total physical memory and are 
> allocating 12G to Solr, that’s an anti-pattern. See:
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> If at all possible, you should allocate between 25% and 50% of your physical 
> memory to Solr...
>
> 2> what garbage collector are you using? G1GC might be a better choice.
>
> > On Oct 6, 2020, at 10:44 AM, matthew sporleder  wrote:
> >
> > Your index is so small that it should easily get cached into OS memory
> > as it is accessed.  Having a too-big heap is a known problem
> > situation.
> >
> > https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-HowmuchheapspacedoIneed?
> >
> > On Tue, Oct 6, 2020 at 9:44 AM Karol Grzyb  wrote:
> >>
> >> Hi Matthew,
> >>
> >> Thank you for the answer, I cannot reproduce the setup locally I'll
> >> try to convince them to reduce Xmx, I guess they will rather not agree
> >> to 1GB but something less than 12G for sure.
> >> And have some proper dev setup because for now we could only test prod
> >> or stage which are difficult to adjust.
> >>
> >> Is being stuck in GC common behaviour when the index is small compared
> >> to available heap during bigger load? I was more worried about the
> >> ratio of heap to total host memory.
> >>
> >> Regards,
> >> Karol
> >>
> >>
> >> wt., 6 paź 2020 o 14:39 matthew sporleder  
> >> napisał(a):
> >>>
> >>> You have a 12G heap for a 200MB index?  Can you just try changing Xmx
> >>> to, like, 1g ?
> >>>
> >>> On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb  wrote:
> 
>  Hi,
> 
>  I'm involved in investigation of issue that involves huge GC overhead
>  that happens during performance tests on Solr Nodes. Solr version is
>  6.1. Last test were done on staging env, and we run into problems for
>  <100 requests/second.
> 
>  The size of the index itself is ~200MB ~ 50K docs
>  Index has small updates every 15min.
> 
> 
> 
>  Queries involve sorting and faceting.
> 
>  I've gathered some heap dumps, I can see from them that most of heap
>  memory is retained because of object of following classes:
> 
>  -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector
>  (>4G, 91% of heap)
>  -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs
>  -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue
>  -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector
>  (>3.7G 76% of heap)
> 
> 
> 
>  Based on information above is there anything generic that can been
>  looked at as source of potential improvement without diving deeply
>  into schema and queries (which may be very difficlut to change at this
>  moment)? I don't see docvalues being enabled - could this help, as if
>  I get the docs correctly, it's specifically helpful when there are
>  many sorts/grouping/facets? Or I
> 
>  Additionaly I see, that many threads are blocked on LRUCache.get,
>  should I recomend switching to FastLRUCache?
> 
>  Also, I wonder if -Xmx12288m for java heap is not too much for 16G
>  memory? I see some (~5/s) page faults in Dynatrace during the biggest
>  traffic.
> 
>  Thank you very much for any help,
>  Kind regards,
>  Karol
>