Re: Exact matching without using new fields
Thanks for replying, Dave. I am afraid that I am looking for non-index time i.e. query time solution. Actually in my case I am expecting both documents to be returned from your example. I am just trying to avoid returning of documents which contain a tokenized versions of the provided search query when it is enclosed within double quotes to indicate exact matching expectation. e.g. search query -> "information retrieval" This should match documents like following: doc 1: "information retrieval" doc 2: "Advanced information retrieval with Solr" but should NOT match documents like doc 3: "informed retrieval" doc 4: "information extraction" (considering 'extraction' was a specified synonym of 'retrieval' ) doc 5: "INFORMATION RETRIEVAL" etc I am also ok with these documents showing up as long as they show up at bottom. Also, query time solution is a must. On Tue, Jan 19, 2021 at 12:22 PM David R wrote: > We had the same requirement. Just to echo back your requirements, I > understand your case to be this. Given these 2 doc titles: > > doc 1: "information retrieval" > doc 2: "Advanced information retrieval with Solr" > > You want a phrase search for "information retrieval" to find both > documents, but an EXACT phrase search for "information retrieval" to find > doc #1 only. > > If that's true, and case-sensitive search isn't a requirement, I indexed > this in the token stream, with adjacent positions of course. > > START information retrieval END > START advanced information retrieval with solr END > > And with our custom query parser, when an EXACT operator is found, I > tokenize the query to match the first case. Otherwise pass it through. > > Needs custom analyzers on the query and index sides to generate the > correct token sequences. > > It's worked out well for our case. > > Dave > > > > > From: gnandre > Sent: Tuesday, January 19, 2021 4:07 PM > To: solr-user@lucene.apache.org > Subject: Exact matching without using new fields > > Hi, > > I am aware that to do exact matching (only whatever is provided inside > double quotes should be matched) in Solr, we can copy existing fields with > the help of copyFields into new fields that have very minimal tokenization > or no tokenization (e.g. using KeywordTokenizer or using string field type) > > However this solution is expensive in terms of index size because it might > almost double the size of the existing index. > > Is there any inexpensive way of achieving exact matches from the query > side. e.g. boost the original tokens more at query time compared to their > tokens? >
Exact matching without using new fields
Hi, I am aware that to do exact matching (only whatever is provided inside double quotes should be matched) in Solr, we can copy existing fields with the help of copyFields into new fields that have very minimal tokenization or no tokenization (e.g. using KeywordTokenizer or using string field type) However this solution is expensive in terms of index size because it might almost double the size of the existing index. Is there any inexpensive way of achieving exact matches from the query side. e.g. boost the original tokens more at query time compared to their tokens?
FST building precaution
Hi, following comment is mentioned in https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/fst/package-info.java . "Input values (keys). These must be provided to Builder in Unicode code point (UTF8 or UTF32) sorted order. Note that sorting by Java's String.compareTo, which is UTF16 sorted order, is not correct and can lead to exceptions while building the FST" Can someone please suggest how to achieve this?
distrib.requestTimes and distrib.totalTime metric always show 0 for any sub-metric
*distrib.requestTimes and *distrib.totalTime metric always show 0 for any sub-metric. Only *local.requestTimes and *local.totalTime metric have non-zero values. This is when we hit solr:8983/solr/admin/metrics endpoint. e.g. "QUERY./select.distrib.requestTimes":{ "count":0, "meanRate":0.0, "1minRate":0.0, "5minRate":0.0, "15minRate":0.0, "min_ms":0.0, "max_ms":0.0, "mean_ms":0.0, "median_ms":0.0, "stddev_ms":0.0, "p75_ms":0.0, "p95_ms":0.0, "p99_ms":0.0, "p999_ms":0.0}, "QUERY./select.local.requestTimes":{ "count":921, "meanRate":0.016278013505962197, "1minRate":0.02502213358051701, "5minRate":0.01792972725206014, "15minRate":0.016913129796499247, "min_ms":0.092099, "max_ms":27.833606, "mean_ms":1.5546483254237826, "median_ms":0.211898, "stddev_ms":2.353088809601306, "p75_ms":0.278897, "p95_ms":5.547842, "p99_ms":5.547842, "p999_ms":9.239902}, "QUERY./select.requestTimes":{ "count":921, "meanRate":0.01627801345713971, "1minRate":0.02502213358051701, "5minRate":0.01792972725206014, "15minRate":0.016913129796499247, "min_ms":0.094899, "max_ms":27.840706, "mean_ms":1.5588447262406753, "median_ms":0.216198, "stddev_ms":2.352629359382386, "p75_ms":0.284497, "p95_ms":5.551242, "p99_ms":5.551242, "p999_ms":9.242902}, I am using the 8.5.2 version of Solr in standalone mode.I have some queries that are distributed in the sense that they use shards parameter to distribute the query among different cores. I was expecting that the distrib metric would have some value when I execute these distributed queries. Also, what is the need of a third metric besides local and distrib?
Duplicate entries for request handlers in Solr metric reporter
Hi, I have hooked up Grafana dashboards with Solr 8.5.2 Prometheus exporter. For some reason, some dashboards like Requests, Timeouts are not showing any data. When I took a look at corresponding data from Prometheus exporter, it showed two entries per search request handler, first with count of 0 and the second with the correct count. I am not sure why the entry with count 0 is appearing or all search request handlers. I checked the configuration and there is no duplication of request handlers in solrconfig.xml. My guest is that Grafana is picking up this first entry and therefore does not show any data. E.g. solr_metrics_core_requests_total{category="QUERY",handler="/questions",core="answers",base_url=" http://localhost:8983/solr",} 0.0 solr_metrics_core_requests_total{category="QUERY",handler="/questions",core="answers",base_url=" http://localhost:8983/solr",} 4534446.0
Error false and Error true in Solr logs
Hi, What do Error false and Error true flags mentioned against Solr errors in Solr admin UI log mean?
Re: Term too complex for spellcheck.q param
Is there a way to truncate spellcheck.q param value from Solr side? On Wed, Oct 7, 2020, 6:22 PM gnandre wrote: > Thanks. Is this going to be fixed in some future version? > > On Wed, Oct 7, 2020, 4:15 PM Mike Drob wrote: > >> Right now the only solution is to use a shorter term. >> >> In a fuzzy query you could also try using a lower edit distance e.g. >> term~1 >> (default is 2), but I’m not sure what the syntax for a spellcheck would >> be. >> >> Mike >> >> On Wed, Oct 7, 2020 at 2:59 PM gnandre wrote: >> >> > Hi, >> > >> > I am getting following error when I pass ' >> > 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마 >> > ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2 >> > >> > { >> > "error": { >> > "code": 500, >> > "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com >> > 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마", >> > "trace": >> "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException: >> > Term too complex: >> > 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat >> > >> > >> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat >> > >> > >> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat >> > >> > >> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat >> > >> > >> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat >> > >> > >> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat >> > >> > >> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat >> > >> > >> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat >> > >> > >> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat >> > >> > >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat >> > >> > >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat >> > >> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat >> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat >> > >> > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat >> > >> > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat >> > >> > >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat >> > >> > >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat >> > >> > >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat >> > >> > >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat >> > >> > >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat >> > >> > >> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat >> > >> > >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat >> > >> > >> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat >> > >> > >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat >> > >> > >> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat >> > >> > >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat >> > >> > >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat >> > >> > >> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat >> > >> > >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat >> > >> > >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
Re: Term too complex for spellcheck.q param
Thanks. Is this going to be fixed in some future version? On Wed, Oct 7, 2020, 4:15 PM Mike Drob wrote: > Right now the only solution is to use a shorter term. > > In a fuzzy query you could also try using a lower edit distance e.g. term~1 > (default is 2), but I’m not sure what the syntax for a spellcheck would be. > > Mike > > On Wed, Oct 7, 2020 at 2:59 PM gnandre wrote: > > > Hi, > > > > I am getting following error when I pass ' > > 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마 > > ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2 > > > > { > > "error": { > > "code": 500, > > "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com > > 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마", > > "trace": > "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException: > > Term too complex: > > 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat > > > > > org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat > > > > > org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat > > > > > org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat > > > > > org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat > > > > > org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat > > > > > org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat > > > > > org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat > > > > > org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat > > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat > > org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat > > > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat > > > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat > > > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat > > > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat > > > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat > > > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat > > > > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat > > > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat > > > > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat > > > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat > > > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat > > > > > org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat > > > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat > > > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat > > > > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat > > > &
Term too complex for spellcheck.q param
Hi, I am getting following error when I pass ' 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마 ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2 { "error": { "code": 500, "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마", "trace": "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException: Term too complex: 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by:
Returning fields a specific order
Hi, I have a use-case where I want to compare stored fields values of Solr documents from two different Solr instances. I can use a diff tool to compare them but only if they returned the fields in specific order in the response. I tried setting fl param with all the fields specified in particular order. However, the results that are returned do not follow specific order given in fl param. Is there any way to achieve this behavior in Solr?
Re: Difference in q.op param behavior between Solr 6.3 and Solr 8.5.2
Thanks, this is helpful. I agree. q.op param should not affect fq parameter. I think this is a feature and not a bug. On Wed, Sep 23, 2020 at 4:39 PM Erik Hatcher wrote: > In 6.3 it did that? It shouldn't have. q and fq shouldn't share > parameters. fq's themselves shouldn't, IMO, have global defaults. fq's > need to be stable and often uniquely specified kinds of constraining query > parsers ({!terms/term/field,etc}) or rely on basic Lucene query parser > syntax and be able to stably rely on AND/OR. > > Relevancy tuning on q and friends, tweaking those parameters, shouldn't > affect fq's, to say it a little differently. > > One can fq={!lucene q.op=AND}id:(1 2 3) > > Erik > > > > On Sep 23, 2020, at 4:23 PM, gnandre wrote: > > > > Is there a way to set default operator as AND for fq parameter in Solr > > 8.5.2 now? > > > > On Tue, Sep 22, 2020 at 7:44 PM gnandre wrote: > > > >> In 6.3, q.op param used to affect q as well fq param behavior. E.g. if > >> q.op is set to AND and fq is set to id:(1 2 3), no results will show up > but > >> if it is set to OR then all 3 results will show up. This does not > happen in > >> Solr 8.5.2 anymore. > >> > >> Is this a bug? What does one need to do in Solr 8.5.2 to achieve the > same > >> behavior besides passing the operator directly in fq param i.e. id:(1 > OR 2 > >> OR 3) > >> > >
Re: Difference in q.op param behavior between Solr 6.3 and Solr 8.5.2
Is there a way to set default operator as AND for fq parameter in Solr 8.5.2 now? On Tue, Sep 22, 2020 at 7:44 PM gnandre wrote: > In 6.3, q.op param used to affect q as well fq param behavior. E.g. if > q.op is set to AND and fq is set to id:(1 2 3), no results will show up but > if it is set to OR then all 3 results will show up. This does not happen in > Solr 8.5.2 anymore. > > Is this a bug? What does one need to do in Solr 8.5.2 to achieve the same > behavior besides passing the operator directly in fq param i.e. id:(1 OR 2 > OR 3) >
Difference in q.op param behavior between Solr 6.3 and Solr 8.5.2
In 6.3, q.op param used to affect q as well fq param behavior. E.g. if q.op is set to AND and fq is set to id:(1 2 3), no results will show up but if it is set to OR then all 3 results will show up. This does not happen in Solr 8.5.2 anymore. Is this a bug? What does one need to do in Solr 8.5.2 to achieve the same behavior besides passing the operator directly in fq param i.e. id:(1 OR 2 OR 3)
Re: Solr 8.5.2 - Solr shards param does not work without localhost
Please ignore the space between. I have updated the calls by removing space below: http://my.domain.com/solr/core/select?q=*:*=0=10= my.domain.com/solr/another_core=* http://my.domain.com/solr/core/select?q=*:*=0=10= localhost:8983/solr/another_core=* On Thu, Aug 6, 2020 at 7:59 PM gnandre wrote: > Hi, > > In Solr 6.3 I was able to use following shards query: > > http://my.domain.com/solr/core/select?q=*:*=0=10= > my.domain.com /solr/another_core=* > > Ir does not work in Solr 8.5.2 anymore unless I pass localhost instead of > my domain in shards param value as follows: > http://my.domain.com/solr/core/select?q=*:*=0=10= > localhost:8983 /solr/another_core=* > > This is a master-slave setup and not a cloud setup. >
Solr 8.5.2 - Solr shards param does not work without localhost
Hi, In Solr 6.3 I was able to use following shards query: http://my.domain.com/solr/core/select?q=*:*=0=10= my.domain.com /solr/another_core=* Ir does not work in Solr 8.5.2 anymore unless I pass localhost instead of my domain in shards param value as follows: http://my.domain.com/solr/core/select?q=*:*=0=10= localhost:8983 /solr/another_core=* This is a master-slave setup and not a cloud setup.
Solr docker image works with image option but not with build option in docker-compose
Hi, I am using Solr docker image 8.5.2-slim from https://hub.docker.com/_/solr. I use it as a base image and then add some more stuff to it with my custom Dockerfile. When I build the final docker image, it is built successfully. After that, when I try to use it in docker-compose.yml (with build option) to start a Solr service, it complains about no permission for creating directories under /var/solr path. I have given read/write permission to solr user for /var/solr path in dockerfile.Also, when I use image instead of build option in docker-compose.yml file for the same image, it does not throw any errors like that and Solr starts without any issues. Any clue why this might be happening?
Re: Solr 8.5.2 indexing issue
It seems that the issue is not with reference_url field itself. There is one copy field which has the reference_url field as source and another field called url_path as destination. This destination field url_path has the following field type definition. If I remove SynonymGraphFilterFactory and FlattenGraphFilterFactory in above field type definition then it works otherwise it throws the same error (IndexOutOfBoundsException) . On Sun, Jun 28, 2020 at 9:06 AM Erick Erickson wrote: > How are you sending this to Solr? I just tried 8.5, submitting that doc > through the admin UI and it works fine. > I defined “asset_id” with as the same type as your reference_url field. > > And does the log on the Solr node that tries to index this give any more > info? > > Best, > Erick > > > On Jun 27, 2020, at 10:45 PM, gnandre wrote: > > > > { > >"asset_id":"add-ons:576deefef7453a9189aa039b66500eb2", > > > > > "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"} > >
Re: Downsides to applying to WordDelimiterFilter twice in analyzer chain
Here are links to images for the Analysis tab. https://pasteboard.co/JfFTYu6.png https://pasteboard.co/JfFUYXf.png On Wed, Jul 1, 2020 at 3:03 PM gnandre wrote: > I am doing that already but it does not help. > > Here is the complete analyzer chain. > > "100"> "solr.WhitespaceTokenizerFactory"/> "solr.WordDelimiterFilterFactory" protected="protect.txt" preserveOriginal > ="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> "solr.LowerCaseFilterFactory"/> "solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose"/> class="solr.SynonymFilterFactory" synonyms="synonyms_en.txt" ignoreCase= > "true" expand="true"/> class="solr.RemoveDuplicatesTokenFilterFactory"/> type="query"> class="solr.WordDelimiterFilterFactory" protected="protect.txt" > preserveOriginal="1" generateWordParts="1" generateNumberParts="1" > catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange= > "1"/> "solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose"/> class="solr.SynonymFilterFactory" synonyms="synonyms_en_query.txt" > ignoreCase="true" expand="true"/> /> > > > [image: image.png] > > [image: image.png] > > On Wed, Jul 1, 2020 at 12:29 PM Erick Erickson > wrote: > >> Why not just specify preserveOriginal and follow by a lowerCaseFilter and >> use one wordDelimiterFilterFactory? >> >> Best, >> Erick >> >> > On Jul 1, 2020, at 11:05 AM, gnandre wrote: >> > >> > Hi, >> > >> > To satisfy one use-case, I need to apply WordDelimiterFilter with >> > splitOnCaseChange >> > with 0 once and then with 1 again. Are there some downsides to this >> > approach? >> > >> > Use-case is to be able to match results when indexed content is >> my.camelCase >> > and search query is camelcase. >> >>
Re: Downsides to applying to WordDelimiterFilter twice in analyzer chain
I am doing that already but it does not help. Here is the complete analyzer chain. [image: image.png] [image: image.png] On Wed, Jul 1, 2020 at 12:29 PM Erick Erickson wrote: > Why not just specify preserveOriginal and follow by a lowerCaseFilter and > use one wordDelimiterFilterFactory? > > Best, > Erick > > > On Jul 1, 2020, at 11:05 AM, gnandre wrote: > > > > Hi, > > > > To satisfy one use-case, I need to apply WordDelimiterFilter with > > splitOnCaseChange > > with 0 once and then with 1 again. Are there some downsides to this > > approach? > > > > Use-case is to be able to match results when indexed content is > my.camelCase > > and search query is camelcase. > >
Downsides to applying to WordDelimiterFilter twice in analyzer chain
Hi, To satisfy one use-case, I need to apply WordDelimiterFilter with splitOnCaseChange with 0 once and then with 1 again. Are there some downsides to this approach? Use-case is to be able to match results when indexed content is my.camelCase and search query is camelcase.
Solr 8.5.2 indexing issue
Hi, I have the following document which fails to get indexed. { "asset_id":"add-ons:576deefef7453a9189aa039b66500eb2", "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"} I am not sure what is so special about the content in the reference_url field. reference_url field is defined as follows in schema: It throws the following error. Status: {"data":{"responseHeader":{"status":400,"QTime":18},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.IndexOutOfBoundsException"],"msg":"Exception writing document id add-ons:576deefef7453a9189aa039b66500eb2 to the index; possible analysis error.","code":400}},"status":400,"config":{"method":"POST","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","headers":{"Content-type":"application/json","Accept":"application/json, text/plain, */*","X-Requested-With":"XMLHttpRequest"},"data":"[{\n \"asset_id\":\"add-ons:576deefef7453a9189aa039b66500eb2\",\n \"reference_url\":\"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html\"}]","url":"add-ons/update","params":{"wt":"json","_":1593304427428,"commitWithin":1000,"overwrite":true},"timeout":1},"statusText":"Bad Request","xhrStatus":"complete","resource":{"0":"[","1":"{","2":"\n","3":" ","4":" ","5":" ","6":" ","7":" ","8":" ","9":" ","10":" ","11":"\"","12":"a","13":"s","14":"s","15":"e","16":"t","17":"_","18":"i","19":"d","20":"\"","21":":","22":"\"","23":"a","24":"d","25":"d","26":"-","27":"o","28":"n","29":"s","30":":","31":"5","32":"7","33":"6","34":"d","35":"e","36":"e","37":"f","38":"e","39":"f","40":"7","41":"4","42":"5","43":"3","44":"a","45":"9","46":"1","47":"8","48":"9","49":"a","50":"a","51":"0","52":"3","53":"9","54":"b","55":"6","56":"6","57":"5","58":"0","59":"0","60":"e","61":"b","62":"2","63":"\"","64":",","65":"\n","66":" ","67":" ","68":" ","69":" ","70":" ","71":" ","72":" ","73":" ","74":"\"","75":"r","76":"e","77":"f","78":"e","79":"r","80":"e","81":"n","82":"c","83":"e","84":"_","85":"u","86":"r","87":"l","88":"\"","89":":","90":"\"","91":"m","92":"o","93":"d","94":"e","95":"l","96":"i","97":"n","98":"g","99":"-","100":"a","101":"-","102":"h","103":"i","104":"g","105":"h","106":"-","107":"s","108":"p","109":"e","110":"e","111":"d","112":"-","113":"b","114":"a","115":"c","116":"k","117":"p","118":"l","119":"a","120":"n","121":"e","122":"-","123":"p","124":"a","125":"r","126":"t","127":"-","128":"3","129":"-","130":"4","131":"-","132":"p","133":"o","134":"r","135":"t","136":"-","137":"s","138":"-","139":"p","140":"a","141":"r","142":"a","143":"m","144":"e","145":"t","146":"e","147":"r","148":"s","149":"-","150":"t","151":"o","152":"-","153":"d","154":"i","155":"f","156":"f","157":"e","158":"r","159":"e","160":"n","161":"t","162":"i","163":"a","164":"l","165":"-","166":"t","167":"d","168":"r","169":"-","170":"a","171":"n","172":"d","173":"-","174":"t","175":"d","176":"t","177":".","178":"h","179":"t","180":"m","181":"l","182":"\"","183":"}","184":"]"}}
Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr
Another alternative for master-slave nodes might be parent-child nodes. This was adopted in Python too afaik. On Fri, Jun 19, 2020, 2:07 AM gnandre wrote: > What about blacklist and whitelist for shards? May I suggest blocklist and > safelist? > > On Fri, Jun 19, 2020, 1:45 AM Thomas Corthals > wrote: > >> Since "overseer" is also problematic, I'd like to propose "orchestrator" >> as >> an alternative. >> >> Thomas >> >> Op vr 19 jun. 2020 04:34 schreef Walter Underwood > >: >> >> > We don’t get to decide whether “master” is a problem. The rest of the >> world >> > has already decided that it is a problem. >> > >> > Our task is to replace the terms “master” and “slave” in Solr. >> > >> > wunder >> > Walter Underwood >> > wun...@wunderwood.org >> > http://observer.wunderwood.org/ (my blog) >> > >> > > On Jun 18, 2020, at 6:50 PM, Rahul Goswami >> > wrote: >> > > >> > > I agree with Phill, Noble and Ilan above. The problematic term is >> "slave" >> > > (not master) which I am all for changing if it causes less regression >> > than >> > > removing BOTH master and slave. Since some people have pointed out >> Github >> > > changing the "master" terminology, in my personal opinion, it was not >> a >> > > measured response to addressing the bigger problem we are all trying >> to >> > > tackle. There is no concept of a "slave" branch, and "master" by >> itself >> > is >> > > a pretty generic term (Is someone having "mastery" over a skill a bad >> > > thing?). I fear all it would end up achieving in the end with Github >> is a >> > > mess of broken build scripts at best. >> > > So +1 on "slave" being the problematic term IMO, not "master". >> > > >> > > On Thu, Jun 18, 2020 at 8:19 PM Phill Campbell >> > > wrote: >> > > >> > >> Master - Worker >> > >> Master - Peon >> > >> Master - Helper >> > >> Master - Servant >> > >> >> > >> The term that is not wanted is “slave’. The term “master” is not a >> > problem >> > >> IMO. >> > >> >> > >>> On Jun 18, 2020, at 3:59 PM, Jan Høydahl >> > wrote: >> > >>> >> > >>> I support Mike Drob and Trey Grainger. We shuold re-use the >> > >> leader/replica >> > >>> terminology from Cloud. Even if you hand-configure a master/slave >> > cluster >> > >>> and orchestrate what doc goes to which node/shard, and hand-code >> your >> > >> shards >> > >>> parameter, you will still have a cluster where you’d send updates to >> > the >> > >> leader of >> > >>> each shard and the replicas would replicate the index from the >> leader. >> > >>> >> > >>> Let’s instead find a new good name for the cluster type. Standalone >> > kind >> > >> of works >> > >>> for me, but I see it can be confused with single-node. We have also >> > >> discussed >> > >>> replacing SolrCloud (which is a terrible name) with something more >> > >> descriptive. >> > >>> >> > >>> Today: SolrCloud vs Master/slave >> > >>> Alt A: SolrCloud vs Standalone >> > >>> Alt B: SolrCloud vs Legacy >> > >>> Alt C: Clustered vs Independent >> > >>> Alt D: Clustered vs Manual mode >> > >>> >> > >>> Jan >> > >>> >> > >>>> 18. jun. 2020 kl. 15:53 skrev Mike Drob : >> > >>>> >> > >>>> I personally think that using Solr cloud terminology for this >> would be >> > >> fine >> > >>>> with leader/follower. The leader is the one that accepts updates, >> > >> followers >> > >>>> cascade the updates somehow. The presence of ZK or election doesn’t >> > >> really >> > >>>> change this detail. >> > >>>> >> > >>>> However, if folks feel that it’s confusing, then I can’t tell them >> > that >> > >>>> they’re not confused. Especially when they’re working with others >> who &
Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr
What about blacklist and whitelist for shards? May I suggest blocklist and safelist? On Fri, Jun 19, 2020, 1:45 AM Thomas Corthals wrote: > Since "overseer" is also problematic, I'd like to propose "orchestrator" as > an alternative. > > Thomas > > Op vr 19 jun. 2020 04:34 schreef Walter Underwood : > > > We don’t get to decide whether “master” is a problem. The rest of the > world > > has already decided that it is a problem. > > > > Our task is to replace the terms “master” and “slave” in Solr. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > On Jun 18, 2020, at 6:50 PM, Rahul Goswami > > wrote: > > > > > > I agree with Phill, Noble and Ilan above. The problematic term is > "slave" > > > (not master) which I am all for changing if it causes less regression > > than > > > removing BOTH master and slave. Since some people have pointed out > Github > > > changing the "master" terminology, in my personal opinion, it was not a > > > measured response to addressing the bigger problem we are all trying to > > > tackle. There is no concept of a "slave" branch, and "master" by itself > > is > > > a pretty generic term (Is someone having "mastery" over a skill a bad > > > thing?). I fear all it would end up achieving in the end with Github > is a > > > mess of broken build scripts at best. > > > So +1 on "slave" being the problematic term IMO, not "master". > > > > > > On Thu, Jun 18, 2020 at 8:19 PM Phill Campbell > > > wrote: > > > > > >> Master - Worker > > >> Master - Peon > > >> Master - Helper > > >> Master - Servant > > >> > > >> The term that is not wanted is “slave’. The term “master” is not a > > problem > > >> IMO. > > >> > > >>> On Jun 18, 2020, at 3:59 PM, Jan Høydahl > > wrote: > > >>> > > >>> I support Mike Drob and Trey Grainger. We shuold re-use the > > >> leader/replica > > >>> terminology from Cloud. Even if you hand-configure a master/slave > > cluster > > >>> and orchestrate what doc goes to which node/shard, and hand-code your > > >> shards > > >>> parameter, you will still have a cluster where you’d send updates to > > the > > >> leader of > > >>> each shard and the replicas would replicate the index from the > leader. > > >>> > > >>> Let’s instead find a new good name for the cluster type. Standalone > > kind > > >> of works > > >>> for me, but I see it can be confused with single-node. We have also > > >> discussed > > >>> replacing SolrCloud (which is a terrible name) with something more > > >> descriptive. > > >>> > > >>> Today: SolrCloud vs Master/slave > > >>> Alt A: SolrCloud vs Standalone > > >>> Alt B: SolrCloud vs Legacy > > >>> Alt C: Clustered vs Independent > > >>> Alt D: Clustered vs Manual mode > > >>> > > >>> Jan > > >>> > > 18. jun. 2020 kl. 15:53 skrev Mike Drob : > > > > I personally think that using Solr cloud terminology for this would > be > > >> fine > > with leader/follower. The leader is the one that accepts updates, > > >> followers > > cascade the updates somehow. The presence of ZK or election doesn’t > > >> really > > change this detail. > > > > However, if folks feel that it’s confusing, then I can’t tell them > > that > > they’re not confused. Especially when they’re working with others > who > > >> have > > less Solr experience than we do and are less familiar with the > > >> intricacies. > > > > Primary/Replica seems acceptable. Coordinator instead of Overseer > > seems > > acceptable. > > > > Would love to see this in 9.0! > > > > Mike > > > > On Thu, Jun 18, 2020 at 8:25 AM John Gallagher > > wrote: > > > > > While on the topic of renaming roles, I'd like to propose finding a > > >> better > > > term than "overseer" which has historical slavery connotations as > > well. > > > Director, perhaps? > > > > > > > > > John Gallagher > > > > > > On Thu, Jun 18, 2020 at 8:48 AM Jason Gerlowski < > > gerlowsk...@gmail.com > > >>> > > > wrote: > > > > > >> +1 to rename master/slave, and +1 to choosing terminology distinct > > >> from what's used for SolrCloud. I could be happy with several of > > the > > >> proposed options. Since a good few have been proposed though, > maybe > > >> an eventual vote thread is the most organized way to aggregate the > > >> opinions here. > > >> > > >> I'm less positive about the prospect of changing the name of our > > >> primary git branch. Most projects that contributors might come > > from, > > >> most tutorials out there to learn git, most tools built on top of > > git > > >> - the majority are going to assume "master" as the main branch. I > > >> appreciate the change that Github is trying to effect in changing > > the > > >> default for new projects, but it'll be a long time before that > > >> competes with the huge bulk of projects, documentation, etc.
Re: Getting rid of Master/Slave nomenclature in Solr
+1 for Leader-Follower. How about Publisher-Subscriber? On Wed, Jun 17, 2020 at 5:19 PM Rahul Goswami wrote: > +1 on avoiding SolrCloud terminology. In the interest of keeping it obvious > and simple, may I I please suggest primary/secondary? > > On Wed, Jun 17, 2020 at 5:14 PM Atita Arora wrote: > > > I agree avoiding using of solr cloud terminology too. > > > > I may suggest going for "prime" and "clone" > > (Short and precise as Master and Slave). > > > > Best, > > Atita > > > > > > > > > > > > On Wed, 17 Jun 2020, 22:50 Walter Underwood, > > wrote: > > > > > I strongly disagree with using the Solr Cloud leader/follower > terminology > > > for non-Cloud clusters. People in my company are confused enough > without > > > using polysemous terminology. > > > > > > “This node is the leader, but it means something different than the > > leader > > > in this other cluster.” I’m dreading that conversation. > > > > > > I like “principal”. How about “clone” for the slave role? That suggests > > > that > > > it does not accept updates and that it is loosely-coupled, only > depending > > > on the state of the no-longer-called-master. > > > > > > Chegg has five production Solr Cloud clusters and one production > > > master/slave > > > cluster, so this is not a hypothetical for us. We have 100+ Solr hosts > in > > > production. > > > > > > wunder > > > Walter Underwood > > > wun...@wunderwood.org > > > http://observer.wunderwood.org/ (my blog) > > > > > > > On Jun 17, 2020, at 1:36 PM, Trey Grainger > wrote: > > > > > > > > Proposal: > > > > "A Solr COLLECTION is composed of one or more SHARDS, which each have > > one > > > > or more REPLICAS. Each replica can have a ROLE of either: > > > > 1) A LEADER, which can process external updates for the shard > > > > 2) A FOLLOWER, which receives updates from another replica" > > > > > > > > (Note: I prefer "role" but if others think it's too overloaded due to > > the > > > > overseer role, we could replace it with "mode" or something similar) > > > > --- > > > > > > > > To be explicit with the above definitions: > > > > 1) In SolrCloud, the roles of leaders and followers can dynamically > > > change > > > > based upon the status of the cluster. In standalone mode, they can be > > > > changed by manual intervention. > > > > 2) A leader does not have to have any followers (i.e. only one active > > > > replica) > > > > 3) Each shard always has one leader. > > > > 4) A follower can also pull updates from another follower instead of > a > > > > leader (traditionally known as a REPEATER). A repeater is still a > > > follower, > > > > but would not be considered a leader because it can't process > external > > > > updates. > > > > 5) A replica cannot be both a leader and a follower. > > > > > > > > In addition to the above roles, each replica can have a TYPE of one > of: > > > > 1) NRT - which can serve in the role of leader or follower > > > > 2) TLOG - which can only serve in the role of follower > > > > 3) PULL - which can only serve in the role of follower > > > > > > > > A replica's type may be changed automatically in the event that its > > role > > > > changes. > > > > > > > > I think this terminology is consistent with the current > Leader/Follower > > > > usage while also being able to easily accomodate a rename of the > > > historical > > > > master/slave terminology without mental gymnastics or the > introduction > > or > > > > more cognitive load through new terminology. I think adopting the > > > > Primary/Replica terminology will be incredibly confusing given the > > > already > > > > specific and well established meaning of "replica" within Solr. > > > > > > > > All the Best, > > > > > > > > Trey Grainger > > > > Founder, Searchkernel > > > > https://searchkernel.com > > > > > > > > > > > > > > > > On Wed, Jun 17, 2020 at 3:38 PM Anshum Gupta > > > > wrote: > > > > > > > >> Hi everyone, > > > >> > > > >> Moving a conversation that was happening on the PMC list to the > public > > > >> forum. Most of the following is just me recapping the conversation > > that > > > has > > > >> happened so far. > > > >> > > > >> Some members of the community have been discussing getting rid of > the > > > >> master/slave nomenclature from Solr. > > > >> > > > >> While this may require a non-trivial effort, a general consensus so > > far > > > >> seems to be to start this process and switch over incrementally, if > a > > > >> single change ends up being too big. > > > >> > > > >> There have been a lot of suggestions around what the new > nomenclature > > > might > > > >> look like, a few people don’t want to overlap the naming here with > > what > > > >> already exists in SolrCloud i.e. leader/follower. > > > >> > > > >> Primary/Replica was an option that was suggested based on what other > > > >> vendors are moving towards based on Wikipedia: > > > >> https://en.wikipedia.org/wiki/Master/slave_(technology) > > > >> , however there
Re: RankLib model output format to Solr LTR model format
Thanks Doug, this is very helpful. On Wed, Jun 17, 2020 at 1:11 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > There are several scripts for doing this. > > I might encourage you to checkout our Hello LTR library of notebooks, which > has a ranklib training driver, and helpers to log training data, train a > model w/ Ranklib, and search with it. I am using this code for my LTR > contributions AI Powered Search > > http://github.com/o19s/hello-ltr > > But if you just care about the conversion, check out this code. It's > adapted / inspired by code written by Christine Poerschke with her Ltr For > Bees demo / talk > > https://github.com/o19s/hello-ltr/blob/master/ltr/helpers/convert.py > > Best > -Doug > > > > > On Wed, Jun 17, 2020 at 12:46 PM gnandre wrote: > > > Hi, > > > > Before I start writing my own implementation for converting RankLib's > model > > output format to Solr LTR model format for my own use cases, I just > wanted > > to check if there is any work done on this front already. Any references > > are welcome. > > > > > -- > *Doug Turnbull **| CTO* | OpenSource Connections > <http://opensourceconnections.com>, LLC | 240.476.9983 > Author: Relevant Search <http://manning.com/turnbull>; Contributor: *AI > Powered Search <http://aipoweredsearch.com>* > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. >
RankLib model output format to Solr LTR model format
Hi, Before I start writing my own implementation for converting RankLib's model output format to Solr LTR model format for my own use cases, I just wanted to check if there is any work done on this front already. Any references are welcome.
Re: Lucene query to Solr query
Is this odd use-case where one needs to convert Lucene query to Solr query? Isn't this normal use-case when somebody is trying to port their Lucene code to Solr? I mean, is it like a XY problem where I should not even run into this problem in the first place? On Sun, May 31, 2020 at 9:40 AM Mikhail Khludnev wrote: > There's nothing like this now. Presumably one might visit queries and > generate Query DSL json, but it might be a challenging problem. > > On Sun, May 31, 2020 at 3:42 AM gnandre wrote: > > > I think this question here in this thread is similar to my question. > > > > > https://lucene.472066.n3.nabble.com/Lucene-Query-to-Solr-query-td493751.html > > > > > > As suggested in that thread, I do not want to use toString method for > > Lucene query to pass it to the q param in SolrQuery. > > > > I am looking for a function that accepts org.apache.lucene.search.Query > and > > returns org.apache.solr.client.solrj.SolrQuery. Is that possible? > > > > On Sat, May 30, 2020 at 8:08 AM Erick Erickson > > wrote: > > > > > edismas is quite different from straight Lucene. > > > > > > Try attaching =query to the input and > > > you’ll see the difference. > > > > > > Best, > > > Erick > > > > > > > On May 30, 2020, at 12:32 AM, gnandre > wrote: > > > > > > > > Hi, > > > > > > > > I have following query which works fine as a lucene query: > > > > +(topics:132)^0.02607211 (topics:146)^0.008187325 > > > > -asset_id:doc:en:index.html > > > > > > > > But, it does not work if I use it as a solr query with lucene as > > defType. > > > > > > > > For it to work, I need to convert it like following: > > > > q=+((topics:132)^0.02607211 (topics:146)^0.008187325 > > > > +(-(asset_id:doc\:en\:index.html))=edismax=OR > > > > > > > > Why does it not work as is? AFAIK syntax given in the first query is > > > > supported by edismax. > > > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: Lucene query to Solr query
I think this question here in this thread is similar to my question. https://lucene.472066.n3.nabble.com/Lucene-Query-to-Solr-query-td493751.html As suggested in that thread, I do not want to use toString method for Lucene query to pass it to the q param in SolrQuery. I am looking for a function that accepts org.apache.lucene.search.Query and returns org.apache.solr.client.solrj.SolrQuery. Is that possible? On Sat, May 30, 2020 at 8:08 AM Erick Erickson wrote: > edismas is quite different from straight Lucene. > > Try attaching =query to the input and > you’ll see the difference. > > Best, > Erick > > > On May 30, 2020, at 12:32 AM, gnandre wrote: > > > > Hi, > > > > I have following query which works fine as a lucene query: > > +(topics:132)^0.02607211 (topics:146)^0.008187325 > > -asset_id:doc:en:index.html > > > > But, it does not work if I use it as a solr query with lucene as defType. > > > > For it to work, I need to convert it like following: > > q=+((topics:132)^0.02607211 (topics:146)^0.008187325 > > +(-(asset_id:doc\:en\:index.html))=edismax=OR > > > > Why does it not work as is? AFAIK syntax given in the first query is > > supported by edismax. > >
Lucene query to Solr query
Hi, I have following query which works fine as a lucene query: +(topics:132)^0.02607211 (topics:146)^0.008187325 -asset_id:doc:en:index.html But, it does not work if I use it as a solr query with lucene as defType. For it to work, I need to convert it like following: q=+((topics:132)^0.02607211 (topics:146)^0.008187325 +(-(asset_id:doc\:en\:index.html))=edismax=OR Why does it not work as is? AFAIK syntax given in the first query is supported by edismax.
Does Learning To Rank feature require SolrCloud?
Hi, Do following features require SolrCloud? Or do they work in master-slave mode just fine? 1. Learning to rank (LTR) 2. Distributed IDF
Re: SolrCloud upgrade concern
Thanks for all this information. It clears lot of confusion surrounding CDCR feature. Although, I should say that if CDCR functionality is so fragile in SolrCloud and not worth pursuing much, does it make sense to add some warning about its possible shortcomings in the documentation? On Thu, May 28, 2020 at 9:02 AM Jan Høydahl wrote: > I had a client who asked a lot about CDCR a few years ago, but I kept > recommending > aginst it and recommended them to go for Ericks’s alternative (2), since > they anyway > needed to replicate their Oracle DBs in each DC as well. Much cleaner > design to let > each cluster have a local datasource and always stay in sync with local DB > than to > replicate both DB and index. > > There are of course use cases where you want to sync a read-only copy of > indices > to multiple DCs. I hope we’ll see a 3rd party tool for that some day, > something that > can sit outside your Solr clusters, monitor ZK of each cluster, and do > some magic :) > > Jan > > > 28. mai 2020 kl. 01:17 skrev Erick Erickson : > > > > The biggest issue with CDCR is it’s rather fragile and requires > monitoring, > > it’s not a “fire and forget” type of functionality. For instance, the > use of the > > tlogs as a queueing mechanism means that if, for any reason, the > communications > > between DCs is broken, the tlogs will grow forever until the connection > is > > re-established. Plus the other issues Jason pointed out. > > > > So yes, some companies do use CDCR to communicate between separate > > DCs. But they also put in some “roll your own” type of monitoring to > insure > > things don’t go haywire. > > > > Alternatives: > > 1> use something that’s built from the ground up to provide reliable > > messaging between DCs. Kafka or similar has been mentioned. Write > > your updates to the Kafka queue and consume them in both DCs. > > These kinds of solutions have a lot more robustness. > > > > 2> reproduce your system-of-record rather than Solr in the DCs and > > treat the DCs as separate installations. If you adopt this approach, > > some of the streaming capabilities can be used to monitor that they stay > > in sync. For instance have a background or periodic task that’ll take a > while > > for a complete run wrap two "search" streams in a "unique” decorator, > > anything except an empty result identifies docs not on both DCs. > > > > 3> Oh Dear. This one is “interesting”. Wrap a “topic" stream on DC1 in > >an update decorator for DC2 and wrap both of those in a daemon > decorator. > > That’s gobbledygook, and you’ll have to dig through the docs a bit for > > that to make sense. Essentially the topic stream is one of the very > few > > streams that does not (IIRC) require all values in the fl list be > docValues. > > It fires the first time and establishes a checkpoint, finding all docs > up to that point. > > Thereafter, it’ll get docs that have changed since the last time it > ran. It uses a tiny > > collection for record keeping. Each time the topic stream finds new > docs, it passes > > them to the update stream which sends them to another DC. Wrapping the > whole > > thing in a daemon decorator means it periodically runs in the > background. The one > > shortcoming is that this approach doesn’t propagate deletes. That’s > enough of that > > until you tell us whether it sounds worth pursuing ;) > > > > So overall, you _can_ use CDCR to connect remote DCs, but it takes time > and energy > > to make it robust. Its advantage is that it’s entirely contained within > Solr. But it’s not > > getting much attention lately, meaning nobody has decided the > functionality is important > > enough to them to donate the time/resources to make it more robust. Were > someone > > to take an active interest in it, likely it could be kept around as a > plugin that core Solr > > is not responsible for. > > > > Best, > > Erick > > > >> On May 27, 2020, at 4:43 PM, gnandre wrote: > >> > >> Thanks, Jason. This is very helpful. > >> > >> I should clarify though that I am not using CDCR currently with my > >> existing master-slave architecture. What I meant to say earlier was > that we > >> will be relying heavily on the CDCR feature if we migrate from solr > >> master-slave architecture to solrcloud architecture. Are there any > >> alternatives to CDCR? AFAIK, if you want to replicate between different > >> data centers then CDCR is the only option. Also, when you s
Re: SolrCloud upgrade concern
Thanks, Jason. This is very helpful. I should clarify though that I am not using CDCR currently with my existing master-slave architecture. What I meant to say earlier was that we will be relying heavily on the CDCR feature if we migrate from solr master-slave architecture to solrcloud architecture. Are there any alternatives to CDCR? AFAIK, if you want to replicate between different data centers then CDCR is the only option. Also, when you say lot of customers are using SolrCloud successfully, how are they working around the CDCR situation? Do they not have any data center use cases? Is there some list maintained somewhere where one can find which companies are using SolrCloud successfully? On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski wrote: > Hi Arnold, > > From what I saw in the community, CDCR saw an initial burst of > development around when it was contributed, but hasn't seen much > attention or improvement since. So while it's been around for a few > years, I'm not sure it's improved much in terms of stability or > compatibility with other Solr features. > > Some of the bigger ticket issues still open around CDCR: > - SOLR-11959 no support for basic-auth > - SOLR-12842 infinite retry of failed update-requests (leads to > sync/recovery problems) > - SOLR-12057 no real support for NRT/TLOG/PULL replicas > - SOLR-10679 no support for collection aliases > > These are in addition to other more architectural issues: CDCR can be > a bottleneck on clusters with high ingestion rates, CDCR uses > full-index-replication more than traditional indexing setups, which > can cause issues with modern index sizes, etc. > > So, unfortunately, no real good news in terms of CDCR maturing much in > recent releases. Joel Bernstein filed a JIRA recently suggesting its > removal entirely actually. Though I don't think it's gone anywhere. > > That said, I gather from what you said that you're already using CDCR > successfully with Master-Slave. If none of these pitfalls are biting > you in your current Master-Slave setup, you might not be bothered by > them any more in SolrCloud. Most of the problems with CDCR are > applicable in master-slave as well as SolrCloud. I wouldn't recommend > CDCR if you were starting from scratch, and I still recommend you > consider other options. But since you're already using it with some > success, it might be an orthogonal concern to your potential migration > to SolrCloud. > > Best of luck deciding! > > Jason > > On Fri, May 22, 2020 at 7:06 PM gnandre wrote: > > > > Thanks for this reply, Jason. > > > > I am mostly worried about CDCR feature. I am relying heavily on it. > > Although, I am planning to use Solr 8.3. It has been long time since CDCR > > was first introduced. I wonder what is the state of CDCR is 8.3. Is it > > stable now? > > > > On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski > wrote: > > > > > Hi Arnold, > > > > > > The stability and complexity issues Mark highlighted in his post > > > aren't just imagined - there are real, sometimes serious, bugs in > > > SolrCloud features. But at the same time there are many many stable > > > deployments out there where SolrCloud is a real success story for > > > users. Small example, I work at a company (Lucidworks) where our main > > > product (Fusion) is built heavily on top of SolrCloud and we see it > > > deployed successfully every day. > > > > > > In no way am I trying to minimize Mark's concerns (or David's). There > > > are stability bugs. But the extent to which those need affect you > > > depends a lot on what your deployment looks like. How many nodes? > > > How many collections? How tightly are you trying to squeeze your > > > hardware? Is your network flaky? Are you looking to use any of > > > SolrCloud's newer, less stable features like CDCR, etc.? > > > > > > Is SolrCloud better for you than Master/Slave? It depends on what > > > you're hoping to gain by a move to SolrCloud, and on your answers to > > > some of the questions above. I would be leery of following any > > > recommendations that are made without regard for your reason for > > > switching or your deployment details. Those things are always the > > > biggest driver in terms of success. > > > > > > Good luck making your decision! > > > > > > Best, > > > > > > Jason > > > >
Re: TimestampUpdateProcessorFactory updates the field even if the value if present
Thanks for the detailed response, Chris. I am aware of the partial (atomic) updates. Thanks for clarifying the confusion about input document vs indexed document. I was thinking that TimestampUpdateProcessorFactory checks if the value exists in the field inside indexed document before updating it but actually it does check if it present inside the input request. But the why do we require explicit processor for that? This can be done with a simple field in schema that has default value as NOW. I tried your idea about MinFieldValueUpdateProcessorFactory but it does not work. Here is the configuration: index_time_stamp_create index_time_stamp_create I think MinFieldValueUpdateProcessorFactory keeps the min value in a multivalued field which index_time_stamp_create is not. On Tue, May 26, 2020 at 2:31 PM Chris Hostetter wrote: > : Subject: TimestampUpdateProcessorFactory updates the field even if the > value > : if present > : > : Hi, > : > : Following is the update request processor chain. > : > : > < > : processor class="solr.TimestampUpdateProcessorFactory"> : "fieldName">index_time_stamp_create : "solr.LogUpdateProcessorFactory" /> : "solr.RunUpdateProcessorFactory" /> > : > : And, here is how the field is defined in schema.xml > : > : : "true" /> > : > : Every time I index the same document, above field changes its value with > : latest timestamp. According to TimestampUpdateProcessorFactory javadoc > : page, if a document does not contain a value in the timestamp field, a > new > > based on the wording of your question, i suspect you are confused about > the overall behavior of how "updating" an existing document works in solr, > and how update processors "see" an *input document* when processing an > add/update command. > > > First off, completley ignoring TimestampUpdateProcessorFactory and > assuming just the simplest possibel update change, let's clarify how > "updates" work, let's assume you when you say you "index the same > document" twice you do so with a few diff field values ... > > First Time... > > { id:"x", title:"" } > > Second time... > > { id:"x", body:" xxx" } > > Solr does not implicitly know that you are trying to *update* that > document, the final result will not be a document containing both a > "title" field and "body" field in addition to the "id", it will *only* > have the "id" and "body" fields and the title field will be lost. > > The way to "update" a document *and keep existing field values* is with > one of the "Atomic Update" command options... > > > https://lucene.apache.org/solr/guide/8_4/updating-parts-of-documents.html#UpdatingPartsofDocuments-AtomicUpdates > > { id:"x", title:"" } > > Second time... > > { id:"x", body: { set: " xxx" } } > > > Now, with that background info clarified: let's talk about update > processors > > > The docs for TimestampUpdateProcessorFactory are refering to how it > modifies an *input* document that it recieves (as part of the processor > chain). It adds the timestamp field if it's not already in the *input* > document, it doesn't know anything about wether that document is already > in the index, or if it has a value for that field in the index. > > > When processors like TimestampUpdateProcessorFactory (or any other > processor that modifies a *input* document) are run they don't know if the > document you are "indexing" already exists in the index or not. even if > you are using the "atomic update" options to set/remove/add a field value, > with the intent of preserving all other field values, the documents based > down the processors chain don't include those values until the "document > merger" logic is run -- as part of the DistributedUpdateProcessor (which > if not explicit in your chain happens immediatly before the > RunUpdateProcessorFactory) > > Off the top of my head i don't know if there is an "easy" way to have a > Timestamp added to "new" documents, but left "as is" for existing > documents. > > Untested idea > > explicitly configured > DistributedUpdateProcessorFactory, so that (in addition to putting > TimestampUpdateProcessorFactory before it) you can > also put MinFieldValueUpdateProcessorFactory on the timestamp field > *after* DistributedUpdateProcessorFactory (but before > RunUpdateProcessorFactory). > > I think that would work? > > Just putting TimestampUpdateProcessorFactory after the > DistributedUpdateProcessorFactory would be dangerous, because it would > introduce descrepencies -- each replica would would up with it's own > locally computed timestamp. having the timetsamp generated before the > distributed update processor ensures the value is computed only once. > > -Hoss > http://www.lucidworks.com/ >
Re: SolrCloud upgrade concern
Thanks for this reply, Jason. I am mostly worried about CDCR feature. I am relying heavily on it. Although, I am planning to use Solr 8.3. It has been long time since CDCR was first introduced. I wonder what is the state of CDCR is 8.3. Is it stable now? On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski wrote: > Hi Arnold, > > The stability and complexity issues Mark highlighted in his post > aren't just imagined - there are real, sometimes serious, bugs in > SolrCloud features. But at the same time there are many many stable > deployments out there where SolrCloud is a real success story for > users. Small example, I work at a company (Lucidworks) where our main > product (Fusion) is built heavily on top of SolrCloud and we see it > deployed successfully every day. > > In no way am I trying to minimize Mark's concerns (or David's). There > are stability bugs. But the extent to which those need affect you > depends a lot on what your deployment looks like. How many nodes? > How many collections? How tightly are you trying to squeeze your > hardware? Is your network flaky? Are you looking to use any of > SolrCloud's newer, less stable features like CDCR, etc.? > > Is SolrCloud better for you than Master/Slave? It depends on what > you're hoping to gain by a move to SolrCloud, and on your answers to > some of the questions above. I would be leery of following any > recommendations that are made without regard for your reason for > switching or your deployment details. Those things are always the > biggest driver in terms of success. > > Good luck making your decision! > > Best, > > Jason >
Re: TimestampUpdateProcessorFactory updates the field even if the value if present
Hi, I do not pass that field at all. Here is the document that I index again and again to test through Solr Admin UI. { asset_id:"x:1", title:"x" } On Thu, May 21, 2020 at 5:25 PM Furkan KAMACI wrote: > Hi, > > How do you index that document? Do you index it with an empty > *index_time_stamp_create* field as the second time too? > > Kind Regards, > Furkan KAMACI > > On Fri, May 22, 2020 at 12:05 AM gnandre wrote: > > > Hi, > > > > Following is the update request processor chain. > > > > > > > < > > processor class="solr.TimestampUpdateProcessorFactory"> > "fieldName">index_time_stamp_create > "solr.LogUpdateProcessorFactory" /> > "solr.RunUpdateProcessorFactory" /> > > > > And, here is how the field is defined in schema.xml > > > > > "true" /> > > > > Every time I index the same document, above field changes its value with > > latest timestamp. According to TimestampUpdateProcessorFactory javadoc > > page, if a document does not contain a value in the timestamp field, a > new > > Date will be generated and added as the value of that field. After the > > first indexing this document should always have a value, so why then it > > gets updated later? > > > > I am using Solr Admin UI's Documents tab to index the document for > testing. > > I am using Solr 6.3 in master-slave architecture mode. > > >
TimestampUpdateProcessorFactory updates the field even if the value if present
Hi, Following is the update request processor chain. < processor class="solr.TimestampUpdateProcessorFactory"> index_time_stamp_create And, here is how the field is defined in schema.xml Every time I index the same document, above field changes its value with latest timestamp. According to TimestampUpdateProcessorFactory javadoc page, if a document does not contain a value in the timestamp field, a new Date will be generated and added as the value of that field. After the first indexing this document should always have a value, so why then it gets updated later? I am using Solr Admin UI's Documents tab to index the document for testing. I am using Solr 6.3 in master-slave architecture mode.