Re: Exact matching without using new fields

2021-01-19 Thread gnandre
Thanks for replying, Dave.

I am afraid that I am looking for non-index time i.e. query time solution.

Actually in my case I am expecting both documents to be returned from your
example. I am just trying to avoid returning of documents which contain a
tokenized versions
of the provided search query when it is enclosed within double quotes to
indicate exact matching expectation.

e.g.
search query -> "information retrieval"

This should match documents like following:
doc 1: "information retrieval"
doc 2: "Advanced information retrieval with Solr"

but should NOT match documents like
doc 3: "informed retrieval"
doc 4: "information extraction"  (considering 'extraction' was a specified
synonym of 'retrieval' )
doc 5: "INFORMATION RETRIEVAL"

etc

I am also ok with these documents showing up as long as they show up at
bottom. Also, query time solution is a must.

On Tue, Jan 19, 2021 at 12:22 PM David R  wrote:

> We had the same requirement. Just to echo back your requirements, I
> understand your case to be this. Given these 2 doc titles:
>
> doc 1: "information retrieval"
> doc 2: "Advanced information retrieval with Solr"
>
> You want a phrase search for "information retrieval" to find both
> documents, but an EXACT phrase search for "information retrieval" to find
> doc #1 only.
>
> If that's true, and case-sensitive search isn't a requirement, I indexed
> this in the token stream, with adjacent positions of course.
>
> START information retrieval END
> START advanced information retrieval with solr END
>
> And with our custom query parser, when an EXACT operator is found, I
> tokenize the query to match the first case. Otherwise pass it through.
>
> Needs custom analyzers on the query and index sides to generate the
> correct token sequences.
>
> It's worked out well for our case.
>
> Dave
>
>
>
> 
> From: gnandre 
> Sent: Tuesday, January 19, 2021 4:07 PM
> To: solr-user@lucene.apache.org 
> Subject: Exact matching without using new fields
>
> Hi,
>
> I am aware that to do exact matching (only whatever is provided inside
> double quotes should be matched) in Solr, we can copy existing fields with
> the help of copyFields into new fields that have very minimal tokenization
> or no tokenization (e.g. using KeywordTokenizer or using string field type)
>
> However this solution is expensive in terms of index size because it might
> almost double the size of the existing index.
>
> Is there any inexpensive way of achieving exact matches from the query
> side. e.g. boost the original tokens more at query time compared to their
> tokens?
>


Exact matching without using new fields

2021-01-19 Thread gnandre
Hi,

I am aware that to do exact matching (only whatever is provided inside
double quotes should be matched) in Solr, we can copy existing fields with
the help of copyFields into new fields that have very minimal tokenization
or no tokenization (e.g. using KeywordTokenizer or using string field type)

However this solution is expensive in terms of index size because it might
almost double the size of the existing index.

Is there any inexpensive way of achieving exact matches from the query
side. e.g. boost the original tokens more at query time compared to their
tokens?


FST building precaution

2021-01-08 Thread gnandre
Hi,

following comment is mentioned in
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/fst/package-info.java
.


 "Input values (keys). These must be provided to Builder in Unicode code
point (UTF8 or UTF32) sorted order. Note that sorting by Java's
String.compareTo, which is UTF16 sorted order, is not correct and can lead
to exceptions while building the FST"

Can someone please suggest how to achieve this?


distrib.requestTimes and distrib.totalTime metric always show 0 for any sub-metric

2020-12-23 Thread gnandre
*distrib.requestTimes and *distrib.totalTime metric always show 0 for any
sub-metric. Only *local.requestTimes and *local.totalTime metric have
non-zero values. This is when we hit solr:8983/solr/admin/metrics endpoint.

e.g.

  "QUERY./select.distrib.requestTimes":{
"count":0,
"meanRate":0.0,
"1minRate":0.0,
"5minRate":0.0,
"15minRate":0.0,
"min_ms":0.0,
"max_ms":0.0,
"mean_ms":0.0,
"median_ms":0.0,
"stddev_ms":0.0,
"p75_ms":0.0,
"p95_ms":0.0,
"p99_ms":0.0,
"p999_ms":0.0},


  "QUERY./select.local.requestTimes":{
"count":921,
"meanRate":0.016278013505962197,
"1minRate":0.02502213358051701,
"5minRate":0.01792972725206014,
"15minRate":0.016913129796499247,
"min_ms":0.092099,
"max_ms":27.833606,
"mean_ms":1.5546483254237826,
"median_ms":0.211898,
"stddev_ms":2.353088809601306,
"p75_ms":0.278897,
"p95_ms":5.547842,
"p99_ms":5.547842,
"p999_ms":9.239902},


  "QUERY./select.requestTimes":{
"count":921,
"meanRate":0.01627801345713971,
"1minRate":0.02502213358051701,
"5minRate":0.01792972725206014,
"15minRate":0.016913129796499247,
"min_ms":0.094899,
"max_ms":27.840706,
"mean_ms":1.5588447262406753,
"median_ms":0.216198,
"stddev_ms":2.352629359382386,
"p75_ms":0.284497,
"p95_ms":5.551242,
"p99_ms":5.551242,
"p999_ms":9.242902},


I am using the 8.5.2 version of Solr in standalone mode.I have some queries
that are distributed in the sense that they use shards parameter to
distribute the query among different cores. I was expecting that the
distrib metric would have some value when I execute these distributed
queries.

Also, what is the need of a third metric besides local and distrib?


Duplicate entries for request handlers in Solr metric reporter

2020-10-26 Thread gnandre
Hi,

I have hooked up Grafana dashboards with Solr 8.5.2 Prometheus exporter.
For some reason, some dashboards like Requests, Timeouts are not showing
any data. When I took a look at corresponding data from Prometheus
exporter, it showed two entries per search request handler, first with
count of 0 and the second with the correct count. I am not sure why the
entry with count 0 is appearing or all search request handlers. I checked
the configuration and there is no duplication of request handlers in
solrconfig.xml. My guest is that Grafana is picking up this first entry and
therefore does not show any data.

E.g.

solr_metrics_core_requests_total{category="QUERY",handler="/questions",core="answers",base_url="
http://localhost:8983/solr",} 0.0

solr_metrics_core_requests_total{category="QUERY",handler="/questions",core="answers",base_url="
http://localhost:8983/solr",} 4534446.0


Error false and Error true in Solr logs

2020-10-15 Thread gnandre
Hi,

What do Error false and Error true flags mentioned against Solr errors in
Solr admin UI log mean?


Re: Term too complex for spellcheck.q param

2020-10-07 Thread gnandre
Is there a way to truncate spellcheck.q param value from Solr side?

On Wed, Oct 7, 2020, 6:22 PM gnandre  wrote:

> Thanks. Is this going to be fixed in some future version?
>
> On Wed, Oct 7, 2020, 4:15 PM Mike Drob  wrote:
>
>> Right now the only solution is to use a shorter term.
>>
>> In a fuzzy query you could also try using a lower edit distance e.g.
>> term~1
>> (default is 2), but I’m not sure what the syntax for a spellcheck would
>> be.
>>
>> Mike
>>
>> On Wed, Oct 7, 2020 at 2:59 PM gnandre  wrote:
>>
>> > Hi,
>> >
>> > I am getting following error when I pass '
>> > 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
>> > ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2
>> >
>> > {
>> >   "error": {
>> > "code": 500,
>> > "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
>> > 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
>> > "trace":
>> "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
>> > Term too complex:
>> > 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
>> >
>> >
>> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
>> >
>> >
>> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
>> >
>> >
>> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
>> >
>> >
>> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
>> >
>> >
>> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
>> >
>> >
>> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
>> >
>> >
>> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
>> >
>> >
>> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
>> >
>> >
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
>> >
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
>> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
>> >
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
>> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
>> >
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
>> >
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
>> >
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
>> >
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> >
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
>> >
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat

Re: Term too complex for spellcheck.q param

2020-10-07 Thread gnandre
Thanks. Is this going to be fixed in some future version?

On Wed, Oct 7, 2020, 4:15 PM Mike Drob  wrote:

> Right now the only solution is to use a shorter term.
>
> In a fuzzy query you could also try using a lower edit distance e.g. term~1
> (default is 2), but I’m not sure what the syntax for a spellcheck would be.
>
> Mike
>
> On Wed, Oct 7, 2020 at 2:59 PM gnandre  wrote:
>
> > Hi,
> >
> > I am getting following error when I pass '
> > 김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
> > ' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2
> >
> > {
> >   "error": {
> > "code": 500,
> > "msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
> > 유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
> > "trace":
> "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
> > Term too complex:
> > 김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
> >
> >
> org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
> >
> >
> org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
> >
> >
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
> >
> >
> org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
> >
> >
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
> >
> >
> org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
> >
> >
> org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
> >
> >
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
> >
> >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> >
> >
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
> >
> &

Term too complex for spellcheck.q param

2020-10-07 Thread gnandre
Hi,

I am getting following error when I pass '
김포오피➬유유닷컴➬✗UUDAT3.COM유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마
' in spellcheck.q param. How to avoid this error? I am using Solr 8.5.2

{
  "error": {
"code": 500,
"msg": "Term too complex: 김포오피➬유유닷컴➬✗uudat3.com
유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마",
"trace": "org.apache.lucene.search.FuzzyTermsEnum$FuzzyTermsException:
Term too complex:
김포오피➬유유닷컴➬✗uudat3.com유유닷컴김포풀싸롱て김포오피ふ김포휴게텔け김포마사지❂김포립카페じ김포안마\n\tat
org.apache.lucene.search.FuzzyAutomatonBuilder.buildAutomatonSet(FuzzyAutomatonBuilder.java:63)\n\tat
org.apache.lucene.search.FuzzyTermsEnum$AutomatonAttributeImpl.init(FuzzyTermsEnum.java:365)\n\tat
org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:125)\n\tat
org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:92)\n\tat
org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:425)\n\tat
org.apache.lucene.search.spell.DirectSpellChecker.suggestSimilar(DirectSpellChecker.java:376)\n\tat
org.apache.solr.spelling.DirectSolrSpellChecker.getSuggestions(DirectSolrSpellChecker.java:196)\n\tat
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:195)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat
java.lang.Thread.run(Thread.java:748)\nCaused by:

Returning fields a specific order

2020-09-28 Thread gnandre
Hi,

I have a use-case where I want to compare stored fields values of Solr
documents from two different Solr instances. I can use a diff tool to
compare them but only if they returned the fields in specific order in the
response. I tried setting fl param with all the fields specified in
particular order. However, the results that are returned do not follow
specific order given in fl param. Is there any way to achieve this behavior
in Solr?


Re: Difference in q.op param behavior between Solr 6.3 and Solr 8.5.2

2020-09-28 Thread gnandre
Thanks, this is helpful. I agree. q.op param should not affect fq
parameter. I think this is a feature and not a bug.

On Wed, Sep 23, 2020 at 4:39 PM Erik Hatcher  wrote:

> In 6.3 it did that?   It shouldn't have.  q and fq shouldn't share
> parameters.  fq's themselves shouldn't, IMO, have global defaults.  fq's
> need to be stable and often uniquely specified kinds of constraining query
> parsers ({!terms/term/field,etc}) or rely on basic Lucene query parser
> syntax and be able to stably rely on AND/OR.
>
> Relevancy tuning on q and friends, tweaking those parameters, shouldn't
> affect fq's, to say it a little differently.
>
> One can fq={!lucene q.op=AND}id:(1 2 3)
>
> Erik
>
>
> > On Sep 23, 2020, at 4:23 PM, gnandre  wrote:
> >
> > Is there a way to set default operator as AND for fq parameter in Solr
> > 8.5.2 now?
> >
> > On Tue, Sep 22, 2020 at 7:44 PM gnandre  wrote:
> >
> >> In 6.3, q.op param used to affect q as well fq param behavior. E.g. if
> >> q.op is set to AND and fq is set to id:(1 2 3), no results will show up
> but
> >> if it is set to OR then all 3 results will show up. This does not
> happen in
> >> Solr 8.5.2 anymore.
> >>
> >> Is this a bug? What does one need to do in Solr 8.5.2 to achieve the
> same
> >> behavior besides passing the operator directly in fq param i.e. id:(1
> OR 2
> >> OR 3)
> >>
>
>


Re: Difference in q.op param behavior between Solr 6.3 and Solr 8.5.2

2020-09-23 Thread gnandre
Is there a way to set default operator as AND for fq parameter in Solr
8.5.2 now?

On Tue, Sep 22, 2020 at 7:44 PM gnandre  wrote:

> In 6.3, q.op param used to affect q as well fq param behavior. E.g. if
> q.op is set to AND and fq is set to id:(1 2 3), no results will show up but
> if it is set to OR then all 3 results will show up. This does not happen in
> Solr 8.5.2 anymore.
>
> Is this a bug? What does one need to do in Solr 8.5.2 to achieve the same
> behavior besides passing the operator directly in fq param i.e. id:(1 OR 2
> OR 3)
>


Difference in q.op param behavior between Solr 6.3 and Solr 8.5.2

2020-09-22 Thread gnandre
In 6.3, q.op param used to affect q as well fq param behavior. E.g. if q.op
is set to AND and fq is set to id:(1 2 3), no results will show up but if
it is set to OR then all 3 results will show up. This does not happen in
Solr 8.5.2 anymore.

Is this a bug? What does one need to do in Solr 8.5.2 to achieve the same
behavior besides passing the operator directly in fq param i.e. id:(1 OR 2
OR 3)


Re: Solr 8.5.2 - Solr shards param does not work without localhost

2020-08-06 Thread gnandre
Please ignore the space between. I have updated the calls by removing space
below:

http://my.domain.com/solr/core/select?q=*:*=0=10=
my.domain.com/solr/another_core=*

http://my.domain.com/solr/core/select?q=*:*=0=10=
localhost:8983/solr/another_core=*


On Thu, Aug 6, 2020 at 7:59 PM gnandre  wrote:

> Hi,
>
> In Solr 6.3 I was able to use following shards query:
>
> http://my.domain.com/solr/core/select?q=*:*=0=10=
> my.domain.com /solr/another_core=*
>
> Ir does not work in Solr 8.5.2 anymore unless I pass localhost instead of
> my domain in shards param value as follows:
> http://my.domain.com/solr/core/select?q=*:*=0=10=
> localhost:8983  /solr/another_core=*
>
> This is a master-slave setup and not a cloud setup.
>


Solr 8.5.2 - Solr shards param does not work without localhost

2020-08-06 Thread gnandre
Hi,

In Solr 6.3 I was able to use following shards query:

http://my.domain.com/solr/core/select?q=*:*=0=10=
my.domain.com /solr/another_core=*

Ir does not work in Solr 8.5.2 anymore unless I pass localhost instead of
my domain in shards param value as follows:
http://my.domain.com/solr/core/select?q=*:*=0=10=
localhost:8983  /solr/another_core=*

This is a master-slave setup and not a cloud setup.


Solr docker image works with image option but not with build option in docker-compose

2020-07-08 Thread gnandre
Hi,

I am using Solr docker image 8.5.2-slim from https://hub.docker.com/_/solr.
I use it as a base image and then add some more stuff to it with my custom
Dockerfile. When I build the final docker image, it is built successfully.
After that, when I try to use it in docker-compose.yml (with build option)
to start a Solr service, it complains about no permission for creating
directories under /var/solr path. I have given read/write permission to
solr user for /var/solr path in dockerfile.Also, when I use image instead
of build option in docker-compose.yml file for the same image, it does not
throw any errors like that and Solr starts without any issues. Any clue why
this might be happening?


Re: Solr 8.5.2 indexing issue

2020-07-02 Thread gnandre
It seems that the issue is not with reference_url field itself. There is
one copy field which has the reference_url field as source and another
field called url_path as destination.
This destination field url_path has the following field type definition.

  

  
  
 
  
  
  
  
 
  
  


  
  
  
 
  
  
  
  

  

If I remove  SynonymGraphFilterFactory and FlattenGraphFilterFactory in
above field type definition then it works otherwise it throws the
same error (IndexOutOfBoundsException) .

On Sun, Jun 28, 2020 at 9:06 AM Erick Erickson 
wrote:

> How are you sending this to Solr? I just tried 8.5, submitting that doc
> through the admin UI and it works fine.
> I defined “asset_id” with as the same type as your reference_url field.
>
> And does the log on the Solr node that tries to index this give any more
> info?
>
> Best,
> Erick
>
> > On Jun 27, 2020, at 10:45 PM, gnandre  wrote:
> >
> > {
> >"asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",
> >
> >
> "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}
>
>


Re: Downsides to applying to WordDelimiterFilter twice in analyzer chain

2020-07-01 Thread gnandre
Here are links to images for the Analysis tab.

https://pasteboard.co/JfFTYu6.png
https://pasteboard.co/JfFUYXf.png


On Wed, Jul 1, 2020 at 3:03 PM gnandre  wrote:

> I am doing that already but it does not help.
>
> Here is the complete analyzer chain.
>
>  "100">   "solr.WhitespaceTokenizerFactory"/>  "solr.WordDelimiterFilterFactory" protected="protect.txt" preserveOriginal
> ="1" generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>  "solr.LowerCaseFilterFactory"/>  "solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose"/>  class="solr.SynonymFilterFactory" synonyms="synonyms_en.txt" ignoreCase=
> "true" expand="true"/>   class="solr.RemoveDuplicatesTokenFilterFactory"/>   type="query">   class="solr.WordDelimiterFilterFactory" protected="protect.txt"
> preserveOriginal="1" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange=
> "1"/>   "solr.ICUNormalizer2FilterFactory" name="nfkc" mode="compose"/>  class="solr.SynonymFilterFactory" synonyms="synonyms_en_query.txt"
> ignoreCase="true" expand="true"/>  />  
> 
>
> [image: image.png]
>
> [image: image.png]
>
> On Wed, Jul 1, 2020 at 12:29 PM Erick Erickson 
> wrote:
>
>> Why not just specify preserveOriginal and follow by a lowerCaseFilter and
>> use one wordDelimiterFilterFactory?
>>
>> Best,
>> Erick
>>
>> > On Jul 1, 2020, at 11:05 AM, gnandre  wrote:
>> >
>> > Hi,
>> >
>> > To satisfy one use-case, I need to apply WordDelimiterFilter with
>> > splitOnCaseChange
>> > with 0 once and then with 1 again. Are there some downsides to this
>> > approach?
>> >
>> > Use-case is to be able to match results when indexed content is
>> my.camelCase
>> > and search query is camelcase.
>>
>>


Re: Downsides to applying to WordDelimiterFilter twice in analyzer chain

2020-07-01 Thread gnandre
I am doing that already but it does not help.

Here is the complete analyzer chain.


 
 
  

[image: image.png]

[image: image.png]

On Wed, Jul 1, 2020 at 12:29 PM Erick Erickson 
wrote:

> Why not just specify preserveOriginal and follow by a lowerCaseFilter and
> use one wordDelimiterFilterFactory?
>
> Best,
> Erick
>
> > On Jul 1, 2020, at 11:05 AM, gnandre  wrote:
> >
> > Hi,
> >
> > To satisfy one use-case, I need to apply WordDelimiterFilter with
> > splitOnCaseChange
> > with 0 once and then with 1 again. Are there some downsides to this
> > approach?
> >
> > Use-case is to be able to match results when indexed content is
> my.camelCase
> > and search query is camelcase.
>
>


Downsides to applying to WordDelimiterFilter twice in analyzer chain

2020-07-01 Thread gnandre
Hi,

To satisfy one use-case, I need to apply WordDelimiterFilter with
splitOnCaseChange
with 0 once and then with 1 again. Are there some downsides to this
approach?

Use-case is to be able to match results when indexed content is my.camelCase
and search query is camelcase.


Solr 8.5.2 indexing issue

2020-06-27 Thread gnandre
Hi,

I have the following document which fails to get indexed.

{
"asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",

"reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}

I am not sure what is so special about the content in the reference_url
field.

reference_url field is defined as follows in schema:



It throws the following error.

Status: 
{"data":{"responseHeader":{"status":400,"QTime":18},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.IndexOutOfBoundsException"],"msg":"Exception
writing document id add-ons:576deefef7453a9189aa039b66500eb2 to the index;
possible analysis
error.","code":400}},"status":400,"config":{"method":"POST","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","headers":{"Content-type":"application/json","Accept":"application/json,
text/plain, */*","X-Requested-With":"XMLHttpRequest"},"data":"[{\n
\"asset_id\":\"add-ons:576deefef7453a9189aa039b66500eb2\",\n
\"reference_url\":\"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html\"}]","url":"add-ons/update","params":{"wt":"json","_":1593304427428,"commitWithin":1000,"overwrite":true},"timeout":1},"statusText":"Bad
Request","xhrStatus":"complete","resource":{"0":"[","1":"{","2":"\n","3":"
","4":" ","5":" ","6":" ","7":" ","8":" ","9":" ","10":"
","11":"\"","12":"a","13":"s","14":"s","15":"e","16":"t","17":"_","18":"i","19":"d","20":"\"","21":":","22":"\"","23":"a","24":"d","25":"d","26":"-","27":"o","28":"n","29":"s","30":":","31":"5","32":"7","33":"6","34":"d","35":"e","36":"e","37":"f","38":"e","39":"f","40":"7","41":"4","42":"5","43":"3","44":"a","45":"9","46":"1","47":"8","48":"9","49":"a","50":"a","51":"0","52":"3","53":"9","54":"b","55":"6","56":"6","57":"5","58":"0","59":"0","60":"e","61":"b","62":"2","63":"\"","64":",","65":"\n","66":"
","67":" ","68":" ","69":" ","70":" ","71":" ","72":" ","73":"
","74":"\"","75":"r","76":"e","77":"f","78":"e","79":"r","80":"e","81":"n","82":"c","83":"e","84":"_","85":"u","86":"r","87":"l","88":"\"","89":":","90":"\"","91":"m","92":"o","93":"d","94":"e","95":"l","96":"i","97":"n","98":"g","99":"-","100":"a","101":"-","102":"h","103":"i","104":"g","105":"h","106":"-","107":"s","108":"p","109":"e","110":"e","111":"d","112":"-","113":"b","114":"a","115":"c","116":"k","117":"p","118":"l","119":"a","120":"n","121":"e","122":"-","123":"p","124":"a","125":"r","126":"t","127":"-","128":"3","129":"-","130":"4","131":"-","132":"p","133":"o","134":"r","135":"t","136":"-","137":"s","138":"-","139":"p","140":"a","141":"r","142":"a","143":"m","144":"e","145":"t","146":"e","147":"r","148":"s","149":"-","150":"t","151":"o","152":"-","153":"d","154":"i","155":"f","156":"f","157":"e","158":"r","159":"e","160":"n","161":"t","162":"i","163":"a","164":"l","165":"-","166":"t","167":"d","168":"r","169":"-","170":"a","171":"n","172":"d","173":"-","174":"t","175":"d","176":"t","177":".","178":"h","179":"t","180":"m","181":"l","182":"\"","183":"}","184":"]"}}


Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-19 Thread gnandre
Another alternative for master-slave nodes might be parent-child nodes.
This was adopted in Python too afaik.

On Fri, Jun 19, 2020, 2:07 AM gnandre  wrote:

> What about blacklist and whitelist for shards? May I suggest blocklist and
> safelist?
>
> On Fri, Jun 19, 2020, 1:45 AM Thomas Corthals 
> wrote:
>
>> Since "overseer" is also problematic, I'd like to propose "orchestrator"
>> as
>> an alternative.
>>
>> Thomas
>>
>> Op vr 19 jun. 2020 04:34 schreef Walter Underwood > >:
>>
>> > We don’t get to decide whether “master” is a problem. The rest of the
>> world
>> > has already decided that it is a problem.
>> >
>> > Our task is to replace the terms “master” and “slave” in Solr.
>> >
>> > wunder
>> > Walter Underwood
>> > wun...@wunderwood.org
>> > http://observer.wunderwood.org/  (my blog)
>> >
>> > > On Jun 18, 2020, at 6:50 PM, Rahul Goswami 
>> > wrote:
>> > >
>> > > I agree with Phill, Noble and Ilan above. The problematic term is
>> "slave"
>> > > (not master) which I am all for changing if it causes less regression
>> > than
>> > > removing BOTH master and slave. Since some people have pointed out
>> Github
>> > > changing the "master" terminology, in my personal opinion, it was not
>> a
>> > > measured response to addressing the bigger problem we are all trying
>> to
>> > > tackle. There is no concept of a "slave" branch, and "master" by
>> itself
>> > is
>> > > a pretty generic term (Is someone having "mastery" over a skill a bad
>> > > thing?). I fear all it would end up achieving in the end with Github
>> is a
>> > > mess of broken build scripts at best.
>> > > So +1 on "slave" being the problematic term IMO, not "master".
>> > >
>> > > On Thu, Jun 18, 2020 at 8:19 PM Phill Campbell
>> > >  wrote:
>> > >
>> > >> Master - Worker
>> > >> Master - Peon
>> > >> Master - Helper
>> > >> Master - Servant
>> > >>
>> > >> The term that is not wanted is “slave’. The term “master” is not a
>> > problem
>> > >> IMO.
>> > >>
>> > >>> On Jun 18, 2020, at 3:59 PM, Jan Høydahl 
>> > wrote:
>> > >>>
>> > >>> I support Mike Drob and Trey Grainger. We shuold re-use the
>> > >> leader/replica
>> > >>> terminology from Cloud. Even if you hand-configure a master/slave
>> > cluster
>> > >>> and orchestrate what doc goes to which node/shard, and hand-code
>> your
>> > >> shards
>> > >>> parameter, you will still have a cluster where you’d send updates to
>> > the
>> > >> leader of
>> > >>> each shard and the replicas would replicate the index from the
>> leader.
>> > >>>
>> > >>> Let’s instead find a new good name for the cluster type. Standalone
>> > kind
>> > >> of works
>> > >>> for me, but I see it can be confused with single-node. We have also
>> > >> discussed
>> > >>> replacing SolrCloud (which is a terrible name) with something more
>> > >> descriptive.
>> > >>>
>> > >>> Today: SolrCloud vs Master/slave
>> > >>> Alt A: SolrCloud vs Standalone
>> > >>> Alt B: SolrCloud vs Legacy
>> > >>> Alt C: Clustered vs Independent
>> > >>> Alt D: Clustered vs Manual mode
>> > >>>
>> > >>> Jan
>> > >>>
>> > >>>> 18. jun. 2020 kl. 15:53 skrev Mike Drob :
>> > >>>>
>> > >>>> I personally think that using Solr cloud terminology for this
>> would be
>> > >> fine
>> > >>>> with leader/follower. The leader is the one that accepts updates,
>> > >> followers
>> > >>>> cascade the updates somehow. The presence of ZK or election doesn’t
>> > >> really
>> > >>>> change this detail.
>> > >>>>
>> > >>>> However, if folks feel that it’s confusing, then I can’t tell them
>> > that
>> > >>>> they’re not confused. Especially when they’re working with others
>> who
&

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-19 Thread gnandre
What about blacklist and whitelist for shards? May I suggest blocklist and
safelist?

On Fri, Jun 19, 2020, 1:45 AM Thomas Corthals  wrote:

> Since "overseer" is also problematic, I'd like to propose "orchestrator" as
> an alternative.
>
> Thomas
>
> Op vr 19 jun. 2020 04:34 schreef Walter Underwood :
>
> > We don’t get to decide whether “master” is a problem. The rest of the
> world
> > has already decided that it is a problem.
> >
> > Our task is to replace the terms “master” and “slave” in Solr.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Jun 18, 2020, at 6:50 PM, Rahul Goswami 
> > wrote:
> > >
> > > I agree with Phill, Noble and Ilan above. The problematic term is
> "slave"
> > > (not master) which I am all for changing if it causes less regression
> > than
> > > removing BOTH master and slave. Since some people have pointed out
> Github
> > > changing the "master" terminology, in my personal opinion, it was not a
> > > measured response to addressing the bigger problem we are all trying to
> > > tackle. There is no concept of a "slave" branch, and "master" by itself
> > is
> > > a pretty generic term (Is someone having "mastery" over a skill a bad
> > > thing?). I fear all it would end up achieving in the end with Github
> is a
> > > mess of broken build scripts at best.
> > > So +1 on "slave" being the problematic term IMO, not "master".
> > >
> > > On Thu, Jun 18, 2020 at 8:19 PM Phill Campbell
> > >  wrote:
> > >
> > >> Master - Worker
> > >> Master - Peon
> > >> Master - Helper
> > >> Master - Servant
> > >>
> > >> The term that is not wanted is “slave’. The term “master” is not a
> > problem
> > >> IMO.
> > >>
> > >>> On Jun 18, 2020, at 3:59 PM, Jan Høydahl 
> > wrote:
> > >>>
> > >>> I support Mike Drob and Trey Grainger. We shuold re-use the
> > >> leader/replica
> > >>> terminology from Cloud. Even if you hand-configure a master/slave
> > cluster
> > >>> and orchestrate what doc goes to which node/shard, and hand-code your
> > >> shards
> > >>> parameter, you will still have a cluster where you’d send updates to
> > the
> > >> leader of
> > >>> each shard and the replicas would replicate the index from the
> leader.
> > >>>
> > >>> Let’s instead find a new good name for the cluster type. Standalone
> > kind
> > >> of works
> > >>> for me, but I see it can be confused with single-node. We have also
> > >> discussed
> > >>> replacing SolrCloud (which is a terrible name) with something more
> > >> descriptive.
> > >>>
> > >>> Today: SolrCloud vs Master/slave
> > >>> Alt A: SolrCloud vs Standalone
> > >>> Alt B: SolrCloud vs Legacy
> > >>> Alt C: Clustered vs Independent
> > >>> Alt D: Clustered vs Manual mode
> > >>>
> > >>> Jan
> > >>>
> >  18. jun. 2020 kl. 15:53 skrev Mike Drob :
> > 
> >  I personally think that using Solr cloud terminology for this would
> be
> > >> fine
> >  with leader/follower. The leader is the one that accepts updates,
> > >> followers
> >  cascade the updates somehow. The presence of ZK or election doesn’t
> > >> really
> >  change this detail.
> > 
> >  However, if folks feel that it’s confusing, then I can’t tell them
> > that
> >  they’re not confused. Especially when they’re working with others
> who
> > >> have
> >  less Solr experience than we do and are less familiar with the
> > >> intricacies.
> > 
> >  Primary/Replica seems acceptable. Coordinator instead of Overseer
> > seems
> >  acceptable.
> > 
> >  Would love to see this in 9.0!
> > 
> >  Mike
> > 
> >  On Thu, Jun 18, 2020 at 8:25 AM John Gallagher
> >   wrote:
> > 
> > > While on the topic of renaming roles, I'd like to propose finding a
> > >> better
> > > term than "overseer" which has historical slavery connotations as
> > well.
> > > Director, perhaps?
> > >
> > >
> > > John Gallagher
> > >
> > > On Thu, Jun 18, 2020 at 8:48 AM Jason Gerlowski <
> > gerlowsk...@gmail.com
> > >>>
> > > wrote:
> > >
> > >> +1 to rename master/slave, and +1 to choosing terminology distinct
> > >> from what's used for SolrCloud.  I could be happy with several of
> > the
> > >> proposed options.  Since a good few have been proposed though,
> maybe
> > >> an eventual vote thread is the most organized way to aggregate the
> > >> opinions here.
> > >>
> > >> I'm less positive about the prospect of changing the name of our
> > >> primary git branch.  Most projects that contributors might come
> > from,
> > >> most tutorials out there to learn git, most tools built on top of
> > git
> > >> - the majority are going to assume "master" as the main branch.  I
> > >> appreciate the change that Github is trying to effect in changing
> > the
> > >> default for new projects, but it'll be a long time before that
> > >> competes with the huge bulk of projects, documentation, etc. 

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread gnandre
+1 for Leader-Follower. How about Publisher-Subscriber?

On Wed, Jun 17, 2020 at 5:19 PM Rahul Goswami  wrote:

> +1 on avoiding SolrCloud terminology. In the interest of keeping it obvious
> and simple, may I I please suggest primary/secondary?
>
> On Wed, Jun 17, 2020 at 5:14 PM Atita Arora  wrote:
>
> > I agree avoiding using of solr cloud terminology too.
> >
> > I may suggest going for "prime" and "clone"
> > (Short and precise as Master and Slave).
> >
> > Best,
> > Atita
> >
> >
> >
> >
> >
> > On Wed, 17 Jun 2020, 22:50 Walter Underwood, 
> > wrote:
> >
> > > I strongly disagree with using the Solr Cloud leader/follower
> terminology
> > > for non-Cloud clusters. People in my company are confused enough
> without
> > > using polysemous terminology.
> > >
> > > “This node is the leader, but it means something different than the
> > leader
> > > in this other cluster.” I’m dreading that conversation.
> > >
> > > I like “principal”. How about “clone” for the slave role? That suggests
> > > that
> > > it does not accept updates and that it is loosely-coupled, only
> depending
> > > on the state of the no-longer-called-master.
> > >
> > > Chegg has five production Solr Cloud clusters and one production
> > > master/slave
> > > cluster, so this is not a hypothetical for us. We have 100+ Solr hosts
> in
> > > production.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > > > On Jun 17, 2020, at 1:36 PM, Trey Grainger 
> wrote:
> > > >
> > > > Proposal:
> > > > "A Solr COLLECTION is composed of one or more SHARDS, which each have
> > one
> > > > or more REPLICAS. Each replica can have a ROLE of either:
> > > > 1) A LEADER, which can process external updates for the shard
> > > > 2) A FOLLOWER, which receives updates from another replica"
> > > >
> > > > (Note: I prefer "role" but if others think it's too overloaded due to
> > the
> > > > overseer role, we could replace it with "mode" or something similar)
> > > > ---
> > > >
> > > > To be explicit with the above definitions:
> > > > 1) In SolrCloud, the roles of leaders and followers can dynamically
> > > change
> > > > based upon the status of the cluster. In standalone mode, they can be
> > > > changed by manual intervention.
> > > > 2) A leader does not have to have any followers (i.e. only one active
> > > > replica)
> > > > 3) Each shard always has one leader.
> > > > 4) A follower can also pull updates from another follower instead of
> a
> > > > leader (traditionally known as a REPEATER). A repeater is still a
> > > follower,
> > > > but would not be considered a leader because it can't process
> external
> > > > updates.
> > > > 5) A replica cannot be both a leader and a follower.
> > > >
> > > > In addition to the above roles, each replica can have a TYPE of one
> of:
> > > > 1) NRT - which can serve in the role of leader or follower
> > > > 2) TLOG - which can only serve in the role of follower
> > > > 3) PULL - which can only serve in the role of follower
> > > >
> > > > A replica's type may be changed automatically in the event that its
> > role
> > > > changes.
> > > >
> > > > I think this terminology is consistent with the current
> Leader/Follower
> > > > usage while also being able to easily accomodate a rename of the
> > > historical
> > > > master/slave terminology without mental gymnastics or the
> introduction
> > or
> > > > more cognitive load through new terminology. I think adopting the
> > > > Primary/Replica terminology will be incredibly confusing given the
> > > already
> > > > specific and well established meaning of "replica" within Solr.
> > > >
> > > > All the Best,
> > > >
> > > > Trey Grainger
> > > > Founder, Searchkernel
> > > > https://searchkernel.com
> > > >
> > > >
> > > >
> > > > On Wed, Jun 17, 2020 at 3:38 PM Anshum Gupta  >
> > > wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> Moving a conversation that was happening on the PMC list to the
> public
> > > >> forum. Most of the following is just me recapping the conversation
> > that
> > > has
> > > >> happened so far.
> > > >>
> > > >> Some members of the community have been discussing getting rid of
> the
> > > >> master/slave nomenclature from Solr.
> > > >>
> > > >> While this may require a non-trivial effort, a general consensus so
> > far
> > > >> seems to be to start this process and switch over incrementally, if
> a
> > > >> single change ends up being too big.
> > > >>
> > > >> There have been a lot of suggestions around what the new
> nomenclature
> > > might
> > > >> look like, a few people don’t want to overlap the naming here with
> > what
> > > >> already exists in SolrCloud i.e. leader/follower.
> > > >>
> > > >> Primary/Replica was an option that was suggested based on what other
> > > >> vendors are moving towards based on Wikipedia:
> > > >> https://en.wikipedia.org/wiki/Master/slave_(technology)
> > > >> , however there 

Re: RankLib model output format to Solr LTR model format

2020-06-17 Thread gnandre
Thanks Doug, this is very helpful.

On Wed, Jun 17, 2020 at 1:11 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> There are several scripts for doing this.
>
> I might encourage you to checkout our Hello LTR library of notebooks, which
> has a ranklib training driver, and helpers to log training data, train a
> model w/ Ranklib, and search with it. I am using this code for my LTR
> contributions AI Powered Search
>
> http://github.com/o19s/hello-ltr
>
> But if you just care about the conversion, check out this code. It's
> adapted / inspired by code written by Christine Poerschke with her Ltr For
> Bees demo / talk
>
> https://github.com/o19s/hello-ltr/blob/master/ltr/helpers/convert.py
>
> Best
> -Doug
>
>
>
>
> On Wed, Jun 17, 2020 at 12:46 PM gnandre  wrote:
>
> > Hi,
> >
> > Before I start writing my own implementation for converting RankLib's
> model
> > output format to Solr LTR model format for my own use cases, I just
> wanted
> > to check if there is any work done on this front already. Any references
> > are welcome.
> >
>
>
> --
> *Doug Turnbull **| CTO* | OpenSource Connections
> <http://opensourceconnections.com>, LLC | 240.476.9983
> Author: Relevant Search <http://manning.com/turnbull>; Contributor: *AI
> Powered Search <http://aipoweredsearch.com>*
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>


RankLib model output format to Solr LTR model format

2020-06-17 Thread gnandre
Hi,

Before I start writing my own implementation for converting RankLib's model
output format to Solr LTR model format for my own use cases, I just wanted
to check if there is any work done on this front already. Any references
are welcome.


Re: Lucene query to Solr query

2020-06-01 Thread gnandre
Is this odd use-case where one needs to convert Lucene query to Solr query?
Isn't this normal use-case when somebody is trying to port their Lucene
code to Solr?
I mean, is it like a XY problem where I should not even run into this
problem in the first place?


On Sun, May 31, 2020 at 9:40 AM Mikhail Khludnev  wrote:

> There's nothing like this now. Presumably one might visit queries and
> generate Query DSL json, but it might be a challenging problem.
>
> On Sun, May 31, 2020 at 3:42 AM gnandre  wrote:
>
> > I think this question here in this thread is similar to my question.
> >
> >
> https://lucene.472066.n3.nabble.com/Lucene-Query-to-Solr-query-td493751.html
> >
> >
> > As suggested in that thread, I do not want to use toString method for
> > Lucene query to pass it to the q param in SolrQuery.
> >
> > I am looking for a function that accepts org.apache.lucene.search.Query
> and
> > returns org.apache.solr.client.solrj.SolrQuery. Is that possible?
> >
> > On Sat, May 30, 2020 at 8:08 AM Erick Erickson 
> > wrote:
> >
> > > edismas is quite different from straight Lucene.
> > >
> > > Try attaching =query to the input and
> > > you’ll see the difference.
> > >
> > > Best,
> > > Erick
> > >
> > > > On May 30, 2020, at 12:32 AM, gnandre 
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I have following query which works fine as a lucene query:
> > > > +(topics:132)^0.02607211 (topics:146)^0.008187325
> > > > -asset_id:doc:en:index.html
> > > >
> > > > But, it does not work if I use it as a solr query with lucene as
> > defType.
> > > >
> > > > For it to work, I need to convert it like following:
> > > > q=+((topics:132)^0.02607211 (topics:146)^0.008187325
> > > > +(-(asset_id:doc\:en\:index.html))=edismax=OR
> > > >
> > > > Why does it not work as is? AFAIK syntax given in the first query is
> > > > supported by edismax.
> > >
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Lucene query to Solr query

2020-05-30 Thread gnandre
I think this question here in this thread is similar to my question.
https://lucene.472066.n3.nabble.com/Lucene-Query-to-Solr-query-td493751.html


As suggested in that thread, I do not want to use toString method for
Lucene query to pass it to the q param in SolrQuery.

I am looking for a function that accepts org.apache.lucene.search.Query and
returns org.apache.solr.client.solrj.SolrQuery. Is that possible?

On Sat, May 30, 2020 at 8:08 AM Erick Erickson 
wrote:

> edismas is quite different from straight Lucene.
>
> Try attaching =query to the input and
> you’ll see the difference.
>
> Best,
> Erick
>
> > On May 30, 2020, at 12:32 AM, gnandre  wrote:
> >
> > Hi,
> >
> > I have following query which works fine as a lucene query:
> > +(topics:132)^0.02607211 (topics:146)^0.008187325
> > -asset_id:doc:en:index.html
> >
> > But, it does not work if I use it as a solr query with lucene as defType.
> >
> > For it to work, I need to convert it like following:
> > q=+((topics:132)^0.02607211 (topics:146)^0.008187325
> > +(-(asset_id:doc\:en\:index.html))=edismax=OR
> >
> > Why does it not work as is? AFAIK syntax given in the first query is
> > supported by edismax.
>
>


Lucene query to Solr query

2020-05-29 Thread gnandre
Hi,

I have following query which works fine as a lucene query:
+(topics:132)^0.02607211 (topics:146)^0.008187325
-asset_id:doc:en:index.html

But, it does not work if I use it as a solr query with lucene as defType.

For it to work, I need to convert it like following:
q=+((topics:132)^0.02607211 (topics:146)^0.008187325
+(-(asset_id:doc\:en\:index.html))=edismax=OR

Why does it not work as is? AFAIK syntax given in the first query is
supported by edismax.


Does Learning To Rank feature require SolrCloud?

2020-05-29 Thread gnandre
Hi,

Do following features require SolrCloud? Or do they work in master-slave
mode just fine?

1. Learning to rank (LTR)
2. Distributed IDF


Re: SolrCloud upgrade concern

2020-05-29 Thread gnandre
Thanks for all this information. It clears lot of confusion surrounding
CDCR feature. Although, I should say that if CDCR functionality is so
fragile in SolrCloud and not worth pursuing much, does it make sense to add
some warning about its possible shortcomings in the documentation?

On Thu, May 28, 2020 at 9:02 AM Jan Høydahl  wrote:

> I had a client who asked a lot about CDCR a few years ago, but I kept
> recommending
> aginst it and recommended them to go for Ericks’s alternative (2), since
> they anyway
> needed to replicate their Oracle DBs in each DC as well. Much cleaner
> design to let
> each cluster have a local datasource and always stay in sync with local DB
> than to
> replicate both DB and index.
>
> There are of course use cases where you want to sync a read-only copy of
> indices
> to multiple DCs. I hope we’ll see a 3rd party tool for that some day,
> something that
> can sit outside your Solr clusters, monitor ZK of each cluster, and do
> some magic :)
>
> Jan
>
> > 28. mai 2020 kl. 01:17 skrev Erick Erickson :
> >
> > The biggest issue with CDCR is it’s rather fragile and requires
> monitoring,
> > it’s not a “fire and forget” type of functionality. For instance, the
> use of the
> > tlogs as a queueing mechanism means that if, for any reason, the
> communications
> > between DCs is broken, the tlogs will grow forever until the connection
> is
> > re-established. Plus the other issues Jason pointed out.
> >
> > So yes, some companies do use CDCR to communicate between separate
> > DCs. But they also put in some “roll your own” type of monitoring to
> insure
> > things don’t go haywire.
> >
> > Alternatives:
> > 1> use something that’s built from the ground up to provide reliable
> > messaging between DCs. Kafka or similar has been mentioned. Write
> > your updates to the Kafka queue and consume them in both DCs.
> > These kinds of solutions have a lot more robustness.
> >
> > 2> reproduce your system-of-record rather than Solr in the DCs and
> >   treat the DCs as separate installations. If you adopt this approach,
> >  some of the streaming capabilities can be used to monitor that they stay
> >  in sync. For instance have a background or periodic task that’ll take a
> while
> >  for a complete run wrap two "search" streams in a "unique” decorator,
> >  anything except an empty result identifies docs not on both DCs.
> >
> > 3> Oh Dear. This one is “interesting”. Wrap a “topic" stream on DC1 in
> >an update decorator for DC2 and wrap both of those in a daemon
> decorator.
> >   That’s gobbledygook, and you’ll have to dig through the docs a bit for
> >   that to make sense. Essentially the topic stream is one of the very
> few
> >   streams that does not (IIRC) require all values in the fl list be
> docValues.
> >   It fires the first time and establishes a checkpoint, finding all docs
> up to that point.
> >   Thereafter, it’ll get docs that have changed since the last time it
> ran. It uses a tiny
> >   collection for record keeping. Each time the topic stream finds new
> docs, it passes
> >  them to the update stream which sends them to another DC. Wrapping the
> whole
> >  thing in a daemon decorator means it periodically runs in the
> background. The one
> >  shortcoming is that this approach doesn’t propagate deletes. That’s
> enough of that
> >  until you tell us whether it sounds worth pursuing ;)
> >
> > So overall, you _can_ use CDCR to connect remote DCs, but it takes time
> and energy
> > to make it robust. Its advantage is that it’s entirely contained within
> Solr. But it’s not
> > getting much attention lately, meaning nobody has decided the
> functionality is important
> > enough to them to donate the time/resources to make it more robust. Were
> someone
> > to take an active interest in it, likely it could be kept around as a
> plugin that core Solr
> > is not responsible for.
> >
> > Best,
> > Erick
> >
> >> On May 27, 2020, at 4:43 PM, gnandre  wrote:
> >>
> >> Thanks, Jason. This is very helpful.
> >>
> >> I should clarify though that I am not using CDCR currently with my
> >> existing master-slave architecture. What I meant to say earlier was
> that we
> >> will be relying heavily on the CDCR feature if we migrate from solr
> >> master-slave architecture to solrcloud architecture. Are there any
> >> alternatives to CDCR? AFAIK, if you want to replicate between different
> >> data centers then CDCR is the only option. Also, when you s

Re: SolrCloud upgrade concern

2020-05-27 Thread gnandre
Thanks, Jason. This is very helpful.

I should clarify though that I am not using CDCR currently with my
existing master-slave architecture. What I meant to say earlier was that we
will be relying heavily on the CDCR feature if we migrate from solr
master-slave architecture to solrcloud architecture. Are there any
alternatives to CDCR? AFAIK, if you want to replicate between different
data centers then CDCR is the only option. Also, when you say lot of
customers are using SolrCloud successfully, how are they working around the
CDCR situation? Do they not have any data center use cases? Is there some
list maintained somewhere where one can find which companies are using
SolrCloud successfully?



On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski 
wrote:

> Hi Arnold,
>
> From what I saw in the community, CDCR saw an initial burst of
> development around when it was contributed, but hasn't seen much
> attention or improvement since.  So while it's been around for a few
> years, I'm not sure it's improved much in terms of stability or
> compatibility with other Solr features.
>
> Some of the bigger ticket issues still open around CDCR:
> - SOLR-11959 no support for basic-auth
> - SOLR-12842 infinite retry of failed update-requests (leads to
> sync/recovery problems)
> - SOLR-12057 no real support for NRT/TLOG/PULL replicas
> - SOLR-10679 no support for collection aliases
>
> These are in addition to other more architectural issues: CDCR can be
> a bottleneck on clusters with high ingestion rates, CDCR uses
> full-index-replication more than traditional indexing setups, which
> can cause issues with modern index sizes, etc.
>
> So, unfortunately, no real good news in terms of CDCR maturing much in
> recent releases.  Joel Bernstein filed a JIRA recently suggesting its
> removal entirely actually.  Though I don't think it's gone anywhere.
>
> That said, I gather from what you said that you're already using CDCR
> successfully with Master-Slave.  If none of these pitfalls are biting
> you in your current Master-Slave setup, you might not be bothered by
> them any more in SolrCloud.  Most of the problems with CDCR are
> applicable in master-slave as well as SolrCloud.  I wouldn't recommend
> CDCR if you were starting from scratch, and I still recommend you
> consider other options.  But since you're already using it with some
> success, it might be an orthogonal concern to your potential migration
> to SolrCloud.
>
> Best of luck deciding!
>
> Jason
>
> On Fri, May 22, 2020 at 7:06 PM gnandre  wrote:
> >
> > Thanks for this reply, Jason.
> >
> > I am mostly worried about CDCR feature. I am relying heavily on it.
> > Although, I am planning to use Solr 8.3. It has been long time since CDCR
> > was first introduced. I wonder what is the state of CDCR is 8.3. Is it
> > stable now?
> >
> > On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski 
> wrote:
> >
> > > Hi Arnold,
> > >
> > > The stability and complexity issues Mark highlighted in his post
> > > aren't just imagined - there are real, sometimes serious, bugs in
> > > SolrCloud features.  But at the same time there are many many stable
> > > deployments out there where SolrCloud is a real success story for
> > > users.  Small example, I work at a company (Lucidworks) where our main
> > > product (Fusion) is built heavily on top of SolrCloud and we see it
> > > deployed successfully every day.
> > >
> > > In no way am I trying to minimize Mark's concerns (or David's).  There
> > > are stability bugs.  But the extent to which those need affect you
> > > depends a lot on what your deployment looks like.  How many nodes?
> > > How many collections?  How tightly are you trying to squeeze your
> > > hardware?  Is your network flaky?  Are you looking to use any of
> > > SolrCloud's newer, less stable features like CDCR, etc.?
> > >
> > > Is SolrCloud better for you than Master/Slave?  It depends on what
> > > you're hoping to gain by a move to SolrCloud, and on your answers to
> > > some of the questions above.  I would be leery of following any
> > > recommendations that are made without regard for your reason for
> > > switching or your deployment details.  Those things are always the
> > > biggest driver in terms of success.
> > >
> > > Good luck making your decision!
> > >
> > > Best,
> > >
> > > Jason
> > >
>


Re: TimestampUpdateProcessorFactory updates the field even if the value if present

2020-05-27 Thread gnandre
Thanks for the detailed response, Chris. I am aware of the partial (atomic)
updates. Thanks for clarifying the confusion about input document vs
indexed document. I was thinking that TimestampUpdateProcessorFactory
checks if the value exists in the field inside indexed document before
updating it but actually it does check if it present inside the input
request. But the why do we require explicit processor for that? This can be
done with a simple field in schema that has default value as NOW.

I tried your idea about MinFieldValueUpdateProcessorFactory but it does not
work. Here is the configuration:


 index_time_stamp_create 
index_time_stamp_create   

I think MinFieldValueUpdateProcessorFactory keeps the min value in a
multivalued field which  index_time_stamp_create is not.

On Tue, May 26, 2020 at 2:31 PM Chris Hostetter 
wrote:

> : Subject: TimestampUpdateProcessorFactory updates the field even if the
> value
> : if present
> :
> : Hi,
> :
> : Following is the update request processor chain.
> :
> :  > <
> : processor class="solr.TimestampUpdateProcessorFactory">  : "fieldName">index_time_stamp_create   : "solr.LogUpdateProcessorFactory" />  : "solr.RunUpdateProcessorFactory" /> 
> :
> : And, here is how the field is defined in schema.xml
> :
> :  : "true" />
> :
> : Every time I index the same document, above field changes its value with
> : latest timestamp. According to TimestampUpdateProcessorFactory  javadoc
> : page, if a document does not contain a value in the timestamp field, a
> new
>
> based on the wording of your question, i suspect you are confused about
> the overall behavior of how "updating" an existing document works in solr,
> and how update processors "see" an *input document* when processing an
> add/update command.
>
>
> First off, completley ignoring TimestampUpdateProcessorFactory and
> assuming just the simplest possibel update change, let's clarify how
> "updates" work, let's assume you when you say you "index the same
> document" twice you do so with a few diff field values ...
>
> First Time...
>
> {  id:"x",  title:"" }
>
> Second time...
>
> {  id:"x",  body:"      xxx" }
>
> Solr does not implicitly know that you are trying to *update* that
> document, the final result will not be a document containing both a
> "title" field and "body" field in addition to the "id", it will *only*
> have the "id" and "body" fields and the title field will be lost.
>
> The way to "update" a document *and keep existing field values* is with
> one of the "Atomic Update" command options...
>
>
> https://lucene.apache.org/solr/guide/8_4/updating-parts-of-documents.html#UpdatingPartsofDocuments-AtomicUpdates
>
> {  id:"x",  title:"" }
>
> Second time...
>
> {  id:"x",  body: { set: "      xxx" } }
>
>
> Now, with that background info clarified: let's talk about update
> processors
>
>
> The docs for TimestampUpdateProcessorFactory are refering to how it
> modifies an *input* document that it recieves (as part of the processor
> chain). It adds the timestamp field if it's not already in the *input*
> document, it doesn't know anything about wether that document is already
> in the index, or if it has a value for that field in the index.
>
>
> When processors like TimestampUpdateProcessorFactory (or any other
> processor that modifies a *input* document) are run they don't know if the
> document you are "indexing" already exists in the index or not.  even if
> you are using the "atomic update" options to set/remove/add a field value,
> with the intent of preserving all other field values, the documents based
> down the processors chain don't include those values until the "document
> merger" logic is run -- as part of the DistributedUpdateProcessor (which
> if not explicit in your chain happens immediatly before the
> RunUpdateProcessorFactory)
>
> Off the top of my head i don't know if there is an "easy" way to have a
> Timestamp added to "new" documents, but left "as is" for existing
> documents.
>
> Untested idea
>
> explicitly configured
> DistributedUpdateProcessorFactory, so that (in addition to putting
> TimestampUpdateProcessorFactory before it) you can
> also put MinFieldValueUpdateProcessorFactory on the timestamp field
> *after* DistributedUpdateProcessorFactory (but before
> RunUpdateProcessorFactory).
>
> I think that would work?
>
> Just putting TimestampUpdateProcessorFactory after the
> DistributedUpdateProcessorFactory would be dangerous, because it would
> introduce descrepencies -- each replica would would up with it's own
> locally computed timestamp.  having the timetsamp generated before the
> distributed update processor ensures the value is computed only once.
>
> -Hoss
> http://www.lucidworks.com/
>


Re: SolrCloud upgrade concern

2020-05-22 Thread gnandre
Thanks for this reply, Jason.

I am mostly worried about CDCR feature. I am relying heavily on it.
Although, I am planning to use Solr 8.3. It has been long time since CDCR
was first introduced. I wonder what is the state of CDCR is 8.3. Is it
stable now?

On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski  wrote:

> Hi Arnold,
>
> The stability and complexity issues Mark highlighted in his post
> aren't just imagined - there are real, sometimes serious, bugs in
> SolrCloud features.  But at the same time there are many many stable
> deployments out there where SolrCloud is a real success story for
> users.  Small example, I work at a company (Lucidworks) where our main
> product (Fusion) is built heavily on top of SolrCloud and we see it
> deployed successfully every day.
>
> In no way am I trying to minimize Mark's concerns (or David's).  There
> are stability bugs.  But the extent to which those need affect you
> depends a lot on what your deployment looks like.  How many nodes?
> How many collections?  How tightly are you trying to squeeze your
> hardware?  Is your network flaky?  Are you looking to use any of
> SolrCloud's newer, less stable features like CDCR, etc.?
>
> Is SolrCloud better for you than Master/Slave?  It depends on what
> you're hoping to gain by a move to SolrCloud, and on your answers to
> some of the questions above.  I would be leery of following any
> recommendations that are made without regard for your reason for
> switching or your deployment details.  Those things are always the
> biggest driver in terms of success.
>
> Good luck making your decision!
>
> Best,
>
> Jason
>


Re: TimestampUpdateProcessorFactory updates the field even if the value if present

2020-05-21 Thread gnandre
Hi,

I do not pass that field at all.

Here is the document that I index again and again to test through Solr
Admin UI.
{
asset_id:"x:1",
title:"x"
}

On Thu, May 21, 2020 at 5:25 PM Furkan KAMACI 
wrote:

> Hi,
>
> How do you index that document? Do you index it with an empty
> *index_time_stamp_create* field as the second time too?
>
> Kind Regards,
> Furkan KAMACI
>
> On Fri, May 22, 2020 at 12:05 AM gnandre  wrote:
>
> > Hi,
> >
> > Following is the update request processor chain.
> >
> >  >
> > <
> > processor class="solr.TimestampUpdateProcessorFactory">  > "fieldName">index_time_stamp_create   > "solr.LogUpdateProcessorFactory" />  > "solr.RunUpdateProcessorFactory" /> 
> >
> > And, here is how the field is defined in schema.xml
> >
> >  > "true" />
> >
> > Every time I index the same document, above field changes its value with
> > latest timestamp. According to TimestampUpdateProcessorFactory  javadoc
> > page, if a document does not contain a value in the timestamp field, a
> new
> > Date will be generated and added as the value of that field. After the
> > first indexing this document should always have a value, so why then it
> > gets updated later?
> >
> > I am using Solr Admin UI's Documents tab to index the document for
> testing.
> > I am using Solr 6.3 in master-slave architecture mode.
> >
>


TimestampUpdateProcessorFactory updates the field even if the value if present

2020-05-21 Thread gnandre
Hi,

Following is the update request processor chain.

 <
processor class="solr.TimestampUpdateProcessorFactory"> index_time_stamp_create

And, here is how the field is defined in schema.xml



Every time I index the same document, above field changes its value with
latest timestamp. According to TimestampUpdateProcessorFactory  javadoc
page, if a document does not contain a value in the timestamp field, a new
Date will be generated and added as the value of that field. After the
first indexing this document should always have a value, so why then it
gets updated later?

I am using Solr Admin UI's Documents tab to index the document for testing.
I am using Solr 6.3 in master-slave architecture mode.